List:Commits« Previous MessageNext Message »
From:Mats Kindahl Date:February 11 2008 7:27pm
Subject:Re: bk commit into 5.0 tree (aelkin:1.2576) BUG#33931
View as plain text  
Hi Andrei!

Good that you fix this, please check my comments and questions below.

Just my few cents,
Mats Kindahl

Andrei Elkin wrote:
> Below is the list of changes that have just been committed into a local
> 5.0 repository of aelkin.  When aelkin does a push these changes
> will be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>
> ChangeSet@stripped, 2008-02-04 23:57:56+02:00, aelkin@stripped +3 -0
>   Bug #33931 assertion at write_ignored_events_info_to_relay_log if
> init_slave_thread() fails
>   and
>   bug#33932  assertion at handle_slave_sql if init_slave_thread() fails
>   
>   the asserts were caused by 
>     bug33931: having thd deleted at time of executing err: code plus
>               a missed initialization;
>     bug33932: initialization of slave_is_running member was missed;
>   
>   fixed with relocating mi members initialization and removing delete thd
>   It is safe to do as deletion happens later.
>   

Perhaps say where the "later" delete is done, just to be clear. See my 
comment below.

>   
>   Todo: at merging the test is better to be moved into suite/bugs for 5.x (when
> x>0).
>
>   mysql-test/t/rpl_bug33931-slave.opt@stripped, 2008-02-04 23:57:50+02:00,
> aelkin@stripped +1 -0
>     option to spark the simulation code
>
>   mysql-test/t/rpl_bug33931-slave.opt@stripped, 2008-02-04 23:57:50+02:00,
> aelkin@stripped +0 -0
>
>   mysql-test/t/rpl_bug33931.test@stripped, 2008-02-04 23:57:54+02:00,
> aelkin@stripped +33 -0
>     tests check that slave does not crash as before.
>     Slave threads must be in NO running state in the end.
>     
>
>   mysql-test/t/rpl_bug33931.test@stripped, 2008-02-04 23:57:54+02:00,
> aelkin@stripped +0 -0
>
>   sql/slave.cc@stripped, 2008-02-04 23:57:49+02:00, aelkin@stripped +12 -4
>     adding the bugs simulating code;
>     relocating some assignments to satisfy the asserts;
>   

You're missing a result file for the test.

> diff -Nrup a/mysql-test/t/rpl_bug33931-slave.opt
> b/mysql-test/t/rpl_bug33931-slave.opt
> --- /dev/null	Wed Dec 31 16:00:00 196900
> +++ b/mysql-test/t/rpl_bug33931-slave.opt	2008-02-04 23:57:50 +02:00
> @@ -0,0 +1 @@
> +--loose-debug=d,simulate_io_slave_error,simulate_sql_slave_error
>   

Since you are using --loose-debug, it means that the flag might not be 
set (if the server was not built with debug info), and if you are 
relying on the flag to be set for the slave to stop, this means that the 
test will fail for any build that is not debug built.

Now, since you don't have an explicit "stop slave" or other statement to 
generate an error...

> diff -Nrup a/mysql-test/t/rpl_bug33931.test b/mysql-test/t/rpl_bug33931.test
> --- /dev/null	Wed Dec 31 16:00:00 196900
> +++ b/mysql-test/t/rpl_bug33931.test	2008-02-04 23:57:54 +02:00
> @@ -0,0 +1,33 @@
> +# Test for 
> +# Bug #33931  	assertion at write_ignored_events_info_to_relay_log if
> init_slave_thread() fails
> +# Bug #33932  	assertion at handle_slave_sql if init_slave_thread() fails
> +
> +source include/have_log_bin.inc;
> +
> +connect (master,127.0.0.1,root,,test,$MASTER_MYPORT,);
> +connect (slave,127.0.0.1,root,,test,$SLAVE_MYPORT,);
> +
> +
> +connection master;
> +reset master;
> +
> +connection slave;
> +--disable_warnings
> +stop slave;
> +--enable_warnings
> +reset slave;
> +start slave;
> +
>   
... between here and ...

> +connection master;
> +save_master_pos;
> +connection slave;
> +
>   

... here, it means that if the server is not debug built, the slave will 
not stop...

> +source include/wait_for_slave_to_stop.inc;
>   

... here, which in turn will cause the test to fail. If you really want 
the test to only be executed on debug build, and ignored otherwise, you 
need to add "source include/have_debug.inc" in the beginning of the file.


> +
> +--replace_result $MASTER_MYPORT MASTER_PORT                                     
> +--replace_column 1 # 8 # 9 # 23 # 33 #                                          
> +--vertical_results                                                              
> +show slave status; 
>   

I suggest using "query_vertical show slave status;"

"vertical_results" set *all* the following displays to vertical output, 
while query vertical just does it for the given statement. Removing the 
"--" before and adding ";" last on the line allow syntax errors to be 
caught. If there is a syntax error on a "--" line, it will just be 
treated as a comment, giving strange and hard to find behavior.

Also, check the include file show_slave_status.inc to see if it 
suffices. It blanks out a set of fields that are usually uninteresting 
when checking slave status.

> +
> +# no clean-up is needed
> +
> diff -Nrup a/sql/slave.cc b/sql/slave.cc
> --- a/sql/slave.cc	2007-12-20 17:07:52 +02:00
> +++ b/sql/slave.cc	2008-02-04 23:57:49 +02:00
> @@ -2895,6 +2895,9 @@ void set_slave_thread_default_charset(TH
>  static int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type)
>  {
>    DBUG_ENTER("init_slave_thread");
> +#if !defined(DBUG_OFF)
> +  int simulate_error= 0;
> +#endif
>    thd->system_thread = (thd_type == SLAVE_THD_SQL) ?
>      SYSTEM_THREAD_SLAVE_SQL : SYSTEM_THREAD_SLAVE_IO; 
>    thd->security_ctx->skip_grants();
> @@ -2914,10 +2917,15 @@ static int init_slave_thread(THD* thd, S
>    thd->thread_id = thread_id++;
>    pthread_mutex_unlock(&LOCK_thread_count);
>  
> +  DBUG_EXECUTE_IF("simulate_io_slave_error", simulate_error|= (1 <<
> SLAVE_THD_IO););
> +  DBUG_EXECUTE_IF("simulate_sql_slave_error", simulate_error|= (1 <<
> SLAVE_THD_SQL););
>   

Maybe a little more elaborate name so that it can be seen *where* it 
will get an error. It is conceivable that there might be various kinds 
of errors for the I/O and SQL thread causing it to stop, so something 
more descriptive is better.

It is, after all, just debug code, so the length of the name does not 
matter.

Just to give an idea: "slave_io_thread_error_on_init".

> +#if !defined(DBUG_OFF)
> +  if (init_thr_lock() || thd->store_globals() || simulate_error & (1<<
> thd_type))
> +#else
>    if (init_thr_lock() || thd->store_globals())
> +#endif
>    {
>      thd->cleanup();
> -    delete thd;
>   

Uh, where is the delete of the THD structure done now?

>      DBUG_RETURN(-1);
>    }
>  
> @@ -3515,6 +3523,7 @@ slave_begin:
>  
>    thd= new THD; // note that contructor of THD uses DBUG_ !
>    THD_CHECK_SENTRY(thd);
> +  mi->io_thd = thd;
>   

Why did you move this one?

>  
>    pthread_detach_this_thread();
>    thd->thread_stack= (char*) &thd; // remember where our stack is
> @@ -3525,7 +3534,6 @@ slave_begin:
>      sql_print_error("Failed during slave I/O thread initialization");
>      goto err;
>    }
> -  mi->io_thd = thd;
>    pthread_mutex_lock(&LOCK_thread_count);
>    threads.append(thd);
>    pthread_mutex_unlock(&LOCK_thread_count);
> @@ -3865,9 +3873,11 @@ slave_begin:
>  
>    thd = new THD; // note that contructor of THD uses DBUG_ !
>    thd->thread_stack = (char*)&thd; // remember where our stack is
> +  rli->sql_thd= thd;
>   

... and why did you move this one? Both moves seems to be arbitrary, but 
might not be.

>    
>    /* Inform waiting threads that slave has started */
>    rli->slave_run_id++;
> +  rli->slave_running = 1;
>  
>    pthread_detach_this_thread();
>    if (init_slave_thread(thd, SLAVE_THD_SQL))
> @@ -3882,7 +3892,6 @@ slave_begin:
>      goto err;
>    }
>    thd->init_for_queries();
> -  rli->sql_thd= thd;
>    thd->temporary_tables = rli->save_temporary_tables; // restore temp tables
>    pthread_mutex_lock(&LOCK_thread_count);
>    threads.append(thd);
> @@ -3895,7 +3904,6 @@ slave_begin:
>      start receiving data so we realize we are not caught up and
>      Seconds_Behind_Master grows. No big deal.
>    */
> -  rli->slave_running = 1;
>    rli->abort_slave = 0;
>    pthread_mutex_unlock(&rli->run_lock);
>    pthread_cond_broadcast(&rli->start_cond);
>
>   


-- 
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com


Thread
bk commit into 5.0 tree (aelkin:1.2576) BUG#33931Andrei Elkin4 Feb
  • Re: bk commit into 5.0 tree (aelkin:1.2576) BUG#33931Sven Sandberg11 Feb
  • Re: bk commit into 5.0 tree (aelkin:1.2576) BUG#33931Mats Kindahl11 Feb
    • Re: bk commit into 5.0 tree (aelkin:1.2576) BUG#33931Andrei Elkin13 Feb