List:Commits« Previous MessageNext Message »
From:Andrei Elkin Date:May 8 2008 7:32pm
Subject:Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654
View as plain text  
Sven Sandberg <sven@stripped> writes:

> Hi Andrei!
>
> Thanks for fixing this.
>
> As we discussed on IRC, please replace 'while (!io_slave_killed)' by
> if (!io_slave_killed)' in the outer loop of handle_slave_io.
>

I could not help it and changed further to eliminate the outermost
while altogether.

Please look at newer patch.
Just hope you will stop me at this point :)

> There is one place where errors are cleared redundantly. I would also
> like more explicit comments in the test case, and a test case for
> Last_SQL_Err*. See below.
>
> /Sven
>
>
> Andrei Elkin wrote:
>> Below is the list of changes that have just been committed into a local
>> 6.0 repository of aelkin.  When aelkin does a push these changes
>> will be propagated to the main repository and, within 24 hours after the
>> push, to the public repository.
>> For information on how to access the public repository
>> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>>
>> ChangeSet@stripped, 2008-05-01 16:26:05+03:00, aelkin@stripped +5 -0
>>   Bug #34654 RESET SLAVE and START SLAVE does not clear Last_IO_Errno
>>     There were no resetting of the io thread's error status at
>> reset_slave() and   at the beginning of handle_slave_io() that
>> corresponds to start slave.
>>     Fixed with deploying clean-up at necessary places of the code.
>>
>>   mysql-test/extra/rpl_tests/rpl_reset_slave.test@stripped, 2008-05-01
> 16:26:01+03:00, aelkin@stripped +51 -0
>>     adding regression test for bug#34654 that tests clearing the IO error status
>>     for start slave and reset slave.
>>
>>   mysql-test/suite/rpl/r/rpl_row_reset_slave.result@stripped, 2008-05-01
> 16:26:01+03:00, aelkin@stripped +21 -0
>>     results changed
>>
>>   mysql-test/suite/rpl/r/rpl_stm_reset_slave.result@stripped, 2008-05-01
> 16:26:01+03:00, aelkin@stripped +21 -0
>>     results changed
>>
>>   sql/slave.cc@stripped, 2008-05-01 16:26:01+03:00, aelkin@stripped +3
> -0
>>     explicitly clearing the last mi's error at
>>         1. entering handle_slave_io
>>     2. getting to the read loop after passing successfully
>>        connecting, checking master version etc, registering and
>>        requesting for the dump
>>     3. right after successful connecting
>>
>>   sql/sql_repl.cc@stripped, 2008-05-01 16:26:01+03:00, aelkin@stripped
> +1 -0
>>     reset slave now resets the io thread error.
>>
>> diff -Nrup a/mysql-test/extra/rpl_tests/rpl_reset_slave.test
> b/mysql-test/extra/rpl_tests/rpl_reset_slave.test
>> --- a/mysql-test/extra/rpl_tests/rpl_reset_slave.test	2008-02-05 15:29:36 +02:00
>> +++ b/mysql-test/extra/rpl_tests/rpl_reset_slave.test	2008-05-01 16:26:01 +03:00
>> @@ -42,3 +42,54 @@ reset slave;
>>  start slave;
>>  sync_with_master;
>>  show status like 'slave_open_temp_tables';
>> +
>> +#
>> +#Bug#34654  	RESET SLAVE does not clear LAST_IO_Err* +#
>> +
>> +# clearing the status
>> +stop slave;
>> +reset slave;
>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>> +echo *** errno must be zero: $last_io_errno ***;
>> +
>> +#
>> +# verifying start slave resets
>> +#
>
> Please be more explicit in the comment describing the purpose of the
> test. You can write something like:
>
> # Verify that "START SLAVE" resets Last_IO_Error and Last_IO_Errno.
>
>> +
>> +change master to master_user='impossible_user_name';
>> +start slave;
>> +source include/wait_for_slave_io_to_stop.inc;
>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>> +--disable_query_log
>> +eval SELECT $last_io_errno > 0 as ONE;
>> +--enable_query_log
>> +
>> +stop slave;
>> +change master to master_user='root';
>> +start slave;
>> +source include/wait_for_slave_to_start.inc;
>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>> +let $last_io_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Error, 1);
>> +--echo *** last errno must be  zero: $last_io_errno ***
>> +--echo *** last error must be blank: $last_io_error ***
>> +
>> +#
>> +# verifying reset slave resets
>> +#
>
> Please be more explicit in the comment (see above).
>
>> +
>> +stop slave;
>> +change master to master_user='impossible_user_name';
>> +start slave;
>> +source include/wait_for_slave_io_to_stop.inc;
>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>> +--disable_query_log
>> +eval SELECT $last_io_errno > 0 as ONE;
>> +--enable_query_log
>> +
>> +stop slave;
>> +reset slave;
>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>> +let $last_io_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Error, 1);
>> +--echo *** last errno must be  zero: $last_io_errno ***
>> +--echo *** last error must be blank: $last_io_error ***
>
> I think the test should also verify that "RESET SLAVE" resets Last_SQL_Err*.
>

Well, the fixes have not done anything in that regard but why not.

Added.

>> diff -Nrup a/mysql-test/suite/rpl/r/rpl_row_reset_slave.result
> b/mysql-test/suite/rpl/r/rpl_row_reset_slave.result
>> --- a/mysql-test/suite/rpl/r/rpl_row_reset_slave.result	2008-02-05 15:29:36
> +02:00
>> +++ b/mysql-test/suite/rpl/r/rpl_row_reset_slave.result	2008-05-01 16:26:01
> +03:00
>> @@ -175,3 +175,24 @@ start slave;
>>  show status like 'slave_open_temp_tables';
>>  Variable_name	Value
>>  Slave_open_temp_tables	0
>> +stop slave;
>> +reset slave;
>> +*** errno must be zero: 0 ***
>> +change master to master_user='impossible_user_name';
>> +start slave;
>> +ONE
>> +1
>> +stop slave;
>> +change master to master_user='root';
>> +start slave;
>> +*** last errno must be  zero: 0 ***
>> +*** last error must be blank:  ***
>> +stop slave;
>> +change master to master_user='impossible_user_name';
>> +start slave;
>> +ONE
>> +1
>> +stop slave;
>> +reset slave;
>> +*** last errno must be  zero: 0 ***
>> +*** last error must be blank:  ***
>> diff -Nrup a/mysql-test/suite/rpl/r/rpl_stm_reset_slave.result
> b/mysql-test/suite/rpl/r/rpl_stm_reset_slave.result
>> --- a/mysql-test/suite/rpl/r/rpl_stm_reset_slave.result	2008-02-05 15:29:36
> +02:00
>> +++ b/mysql-test/suite/rpl/r/rpl_stm_reset_slave.result	2008-05-01 16:26:01
> +03:00
>> @@ -175,3 +175,24 @@ start slave;
>>  show status like 'slave_open_temp_tables';
>>  Variable_name	Value
>>  Slave_open_temp_tables	1
>> +stop slave;
>> +reset slave;
>> +*** errno must be zero: 0 ***
>> +change master to master_user='impossible_user_name';
>> +start slave;
>> +ONE
>> +1
>> +stop slave;
>> +change master to master_user='root';
>> +start slave;
>> +*** last errno must be  zero: 0 ***
>> +*** last error must be blank:  ***
>> +stop slave;
>> +change master to master_user='impossible_user_name';
>> +start slave;
>> +ONE
>> +1
>> +stop slave;
>> +reset slave;
>> +*** last errno must be  zero: 0 ***
>> +*** last error must be blank:  ***
>> diff -Nrup a/sql/slave.cc b/sql/slave.cc
>> --- a/sql/slave.cc	2008-04-01 13:56:42 +03:00
>> +++ b/sql/slave.cc	2008-05-01 16:26:01 +03:00
>> @@ -2114,6 +2114,7 @@ pthread_handler_t handle_slave_io(void *
>>     pthread_detach_this_thread();
>>    thd->thread_stack= (char*) &thd; // remember where our stack is
>> +  mi->clear_error();
>>    if (init_slave_thread(thd, SLAVE_THD_IO))
>>    {
>>      pthread_cond_broadcast(&mi->start_cond);
>> @@ -2228,6 +2229,7 @@ requesting master dump") ||
>>          goto connected;
>>        });
>>  +    mi->clear_error(); // getting into read loop with clear sheet
>
> This place is actually redundant: the error is already cleared at this
> point.
>
> The error was cleared in the beginning of handle_slave_io(). If new
> errors are reported after that, either the errors will be cleared
> again before reaching the read loop, or the read loop will not be
> reached: here is the list of all possible errors reported in
> handle_slave_io() before the read loop:
>
>  - in handle_slave_io() if mysql_init() fails, but then we execute
> goto err'.
>
>  - in get_master_version_and_clock(), but then we return from
> handle_slave_io().
>
>  - in register_slave_on_master(), but only if it fails. In this case,
> either 'goto err' is executed or try_to_reconnect() succeeds and
> clears the error (actually, try_to_reconnect calls safe_reconnect
> which calls connect_to_master which resets the error).
>

You are correct that changes done in connect_to_master() and the
following events logics guarantee the status has been cleared by this
point.

Let's guard from future intrusions with

DBUG_ASSERT(mi->last_error().number == 0);

instead.

>
> Argh, the reconnect logic of handle_slave_io is so horrible :-( . I'd
> like to see this refactored in the future (not in this bug, of
> course).
>
>>      while (!io_slave_killed(thd,mi))
>>      {
>>        ulong event_len;
>> @@ -3411,6 +3413,7 @@ static int connect_to_master(THD* thd, M
>>     if (!slave_was_killed)
>>    {
>> +    mi->clear_error();
>>      if (reconnect)
>>      {
>>        if (!suppress_warnings && global_system_variables.log_warnings)
>> diff -Nrup a/sql/sql_repl.cc b/sql/sql_repl.cc
>> --- a/sql/sql_repl.cc	2008-03-08 12:14:45 +02:00
>> +++ b/sql/sql_repl.cc	2008-05-01 16:26:01 +03:00
>> @@ -1114,6 +1114,7 @@ int reset_slave(THD *thd, Master_info* m
>>       Reset errors (the idea is that we forget about the
>>       old master).
>>    */
>> +  mi->clear_error();
>>    mi->rli.clear_error();
>>    mi->rli.clear_until_condition();
>>  
>>
>
> -- 
> Sven Sandberg, Software Engineer
> MySQL AB, www.mysql.com
>
> -- 
> MySQL Code Commits Mailing List
> For list archives: http://lists.mysql.com/commits
> To unsubscribe:    http://lists.mysql.com/commits?unsub=1
>

a newer patch has been committed.

regards,

Andrei
Thread
bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Andrei Elkin1 May 2008
  • Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Sven Sandberg2 May 2008
    • Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Andrei Elkin8 May 2008
      • Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Sven Sandberg9 May 2008
        • Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Magnus Svensson21 May 2008