List:Commits« Previous MessageNext Message »
From:Sven Sandberg Date:May 9 2008 10:33am
Subject:Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654
View as plain text  
Hi,

Thanks for your work, the patch is approved!

/Sven


Andrei Elkin wrote:
> Sven Sandberg <sven@stripped> writes:
> 
>> Hi Andrei!
>>
>> Thanks for fixing this.
>>
>> As we discussed on IRC, please replace 'while (!io_slave_killed)' by
>> if (!io_slave_killed)' in the outer loop of handle_slave_io.
>>
> 
> I could not help it and changed further to eliminate the outermost
> while altogether.
> 
> Please look at newer patch.
> Just hope you will stop me at this point :)
> 
>> There is one place where errors are cleared redundantly. I would also
>> like more explicit comments in the test case, and a test case for
>> Last_SQL_Err*. See below.
>>
>> /Sven
>>
>>
>> Andrei Elkin wrote:
>>> Below is the list of changes that have just been committed into a local
>>> 6.0 repository of aelkin.  When aelkin does a push these changes
>>> will be propagated to the main repository and, within 24 hours after the
>>> push, to the public repository.
>>> For information on how to access the public repository
>>> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>>>
>>> ChangeSet@stripped, 2008-05-01 16:26:05+03:00, aelkin@stripped +5
> -0
>>>   Bug #34654 RESET SLAVE and START SLAVE does not clear Last_IO_Errno
>>>     There were no resetting of the io thread's error status at
>>> reset_slave() and   at the beginning of handle_slave_io() that
>>> corresponds to start slave.
>>>     Fixed with deploying clean-up at necessary places of the code.
>>>
>>>   mysql-test/extra/rpl_tests/rpl_reset_slave.test@stripped, 2008-05-01
> 16:26:01+03:00, aelkin@stripped +51 -0
>>>     adding regression test for bug#34654 that tests clearing the IO error
> status
>>>     for start slave and reset slave.
>>>
>>>   mysql-test/suite/rpl/r/rpl_row_reset_slave.result@stripped, 2008-05-01
> 16:26:01+03:00, aelkin@stripped +21 -0
>>>     results changed
>>>
>>>   mysql-test/suite/rpl/r/rpl_stm_reset_slave.result@stripped, 2008-05-01
> 16:26:01+03:00, aelkin@stripped +21 -0
>>>     results changed
>>>
>>>   sql/slave.cc@stripped, 2008-05-01 16:26:01+03:00, aelkin@stripped
> +3 -0
>>>     explicitly clearing the last mi's error at
>>>         1. entering handle_slave_io
>>>     2. getting to the read loop after passing successfully
>>>        connecting, checking master version etc, registering and
>>>        requesting for the dump
>>>     3. right after successful connecting
>>>
>>>   sql/sql_repl.cc@stripped, 2008-05-01 16:26:01+03:00,
> aelkin@stripped +1 -0
>>>     reset slave now resets the io thread error.
>>>
>>> diff -Nrup a/mysql-test/extra/rpl_tests/rpl_reset_slave.test
> b/mysql-test/extra/rpl_tests/rpl_reset_slave.test
>>> --- a/mysql-test/extra/rpl_tests/rpl_reset_slave.test	2008-02-05 15:29:36
> +02:00
>>> +++ b/mysql-test/extra/rpl_tests/rpl_reset_slave.test	2008-05-01 16:26:01
> +03:00
>>> @@ -42,3 +42,54 @@ reset slave;
>>>  start slave;
>>>  sync_with_master;
>>>  show status like 'slave_open_temp_tables';
>>> +
>>> +#
>>> +#Bug#34654  	RESET SLAVE does not clear LAST_IO_Err* +#
>>> +
>>> +# clearing the status
>>> +stop slave;
>>> +reset slave;
>>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>>> +echo *** errno must be zero: $last_io_errno ***;
>>> +
>>> +#
>>> +# verifying start slave resets
>>> +#
>> Please be more explicit in the comment describing the purpose of the
>> test. You can write something like:
>>
>> # Verify that "START SLAVE" resets Last_IO_Error and Last_IO_Errno.
>>
>>> +
>>> +change master to master_user='impossible_user_name';
>>> +start slave;
>>> +source include/wait_for_slave_io_to_stop.inc;
>>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>>> +--disable_query_log
>>> +eval SELECT $last_io_errno > 0 as ONE;
>>> +--enable_query_log
>>> +
>>> +stop slave;
>>> +change master to master_user='root';
>>> +start slave;
>>> +source include/wait_for_slave_to_start.inc;
>>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>>> +let $last_io_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Error, 1);
>>> +--echo *** last errno must be  zero: $last_io_errno ***
>>> +--echo *** last error must be blank: $last_io_error ***
>>> +
>>> +#
>>> +# verifying reset slave resets
>>> +#
>> Please be more explicit in the comment (see above).
>>
>>> +
>>> +stop slave;
>>> +change master to master_user='impossible_user_name';
>>> +start slave;
>>> +source include/wait_for_slave_io_to_stop.inc;
>>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>>> +--disable_query_log
>>> +eval SELECT $last_io_errno > 0 as ONE;
>>> +--enable_query_log
>>> +
>>> +stop slave;
>>> +reset slave;
>>> +let $last_io_errno= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>>> +let $last_io_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Error, 1);
>>> +--echo *** last errno must be  zero: $last_io_errno ***
>>> +--echo *** last error must be blank: $last_io_error ***
>> I think the test should also verify that "RESET SLAVE" resets Last_SQL_Err*.
>>
> 
> Well, the fixes have not done anything in that regard but why not.
> 
> Added.
> 
>>> diff -Nrup a/mysql-test/suite/rpl/r/rpl_row_reset_slave.result
> b/mysql-test/suite/rpl/r/rpl_row_reset_slave.result
>>> --- a/mysql-test/suite/rpl/r/rpl_row_reset_slave.result	2008-02-05 15:29:36
> +02:00
>>> +++ b/mysql-test/suite/rpl/r/rpl_row_reset_slave.result	2008-05-01 16:26:01
> +03:00
>>> @@ -175,3 +175,24 @@ start slave;
>>>  show status like 'slave_open_temp_tables';
>>>  Variable_name	Value
>>>  Slave_open_temp_tables	0
>>> +stop slave;
>>> +reset slave;
>>> +*** errno must be zero: 0 ***
>>> +change master to master_user='impossible_user_name';
>>> +start slave;
>>> +ONE
>>> +1
>>> +stop slave;
>>> +change master to master_user='root';
>>> +start slave;
>>> +*** last errno must be  zero: 0 ***
>>> +*** last error must be blank:  ***
>>> +stop slave;
>>> +change master to master_user='impossible_user_name';
>>> +start slave;
>>> +ONE
>>> +1
>>> +stop slave;
>>> +reset slave;
>>> +*** last errno must be  zero: 0 ***
>>> +*** last error must be blank:  ***
>>> diff -Nrup a/mysql-test/suite/rpl/r/rpl_stm_reset_slave.result
> b/mysql-test/suite/rpl/r/rpl_stm_reset_slave.result
>>> --- a/mysql-test/suite/rpl/r/rpl_stm_reset_slave.result	2008-02-05 15:29:36
> +02:00
>>> +++ b/mysql-test/suite/rpl/r/rpl_stm_reset_slave.result	2008-05-01 16:26:01
> +03:00
>>> @@ -175,3 +175,24 @@ start slave;
>>>  show status like 'slave_open_temp_tables';
>>>  Variable_name	Value
>>>  Slave_open_temp_tables	1
>>> +stop slave;
>>> +reset slave;
>>> +*** errno must be zero: 0 ***
>>> +change master to master_user='impossible_user_name';
>>> +start slave;
>>> +ONE
>>> +1
>>> +stop slave;
>>> +change master to master_user='root';
>>> +start slave;
>>> +*** last errno must be  zero: 0 ***
>>> +*** last error must be blank:  ***
>>> +stop slave;
>>> +change master to master_user='impossible_user_name';
>>> +start slave;
>>> +ONE
>>> +1
>>> +stop slave;
>>> +reset slave;
>>> +*** last errno must be  zero: 0 ***
>>> +*** last error must be blank:  ***
>>> diff -Nrup a/sql/slave.cc b/sql/slave.cc
>>> --- a/sql/slave.cc	2008-04-01 13:56:42 +03:00
>>> +++ b/sql/slave.cc	2008-05-01 16:26:01 +03:00
>>> @@ -2114,6 +2114,7 @@ pthread_handler_t handle_slave_io(void *
>>>     pthread_detach_this_thread();
>>>    thd->thread_stack= (char*) &thd; // remember where our stack is
>>> +  mi->clear_error();
>>>    if (init_slave_thread(thd, SLAVE_THD_IO))
>>>    {
>>>      pthread_cond_broadcast(&mi->start_cond);
>>> @@ -2228,6 +2229,7 @@ requesting master dump") ||
>>>          goto connected;
>>>        });
>>>  +    mi->clear_error(); // getting into read loop with clear sheet
>> This place is actually redundant: the error is already cleared at this
>> point.
>>
>> The error was cleared in the beginning of handle_slave_io(). If new
>> errors are reported after that, either the errors will be cleared
>> again before reaching the read loop, or the read loop will not be
>> reached: here is the list of all possible errors reported in
>> handle_slave_io() before the read loop:
>>
>>  - in handle_slave_io() if mysql_init() fails, but then we execute
>> goto err'.
>>
>>  - in get_master_version_and_clock(), but then we return from
>> handle_slave_io().
>>
>>  - in register_slave_on_master(), but only if it fails. In this case,
>> either 'goto err' is executed or try_to_reconnect() succeeds and
>> clears the error (actually, try_to_reconnect calls safe_reconnect
>> which calls connect_to_master which resets the error).
>>
> 
> You are correct that changes done in connect_to_master() and the
> following events logics guarantee the status has been cleared by this
> point.
> 
> Let's guard from future intrusions with
> 
> DBUG_ASSERT(mi->last_error().number == 0);
> 
> instead.
> 
>> Argh, the reconnect logic of handle_slave_io is so horrible :-( . I'd
>> like to see this refactored in the future (not in this bug, of
>> course).
>>
>>>      while (!io_slave_killed(thd,mi))
>>>      {
>>>        ulong event_len;
>>> @@ -3411,6 +3413,7 @@ static int connect_to_master(THD* thd, M
>>>     if (!slave_was_killed)
>>>    {
>>> +    mi->clear_error();
>>>      if (reconnect)
>>>      {
>>>        if (!suppress_warnings &&
> global_system_variables.log_warnings)
>>> diff -Nrup a/sql/sql_repl.cc b/sql/sql_repl.cc
>>> --- a/sql/sql_repl.cc	2008-03-08 12:14:45 +02:00
>>> +++ b/sql/sql_repl.cc	2008-05-01 16:26:01 +03:00
>>> @@ -1114,6 +1114,7 @@ int reset_slave(THD *thd, Master_info* m
>>>       Reset errors (the idea is that we forget about the
>>>       old master).
>>>    */
>>> +  mi->clear_error();
>>>    mi->rli.clear_error();
>>>    mi->rli.clear_until_condition();
>>>  
>>>
>> -- 
>> Sven Sandberg, Software Engineer
>> MySQL AB, www.mysql.com
>>
>> -- 
>> MySQL Code Commits Mailing List
>> For list archives: http://lists.mysql.com/commits
>> To unsubscribe:    http://lists.mysql.com/commits?unsub=1
>>
> 
> a newer patch has been committed.
> 
> regards,
> 
> Andrei

-- 
Sven Sandberg, Software Engineer
MySQL AB, www.mysql.com
Thread
bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Andrei Elkin1 May
  • Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Sven Sandberg2 May
    • Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Andrei Elkin8 May
      • Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Sven Sandberg9 May
        • Re: bk commit into 6.0 tree (aelkin:1.2646) BUG#34654Magnus Svensson21 May