Hi Zhenxing and happy (western) new year,
Great work, patch approved! I have nothing to add, both the analysis and
the patch seem sound.
/Sven
He Zhenxing wrote:
> #At file:///media/sdb2/hezx/work/mysql/bzrwork/b41188/6.0-rpl/
>
> 2773 He Zhenxing 2008-12-29
> BUG#41188 rpl_ndb_denote_gap fails sporadically: wrong Last_IO_Error
>
> When master shuts down, the slave I/O thread can fail due to
> various reasons, and result in various error code and message,
> for example:
> 1) reconnect error
> 2) error in get_master_version_and_clock
> 3) error in register_slave_on_master
> The actual situation can be more complicated than this.
>
> There are some other problems with this test case:
> 1) in include/wait_until_disconnected.inc, which set the
> mysql_errno to a wrong value
> 2) waiting for reconnection on the wrong host
> 3) did not start slave after slave stops, and the reason why it
> worked was because when SHOW SLAVE STATUS says I/O slave is
> not running, it can be RUN_NOT_CONNECT and reconnecting, so
> when the master later comes up soon enough, it will
> reconnect to the master.
>
> This patch fixes the problem by:
> 1) removing I/O thread error code and message because they
> varies and are not important to this test
> 2) adding missing --connection master before waiting
> reconnection to master
> 3) starting slave IO thread before waiting for SQL thread to
> stop
> 4) adding warning suppression for slave I/O thread
> modified:
> mysql-test/suite/rpl/t/rpl_slave_status.test
> mysql-test/suite/rpl_ndb/r/rpl_ndb_denote_gap.result
> mysql-test/suite/rpl_ndb/t/rpl_ndb_denote_gap.test
>
> per-file messages:
> mysql-test/suite/rpl/t/rpl_slave_status.test
> wait slave I/O to stop before show status
> mysql-test/suite/rpl_ndb/r/rpl_ndb_denote_gap.result
> update result
> mysql-test/suite/rpl_ndb/t/rpl_ndb_denote_gap.test
> Add missing --connection master before waiting reconnection to
> master
> Start slave IO thread before waiting for SQL thread to stop
> Remove showing the error code and message after master shutdown
> because they can vary and are not neccessary for this
> === modified file 'mysql-test/suite/rpl/t/rpl_slave_status.test'
> --- a/mysql-test/suite/rpl/t/rpl_slave_status.test 2008-12-24 10:48:24 +0000
> +++ b/mysql-test/suite/rpl/t/rpl_slave_status.test 2008-12-29 09:16:51 +0000
> @@ -54,6 +54,7 @@ sync_slave_with_master;
> source include/stop_slave.inc;
> start slave;
> source include/wait_for_slave_sql_to_start.inc;
> +source include/wait_for_slave_io_to_stop.inc;
>
> --echo ==== Verify that Slave_IO_Running = No ====
> let $result= query_get_value("SHOW SLAVE STATUS", Slave_IO_Running, 1);
>
> === modified file 'mysql-test/suite/rpl_ndb/r/rpl_ndb_denote_gap.result'
> --- a/mysql-test/suite/rpl_ndb/r/rpl_ndb_denote_gap.result 2008-09-09 18:47:34 +0000
> +++ b/mysql-test/suite/rpl_ndb/r/rpl_ndb_denote_gap.result 2008-12-29 09:16:51 +0000
> @@ -9,9 +9,8 @@ start slave;
> * slave status*
> Slave_IO_Running = No
> Slave_SQL_Running = Yes
> -Last_IO_Errno = 2013
> -Last_IO_Error = error reconnecting to master
> * start master *
> +start slave io_thread;
> * slave status *
> Slave_IO_Running = Yes
> Slave_SQL_Running = No
>
> === modified file 'mysql-test/suite/rpl_ndb/t/rpl_ndb_denote_gap.test'
> --- a/mysql-test/suite/rpl_ndb/t/rpl_ndb_denote_gap.test 2008-09-09 18:47:34 +0000
> +++ b/mysql-test/suite/rpl_ndb/t/rpl_ndb_denote_gap.test 2008-12-29 09:16:51 +0000
> @@ -15,6 +15,18 @@
> --source include/ndb_master-slave.inc
> --echo
>
> +# When master shuts down, the slave I/O thread can fail due to various
> +# reasons, and result in various error code and message, for example:
> +# 1) reconnect error
> +# 2) error in get_master_version_and_clock
> +# 3) error in register_slave_on_master
> +# So add suppresions for these errors.
> +--disable_query_log
> +call mtr.add_suppression("slave I/O thread stops");
> +call mtr.add_suppression("Slave I/O thread couldn't register on master");
> +call mtr.add_suppression("Slave I/O: Master command COM_REGISTER_SLAVE failed");
> +--enable_query_log
> +
> # Stop master mysql server
> --echo * shutdown master *
> --connection master
> @@ -32,12 +44,6 @@ let $io_run= query_get_value("SHOW SLAVE
> echo Slave_IO_Running = $io_run;
> let $sql_run= query_get_value("SHOW SLAVE STATUS", Slave_SQL_Running, 1);
> echo Slave_SQL_Running = $sql_run;
> -let $errno= query_get_value("SHOW SLAVE STATUS", Last_IO_Errno, 1);
> -echo Last_IO_Errno = $errno;
> -let $error= query_get_value("SHOW SLAVE STATUS", Last_IO_Error, 1);
> -#--eval SELECT SUBSTRING_INDEX('$error', 'master', -1)
> -let $error= `SELECT SUBSTRING_INDEX("$error", "'", 1)`;
> -echo Last_IO_Error = $error;
>
> # Start master server again
> --echo * start master *
> @@ -46,11 +52,31 @@ restart
> EOF
>
> # Reconnect to master
> +--connection master
> --enable_reconnect
> --source include/wait_until_connected_again.inc
>
> -# Wait for stop slave SQL thread with error LOST EVENTS
> --connection slave
> +
> +# restart the slave IO thread
> +#
> +# NOTE: wait_for_slave_io_to_stop.inc above only waits for the
> +# Slave_IO_running of SHOW SLAVE STATUS to become 'No', but this does
> +# not mean the slave I/O thread is truly stopped, it will try to
> +# reconnect to master for 'master_retry_count' times before giving up,
> +# so if later the master restarts quickly enough, the slave I/O thread
> +# will automatically reconnect to master and running.
> +#
> +# So the slave I/O thread still be running and the following statement
> +# can emit a warning.
> +#
> +# See BUG#41613, this restriction can be lifted if this bug is fixed.
> +#
> +--disable_warnings
> +start slave io_thread;
> +--enable_warnings
> +
> +# Wait for stop slave SQL thread with error LOST EVENTS
> --source include/wait_for_slave_sql_to_stop.inc
> --echo * slave status *
> let $io_run= query_get_value("SHOW SLAVE STATUS", Slave_IO_Running, 1);
>
>
--
Sven Sandberg, Software Engineer
MySQL AB, www.mysql.com