Hi Luís,
Good work with the test case. I have some suggestions below. Mainly, I'd
like to simplify some things in the test. Note: I'll review it once
again after you have committed the bugfix, so that I can run it on a
server where it succeeds.
Luís Soares wrote:
> #At file:///stuff/workspace/mysql-server/5.1-rpl-39325/
>
> 2670 Luís Soares 2008-10-07
> BUG#39325: In MYSQL_LOG::purge_first_log the order of operations is to first
> call
We usually start the changeset comment with:
BUG#<n>: <bug title>
So in this case it would be
BUG#39325: Server crash inside MYSQL_LOG::purge_first_log halts replication
> MYSQL_LOG::purge_log(), which deletes the relay log and then updates the relay
> index, and
> then to call flush_relay_log_info(), If the slave dies before the call to
> flush_relay_log_info(), on restart of the server, replication stops with an
> error.
>
> This commits the test case to reproduce the deffective behavior.
speling error: defective
> added:
> mysql-test/suite/rpl/t/rpl_relay_info_crash_slave.test
> modified:
> sql/log.cc
>
> === added file 'mysql-test/suite/rpl/t/rpl_relay_info_crash_slave.test'
> --- a/mysql-test/suite/rpl/t/rpl_relay_info_crash_slave.test 1970-01-01 00:00:00
> +0000
> +++ b/mysql-test/suite/rpl/t/rpl_relay_info_crash_slave.test 2008-10-07 10:03:06
> +0000
> @@ -0,0 +1,108 @@
> +########################################################################
> +# #
> +# Testing seamless slave restarting after crashing before flushing #
> +# the relay log info. #
> +# #
> +# Date: 2009-10-03 #
> +# Developer: lsoares (Luis.Soares@stripped) #
Please avoid dates and names in files: it's difficult to maintain, and
everything is available from 'bzr annotate' anyways.
> +# Reason: BUG#39325 #
> +# #
> +# Note: This test borrows code from tests: #
> +# * t/crash_commit_before.test #
> +# * extra/rpl_tests/rpl_max_relay_size.test #
I think this is unnecessary, we steal all the time from tests, no need
to keep track of that :-)
> +########################################################################
> +
> +
> +# This test REQUIRES:
> +# 1. master slave replication setup
> +# 2. server compiled with debug
I think it's unnecessary to state this, it doesn't say more than the two
lines below...
> +--source include/master-slave.inc
> +--source include/have_debug.inc
You probably need to source include/not_valgrind.inc as well, to avoid
warnings for the memory leak when crashing the server (like
crash_commit_before).
> +
> +##### Option, for debugging support #####
> +let $debug= 1;
I think the printouts guarded by this flag are good, no need for a
special flag guarding them. I suggest removing $debug and make it print
the connection name unconditionally.
> +
> +##### SETUP ######
> +
> +# We have to sync with master, to ensure slave had time to start properly
> +# # before we stop it. If not, we get errors about UNIX_TIMESTAMP() in the
> +# log.
> +sync_slave_with_master;
I think this can be removed, it's probably not needed.
> +connection slave;
> +stop slave;
> +connection master;
> +
> +if ( $debug ) {
> + --echo *** On Master ***
> +}
> +
> +--disable_warnings
> +DROP TABLE IF EXISTS t1;
> +--enable_warnings
Not needed: include/master-slave.inc does that already.
> +
> +CREATE TABLE t1 (id INT);
> +
> +##### Test ######
> +# Outline:
> +# 1. insert on master
> +# 2. start slave
> +# 3. crash slave before flushing relay logs
> +# 4. restart slave
> +# 5. sync it with master
> +
> +# on the master insert some rows
> +let $1=10;
> +disable_query_log;
> +begin;
> +while ($1)
> +{
> + eval INSERT INTO t1 VALUES( $1 );
> + dec $1;
> +}
> +enable_query_log;
I don't think this test needs any rows in the table, so this can
probably be removed.
> +DROP TABLE t1;
> +save_master_pos;
> +
> +# on the slave:
> +# 1. reset it and start it
> +# 2. set the crash flag
> +# 3. set the restart flag
> +# 4. flush logs
> +# 5. sync it with master
> +
> +if ( $debug ) {
> + --echo *** On Slave ***
> +}
> +
> +connection slave;
> +reset slave;
> +start slave;
I think there is a race here: the slave may either finish replicating
before the crash, or may have events still not replicated before the
crash. I think you can make the test deterministic by adding
sync_slave_with_master here...
> +
> +SET GLOBAL debug= "+d,crash_before_flush_relay_log_info";
> +
> +# Write file to make mysql-test-run.pl expect crash and restart
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.2.expect
> +
> +if ( $debug ) {
> + --echo * On Slave: Flushing logs.
> +}
> +
> +# expect one of this errors (lost connection)
> +--error 2006,2013
> +flush logs;
> +
> +# enable slave restart
> +--enable_reconnect
> +
> +# wait until the connection is up again
> +--source include/wait_until_connected_again.inc
> +--source include/wait_for_slave_to_start.inc
... and if it is important to have events that are still unreplicated at
crash time, you can do something on master here, and then
sync_slave_with_master.
> +
> +#### TEAR DOWN ####
> +
> +#
> +# As the master already does a DROP TABLE t1
> +# there is no need to redo it here.
> +
> +# sync with master to clean everything up
> +sync_with_master 0;
>
> === modified file 'sql/log.cc'
> --- a/sql/log.cc 2008-10-06 08:27:36 +0000
> +++ b/sql/log.cc 2008-10-07 10:03:06 +0000
> @@ -3013,6 +3013,9 @@ int MYSQL_BIN_LOG::purge_first_log(Relay
> rli->notify_group_relay_log_name_update();
> }
>
> + /* fault-injection required for bug#39325 (test: rpl.rpl_relay_info_crash_slave)
> */
> + DBUG_EXECUTE_IF("crash_before_flush_relay_log_info", abort(););
> +
> /* Store where we are in the new file for the execution thread */
> flush_relay_log_info(rli);
>
>
--
Sven Sandberg, Software Engineer
MySQL AB, www.mysql.com