Hi Libing,
Nice Work. Please find my review comments below.
STATUS
------
Not approved.
REQUIRED CHANGES
----------------
RC1. The test case needs some work, as I believe it is not
platform agnostic. For example, take this change:
+write_file $datadir/master-bin.index;
+./master-bin.000002
+EOF
In windows the path separator is not './' but instead
'.\'. This needs to be fixed.
RC2. I think there is another windows issue with this part of
the test, related to removing and creating a new file,
while MySQL has it opened. The specific part of the test I
am wondering is:
+remove_file $datadir/master-bin.index;
+write_file $datadir/master-bin.index;
+./master-bin.000002
+EOF
You need to try this test on windows before pushing. If
you find problems with this, I suggest to try something
like (basically, we truncate the file and then append the
required lines, so that we dont delete and create it
again):
let INDEX_FILE=$index;
perl;
$file= $ENV{'INDEX_FILE'};
open(FILE, ">$file") || die "Unable to open $file.";
truncate(FILE,0);
close ($file);
EOF
-- append_file $index
master-bin.000002
EOF
RC3. I also don't think that this is entirely correct:
"Manually changing index file is a illegal action, ..."
We allow the user to change the index file (even more
after BUG 34582). What happens is that once he does that,
it's his own responsibility. It's like playing with fire,
so the user should expect some issues and this seems
definitely like one of them.
Would you mind clarifying that in the commit message?
Furthermore, I think you need to clarify what happens. For
instance, taking your test as an example, if I stop the
slave thread before master changing the index and then
restart it after the FLUSH LOGS, the test just hangs... Is
this expected behavior? Should this be documented?
The portion of the test that I changed:
STOP SLAVE IO_THREAD;
source include/wait_for_slave_io_to_stop.inc;
# Now, all events has been replicate to slave. As current dump file
# (master-bin.000002) is the last binlog file, so master is waiting for
new
# events.
connection master;
# Delete './master-bin.000001' from index file.
let $datadir= `SELECT @@DATADIR`;
remove_file $datadir/master-bin.index;
write_file $datadir/master-bin.index;
./master-bin.000002
EOF
# Now, only 1 entry in index file.
# ./master-bin.000002
# Generate master-bin.000003, but it is in the second line.
FLUSH LOGS;
# Now, 2 entries in index file.
# ./master-bin.000002
# ./master-bin.000003
-- connection slave
START SLAVE IO_THREAD;
source include/wait_for_slave_io_to_start.inc;
-- connection master
REQUESTS
--------
n/a
SUGGESTIONS
-----------
n/a
DETAILS
-------
On Thu, 2009-12-10 at 12:10 +0000, Li-Bing.Song@stripped wrote:
> #At file:///home/anders/work/bzrwork/worktree2/mysql-5.1-bugteam/ based on
> revid:li-bing.song@stripped
>
> 3215 Li-Bing.Song@stripped 2009-12-10
> BUG #28421 Infinite loop on slave relay logs
>
> That, manually deleteing one or more entries from 'master-bin.index', will
> cause master infinitely loop to send one binlog file.
>
suggestion: Maybe loose the 'That' ^, and just make:
"Manually deleting one or more ..."
> When starting a dump session, master opens index file and search the binlog
> file
> which is being requested by the slave. The position of the binlog file in the
> index file is recorded. it will be used to find the next binlog file when
> current
> binlog file has been dumped completely. As only the position is used, it will
> can
> not get the correct file if some entries has been removed manually from the
> index file.
suggestion: It may not get the correct file
> the master will reopen the current binlog file which has been dump completely
suggestion: s/has been dump/has dumped
> and redump it if it can not get the next binlog file's name from index file.
> It obviously is a logical error.
>
>
> Manually changing index file is a illegal action, so after this patch, master
> sends a fatal error to slave and close the dump session if a new binlog file
> has been generated and master can not get it from the index file.
See RC3.
> added:
> mysql-test/suite/rpl/r/rpl_manual_change_index_file.result
> mysql-test/suite/rpl/t/rpl_manual_change_index_file.test
> modified:
> sql/sql_repl.cc
> === added file 'mysql-test/suite/rpl/r/rpl_manual_change_index_file.result'
> --- a/mysql-test/suite/rpl/r/rpl_manual_change_index_file.result 1970-01-01 00:00:00
> +0000
> +++ b/mysql-test/suite/rpl/r/rpl_manual_change_index_file.result 2009-12-10 12:10:33
> +0000
> @@ -0,0 +1,25 @@
> +stop slave;
> +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
> +reset master;
> +reset slave;
> +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
> +start slave;
> +FLUSH LOGS;
> +CREATE TABLE t1(c1 INT);
> +FLUSH LOGS;
> +call mtr.add_suppression('Got fatal error 1236 from master when reading data from
> binary log: .*could not find next log');
> +Last_IO_Error
> +Got fatal error 1236 from master when reading data from binary log: 'could not find
> next log'
> +CREATE TABLE t2(c1 INT);
> +FLUSH LOGS;
> +CREATE TABLE t3(c1 INT);
> +FLUSH LOGS;
> +CREATE TABLE t4(c1 INT);
> +START SLAVE IO_THREAD;
> +SHOW TABLES;
> +Tables_in_test
> +t1
> +t2
> +t3
> +t4
> +DROP TABLE t1, t2, t3, t4;
>
> === added file 'mysql-test/suite/rpl/t/rpl_manual_change_index_file.test'
> --- a/mysql-test/suite/rpl/t/rpl_manual_change_index_file.test 1970-01-01 00:00:00
> +0000
> +++ b/mysql-test/suite/rpl/t/rpl_manual_change_index_file.test 2009-12-10 12:10:33
> +0000
> @@ -0,0 +1,69 @@
> +source include/master-slave.inc;
> +
> +#
> +# BUG#28421 Infinite loop on slave relay logs
> +#
> +# That, manually deleteing one or more entries from 'master-bin.index', will
> +# cause master infinitely loop to send one binlog file.
> +#
> +# Manually changing index file is a illegal action, so when this happen, we
> +# send a fatal error to slave and close the dump session.
> +
> +FLUSH LOGS;
> +# Now, 2 entries in index file.
> +# ./master-bin.000001
> +# ./master-bin.000002
> +
> +CREATE TABLE t1(c1 INT);
> +# Now, the current dump file(master-bin.000002) is the second line of index
> +# file
> +sync_slave_with_master;
> +# Now, all events has been replicate to slave. As current dump file
> +# (master-bin.000002) is the last binlog file, so master is waiting for new
> +# events.
> +
> +connection master;
> +# Delete './master-bin.000001' from index file.
> +let $datadir= `SELECT @@DATADIR`;
> +remove_file $datadir/master-bin.index;
> +write_file $datadir/master-bin.index;
> +./master-bin.000002
> +EOF
RC1 and RC2 ^
> +# Now, only 1 entry in index file.
> +# ./master-bin.000002
> +
> +# Generate master-bin.000003, but it is in the second line.
> +FLUSH LOGS;
> +# Now, 2 entries in index file.
> +# ./master-bin.000002
> +# ./master-bin.000003
> +
> +# Now, master know that new binlog file(master-bin.000003) has been generated.
> +# It expects that the new binlog file is in third line of index file, but
> +# there is no third line in index file. It is so strange that master sends an
> +# error to slave.
> +call mtr.add_suppression('Got fatal error 1236 from master when reading data from
> binary log: .*could not find next log');
> +connection slave;
> +source include/wait_for_slave_io_to_stop.inc;
> +let $last_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Error, 1);
> +echo Last_IO_Error;
> +echo $last_error;
> +
> +connection master;
> +CREATE TABLE t2(c1 INT);
> +FLUSH LOGS;
> +CREATE TABLE t3(c1 INT);
> +FLUSH LOGS;
> +CREATE TABLE t4(c1 INT);
> +
> +connection slave;
> +START SLAVE IO_THREAD;
> +source include/wait_for_slave_io_to_start.inc;
> +
> +connection master;
> +sync_slave_with_master;
> +SHOW TABLES;
> +
> +connection master;
> +DROP TABLE t1, t2, t3, t4;
> +source include/master-slave-end.inc;
>
> === modified file 'sql/sql_repl.cc'
> --- a/sql/sql_repl.cc 2009-09-18 08:20:29 +0000
> +++ b/sql/sql_repl.cc 2009-12-10 12:10:33 +0000
> @@ -711,11 +711,14 @@ impossible position";
>
> thd_proc_info(thd, "Finished reading one binlog; switching to next binlog");
> switch (mysql_bin_log.find_next_log(&linfo, 1)) {
> - case LOG_INFO_EOF:
> - loop_breaker = (flags & BINLOG_DUMP_NON_BLOCK);
> - break;
> case 0:
> break;
> + case LOG_INFO_EOF:
> + if (mysql_bin_log.is_active(log_file_name))
> + {
> + loop_breaker = (flags & BINLOG_DUMP_NON_BLOCK);
> + break;
> + }
> default:
> errmsg = "could not find next log";
> my_errno= ER_MASTER_FATAL_ERROR_READING_BINLOG;
>