Hi Libin,
Great work.
Cheers.
Li-Bing.Song@stripped wrote:
> #At file:///home/anders/work/bzrwork/bug45516/mysql-5.1-bugteam/ based on
> revid:v.narayanan@stripped
>
> 3162 Li-Bing.Song@stripped 2009-10-19
> Bug#21437 server_errno=29 error message flood mysqld error log
>
> Sometimes, Slaves requests a binlog file which does not exists on master.
> On this occasion, master will send an error to slave. The error number is 29
> Which means file does not find. This error message will lead a flood.
>
> Instead of error 29, error
> COM_BINLOG_DUMP has a group of its own errors, The errors which have happend
> will be
> sent to slave immediately.
> Before the COM_BINLOG_DUMP error is reported by this function, an error
> sometimes has been set into thd->main_da by a low level function such as
> my_register_filename, which is called during COM_BINLOG_DUMP is
> executing. Thus, instead of the COM_BINLOG_DUMP error, the low level
> error is sent to slave.
> The low level error should never be sent to slave, it only be reported
> to master.
I would change the comments above as follows:
Bug#21437 server_errno=29 error message flood mysqld error log
If an error happens while dumping a binary log from the master, the
master sends an specific set of error messages and the slave I/O
thread stops immediately.
However, when a slave requests a binlog file that does not exist on
master, the master sends 'EE_FILENOTFOUND' error (i.e. error code 29)
to the slave. The 'EE_FILENOTFOUND' is a low level error and the slave
I/O thread will still retry to request the binlog file thus flooding
the mysqld error log.
This happens because the master sends the first error message set in
the diagnostic area and as such the I/O thread does not stop. To fix
the problem, we reset the diagnostic area before setting the
ER_MASTER_FATAL_ERROR_READING_BINLOG in order to notify and stop the
slave I/O Thread immediately.
---Just a suggestion.---
>
> added:
> mysql-test/suite/rpl/r/rpl_master_fatal_error.result
> mysql-test/suite/rpl/t/rpl_master_fatal_error.test
> modified:
> sql/sql_repl.cc
> === added file 'mysql-test/suite/rpl/r/rpl_master_fatal_error.result'
> --- a/mysql-test/suite/rpl/r/rpl_master_fatal_error.result 1970-01-01 00:00:00 +0000
> +++ b/mysql-test/suite/rpl/r/rpl_master_fatal_error.result 2009-10-19 08:23:22 +0000
> @@ -0,0 +1,20 @@
> +stop slave;
> +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
> +reset master;
> +reset slave;
> +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
> +start slave;
> +call mtr.add_suppression("Got fatal error 1236 from master when reading data from
> binary log:");
> +CREATE TABLE t1(c1 int);
> +DROP TABLE t1;
> +FLUSH LOGS;
> +CREATE TABLE t1(c1 int);
> +DROP TABLE t1;
> +STOP SLAVE;
> +CHANGE MASTER TO master_host='127.0.0.1', master_log_file='master-bin.000001';
> +START SLAVE;
> +Last_IO_Error
> +echo Got fatal error 1236 from master when reading data from binary log: 'Could not
> open log file'
> +STOP SLAVE SQL_THREAD;
> +CHANGE MASTER TO master_host='127.0.0.1', master_log_file='master-bin.000002';
> +START SLAVE;
>
> === added file 'mysql-test/suite/rpl/t/rpl_master_fatal_error.test'
> --- a/mysql-test/suite/rpl/t/rpl_master_fatal_error.test 1970-01-01 00:00:00 +0000
> +++ b/mysql-test/suite/rpl/t/rpl_master_fatal_error.test 2009-10-19 08:23:22 +0000
> @@ -0,0 +1,47 @@
> +###############################################################################
> +# Bug #21437 server_errno=29 error message flood mysqld error log
> +#
> +# server_errno 29 is EE_FILENOTFOUND. It should never be reported to slave.
> +# Instead of EE_FILENOTFOUND, ER_MASTER_FATAL_ERROR_READING_BINLOG and the
> +# specific reason "Could not open log file" are sent to slave.
> +################################################################################
> +
> +source include/master-slave.inc;
> +#Add supression for the expected error in slave error log
> +call mtr.add_suppression("Got fatal error 1236 from master when reading data from
> binary log:");
> +
> +CREATE TABLE t1(c1 int);
> +DROP TABLE t1;
> +
> +FLUSH LOGS;
> +
> +CREATE TABLE t1(c1 int);
> +DROP TABLE t1;
> +sync_slave_with_master;
> +
> +STOP SLAVE;
> +source include/wait_for_slave_to_stop.inc;
> +
> +#This operation will result in error EE_FILENOTFOUND which happens when slave IO
> thread is connecting to master.
> +move_file $MYSQL_TEST_DIR/var/mysqld.1/data/master-bin.000001
> $MYSQL_TEST_DIR/var/mysqld.1/data/master-bin.bak;
> +
> +connection slave;
> +CHANGE MASTER TO master_host='127.0.0.1', master_log_file='master-bin.000001';
> +START SLAVE;
> +source include/wait_for_slave_io_to_stop.inc;
> +
> +# Instead of EE_FILENOTFOUND, ER_MASTER_FATAL_ERROR_READING_BINLOG and the
> +# specific information are sent to slave.
> +let $last_io_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Error, 1);
> +echo Last_IO_Error
> +echo $last_io_error;
> +
> +move_file $MYSQL_TEST_DIR/var/mysqld.1/data/master-bin.bak
> $MYSQL_TEST_DIR/var/mysqld.1/data/master-bin.000001;
> +
> +STOP SLAVE SQL_THREAD;
> +source include/wait_for_slave_sql_to_stop.inc;
> +CHANGE MASTER TO master_host='127.0.0.1', master_log_file='master-bin.000002';
> +START SLAVE;
> +source include/wait_for_slave_to_start.inc;
> +
> +source include/master-slave-end.inc;
This test case is sporadically failing:
rpl.rpl_master_fatal_error 'row' w2 [ fail ]
Test ended at 2009-11-06 14:01:09
CURRENT_TEST: rpl.rpl_master_fatal_error
mysqltest: At line 26: command "move_file" failed with error 1
The result from queries just before the failure was:
stop slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
reset master;
reset slave;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
start slave;
call mtr.add_suppression("Got fatal error 1236 from master when reading data from binary
log:");
CREATE TABLE t1(c1 int);
DROP TABLE t1;
FLUSH LOGS;
CREATE TABLE t1(c1 int);
DROP TABLE t1;
STOP SLAVE;
>
> === modified file 'sql/sql_repl.cc'
> --- a/sql/sql_repl.cc 2009-09-18 08:20:29 +0000
> +++ b/sql/sql_repl.cc 2009-10-19 08:23:22 +0000
> @@ -777,6 +777,20 @@ err:
> if (file >= 0)
> (void) my_close(file, MYF(MY_WME));
>
> + /*
> + Bug #21437 server_errno=29 error message flood mysqld error log
> +
> + COM_BINLOG_DUMP has a group of its own errors, The errors which have happend
> will be
> + sent to slave immediately.
> + Before the COM_BINLOG_DUMP error is reported by this function, an error
> + sometimes has been set into thd->main_da by a low level function such as
> + my_register_filename, which is called during COM_BINLOG_DUMP is
> + executing. Thus, instead of the COM_BINLOG_DUMP error, the low level
> + error is sent to slave.
> + The low level error should never be sent to slave, it only be reported
> + to master.
> + */
I don't think you need to reference the bug number.
Just paste an explanation why you need to reset the diagnostic area.
> + thd->main_da.reset_diagnostics_area();
> my_message(my_errno, errmsg, MYF(0));
> DBUG_VOID_RETURN;
> }
>
>
>
> ------------------------------------------------------------------------
>
>