Hi Alexey,
I think the patch looks good. I have two suggestions: (1) about comments
and (2) about test cases.
(1) This is a very difficult piece of code, and I would suggest to add
comment explaining under what conditions the error message is removed
(this will make the code safer for the future).
It discards errors if check_io_slave_killed returns true.
check_io_slave_killed returns true if io_slave_killed() returns true.
io_slave_killed returns true under this condition:
DBUG_RETURN(mi->abort_slave || abort_loop || thd->killed);
mi->abort_slave is in the following conditions:
- by the slave when STOP SLAVE is issued (sql_repl.cc:stop_slave()
calls slave.cc:terminate_slave_threads() which sets the flag
- by the slave when the condition in "START SLAVE UNTIL condition" is
met (slave.cc:exec_relay_log_event()). However, this cannot happen while
starting the slave.
abort_loop and thd->killed are not set by "STOP SLAVE" AFAICT; they are
set when the slave server is shut down for various reasons. I think it
is ok to hide these messages (doesn't matter if you get a few messages
on shutdown or not), but the behavior should be documented so that the
code can be edited in a safe way in the future.
(2) Would it be difficult to create a test case? Just something that does
START SLAVE;
STOP SLAVE;
should suffice, right? If it's not much more difficult than that, I
think it should be included.
Best regards,
Sven
Alexey Kopytov wrote:
> Below is the list of changes that have just been committed into a local
> 5.1 repository of kaa. When kaa does a push these changes will
> be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>
> ChangeSet@stripped, 2007-11-14 14:42:40+03:00, kaa@polly.(none) +1 -0
> Fix for bug #29976: Excessive Slave I/O errors in replication tests
>
> Problem:
>
> The "Slave I/O thread couldn't register on master" error sporadically
> occurred in replication tests because the slave I/O thread got
> killed by STOP SLAVE before or while registering on master.
>
> Fixed by checking the state of the I/O thread, and issueing
> the error only if it was not explicitely killed by a user.
>
> sql/slave.cc@stripped, 2007-11-14 14:42:24+03:00, kaa@polly.(none) +12 -7
> When the slave I/O thread fails to register on master, issue an error
> message only if it is not explicitely killed by a user with STOP SLAVE.
>
> diff -Nrup a/sql/slave.cc b/sql/slave.cc
> --- a/sql/slave.cc 2007-08-30 01:28:34 +04:00
> +++ b/sql/slave.cc 2007-11-14 14:42:24 +03:00
> @@ -137,6 +137,7 @@ static int terminate_slave_thread(THD *t
> pthread_cond_t* term_cond,
> volatile uint *slave_running,
> bool skip_lock);
> +static bool check_io_slave_killed(THD *thd, Master_info *mi, const char *info);
>
> /*
> Find out which replications threads are running
> @@ -1223,7 +1224,7 @@ int register_slave_on_master(MYSQL* mysq
> {
> *suppress_warnings= TRUE; // Suppress reconnect warning
> }
> - else
> + else if (!check_io_slave_killed(mi->io_thd, mi, NULL))
> {
> char buf[256];
> my_snprintf(buf, sizeof(buf), "%s (Errno: %d)", mysql_error(mysql),
> @@ -1985,7 +1986,7 @@ static bool check_io_slave_killed(THD *t
> {
> if (io_slave_killed(thd, mi))
> {
> - if (global_system_variables.log_warnings)
> + if (info && global_system_variables.log_warnings)
> sql_print_information(info);
> return TRUE;
> }
> @@ -2170,11 +2171,15 @@ connected:
> thd->proc_info = "Registering slave on master";
> if (register_slave_on_master(mysql, mi, &suppress_warnings))
> {
> - sql_print_error("Slave I/O thread couldn't register on master");
> - if (check_io_slave_killed(thd, mi, "Slave I/O thread killed while \
> -registering slave on master") ||
> - try_to_reconnect(thd, mysql, mi, &retry_count, suppress_warnings,
> - reconnect_messages[SLAVE_RECON_ACT_REG]))
> + if (!check_io_slave_killed(thd, mi, "Slave I/O thread killed "
> + "while registering slave on master"))
> + {
> + sql_print_error("Slave I/O thread couldn't register on master");
> + if (try_to_reconnect(thd, mysql, mi, &retry_count, suppress_warnings,
> + reconnect_messages[SLAVE_RECON_ACT_REG]))
> + goto err;
> + }
> + else
> goto err;
> goto connected;
> }
>