Alexey, hello.
> Below is the list of changes that have just been committed into a local
> 5.1 repository of kaa. When kaa does a push these changes will
> be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>
> ChangeSet@stripped, 2007-11-14 14:42:40+03:00, kaa@polly.(none) +1 -0
> Fix for bug #29976: Excessive Slave I/O errors in replication tests
>
> Problem:
>
> The "Slave I/O thread couldn't register on master" error sporadically
> occurred in replication tests because the slave I/O thread got
> killed by STOP SLAVE before or while registering on master.
>
> Fixed by checking the state of the I/O thread, and issueing
> the error only if it was not explicitely killed by a user.
>
I think that's a sound idea. If we know why the IO thread can not
connect we should not issue an error. The error is for a case we
don't know why.
What I'd like for you to do is to extend it to cover related issues.
Before the IO thread gains ON status it makes several client-api
calls. One your patch is fixing and others are done just before
register_slave_on_master() in get_master_version_and_clock().
So let's not panic spewing error messages whenever a killed io fails
to query the master.
> sql/slave.cc@stripped, 2007-11-14 14:42:24+03:00, kaa@polly.(none) +12 -7
> When the slave I/O thread fails to register on master, issue an error
> message only if it is not explicitely killed by a user with STOP SLAVE.
>
> diff -Nrup a/sql/slave.cc b/sql/slave.cc
> --- a/sql/slave.cc 2007-08-30 01:28:34 +04:00
> +++ b/sql/slave.cc 2007-11-14 14:42:24 +03:00
> @@ -137,6 +137,7 @@ static int terminate_slave_thread(THD *t
> pthread_cond_t* term_cond,
> volatile uint *slave_running,
> bool skip_lock);
> +static bool check_io_slave_killed(THD *thd, Master_info *mi, const char *info);
>
> /*
> Find out which replications threads are running
> @@ -1223,7 +1224,7 @@ int register_slave_on_master(MYSQL* mysq
> {
> *suppress_warnings= TRUE; // Suppress reconnect warning
> }
> - else
> + else if (!check_io_slave_killed(mi->io_thd, mi, NULL))
> {
> char buf[256];
> my_snprintf(buf, sizeof(buf), "%s (Errno: %d)", mysql_error(mysql),
> @@ -1985,7 +1986,7 @@ static bool check_io_slave_killed(THD *t
> {
> if (io_slave_killed(thd, mi))
> {
> - if (global_system_variables.log_warnings)
> + if (info && global_system_variables.log_warnings)
> sql_print_information(info);
> return TRUE;
> }
> @@ -2170,11 +2171,15 @@ connected:
> thd->proc_info = "Registering slave on master";
> if (register_slave_on_master(mysql, mi, &suppress_warnings))
> {
> - sql_print_error("Slave I/O thread couldn't register on master");
> - if (check_io_slave_killed(thd, mi, "Slave I/O thread killed while \
> -registering slave on master") ||
> - try_to_reconnect(thd, mysql, mi, &retry_count, suppress_warnings,
> - reconnect_messages[SLAVE_RECON_ACT_REG]))
> + if (!check_io_slave_killed(thd, mi, "Slave I/O thread killed "
> + "while registering slave on master"))
> + {
> + sql_print_error("Slave I/O thread couldn't register on
> master");
I think we are better to remove this like as there is already
mi->report(ERROR_LEVEL,...) in register_slave_on_master().
> + if (try_to_reconnect(thd, mysql, mi, &retry_count, suppress_warnings,
> + reconnect_messages[SLAVE_RECON_ACT_REG]))
> + goto err;
> + }
> + else
> goto err;
> goto connected;
> }
>
thanks for the great analysis and work!
cheers,
Andrei