List:Commits« Previous MessageNext Message »
From:Rafal Somla Date:June 29 2007 11:01am
Subject:Re: bk commit into 5.1 tree (ramil:1.2524) BUG#19328
View as plain text  
Hi Ramil,

Thanks for implementing my suggestions and especially adding documentation to 
try_to_reconnect(). Below I propose how to extend it, hopefully making it a bit 
more clear what the function actually does.

I have also one more suggestion (implement it if you like it). I'd move handling 
of master_retry_count outside try_to_reconnect() function. I.e., make the 
function to reconnect unconditionally and simply not call it if we don't want 
to. I know it will cause code repetition but I think it will make the code 
easier to read and understand.

Best,
Rafal

ramil@stripped wrote:
> Below is the list of changes that have just been committed into a local
> 5.1 repository of ram. When ram does a push these changes will
> be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
> 
> ChangeSet@stripped, 2007-06-28 17:18:29+05:00, ramil@stripped +1 -0
>   Fix for bug #19328: Slave timeout with COM_REGISTER_SLAVE error causing stop
>   
>   Problem: "Under high load, the slave registering to the master can timeout 
>   during the COM_REGISTER_SLAVE execution. This causes an error, which 
>   prevents the slave from connecting at all."
>   
>   Fix: Do not abort immediately, but retry registering on master.
> 
>   sql/slave.cc@stripped, 2007-06-28 17:18:27+05:00, ramil@stripped +203 -100
>     Fix for bug #19328: Slave timeout with COM_REGISTER_SLAVE error causing stop
>       - retry connecting after a failed slave registration on master.
>         Reconnecting has been factored out (see try_to_reconnect() and 
>         check_io_slave_killed()) to get rid of similar code used during 
>         registering on master/dump requesting/event reading.
>         All messages have been moved to reconnect_messages[] array 
>         for easy usage and maintenance.
>       - now one can force slave to reconnect using 
>         ./mtr --mysqld=--loose-debug=d,FORCE_SLAVE_TO_RECONNECT_{REG, DUMP, EVENT}
> 
> diff -Nrup a/sql/slave.cc b/sql/slave.cc
> --- a/sql/slave.cc	2007-06-12 01:15:29 +05:00
> +++ b/sql/slave.cc	2007-06-28 17:18:27 +05:00
(...)
> @@ -1879,6 +1942,89 @@ on this slave.\
>  }
>  
>  
> +static bool check_io_slave_killed(THD *thd, MASTER_INFO *mi, const char *info)
> +{
> +  if (io_slave_killed(thd, mi))
> +  {
> +    if (global_system_variables.log_warnings)
> +      sql_print_information(info);
> +    return TRUE;
> +  }
> +  return FALSE;
> +}
> +
> +
> +/*

Doxygen comment should start with /**

> +  @brief Try to reconnect slave IO thread.
> +  

I propose to add this to detailed description:

Terminates current connection to master, sleeps for @c mi->connect_retry msecs 
and initiates new connection with @c safe_reconnect(). Variable pointed by @c 
retry_count is increased - if it exceeds @c master_retry_count then connection 
is not re-established and function signals error.

Unless suppres_warnings is TRUE, a warning is put in the server error log when 
reconnecting. The warning message and messages used to report errors are taken 
from @c messages array. In case @c master_retry_count is exceeded, no messages 
are added to the log.

> +  @details Under high load, the slave registering on the master/
> +  requesting a dump/reading an event can timeout during execution.
> +  If so, don't abort immediately, but retry reconnecting up to
> +  master_retry_count times.
> +  
> +  @param[in]     thd                 Thread context.
> +  @param[in]     mysql               MySQL connection.
> +  @param[in]     mi                  Master connection information.
> +  @param[in,out] retry_count         Number of attempts to reconnect.
> +  @param[in]     suppress_warnings   TRUE when a normal net read timeout 
> +                                     has caused to reconnecting.
> +  @param[in]     messages            Messages to print/log, see 
> +                                     reconnect_messages[] array.
> +
> +  @retval        0                   OK.
> +  @retval        1                   There was an error.
> +*/
> +
> +static int try_to_reconnect(THD *thd, MYSQL *mysql, MASTER_INFO *mi,
> +                            uint *retry_count, bool suppress_warnings,
> +                            const char *messages[SLAVE_RECON_MSG_MAX])
> +{
> +  mi->slave_running= MYSQL_SLAVE_RUN_NOT_CONNECT;
> +  thd->proc_info= messages[SLAVE_RECON_MSG_WAIT];
> +#ifdef SIGNAL_WITH_VIO_CLOSE  
> +  thd->clear_active_vio();
> +#endif
> +  end_server(mysql);
> +  if ((*retry_count)++)
> +  {
> +    if (*retry_count > master_retry_count)
> +      return 1;                             // Don't retry forever
> +    safe_sleep(thd, mi->connect_retry, (CHECK_KILLED_FUNC) io_slave_killed,
> +               (void *) mi);
> +  }
> +  if (check_io_slave_killed(thd, mi, messages[SLAVE_RECON_MSG_KILLED_WAITING]))
> +    return 1;
> +  thd->proc_info = messages[SLAVE_RECON_MSG_AFTER];
> +  if (!suppress_warnings) 
> +  {
> +    char buf[256], llbuff[22];
> +    my_snprintf(buf, sizeof(buf), messages[SLAVE_RECON_MSG_FAILED], 
> +                IO_RPL_LOG_NAME, llstr(mi->master_log_pos, llbuff));
> +    /* 
> +      Raise a warining during registering on master/requesting dump.
> +      Log a message reading event.
> +    */
> +    if (messages[SLAVE_RECON_MSG_COMMAND][0])
> +    {
> +      mi->report(WARNING_LEVEL, ER_SLAVE_MASTER_COM_FAILURE,
> +                 ER(ER_SLAVE_MASTER_COM_FAILURE), 
> +                 messages[SLAVE_RECON_MSG_COMMAND], buf);
> +    }
> +    else
> +    {
> +      sql_print_information(buf);
> +    }
> +  }
> +  if (safe_reconnect(thd, mysql, mi, 1) || io_slave_killed(thd, mi))
> +  {
> +    if (global_system_variables.log_warnings)
> +      sql_print_information(messages[SLAVE_RECON_MSG_KILLED_AFTER]);
> +    return 1;
> +  }
> +  return 0;
> +}
> +
Thread
bk commit into 5.1 tree (ramil:1.2524) BUG#19328ramil28 Jun
  • Re: bk commit into 5.1 tree (ramil:1.2524) BUG#19328Rafal Somla29 Jun