MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:Mats Kindahl Date:May 7 2008 7:32pm
Subject:Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524
View as plain text  
Hi Andrei!

Comments inline below.

/Matz

Andrei Elkin wrote:
> Below is the list of changes that have just been committed into a local
> 5.1 repository of aelkin.  When aelkin does a push these changes
> will be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>
> ChangeSet@stripped, 2008-05-07 15:49:18+03:00, aelkin@stripped +3 -0
>   Bug #36524 incident of deadlock on slave is overdramatized
>   
>   In cases of temporary errors in replication event execution (deadlock, timeout)
>   the error log gained an overreacting error message whereas just a warning would be
> fine.
>   
>   Fixed with checking of the temporary status of the error 
>   inside Slave_reporting_capability::report() to demote the error to the warning
> level if needed.
>   The warning gets converted into the error whenever number of successive attempts to
> re-apply
>   the event(s) gets equal to the global server variable slave_trans_retries.
>
>   sql/rpl_reporting.cc@stripped, 2008-05-07 15:49:16+03:00, aelkin@stripped
> +16 -0
>     putting burden of decision whether to report the message as the error or
>     as the warning to report() method.
>
>   sql/slave.cc@stripped, 2008-05-07 15:49:16+03:00, aelkin@stripped +4 -1
>     exporting the function to be used in log_event.cc;
>     promoting a temporary error to the fatal level so that
>     has_temporary_error() won't return TRUE if is_fatal_error is ON.
>
>   sql/slave.h@stripped, 2008-05-07 15:49:16+03:00, aelkin@stripped +1 -0
>     the function starts to be shared by log_event.cc as well.
>
> diff -Nrup a/sql/rpl_reporting.cc b/sql/rpl_reporting.cc
> --- a/sql/rpl_reporting.cc	2007-06-11 23:15:28 +03:00
> +++ b/sql/rpl_reporting.cc	2008-05-07 15:49:16 +03:00
> @@ -2,6 +2,20 @@
>  #include "mysql_priv.h"
>  #include "rpl_reporting.h"
>  
> +/**
> +   The method composes a message for the error log and/or
> +   for future show slave status exposure.
> +
> +   @param level     designate seriousity of the message.
> +   @param err_code  the user level error code
> +   @param msg       the format string of the message,
> +                    can be followed with more arguments
> +                    providing the data for the format string
> +
> +   @note  level can be corrected inside. E.g when the error
> +          appears to be a temprorary the level is demoted
>   
typo "temprorary"

> +          to WARNING_LEVEL
> + */
>  void
>  Slave_reporting_capability::report(loglevel level, int err_code,
>                                     const char *msg, ...) const
> @@ -13,6 +27,8 @@ Slave_reporting_capability::report(logle
>    va_list args;
>    va_start(args, msg);
>  
> +  if (level == ERROR_LEVEL && has_temporary_error(current_thd))
> +    level= WARNING_LEVEL;
>   

Why do you put the logic here? Isn't the what log level should be used a 
decision to be made by the caller? This seems to put to many features 
into the function, which previously had the role of just reporting an 
error at a certain level.

Why is this a problem? This will make all errors reported through this 
interface be demoted provided the current thread has a temporary error, 
regardless of what is actually being reported. This can cause two 
strange situations, to a naïve user of the reporting function:

   1. A different error than the error present for the current thread is
      passed to the reporting function. This can be a decision on the
      callers behalf to regard a certain temporary error as a fatal
      error, but the reporting function will demote this error to a
      warning level just because the caller has "forgotten" to set the
      "fatal error" flag.
   2. It is necessary to add code to prevent the error message to be
      repeated ad-nauseum, which means that part of the logic for
      deciding if this should be a warning is in the caller and part is
      inside the reporting function. This split just introduces an
      unnecessary coupling between the caller and the callee.

I would like you to pass in the correct log-level to the reporting 
function instead, which will also keep the has_temporary_error() 
function inside the file and not exposing it outwards.

>    switch (level)
>    {
>    case ERROR_LEVEL:
> diff -Nrup a/sql/slave.cc b/sql/slave.cc
> --- a/sql/slave.cc	2008-03-31 11:55:42 +03:00
> +++ b/sql/slave.cc	2008-05-07 15:49:16 +03:00
> @@ -1780,7 +1780,7 @@ int check_expected_error(THD* thd, Relay
>    that the error is temporary by pushing a warning with the error code
>    ER_GET_TEMPORARY_ERRMSG, if the originating error is temporary.
>  */
> -static int has_temporary_error(THD *thd)
> +int has_temporary_error(THD *thd)
>  {
>    DBUG_ENTER("has_temporary_error");
>  
> @@ -2121,10 +2121,13 @@ static int exec_relay_log_event(THD* thd
>            }
>          }
>          else
> +        {
>            sql_print_error("Slave SQL thread retried transaction %lu time(s) "
>                            "in vain, giving up. Consider raising the value of "
>                            "the slave_transaction_retries variable.",
>                            slave_trans_retries);
> +          thd->is_fatal_error= 1; // to force has_temporary_error() return 0
> +        }
>        }
>        else if (exec_res && !temp_err ||
>                 (opt_using_transactions &&
> diff -Nrup a/sql/slave.h b/sql/slave.h
> --- a/sql/slave.h	2008-03-29 14:19:49 +02:00
> +++ b/sql/slave.h	2008-05-07 15:49:16 +03:00
> @@ -190,6 +190,7 @@ void set_slave_thread_default_charset(TH
>  void rotate_relay_log(Master_info* mi);
>  int apply_event_and_update_pos(Log_event* ev, THD* thd, Relay_log_info* rli,
>                                 bool skip);
> +inline int has_temporary_error(THD *thd);
>   

It is pointless to *declare* a function inline when it doesn't have a 
definition. Remove the keyword, it just adds confusion.

>  
>  pthread_handler_t handle_slave_io(void *arg);
>  pthread_handler_t handle_slave_sql(void *arg);
>
>   


-- 
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com


Thread
bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin7 May
  • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl7 May
    • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin8 May
      • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl13 May
        • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin13 May
          • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin17 Jun
            • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl25 Jun
              • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin25 Jun
                • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl25 Jun
                  • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin29 Jun