Hi Andrei!
Comments inline below.
/Matz
Andrei Elkin wrote:
> Below is the list of changes that have just been committed into a local
> 5.1 repository of aelkin. When aelkin does a push these changes
> will be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>
> ChangeSet@stripped, 2008-05-07 15:49:18+03:00, aelkin@stripped +3 -0
> Bug #36524 incident of deadlock on slave is overdramatized
>
> In cases of temporary errors in replication event execution (deadlock, timeout)
> the error log gained an overreacting error message whereas just a warning would be
> fine.
>
> Fixed with checking of the temporary status of the error
> inside Slave_reporting_capability::report() to demote the error to the warning
> level if needed.
> The warning gets converted into the error whenever number of successive attempts to
> re-apply
> the event(s) gets equal to the global server variable slave_trans_retries.
>
> sql/rpl_reporting.cc@stripped, 2008-05-07 15:49:16+03:00, aelkin@stripped
> +16 -0
> putting burden of decision whether to report the message as the error or
> as the warning to report() method.
>
> sql/slave.cc@stripped, 2008-05-07 15:49:16+03:00, aelkin@stripped +4 -1
> exporting the function to be used in log_event.cc;
> promoting a temporary error to the fatal level so that
> has_temporary_error() won't return TRUE if is_fatal_error is ON.
>
> sql/slave.h@stripped, 2008-05-07 15:49:16+03:00, aelkin@stripped +1 -0
> the function starts to be shared by log_event.cc as well.
>
> diff -Nrup a/sql/rpl_reporting.cc b/sql/rpl_reporting.cc
> --- a/sql/rpl_reporting.cc 2007-06-11 23:15:28 +03:00
> +++ b/sql/rpl_reporting.cc 2008-05-07 15:49:16 +03:00
> @@ -2,6 +2,20 @@
> #include "mysql_priv.h"
> #include "rpl_reporting.h"
>
> +/**
> + The method composes a message for the error log and/or
> + for future show slave status exposure.
> +
> + @param level designate seriousity of the message.
> + @param err_code the user level error code
> + @param msg the format string of the message,
> + can be followed with more arguments
> + providing the data for the format string
> +
> + @note level can be corrected inside. E.g when the error
> + appears to be a temprorary the level is demoted
>
typo "temprorary"
> + to WARNING_LEVEL
> + */
> void
> Slave_reporting_capability::report(loglevel level, int err_code,
> const char *msg, ...) const
> @@ -13,6 +27,8 @@ Slave_reporting_capability::report(logle
> va_list args;
> va_start(args, msg);
>
> + if (level == ERROR_LEVEL && has_temporary_error(current_thd))
> + level= WARNING_LEVEL;
>
Why do you put the logic here? Isn't the what log level should be used a
decision to be made by the caller? This seems to put to many features
into the function, which previously had the role of just reporting an
error at a certain level.
Why is this a problem? This will make all errors reported through this
interface be demoted provided the current thread has a temporary error,
regardless of what is actually being reported. This can cause two
strange situations, to a naïve user of the reporting function:
1. A different error than the error present for the current thread is
passed to the reporting function. This can be a decision on the
callers behalf to regard a certain temporary error as a fatal
error, but the reporting function will demote this error to a
warning level just because the caller has "forgotten" to set the
"fatal error" flag.
2. It is necessary to add code to prevent the error message to be
repeated ad-nauseum, which means that part of the logic for
deciding if this should be a warning is in the caller and part is
inside the reporting function. This split just introduces an
unnecessary coupling between the caller and the callee.
I would like you to pass in the correct log-level to the reporting
function instead, which will also keep the has_temporary_error()
function inside the file and not exposing it outwards.
> switch (level)
> {
> case ERROR_LEVEL:
> diff -Nrup a/sql/slave.cc b/sql/slave.cc
> --- a/sql/slave.cc 2008-03-31 11:55:42 +03:00
> +++ b/sql/slave.cc 2008-05-07 15:49:16 +03:00
> @@ -1780,7 +1780,7 @@ int check_expected_error(THD* thd, Relay
> that the error is temporary by pushing a warning with the error code
> ER_GET_TEMPORARY_ERRMSG, if the originating error is temporary.
> */
> -static int has_temporary_error(THD *thd)
> +int has_temporary_error(THD *thd)
> {
> DBUG_ENTER("has_temporary_error");
>
> @@ -2121,10 +2121,13 @@ static int exec_relay_log_event(THD* thd
> }
> }
> else
> + {
> sql_print_error("Slave SQL thread retried transaction %lu time(s) "
> "in vain, giving up. Consider raising the value of "
> "the slave_transaction_retries variable.",
> slave_trans_retries);
> + thd->is_fatal_error= 1; // to force has_temporary_error() return 0
> + }
> }
> else if (exec_res && !temp_err ||
> (opt_using_transactions &&
> diff -Nrup a/sql/slave.h b/sql/slave.h
> --- a/sql/slave.h 2008-03-29 14:19:49 +02:00
> +++ b/sql/slave.h 2008-05-07 15:49:16 +03:00
> @@ -190,6 +190,7 @@ void set_slave_thread_default_charset(TH
> void rotate_relay_log(Master_info* mi);
> int apply_event_and_update_pos(Log_event* ev, THD* thd, Relay_log_info* rli,
> bool skip);
> +inline int has_temporary_error(THD *thd);
>
It is pointless to *declare* a function inline when it doesn't have a
definition. Remove the keyword, it just adds confusion.
>
> pthread_handler_t handle_slave_io(void *arg);
> pthread_handler_t handle_slave_sql(void *arg);
>
>
--
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com