List:Commits« Previous MessageNext Message »
From:Andrei Elkin Date:June 25 2009 2:46pm
Subject:Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524
View as plain text  
Mats, hello.

> Andrei Elkin wrote:
>> Mats, hello.
>> 
>> You are still formally a reviewer for the bug.  Since the last time we
>> discussed the issue and my idea of fixes it got a customer.
>> I have analyzed my last changeset once again to be utmost confident
>> in the idea implemented in
>> 
>> http://lists.mysql.com/commits/46451
>> 
>> Let me put it all once again to avoid spawning yet more our last year
>> thread.
>> 
>> THE PROBLEM
>> 
>> We have overreaching error messages in the slave's server error log.
>> The message is either  Deadlock or Timeout in waiting for a lock
>> at applying a replication event. It is not really a problem but rather
>> transient issue and should not be noticed by the user.
>> \footnote{The temporary status changes to the regular once the slave
>>           has re-tried to apply the failing events unsuccessfully for
>>           rli->trans_retries == slave_trans_retries times}
>> 
>> 
>> THE OFFERED SOLUTION
>> 
>> To avoid logging an error having temporary status the reporting method
>> 
>>    rli->report(error description args) 
>> 
>> should call has_temporary_error() to sort out the actual status.
>> If the error is temporary it won't force an error line in the error
>> log.
>> 
>> 
>> 
>> THE IMPLEMENTATION
>> 
>> rli->report() demotes the error's severity to the warning level
>> basing on has_temporary_error() report.
>> 
>> has_temporary_error() becomes available in rpl_reporting.cc as well
>> still to be executed exclusively by the SQL thread.
>> 
>> The temporary to the regular status promotion is achieved as simple as
>> with raising up thd->is_fatal_error flag.
>> 
>> 
>> BEING DISCUSSED ISSUES
>> 
>> has_temporary_error() is not necessary to be precise in detection of
>>   the temporary status. The function is rather optimistic to give a
>>   chance to re-apply (uselessly) an event e.g in a case of the last
>>   applying cycle produced a sequence of errors with the temporary in
>>   the end: a-reg-err_1, ..., a-temp-err_N.
>> 
>>   This optimistic feature is irrelevant to the idea of the
>>   solution. It does not make the SQL slave not to notice any errors
>>   and when it eventually stops all errors and warnings are printed
>>   out.
>> 
>> A technical issue to change signature of
>>   Slave_reporting_capability::report to accept THD* as an arg:
>>   rli->report(..., THD *thd, ...)
>> 
>>   I would do that, still in a separate patch.
>> 
>> 
>> I would be happy to hear your opinion.

>
> The problem is still the following, to which I have not seen a convincing
> explanation or alternative solution: the reporting function report() demotes the
> error to a warning if the thread has a temporary error, which may or may not be
> the same as the one that is passed.
>
> Suppose that the function is used in this way somewhere, maybe by mistake, maybe
> intentional, with a new error code ER_DEADLOCK_IS_FATAL:
>
> /*
>    If we get a deadlock here, this is to be reported as an error.
>  */
> int error= thd->stmt_da->sql_errno();
> if (error == ER_LOCK_DEADLOCK)
>   thd->report(ERROR_LEVEL, ER_DEADLOCK_IS_FATAL, "A fatal deadlock occured");

That would be an incorrect use.
What's a point of translating a temporary error into an ordinary one?
The ordinary error would be induced by a temporary, but it's not equal
to the temporary.
Therefore the correct way to achive the goal is to report the ordinary,
like ER_DEADLOCK_IS_FATAL, not instead but afterward.
And if one *really* needs translation he has to *remove* all traces of 
the temporary, incl ones from the diagnostics area, and to report the ordinary
on a "clean sheet".

>
> Now, since the patch demotes the error to a warning inside report(), this will
> cause the function above to not do what was intended because the calling code
> thinks this is an error (ER_DEADLOCK_IS_FATAL) while report() thinks this is a
> warning (ER_LOCK_DEADLOCK). This occurs because has_temporary_error() talks
> about the error stored in the diagnostics area (ER_LOCK_DEADLOCK) while the
> calling code talks about the code passed to the report() function
> (ER_DEADLOCK_IS_FATAL).
>
> From my point of view, you can solve this either by:
>
> - Convincing me that this cannot happen, i.e.: that nobody currently does or
>   ever wants to do anything like the above.

I believe it can not happen while we stick to some sensible principle.
One is suggest above: don't do tricks instead report new error.

>
> - Changing the patch to ensure that has_temporary_error() and calling code talks
>   about the same error code either by:
>
>   - Passing the error code passed to report() further on to the
>     has_temporary_error() function when doing the check
>

>   - Removing the error code from the parameter list of the report() function and
>     reading the error code from the diagnostics area inside the report()
>     function

Yes, the 2nd is something I did not make but liked.

Passing the last error through da does not make the convention of not doing tricks
irrevant. But it would make that activity exposed more so that reviewers hardly will
miss the chance to ask about.

I believe we are on the same page now and I am submitting the da-error-trasport patch
in a short time.

cheers,

Andrei

Thread
bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin7 May
  • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl7 May
    • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin8 May
      • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl13 May
        • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin13 May
          • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin17 Jun 2009
            • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl25 Jun 2009
              • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin25 Jun 2009
                • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl25 Jun 2009
                  • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin29 Jun 2009