List:Commits« Previous MessageNext Message »
From:Mats Kindahl Date:June 25 2009 2:59pm
Subject:Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524
View as plain text  

Andrei Elkin wrote:
> Mats, hello.
> 
>> Andrei Elkin wrote:
>>> Mats, hello.
>>>
>>> You are still formally a reviewer for the bug.  Since the last time we
>>> discussed the issue and my idea of fixes it got a customer.
>>> I have analyzed my last changeset once again to be utmost confident
>>> in the idea implemented in
>>>
>>> http://lists.mysql.com/commits/46451
>>>
>>> Let me put it all once again to avoid spawning yet more our last year
>>> thread.
>>>
>>> THE PROBLEM
>>>
>>> We have overreaching error messages in the slave's server error log.
>>> The message is either  Deadlock or Timeout in waiting for a lock
>>> at applying a replication event. It is not really a problem but rather
>>> transient issue and should not be noticed by the user.
>>> \footnote{The temporary status changes to the regular once the slave
>>>           has re-tried to apply the failing events unsuccessfully for
>>>           rli->trans_retries == slave_trans_retries times}
>>>
>>>
>>> THE OFFERED SOLUTION
>>>
>>> To avoid logging an error having temporary status the reporting method
>>>
>>>    rli->report(error description args) 
>>>
>>> should call has_temporary_error() to sort out the actual status.
>>> If the error is temporary it won't force an error line in the error
>>> log.
>>>
>>>
>>>
>>> THE IMPLEMENTATION
>>>
>>> rli->report() demotes the error's severity to the warning level
>>> basing on has_temporary_error() report.
>>>
>>> has_temporary_error() becomes available in rpl_reporting.cc as well
>>> still to be executed exclusively by the SQL thread.
>>>
>>> The temporary to the regular status promotion is achieved as simple as
>>> with raising up thd->is_fatal_error flag.
>>>
>>>
>>> BEING DISCUSSED ISSUES
>>>
>>> has_temporary_error() is not necessary to be precise in detection of
>>>   the temporary status. The function is rather optimistic to give a
>>>   chance to re-apply (uselessly) an event e.g in a case of the last
>>>   applying cycle produced a sequence of errors with the temporary in
>>>   the end: a-reg-err_1, ..., a-temp-err_N.
>>>
>>>   This optimistic feature is irrelevant to the idea of the
>>>   solution. It does not make the SQL slave not to notice any errors
>>>   and when it eventually stops all errors and warnings are printed
>>>   out.
>>>
>>> A technical issue to change signature of
>>>   Slave_reporting_capability::report to accept THD* as an arg:
>>>   rli->report(..., THD *thd, ...)
>>>
>>>   I would do that, still in a separate patch.
>>>
>>>
>>> I would be happy to hear your opinion.
> 
>> The problem is still the following, to which I have not seen a convincing
>> explanation or alternative solution: the reporting function report() demotes the
>> error to a warning if the thread has a temporary error, which may or may not be
>> the same as the one that is passed.
>>
>> Suppose that the function is used in this way somewhere, maybe by mistake, maybe
>> intentional, with a new error code ER_DEADLOCK_IS_FATAL:
>>
>> /*
>>    If we get a deadlock here, this is to be reported as an error.
>>  */
>> int error= thd->stmt_da->sql_errno();
>> if (error == ER_LOCK_DEADLOCK)
>>   thd->report(ERROR_LEVEL, ER_DEADLOCK_IS_FATAL, "A fatal deadlock occured");
> 
> That would be an incorrect use.
> What's a point of translating a temporary error into an ordinary one?
> The ordinary error would be induced by a temporary, but it's not equal
> to the temporary.
> Therefore the correct way to achive the goal is to report the ordinary,
> like ER_DEADLOCK_IS_FATAL, not instead but afterward.
> And if one *really* needs translation he has to *remove* all traces of 
> the temporary, incl ones from the diagnostics area, and to report the ordinary
> on a "clean sheet".

No objections there, but that means that the interface, basically the report()
function, needs to be changed so that the above cannot be done by "normal" mistakes.

>> Now, since the patch demotes the error to a warning inside report(), this will
>> cause the function above to not do what was intended because the calling code
>> thinks this is an error (ER_DEADLOCK_IS_FATAL) while report() thinks this is a
>> warning (ER_LOCK_DEADLOCK). This occurs because has_temporary_error() talks
>> about the error stored in the diagnostics area (ER_LOCK_DEADLOCK) while the
>> calling code talks about the code passed to the report() function
>> (ER_DEADLOCK_IS_FATAL).
>>
>> From my point of view, you can solve this either by:
>>
>> - Convincing me that this cannot happen, i.e.: that nobody currently does or
>>   ever wants to do anything like the above.
> 
> I believe it can not happen while we stick to some sensible principle.
> One is suggest above: don't do tricks instead report new error.

Right, so we want to eliminate the possibility of doing tricks like the above.

>> - Changing the patch to ensure that has_temporary_error() and calling code talks
>>   about the same error code either by:
>>
>>   - Passing the error code passed to report() further on to the
>>     has_temporary_error() function when doing the check
>>
> 
>>   - Removing the error code from the parameter list of the report() function and
>>     reading the error code from the diagnostics area inside the report()
>>     function
> 
> Yes, the 2nd is something I did not make but liked.
> 
> Passing the last error through da does not make the convention of not doing tricks
> irrevant. But it would make that activity exposed more so that reviewers hardly will
> miss the chance to ask about.
> 
> I believe we are on the same page now and I am submitting the da-error-trasport
> patch
> in a short time.

Sounds excellent! I'm eagerly awaiting a patch.

Best wishes,
Mats Kindahl
-- 
Mats Kindahl
Senior Software Engineer
Database Technology Group
Sun Microsystems
Thread
bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin7 May
  • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl7 May
    • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin8 May
      • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl13 May
        • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin13 May
          • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin17 Jun 2009
            • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl25 Jun 2009
              • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin25 Jun 2009
                • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Mats Kindahl25 Jun 2009
                  • Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524Andrei Elkin29 Jun 2009