Andrei Elkin wrote:
> Mats, hello.
>
>> Andrei Elkin wrote:
>>> Mats, hello.
>>>
>>> You are still formally a reviewer for the bug. Since the last time we
>>> discussed the issue and my idea of fixes it got a customer.
>>> I have analyzed my last changeset once again to be utmost confident
>>> in the idea implemented in
>>>
>>> http://lists.mysql.com/commits/46451
>>>
>>> Let me put it all once again to avoid spawning yet more our last year
>>> thread.
>>>
>>> THE PROBLEM
>>>
>>> We have overreaching error messages in the slave's server error log.
>>> The message is either Deadlock or Timeout in waiting for a lock
>>> at applying a replication event. It is not really a problem but rather
>>> transient issue and should not be noticed by the user.
>>> \footnote{The temporary status changes to the regular once the slave
>>> has re-tried to apply the failing events unsuccessfully for
>>> rli->trans_retries == slave_trans_retries times}
>>>
>>>
>>> THE OFFERED SOLUTION
>>>
>>> To avoid logging an error having temporary status the reporting method
>>>
>>> rli->report(error description args)
>>>
>>> should call has_temporary_error() to sort out the actual status.
>>> If the error is temporary it won't force an error line in the error
>>> log.
>>>
>>>
>>>
>>> THE IMPLEMENTATION
>>>
>>> rli->report() demotes the error's severity to the warning level
>>> basing on has_temporary_error() report.
>>>
>>> has_temporary_error() becomes available in rpl_reporting.cc as well
>>> still to be executed exclusively by the SQL thread.
>>>
>>> The temporary to the regular status promotion is achieved as simple as
>>> with raising up thd->is_fatal_error flag.
>>>
>>>
>>> BEING DISCUSSED ISSUES
>>>
>>> has_temporary_error() is not necessary to be precise in detection of
>>> the temporary status. The function is rather optimistic to give a
>>> chance to re-apply (uselessly) an event e.g in a case of the last
>>> applying cycle produced a sequence of errors with the temporary in
>>> the end: a-reg-err_1, ..., a-temp-err_N.
>>>
>>> This optimistic feature is irrelevant to the idea of the
>>> solution. It does not make the SQL slave not to notice any errors
>>> and when it eventually stops all errors and warnings are printed
>>> out.
>>>
>>> A technical issue to change signature of
>>> Slave_reporting_capability::report to accept THD* as an arg:
>>> rli->report(..., THD *thd, ...)
>>>
>>> I would do that, still in a separate patch.
>>>
>>>
>>> I would be happy to hear your opinion.
>
>> The problem is still the following, to which I have not seen a convincing
>> explanation or alternative solution: the reporting function report() demotes the
>> error to a warning if the thread has a temporary error, which may or may not be
>> the same as the one that is passed.
>>
>> Suppose that the function is used in this way somewhere, maybe by mistake, maybe
>> intentional, with a new error code ER_DEADLOCK_IS_FATAL:
>>
>> /*
>> If we get a deadlock here, this is to be reported as an error.
>> */
>> int error= thd->stmt_da->sql_errno();
>> if (error == ER_LOCK_DEADLOCK)
>> thd->report(ERROR_LEVEL, ER_DEADLOCK_IS_FATAL, "A fatal deadlock occured");
>
> That would be an incorrect use.
> What's a point of translating a temporary error into an ordinary one?
> The ordinary error would be induced by a temporary, but it's not equal
> to the temporary.
> Therefore the correct way to achive the goal is to report the ordinary,
> like ER_DEADLOCK_IS_FATAL, not instead but afterward.
> And if one *really* needs translation he has to *remove* all traces of
> the temporary, incl ones from the diagnostics area, and to report the ordinary
> on a "clean sheet".
No objections there, but that means that the interface, basically the report()
function, needs to be changed so that the above cannot be done by "normal" mistakes.
>> Now, since the patch demotes the error to a warning inside report(), this will
>> cause the function above to not do what was intended because the calling code
>> thinks this is an error (ER_DEADLOCK_IS_FATAL) while report() thinks this is a
>> warning (ER_LOCK_DEADLOCK). This occurs because has_temporary_error() talks
>> about the error stored in the diagnostics area (ER_LOCK_DEADLOCK) while the
>> calling code talks about the code passed to the report() function
>> (ER_DEADLOCK_IS_FATAL).
>>
>> From my point of view, you can solve this either by:
>>
>> - Convincing me that this cannot happen, i.e.: that nobody currently does or
>> ever wants to do anything like the above.
>
> I believe it can not happen while we stick to some sensible principle.
> One is suggest above: don't do tricks instead report new error.
Right, so we want to eliminate the possibility of doing tricks like the above.
>> - Changing the patch to ensure that has_temporary_error() and calling code talks
>> about the same error code either by:
>>
>> - Passing the error code passed to report() further on to the
>> has_temporary_error() function when doing the check
>>
>
>> - Removing the error code from the parameter list of the report() function and
>> reading the error code from the diagnostics area inside the report()
>> function
>
> Yes, the 2nd is something I did not make but liked.
>
> Passing the last error through da does not make the convention of not doing tricks
> irrevant. But it would make that activity exposed more so that reviewers hardly will
> miss the chance to ask about.
>
> I believe we are on the same page now and I am submitting the da-error-trasport
> patch
> in a short time.
Sounds excellent! I'm eagerly awaiting a patch.
Best wishes,
Mats Kindahl
--
Mats Kindahl
Senior Software Engineer
Database Technology Group
Sun Microsystems