From: Date: June 25 2009 2:59pm Subject: Re: bk commit into 5.1 tree (aelkin:1.2612) BUG#36524 List-Archive: http://lists.mysql.com/commits/77187 Message-Id: <4A4374B0.5030108@sun.com> MIME-Version: 1.0 Content-Type: text/plain; CHARSET=US-ASCII Content-Transfer-Encoding: 7BIT Andrei Elkin wrote: > Mats, hello. > >> Andrei Elkin wrote: >>> Mats, hello. >>> >>> You are still formally a reviewer for the bug. Since the last time we >>> discussed the issue and my idea of fixes it got a customer. >>> I have analyzed my last changeset once again to be utmost confident >>> in the idea implemented in >>> >>> http://lists.mysql.com/commits/46451 >>> >>> Let me put it all once again to avoid spawning yet more our last year >>> thread. >>> >>> THE PROBLEM >>> >>> We have overreaching error messages in the slave's server error log. >>> The message is either Deadlock or Timeout in waiting for a lock >>> at applying a replication event. It is not really a problem but rather >>> transient issue and should not be noticed by the user. >>> \footnote{The temporary status changes to the regular once the slave >>> has re-tried to apply the failing events unsuccessfully for >>> rli->trans_retries == slave_trans_retries times} >>> >>> >>> THE OFFERED SOLUTION >>> >>> To avoid logging an error having temporary status the reporting method >>> >>> rli->report(error description args) >>> >>> should call has_temporary_error() to sort out the actual status. >>> If the error is temporary it won't force an error line in the error >>> log. >>> >>> >>> >>> THE IMPLEMENTATION >>> >>> rli->report() demotes the error's severity to the warning level >>> basing on has_temporary_error() report. >>> >>> has_temporary_error() becomes available in rpl_reporting.cc as well >>> still to be executed exclusively by the SQL thread. >>> >>> The temporary to the regular status promotion is achieved as simple as >>> with raising up thd->is_fatal_error flag. >>> >>> >>> BEING DISCUSSED ISSUES >>> >>> has_temporary_error() is not necessary to be precise in detection of >>> the temporary status. The function is rather optimistic to give a >>> chance to re-apply (uselessly) an event e.g in a case of the last >>> applying cycle produced a sequence of errors with the temporary in >>> the end: a-reg-err_1, ..., a-temp-err_N. >>> >>> This optimistic feature is irrelevant to the idea of the >>> solution. It does not make the SQL slave not to notice any errors >>> and when it eventually stops all errors and warnings are printed >>> out. >>> >>> A technical issue to change signature of >>> Slave_reporting_capability::report to accept THD* as an arg: >>> rli->report(..., THD *thd, ...) >>> >>> I would do that, still in a separate patch. >>> >>> >>> I would be happy to hear your opinion. > >> The problem is still the following, to which I have not seen a convincing >> explanation or alternative solution: the reporting function report() demotes the >> error to a warning if the thread has a temporary error, which may or may not be >> the same as the one that is passed. >> >> Suppose that the function is used in this way somewhere, maybe by mistake, maybe >> intentional, with a new error code ER_DEADLOCK_IS_FATAL: >> >> /* >> If we get a deadlock here, this is to be reported as an error. >> */ >> int error= thd->stmt_da->sql_errno(); >> if (error == ER_LOCK_DEADLOCK) >> thd->report(ERROR_LEVEL, ER_DEADLOCK_IS_FATAL, "A fatal deadlock occured"); > > That would be an incorrect use. > What's a point of translating a temporary error into an ordinary one? > The ordinary error would be induced by a temporary, but it's not equal > to the temporary. > Therefore the correct way to achive the goal is to report the ordinary, > like ER_DEADLOCK_IS_FATAL, not instead but afterward. > And if one *really* needs translation he has to *remove* all traces of > the temporary, incl ones from the diagnostics area, and to report the ordinary > on a "clean sheet". No objections there, but that means that the interface, basically the report() function, needs to be changed so that the above cannot be done by "normal" mistakes. >> Now, since the patch demotes the error to a warning inside report(), this will >> cause the function above to not do what was intended because the calling code >> thinks this is an error (ER_DEADLOCK_IS_FATAL) while report() thinks this is a >> warning (ER_LOCK_DEADLOCK). This occurs because has_temporary_error() talks >> about the error stored in the diagnostics area (ER_LOCK_DEADLOCK) while the >> calling code talks about the code passed to the report() function >> (ER_DEADLOCK_IS_FATAL). >> >> From my point of view, you can solve this either by: >> >> - Convincing me that this cannot happen, i.e.: that nobody currently does or >> ever wants to do anything like the above. > > I believe it can not happen while we stick to some sensible principle. > One is suggest above: don't do tricks instead report new error. Right, so we want to eliminate the possibility of doing tricks like the above. >> - Changing the patch to ensure that has_temporary_error() and calling code talks >> about the same error code either by: >> >> - Passing the error code passed to report() further on to the >> has_temporary_error() function when doing the check >> > >> - Removing the error code from the parameter list of the report() function and >> reading the error code from the diagnostics area inside the report() >> function > > Yes, the 2nd is something I did not make but liked. > > Passing the last error through da does not make the convention of not doing tricks > irrevant. But it would make that activity exposed more so that reviewers hardly will > miss the chance to ask about. > > I believe we are on the same page now and I am submitting the da-error-trasport patch > in a short time. Sounds excellent! I'm eagerly awaiting a patch. Best wishes, Mats Kindahl -- Mats Kindahl Senior Software Engineer Database Technology Group Sun Microsystems