Mats, hi.
>>>> void
>>>> Slave_reporting_capability::report(loglevel level, int err_code,
>>>> const char *msg, ...) const
>>>> @@ -13,6 +27,8 @@ Slave_reporting_capability::report(logle
>>>> va_list args;
>>>> va_start(args, msg);
>>>> + if (level == ERROR_LEVEL &&
> has_temporary_error(current_thd))
>>>> + level= WARNING_LEVEL;
>>>>
>>
>>
>>> Why do you put the logic here? Isn't the what log level should be used
>>> a decision to be made by the caller? This seems to put to many
>>> features into the function, which previously had the role of just
>>> reporting an error at a certain level.
>>>
>>
>> When I made the first patch where
>> Rows_log_event,Query_log_event::do_apply_event
>> composed the correct arg for rli->report(),
>> I had thought there would be just two occurances where rli->report()
>> needs to be careful.
>> But overnight I realized a simple thing that load data is prone to the
>> timeout, and moreover, there is the general problem.
>>
>> Anywhere in the slave sql thread execution there might happen a situation
>> that we would not like to report as an error for some time.
>> As there are many sources I think it's wiser to implement a general
>> mechanism.
>>
>
> I don't understand what you're saying here...
The paragraph is trying to say what is implemented:
whenever there is a
rpl->report(error)
invocation there is a possibility that the error is a temporary.
Hence, its reporting should take of that.
The current scheme of reporting
where
rli->report()
in the middle of executing an event
follows with the final too late care of whether that was a
temporary error is vicious.
Here is a suggestion how it can be improved:
>> The mechanism that can block reporting and resume it later could be as
>> simple as the patch offers, which is in
>>
>> 1. check if has_temporary_error() and block (demote) the error from
>> showing up as an error;
>>
>> 2. raise is_fatal_error as soon as there is certainty that a blocked
>> error reporting needs resuming.
>>
As you did not comment on 1,2 I presume you agree conceptually.
>>> Why is this a problem?
>>
>> I hope you see my concern to avoid duplication of the same logics all
>> around the code (mentioned above as done in the first patch).
>>
>
> Duplicating the logic might be an issue, but changing the report()
> function for this purpose strikes me as fixing the problem at the
> wrong place.
> Is there any particular reason to why you don't construct a special
> function for this file only with the functionality you want?
> That way,
> you avoid code duplication and the changes are kept local to the file,
> which eliminates the risk that you are changing behavior of code that
> should not be affected by this change.
A special function would be a wrapper over rli->report() if I
understand your advice correctly.
But then, it'd be of dubious advantange as we would extend error
reporting interface.
Considering what rli->report() should do is a well good subject to
discuss.
As I see it, rli->report() provide a similar feature as my_error() does.
There is only one essencial difference though.
An error of a user session can not be temporary.
But the rest is conceptaully the same so that I think eventually
rli->report() should be replaced with my_error().
Our task in that case would be to define a necessary
int (*error_handler_hook)(uint my_err, const char *str,myf MyFlags);
hooks for IO, SQL threads.
We don't need any storage that Slave_reporting_capability provides,
because there is thd->main_da.sql_errno etc available.
Considering the current DEADLOCK temp error and imagining for a second
if there was no rli->report() I would call
my_error(error)
and hooks, I'd have to prepare/register, would call has_temporary_error() to
avoid ERROR line showing up.
> The other option is to get all
> the information from the THD instance instead of part of the error as
> parameter and part from the THD instance.
Have you meant to pass THD as an argument to rli->report() and to use
thd->main_da ?
>>
>>> This will make all errors reported through this
>>> interface be demoted provided the current thread has a temporary
>>> error, regardless of what is actually being reported.
>>>
Need to speak to you, as sorry, I am lost in guesses.
>>
>> It should not. Let's look at your cases.
>>
>>
>>> This can cause
>>> two strange situations, to a naïve user of the reporting function:
>>>
>>> 1. A different error than the error present for the current thread is
>>> passed to the reporting function. This can be a decision on the
>>> callers behalf to regard a certain temporary error as a fatal
>>> error, but the reporting function will demote this error to a
>>> warning level just because the caller has "forgotten" to set the
>>> "fatal error" flag.
>>>
>>
>> That's a good question.
>>
>> In other words, if there had been a temporary error and after that
>> the 2nd error that is not meant to be temporary. Is that the
>> concern?
>>
>
> No, that is not the case I'm describing. The report() function can be
> used to report any error, not just one that has been generated.
>
I hope we must see it in agreement that
report() is an immediate reaction on an error situation.
There can not be two pending errors to report.
We report them as soon as they are generated.
It's reasonable and basicly the current non-perfect code follows that.
This means indeed, report() can report only immediate, just has been
generated error, and nothing else.
If there is disagreemnt we will need to talk that over.
>> The current handling in rli->report() would respect
>> the current error, coming as an argument, so that the 2nd error
>> remains a legal error as long as if has_temporary_error() confirms
>> that.
>
> ... but what I'm describing is a case where has_temporary_error()
> looks at the last generated error and not at the error that was
> supplied to the reporting function.
Again, the assumption above is once an error situation happened it
must be reported.
At reporting time properties of the error have to
be shaped off, and so the patch does calling has_temporary_error().
>
>> The latter function is interested only in the latest error.
>>
>
> Yes, but that might not be what is passed to report().
>
>> This observation tells me that has_temporary_error() is combinable
>> with rli->report() that also deals only with the latest.
>>
>
> Again, this is not the scenario I describe. You are changing the
> behavior of the report() function to *all* users, not just the case
> you are trying to "fix".
Right, for *all* because there are many already. But (*many* - all)
the rest is not affected: if we had an event and an error which were
not listed in has_temporary_error() there would be no change.
>
>>
>>> 2. It is necessary to add code to prevent the error message to be
>>> repeated ad-nauseum,
>>
>> there was no such intention, I change only for demotion. But logics
>> of rli->report() can be extended to add such a rule. Again, it would
>> be an advantage to do that in the method not on the caller level.
>>
>
> I cannot see any advantage. What would the advantage be?
The only advantage is that we catch temporary errors correctly on time
and in the single central location - inside of report().
Making rli->report() just a bit more intelligent is not a big deal.
The above might be a temproray solution, but I worder if you will
share my ideal world view.
Imo logics should be like this:
an error situation happens,
my_error(error) is called,
(*error_handler_hook) having access to execution context figures out
what and how to report.
>
>>
>>> which means that part of the logic for
>>> deciding if this should be a warning is in the caller and part is
>>> inside the reporting function. This split just introduces an
>>> unnecessary coupling between the caller and the callee.
>>>
>>
>> The caller has is_fatal_error always available to express his strong
>> preference. And that idea has been worked so far. And then let's leave
>> it to the caller but release the caller from expressing the soft
>> opinion because that soft opinion is just a default.
>>
>
> I don't understand this...
thd->is_fatal_error cancels temporary meaning.
rli->report(error) can be prepared with the strong thd->is_fatal_error
opinion so that a temporary error will show up as an error at once.
The express the soft opinion is contrary to designate explictly in the
caller that the error is temporary.
That's what i like to avoid because there are several report() calling
places that would need that - duplication of logics.
So the patch offers to think about an error with the soft opinion to
allow reporting function to shape off the error fully, incl its
severity.
We have a simple way to change the opinion with thd->is_fatal_error
that we do e.g when (rli->trans_retries >= slave_trans_retries).
>>
>>> I would like you to pass in the correct log-level to the reporting
>>> function instead,
>>>
>>
>> I hope my comments will make it more clear the idea of this patch.
>>
>>> which will also keep the has_temporary_error()
>>> function inside the file and not exposing it outwards.
>>>
>>>
>>
>> You can compare one one-line hunk with what we would have to do -
>> finding all case prone to the temprorary errors, duplicate the same dumb
>> lines all over the cases, and after that sit and wait then there will
>> show up yet another mtr->report() where the caller forgets to check
>> the temporary status.
>>
>
> The role of the reporting function is to report errors... *any* error,
> not just the latest generated error.
We need to talk as it is obvious to me it's impossible to report any
error. A simplest example can be a case of two errors where the latter
is inferred by the first. We never should report the 2nd as the first
or instead of the 1st.
IMO, it's a reasonable requirement is to report errors immediately.
>
> If you want to change the logic, then information should be collected
> so that the risk of incompatible information is minimal. The
> has_temporary_error() checks one error, and the error coming through
> the parameter list is another error.
That should not be possible and unless I miss something is not
possible with the current patch.
I agree it'd be better to use the only source for the error code THD
for instance.
>If they don't match, then
> something is wrong. Either you pass both items through as parameters
> *or* you get all information from the THD instance, but don't try to
> mix the two parts of the error, because they may or may not be
> compatible.
>
> *If* you decide to take the information from the THD instance instead
> (*both* the has_temporary_error() information and the actual error
> code that has_temporary_error() has decided is the temporary error,
> then you should make sure to pass the THD structure as parameter since
> using "current_thd" is very expensive (it starts digging in the
> thread-local store).
Even though it does this "digging" how it's expensive?
First, not to say errors reporting subject is separate from
optimising, but it's not that important.
Second, current_thd is used in reporting through my_error() that I
find as a good pattern to follow.
I hope we can talk and come to consensus about this discussion's
topics.
cheers,
Andrei