List:Backup« Previous MessageNext Message »
From:Rafal Somla Date:October 27 2009 9:08am
Subject:Re: RFC: WL#5046 - error reporting
View as plain text  
Hi Ingo,

Thanks for, valuable as always, comments.

Ingo Strüwing wrote:
> Hi Rafal,
> 
> Rafal Somla, 24.10.2009 16:02:
> 
>> Hi,
>>
>> I've described in HLS a proposition how to handle errors from backup
>> storage modules. Please have a look and let me know if you have any
>> comments on that.
> 
> great work. Thank you.
> 
>> S12 Get error information.
> ...
>>     Note: this service will fail if it is called without any error being
>>     detected earlier.
> 
> 
> And then you can call it again and get a "no error" error? What about
> letting it always succeed and return zero and empty string (or "no
> error") if no error happened? If it can fail, its failure needs to be
> handled...

Yes, too complicated and not really needed. I'll change the specification.

> 
> What does "earlier" mean? "Anytime before in the backup storage
> session"? Or "during the last service call"?

In the note the both meanings agree, because of the negation :)

I'm thinking about reformulating specification of S12 like this:

"Return information about the last error or information that there was no 
error so far. If there was an error then the internal error number and human 
readable description of the error in plain English are returned."

> 
>> Examples
> 
> The examples are nice. But should they be part of the specification?
> Confusion might come from the fact that they give additional
> information, which might be taken as "specified" or "optional". E.g. is
> it part of the specification that path names in error messages have to
> be canonical, or it is just an option?

Yes, they should not be part of specification but, well..., just examples. 
I'll add a note explaining this. I don't want to remove them because I think 
they illustrate well what is the idea behind this design.

> 
>> Design principles
>> -----------------
>>
>> - Several errors can be reported for a single failure - they provide information
>> which should be combined to get the most accurate description of the failure.
> 
> 
> From the text, later in the section, I guess that you mean that the
> backup kernel can add more error messages. But since we specify the API
> here, it could be misunderstood as if multiple "S12 Get error
> information" could be called in a row, to retrieve several errors from
> the failed service.
> 
> If my guess is correct, then this sentence is a duplicate of the later
> "Together with an error from storage module, backup kernel reports more
>  errors informing about the context in which the error has happened." I
> find the latter easier to understand and suggest to drop the former.
> 

OK, I removed this.

>> - Errors from storage modules are reported as a single ... error" ...
>> - Backup kernel does not try to interpret errors ...
> 
> I suggest to exchange the order of the two paragraphs. Then two adjacent
> paragraphs explain, how the errors are reported.
> 

Yes, should read better - I changed this.

>> - Backup kernel does not try to interpret errors reported by backup storage
>> modules, it only notes that an error has happened and possibly forwards its to
>> backup error log. There is no global convention about which error number means
>> what.
> 
> I'm unhappy with this. I understand that the current semantics of the
> interface distinguishes between problems that can be accepted and
> circumvented (and which are not reported as errors), and errors that
> require an abort of the backup/restore.
> 
> But I'm not sure if no future extension can make it necessary that the
> kernel distinguishes certain errors to be able to react flexibly.
> 
> It is a pretty common design pattern to deduct different problems from
> different errors. I'd like this API to work that way too. I think this
> is a design decision. Perhaps you can note my preference as an alternative.
> 

But, do you have any concrete propositions what error situations for which 
services should be distinguishable (across all BSMs)? Or you are only 
concerned with possible future extensions?

In the latter case the current specification does not close such possibility 
completely. Right now we only say that each service either completes 
successfully or signals error. This does not exclude the possibility that 
when error is signalled, different error situations can be indicated. If you 
see a need for such distinctions, I would include this in the specifications 
of the services, in the form of a list of possible error situations.

>> There is no global convention about which error number means what.
> 
> This is something, which might bite us one day. If multiple modules
> report similar error messages for problems that are pretty different to
> handle by the user, then the support team might have a hard time to
> figure out, what happened exactly. Sure, the final message contains the
> module name, but often the customer doesn't remember the exact text.
> Especially if he is no native English speaker. "The backup said no such
> tape", but it was "file not found". Perhaps the xbsa: type specifier had
> been forgotten. A globally unique error number would help a lot.
> 

I don't understand the example. How "The backup said no such tape" could 
possibly appear if we are using a filesystem BSM and the real problem was 
"file not found"?

A disadvantage of globally unique error numbers is that BSM implementer need 
to register them with MySQL. This, I dare to say, will never work. 
Pre-allocating error codes will also not work because different BSM 
implementers would have to agree on which error code range they use. So, I 
don't see a working alternative yet.

>> - For simplicity, storage module error descriptions are in plain English. The
>> common "error from storage engine" error message is internationalized like other
>> MySQL errors (using errmsg file).
> 
> Unfortunately, we have such behavior in storage engines already as
> historical ballast. But I oppose to specify this for new software. There
> should be a way to handle internationalization for storage modules.
> 

Easier said than done :) Any propositions? If the proposition is that BSMs 
use my_error() to report errors then I will oppose such solution...

>> - For simplicity, storage module error descriptions are in plain English. The
>> common "error from storage engine" error message is internationalized like other
>> MySQL errors (using errmsg file).
> 
> A matter of course. I wouldn't mention it in the specification.
> 

I want to point out that with this solution this will lead to mixed language 
messages: General message in local language and BSM string in English.

>> Proposed error messages
> 
> A lot of work, a lot of details. You may have got it mostly right. My
> concern is that it may turn out as incomplete during implementation.
> Then the specification must be changed. I wonder if it won't be better
> to keep this out of the specification.

It was my intention that this is just a proposition which can be changed in 
the implementation. I'll try to explain this in a comment.

Rafal
Thread
RFC: WL#5046 - error reportingRafal Somla24 Oct
  • Re: RFC: WL#5046 - error reportingIngo Strüwing25 Oct
    • RE: RFC: WL#5046 - error reportingAndreas Almroth26 Oct
    • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
      • Re: RFC: WL#5046 - error reportingIngo Strüwing27 Oct
        • Re: RFC: WL#5046 - error reportingRafal Somla28 Oct
          • Re: RFC: WL#5046 - error reportingIngo Strüwing29 Oct
            • Re: RFC: WL#5046 - error reportingRafal Somla3 Nov
              • RE: RFC: WL#5046 - error reportingAndreas Almroth4 Nov
  • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
    • Re: RFC: WL#5046 - error reportingIngo Strüwing27 Oct
      • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
        • RE: RFC: WL#5046 - error reportingAndreas Almroth27 Oct
  • Re: RFC: WL#5046 - error reportingIngo Strüwing4 Nov