List:Backup« Previous MessageNext Message »
From:Andreas Almroth Date:October 26 2009 1:30pm
Subject:RE: RFC: WL#5046 - error reporting
View as plain text  
My suggestions for error handling:

S12 Get error information

    [IN]  backup storage session
    [OUT] error number

Will return last error occurred in the given BSM session. Returns BSM_NO_ERR if no
previous error has occurred.
Error description is returned by the BSM using current I18N locale or default to English.

Note: In practical implementation the two outputs are split in to API calls really. One
for retrieving the error code (a la errno in C), and one to map text to the error code;
get_error. This is how I envision the implementation. Calling Get_Error with overhead for
text error code is too troublesome. If the backup kernel requires the text (for output to
user), it retrieves it using get_error_desc or something.

So;
S12 Get last error

    [IN]  backup storage session
    [OUT] error number

Will return last error occurred in the given BSM session. Returns BSM_NO_ERR if no
previous error has occurred.

S13 Get error description

    [IN]  backup storage session
    [IN]  error code
    [OUT] error description

Return textual description of error code in current session. Error description should be
returned by the BSM using current I18N locale or default to English.


Error codes must be standardized in the BSM API as else there would be no way for the
backup kernel to decide whether to abort, continue or act on error.
In my MyBRM API there are a set of errors that is expected from the MyBRM implementations
that the backup kernel can handle.

For the HLS, I don't see the need to go into such details, but the LLD will cover it all.
Also, errors reported by the backup kernel (not the BSM) should go into the errmsg and any
locales we choose to cover.
These errors must be added to the mysql global database.

For BSM reported errors, the backup kernel should either relay to the error log or just
simply not forward to user. I'm more in favour for adding messages in the error log and
do like a stack trace of messages to guide the user. That is; fixed part of backup kernel
error message + dynamic part reported by the individual BSM.

We should all know that if the user runs SE_sv or DE_de or whatever locale, and the 3rd
party BSM does not, it will be mixed language in the error log.
The only way to avoid this is to not log BSM errors at all, but the user will have to rely
on that the 3rd party will provide their own error log.




Best regards / Cordialement

Andreas Almroth

-----Original Message-----
From: Ingo.Struewing@stripped [mailto:Ingo.Struewing@stripped] 
Sent: 25. oktober 2009 19:34
To: Rafal Somla
Cc: Ingo Strüwing; Andreas Almroth; backup@stripped
Subject: Re: RFC: WL#5046 - error reporting

Hi Rafal,

Rafal Somla, 24.10.2009 16:02:

> Hi,
> 
> I've described in HLS a proposition how to handle errors from backup
> storage modules. Please have a look and let me know if you have any
> comments on that.

great work. Thank you.

> S12 Get error information.
...
>     Note: this service will fail if it is called without any error being
>     detected earlier.


And then you can call it again and get a "no error" error? What about
letting it always succeed and return zero and empty string (or "no
error") if no error happened? If it can fail, its failure needs to be
handled...

What does "earlier" mean? "Anytime before in the backup storage
session"? Or "during the last service call"?

> Examples

The examples are nice. But should they be part of the specification?
Confusion might come from the fact that they give additional
information, which might be taken as "specified" or "optional". E.g. is
it part of the specification that path names in error messages have to
be canonical, or it is just an option?

> Design principles
> -----------------
> 
> - Several errors can be reported for a single failure - they provide information
> which should be combined to get the most accurate description of the failure.


From the text, later in the section, I guess that you mean that the
backup kernel can add more error messages. But since we specify the API
here, it could be misunderstood as if multiple "S12 Get error
information" could be called in a row, to retrieve several errors from
the failed service.

If my guess is correct, then this sentence is a duplicate of the later
"Together with an error from storage module, backup kernel reports more
 errors informing about the context in which the error has happened." I
find the latter easier to understand and suggest to drop the former.

> - Errors from storage modules are reported as a single ... error" ...
> - Backup kernel does not try to interpret errors ...

I suggest to exchange the order of the two paragraphs. Then two adjacent
paragraphs explain, how the errors are reported.

> - Backup kernel does not try to interpret errors reported by backup storage
> modules, it only notes that an error has happened and possibly forwards its to
> backup error log. There is no global convention about which error number means
> what.

I'm unhappy with this. I understand that the current semantics of the
interface distinguishes between problems that can be accepted and
circumvented (and which are not reported as errors), and errors that
require an abort of the backup/restore.

But I'm not sure if no future extension can make it necessary that the
kernel distinguishes certain errors to be able to react flexibly.

It is a pretty common design pattern to deduct different problems from
different errors. I'd like this API to work that way too. I think this
is a design decision. Perhaps you can note my preference as an alternative.

> There is no global convention about which error number means what.

This is something, which might bite us one day. If multiple modules
report similar error messages for problems that are pretty different to
handle by the user, then the support team might have a hard time to
figure out, what happened exactly. Sure, the final message contains the
module name, but often the customer doesn't remember the exact text.
Especially if he is no native English speaker. "The backup said no such
tape", but it was "file not found". Perhaps the xbsa: type specifier had
been forgotten. A globally unique error number would help a lot.

> - For simplicity, storage module error descriptions are in plain English. The
> common "error from storage engine" error message is internationalized like other
> MySQL errors (using errmsg file).

Unfortunately, we have such behavior in storage engines already as
historical ballast. But I oppose to specify this for new software. There
should be a way to handle internationalization for storage modules.

> - For simplicity, storage module error descriptions are in plain English. The
> common "error from storage engine" error message is internationalized like other
> MySQL errors (using errmsg file).

A matter of course. I wouldn't mention it in the specification.

> Proposed error messages

A lot of work, a lot of details. You may have got it mostly right. My
concern is that it may turn out as incomplete during implementation.
Then the specification must be changed. I wonder if it won't be better
to keep this out of the specification.

Regards
Ingo
-- 
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder,   Wolfgang Engels,   Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring   HRB München 161028
Thread
RFC: WL#5046 - error reportingRafal Somla24 Oct
  • Re: RFC: WL#5046 - error reportingIngo Strüwing25 Oct
    • RE: RFC: WL#5046 - error reportingAndreas Almroth26 Oct
    • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
      • Re: RFC: WL#5046 - error reportingIngo Strüwing27 Oct
        • Re: RFC: WL#5046 - error reportingRafal Somla28 Oct
          • Re: RFC: WL#5046 - error reportingIngo Strüwing29 Oct
            • Re: RFC: WL#5046 - error reportingRafal Somla3 Nov
              • RE: RFC: WL#5046 - error reportingAndreas Almroth4 Nov
  • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
    • Re: RFC: WL#5046 - error reportingIngo Strüwing27 Oct
      • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
        • RE: RFC: WL#5046 - error reportingAndreas Almroth27 Oct
  • Re: RFC: WL#5046 - error reportingIngo Strüwing4 Nov