List:Backup« Previous MessageNext Message »
From:Ingo Strüwing Date:October 25 2009 6:33pm
Subject:Re: RFC: WL#5046 - error reporting
View as plain text  
Hi Rafal,

Rafal Somla, 24.10.2009 16:02:

> Hi,
> 
> I've described in HLS a proposition how to handle errors from backup
> storage modules. Please have a look and let me know if you have any
> comments on that.

great work. Thank you.

> S12 Get error information.
...
>     Note: this service will fail if it is called without any error being
>     detected earlier.


And then you can call it again and get a "no error" error? What about
letting it always succeed and return zero and empty string (or "no
error") if no error happened? If it can fail, its failure needs to be
handled...

What does "earlier" mean? "Anytime before in the backup storage
session"? Or "during the last service call"?

> Examples

The examples are nice. But should they be part of the specification?
Confusion might come from the fact that they give additional
information, which might be taken as "specified" or "optional". E.g. is
it part of the specification that path names in error messages have to
be canonical, or it is just an option?

> Design principles
> -----------------
> 
> - Several errors can be reported for a single failure - they provide information
> which should be combined to get the most accurate description of the failure.


From the text, later in the section, I guess that you mean that the
backup kernel can add more error messages. But since we specify the API
here, it could be misunderstood as if multiple "S12 Get error
information" could be called in a row, to retrieve several errors from
the failed service.

If my guess is correct, then this sentence is a duplicate of the later
"Together with an error from storage module, backup kernel reports more
 errors informing about the context in which the error has happened." I
find the latter easier to understand and suggest to drop the former.

> - Errors from storage modules are reported as a single ... error" ...
> - Backup kernel does not try to interpret errors ...

I suggest to exchange the order of the two paragraphs. Then two adjacent
paragraphs explain, how the errors are reported.

> - Backup kernel does not try to interpret errors reported by backup storage
> modules, it only notes that an error has happened and possibly forwards its to
> backup error log. There is no global convention about which error number means
> what.

I'm unhappy with this. I understand that the current semantics of the
interface distinguishes between problems that can be accepted and
circumvented (and which are not reported as errors), and errors that
require an abort of the backup/restore.

But I'm not sure if no future extension can make it necessary that the
kernel distinguishes certain errors to be able to react flexibly.

It is a pretty common design pattern to deduct different problems from
different errors. I'd like this API to work that way too. I think this
is a design decision. Perhaps you can note my preference as an alternative.

> There is no global convention about which error number means what.

This is something, which might bite us one day. If multiple modules
report similar error messages for problems that are pretty different to
handle by the user, then the support team might have a hard time to
figure out, what happened exactly. Sure, the final message contains the
module name, but often the customer doesn't remember the exact text.
Especially if he is no native English speaker. "The backup said no such
tape", but it was "file not found". Perhaps the xbsa: type specifier had
been forgotten. A globally unique error number would help a lot.

> - For simplicity, storage module error descriptions are in plain English. The
> common "error from storage engine" error message is internationalized like other
> MySQL errors (using errmsg file).

Unfortunately, we have such behavior in storage engines already as
historical ballast. But I oppose to specify this for new software. There
should be a way to handle internationalization for storage modules.

> - For simplicity, storage module error descriptions are in plain English. The
> common "error from storage engine" error message is internationalized like other
> MySQL errors (using errmsg file).

A matter of course. I wouldn't mention it in the specification.

> Proposed error messages

A lot of work, a lot of details. You may have got it mostly right. My
concern is that it may turn out as incomplete during implementation.
Then the specification must be changed. I wonder if it won't be better
to keep this out of the specification.

Regards
Ingo
-- 
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder,   Wolfgang Engels,   Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring   HRB München 161028
Thread
RFC: WL#5046 - error reportingRafal Somla24 Oct
  • Re: RFC: WL#5046 - error reportingIngo Strüwing25 Oct
    • RE: RFC: WL#5046 - error reportingAndreas Almroth26 Oct
    • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
      • Re: RFC: WL#5046 - error reportingIngo Strüwing27 Oct
        • Re: RFC: WL#5046 - error reportingRafal Somla28 Oct
          • Re: RFC: WL#5046 - error reportingIngo Strüwing29 Oct
            • Re: RFC: WL#5046 - error reportingRafal Somla3 Nov
              • RE: RFC: WL#5046 - error reportingAndreas Almroth4 Nov
  • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
    • Re: RFC: WL#5046 - error reportingIngo Strüwing27 Oct
      • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
        • RE: RFC: WL#5046 - error reportingAndreas Almroth27 Oct
  • Re: RFC: WL#5046 - error reportingIngo Strüwing4 Nov