List:Backup« Previous MessageNext Message »
From:Ingo Strüwing Date:October 29 2009 4:54pm
Subject:Re: RFC: WL#5046 - error reporting
View as plain text  
Hi Rafal,

Rafal Somla, 28.10.2009 17:16:
> Hi Ingo,

> 
> The discussion continues...


But we seem to come closer.

> 
> Ingo Strüwing wrote:


...
> Sure. Working at MySQL I've learned that probably the best way to deal
> with that is:
> 1. do our best to design as good interface as we can at the moment,
> accepting that we can not predict everything;
> 2. when unpredicted issue arises, rework the interface.


Interesting. We had this topic in the other direction when talking about
the asynchronism extensions. :)

...
> After thinking about it for a long while, for me this boils down to the
> following desing choice (don't ask me why :)):
> 
> Currently, when a service is called, only two general outcomes are
> possible:
> 
> 1. Service succeeds and provides the specified information.
> 2. Service fails and this is a fatal error - the whole session is
> interrupted.
> 
> But perhaps we want to have three possible outcomes:
> 
> 1. Service succeeds and provides the specified information.
> 2. Service fails with fatal error - the whole session is interrupted.
> 3. Service fails with non-fatal error - the session can still be used.


I agree. These are the choices, we were discussing about.

> 
> I am still not convinced that we really need it. Although I think I can
> also buy it. The only think which stops me right now is that I'd rather
> keep it simpler if possible.
> 
> If we are to go this way, then I think a user of a storage module
> (backup kernel) can not decide on its own whether given error is fatal
> or non-fatal.
> In the end, it is the storage module which knows whether the failure
> that has happened prevents further operation or not.


Funny, I feel the contrary. How can the module know, which options the
kernel has, to work around problems?

For example. If the medium runs full, how can the module know, if the
kernel is allowed by the user to retry with compression?

I think there are only very few errors that make a session unusable, for
example insufficient memory.

OTOH, with the current service specification, we can probably just drop
a failed session and initialize a new one. (With the compression
selection service, things would be different.)

...
> "
> S10 Get information about image stored in the location.
>     [IN]  backup storage session
>     [OUT] size and timestamp of the image or information that location does
>           not contain a backup image.
> "
> 
> can be a function:
> 
> int get_image_info(session, ...)
> 
> which returns:
>  0               - upon normal termination
>  BSM_LOC_EMPTY   - if location is empty
>  BSM_WRONG_DATA  - if location does not contain backup image
>  error           - negative error code if fatal error has been encountered.
> 
> This is a valid implementation of the above specification. The
> information that location does not contain a backup image is passed in
> form of positive return value. This is distinguished from (fatal) errors
> which are signalled via negative return values.


Agree. But I find it more natural for a software developer to think in
function signatures when reading [IN) and [OUT]. I would prefer to avoid
the surprise when switching from specification- to code reading.

If we want to leave freedom to the implementor, then we could perhaps
specify the services with one paragraph per in/out "information" instead
of [IN) and [OUT].

OTOH this should not be a prerequisite for me to approve the HLS.

> 
>> We can leave it to the backup kernel, which errors to take as fatal, and
>> which to work around. Backup kernel could be fixed in this respect,
>> without changing the interface.
>>
> 
> But first of all, backup kernel must know if backup storage session is
> usable after an error or not. This information must be passed somehow
> from module to the kernel - the kernel can not decide it on its own.


Well, this could be solved by all further services to fail, so that
every attempt to work around the problem would fail.

But the most important cases will have well-known error codes. And
hence, well-known severity.

...

>    As far as we speak about the interface, it
> is most important to specify what information is passed through it and
> then how.


I agree in general. But I may have difficulties to think strongly
top-down. For an interface I see function signatures first. And I'm
trying to translate them to the specification. Not academically correct,
but...

...
>> Support would have better chances if there is a unique error number.
> 
> And what if users makes mistake when reporting error number ;)


Right. This happens too. But in spite of a vast number of languages in
the world, there are not so many number systems.

> 
>> Some (management-)applications could also profit from unique error
>> numbers. So they won't need to parse the error text.
>>
> 
> Ok, I still see an issue how global error codes (code ranges) would be
> assigned to particular BSM implementations? The only solution which I
> can come up with is that we reserve certain range for MySQL and then all
> BSMs developed at MySQL will use unique error numbers from that range.
> But any external implementers will use arbitrary chosen numbers outside
> of that range. Then the error numbers are bound to overlap and the
> advantage you have in mind is not going to happen.


This applies to storage modules, which are developed and used
proprietarily. I think of community projects mainly. These will be added
to the MySQL code base. If they add error messages to errmsg-utf8.txt,
their final push will reserve the numbers once and forever.

...
> For the moment, I have updated HLS as suggested: locale setting is
> passed to "create storage session" service and error descriptions
> returned by storage module should be in appropriate language. I think
> this specification should not be strict. That is, BSM should do its best
> to describe errors in the selected language, but if not possible, it can
> use other language (English). Any thoughts about how to best specify it?


IMHO the formulation in the HLS is good enough.

Regards
Ingo
-- 
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder,   Wolfgang Engels,   Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring   HRB München 161028

Thread
RFC: WL#5046 - error reportingRafal Somla24 Oct
  • Re: RFC: WL#5046 - error reportingIngo Strüwing25 Oct
    • RE: RFC: WL#5046 - error reportingAndreas Almroth26 Oct
    • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
      • Re: RFC: WL#5046 - error reportingIngo Strüwing27 Oct
        • Re: RFC: WL#5046 - error reportingRafal Somla28 Oct
          • Re: RFC: WL#5046 - error reportingIngo Strüwing29 Oct
            • Re: RFC: WL#5046 - error reportingRafal Somla3 Nov
              • RE: RFC: WL#5046 - error reportingAndreas Almroth4 Nov
  • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
    • Re: RFC: WL#5046 - error reportingIngo Strüwing27 Oct
      • Re: RFC: WL#5046 - error reportingRafal Somla27 Oct
        • RE: RFC: WL#5046 - error reportingAndreas Almroth27 Oct
  • Re: RFC: WL#5046 - error reportingIngo Strüwing4 Nov