Hi Ingo,
Ingo Strüwing wrote:
...
>> Sure. Working at MySQL I've learned that probably the best way to deal
>> with that is:
>> 1. do our best to design as good interface as we can at the moment,
>> accepting that we can not predict everything;
>> 2. when unpredicted issue arises, rework the interface.
>
>
> Interesting. We had this topic in the other direction when talking about
> the asynchronism extensions. :)
No no, this was point 1: "do our best to design as good interface as we can" :)
>
> ...
>> After thinking about it for a long while, for me this boils down to the
>> following desing choice (don't ask me why :)):
>>
>> Currently, when a service is called, only two general outcomes are
>> possible:
>>
>> 1. Service succeeds and provides the specified information.
>> 2. Service fails and this is a fatal error - the whole session is
>> interrupted.
>>
>> But perhaps we want to have three possible outcomes:
>>
>> 1. Service succeeds and provides the specified information.
>> 2. Service fails with fatal error - the whole session is interrupted.
>> 3. Service fails with non-fatal error - the session can still be used.
>
>
> I agree. These are the choices, we were discussing about.
>
I have changed HLS to specify the second alternative. That is, the
distinction between fatal and non-fatal errors is made explicit in the
interface. I kept fatal errors because I think it allows for more efficient
implementations which can assume that after reporting fatal error no other
services would be called. Otherwise, all service implementations would have
to check for session validity and this would cost additional cycles.
>> I am still not convinced that we really need it. Although I think I can
>> also buy it. The only think which stops me right now is that I'd rather
>> keep it simpler if possible.
>>
>> If we are to go this way, then I think a user of a storage module
>> (backup kernel) can not decide on its own whether given error is fatal
>> or non-fatal.
>> In the end, it is the storage module which knows whether the failure
>> that has happened prevents further operation or not.
>
>
> Funny, I feel the contrary. How can the module know, which options the
> kernel has, to work around problems?
>
There can be two reasons why operations can not continue:
a) the storage module is in fatal error condition and it can not work,
b) the backup kernel has reached a state where nothing else can be done but
aborting the operation.
Only storage module knows about condition a) and only backup kernel knows
about condition b). Storage module should not decide whether b) has happened
and backup kernel should not guess if a) has happened. Since it is backup
kernel who drives the whole operation, the information about condition a)
must be passed from storage module to it. I think there is no need to inform
storage module about condition b). If backup kernel decides to abort
operation it will abort storage session and shut down storage module using
its services.
> For example. If the medium runs full, how can the module know, if the
> kernel is allowed by the user to retry with compression?
>
> I think there are only very few errors that make a session unusable, for
> example insufficient memory.
>
> OTOH, with the current service specification, we can probably just drop
> a failed session and initialize a new one. (With the compression
> selection service, things would be different.)
>
Very good example. In the previous version of HLS there was no way for
storage module to inform about "disk full" condition - the only option was
to report fatal error and then backup kernel would have to create new
session. Currently I imagine that backup kernel logic would be somethink
like this:
1. call "write bytes" service.
2. If call was ok then continue.
3. if fatal error then report error an abort.
4. If non-fatal error, then:
4a. close stream and free location,
4b. re-open location for writing,
4c. restart backup using compression.
Here I still avoid to analyse errors signaled by storage module. Simply,
whenever a non-fatal error is reported, the work-around with compression
would be tried.
However, it could make sense to try compression only in "disk full"
condition and do other things (simple re-try or abort) upon other non-fatal
errors. To implement this behaviour, backup kernel must be able to
distinguish the two situations and this can be done only if storage module
provides more information about it. Thus I see this as a request for
extending the interface so that the information can be conveyed. In this
case I would change specification of "write bytes service" to explicitly
return information about disk full condition:
S6 Write bytes to location.
IN: Backup storage session, data buffer and amount of data to
be written.
OUT: Amount of data that has been written and information if there is
space for more data.
Fatal and other non-fatal errors would be reported as usual.
...
> Agree. But I find it more natural for a software developer to think in
> function signatures when reading [IN) and [OUT]. I would prefer to avoid
> the surprise when switching from specification- to code reading.
>
> If we want to leave freedom to the implementor, then we could perhaps
> specify the services with one paragraph per in/out "information" instead
> of [IN) and [OUT].
>
> OTOH this should not be a prerequisite for me to approve the HLS.
>
I like this more general form of specifications and I have updated HLS
accordingly.
>>> We can leave it to the backup kernel, which errors to take as fatal, and
>>> which to work around. Backup kernel could be fixed in this respect,
>>> without changing the interface.
>>>
>> But first of all, backup kernel must know if backup storage session is
>> usable after an error or not. This information must be passed somehow
>> from module to the kernel - the kernel can not decide it on its own.
>
>
> Well, this could be solved by all further services to fail, so that
> every attempt to work around the problem would fail.
This approach has two problems:
- It might be difficult for backup kernel to decide if the failure is rally
fatal or if it can/should try again,
- It would require each service to check if session is valid upon each call.
If storage module can explicitly report fatal error and there is convention
that service calls after fatal error are prohibited, then slightly more
optimal implementation is possible.
>
> But the most important cases will have well-known error codes. And
> hence, well-known severity.
>
I leave it to LLD to decide how error severity is reported. This is one
possibility.
...
>> Ok, I still see an issue how global error codes (code ranges) would be
>> assigned to particular BSM implementations? The only solution which I
>> can come up with is that we reserve certain range for MySQL and then all
>> BSMs developed at MySQL will use unique error numbers from that range.
>> But any external implementers will use arbitrary chosen numbers outside
>> of that range. Then the error numbers are bound to overlap and the
>> advantage you have in mind is not going to happen.
>
>
> This applies to storage modules, which are developed and used
> proprietarily. I think of community projects mainly. These will be added
> to the MySQL code base. If they add error messages to errmsg-utf8.txt,
> their final push will reserve the numbers once and forever.
>
This suggestion is in contrast with what I propose in HLS. My idea was that
storage module provides error description via service call and then kernel
reports it using one general "error from storage module" mysql error (one
entry in errmsg.txt). With your proposition, storage modules should register
error messages in errmsg.txt like the rest of the server. Then there is no
need for a service which gives error description. It is enough that error
number is reported and then backup kernel can locate its description in
errmsg.txt as usual. I described it as alternative A5 - please verify that
it is adequately described.
I think one consequence of such a design will be that a user would not be
able to hot-plug a storage module which was not known at server build time
(when errmsg.txt was compiled). To use such a module he would not only have
to stop the server but also recompile it. With my design, it should be
possible to upgrade a running server by adding to it a new storage module,
even if this module did not exist at the server build time.
Rafal