List:Backup« Previous MessageNext Message »
From:Rafal Somla Date:October 22 2009 11:04am
Subject:Re: WL#4056: Pluggable storage modules for backup
View as plain text  
Hi Ingo,

Se my replies. Many things have been clarified already but few still remain 

Ingo Strüwing wrote:
> "What do we consider part of the backup image?"
> -----------------------------------------------
> Sorry to add more to this. I believe, I understand your point of view.
> You see the backup image as a structure that shall be handled as a
> "black box" blob by external modules. These should not assume *any*
> piece of structure in it. Since the format of that image can change, one
> needs to know the version to be able to interpret the image correctly.
> The version number is "meta" information, which must be present before
> trying to extract information from the image.

Yes, very good description. Good that we understand each other.

> To me this is an unusual, unexpected, complicated way do handle
> versioned data formats. I think that some predefined information at the
> begin of a data structure, which determines the rest of the format, is
> more common, expected, and simpler to handle. External modules don't
> need to care about the image's version. They don't even need to know
> that there are different formats. When they feed a backup image of
> arbitrary format into a backup kernel, they can expect that the kernel
> is able to figure out, how to treat that image.

I think your preference is clear to me and I described this alternative as 
A1 in HLS. There I list the advantages/disadvantages of this choice. Let me 
know if you thing something should be added/changed there. Note that what is 
"usual", "expected" and "complicated" is pretty subjective. I try to avoid 
this kind of arguments and instead collect more concrete/objective ones.

> Also, with your approach, we need to put the burden of keeping the
> version number consistent with the image blob on the storage modules. If
> a module would become screwed up, and deliver a wrong version number
> with the image (or deliver a wrong "stream offset"), we won't have a
> chance to detect the inconsistency. It could take some time until some
> function doesn't work due to being fed with wrong data. We give away the
> most basic, effective and simple consistency check.

Yes, I agree. With my solution we must trust BSM more if we do not do any 
other consistency checking. However, the ultimate solution is to do proper 
consistency checking using checksums or the like. One could argue that it is 
better to rely on a complete solution rather than have a false safety 
feeling based on some simplistic and partial solution.

> If we define "magic bytes" + "version number" as part of the image, we
> would immediately detect a failing storage module.

... unless it fails after giving us correct magic bytes. E.g., if version 
number is corrupted we are screwed anyway. So even in this variant we must 
trust BSM to a big extend.

> "Marking of backup images"
> --------------------------
> My thinking is that the user supplies the "location string". By using
> distinguishable names, he "marks" backup image as such. And he can tell,
> what is a backup image and what is not. In case of human failure I want
> the backup kernel to detect that error as a last resort. If storage
> modules are able to detect such failure before it hits the backup kernel
> - fine with me.

This is the idea of my design.

> I always try to keep the plain file storage in mind. I would like to
> keep it simple. I don't expect users to be confused about the contents
> of their files frequently. So I do not want the plain file storage to
> mark and distinguish backup image files from other files.

I also keep plain file storage in mind. And my idea here is that the 
filesystem storage module would mark files containing backup images with the 
10 byte prefix as it is done now.

After creating backup storage session, when stream is opened for write the 
File BSM will write 8 magic bytes followed by 2 byte version number to the 
file and return information about 10 byte offset. The following write 
requests will append data to the file. If such file is opened for reading, 
File BSM will read 10 bytes, verify the magic bytes and report version 
number. If magic bytes are wrong, the "open stream for reading" request will 
fail with error.

However, XBSA BSM could arrange things differently, marking location as 
backup image and storing its version number in XBSA object attributes. The 
data will be written directly to the object's data stream. When opening 
location for reading, the object attributes will be consulted to see if it 
is a valid backup image and to report its version number.

> ...
>>> 6. "acknowledgement": The services look like they all return a status.
>>> Why do we need a second status return here? Can't the return status have
>>> different values that encode different (non-)problems? Another method
>>> could be to set algorithm "none" if the preferred one was rejected.
>> On this level of specification, I only want to say *what* information is
>> passed in and out of each service. I do not want to say *how* it is
>> done. I consider it part of LLD.
>> In this case I want to say that "Set compression algorithm service"
>> informs its user whether it successfully set-up compression or not. I do
>> not want to specify yet, how it is done - all the options which you
>> propose are valid. I expect this to be specified/clarified in LLD.
> Ok. You say that you want to say that the service says whether it could
> process the request successfully. Fine. But why don't any other service
> need to do that?
> In other cases you say, the service reports an error. So why is this not
> possible for the "Set compression algorithm service"? What is that
> special with it?
> It is the inconsistency I'm stumbling over. I want either all services
> "report error" on failure, or all services to "transport" an
> "acknowledgment" information.

I think that for "set compression algorithm" service, reporting error and 
reporting that compression is not supported are two different things. In the 
latter case the service completes successfully, informing the user that 
compression is not supported. This is different from the situation where 
service fails because of some reason.

So, we have three possibilities here:

1. Service fails with error.
2. Service completes successfully and reports that compression is on.
3. Service completes successfully and reports that compression is not

You are right that the first possibility applies to all services. But the 
other 2 are specific to this service and this is why they are specified 
here. I added a note explaining this to the newest HLS. Note also, that in 
the current design this service can give 3 different answers if it completes 

> "Amount of data that has been written"
> --------------------------------------
> I don't see us implement asynchronous I/O over the storage module
> interface. I just see the added complexity for the backup kernel to
> handle incomplete writes.

Yes, with your suggestion code will be simpler. I listed it as an advantage 
of the alternative you propose in HLS. As usual, there is a trade-off 
between simplicity and flexibility of an interface.

> The POSIX write() function has to handle a vast amount of weird devices.
> We have the implementation of storage modules under control. We can
> insist in complete writes or failure. This reduces the complexity of the
> backup kernel, which is good. I still don't foresee a real limitation.

I don't understand why you say "we have the implementation of storage 
modules under control". I would say we have control over the specification 
of the API but not over how it will be implemented.

If we want to design an open system, where any contributor can implement the 
API and plug his module into the system which should work with it, we can 
not assume anything about implementation unless we specify it in the API.

 From this perspective, it is very easy to imagine that a BSM will talk to 
some more or less exotic devices such as tapes, where it can even require 
physical loading of the tape before any bytes can be written. The POSIX 
write() interface allows the application to resume control and do something 
in such situations, while your simplified solution would mean that the 
backup kernel (or one of its threads) will be blocked waiting for the device 
to be ready.

> Having flexible and future-proof interfaces is a good thing. But if
> there is no foreseeable use for a feature, we might need to live with
> the burden forever, without taking any profit from it.
> OTOH, to assume we could specify an interface that won't need to change
> forever is naive.

Sure, I agree.

> So please come up with a real, foreseeable use case, or get rid of the
> parameter and accept an interface change one day in the future, if a
> real, but unforeseen need arises.

See above.

> "Information about end of stream"
> ---------------------------------
> If you see it this way, why don't you think it could be good that during
> a stream write the storage module gets some idea about when the stream
> is at its end?
> You probably will answer that the SM can assume end of stream when a
> "close" is requested. And the I will ask, why the backup kernel cannot
> assume end of stream when the SM does not deliver any data any more
> (zero length read)?

Sorry, I don't fully understand your concern(s). For the last sentence, SM 
can not deliver any data for other reasons than end of stream. Perhaps it 
knows that it will have to wait very long for next byte from the stream and 
thus it returns control to the caller informing that no bytes have arrived yet.

I know this is not how POSIX write() works but I see no real problem with 
leaving specification open in this respect. The specification simply says 
that a user of the service, after it has completed should know how many 
bytes have been read and whether there can be more bytes in the stream or not.

One way of conveying this information from the service is using the usual 
write() convention. That is, service returns single number N. If this number 
is non-zero it means that N bytes have been read *and* there are potentially 
more bytes in the stream. If N=0 it means that no bytes have been read and 
there are no more bytes in the stream. So, both informations required by the 
specification are returned using a single number. But specification is more 
general and allows also for other implementations.

> In my eyes the "information about end of stream" is an inconsistency in
> level of detail with other service specifications. I suggest to get rid
> of it. It is obvious that end of stream needs to be signaled somehow.

Maybe it is obvious, but in the HLS I try to list all information which goes 
in and out of each service.

>>> 15. "Error reporting": The methods should return non-zero on error.
>>> There should be a get_errno() and a get_errmsg() method. Internally the
>>> modules must be allowed to use include/mysqld_error.h and my_error() or
>>> my_printf_error() from include/my_sys.h.
>>   I am afraid of such design because it
>> requires a lot of knowledge from the module and makes it harder to
>> separate it from the rest of the system.
> What do you mean by "separate it from the rest of the system"? I guess,
> we agree that backup storage modules shall be dynamically loadable
> plugins. So they are limited to what such plugins are allowed to access.
> Do you have additional limitations in mind?

To clarify here - I think design should be open in the sense that a BSM 
module can be implemented independently from our mysql code. This does not 
mean I would restrict implementations to not use mysys or other parts of our 
code. On the contrary - the idea is to leave implementers complete freedom 
as long as they correctly implement our API. Freedom means that they *can* 
use mysql code in their implementation if they wish so. But this is 
different from saying that they *must* use mysql code, e.g., for error 
reporting. The latter I'd rather avoid and ensure that the whole interface 
between mysql backup and BSM is included in the API.

> This is handled through my_error() already. Unfortunately there is one
> language per server only, no session specific language yet. But this is
> how MySQL works. We should not try to solve it in a backup specific way.

I want to send some propositions for error handling soon. Let's discuss it 
then in more detail. I keep in mind your preferences, especially the strong 
preference that BSM error codes are globally registered in mysql server.

>> Note: using global error numbers is a bit problematic in an open
>> architecture. The global set of error numbers must fit any possible
>> implementation of a backup storage module. It is difficult to predict
>> all possible errors, thus either storage modules will not be able to
>> signal some errors or they will hack-around, breaking the modularity of
>> the system. In the ideal world, I'd like to have a system to which an
>> independently developed backup storage module can be plugged and it will
>> integrate smoothly, including satisfactory error reporting.
> While I see the advantages of dynamic error messages to the developer, I
> insist that global error numbers are a major requirement. When a
> customer gets an error message, he cannot understand, he'll contact the
> support team. If they don't get a unique error number, but just the
> Chinese error text, they may not be able to help.
> What we could do ideally would be to reserve number ranges for plugins
> and register their error messages when mounting them. I guess there
> exist some such ideas already, but I don't know if/when anything like
> that could be available.

Thanks for your suggestions. I'll keep them in mind when sending my proposal 
for error handling.

> I also strongly insist that backup storage module plugins must not be
> limited below the level of storage engine plugins. That is, they must be
> allowed to use libmysys (including my_error()), libdbug,
> system_charset_info, libmystrings (which can handle
> system_charset_info), etc.

Hope it is clear now that there was no intention to limit BSMs in that way.

>>> 16. "non-ascii strings": The interface should define strings in system
>>> character set.
>> Using system character set creates, unnecessary in my opinion,
>> dependence between a storage module and the server code. The module
>> would have to access this variable. Instead, we could specify "utf8" to
>> be the standard encoding for strings. This seems to be an obvious option
>> but I have not put it down yet, because I'm not sure about all the
>> consequences. And perhaps, we can be fine with just us-ascii strings?
> No. Location names and error messages must be internationalizable.

I agree.

Re: WL#4056: Pluggable storage modules for backupRafal Somla13 Oct
  • Re: WL#4056: Pluggable storage modules for backupIngo Strüwing15 Oct
    • Re: WL#4056: Pluggable storage modules for backupRafal Somla22 Oct
      • Re: WL#4056: Pluggable storage modules for backupIngo Strüwing22 Oct
        • RE: WL#4056: Pluggable storage modules for backupAndreas Almroth22 Oct
        • Re: WL#4056: Pluggable storage modules for backupRafal Somla23 Oct
          • Re: WL#4056: Pluggable storage modules for backupIngo Strüwing23 Oct
            • Re: WL#4056: Pluggable storage modules for backupRafal Somla3 Nov