List:Backup« Previous MessageNext Message »
From:Rafal Somla Date:October 14 2009 8:25am
Subject:Re: MyBRM in backup kernel (RE: Dissertation project and MySQL backup
extension)
View as plain text  
Hi Andreas,

I'm sending you replies to your comments on the design (which I should 
really have done long time ago :/). We can also discuss/calrify these during 
our call tomorrow. I will update WL#5046 with the feedback I got from you 
and Ingo.

Andreas Almroth wrote:
> 
> I've noticed you use the terms location and image, and I read it as you 
> expect a location to contain multiple backup images?
> 
I am using the terms as follows:

backup image 	- the data which we want to store
location     	- the "place" where it is stored

In file system storage, location will be a path to the file storing backup 
image, e.g., "/my/backup/dir/image.bkp". In XBSA, my locations will be 
mapped to XBSA objects. AFAIK these XBSA objects are identified by "paths" 
similar to filesystem ones.

Thus, no - single location can contain only one backup image. But it can 
also be empty (file "/my/backup/dir/image.bkp" does not exist) or can be 
occupied by something else than a backup image.

> The location string given in the BACKUP/RESTORE commands must be a
> unique
> name. Storing the same name does not work easily, at least not in the 
> context of XBSA. I think it would add too much complexity in the 
> implementation layer. At best, you would get an error, stating duplicate
> 
> object.
> 
> If the location is the prefix, and the rest of the string is the unique
> identification, most backup systems would be able to handle that without
> too much added code. This is how my current prototype works.
> 

I was considering "location" to be the full string identifying the, well, 
location at which backup image should be stored. I'm not sure what you mean 
by "storing the same name". I assume this is the situation when we try to 
store backup image at a location which is already occupied (either by 
another image or by some other data).

In the design of WL#5046 it should work as follows:

1. When backup storage session is created, the location string is given, 
which specifies the "place" where backup image should be find. This location 
can be empty or already occupied.

2. When opening input stream, the location should contain a backup image. If 
this is not the case then error is reported.

3. When opening output stream, the location should be empty, otherwise error 
is reported.

4. If location is not empty, it can be emptied with the "Free the location" 
service.

5. Whether the location contains an image or not can be found using "Get 
information about image stored in the location" service.

Not sure if it addresses all your concerns on this issue.

>> When integrating MyBRM into MySQL backup system, I'd like to make it
>> less general and more focused on the functionality which is needed
> here.
>> The main changes I want to propose are:
>>
>> 1. The API is tailored towards storing backup image data, not any kind
>> of data. 
>> This is reflected in the fact that some meta-information about the
> image
>> will be handled by a MyBRM module, not in the backup kernel. In my
>> current proposal it is the image format version number and also the
>> "magic bytes" which is really just a way of distinguishing backup
> images
>> from other kinds of data (see below for more details).
>>
> 
> It may be "dangerous" to push down too much logic into a storage
> plug-in, but
> as long as the meta-data has a tiny footprint, most enterprise backup
> systems
> can store the data together with the backup image header, and not within
> the data stream. We wouldn't want force the implementers to write
> supporting 
> code for data we need in the backup kernel. If it is essential data to
> the 
> functionality, it is my opinion it should be controlled by the backup
> kernel
> rather than the implementation library.
> 
> Also, we must keep in mind that we do not know the backend storage type;
> E.g. if the storage is tape, we must expect very long access times, and
> we wouldn't want to put too much meta-data in the data stream. Thus, I
> find
> it an acceptable solution to put the meta data in the image header, and
> therefore accessible very fast in most backup products. Obviously we can
> not 
> design for every possible scenario and backup product...
> 
> For instance, today, my prototype store the version and magic bytes in
> the
> data stream, not the header, but by using a disk based storage backend,
> restores and browsing works very fast.
> 

The only meta-info about backup image (let's use meta-info instead of 
meta-data to avoid confusion with meta-data stored in backup image) which I 
consider here is:
1. The version number of the format in which backup image is stored.
2. The "fingerprint" distinguishing backup image data from other kinds of data.

In current code, this meta-info is stored in-stream, as a (10 byte) prefix 
of a backup image.

In WL#5046 I proposed to store it off-stream using methods specific to each 
backup storage module.

Ingo from our team thinks that it is better to keep this meta-info in stream.

I'm not sure what is your opinion - above you seem to give arguments for 
both options.

In any case, this is one of the things we must decide in the design. Usually 
we do it by voting after gathering all opinions.

>> 3. Remove from the API general functions for managing backup
>> repositories, such as listing/searching for backup images etc. This is
>> not used by BACKUP/RESTORE commands and is unlikely to be handled from
>> SQL level. More likely, there will be separate tools for managing such
>> backup repositories.
>>
> 
> I think the API functions should stay, as they can be used by other
> tools
> which also link to the shared library. Having two libraries from each
> implementer would just complicate software maintenance. I foresee that
> a built-in admin tool could come in the future to manage backup 
> expiration and other similar tasks. Perhaps an extension of the code
> Lars 
> has written for a admin tool?
> 
> Potentially the memory footprint of the mysqld process would be
> larger, but I don't see that as a real problem in most systems today.
>

I see no problem with having the extra functionality implemented in the 
module and exported by some kind of API. Then, the module could be used by 
an admin tool to perform administrative tasks.

However, I have problem with adding such services to the backup storage 
module interface. I would rather like to keep this interface minimal and 
focused on the functionality which is needed for the task at hand.

Thus I propose this: The BSM interface will not contain functions for 
managing backup locations but a backup storage module can implement an 
extended interface. It will not be used by backup kernel but can be used by 
other clients of the module. Designing such extended interface will be 
outside the scope of WL#5046.

What do you think - is it acceptable approach?

>> 4. Add compression service (in the future also encryption) to the API
> as
>> it is used by our backup system and is supposed to be done
> "externally",
>> that is on the underlying storage level.
>>
> 
> Doable. Although, history shows that sub-linking crypto and compression
> libraries can conflict with similar/same shared libraries, but different
> versions used my mysqld. We have had this issue with Oracle and
> NetBackup
> implementation. I think however, mysqld is only using libcrypt today,
> but
> I guess/assume that could change if a new feature is introduced in a
> storage
> engine.
> 

Hmm that sounds scary. I'm not an expert on how dynamically loadable code 
works. The issue you describe here looks to me to be a more general one. The 
way I see it:

We have dynamic module M which uses library L. There are two possibilities

a) L is statically linked into M
b) L is dynamic

Now we load M into our system S. If I understand correctly, you say that in 
case a) we can have problem if S also uses L but a different version of it. 
This looks bad as it would basically prevent module developer from using any 
static libraries unless they want to count on luck that the system will not 
use the same library.

In case b) I really don't know what happens. If S is also using L and has 
already loaded it, will M find the loaded L and link to it? If S is not 
using L, will L be correctly loaded and linked when M is loaded?

So far, I simply assumed that such things work one way or another...

>> The name of the game
>> --------------------
>> This is a topic of never ending discussions, but anyway... You named
>> your framework MyBRM which stands for Backup and Restore Module. This
>> name suggests (to me) that such modules perform backup/restore
> services,
>> which is not the case. Rather, they perform storage services for the
>> backup kernel (part of mysql
>> server) and this kernel performs backup and restore operations.
>> Therefore I'd prefer a different name. For the moment I propose Backup
>> Storage Module or Backup Storage Manager, both abbreviated as BSM.
>>
> 
> BSM works for me, not a problem at all. Backup Storage Module is perhaps
> 
> the best name, as the implementation libraries should be seen as
> pluggable 
> modules.
>

OK, great. I already started using "backup storage module" name.

Rafal
Thread
Re: MyBRM in backup kernel (RE: Dissertation project and MySQL backupextension)Rafal Somla14 Oct