List:Commits« Previous MessageNext Message »
From:Ingo Strüwing Date:December 12 2008 7:44pm
Subject:Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)
WL#4630
View as plain text  
Hi Øystein,

thank you very much for the first part of a very thorough review.
Please find answers below.

Øystein Grøvlen, 11.12.2008 14:24:
> Ingo,
> 
> Thanks, for this thorough and well tested patch.  I have not been
> through all of it, yet, but here are the comments and questions I have
> so far:
> 
>  * The name: With a client that is called mysqlbackup, I would expect
>    to be able to perform backup.  Has another name been considered?

No. At 1. This is similar to other programs like mysqlbinlog, which does
not perform binlogging either. At 2. Milestone 3 is defined as: Perform
a backup or restore and display the results.

> 
>  * Some questions from playing a bit with the client:
> 
>      - What does "extra data length" mean?  And why is it always 0?

It is the length of the "extra data" that could be present in the backup
image for an object. Currently, we do not write extra data to the backup
image. Hence it is always 0.

>      - Finish time is always 0.  Why?  It is set correctly in the
>        backup_history table for the same backup.

I do not know. It is the value that the stream library provides. I guess
it is not written to the backup file, or the stream library does not
fill this element of the "header" when reading the image.

>      - If I use data-chunks option, there are several entries for each
>        chunk:
>          Data chunks:
>            Chunk 1 has 29 bytes for table 'test'.'t'
>            Chunk 1 has 1025 bytes for table 'test'.'t'
>        This seems a bit strange.  Is it correct?

Good catch. This is a bug. I should have noticed it when accepting the
result file. The chunks should be numbered 1, 2, 3, ...

>      - What is --exact supposed to do?

mysqlbackup --help
  ...
  --exact   Print exact number of bytes instead of human readable form.
  ...

mysqlbackup mysql-test/mysqlbackup-test.bak
...
Image size:          8458 KB
...

mysqlbackup --exact mysql-test/mysqlbackup-test.bak
...
Image size:          8661804 bytes
...

> 
>  * All this error injection seems a bit of an overkill for a client
>    since in the event of failure all you need is to be able to stop
>    and report the error.  For a server that need to keep on going, I
>    see that it is necessary to test that errors does not take it down.

I wanted to be sure that the error is correctly reported and that we
don't get valgrind reports should someone add a new test. That is,
resources should be freed.

> 
>  * I am a bit puzzled as to why we need interface code on both side
>    the fence here.  stream_backup.c contains several functions that in
>    effect does nothing, but call functions in stream_v1.c.  (E.g.,
>    backup_read_summary is one single method call plus 20 lines of
>    initialization, debug code, and error testing.)  Why could not
>    mysqlbackup.cc use stream_v1.c directly?  Then error injection and
>    checking could be put there so that also the server could benefit
>    from it.  (This is maybe not my business since I am not the
>    architectural reviewer.)

I started with using stream_v1.c functions directly from mysqlbackup.cc.
See my first prototype: http://lists.mysql.com/commits/54743. But it
became ugly for my taste. There are a lot of constraints, which
function to call when, and several exceptions, I had to learn about.

My concept is now to use mysqlbackup.cc for data visualization and
backup_stream.c for data collection. This kind of a modularization. To
have it somewhat "clean", I decided to wrap all stream library
functions, even the few which could be used directly with little hassle.
But then the wrappers do also error reporting to free the main program
from it. This was already included in my second prototype:
http://lists.mysql.com/commits/56417

These two concepts had been present for discussion for some days. The
descision, to go with the second concept, has been made during the
Backup Meeting by phone conference on October 21:
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=2056

> 
>  * Took me some time to understand the role of the bcat_ functions.  I
>    think this should be explained a bit better.  Do the implementation
>    have to be this separate for the client and the server, or would it
>    be possible to share some more code?

It took me some time too. To say the least. These functions are defined
by the stream library interface. I had no influence on their architecture.

I must admit that I didn't spend much time in analyzing how to share
code with the server. Some small experiments required me to include most
of the server code, which didn't look right. I would also need to learn,
how the catalog is constructed in the server. A quick look shows that a
couple of classes are involved. I usually have a hard time to follow all
the classes used in the backup code. Since not even Rafal suggested to
do it, I didn't even try.

> 
>  * Is it safe to extend the structs like st_bstream_image_header?  Are
>    you guaranteed that the stream library will never make any
>    assumptions about the size of the struct?

I don't know, what you mean. Can you please give an example of an
"extension"?

> 
>  * I would prefer if the addition of error injection to mysys routines
>    was done as a separate patch.  This is a general mechanism and I
>    feel that is not related to backup.  I also think other people than
>    members of the backup team should review this.

I can't imagine, how this could be done. Such a task wouldn't get
sufficient priority to be looked at by anyone.

But I could remove it from the backup client patch. I have tested, what
I wanted to test with it.

> 
>  * The code contains a lot of hard-coded English messages.  Will there
>    not be a requirement to internationlize this?

Possible. But I can't see that any of our command line clients attempts
to do this.

> 
> 
> Some more detailed questions/comments inline:
> 
> ...
> 
>> === modified file '.bzrignore'
>> --- a/.bzrignore    2008-10-29 14:20:12 +0000
>> +++ b/.bzrignore    2008-11-25 19:33:34 +0000
>> @@ -314,6 +314,10 @@ client/mysql_upgrade
>>  client/mysqladmin
>>  client/mysqladmin.c
>>  client/mysqladmin.cpp
>> +client/mysqlbackup
>> +client/stream_v1.h
>> +client/stream_v1_services.h
>> +mysql-test/mysqlbackup-test.bak
> 
> Why are the three last entries added?

I want bzr to ignore these files when running gcommit.

client/stream_*.h are symbolic links to files in sql/backup. They are
required for #include directives. The Makefile for the client directory
does not use an -I../sql option, so I didn't add an -I../sql/backup
option either.

mysql-test/mysqlbackup-test.bak is created during backup_client.test. I
do not want to delete it at the end of the test case so that I have
something to feed mysqlbackup with in manual tests.

> 
> ...
> 
>> +/*
>> +  Error injection. Built on DBUG.
>> +  This is similar to error injection in the server.
>> +  In case that one becomes globally available, we undefine it first.
>> +*/
>> +#ifdef ERROR_INJECT
>> +#undef ERROR_INJECT
>> +#endif
>> +#define ERROR_INJECT(_keyword_, _action_) \
>> +  DBUG_EXECUTE_IF((_keyword_), errm("ERROR_INJECT(\"%s\")\n",
> (_keyword_)); \
>> +                  DBUG_PRINT("bupstrm", ("ERROR_INJECT(\"%s\")\n",
>         \
>> +                                         (_keyword_))); _action_)
> 
> Could not this be a general macro and not particular to this file?

Yes, but then I think this must be accepted outside the team. In
mysql_priv.h we have an ERROR_INJECT macro already, though conditionally
compiled. This means, it is not generally usable, but could clash, when
enabled. Mikael Ronström made this one. Even though it is not used
anywhere, we would need his approval for a change. Note that his error
injection feature works slightly different than mine. So this should be
carefully considered.

We might however find a place which is common to backup code.

> 
>> +
>> +/* For easier error tracking we do not pre-allocate in DBUG mode. */
>> +#if !defined(DBUG_OFF)
>> +#define DYN_ALLOC_INIT 0
>> +#define DYN_ALLOC_INCR 1
>> +#else
>> +#define DYN_ALLOC_INIT 1000
>> +#define DYN_ALLOC_INCR 1000
>> +#endif
> 
> Why is this done?

For efficiency.

In a debug server, every insertion into a dynamic array causes a call to
my_malloc(). This allows for better error tracking. For example we would
notice faster if we store pointers to array elements. Every reallocation
can invalidate such address.

In a production server, we preallocate 1000 array elements and need to
reallocate only for every 1000th insert.

> 
>> +
>> +/*
>> +  ====================================================
>> +  Convenience functions for access to st_blob objects.
>> +  ====================================================
>> +*/
>> +
>> +/**
>> +  Backup blob length.
>> +
>> +  This is a function, not a macro, for better type checking.
>> +
>> +  @param[in]    blob            blob
>> +
>> +  @return       length
>> +*/
>> +
>> +ulong BBL(struct st_blob *blob)
> 
> Why name it like a macro, when it is a function?

I have BBL(), BBS(), and BBLS(). The latter must be a macro, the first
two could be a macro. IMHO it would be more complicated to understand,
why they follow different naming schemes.

Having those, that can be functions, be functions, helped me in finding
problems.

While I write this, I notice a potential optimization, which would also
please you preference:

#if !defined(DBUG_OFF)
/* Backup blob length. */
ulong backup_blob_length(struct st_blob *blob);
#define BBL(_b_) backup_blob_length(_b_)
/* Backup blob string. */
char *backup_blob_string(struct st_blob *blob);
#define BBS(_b_) backup_blob_string(_b_)
#else
#define BBL(_b_) ((ulong) (blob->end - blob->begin))
#define BBS(_b_) ((char*) blob->begin)
#endif
/*
  Backup blob length and string.
  This is for use with printf format '%.*s'.
  This must be a macro because it produces two comma separated values!
*/
#define BBLS(_b_) ((uint) BBL(_b_)), BBS(_b_)

This would even save some function calls on a production system.
However, problems with argument types might lead to error messages about
backup_blob_*() in lines, which have BBL/BBS/BBLS only.

...
>> +/*
>> +  ====================================================
>> +  Memory allocation.
>> +  These are callback functions for the stream library.
>> +  ====================================================
>> +*/
>> +
>> +/**
>> +  Allocate given amount of memory and return pointer to it.
>> +
>> +  @param[in]    size            amount of memory to allocate
>> +
>> +  @return       pointer to allocated memory
>> +*/
>> +
>> +bstream_byte* bstream_alloc(unsigned long int size)
>> +{
>> +  return my_malloc(size, MYF(MY_WME));
>> +}
>> +
>> +
>> +/**
>> +  Free previously allocated memory.
>> +
>> +  @param[in]    ptr             pointer to allocated memory
>> +*/
>> +
>> +void bstream_free(bstream_byte *ptr)
>> +{
>> +  my_free(ptr, MYF(MY_ALLOW_ZERO_PTR));
>> +}
>> +
> 
> It seems a bit strange that while I can't find that bstream_alloc is
> used anywhere, bstream_free is used in several places.  Should not
> these come in pairs?  There is little backup specific about the these
> functions.  Are they actually needed, and if yes, why not make the
> functionality generally available?

These are callbacks for the stream library. The stream library allocates
the data that it hands over to the bcat_* callback functions. It is the
duty of the calling application to free it after use.

One can go without defining these functions in the application by
defining BSTREAM_USE_MALLOC before including stream_v1_services.h. but
then they use malloc() and free() directly, instead of my_malloc() and
my_free(). The main difference is that the latter report errors. So I
could never be sure if an error has been reported when a stream
functions returns BSTREAM_ERROR.

...
>> +  'stream_pos' is used to describe which section/chunk of a backup
>> +  image we have read last.
>> +*/
>> +struct st_stream
>> +{
...
>> +  const char                    *stream_pos;    /* Verbal stream
> position */
> 
> It seems a bit strange to me to use a string to record position.  Why
> not an enum?

The stream position, which I store here, is not a "record position". It
is more like a "chunk" position. The value are: "prefix", "header",
"catalog", "meta data", "table data", and "summary". They reflect the
structure of a backup image.

This is used for error reporting like this:

      errm("end of stream after %s.\n", strm->stream_pos);

We could use enums, but error reporting would require translation of
enums to strings. Probably in a function like this:

char *verbal_stream_pos(enum enum_stream_pos strpos)
{
  switch (strpos) {
  case ENUM_STREAM_POS_META_DATA: return "meta data";
  ...
}

We would need to pre-define the enum with all the positions that we want
to report. Whenever we want to report about a new position, we would
need to extend the enum and the function.

Using an enum in the structure would save 4 bytes on a 64-bit system.
But since we open one stream only, it would not multiply. So it is not a
big win.

In the proposed patch, I store a string address of a string literal. No
string copy is done. The literals are required anyway. A new position
just requires to set strm->stream_pos to a new string literal. For my
taste, this is simple, easy, and efficient.

...
>> +/**
>> +  Read from the stream/image.
>> +
>> +  @param[in,out]    strm        stream handle, updating position
>> +  @param[in,out]    data        data container, updating contents and
> ptrs
> 
> I do not think it is easy to understand from this comment what is
> producer and comsumer here.  And what "updating contents and ptrs"
> implies.

Agree. I tried to keep the parameter descriptions short. This didn't
work out well.

Please note that this is also a callback for the stream library. It's
signature is specified in stream_v1_services.h:

+  /**
+    Function reading bytes from the underlying media.
+    For specification, see @c bstream_read_part() function.
+  */
+  typedef int (*as_read_m)(void *, bstream_blob*, bstream_blob);

However, bstream_read_part() is the caller of the as_read_m typed
function. So this comment basically says "Go and figure out yourself,
how it is used." That's what I had to do.

Fortunately, at least the function comment of bstream_read_part() tries
to explain, how "bstream_blob"s are used.

And, there is a reference implementation in stream.cc: stream_read().
Unfortunately, its comments doesn't add much insight.

In the end I was happy that I made it work at all. I didn't feel much of
an obligation to explain it better than what I had to work with.

But now I see, I was in error. I take the blame. How about this description:

The 'data' parameter is a reference to an st_blob structure, which
contains two pointers, 'begin' and 'end'. When the function is called,
the pointers point into a buffer to receive the data from the stream.
They are to be placed starting at 'begin' and added up to 'end'. On
return of the function, the 'end' pointer is adjusted towards 'begin',
if less than the requested amount of data was read from the stream.

> 
>> +  @param            envelope    not used
> 
> What is the intention of envelope?

From the reference implementation and the calling function, I guess it
is the buffer as it has been allocated and into which the 'data'
pointers point. It is not necessary that 'data->begin' is at
'envelope->begin', nor that 'data->end' is at 'envelope->end'. But both
'data' pointers must be between the 'envelope' pointers.

However, the parameter is never used. Not in my implementation, nor in
the reference implementation. I don't even have an idea, what it could
be useful for.

...
>> +static int
>> +str_read(struct st_stream *strm, struct st_blob *data,
>> +         struct st_blob envelope __attribute__((unused)))
...

> 
> I suggest to put the compression related code in separate functions.
> That would ease the reading of this code.
> 
>> +#ifdef HAVE_COMPRESS
>> +  if (strm->zbuf)

Ok.

...
>> +    /* Set output buffer pointer and size. */
>> +    zstream->next_out= data->begin;
> 
> Why not use BBS here?

BBS returns char*. Both 'next_out' and 'begin' are unsigned char*.

...
>> +#ifdef NOTUSED
>> +/**
>> +  Skip part of the stream/image.
...
>> +*/
>> +
>> +static int
>> +str_forward(struct st_stream *strm, size_t *len)
> 
> Is there a plan for using this function?  Or can it be removed?

It seems that it can be used if the catalog contains "extra data", which
won't be the case with today's backup, if I understand correctly, and in
cases where reading a "chunk" is abandoned before its end and the next
chunk is requested.

Both things don't happen with current code. Hence I cannot test the
function and so I defined it away. One day the situation may change.
Then we have something to start with and don't need to think about it
from the beginning.

...
>> +static int
>> +str_open_rd(struct st_stream *strm, const char *path, uint *version_p)
>> +{
>> +  MY_STAT               stat_area;
>> +  size_t                lgt;
>> +  int                   rc= BSTREAM_OK;
>> +  int                   errpos= 0;
>> +  uint16                version;
>> +  const unsigned char   backup_magic_bytes[8]=
>> +    {
>> +      0xE0, // ###.....
>> +      0xF8, // #####...
>> +      0x7F, // .#######
>> +      0x7E, // .######.
>> +      0x7E, // .######.
>> +      0x5F, // .#.#####
>> +      0x0F, // ....####
>> +      0x03  // ......##
>> +    };
> 
> I suggest that these magic bytes are defined a place where the same
> definition could be used for both server and client.

Hehe. I tried this with another define. The problem is that the headers
in sql/backup/ are heavily interwoven with server code.

The two exceptions stream_v1.h and stream_v1_services.h are not
appropriate either, because Rafal's architecture is so that the magic
bytes will never change, while a new version of the stream library may
change everything. So we would probably need a new header file. I'm
awaiting your proposal.

...
>> +  /*
>> +    Open the image file.
>> +  */
>> +  strm->fd= my_open(strm->path, O_RDONLY, MYF(0));
>> +  if (strm->fd < 0)
>> +  {
>> +    errm("cannot open backup image '%s': %s\n", strm->path,
> strerror(my_errno));
> 
> Why generate your own error here?  Could you not let my_open generate
> the error?

The only two errors that my_open(...MYF(MY_WME)) can generate are
EE_FILENOTFOUND and EE_OUT_OF_FILERESOURCES. This is too limited for my
taste.

my_errno contains errno from open(2) and thus covers a better range of
possible open errors.

...
> Here, too, I suggest to put compress code in separate method.
> 
>> +#ifdef HAVE_COMPRESS
>> +  /*
>> +    Check for compression.
>> +  */
>> +  if (!memcmp(prefix, gzip_magic_bytes, sizeof(gzip_magic_bytes)))

Ok.

...
>> +    goto err;
>> +  }
>> +  DBUG_PRINT("bupstrm", ("opened  stream: '%s'  fd: %d  rc: %d",
>> +                         strm->path, strm->fd, rc));
>> +
>> +  goto end;
> 
> Why not just return here?

I like to minimize the number of DBUG_RETURN()s, because they add extra
code. Not a big deal, but anyway.

...
>> +  case BSTREAM_IT_CHARSET:
>> +  case BSTREAM_IT_USER:
>> +  case BSTREAM_IT_TABLESPACE:
>> +  {
>> +    struct st_backup_global     *bup_global;
>> +    DYNAMIC_ARRAY               *array;
>> +
>> +    /* Allocate memory for the item. */
>> +    ERROR_INJECT("bcat_add_item_global_malloc",
> my_malloc_error_inject= 1;);
>> +    bup_global= (struct st_backup_global*)
>> +      my_malloc(sizeof(struct st_backup_global), MYF(MY_WME));
>> +    if (!bup_global)
>> +    {
>> +      /* Error message reported by mysys. */
>> +      brc= BSTREAM_ERROR;
>> +      goto end;
>> +    }
>> +    /* Copy the item struct into the allocated struct. */
>> +    bup_global->glb_item= *item;
>> +    /* Initialize other elements. */
>> +    bzero(&bup_global->glb_metadata,
> sizeof(bup_global->glb_metadata));
> 
> The above allocation is very similar for case of this switch
> statement.  I suggest to parameterize it in a helper function.

I have brooded over this a couple of times. But I always abstained from
the attempt to do it. The only things common to all cases are the malloc
and the assignment of the item. Metadata is not always present. And if
it is present, it has a different place in every object type.

When using a helper function, its return value must be casted in each
case anyway. The error check has to be done anyway. In summary, I could
save the assignment of the item. I don't think it's worth it.

> 
> ...
> 
>> +/**
>> +  Create global iterator of a given type.
>> +
>> +  Possible iterator types.
>> +
>> +  - BSTREAM_IT_CHARSET:    all charsets
>> +  - BSTREAM_IT_USER:       all users
>> +  - BSTREAM_IT_TABLESPACE: all tablespaces
>> +  - BSTREAM_IT_DB:         all databases
>> +
>> +  The following types of iterators iterate only over items for which
>> +  some meta-data should be saved in the image.
>> +
>> +  - BSTREAM_IT_GLOBAL:     all global items in create-dependency order
>> +  - BSTREAM_IT_PERDB:      all per-db items except tables which are
> enumerated
>> +                           by a table iterator (see below)
>> +  - BSTREAM_IT_PERTABLE:   all per-table items in create-dependency
> orders.
>> +
> 
> I do not quite understand this comment.  Are the three latter no
> possible iterator types since they are in another list?

I'm not sure either. I copied the comment verbatim from the stream
library interface definition in stream_v1_services.h.

I guess that "Possible iterator types" applies to all the iterator
types, but that the latter three have an additional comment.

...
>> +    /* purecov: begin inspected */
>> +  case BSTREAM_IT_CHARSET:
>> +    iter->it_array= &bup_catalog->cat_charsets;
>> +    DBUG_PRINT("bupstrm", ("charset"));
>> +    break;
>> +
>> +  case BSTREAM_IT_USER:
>> +    iter->it_array= &bup_catalog->cat_users;
>> +    DBUG_PRINT("bupstrm", ("user"));
>> +    goto err;
>> +    /* purecov: end */
> 
> Is it not possible to test the above, yet?

Right. We don't backup character sets and users yet.

> 
>> +
>> +  case BSTREAM_IT_DB:
>> +    iter->it_array= &bup_catalog->cat_databases;
>> +    DBUG_PRINT("bupstrm", ("database"));
>> +    break;
>> +
>> +    /* purecov: begin inspected */
>> +  case BSTREAM_IT_TABLESPACE:
>> +    iter->it_array= &bup_catalog->cat_tablespaces;
>> +    DBUG_PRINT("bupstrm", ("tablespace"));
>> +    goto err;
>> +
>> +  case BSTREAM_IT_GLOBAL:
>> +    errm("iterator type not yet implemented: global\n");
>> +    DBUG_PRINT("bupstrm", ("NOTYET implemented: global"));
>> +    goto err;
>> +
>> +  case BSTREAM_IT_PERDB:
>> +    errm("iterator type not yet implemented: perdb\n");
>> +    DBUG_PRINT("bupstrm", ("NOTYET implemented: perdb"));
>> +    goto err;
>> +    /* purecov: end */
> 
> Ditto.

Well, I don't know why these have not been used during all the tests I
did. There were definitely tablespaces and views (per-db items) in the
backup files.

A vague guess may be that all of these have no sub-objects. So that the
stream library does not need to access the "parent" structure ever. Only
 the databases iterator was used. The catalogs contained sub-items of
databases. When handing over such sub-type to the bcat_add_item()
function, it sets up the item structure with a reference to the database
object within the application's catalog. To get at that reference, it
needs to use an iterator to find the database by its "catalog position".

> 
> ...
> 
>> +
>> +/**
>> +  Create database object from its meta-data.
>> +
>> +  When the meta-data section of backup image is read, items can be
> created
>> +  as their meta-data is read (so that there is no need to store these
>> +  meta-data). This functions stores them in the catalog instead of
>> +  creating database objects. So the application can make different use
>> +  of the data.
> 
> I did not quite understand this comment.  Are you saying that items
> CAN be created as the meta-data is read, but that this is not the way
> it is done?

Yes. The wording is not good.

The problem is that the declaration of this function in
stream_v1_services.h has this comment: "When the meta-data section of
backup image is read, items are created as their meta-data is read (so
that there is no need to store these meta-data). Backup stream library
calls this function to create an item."

I wanted to say that this ws the initial intent of the semantics for
this function. But I changed it to *not* creating the item, but instead
store the meta-data in the catalog. I want to display them later.

...
>> +  /*
>> +    Now that we know, how many snapshots we have, initialize the
>> +    snapshot pos to table index arrays.
>> +  */
>> +  DBUG_PRINT("bupstrm", ("allocating snapshot indexes: %u",
>> +                         bup_catalog->cat_header.snap_count));
>> +  for (idx= 0; idx < bup_catalog->cat_header.snap_count; idx++)
>> +  {
>> +    struct st_backup_snapshot *bup_snapshot;
>> +
>> +    /* Allocate memory for the item. */
>> +    ERROR_INJECT("backup_image_open_snapshot_malloc",
>> +                 my_malloc_error_inject= 1;);
>> +    bup_snapshot= (struct st_backup_snapshot*)
>> +      my_malloc(sizeof(struct st_backup_snapshot), MYF(MY_WME));
>> +    if (!bup_snapshot)
>> +    {
>> +      /* Error message reported by mysys. */
>> +      rc= BSTREAM_ERROR;
>> +      goto err;
>> +    }
>> +    /* Initialize a new index array. */
>> +    rc= my_init_dynamic_array(&bup_snapshot->snap_index_pos_to_table,
>> +                              sizeof(struct st_backup_table*),
>> +                              DYN_ALLOC_INIT, DYN_ALLOC_INCR);
>> +    ERROR_INJECT("backup_image_open_index", if (!rc) {rc= TRUE;
>> +        delete_dynamic(&bup_snapshot->snap_index_pos_to_table);});
>> +    ERROR_INJECT("backup_image_open_index1", if (idx && !rc) {rc= TRUE;
>> +        delete_dynamic(&bup_snapshot->snap_index_pos_to_table);});
>> +    if (rc)
>> +    {
>> +      /* Error message reported by mysys. */
>> +      my_free(bup_snapshot, MYF(0));
>> +      rc= BSTREAM_ERROR;
>> +      goto err;
>> +    }
>> +    /* Check consistency of array position. */
>> +    DBUG_ASSERT(idx == bup_catalog->cat_snapshots.elements);
>> +    /* Insert in the catalog. */
>> +    ERROR_INJECT("backup_image_open_snapshot_insert",
>> +                 my_malloc_error_inject= 1;);
>> +    rc= insert_dynamic(&bup_catalog->cat_snapshots, (uchar*)
> &bup_snapshot);
>> +    if (rc)
>> +    {
>> +      /* Error message reported by mysys. */
>> +      delete_dynamic(&bup_snapshot->snap_index_pos_to_table);
>> +      my_free(bup_snapshot, MYF(0));
>> +      rc= BSTREAM_ERROR;
>> +      goto err;
>> +    }
> 
> The error handling is a bit inconsistent in this function.  For the
> first part you set errpos a do the cleanup at the end.  For the latter
> part you do clean-up before going to the end.  Why not move most of
> the content of the above error handling to 'err:'?  Calling my_free
> (providing proper initialization and allowing for null) and setting
> rc, could be done unconditionally.  No reason to repeat that code for
> each function call.

Hm. I just noticed that rc= BSTREAM_ERROR in the loop is not required.
rc isn't used later.

Now we argue about combining two my_free(bup_snapshot, MYF(0)) into one.
To do this, I must move bup_snapshot from the loop to the top of the
function and either initialize it to NULL, or to set another errpos
within the loop after an allocation.

Please tell me your preference. I'll change it accordingly, though I'd
still prefer to keep bup_snapshot in the loop.

...
>> +enum enum_bstream_ret_codes
>> +backup_read_catalog(void* image_handle, struct st_backup_catalog
> *bup_catalog)
>> +{
>> +  struct st_stream *strm= (struct st_stream*) image_handle;
>> +  enum enum_bstream_ret_codes brc;
>> +  DBUG_ENTER("backup_read_catalog");
>> +  DBUG_ASSERT(strm);
>> +  DBUG_ASSERT(bup_catalog);
> 
> Would it not be appropriate here and in the other similar _read_
> functions to assert that stream_pos is as expected?

- This would not work with the current way, I assign string literals to
stream_pos (unless we do string comparison). I would at least have to
use file-global string constants, if not enums as you suggested. Since I
found that I did a string compare for "meta data", I did now define
string constants like: const char *STREAM_POS_META_DATA= "meta data". So
this argument is void now.

- Assertions are used to catch flaws in the implementation. Wrong stream
pos is more likely to happen due to data corruption. Then it is better
to check with 'if'.

- I guess that this check is done in the stream library. But if you
think we should check it here anyway, then ok.

- It may take some time to figure out allowed combinations. For example,
I guess that table data do not need to be present. If the summary
follows the meta data, this should proably be accepted. Summary can be
inline in the header. Etc.

...
>> +enum enum_bstream_ret_codes
>> +backup_read_snapshot(void *image_handle,
>> +                     struct st_backup_catalog *bup_catalog
>> +                      __attribute__((unused)),
>> +                     struct st_bstream_data_chunk *snapshot)
>> +{
>> +  struct st_stream *strm= (struct st_stream*) image_handle;
>> +  enum enum_bstream_ret_codes brc;
>> +  DBUG_ENTER("backup_read_snapshot");
>> +  DBUG_ASSERT(strm);
>> +  DBUG_ASSERT(snapshot);
>> +
>> +  /*
>> +    First read data chunk requires search of next chunk start.
>> +  */
> 
> I do not quite understand this comment.  I think you mean that before
> reading the first data chunk, you must search for it while this is not
> necessary for the succeeding chunks.

Correct. The sentence is broken.

> 
> ...
> 
>> +
>> +
>> +/*
>> +  ===================
>> +  Catalog navigation.
>> +  ===================
>> +*/
>> +
>> +/*
>> +  Catalog items are located by "catalog coordinates". The format of the
>> +  catalog coordinates depends on the type of item. It is specified as
>> +  follows:
>> +
>> +  [item position (global)]= [db no.]
>> +  [item position (table)]= [ snap no. ! pos in snapshot's table list ]
>> +  [item position (other per-db item)]= [ pos in db item list ! db no. ]
>> +  [item position (per-table item)] = [ pos in table's item list !
>> +                                       db no. ! table pos ]
> 
> What does '!' mean.  If it is just a separator, please, choose
> another character so that people do not believe it has something to do
> with negation.

I copied the from stream_v1.c. I hesitate to introduce an inconsistent
syntax notation here.

...
>> +/*
>> +  =================================
>> +  Convenience functions and macros.
>> +  =================================
>> +*/
>> +
>> +/* Backup blob length. */
>> +ulong BBL(struct st_blob *blob);
>> +
>> +/* Backup blob string. */
>> +char *BBS(struct st_blob *blob);
>> +
>> +/*
>> +  Backup blob length and string.
>> +  This is for use with printf format '%.*s'.
>> +  This must be a macro because it produces two comma separated values!
>> +*/
>> +#define BBLS(_b_)       ((uint) BBL(_b_)), BBS(_b_)
>> +
>> +/* The above shall also be usable by C++. */
>> +C_MODE_END
> 
> I am not sure I like this mix of macros and functions spread across
> several files.  Would it not be better to define everything in the
> same place?

I think this is the usual way to do it. Header files contain
declarations of global functions and macros that are used in multiple
files. Function definitions go into C files.

>   More self-explanatory names would also help.

Wait until you see the usage of these functions/macro in mysqlbackup.cc.
I guess, you wouldn't like self-explanatory names in the argument lists
of printf()...

> 
> That's how far I have got for now.  More to come ...

Thank you very much!

Regards
Ingo
-- 
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Dr. Roland Bömer
Vorsitzender des Aufsichtsrates: Martin Häring   HRB München 161028

Thread
bzr commit into mysql-6.0-backup branch (ingo.struewing:2745) WL#4630Ingo Struewing25 Nov
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Øystein Grøvlen11 Dec
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing12 Dec
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Rafal Somla19 Dec
        • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing10 Feb 2009
          • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Rafal Somla10 Feb 2009
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Rafal Somla22 Dec
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing12 Feb 2009
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Rafal Somla17 Feb 2009
        • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing17 Feb 2009
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Øystein Grøvlen17 Feb 2009
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing18 Feb 2009
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Øystein Grøvlen19 Feb 2009
        • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing21 Feb 2009