| List: | Commits | « Previous MessageNext Message » | |
| From: | Ingo Strüwing | Date: | December 12 2008 7:44pm |
| Subject: | Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745) WL#4630 | ||
| View as plain text | |||
Hi Øystein, thank you very much for the first part of a very thorough review. Please find answers below. Øystein Grøvlen, 11.12.2008 14:24: > Ingo, > > Thanks, for this thorough and well tested patch. I have not been > through all of it, yet, but here are the comments and questions I have > so far: > > * The name: With a client that is called mysqlbackup, I would expect > to be able to perform backup. Has another name been considered? No. At 1. This is similar to other programs like mysqlbinlog, which does not perform binlogging either. At 2. Milestone 3 is defined as: Perform a backup or restore and display the results. > > * Some questions from playing a bit with the client: > > - What does "extra data length" mean? And why is it always 0? It is the length of the "extra data" that could be present in the backup image for an object. Currently, we do not write extra data to the backup image. Hence it is always 0. > - Finish time is always 0. Why? It is set correctly in the > backup_history table for the same backup. I do not know. It is the value that the stream library provides. I guess it is not written to the backup file, or the stream library does not fill this element of the "header" when reading the image. > - If I use data-chunks option, there are several entries for each > chunk: > Data chunks: > Chunk 1 has 29 bytes for table 'test'.'t' > Chunk 1 has 1025 bytes for table 'test'.'t' > This seems a bit strange. Is it correct? Good catch. This is a bug. I should have noticed it when accepting the result file. The chunks should be numbered 1, 2, 3, ... > - What is --exact supposed to do? mysqlbackup --help ... --exact Print exact number of bytes instead of human readable form. ... mysqlbackup mysql-test/mysqlbackup-test.bak ... Image size: 8458 KB ... mysqlbackup --exact mysql-test/mysqlbackup-test.bak ... Image size: 8661804 bytes ... > > * All this error injection seems a bit of an overkill for a client > since in the event of failure all you need is to be able to stop > and report the error. For a server that need to keep on going, I > see that it is necessary to test that errors does not take it down. I wanted to be sure that the error is correctly reported and that we don't get valgrind reports should someone add a new test. That is, resources should be freed. > > * I am a bit puzzled as to why we need interface code on both side > the fence here. stream_backup.c contains several functions that in > effect does nothing, but call functions in stream_v1.c. (E.g., > backup_read_summary is one single method call plus 20 lines of > initialization, debug code, and error testing.) Why could not > mysqlbackup.cc use stream_v1.c directly? Then error injection and > checking could be put there so that also the server could benefit > from it. (This is maybe not my business since I am not the > architectural reviewer.) I started with using stream_v1.c functions directly from mysqlbackup.cc. See my first prototype: http://lists.mysql.com/commits/54743. But it became ugly for my taste. There are a lot of constraints, which function to call when, and several exceptions, I had to learn about. My concept is now to use mysqlbackup.cc for data visualization and backup_stream.c for data collection. This kind of a modularization. To have it somewhat "clean", I decided to wrap all stream library functions, even the few which could be used directly with little hassle. But then the wrappers do also error reporting to free the main program from it. This was already included in my second prototype: http://lists.mysql.com/commits/56417 These two concepts had been present for discussion for some days. The descision, to go with the second concept, has been made during the Backup Meeting by phone conference on October 21: https://intranet.mysql.com/secure/mailarchive/mail.php?folder=230&mail=2056 > > * Took me some time to understand the role of the bcat_ functions. I > think this should be explained a bit better. Do the implementation > have to be this separate for the client and the server, or would it > be possible to share some more code? It took me some time too. To say the least. These functions are defined by the stream library interface. I had no influence on their architecture. I must admit that I didn't spend much time in analyzing how to share code with the server. Some small experiments required me to include most of the server code, which didn't look right. I would also need to learn, how the catalog is constructed in the server. A quick look shows that a couple of classes are involved. I usually have a hard time to follow all the classes used in the backup code. Since not even Rafal suggested to do it, I didn't even try. > > * Is it safe to extend the structs like st_bstream_image_header? Are > you guaranteed that the stream library will never make any > assumptions about the size of the struct? I don't know, what you mean. Can you please give an example of an "extension"? > > * I would prefer if the addition of error injection to mysys routines > was done as a separate patch. This is a general mechanism and I > feel that is not related to backup. I also think other people than > members of the backup team should review this. I can't imagine, how this could be done. Such a task wouldn't get sufficient priority to be looked at by anyone. But I could remove it from the backup client patch. I have tested, what I wanted to test with it. > > * The code contains a lot of hard-coded English messages. Will there > not be a requirement to internationlize this? Possible. But I can't see that any of our command line clients attempts to do this. > > > Some more detailed questions/comments inline: > > ... > >> === modified file '.bzrignore' >> --- a/.bzrignore 2008-10-29 14:20:12 +0000 >> +++ b/.bzrignore 2008-11-25 19:33:34 +0000 >> @@ -314,6 +314,10 @@ client/mysql_upgrade >> client/mysqladmin >> client/mysqladmin.c >> client/mysqladmin.cpp >> +client/mysqlbackup >> +client/stream_v1.h >> +client/stream_v1_services.h >> +mysql-test/mysqlbackup-test.bak > > Why are the three last entries added? I want bzr to ignore these files when running gcommit. client/stream_*.h are symbolic links to files in sql/backup. They are required for #include directives. The Makefile for the client directory does not use an -I../sql option, so I didn't add an -I../sql/backup option either. mysql-test/mysqlbackup-test.bak is created during backup_client.test. I do not want to delete it at the end of the test case so that I have something to feed mysqlbackup with in manual tests. > > ... > >> +/* >> + Error injection. Built on DBUG. >> + This is similar to error injection in the server. >> + In case that one becomes globally available, we undefine it first. >> +*/ >> +#ifdef ERROR_INJECT >> +#undef ERROR_INJECT >> +#endif >> +#define ERROR_INJECT(_keyword_, _action_) \ >> + DBUG_EXECUTE_IF((_keyword_), errm("ERROR_INJECT(\"%s\")\n", > (_keyword_)); \ >> + DBUG_PRINT("bupstrm", ("ERROR_INJECT(\"%s\")\n", > \ >> + (_keyword_))); _action_) > > Could not this be a general macro and not particular to this file? Yes, but then I think this must be accepted outside the team. In mysql_priv.h we have an ERROR_INJECT macro already, though conditionally compiled. This means, it is not generally usable, but could clash, when enabled. Mikael Ronström made this one. Even though it is not used anywhere, we would need his approval for a change. Note that his error injection feature works slightly different than mine. So this should be carefully considered. We might however find a place which is common to backup code. > >> + >> +/* For easier error tracking we do not pre-allocate in DBUG mode. */ >> +#if !defined(DBUG_OFF) >> +#define DYN_ALLOC_INIT 0 >> +#define DYN_ALLOC_INCR 1 >> +#else >> +#define DYN_ALLOC_INIT 1000 >> +#define DYN_ALLOC_INCR 1000 >> +#endif > > Why is this done? For efficiency. In a debug server, every insertion into a dynamic array causes a call to my_malloc(). This allows for better error tracking. For example we would notice faster if we store pointers to array elements. Every reallocation can invalidate such address. In a production server, we preallocate 1000 array elements and need to reallocate only for every 1000th insert. > >> + >> +/* >> + ==================================================== >> + Convenience functions for access to st_blob objects. >> + ==================================================== >> +*/ >> + >> +/** >> + Backup blob length. >> + >> + This is a function, not a macro, for better type checking. >> + >> + @param[in] blob blob >> + >> + @return length >> +*/ >> + >> +ulong BBL(struct st_blob *blob) > > Why name it like a macro, when it is a function? I have BBL(), BBS(), and BBLS(). The latter must be a macro, the first two could be a macro. IMHO it would be more complicated to understand, why they follow different naming schemes. Having those, that can be functions, be functions, helped me in finding problems. While I write this, I notice a potential optimization, which would also please you preference: #if !defined(DBUG_OFF) /* Backup blob length. */ ulong backup_blob_length(struct st_blob *blob); #define BBL(_b_) backup_blob_length(_b_) /* Backup blob string. */ char *backup_blob_string(struct st_blob *blob); #define BBS(_b_) backup_blob_string(_b_) #else #define BBL(_b_) ((ulong) (blob->end - blob->begin)) #define BBS(_b_) ((char*) blob->begin) #endif /* Backup blob length and string. This is for use with printf format '%.*s'. This must be a macro because it produces two comma separated values! */ #define BBLS(_b_) ((uint) BBL(_b_)), BBS(_b_) This would even save some function calls on a production system. However, problems with argument types might lead to error messages about backup_blob_*() in lines, which have BBL/BBS/BBLS only. ... >> +/* >> + ==================================================== >> + Memory allocation. >> + These are callback functions for the stream library. >> + ==================================================== >> +*/ >> + >> +/** >> + Allocate given amount of memory and return pointer to it. >> + >> + @param[in] size amount of memory to allocate >> + >> + @return pointer to allocated memory >> +*/ >> + >> +bstream_byte* bstream_alloc(unsigned long int size) >> +{ >> + return my_malloc(size, MYF(MY_WME)); >> +} >> + >> + >> +/** >> + Free previously allocated memory. >> + >> + @param[in] ptr pointer to allocated memory >> +*/ >> + >> +void bstream_free(bstream_byte *ptr) >> +{ >> + my_free(ptr, MYF(MY_ALLOW_ZERO_PTR)); >> +} >> + > > It seems a bit strange that while I can't find that bstream_alloc is > used anywhere, bstream_free is used in several places. Should not > these come in pairs? There is little backup specific about the these > functions. Are they actually needed, and if yes, why not make the > functionality generally available? These are callbacks for the stream library. The stream library allocates the data that it hands over to the bcat_* callback functions. It is the duty of the calling application to free it after use. One can go without defining these functions in the application by defining BSTREAM_USE_MALLOC before including stream_v1_services.h. but then they use malloc() and free() directly, instead of my_malloc() and my_free(). The main difference is that the latter report errors. So I could never be sure if an error has been reported when a stream functions returns BSTREAM_ERROR. ... >> + 'stream_pos' is used to describe which section/chunk of a backup >> + image we have read last. >> +*/ >> +struct st_stream >> +{ ... >> + const char *stream_pos; /* Verbal stream > position */ > > It seems a bit strange to me to use a string to record position. Why > not an enum? The stream position, which I store here, is not a "record position". It is more like a "chunk" position. The value are: "prefix", "header", "catalog", "meta data", "table data", and "summary". They reflect the structure of a backup image. This is used for error reporting like this: errm("end of stream after %s.\n", strm->stream_pos); We could use enums, but error reporting would require translation of enums to strings. Probably in a function like this: char *verbal_stream_pos(enum enum_stream_pos strpos) { switch (strpos) { case ENUM_STREAM_POS_META_DATA: return "meta data"; ... } We would need to pre-define the enum with all the positions that we want to report. Whenever we want to report about a new position, we would need to extend the enum and the function. Using an enum in the structure would save 4 bytes on a 64-bit system. But since we open one stream only, it would not multiply. So it is not a big win. In the proposed patch, I store a string address of a string literal. No string copy is done. The literals are required anyway. A new position just requires to set strm->stream_pos to a new string literal. For my taste, this is simple, easy, and efficient. ... >> +/** >> + Read from the stream/image. >> + >> + @param[in,out] strm stream handle, updating position >> + @param[in,out] data data container, updating contents and > ptrs > > I do not think it is easy to understand from this comment what is > producer and comsumer here. And what "updating contents and ptrs" > implies. Agree. I tried to keep the parameter descriptions short. This didn't work out well. Please note that this is also a callback for the stream library. It's signature is specified in stream_v1_services.h: + /** + Function reading bytes from the underlying media. + For specification, see @c bstream_read_part() function. + */ + typedef int (*as_read_m)(void *, bstream_blob*, bstream_blob); However, bstream_read_part() is the caller of the as_read_m typed function. So this comment basically says "Go and figure out yourself, how it is used." That's what I had to do. Fortunately, at least the function comment of bstream_read_part() tries to explain, how "bstream_blob"s are used. And, there is a reference implementation in stream.cc: stream_read(). Unfortunately, its comments doesn't add much insight. In the end I was happy that I made it work at all. I didn't feel much of an obligation to explain it better than what I had to work with. But now I see, I was in error. I take the blame. How about this description: The 'data' parameter is a reference to an st_blob structure, which contains two pointers, 'begin' and 'end'. When the function is called, the pointers point into a buffer to receive the data from the stream. They are to be placed starting at 'begin' and added up to 'end'. On return of the function, the 'end' pointer is adjusted towards 'begin', if less than the requested amount of data was read from the stream. > >> + @param envelope not used > > What is the intention of envelope? From the reference implementation and the calling function, I guess it is the buffer as it has been allocated and into which the 'data' pointers point. It is not necessary that 'data->begin' is at 'envelope->begin', nor that 'data->end' is at 'envelope->end'. But both 'data' pointers must be between the 'envelope' pointers. However, the parameter is never used. Not in my implementation, nor in the reference implementation. I don't even have an idea, what it could be useful for. ... >> +static int >> +str_read(struct st_stream *strm, struct st_blob *data, >> + struct st_blob envelope __attribute__((unused))) ... > > I suggest to put the compression related code in separate functions. > That would ease the reading of this code. > >> +#ifdef HAVE_COMPRESS >> + if (strm->zbuf) Ok. ... >> + /* Set output buffer pointer and size. */ >> + zstream->next_out= data->begin; > > Why not use BBS here? BBS returns char*. Both 'next_out' and 'begin' are unsigned char*. ... >> +#ifdef NOTUSED >> +/** >> + Skip part of the stream/image. ... >> +*/ >> + >> +static int >> +str_forward(struct st_stream *strm, size_t *len) > > Is there a plan for using this function? Or can it be removed? It seems that it can be used if the catalog contains "extra data", which won't be the case with today's backup, if I understand correctly, and in cases where reading a "chunk" is abandoned before its end and the next chunk is requested. Both things don't happen with current code. Hence I cannot test the function and so I defined it away. One day the situation may change. Then we have something to start with and don't need to think about it from the beginning. ... >> +static int >> +str_open_rd(struct st_stream *strm, const char *path, uint *version_p) >> +{ >> + MY_STAT stat_area; >> + size_t lgt; >> + int rc= BSTREAM_OK; >> + int errpos= 0; >> + uint16 version; >> + const unsigned char backup_magic_bytes[8]= >> + { >> + 0xE0, // ###..... >> + 0xF8, // #####... >> + 0x7F, // .####### >> + 0x7E, // .######. >> + 0x7E, // .######. >> + 0x5F, // .#.##### >> + 0x0F, // ....#### >> + 0x03 // ......## >> + }; > > I suggest that these magic bytes are defined a place where the same > definition could be used for both server and client. Hehe. I tried this with another define. The problem is that the headers in sql/backup/ are heavily interwoven with server code. The two exceptions stream_v1.h and stream_v1_services.h are not appropriate either, because Rafal's architecture is so that the magic bytes will never change, while a new version of the stream library may change everything. So we would probably need a new header file. I'm awaiting your proposal. ... >> + /* >> + Open the image file. >> + */ >> + strm->fd= my_open(strm->path, O_RDONLY, MYF(0)); >> + if (strm->fd < 0) >> + { >> + errm("cannot open backup image '%s': %s\n", strm->path, > strerror(my_errno)); > > Why generate your own error here? Could you not let my_open generate > the error? The only two errors that my_open(...MYF(MY_WME)) can generate are EE_FILENOTFOUND and EE_OUT_OF_FILERESOURCES. This is too limited for my taste. my_errno contains errno from open(2) and thus covers a better range of possible open errors. ... > Here, too, I suggest to put compress code in separate method. > >> +#ifdef HAVE_COMPRESS >> + /* >> + Check for compression. >> + */ >> + if (!memcmp(prefix, gzip_magic_bytes, sizeof(gzip_magic_bytes))) Ok. ... >> + goto err; >> + } >> + DBUG_PRINT("bupstrm", ("opened stream: '%s' fd: %d rc: %d", >> + strm->path, strm->fd, rc)); >> + >> + goto end; > > Why not just return here? I like to minimize the number of DBUG_RETURN()s, because they add extra code. Not a big deal, but anyway. ... >> + case BSTREAM_IT_CHARSET: >> + case BSTREAM_IT_USER: >> + case BSTREAM_IT_TABLESPACE: >> + { >> + struct st_backup_global *bup_global; >> + DYNAMIC_ARRAY *array; >> + >> + /* Allocate memory for the item. */ >> + ERROR_INJECT("bcat_add_item_global_malloc", > my_malloc_error_inject= 1;); >> + bup_global= (struct st_backup_global*) >> + my_malloc(sizeof(struct st_backup_global), MYF(MY_WME)); >> + if (!bup_global) >> + { >> + /* Error message reported by mysys. */ >> + brc= BSTREAM_ERROR; >> + goto end; >> + } >> + /* Copy the item struct into the allocated struct. */ >> + bup_global->glb_item= *item; >> + /* Initialize other elements. */ >> + bzero(&bup_global->glb_metadata, > sizeof(bup_global->glb_metadata)); > > The above allocation is very similar for case of this switch > statement. I suggest to parameterize it in a helper function. I have brooded over this a couple of times. But I always abstained from the attempt to do it. The only things common to all cases are the malloc and the assignment of the item. Metadata is not always present. And if it is present, it has a different place in every object type. When using a helper function, its return value must be casted in each case anyway. The error check has to be done anyway. In summary, I could save the assignment of the item. I don't think it's worth it. > > ... > >> +/** >> + Create global iterator of a given type. >> + >> + Possible iterator types. >> + >> + - BSTREAM_IT_CHARSET: all charsets >> + - BSTREAM_IT_USER: all users >> + - BSTREAM_IT_TABLESPACE: all tablespaces >> + - BSTREAM_IT_DB: all databases >> + >> + The following types of iterators iterate only over items for which >> + some meta-data should be saved in the image. >> + >> + - BSTREAM_IT_GLOBAL: all global items in create-dependency order >> + - BSTREAM_IT_PERDB: all per-db items except tables which are > enumerated >> + by a table iterator (see below) >> + - BSTREAM_IT_PERTABLE: all per-table items in create-dependency > orders. >> + > > I do not quite understand this comment. Are the three latter no > possible iterator types since they are in another list? I'm not sure either. I copied the comment verbatim from the stream library interface definition in stream_v1_services.h. I guess that "Possible iterator types" applies to all the iterator types, but that the latter three have an additional comment. ... >> + /* purecov: begin inspected */ >> + case BSTREAM_IT_CHARSET: >> + iter->it_array= &bup_catalog->cat_charsets; >> + DBUG_PRINT("bupstrm", ("charset")); >> + break; >> + >> + case BSTREAM_IT_USER: >> + iter->it_array= &bup_catalog->cat_users; >> + DBUG_PRINT("bupstrm", ("user")); >> + goto err; >> + /* purecov: end */ > > Is it not possible to test the above, yet? Right. We don't backup character sets and users yet. > >> + >> + case BSTREAM_IT_DB: >> + iter->it_array= &bup_catalog->cat_databases; >> + DBUG_PRINT("bupstrm", ("database")); >> + break; >> + >> + /* purecov: begin inspected */ >> + case BSTREAM_IT_TABLESPACE: >> + iter->it_array= &bup_catalog->cat_tablespaces; >> + DBUG_PRINT("bupstrm", ("tablespace")); >> + goto err; >> + >> + case BSTREAM_IT_GLOBAL: >> + errm("iterator type not yet implemented: global\n"); >> + DBUG_PRINT("bupstrm", ("NOTYET implemented: global")); >> + goto err; >> + >> + case BSTREAM_IT_PERDB: >> + errm("iterator type not yet implemented: perdb\n"); >> + DBUG_PRINT("bupstrm", ("NOTYET implemented: perdb")); >> + goto err; >> + /* purecov: end */ > > Ditto. Well, I don't know why these have not been used during all the tests I did. There were definitely tablespaces and views (per-db items) in the backup files. A vague guess may be that all of these have no sub-objects. So that the stream library does not need to access the "parent" structure ever. Only the databases iterator was used. The catalogs contained sub-items of databases. When handing over such sub-type to the bcat_add_item() function, it sets up the item structure with a reference to the database object within the application's catalog. To get at that reference, it needs to use an iterator to find the database by its "catalog position". > > ... > >> + >> +/** >> + Create database object from its meta-data. >> + >> + When the meta-data section of backup image is read, items can be > created >> + as their meta-data is read (so that there is no need to store these >> + meta-data). This functions stores them in the catalog instead of >> + creating database objects. So the application can make different use >> + of the data. > > I did not quite understand this comment. Are you saying that items > CAN be created as the meta-data is read, but that this is not the way > it is done? Yes. The wording is not good. The problem is that the declaration of this function in stream_v1_services.h has this comment: "When the meta-data section of backup image is read, items are created as their meta-data is read (so that there is no need to store these meta-data). Backup stream library calls this function to create an item." I wanted to say that this ws the initial intent of the semantics for this function. But I changed it to *not* creating the item, but instead store the meta-data in the catalog. I want to display them later. ... >> + /* >> + Now that we know, how many snapshots we have, initialize the >> + snapshot pos to table index arrays. >> + */ >> + DBUG_PRINT("bupstrm", ("allocating snapshot indexes: %u", >> + bup_catalog->cat_header.snap_count)); >> + for (idx= 0; idx < bup_catalog->cat_header.snap_count; idx++) >> + { >> + struct st_backup_snapshot *bup_snapshot; >> + >> + /* Allocate memory for the item. */ >> + ERROR_INJECT("backup_image_open_snapshot_malloc", >> + my_malloc_error_inject= 1;); >> + bup_snapshot= (struct st_backup_snapshot*) >> + my_malloc(sizeof(struct st_backup_snapshot), MYF(MY_WME)); >> + if (!bup_snapshot) >> + { >> + /* Error message reported by mysys. */ >> + rc= BSTREAM_ERROR; >> + goto err; >> + } >> + /* Initialize a new index array. */ >> + rc= my_init_dynamic_array(&bup_snapshot->snap_index_pos_to_table, >> + sizeof(struct st_backup_table*), >> + DYN_ALLOC_INIT, DYN_ALLOC_INCR); >> + ERROR_INJECT("backup_image_open_index", if (!rc) {rc= TRUE; >> + delete_dynamic(&bup_snapshot->snap_index_pos_to_table);}); >> + ERROR_INJECT("backup_image_open_index1", if (idx && !rc) {rc= TRUE; >> + delete_dynamic(&bup_snapshot->snap_index_pos_to_table);}); >> + if (rc) >> + { >> + /* Error message reported by mysys. */ >> + my_free(bup_snapshot, MYF(0)); >> + rc= BSTREAM_ERROR; >> + goto err; >> + } >> + /* Check consistency of array position. */ >> + DBUG_ASSERT(idx == bup_catalog->cat_snapshots.elements); >> + /* Insert in the catalog. */ >> + ERROR_INJECT("backup_image_open_snapshot_insert", >> + my_malloc_error_inject= 1;); >> + rc= insert_dynamic(&bup_catalog->cat_snapshots, (uchar*) > &bup_snapshot); >> + if (rc) >> + { >> + /* Error message reported by mysys. */ >> + delete_dynamic(&bup_snapshot->snap_index_pos_to_table); >> + my_free(bup_snapshot, MYF(0)); >> + rc= BSTREAM_ERROR; >> + goto err; >> + } > > The error handling is a bit inconsistent in this function. For the > first part you set errpos a do the cleanup at the end. For the latter > part you do clean-up before going to the end. Why not move most of > the content of the above error handling to 'err:'? Calling my_free > (providing proper initialization and allowing for null) and setting > rc, could be done unconditionally. No reason to repeat that code for > each function call. Hm. I just noticed that rc= BSTREAM_ERROR in the loop is not required. rc isn't used later. Now we argue about combining two my_free(bup_snapshot, MYF(0)) into one. To do this, I must move bup_snapshot from the loop to the top of the function and either initialize it to NULL, or to set another errpos within the loop after an allocation. Please tell me your preference. I'll change it accordingly, though I'd still prefer to keep bup_snapshot in the loop. ... >> +enum enum_bstream_ret_codes >> +backup_read_catalog(void* image_handle, struct st_backup_catalog > *bup_catalog) >> +{ >> + struct st_stream *strm= (struct st_stream*) image_handle; >> + enum enum_bstream_ret_codes brc; >> + DBUG_ENTER("backup_read_catalog"); >> + DBUG_ASSERT(strm); >> + DBUG_ASSERT(bup_catalog); > > Would it not be appropriate here and in the other similar _read_ > functions to assert that stream_pos is as expected? - This would not work with the current way, I assign string literals to stream_pos (unless we do string comparison). I would at least have to use file-global string constants, if not enums as you suggested. Since I found that I did a string compare for "meta data", I did now define string constants like: const char *STREAM_POS_META_DATA= "meta data". So this argument is void now. - Assertions are used to catch flaws in the implementation. Wrong stream pos is more likely to happen due to data corruption. Then it is better to check with 'if'. - I guess that this check is done in the stream library. But if you think we should check it here anyway, then ok. - It may take some time to figure out allowed combinations. For example, I guess that table data do not need to be present. If the summary follows the meta data, this should proably be accepted. Summary can be inline in the header. Etc. ... >> +enum enum_bstream_ret_codes >> +backup_read_snapshot(void *image_handle, >> + struct st_backup_catalog *bup_catalog >> + __attribute__((unused)), >> + struct st_bstream_data_chunk *snapshot) >> +{ >> + struct st_stream *strm= (struct st_stream*) image_handle; >> + enum enum_bstream_ret_codes brc; >> + DBUG_ENTER("backup_read_snapshot"); >> + DBUG_ASSERT(strm); >> + DBUG_ASSERT(snapshot); >> + >> + /* >> + First read data chunk requires search of next chunk start. >> + */ > > I do not quite understand this comment. I think you mean that before > reading the first data chunk, you must search for it while this is not > necessary for the succeeding chunks. Correct. The sentence is broken. > > ... > >> + >> + >> +/* >> + =================== >> + Catalog navigation. >> + =================== >> +*/ >> + >> +/* >> + Catalog items are located by "catalog coordinates". The format of the >> + catalog coordinates depends on the type of item. It is specified as >> + follows: >> + >> + [item position (global)]= [db no.] >> + [item position (table)]= [ snap no. ! pos in snapshot's table list ] >> + [item position (other per-db item)]= [ pos in db item list ! db no. ] >> + [item position (per-table item)] = [ pos in table's item list ! >> + db no. ! table pos ] > > What does '!' mean. If it is just a separator, please, choose > another character so that people do not believe it has something to do > with negation. I copied the from stream_v1.c. I hesitate to introduce an inconsistent syntax notation here. ... >> +/* >> + ================================= >> + Convenience functions and macros. >> + ================================= >> +*/ >> + >> +/* Backup blob length. */ >> +ulong BBL(struct st_blob *blob); >> + >> +/* Backup blob string. */ >> +char *BBS(struct st_blob *blob); >> + >> +/* >> + Backup blob length and string. >> + This is for use with printf format '%.*s'. >> + This must be a macro because it produces two comma separated values! >> +*/ >> +#define BBLS(_b_) ((uint) BBL(_b_)), BBS(_b_) >> + >> +/* The above shall also be usable by C++. */ >> +C_MODE_END > > I am not sure I like this mix of macros and functions spread across > several files. Would it not be better to define everything in the > same place? I think this is the usual way to do it. Header files contain declarations of global functions and macros that are used in multiple files. Function definitions go into C files. > More self-explanatory names would also help. Wait until you see the usage of these functions/macro in mysqlbackup.cc. I guess, you wouldn't like self-explanatory names in the argument lists of printf()... > > That's how far I have got for now. More to come ... Thank you very much! Regards Ingo -- Ingo Strüwing, Database Group Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schröder, Wolfgang Engels, Dr. Roland Bömer Vorsitzender des Aufsichtsrates: Martin Häring HRB München 161028
