| List: | Commits | « Previous MessageNext Message » | |
| From: | Ingo Strüwing | Date: | February 12 2009 9:39pm |
| Subject: | Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745) WL#4630 | ||
| View as plain text | |||
Hi Rafal, Rafal Somla, 22.12.2008 13:14: ... >> 2745 Ingo Struewing 2008-11-25 >> WL#4630 - ST: MySQL Backup Client Program - Milestone 1 >> The MySQL Backup Client Program is a command line client >> program, that examines and displays the contents of a >> backup image. >> By command line options you can select different types >> of information to view. >> It does also have a search capability for finding >> specific database objects in the backup image. >> For a detailed explanation of the options, run >> mysqlbackup --help ... > [1] Suggestion: Use a macro here (perhaps only in production build). Done. > > ... >> + >> + >> +#ifdef NOTUSED > > [2] Hmm, should it be commited in? I tend to leave it in. I spent some time to make it working, until gcov showed it was not used. If one day we need it back, we could safe a little effort. > >> +/** >> + Skip part of the stream/image. >> + >> + @param[in,out] strm stream handle, updating position >> + @param[in,out] len number of bytes to skip, skipped >> + >> + @return status >> + @retval BSTREAM_OK ok >> + @retval otherwise error >> + >> + @note The return value is specified as 'int' in stream_v1.h >> + though only values from enum_bstream_ret_codes are expected. >> +*/ >> + >> +static int >> +str_forward(struct st_stream *strm, size_t *len) >> +{ > ... >> + >> + >> +/** >> + Open the stream/image for reading. >> + >> + A backup stream has a prefix, consisting of 8 bytes magic number >> + and 2 bytes version number. >> + >> + TODO A compressed image has a gzip magic number. > > [3] This is done already, or? Yes. "TODO" flag removed. ... >> + const unsigned char backup_magic_bytes[8]= >> + { >> + 0xE0, // ###..... >> + 0xF8, // #####... >> + 0x7F, // .####### >> + 0x7E, // .######. >> + 0x7E, // .######. >> + 0x5F, // .#.##### >> + 0x0F, // ....#### >> + 0x03 // ......## >> + }; >> +#ifdef HAVE_COMPRESS >> + const unsigned char gzip_magic_bytes[3]= >> + { >> + 0x1f, >> + 0x8b, >> + 0x08 >> + }; >> +#endif >> + uchar prefix[sizeof(backup_magic_bytes) + sizeof(version)]; > > [4] I think it is better not to use sizeof() here. The prefix must be 10 > bytes long - this is specified by backup stream format. Why to depend on > compiler having expected basic type sizes if we know the exact number. > However, it is ok to leave the code as is. If the sizeof()s are not 8 and 2, then there is no guarantee that the wanted values are at the expected bytes. Compromise: I'll add an assert. .... >> + bup_global= (struct st_backup_global*) >> + my_malloc(sizeof(struct st_backup_global), MYF(MY_WME)); >> + if (!bup_global) >> + { >> + /* Error message reported by mysys. */ >> + brc= BSTREAM_ERROR; >> + goto end; >> + } >> + /* Copy the item struct into the allocated struct. */ >> + bup_global->glb_item= *item; >> + /* Initialize other elements. */ >> + bzero(&bup_global->glb_metadata, > sizeof(bup_global->glb_metadata)); > > [5] Perhaps: > > bzero(bup_global, sizeof(*bup_global)); > bup->global->glb_item= *item; > > so that when structure layout changes it will still be properly > initialized. In MySQL we usually try to avoid unnecessary initializations. Especially when major parts of the initialized stuff are overwritten immediately afterwards. When the structure layout changes, caution is required anyway. Finally, if doing, what you suggest, then the right way would be to use the MY_ZEROFILL option to my_malloc(). > > ... >> +int >> +bcat_create_item(struct st_bstream_image_header *hdr, >> + struct st_bstream_item_info *item, >> + struct st_blob query, >> + struct st_blob data) >> +{ >> + struct st_backup_catalog *bup_catalog= (struct >> st_backup_catalog*) hdr; >> + struct st_backup_metadata mdata; >> + enum enum_bstream_ret_codes brc= BSTREAM_OK; > > [6] Suggestion: change initial value of brc to BSTREAM_ERROR and set it > to BSTREAM_OK when you know that all is ok. This can save few lines of > code and make it safer to change the code in the future (correct > de-allocation at end: label). Done. > > ... > > [7] Perhaps add a comment explaining that the code below frees memory > allocated by backup stream library. Done. When writing this, you didn't get aware that this should be part of the backup stream library interface documentation? .... >> +/** >> + Open a backup image. > > [8] For reading, right? Yes. Added. ... >> + strm= my_malloc(sizeof(struct st_stream), MYF(MY_WME)); >> + if (!strm) >> + { >> + /* Error message reported by mysys. */ >> + goto err; > > [9] Here you jump to err when errpos is 0, but the swich doesn't cover > this case > (probably adding a default case is what is needed). I changed to: switch (errpos) { case ... case ... default: ; /* No (further) de-initializations required. */ } Ugly, but ok. ... >> + /* Add image version to header. */ >> + bup_catalog->cat_header.version= version; >> + DBUG_PRINT("bupstrm", ("set header version: %u", version)); > > [10] Suggestion: I'd move the above assignment before the call to > bastream_rd_header(), just after the version has been detected in > str_open_rd(). No. The stream library interface does not promise to keep pre-initialized header members intact. It does not even specify if the header needs to be pre-initialized, and which members exactly it fills in. ... >> +enum enum_bstream_ret_codes >> +backup_read_catalog(void* image_handle, struct st_backup_catalog >> *bup_catalog) > > [11] Suggestion: declare image_handle as st_stream* and then cast it to > void* if > needed (but should not be needed I think). Done. But please explain why it supports your beloved modularization that mysqlbackup.cc does now need information about the internal implementation of backup_stream.c. And it requires to include compression definitions into the display module mysqlbackup.cc. :( ... >> + /* >> + First read data chunk requires search of next chunk start. >> + */ > > [12] Suggestion: move this to the end of backup_read_metadata(). So that > this > function can be written under pre-condition that the stream is already > positioned at a start of next chunk. Done. I did also move the other next_chunk() calls to the end of the predecessor functions. That way it looks a little more like symmetry. However, now these functions include the search for the next chunk, which I preferred to do in the function, where I handle the next object. For example, the catalog reading function does now search for the meta data chunk and complains if it is missing. Another downside is that the error from the missing chunk ends mysqlbackup and the already read information is not printed any more. Also it costed some time to get the coverage test right as some sections changed place. Now that I made this change to please you, can you please explain, what the win is? One less string comparison per table data chunk? Or...? BTW, triggered by Øystein's review I had changed this "if" to: if (strm->stream_pos == STREAM_POS_META_DATA) and had this at file begin: ... const char *STREAM_POS_CATALOG= "catalog"; const char *STREAM_POS_META_DATA= "meta data"; const char *STREAM_POS_TABLE_DATA= "table data"; ... So the whole change was just a pointer comparison I got rid of. ... >> + /* >> + Read data chunk. >> + */ > > [13] Perhaps add a note explaining that memory for storing the data is > allocated and freed by backup stream library so that future readers of > this code will not ask the question which I asked myself: "where the > memory for storing the data comes from and what happens to it when it is > not needed any more?". Done, but are you serious about the application to document, what data the backup stream library allocates? ... >> + [item position (table)]= [ snap no. ! pos in snapshot's table list ] > > [14] I think it is the other way around - first position of the table in > the > snapshot, then the snapshot number. No. Copied verbatim from stream_v1.c. ... >> + /* >> + Get a reference to the global object. >> + It does not fail because we checked that pos is in range. >> + Note that the array contains pointers only. >> + */ >> + bup_global= *((struct st_backup_global**) dynamic_array_ptr(array, >> pos)); > > [15] Suggestion: Here and in other places, check your assumption with > assert: > > DBUG_ASSERT(bup_global); Overkill IMHO, but ok. I added at a dozen places like: DBUG_ASSERT(bup_global); // Never fails, pos is checked ... >> + /* Check consistency of array position. */ >> + if (bup_global->glb_item.pos != pos) >> + { >> + /* purecov: begin inspected */ >> + errm("position of stored %s item: %lu " >> + "does not match position in catalog array: %lu\n", >> + typnam, bup_global->glb_item.pos, pos); > > [16] Consider: warn here and continue. If these position do not match, I might return the wrong object. Why do you think this would be safe? In contrast, this can only happen if bup_global->glb_item.pos is changed after bup_global was added to the array. I check these positions before every insert (there's an exception for tables though). So an assert may even be better than the "if". ... >> +enum options_mysqlbackup >> +{ >> + OPT_MYSQLBACKUP_CATALOG_SUMMARY=1024, > > [17] Why 1024? What does it have to allign with? The mysys function handle_options(), my_print_help(), and my_print_variables() take an options array as argument, which identifies the options by an integer value. This value is a single character for many options like 'v' for verbose. Many characters are used up for standard MySQL options. Options that do not need a single letter equivalent can specify identifiers > 255. This is done in client_priv.h: enum options_client { OPT_CHARSETS_DIR=256, OPT_DEFAULT_CHARSET, OPT_PAGER, OPT_NOPAGER, OPT_TEE, OPT_NOTEE, ... for options common to all clients. Client-specific options use numbers above these. I selected 1024, which feels safe. But now I found an example in mysqltest.cc, how to do it better: enum { OPT_SKIP_SAFEMALLOC=OPT_MAX_CLIENT_OPTION, OPT_PS_PROTOCOL, OPT_SP_PROTOCOL, OPT_CURSOR_PROTOCOL, ... I'll change 1024 to OPT_MAX_CLIENT_OPTION and add a comment. ... >> +void print_item(uint indent, void *item_arg, uint what) > > [18] Suggestion: declare item_arg as st_bstream_item_info* type. Done, inclusive the required casts at every call of print_item(). Some changed lines became too long and required a line break. I feel the client code to become more ugly the longer I follow these suggestions... Again. What is the win? A saved assignment? And what is the difference to the "void*", the backup stream library interface makes use of? > > ... >> +/** >> + Search backup object. > > [19] ... In the given catalog. What is the result of the search and how > is it > communicated to the caller? Extended function comment. > >> + >> + @param[in] search_name search name >> + @param[out] search_database_name database name, may be NULL >> + @param[out] search_object_name object name >> + >> + @return status >> + @retval 0 ok >> + @retval != 0 error >> +*/ >> + >> +void > > [20] Bad return type for a function which is supposed to return status... Fixed function comment. ... > [21] Why a separate search loop for database? Looks like it could be > handled by > search_array_array loop. The bup_catalog->cat_databases array contains pointers to a different struct. If combining these, I would need to have branches for both data types and/or be casting to death. ... >> + if (!my_wildcmp(&my_charset_latin1, (char*) name->begin, >> + (char*) name->end, (char*) search_object_name, >> + (char*) search_object_name + >> search_object_name_len, >> + '\\', '_', '%')) >> + { >> + print_item(4, bup_perdb, PRI_NAME | PRI_META); >> + } >> + } >> + } >> + } >> + } > > [22] This code repeats again - perhaps a helper function should be defined. Done. ... >> + if (brc != BSTREAM_OK) >> + { >> + goto end; >> + } > > [23] Our coding guidelines do not allow braces in such situation... :P Well, I read: "After if or else or while, when there is only one instruction after the condition, braces are not necessary and the instruction goes on the next line, indented." The important word is "not necessary". But to proceed with uglifying the code, I change to: if (brc != BSTREAM_OK) { /* Bail out on error. */ goto end; } ... >> + /* >> + Binlog coordinates are present, if the BSTREAM_FLAG_BINLOG flag is >> + set in the header *and* table data are in the image. >> + DBUG_ASSERT(!(hdr->flags & BSTREAM_FLAG_BINLOG) || >> + hdr->binlog_pos.file || hdr->binlog_group.file); >> + */ > > [24] This was a bug which is fixed now. Now binlog coordinates are > always present if BSTREAM_FLAG_BINLOG is set. The code should be changed > to reflect this. You must be mistaken. I added an assert that verifies your claim. It crashed on binlog_group.file == NULL. ... >> + if (opt_mysqlbackup_search || >> + opt_mysqlbackup_snapshots || >> + opt_mysqlbackup_catalog_summary || >> + opt_mysqlbackup_catalog_details || >> + opt_mysqlbackup_metadata_statements || >> + opt_mysqlbackup_metadata_extra || >> + opt_mysqlbackup_data_chunks || >> + opt_mysqlbackup_data_totals || >> + (opt_mysqlbackup_summary && !(hdr->flags & >> BSTREAM_FLAG_INLINE_SUMMARY))) > > [25] Suggestion: wouldn't it be more clear to formulate condition when > we *do not* want to read the catalog? Hehe. We want to read the catalog, if *any* of the options is present. We do not want to read, if *none* is present. So would inverting the condition become more clear? To my great surprise there are indeed humans who find double negations easier to understand. But I fail to do so. However, I found a different way to make it slightly less ugly. I add a flag per section of the backup image, e.g. "bool need_catalog". When reading the command line options, I set the corresponding flag. After reading the header, I know, if the summary section is contained. So I can clear need_summary, if it is in the header. Then I escalate flags from end of image to the beginning. That is, if the summary is needed, I must read table data too, etc. Finally, if a search is requested, I need the catalog. Now I have a single flag per section to decide if the section is required. ... >> + for (idx= 0; idx < hdr->snap_count; idx++) > > [26] Here and in other places - assert that snap_count < 256 which is > the size of snapshot[] array (i.e., that you don't exceed array sizes). Done. ... >> + if (snapshot->options) >> + printf(" Snapshot %u options %u\n", >> + idx, snapshot->options); /* purecov: inspected */ > > [27] I think the options should be printed as hex number (so that user > can see the bits). Done. ... > [28] Suggestion: Below you iterate over databases and tables in each > database to collect tables belonging to a given snapshot. But I think it > would be simpler and more direct to iterate over snap_index_pos_to_table > array of the given snapshot. But is there a guarantee that tables belonging to a database are kept together? > >> + for (kdx= 0; kdx < bup_catalog->cat_databases.elements; kdx++) >> + { ... >> + bup_global= *((struct st_backup_global**) >> + dynamic_array_ptr(&bup_catalog->cat_charsets, >> idx)); > > [29] Suggestion: Check with assertion that bup_global is not NULL. Done like above with a comment that this cannot happen. ... >> + /* >> + Print meta data. >> + */ > > [29] Suggestion: print metadata info as it is stored in the image. > Here you show metadata in a logical order but I think it would add > certain value to show it in the order in which it is stored in the > image. This would give a closer insight into the structure of the image > and the purpose of this client is to show this structure. Note that the > order in which metadata is stored is important because it is supposed to > obey dependencies between objects. Being able to see the actual order > will help to diagnose potential problems. I see the value from a developer's point of view. However, for a user the inner structure of the backup image won't be of that much use. I think, a user prefers a logical order. > > However, I do realize that presenting this information as I suggest > would require big changes to the structure of this code, so I leave it > for your judgement if it is needed and worth the effort. > > In the backup image, metadata for various objects from the catalogue is > stored in following order: > > - metadata for global items; > - for each database, metadata for tables from that database; > - metadata for all the other per-database objects. > > Implementing my suggestion would require printing metadata from > bcat_create_item() callback function which is called when backup > catalogue is read by bstream_rd_catalogue(). No. Fortunately not. I just need two more dynamic arrays. One to hold pointers to the catalog items in the order as presented to bcat_add_item() (the catalog items), and one to hold pointers to the catalog items in the order as presented to bcat_create_item() (the meta data). But since I disagree with your suggestion, I hope you can forgive me, that I didn't implement it. Instead, I violated the processes and added a new command line option: --image-order. This option determines, if I step through the catalog hierarchy in the logical order when printing items or meta data, or if I step through the new arrays. I hope, you can live with the compromise, and that it doesn't make my patch inapprovable. ... >> + /* Charsets don't have meta data. */ >> + /* Users don't have meta data. */ > > [30] Considering changing image specification to allow metadata for all > types of objects in which case the code should handle it. It should be easy to add the required loops then. Otherwise I would need to add purecov comments, which I would have to remove when the image specification changed. So some code change will be necessary anyway. ... I made a new patch with the changes. http://lists.mysql.com/commits/66092 Regards Ingo -- Ingo Strüwing, Database Group Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schröder, Wolfgang Engels, Dr. Roland Bömer Vorsitzender des Aufsichtsrates: Martin Häring HRB München 161028
