| List: | Commits | « Previous MessageNext Message » | |
| From: | Rafal Somla | Date: | February 17 2009 8:50am |
| Subject: | Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745) WL#4630 | ||
| View as plain text | |||
Hi Ingo, See my replies below. In general I'm fine with all decisions you made. I'll have a look at the new patch as well and tell you if I see any problems there. I see we probably still have a problem with understanding the difference between reviewer's requirements and suggestions. My view of it is as follows: - Requirements are things which implementer has to address somehow in order to get the patch approved. - Suggestions are things which implementer is free to ignore - the approval of the patch does not depend on them. On top of that, my understanding is that "addressing an requirement" is not necessarily implementing it in the code. Another possibility is to convince the reviewer that the original solution is OK. Also, "ignoring a suggestion" means, for me, that the implementer does not even have to justify his decision (this is to save us some time - I'll not be offended by silently ignoring my suggestion). Ingo Strüwing wrote: > Hi Rafal, > > Rafal Somla, 22.12.2008 13:14: > ... > >>> 2745 Ingo Struewing 2008-11-25 >>> WL#4630 - ST: MySQL Backup Client Program - Milestone 1 >>> The MySQL Backup Client Program is a command line client >>> program, that examines and displays the contents of a >>> backup image. >>> By command line options you can select different types >>> of information to view. >>> It does also have a search capability for finding >>> specific database objects in the backup image. >>> For a detailed explanation of the options, run >>> mysqlbackup --help > ... > >> [1] Suggestion: Use a macro here (perhaps only in production build). > > Done. > >> ... >>> + >>> + >>> +#ifdef NOTUSED >> [2] Hmm, should it be commited in? > > > I tend to leave it in. I spent some time to make it working, until gcov > showed it was not used. If one day we need it back, we could safe a > little effort. > OK. >>> + const unsigned char backup_magic_bytes[8]= >>> + { >>> + 0xE0, // ###..... >>> + 0xF8, // #####... >>> + 0x7F, // .####### >>> + 0x7E, // .######. >>> + 0x7E, // .######. >>> + 0x5F, // .#.##### >>> + 0x0F, // ....#### >>> + 0x03 // ......## >>> + }; >>> +#ifdef HAVE_COMPRESS >>> + const unsigned char gzip_magic_bytes[3]= >>> + { >>> + 0x1f, >>> + 0x8b, >>> + 0x08 >>> + }; >>> +#endif >>> + uchar prefix[sizeof(backup_magic_bytes) + sizeof(version)]; >> [4] I think it is better not to use sizeof() here. The prefix must be 10 >> bytes long - this is specified by backup stream format. Why to depend on >> compiler having expected basic type sizes if we know the exact number. >> However, it is ok to leave the code as is. > > > If the sizeof()s are not 8 and 2, then there is no guarantee that the > wanted values are at the expected bytes. Compromise: I'll add an assert. > OK. >>> + bup_global= (struct st_backup_global*) >>> + my_malloc(sizeof(struct st_backup_global), MYF(MY_WME)); >>> + if (!bup_global) >>> + { >>> + /* Error message reported by mysys. */ >>> + brc= BSTREAM_ERROR; >>> + goto end; >>> + } >>> + /* Copy the item struct into the allocated struct. */ >>> + bup_global->glb_item= *item; >>> + /* Initialize other elements. */ >>> + bzero(&bup_global->glb_metadata, > sizeof(bup_global->glb_metadata)); >> [5] Perhaps: >> >> bzero(bup_global, sizeof(*bup_global)); >> bup->global->glb_item= *item; >> >> so that when structure layout changes it will still be properly >> initialized. > > > In MySQL we usually try to avoid unnecessary initializations. Especially > when major parts of the initialized stuff are overwritten immediately > afterwards. > > When the structure layout changes, caution is required anyway. > > Finally, if doing, what you suggest, then the right way would be to use > the MY_ZEROFILL option to my_malloc(). > As you wish - you are the implementer. >> >> [7] Perhaps add a comment explaining that the code below frees memory >> allocated by backup stream library. > > > Done. > When writing this, you didn't get aware that this should be part of the > backup stream library interface documentation? > Good point - added it to WL#4782. >>> + strm= my_malloc(sizeof(struct st_stream), MYF(MY_WME)); >>> + if (!strm) >>> + { >>> + /* Error message reported by mysys. */ >>> + goto err; >> [9] Here you jump to err when errpos is 0, but the swich doesn't cover >> this case >> (probably adding a default case is what is needed). > > > I changed to: > > switch (errpos) { > case ... > case ... > default: > ; /* No (further) de-initializations required. */ > } > > Ugly, but ok. > OK. >>> + /* Add image version to header. */ >>> + bup_catalog->cat_header.version= version; >>> + DBUG_PRINT("bupstrm", ("set header version: %u", version)); >> [10] Suggestion: I'd move the above assignment before the call to >> bastream_rd_header(), just after the version has been detected in >> str_open_rd(). > > > No. The stream library interface does not promise to keep > pre-initialized header members intact. It does not even specify if the > header needs to be pre-initialized, and which members exactly it fills in. > OK. >>> +enum enum_bstream_ret_codes >>> +backup_read_catalog(void* image_handle, struct st_backup_catalog >>> *bup_catalog) >> [11] Suggestion: declare image_handle as st_stream* and then cast it to >> void* if >> needed (but should not be needed I think). > > > Done. But please explain why it supports your beloved modularization > that mysqlbackup.cc does now need information about the internal > implementation of backup_stream.c. And it requires to include > compression definitions into the display module mysqlbackup.cc. :( > It does not. Apparently, st_stream is not a part of public interface, which I have not noticed when writing my suggestion. A pity you have not corrected me, but went on with implementing the suggestion which you disagree with. See also my next comment. > ... >>> + /* >>> + First read data chunk requires search of next chunk start. >>> + */ >> [12] Suggestion: move this to the end of backup_read_metadata(). So that >> this >> function can be written under pre-condition that the stream is already >> positioned at a start of next chunk. > > > Done. I did also move the other next_chunk() calls to the end of the > predecessor functions. That way it looks a little more like symmetry. > > However, now these functions include the search for the next chunk, > which I preferred to do in the function, where I handle the next object. > For example, the catalog reading function does now search for the meta > data chunk and complains if it is missing. > > Another downside is that the error from the missing chunk ends > mysqlbackup and the already read information is not printed any more. > > Also it costed some time to get the coverage test right as some sections > changed place. > > Now that I made this change to please you, can you please explain, what > the win is? One less string comparison per table data chunk? Or...? > Ingo, if you don't see any gain from this change then why have you implemented it? Note that this was just my suggestion - something you are not obliged to follow. If you do, then I assume you agree that it is a good idea. We might have many different goals when doing this job, but pleasing me is certainly not one of them (btw. what you have done did not please me - quite opposite actually). As a reviewer, I'd like to feel free to make suggestions in hope that they might be useful sometimes. But I consider you, the implementer, to be responsible for the implementation. After all, you create the code and you understand it best, including all the subtleties. Reviewing your code I might have some ideas which can be good or wrong. Just blindly following these suggestions and then throwing at me questions like the above feels unfair. It would be more fair if you *first* ask me questions, if you have any, and then decide if you want to follow the suggestion or not. >>> + /* >>> + Read data chunk. >>> + */ >> [13] Perhaps add a note explaining that memory for storing the data is >> allocated and freed by backup stream library so that future readers of >> this code will not ask the question which I asked myself: "where the >> memory for storing the data comes from and what happens to it when it is >> not needed any more?". > > > Done, but are you serious about the application to document, what data > the backup stream library allocates? > Hmm, I don't understand this question - can you explain/reformulate? > ... >>> + [item position (table)]= [ snap no. ! pos in snapshot's table list ] >> [14] I think it is the other way around - first position of the table in >> the >> snapshot, then the snapshot number. > > No. Copied verbatim from stream_v1.c. > Actually, what I'm telling you is that the documentation in stream_v1.c is wrong (there is BUG#40587 reported about that). If you don't want to correct the error in your comment then perhaps you can just remove the incorrect quote. Especially that the format in which the coordinates are stored inside the backup image is irrelevant for your code - this is handled by backup stream library - in your code you just read the coordinates from the members of st_bstream_*_info structures. >>> + /* >>> + Get a reference to the global object. >>> + It does not fail because we checked that pos is in range. >>> + Note that the array contains pointers only. >>> + */ >>> + bup_global= *((struct st_backup_global**) dynamic_array_ptr(array, >>> pos)); >> [15] Suggestion: Here and in other places, check your assumption with >> assert: >> >> DBUG_ASSERT(bup_global); > > > Overkill IMHO, but ok. I added at a dozen places like: > > DBUG_ASSERT(bup_global); // Never fails, pos is checked > To clarify and re-iterate: I made it a suggestion exactly because it can be considered an overkill (if I thought it is necessary, I'd make it a requirement). I think it is good but if you think it is an overkill, then simply don't follow the suggestion. >>> + /* Check consistency of array position. */ >>> + if (bup_global->glb_item.pos != pos) >>> + { >>> + /* purecov: begin inspected */ >>> + errm("position of stored %s item: %lu " >>> + "does not match position in catalog array: %lu\n", >>> + typnam, bup_global->glb_item.pos, pos); >> [16] Consider: warn here and continue. > > > If these position do not match, I might return the wrong object. Why do > you think this would be safe? > When adding items to the catalogue, you append them at the end of the corresponding array. Thus you correctly assume that the bstream library will give you items in the order in which they are stored in image's catalogue. This means that, e.g., the 3rd database will be stored in cat_databases at position 2 and when backup_locate_global() is called with pos=2, then the correct database will be fetched. All this should work ok, regardless of whether the position stored inside st_bstream_*_info structure is correct or not. Since I believe that all should work correctly even if the position stored in an item is inconsistent, I suggested not to error in that case, but warn and continue. > In contrast, this can only happen if bup_global->glb_item.pos is changed > after bup_global was added to the array. I check these positions before > every insert (there's an exception for tables though). So an assert may > even be better than the "if". > > ... >>> +enum options_mysqlbackup >>> +{ >>> + OPT_MYSQLBACKUP_CATALOG_SUMMARY=1024, >> [17] Why 1024? What does it have to allign with? > > > The mysys function handle_options(), my_print_help(), and > my_print_variables() take an options array as argument, which identifies > the options by an integer value. This value is a single character for > many options like 'v' for verbose. Many characters are used up for > standard MySQL options. Options that do not need a single letter > equivalent can specify identifiers > 255. This is done in client_priv.h: > > enum options_client > { > OPT_CHARSETS_DIR=256, OPT_DEFAULT_CHARSET, > OPT_PAGER, OPT_NOPAGER, OPT_TEE, OPT_NOTEE, > ... > > for options common to all clients. Client-specific options use numbers > above these. I selected 1024, which feels safe. But now I found an > example in mysqltest.cc, how to do it better: > > enum { > OPT_SKIP_SAFEMALLOC=OPT_MAX_CLIENT_OPTION, > OPT_PS_PROTOCOL, OPT_SP_PROTOCOL, OPT_CURSOR_PROTOCOL, > ... > > I'll change 1024 to OPT_MAX_CLIENT_OPTION and add a comment. > Cool. > ... >>> +void print_item(uint indent, void *item_arg, uint what) >> [18] Suggestion: declare item_arg as st_bstream_item_info* type. > > > Done, inclusive the required casts at every call of print_item(). Some > changed lines became too long and required a line break. > > I feel the client code to become more ugly the longer I follow these > suggestions... > > Again. What is the win? A saved assignment? And what is the difference > to the "void*", the backup stream library interface makes use of? > The win is having some basic type-checking, which is a way of detecting errors. Contrary to my previous suggestion about backup_read_catalog(), the item structures are part of public interface so it makes sense to use them and have compiler check correct typing of arguments. This is what I think - if you disagree you can simply ignore the suggestion. >> ... >>> +/** >>> + Search backup object. >> [19] ... In the given catalog. What is the result of the search and how >> is it >> communicated to the caller? > > > Extended function comment. > OK. >>> + >>> + @param[in] search_name search name >>> + @param[out] search_database_name database name, may be NULL >>> + @param[out] search_object_name object name >>> + >>> + @return status >>> + @retval 0 ok >>> + @retval != 0 error >>> +*/ >>> + >>> +void >> [20] Bad return type for a function which is supposed to return status... > > > Fixed function comment. > OK. > ... >> [21] Why a separate search loop for database? Looks like it could be >> handled by >> search_array_array loop. > > > The bup_catalog->cat_databases array contains pointers to a different > struct. If combining these, I would need to have branches for both data > types and/or be casting to death. > OK. > ... >>> + if (!my_wildcmp(&my_charset_latin1, (char*) name->begin, >>> + (char*) name->end, (char*) search_object_name, >>> + (char*) search_object_name + >>> search_object_name_len, >>> + '\\', '_', '%')) >>> + { >>> + print_item(4, bup_perdb, PRI_NAME | PRI_META); >>> + } >>> + } >>> + } >>> + } >>> + } >> [22] This code repeats again - perhaps a helper function should be defined. > > > Done. > > ... >>> + if (brc != BSTREAM_OK) >>> + { >>> + goto end; >>> + } >> [23] Our coding guidelines do not allow braces in such situation... :P > > > Well, I read: "After if or else or while, when there is only one > instruction after the condition, braces are not necessary and the > instruction goes on the next line, indented." > > The important word is "not necessary". > Good to know. I was once asked by a reviewer to remove such braces but now I see that I was mislead - I have not noticed the important word... > But to proceed with uglifying the code, I change to: > > if (brc != BSTREAM_OK) > { > /* Bail out on error. */ > goto end; > } > > ... >>> + /* >>> + Binlog coordinates are present, if the BSTREAM_FLAG_BINLOG flag is >>> + set in the header *and* table data are in the image. >>> + DBUG_ASSERT(!(hdr->flags & BSTREAM_FLAG_BINLOG) || >>> + hdr->binlog_pos.file || hdr->binlog_group.file); >>> + */ >> [24] This was a bug which is fixed now. Now binlog coordinates are >> always present if BSTREAM_FLAG_BINLOG is set. The code should be changed >> to reflect this. > > > You must be mistaken. I added an assert that verifies your claim. It > crashed on binlog_group.file == NULL. > In case BSTREAM_FLAG_BINLOG is set, binlog_pos should be initialized but binlog_group is always empty because it is not used (and perhaps will never be). > ... >>> + if (opt_mysqlbackup_search || >>> + opt_mysqlbackup_snapshots || >>> + opt_mysqlbackup_catalog_summary || >>> + opt_mysqlbackup_catalog_details || >>> + opt_mysqlbackup_metadata_statements || >>> + opt_mysqlbackup_metadata_extra || >>> + opt_mysqlbackup_data_chunks || >>> + opt_mysqlbackup_data_totals || >>> + (opt_mysqlbackup_summary && !(hdr->flags & >>> BSTREAM_FLAG_INLINE_SUMMARY))) >> [25] Suggestion: wouldn't it be more clear to formulate condition when >> we *do not* want to read the catalog? > > > Hehe. We want to read the catalog, if *any* of the options is present. > We do not want to read, if *none* is present. > > So would inverting the condition become more clear? To my great surprise > there are indeed humans who find double negations easier to understand. > But I fail to do so. > Right, the condition is such that negating it will not make it simpler. I'm sorry to see the irony again - I asked a question in hope that it might help to improve the code. Sometimes negating the condition can simplify it a lot and often such possibility can be overlooked (I'm sure it happened to me). I'll take a mental note that you are not interested in such remarks and suggestions. > However, I found a different way to make it slightly less ugly. I add a > flag per section of the backup image, e.g. "bool need_catalog". When > reading the command line options, I set the corresponding flag. After > reading the header, I know, if the summary section is contained. So I > can clear need_summary, if it is in the header. Then I escalate flags > from end of image to the beginning. That is, if the summary is needed, I > must read table data too, etc. Finally, if a search is requested, I need > the catalog. Now I have a single flag per section to decide if the > section is required. > > ... >>> + for (idx= 0; idx < hdr->snap_count; idx++) >> [26] Here and in other places - assert that snap_count < 256 which is >> the size of snapshot[] array (i.e., that you don't exceed array sizes). > > > Done. > OK. > ... >>> + if (snapshot->options) >>> + printf(" Snapshot %u options %u\n", >>> + idx, snapshot->options); /* purecov: inspected */ >> [27] I think the options should be printed as hex number (so that user >> can see the bits). > > > Done. > OK. > ... >> [28] Suggestion: Below you iterate over databases and tables in each >> database to collect tables belonging to a given snapshot. But I think it >> would be simpler and more direct to iterate over snap_index_pos_to_table >> array of the given snapshot. > > > But is there a guarantee that tables belonging to a database are kept > together? > I'm not sure - if it is the case then it is by chance, not by design. But, is it crucial that they are together? After all, you are listing all tables belonging to a given snapshot - they can come from different databases and perhaps it is not so bad if they are not grouped by database. So, one can loose some clarity of the output and gain some simplicity of the code. Which is more important is subjective. This is why I made it a suggestion and you can choose. >>> + for (kdx= 0; kdx < bup_catalog->cat_databases.elements; > kdx++) >>> + { > ... >>> + bup_global= *((struct st_backup_global**) >>> + dynamic_array_ptr(&bup_catalog->cat_charsets, >>> idx)); >> [29] Suggestion: Check with assertion that bup_global is not NULL. > > > Done like above with a comment that this cannot happen. > > ... >>> + /* >>> + Print meta data. >>> + */ >> [29] Suggestion: print metadata info as it is stored in the image. >> Here you show metadata in a logical order but I think it would add >> certain value to show it in the order in which it is stored in the >> image. This would give a closer insight into the structure of the image >> and the purpose of this client is to show this structure. Note that the >> order in which metadata is stored is important because it is supposed to >> obey dependencies between objects. Being able to see the actual order >> will help to diagnose potential problems. > > > I see the value from a developer's point of view. However, for a user > the inner structure of the backup image won't be of that much use. I > think, a user prefers a logical order. > >> However, I do realize that presenting this information as I suggest >> would require big changes to the structure of this code, so I leave it >> for your judgement if it is needed and worth the effort. >> >> In the backup image, metadata for various objects from the catalogue is >> stored in following order: >> >> - metadata for global items; >> - for each database, metadata for tables from that database; >> - metadata for all the other per-database objects. >> >> Implementing my suggestion would require printing metadata from >> bcat_create_item() callback function which is called when backup >> catalogue is read by bstream_rd_catalogue(). > > > No. Fortunately not. I just need two more dynamic arrays. One to hold > pointers to the catalog items in the order as presented to > bcat_add_item() (the catalog items), and one to hold pointers to the > catalog items in the order as presented to bcat_create_item() (the meta > data). > > But since I disagree with your suggestion, I hope you can forgive me, > that I didn't implement it. Instead, I violated the processes and added > a new command line option: --image-order. This option determines, if I > step through the catalog hierarchy in the logical order when printing > items or meta data, or if I step through the new arrays. I hope, you can > live with the compromise, and that it doesn't make my patch inapprovable. > Sure, this is the definition of a "suggestion". And the compromise is very good. > ... >>> + /* Charsets don't have meta data. */ >>> + /* Users don't have meta data. */ >> [30] Considering changing image specification to allow metadata for all >> types of objects in which case the code should handle it. > > > It should be easy to add the required loops then. Otherwise I would need > to add purecov comments, which I would have to remove when the image > specification changed. So some code change will be necessary anyway. > OK. > ... > > I made a new patch with the changes. > http://lists.mysql.com/commits/66092 > I'll look at it. Rafal
