| List: | Commits | « Previous MessageNext Message » | |
| From: | Ingo Strüwing | Date: | February 18 2009 10:57pm |
| Subject: | Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745) WL#4630 | ||
| View as plain text | |||
Hi Øystein, thank you for the review. You had some valuable comments and found some bugs. However, in a couple of cases you complained without proposing alternatives. Please see below. Øystein Grøvlen, 17.02.2009 14:57: ... > Ingo Struewing wrote: ... >> 2745 Ingo Struewing 2008-11-25 >> WL#4630 - ST: MySQL Backup Client Program - Milestone 1 >> >> The MySQL Backup Client Program is a command line client >> program, that examines and displays the contents of a >> backup image. >> >> By command line options you can select different types >> of information to view. >> >> It does also have a search capability for finding >> specific database objects in the backup image. >> >> For a detailed explanation of the options, run >> >> mysqlbackup --help ... >> +static const char *load_default_groups[]= { "mysqlbackup", >> +#ifdef MYSQL_CLIENT >> + "client", >> +#endif > > Since MYSQL_CLIENT was undef just above, is this just for future > extensions? Yes. In a later milestone, mysqlbackup shall connect to the server and initiate backups. At that point, I need to include the client part. ... >> +#ifdef MYSQL_CLIENT >> +/* Options required by a MySQL client (connecting to a server). */ >> +static const char *opt_database; > > As far as I can tell, this is usually called opt_databases in other > clients. Any reason it should be different here? I found "opt_databases" in mysqlcheck.c and mysqldump.c only. For these clients, the option means to check/dump several databases instead of a list of tables. So it is different from the option that supplies the default database name for the connection. For the latter, mysqlbinlog.cc has "database", mysql.cc has "current_db", mysqltest.cc has "opt_db". So it seems it *must* be different somehow. ;-) ... >> +enum options_mysqlbackup >> +{ >> + OPT_MYSQLBACKUP_CATALOG_SUMMARY=1024, > > What is the magic of 1024? Rafal asked the same. Here my reply: The mysys function handle_options(), my_print_help(), and my_print_variables() take an options array as argument, which identifies the options by an integer value. This value is a single character for many options like 'v' for verbose. Many characters are used up for standard MySQL options. Options that do not need a single letter equivalent can specify identifiers > 255. This is done in client_priv.h: enum options_client { OPT_CHARSETS_DIR=256, OPT_DEFAULT_CHARSET, OPT_PAGER, OPT_NOPAGER, OPT_TEE, OPT_NOTEE, ... for options common to all clients. Client-specific options use numbers above these. I selected 1024, which feels safe. But now I found an example in mysqltest.cc, how to do it better: enum { OPT_SKIP_SAFEMALLOC=OPT_MAX_CLIENT_OPTION, OPT_PS_PROTOCOL, OPT_SP_PROTOCOL, OPT_CURSOR_PROTOCOL, ... I'll change 1024 to OPT_MAX_CLIENT_OPTION and add a comment. ... >> +static struct my_option my_long_options[] = >> +{ >> + /* General options. */ > > Isn't there some way that clients can share this code? It does not > seem right to have to copy&paste this for every new client. Since this is a list of initializers to an array of structs, we would probably need a macro to do this. Or a bunch of macros that combine related options each. Anyhow, it is probably out of scope of this task. ... >> + puts("By Sun Microsystems, for your professional use\n\ >> +This software comes with NO WARRANTY: This is free software,\n\ >> +and you are welcome to modify and redistribute it under the GPL > license\n"); >> + > > Have you checked that this is correct? Should not some copyright > notice with year be included (e.g., see mysqld)? Hm. I copied it from mysqlbinlog and changed the name. It seems that every client displays a different text. Perhaps we could also leave it away. Or check with someone authority, what we should print. ... >> +/** >> + Print a human readable number. > > I do not think "number" is right here. Isn't it a size in bytes? Doesn't include a size a number too? I use the function for the file size and the number of bytes per table-data chunk. So what do you suggest to write here? "human readable number of bytes"? Or just add "K", "M", ... and leave "bytes" to be printed outside of the function? ... >> + if (!opt_mysqlbackup_exact) >> + { >> + /* Shrink value while it's > 16K and postfix list is not at end. */ >> + while (((value > 16383) || (value < -16384)) && > *(++postfix_p)) >> + value/= 1024; >> + /* >> + In case that we terminated the loop due to end of the postfixes >> + list, we need to compensate the loop's ++postfix_p so that we >> + point to the last non-NULL postfix. Note that this can only happen >> + on machines where longlong has more than 64 bits. >> + */ >> + if (!*postfix_p) >> + --postfix_p; /* purecov: inspected */ > > I am not very fond of dead code like this. It is not possible to > test, and it took me some time to verify that it is actually correct. > I would guess that if longlong ever becomes something more than 64 > bits, there are bigger problems that need to be solved ... Possible. But do you suggest to consciously leave this unsolved? Didn't we face enough problems with code that didn't expect "long" could be longer than 32 bits? If we will get problems with code that doesn't think of longlong to be longer than 64 bits ever, do we want to consciously add yet another one here? ... >> + case BSTREAM_IT_CHARSET: >> + case BSTREAM_IT_USER: >> + case BSTREAM_IT_TABLESPACE: >> + { >> + struct st_backup_global *bup_global= (struct st_backup_global*) > item; >> + type_s= bup_global->glb_typename; >> + mdata= &bup_global->glb_metadata; >> + DBUG_PRINT("bupstrm", ("%s", type_s)); >> + break; >> + } > > In the above code block, the order of the statements is different from > the others below. I think it would be good with some symmetry here. Ok. Which order do you like best? ... >> + case BSTREAM_IT_SPROC: >> + DBUG_PRINT("bupstrm", ("sproc")); >> + type_s= "Sproc"; >> + break; >> + case BSTREAM_IT_SFUNC: >> + DBUG_PRINT("bupstrm", ("sfunc")); >> + type_s= "Sfunc"; >> + break; > > Is sproc and sfunc well known abbreviations? Does the s mean stored or > sql? > I think it would be good to be more explicit here. Sure, but look. At the moment I print: Table 'mysqltest1'.'t3' Sproc 'mysqltest1'.'p1' Sproc 'mysqltest1'.'p2' Sfunc 'mysqltest1'.'f1' Sfunc 'mysqltest1'.'f2' View 'mysqltest1'.'v1' View 'mysqltest1'.'v2' Event 'mysqltest1'.'e1' Event 'mysqltest1'.'e2' Privilege 'mysqltest1'.'<empty>' If I change to "stored procedure", the indentation of the object names will become so wide that it may be difficult to associate the type name with the object name for "table", "view", and "event". For table one could print "relation", which is longer, but I don't know how to make "view" longer. So what shall I do? ... >> +/** >> + Print error from parsing the search name. >> + >> + @param[in] format printf-style format string >> + @param[in] args list of arguments for the format string >> +*/ >> + >> +static void error_parse_search_name(const char *format, ...) >> + ATTRIBUTE_FORMAT(printf, 1, 2); >> +static void error_parse_search_name(const char *format, ...) > > I do not quite understand what is going on here, but is this function > really necesssary? It seems like all you achieve is 3 less times of > typing "Cannot parse search name.\n". If you really wanted to save > typing, you could save more by using a macro. I have been asked to make functions for even less savings. Typing is not a problem. Copy-n-paste would be the natural choice. The (small) wins are less string space, and less line breaks. Since a macro cannot have a variable number of arguments, you probably mean to add the string through a macro, e.g. "error_parse_search_name(CPSN "Unrecognized stuff at '%s'\n", tend);"? But then you probably want a meaningful macro name, and I'll get line breaks again. Or did you think of something else? It would be good if you could make more concrete suggestions in a review. ... >> +/** >> + Parse the search name. >> + >> + Split search name into database name, if present, and object name. >> + >> + @param[in] search_name search name >> + @param[out] search_database_name_p database name >> + @param[out] search_object_name_p object name >> + >> + @return status >> + @retval 0 ok >> + @retval != 0 error >> +*/ >> + >> +static int parse_search_name(const char *search_name, >> + char **search_database_name_p, >> + char **search_object_name_p) >> +{ > > It seems to me like you are re-inventing the wheel here. Parsing for > quoted strings etc. must have been done before. Can't some existing > code be reused? Which existing code do you mean? ... >> + /* Skip dot. */ >> + tend++; >> + /* We need an object name. Skip space. */ >> + for (token= tend; isspace(*token); token++) {} > > Is it supposed to be OK with spaces after the dot? Well, I defined the syntax to my liking. :) And yes, I did explicitly allow for it. This having said, I realized that I didn't have a test for it. Fixed. I guess that no one ever wants to type a space after the dot (which requires escaping from the shell after all), but if the command is built by a script, it might be helpful if the program accepts space at this place. > >> + /* Check for quoted search string. If unquoted, search for end. */ >> + quote= '\0'; >> + switch (*token) { >> + case '\'': >> + case '\"': >> + case '`': >> + quote= *(token++); >> + } >> + /* Find end of token. */ >> + if (quote == '\0') >> + for (tend= token; *tend && !isspace(*tend); tend++) {} >> + else >> + for (tend= token; *tend && (*tend != quote); tend++) {} > > A lot of duplicate code here. I suggest extracting the quoted string > stuff into a separate method which can be called several times. See, what I meant above ("I have been asked to make functions for even less savings")? You suggested, not to make a function for adding a string at four places, but want a function to replace a couple of instructions at two places? The function would need to take quote, token, and tend as parameters, and return object. Tricky to select a name that makes its effect obvious. Not simple to document its operation. Reading the parser one would have to jump between two functions to see, how they interact in setting the four variables. I'd not be happy to waste time on this, but if you insist... ... >> + /* Now we must be at string end. */ >> + if (*tend) >> + { >> + error_parse_search_name("Unrecognized stuff at '%s'\n", tend); >> + goto end; > > stuff? I suggest content or something more "respectful". :) I searched for "stuff" through the MySQL sources and found > 300 lines. I found it in > 300 MySQL related emails in my archive. I found it in > 200 Linux man pages. It seems to be the way people are talking. Anyway, what do you suggest as a replacement? ... >> + printf("Searching '%s'.'%s'\n", search_database_name, > search_object_name); > > I suggest "Searching for ..." You love linebreaks, do you? ;-) Ok. fixed. BTW. Thanks for a suggestion here. Ah. And do you think the command line option should be --search-for instead of --search? Also, is it ok to have "search" without "for" in variable names? ... > I suggest a comment here. Later blocks of this method have one. Ok. > >> + for (search_array_p= search_array_array; *search_array_p; > search_array_p++) >> + { >> + for (idx= 0; idx < (*search_array_p)->elements; idx++) ... >> + if (!my_wildcmp(&my_charset_latin1, (char*) name->begin, >> + (char*) name->end, (char*) search_database_name, >> + (char*) search_database_name + > search_database_name_len, >> + '\\', '_', '%')) >> + { >> + DYNAMIC_ARRAY **search_array_p; >> + DYNAMIC_ARRAY *search_array_array[]= { >> + &bup_database->db_tables, >> + &bup_database->db_perdbs, >> + NULL >> + }; >> + struct st_backup_perdb *bup_perdb; >> + uint jdx; > > You are repeating the same pattern here. Please, consider if some > common code can be extracted into a separate method. Done. ... >> +read_and_print_summary(void *image_handle, >> + struct st_backup_catalog *bup_catalog) >> +{ ... >> + brc= backup_read_summary(image_handle, bup_catalog); > > Why is bup_catalog the parameter here? It seems backup_read_summary > only uses the header internally, so why not use the header instead? I > guess the same holds for this method. I like it better to use a common "handle" for all functions of an interface. That way I don't need to change the signature when I change the implementation. ... >> + /* >> + Print summary. >> + */ >> + if (opt_mysqlbackup_summary) > > Is it necessary to read the summary if it is not supposed to be > printed? Yes. In case of an inline summary, we need to read over it until the next chunk, even if we don't want to print it. > Anyway, why not let the caller call backup_read_summary > directly and just provide a method for printing? Are you joking with me, or do you want to frustrate me? I made a function to print error at four places, you don't want it. I did not make a function from two slightly different instruction sequences for parsing, so you want one. I made a function to combine reading and printing of summary to be called from two places with exactly the same instructions, you don't want it. (?!?) > A general design > advice is to let methods have just one responsibility. This is pretty new to me. I learned to develop in hierarchies. The bottom methods may have just one responsibility. But already the next higher level calls two or more methods from the lower level and thus has two or more responsibilities. See for example Backup_restore_ctx::do_backup(). It has the resposibilities logging, locking, starting drivers, polling drivers, writing image file, and so on. ... >> + /* >> + Binlog coordinates are present, if the BSTREAM_FLAG_BINLOG flag is >> + set in the header *and* table data are in the image. >> + DBUG_ASSERT(!(hdr->flags & BSTREAM_FLAG_BINLOG) || >> + hdr->binlog_pos.file || hdr->binlog_group.file); >> + */ > > Why is this disabled? At the time of committing this version of the patch, there was a bug in the backup code. This is fixed now and a new version of my patch hasn't anything disabled at this place. ... >> +int main(int argc, char** argv) > > This method is very long. I suggest splitting into several methods, > each with a single responsibility. Can you please define "single responsibility"? I have the impression that I won't get it right without such definition. > >> +{ >> + char **defaults_argv; >> + void *image_handle; >> + char *search_database_name= NULL; >> + char *search_object_name= NULL; >> + struct st_backup_catalog *bup_catalog= NULL; >> + struct st_bstream_image_header *hdr; >> + enum enum_bstream_ret_codes brc; >> + int rc; >> + uint idx; >> + int errpos= 0; >> + char llbuff[22]; > > Some explanations for the different local variables would be good. > In particular, I would like to know the significance of the different > values for rc. Can you please point me at some code, where this is done exemplarily? ... >> + if (hdr->flags && opt_verbose) >> + { >> + const char *separators[]= {"", ", "}; >> + int separator= 0; > > This separator trick is pretty need, but it took me some time to > understand so a comment would be good. What means "pretty need"? "nasty"? Anyway, I tried to explain this in detail. But it became quite long and still complicated. So I chose a simpler solution with a simpler comment. ... >> + if (hdr->flags & BSTREAM_FLAG_INLINE_SUMMARY) > > Is it not possible to test this? As far as I know, the current backup code won't create an image with inline summary. Spending an error injection point seems to be overkill. The same call to read_and_print_summary() is tested with end-summary later in the function. ... >> + /* >> + The first two character sets are special: >> + 1. Character set to use for interpreting the backup file. >> + 2. Server character set. >> + These do not count as catalog items. > > I have been told this a few times already. This might be an > indication that some code could be extracted and shared, or just that > this comment is superfluous. From my point of view, the problem is the special handling of the first two character sets. This makes the code pretty spaghetti. I am unsure, which comment to remove. Not everyone is a reviewer, reading all of the code. If you are sure that everybody will find the relevant comment, regardless of which part of the code he looks at, then tell me which comment to leave in and which to remove. ... >> + if (opt_mysqlbackup_search && >> + !opt_mysqlbackup_metadata_statements && >> + !opt_mysqlbackup_metadata_extra) >> + { >> + search_object(bup_catalog, search_database_name, > search_object_name); >> + goto end; >> + } > > > I do not understand why the searching is split in two places. It isn't split. It is done at one of two places, depending on command line options. But I found a way to avoid it. > Why > could not searching be done here even if one is interested in metadata? During the search, every found item is printed immediately. If we don't have read the meta data yet, we can't print it. But as said above, I changed it. ... >> + use_err: >> + usage(); >> + rc= 1; >> + > > Instead of a particular use_err target, why to just go to end and use > the value of rc to determine whether usage should be printed. I do not want to print the usage one every error, but only for wrong usage, so to say (wrong command line option, including invalid search string, wrong number of command line arguments). Using a special value for rc might do the trick, but requires careful documentation. What would be the win? > >> + end: >> + /* Flush stdout to get output in right order in case of error > messages. */ >> + fflush(stdout); >> + /* Cleanup. */ >> + switch (errpos) { >> + default: >> + case 40: >> + brc= backup_image_close(image_handle); >> + if (brc != BSTREAM_OK) >> + rc= 2; >> + case 30: >> + backup_catalog_free(bup_catalog); >> + case 25: >> + my_free(search_object_name, MYF(MY_ALLOW_ZERO_PTR)); >> + my_free(search_database_name, MYF(MY_ALLOW_ZERO_PTR)); >> + case 20: >> +#ifdef MYSQL_CLIENT >> + my_free((char*) opt_database, MYF(MY_ALLOW_ZERO_PTR)); >> + my_free((char*) opt_host, MYF(MY_ALLOW_ZERO_PTR)); >> + my_free((char*) opt_pass, MYF(MY_ALLOW_ZERO_PTR)); >> + my_free((char*) opt_user, MYF(MY_ALLOW_ZERO_PTR)); >> +#endif > > Are you sure that this clean-up is not needed if handle_option() fail? > Or to state it differently, are you sure that handle_option() does > the necessary cleanup should it fail? Anyway, why bother with errpos > when doing it in MY_ALLOW_ZERO_PTR mode. Would it not be OK to call > free for all pointers regardless of when it failed? True. Fixed. ... Regards Ingo -- Ingo Strüwing, Database Group Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten Geschäftsführer: Thomas Schröder, Wolfgang Engels, Dr. Roland Bömer Vorsitzender des Aufsichtsrates: Martin Häring HRB München 161028
