| List: | Commits | « Previous MessageNext Message » | |
| From: | Øystein Grøvlen | Date: | February 19 2009 10:17am |
| Subject: | Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745) WL#4630 | ||
| View as plain text | |||
Ingo Strüwing wrote:
> Hi Øystein,
>
> thank you for the review. You had some valuable comments and found some
> bugs.
>
> However, in a couple of cases you complained without proposing
> alternatives. Please see below.
I do this because I do not necessarily have the answer. I think it is
the implementor's responsibilty to decide on a solution, and that it
is OK for a reviewer to just challenge the existing code to make the
implementor consider whether better solutions exists. (I think Rafal
covered that point of view pretty good in an earlier email.)
Responses to your responses can be found in-line.
--
Øystein
>
> Øystein Grøvlen, 17.02.2009 14:57:
>
> ...
>> Ingo Struewing wrote:
> ...
>>> 2745 Ingo Struewing 2008-11-25
>>> WL#4630 - ST: MySQL Backup Client Program - Milestone 1
>>>
>>> The MySQL Backup Client Program is a command line client
>>> program, that examines and displays the contents of a
>>> backup image.
>>>
>>> By command line options you can select different types
>>> of information to view.
>>>
>>> It does also have a search capability for finding
>>> specific database objects in the backup image.
>>>
>>> For a detailed explanation of the options, run
>>>
>>> mysqlbackup --help
> ...
>>> +static const char *load_default_groups[]= { "mysqlbackup",
>>> +#ifdef MYSQL_CLIENT
>>> + "client",
>>> +#endif
>> Since MYSQL_CLIENT was undef just above, is this just for future
>> extensions?
>
>
> Yes. In a later milestone, mysqlbackup shall connect to the server and
> initiate backups. At that point, I need to include the client part.
>
OK.
> ...
>>> +#ifdef MYSQL_CLIENT
>>> +/* Options required by a MySQL client (connecting to a server). */
>>> +static const char *opt_database;
>> As far as I can tell, this is usually called opt_databases in other
>> clients. Any reason it should be different here?
>
>
> I found "opt_databases" in mysqlcheck.c and mysqldump.c only. For these
> clients, the option means to check/dump several databases instead of a
> list of tables. So it is different from the option that supplies the
> default database name for the connection. For the latter, mysqlbinlog.cc
> has "database", mysql.cc has "current_db", mysqltest.cc has "opt_db". So
> it seems it *must* be different somehow. ;-)
OK. It is not as consistent as I thought.
>
> ...
>>> +enum options_mysqlbackup
>>> +{
>>> + OPT_MYSQLBACKUP_CATALOG_SUMMARY=1024,
>> What is the magic of 1024?
>
>
> Rafal asked the same. Here my reply:
>
> The mysys function handle_options(), my_print_help(), and
> my_print_variables() take an options array as argument, which identifies
> the options by an integer value. This value is a single character for
> many options like 'v' for verbose. Many characters are used up for
> standard MySQL options. Options that do not need a single letter
> equivalent can specify identifiers > 255. This is done in client_priv.h:
>
> enum options_client
> {
> OPT_CHARSETS_DIR=256, OPT_DEFAULT_CHARSET,
> OPT_PAGER, OPT_NOPAGER, OPT_TEE, OPT_NOTEE,
> ...
>
> for options common to all clients. Client-specific options use numbers
> above these. I selected 1024, which feels safe. But now I found an
> example in mysqltest.cc, how to do it better:
>
> enum {
> OPT_SKIP_SAFEMALLOC=OPT_MAX_CLIENT_OPTION,
> OPT_PS_PROTOCOL, OPT_SP_PROTOCOL, OPT_CURSOR_PROTOCOL,
> ...
>
> I'll change 1024 to OPT_MAX_CLIENT_OPTION and add a comment.
Good.
>
> ...
>>> +static struct my_option my_long_options[] =
>>> +{
>>> + /* General options. */
>> Isn't there some way that clients can share this code? It does not
>> seem right to have to copy&paste this for every new client.
>
>
> Since this is a list of initializers to an array of structs, we would
> probably need a macro to do this. Or a bunch of macros that combine
> related options each. Anyhow, it is probably out of scope of this task.
I agree.
>
> ...
>>> + puts("By Sun Microsystems, for your professional use\n\
>>> +This software comes with NO WARRANTY: This is free software,\n\
>>> +and you are welcome to modify and redistribute it under the GPL
>> license\n");
>>> +
>> Have you checked that this is correct? Should not some copyright
>> notice with year be included (e.g., see mysqld)?
>
>
> Hm. I copied it from mysqlbinlog and changed the name. It seems that
> every client displays a different text. Perhaps we could also leave it
> away. Or check with someone authority, what we should print.
Maybe we should check with build or documentation people.
>
> ...
>>> +/**
>>> + Print a human readable number.
>> I do not think "number" is right here. Isn't it a size in bytes?
>
>
> Doesn't include a size a number too? I use the function for the file
> size and the number of bytes per table-data chunk. So what do you
> suggest to write here? "human readable number of bytes"? Or just add
> "K", "M", ... and leave "bytes" to be printed outside of the function?
It is not a big issue, but my point is that numbers are a sequence of
digits (and +, -, and .) What is printed here is a number AND a unit
(bytes).
>
> ...
>>> + if (!opt_mysqlbackup_exact)
>>> + {
>>> + /* Shrink value while it's > 16K and postfix list is not at
end. */
>>> + while (((value > 16383) || (value < -16384)) &&
*(++postfix_p))
>>> + value/= 1024;
>>> + /*
>>> + In case that we terminated the loop due to end of the postfixes
>>> + list, we need to compensate the loop's ++postfix_p so that we
>>> + point to the last non-NULL postfix. Note that this can only
happen
>>> + on machines where longlong has more than 64 bits.
>>> + */
>>> + if (!*postfix_p)
>>> + --postfix_p; /* purecov: inspected */
>> I am not very fond of dead code like this. It is not possible to
>> test, and it took me some time to verify that it is actually correct.
>> I would guess that if longlong ever becomes something more than 64
>> bits, there are bigger problems that need to be solved ...
>
>
> Possible. But do you suggest to consciously leave this unsolved? Didn't
> we face enough problems with code that didn't expect "long" could be
> longer than 32 bits? If we will get problems with code that doesn't
> think of longlong to be longer than 64 bits ever, do we want to
> consciously add yet another one here?
It does not seem to me that spending time on fixing .1% of the issue
we will have if that happens is worthwhile.
>
> ...
>>> + case BSTREAM_IT_CHARSET:
>>> + case BSTREAM_IT_USER:
>>> + case BSTREAM_IT_TABLESPACE:
>>> + {
>>> + struct st_backup_global *bup_global= (struct st_backup_global*)
>> item;
>>> + type_s= bup_global->glb_typename;
>>> + mdata= &bup_global->glb_metadata;
>>> + DBUG_PRINT("bupstrm", ("%s", type_s));
>>> + break;
>>> + }
>> In the above code block, the order of the statements is different from
>> the others below. I think it would be good with some symmetry here.
>
>
> Ok. Which order do you like best?
I do not really care which order. I just feel that some symmetry
makes the code easier to read.
>
> ...
>>> + case BSTREAM_IT_SPROC:
>>> + DBUG_PRINT("bupstrm", ("sproc"));
>>> + type_s= "Sproc";
>>> + break;
>>> + case BSTREAM_IT_SFUNC:
>>> + DBUG_PRINT("bupstrm", ("sfunc"));
>>> + type_s= "Sfunc";
>>> + break;
>> Is sproc and sfunc well known abbreviations? Does the s mean stored or
>> sql?
>> I think it would be good to be more explicit here.
>
>
> Sure, but look. At the moment I print:
>
> Table 'mysqltest1'.'t3'
> Sproc 'mysqltest1'.'p1'
> Sproc 'mysqltest1'.'p2'
> Sfunc 'mysqltest1'.'f1'
> Sfunc 'mysqltest1'.'f2'
> View 'mysqltest1'.'v1'
> View 'mysqltest1'.'v2'
> Event 'mysqltest1'.'e1'
> Event 'mysqltest1'.'e2'
> Privilege 'mysqltest1'.'<empty>'
>
> If I change to "stored procedure", the indentation of the object names
> will become so wide that it may be difficult to associate the type name
> with the object name for "table", "view", and "event". For table one
> could print "relation", which is longer, but I don't know how to make
> "view" longer. So what shall I do?
I see your point. Do really "stored" need to be included here? Would
just using "procedure" and "function" introduce ambiguity?
>
> ...
>>> +/**
>>> + Print error from parsing the search name.
>>> +
>>> + @param[in] format printf-style format string
>>> + @param[in] args list of arguments for the format
string
>>> +*/
>>> +
>>> +static void error_parse_search_name(const char *format, ...)
>>> + ATTRIBUTE_FORMAT(printf, 1, 2);
>>> +static void error_parse_search_name(const char *format, ...)
>> I do not quite understand what is going on here, but is this function
>> really necesssary? It seems like all you achieve is 3 less times of
>> typing "Cannot parse search name.\n". If you really wanted to save
>> typing, you could save more by using a macro.
>
>
> I have been asked to make functions for even less savings. Typing is not
> a problem. Copy-n-paste would be the natural choice. The (small) wins
> are less string space, and less line breaks.
>
> Since a macro cannot have a variable number of arguments, you probably
> mean to add the string through a macro, e.g.
> "error_parse_search_name(CPSN "Unrecognized stuff at '%s'\n", tend);"?
> But then you probably want a meaningful macro name, and I'll get line
> breaks again. Or did you think of something else? It would be good if
> you could make more concrete suggestions in a review.
>
My suggestion would be to use errm_with_prefix directly where needed
and add the "Cannot parse ..." directly to the string.
> ...
>>> +/**
>>> + Parse the search name.
>>> +
>>> + Split search name into database name, if present, and object name.
>>> +
>>> + @param[in] search_name search name
>>> + @param[out] search_database_name_p database name
>>> + @param[out] search_object_name_p object name
>>> +
>>> + @return status
>>> + @retval 0 ok
>>> + @retval != 0 error
>>> +*/
>>> +
>>> +static int parse_search_name(const char *search_name,
>>> + char **search_database_name_p,
>>> + char **search_object_name_p)
>>> +{
>> It seems to me like you are re-inventing the wheel here. Parsing for
>> quoted strings etc. must have been done before. Can't some existing
>> code be reused?
>
>
> Which existing code do you mean?
I am not sure, but I would think that MySQL would need code to handle
quoted strings (e.g., the interactive client).
>
> ...
>>> + /* Skip dot. */
>>> + tend++;
>>> + /* We need an object name. Skip space. */
>>> + for (token= tend; isspace(*token); token++) {}
>> Is it supposed to be OK with spaces after the dot?
>
>
> Well, I defined the syntax to my liking. :) And yes, I did explicitly
> allow for it. This having said, I realized that I didn't have a test for
> it. Fixed.
>
> I guess that no one ever wants to type a space after the dot (which
> requires escaping from the shell after all), but if the command is built
> by a script, it might be helpful if the program accepts space at
> this place.
I agree, and I checked that the interactive command-line sql client
also allow spaces after dot.
>
>>> + /* Check for quoted search string. If unquoted, search for
end. */
>>> + quote= '\0';
>>> + switch (*token) {
>>> + case '\'':
>>> + case '\"':
>>> + case '`':
>>> + quote= *(token++);
>>> + }
>>> + /* Find end of token. */
>>> + if (quote == '\0')
>>> + for (tend= token; *tend && !isspace(*tend); tend++) {}
>>> + else
>>> + for (tend= token; *tend && (*tend != quote); tend++) {}
>> A lot of duplicate code here. I suggest extracting the quoted string
>> stuff into a separate method which can be called several times.
>
>
> See, what I meant above ("I have been asked to make functions for even
> less savings")? You suggested, not to make a function for adding a
> string at four places, but want a function to replace a couple of
> instructions at two places?
I think it is a huge difference between reusing a simple string and
simplifying an algorithm by extracting common sub-operations. As far
as I can tell (count), we are talking about moire than 20 lines of
code that are almost identical. (Only difference is whether quote is
initialized to '.' or '\0')
>
> The function would need to take quote, token, and tend as parameters,
> and return object. Tricky to select a name that makes its effect
> obvious. Not simple to document its operation. Reading the parser one
> would have to jump between two functions to see, how they interact in
> setting the four variables.
>
> I'd not be happy to waste time on this, but if you insist...
I am not insisting on anything. It is your code. However, this issue
is more important to me than most of my other comments. By extracting
common code into a function, one will improve readability and
maintainability. One will immediately see that the code is doing the
same thing twice. This was not evident when I was reading the code,
which meant that it took me longer than necessary to understand what
the code was doing.
>
> ...
>>> + /* Now we must be at string end. */
>>> + if (*tend)
>>> + {
>>> + error_parse_search_name("Unrecognized stuff at '%s'\n",
tend);
>>> + goto end;
>> stuff? I suggest content or something more "respectful".
>
>
> :) I searched for "stuff" through the MySQL sources and found > 300
> lines. I found it in > 300 MySQL related emails in my archive. I found
> it in > 200 Linux man pages. It seems to be the way people are talking.
>
> Anyway, what do you suggest as a replacement?
content? However, this is not a big issue.
>
> ...
>>> + printf("Searching '%s'.'%s'\n", search_database_name,
>> search_object_name);
>>
>> I suggest "Searching for ..."
>
>
> You love linebreaks, do you? ;-) Ok. fixed. BTW. Thanks for a suggestion
> here.
>
> Ah. And do you think the command line option should be --search-for
> instead of --search? Also, is it ok to have "search" without "for" in
> variable names?
I do not think you need to change the names. I think there a
different standards for what a presented to programmers and to end
users. I think we need to be more careful about the visible part of
the product.
>
> ...
>> I suggest a comment here. Later blocks of this method have one.
>
>
> Ok.
>
>>> + for (search_array_p= search_array_array; *search_array_p;
>> search_array_p++)
>>> + {
>>> + for (idx= 0; idx < (*search_array_p)->elements; idx++)
>
>
> ...
>>> + if (!my_wildcmp(&my_charset_latin1, (char*) name->begin,
>>> + (char*) name->end, (char*) search_database_name,
>>> + (char*) search_database_name +
>> search_database_name_len,
>>> + '\\', '_', '%'))
>>> + {
>>> + DYNAMIC_ARRAY **search_array_p;
>>> + DYNAMIC_ARRAY *search_array_array[]= {
>>> + &bup_database->db_tables,
>>> + &bup_database->db_perdbs,
>>> + NULL
>>> + };
>>> + struct st_backup_perdb *bup_perdb;
>>> + uint jdx;
>> You are repeating the same pattern here. Please, consider if some
>> common code can be extracted into a separate method.
>
>
> Done.
Considered or extracted? ;-)
>
> ...
>>> +read_and_print_summary(void *image_handle,
>>> + struct st_backup_catalog *bup_catalog)
>>> +{
> ...
>>> + brc= backup_read_summary(image_handle, bup_catalog);
>> Why is bup_catalog the parameter here? It seems backup_read_summary
>> only uses the header internally, so why not use the header instead? I
>> guess the same holds for this method.
>
>
> I like it better to use a common "handle" for all functions of an
> interface. That way I don't need to change the signature when I change
> the implementation.
Limiting what is shared between units of code (low coupling) improves
modularization and makes it clearer what is actually used. However, I
see your point.
>
> ...
>>> + /*
>>> + Print summary.
>>> + */
>>> + if (opt_mysqlbackup_summary)
>> Is it necessary to read the summary if it is not supposed to be
>> printed?
>
>
> Yes. In case of an inline summary, we need to read over it until the
> next chunk, even if we don't want to print it.
>
>> Anyway, why not let the caller call backup_read_summary
>> directly and just provide a method for printing?
>
>
> Are you joking with me, or do you want to frustrate me? I made a
> function to print error at four places, you don't want it. I did not
> make a function from two slightly different instruction sequences for
> parsing, so you want one. I made a function to combine reading and
> printing of summary to be called from two places with exactly the same
> instructions, you don't want it. (?!?)
This is not an exact science. There are many different aspects to
consider, and probably many different opinions. That you are doing
this in two places is an argument for keeping it as it is. Some
arguments against:
- This function is not always doing read_and_print. It only prints
if that is asked for. Changing the code where
read_and_print_summary is called like this would make that more
evident:
backup_read_summary(...)
if (opt_mysqlbackup_summary)
print_summary(...) // New method that does the print part
- The current function is a bit assymetric since you call another method
for reading, but do the printing in the function. (This is not a
major point.)
The final decision is all up to you.
>
>> A general design
>> advice is to let methods have just one responsibility.
>
>
> This is pretty new to me. I learned to develop in hierarchies. The
> bottom methods may have just one responsibility. But already the next
> higher level calls two or more methods from the lower level and thus has
> two or more responsibilities.
The point is that we are talking about different levels of
abstraction. The next higher level should have a single
responsibility as seen from the caller's level of abstraction.
>
> See for example Backup_restore_ctx::do_backup(). It has the
> resposibilities logging, locking, starting drivers, polling drivers,
> writing image file, and so on.
No, at the caller's abstraction level, it has the single
responsibility of doing backup. At a lower abstraction level, this
involves logging, locking, ...
>
> ...
>>> + /*
>>> + Binlog coordinates are present, if the BSTREAM_FLAG_BINLOG
flag is
>>> + set in the header *and* table data are in the image.
>>> + DBUG_ASSERT(!(hdr->flags & BSTREAM_FLAG_BINLOG) ||
>>> + hdr->binlog_pos.file || hdr->binlog_group.file);
>>> + */
>> Why is this disabled?
>
>
> At the time of committing this version of the patch, there was a bug in
> the backup code. This is fixed now and a new version of my patch hasn't
> anything disabled at this place.
Good.
>
> ...
>>> +int main(int argc, char** argv)
>> This method is very long. I suggest splitting into several methods,
>> each with a single responsibility.
>
>
> Can you please define "single responsibility"? I have the impression
> that I won't get it right without such definition.
My suggestion is to put the parts that prints stuff in separate
methods. (E.g., print backup stram header, print snapshot section,
print catalog, print meta data). The reading seems to already happen
in separate methods.
>
>>> +{
>>> + char **defaults_argv;
>>> + void *image_handle;
>>> + char *search_database_name= NULL;
>>> + char *search_object_name= NULL;
>>> + struct st_backup_catalog *bup_catalog= NULL;
>>> + struct st_bstream_image_header *hdr;
>>> + enum enum_bstream_ret_codes brc;
>>> + int rc;
>>> + uint idx;
>>> + int errpos= 0;
>>> + char llbuff[22];
>> Some explanations for the different local variables would be good.
>> In particular, I would like to know the significance of the different
>> values for rc.
>
>
> Can you please point me at some code, where this is done exemplarily?
I am not looking for something very advanced here. Just a line for
each variable that explains it's purpose. When a method is a long as
this one, it is not straight-forward to see what a variable is used
for, and an explanation would be good.
>
> ...
>>> + if (hdr->flags && opt_verbose)
>>> + {
>>> + const char *separators[]= {"", ", "};
>>> + int separator= 0;
>> This separator trick is pretty need, but it took me some time to
>> understand so a comment would be good.
>
>
> What means "pretty need"? "nasty"? Anyway, I tried to explain this in
> detail. But it became quite long and still complicated. So I chose a
> simpler solution with a simpler comment.
Sorry, I meant "neat".
>
> ...
>>> + if (hdr->flags & BSTREAM_FLAG_INLINE_SUMMARY)
>> Is it not possible to test this?
>
>
> As far as I know, the current backup code won't create an image with
> inline summary. Spending an error injection point seems to be overkill.
> The same call to read_and_print_summary() is tested with end-summary
> later in the function.
OK.
>
> ...
>>> + /*
>>> + The first two character sets are special:
>>> + 1. Character set to use for interpreting the backup file.
>>> + 2. Server character set.
>>> + These do not count as catalog items.
>> I have been told this a few times already. This might be an
>> indication that some code could be extracted and shared, or just that
>> this comment is superfluous.
>
>
> From my point of view, the problem is the special handling of the first
> two character sets. This makes the code pretty spaghetti.
>
> I am unsure, which comment to remove. Not everyone is a reviewer,
> reading all of the code. If you are sure that everybody will find the
> relevant comment, regardless of which part of the code he looks at, then
> tell me which comment to leave in and which to remove.
My suggestion is to explain this thoroughly as you do the first time.
In the other places, you could just say "Skip first two character
sets. They have been displayed in the header section", or something
like that.
>
> ...
>>> + if (opt_mysqlbackup_search &&
>>> + !opt_mysqlbackup_metadata_statements &&
>>> + !opt_mysqlbackup_metadata_extra)
>>> + {
>>> + search_object(bup_catalog, search_database_name,
>> search_object_name);
>>> + goto end;
>>> + }
>>
>> I do not understand why the searching is split in two places.
>
>
> It isn't split. It is done at one of two places, depending on command
> line options. But I found a way to avoid it.
Good.
>
>> Why
>> could not searching be done here even if one is interested in metadata?
>
>
> During the search, every found item is printed immediately. If we don't
> have read the meta data yet, we can't print it. But as said above, I
> changed it.
>
> ...
>>> + use_err:
>>> + usage();
>>> + rc= 1;
>>> +
>> Instead of a particular use_err target, why to just go to end and use
>> the value of rc to determine whether usage should be printed.
>
>
> I do not want to print the usage one every error, but only for wrong
> usage, so to say (wrong command line option, including invalid search
> string, wrong number of command line arguments). Using a special value
> for rc might do the trick, but requires careful documentation. What
> would be the win?
I would like careful documentation of rc anyhow :-)
>
>>> + end:
>>> + /* Flush stdout to get output in right order in case of error
>> messages. */
>>> + fflush(stdout);
>>> + /* Cleanup. */
>>> + switch (errpos) {
>>> + default:
>>> + case 40:
>>> + brc= backup_image_close(image_handle);
>>> + if (brc != BSTREAM_OK)
>>> + rc= 2;
>>> + case 30:
>>> + backup_catalog_free(bup_catalog);
>>> + case 25:
>>> + my_free(search_object_name, MYF(MY_ALLOW_ZERO_PTR));
>>> + my_free(search_database_name, MYF(MY_ALLOW_ZERO_PTR));
>>> + case 20:
>>> +#ifdef MYSQL_CLIENT
>>> + my_free((char*) opt_database, MYF(MY_ALLOW_ZERO_PTR));
>>> + my_free((char*) opt_host, MYF(MY_ALLOW_ZERO_PTR));
>>> + my_free((char*) opt_pass, MYF(MY_ALLOW_ZERO_PTR));
>>> + my_free((char*) opt_user, MYF(MY_ALLOW_ZERO_PTR));
>>> +#endif
>> Are you sure that this clean-up is not needed if handle_option() fail?
>> Or to state it differently, are you sure that handle_option() does
>> the necessary cleanup should it fail? Anyway, why bother with errpos
>> when doing it in MY_ALLOW_ZERO_PTR mode. Would it not be OK to call
>> free for all pointers regardless of when it failed?
>
>
> True. Fixed.
>
> ...
>
> Regards
> Ingo
