List:Commits« Previous MessageNext Message »
From:Ingo Strüwing Date:February 18 2009 9:57pm
Subject:Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)
WL#4630
View as plain text  
Hi Øystein,

thank you for the review. You had some valuable comments and found some
bugs.

However, in a couple of cases you complained without proposing
alternatives. Please see below.

Øystein Grøvlen, 17.02.2009 14:57:

...
> Ingo Struewing wrote:
...
>>  2745 Ingo Struewing    2008-11-25
>>       WL#4630 - ST: MySQL Backup Client Program - Milestone 1
>>
>>       The MySQL Backup Client Program is a command line client
>>       program, that examines and displays the contents of a
>>       backup image.
>>
>>       By command line options you can select different types
>>       of information to view.
>>
>>       It does also have a search capability for finding
>>       specific database objects in the backup image.
>>
>>       For a detailed explanation of the options, run
>>
>>           mysqlbackup --help
...
>> +static const char *load_default_groups[]= { "mysqlbackup",
>> +#ifdef MYSQL_CLIENT
>> +                                            "client",
>> +#endif
> 
> Since MYSQL_CLIENT was undef just above, is this just for future
> extensions?


Yes. In a later milestone, mysqlbackup shall connect to the server and
initiate backups. At that point, I need to include the client part.

...
>> +#ifdef MYSQL_CLIENT
>> +/* Options required by a MySQL client (connecting to a server). */
>> +static const char       *opt_database;
> 
> As far as I can tell, this is usually called opt_databases in other
> clients.  Any reason it should be different here?


I found "opt_databases" in mysqlcheck.c and mysqldump.c only. For these
clients, the option means to check/dump several databases instead of a
list of tables. So it is different from the option that supplies the
default database name for the connection. For the latter, mysqlbinlog.cc
has "database", mysql.cc has "current_db", mysqltest.cc has "opt_db". So
it seems it *must* be different somehow. ;-)

...
>> +enum options_mysqlbackup
>> +{
>> +  OPT_MYSQLBACKUP_CATALOG_SUMMARY=1024,
> 
> What is the magic of 1024?


Rafal asked the same. Here my reply:

The mysys function handle_options(), my_print_help(), and
my_print_variables() take an options array as argument, which identifies
the options by an integer value. This value is a single character for
many options like 'v' for verbose. Many characters are used up for
standard MySQL options. Options that do not need a single letter
equivalent can specify identifiers > 255. This is done in client_priv.h:

enum options_client
{
  OPT_CHARSETS_DIR=256, OPT_DEFAULT_CHARSET,
  OPT_PAGER, OPT_NOPAGER, OPT_TEE, OPT_NOTEE,
  ...

for options common to all clients. Client-specific options use numbers
above these. I selected 1024, which feels safe. But now I found an
example in mysqltest.cc, how to do it better:

enum {
  OPT_SKIP_SAFEMALLOC=OPT_MAX_CLIENT_OPTION,
  OPT_PS_PROTOCOL, OPT_SP_PROTOCOL, OPT_CURSOR_PROTOCOL,
  ...

I'll change 1024 to OPT_MAX_CLIENT_OPTION and add a comment.

...
>> +static struct my_option my_long_options[] =
>> +{
>> +  /* General options. */
> 
> Isn't there some way that clients can share this code? It does not
> seem right to have to copy&paste this for every new client.


Since this is a list of initializers to an array of structs, we would
probably need a macro to do this. Or a bunch of macros that combine
related options each. Anyhow, it is probably out of scope of this task.

...
>> +  puts("By Sun Microsystems, for your professional use\n\
>> +This software comes with NO WARRANTY:  This is free software,\n\
>> +and you are welcome to modify and redistribute it under the GPL
> license\n");
>> +
> 
> Have you checked that this is correct?  Should not some copyright
> notice with year be included (e.g., see mysqld)?


Hm. I copied it from mysqlbinlog and changed the name. It seems that
every client displays a different text. Perhaps we could also leave it
away. Or check with someone authority, what we should print.

...
>> +/**
>> +  Print a human readable number.
> 
> I do not think "number" is right here.  Isn't it a size in bytes?


Doesn't include a size a number too? I use the function for the file
size and the number of bytes per table-data chunk. So what do you
suggest to write here? "human readable number of bytes"? Or just add
"K", "M", ... and leave "bytes" to be printed outside of the function?

...
>> +  if (!opt_mysqlbackup_exact)
>> +  {
>> +    /* Shrink value while it's > 16K and postfix list is not at end. */
>> +    while (((value > 16383) || (value < -16384)) &&
> *(++postfix_p))
>> +      value/= 1024;
>> +    /*
>> +      In case that we terminated the loop due to end of the postfixes
>> +      list, we need to compensate the loop's ++postfix_p so that we
>> +      point to the last non-NULL postfix. Note that this can only happen
>> +      on machines where longlong has more than 64 bits.
>> +    */
>> +    if (!*postfix_p)
>> +      --postfix_p; /* purecov: inspected */
> 
> I am not very fond of dead code like this.  It is not possible to
> test, and it took me some time to verify that it is actually correct.
> I would guess that if longlong ever becomes something more than 64
> bits, there are bigger problems that need to be solved ...


Possible. But do you suggest to consciously leave this unsolved? Didn't
we face enough problems with code that didn't expect "long" could be
longer than 32 bits? If we will get problems with code that doesn't
think of longlong to be longer than 64 bits ever, do we want to
consciously add yet another one here?

...
>> +  case BSTREAM_IT_CHARSET:
>> +  case BSTREAM_IT_USER:
>> +  case BSTREAM_IT_TABLESPACE:
>> +  {
>> +    struct st_backup_global *bup_global= (struct st_backup_global*)
> item;
>> +    type_s= bup_global->glb_typename;
>> +    mdata= &bup_global->glb_metadata;
>> +    DBUG_PRINT("bupstrm", ("%s", type_s));
>> +    break;
>> +  }
> 
> In the above code block, the order of the statements is different from
> the others below.  I think it would be good with some symmetry here.


Ok. Which order do you like best?

...
>> +    case BSTREAM_IT_SPROC:
>> +      DBUG_PRINT("bupstrm", ("sproc"));
>> +      type_s= "Sproc";
>> +      break;
>> +    case BSTREAM_IT_SFUNC:
>> +      DBUG_PRINT("bupstrm", ("sfunc"));
>> +      type_s= "Sfunc";
>> +      break;
> 
> Is sproc and sfunc well known abbreviations?  Does the s mean stored or
> sql?
> I think it would be good to be more explicit here.


Sure, but look. At the moment I print:

    Table     'mysqltest1'.'t3'
    Sproc     'mysqltest1'.'p1'
    Sproc     'mysqltest1'.'p2'
    Sfunc     'mysqltest1'.'f1'
    Sfunc     'mysqltest1'.'f2'
    View      'mysqltest1'.'v1'
    View      'mysqltest1'.'v2'
    Event     'mysqltest1'.'e1'
    Event     'mysqltest1'.'e2'
    Privilege 'mysqltest1'.'<empty>'

If I change to "stored procedure", the indentation of the object names
will become so wide that it may be difficult to associate the type name
with the object name for "table", "view", and "event". For table one
could print "relation", which is longer, but I don't know how to make
"view" longer. So what shall I do?

...
>> +/**
>> +  Print error from parsing the search name.
>> +
>> +  @param[in]    format          printf-style format string
>> +  @param[in]    args            list of arguments for the format string
>> +*/
>> +
>> +static void error_parse_search_name(const char *format, ...)
>> +  ATTRIBUTE_FORMAT(printf, 1, 2);
>> +static void error_parse_search_name(const char *format, ...)
> 
> I do not quite understand what is going on here, but is this function
> really necesssary?  It seems like all you achieve is 3 less times of
> typing "Cannot parse search name.\n".  If you really wanted to save
> typing, you could save more by using a macro.


I have been asked to make functions for even less savings. Typing is not
a problem. Copy-n-paste would be the natural choice. The (small) wins
are less string space, and less line breaks.

Since a macro cannot have a variable number of arguments, you probably
mean to add the string through a macro, e.g.
"error_parse_search_name(CPSN "Unrecognized stuff at '%s'\n", tend);"?
But then you probably want a meaningful macro name, and I'll get line
breaks again. Or did you think of something else? It would be good if
you could make more concrete suggestions in a review.

...
>> +/**
>> +  Parse the search name.
>> +
>> +  Split search name into database name, if present, and object name.
>> +
>> +  @param[in]    search_name             search name
>> +  @param[out]   search_database_name_p  database name
>> +  @param[out]   search_object_name_p    object name
>> +
>> +  @return       status
>> +    @retval     0                       ok
>> +    @retval     != 0                    error
>> +*/
>> +
>> +static int parse_search_name(const char *search_name,
>> +                             char **search_database_name_p,
>> +                             char **search_object_name_p)
>> +{
> 
> It seems to me like you are re-inventing the wheel here.  Parsing for
> quoted strings etc. must have been done before.  Can't some existing
> code be reused?


Which existing code do you mean?

...
>> +      /* Skip dot. */
>> +      tend++;
>> +      /* We need an object name. Skip space. */
>> +      for (token= tend; isspace(*token); token++) {}
> 
> Is it supposed to be OK with spaces after the dot?


Well, I defined the syntax to my liking. :) And yes, I did explicitly
allow for it. This having said, I realized that I didn't have a test for
it. Fixed.

I guess that no one ever wants to type a space after the dot (which
requires escaping from the shell after all), but if the command is built
by a script, it might be helpful if the program accepts space at this place.

> 
>> +      /* Check for quoted search string. If unquoted, search for end. */
>> +      quote= '\0';
>> +      switch (*token) {
>> +      case '\'':
>> +      case '\"':
>> +      case '`':
>> +        quote= *(token++);
>> +      }
>> +      /* Find end of token. */
>> +      if (quote == '\0')
>> +        for (tend= token; *tend && !isspace(*tend); tend++) {}
>> +      else
>> +        for (tend= token; *tend && (*tend != quote); tend++) {}
> 
> A lot of duplicate code here.  I suggest extracting the quoted string
> stuff into a separate method which can be called several times.


See, what I meant above ("I have been asked to make functions for even
less savings")? You suggested, not to make a function for adding a
string at four places, but want a function to replace a couple of
instructions at two places?

The function would need to take quote, token, and tend as parameters,
and return object. Tricky to select a name that makes its effect
obvious. Not simple to document its operation. Reading the parser one
would have to jump between two functions to see, how they interact in
setting the four variables.

I'd not be happy to waste time on this, but if you insist...

...
>> +        /* Now we must be at string end. */
>> +        if (*tend)
>> +        {
>> +          error_parse_search_name("Unrecognized stuff at '%s'\n", tend);
>> +          goto end;
> 
> stuff?  I suggest content or something more "respectful".


:) I searched for "stuff" through the MySQL sources and found > 300
lines. I found it in > 300 MySQL related emails in my archive. I found
it in > 200 Linux man pages. It seems to be the way people are talking.

Anyway, what do you suggest as a replacement?

...
>> +    printf("Searching '%s'.'%s'\n", search_database_name,
> search_object_name);
> 
> I suggest "Searching for ..."


You love linebreaks, do you? ;-) Ok. fixed. BTW. Thanks for a suggestion
here.

Ah. And do you think the command line option should be --search-for
instead of --search? Also, is it ok to have "search" without "for" in
variable names?

...
> I suggest a comment here.  Later blocks of this method have one.


Ok.

> 
>> +    for (search_array_p= search_array_array; *search_array_p;
> search_array_p++)
>> +    {
>> +      for (idx= 0; idx < (*search_array_p)->elements; idx++)


...
>> +    if (!my_wildcmp(&my_charset_latin1, (char*) name->begin,
>> +                    (char*) name->end, (char*) search_database_name,
>> +                    (char*) search_database_name +
> search_database_name_len,
>> +                    '\\', '_', '%'))
>> +    {
>> +      DYNAMIC_ARRAY **search_array_p;
>> +      DYNAMIC_ARRAY *search_array_array[]= {
>> +        &bup_database->db_tables,
>> +        &bup_database->db_perdbs,
>> +        NULL
>> +      };
>> +      struct st_backup_perdb  *bup_perdb;
>> +      uint                    jdx;
> 
> You are repeating the same pattern here. Please, consider if some
> common code can be extracted into a separate method.


Done.

...
>> +read_and_print_summary(void *image_handle,
>> +                       struct st_backup_catalog *bup_catalog)
>> +{
...
>> +  brc= backup_read_summary(image_handle, bup_catalog);
> 
> Why is bup_catalog the parameter here?  It seems backup_read_summary
> only uses the header internally, so why not use the header instead?  I
> guess the same holds for this method.


I like it better to use a common "handle" for all functions of an
interface. That way I don't need to change the signature when I change
the implementation.

...
>> +  /*
>> +    Print summary.
>> +  */
>> +  if (opt_mysqlbackup_summary)
> 
> Is it necessary to read the summary if it is not supposed to be
> printed?


Yes. In case of an inline summary, we need to read over it until the
next chunk, even if we don't want to print it.

>    Anyway, why not let the caller call backup_read_summary
> directly and just provide a method for printing?


Are you joking with me, or do you want to frustrate me? I made a
function to print error at four places, you don't want it. I did not
make a function from two slightly different instruction sequences for
parsing, so you want one. I made a function to combine reading and
printing of summary to be called from two places with exactly the same
instructions, you don't want it. (?!?)

>   A general design
> advice is to let methods have just one responsibility.


This is pretty new to me. I learned to develop in hierarchies. The
bottom methods may have just one responsibility. But already the next
higher level calls two or more methods from the lower level and thus has
two or more responsibilities.

See for example Backup_restore_ctx::do_backup(). It has the
resposibilities logging, locking, starting drivers, polling drivers,
writing image file, and so on.

...
>> +    /*
>> +      Binlog coordinates are present, if the BSTREAM_FLAG_BINLOG flag is
>> +      set in the header *and* table data are in the image.
>> +      DBUG_ASSERT(!(hdr->flags & BSTREAM_FLAG_BINLOG) ||
>> +                  hdr->binlog_pos.file || hdr->binlog_group.file);
>> +    */
> 
> Why is this disabled?


At the time of committing this version of the patch, there was a bug in
the backup code. This is fixed now and a new version of my patch hasn't
anything disabled at this place.

...
>> +int main(int argc, char** argv)
> 
> This method is very long.  I suggest splitting into several methods,
> each with a single responsibility.


Can you please define "single responsibility"? I have the impression
that I won't get it right without such definition.

> 
>> +{
>> +  char                                  **defaults_argv;
>> +  void                                  *image_handle;
>> +  char                                  *search_database_name= NULL;
>> +  char                                  *search_object_name= NULL;
>> +  struct st_backup_catalog              *bup_catalog= NULL;
>> +  struct st_bstream_image_header        *hdr;
>> +  enum enum_bstream_ret_codes           brc;
>> +  int                                   rc;
>> +  uint                                  idx;
>> +  int                                   errpos= 0;
>> +  char                                  llbuff[22];
> 
> Some explanations for the different local variables would be good.
> In particular, I would like to know the significance of the different
> values for rc.


Can you please point me at some code, where this is done exemplarily?

...
>> +  if (hdr->flags && opt_verbose)
>> +  {
>> +    const char  *separators[]= {"", ", "};
>> +    int         separator= 0;
> 
> This separator trick is pretty need, but it took me some time to
> understand so a comment would be good.


What means "pretty need"? "nasty"? Anyway, I tried to explain this in
detail. But it became quite long and still complicated. So I chose a
simpler solution with a simpler comment.

...
>> +  if (hdr->flags & BSTREAM_FLAG_INLINE_SUMMARY)
> 
> Is it not possible to test this?


As far as I know, the current backup code won't create an image with
inline summary. Spending an error injection point seems to be overkill.
The same call to read_and_print_summary() is tested with end-summary
later in the function.

...
>> +      /*
>> +        The first two character sets are special:
>> +        1. Character set to use for interpreting the backup file.
>> +        2. Server character set.
>> +        These do not count as catalog items.
> 
> I have been told this a few times already.  This might be an
> indication that some code could be extracted and shared, or just that
> this comment is superfluous.


From my point of view, the problem is the special handling of the first
two character sets. This makes the code pretty spaghetti.

I am unsure, which comment to remove. Not everyone is a reviewer,
reading all of the code. If you are sure that everybody will find the
relevant comment, regardless of which part of the code he looks at, then
tell me which comment to leave in and which to remove.

...
>> +  if (opt_mysqlbackup_search &&
>> +      !opt_mysqlbackup_metadata_statements &&
>> +      !opt_mysqlbackup_metadata_extra)
>> +  {
>> +    search_object(bup_catalog, search_database_name,
> search_object_name);
>> +    goto end;
>> +  }
> 
> 
> I do not understand why the searching is split in two places.


It isn't split. It is done at one of two places, depending on command
line options. But I found a way to avoid it.

>    Why
> could not searching be done here even if one is interested in metadata?


During the search, every found item is printed immediately. If we don't
have read the meta data yet, we can't print it. But as said above, I
changed it.

...
>> + use_err:
>> +  usage();
>> +  rc= 1;
>> +
> 
> Instead of a particular use_err target, why to just go to end and use
> the value of rc to determine whether usage should be printed.


I do not want to print the usage one every error, but only for wrong
usage, so to say (wrong command line option, including invalid search
string, wrong number of command line arguments). Using a special value
for rc might do the trick, but requires careful documentation. What
would be the win?

> 
>> + end:
>> +  /* Flush stdout to get output in right order in case of error
> messages. */
>> +  fflush(stdout);
>> +  /* Cleanup. */
>> +  switch (errpos) {
>> +  default:
>> +  case 40:
>> +    brc= backup_image_close(image_handle);
>> +    if (brc != BSTREAM_OK)
>> +      rc= 2;
>> +  case 30:
>> +    backup_catalog_free(bup_catalog);
>> +  case 25:
>> +    my_free(search_object_name, MYF(MY_ALLOW_ZERO_PTR));
>> +    my_free(search_database_name, MYF(MY_ALLOW_ZERO_PTR));
>> +  case 20:
>> +#ifdef MYSQL_CLIENT
>> +    my_free((char*) opt_database, MYF(MY_ALLOW_ZERO_PTR));
>> +    my_free((char*) opt_host, MYF(MY_ALLOW_ZERO_PTR));
>> +    my_free((char*) opt_pass, MYF(MY_ALLOW_ZERO_PTR));
>> +    my_free((char*) opt_user, MYF(MY_ALLOW_ZERO_PTR));
>> +#endif
> 
> Are you sure that this clean-up is not needed if handle_option() fail?
> Or to state it differently, are you sure that handle_option() does
> the necessary cleanup should it fail?  Anyway, why bother with errpos
> when doing it in MY_ALLOW_ZERO_PTR mode.  Would it not be OK to call
> free for all pointers regardless of when it failed?


True. Fixed.

...

Regards
Ingo
-- 
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Dr. Roland Bömer
Vorsitzender des Aufsichtsrates: Martin Häring   HRB München 161028
Thread
bzr commit into mysql-6.0-backup branch (ingo.struewing:2745) WL#4630Ingo Struewing25 Nov
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Øystein Grøvlen11 Dec
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing12 Dec
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Rafal Somla19 Dec
        • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing10 Feb
          • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Rafal Somla10 Feb
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Rafal Somla22 Dec
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing12 Feb
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Rafal Somla17 Feb
        • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing17 Feb
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Øystein Grøvlen17 Feb
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing18 Feb
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Øystein Grøvlen19 Feb
        • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2745)WL#4630Ingo Strüwing21 Feb