List:Commits« Previous MessageNext Message »
From:Rafal Somla Date:March 3 2008 1:15pm
Subject:Re: your review of WL#4212
View as plain text  
Chuck,

Thanks for review and being open in the discussion. I think we've come to an 
agreement. Please just check below if I'm not mistaken in thinking that the code 
for selecting backup engine is actually correct.

I'll start working on the merge now - expect a merge patch from me by this 
Wednesday.

Rafal

Chuck Bell wrote:
> Hi Rafal,
> 
> Thanks for your excellent work on my lengthy review requests. Your
> objections to some of my requests are noted and we should probably ink those
> things on an agenda for a team meeting. Specifically, let's discuss coding
> guidelines (or the lack thereof) and reviewer/reviewed responsibility
> boundaries. I think we can come to some solid, healthy agreements that will
> benefit all.
> 
> I especially appreciate some of your comment and code formatting changes. I
> think these improve the maintainability and readability of the code greatly
> -- excellent work!
> 
> The patch looks good. I noted a couple of small things. I don't need to see
> another patch to review until you've changed the code to work with the
> latest mysql-6.0-backup tree. 
> 
>> @@ -438,6 +437,12 @@ int Backup_restore_ctx::prepare(LEX_STRI
>>    which objects to backup. NULL if an error was detected.
>>    
>>    @note This function reports errors.
>> +
>> +  @note It is important that changes of meta-data are blocked as part of
> the
>> +  preparations. The set of server objects and their definitions should
> not
>> +  change after the backup context has been prepared and before the actual
> backup
>> +  is performed using @c do_backup() method. In particular, meta-data
> should be 
>> +  frozen when 
>>   */ 
>>  Backup_info* Backup_restore_ctx::prepare_for_backup(LEX_STRING location)
> 
> The comment leaves me hanging. Was there something else you wanted to say
> there? I don't need to see this one fixed, just fix it with your next patch.
> 

Right -- the last, unfinished sentence is a left-over from previous versions of 
this comment. I removed it.

> 
> Please double check that the FIXME's that you know are future needs are
> added to the @todo lists so we can see them in doxygen. Again, I don't need
> to review this again.
> 

Good point. I reviewed my FIXMEs and most of them turned out to be obsolete. In 
one place I added explanations and @todo in doxygen documentation.

> One last thing...while researching a bug, I found another bug in the kernel
> code. Since I cannot apply your patches to the trees that I have (long story
> -- I'd rather wait until the merge patch), I'd like to ask you to examine
> your code for this bug. Also, since you've already done extensive
> modifications to the code it would be silly for me to open a bug report on
> it. Here's your chance to prevent it! ;)
> 
> @line#557 in old kernel.cc:
>>   /*
>>     Try default drivers in decreasing order of preferrence
>>     (first snapshot, then default)
>>   */
>>   for (no=1; no >=0; --no)
>>   {
>>     if (!m_snap[no]->accept(tbl,hton))
>>      continue;
>>
>>     DBUG_RETURN(no);
>>   }
>>
>>   report_error(ER_BACKUP_NO_BACKUP_DRIVER,tbl.describe(buf,sizeof(buf)));
>>   DBUG_RETURN(-1);
> 
> The problem is if the default drivers do not accept the table, this loop
> terminates but does not report the error. Instead, it returns 0 (no == 0)
> and that causes code upstream to fail (the DBUG_PRINT goes whacky first but
> other things get crazy from there). I have implemented the following change
> to the old code so that I can continue to work on the bug report, but it is
> best that you ensure that the error is corrected with your refactored code.
>

Please double-check my thinking on this. I think the above loop is corect, 
because the DBUG_RETURN(no) statement inside the loop is reached only if one of 
the two snapshots accepts the table. If not, then "continue" will be executed 
twice and the loop will terminate when 'no' drops below zero. Then the error 
will be reported and -1  returned to indicate the problem. The returned value is 
checked inside Backup_info::add_table() and the problem is reported up the call 
stack. If this doesn't work, the reason must be something else...


> --- a/sql/backup/kernel.cc	2008-02-15 10:57:49 -05:00
> +++ b/sql/backup/kernel.cc	2008-02-28 12:44:45 -05:00
> @@ -561,11 +561,16 @@ int Backup_info::find_backup_engine(cons
>    for (no=1; no >=0; --no)
>    {
>      if (!m_snap[no]->accept(tbl,hton))
> -     continue;
> +    {
> +      if (no == 0) // if default doesn't accept, it's an error
> +        goto error;
> +      continue;
> +    }
>  
>      DBUG_RETURN(no);
>    }
>  
> +error:
>   report_error(ER_BACKUP_NO_BACKUP_DRIVER,tbl.describe(buf,sizeof(buf)));
>    DBUG_RETURN(-1);
>  }
> @@ -1015,9 +1020,10 @@ Backup_info::add_table(Db_item &dbi, con
>  
>    int no= find_backup_engine(tl->table,t); // Note: reports errors
>  
> -  DBUG_PRINT("backup",(" table %s backed-up with %s engine",
> -                       t.describe(buf),
> -                       m_snap[no]->name()));
> +  if (no >= 0)
> +    DBUG_PRINT("backup",(" table %s backed-up with %s engine",
> +                         t.describe(buf),
> +                         m_snap[no]->name()));
>  
>    /*
>      If error debugging is switched on (see debug.h) then any table whose
> 
> Please check that your new code can handle the unlikely event (but possible
> as I have seen) that no drivers accept a table. I understand that the code
> does not work that way now, but this is a logic error that needs to be fixed
> so that later changes do not encounter this unusual bug.

The corresponding fragment in the new code is this:

   if (!snap)
   {
     List_iterator<Snapshot_info> it(snapshots);

     while ((snap= it++))
       if (snap->accept(tbl, se))
         break;
   }

   if (!snap)
     m_ctx.fatal_error(ER_BACKUP_NO_BACKUP_DRIVER,tbl.describe(buf));

   DBUG_RETURN(snap);

This time the find_backup_engine function should return pointer to snapshot 
object or NULL. If accepting snapshot is found in the 'snapshots' list, then the 
while loop is broken with 'snap' pointing at this snapshot. If no snapshot 
accepts the table, then 'snap' will remain NULL, thus error will be reported and 
NULL returned by the function. Do you agree that this is correct?

Rafal
Thread
Re: your review of WL#4212Rafal Somla26 Feb
  • RE: your review of WL#4212Chuck Bell26 Feb
    • Re: your review of WL#4212Rafal Somla27 Feb
      • RE: your review of WL#4212Chuck Bell29 Feb
        • Re: your review of WL#4212Rafal Somla3 Mar
          • RE: your review of WL#4212Chuck Bell3 Mar