List:Commits« Previous MessageNext Message »
From:Charles Bell Date:August 24 2009 7:41pm
Subject:Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722
View as plain text  
Lars, Rafal,

This is my reply to Rafal's comments to my comments to his comments...

Items 1-10 were under the 'REQUESTS' section of his latest review and 
11-13 were under 'SUGGESTIONS' section.

I think this is ready to go as I have complied with all 10 of his 
demands. I would suggest you read through this and if you agree, please 
either allow me to push the patch as-is or perhaps have Jorgen review 
the patch and emails to see if he concurs that I have met the demands of 
the reviewers.

Chuck

Rafal Somla wrote:
> Hi Chuck,
> 
> Please reconsider your decisions. See my comments below.
> 
> Charles Bell wrote:
>> Rafal,
>>
>> This is a formal reply to your first review. I have a new patch ready 
>> and will submit that soon for review.
>>
>> I have also asked that this be put on the backup decision meeting 
>> agenda in anticipation of your rejection of my replies.
>>
>> Note: I have implemented all of Jorgen's requests and suggestions 
>> where possible.
>>
>>> REQUESTS
>>> --------
>>> 1. Use a different method for testing errors during insertion of 
>>> various object types into the catalogue (ER_BACKUP_CATALOG_ADD_*) and 
>>> when restoring objects (ER_BACKUP_CANT_RESTORE_*/ER_BACKUP_GET_META_*).
>>
>> I see your request as an alternative implementation that is actually 
>> more complex that my original patch and not a case where the code is 
>> wrong. Your version is more complex because the three areas where this 
>> occurs will mean three copies of something very similar to what I have 
>> with sync points instead of debug execute if. Furthermore, your 
>> version will make the test cases much more complex. Therefore, I must 
>> reject this alternative as it does not invalidate nor improve the 
>> original patch.
>>
>> STATUS: alternative implementation is rejected
>>
> 
> It is a pity I think, but I accept your decision as the solution you 
> proposed will work. In that case I replace my request 1 by two other 
> requests:
> 
> 1a. Guard the added extra debug code (the one which sets the name of the 
> error insertion point in str variable) with appropriate #ifdef so that 
> it is not included in production code.
> 

I think this is extreme overkill and evidence that you do not trust the 
code I have written. I include the #ifdef as you demand but under 
extreme protest.

STATUS: Complied.

> 1b. Replace this fragment:
> 
> +  if (str.ptr())
> +  {
> +    DBUG_EXECUTE_IF(str.ptr(),
> +      m_log.report_error(error, db.name().ptr(), obj->get_name()->ptr());
> +      return NULL;);
> +  }
> 
> with:
> 
> +  if (str.ptr())
> +  {
> +    DBUG_EXECUTE_IF(str.ptr(), res= get_dep_node_res::ERROR;);
> +  }
> 
> so that the error handling branch in the original code is entered.
> 

STATUS: complied

>>> 2. Do not set Backup_pump::m_name member upon error insertion - it 
>>> should be already correctly initialized.
>>
>> As I have explained, this is so that the tests will be deterministic. 
>> Reminder: This form of testing can only show the error handling for a 
>> given error is possible, not that it is 100% error free.
>>
>> Since you do not believe me that the data I have masked is 
>> non-deterministic, consider this modification:
>>
>> (data_backup.cc @ 1279)
>>
>> from:
>>
>>   // Error code insertion
>>   DBUG_EXECUTE_IF("ER_BACKUP_PREPARE_DRIVER",
>>     res= ERROR; m_name= "Default";);
>>
>> to:
>>
>>   // Error code insertion
>>   DBUG_EXECUTE_IF("ER_BACKUP_PREPARE_DRIVER", res= ERROR;);
>>
>> As recorded without m_name change:
>>
>> # Show warning/error from progress and history log
>> # if they exist.
>> #
>> SELECT * FROM mysql.backup_progress WHERE error_num <> 0;
>> backup_id    object    error_num    notes
>> #        ####    Default backup driver can't create its validity point
>> PURGE BACKUP LOGS;
>>
>> And after several runs, this happens. Ooops, who'da thunk it...it 
>> really is non-deterministic!
>>
>> CURRENT_TEST: backup.backup_errors_debug_2
>> --- 
>> d:/source/bzr/mysql-6.0-wl-4722/mysql-test/suite/backup/r/backup_errors_deb 
>>
>> g_2.result      2009-08-20 21:44:42.337506400 +0300
>> +++ 
>> d:\source\bzr\mysql-6.0-wl-4722\mysql-test\suite\backup\r\backup_errors_deb 
>>
>> g_2.reject      2009-08-20 21:45:39.293106400 +0300
>> @@ -53,7 +53,7 @@
>>  # Execute backup
>>  #
>>  BACKUP DATABASE backup_test TO 'backup_test_err.bak';
>> -ERROR HY000: Default backup driver can't create its validity point
>> +ERROR HY000: MyISAM backup driver can't create its validity point
>>  #
>>  # Test case for error ER_BACKUP_CREATE_VP PASSED.
>>  #
>> @@ -63,7 +63,7 @@
>>  #
>>  SELECT * FROM mysql.backup_progress WHERE error_num <> 0;
>>  backup_id      object  error_num       notes
>> -#              ####    Default backup driver can't create its 
>> validity point
>> +#              ####    MyISAM backup driver can't create its validity 
>> point
>>  PURGE BACKUP LOGS;
>>
>> You may believe it is a bug and if so you're welcome to open a bug 
>> report but I do not agree and see this as merely expected behavior. I 
>> have explained this in another email so I will not rehash that here.
>>
>> STATUS: rejected
>>
> 
> Thanks for detailed explanation.
> 
> Even if the non-determinism is justified by the way the code works, the 
> test can be easily made deterministic using:
> 
>   --replace_regex /(Default|MyISAM|...) backup driver/<name> backup driver/
> 
> in front of the select. Thus I insist that the setting of 
> Backup_pump::m_name upon error injection is removed and the test is 
> modified instead.
> 

I have made the change as you demand albeit with extreme prejudice as I 
believe this is a case where the reviewer is usurping his role.

STATUS: Complied

>>> 3. Remove driver list cutting during error insertion in 
>>> restore_table_data() or explain why it is needed.
>>
>> I do this to avoid the non-deterministic problem.
>>
>> STATUS: rejected
>>
> 
> The error injection you propose
> 
>  > +    DBUG_EXECUTE_IF("ER_BACKUP_CREATE_RESTORE_DRIVER", +      // 
> Only process the first driver (0) for debug test
>  > +      active[1]= 0; +      drv[1]= 0;
> 
> makes the error code in restore_table_data() (data_backup.cc:1749) not 
> work as expected. In particular the loop which cancels other drivers 
> than the failing one is not fully executed. Here is the affected code:
> 
> error:
> 
>   state= ERROR;
> 
>   DBUG_PRINT("restore",("Cancelling restore process"));
> 
>   // Call cancel() for all active drivers
> 
>   for (uint n=0; n < info.snap_count(); ++n)
>   {
>     if (!active[n])
>       continue;
> 
>     DBUG_PRINT("restore",("Cancelling restore driver %s",
>                            info.m_snap[n]->name()));
>     res= active[n]->cancel();
>     if (res)
>     {
>       if (!bad_drivers.is_empty())
>         bad_drivers.append(",");
>       bad_drivers.append(info.m_snap[n]->name());
>     }
>   }
> 
> I think it is not good to change the normal execution path so 
> drastically, as we can miss problems which could occur there. We should 
> test the error handling code as it is, not an artificial modification of 
> it.
> 
> I assume that you are right about inherent non-determinism in the order 
> in which backup/restore drivers are created and listed in the catalogue. 
> In that case, I think this non-determinism should be removed in the 
> test, by e.g. result masking.
> 

STATUS: Complied

>>> 4. Directly trigger error when testing for 
>>> ER_BACKUP_STOP_RESTORE_DRIVERS.
>>
>> This does *not* work. I have changed the code exactly as you say and 
>> the test case fails. That is because we are simulating an error here 
>> and the normal path of execution does not enter the section you 
>> requested I move  the code to.
>>
>> STATUS: rejected
>>
> 
> You did not explain what and how fails. If it fails it looks suspicious 
> - probably an error in the code. Note that what I proposed is doing the 
> following error injection in the above driver cancelling loop:
> 
>   for (uint n=0; n < info.snap_count(); ++n)
>   {
>     if (!active[n])
>       continue;
> 
>     DBUG_PRINT("restore",("Cancelling restore driver %s",
>                            info.m_snap[n]->name()));
>     res= active[n]->cancel();
>     DBUG_EXECUTE_IF("trigger_error", res= TRUE;);  <--- PROPOSED INJECTION
>     if (res)
>     {
>       if (!bad_drivers.is_empty())
>         bad_drivers.append(",");
>       bad_drivers.append(info.m_snap[n]->name());
>     }
>   }
> 
> Thus it simulates exactly the situation when driver's cancel() method 
> signals error, in which case ER_BACKUP_STOP_RESTORE_DRIVERS should be 
> reported. If this does not work, then we have a bug in the code - it 
> should work!
> 
> Thus I insist that we do not go around this problem, as your original 
> patch proposed, but use this most direct method of triggering the error 
> and investigate why it does not work (report a bug if there is a problem).
> 

This will not work, Rafal. I don't how many times I must explain that 
your requested changes place the code in a situation where the code in 
question is never reached. I forced the issue under protest by inserting 
additional debug code to code around the code to comply with your 
request. I object to the use of conjecture in a review.

STATUS: Complied after a fashion


>>> 5. Also trigger direct error in send_reply() method 
>>> (ER_BACKUP_SEND_REPLY).
>>
>> Done. BUG#46594 set to 'not a bug'.
>>
>> STATUS: code changed
>>
>>> 6. Make sure that errors are simulated directly after a call to 
>>> function reporting error, so that all code following it is executed.
>>
>> Done.
>>
>> STATUS: code changed

Status: We are agreeing on this point.

>>
>>> 7. Remove extra variable if a single one can be used.
>>
>> Done.
>>
>> STATUS: code changed
>>

Status: We are agreeing on this point.

>>> 8. When triggering ER_BACKUP_RESTORE_2 do not change type of the 
>>> operation stored in the context class.
>>
>> Done.
>>
>> STATUS: code changed
>>

Status: We are agreeing on this point.

>>> 9. Remove assertion where error should be reported 
>>> (ER_BACKUP_WRONG_TABLE_BE).
>>
>> I have made the change as you requested and it does *not* work. In the 
>> case of snap= NULL, the server crashes. In the case of snap_num + 1 
>> the test fails.
>>
>> STATUS: rejected
>>
> 
> The change I proposed looks as follows (around kernel.cc:1834):
> 
>     DBUG_PRINT("restore",(" table's snapshot no. is %d", it->snap_num));
> 
>     DBUG_EXECUTE_IF("trigger_error", it->snap_num= <wrong number>); <---
> 
> HERE
>     Snapshot_info *snap= info->m_snap[it->snap_num];
> 
>     if (!snap)
>     {
>       /*
>         This can happen only if the snapshot number is too big - if we 
> failed
> 
> As already explained, if this does not work, then we have a problem with 
> error detection here. In case it->snap_num is wrong (e.g. backup image 
> corruption), the info->m_snap[it->snap_num] should be NULL and the error 
> should be reported.
> 
> I insist that we test it here as suggested and report bug if it does not 
> work, instead of injecting code which pretends that everything is ok:
> 
> 
> +    // Debug insertion for ER_BACKUP_WRONG_TABLE_BE.
> +    // We must use debug insertion to avoid assertion
> +    DBUG_EXECUTE_IF("ER_BACKUP_WRONG_TABLE_BE", + 
> log.report_error(ER_BACKUP_WRONG_TABLE_BE, it->snap_num + 1);
> +      return BSTREAM_ERROR;);
> 

With this latest suggestion, I got it to work.

STATUS : Complied

>>> 10. Remove error insertion for ER_BACKUP_INTERRUPTION since it is 
>>> already extensively tested in backup_interruption test from 
>>> backup_engines suite and this without using error injection.
>>
>> There are no tests that test this error message or its conditions. I 
>> have searched the latest tree and there is also not test result that 
>> contains this error message.
>>
>> STATUS: rejected
>>
> 
> As I have said already twice, this is tested in backup_interruption test 
> from backup_engines suite. The reason why you do not see 
> ER_BACKUP_INTERRUPTION mentioned in test and result files is that this 
> message is reported with log_level::INFO, and such messages are written 
> only to the server error log - never reported by SQL statement or pushed 
> on the error stack (see Logger::write_message() method in logger.cc). 
> You can check with code coverage tool that the line in 
> Logger::report_killed() which reports this error is executed during 
> backup_interruption test (or simply look at the server error log after 
> executing the test).
> 
> Since this particular situation and error message is tested extensively, 
> I think the error injection you proposed should be removed to not 
> pollute the code and keep injections at minimum.
> 

Ok, if you insist but I have executed said test and the code does *not* 
reach this error condition.

STATUS: Complied

>>> SUGGESTIONS
>>> -----------
>>> 11. Whenever possible, test the original error branch, not a copy of it.
>>
>> Your specific example for the binlog test still crashes the server. 
>> See BUG#46574.
>>
>> Your specific example for ER_BACKUP_UNEXPECTED_DATA will not work 
>> because, again, we are executing through the normal non-error portion 
>> of the code and your request has me move the debug insertion to inside 
>> an error condition. This will not work and the test case fails.
>>
>> STATUS: rejected
>>
> 
> I don't understand why it should not work. The injection which I propose 
> looks as follows:
> 
>   /*
>     If there are no tables stored in the image, there is nothing to do 
> in this
>     function. However, we must call bstream_rd_data_chunk() which will 
> absorb
>     the 0x00 byte signalling end of (the empty) table data chunk sequence.
>   */
>   if (info.snap_count() == 0 || info.table_count() == 0) // nothing to 
> restore
>   {
>     int res= bstream_rd_data_chunk(&s, &chunk_info);
>     DBUG_EXECUTE_IF("inject_error", res= BSTREAM_OK;); <---- HERE
>     if (res != BSTREAM_EOC)
>     {
>        info.m_log.report_error(res == BSTREAM_ERROR ?
>                                         ER_BACKUP_READ_DATA :
>                                         ER_BACKUP_UNEXPECTED_DATA);
>        DBUG_RETURN(ERROR);
>     }
> 
> This is the most direct way to trigger the situation where 
> ER_BACKUP_UNEXPECTED_DATA is reported. Of course, to get to this 
> situation you need to restore an image without tables, so that the whole 
> branch is entered. Only in such situation the error should be reported.
> 

Well, you didn't say it needed to be an empty backup and it was not 
clear from the code (but it is now) that that was the case. I have made 
the change and it works now.

STATUS: complied

>>> 12. Other minor suggestions.
>>
>> All were changed except the first which forced the server to crash and 
>> the third which I do not consider relevant.
> 
> I would rather investigate the crash - is it a bug in the error handling 
> code?
> 

No crash now that I've pulled latest code.

STATUS: complied

>>
>> STATUS: partially accepted
>>
>>> 13. Make include/basic_data.inc database independent so that it can 
>>> be easily used from other tests.
>>
>> This is not so easy to do. Consider the event, trigger, procedure, 
>> function, and grant statement. Parameterizing this would require extra 
>> work. I choose to leave it as it is.
>>
> 
> Ok, good to know. With this the situation looks much better.
> 

STATUS: Unknown. It seems like Rafal is ok with me not implementing this 
suggestion.


Thread
bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Chuck Bell9 Aug
  • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Jørgen Løland11 Aug
    • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Charles Bell20 Aug
  • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Rafal Somla17 Aug
    • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Charles Bell17 Aug
      • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Rafal Somla19 Aug
        • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Charles Bell20 Aug
          • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Rafal Somla20 Aug
          • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Rafal Somla20 Aug
            • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Charles Bell20 Aug
        • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Charles Bell20 Aug
    • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Charles Bell20 Aug
      • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Rafal Somla21 Aug
        • Re: bzr commit into mysql-6.0-backup branch (charles.bell:2857) WL#4722Charles Bell24 Aug