Hi Chuck,
I committed a new patch with fixes <http://lists.mysql.com/commits/60888>.
Regarding the windows failure, could you help me here by preparing a compiled
tree on our team's windows machine (it is dl145m I think) so that I could login
there and investigate the issue?
Chuck Bell wrote:
> STATUS
> ------
> Patch not approved - changes requested.
>
> REQUESTS
> --------
> 1. The test backup.backup_intr_errors fails on Windows.
> 2. Spelling error in comments.
> 3. Long line in mtr_report.pl.
> 4. Spacing issues in several locations (See below).
> 5. Fix patch warning.
> 6. Answer question about race condition.
> 7. Remove extra line change in logger.cc.
> 8. Insert bug numbers for several FIX ME's.
>
...
>> mysql-test/lib/mtr_report.pl
>> Suppres detection by mtr expected errors during backup_intr_errors
>> test.
>
> [2] Spelling error 'Suppress'.
>
Fixed.
> ...
>
>> sql/backup/data_backup.cc
>> - Fix error in write_table_data() logic which caused the commit
>> blocker to not be deactivated if error branch was executed.
>> - Add checks for interruptions in various stages of backup and
>> restore protocol, including Sheduler::step() which is the main
>> worker method in write_table_data().
>> - Add explicit log_level::ERROR to report_error() where needed.
>> - Add extra debug sync points needed for testing.
>
> [2] Spelling error 'Scheduler'.
Fixed.
>
> ...
>
>> sql/backup/stream.h
>> Move the temprorary cleanup code to the wrapper function
>> read_meta_data()
>> so that we don't have to depend on the caller for executing it.
>
> [2] Spelling error 'temporary'.
Fixed.
>
>> sql/share/errmsg.txt
>> Add messge used to notify about an interruption of BACKUP/RESTORE
>> operations.
>
> [2] Spelling error 'message'.
Fixed.
>
>> === modified file 'mysql-test/lib/mtr_report.pl'
>> --- a/mysql-test/lib/mtr_report.pl 2008-11-21 15:02:34 +0000
>> +++ b/mysql-test/lib/mtr_report.pl 2008-12-05 07:38:01 +0000
>> @@ -392,6 +392,16 @@ sub mtr_report_stats ($) {
>>
>> ($testname eq 'backup.backup_myisam1') and
>> (/Backup: Can't initialize MyISAM backup driver/) or
>> +
>> + ($testname eq 'backup.backup_intr_errors') and
>> + (
>> + # ignore errors triggered on purpose during
>> BACKUP/RESTORE operation shutdown
>
> [3] This line is too long. Please break up into 2 lines.
>
Done.
> ...
>
>> === added file 'mysql-test/suite/backup/t/backup_intr_errors.test'
>> --- a/mysql-test/suite/backup/t/backup_intr_errors.test 1970-01-01
>> 00:00:00 +0000
>> +++ b/mysql-test/suite/backup/t/backup_intr_errors.test 2008-12-05
>> 07:38:01 +0000
>> @@ -0,0 +1,122 @@
>
> [1] This test fails on Windows:
>
> backup.backup_intr_errors [ fail ]
>
> mysqltest: At line 86: query 'BACKUP DATABASE bup_intr TO
> 'bup_intr.bkp'' failed
> : 1638: Can't write to backup location 'bup_intr.bkp' (file already
> exists?)
>
> The result from queries just before the failure was:
> < snip >
> #
> # Examine backup logs.
> #
> # FIXME: Until BUG#39924 is fixed, change to BUP_CANCEL state will not be
> # seen in backup_progress table and backup_history table will be empty.
> # When the bug is fixed the output below will change and the result file
> # should be modifed accordingly.
> #
> SELECT object, error_num, notes FROM mysql.backup_progress;
> object error_num notes
> backup kernel 0 starting
> backup kernel 0 running
> backup kernel 0 validity point
> SELECT * FROM mysql.backup_history;
> SET SESSION DEBUG="-d";
> #
> # Prepare bup_intr.bkp for RESTORE testing. Note that above BACKUP
> # command should not create the file because it was interrupted.
> #
> BACKUP DATABASE bup_intr TO 'bup_intr.bkp';
>
> I tried for about 2 hours to track this down but wasn't able to. One
> thing that complicated the diagnosis was the execution against all
> storage engines. While not related to this problem (I think), the test
> would be much easier to deal with it if it weren't run against all
> storage engines.
>
> My gut feeling is this is a problem with how Linux and Windows deal with
> file open -- I think it may be that Windows creates the file earlier
> than you expected. But I don't know that to be true.
>
> ...
>
>> +while (`SELECT count(*) > 0 FROM bup_intr.t1`)
>> +{
>> +
>> +# read the next engine and remove it from t1
>> +let $engine=`SELECT engine FROM bup_intr.t1 LIMIT 1`;
>> +eval DELETE FROM bup_intr.t1 WHERE engine='$engine';
>
> [4] While body is not indented.
>
Fixed.
>> +
>> +--echo
>> +--echo ########################################
>> +--echo ## Testing with $engine engine.
>> +--echo ########################################
>> +--echo
>
> [10, 11] I don't think we do to this -- I don't think we need to run
> against all storage engines. By doing so, the result file is really
> large. Given that this test doesn't work on Windows diagnosing the
> problem is all that much more difficult. Unless there is a compelling
> reason to run against all storage engines, I would change this to a
> single pass and place in suite/backup.
>
It is interesting and important to see how different backup/restore engines
react to being interrupted in the middle of their operation. Thus I'll keep the
test as it is.
Note that the result file is long not because of running the test for different
engines - for each engine the same result file is used. It is long, because it
tests interruption in several places during backup/restore process.
> ...
>
>> === added file
>> 'mysql-test/suite/backup_engines/include/backup_restore_interrupt.inc'
>> ---
>> a/mysql-test/suite/backup_engines/include/backup_restore_interrupt.inc
>> 1970-01-01 00:00:00 +0000
>> +++
>> b/mysql-test/suite/backup_engines/include/backup_restore_interrupt.inc
>> 2008-12-05 07:38:01 +0000
>
> ...
>
>> + --connection killer
>> +
>> + --echo #
>> + --echo # Wait for the command to reach its synchronization point,
>> + --echo # then kill it.
>> + --echo #
>> + SET DEBUG_SYNC='now WAIT_FOR here';
>> + --replace_regex /id=[0-9]+/id=<query id>/
>> + eval SELECT state FROM INFORMATION_SCHEMA.PROCESSLIST WHERE id=$id;
>> + --replace_regex /QUERY [0-9]+/QUERY <query id>/
>> + eval KILL QUERY $id;
>> + SET DEBUG_SYNC='now SIGNAL go';
>> +
>
> [4] Why is this indented? It shouldn't be.
>
Why do you say it should not be? I think it should to make it more clear that
these lines are executed from a different connection.
> ...
>
>> +--connection default
>> +
>> +--echo #
>> +--echo # Reap the command and show results.
>> +--echo #
>> +--error ER_QUERY_INTERRUPTED
>> +reap;
>> +--replace_column 2 <error-code>
>> +# One error message contains file path - mask it out.
>> +--replace_regex /Error on delete of '.*'/Error on delete of <backup
>> image path>/
>> +SHOW WARNINGS;
>> +
>> +--echo #
>> +--echo # Examine backup logs.
>> +--echo #
>> +--echo # FIXME: Until BUG#39924 is fixed, change to BUP_CANCEL state
>> will not be
>> +--echo # seen in backup_progress table and backup_history table will
>> be empty.
>> +--echo # When the bug is fixed the output below will change and the
>> result file
>> +--echo # should be modifed accordingly.
>> +--echo #
>> +SELECT object, error_num, notes FROM mysql.backup_progress;
>> +query_vertical SELECT * FROM mysql.backup_history;
>> +
>> +# check that backup image file was removed
>> +if (!$do_restore)
>> +{
>> + --error 1
>> + --remove_file $bdir/bup_intr.bkp
>> +}
>> \ No newline at end of file
>
> [5] Add newline to remove patch warning.
>
Done.
> ...
>
>> === added file
>> 'mysql-test/suite/backup_engines/r/backup_interruption.result'
>> --- a/mysql-test/suite/backup_engines/r/backup_interruption.result
>> 1970-01-01 00:00:00 +0000
>> +++ b/mysql-test/suite/backup_engines/r/backup_interruption.result
>> 2008-12-05 07:38:01 +0000
>
> I don't think this test needs to be in backup_engines. Please explain
> why you think it should be there.
>
To run it with all possible backup/restore drivers, which is the main idea of
backup_engines test suite.
> ...
>
>> === added file
>> 'mysql-test/suite/backup_engines/t/backup_interruption.test'
>> --- a/mysql-test/suite/backup_engines/t/backup_interruption.test
>> 1970-01-01 00:00:00 +0000
>> +++ b/mysql-test/suite/backup_engines/t/backup_interruption.test
>> 2008-12-05 07:38:01 +0000
>> @@ -0,0 +1,261 @@
>
> ...
>
>> + + # before common preparations
>> + let $sync_point= before_backup_common_prepare;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # during common preparations
>> + # before checking privileges
>> + let $sync_point= before_backup_privileges;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + + # before checking that no other BACKUP/RESTORE is running
>> + let $sync_point= before_backup_single_op;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before blocking DDLs
>> + let $sync_point= before_backup_ddl_block;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before openning the stream
>> + let $sync_point= before_backup_stream_open;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before creating catalogue
>> + let $sync_point= before_backup_catalog;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>
> [4] Why is this indented?
To better show the structure of sync points.
>
>> +
>> +# before populating backup catalogue
>> +let $sync_point= after_backup_start_backup;
>> +--source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> +# before do_backup
>> +let $sync_point= before_do_backup;
>> +--source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> +# inside do_backup
>> + # before preamble is written
>> + let $sync_point= before_backup_meta;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + let $sync_point= backup_before_write_preamble;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before write_table_data
>> + let $sync_point= before_backup_data;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # inside write_table_data
>> + # before initial phase
>> + let $sync_point=before_backup_data_init;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before prepare phase
>> + let $sync_point=before_backup_data_prepare;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before sync phase
>> + let $sync_point=before_backup_data_lock;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # inside sync phase
>> + let $sync_point=before_backup_data_unlock;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # after sync phase
>> + let $sync_point=after_backup_binlog;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before final phase
>> + let $sync_point=before_backup_data_finish;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before writing summary section
>> + let $sync_point= before_backup_summary;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>
> [4] Why is this indented?
>
See above.
>> +
>> +# Note: after the call to do_backup() the operation is completed
>> +# and terminates successfully, even if interruption has happened +#
>> after its completion.
>> +
>> +#
>> +# Test RESTORE interruptions.
>> +#
>> +
>> +--replace_column 1 #
>> +BACKUP DATABASE bup_intr TO 'bup_intr.bkp';
>> +DROP DATABASE bup_intr;
>> +
>> +let $do_restore=1;
>> +
>> +# before preparations
>> +let $sync_point= before_restore_prepare;
>> +--source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> +# during preparations
>> + # before logger initialization
>> + let $sync_point= before_restore_logger_init;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + + # before common preparations
>> + let $sync_point= before_restore_common_prepare;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + + # before openning the stream
>> + let $sync_point= before_restore_stream_open;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before creating catalogue
>> + let $sync_point= before_restore_catalog;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before reading header
>> + let $sync_point= before_restore_read_header;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before reading catalogue
>> + let $sync_point= before_restore_read_catalog;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before writing incident event
>> + let $sync_point= before_restore_binlog;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>
> [4] Why is this indented?
See above.
>
>> +# before do_restore
>> +let $sync_point= after_backup_start_restore;
>> +--source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> +# inside do_restore
>> +
>> + # before disabling fkey constraints
>> + let $sync_point= before_restore_fkey_disable;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # before reading metadata and creating objects
>> + let $sync_point= before_restore_read_metadata;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> +
>> + # Note: now we pass the point when database is created during RESTORE,
>> + # thus we need to DROP it before executing RESTORE another time.
>> + DROP DATABASE bup_intr;
>> +
>> + # before locking tables
>> + let $sync_point= before_restore_lock_tables;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + DROP DATABASE bup_intr;
>> +
>> + # before restoring table data
>> + let $sync_point= before_restore_table_data;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + DROP DATABASE bup_intr;
>> +
>> + # inside restore_table_data
>> + # before creating restore drivers
>> + let $sync_point= restore_before_drivers_create;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + DROP DATABASE bup_intr;
>> +
>> + # before initializing the drivers
>> + let $sync_point= restore_before_drivers_init;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + DROP DATABASE bup_intr;
>> +
>> + # before reading table data chunk
>> + let $sync_point= restore_before_read_data_chunk;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + DROP DATABASE bup_intr;
>> +
>> + # before sending the data to a driver
>> + let $sync_point= restore_before_sending_data;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + DROP DATABASE bup_intr;
>> +
>> + # before shutting down the drivers
>> + let $sync_point= restore_table_data_before_end;
>> + --source suite/backup_engines/include/backup_restore_interrupt.inc
>> + DROP DATABASE bup_intr;
>
> [4] Why is this indented?
See above.
>
> ...
>
>> === modified file 'sql/backup/be_thread.cc'
>> --- a/sql/backup/be_thread.cc 2008-10-30 12:29:54 +0000
>> +++ b/sql/backup/be_thread.cc 2008-12-05 07:38:01 +0000
>
> ...
>
>> void Locking_thread_st::wait_until_locking_thread_dies()
>> {
>> + // Nothing to do if the locking thread has not been started.
>> + if (!m_thread_started)
>> + return;
>> +
>> pthread_mutex_lock(&THR_LOCK_caller);
>> if (lock_state != LOCK_DONE)
>> {
>> @@ -350,4 +363,6 @@ void Locking_thread_st::wait_until_locki
>> }
>> else
>> pthread_mutex_unlock(&THR_LOCK_caller);
>> +
>> + m_thread_started= FALSE;
>
> [6] Isn't it possible to have a race condition here with writing to
> m_thread_started? I wonder...but if you are convinced there isn't a
> change then I'm ok with that. I think it is possible.
>
I don't see any possibility of a race here. Can you give any example?
Note that methods of Locking_thread_st are called from a single thread -
synchronization is only needed between that thread and the table locking thread.
Thus Locking_thread_st needs not to be thread safe in the sense that several
threads can use a single instance of that class at the same time.
> ...
>
>> === modified file 'sql/backup/kernel.cc'
>> --- a/sql/backup/kernel.cc 2008-11-28 10:10:39 +0000
>> +++ b/sql/backup/kernel.cc 2008-12-05 07:38:01 +0000
>
> ...
>
>> - DBUG_RETURN(0);
>> + /* + FIXME: this call is here because object services doesn't
>> clean the
>> + statement execution context properly, which leads to assertion
>> failure.
>> + It should be fixed inside object services implementation and then
>> the
>> + following line should be removed.
>> + */
>
> [8] Please annotate the code here making it (more) clear that this needs
> to be fixed with that bug. I wasn't sure which parts you were referring to.
>
> [9] Please add a new bug for this problem if it doesn't already exist
> then add that bug# to this comment.
>
Done. The new bug is 41294.
> ...
>
>> @@ -1231,24 +1387,23 @@ int Backup_restore_ctx::do_restore(bool
>> DBUG_PRINT("restore",("Restoring table data"));
>>
>> - /* - FIXME: this call is here because object services doesn't
>> clean the
>> - statement execution context properly, which leads to assertion
>> failure.
>> - It should be fixed inside object services implementation and then
>> the
>> - following line should be removed.
>> - */
>
> [8] Please add a new bug for this problem and annotate the code here
> making it (more) clear that this needs to be fixed with that bug.
>
Done.
> ...
>
>> @@ -1259,22 +1414,17 @@ int Backup_restore_ctx::do_restore(bool
>> creation of these objects will fail.
>> */
>>
>> - err= restore_triggers_and_events(); // logs errors
>> + DEBUG_SYNC(m_thd, "before_restore_triggers");
>> + err= restore_triggers_and_events(); // logs errors and detects
>> interruptions
>> if (err)
>> DBUG_RETURN(fatal_error(err));
>>
>> - /* - FIXME: this call is here because object services doesn't
>> clean the
>> - statement execution context properly, which leads to assertion
>> failure.
>> - It should be fixed inside object services implementation and then
>> the
>> - following line should be removed.
>> - */
>
> [8] Please add a new bug for this problem and annotate the code here
> making it (more) clear that this needs to be fixed with that bug.
>
Done.
> ...
>
>> === modified file 'sql/backup/stream.h'
>> --- a/sql/backup/stream.h 2008-11-13 13:02:36 +0000
>> +++ b/sql/backup/stream.h 2008-12-05 07:38:01 +0000
>> @@ -192,11 +192,29 @@ read_catalog(Image_info &info, Input_str
>> return ret == BSTREAM_ERROR ? ERROR : OK;
>> }
>>
>> +/*
>> + FIXME: the thd parameter for read_meta_data() is here only because the
>> + temporary code within the function needs it. It should be removed
>> once the
>> + issue is fixed.
>> +*/
>
> [8] Please add the corresponding bug here (or create one if not already)
> and make it clear that this needs to be fixed with that bug.
>
Done.
>> inline
>> result_t
>> -read_meta_data(Image_info &info, Input_stream &s)
>> +read_meta_data(THD *thd, Image_info &info, Input_stream &s)
>> {
>> int ret= bstream_rd_meta_data(&s,
>> static_cast<st_bstream_image_header*>(&info));
>> +
>> + /* + FIXME: the following code is here because object services
>> doesn't clean the
>> + statement execution context properly, which leads to assertion
>> failure.
>> + It should be fixed inside object services implementation and then
>> the
>> + following line should be removed.
>> + */
>
> [8] Please add a new bug for this problem and annotate the code here
> making it (more) clear that this needs to be fixed with that bug.
>
Done.
> Chuck
>