List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:February 10 2009 2:51pm
Subject:bzr commit into mysql-6.0 branch (guilhem:2721) Bug#42519
View as plain text  
#At file:///home/mysql_src/bzrrepos/merge_6.0_myisam_changes_to_maria/mysql-6.0-maria-with-porting/ based on revid:guilhem@stripped

 2721 Guilhem Bichot	2009-02-10
      Maria online backup: BUG#42519 "Maria: RESTORE leads to corrupted table and assertion":
      * working around deficiencies of the restore kernel
      (occuring also in
      BUG 41716 "Backup Error when sending data (for table #1) to Default/Snapshot restore driver"
      BUG 40944 "Backup: crash after myisampack"); this fixes BUG#42519.
      * same workaround for MyISAM (which suffered from a bug similar to BUG#42519).
      * fixing a Maria restore driver bug found during debugging this ("DELETE FROM <table>;" was
      not handled during backup).
      modified:
        mysql-test/suite/backup/r/backup_maria.result
        mysql-test/suite/backup/t/backup_maria.test
        sql/backup/kernel.cc
        storage/maria/ma_bitmap.c
        storage/maria/ma_check.c
        storage/maria/ma_delete_all.c
        storage/maria/ma_examine_non_trans_log.c
        storage/maria/ma_non_trans_log.c
        storage/maria/maria_backup_engine.cc
        storage/maria/maria_def.h
        storage/myisam/myisam_backup_engine.cc

per-file messages:
  mysql-test/suite/backup/r/backup_maria.result
    result is correct; before the bugfixes:
    - a DBUG_ASSERT() would fire when "DELETE FROM <table>;" is run during backup
    - the table would be corrupted in the final CHECK TABLE:
    mysqltest.t1    check   warning Size of indexfile is: 16384      Should be: 8192
    mysqltest.t1    check   warning Size of datafile is: 16384       Should be: 8192
    mysqltest.t1    check   error   Bitmap at page 0 has pages reserved outside of data file length
    because the client's INSERT (before VP creation) would cache stuff which would survive the
    driver's restore.
  mysql-test/suite/backup/t/backup_maria.test
    adding a test which involves "DELETE FROM <table>;" (which caused a crash during backup),
    and a test for BUG#42519
  sql/backup/kernel.cc
    sync points for repeating concurrency bugs
  storage/maria/ma_bitmap.c
    _ma_bitmap_create_first() is called during maria_delete_all_rows() which can happen
    when the table is doing physical logging: do log its operations.
    Optimization: a bitmap page of a temporary table does not need to call
    maria_log_data_page_flush_physical.
  storage/maria/ma_check.c
    update to new prototype
  storage/maria/ma_delete_all.c
    This logical "MA_LOG_DELETE_ALL" log record had a problem: when log was applied it could generate
    a transaction-log record, thus assign an id to the MARIA_SHARE, which could make Checkpoint
    look at the table while it's being restored, which is not good. It also put an empty bitmap
    in memory, which would be out-of-date cached data as soon as the next log record causes
    insertion of some data in a page. Instead, we log the my_chsize() which sounds more in line
    with other places like in ma_check.c.
    We needn't do the same change in MyISAM because it does not have bitmaps or share's ids.
  storage/maria/ma_examine_non_trans_log.c
    MA_LOG_DELETE_ALL can't be found in log, and MA_LOG_CHSIZE_MAD can.
  storage/maria/ma_non_trans_log.c
    MA_LOG_DELETE_ALL can't be put in log, and MA_LOG_CHSIZE_MAD can.
  storage/maria/maria_backup_engine.cc
    * Working around a deficiency of the restore kernel:
     - it leaves a window between creating the table empty and locking it; in this window,
    another thread can do an update to the table (which conflicts with what the restore
    driver will put into files).
     - it does not always really close tables at end of restore, so restore driver has to
    make sure to invalidate any cached info before table is unlocked.
    - see this file's comments for more details about the two problems above;
    they were the causes of BUG#42519 (various corruptions or assertion failures
    in the first DMLs after RESTORE, depending on concurrency).
    * The same logic is needed independently of rebuilding the index or not.
    * keep MARIA_SHARE::close_lock when iterating over states of table,
    to protect against a concurrent checkpoint (safer).
  storage/maria/maria_def.h
    new declarations
  storage/myisam/myisam_backup_engine.cc
    Same changes as in maria_backup_engine.cc (though simplified).
=== modified file 'mysql-test/suite/backup/r/backup_maria.result'
--- a/mysql-test/suite/backup/r/backup_maria.result	2009-01-28 11:08:55 +0000
+++ b/mysql-test/suite/backup/r/backup_maria.result	2009-02-10 14:51:40 +0000
@@ -136,5 +136,121 @@ CHECK TABLE t4 EXTENDED;
 Table	Op	Msg_type	Msg_text
 mysqltest.t4	check	status	OK
 
+connection backup: Start backup
+SET DEBUG_SYNC= 'before_backup_data_prepare SIGNAL bup_sync
+                     WAIT_FOR bup_finish';
+BACKUP DATABASE mysqltest TO 'test.ba';
+
+connection default: Wait for BACKUP to reach its sync point
+SET DEBUG_SYNC= 'now WAIT_FOR bup_sync';
+Modify tables
+delete from t1;
+insert into t1 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t1 set c=repeat("NUS", 4000);
+insert into t1 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t1 select * from t1;
+insert into t1 select * from t1;
+delete from t1 where a=200;
+CHECKSUM TABLE t1;
+Table	Checksum
+mysqltest.t1	3470596356
+select count(*) from t1;
+count(*)
+4
+delete from t2;
+insert into t2 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t2 set c=repeat("NUS", 4000);
+insert into t2 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t2 select * from t2;
+insert into t2 select * from t2;
+delete from t2 where a=200;
+CHECKSUM TABLE t2;
+Table	Checksum
+mysqltest.t2	3470596356
+select count(*) from t2;
+count(*)
+4
+delete from t3;
+insert into t3 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t3 set c=repeat("NUS", 4000);
+insert into t3 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t3 select * from t3;
+insert into t3 select * from t3;
+delete from t3 where a=200;
+CHECKSUM TABLE t3;
+Table	Checksum
+mysqltest.t3	3470596356
+select count(*) from t3;
+count(*)
+4
+delete from t4;
+insert into t4 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t4 set c=repeat("NUS", 4000);
+insert into t4 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t4 select * from t4;
+insert into t4 select * from t4;
+delete from t4 where a=200;
+CHECKSUM TABLE t4;
+Table	Checksum
+mysqltest.t4	3470596356
+select count(*) from t4;
+count(*)
+4
+Signal BACKUP to finish
+SET DEBUG_SYNC= 'now SIGNAL bup_finish';
+
+connection backup: Fetch result
+backup_id
+#
+DROP DATABASE mysqltest;
+RESTORE FROM 'test.ba';
+backup_id
+#
+CHECKSUM TABLE t1;
+Table	Checksum
+mysqltest.t1	3470596356
+CHECK TABLE t1 EXTENDED;
+Table	Op	Msg_type	Msg_text
+mysqltest.t1	check	status	OK
+CHECKSUM TABLE t2;
+Table	Checksum
+mysqltest.t2	3470596356
+CHECK TABLE t2 EXTENDED;
+Table	Op	Msg_type	Msg_text
+mysqltest.t2	check	status	OK
+CHECKSUM TABLE t3;
+Table	Checksum
+mysqltest.t3	3470596356
+CHECK TABLE t3 EXTENDED;
+Table	Op	Msg_type	Msg_text
+mysqltest.t3	check	status	OK
+CHECKSUM TABLE t4;
+Table	Checksum
+mysqltest.t4	3470596356
+CHECK TABLE t4 EXTENDED;
+Table	Op	Msg_type	Msg_text
+mysqltest.t4	check	status	OK
+drop database mysqltest;
+create database mysqltest;
+CREATE TABLE mysqltest.t1 (a int, b varchar(100), unique(a)) engine=maria;
+BACKUP DATABASE mysqltest TO 'test.ba';
+backup_id
+#
+DROP DATABASE mysqltest;
+SET DEBUG_SYNC= 'before_restore_locks_tables SIGNAL wait_for_restore WAIT_FOR finish';
+RESTORE FROM 'test.ba';
+SET DEBUG_SYNC= 'now WAIT_FOR wait_for_restore';
+insert into mysqltest.t1 values(1,"def");
+SET DEBUG_SYNC= 'now SIGNAL finish';
+backup_id
+#
+select * from mysqltest.t1;
+a	b
+check table mysqltest.t1;
+Table	Op	Msg_type	Msg_text
+mysqltest.t1	check	status	OK
+select * from mysqltest.t1;
+a	b
+
 connection default: cleanup
 drop database mysqltest;

=== modified file 'mysql-test/suite/backup/t/backup_maria.test'
--- a/mysql-test/suite/backup/t/backup_maria.test	2009-01-28 14:01:33 +0000
+++ b/mysql-test/suite/backup/t/backup_maria.test	2009-02-10 14:51:40 +0000
@@ -124,6 +124,115 @@ SET DEBUG_SYNC= 'now SIGNAL bup_finish';
     CHECKSUM TABLE t4;
     CHECK TABLE t4 EXTENDED;
 
+connection default;
+--remove_file $MYSQLTEST_VARDIR/master-data/test.ba
+
+# Once more, with deletion of all rows
+
+    --echo
+    --echo connection backup: Start backup
+    connection backup;
+
+    SET DEBUG_SYNC= 'before_backup_data_prepare SIGNAL bup_sync
+                     WAIT_FOR bup_finish';
+    send BACKUP DATABASE mysqltest TO 'test.ba';
+
+--echo
+--echo connection default: Wait for BACKUP to reach its sync point
+connection default;
+SET DEBUG_SYNC= 'now WAIT_FOR bup_sync';
+
+--echo Modify tables
+delete from t1;
+insert into t1 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t1 set c=repeat("NUS", 4000); # a multi-page blob
+insert into t1 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t1 select * from t1;
+insert into t1 select * from t1;
+delete from t1 where a=200;
+CHECKSUM TABLE t1;
+select count(*) from t1;
+delete from t2;
+insert into t2 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t2 set c=repeat("NUS", 4000); # a multi-page blob
+insert into t2 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t2 select * from t2;
+insert into t2 select * from t2;
+delete from t2 where a=200;
+CHECKSUM TABLE t2;
+select count(*) from t2;
+delete from t3;
+insert into t3 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t3 set c=repeat("NUS", 4000); # a multi-page blob
+insert into t3 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t3 select * from t3;
+insert into t3 select * from t3;
+delete from t3 where a=200;
+CHECKSUM TABLE t3;
+select count(*) from t3;
+delete from t4;
+insert into t4 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t4 set c=repeat("NUS", 4000); # a multi-page blob
+insert into t4 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t4 select * from t4;
+insert into t4 select * from t4;
+delete from t4 where a=200;
+CHECKSUM TABLE t4;
+select count(*) from t4;
+
+
+--echo Signal BACKUP to finish
+SET DEBUG_SYNC= 'now SIGNAL bup_finish';
+
+    --echo
+    --echo connection backup: Fetch result
+    connection backup;
+    --replace_column 1 #
+    reap;
+    DROP DATABASE mysqltest;
+
+    --replace_column 1 #
+    RESTORE FROM 'test.ba';
+    CHECKSUM TABLE t1;
+    CHECK TABLE t1 EXTENDED;
+    CHECKSUM TABLE t2;
+    CHECK TABLE t2 EXTENDED;
+    CHECKSUM TABLE t3;
+    CHECK TABLE t3 EXTENDED;
+    CHECKSUM TABLE t4;
+    CHECK TABLE t4 EXTENDED;
+
+connection default;
+drop database mysqltest;
+--remove_file $MYSQLTEST_VARDIR/master-data/test.ba
+
+# test for BUG#42519 "Maria: RESTORE leads to corrupted table and assertion":
+
+create database mysqltest;
+CREATE TABLE mysqltest.t1 (a int, b varchar(100), unique(a)) engine=maria;
+
+--replace_column 1 #
+BACKUP DATABASE mysqltest TO 'test.ba';
+DROP DATABASE mysqltest;
+
+connection backup;
+SET DEBUG_SYNC= 'before_restore_locks_tables SIGNAL wait_for_restore WAIT_FOR finish';
+send RESTORE FROM 'test.ba';
+
+connection default;
+SET DEBUG_SYNC= 'now WAIT_FOR wait_for_restore';
+insert into mysqltest.t1 values(1,"def");
+SET DEBUG_SYNC= 'now SIGNAL finish';
+
+connection backup;
+--replace_column 1 #
+reap;
+
+connection default;
+select * from mysqltest.t1;
+check table mysqltest.t1;
+select * from mysqltest.t1;
+
 --echo
 --echo connection default: cleanup
 connection default;

=== modified file 'sql/backup/kernel.cc'
--- a/sql/backup/kernel.cc	2009-01-26 11:20:22 +0000
+++ b/sql/backup/kernel.cc	2009-02-10 14:51:40 +0000
@@ -1262,10 +1262,14 @@ int Backup_restore_ctx::do_restore(bool 
   close_thread_tables(m_thd);                   // Never errors
   m_thd->main_da.reset_diagnostics_area();      // Never errors  
 
+  DEBUG_SYNC(m_thd, "before_restore_locks_tables");
+
   err= lock_tables_for_restore();               // logs errors
   if (err)
     DBUG_RETURN(fatal_error(err));
 
+  DEBUG_SYNC(m_thd, "after_restore_locks_tables");
+
   // Here restore drivers are created to restore table data
   err= restore_table_data(m_thd, info, s);      // logs errors
 

=== modified file 'storage/maria/ma_bitmap.c'
--- a/storage/maria/ma_bitmap.c	2009-01-28 11:08:55 +0000
+++ b/storage/maria/ma_bitmap.c	2009-02-10 14:51:40 +0000
@@ -2517,13 +2517,22 @@ int _ma_bitmap_create_first(MARIA_SHARE 
     if it is needed
   */
   int4store(marker, MARIA_NO_CRC_BITMAP_PAGE);
-  DBUG_ASSERT(!share->physical_logging);
+
   if (my_chsize(file, block_size - sizeof(marker),
                 0, MYF(MY_WME)) ||
       my_pwrite(file, marker, sizeof(marker),
                 block_size - sizeof(marker),
                 MYF(MY_NABP | MY_WME)))
     return 1;
+  if (unlikely(ma_get_physical_logging_state(share)))
+  {
+    maria_log_chsize_physical(share, MA_LOG_CHSIZE_MAD,
+                              block_size - sizeof(marker));
+    maria_log_pwrite_physical(MA_LOG_WRITE_BYTES_MAD, share, marker,
+                              sizeof(marker),
+                              block_size - sizeof(marker));
+  }
+
   share->state.state.data_file_length= block_size;
   _ma_bitmap_delete_all(share);
   return 0;
@@ -2573,7 +2582,7 @@ void _ma_bitmap_set_pagecache_callbacks(
   file->callback_data= (uchar*) share;
   file->flush_log_callback= maria_flush_log_for_page_none;
   file->write_fail= maria_page_write_failure;
-  file->post_write_callback= maria_log_data_page_flush_physical;
+  file->post_write_callback= &maria_flush_log_for_page_none;
 
   if (share->temporary)
   {
@@ -2589,6 +2598,9 @@ void _ma_bitmap_set_pagecache_callbacks(
       file->pre_write_callback= &maria_page_filler_set_bitmap;
     if (share->now_transactional)
       file->flush_log_callback= flush_log_for_bitmap;
+#ifdef HAVE_MARIA_PHYSICAL_LOGGING
+    file->post_write_callback= &maria_log_data_page_flush_physical;
+#endif
   }
 }
 

=== modified file 'storage/maria/ma_check.c'
--- a/storage/maria/ma_check.c	2009-01-28 11:08:55 +0000
+++ b/storage/maria/ma_check.c	2009-02-10 14:51:40 +0000
@@ -6739,7 +6739,7 @@ static int chsize_kfile(MARIA_HA *info)
                ma_get_physical_logging_state(share)))
   {
     DBUG_ASSERT(no_length_change);
-    maria_log_chsize_kfile_physical(share, new_length);
+    maria_log_chsize_physical(share, MA_LOG_CHSIZE_MAI, new_length);
   }
 
   return ret;

=== modified file 'storage/maria/ma_delete_all.c'
--- a/storage/maria/ma_delete_all.c	2009-01-28 11:08:55 +0000
+++ b/storage/maria/ma_delete_all.c	2009-02-10 14:51:40 +0000
@@ -102,6 +102,11 @@ int maria_delete_all_rows(MARIA_HA *info
       my_chsize(info->dfile.file, 0, 0, MYF(MY_WME)) ||
       my_chsize(share->kfile.file, share->base.keystart, 0, MYF(MY_WME)))
     goto err;
+  if (unlikely(ma_get_physical_logging_state(share)))
+  {
+    maria_log_chsize_physical(share, MA_LOG_CHSIZE_MAD, 0);
+    maria_log_chsize_physical(share, MA_LOG_CHSIZE_MAI, share->base.keystart);
+  }
 
   if (_ma_initialize_data_file(share, info->dfile.file))
     goto err;
@@ -125,9 +130,6 @@ int maria_delete_all_rows(MARIA_HA *info
       goto err;
   }
 
-  if (unlikely(ma_get_physical_logging_state(info->s)))
-    _maria_log_command(&maria_physical_log, MA_LOG_DELETE_ALL, share,
-                       NULL, 0, 0);
   (void)(_ma_writeinfo(info,WRITEINFO_UPDATE_KEYFILE));
 #ifdef HAVE_MMAP
   /* Map again */

=== modified file 'storage/maria/ma_examine_non_trans_log.c'
--- a/storage/maria/ma_examine_non_trans_log.c	2009-01-28 11:08:55 +0000
+++ b/storage/maria/ma_examine_non_trans_log.c	2009-02-10 14:51:40 +0000
@@ -32,7 +32,7 @@
 /** Human-readable names of commands storable in Maria logs */
 const char *ma_log_command_name[]=
 {"open","close",
- "delete-all", "write-bytes-to-MAD", "write-bytes-to-MAI", "chsize-MAI",
+ "write-bytes-to-MAD", "write-bytes-to-MAI", "chsize-MAD", "chsize-MAI",
  /*
    This one is special: it is never in log records, it's just used by
    ma_examine_log() to tell the user that it failed when reopening a table. It
@@ -144,8 +144,8 @@ int ma_examine_log(MA_EXAMINE_LOG_PARAM 
   TREE tree;
   struct file_info file_info,*curr_file_info;
   uint head_len[][2]=
-    { { 11, 14 }, { 11, 14 }, { 11, 14 }, {  9, 16 }, {  9, 16 }, {  7, 12 }  };
-  uint has_pid_and_result[]= {1, 1, 1, 0, 0, 0};
+    { { 11, 14 }, { 11, 14 }, { 9, 16 }, { 9, 16 }, { 7, 12 }, { 7, 12 } };
+  uint has_pid_and_result[]= {1, 1, 0, 0, 0, 0};
   DBUG_ENTER("ma_examine_log");
 
   compile_time_assert((sizeof(ma_log_command_name) /
@@ -393,6 +393,7 @@ int ma_examine_log(MA_EXAMINE_LOG_PARAM 
       }
       my_free(buff,MYF(0));
       break;
+    case MA_LOG_CHSIZE_MAD:
     case MA_LOG_CHSIZE_MAI:
       /* here 'filepos' means new length of file */
       if (big_numbers)
@@ -416,24 +417,13 @@ int ma_examine_log(MA_EXAMINE_LOG_PARAM 
       if (mi_exl->update && curr_file_info && !curr_file_info->closed)
       {
         update_index_on_close= FALSE;
-        if (my_chsize(curr_file_info->isam->s->kfile.file, filepos,
-                      0, MYF(MY_WME)))
+        if (my_chsize((command == MA_LOG_CHSIZE_MAI) ?
+                      curr_file_info->isam->s->kfile.file :
+                      curr_file_info->isam->dfile.file,
+                      filepos, 0, MYF(MY_WME)))
           goto com_err;
       }
       break;
-    case MA_LOG_DELETE_ALL:
-      if (mi_exl->verbose && !mi_exl->record_pos_file &&
-	  (!mi_exl->table_selection_hook ||
-           (curr_file_info && curr_file_info->used)))
-	printf_log(mi_exl->verbose, isamlog_process, isamlog_filepos,
-                   "%s: %s -> %d",FILENAME(curr_file_info),
-		   ma_log_command_name[command],result);
-      if (mi_exl->update && curr_file_info && !curr_file_info->closed)
-      {
-	if (maria_delete_all_rows(curr_file_info->isam) != (int) result)
-	  goto com_err;
-      }
-      break;
     default:
       fflush(stdout);
       fprintf(stderr, "Error: found unknown command %d in logfile, aborted\n",

=== modified file 'storage/maria/ma_non_trans_log.c'
--- a/storage/maria/ma_non_trans_log.c	2009-01-29 20:01:10 +0000
+++ b/storage/maria/ma_non_trans_log.c	2009-02-10 14:51:40 +0000
@@ -246,13 +246,13 @@ void _maria_log_command(IO_CACHE *log, e
   uchar header[14];
   int old_errno, headerlen;
   ulong pid=(ulong) GETPID();
-  File file;
-
-  file= share->kfile.file;
-
-  DBUG_ASSERT(command == MA_LOG_OPEN  || command == MA_LOG_CLOSE ||
-              command == MA_LOG_DELETE_ALL);
+  File file= share->kfile.file;
   old_errno=my_errno;
+  DBUG_ENTER("_maria_log_command");
+  DBUG_PRINT("enter", ("command: %u share->open_file_name.str '%s'",
+                       command, share->open_file_name.str));
+  DBUG_ASSERT(command == MA_LOG_OPEN  || command == MA_LOG_CLOSE);
+
   DBUG_ASSERT(((uint)result) <= UINT_MAX16);
   if (file >= UINT_MAX16 || length >= UINT_MAX16)
   {
@@ -340,6 +340,7 @@ retry:
   }
   pthread_mutex_unlock(&THR_LOCK_maria_log);
   my_errno=old_errno;
+  DBUG_VOID_RETURN;
 }
 
 
@@ -413,24 +414,27 @@ retry:
 
 
 /**
-  Logs a my_chsize() done to the index file to the physical log.
+  Logs a my_chsize() done to the data or index file to the physical log.
 
   Also logs MA_LOG_OPEN if first time.
 
   @param  share            table's share
-  @param  new_length       new length of the table's index file
+  @param  command          MA_LOG_CHSIZE_MAD or MA_LOG_CHSIZE_MAI
+  @param  new_length       new length of the table's file
 */
 
-void maria_log_chsize_kfile_physical(MARIA_SHARE *share,
-                                      my_off_t new_length)
+void maria_log_chsize_physical(MARIA_SHARE *share,
+                               enum maria_log_commands command,
+                               my_off_t new_length)
 {
   uchar header[12];
   int old_errno, headerlen;
-  DBUG_ENTER("maria_log_chsize_kfile_physical");
+  DBUG_ENTER("maria_log_chsize_physical");
   old_errno= my_errno;
+  DBUG_ASSERT(command == MA_LOG_CHSIZE_MAD  || command == MA_LOG_CHSIZE_MAI);
   if (share->kfile.file >= UINT_MAX16 || new_length >= UINT_MAX32)
   {
-    header[0]= MA_LOG_CHSIZE_MAI | MA_LOG_BIG_NUMBERS;
+    header[0]= ((uchar) command) | MA_LOG_BIG_NUMBERS;
     DBUG_ASSERT(share->kfile.file < (2<<24));
     mi_int3store(header + 1, share->kfile.file);
     mi_sizestore(header + 4, new_length);
@@ -438,7 +442,7 @@ void maria_log_chsize_kfile_physical(MAR
   }
   else
   {
-    header[0]= MA_LOG_CHSIZE_MAI;
+    header[0]= (uchar)command;
     mi_int2store(header + 1, share->kfile.file);
     mi_int4store(header + 3, new_length);
     headerlen= 7;

=== modified file 'storage/maria/maria_backup_engine.cc'
--- a/storage/maria/maria_backup_engine.cc	2009-01-28 11:08:55 +0000
+++ b/storage/maria/maria_backup_engine.cc	2009-02-10 14:51:40 +0000
@@ -29,7 +29,10 @@
 #define MYSQL_SERVER 1 // need it to have mysql_tmpdir defined
 #include "mysql_priv.h"
 #include "ha_maria.h"
+C_MODE_START
 #include "maria_def.h" // to access dfile and kfile
+#include "ma_blockrec.h"
+C_MODE_END
 #include "backup/backup_engine.h"
 #include "backup/backup_aux.h"         // for build_table_list()
 #include <hash.h>
@@ -123,6 +126,23 @@ using backup::Buffer;
 */
 #define MARIA_BACKUP_VERSION 1
 
+/**
+  Restore kernel opens tables and locks them for the duration of restore
+  (after having created them empty); this means that cached objects stay
+  around (MARIA_SHARE, MARIA_HA) and can become out-of-sync with the
+  data/index file filled by the driver, unless we take precautions which are
+  recognizable by this symbol.
+*/
+#define RESTORE_KERNEL_KEEPS_OPEN_TABLES 1
+/**
+  Restore kernel leaves a time windows between end of creation of table (via
+  execution of CREATE TABLE) and locking of this table; in this window another
+  client can open/lock/modify/unlock the table, which conflicts with what the
+  driver is going to write to the data/index file, unless we take precautions
+  which are recognizable by this symbol.
+*/
+#define RESTORE_KERNEL_NOT_ATOMIC 1
+
 /** Like Table_ref but with file name added */
 class Maria_table_ref
 {
@@ -1652,7 +1672,9 @@ Table_restore::Table_restore(const Table
   Maria_table_ref(tbl), rebuild_index(FALSE)
 {
   MARIA_HA *mi_info;
+  MARIA_SHARE *share;
   File dfiledes= -1, kfiledes= -1;
+  my_bool save_transactional= FALSE;
   DBUG_ENTER("maria_backup::Table_restore::Table_restore");
   DBUG_PRINT("enter",("Initializing backup image for table %s",
                       file_name.ptr()));
@@ -1668,13 +1690,65 @@ Table_restore::Table_restore(const Table
     /* table does not exist or is corrupted? not normal, it's just created */
     goto err;
   }
+
+  share= mi_info->s;
+
+#ifdef RESTORE_KERNEL_NOT_ATOMIC
+  /*
+    Restore kernel leaves a window between creation of table and locking it;
+    in this window, another thread can modify the table, put pages in page
+    cache, changed cached bitmap or state, increase the files's length to
+    greater than what the driver has to write...
+    So we re-empty it here. We know we are alone using the table at this
+    point, as restore kernel has finished locking tables.
+    See BUG#42519, BUG#41716.
+  */
+  /* Another thread may have assigned an id */
+  pthread_mutex_lock(&share->intern_lock);
+  if (share->id != 0)
+  {
+    translog_deassign_id_from_share(share);
+    /*
+      Because id is 0, checkpoint will ignore this table, which is good
+      (otherwise Checkpoint may flush old info to the files, overwriting the
+      writes done by the driver.
+    */
+  }
+  pthread_mutex_unlock(&share->intern_lock);
+  save_transactional= share->now_transactional;
+  if (save_transactional) /* don't need logging */
+    _ma_tmp_disable_logging_for_table(mi_info, FALSE);
+  if (maria_delete_all_rows(mi_info))
+    goto err;
+  if (share->data_file_type == BLOCK_RECORD)
+  {
+    /*
+      maria_delete_all_rows() filled bitmap->map with zeroes and marked this
+      bitmap as changed. Flushing those zeroes would be wrong as soon as we
+      have restored the first bitmap page of the data file: prevent it.
+      Checkpoint can't flush between maria_delete_all_rows() and here, because
+      share->id is 0.
+    */
+    pthread_mutex_lock(&share->bitmap.bitmap_lock);
+    share->bitmap.changed= FALSE;
+    pthread_mutex_unlock(&share->bitmap.bitmap_lock);
+    DBUG_ASSERT(share->id == 0);
+  }
+  if (save_transactional)
+  {
+    save_transactional= FALSE;
+    if (_ma_reenable_logging_for_table(mi_info, FALSE))
+      goto err;
+  }
+#endif
+
   /*
     It's ok to copy the kfile descriptor and write() to it as the upper layers
     guarantee that we are the only user of the brand new table (nobody will
     lseek() under our feet).
   */
   if (((dfiledes= my_dup(mi_info->dfile.file, MYF(MY_WME))) < 0) ||
-      ((kfiledes= my_dup(mi_info->s->kfile.file, MYF(MY_WME))) < 0))
+      ((kfiledes= my_dup(share->kfile.file, MYF(MY_WME))) < 0))
     goto err;
   /*
     We are going to my_write() to the files without updating the table's
@@ -1697,6 +1771,8 @@ err:
     my_close(dfiledes, MYF(MY_WME));
   if (kfiledes > 0)
     my_close(kfiledes, MYF(MY_WME));
+  if (save_transactional)
+    _ma_reenable_logging_for_table(mi_info, FALSE);
   if (mi_info != NULL)
     maria_close(mi_info);
 }
@@ -1718,6 +1794,7 @@ result_t Table_restore::close()
       (kfile_restore.close_file() != backup::OK))
     SET_STATE_TO_ERROR_AND_DBUG_RETURN;
 
+#ifdef RESTORE_KERNEL_KEEPS_OPEN_TABLES
   /*
     CAUTION! Ugliest hack ever!
     This hack tries to recover from bypassing the Maria interface
@@ -1787,6 +1864,7 @@ result_t Table_restore::close()
     {
       LIST *list_element ;
       pthread_mutex_lock(&THR_LOCK_maria);
+      pthread_mutex_lock(&share->close_lock);
       pthread_mutex_lock(&share->intern_lock);
       for (list_element= maria_open_list;
            list_element;
@@ -1798,6 +1876,7 @@ result_t Table_restore::close()
       }
       pthread_mutex_unlock(&THR_LOCK_maria);
       pthread_mutex_unlock(&share->intern_lock);
+      pthread_mutex_unlock(&share->close_lock);
     }
     if (maria_close(mi_info))
       goto err;
@@ -1809,6 +1888,7 @@ result_t Table_restore::close()
   end :
     do {} while (0); /* Empty statement, syntactically required. */
   }
+#endif
 
   DBUG_RETURN(backup::OK);
 }
@@ -1850,7 +1930,6 @@ result_t Table_restore::post_restore()
   Vio* save_vio;
   DBUG_ENTER("maria_backup::Table_restore::post_restore");
 
-  if (!rebuild_index)
   {
     MARIA_HA *mi_info;
     MARIA_SHARE *share;
@@ -1874,6 +1953,31 @@ result_t Table_restore::post_restore()
       /* open_count>0 only because we copied while open, no problem */
       share->state.open_count= 0;
     }
+
+#ifdef RESTORE_KERNEL_KEEPS_OPEN_TABLES
+    if (share->data_file_type == BLOCK_RECORD)
+    {
+      /*
+        bitmap->map is full of zeroes (it dates from maria_delete_all_rows()
+        above). If we leave it like this, next threads may use it
+        (close_cached_tables() doesn't help here: if a thread has managed to
+        open the table while we had it locked, close_cached_tables() doesn't
+        close the table (BUG#40944)). So they will see it as empty, thus treat
+        data pages as empty, thus overwrite existing data records. To prevent
+        this, we reload this bitmap from disk.
+        We must do it in Table_restore::post_restore() and not in
+        Table_restore::close() (which is called two times): if we did it in
+        close(), the bitmap of before-log-applying would be read by page cache
+        and stay cached there, so the second close() (of after-log-applying)
+        will pick it from page cache instead of from disk, and so it will stay
+        old and empty.
+      */
+      pthread_mutex_lock(&share->bitmap.bitmap_lock);
+      share->bitmap.page= ~(ULL(0)); /* to force a read below */
+      (void)_ma_bitmap_get_page_bits(mi_info, &share->bitmap, 1);
+      pthread_mutex_unlock(&share->bitmap.bitmap_lock);
+    }
+#endif
     if (share->base.born_transactional)
     {
       /*
@@ -1891,9 +1995,11 @@ result_t Table_restore::post_restore()
     error= _ma_state_info_write_sub(share, share->kfile.file,
                                     &share->state, 1);
     error|= maria_close(mi_info);
-    goto err;
   }
 
+  if (!rebuild_index)
+    goto err;
+
   /*
     mariachk() as well as ha_maria::repair() do a lot of operations before
     and after maria_repair(); to not duplicate code we reuse one of them.

=== modified file 'storage/maria/maria_def.h'
--- a/storage/maria/maria_def.h	2009-01-29 20:01:10 +0000
+++ b/storage/maria/maria_def.h	2009-02-10 14:51:40 +0000
@@ -1096,9 +1096,9 @@ typedef struct st_maria_block_info
 enum maria_log_commands {
   MA_LOG_OPEN, /**< when maria_open() */
   MA_LOG_CLOSE, /**< when maria_close() */
-  MA_LOG_DELETE_ALL, /**< when maria_delete_all() */
   MA_LOG_WRITE_BYTES_MAD, /**< when Maria writes to the data file */
   MA_LOG_WRITE_BYTES_MAI, /**< when Maria writes to the index file */
+  MA_LOG_CHSIZE_MAD,      /**< when Maria changes size of data file */
   MA_LOG_CHSIZE_MAI,      /**< when Maria changes size of index file */
   MA_LOG_END_SENTINEL /**< keep this one unused and last */
 };
@@ -1139,8 +1139,9 @@ extern void maria_log_pwrite_physical(en
                                        MARIA_SHARE *share,
                                        const uchar *buffert, uint length,
                                        my_off_t filepos);
-extern void maria_log_chsize_kfile_physical(MARIA_SHARE *share,
-                                             my_off_t new_length);
+extern void maria_log_chsize_physical(MARIA_SHARE *share,
+                                      enum maria_log_commands command,
+                                      my_off_t new_length);
 #ifdef HAVE_MARIA_PHYSICAL_LOGGING
 static inline int32 ma_get_physical_logging_state(MARIA_SHARE *share)
 {

=== modified file 'storage/myisam/myisam_backup_engine.cc'
--- a/storage/myisam/myisam_backup_engine.cc	2009-01-28 11:08:55 +0000
+++ b/storage/myisam/myisam_backup_engine.cc	2009-02-10 14:51:40 +0000
@@ -118,6 +118,23 @@ using backup::Buffer;
 */
 #define MYISAM_BACKUP_VERSION 1
 
+/**
+  Restore kernel opens tables and locks them for the duration of restore
+  (after having created them empty); this means that cached objects stay
+  around (MYISAM_SHARE, MI_INFO) and can become out-of-sync with the
+  data/index file filled by the driver, unless we take precautions which are
+  recognizable by this symbol.
+*/
+#define RESTORE_KERNEL_KEEPS_OPEN_TABLES 1
+/**
+  Restore kernel leaves a time windows between end of creation of table (via
+  execution of CREATE TABLE) and locking of this table; in this window another
+  client can open/lock/modify/unlock the table, which conflicts with what the
+  driver is going to write to the data/index file, unless we take precautions
+  which are recognizable by this symbol.
+*/
+#define RESTORE_KERNEL_NOT_ATOMIC 1
+
 /** Like Table_ref but with file name added */
 class Myisam_table_ref
 {
@@ -1660,6 +1677,19 @@ Table_restore::Table_restore(const Table
     /* table does not exist or is corrupted? not normal, it's just created */
     goto err;
   }
+#ifdef RESTORE_KERNEL_NOT_ATOMIC
+  /*
+    Restore kernel leaves a window between creation of table and locking it;
+    in this window, another thread can modify the table, put pages in page
+    cache, alter state, increase the files's length to greater than what the
+    driver has to write...
+    So we re-empty it here. We know we are alone using the table at this
+    point, as restore kernel has finished locking tables.
+    See BUG#42519, BUG#41716.
+  */
+  if (mi_delete_all_rows(mi_info))
+    goto err;
+#endif
   /*
     It's ok to copy the kfile descriptor and write() to it as the upper layers
     guarantee that we are the only user of the brand new table (nobody will
@@ -1710,6 +1740,7 @@ result_t Table_restore::close()
       (kfile_restore.close_file() != backup::OK))
     SET_STATE_TO_ERROR_AND_DBUG_RETURN;
 
+#ifdef RESTORE_KERNEL_KEEPS_OPEN_TABLES
   /*
     CAUTION! Ugliest hack ever!
     This hack tries to recover from bypassing the MyISAM interface
@@ -1821,6 +1852,7 @@ result_t Table_restore::close()
   end :
     do {} while (0); /* Empty statement, syntactically required. */
   }
+#endif
 
   DBUG_RETURN(backup::OK);
 }
@@ -1862,7 +1894,6 @@ result_t Table_restore::post_restore()
   Vio* save_vio;
   DBUG_ENTER("myisam_backup::Table_restore::post_restore");
 
-  if (!rebuild_index)
   {
     MI_INFO *mi_info;
     MYISAM_SHARE *share;
@@ -1889,9 +1920,11 @@ result_t Table_restore::post_restore()
       error= mi_state_info_write(share, share->kfile, &share->state, 1);
     }
     error|= mi_close(mi_info);
-    goto err;
   }
 
+  if (!rebuild_index)
+    goto err;
+
   /*
     myisamchk() as well as ha_myisam::repair() do a lot of operations before
     and after mi_repair(); to not duplicate code we reuse one of them.

Thread
bzr commit into mysql-6.0 branch (guilhem:2721) Bug#42519Guilhem Bichot10 Feb