#At file:///home/mysql_src/bzrrepos/merge_6.0_myisam_changes_to_maria/mysql-6.0-maria-with-porting/ based on revid:guilhem@stripped
2721 Guilhem Bichot 2009-02-10
Maria online backup: BUG#42519 "Maria: RESTORE leads to corrupted table and assertion":
* working around deficiencies of the restore kernel
(occuring also in
BUG 41716 "Backup Error when sending data (for table #1) to Default/Snapshot restore driver"
BUG 40944 "Backup: crash after myisampack"); this fixes BUG#42519.
* same workaround for MyISAM (which suffered from a bug similar to BUG#42519).
* fixing a Maria restore driver bug found during debugging this ("DELETE FROM <table>;" was
not handled during backup).
modified:
mysql-test/suite/backup/r/backup_maria.result
mysql-test/suite/backup/t/backup_maria.test
sql/backup/kernel.cc
storage/maria/ma_bitmap.c
storage/maria/ma_check.c
storage/maria/ma_delete_all.c
storage/maria/ma_examine_non_trans_log.c
storage/maria/ma_non_trans_log.c
storage/maria/maria_backup_engine.cc
storage/maria/maria_def.h
storage/myisam/myisam_backup_engine.cc
per-file messages:
mysql-test/suite/backup/r/backup_maria.result
result is correct; before the bugfixes:
- a DBUG_ASSERT() would fire when "DELETE FROM <table>;" is run during backup
- the table would be corrupted in the final CHECK TABLE:
mysqltest.t1 check warning Size of indexfile is: 16384 Should be: 8192
mysqltest.t1 check warning Size of datafile is: 16384 Should be: 8192
mysqltest.t1 check error Bitmap at page 0 has pages reserved outside of data file length
because the client's INSERT (before VP creation) would cache stuff which would survive the
driver's restore.
mysql-test/suite/backup/t/backup_maria.test
adding a test which involves "DELETE FROM <table>;" (which caused a crash during backup),
and a test for BUG#42519
sql/backup/kernel.cc
sync points for repeating concurrency bugs
storage/maria/ma_bitmap.c
_ma_bitmap_create_first() is called during maria_delete_all_rows() which can happen
when the table is doing physical logging: do log its operations.
Optimization: a bitmap page of a temporary table does not need to call
maria_log_data_page_flush_physical.
storage/maria/ma_check.c
update to new prototype
storage/maria/ma_delete_all.c
This logical "MA_LOG_DELETE_ALL" log record had a problem: when log was applied it could generate
a transaction-log record, thus assign an id to the MARIA_SHARE, which could make Checkpoint
look at the table while it's being restored, which is not good. It also put an empty bitmap
in memory, which would be out-of-date cached data as soon as the next log record causes
insertion of some data in a page. Instead, we log the my_chsize() which sounds more in line
with other places like in ma_check.c.
We needn't do the same change in MyISAM because it does not have bitmaps or share's ids.
storage/maria/ma_examine_non_trans_log.c
MA_LOG_DELETE_ALL can't be found in log, and MA_LOG_CHSIZE_MAD can.
storage/maria/ma_non_trans_log.c
MA_LOG_DELETE_ALL can't be put in log, and MA_LOG_CHSIZE_MAD can.
storage/maria/maria_backup_engine.cc
* Working around a deficiency of the restore kernel:
- it leaves a window between creating the table empty and locking it; in this window,
another thread can do an update to the table (which conflicts with what the restore
driver will put into files).
- it does not always really close tables at end of restore, so restore driver has to
make sure to invalidate any cached info before table is unlocked.
- see this file's comments for more details about the two problems above;
they were the causes of BUG#42519 (various corruptions or assertion failures
in the first DMLs after RESTORE, depending on concurrency).
* The same logic is needed independently of rebuilding the index or not.
* keep MARIA_SHARE::close_lock when iterating over states of table,
to protect against a concurrent checkpoint (safer).
storage/maria/maria_def.h
new declarations
storage/myisam/myisam_backup_engine.cc
Same changes as in maria_backup_engine.cc (though simplified).
=== modified file 'mysql-test/suite/backup/r/backup_maria.result'
--- a/mysql-test/suite/backup/r/backup_maria.result 2009-01-28 11:08:55 +0000
+++ b/mysql-test/suite/backup/r/backup_maria.result 2009-02-10 14:51:40 +0000
@@ -136,5 +136,121 @@ CHECK TABLE t4 EXTENDED;
Table Op Msg_type Msg_text
mysqltest.t4 check status OK
+connection backup: Start backup
+SET DEBUG_SYNC= 'before_backup_data_prepare SIGNAL bup_sync
+ WAIT_FOR bup_finish';
+BACKUP DATABASE mysqltest TO 'test.ba';
+
+connection default: Wait for BACKUP to reach its sync point
+SET DEBUG_SYNC= 'now WAIT_FOR bup_sync';
+Modify tables
+delete from t1;
+insert into t1 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t1 set c=repeat("NUS", 4000);
+insert into t1 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t1 select * from t1;
+insert into t1 select * from t1;
+delete from t1 where a=200;
+CHECKSUM TABLE t1;
+Table Checksum
+mysqltest.t1 3470596356
+select count(*) from t1;
+count(*)
+4
+delete from t2;
+insert into t2 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t2 set c=repeat("NUS", 4000);
+insert into t2 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t2 select * from t2;
+insert into t2 select * from t2;
+delete from t2 where a=200;
+CHECKSUM TABLE t2;
+Table Checksum
+mysqltest.t2 3470596356
+select count(*) from t2;
+count(*)
+4
+delete from t3;
+insert into t3 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t3 set c=repeat("NUS", 4000);
+insert into t3 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t3 select * from t3;
+insert into t3 select * from t3;
+delete from t3 where a=200;
+CHECKSUM TABLE t3;
+Table Checksum
+mysqltest.t3 3470596356
+select count(*) from t3;
+count(*)
+4
+delete from t4;
+insert into t4 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t4 set c=repeat("NUS", 4000);
+insert into t4 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t4 select * from t4;
+insert into t4 select * from t4;
+delete from t4 where a=200;
+CHECKSUM TABLE t4;
+Table Checksum
+mysqltest.t4 3470596356
+select count(*) from t4;
+count(*)
+4
+Signal BACKUP to finish
+SET DEBUG_SYNC= 'now SIGNAL bup_finish';
+
+connection backup: Fetch result
+backup_id
+#
+DROP DATABASE mysqltest;
+RESTORE FROM 'test.ba';
+backup_id
+#
+CHECKSUM TABLE t1;
+Table Checksum
+mysqltest.t1 3470596356
+CHECK TABLE t1 EXTENDED;
+Table Op Msg_type Msg_text
+mysqltest.t1 check status OK
+CHECKSUM TABLE t2;
+Table Checksum
+mysqltest.t2 3470596356
+CHECK TABLE t2 EXTENDED;
+Table Op Msg_type Msg_text
+mysqltest.t2 check status OK
+CHECKSUM TABLE t3;
+Table Checksum
+mysqltest.t3 3470596356
+CHECK TABLE t3 EXTENDED;
+Table Op Msg_type Msg_text
+mysqltest.t3 check status OK
+CHECKSUM TABLE t4;
+Table Checksum
+mysqltest.t4 3470596356
+CHECK TABLE t4 EXTENDED;
+Table Op Msg_type Msg_text
+mysqltest.t4 check status OK
+drop database mysqltest;
+create database mysqltest;
+CREATE TABLE mysqltest.t1 (a int, b varchar(100), unique(a)) engine=maria;
+BACKUP DATABASE mysqltest TO 'test.ba';
+backup_id
+#
+DROP DATABASE mysqltest;
+SET DEBUG_SYNC= 'before_restore_locks_tables SIGNAL wait_for_restore WAIT_FOR finish';
+RESTORE FROM 'test.ba';
+SET DEBUG_SYNC= 'now WAIT_FOR wait_for_restore';
+insert into mysqltest.t1 values(1,"def");
+SET DEBUG_SYNC= 'now SIGNAL finish';
+backup_id
+#
+select * from mysqltest.t1;
+a b
+check table mysqltest.t1;
+Table Op Msg_type Msg_text
+mysqltest.t1 check status OK
+select * from mysqltest.t1;
+a b
+
connection default: cleanup
drop database mysqltest;
=== modified file 'mysql-test/suite/backup/t/backup_maria.test'
--- a/mysql-test/suite/backup/t/backup_maria.test 2009-01-28 14:01:33 +0000
+++ b/mysql-test/suite/backup/t/backup_maria.test 2009-02-10 14:51:40 +0000
@@ -124,6 +124,115 @@ SET DEBUG_SYNC= 'now SIGNAL bup_finish';
CHECKSUM TABLE t4;
CHECK TABLE t4 EXTENDED;
+connection default;
+--remove_file $MYSQLTEST_VARDIR/master-data/test.ba
+
+# Once more, with deletion of all rows
+
+ --echo
+ --echo connection backup: Start backup
+ connection backup;
+
+ SET DEBUG_SYNC= 'before_backup_data_prepare SIGNAL bup_sync
+ WAIT_FOR bup_finish';
+ send BACKUP DATABASE mysqltest TO 'test.ba';
+
+--echo
+--echo connection default: Wait for BACKUP to reach its sync point
+connection default;
+SET DEBUG_SYNC= 'now WAIT_FOR bup_sync';
+
+--echo Modify tables
+delete from t1;
+insert into t1 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t1 set c=repeat("NUS", 4000); # a multi-page blob
+insert into t1 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t1 select * from t1;
+insert into t1 select * from t1;
+delete from t1 where a=200;
+CHECKSUM TABLE t1;
+select count(*) from t1;
+delete from t2;
+insert into t2 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t2 set c=repeat("NUS", 4000); # a multi-page blob
+insert into t2 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t2 select * from t2;
+insert into t2 select * from t2;
+delete from t2 where a=200;
+CHECKSUM TABLE t2;
+select count(*) from t2;
+delete from t3;
+insert into t3 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t3 set c=repeat("NUS", 4000); # a multi-page blob
+insert into t3 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t3 select * from t3;
+insert into t3 select * from t3;
+delete from t3 where a=200;
+CHECKSUM TABLE t3;
+select count(*) from t3;
+delete from t4;
+insert into t4 values(2000,"UJLlk5454j-JK","PferZZZfPMLsn!");
+update t4 set c=repeat("NUS", 4000); # a multi-page blob
+insert into t4 values(200,"UJLlkj-JK","PferfPMLsn!");
+insert into t4 select * from t4;
+insert into t4 select * from t4;
+delete from t4 where a=200;
+CHECKSUM TABLE t4;
+select count(*) from t4;
+
+
+--echo Signal BACKUP to finish
+SET DEBUG_SYNC= 'now SIGNAL bup_finish';
+
+ --echo
+ --echo connection backup: Fetch result
+ connection backup;
+ --replace_column 1 #
+ reap;
+ DROP DATABASE mysqltest;
+
+ --replace_column 1 #
+ RESTORE FROM 'test.ba';
+ CHECKSUM TABLE t1;
+ CHECK TABLE t1 EXTENDED;
+ CHECKSUM TABLE t2;
+ CHECK TABLE t2 EXTENDED;
+ CHECKSUM TABLE t3;
+ CHECK TABLE t3 EXTENDED;
+ CHECKSUM TABLE t4;
+ CHECK TABLE t4 EXTENDED;
+
+connection default;
+drop database mysqltest;
+--remove_file $MYSQLTEST_VARDIR/master-data/test.ba
+
+# test for BUG#42519 "Maria: RESTORE leads to corrupted table and assertion":
+
+create database mysqltest;
+CREATE TABLE mysqltest.t1 (a int, b varchar(100), unique(a)) engine=maria;
+
+--replace_column 1 #
+BACKUP DATABASE mysqltest TO 'test.ba';
+DROP DATABASE mysqltest;
+
+connection backup;
+SET DEBUG_SYNC= 'before_restore_locks_tables SIGNAL wait_for_restore WAIT_FOR finish';
+send RESTORE FROM 'test.ba';
+
+connection default;
+SET DEBUG_SYNC= 'now WAIT_FOR wait_for_restore';
+insert into mysqltest.t1 values(1,"def");
+SET DEBUG_SYNC= 'now SIGNAL finish';
+
+connection backup;
+--replace_column 1 #
+reap;
+
+connection default;
+select * from mysqltest.t1;
+check table mysqltest.t1;
+select * from mysqltest.t1;
+
--echo
--echo connection default: cleanup
connection default;
=== modified file 'sql/backup/kernel.cc'
--- a/sql/backup/kernel.cc 2009-01-26 11:20:22 +0000
+++ b/sql/backup/kernel.cc 2009-02-10 14:51:40 +0000
@@ -1262,10 +1262,14 @@ int Backup_restore_ctx::do_restore(bool
close_thread_tables(m_thd); // Never errors
m_thd->main_da.reset_diagnostics_area(); // Never errors
+ DEBUG_SYNC(m_thd, "before_restore_locks_tables");
+
err= lock_tables_for_restore(); // logs errors
if (err)
DBUG_RETURN(fatal_error(err));
+ DEBUG_SYNC(m_thd, "after_restore_locks_tables");
+
// Here restore drivers are created to restore table data
err= restore_table_data(m_thd, info, s); // logs errors
=== modified file 'storage/maria/ma_bitmap.c'
--- a/storage/maria/ma_bitmap.c 2009-01-28 11:08:55 +0000
+++ b/storage/maria/ma_bitmap.c 2009-02-10 14:51:40 +0000
@@ -2517,13 +2517,22 @@ int _ma_bitmap_create_first(MARIA_SHARE
if it is needed
*/
int4store(marker, MARIA_NO_CRC_BITMAP_PAGE);
- DBUG_ASSERT(!share->physical_logging);
+
if (my_chsize(file, block_size - sizeof(marker),
0, MYF(MY_WME)) ||
my_pwrite(file, marker, sizeof(marker),
block_size - sizeof(marker),
MYF(MY_NABP | MY_WME)))
return 1;
+ if (unlikely(ma_get_physical_logging_state(share)))
+ {
+ maria_log_chsize_physical(share, MA_LOG_CHSIZE_MAD,
+ block_size - sizeof(marker));
+ maria_log_pwrite_physical(MA_LOG_WRITE_BYTES_MAD, share, marker,
+ sizeof(marker),
+ block_size - sizeof(marker));
+ }
+
share->state.state.data_file_length= block_size;
_ma_bitmap_delete_all(share);
return 0;
@@ -2573,7 +2582,7 @@ void _ma_bitmap_set_pagecache_callbacks(
file->callback_data= (uchar*) share;
file->flush_log_callback= maria_flush_log_for_page_none;
file->write_fail= maria_page_write_failure;
- file->post_write_callback= maria_log_data_page_flush_physical;
+ file->post_write_callback= &maria_flush_log_for_page_none;
if (share->temporary)
{
@@ -2589,6 +2598,9 @@ void _ma_bitmap_set_pagecache_callbacks(
file->pre_write_callback= &maria_page_filler_set_bitmap;
if (share->now_transactional)
file->flush_log_callback= flush_log_for_bitmap;
+#ifdef HAVE_MARIA_PHYSICAL_LOGGING
+ file->post_write_callback= &maria_log_data_page_flush_physical;
+#endif
}
}
=== modified file 'storage/maria/ma_check.c'
--- a/storage/maria/ma_check.c 2009-01-28 11:08:55 +0000
+++ b/storage/maria/ma_check.c 2009-02-10 14:51:40 +0000
@@ -6739,7 +6739,7 @@ static int chsize_kfile(MARIA_HA *info)
ma_get_physical_logging_state(share)))
{
DBUG_ASSERT(no_length_change);
- maria_log_chsize_kfile_physical(share, new_length);
+ maria_log_chsize_physical(share, MA_LOG_CHSIZE_MAI, new_length);
}
return ret;
=== modified file 'storage/maria/ma_delete_all.c'
--- a/storage/maria/ma_delete_all.c 2009-01-28 11:08:55 +0000
+++ b/storage/maria/ma_delete_all.c 2009-02-10 14:51:40 +0000
@@ -102,6 +102,11 @@ int maria_delete_all_rows(MARIA_HA *info
my_chsize(info->dfile.file, 0, 0, MYF(MY_WME)) ||
my_chsize(share->kfile.file, share->base.keystart, 0, MYF(MY_WME)))
goto err;
+ if (unlikely(ma_get_physical_logging_state(share)))
+ {
+ maria_log_chsize_physical(share, MA_LOG_CHSIZE_MAD, 0);
+ maria_log_chsize_physical(share, MA_LOG_CHSIZE_MAI, share->base.keystart);
+ }
if (_ma_initialize_data_file(share, info->dfile.file))
goto err;
@@ -125,9 +130,6 @@ int maria_delete_all_rows(MARIA_HA *info
goto err;
}
- if (unlikely(ma_get_physical_logging_state(info->s)))
- _maria_log_command(&maria_physical_log, MA_LOG_DELETE_ALL, share,
- NULL, 0, 0);
(void)(_ma_writeinfo(info,WRITEINFO_UPDATE_KEYFILE));
#ifdef HAVE_MMAP
/* Map again */
=== modified file 'storage/maria/ma_examine_non_trans_log.c'
--- a/storage/maria/ma_examine_non_trans_log.c 2009-01-28 11:08:55 +0000
+++ b/storage/maria/ma_examine_non_trans_log.c 2009-02-10 14:51:40 +0000
@@ -32,7 +32,7 @@
/** Human-readable names of commands storable in Maria logs */
const char *ma_log_command_name[]=
{"open","close",
- "delete-all", "write-bytes-to-MAD", "write-bytes-to-MAI", "chsize-MAI",
+ "write-bytes-to-MAD", "write-bytes-to-MAI", "chsize-MAD", "chsize-MAI",
/*
This one is special: it is never in log records, it's just used by
ma_examine_log() to tell the user that it failed when reopening a table. It
@@ -144,8 +144,8 @@ int ma_examine_log(MA_EXAMINE_LOG_PARAM
TREE tree;
struct file_info file_info,*curr_file_info;
uint head_len[][2]=
- { { 11, 14 }, { 11, 14 }, { 11, 14 }, { 9, 16 }, { 9, 16 }, { 7, 12 } };
- uint has_pid_and_result[]= {1, 1, 1, 0, 0, 0};
+ { { 11, 14 }, { 11, 14 }, { 9, 16 }, { 9, 16 }, { 7, 12 }, { 7, 12 } };
+ uint has_pid_and_result[]= {1, 1, 0, 0, 0, 0};
DBUG_ENTER("ma_examine_log");
compile_time_assert((sizeof(ma_log_command_name) /
@@ -393,6 +393,7 @@ int ma_examine_log(MA_EXAMINE_LOG_PARAM
}
my_free(buff,MYF(0));
break;
+ case MA_LOG_CHSIZE_MAD:
case MA_LOG_CHSIZE_MAI:
/* here 'filepos' means new length of file */
if (big_numbers)
@@ -416,24 +417,13 @@ int ma_examine_log(MA_EXAMINE_LOG_PARAM
if (mi_exl->update && curr_file_info && !curr_file_info->closed)
{
update_index_on_close= FALSE;
- if (my_chsize(curr_file_info->isam->s->kfile.file, filepos,
- 0, MYF(MY_WME)))
+ if (my_chsize((command == MA_LOG_CHSIZE_MAI) ?
+ curr_file_info->isam->s->kfile.file :
+ curr_file_info->isam->dfile.file,
+ filepos, 0, MYF(MY_WME)))
goto com_err;
}
break;
- case MA_LOG_DELETE_ALL:
- if (mi_exl->verbose && !mi_exl->record_pos_file &&
- (!mi_exl->table_selection_hook ||
- (curr_file_info && curr_file_info->used)))
- printf_log(mi_exl->verbose, isamlog_process, isamlog_filepos,
- "%s: %s -> %d",FILENAME(curr_file_info),
- ma_log_command_name[command],result);
- if (mi_exl->update && curr_file_info && !curr_file_info->closed)
- {
- if (maria_delete_all_rows(curr_file_info->isam) != (int) result)
- goto com_err;
- }
- break;
default:
fflush(stdout);
fprintf(stderr, "Error: found unknown command %d in logfile, aborted\n",
=== modified file 'storage/maria/ma_non_trans_log.c'
--- a/storage/maria/ma_non_trans_log.c 2009-01-29 20:01:10 +0000
+++ b/storage/maria/ma_non_trans_log.c 2009-02-10 14:51:40 +0000
@@ -246,13 +246,13 @@ void _maria_log_command(IO_CACHE *log, e
uchar header[14];
int old_errno, headerlen;
ulong pid=(ulong) GETPID();
- File file;
-
- file= share->kfile.file;
-
- DBUG_ASSERT(command == MA_LOG_OPEN || command == MA_LOG_CLOSE ||
- command == MA_LOG_DELETE_ALL);
+ File file= share->kfile.file;
old_errno=my_errno;
+ DBUG_ENTER("_maria_log_command");
+ DBUG_PRINT("enter", ("command: %u share->open_file_name.str '%s'",
+ command, share->open_file_name.str));
+ DBUG_ASSERT(command == MA_LOG_OPEN || command == MA_LOG_CLOSE);
+
DBUG_ASSERT(((uint)result) <= UINT_MAX16);
if (file >= UINT_MAX16 || length >= UINT_MAX16)
{
@@ -340,6 +340,7 @@ retry:
}
pthread_mutex_unlock(&THR_LOCK_maria_log);
my_errno=old_errno;
+ DBUG_VOID_RETURN;
}
@@ -413,24 +414,27 @@ retry:
/**
- Logs a my_chsize() done to the index file to the physical log.
+ Logs a my_chsize() done to the data or index file to the physical log.
Also logs MA_LOG_OPEN if first time.
@param share table's share
- @param new_length new length of the table's index file
+ @param command MA_LOG_CHSIZE_MAD or MA_LOG_CHSIZE_MAI
+ @param new_length new length of the table's file
*/
-void maria_log_chsize_kfile_physical(MARIA_SHARE *share,
- my_off_t new_length)
+void maria_log_chsize_physical(MARIA_SHARE *share,
+ enum maria_log_commands command,
+ my_off_t new_length)
{
uchar header[12];
int old_errno, headerlen;
- DBUG_ENTER("maria_log_chsize_kfile_physical");
+ DBUG_ENTER("maria_log_chsize_physical");
old_errno= my_errno;
+ DBUG_ASSERT(command == MA_LOG_CHSIZE_MAD || command == MA_LOG_CHSIZE_MAI);
if (share->kfile.file >= UINT_MAX16 || new_length >= UINT_MAX32)
{
- header[0]= MA_LOG_CHSIZE_MAI | MA_LOG_BIG_NUMBERS;
+ header[0]= ((uchar) command) | MA_LOG_BIG_NUMBERS;
DBUG_ASSERT(share->kfile.file < (2<<24));
mi_int3store(header + 1, share->kfile.file);
mi_sizestore(header + 4, new_length);
@@ -438,7 +442,7 @@ void maria_log_chsize_kfile_physical(MAR
}
else
{
- header[0]= MA_LOG_CHSIZE_MAI;
+ header[0]= (uchar)command;
mi_int2store(header + 1, share->kfile.file);
mi_int4store(header + 3, new_length);
headerlen= 7;
=== modified file 'storage/maria/maria_backup_engine.cc'
--- a/storage/maria/maria_backup_engine.cc 2009-01-28 11:08:55 +0000
+++ b/storage/maria/maria_backup_engine.cc 2009-02-10 14:51:40 +0000
@@ -29,7 +29,10 @@
#define MYSQL_SERVER 1 // need it to have mysql_tmpdir defined
#include "mysql_priv.h"
#include "ha_maria.h"
+C_MODE_START
#include "maria_def.h" // to access dfile and kfile
+#include "ma_blockrec.h"
+C_MODE_END
#include "backup/backup_engine.h"
#include "backup/backup_aux.h" // for build_table_list()
#include <hash.h>
@@ -123,6 +126,23 @@ using backup::Buffer;
*/
#define MARIA_BACKUP_VERSION 1
+/**
+ Restore kernel opens tables and locks them for the duration of restore
+ (after having created them empty); this means that cached objects stay
+ around (MARIA_SHARE, MARIA_HA) and can become out-of-sync with the
+ data/index file filled by the driver, unless we take precautions which are
+ recognizable by this symbol.
+*/
+#define RESTORE_KERNEL_KEEPS_OPEN_TABLES 1
+/**
+ Restore kernel leaves a time windows between end of creation of table (via
+ execution of CREATE TABLE) and locking of this table; in this window another
+ client can open/lock/modify/unlock the table, which conflicts with what the
+ driver is going to write to the data/index file, unless we take precautions
+ which are recognizable by this symbol.
+*/
+#define RESTORE_KERNEL_NOT_ATOMIC 1
+
/** Like Table_ref but with file name added */
class Maria_table_ref
{
@@ -1652,7 +1672,9 @@ Table_restore::Table_restore(const Table
Maria_table_ref(tbl), rebuild_index(FALSE)
{
MARIA_HA *mi_info;
+ MARIA_SHARE *share;
File dfiledes= -1, kfiledes= -1;
+ my_bool save_transactional= FALSE;
DBUG_ENTER("maria_backup::Table_restore::Table_restore");
DBUG_PRINT("enter",("Initializing backup image for table %s",
file_name.ptr()));
@@ -1668,13 +1690,65 @@ Table_restore::Table_restore(const Table
/* table does not exist or is corrupted? not normal, it's just created */
goto err;
}
+
+ share= mi_info->s;
+
+#ifdef RESTORE_KERNEL_NOT_ATOMIC
+ /*
+ Restore kernel leaves a window between creation of table and locking it;
+ in this window, another thread can modify the table, put pages in page
+ cache, changed cached bitmap or state, increase the files's length to
+ greater than what the driver has to write...
+ So we re-empty it here. We know we are alone using the table at this
+ point, as restore kernel has finished locking tables.
+ See BUG#42519, BUG#41716.
+ */
+ /* Another thread may have assigned an id */
+ pthread_mutex_lock(&share->intern_lock);
+ if (share->id != 0)
+ {
+ translog_deassign_id_from_share(share);
+ /*
+ Because id is 0, checkpoint will ignore this table, which is good
+ (otherwise Checkpoint may flush old info to the files, overwriting the
+ writes done by the driver.
+ */
+ }
+ pthread_mutex_unlock(&share->intern_lock);
+ save_transactional= share->now_transactional;
+ if (save_transactional) /* don't need logging */
+ _ma_tmp_disable_logging_for_table(mi_info, FALSE);
+ if (maria_delete_all_rows(mi_info))
+ goto err;
+ if (share->data_file_type == BLOCK_RECORD)
+ {
+ /*
+ maria_delete_all_rows() filled bitmap->map with zeroes and marked this
+ bitmap as changed. Flushing those zeroes would be wrong as soon as we
+ have restored the first bitmap page of the data file: prevent it.
+ Checkpoint can't flush between maria_delete_all_rows() and here, because
+ share->id is 0.
+ */
+ pthread_mutex_lock(&share->bitmap.bitmap_lock);
+ share->bitmap.changed= FALSE;
+ pthread_mutex_unlock(&share->bitmap.bitmap_lock);
+ DBUG_ASSERT(share->id == 0);
+ }
+ if (save_transactional)
+ {
+ save_transactional= FALSE;
+ if (_ma_reenable_logging_for_table(mi_info, FALSE))
+ goto err;
+ }
+#endif
+
/*
It's ok to copy the kfile descriptor and write() to it as the upper layers
guarantee that we are the only user of the brand new table (nobody will
lseek() under our feet).
*/
if (((dfiledes= my_dup(mi_info->dfile.file, MYF(MY_WME))) < 0) ||
- ((kfiledes= my_dup(mi_info->s->kfile.file, MYF(MY_WME))) < 0))
+ ((kfiledes= my_dup(share->kfile.file, MYF(MY_WME))) < 0))
goto err;
/*
We are going to my_write() to the files without updating the table's
@@ -1697,6 +1771,8 @@ err:
my_close(dfiledes, MYF(MY_WME));
if (kfiledes > 0)
my_close(kfiledes, MYF(MY_WME));
+ if (save_transactional)
+ _ma_reenable_logging_for_table(mi_info, FALSE);
if (mi_info != NULL)
maria_close(mi_info);
}
@@ -1718,6 +1794,7 @@ result_t Table_restore::close()
(kfile_restore.close_file() != backup::OK))
SET_STATE_TO_ERROR_AND_DBUG_RETURN;
+#ifdef RESTORE_KERNEL_KEEPS_OPEN_TABLES
/*
CAUTION! Ugliest hack ever!
This hack tries to recover from bypassing the Maria interface
@@ -1787,6 +1864,7 @@ result_t Table_restore::close()
{
LIST *list_element ;
pthread_mutex_lock(&THR_LOCK_maria);
+ pthread_mutex_lock(&share->close_lock);
pthread_mutex_lock(&share->intern_lock);
for (list_element= maria_open_list;
list_element;
@@ -1798,6 +1876,7 @@ result_t Table_restore::close()
}
pthread_mutex_unlock(&THR_LOCK_maria);
pthread_mutex_unlock(&share->intern_lock);
+ pthread_mutex_unlock(&share->close_lock);
}
if (maria_close(mi_info))
goto err;
@@ -1809,6 +1888,7 @@ result_t Table_restore::close()
end :
do {} while (0); /* Empty statement, syntactically required. */
}
+#endif
DBUG_RETURN(backup::OK);
}
@@ -1850,7 +1930,6 @@ result_t Table_restore::post_restore()
Vio* save_vio;
DBUG_ENTER("maria_backup::Table_restore::post_restore");
- if (!rebuild_index)
{
MARIA_HA *mi_info;
MARIA_SHARE *share;
@@ -1874,6 +1953,31 @@ result_t Table_restore::post_restore()
/* open_count>0 only because we copied while open, no problem */
share->state.open_count= 0;
}
+
+#ifdef RESTORE_KERNEL_KEEPS_OPEN_TABLES
+ if (share->data_file_type == BLOCK_RECORD)
+ {
+ /*
+ bitmap->map is full of zeroes (it dates from maria_delete_all_rows()
+ above). If we leave it like this, next threads may use it
+ (close_cached_tables() doesn't help here: if a thread has managed to
+ open the table while we had it locked, close_cached_tables() doesn't
+ close the table (BUG#40944)). So they will see it as empty, thus treat
+ data pages as empty, thus overwrite existing data records. To prevent
+ this, we reload this bitmap from disk.
+ We must do it in Table_restore::post_restore() and not in
+ Table_restore::close() (which is called two times): if we did it in
+ close(), the bitmap of before-log-applying would be read by page cache
+ and stay cached there, so the second close() (of after-log-applying)
+ will pick it from page cache instead of from disk, and so it will stay
+ old and empty.
+ */
+ pthread_mutex_lock(&share->bitmap.bitmap_lock);
+ share->bitmap.page= ~(ULL(0)); /* to force a read below */
+ (void)_ma_bitmap_get_page_bits(mi_info, &share->bitmap, 1);
+ pthread_mutex_unlock(&share->bitmap.bitmap_lock);
+ }
+#endif
if (share->base.born_transactional)
{
/*
@@ -1891,9 +1995,11 @@ result_t Table_restore::post_restore()
error= _ma_state_info_write_sub(share, share->kfile.file,
&share->state, 1);
error|= maria_close(mi_info);
- goto err;
}
+ if (!rebuild_index)
+ goto err;
+
/*
mariachk() as well as ha_maria::repair() do a lot of operations before
and after maria_repair(); to not duplicate code we reuse one of them.
=== modified file 'storage/maria/maria_def.h'
--- a/storage/maria/maria_def.h 2009-01-29 20:01:10 +0000
+++ b/storage/maria/maria_def.h 2009-02-10 14:51:40 +0000
@@ -1096,9 +1096,9 @@ typedef struct st_maria_block_info
enum maria_log_commands {
MA_LOG_OPEN, /**< when maria_open() */
MA_LOG_CLOSE, /**< when maria_close() */
- MA_LOG_DELETE_ALL, /**< when maria_delete_all() */
MA_LOG_WRITE_BYTES_MAD, /**< when Maria writes to the data file */
MA_LOG_WRITE_BYTES_MAI, /**< when Maria writes to the index file */
+ MA_LOG_CHSIZE_MAD, /**< when Maria changes size of data file */
MA_LOG_CHSIZE_MAI, /**< when Maria changes size of index file */
MA_LOG_END_SENTINEL /**< keep this one unused and last */
};
@@ -1139,8 +1139,9 @@ extern void maria_log_pwrite_physical(en
MARIA_SHARE *share,
const uchar *buffert, uint length,
my_off_t filepos);
-extern void maria_log_chsize_kfile_physical(MARIA_SHARE *share,
- my_off_t new_length);
+extern void maria_log_chsize_physical(MARIA_SHARE *share,
+ enum maria_log_commands command,
+ my_off_t new_length);
#ifdef HAVE_MARIA_PHYSICAL_LOGGING
static inline int32 ma_get_physical_logging_state(MARIA_SHARE *share)
{
=== modified file 'storage/myisam/myisam_backup_engine.cc'
--- a/storage/myisam/myisam_backup_engine.cc 2009-01-28 11:08:55 +0000
+++ b/storage/myisam/myisam_backup_engine.cc 2009-02-10 14:51:40 +0000
@@ -118,6 +118,23 @@ using backup::Buffer;
*/
#define MYISAM_BACKUP_VERSION 1
+/**
+ Restore kernel opens tables and locks them for the duration of restore
+ (after having created them empty); this means that cached objects stay
+ around (MYISAM_SHARE, MI_INFO) and can become out-of-sync with the
+ data/index file filled by the driver, unless we take precautions which are
+ recognizable by this symbol.
+*/
+#define RESTORE_KERNEL_KEEPS_OPEN_TABLES 1
+/**
+ Restore kernel leaves a time windows between end of creation of table (via
+ execution of CREATE TABLE) and locking of this table; in this window another
+ client can open/lock/modify/unlock the table, which conflicts with what the
+ driver is going to write to the data/index file, unless we take precautions
+ which are recognizable by this symbol.
+*/
+#define RESTORE_KERNEL_NOT_ATOMIC 1
+
/** Like Table_ref but with file name added */
class Myisam_table_ref
{
@@ -1660,6 +1677,19 @@ Table_restore::Table_restore(const Table
/* table does not exist or is corrupted? not normal, it's just created */
goto err;
}
+#ifdef RESTORE_KERNEL_NOT_ATOMIC
+ /*
+ Restore kernel leaves a window between creation of table and locking it;
+ in this window, another thread can modify the table, put pages in page
+ cache, alter state, increase the files's length to greater than what the
+ driver has to write...
+ So we re-empty it here. We know we are alone using the table at this
+ point, as restore kernel has finished locking tables.
+ See BUG#42519, BUG#41716.
+ */
+ if (mi_delete_all_rows(mi_info))
+ goto err;
+#endif
/*
It's ok to copy the kfile descriptor and write() to it as the upper layers
guarantee that we are the only user of the brand new table (nobody will
@@ -1710,6 +1740,7 @@ result_t Table_restore::close()
(kfile_restore.close_file() != backup::OK))
SET_STATE_TO_ERROR_AND_DBUG_RETURN;
+#ifdef RESTORE_KERNEL_KEEPS_OPEN_TABLES
/*
CAUTION! Ugliest hack ever!
This hack tries to recover from bypassing the MyISAM interface
@@ -1821,6 +1852,7 @@ result_t Table_restore::close()
end :
do {} while (0); /* Empty statement, syntactically required. */
}
+#endif
DBUG_RETURN(backup::OK);
}
@@ -1862,7 +1894,6 @@ result_t Table_restore::post_restore()
Vio* save_vio;
DBUG_ENTER("myisam_backup::Table_restore::post_restore");
- if (!rebuild_index)
{
MI_INFO *mi_info;
MYISAM_SHARE *share;
@@ -1889,9 +1920,11 @@ result_t Table_restore::post_restore()
error= mi_state_info_write(share, share->kfile, &share->state, 1);
}
error|= mi_close(mi_info);
- goto err;
}
+ if (!rebuild_index)
+ goto err;
+
/*
myisamchk() as well as ha_myisam::repair() do a lot of operations before
and after mi_repair(); to not duplicate code we reuse one of them.
| Thread |
|---|
| • bzr commit into mysql-6.0 branch (guilhem:2721) Bug#42519 | Guilhem Bichot | 10 Feb |