Thanks, for the patch, Ingo. I have a few comments and questions that
I want your response to before I finalize the review:
1. The new tests fail on Solaris x86 because a lot of warnings like
this:
+Warning 1292 Truncated incorrect DOUBLE value: 'A'
I think this is because your select query compares a varchar column
to an integer (see [1], [3] inline.)
2. If a understand the patch correctly, it is the change to kernel.cc
that fixes the problem. The changes to mf_keycache.c is to prevent
future crashes with similar bugs?
3. Are there any recommended policy for splitting test cases between
different test files? Why should not your test case go into
existing an existing backup test that tests myisam related stuff?
I am also unsure about the naming, and think I would like something
more informative that just a number to distinguish between
different backup/myisam tests.
4. The backup_myisam3_coverage test is just the same as the
backup_myisam3 test, but with error injection. It does not seem to
me that it attempts to test the actions performed upon error. If I
understand correctly, in case of the injected error, the server
will flush all tables. However, there are only one table around,
so the test performs the same actions as without error injection.
5. Which leads me to my next question: What would the disadvantages be
of simplifying this fix to always flush all tables after restore?
6. If I understand correctly, it is not possible to add to do the
close_cached_tables call from the driver code. Is this because it
should only be called within the server? It seems to me that it
would be "more right" to handle such driver specific issues on the
driver level. Also, the driver already has a list of the tables it
is restoring. It seems a bit of a waste to use the catalog to
build another list when one such already exists. Could an
alternative be to iterate over drivers, instead of snapshots here,
and do this on a per driver basis?
More questions/comments in-line.
Ingo Struewing wrote:
> #At file:///home2/mydev/bzrroot/mysql-6.0-bug40944-2/
>
> 2737 Ingo Struewing 2008-12-03
> Bug#40944 - Backup: crash after myisampack
>
> After restore of a compressed MyISAM table, the server crashed
> when the table was accessed.
>
> The problem was twofold.
>
> 1. The crash happened in the MyISAM keycache. It did not handle
> an error in reading a block correctly. This is now fixed in
> key_cache_read(), key_cache_insert(), and key_cache_write().
> Their BLOCK_ERROR handling does now follow a common pattern.
> Erroneous blocks are no longer put to the LRU ring, but freed.
>
> 2. After fixing the above, accesses to a restored compressed table
> failed with error messages about a corrupted table. Restore of a
> table works by dropping the old table if exists, creating a new
> one, and loading the data from the backup image. The MyISAM
> native driver writes directly to the table files, what was there
> at backup time. This works even for compressed tables, as long as
> the table is freshly opened after loading the data. But RESTORE
> needs to keep the table write locked until it is complete. This
> includes keeping the table open. When restore is done, it unlocks
> the tables and requests close. But our table cache does not
> perform a full close. It marks it unused and keeps it open in the
> cache. Later statements reuse the open table. But when the MyISAM
> table format changes, we must force a full close. Since this is
> difficult to detect, we do now always force close of restored
> MyISAM tables, if restored through the native driver.
> added:
> mysql-test/r/myisam_keycache_coverage.result
> mysql-test/suite/backup/r/backup_myisam3.result
> mysql-test/suite/backup/r/backup_myisam3_coverage.result
> mysql-test/suite/backup/t/backup_myisam3.test
> mysql-test/suite/backup/t/backup_myisam3_coverage.test
> mysql-test/t/myisam_keycache_coverage.test
> modified:
> .bzrignore
> mysql-test/lib/mtr_report.pl
> mysql-test/suite/backup/r/backup_myisam1.result
> mysql-test/suite/backup/t/backup_myisam1.test
> mysys/mf_keycache.c
> sql/backup/kernel.cc
> storage/myisam/mi_close.c
> storage/myisam/mi_open.c
> storage/myisam/myisam_backup_engine.cc
...
> +--replace_column 1 #
> +BACKUP DATABASE mysql_db1 to 'bup_myisam3.bak';
> +--replace_column 1 #
> +RESTORE FROM 'bup_myisam3.bak' OVERWRITE;
> +#
> +SELECT COUNT(*) FROM mysql_db1.t1 WHERE c1 < 5;
[1] Here you are comparing a varchar column to an int. I guess that
is not your intention.
...
> +#
> +# Inject error at Backup_restore_ctx_unlock_tables_alloc
> +SET debug='d,Backup_restore_ctx_unlock_tables_alloc';
[2] Other backup tests use SET SESSION debug='+d, ...'. Does the
SESSION keyword make any difference? How about d vs. +d?
Does d, replace the debug setting, while +d add another?
> +--replace_column 1 #
> +RESTORE FROM 'bup_myisam3.bak' OVERWRITE;
> +#
> +SELECT COUNT(*) FROM mysql_db1.t1 WHERE c1 < 5;
[3] Here is a similar varchar vs int comparison as in [1].
...
> +--echo #
> +--echo # Inject error key_cache_read_block_error
> +--echo #
> +SET debug='d,key_cache_read_block_error';
> +--replace_regex /'.*\//'/
[4] Is this replace really necessary? Would not the path be the same
here for all users and platforms?
> +--error 126
> +SELECT COUNT(*) FROM t1 WHERE c1 < 5;
> +FLUSH TABLE t1;
> +
> +--echo #
> +--echo # Inject error key_cache_insert_block_error
> +--echo #
> +SET debug='d,key_cache_insert_block_error';
[5] Will this replace the previous debug flag, or is both set after
this?
...
> === modified file 'mysys/mf_keycache.c'
> --- a/mysys/mf_keycache.c 2008-10-20 09:16:47 +0000
> +++ b/mysys/mf_keycache.c 2008-12-03 16:02:14 +0000
> @@ -1373,7 +1373,11 @@ static void unreg_request(KEY_CACHE *key
> DBUG_ASSERT(block->prev_changed && *block->prev_changed == block);
> DBUG_ASSERT(!block->next_used);
> DBUG_ASSERT(!block->prev_used);
> - if (! --block->requests)
> + /*
> + Unregister the request, but do not link erroneous blocks into the
> + LRU ring.
> + */
> + if (!--block->requests && !(block->status & BLOCK_ERROR))
[6] Did you consider making sure that unreg_request is not called for
error blocks instead of handling it here?
> {
> my_bool hot;
> if (block->hits_left)
> @@ -1455,8 +1459,7 @@ static void wait_for_readers(KEY_CACHE *
> #ifdef THREAD
> struct st_my_thread_var *thread= my_thread_var;
> DBUG_ASSERT(block->status & (BLOCK_READ | BLOCK_IN_USE));
> - DBUG_ASSERT(!(block->status & (BLOCK_ERROR | BLOCK_IN_FLUSH |
> - BLOCK_CHANGED)));
> + DBUG_ASSERT(!(block->status & (BLOCK_IN_FLUSH | BLOCK_CHANGED)));
[7] I do not understand this change. The intention of this patch
seems to be to put restriction on error blocks, but here it seems you
remove the restriction that wait_for_readers should not be called for
error blocks. Please, explain.
...
> === modified file 'storage/myisam/myisam_backup_engine.cc'
> --- a/storage/myisam/myisam_backup_engine.cc 2008-08-11 16:06:30 +0000
> +++ b/storage/myisam/myisam_backup_engine.cc 2008-12-03 16:02:14 +0000
> @@ -1754,6 +1754,25 @@ result_t Table_restore::close()
> But since the share does now cache the new values from the
> index file, the backup kernel's close writes the correct
> information back to the file.
> +
> + This used to work until a brave soul tried to backup and restore
> + compressed tables. Now we know, that replacing the state info is
> + insufficient. The table is always re-created as a non-compressed
> + table.
Does this imply that if I backup a compressed table, it will not be
compressed after a restore? Please, clarify.
--
Øystein