Hi Øystein,
thanks for the review. Please see below.
Øystein Grøvlen, 18.12.2008 12:20:
>
> Thanks, for the patch, Ingo. I have a few comments and questions that
> I want your response to before I finalize the review:
>
> 1. The new tests fail on Solaris x86 because a lot of warnings like
> this:
>
> +Warning 1292 Truncated incorrect DOUBLE value: 'A'
>
> I think this is because your select query compares a varchar column
> to an integer (see [1], [3] inline.)
After applying the patch to a current tree, I saw these warnings on
Linux too. Fixed.
>
> 2. If a understand the patch correctly, it is the change to kernel.cc
> that fixes the problem. The changes to mf_keycache.c is to prevent
> future crashes with similar bugs?
Yes, but the order of fixes were reverse. The initial behavior of the
bug was a crash in keycache code. I fixed this first. Only then I found
the kernel.cc problem.
>
> 3. Are there any recommended policy for splitting test cases between
> different test files? Why should not your test case go into
> existing an existing backup test that tests myisam related stuff?
> I am also unsure about the naming, and think I would like something
> more informative that just a number to distinguish between
> different backup/myisam tests.
I don't know of a policy for this. In contrast, I have seen different
opinions about test case splitting. One voice did even suggest to have
one test case per bug fix. ;-)
My first attempt was to add the new tests to one of the other
backup_myisam tests. But then I found that they run with different setups.
backup_myisam1 has an option file to enable --external-locking. This has
a high potential to fail on Windows. So we better keep this test case as
short as possible.
backup_myisam2 requires have_debug_sync. This test case will not
normally run on a production server. So it should get only those tests
that require synchronization.
So I made a new test case with no special setup, which can be used for
all MyISAM specific tests that don't require any.
How would you rename that test case?
backup.backup_myisam_with_external_locking.test,
backup.backup_myisam_with_debug_sync.test,
backup.backup_myisam_normal.test, and
backup.backup_myisam_normal_coverage.test?
>
> 4. The backup_myisam3_coverage test is just the same as the
> backup_myisam3 test, but with error injection. It does not seem to
> me that it attempts to test the actions performed upon error.
What error do you mean here?
> If I
> understand correctly, in case of the injected error, the server
> will flush all tables. However, there are only one table around,
> so the test performs the same actions as without error injection.
Coverage testing has a different goal. Its intention is to have every
line of new or changed code executed. Without error injection, I was not
able to let the execution run through the inner close_cached_tables().
Indeed, in this test case it performs exactly the same operation as the
outer close_cached_tables() does without error injection.
But this is an excellent example to show, what coverage testing can do
and cannot do.
Though I run through each line of code now, I still don't test more
cases than without coverage testing. The effort seems wasted.
However, before I added the coverage test, the error handling within the
loop looked different. Basically I copied the loop from
Backup_restore_ctx::lock_tables_for_restore() and modified it slightly.
The error handling was "return
fatal_error(log_error(ER_OUT_OF_RESOURCES));". I prepended
close_cached_tables() and all looked fine. Since the inner code was
never executed, all looked great, all tests passed.
Now, when I added the coverage test, I got problems because of the error
reporting and later due to the missing call to close_thread_tables().
Now the code became more simple and more correct. I do not report an
error, but it is not necessary because calling close_cached_tables()
with a list of tables is just an optimization. With a NULL list, all
tables are closed, which is inefficient, but still correct. If later
parts of code execution run into insufficient memory, they will
(hopefully) report an error, if they don't have a way around it.
In summary, though coverage testing catches a small class of problems
only, it is still very useful to discover problems that otherwise might
go undetected for a long time. These are mostly cases that will happen
very seldom, but anyway it's better to fix them.
>
> 5. Which leads me to my next question: What would the disadvantages be
> of simplifying this fix to always flush all tables after restore?
The disadvantage would be that all tables that are open in the whole
server would be closed, and have to be re-opened if they are used next
time. On a big, busy server, this can make a remarkable burst of system
activity and degrade performance until the table cache is filled with a
sufficient number of open TABLE objects for all frequently used tables
again.
Flushing just the tables, that we did freshly create during restore, is
much less of a threat. We should be the only one to have them open anyway.
>
> 6. If I understand correctly, it is not possible to add to do the
> close_cached_tables call from the driver code. Is this because it
> should only be called within the server? It seems to me that it
> would be "more right" to handle such driver specific issues on the
> driver level. Also, the driver already has a list of the tables it
> is restoring. It seems a bit of a waste to use the catalog to
> build another list when one such already exists. Could an
> alternative be to iterate over drivers, instead of snapshots here,
> and do this on a per driver basis?
I made some attempts to request table refresh from the driver. This gave
me nothing but crashes. It may be possible, but I don't know, how.
It is an interesting idea, to use the driver's table list. But it will
probably take me some time to locate it.
OTOH it doesn't look like the most optimal modularization to me, if the
backup driver, which belongs to the storage engine, manipulates the
server's table cache.
What we typically do in such cases is to report from the engine, if some
special treatment is required. So it would be great, if during driver
open, it could tell the backup kernel, that it needs a flush after restore.
I have read a comment that the kernel takes a copy of the table list for
later reference, but I didn't find it. And I didn't find an appropriate
place for it, if it is not there, in spite of the comment.
And last not least, I expect that other engines might have similar
requirements, once they have native drivers. Some common code doesn't
look that bad then.
...
>> +--echo #
>> +--echo # Inject error key_cache_read_block_error
>> +--echo #
>> +SET debug='d,key_cache_read_block_error';
>> +--replace_regex /'.*\//'/
>
> [4] Is this replace really necessary? Would not the path be the same
> here for all users and platforms?
Probably. I deleted it.
...
>> +--echo #
>> +--echo # Inject error key_cache_insert_block_error
>> +--echo #
>> +SET debug='d,key_cache_insert_block_error';
>
> [5] Will this replace the previous debug flag, or is both set after
> this?
It will replace the list of keywords. To add one, one would write
SET debug='+d,key_cache_insert_block_error';
>
> ...
>
>> === modified file 'mysys/mf_keycache.c'
>> --- a/mysys/mf_keycache.c 2008-10-20 09:16:47 +0000
>> +++ b/mysys/mf_keycache.c 2008-12-03 16:02:14 +0000
>> @@ -1373,7 +1373,11 @@ static void unreg_request(KEY_CACHE *key
>> DBUG_ASSERT(block->prev_changed && *block->prev_changed ==
> block);
>> DBUG_ASSERT(!block->next_used);
>> DBUG_ASSERT(!block->prev_used);
>> - if (! --block->requests)
>> + /*
>> + Unregister the request, but do not link erroneous blocks into the
>> + LRU ring.
>> + */
>> + if (!--block->requests && !(block->status & BLOCK_ERROR))
>
> [6] Did you consider making sure that unreg_request is not called for
> error blocks instead of handling it here?
It is called from free_block(), where it can be an error block. And it
does --block->requests unconditionally. It would be ugly to count block
requests in free-block() directly, just to avoid a call to
unreg_request(). OTOH, it might be slightly more performant. If you
insist...
>
>
>> {
>> my_bool hot;
>> if (block->hits_left)
>> @@ -1455,8 +1459,7 @@ static void wait_for_readers(KEY_CACHE *
>> #ifdef THREAD
>> struct st_my_thread_var *thread= my_thread_var;
>> DBUG_ASSERT(block->status & (BLOCK_READ | BLOCK_IN_USE));
>> - DBUG_ASSERT(!(block->status & (BLOCK_ERROR | BLOCK_IN_FLUSH |
>> - BLOCK_CHANGED)));
>> + DBUG_ASSERT(!(block->status & (BLOCK_IN_FLUSH | BLOCK_CHANGED)));
>
> [7] I do not understand this change. The intention of this patch
> seems to be to put restriction on error blocks, but here it seems you
> remove the restriction that wait_for_readers should not be called for
> error blocks. Please, explain.
These asserts stem from my last big keycache patch. I added them, to
understand, what kind of blocks can go through which functions. This was
a several months task. I adjusted the asserts back and forth while my
understanding of the code grew. Finally I ended up with a set which is
pretty restrictive and does still survive all tests we've done since.
However, during my work on the keycache I noticed that the handling of
block errors was virtually not present, but at least not correct. I made
some written notices and proposals to Monty, who reviewed my work at
that time. But these problems are very rare, and so nothing has been
done about it. Bug#40944 was the first time that a BLOCK_ERROR showed
up, as far as I know.
With this patch I propose a fix for the incorrect handling in
key_cache_read(), which made the reported crash, and the very similar
parts of key_cache_insert() and key_cache_write().
The crash happened on an assert, which disallowed to put erroneous
blocks in the LRU.
My main proposal is to change the named functions so that they do not
try to put erroneous blocks in the LRU any more, but to free them.
In this course, it became necessary to let erroneous blocks through some
functions, which are called from free_block().
An alternative could be to clear the error flag before freeing the
block, like I do with BLOCK_CHANGED in key_cache_write(). But I don't
feel this to be a cleaner approach. If yo prefer, we can do a poll with
Sergey.
...
>> + This used to work until a brave soul tried to backup and restore
>> + compressed tables. Now we know, that replacing the state info is
>> + insufficient. The table is always re-created as a non-compressed
>> + table.
>
> Does this imply that if I backup a compressed table, it will not be
> compressed after a restore? Please, clarify.
Restore drops and re-creates the tables it restores. While the
compression state might be visible through some SHOW functions, there is
no way to create a compressed table by SQL. A CREATE TABLE statement as
retrieved from SHOW CREATE TABLE does not show the compression state.
Hence, the freshly re-created table is always a non-compressed one.
Since the compression is handled completely inside MyISAM, and all
required information is in the MyISAM table files .MYD and .MYI, the
native driver does restore a compressed table. The next open of the
files will load the internal MyISAM structures, so that it can handle
the table correctly.
On the server level this does not make a difference. But as long as it
keeps the freshly created table open, it does also keep the MyISAM
structures open, which were loaded from the freshly created,
uncompressed table files. An attempt to use these structures to access
the now compressed files fails.
If we would use the built-in driver with MyISAM, each row would be
uncompressed on read and put into the backup stream uncompressed. On
restore, an uncompressed table would be created as usual. The built-in
driver would write the uncompressed rows into it. The result would be an
uncompressed table after restore.
Regards
Ingo
--
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Dr. Roland Bömer
Vorsitzender des Aufsichtsrates: Martin Häring HRB München 161028