Hi Rafal,
thank you for looking at it.
Rafal Somla, 21.10.2009 16:47:
> Hi Ingo,
>
> I had a look at your patch. Here are my comments.
>
> 1. The solution which you implements here puts new responsibility on
> backup/restore drivers - they should save and restore the current
> auto_increment value for each table they handle.
>
> This new requirement should be documented, so that when new drivers are
> written, the implementer will not forget about that. I think a good
> place to document the requirement is somewhere in backup_engine.h file.
> Ideally we should also update documentation on various web pages, and in
> backup internals manual.
Agree.
>
> Note that it is not clear what "current auto_increment value for a
> table" is. One understanding is that it is the value at VP time. You use
> a different interpretation, where any value bigger than the value at VP
> time is OK.
>
> 2. If we are after saving auto_increment values at VP time, I'd arrange
> the code differently. I'll scan for these values just after unlock()
> call (i.e., when VP was just created). I'll iterate over all tables and
> save AI values either directly in the stream or in memory (to be written
> later, at the end of the data stream).
If you mean "just after lock() call" (not unlock()), the I agree.
But, as we agreed on IRC, we want to keep the lock period as short as
possible. If somebody teaches us that we need to restore exact values
from VP time, we don't have a chance than the above. However, I hope it
is acceptable if the auto_increment value is restored too high.
One possible problem could be statement based replication. After
restoring the backup from a master, the slave could produce higher
values than the master.
...
>> Note that there is a chance that the restored
>> auto_increment value can be higher than at the
>> validation point of BACKUP. Concurrent DML on a table
>> backed up with the consistent snapshot driver can
>> increase the auto_increment value before the driver
>> reads it. Since we do not guarantee gap free
>> auto_increment values, I consider this as acceptable.
>> At least this patch assures that the values cannot be
>> lower.
>
> This is a controversial choice. Another view is that we want to restore
> the situation as it was at the VP time, including the current values of
> auto_increment. For example, a user can expect that after restore
> LAST_INSERT_ID() returns the same value as it was returning at VP time.
You mean, a user can expect that after restore an INSERT would generate
the same auto_increment value as an INSERT immediately after the
BACKUP's VP.
LAST_INSERT_ID() returns 0 in a new session and the value of the
auto_increment column of the first row after an INSERT statement.
LAST_INSERT_ID() continues to return the same value again and again
until changed by another INSERT. RESTORE does not influence the value
returned by LAST_INSERT_ID(). An INSERT into a table with no
auto_increment column does not influence LAST_INSERT_ID(). Even an
INSERT into a table with an auto_increment column, but using a fixed
value for it, does not influence LAST_INSERT_ID(), even though this
raises the auto_increment value.
>
> On the other hand, I can also understand the reasoning that user should
> not rely on particular values of auto_increment - only on the fact that
> they are unique. Then not restoring exact values should be acceptable.
Right. That's my position. However, I'm concerned about a possible
impact on replication.
...
>> === modified file 'sql/backup/be_default.cc'
>> --- a/sql/backup/be_default.cc 2009-09-10 13:46:13 +0000
>> +++ b/sql/backup/be_default.cc 2009-10-21 11:46:46 +0000
>> @@ -421,8 +421,21 @@ result_t Backup::get_data(Buffer &buf)
>> */
>> if (last_read_res == HA_ERR_END_OF_FILE)
>> {
>
> This happens when last row of a table was read.
>
>> + if (cur_table->found_next_number_field)
>
> I do not understand what cur_table->found_next_number_field is. Any hints?
A "next_number_field" is a column with an AUTO_INCREMENT default value.
TABLE::found_next_number_field is set at table open to the field object
of that column, if there is any (and it can be one at most).
It is common in the server code to check for this TABLE object member to
be non-NULL to decide if the table has an auto_increment column.
I'll add a comment.
...
>> switch (block_type) {
>>
>> + case AUTO_INCR:
>> + {
>> + ulonglong auto_incr;
>> + DBUG_ASSERT(size == 8);
>> + auto_incr= uint8korr((byte *)buf.data + META_SIZE);
>> + DBUG_PRINT("default_restore",
>> + ("table: '%s'.'%s' auto_inc: %lu",
>> + cur_table->s->db.str,
>> cur_table->s->table_name.str,
>> + (ulong) auto_incr));
>> + hdl->ha_reset_auto_increment(auto_incr);
>
> Is it important that this hdl->ha_reset_auto_increment() is done last,
> after all table rows have been written with hdl->ha_write_row()? That
> is, if I first do hdl->ha_reset_auto_increment() and then a couple of
> hdl->ha_write_row() can they change the value of table's auto_increment?
This could be an issue only if a row has a higher value for the
auto_increment column than the current auto_increment value for that
table. Since we grab the number after reading the last record in backup,
this should never be possible. So the order does not matter.
Note that there is no way to set the auto_increment lower than the
highest value of the auto_increment column in the table, by using SQL
statements. At least I didn't find one.
But, assumed it would be possible to have the current auto_increment
value too low, then hdl->ha_write_row() does not increase it. Duplicate
key errors would be possible. This is what RESTORE did with the new
locking scheme and which caused this bug to be reported.
Regards
Ingo
--
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring HRB München 161028