Zhenxing, hello.
I am glad you are confirming the set of issues the big represents as
well for a good finding on a documentation artifact.
Getting straight to the point:
> To sum up, things we may need to do:
> 1) Let the 5.1 ignore the INSERT_ID value as 5.0 does when it is
> replicating from 5.0 masters that have this problem
> 2) Make 5.0 follow the document when using SP, do not write the
> INSERT_ID value
> 3) Make 5.0 behave the same as 5.1 on this issue
>
> item 1) is necessay for upgrade purpose, and is fixed by this patch.
agreed. But remember to fix the minor version we talked about.
>
> Items 2) and 3) are exclusive, only one should be chosen, I would prefer
> 2), because it's simple, can be fixed in an hour.
agreed
>
> Item 3) would takes weeks to fix and can bring new problems if not
> properly implemented, and I think we should not introduce new features
> to or change behave of this version.
hard to disagree.
cheers,
Andrei
> Hi Andrei
>
> Thank you for your scrutiny and discussion over skype, it helped me to
> clarify some vague points of the problem. And here is my latest
> understanding of this problem after more investigation.
>
> For 5.0 masters, when a statement invokes a trigger/function that INSERT
> into an AUTO_INCREMENT column, the generated AUTO_INCREMENT value is not
> logged in the binary log, at the same time, when executing such a
> statement, the INSERT_ID values associated with the statement is ignored
> by the trigger/function invoked by the statement. This is documented in:
>
> http://dev.mysql.com/doc/refman/5.0/en/replication-features-autoincid.html
>
>
> For 5.1, this was fixed, the master will write the AUTO_INCREMENT value
> and the slave will use the value in this case.
>
> http://dev.mysql.com/doc/refman/5.1/en/replication-features-autoincid.html
>
> (NOTE: the document says that for triggers, the AUTO_INCREMENT value is
> not written into binary log, but this is not correct, the AUTO_INCREMENT
> value is written for both functions and triggers. I will report a bug on
> this once convinced.)
>
> However, for 5.0 masters, when executing such a statement within a SP,
> an Intvar event *IS* written into the binary log, with a wrong value.
> According the document there should not happen. This does not cause any
> problem when replicating to 5.0 slaves because the wrong AUTO_INCREMENT
> value is not used for trigger/function when 5.0 slaves executing such a
> statement. But this causes problem when replicating to a 5.1 slave,
> which uses the AUTO_INCREMENT value for trigger/function invoked with
> the statement.
>
> Let's use an example to make it clearer, first, the definitions of
> tables t1, t2 and trigger tr1:
>
> CREATE TABLE t1 (id INT AUTO_INCREMENT PRIMARY KEY);
> CREATE TABLE t2 (id INT AUTO_INCREMENT PRIMARY KEY);
> DELIMITEDELIMITER //;R //;
> CREATE TRIGGER tr1 BEFORE DELETE
> ON t1 FOR EACH ROW
> BEGIN
> INSERT INTO t2 VALUES(NULL);
> END//
> DELIMITER ;//
>
> and now we run the following statements:
>
> INSERT INTO t1 VALUES(NULL);
> INSERT INTO t1 VALUES(NULL);
> DELETE FROM t1 WHERE id=1;
>
> If we run the above statements on 5.0 master, the last DELETE statement
> will not have a SET INSERT_ID associated with it, which is correct
> according to the document.
>
> INSERT_ID=1
> use `test`; insert into t1 values (NULL)
> INSERT_ID=2
> use `test`; insert into t1 values (NULL)
> use `test`; delete from t1 where id = 1
>
> However, if we wrap the above tree statements into a SP and them call
> the SP, it would act the same because SP is logged by unrolling. But
> this is not the case, if we call the SP, there will be a SET INSERT_ID
> associated with the last DELETE statement. This is inconsistent with the
> document and is the cause of this bug.
>
> INSERT_ID=1
> use `test`; INSERT INTO t1 VALUES (NULL)
> INSERT_ID=2
> use `test`; INSERT INTO t1 VALUES (NULL)
> INSERT_ID=2 <- this is incorrect
> use `test`; DELETE FROM t1 WHERE id = 1
>
> when replicating to a 5.0 slave, this will not cause problem because the
> incorrect INSERT_ID value is ignored by the trigger/function. But this
> will cause problem when replicating to a 5.1 slave, which uses the
> incorrect INSERT_ID value when calling trigger/function.
>
> To sum up, things we may need to do:
> 1) Let the 5.1 ignore the INSERT_ID value as 5.0 does when it is
> replicating from 5.0 masters that have this problem
> 2) Make 5.0 follow the document when using SP, do not write the
> INSERT_ID value
> 3) Make 5.0 behave the same as 5.1 on this issue
>
> item 1) is necessay for upgrade purpose, and is fixed by this patch.
>
> Items 2) and 3) are exclusive, only one should be chosen, I would prefer
> 2), because it's simple, can be fixed in an hour.
>
> Item 3) would takes weeks to fix and can bring new problems if not
> properly implemented, and I think we should not introduce new features
> to or change behave of this version.
>
>
> On 2008-03-03 Mon 12:52 +0200,Andrei Elkin Wrote:
>> Zhenxing, good morning.
>>
>> > Hi Andrei
>> >
>> > Thank you for your review! See below.
>> >
>> > On 2008-02-29 Fri 14:11 +0200,Andrei Elkin wrote:
>> >> ZhenXing, hello.
>> >>
>> >> The bug you are dealing with is certainly very compicated.
>> >> Please read on about my findings and a solution how we can solve the
> issues.
>> >>
>> >> > Below is the list of changes that have just been committed into a
> local
>> >> > 5.1 repository of hezx. When hezx does a push these changes
>> >> > will be propagated to the main repository and, within 24 hours
> after the
>> >> > push, to the public repository.
>> >> > For information on how to access the public repository
>> >> > see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>> >> >
>> >> > ChangeSet@stripped, 2008-02-26 22:43:27+08:00, hezx@stripped +7
> -0
>> >> > BUG#33029 5.0 to 5.1 replication fails on dup key when inserting
>> >> > using a trig in SP
>> >> >
>> >> > When replicating from 5.0 to 5.1, using trigger or function in
>> >> > stored procedure will cause the replication break.
>> >> >
>> >>
>> >>
>> >> You needed to be more specific and explain what happended in that
>> >> case.
>> >> Examining a regression test i found two issues which are left as
>> >> comments on the bug page, please read them too:
>> >>
>> >> 1. when 5.0 master executes
>> >>
>> >> delete from te
>> >>
>> >> where the explicit table te has a trigger inserting into an
>> >> implicit table ti
>> >>
>> >> it does not generate set insert_id intvar event altogether.
>> >>
>> >> 2. 5.1 master does generate the intvar event in that case.
>> >>
>> >> 3. Given 5.0 master and the stored procedure of your test the binlog
> of its
>> >> execution has the intvar with an incorrect auto-inc value
>> >> (you can check that's 5 not 1)
>> >>
>> >> 4. I guess - you need to check it - 5.1 master makes the same
>> >> mistake at p.3. Btw, use set @@session.binlog_format=statement.
>> >
>> > 5.1 master is correct on at p.3, it has the correct value 1 not 5.
>> >
>>
>> Thanks for comforting there is no issue with 5.1!
>>
>> >>
>> >> > The reason of the bug is because when run a statement that calls
>> >> > function or trigger in a SP, For 5.1 masters, if the statement
> or
>> >> > functions/triggers used generate auto_increment values, the
> first
>> >> > such used value will be logged in binlog, if the statement do
> not
>> >> > used such values, the auto_increment value used by
>> >> > functions/triggers will be logged, and the value will later be
>> >> > used by 5.1 slaves. However this is not the case for 5.0
> masters,
>> >> > it's always the next auto_increment value of the statement that
>> >> > will be logged, and at the same time, 5.0 slaves will not use
> the
>> >> > value for functions/triggers. If we replicate from 5.0 to 5.1,
>> >> > the auto_increment value of the statement will be use as the
>> >> > value for functions/triggers, and then cause the break.
>> >> >
>> >>
>> >> Sorry, I did not understand the writing. What i can guess from the
>> >> patch you tried to protect the current statement from the most recent
>> >> set insert_id which carries the wrong value.
>> >> Should not we correct the value instead on master?
>> >>
>> >
>> > I think the point of this bug is compatible with the behavour of 5.0, so
>> > 5.0 servers can be upgraded to 5.1 or newer without breaking the
>> > replication.
>>
>> I assume you have agreed with p.1 + p.3 are the reason for the failure of
>> the reported hitting the prim key on the slave.
>> You must notice that such artifact happens regardless of the master's version.
>> In other words, 5.0 -> 5.0 replication is broken currently as well.
>>
>> Please consult with Lars, #support if you doubt we need to fix this.
>> I personly don't doubt that it should be mended.
>>
>> What I would offer for you to do at first is to find out what are changes that
>> made 5.1 master to binlog correctly and to merge them with 5.0.
>>
>> > it doesn't matter if the issue is fixed or not in the new
>> > 5.0 relesse, there are already releases of 5.0 (and versions before
>> > 5.1.12) suffer this problem, even if we fixed the problem, we still have
>> > to cope with this bug so that these buggy old versions can be upgraded
>> > to new versions.
>>
>> Wrt to protecting 5.0 -> 5.1 upgrade scenario the way you made it does
>> seem secure:
>>
>> "hiding" set_insert_id for the subsequent query, and thus
>> asking innodb engine for evaluating the auto_inc value, that your
>> patch implements leaves number of possibilities to reach an
>> inconsistent state on slave.
>> Look at comments on the bug page I refer as the parent for yours
>> how you can do that: start a transaction on master or slave
>> that insert auto_inc values to the implicitly involved table
>> to delete statement (tab_b) and roll it back.
>> Since this moment, innodb engines on master and slave side views
>> differently on how to calculate the auto_inc.
>>
>> Considering the upgrade problem I wonder if users are better to
>> upgrade first on a fixed 5.0 first. Perhaps for that many could afford
>> quick changing and restarting of their 5.0 master.
>>
>>
>> >> That sounds to me the major thing to do.
>> >>
>> >> please mail me your opinion,
>> >>
>> >> cheers,
>> >>
>> >> Andrei
>> >>
>> >> > This patch fixes this bug by making 5.1 slaves checking for 5.0
>> >> > masters, and if it is, don't use the saved auto_increment value
>> >> > for functions/triggers to be compatible with the 5.0 master.
>> >> >
>> >> > mysql-test/suite/rpl/r/rpl_auto_increment_bug33029.result@stripped,
> 2008-02-26 22:43:24+08:00, hezx@stripped +109 -0
>> >> > Add test for BUG#33029
>> >> >
>> >> > mysql-test/suite/rpl/r/rpl_auto_increment_bug33029.result@stripped,
> 2008-02-26 22:43:24+08:00, hezx@stripped +0 -0
>> >> >
>> >> > mysql-test/suite/rpl/t/rpl_auto_increment_bug33029.test@stripped,
> 2008-02-26 22:43:24+08:00, hezx@stripped +80 -0
>> >> > Add test for BUG#33029
>> >> >
>> >> > mysql-test/suite/rpl/t/rpl_auto_increment_bug33029.test@stripped,
> 2008-02-26 22:43:24+08:00, hezx@stripped +0 -0
>> >> >
>> >> > sql/slave.cc@stripped, 2008-02-26 22:43:23+08:00, hezx@stripped
> +22 -0
>> >> > Add function to check for bug#33029
>> >> >
>> >> > sql/slave.h@stripped, 2008-02-26 22:43:24+08:00, hezx@stripped
> +1 -0
>> >> > Add function to check for bug#33029
>> >> >
>> >> > sql/sql_class.cc@stripped, 2008-02-26 22:43:24+08:00,
> hezx@stripped +11 -0
>> >> > if master has bug#33029, reset auto_inc_intervals_forced for
> sub statements
>> >> >
>> >> > sql/sql_class.h@stripped, 2008-02-26 22:43:24+08:00,
> hezx@stripped +1 -0
>> >> > Add member to save and restore auto_inc_intervals_forced
>> >> >
>> >> > sql/structs.h@stripped, 2008-02-26 22:43:24+08:00, hezx@stripped
> +22 -0
>> >> > Add functions to save and restore Discrete_intervals_list
> object
...