List:Commits« Previous MessageNext Message »
From:Andrei Elkin Date:March 6 2008 10:08am
Subject:Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029
View as plain text  
Zhenxing, hello.

I am glad you are confirming the set of issues the big represents as
well for a good finding on a documentation artifact.

Getting straight to the point:

> To sum up, things we may need to do:
>   1) Let the 5.1 ignore the INSERT_ID value as 5.0 does when it is
> replicating from 5.0 masters that have this problem
>   2) Make 5.0 follow the document when using SP, do not write the
> INSERT_ID value
>   3) Make 5.0 behave the same as 5.1 on this issue
>
> item 1) is necessay for upgrade purpose, and is fixed by this patch.

agreed. But remember to fix the minor version we talked about.

>
> Items 2) and 3) are exclusive, only one should be chosen, I would prefer
> 2), because it's simple, can be fixed in an hour. 

agreed


>
> Item 3) would takes weeks to fix and can bring new problems if not
> properly implemented, and I think we should not introduce new features
> to or change behave of this version.

hard to disagree.

cheers,

Andrei


> Hi Andrei
>
> Thank you for your scrutiny and discussion over skype, it helped me to
> clarify some vague points of the problem. And here is my latest
> understanding of this problem after more investigation.
>
> For 5.0 masters, when a statement invokes a trigger/function that INSERT
> into an AUTO_INCREMENT column, the generated AUTO_INCREMENT value is not
> logged in the binary log, at the same time, when executing such a
> statement, the INSERT_ID values associated with the statement is ignored
> by the trigger/function invoked by the statement. This is documented in:
>
> http://dev.mysql.com/doc/refman/5.0/en/replication-features-autoincid.html
>
>
> For 5.1, this was fixed, the master will write the AUTO_INCREMENT value
> and the slave will use the value in this case.
>
> http://dev.mysql.com/doc/refman/5.1/en/replication-features-autoincid.html
>
> (NOTE: the document says that for triggers, the AUTO_INCREMENT value is
> not written into binary log, but this is not correct, the AUTO_INCREMENT
> value is written for both functions and triggers. I will report a bug on
> this once convinced.)
>
> However, for 5.0 masters, when executing such a statement within a SP,
> an Intvar event *IS* written into the binary log, with a wrong value.
> According the document there should not happen. This does not cause any
> problem when replicating to 5.0 slaves because the wrong AUTO_INCREMENT
> value is not used for trigger/function when 5.0 slaves executing such a
> statement. But this causes problem when replicating to a 5.1 slave,
> which uses the AUTO_INCREMENT value for trigger/function invoked with
> the statement.
>
> Let's use an example to make it clearer, first, the definitions of
> tables t1, t2 and trigger tr1:
>
> CREATE TABLE t1 (id INT AUTO_INCREMENT PRIMARY KEY);
> CREATE TABLE t2 (id INT AUTO_INCREMENT PRIMARY KEY);
> DELIMITEDELIMITER //;R //;
> CREATE TRIGGER tr1 BEFORE DELETE
>   ON t1 FOR EACH ROW
>   BEGIN
> 		INSERT INTO t2 VALUES(NULL);
>   END//
> DELIMITER ;//
>
> and now we run the following statements:
>
> INSERT INTO t1 VALUES(NULL);
> INSERT INTO t1 VALUES(NULL);
> DELETE FROM t1 WHERE id=1;
>
> If we run the above statements on 5.0 master, the last DELETE statement
> will not have a SET INSERT_ID associated with it, which is correct
> according to the document. 
>
>  INSERT_ID=1
>  use `test`; insert into t1 values (NULL)
>  INSERT_ID=2
>  use `test`; insert into t1 values (NULL)
>  use `test`; delete from t1 where id = 1
>
> However, if we wrap the above tree statements into a SP and them call
> the SP, it would act the same because SP is logged by unrolling. But
> this is not the case, if we call the SP, there will be a SET INSERT_ID
> associated with the last DELETE statement. This is inconsistent with the
> document and is the cause of this bug.
>
>  INSERT_ID=1
>  use `test`; INSERT INTO t1 VALUES (NULL)
>  INSERT_ID=2
>  use `test`; INSERT INTO t1 VALUES (NULL)
>  INSERT_ID=2                                     <- this is incorrect
>  use `test`; DELETE FROM t1 WHERE id = 1
>
> when replicating to a 5.0 slave, this will not cause problem because the
> incorrect INSERT_ID value is ignored by the trigger/function. But this
> will cause problem when replicating to a 5.1 slave, which uses the
> incorrect INSERT_ID value when calling trigger/function.
>

> To sum up, things we may need to do:
>   1) Let the 5.1 ignore the INSERT_ID value as 5.0 does when it is
> replicating from 5.0 masters that have this problem
>   2) Make 5.0 follow the document when using SP, do not write the
> INSERT_ID value
>   3) Make 5.0 behave the same as 5.1 on this issue
>
> item 1) is necessay for upgrade purpose, and is fixed by this patch.
>
> Items 2) and 3) are exclusive, only one should be chosen, I would prefer
> 2), because it's simple, can be fixed in an hour. 
>
> Item 3) would takes weeks to fix and can bring new problems if not
> properly implemented, and I think we should not introduce new features
> to or change behave of this version.

>
>
> On 2008-03-03 Mon 12:52 +0200,Andrei Elkin Wrote: 
>> Zhenxing, good morning.
>> 
>> > Hi Andrei
>> >
>> > Thank you for your review! See below.
>> >
>> > On 2008-02-29 Fri 14:11 +0200,Andrei Elkin wrote:
>> >> ZhenXing, hello.
>> >> 
>> >> The bug you are dealing with is certainly very compicated.
>> >> Please read on about my findings and a solution how we can solve the
> issues.
>> >> 
>> >> > Below is the list of changes that have just been committed into a
> local
>> >> > 5.1 repository of hezx.  When hezx does a push these changes
>> >> > will be propagated to the main repository and, within 24 hours
> after the
>> >> > push, to the public repository.
>> >> > For information on how to access the public repository
>> >> > see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>> >> >
>> >> > ChangeSet@stripped, 2008-02-26 22:43:27+08:00, hezx@stripped +7
> -0
>> >> >   BUG#33029 5.0 to 5.1 replication fails on dup key when inserting
>> >> >   using a trig in SP
>> >> >   
>> >> >   When replicating from 5.0 to 5.1, using trigger or function in
>> >> >   stored procedure will cause the replication break.
>> >> >   
>> >> 
>> >> 
>> >> You needed to be more specific and explain what happended in that
>> >> case.
>> >> Examining a regression test i found two issues which are left as
>> >> comments on the bug page, please read them too:
>> >> 
>> >> 1.  when 5.0 master executes
>> >> 
>> >>     delete from te 
>> >>     
>> >>     where the explicit table te has a trigger inserting into an
>> >>     implicit table ti
>> >> 
>> >>     it does not generate set insert_id intvar event altogether.
>> >> 
>> >> 2.  5.1 master does generate the intvar event in that case.
>> >> 
>> >> 3.  Given 5.0 master and the stored procedure of your test  the binlog
> of its
>> >>     execution has the intvar with an incorrect auto-inc value
>> >>     (you can check that's 5 not 1)
>> >> 
>> >> 4.  I guess  - you need to check it - 5.1 master makes the same
>> >>     mistake at p.3. Btw, use set @@session.binlog_format=statement.
>> >
>> > 5.1 master is correct on at p.3, it has the correct value 1 not 5.
>> >
>> 
>> Thanks for comforting there is no issue with 5.1!
>> 
>> >> 
>> >> >   The reason of the bug is because when run a statement that calls
>> >> >   function or trigger in a SP, For 5.1 masters, if the statement
> or
>> >> >   functions/triggers used generate auto_increment values, the
> first
>> >> >   such used value will be logged in binlog, if the statement do
> not
>> >> >   used such values, the auto_increment value used by
>> >> >   functions/triggers will be logged, and the value will later be
>> >> >   used by 5.1 slaves. However this is not the case for 5.0
> masters,
>> >> >   it's always the next auto_increment value of the statement that
>> >> >   will be logged, and at the same time, 5.0 slaves will not use
> the
>> >> >   value for functions/triggers. If we replicate from 5.0 to 5.1,
>> >> >   the auto_increment value of the statement will be use as the
>> >> >   value for functions/triggers, and then cause the break.
>> >> >   
>> >> 
>> >> Sorry, I did not understand the writing. What i can guess from the
>> >> patch you tried to protect the current statement from the most recent
>> >> set insert_id which carries the wrong value.
>> >> Should not we correct the value instead on master?
>> >> 
>> >
>> > I think the point of this bug is compatible with the behavour of 5.0, so
>> > 5.0 servers can be upgraded to 5.1 or newer without breaking the
>> > replication.
>> 
>> I assume you have agreed with p.1 + p.3 are the reason for the failure of
>> the reported hitting the prim key on the slave.
>> You must notice that such artifact happens regardless of the master's version.
>> In other words, 5.0 -> 5.0 replication is broken currently as well.
>> 
>> Please consult with Lars, #support if you doubt we need to fix this.
>> I personly don't doubt that it should be mended.
>> 
>> What I would offer for you to do at first is to find out what are changes that
>> made 5.1 master to binlog correctly and to merge them with 5.0.
>> 
>> > it doesn't matter if the issue is fixed or not in the new
>> > 5.0 relesse, there are already releases of 5.0 (and versions before
>> > 5.1.12) suffer this problem, even if we fixed the problem, we still have
>> > to cope with this bug so that these buggy old versions can be upgraded
>> > to new versions.
>> 
>> Wrt to protecting 5.0 -> 5.1 upgrade scenario the way you made it does
>> seem secure:
>> 
>>   "hiding" set_insert_id for the subsequent query, and thus
>>    asking innodb engine for evaluating the auto_inc value, that your
>>    patch implements leaves number of possibilities to reach an
>>    inconsistent state on slave.
>>    Look at comments on the bug page I refer as the parent for yours
>>    how you can do that: start a transaction on master or slave
>>    that insert auto_inc values to the implicitly involved table
>>    to delete statement (tab_b) and roll it back.
>>    Since this moment, innodb engines on master and slave side views
>>    differently on how to calculate the auto_inc.
>> 
>> Considering the upgrade problem I wonder if users are better to
>> upgrade first on a fixed 5.0 first. Perhaps for that many could afford
>> quick changing and restarting of their 5.0 master.
>> 
>> 
>> >> That sounds to me the major thing to do.
>> >> 
>> >> please mail me your opinion,
>> >> 
>> >> cheers,
>> >> 
>> >> Andrei
>> >> 
>> >> >   This patch fixes this bug by making 5.1 slaves checking for 5.0
>> >> >   masters, and if it is, don't use the saved auto_increment value
>> >> >   for functions/triggers to be compatible with the 5.0 master.
>> >> >
>> >> >   mysql-test/suite/rpl/r/rpl_auto_increment_bug33029.result@stripped,
> 2008-02-26 22:43:24+08:00, hezx@stripped +109 -0
>> >> >     Add test for BUG#33029
>> >> >
>> >> >   mysql-test/suite/rpl/r/rpl_auto_increment_bug33029.result@stripped,
> 2008-02-26 22:43:24+08:00, hezx@stripped +0 -0
>> >> >
>> >> >   mysql-test/suite/rpl/t/rpl_auto_increment_bug33029.test@stripped,
> 2008-02-26 22:43:24+08:00, hezx@stripped +80 -0
>> >> >     Add test for BUG#33029
>> >> >
>> >> >   mysql-test/suite/rpl/t/rpl_auto_increment_bug33029.test@stripped,
> 2008-02-26 22:43:24+08:00, hezx@stripped +0 -0
>> >> >
>> >> >   sql/slave.cc@stripped, 2008-02-26 22:43:23+08:00, hezx@stripped
> +22 -0
>> >> >     Add function to check for bug#33029
>> >> >
>> >> >   sql/slave.h@stripped, 2008-02-26 22:43:24+08:00, hezx@stripped
> +1 -0
>> >> >     Add function to check for bug#33029
>> >> >
>> >> >   sql/sql_class.cc@stripped, 2008-02-26 22:43:24+08:00,
> hezx@stripped +11 -0
>> >> >     if master has bug#33029, reset auto_inc_intervals_forced for
> sub statements
>> >> >
>> >> >   sql/sql_class.h@stripped, 2008-02-26 22:43:24+08:00,
> hezx@stripped +1 -0
>> >> >     Add member to save and restore auto_inc_intervals_forced
>> >> >
>> >> >   sql/structs.h@stripped, 2008-02-26 22:43:24+08:00, hezx@stripped
> +22 -0
>> >> >     Add functions to save and restore Discrete_intervals_list
> object
...
Thread
bk commit into 5.1 tree (hezx:1.2534) BUG#33029hezx26 Feb
  • Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029Andrei Elkin29 Feb
    • Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029He Zhenxing3 Mar
      • Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029Andrei Elkin3 Mar
        • Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029He Zhenxing6 Mar
          • Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029Andrei Elkin6 Mar
            • Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029He Zhenxing7 Mar
              • Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029Andrei Elkin7 Mar
                • Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029He Zhenxing10 Mar
Re: bk commit into 5.1 tree (hezx:1.2534) BUG#33029He Zhenxing29 Feb