List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:April 27 2011 1:59pm
Subject:Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872
View as plain text  
Hello,

Andrei Elkin a écrit, Le 26.04.2011 18:36:
> Guilhem, hello.
> 
> The fixes look good except one one puzzling point. Please read on.
> 
>> #At file:///home/mysql_src/bzrrepos_new/mysql-trunk/ based on
> revid:sven.sandberg@stripped
>>
>>  3350 Guilhem Bichot	2011-04-14
>>       WL#5872 "avoid using global heap memory to remember autoincrement values
> for statement-based binlog".
>>       As long as WL3404 "new binlog event Insert_id_list_event to store multiple
> auto_increment intervals"
>>       is not implemented, Discrete_intervals_list always contains at most one
> interval,
>>       so it makes sense to store the interval as a member of
> Discrete_intervals_list,
>>       rather than allocating it on the global heap with new(). This is to save
> one new()
>>       per statement inserting into an auto-increment column, and is a sub-part
> of
>>       WL5774 "Decrease number of malloc's for normal DML queries".
>>      @ sql/structs.h
>>         1) ** Edits to the full class **
>>         The simplified class has to provide all the interface of the full class.
>>         So, shorten the public interface of the full class, to shorten the work
>>         needed to code the simplified class.
>>         1.1) remove get_head()/current()/tail(), because those were used only
>>         in private functions where head/current/tail are accessible anyway;
>>         head/current/tail are implementation details, users of the class
>>         don't need to see them.
>>         1.2) make copy_shallow() and append(Discrete_interval*) private,
>>         because they are used only in private functions (same arguments as
>>         above).
>>         
>>         2) ** implementation of the simplified class **
>>         Same public interface as full class, but simpler code, and no new().
>>         This class is a "POD", so it can use the default copy constructor,
>>         and swap() is simple to write.
>>
>>     modified:
>>       sql/structs.h
>> === modified file 'sql/structs.h'
>> --- a/sql/structs.h	2010-10-21 09:49:16 +0000
>> +++ b/sql/structs.h	2011-04-14 14:42:31 +0000
>> @@ -292,7 +292,114 @@ public:
>>    };
>>  };
>>  
>> -/* List of Discrete_interval objects */
>> +/**
>> +  As long as statement-based binary logging does not support multiple
>> +  autoincrement intervals, we can use the simplified version of
>> +  Discrete_intervals_list which supports only 0 or 1 elements.
>> +*/
>> +#define DISCRETE_INTERVAL_LIST_HAS_MAX_ONE_ELEMENT 1
>> +
>> +#ifdef DISCRETE_INTERVAL_LIST_HAS_MAX_ONE_ELEMENT
>> +
>> +/**
>> +  This is a simplified version of Discrete_intervals_list: it's a degenerate
>> +  list which supports only 0 or 1 element. Compared to the full version (which
>> +  is after #else), it avoids one allocation on the global heap. Any fix done
>> +  here must also be done in the full version.
>> +  Implemented as part of WL#.
>> +*/
>> +class Discrete_intervals_list {
>> +private:
>> +  uint elements;                             ///< number of elements (0 or
> 1)
>> +  /// Storage for our only element. Meaningless if elements == 0.
>> +  Discrete_interval single_interval;
>> +  /**
>> +    Says whether the next get_next() call should return our only element. This
>> +    is false if:
>> +    - either the list is empty
>> +    - or the list has one element but we already returned it in a previous
>> +    get_next().
>> +  */
>> +  bool current_is_positioned;
>> +
>> +public:
>> +  Discrete_intervals_list()  { empty(); }
>> +  /// Empties the list
>> +  void empty()
>> +  {
>> +    elements= 0;
>> +    current_is_positioned= false;
>> +  }
>> +  void swap (Discrete_intervals_list *dli)
>> +  {
>> +    Discrete_intervals_list tmp(*dli);
>> +    *dli= *this;
>> +    *this= tmp;
>> +  }
>> +  /**
>> +    Gets next interval from list.
>> +    @returns next interval, or NULL if no next interval.
>> +  */
>> +  const Discrete_interval* get_next()
>> +  {
>> +    if (current_is_positioned)
>> +    {
>> +      /*
>> +        Positioned on single interval. We return it, and after this there is
>> +        no 'next' so we won't be positioned.
>> +      */
>> +      current_is_positioned= false;
>> +      return &single_interval;
>> +    }
>> +    else
>> +      return NULL;
>> +  }
>> +  /**
>> +    Appends an interval to the list.
>> +    @param start  start of interval
>> +    @val   how    many values it contains
>> +    @param incr   what increment between each value
>> +    @retval true  error
>> +    @retval false success
>> +  */
>> +  bool append(ulonglong start, ulonglong val, ulonglong incr)
>> +  {
>> +    if (elements == 1)
>> +    {
> 
> If the server runs with --innodb_autoinc_lock_mode=2
> there seem to be a possibility ...
> 
>> +      if (unlikely(single_interval.merge_if_contiguous(start, val, incr)))
>> +      {
>> +        /*
>> +          The merge was impossible. But we cannot create yet another interval
>> +          as we already contain one. This must not happen. Statement-based
>> +          binary logging does not support this.
>> +        */
>> +        DBUG_ASSERT(0);
> 
> ... to arrive at the assert ^ above.
> 
> I just tested the *base* code line with 
>    
>    insert into t1 values (null),(null),(1025),(null);
> 
> (where, create table t1 (a int auto_increment primary key)
> engine=innodb;)
> 
> which makes two intervals and the 2nd starting from 1026 could not be merged because
> the first ends with 5.
> 
> Being unable to merge in bool append(ulonglong start, ulonglong val, ulonglong incr)
>     
>     if ((head == NULL) || tail->merge_if_contiguous(start, val, incr))
>     {
> 
> the base code allocates a new interval
> 
>       /* it cannot, so need to add a new interval */
>       Discrete_interval *new_interval= new Discrete_interval(start, val, incr);
>       DBUG_RETURN(append(new_interval));
> 
> which does not affect negatively the eventual binary logging:
> 
> 
> mysql> show binlog events;
> ...
> 
> | master-bin.000001 | 241 | Query       |         1 |         309 | BEGIN            
>                                                            |
> | master-bin.000001 | 309 | Intvar      |         1 |         337 | INSERT_ID=1      
>                                                            |
> | master-bin.000001 | 337 | Query       |         1 |         449 | use `test`;
> insert into t1 values (null),(null),(1025),(null)                |
> ...

You have a 90% good point. Thanks.
The Discrete_intervals_list is used only in SBR, and 
innodb_autoinc_lock_mode=2 is not safe with SBR as said in
http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html#innodb-auto-increment-configurable
Your case above replicates fine in SBR as long as it's in a 
single-thread system; if a second thread comes just after thread1 has 
inserted 1025, and this thread2 inserts, it will steal 1026 from 
thread1, so thread1 will have inserted
1,2,1025,1027
which cannot be replicated in SBR as the Intvar only contains "1".

Having an assertion failure in this case (only in the debug binary) is 
not too bad, as this is a non-working configuration which, without the 
assertion, would lead to an out-of-sync slave.

But we can take your scenario in innodb_autoinc_lock_mode=1 too, and 
then the assertion still fires, which is bad (as this mode is safe for 
SBR). So I need to change my patch; instead of assert(0), I'll just 
ignore the new interval. So, in the INSERT above, where InnoDB generates 
two intervals
[1,4]
and
[1026,1026]
(I see them in gdb), then the Discrete_intervals_list will remember only 
[1,4], which leads to storing "1" in the Intvar, which is suitable for 
SBR. The [1026,1026] being "deterministically due to the insertion of 
1025", we can ignore it...

So thanks a lot for the catch!

> 
> So I am wondering in particular if the patch's assert is good in combination with
> the
> interleaved auto-increment locking of Innodb.
> 
> In any rate we need to be sure that there is a test verifying  your
> patch against --innodb_autoinc_lock_mode=2 like in above Insert (I bet there must be
> one,
> but I have not tried to locate it, sorry).

Just:
mysql> create table t(a int auto_increment primary key) engine=innodb;
mysql> insert into t values (null),(null),(1025),(null);
suffices to make the assert fire, with the default config of innodb 
(i.e. lock mode = 1). I have now put this in a test.

New commit:
http://lists.mysql.com/commits/136208
The changes are in rpl_auto_increment and the append() function.
Thread
bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Guilhem Bichot27 Apr
  • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Sven Sandberg28 Apr
    • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Guilhem Bichot28 Apr
      • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Sven Sandberg28 Apr
        • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Guilhem Bichot30 Apr
    • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Guilhem Bichot2 May
Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Guilhem Bichot27 Apr
  • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Guilhem Bichot27 Apr
Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Guilhem Bichot27 Apr
Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Guilhem Bichot27 Apr
Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Sven Sandberg28 Apr
Re: bzr commit into mysql-trunk branch (guilhem.bichot:3350) WL#5872Sven Sandberg28 Apr