List:Commits« Previous MessageNext Message »
From:Mattias Jonsson Date:April 25 2008 11:41am
Subject:Re: bk commit into 5.1 tree (mattiasj:1.2518) BUG#33479
View as plain text  
Hi,

First, I just realized that the use of ha_data will eventually collide
for partitioning, when another storage engines start to use it. So in
the future, I believe that partitioning should have an own
ha_partition_data (HA_PARTITION_DATA*) in TABLE_SHARE, since it is not
mutual exclusive using partitioning and an other storage engine using
the new ha_data.

Should I do the change now to be auto_increment specific or partition
specific or let it be handled in WL#4305 ?

Other than that, thanks for you comments, my replies are marked with MJ.

Regards
Mattias

Guilhem Bichot wrote:
> Hello Mattias,
> 
> My comments prefixed with GB.
> Please also see the ones I sent in a previous mail:
>  Date: Thu, 10 Apr 2008 15:59:41 +0200
>  From: Guilhem Bichot <guilhem@stripped>
>  To: Mattias Jonsson <mattiasj@stripped>
> 
> Generally I think you patch is approaching perfection. So some of my
> comments below are just ideas for how to present code, add comments.
> Regarding testing, given that your patch touches concurrency (longer
> lock for statement-based replication etc), I'd say it needs
> "additional QA testing" (you know, this checkbox in the bug
> report). Before the patch is rolled out to our customers, I think a QA
> person should run a stress test like this: multiple threads all
> inserting  into the same partitioned table, using row-based and then
> statement-based replication, checking that slave does not go wrong
> (it could go wrong without the lock specific of statement-based
> replication); checking that INSERTs don't receive duplicate keys
> errors (like was the case in the past for InnoDB+partition, fixed by
> some code, which you had to remove in your patch). Doing it for
> MyISAM, InnoDB, Falcon at least.

MJ: I'll add this in the bug report.

> 
> On Tue, Apr 01, 2008 at 11:04:45PM +0200, mattiasj@stripped wrote:
>> ChangeSet@stripped, 2008-04-01 23:04:40+02:00, mattiasj@witty. +11 -0
>>   Bug#33479: auto_increment failures in partitioning
>>   
>>   Several problems with auto_increment in partitioning
>>   (with MyISAM, InnoDB and Falcon. Locking issues, not handling
>>   multi row inserts properly etc.)
>>   
>>   This is a full patch which I have updated according to sergs and
>>   guilhems comments.
>>   Changed the auto_increment handling for partitioning:
>>   Added a ha_data variable in table_share for storage engine specific data such
> as
>>   auto_increment value handling in partitioning
>>   and using the ha_data->mutex to lock around read + update.
>>   
>>   The idea is this:
>>   Store the table's reserved auto_increment value in
>>   the TABLE_SHARE and use a mutex to, lock it for reading and updating it
>>   and unlocking it, in one block. Only accessing all partitions
>>   when it is not initialized.
>>   Also allow reservations of ranges, and if no one has done a reservation
>>   afterwards, lower the reservation to what was actually used after
>>   the statement is done (via release_auto_increment from WL 3146).
>>   The lock is kept from the first reservation if it is statement based
>>   replication and a multi row insert statement
> 
> GB "multi row" is a bit vague, see below.
> 
>>   (insert select, load data etc.)
> 
> GB We don't know what 'etc' is.
> InnoDB has descriptions and names here:
> http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html
> ("simple inserts", "bulk inserts", "mixed inserts"), but it's not easy
> to guess from only the name, what a "simple insert" is.
> I'd say
> "multi-row INSERT statement where the number of candidate rows to
> insert is not known in advance (like INSERT SELECT, LOAD DATA; unlike
> INSERT VALUES(row1),(row2),...,(rowN))"
> Such verbose description is even more important in the code, which has
> a longer life than the changeset comment.
> 

MJ: have changed the comment in the code and the changeset

>>   sql/ha_partition.cc@stripped, 2008-04-01 23:04:38+02:00, mattiasj@witty. +227
> -103
>>     Bug#33479: Failures using auto_increment and partitioning
>>     
>>     Changed ha_partition::get_auto_increment from file->get_auto_increment
>>     to file->info(HA_AUTO_STATUS), since it is works better with InnoDB
> 
> GB Please add why it works better.
> 

There seems to be a bug in ha_innobase::get_auto_increment. It does not
always update the *first_value variable, if called second time as a
second partition:

If you change the original code (ha_partition::get_auto_incremement,
ha_partition.cc ~ row 5637 from 'first_value_part= *first_value' to
'first_value_part= 0' and then DBUG_ASSERT(first_value_part != 0) after
the file->get_auto_increment call, it will fail on this test:

CREATE TABLE t1 (c1 INT AUTO_INCREMENT PRIMARY KEY)
ENGINE=InnoDB
PARTITION BY HASH (c1)
PARTITIONS 2;
INSERT INTO t1 VALUES (NULL);
INSERT INTO t1 VALUES (NULL);


>> diff -Nrup a/mysql-test/extra/rpl_tests/rpl_auto_increment.test
> b/mysql-test/extra/rpl_tests/rpl_auto_increment.test
>> --- a/mysql-test/extra/rpl_tests/rpl_auto_increment.test	2007-06-06 19:48:51
> +02:00
>> +++ b/mysql-test/extra/rpl_tests/rpl_auto_increment.test	2008-04-01 23:04:38
> +02:00
>> @@ -12,7 +12,7 @@
>>  -- source include/not_ndb_default.inc
>>  -- source include/master-slave.inc
>>  
>> -eval create table t1 (a int not null auto_increment,b int, primary key (a))
> engine=$engine_type2 auto_increment=3;
>> +eval create table t1 (a int not null auto_increment,b int, primary key (a))
> auto_increment=3 engine=$engine_type2;
>>  insert into t1 values (NULL,1),(NULL,2),(NULL,3);
>>  select * from t1;
>>  
>> @@ -76,7 +76,7 @@ insert into t1 values (NULL),(5),(NULL),
>>  insert into t1 values (500),(NULL),(502),(NULL),(NULL);
>>  select * from t1;
>>  set @@insert_id=600;
>> ---error ER_DUP_ENTRY
>> +--error ER_DUP_ENTRY, ER_DUP_KEY
> 
> GB do you know why two errors can happen now and not one?
> As for tests and results, I didn't re-check them since one of your
> previous commits. So I don't know if since then, you changed anything
> to them or the results changed.
> 

MJ: a non partitioned table returns ER_DUP_KEY and a partitioned table
returns ER_DUP_ENTRY, but I have changed this in the testcase (and will
not include the updated rpl_auto_increment test since I have not used it
in this patch.

>> diff -Nrup a/sql/ha_partition.cc b/sql/ha_partition.cc
>> --- a/sql/ha_partition.cc	2007-12-20 19:16:51 +01:00
>> +++ b/sql/ha_partition.cc	2008-04-01 23:04:38 +02:00
> 
>> @@ -2270,8 +2273,10 @@ int ha_partition::open(const char *name,
>>    uint alloc_len;
>>    handler **file;
>>    char name_buff[FN_REFLEN];
>> +  bool is_not_tmp_table= (table_share->tmp_table == NO_TMP_TABLE);
>>    DBUG_ENTER("ha_partition::open");
>>  
>> +  DBUG_ASSERT(table->s == table_share);
> 
> GB sounds logical but I hope there is no corner case... Are you sure
> that the '==' is always true?
> 

MJ: I'll try to be bold and say it must. I have not searched every
corner case, but when changinge the table->s, it should also call
table->file->change_table_ptr(table, table->s).

>> @@ -4412,20 +4439,53 @@ int ha_partition::handle_ordered_prev(uc
>>  int ha_partition::info(uint flag)
>>  {
>>    handler *file, **file_array;
> 
> GB please move those two variables down to the else{} which uses them.
> 

MJ: Moved

>> -  DBUG_ENTER("ha_partition:info");
>> +  DBUG_ENTER("ha_partition::info");
>>  
>>    if (flag & HA_STATUS_AUTO)
>>    {
>> -    ulonglong auto_increment_value= 0;
>> +    bool auto_inc_is_first_in_idx= (table_share->next_number_keypart == 0);
>> +    HA_DATA_PARTITION *ha_data= (HA_DATA_PARTITION*) table_share->ha_data;
>>      DBUG_PRINT("info", ("HA_STATUS_AUTO"));
>> -    file_array= m_file;
>> -    do
>> +    if (!table->found_next_number_field)
>> +      stats.auto_increment_value= 0;
>> +    else if (ha_data->auto_inc_initialized &&
> auto_inc_is_first_in_idx)
> 
> GB I suspect that you can remove '&& auto_inc_is_first_in_idx' above.
> 

MJ: Removed

>> +    {
>> +      lock_auto_increment();
>> +      stats.auto_increment_value= ha_data->next_auto_inc_val;
>> +      unlock_auto_increment();
>> +    }
>> +    else
>>      {
>> -      file= *file_array;
>> -      file->info(HA_STATUS_AUTO);
>> -      set_if_bigger(auto_increment_value, file->stats.auto_increment_value);
>> -    } while (*(++file_array));
>> -    stats.auto_increment_value= auto_increment_value;
>> +      lock_auto_increment();
> <cut>
> 
>> @@ -5579,19 +5639,36 @@ int ha_partition::cmp_ref(const uchar *r
>>                  MODULE auto increment
>>  ****************************************************************************/
>>  
>> -void ha_partition::restore_auto_increment(ulonglong)
>> -{
>> -  DBUG_ENTER("ha_partition::restore_auto_increment");
>>  
>> -  DBUG_VOID_RETURN;
>> +int ha_partition::reset_auto_increment(ulonglong value)
>> +{
>> +  handler **pos, **end;
>> +  int res;
>> +  HA_DATA_PARTITION *ha_data= (HA_DATA_PARTITION*) table_share->ha_data;
>> +  DBUG_ENTER("ha_partition::reset_auto_increment");
>> +  lock_auto_increment();
>> +  ha_data->auto_inc_initialized= FALSE;
>> +  ha_data->next_auto_inc_val= 0;
>> +  for (pos=m_file, end= m_file + m_tot_parts; pos != end ; pos++)
> 
> GB one might write it like this:
>   for (handler **pos=m_file, **end= m_file + m_tot_parts; pos != end ; pos++)
> to keep "pos" and "end" local to the loop where they are used.
> This is a mere suggestion.
> 
> By the way, this m_file+m_tot_parts is not your invention, but there
> is another way to write the loop, which is more common throughout the
> ha_partition.cc file, it's
> do { ...  } while (*(++file));
> This is also a mere suggestion.
> 

MJ: I have changed to do { ... } while (*(++file));

>> +  {
>> +    if ((res= (*pos)->ha_reset_auto_increment(value)) != 0)
>> +    {
>> +      unlock_auto_increment();
>> +      DBUG_RETURN(res);
>> +    }
> 
> GB I think the code could be smaller it just did "break" and then
> DBUG_RETURN(0) below would be changed to DBUG_RETURN(res).
> With a LINT_INIT(res) at start if a compiler complains.
> This could make the code smaller because it would remove one inlined
> function (unlock_auto_increment()) and, a DBUG_RETURN() which is a
> function in debug builds (which don't really matter, I agree).
> This is a mere suggestion.
> 

MJ: Changed

>> +  }
>> +  unlock_auto_increment();
>> +  DBUG_RETURN(0);
>>  }
>>  
>>  
>> -/*
>> +/**
>>    This method is called by update_auto_increment which in turn is called
>> -  by the individual handlers as part of write_row. We will always let
>> -  the first handler keep track of the auto increment value for all
>> -  partitions.
>> +  by the individual handlers as part of write_row. We use the
>> +  table_share->ha_data->next_auto_inc_val, or search all
>> +  partitions for the highest auto_increment_value if not initialized or
>> +  if auto_increment field is a secondary part of a key, we must search
>> +  every partition when holding a mutex to be sure of correctness.
>>  */
>>  
>>  void ha_partition::get_auto_increment(ulonglong offset, ulonglong increment,
>> @@ -5599,59 +5676,82 @@ void ha_partition::get_auto_increment(ul
>>                                        ulonglong *first_value,
>>                                        ulonglong *nb_reserved_values)
>>  {
>> -  ulonglong first_value_part, last_value_part, nb_reserved_values_part,
>> -    last_value= ~ (ulonglong) 0;
>> -  handler **pos, **end;
>> -  bool retry= TRUE;
>>    DBUG_ENTER("ha_partition::get_auto_increment");
>> -
>> -again:
>> -  for (pos=m_file, end= m_file+ m_tot_parts; pos != end ; pos++)
>> +  DBUG_PRINT("info", ("offset: %lu inc: %lu desired_values: %lu "
>> +                      "first_value: %lu", (ulong) offset, (ulong) increment,
>> +                      (ulong) nb_desired_values, (ulong) *first_value));
>> +  DBUG_ASSERT(increment && nb_desired_values);
>> +  *first_value= 0;
>> +  if (table->s->next_number_keypart)
>>    {
>> -    first_value_part= *first_value;
>> -    (*pos)->get_auto_increment(offset, increment, nb_desired_values,
>> -                               &first_value_part,
> &nb_reserved_values_part);
>> -    if (first_value_part == ~(ulonglong)(0)) // error in one partition
>> -    {
>> -      *first_value= first_value_part;
>> -      sql_print_error("Partition failed to reserve auto_increment value");
>> -      DBUG_VOID_RETURN;
>> -    }
>>      /*
>> -      Partition has reserved an interval. Intersect it with the intervals
>> -      already reserved for the previous partitions.
>> +      next_number_keypart is != 0 if the auto_increment column is a secondary
>> +      column in the index (it is allowed in MyISAM)
>>      */
>> -    last_value_part= (nb_reserved_values_part == ULONGLONG_MAX) ?
>> -      ULONGLONG_MAX : (first_value_part + nb_reserved_values_part * increment);
>> -    set_if_bigger(*first_value, first_value_part);
>> -    set_if_smaller(last_value, last_value_part);
>> -  }
>> -  if (last_value < *first_value) /* empty intersection, error */
>> -  {
>> +    DBUG_PRINT("info", ("next_number_keypart != 0"));
>> +    ulonglong first_value_part, nb_reserved_values_part;
>> +    handler **pos, **end;
>>      /*
>> -      When we have an empty intersection, it means that one or more
>> -      partitions may have a significantly different autoinc next value.
>> -      We should not fail here - it just means that we should try to
>> -      find a new reservation making use of the current *first_value
>> -      wbich should now be compatible with all partitions.
>> +      Must lock and find highest value among all partitions.
> 
> GB comment would fit in one line together with its /* */  (advantage:
> more lines on the screen when reading this complex function).
> 

MJ: changed

>>      */
>> -    if (retry)
>> +    lock_auto_increment();
>> +    for (pos=m_file, end= m_file + m_tot_parts; pos != end ; pos++)
> 
> GB pos and end could be local to the loop.
> 

MJ: Change into do { ... } while (*(++file));

>>      {
>> -      retry= FALSE;
>> -      last_value= ~ (ulonglong) 0;
>> -      release_auto_increment();
>> -      goto again;
>> +      first_value_part= 0;
> 
> GB please add a comment to explain the first_value_part=0 above: "hack
> for InnoDB which considers *first_value_part as an INOUT parameter
> though it's only an OUT parameter per the API". This is important.
> 

MJ: I have removed that line, since it was innodb only, and innodb does
not support auto_increment in a secondary column in a multi-column
index. As a safety, I do initialize it before the loop to *first_value.

>> +      /* Only nb_desired_values = 1 makes sense */
>> +      (*pos)->get_auto_increment(offset, increment, 1,
>> +                                 &first_value_part,
> &nb_reserved_values_part);
>> +      if (first_value_part == ~(ulonglong)(0)) // error in one partition
>> +      {
>> +        *first_value= first_value_part;
>> +        sql_print_error("Partition failed to reserve auto_increment value");
>> +        unlock_auto_increment();
>> +        DBUG_VOID_RETURN;
>> +      }
>> +      DBUG_PRINT("info", ("first_value_part: %lu", (ulong) first_value_part));
>> +      set_if_bigger(*first_value, first_value_part);
> 
> 
> GB this get_auto_increment() was partly written by me so this is more
> a critics of myself: it dereferences first_value too much, once per
> table in the partition.
> I'd try to remove most dereferences (but error cases matter less) by
> using a local variable, and in the end of the function, doing
> *first_value = the_local_variable.
> 

MJ: I added a local variable (max_first_part), do decrease the use of
dereferences.

>>      }
>> +    *nb_reserved_values= 1;
>> +    unlock_auto_increment();
>> +  }
>> +  else
>> +  {
>> +    THD *thd= ha_thd();
>> +    HA_DATA_PARTITION *ha_data= (HA_DATA_PARTITION*) table_share->ha_data;
>>      /*
>> -      We should not get here.
>> +     This is initialized in the beginning of the first write_row call.
> 
> GB indentation of "This is"
> 

MJ: Fixed.

>>      */
>> -    sql_print_error("Failed to calculate auto_increment value for partition");
>> -    
>> -    *first_value= ~(ulonglong)(0);
>> +    DBUG_ASSERT(ha_data->auto_inc_initialized);
>> +    /*
>> +      Get a lock for handling the auto_increment in table_share->ha_data
>> +      for avoiding two concurrent statements getting the same number.
>> +    */ 
>> +
>> +    lock_auto_increment();
>> +
>> +    /*
>> +      in a multi row insert statement like insert ... select, load data 
>> +      we must hold a lock/mutex for the whole statement if we have statement
>> +      based replication.
> 
> GB I'd mention the key fact that the number of candidate rows to
> insert is not known in advance. And also the reason why we have to
> hold the lock/mutex: because the statement-based binary log contains
> only the first generated value used by the statement, and slave assumes
> all other generated values used by this statement were consecutive to
> this first one, we must exclusively lock the generator until the
> statement is done. Something like that, maybe I'm too verbose. Well,
> something which a new dev could understand.
> 

MJ: Updated the comment to:
In a multi-row insert statement like INSERT SELECT and LOAD DATA
where the number of candidate rows to insert is not known in advance
we must hold a lock/mutex for the whole statement if we have statement
based replication. Because the statement-based binary log contains
only the first generated value used by the statement, and slaves assumes
all other generated values used by this statement were consecutive to
this first one, we must exclusively lock the generator until the statement
is done.

>> +    */
>> +    if (!auto_increment_multi_row_stmt_lock &&
>> +        thd->variables.binlog_format == BINLOG_FORMAT_STMT &&
>> +        (thd->lex->sql_command == SQLCOM_REPLACE_SELECT ||
>> +         thd->lex->sql_command == SQLCOM_INSERT_SELECT ||
>> +         thd->lex->sql_command == SQLCOM_LOAD))
> 
> GB As said in a previous email, this test should test other variables
> related to binlog.
> About sql_command values: InnoDB, in
> ha_innobase::innobase_autoinc_lock(), tests in another way:
> if (SQLCOM_INSERT) don't use lock else use lock.
> It is more future-safe if somebody later adds a new command to MySQL
> and does not update your test above: 
> the InnoDB way will be maybe too conservative for the new command (no
> data corruption but non-optimal performance) while the way of your
> patch will be maybe too unsafe for the new command (optimal
> performance but data corruption).
> 

MJ: changed to:
     if (!auto_increment_safe_stmt_log_lock &&
         thd->lex->sql_command != SQLCOM_INSERT &&
         mysql_bin_log.is_open() &&
         !thd->current_stmt_binlog_row_based &&
         (thd->options & OPTION_BIN_LOG))

And if there are new commands, it is easy to add them after SQLCOM_INSERT

>> +    {
>> +      DBUG_PRINT("info", ("locking auto_increment_multi_row_stmt_lock"));
>> +      auto_increment_multi_row_stmt_lock= TRUE;
>> +    }
>> +
>> +    /* this gets corrected (for offset/increment) in update_auto_increment */
>> +    *first_value= ha_data->next_auto_inc_val;
>> +    ha_data->next_auto_inc_val= *first_value + nb_desired_values *
> increment;
> 
> GB I'd change the line above to 
> ha_data->next_auto_inc_val+= nb_desired_values * increment;
> 

MJ: done

>> +
>> +    unlock_auto_increment();
>> +    DBUG_PRINT("info", ("*first_value: %lu", (ulong) *first_value));
>> +    *nb_reserved_values= nb_desired_values;
>>    }
>> -  if (increment)                                // If not check for values
>> -    *nb_reserved_values= (last_value == ULONGLONG_MAX) ?
>> -      ULONGLONG_MAX : ((last_value - *first_value) / increment);
>>    DBUG_VOID_RETURN;
>>  }
>> @@ -5659,9 +5759,33 @@ void ha_partition::release_auto_incremen
>>  {
>>    DBUG_ENTER("ha_partition::release_auto_increment");
> 
> GB I tested that bypassing this function makes your tests fail - good!
> 

MJ: OK, thanks!

>> -  for (uint i= 0; i < m_tot_parts; i++)
>> +  if (table->s->next_number_keypart)
>> +  {
>> +    for (uint i= 0; i < m_tot_parts; i++)
>> +      m_file[i]->ha_release_auto_increment();
>> +  }
>> +  else
>>    {
>> -    m_file[i]->ha_release_auto_increment();
>> +    if (next_insert_id)
> 
> GB we could glue 'else' and 'if' and remove a pair of {}:
>  else if (next_insert_id)
> 

MJ: Done

>> +    {
>> +      HA_DATA_PARTITION *ha_data= (HA_DATA_PARTITION*) table_share->ha_data;
>> +      lock_auto_increment();
>> +      if (next_insert_id < ha_data->next_auto_inc_val &&
>> +          auto_inc_interval_for_cur_row.maximum() >=
>> +          ha_data->next_auto_inc_val)
> 
> GB you may want to cache ha_data->next_auto_inc_val in a local
> variable to not dereference two times.
> 

MJ: Done, but this is on the fine line of over optimization :)

>> +        ha_data->next_auto_inc_val= next_insert_id;
>> +      DBUG_PRINT("info", ("ha_data->next_auto_inc_val: %lu",
>> +                          (ulong) ha_data->next_auto_inc_val));
>> +
>> +      /* Unlock the multi row statement lock taken in get_auto_increment */
>> +      if (auto_increment_multi_row_stmt_lock)
>> +      {
>> +        auto_increment_multi_row_stmt_lock= FALSE;
>> +        DBUG_PRINT("info", ("unlocking auto_increment_multi_row_stmt_lock"));
> 
> GB If there wasn't this DBUG_PRINT(), I'd suggest to remove the if() and
> unconditionally do
> auto_increment_multi_row_stmt_lock= FALSE;
> because:
> "read + test + jump"
> is probably more costly than "write".
> 

MJ: Point taken, but I would go for having the DBUG_PRINT, since it can
be good to have when debugging.

>> +      }
>> +
>> +      unlock_auto_increment();
>> +    }
>>    }
>>    DBUG_VOID_RETURN;
>>  }
>> diff -Nrup a/sql/ha_partition.h b/sql/ha_partition.h
>> --- a/sql/ha_partition.h	2007-09-24 15:30:28 +02:00
>> +++ b/sql/ha_partition.h	2008-04-01 23:04:38 +02:00
>> @@ -37,6 +37,15 @@ typedef struct st_partition_share
>>  } PARTITION_SHARE;
>>  #endif
>>  
>> +/**
>> +  Partition specific ha_data struct.
>> +  @todo: move all partition specific data from TABLE_SHARE here.
>> +*/
>> +typedef struct st_ha_data_partition
>> +{
>> +  ulonglong next_auto_inc_val;                 /**< first non reserved value
> */
>> +  bool auto_inc_initialized;
>> +} HA_DATA_PARTITION;
>>  
>>  #define PARTITION_BYTES_IN_POS 2
>>  class ha_partition :public handler
>> @@ -141,6 +150,8 @@ private:
>>      "own" the m_part_info structure.
>>    */
>>    bool is_clone;
>> +  bool auto_increment_lock;             /* lock reading/updating auto_inc */
>> +  bool auto_increment_multi_row_stmt_lock; /* used with auto_increment_lock */
> 
> GB Please make both comments start with /**< so that Doxygen picks
> them up.
> Is auto_increment_multi_row_stmt_lock a good name? I mean, it's not
> used for INSERT VALUES(),(),() which is a multi-row INSERT too.
> This variable is true when we don't know how many rows are to be
> inserted. So maybe we could find a name for this.
> And, "used with auto_increment_lock" is not really a description of
> the member. We should specify what it serves for. If the comment
> becomes too long to sit at the right of the declaration, it can come
> on top:
> /**
>   the comment
> */
> bool auto_increment_multi_row_stmt_lock;
> 

MJ: Changed the variable to auto_increment_safe_stmt_log_lock and added
a bigger comment.

>>  public:
>>    handler *clone(MEM_ROOT *mem_root);
>>    virtual void set_part_info(partition_info *part_info)
> 
>> @@ -828,12 +839,46 @@ public:
>>      auto_increment_column_changed
>>       -------------------------------------------------------------------------
>>    */
>> -  virtual void restore_auto_increment(ulonglong prev_insert_id);
>>    virtual void get_auto_increment(ulonglong offset, ulonglong increment,
>>                                    ulonglong nb_desired_values,
>>                                    ulonglong *first_value,
>>                                    ulonglong *nb_reserved_values);
>>    virtual void release_auto_increment();
>> +private:
>> +  virtual int reset_auto_increment(ulonglong value);
>> +  virtual void lock_auto_increment()
>> +  {
>> +    if (auto_increment_multi_row_stmt_lock)
>> +      return;
> 
> Gb I'd add above a comment like
> /* lock is already locked */
> 

MJ: Done.

>> +    DBUG_ASSERT(table_share->ha_data && !auto_increment_lock);
>> +    if(table_share->tmp_table == NO_TMP_TABLE)
>> +    {
>> +      auto_increment_lock= TRUE;
>> +      pthread_mutex_lock(&table_share->mutex);
>> +    }
>> +  }
>> +  virtual void unlock_auto_increment()
>> +  {
>> +    DBUG_ASSERT(table_share->ha_data);
>> +    if(auto_increment_lock && !auto_increment_multi_row_stmt_lock)
> 
> 
> GB I'd add a comment:
> /* if auto_increment_multi_row_lock is true, we have to keep lock, it
> will be released only at end of statement */
> 

MJ: Done.

>> +    {
>> +      pthread_mutex_unlock(&table_share->mutex);
>> +      auto_increment_lock= FALSE;
>> +    }
>> +  }
>> +  virtual void set_auto_increment_if_higher()
>> +  {
>> +    ulonglong nr= table->next_number_field->val_int();
>> +    HA_DATA_PARTITION *ha_data= (HA_DATA_PARTITION*) table_share->ha_data;
>> +    lock_auto_increment();
>> +    /* must check when the mutex is taken */
>> +    if (nr >= ha_data->next_auto_inc_val)
>> +      ha_data->next_auto_inc_val= nr + 1;
>> +    ha_data->auto_inc_initialized= TRUE;
>> +    unlock_auto_increment();
>> +  }
> 
> GB I'd add
> DBUG_ASSERT(!auto_increment_lock && !auto_increment_multi_row_stmt_lock);
> to handler::ha_external_lock() (it already contains something similar
> for next_insert_id==0).
> 

MJ: I have added the DBUG_ASSERT in ha_partition::external_lock, since
those variables are not available in handler::ha_external_lock.


Regards
-- 
Mattias Jonsson, Software Engineer
MySQL AB, www.mysql.com

Are you MySQL certified?  www.mysql.com/certification

Thread
bk commit into 5.1 tree (mattiasj:1.2518) BUG#33479mattiasj1 Apr
  • Re: bk commit into 5.1 tree (mattiasj:1.2518) BUG#33479Guilhem Bichot17 Apr
    • Re: bk commit into 5.1 tree (mattiasj:1.2518) BUG#33479Mattias Jonsson25 Apr
      • Re: bk commit into 5.1 tree (mattiasj:1.2518) BUG#33479Guilhem Bichot25 Apr
        • Re: bk commit into 5.1 tree (mattiasj:1.2518) BUG#33479Mattias Jonsson25 Apr
          • Re: bk commit into 5.1 tree (mattiasj:1.2518) BUG#33479Sergei Golubchik25 Apr