List:Commits« Previous MessageNext Message »
From:Davi Arnaut Date:October 4 2010 9:56pm
Subject:Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938
Bug#54678
View as plain text  
Hi Dmitry,

On 10/4/10 4:11 PM, Dmitry Lenev wrote:
> Hello Davi!
>
> Here are my comments about your patch:

Thanks!

> * Davi Arnaut<davi.arnaut@stripped>  [10/09/30 02:41]:
>>   3149 Davi Arnaut	2010-09-29
>>        Bug#49938: Failing assertion: inode or deadlock in fsp/fsp0fsp.c
>>        Bug#54678: InnoDB, TRUNCATE, ALTER, I_S SELECT, crash or deadlock
>>
>>        The problem was that for storage engines that do not support
>>        truncate table via a external drop and recreate, such as InnoDB
>>        which implements truncate via a internal drop and recreate, the
>>        delete_all_rows method could be invoked with a shared metadata
>>        lock, causing problems if the engine needed exclusive access
>>        to some internal metadata. This problem originated with the
>>        fact that there is no truncate specific handler method, which
>>        ended up leading to a abuse of the delete_all_rows method that
>>        is primarily used for delete operations without a condition.
>>
>>        The solution is to introduce a truncate handler method that is
>>        invoked when the engine does not support truncation via a table
>>        drop and recreate. This method is invoked under a exclusive
>>        metadata lock, so that there is only a single instance of the
>>        table when the method is invoked.
>>
>>        Also, the method is not invoked and a error is thrown if
>>        the table participates in a non-self-referencing foreign key
>>        relationship. This was necessary to avoid inconsistency as
>>        integrity checks are bypassed. This is inline with the fact
>>        that truncate is primarily a DDL operation that was designed
>>        to quickly remove all data from a table.
>>
>>        Incompatible change: truncate no longer resorts to a high-level
>>        row by row delete if the storage engine does not support the
>>        truncate method. Consequently, the count of rows affected does
>>        not reflect the actual number of rows.
>
> I think it makes sense to highlight in this comment the fact that
> TRUNCATE will no longer work for InnoDB tables which participate
> in non-recursive foreign key as parent. This includes cases when it
> succeeded before - when child table was empty (e.g. due to previous
> truncation) or this foreign key had a CASCADE as ON DELETE action.

OK.

> Actually this incompatible change is my main issue with your patch.
> Although I agree that this is a step in the right direction (we did
> the same thing in 6.1-fk tree after all) I am concerned with the
> fact that we are introducing it fairly late in the release cycle.
>

FWIW, I'll double check with the InnoDB team whether we can lift this.

>
>> === modified file 'mysql-test/r/trigger-trans.result'
>> --- a/mysql-test/r/trigger-trans.result	2008-10-07 16:54:12 +0000
>> +++ b/mysql-test/r/trigger-trans.result	2010-09-29 22:38:57 +0000
>> @@ -151,9 +151,14 @@ CREATE TRIGGER t1_ad AFTER DELETE ON t1
>>   SET @a = 0;
>>   SET @b = 0;
>>   TRUNCATE t1;
>> +ERROR 42000: Cannot truncate table 't1' that is referred to by foreign key in
> another table
>>   SELECT @a, @b;
>>   @a	@b
>>   0	0
>> +DELETE FROM t1;
>> +SELECT @a, @b;
>> +@a	@b
>> +1	1
>>   INSERT INTO t1 VALUES (1);
>
> I think the above 3 statements are redundant and can be simply removed.

OK.

>>   DELETE FROM t1;
>>   SELECT @a, @b;
>>
>
> ...
>
>> === modified file 'sql/ha_partition.cc'
>> --- a/sql/ha_partition.cc	2010-08-19 08:22:23 +0000
>> +++ b/sql/ha_partition.cc	2010-09-29 22:38:57 +0000
>
> ...
>
>> +
>> +
>> +/**
>> +  Manually truncate the table.
>> +
>> +  @remark Auto increment value will be truncated in that partition as well!
>> +
>
> Is the above comment still relevant?

Yes, but I think it was meant for partition truncate.

> If yes which partition is meant under "that" ?

Will move this remark to ha_partition::truncate_partition.

>> +  @retval  0    Success.
>> +  @retval>  0  Error code.
>> +*/
>> +
>> +int ha_partition::truncate()
>> +{
>
> ...
>
>>
>>
>> +/**
>> +  Truncate a set of specific partitioins.
>                                  ^^^^^^^^^^^
> Typo:                           partitions

Fixed.

[..]

>
>> === modified file 'sql/ha_partition.h'
>> --- a/sql/ha_partition.h	2010-07-23 20:09:27 +0000
>> +++ b/sql/ha_partition.h	2010-09-29 22:38:57 +0000
>> @@ -342,6 +342,7 @@ public:
>>     virtual int update_row(const uchar * old_data, uchar * new_data);
>>     virtual int delete_row(const uchar * buf);
>>     virtual int delete_all_rows(void);
>> +  virtual int truncate();
>>     virtual void start_bulk_insert(ha_rows rows);
>>     virtual int end_bulk_insert();
>>   private:
>> @@ -350,6 +351,12 @@ private:
>>     long estimate_read_buffer_size(long original_size);
>>   public:
>>
>> +  /*
>> +    Method for truncating a specific partition.
>> +    (i.e. ALTER TABLE t1 TRUNCATE PARTITION p).
>> +  */
>> +  int truncate_partition(Alter_info *);
>> +
>
>
> Please correct me if I am wrong. This method is not a method
> of general SE API but rather a partitioning-specific hook.
> So we are not adding Alter_info class to the SE API.
> Right?

Right.

> Maybe it makes sense to mention this in the above comment?

Added:

     @remark This method is partitioning-specific hook and
             thus not a member of the general SE API.

> ...
>
>> === modified file 'sql/handler.h'
>> --- a/sql/handler.h	2010-08-18 11:55:37 +0000
>> +++ b/sql/handler.h	2010-09-29 22:38:57 +0000
>> @@ -1331,6 +1331,7 @@ public:
>>     int ha_bulk_update_row(const uchar *old_data, uchar *new_data,
>>                            uint *dup_key_found);
>>     int ha_delete_all_rows();
>> +  int ha_truncate();
>>     int ha_reset_auto_increment(ulonglong value);
>>     int ha_optimize(THD* thd, HA_CHECK_OPT* check_opt);
>>     int ha_analyze(THD* thd, HA_CHECK_OPT* check_opt);
>> @@ -2010,12 +2011,31 @@ private:
>>       This is called to delete all rows in a table
>>       If the handler don't support this, then this function will
>>       return HA_ERR_WRONG_COMMAND and MySQL will delete the rows one
>> -    by one. It should reset auto_increment if
>> -    thd->lex->sql_command == SQLCOM_TRUNCATE.
>> +    by one.
>>     */
>>     virtual int delete_all_rows()
>>     { return (my_errno=HA_ERR_WRONG_COMMAND); }
>>     /**
>> +    Quickly remove all rows from a table.
>> +
>> +    @remark This method is responsible for implementing MySQL's TRUNCATE
>> +            TABLE statement, which is a DDL operation. As such, a engine
>> +            can bypass certain integrity checks and in some cases avoid
>> +            fine-grained locking (e.g. row locks) which would normally be
>> +            required for a DELETE statement.
>> +
>> +    @remark Typically, truncate is not used if it can result in integrity
>> +            violation. For example, truncate is not used when a foreign
>> +            key references the table, but it might be used if foreign key
>> +            checks are disabled.
>> +
>> +    @remark Engine is responsible for reseting the auto-increment counter.
>> +
>> +    @remark The table is locked in exclusive mode.
>> +  */
>> +  virtual int truncate()
>> +  { return HA_ERR_WRONG_COMMAND; }
>> +  /**
>>       Reset the auto-increment counter to the given value, i.e. the next row
>>       inserted will get the given value. This is called e.g. after TRUNCATE
>>       is emulated by doing a 'DELETE FROM t'. HA_ERR_WRONG_COMMAND is
>>
>
> Hmm... handler::ha_/reset_auto_increment() is no longer used outside
> of storage engines code. I am not sure if it makes sense to get rid
> of this API method at this point... OTOH it is probably a good idea

We could deprecate it.

> to update the above comment describing this method.

Updated.

> ...
>
>> === modified file 'sql/sql_parse.cc'
>> --- a/sql/sql_parse.cc	2010-09-22 08:15:41 +0000
>> +++ b/sql/sql_parse.cc	2010-09-29 22:38:57 +0000
>> @@ -49,7 +49,7 @@
>>                                 // mysql_recreate_table,
>>                                 // mysql_backup_table,
>>                                 // mysql_restore_table
>> -#include "sql_truncate.h"     // mysql_truncate_table
>> +#include "sql_truncate.h"     // Truncate_statement
>
> Taking into account that Truncate_statement class is not used in
> sql_parse.cc do we need this include at all?

Removed.

> ...
>
>> === modified file 'sql/sql_partition_admin.cc'
>> --- a/sql/sql_partition_admin.cc	2010-08-16 14:25:23 +0000
>> +++ b/sql/sql_partition_admin.cc	2010-09-29 22:38:57 +0000
>> @@ -16,10 +16,10 @@
>>   #include "sql_parse.h"                      // check_one_table_access
>>   #include "sql_table.h"                      // mysql_alter_table, etc.
>>   #include "sql_lex.h"                        // Sql_statement
>> -#include "sql_truncate.h"                   // mysql_truncate_table,
>> -                                            // Truncate_statement
>>   #include "sql_admin.h"                      //
> Analyze/Check/.._table_statement
>>   #include "sql_partition_admin.h"            // Alter_table_*_partition
>> +#include "ha_partition.h"                   // ha_partition
>> +#include "sql_base.h"                       // open_and_lock_tables
>>
>>   #ifndef WITH_PARTITION_STORAGE_ENGINE
>>
>> @@ -46,7 +46,7 @@ bool Alter_table_analyze_partition_state
>>     m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION;
>>
>>     res= Analyze_table_statement::execute(thd);
>> -
>> +
>>     DBUG_RETURN(res);
>>   }
>>
>> @@ -104,36 +104,64 @@ bool Alter_table_repair_partition_statem
>>
>>   bool Alter_table_truncate_partition_statement::execute(THD *thd)
>>   {
>> +  int error;
>> +  ha_partition *partition;
>>     TABLE_LIST *first_table= thd->lex->select_lex.table_list.first;
>> -  bool res;
>> -  enum_sql_command original_sql_command;
>>     DBUG_ENTER("Alter_table_truncate_partition_statement::execute");
>>
>>     /*
>> -    Execute TRUNCATE PARTITION just like TRUNCATE TABLE.
>> -    Some storage engines (InnoDB, partition) checks thd_sql_command,
>> -    so we set it to SQLCOM_TRUNCATE during the execution.
>> -  */
>> -  original_sql_command= m_lex->sql_command;
>> -  m_lex->sql_command= SQLCOM_TRUNCATE;
>> -
>> -  /*
>>       Flag that it is an ALTER command which administrates partitions, used
>>       by ha_partition.
>>     */
>> -  m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION;
>> -
>> +  m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION |
>> +                            ALTER_TRUNCATE_PARTITION;
>> +
>> +  /* Fix the lock types (not the same as ordinary ALTER TABLE). */
>> +  first_table->lock_type= TL_WRITE;
>> +  first_table->mdl_request.set_type(MDL_EXCLUSIVE);
>> +
>>     /*
>> -    Fix the lock types (not the same as ordinary ALTER TABLE).
>> +    Check table permissions and open it with a exclusive lock.
>> +    Ensure it is a partitioned table and finally, upcast the
>> +    handler and invoke the partition truncate method. Lastly,
>> +    write the statement to the binary log if necessary.
>>     */
>> -  first_table->lock_type= TL_WRITE;
>> -  first_table->mdl_request.set_type(MDL_SHARED_NO_READ_WRITE);
>>
>> -  /* execute as a TRUNCATE TABLE */
>> -  res= Truncate_statement::execute(thd);
>> +  if (check_one_table_access(thd, DROP_ACL, first_table))
>> +    DBUG_RETURN(TRUE);
>>
>> -  m_lex->sql_command= original_sql_command;
>> -  DBUG_RETURN(res);
>> +  if (open_and_lock_tables(thd, first_table, FALSE, 0))
>> +    DBUG_RETURN(TRUE);
>> +
>> +  /*
>> +    TODO: Add support for TRUNCATE PARTITION for NDB and other
>> +          engines supporting native partitioning.
>> +  */
>> +  if (first_table->table->s->db_type() != partition_hton)
>> +  {
>> +    my_error(ER_PARTITION_MGMT_ON_NONPARTITIONED, MYF(0));
>> +    DBUG_RETURN(TRUE);
>> +  }
>> +
>> +  partition= (ha_partition *) first_table->table->file;
>> +
>> +  /* Invoke the handler method responsible for truncating the partition. */
>> +  if ((error=
> partition->truncate_partition(&thd->lex->alter_info)))
>> +    first_table->table->file->print_error(error, MYF(0));
>> +
>> +  /*
>> +    All effects of a truncate operation are committed even if the
>> +    operation fails. Thus, the query must be written to the binary
>> +    log. The only exception is a unimplemented truncate method. Also,
>> +    it is logged in statement format, regardless of the binlog format.
>> +  */
>> +  if (error != HA_ERR_WRONG_COMMAND)
>> +    error|= write_bin_log(thd, !error, thd->query(),
> thd->query_length());
>> +
>> +  if (! error)
>> +    my_ok(thd);
>> +
>> +  DBUG_RETURN(error);
>>   }
>
> So what happens if one executes ALTER TABLE TRUNCATE PARTITION
> under LOCK TABLES? Don't we need to upgrade SNRW metadata lock
> to X lock in this case?
>

Yes. Fixed.

>
>> === modified file 'sql/sql_truncate.cc'
>> --- a/sql/sql_truncate.cc	2010-08-31 10:03:36 +0000
>> +++ b/sql/sql_truncate.cc	2010-09-29 22:38:57 +0000
>> @@ -29,145 +29,120 @@
>>   #include "sql_truncate.h"
>>
>>
>> -/*
>> -  Delete all rows of a locked table.
>> -
>> -  @param  thd           Thread context.
>> -  @param  table_list    Table list element for the table.
>> -  @param  rows_deleted  Whether rows might have been deleted.
>> -
>> -  @retval  FALSE  Success.
>> -  @retval  TRUE   Error.
>> +/**
>> +  Check if table which is going to be affected by TRUNCATE TABLE
>> +  is a parent table in some non-self-referencing foreign key and
>> +  if yes emit an error that it is illegal to truncate such table.
>> +
>> +  @param  thd    Thread context.
>> +  @param  table  Table handle.
>> +
>> +  @retval FALSE  This table is not parent in a non-self-referencing foreign
>> +                 key. Statement can proceed.
>> +  @retval TRUE   This table is parent in a non-self-referencing foreign key,
>> +                 error was emitted.
>>   */
>>
>>   static bool
>> -delete_all_rows(THD *thd, TABLE *table)
>> +fk_truncate_illegal_if_parent(THD *thd, TABLE *table)
>>   {
>> -  int error;
>> -  READ_RECORD info;
>> -  bool is_bulk_delete;
>> -  bool some_rows_deleted= FALSE;
>> -  bool save_binlog_row_based= thd->is_current_stmt_binlog_format_row();
>> -  DBUG_ENTER("delete_all_rows");
>> +  FOREIGN_KEY_INFO *f_key_info;
>> +  List<FOREIGN_KEY_INFO>  f_key_list;
>> +  List_iterator_fast<FOREIGN_KEY_INFO>  it;
>>
>> -  /* Replication of truncate table must be statement based. */
>> -  thd->clear_current_stmt_binlog_format_row();
>> +  /* Whether the table is referenced by a foreign key. */
>> +  if (! table->file->referenced_by_foreign_key())
>> +    return FALSE;
>>
>> -  /*
>> -    Update handler statistics (e.g. table->file->stats.records).
>> -    Might be used by the storage engine to aggregate information
>> -    necessary to allow deletion. Currently, this seems to be
>> -    meaningful only to the archive storage engine, which uses
>> -    the info method to set the number of records. Although
>> -    archive does not support deletion, it becomes necessary in
>> -    order to return a error if the table is not empty.
>> -  */
>> -  error= table->file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK);
>> -  if (error&&  error != HA_ERR_WRONG_COMMAND)
>> -  {
>> -    table->file->print_error(error, MYF(0));
>> -    goto end;
>> -  }
>> +  table->file->get_foreign_key_list(thd,&f_key_list);
>>
>> -  /*
>> -    Attempt to delete all rows in the table.
>> -    If it is unsupported, switch to row by row deletion.
>> -  */
>> -  if (! (error= table->file->ha_delete_all_rows()))
>> -    goto end;
>> +  it.init(f_key_list);
>>
>> -  if (error != HA_ERR_WRONG_COMMAND)
>> +  /* Check whether it is a self-referencing FK. */
>> +  while ((f_key_info= it++))
>>     {
>> -    /*
>> -      If a transactional engine fails in the middle of deletion,
>> -      we expect it to be able to roll it back.  Some reasons
>> -      for the engine to fail would be media failure or corrupted
>> -      data dictionary (i.e. in case of a partitioned table). We
>> -      have sufficiently strong metadata locks to rule out any
>> -      potential deadlocks.
>> -
>> -      If a non-transactional engine fails here (that would
>> -      not be MyISAM, since MyISAM does TRUNCATE by recreate),
>> -      and binlog is on, replication breaks, since nothing gets
>> -      written to the binary log.  (XXX: is this a bug?)
>> -    */
>> -    table->file->print_error(error, MYF(0));
>> -    goto end;
>> +    if (my_strcasecmp(system_charset_info,
> f_key_info->referenced_db->str,
>> +                      table->s->db.str) ||
>> +        my_strcasecmp(system_charset_info,
> f_key_info->referenced_table->str,
>> +                      table->s->table_name.str))
>> +      break;
>>     }
>>
>> -  /*
>> -    A workaround for Bug#53696  "Performance schema engine violates the
>> -    PSEA API by calling my_error()".
>> -  */
>> -  if (thd->is_error())
>> -    goto end;
>> -
>
> AFAIU this bug is still there. Does this mean that this code should
> have been moved somewhere else and not removed?

Yes, the bug is still here but we don't need the hack anymore because we 
no longer resort to row-by-row delete. What happens is that now my_error 
is invoked two times, once by perfschema and another by print_error, 
which is harmless.


>> -  /* Handler didn't support fast delete. Delete rows one by one. */
>> -
>> -  init_read_record(&info, thd, table, NULL, TRUE, TRUE, FALSE);
>> -
>> -  /*
>> -    Start bulk delete. If the engine does not support it, go on,
>> -    it's not an error.
>> -  */
>> -  is_bulk_delete= ! table->file->start_bulk_delete();
>> +  /* Table is parent in a non-self-referencing foreign key. */
>> +  if (f_key_list.is_empty() || f_key_info)
>> +  {
>> +    my_error(ER_TRUNCATE_ILLEGAL_FK, MYF(0), table->s->table_name.str);
>> +    return TRUE;
>> +  }
>
> Could you please clarify why "f_key_list.is_empty()" part
> of the above condition is needed?

When we reach empty we know that the table is referenced by a foreign 
key. In this case, we allow truncate only if all the FKs of this table 
are self-referencing.

I've updated the comment to:

   /*
     Table is parent in a non-self-referencing foreign key or the table
     has a foreign key that refers to another table. The intention is to
     allow truncate only for tables that do not depend on other tables.
   */

Some background on what is being avoided: Bug#56785

>>
>> -  table->mark_columns_needed_for_delete();
>> +  return FALSE;
>> +}
>>
>> -  while (!(error= info.read_record(&info))&&  !thd->killed)
>> -  {
>> -    if ((error= table->file->ha_delete_row(table->record[0])))
>> -    {
>> -      table->file->print_error(error, MYF(0));
>> -      break;
>> -    }
>>
>> -    some_rows_deleted= TRUE;
>> -  }
>> +/*
>> +  Open and truncate a locked table.
>>
>> -  /* HA_ERR_END_OF_FILE */
>> -  if (error == -1)
>> -    error= 0;
>> +  @param  thd           Thread context.
>> +  @param  table_ref     Table list element for the table to be truncated.
>> +  @param  is_tmp_table  True if element refers to a temp table.
>>
>> -  /* Close down the bulk delete. */
>> -  if (is_bulk_delete)
>> -  {
>> -    int bulk_delete_error= table->file->end_bulk_delete();
>> -    if (bulk_delete_error&&  !error)
>> -    {
>> -      table->file->print_error(bulk_delete_error, MYF(0));
>> -      error= bulk_delete_error;
>> -    }
>> -  }
>> +  @retval  0    Success.
>> +  @retval>  0  Error code.
>> +*/
>>
>> -  end_read_record(&info);
>> +int Truncate_statement::handler_truncate(THD *thd, TABLE_LIST *table_ref,
>> +                                         bool is_tmp_table)
>> +{
>> +  int error= 0;
>> +  uint flags;
>> +  DBUG_ENTER("Truncate_statement::handler_truncate");
>>
>>     /*
>> -    Regardless of the error status, the query must be written to the
>> -    binary log if rows of the table is non-transactional.
>> +    Can't recreate, the engine must mechanically delete all rows
>> +    in the table. Use open_and_lock_tables() to open a write cursor.
>>     */
>> -  if (some_rows_deleted&&  !table->file->has_transactions())
>> +
>> +  /* If it is a temporary table, no need to take locks. */
>> +  if (is_tmp_table)
>> +    flags= MYSQL_OPEN_TEMPORARY_ONLY;
>> +  else
>>     {
>> -    thd->transaction.stmt.modified_non_trans_table= TRUE;
>> -    thd->transaction.all.modified_non_trans_table= TRUE;
>> +    /* We don't need to load triggers. */
>> +    DBUG_ASSERT(table_ref->trg_event_map == 0);
>> +    /*
>> +      Our metadata lock guarantees that no transaction is reading
>> +      or writing into the table. Yet, to open a write cursor we need
>> +      a thr_lock lock. Allow to open base tables only.
>> +    */
>> +    table_ref->required_type= FRMTYPE_TABLE;
>> +    /*
>> +      Ignore pending FLUSH TABLES since we don't want to release
>> +      the MDL lock taken above and otherwise there is no way to
>> +      wait for FLUSH TABLES in deadlock-free fashion.
>> +    */
>> +    flags= MYSQL_OPEN_IGNORE_FLUSH | MYSQL_OPEN_SKIP_TEMPORARY;
>> +    /*
>> +      Even though we have an MDL lock on the table here, we don't
>> +      pass MYSQL_OPEN_HAS_MDL_LOCK to open_and_lock_tables
>> +      since to truncate a MERGE table, we must open and lock
>> +      merge children, and on those we don't have an MDL lock.
>> +      Thus clear the ticket to satisfy MDL asserts.
>> +    */
>> +    table_ref->mdl_request.ticket= NULL;
>> +    /* The handler truncate protocol dictates a exclusive lock. */
>> +    table_ref->mdl_request.set_type(MDL_EXCLUSIVE);
>>     }
>
> Maybe it makes sense to set the type of metadata lock right
> in the parser?
>

No, because we first check, with a shared lock, whether the engine 
supports table recreate. Hum, actually, but this is also poses a problem 
in a re-execution context, as the type will always be exclusive after 
the first time the a engine for the table does not support recreate.

Suggestions?

>
>> @@ -225,30 +200,27 @@ static bool recreate_temporary_table(THD
>>
>>
>>   /*
>> -  Handle opening and locking if a base table for truncate.
>> +  Handle locking a base table for truncate.
>>
>>     @param[in]  thd               Thread context.
>>     @param[in]  table_ref         Table list element for the table to
>>                                   be truncated.
>>     @param[out] hton_can_recreate Set to TRUE if table can be dropped
>>                                   and recreated.
>> -  @param[out] ticket_downgrade  Set if a lock must be downgraded after
>> -                                truncate is done.
>>
>>     @retval  FALSE  Success.
>>     @retval  TRUE   Error.
>>   */
>>
>> -static bool open_and_lock_table_for_truncate(THD *thd, TABLE_LIST *table_ref,
>> -                                             bool *hton_can_recreate,
>> -                                             MDL_ticket **ticket_downgrade)
>> +bool Truncate_statement::lock_table(THD *thd, TABLE_LIST *table_ref,
>> +                                    bool *hton_can_recreate)
>>   {
>>     TABLE *table= NULL;
>> -  handlerton *table_type;
>> -  DBUG_ENTER("open_and_lock_table_for_truncate");
>> +  DBUG_ENTER("Truncate_statement::lock_table");
>>
>>     DBUG_ASSERT(table_ref->lock_type == TL_WRITE);
>> -  DBUG_ASSERT(table_ref->mdl_request.type == MDL_SHARED_NO_READ_WRITE);
>> +  DBUG_ASSERT(table_ref->mdl_request.type>= MDL_SHARED_NO_READ_WRITE);
>> +
>>     /*
>>       Before doing anything else, acquire a metadata lock on the table,
>>       or ensure we have one.  We don't use open_and_lock_tables()
>> @@ -268,8 +240,7 @@ static bool open_and_lock_table_for_trun
>>                                               table_ref->table_name, FALSE)))
>>         DBUG_RETURN(TRUE);
>>
>> -    table_type= table->s->db_type();
>> -    *hton_can_recreate= ha_check_storage_engine_flag(table_type,
>> +    *hton_can_recreate= ha_check_storage_engine_flag(table->s->db_type(),
>>                                                        HTON_CAN_RECREATE);
>>       table_ref->mdl_request.ticket= table->mdl_ticket;
>>     }
>> @@ -285,86 +256,36 @@ static bool open_and_lock_table_for_trun
>>                            MYSQL_OPEN_SKIP_TEMPORARY))
>>         DBUG_RETURN(TRUE);
>>
>> -    if (dd_frm_storage_engine(thd, table_ref->db, table_ref->table_name,
>> -&table_type))
>> +    if (dd_check_storage_engine_flag(thd, table_ref->db,
> table_ref->table_name,
>> +                                     HTON_CAN_RECREATE, hton_can_recreate))
>>         DBUG_RETURN(TRUE);
>> -    *hton_can_recreate= ha_check_storage_engine_flag(table_type,
>> -                                                     HTON_CAN_RECREATE);
>>     }
>>
>> -#ifdef WITH_PARTITION_STORAGE_ENGINE
>> -  /*
>> -    TODO: Add support for TRUNCATE PARTITION for NDB and other engines
>> -    supporting native partitioning.
>> -  */
>> -  if (thd->lex->alter_info.flags&  ALTER_ADMIN_PARTITION&&
>> -      table_type != partition_hton)
>> -  {
>> -    my_error(ER_PARTITION_MGMT_ON_NONPARTITIONED, MYF(0));
>> -    DBUG_RETURN(TRUE);
>> -  }
>> -#endif
>>     DEBUG_SYNC(thd, "lock_table_for_truncate");
>>
>> -  if (*hton_can_recreate)
>> +  /*
>> +    Acquire an exclusive lock. A storage engine can recreate or
>> +    truncate the table only if there are no references to it from
>> +    anywhere, i.e. no cached TABLE in the table cache. To remove
>> +    the table from the cache we need an exclusive lock.
>> +  */
>> +  if (thd->locked_tables_mode)
>>     {
>> -    /*
>> -      Acquire an exclusive lock. The storage engine can recreate the
>> -      table only if there are no references to it from anywhere, i.e.
>> -      no cached TABLE in the table cache. To remove the table from the
>> -      cache we need an exclusive lock.
>> -    */
>> -    if (thd->locked_tables_mode)
>> -    {
>> -      if (wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN))
>> -        DBUG_RETURN(TRUE);
>> -      *ticket_downgrade= table->mdl_ticket;
>> +    if (wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN))
>> +      DBUG_RETURN(TRUE);
>> +    m_ticket_downgrade= table->mdl_ticket;
>
> So we init m_ticket_dowgrade in constructor and set here.
> Do we reset it anywhere to make code re-execution safe?
> (... OK, re-prepare will probably care about PS but what
> about SPs?...)

Fixed. Also added a test case.

>> +    /* Close if table is going to be recreated. */
>> +    if (*hton_can_recreate)
>>         close_all_tables_for_name(thd, table->s, FALSE);
>> -    }
>> -    else
>> -    {
>> -      ulong timeout= thd->variables.lock_wait_timeout;
>> -      if (thd->mdl_context.
>> -          upgrade_shared_lock_to_exclusive(table_ref->mdl_request.ticket,
>> -                                           timeout))
>> -        DBUG_RETURN(TRUE);
>> -      tdc_remove_table(thd, TDC_RT_REMOVE_ALL, table_ref->db,
>> -                       table_ref->table_name, FALSE);
>> -    }
>>     }
>>     else
>>     {
>> -    /*
>> -      Can't recreate, we must mechanically delete all rows in
>> -      the table. Our metadata lock guarantees that no transaction
>> -      is reading or writing into the table. Yet, to open a write
>> -      cursor we need a thr_lock lock. Use open_and_lock_tables()
>> -      to do the necessary job.
>> -    */
>> -
>> -    /* Allow to open base tables only. */
>> -    table_ref->required_type= FRMTYPE_TABLE;
>> -    /* We don't need to load triggers. */
>> -    DBUG_ASSERT(table_ref->trg_event_map == 0);
>> -    /*
>> -      Even though we have an MDL lock on the table here, we don't
>> -      pass MYSQL_OPEN_HAS_MDL_LOCK to open_and_lock_tables
>> -      since to truncate a MERGE table, we must open and lock
>> -      merge children, and on those we don't have an MDL lock.
>> -      Thus clear the ticket to satisfy MDL asserts.
>> -    */
>> -    table_ref->mdl_request.ticket= NULL;
>> -
>> -    /*
>> -      Open the table as it will handle some required preparations.
>> -      Ignore pending FLUSH TABLES since we don't want to release
>> -      the MDL lock taken above and otherwise there is no way to
>> -      wait for FLUSH TABLES in deadlock-free fashion.
>> -    */
>> -    if (open_and_lock_tables(thd, table_ref, FALSE,
>> -                             MYSQL_OPEN_IGNORE_FLUSH |
>> -                             MYSQL_OPEN_SKIP_TEMPORARY))
>> +    ulong timeout= thd->variables.lock_wait_timeout;
>> +    if (thd->mdl_context.
>> +        upgrade_shared_lock_to_exclusive(table_ref->mdl_request.ticket,
> timeout))
>>         DBUG_RETURN(TRUE);
>> +    tdc_remove_table(thd, TDC_RT_REMOVE_ALL, table_ref->db,
>> +                     table_ref->table_name, FALSE);
>>     }
>
> Taking into account that we now always upgrade metadata locks
> for base tables maybe it makes sense to take X metadata lock on
> table being truncated right from the start?

I think so but I remember that kostja had some desire to keep a shared 
lock as long as necessary. I asked him once about this but go no response...

[..]

>> -  DBUG_PRINT("exit", ("error: %d", error));
>> -  DBUG_RETURN(test(error));
>> +  DBUG_RETURN(error);
>>   }
>>
>>
>> +/**
>> +  Execute a TRUNCATE statement at runtime.
>> +
>> +  @param  thd   The current thread.
>> +
>> +  @return FALSE on success.
>> +*/
>> +
>>   bool Truncate_statement::execute(THD *thd)
>>   {
>>     TABLE_LIST *first_table= thd->lex->select_lex.table_list.first;
>> @@ -502,9 +420,10 @@ bool Truncate_statement::execute(THD *th
>>
>>     if (check_one_table_access(thd, DROP_ACL, first_table))
>>       goto error;
>> +
>>     /*
>>       Don't allow this within a transaction because we want to use
>> -    re-generate table
>> +    re-generate table.
>>     */
>>     if (thd->in_active_multi_stmt_transaction())
>>     {
>
> Taking into account that TRUNCATE TABLE causes implicit commit
> do we need the above if-statement at all?

Hum, some consider the below to be somewhat evil:

SET @autocommit = 0;
# Does implicit commit.
TRUNCATE TABLE t1;

but this now seems to be particular to truncate. Removed.

>> @@ -512,8 +431,11 @@ bool Truncate_statement::execute(THD *th
>>                  ER(ER_LOCK_OR_ACTIVE_TRANSACTION), MYF(0));
>>       goto error;
>>     }
>> -  if (! (res= mysql_truncate_table(thd, first_table)))
>> +
>> +  if (! (res= truncate_table(thd, first_table)))
>>       my_ok(thd);
>> +
>>   error:
>>     DBUG_RETURN(res);
>>   }
>> +
>>
>
> OK.
>
>> === modified file 'sql/sql_truncate.h'
>> --- a/sql/sql_truncate.h	2010-08-16 12:53:30 +0000
>> +++ b/sql/sql_truncate.h	2010-09-29 22:38:57 +0000
>> @@ -18,23 +18,26 @@
>>   class THD;
>>   struct TABLE_LIST;
>>
>> -bool mysql_truncate_table(THD *thd, TABLE_LIST *table_ref);
>> -
>>   /**
>>     Truncate_statement represents the TRUNCATE statement.
>>   */
>>   class Truncate_statement : public Sql_statement
>>   {
>> +private:
>> +  /* Set if a lock must be downgraded after truncate is done. */
>> +  MDL_ticket *m_ticket_downgrade;
>> +
>>   public:
>>     /**
>>       Constructor, used to represent a ALTER TABLE statement.
>>       @param lex the LEX structure for this statement.
>>     */
>>     Truncate_statement(LEX *lex)
>> -    : Sql_statement(lex)
>> +    : Sql_statement(lex),
>> +      m_ticket_downgrade(NULL)
>>     {}
>>
>> -  ~Truncate_statement()
>> +  virtual ~Truncate_statement()
>>     {}
>>
>>     /**
>> @@ -43,7 +46,20 @@ public:
>>       @return false on success.
>>     */
>>     bool execute(THD *thd);
>> -};
>>
>> +protected:
>> +  /** Handle locking a base table for truncate. */
>> +  bool lock_table(THD *, TABLE_LIST *, bool *);
>> +
>> +  /** Truncate table via the handler method. */
>> +  int handler_truncate(THD *, TABLE_LIST *, bool);
>> +
>> +  /**
>> +    Optimized delete of all rows by doing a full generate of the table.
>
> Maybe it is better to use word "regenerate" here ?

done.

>> +    Depending on the storage engine, it can be accomplished through a
>> +    drop and recreate or via the handler truncate method.
>> +  */
>> +  bool truncate_table(THD *, TABLE_LIST *);
>> +};
>>
>>   #endif
>>
>> === modified file 'storage/archive/ha_archive.cc'
>> --- a/storage/archive/ha_archive.cc	2010-07-26 15:54:20 +0000
>> +++ b/storage/archive/ha_archive.cc	2010-09-29 22:38:57 +0000
>> @@ -1650,9 +1650,9 @@ int ha_archive::end_bulk_insert()
>>     This is done for security reasons. In a later version we will enable this by
>>     allowing the user to select a different row format.
>>   */
>> -int ha_archive::delete_all_rows()
>> +int ha_archive::truncate()
>>   {
>> -  DBUG_ENTER("ha_archive::delete_all_rows");
>> +  DBUG_ENTER("ha_archive::truncate");
>>     DBUG_RETURN(HA_ERR_WRONG_COMMAND);
>>   }
>>
>
> OK.
>
>> === modified file 'storage/blackhole/ha_blackhole.cc'
>> --- a/storage/blackhole/ha_blackhole.cc	2010-07-08 21:20:08 +0000
>> +++ b/storage/blackhole/ha_blackhole.cc	2010-09-29 22:38:57 +0000
>> @@ -87,6 +87,12 @@ int ha_blackhole::create(const char *nam
>>     DBUG_RETURN(0);
>>   }
>>
>> +int ha_blackhole::truncate()
>> +{
>> +  DBUG_ENTER("ha_blackhole::truncate");
>> +  DBUG_RETURN(0);
>> +}
>> +
>>   const char *ha_blackhole::index_type(uint key_number)
>>   {
>>     DBUG_ENTER("ha_blackhole::index_type");
>>
>
> Do I understand correctly that the above method is needed only
> to support partitioning over blackhole engine (as this engine
> has HTON_CAN_RECREATE set)? Maybe it is worth to mention this
> in the comment for this method?

Done.

>> === modified file 'storage/csv/ha_tina.cc'
>> --- a/storage/csv/ha_tina.cc	2010-07-29 12:32:11 +0000
>> +++ b/storage/csv/ha_tina.cc	2010-09-29 22:38:57 +0000
>> @@ -1638,6 +1638,16 @@ int ha_tina::delete_all_rows()
>>   }
>>
>>   /*
>> +  Manual truncate table.
>> +*/
>> +
>> +int ha_tina::truncate()
>> +{
>> +  /* Simply delete all rows. No auto increment to reset. */
>> +  return delete_all_rows();
>> +}
>> +
>> +/*
>>     Called by the database to lock the table. Keep in mind that this
>>     is an internal lock.
>>   */
>>
>
> On one hand CSV has HTON_CAN_RECREATE flag set. So the above
> method will be called only for partitioned tables based on
> this engine.
> But on the other hand it also has HTON_NO_PARTITION set so such
> tables are impossible to create.
>
> Therefore I think it makes sense to leave this method unimplemented.

OK.

>> === modified file 'storage/example/ha_example.cc'
>> --- a/storage/example/ha_example.cc	2010-08-05 12:34:19 +0000
>> +++ b/storage/example/ha_example.cc	2010-09-29 22:38:57 +0000
>> @@ -739,6 +739,26 @@ int ha_example::delete_all_rows()
>>
>>   /**
>>     @brief
>> +  Used for handler specific truncate table.  The table is locked in
>> +  exclusive mode and handler is responsible for reseting the auto-
>> +  increment counter.
>> +
>> +    @details
>> +  Called from Truncate_statement::handler_truncate.
>> +
>> +    @see
>> +  Truncate_statement in sql_truncate.cc
>> +  Remarks in handler::truncate.
>> +*/
>> +int ha_example::truncate()
>
> Don't you think that layout of the above comment looks wrong?

Yes, but it follows the style used throughout the file. Anyway, fixed 
all cases :)

> Also maybe it makes sense to mention in this comment the role
> of HTON_CAN_RECREATE flag, especially since it is set for this
> engine?
>

Yes.

>
>> === modified file 'storage/ibmdb2i/ha_ibmdb2i.cc'
>> --- a/storage/ibmdb2i/ha_ibmdb2i.cc	2010-07-08 21:20:08 +0000
>> +++ b/storage/ibmdb2i/ha_ibmdb2i.cc	2010-09-29 22:38:57 +0000
>> @@ -1806,6 +1806,13 @@ int ha_ibmdb2i::delete_all_rows()
>>   }
>>
>>
>> +int ha_ibmdb2i::truncate()
>> +{
>> +  int error = delete_all_rows();
>> +  return error ? error : reset_auto_increment(0);
>> +}
>> +
>> +
>>   int ha_ibmdb2i::external_lock(THD *thd, int lock_type)
>>   {
>>     int rc = 0;
>
> OK. Seems like ha_ibmdb2i::delete_all_rows() contains
> some truncate-specific code, so it probably makes sense
> to notify person responsible for supporting this code
> (whoever it is) once your patch is pushed.

OK.

>>
>> === modified file 'storage/innobase/handler/ha_innodb.cc'
>> --- a/storage/innobase/handler/ha_innodb.cc	2010-08-18 07:14:06 +0000
>> +++ b/storage/innobase/handler/ha_innodb.cc	2010-09-29 22:38:57 +0000
>> @@ -7072,33 +7072,21 @@ Deletes all rows of an InnoDB table.
>>   @return	error number */
>>   UNIV_INTERN
>>   int
>> -ha_innobase::delete_all_rows(void)
>> +ha_innobase::truncate(void)
>>   /*==============================*/
>>   {
>>   	int		error;
>>
>> -	DBUG_ENTER("ha_innobase::delete_all_rows");
>> +	DBUG_ENTER("ha_innobase::truncate");
>>
>>   	/* Get the transaction associated with the current thd, or create one
>>   	if not yet created, and update prebuilt->trx */
>>
>>   	update_thd(ha_thd());
>>
>> -	if (thd_sql_command(user_thd) != SQLCOM_TRUNCATE) {
>> -	fallback:
>> -		/* We only handle TRUNCATE TABLE t as a special case.
>> -		DELETE FROM t will have to use ha_innobase::delete_row(),
>> -		because DELETE is transactional while TRUNCATE is not. */
>> -		DBUG_RETURN(my_errno=HA_ERR_WRONG_COMMAND);
>> -	}
>> -
>>   	/* Truncate the table in InnoDB */
>>
>>   	error = row_truncate_table_for_mysql(prebuilt->table, prebuilt->trx);
>> -	if (error == DB_ERROR) {
>> -		/* Cannot truncate; resort to ha_innobase::delete_row() */
>> -		goto fallback;
>> -	}
>>
>>   	error = convert_error_code_to_mysql(error, prebuilt->table->flags,
>>   					    NULL);
>>
>
> OK. Please notify InnoDB guys so they can perform necessary clean-ups
> in their code once your patch is pushed.

OK.

>> === modified file 'storage/myisam/ha_myisam.cc'
>> --- a/storage/myisam/ha_myisam.cc	2010-07-08 21:20:08 +0000
>> +++ b/storage/myisam/ha_myisam.cc	2010-09-29 22:38:57 +0000
>> @@ -1788,6 +1788,12 @@ int ha_myisam::delete_all_rows()
>>     return mi_delete_all_rows(file);
>>   }
>>
>> +int ha_myisam::truncate()
>> +{
>> +  int error= delete_all_rows();
>> +  return error ? error : reset_auto_increment(0);
>> +}
>> +
>>   int ha_myisam::reset_auto_increment(ulonglong value)
>>   {
>>     file->s->state.auto_increment= value;
>>
>
> Again maybe makes sense to mention in the comment describing
> this method that we need ha_myisam::truncate() only for the
> sake of partitioning?

Done.

>> === modified file 'storage/myisammrg/ha_myisammrg.cc'
>> --- a/storage/myisammrg/ha_myisammrg.cc	2010-09-08 08:25:37 +0000
>> +++ b/storage/myisammrg/ha_myisammrg.cc	2010-09-29 22:38:57 +0000
>> @@ -1203,6 +1203,22 @@ ha_rows ha_myisammrg::records_in_range(u
>>   }
>>
>>
>> +int ha_myisammrg::truncate()
>> +{
>> +  int err= 0;
>> +  MYRG_TABLE *table;
>> +  DBUG_ENTER("ha_myisammrg::truncate");
>> +
>> +  for (table= file->open_tables; table != file->end_table; table++)
>> +  {
>> +    if ((err= mi_delete_all_rows(table->table)))
>> +      break;
>> +  }
>> +
>> +  DBUG_RETURN(err);
>> +}
>> +
>> +
>
> Hmm... MERGE doesn't support HA_CAN_RECREATE and doesn't implement
> delete_all_rows() method. So it didn't support ordinary TRUNCATE
> before your patch. Also it has HTON_NO_PARTITION set so this method
> is not necessary for partitioning.

This is one corner case where TRUNCATE resorted to row-by-row delete.

> I don't think we should enable support of TRUNCATE for MERGE
> in your patch. Otherwise we probably should do something about
> resetting auto-increment and add test coverage for this new
> functionality.

Since the implementation is simple, I think it is worth implementing to 
remain compatible. There is some test coverage already, that's why I 
ended up implementing it. As for auto-increment, I think it does not 
apply as the engine does not implement the auto-increment handler methods.

>
>> === modified file 'storage/perfschema/ha_perfschema.cc'
>> --- a/storage/perfschema/ha_perfschema.cc	2010-09-10 09:10:38 +0000
>> +++ b/storage/perfschema/ha_perfschema.cc	2010-09-29 22:38:57 +0000
>> @@ -345,6 +345,11 @@ int ha_perfschema::delete_all_rows(void)
>>     DBUG_RETURN(result);
>>   }
>>
>> +int ha_perfschema::truncate()
>> +{
>> +  return delete_all_rows();
>> +}
>> +
>
> See my comment about bug #53696 above.
>
> ...
>
> Otherwise I am OK with your patch.
>
> I think it can be pushed after considering/addressing/discussing
> the above points and provided that we will settle somehow the
> issue with incompatible change in TRUNCATE behavior for InnoDB
> tables with FKs.
>

OK. Most points are addressed, we can discuss the rest over IRC.

Thank you.

Regards,

Davi
Thread
bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938 Bug#54678Davi Arnaut30 Sep
Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938Bug#54678Dmitry Lenev4 Oct
  • Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938Bug#54678Davi Arnaut4 Oct
    • Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938Bug#54678Dmitry Lenev5 Oct