List:Commits« Previous MessageNext Message »
From:Dmitry Lenev Date:October 4 2010 7:11pm
Subject:Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938
Bug#54678
View as plain text  
Hello Davi!

Here are my comments about your patch:

* Davi Arnaut <davi.arnaut@stripped> [10/09/30 02:41]:
>  3149 Davi Arnaut	2010-09-29
>       Bug#49938: Failing assertion: inode or deadlock in fsp/fsp0fsp.c
>       Bug#54678: InnoDB, TRUNCATE, ALTER, I_S SELECT, crash or deadlock
>       
>       The problem was that for storage engines that do not support
>       truncate table via a external drop and recreate, such as InnoDB
>       which implements truncate via a internal drop and recreate, the
>       delete_all_rows method could be invoked with a shared metadata
>       lock, causing problems if the engine needed exclusive access
>       to some internal metadata. This problem originated with the
>       fact that there is no truncate specific handler method, which
>       ended up leading to a abuse of the delete_all_rows method that
>       is primarily used for delete operations without a condition.
>       
>       The solution is to introduce a truncate handler method that is
>       invoked when the engine does not support truncation via a table
>       drop and recreate. This method is invoked under a exclusive
>       metadata lock, so that there is only a single instance of the
>       table when the method is invoked.
>       
>       Also, the method is not invoked and a error is thrown if
>       the table participates in a non-self-referencing foreign key
>       relationship. This was necessary to avoid inconsistency as
>       integrity checks are bypassed. This is inline with the fact
>       that truncate is primarily a DDL operation that was designed
>       to quickly remove all data from a table.
>       
>       Incompatible change: truncate no longer resorts to a high-level
>       row by row delete if the storage engine does not support the
>       truncate method. Consequently, the count of rows affected does
>       not reflect the actual number of rows.

I think it makes sense to highlight in this comment the fact that
TRUNCATE will no longer work for InnoDB tables which participate
in non-recursive foreign key as parent. This includes cases when it
succeeded before - when child table was empty (e.g. due to previous
truncation) or this foreign key had a CASCADE as ON DELETE action.

Actually this incompatible change is my main issue with your patch.
Although I agree that this is a step in the right direction (we did
the same thing in 6.1-fk tree after all) I am concerned with the
fact that we are introducing it fairly late in the release cycle.

...

> === modified file 'mysql-test/r/trigger-trans.result'
> --- a/mysql-test/r/trigger-trans.result	2008-10-07 16:54:12 +0000
> +++ b/mysql-test/r/trigger-trans.result	2010-09-29 22:38:57 +0000
> @@ -151,9 +151,14 @@ CREATE TRIGGER t1_ad AFTER DELETE ON t1
>  SET @a = 0;
>  SET @b = 0;
>  TRUNCATE t1;
> +ERROR 42000: Cannot truncate table 't1' that is referred to by foreign key in
> another table
>  SELECT @a, @b;
>  @a	@b
>  0	0
> +DELETE FROM t1;
> +SELECT @a, @b;
> +@a	@b
> +1	1
>  INSERT INTO t1 VALUES (1);

I think the above 3 statements are redundant and can be simply removed.

>  DELETE FROM t1;
>  SELECT @a, @b;
> 

...

> === modified file 'sql/ha_partition.cc'
> --- a/sql/ha_partition.cc	2010-08-19 08:22:23 +0000
> +++ b/sql/ha_partition.cc	2010-09-29 22:38:57 +0000

...

> +
> +
> +/**
> +  Manually truncate the table.
> +
> +  @remark Auto increment value will be truncated in that partition as well!
> +

Is the above comment still relevant?
If yes which partition is meant under "that" ?

> +  @retval  0    Success.
> +  @retval  > 0  Error code.
> +*/
> +
> +int ha_partition::truncate()
> +{

...

>  
>  
> +/**
> +  Truncate a set of specific partitioins.
                                ^^^^^^^^^^^
Typo:                           partitions

> +
> +  ALTER TABLE t TRUNCATE PARTITION ...
> +*/
> +
> +int ha_partition::truncate_partition(Alter_info *alter_info)
> +{
> +  int error= 0;
> +  List_iterator<partition_element> part_it(m_part_info->partitions);
> +  uint num_parts= m_part_info->num_parts;
> +  uint num_subparts= m_part_info->num_subparts;
> +  uint i= 0;
> +  uint num_parts_set= alter_info->partition_names.elements;
> +  uint num_parts_found= set_part_state(alter_info, m_part_info,
> +                                        PART_ADMIN);
> +  DBUG_ENTER("ha_partition::truncate_partition");
> +
> +  /* TRUNCATE also means resetting auto_increment */
> +  lock_auto_increment();
> +  table_share->ha_part_data->next_auto_inc_val= 0;
> +  table_share->ha_part_data->auto_inc_initialized= FALSE;
> +  unlock_auto_increment();
> +
> +  if (num_parts_set != num_parts_found &&
> +      (!(alter_info->flags & ALTER_ALL_PARTITION)))
> +    DBUG_RETURN(HA_ERR_NO_PARTITION_FOUND);
> +
> +  do
> +  {
> +    partition_element *part_elem= part_it++;
> +    if (part_elem->part_state == PART_ADMIN)
> +    {
> +      if (m_is_sub_partitioned)
> +      {
> +        List_iterator<partition_element>
> +                                    subpart_it(part_elem->subpartitions);
> +        partition_element *sub_elem;
> +        uint j= 0, part;
> +        do
> +        {
> +          sub_elem= subpart_it++;
> +          part= i * num_subparts + j;
> +          DBUG_PRINT("info", ("truncate subpartition %u (%s)",
> +                              part, sub_elem->partition_name));
> +          if ((error= m_file[part]->ha_truncate()))
> +            break;
> +        } while (++j < num_subparts);
> +      }
> +      else
> +      {
> +        DBUG_PRINT("info", ("truncate partition %u (%s)", i,
> +                            part_elem->partition_name));
> +        error= m_file[i]->ha_truncate();
> +      }
> +      part_elem->part_state= PART_NORMAL;
> +    }
> +  } while (!error && (++i < num_parts));
> +  DBUG_RETURN(error);
> +}

OK. The code is so much clearer now.

...

> === modified file 'sql/ha_partition.h'
> --- a/sql/ha_partition.h	2010-07-23 20:09:27 +0000
> +++ b/sql/ha_partition.h	2010-09-29 22:38:57 +0000
> @@ -342,6 +342,7 @@ public:
>    virtual int update_row(const uchar * old_data, uchar * new_data);
>    virtual int delete_row(const uchar * buf);
>    virtual int delete_all_rows(void);
> +  virtual int truncate();
>    virtual void start_bulk_insert(ha_rows rows);
>    virtual int end_bulk_insert();
>  private:
> @@ -350,6 +351,12 @@ private:
>    long estimate_read_buffer_size(long original_size);
>  public:
>  
> +  /*
> +    Method for truncating a specific partition.
> +    (i.e. ALTER TABLE t1 TRUNCATE PARTITION p).
> +  */
> +  int truncate_partition(Alter_info *);
> +


Please correct me if I am wrong. This method is not a method
of general SE API but rather a partitioning-specific hook.
So we are not adding Alter_info class to the SE API.
Right?

Maybe it makes sense to mention this in the above comment?

...

> === modified file 'sql/handler.h'
> --- a/sql/handler.h	2010-08-18 11:55:37 +0000
> +++ b/sql/handler.h	2010-09-29 22:38:57 +0000
> @@ -1331,6 +1331,7 @@ public:
>    int ha_bulk_update_row(const uchar *old_data, uchar *new_data,
>                           uint *dup_key_found);
>    int ha_delete_all_rows();
> +  int ha_truncate();
>    int ha_reset_auto_increment(ulonglong value);
>    int ha_optimize(THD* thd, HA_CHECK_OPT* check_opt);
>    int ha_analyze(THD* thd, HA_CHECK_OPT* check_opt);
> @@ -2010,12 +2011,31 @@ private:
>      This is called to delete all rows in a table
>      If the handler don't support this, then this function will
>      return HA_ERR_WRONG_COMMAND and MySQL will delete the rows one
> -    by one. It should reset auto_increment if
> -    thd->lex->sql_command == SQLCOM_TRUNCATE.
> +    by one.
>    */
>    virtual int delete_all_rows()
>    { return (my_errno=HA_ERR_WRONG_COMMAND); }
>    /**
> +    Quickly remove all rows from a table.
> +
> +    @remark This method is responsible for implementing MySQL's TRUNCATE
> +            TABLE statement, which is a DDL operation. As such, a engine
> +            can bypass certain integrity checks and in some cases avoid
> +            fine-grained locking (e.g. row locks) which would normally be
> +            required for a DELETE statement.
> +
> +    @remark Typically, truncate is not used if it can result in integrity
> +            violation. For example, truncate is not used when a foreign
> +            key references the table, but it might be used if foreign key
> +            checks are disabled.
> +
> +    @remark Engine is responsible for reseting the auto-increment counter.
> +
> +    @remark The table is locked in exclusive mode.
> +  */
> +  virtual int truncate()
> +  { return HA_ERR_WRONG_COMMAND; }
> +  /**
>      Reset the auto-increment counter to the given value, i.e. the next row
>      inserted will get the given value. This is called e.g. after TRUNCATE
>      is emulated by doing a 'DELETE FROM t'. HA_ERR_WRONG_COMMAND is
>

Hmm... handler::ha_/reset_auto_increment() is no longer used outside
of storage engines code. I am not sure if it makes sense to get rid
of this API method at this point... OTOH it is probably a good idea
to update the above comment describing this method.

...

> === modified file 'sql/sql_parse.cc'
> --- a/sql/sql_parse.cc	2010-09-22 08:15:41 +0000
> +++ b/sql/sql_parse.cc	2010-09-29 22:38:57 +0000
> @@ -49,7 +49,7 @@
>                                // mysql_recreate_table,
>                                // mysql_backup_table,
>                                // mysql_restore_table
> -#include "sql_truncate.h"     // mysql_truncate_table
> +#include "sql_truncate.h"     // Truncate_statement

Taking into account that Truncate_statement class is not used in
sql_parse.cc do we need this include at all?

...

> === modified file 'sql/sql_partition_admin.cc'
> --- a/sql/sql_partition_admin.cc	2010-08-16 14:25:23 +0000
> +++ b/sql/sql_partition_admin.cc	2010-09-29 22:38:57 +0000
> @@ -16,10 +16,10 @@
>  #include "sql_parse.h"                      // check_one_table_access
>  #include "sql_table.h"                      // mysql_alter_table, etc.
>  #include "sql_lex.h"                        // Sql_statement
> -#include "sql_truncate.h"                   // mysql_truncate_table,
> -                                            // Truncate_statement
>  #include "sql_admin.h"                      // Analyze/Check/.._table_statement
>  #include "sql_partition_admin.h"            // Alter_table_*_partition
> +#include "ha_partition.h"                   // ha_partition
> +#include "sql_base.h"                       // open_and_lock_tables
>  
>  #ifndef WITH_PARTITION_STORAGE_ENGINE
>  
> @@ -46,7 +46,7 @@ bool Alter_table_analyze_partition_state
>    m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION;
>  
>    res= Analyze_table_statement::execute(thd);
> -    
> +
>    DBUG_RETURN(res);
>  }
>  
> @@ -104,36 +104,64 @@ bool Alter_table_repair_partition_statem
>  
>  bool Alter_table_truncate_partition_statement::execute(THD *thd)
>  {
> +  int error;
> +  ha_partition *partition;
>    TABLE_LIST *first_table= thd->lex->select_lex.table_list.first;
> -  bool res;
> -  enum_sql_command original_sql_command;
>    DBUG_ENTER("Alter_table_truncate_partition_statement::execute");
>  
>    /*
> -    Execute TRUNCATE PARTITION just like TRUNCATE TABLE.
> -    Some storage engines (InnoDB, partition) checks thd_sql_command,
> -    so we set it to SQLCOM_TRUNCATE during the execution.
> -  */
> -  original_sql_command= m_lex->sql_command;
> -  m_lex->sql_command= SQLCOM_TRUNCATE;
> -  
> -  /*
>      Flag that it is an ALTER command which administrates partitions, used
>      by ha_partition.
>    */
> -  m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION;
> -   
> +  m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION |
> +                            ALTER_TRUNCATE_PARTITION;
> +
> +  /* Fix the lock types (not the same as ordinary ALTER TABLE). */
> +  first_table->lock_type= TL_WRITE;
> +  first_table->mdl_request.set_type(MDL_EXCLUSIVE);
> +
>    /*
> -    Fix the lock types (not the same as ordinary ALTER TABLE).
> +    Check table permissions and open it with a exclusive lock.
> +    Ensure it is a partitioned table and finally, upcast the
> +    handler and invoke the partition truncate method. Lastly,
> +    write the statement to the binary log if necessary.
>    */
> -  first_table->lock_type= TL_WRITE;
> -  first_table->mdl_request.set_type(MDL_SHARED_NO_READ_WRITE);
>  
> -  /* execute as a TRUNCATE TABLE */
> -  res= Truncate_statement::execute(thd);
> +  if (check_one_table_access(thd, DROP_ACL, first_table))
> +    DBUG_RETURN(TRUE);
>  
> -  m_lex->sql_command= original_sql_command;
> -  DBUG_RETURN(res);
> +  if (open_and_lock_tables(thd, first_table, FALSE, 0))
> +    DBUG_RETURN(TRUE);
> +
> +  /*
> +    TODO: Add support for TRUNCATE PARTITION for NDB and other
> +          engines supporting native partitioning.
> +  */
> +  if (first_table->table->s->db_type() != partition_hton)
> +  {
> +    my_error(ER_PARTITION_MGMT_ON_NONPARTITIONED, MYF(0));
> +    DBUG_RETURN(TRUE);
> +  }
> +
> +  partition= (ha_partition *) first_table->table->file;
> +
> +  /* Invoke the handler method responsible for truncating the partition. */
> +  if ((error= partition->truncate_partition(&thd->lex->alter_info)))
> +    first_table->table->file->print_error(error, MYF(0));
> +
> +  /*
> +    All effects of a truncate operation are committed even if the
> +    operation fails. Thus, the query must be written to the binary
> +    log. The only exception is a unimplemented truncate method. Also,
> +    it is logged in statement format, regardless of the binlog format.
> +  */
> +  if (error != HA_ERR_WRONG_COMMAND)
> +    error|= write_bin_log(thd, !error, thd->query(), thd->query_length());
> +
> +  if (! error)
> +    my_ok(thd);
> +
> +  DBUG_RETURN(error);
>  }

So what happens if one executes ALTER TABLE TRUNCATE PARTITION
under LOCK TABLES? Don't we need to upgrade SNRW metadata lock
to X lock in this case?

...

> === modified file 'sql/sql_truncate.cc'
> --- a/sql/sql_truncate.cc	2010-08-31 10:03:36 +0000
> +++ b/sql/sql_truncate.cc	2010-09-29 22:38:57 +0000
> @@ -29,145 +29,120 @@
>  #include "sql_truncate.h"
>  
>  
> -/*
> -  Delete all rows of a locked table.
> -
> -  @param  thd           Thread context.
> -  @param  table_list    Table list element for the table.
> -  @param  rows_deleted  Whether rows might have been deleted.
> -
> -  @retval  FALSE  Success.
> -  @retval  TRUE   Error.
> +/**
> +  Check if table which is going to be affected by TRUNCATE TABLE
> +  is a parent table in some non-self-referencing foreign key and
> +  if yes emit an error that it is illegal to truncate such table.
> +
> +  @param  thd    Thread context.
> +  @param  table  Table handle.
> +
> +  @retval FALSE  This table is not parent in a non-self-referencing foreign
> +                 key. Statement can proceed.
> +  @retval TRUE   This table is parent in a non-self-referencing foreign key,
> +                 error was emitted.
>  */
>  
>  static bool
> -delete_all_rows(THD *thd, TABLE *table)
> +fk_truncate_illegal_if_parent(THD *thd, TABLE *table)
>  {
> -  int error;
> -  READ_RECORD info;
> -  bool is_bulk_delete;
> -  bool some_rows_deleted= FALSE;
> -  bool save_binlog_row_based= thd->is_current_stmt_binlog_format_row();
> -  DBUG_ENTER("delete_all_rows");
> +  FOREIGN_KEY_INFO *f_key_info;
> +  List<FOREIGN_KEY_INFO> f_key_list;
> +  List_iterator_fast<FOREIGN_KEY_INFO> it;
>  
> -  /* Replication of truncate table must be statement based. */
> -  thd->clear_current_stmt_binlog_format_row();
> +  /* Whether the table is referenced by a foreign key. */
> +  if (! table->file->referenced_by_foreign_key())
> +    return FALSE;
>  
> -  /*
> -    Update handler statistics (e.g. table->file->stats.records).
> -    Might be used by the storage engine to aggregate information
> -    necessary to allow deletion. Currently, this seems to be
> -    meaningful only to the archive storage engine, which uses
> -    the info method to set the number of records. Although
> -    archive does not support deletion, it becomes necessary in
> -    order to return a error if the table is not empty.
> -  */
> -  error= table->file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK);
> -  if (error && error != HA_ERR_WRONG_COMMAND)
> -  {
> -    table->file->print_error(error, MYF(0));
> -    goto end;
> -  }
> +  table->file->get_foreign_key_list(thd, &f_key_list);
>  
> -  /*
> -    Attempt to delete all rows in the table.
> -    If it is unsupported, switch to row by row deletion.
> -  */
> -  if (! (error= table->file->ha_delete_all_rows()))
> -    goto end;
> +  it.init(f_key_list);
>  
> -  if (error != HA_ERR_WRONG_COMMAND)
> +  /* Check whether it is a self-referencing FK. */
> +  while ((f_key_info= it++))
>    {
> -    /*
> -      If a transactional engine fails in the middle of deletion,
> -      we expect it to be able to roll it back.  Some reasons
> -      for the engine to fail would be media failure or corrupted
> -      data dictionary (i.e. in case of a partitioned table). We
> -      have sufficiently strong metadata locks to rule out any
> -      potential deadlocks.
> -
> -      If a non-transactional engine fails here (that would
> -      not be MyISAM, since MyISAM does TRUNCATE by recreate),
> -      and binlog is on, replication breaks, since nothing gets
> -      written to the binary log.  (XXX: is this a bug?)
> -    */
> -    table->file->print_error(error, MYF(0));
> -    goto end;
> +    if (my_strcasecmp(system_charset_info, f_key_info->referenced_db->str,
> +                      table->s->db.str) ||
> +        my_strcasecmp(system_charset_info, f_key_info->referenced_table->str,
> +                      table->s->table_name.str))
> +      break;
>    }
>  
> -  /*
> -    A workaround for Bug#53696  "Performance schema engine violates the
> -    PSEA API by calling my_error()".
> -  */
> -  if (thd->is_error())
> -    goto end;
> -

AFAIU this bug is still there. Does this mean that this code should
have been moved somewhere else and not removed?

> -  /* Handler didn't support fast delete. Delete rows one by one. */
> -
> -  init_read_record(&info, thd, table, NULL, TRUE, TRUE, FALSE);
> -
> -  /*
> -    Start bulk delete. If the engine does not support it, go on,
> -    it's not an error.
> -  */
> -  is_bulk_delete= ! table->file->start_bulk_delete();
> +  /* Table is parent in a non-self-referencing foreign key. */
> +  if (f_key_list.is_empty() || f_key_info)
> +  {
> +    my_error(ER_TRUNCATE_ILLEGAL_FK, MYF(0), table->s->table_name.str);
> +    return TRUE;
> +  }

Could you please clarify why "f_key_list.is_empty()" part
of the above condition is needed?

>  
> -  table->mark_columns_needed_for_delete();
> +  return FALSE;
> +}
>  
> -  while (!(error= info.read_record(&info)) && !thd->killed)
> -  {
> -    if ((error= table->file->ha_delete_row(table->record[0])))
> -    {
> -      table->file->print_error(error, MYF(0));
> -      break;
> -    }
>  
> -    some_rows_deleted= TRUE;
> -  }
> +/*
> +  Open and truncate a locked table.
>  
> -  /* HA_ERR_END_OF_FILE */
> -  if (error == -1)
> -    error= 0;
> +  @param  thd           Thread context.
> +  @param  table_ref     Table list element for the table to be truncated.
> +  @param  is_tmp_table  True if element refers to a temp table.
>  
> -  /* Close down the bulk delete. */
> -  if (is_bulk_delete)
> -  {
> -    int bulk_delete_error= table->file->end_bulk_delete();
> -    if (bulk_delete_error && !error)
> -    {
> -      table->file->print_error(bulk_delete_error, MYF(0));
> -      error= bulk_delete_error;
> -    }
> -  }
> +  @retval  0    Success.
> +  @retval  > 0  Error code.
> +*/
>  
> -  end_read_record(&info);
> +int Truncate_statement::handler_truncate(THD *thd, TABLE_LIST *table_ref,
> +                                         bool is_tmp_table)
> +{
> +  int error= 0;
> +  uint flags;
> +  DBUG_ENTER("Truncate_statement::handler_truncate");
>  
>    /*
> -    Regardless of the error status, the query must be written to the
> -    binary log if rows of the table is non-transactional.
> +    Can't recreate, the engine must mechanically delete all rows
> +    in the table. Use open_and_lock_tables() to open a write cursor.
>    */
> -  if (some_rows_deleted && !table->file->has_transactions())
> +
> +  /* If it is a temporary table, no need to take locks. */
> +  if (is_tmp_table)
> +    flags= MYSQL_OPEN_TEMPORARY_ONLY;
> +  else
>    {
> -    thd->transaction.stmt.modified_non_trans_table= TRUE;
> -    thd->transaction.all.modified_non_trans_table= TRUE;
> +    /* We don't need to load triggers. */
> +    DBUG_ASSERT(table_ref->trg_event_map == 0);
> +    /*
> +      Our metadata lock guarantees that no transaction is reading
> +      or writing into the table. Yet, to open a write cursor we need
> +      a thr_lock lock. Allow to open base tables only.
> +    */
> +    table_ref->required_type= FRMTYPE_TABLE;
> +    /*
> +      Ignore pending FLUSH TABLES since we don't want to release
> +      the MDL lock taken above and otherwise there is no way to
> +      wait for FLUSH TABLES in deadlock-free fashion.
> +    */
> +    flags= MYSQL_OPEN_IGNORE_FLUSH | MYSQL_OPEN_SKIP_TEMPORARY;
> +    /*
> +      Even though we have an MDL lock on the table here, we don't
> +      pass MYSQL_OPEN_HAS_MDL_LOCK to open_and_lock_tables
> +      since to truncate a MERGE table, we must open and lock
> +      merge children, and on those we don't have an MDL lock.
> +      Thus clear the ticket to satisfy MDL asserts.
> +    */
> +    table_ref->mdl_request.ticket= NULL;
> +    /* The handler truncate protocol dictates a exclusive lock. */
> +    table_ref->mdl_request.set_type(MDL_EXCLUSIVE);
>    }

Maybe it makes sense to set the type of metadata lock right
in the parser?

...

> @@ -225,30 +200,27 @@ static bool recreate_temporary_table(THD
>  
>  
>  /*
> -  Handle opening and locking if a base table for truncate.
> +  Handle locking a base table for truncate.
>
>    @param[in]  thd               Thread context.
>    @param[in]  table_ref         Table list element for the table to
>                                  be truncated.
>    @param[out] hton_can_recreate Set to TRUE if table can be dropped
>                                  and recreated.
> -  @param[out] ticket_downgrade  Set if a lock must be downgraded after
> -                                truncate is done.
>  
>    @retval  FALSE  Success.
>    @retval  TRUE   Error.
>  */
>  
> -static bool open_and_lock_table_for_truncate(THD *thd, TABLE_LIST *table_ref,
> -                                             bool *hton_can_recreate,
> -                                             MDL_ticket **ticket_downgrade)
> +bool Truncate_statement::lock_table(THD *thd, TABLE_LIST *table_ref,
> +                                    bool *hton_can_recreate)
>  {
>    TABLE *table= NULL;
> -  handlerton *table_type;
> -  DBUG_ENTER("open_and_lock_table_for_truncate");
> +  DBUG_ENTER("Truncate_statement::lock_table");
>  
>    DBUG_ASSERT(table_ref->lock_type == TL_WRITE);
> -  DBUG_ASSERT(table_ref->mdl_request.type == MDL_SHARED_NO_READ_WRITE);
> +  DBUG_ASSERT(table_ref->mdl_request.type >= MDL_SHARED_NO_READ_WRITE);
> +
>    /*
>      Before doing anything else, acquire a metadata lock on the table,
>      or ensure we have one.  We don't use open_and_lock_tables()
> @@ -268,8 +240,7 @@ static bool open_and_lock_table_for_trun
>                                              table_ref->table_name, FALSE)))
>        DBUG_RETURN(TRUE);
>  
> -    table_type= table->s->db_type();
> -    *hton_can_recreate= ha_check_storage_engine_flag(table_type,
> +    *hton_can_recreate= ha_check_storage_engine_flag(table->s->db_type(),
>                                                       HTON_CAN_RECREATE);
>      table_ref->mdl_request.ticket= table->mdl_ticket;
>    }
> @@ -285,86 +256,36 @@ static bool open_and_lock_table_for_trun
>                           MYSQL_OPEN_SKIP_TEMPORARY))
>        DBUG_RETURN(TRUE);
>  
> -    if (dd_frm_storage_engine(thd, table_ref->db, table_ref->table_name,
> -                              &table_type))
> +    if (dd_check_storage_engine_flag(thd, table_ref->db,
> table_ref->table_name,
> +                                     HTON_CAN_RECREATE, hton_can_recreate))
>        DBUG_RETURN(TRUE);
> -    *hton_can_recreate= ha_check_storage_engine_flag(table_type,
> -                                                     HTON_CAN_RECREATE);
>    }
>  
> -#ifdef WITH_PARTITION_STORAGE_ENGINE
> -  /*
> -    TODO: Add support for TRUNCATE PARTITION for NDB and other engines
> -    supporting native partitioning.
> -  */
> -  if (thd->lex->alter_info.flags & ALTER_ADMIN_PARTITION &&
> -      table_type != partition_hton)
> -  {
> -    my_error(ER_PARTITION_MGMT_ON_NONPARTITIONED, MYF(0));
> -    DBUG_RETURN(TRUE);
> -  }
> -#endif
>    DEBUG_SYNC(thd, "lock_table_for_truncate");
>  
> -  if (*hton_can_recreate)
> +  /*
> +    Acquire an exclusive lock. A storage engine can recreate or
> +    truncate the table only if there are no references to it from
> +    anywhere, i.e. no cached TABLE in the table cache. To remove
> +    the table from the cache we need an exclusive lock.
> +  */
> +  if (thd->locked_tables_mode)
>    {
> -    /*
> -      Acquire an exclusive lock. The storage engine can recreate the
> -      table only if there are no references to it from anywhere, i.e.
> -      no cached TABLE in the table cache. To remove the table from the
> -      cache we need an exclusive lock.
> -    */
> -    if (thd->locked_tables_mode)
> -    {
> -      if (wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN))
> -        DBUG_RETURN(TRUE);
> -      *ticket_downgrade= table->mdl_ticket;
> +    if (wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN))
> +      DBUG_RETURN(TRUE);
> +    m_ticket_downgrade= table->mdl_ticket;

So we init m_ticket_dowgrade in constructor and set here.
Do we reset it anywhere to make code re-execution safe?
(... OK, re-prepare will probably care about PS but what
about SPs?...)

> +    /* Close if table is going to be recreated. */
> +    if (*hton_can_recreate)
>        close_all_tables_for_name(thd, table->s, FALSE);
> -    }
> -    else
> -    {
> -      ulong timeout= thd->variables.lock_wait_timeout;
> -      if (thd->mdl_context.
> -          upgrade_shared_lock_to_exclusive(table_ref->mdl_request.ticket,
> -                                           timeout))
> -        DBUG_RETURN(TRUE);
> -      tdc_remove_table(thd, TDC_RT_REMOVE_ALL, table_ref->db,
> -                       table_ref->table_name, FALSE);
> -    }
>    }
>    else
>    {
> -    /*
> -      Can't recreate, we must mechanically delete all rows in
> -      the table. Our metadata lock guarantees that no transaction
> -      is reading or writing into the table. Yet, to open a write
> -      cursor we need a thr_lock lock. Use open_and_lock_tables()
> -      to do the necessary job.
> -    */
> -
> -    /* Allow to open base tables only. */
> -    table_ref->required_type= FRMTYPE_TABLE;
> -    /* We don't need to load triggers. */
> -    DBUG_ASSERT(table_ref->trg_event_map == 0);
> -    /*
> -      Even though we have an MDL lock on the table here, we don't
> -      pass MYSQL_OPEN_HAS_MDL_LOCK to open_and_lock_tables
> -      since to truncate a MERGE table, we must open and lock
> -      merge children, and on those we don't have an MDL lock.
> -      Thus clear the ticket to satisfy MDL asserts.
> -    */
> -    table_ref->mdl_request.ticket= NULL;
> -
> -    /*
> -      Open the table as it will handle some required preparations.
> -      Ignore pending FLUSH TABLES since we don't want to release
> -      the MDL lock taken above and otherwise there is no way to
> -      wait for FLUSH TABLES in deadlock-free fashion.
> -    */
> -    if (open_and_lock_tables(thd, table_ref, FALSE,
> -                             MYSQL_OPEN_IGNORE_FLUSH |
> -                             MYSQL_OPEN_SKIP_TEMPORARY))
> +    ulong timeout= thd->variables.lock_wait_timeout;
> +    if (thd->mdl_context.
> +        upgrade_shared_lock_to_exclusive(table_ref->mdl_request.ticket,
> timeout))
>        DBUG_RETURN(TRUE);
> +    tdc_remove_table(thd, TDC_RT_REMOVE_ALL, table_ref->db,
> +                     table_ref->table_name, FALSE);
>    }

Taking into account that we now always upgrade metadata locks 
for base tables maybe it makes sense to take X metadata lock on
table being truncated right from the start?

>  
>    DBUG_RETURN(FALSE);
> @@ -385,14 +306,14 @@ static bool open_and_lock_table_for_trun
>    @retval  TRUE   Error.
>  */
>  
> -bool mysql_truncate_table(THD *thd, TABLE_LIST *table_ref)
> +bool Truncate_statement::truncate_table(THD *thd, TABLE_LIST *table_ref)
>  {
> +  int error;
>    TABLE *table;
> -  bool error= TRUE, binlog_stmt;
> -  MDL_ticket *mdl_ticket= NULL;
> -  DBUG_ENTER("mysql_truncate_table");
> +  bool binlog_stmt;
> +  DBUG_ENTER("Truncate_statement::truncate_table");
>  
> -  /* Remove tables from the HANDLER's hash. */
> +  /* Remove table from the HANDLER's hash. */
>    mysql_ha_rm_tables(thd, table_ref);
>  
>    /* If it is a temporary table, no need to take locks. */
> @@ -413,14 +334,11 @@ bool mysql_truncate_table(THD *thd, TABL
>      {
>        /*
>          The engine does not support truncate-by-recreate. Open the
> -        table and delete all rows. In such a manner this can in fact
> -        open several tables if it's a temporary MyISAMMRG table.
> +        table and invoke the handler truncate. In such a manner this
> +        can in fact open several tables if it's a temporary MyISAMMRG
> +        table.
>        */
> -      if (open_and_lock_tables(thd, table_ref, FALSE,
> -                               MYSQL_OPEN_TEMPORARY_ONLY))
> -        DBUG_RETURN(TRUE);
> -
> -      error= delete_all_rows(thd, table_ref->table);
> +      error= handler_truncate(thd, table_ref, TRUE);
>      }
>  
>      /*
> @@ -434,8 +352,7 @@ bool mysql_truncate_table(THD *thd, TABL
>    {
>      bool hton_can_recreate;
>  
> -    if (open_and_lock_table_for_truncate(thd, table_ref,
> -                                         &hton_can_recreate, &mdl_ticket))
> +    if (lock_table(thd, table_ref, &hton_can_recreate))
>        DBUG_RETURN(TRUE);
>  
>      if (hton_can_recreate)
> @@ -454,13 +371,18 @@ bool mysql_truncate_table(THD *thd, TABL
>      }
>      else
>      {
> -      error= delete_all_rows(thd, table_ref->table);
> +      /*
> +        The engine does not support truncate-by-recreate.
> +        Attempt to use the handler truncate method.
> +      */
> +      error= handler_truncate(thd, table_ref, FALSE);
>  
>        /*
> -        Regardless of the error status, the query must be written to the
> -        binary log if rows of a non-transactional table were deleted.
> +        All effects of a TRUNCATE TABLE operation are committed even if
> +        truncation fails. Thus, the query must be written to the binary
> +        log. The only exception is a unimplemented truncate method.
>        */
> -      binlog_stmt= !error || thd->transaction.stmt.modified_non_trans_table;
> +      binlog_stmt= !error || error != HA_ERR_WRONG_COMMAND;
>      }
>  
>      query_cache_invalidate3(thd, table_ref, FALSE);
> @@ -471,29 +393,25 @@ bool mysql_truncate_table(THD *thd, TABL
>      error|= write_bin_log(thd, !error, thd->query(), thd->query_length());
>  
>    /*
> -    All effects of a TRUNCATE TABLE operation are rolled back if a row
> -    by row deletion fails. Otherwise, it is automatically committed at
> -    the end.
> -  */
> -  if (error)
> -  {
> -    trans_rollback_stmt(thd);
> -    trans_rollback(thd);
> -  }
> -
> -  /*
>      A locked table ticket was upgraded to a exclusive lock. After the
>      the query has been written to the binary log, downgrade the lock
>      to a shared one.
>    */
> -  if (mdl_ticket)
> -    mdl_ticket->downgrade_exclusive_lock(MDL_SHARED_NO_READ_WRITE);
> +  if (m_ticket_downgrade)
> +    m_ticket_downgrade->downgrade_exclusive_lock(MDL_SHARED_NO_READ_WRITE);
>  

See the above code about resetting m_ticket_downgrade before
statement re-execution.

> -  DBUG_PRINT("exit", ("error: %d", error));
> -  DBUG_RETURN(test(error));
> +  DBUG_RETURN(error);
>  }
>  
>  
> +/**
> +  Execute a TRUNCATE statement at runtime.
> +
> +  @param  thd   The current thread.
> +
> +  @return FALSE on success.
> +*/
> +
>  bool Truncate_statement::execute(THD *thd)
>  {
>    TABLE_LIST *first_table= thd->lex->select_lex.table_list.first;
> @@ -502,9 +420,10 @@ bool Truncate_statement::execute(THD *th
>  
>    if (check_one_table_access(thd, DROP_ACL, first_table))
>      goto error;
> +
>    /*
>      Don't allow this within a transaction because we want to use
> -    re-generate table
> +    re-generate table.
>    */
>    if (thd->in_active_multi_stmt_transaction())
>    {

Taking into account that TRUNCATE TABLE causes implicit commit
do we need the above if-statement at all?

> @@ -512,8 +431,11 @@ bool Truncate_statement::execute(THD *th
>                 ER(ER_LOCK_OR_ACTIVE_TRANSACTION), MYF(0));
>      goto error;
>    }
> -  if (! (res= mysql_truncate_table(thd, first_table)))
> +
> +  if (! (res= truncate_table(thd, first_table)))
>      my_ok(thd);
> +
>  error:
>    DBUG_RETURN(res);
>  }
> +
> 

OK.

> === modified file 'sql/sql_truncate.h'
> --- a/sql/sql_truncate.h	2010-08-16 12:53:30 +0000
> +++ b/sql/sql_truncate.h	2010-09-29 22:38:57 +0000
> @@ -18,23 +18,26 @@
>  class THD;
>  struct TABLE_LIST;
>  
> -bool mysql_truncate_table(THD *thd, TABLE_LIST *table_ref);
> -
>  /**
>    Truncate_statement represents the TRUNCATE statement.
>  */
>  class Truncate_statement : public Sql_statement
>  {
> +private:
> +  /* Set if a lock must be downgraded after truncate is done. */
> +  MDL_ticket *m_ticket_downgrade;
> +
>  public:
>    /**
>      Constructor, used to represent a ALTER TABLE statement.
>      @param lex the LEX structure for this statement.
>    */
>    Truncate_statement(LEX *lex)
> -    : Sql_statement(lex)
> +    : Sql_statement(lex),
> +      m_ticket_downgrade(NULL)
>    {}
>  
> -  ~Truncate_statement()
> +  virtual ~Truncate_statement()
>    {}
>  
>    /**
> @@ -43,7 +46,20 @@ public:
>      @return false on success.
>    */
>    bool execute(THD *thd);
> -};
>  
> +protected:
> +  /** Handle locking a base table for truncate. */
> +  bool lock_table(THD *, TABLE_LIST *, bool *);
> +
> +  /** Truncate table via the handler method. */
> +  int handler_truncate(THD *, TABLE_LIST *, bool);
> +
> +  /**
> +    Optimized delete of all rows by doing a full generate of the table.

Maybe it is better to use word "regenerate" here ?

> +    Depending on the storage engine, it can be accomplished through a
> +    drop and recreate or via the handler truncate method.
> +  */
> +  bool truncate_table(THD *, TABLE_LIST *);
> +};
>  
>  #endif
> 
> === modified file 'storage/archive/ha_archive.cc'
> --- a/storage/archive/ha_archive.cc	2010-07-26 15:54:20 +0000
> +++ b/storage/archive/ha_archive.cc	2010-09-29 22:38:57 +0000
> @@ -1650,9 +1650,9 @@ int ha_archive::end_bulk_insert()
>    This is done for security reasons. In a later version we will enable this by 
>    allowing the user to select a different row format.
>  */
> -int ha_archive::delete_all_rows()
> +int ha_archive::truncate()
>  {
> -  DBUG_ENTER("ha_archive::delete_all_rows");
> +  DBUG_ENTER("ha_archive::truncate");
>    DBUG_RETURN(HA_ERR_WRONG_COMMAND);
>  }
>  

OK.

> === modified file 'storage/blackhole/ha_blackhole.cc'
> --- a/storage/blackhole/ha_blackhole.cc	2010-07-08 21:20:08 +0000
> +++ b/storage/blackhole/ha_blackhole.cc	2010-09-29 22:38:57 +0000
> @@ -87,6 +87,12 @@ int ha_blackhole::create(const char *nam
>    DBUG_RETURN(0);
>  }
>  
> +int ha_blackhole::truncate()
> +{
> +  DBUG_ENTER("ha_blackhole::truncate");
> +  DBUG_RETURN(0);
> +}
> +
>  const char *ha_blackhole::index_type(uint key_number)
>  {
>    DBUG_ENTER("ha_blackhole::index_type");
> 

Do I understand correctly that the above method is needed only
to support partitioning over blackhole engine (as this engine
has HTON_CAN_RECREATE set)? Maybe it is worth to mention this
in the comment for this method?

> === modified file 'storage/csv/ha_tina.cc'
> --- a/storage/csv/ha_tina.cc	2010-07-29 12:32:11 +0000
> +++ b/storage/csv/ha_tina.cc	2010-09-29 22:38:57 +0000
> @@ -1638,6 +1638,16 @@ int ha_tina::delete_all_rows()
>  }
>  
>  /*
> +  Manual truncate table.
> +*/
> +
> +int ha_tina::truncate()
> +{
> +  /* Simply delete all rows. No auto increment to reset. */
> +  return delete_all_rows();
> +}
> +
> +/*
>    Called by the database to lock the table. Keep in mind that this
>    is an internal lock.
>  */
> 

On one hand CSV has HTON_CAN_RECREATE flag set. So the above
method will be called only for partitioned tables based on
this engine.
But on the other hand it also has HTON_NO_PARTITION set so such
tables are impossible to create.

Therefore I think it makes sense to leave this method unimplemented.

> === modified file 'storage/example/ha_example.cc'
> --- a/storage/example/ha_example.cc	2010-08-05 12:34:19 +0000
> +++ b/storage/example/ha_example.cc	2010-09-29 22:38:57 +0000
> @@ -739,6 +739,26 @@ int ha_example::delete_all_rows()
>  
>  /**
>    @brief
> +  Used for handler specific truncate table.  The table is locked in
> +  exclusive mode and handler is responsible for reseting the auto-
> +  increment counter.
> +
> +    @details
> +  Called from Truncate_statement::handler_truncate.
> +
> +    @see
> +  Truncate_statement in sql_truncate.cc
> +  Remarks in handler::truncate.
> +*/
> +int ha_example::truncate()

Don't you think that layout of the above comment looks wrong?
Also maybe it makes sense to mention in this comment the role
of HTON_CAN_RECREATE flag, especially since it is set for this
engine?

...

> === modified file 'storage/ibmdb2i/ha_ibmdb2i.cc'
> --- a/storage/ibmdb2i/ha_ibmdb2i.cc	2010-07-08 21:20:08 +0000
> +++ b/storage/ibmdb2i/ha_ibmdb2i.cc	2010-09-29 22:38:57 +0000
> @@ -1806,6 +1806,13 @@ int ha_ibmdb2i::delete_all_rows()
>  }
>  
>  
> +int ha_ibmdb2i::truncate()
> +{
> +  int error = delete_all_rows();
> +  return error ? error : reset_auto_increment(0);
> +}
> +
> +
>  int ha_ibmdb2i::external_lock(THD *thd, int lock_type)
>  {
>    int rc = 0;

OK. Seems like ha_ibmdb2i::delete_all_rows() contains
some truncate-specific code, so it probably makes sense
to notify person responsible for supporting this code
(whoever it is) once your patch is pushed.

> 
> === modified file 'storage/innobase/handler/ha_innodb.cc'
> --- a/storage/innobase/handler/ha_innodb.cc	2010-08-18 07:14:06 +0000
> +++ b/storage/innobase/handler/ha_innodb.cc	2010-09-29 22:38:57 +0000
> @@ -7072,33 +7072,21 @@ Deletes all rows of an InnoDB table.
>  @return	error number */
>  UNIV_INTERN
>  int
> -ha_innobase::delete_all_rows(void)
> +ha_innobase::truncate(void)
>  /*==============================*/
>  {
>  	int		error;
>  
> -	DBUG_ENTER("ha_innobase::delete_all_rows");
> +	DBUG_ENTER("ha_innobase::truncate");
>  
>  	/* Get the transaction associated with the current thd, or create one
>  	if not yet created, and update prebuilt->trx */
>  
>  	update_thd(ha_thd());
>  
> -	if (thd_sql_command(user_thd) != SQLCOM_TRUNCATE) {
> -	fallback:
> -		/* We only handle TRUNCATE TABLE t as a special case.
> -		DELETE FROM t will have to use ha_innobase::delete_row(),
> -		because DELETE is transactional while TRUNCATE is not. */
> -		DBUG_RETURN(my_errno=HA_ERR_WRONG_COMMAND);
> -	}
> -
>  	/* Truncate the table in InnoDB */
>  
>  	error = row_truncate_table_for_mysql(prebuilt->table, prebuilt->trx);
> -	if (error == DB_ERROR) {
> -		/* Cannot truncate; resort to ha_innobase::delete_row() */
> -		goto fallback;
> -	}
>  
>  	error = convert_error_code_to_mysql(error, prebuilt->table->flags,
>  					    NULL);
>

OK. Please notify InnoDB guys so they can perform necessary clean-ups
in their code once your patch is pushed.

> === modified file 'storage/myisam/ha_myisam.cc'
> --- a/storage/myisam/ha_myisam.cc	2010-07-08 21:20:08 +0000
> +++ b/storage/myisam/ha_myisam.cc	2010-09-29 22:38:57 +0000
> @@ -1788,6 +1788,12 @@ int ha_myisam::delete_all_rows()
>    return mi_delete_all_rows(file);
>  }
>  
> +int ha_myisam::truncate()
> +{
> +  int error= delete_all_rows();
> +  return error ? error : reset_auto_increment(0);
> +}
> +
>  int ha_myisam::reset_auto_increment(ulonglong value)
>  {
>    file->s->state.auto_increment= value;
> 

Again maybe makes sense to mention in the comment describing
this method that we need ha_myisam::truncate() only for the
sake of partitioning?

> === modified file 'storage/myisammrg/ha_myisammrg.cc'
> --- a/storage/myisammrg/ha_myisammrg.cc	2010-09-08 08:25:37 +0000
> +++ b/storage/myisammrg/ha_myisammrg.cc	2010-09-29 22:38:57 +0000
> @@ -1203,6 +1203,22 @@ ha_rows ha_myisammrg::records_in_range(u
>  }
>  
>  
> +int ha_myisammrg::truncate()
> +{
> +  int err= 0;
> +  MYRG_TABLE *table;
> +  DBUG_ENTER("ha_myisammrg::truncate");
> +
> +  for (table= file->open_tables; table != file->end_table; table++)
> +  {
> +    if ((err= mi_delete_all_rows(table->table)))
> +      break;
> +  }
> +
> +  DBUG_RETURN(err);
> +}
> +
> +

Hmm... MERGE doesn't support HA_CAN_RECREATE and doesn't implement
delete_all_rows() method. So it didn't support ordinary TRUNCATE
before your patch. Also it has HTON_NO_PARTITION set so this method
is not necessary for partitioning.

I don't think we should enable support of TRUNCATE for MERGE
in your patch. Otherwise we probably should do something about
resetting auto-increment and add test coverage for this new
functionality.


> === modified file 'storage/perfschema/ha_perfschema.cc'
> --- a/storage/perfschema/ha_perfschema.cc	2010-09-10 09:10:38 +0000
> +++ b/storage/perfschema/ha_perfschema.cc	2010-09-29 22:38:57 +0000
> @@ -345,6 +345,11 @@ int ha_perfschema::delete_all_rows(void)
>    DBUG_RETURN(result);
>  }
>  
> +int ha_perfschema::truncate()
> +{
> +  return delete_all_rows();
> +}
> +

See my comment about bug #53696 above.

...

Otherwise I am OK with your patch.

I think it can be pushed after considering/addressing/discussing
the above points and provided that we will settle somehow the
issue with incompatible change in TRUNCATE behavior for InnoDB
tables with FKs.

-- 
Dmitry Lenev, Software Developer
Oracle Development SPB/MySQL, www.mysql.com

Are you MySQL certified?  http://www.mysql.com/certification
Thread
bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938 Bug#54678Davi Arnaut30 Sep
Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938Bug#54678Dmitry Lenev4 Oct
  • Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938Bug#54678Davi Arnaut4 Oct
    • Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938Bug#54678Dmitry Lenev5 Oct