Hi Dmitry,
On 10/4/10 4:11 PM, Dmitry Lenev wrote:
> Hello Davi!
>
> Here are my comments about your patch:
Thanks!
> * Davi Arnaut<davi.arnaut@stripped> [10/09/30 02:41]:
>> 3149 Davi Arnaut 2010-09-29
>> Bug#49938: Failing assertion: inode or deadlock in fsp/fsp0fsp.c
>> Bug#54678: InnoDB, TRUNCATE, ALTER, I_S SELECT, crash or deadlock
>>
>> The problem was that for storage engines that do not support
>> truncate table via a external drop and recreate, such as InnoDB
>> which implements truncate via a internal drop and recreate, the
>> delete_all_rows method could be invoked with a shared metadata
>> lock, causing problems if the engine needed exclusive access
>> to some internal metadata. This problem originated with the
>> fact that there is no truncate specific handler method, which
>> ended up leading to a abuse of the delete_all_rows method that
>> is primarily used for delete operations without a condition.
>>
>> The solution is to introduce a truncate handler method that is
>> invoked when the engine does not support truncation via a table
>> drop and recreate. This method is invoked under a exclusive
>> metadata lock, so that there is only a single instance of the
>> table when the method is invoked.
>>
>> Also, the method is not invoked and a error is thrown if
>> the table participates in a non-self-referencing foreign key
>> relationship. This was necessary to avoid inconsistency as
>> integrity checks are bypassed. This is inline with the fact
>> that truncate is primarily a DDL operation that was designed
>> to quickly remove all data from a table.
>>
>> Incompatible change: truncate no longer resorts to a high-level
>> row by row delete if the storage engine does not support the
>> truncate method. Consequently, the count of rows affected does
>> not reflect the actual number of rows.
>
> I think it makes sense to highlight in this comment the fact that
> TRUNCATE will no longer work for InnoDB tables which participate
> in non-recursive foreign key as parent. This includes cases when it
> succeeded before - when child table was empty (e.g. due to previous
> truncation) or this foreign key had a CASCADE as ON DELETE action.
OK.
> Actually this incompatible change is my main issue with your patch.
> Although I agree that this is a step in the right direction (we did
> the same thing in 6.1-fk tree after all) I am concerned with the
> fact that we are introducing it fairly late in the release cycle.
>
FWIW, I'll double check with the InnoDB team whether we can lift this.
>
>> === modified file 'mysql-test/r/trigger-trans.result'
>> --- a/mysql-test/r/trigger-trans.result 2008-10-07 16:54:12 +0000
>> +++ b/mysql-test/r/trigger-trans.result 2010-09-29 22:38:57 +0000
>> @@ -151,9 +151,14 @@ CREATE TRIGGER t1_ad AFTER DELETE ON t1
>> SET @a = 0;
>> SET @b = 0;
>> TRUNCATE t1;
>> +ERROR 42000: Cannot truncate table 't1' that is referred to by foreign key in
> another table
>> SELECT @a, @b;
>> @a @b
>> 0 0
>> +DELETE FROM t1;
>> +SELECT @a, @b;
>> +@a @b
>> +1 1
>> INSERT INTO t1 VALUES (1);
>
> I think the above 3 statements are redundant and can be simply removed.
OK.
>> DELETE FROM t1;
>> SELECT @a, @b;
>>
>
> ...
>
>> === modified file 'sql/ha_partition.cc'
>> --- a/sql/ha_partition.cc 2010-08-19 08:22:23 +0000
>> +++ b/sql/ha_partition.cc 2010-09-29 22:38:57 +0000
>
> ...
>
>> +
>> +
>> +/**
>> + Manually truncate the table.
>> +
>> + @remark Auto increment value will be truncated in that partition as well!
>> +
>
> Is the above comment still relevant?
Yes, but I think it was meant for partition truncate.
> If yes which partition is meant under "that" ?
Will move this remark to ha_partition::truncate_partition.
>> + @retval 0 Success.
>> + @retval> 0 Error code.
>> +*/
>> +
>> +int ha_partition::truncate()
>> +{
>
> ...
>
>>
>>
>> +/**
>> + Truncate a set of specific partitioins.
> ^^^^^^^^^^^
> Typo: partitions
Fixed.
[..]
>
>> === modified file 'sql/ha_partition.h'
>> --- a/sql/ha_partition.h 2010-07-23 20:09:27 +0000
>> +++ b/sql/ha_partition.h 2010-09-29 22:38:57 +0000
>> @@ -342,6 +342,7 @@ public:
>> virtual int update_row(const uchar * old_data, uchar * new_data);
>> virtual int delete_row(const uchar * buf);
>> virtual int delete_all_rows(void);
>> + virtual int truncate();
>> virtual void start_bulk_insert(ha_rows rows);
>> virtual int end_bulk_insert();
>> private:
>> @@ -350,6 +351,12 @@ private:
>> long estimate_read_buffer_size(long original_size);
>> public:
>>
>> + /*
>> + Method for truncating a specific partition.
>> + (i.e. ALTER TABLE t1 TRUNCATE PARTITION p).
>> + */
>> + int truncate_partition(Alter_info *);
>> +
>
>
> Please correct me if I am wrong. This method is not a method
> of general SE API but rather a partitioning-specific hook.
> So we are not adding Alter_info class to the SE API.
> Right?
Right.
> Maybe it makes sense to mention this in the above comment?
Added:
@remark This method is partitioning-specific hook and
thus not a member of the general SE API.
> ...
>
>> === modified file 'sql/handler.h'
>> --- a/sql/handler.h 2010-08-18 11:55:37 +0000
>> +++ b/sql/handler.h 2010-09-29 22:38:57 +0000
>> @@ -1331,6 +1331,7 @@ public:
>> int ha_bulk_update_row(const uchar *old_data, uchar *new_data,
>> uint *dup_key_found);
>> int ha_delete_all_rows();
>> + int ha_truncate();
>> int ha_reset_auto_increment(ulonglong value);
>> int ha_optimize(THD* thd, HA_CHECK_OPT* check_opt);
>> int ha_analyze(THD* thd, HA_CHECK_OPT* check_opt);
>> @@ -2010,12 +2011,31 @@ private:
>> This is called to delete all rows in a table
>> If the handler don't support this, then this function will
>> return HA_ERR_WRONG_COMMAND and MySQL will delete the rows one
>> - by one. It should reset auto_increment if
>> - thd->lex->sql_command == SQLCOM_TRUNCATE.
>> + by one.
>> */
>> virtual int delete_all_rows()
>> { return (my_errno=HA_ERR_WRONG_COMMAND); }
>> /**
>> + Quickly remove all rows from a table.
>> +
>> + @remark This method is responsible for implementing MySQL's TRUNCATE
>> + TABLE statement, which is a DDL operation. As such, a engine
>> + can bypass certain integrity checks and in some cases avoid
>> + fine-grained locking (e.g. row locks) which would normally be
>> + required for a DELETE statement.
>> +
>> + @remark Typically, truncate is not used if it can result in integrity
>> + violation. For example, truncate is not used when a foreign
>> + key references the table, but it might be used if foreign key
>> + checks are disabled.
>> +
>> + @remark Engine is responsible for reseting the auto-increment counter.
>> +
>> + @remark The table is locked in exclusive mode.
>> + */
>> + virtual int truncate()
>> + { return HA_ERR_WRONG_COMMAND; }
>> + /**
>> Reset the auto-increment counter to the given value, i.e. the next row
>> inserted will get the given value. This is called e.g. after TRUNCATE
>> is emulated by doing a 'DELETE FROM t'. HA_ERR_WRONG_COMMAND is
>>
>
> Hmm... handler::ha_/reset_auto_increment() is no longer used outside
> of storage engines code. I am not sure if it makes sense to get rid
> of this API method at this point... OTOH it is probably a good idea
We could deprecate it.
> to update the above comment describing this method.
Updated.
> ...
>
>> === modified file 'sql/sql_parse.cc'
>> --- a/sql/sql_parse.cc 2010-09-22 08:15:41 +0000
>> +++ b/sql/sql_parse.cc 2010-09-29 22:38:57 +0000
>> @@ -49,7 +49,7 @@
>> // mysql_recreate_table,
>> // mysql_backup_table,
>> // mysql_restore_table
>> -#include "sql_truncate.h" // mysql_truncate_table
>> +#include "sql_truncate.h" // Truncate_statement
>
> Taking into account that Truncate_statement class is not used in
> sql_parse.cc do we need this include at all?
Removed.
> ...
>
>> === modified file 'sql/sql_partition_admin.cc'
>> --- a/sql/sql_partition_admin.cc 2010-08-16 14:25:23 +0000
>> +++ b/sql/sql_partition_admin.cc 2010-09-29 22:38:57 +0000
>> @@ -16,10 +16,10 @@
>> #include "sql_parse.h" // check_one_table_access
>> #include "sql_table.h" // mysql_alter_table, etc.
>> #include "sql_lex.h" // Sql_statement
>> -#include "sql_truncate.h" // mysql_truncate_table,
>> - // Truncate_statement
>> #include "sql_admin.h" //
> Analyze/Check/.._table_statement
>> #include "sql_partition_admin.h" // Alter_table_*_partition
>> +#include "ha_partition.h" // ha_partition
>> +#include "sql_base.h" // open_and_lock_tables
>>
>> #ifndef WITH_PARTITION_STORAGE_ENGINE
>>
>> @@ -46,7 +46,7 @@ bool Alter_table_analyze_partition_state
>> m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION;
>>
>> res= Analyze_table_statement::execute(thd);
>> -
>> +
>> DBUG_RETURN(res);
>> }
>>
>> @@ -104,36 +104,64 @@ bool Alter_table_repair_partition_statem
>>
>> bool Alter_table_truncate_partition_statement::execute(THD *thd)
>> {
>> + int error;
>> + ha_partition *partition;
>> TABLE_LIST *first_table= thd->lex->select_lex.table_list.first;
>> - bool res;
>> - enum_sql_command original_sql_command;
>> DBUG_ENTER("Alter_table_truncate_partition_statement::execute");
>>
>> /*
>> - Execute TRUNCATE PARTITION just like TRUNCATE TABLE.
>> - Some storage engines (InnoDB, partition) checks thd_sql_command,
>> - so we set it to SQLCOM_TRUNCATE during the execution.
>> - */
>> - original_sql_command= m_lex->sql_command;
>> - m_lex->sql_command= SQLCOM_TRUNCATE;
>> -
>> - /*
>> Flag that it is an ALTER command which administrates partitions, used
>> by ha_partition.
>> */
>> - m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION;
>> -
>> + m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION |
>> + ALTER_TRUNCATE_PARTITION;
>> +
>> + /* Fix the lock types (not the same as ordinary ALTER TABLE). */
>> + first_table->lock_type= TL_WRITE;
>> + first_table->mdl_request.set_type(MDL_EXCLUSIVE);
>> +
>> /*
>> - Fix the lock types (not the same as ordinary ALTER TABLE).
>> + Check table permissions and open it with a exclusive lock.
>> + Ensure it is a partitioned table and finally, upcast the
>> + handler and invoke the partition truncate method. Lastly,
>> + write the statement to the binary log if necessary.
>> */
>> - first_table->lock_type= TL_WRITE;
>> - first_table->mdl_request.set_type(MDL_SHARED_NO_READ_WRITE);
>>
>> - /* execute as a TRUNCATE TABLE */
>> - res= Truncate_statement::execute(thd);
>> + if (check_one_table_access(thd, DROP_ACL, first_table))
>> + DBUG_RETURN(TRUE);
>>
>> - m_lex->sql_command= original_sql_command;
>> - DBUG_RETURN(res);
>> + if (open_and_lock_tables(thd, first_table, FALSE, 0))
>> + DBUG_RETURN(TRUE);
>> +
>> + /*
>> + TODO: Add support for TRUNCATE PARTITION for NDB and other
>> + engines supporting native partitioning.
>> + */
>> + if (first_table->table->s->db_type() != partition_hton)
>> + {
>> + my_error(ER_PARTITION_MGMT_ON_NONPARTITIONED, MYF(0));
>> + DBUG_RETURN(TRUE);
>> + }
>> +
>> + partition= (ha_partition *) first_table->table->file;
>> +
>> + /* Invoke the handler method responsible for truncating the partition. */
>> + if ((error=
> partition->truncate_partition(&thd->lex->alter_info)))
>> + first_table->table->file->print_error(error, MYF(0));
>> +
>> + /*
>> + All effects of a truncate operation are committed even if the
>> + operation fails. Thus, the query must be written to the binary
>> + log. The only exception is a unimplemented truncate method. Also,
>> + it is logged in statement format, regardless of the binlog format.
>> + */
>> + if (error != HA_ERR_WRONG_COMMAND)
>> + error|= write_bin_log(thd, !error, thd->query(),
> thd->query_length());
>> +
>> + if (! error)
>> + my_ok(thd);
>> +
>> + DBUG_RETURN(error);
>> }
>
> So what happens if one executes ALTER TABLE TRUNCATE PARTITION
> under LOCK TABLES? Don't we need to upgrade SNRW metadata lock
> to X lock in this case?
>
Yes. Fixed.
>
>> === modified file 'sql/sql_truncate.cc'
>> --- a/sql/sql_truncate.cc 2010-08-31 10:03:36 +0000
>> +++ b/sql/sql_truncate.cc 2010-09-29 22:38:57 +0000
>> @@ -29,145 +29,120 @@
>> #include "sql_truncate.h"
>>
>>
>> -/*
>> - Delete all rows of a locked table.
>> -
>> - @param thd Thread context.
>> - @param table_list Table list element for the table.
>> - @param rows_deleted Whether rows might have been deleted.
>> -
>> - @retval FALSE Success.
>> - @retval TRUE Error.
>> +/**
>> + Check if table which is going to be affected by TRUNCATE TABLE
>> + is a parent table in some non-self-referencing foreign key and
>> + if yes emit an error that it is illegal to truncate such table.
>> +
>> + @param thd Thread context.
>> + @param table Table handle.
>> +
>> + @retval FALSE This table is not parent in a non-self-referencing foreign
>> + key. Statement can proceed.
>> + @retval TRUE This table is parent in a non-self-referencing foreign key,
>> + error was emitted.
>> */
>>
>> static bool
>> -delete_all_rows(THD *thd, TABLE *table)
>> +fk_truncate_illegal_if_parent(THD *thd, TABLE *table)
>> {
>> - int error;
>> - READ_RECORD info;
>> - bool is_bulk_delete;
>> - bool some_rows_deleted= FALSE;
>> - bool save_binlog_row_based= thd->is_current_stmt_binlog_format_row();
>> - DBUG_ENTER("delete_all_rows");
>> + FOREIGN_KEY_INFO *f_key_info;
>> + List<FOREIGN_KEY_INFO> f_key_list;
>> + List_iterator_fast<FOREIGN_KEY_INFO> it;
>>
>> - /* Replication of truncate table must be statement based. */
>> - thd->clear_current_stmt_binlog_format_row();
>> + /* Whether the table is referenced by a foreign key. */
>> + if (! table->file->referenced_by_foreign_key())
>> + return FALSE;
>>
>> - /*
>> - Update handler statistics (e.g. table->file->stats.records).
>> - Might be used by the storage engine to aggregate information
>> - necessary to allow deletion. Currently, this seems to be
>> - meaningful only to the archive storage engine, which uses
>> - the info method to set the number of records. Although
>> - archive does not support deletion, it becomes necessary in
>> - order to return a error if the table is not empty.
>> - */
>> - error= table->file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK);
>> - if (error&& error != HA_ERR_WRONG_COMMAND)
>> - {
>> - table->file->print_error(error, MYF(0));
>> - goto end;
>> - }
>> + table->file->get_foreign_key_list(thd,&f_key_list);
>>
>> - /*
>> - Attempt to delete all rows in the table.
>> - If it is unsupported, switch to row by row deletion.
>> - */
>> - if (! (error= table->file->ha_delete_all_rows()))
>> - goto end;
>> + it.init(f_key_list);
>>
>> - if (error != HA_ERR_WRONG_COMMAND)
>> + /* Check whether it is a self-referencing FK. */
>> + while ((f_key_info= it++))
>> {
>> - /*
>> - If a transactional engine fails in the middle of deletion,
>> - we expect it to be able to roll it back. Some reasons
>> - for the engine to fail would be media failure or corrupted
>> - data dictionary (i.e. in case of a partitioned table). We
>> - have sufficiently strong metadata locks to rule out any
>> - potential deadlocks.
>> -
>> - If a non-transactional engine fails here (that would
>> - not be MyISAM, since MyISAM does TRUNCATE by recreate),
>> - and binlog is on, replication breaks, since nothing gets
>> - written to the binary log. (XXX: is this a bug?)
>> - */
>> - table->file->print_error(error, MYF(0));
>> - goto end;
>> + if (my_strcasecmp(system_charset_info,
> f_key_info->referenced_db->str,
>> + table->s->db.str) ||
>> + my_strcasecmp(system_charset_info,
> f_key_info->referenced_table->str,
>> + table->s->table_name.str))
>> + break;
>> }
>>
>> - /*
>> - A workaround for Bug#53696 "Performance schema engine violates the
>> - PSEA API by calling my_error()".
>> - */
>> - if (thd->is_error())
>> - goto end;
>> -
>
> AFAIU this bug is still there. Does this mean that this code should
> have been moved somewhere else and not removed?
Yes, the bug is still here but we don't need the hack anymore because we
no longer resort to row-by-row delete. What happens is that now my_error
is invoked two times, once by perfschema and another by print_error,
which is harmless.
>> - /* Handler didn't support fast delete. Delete rows one by one. */
>> -
>> - init_read_record(&info, thd, table, NULL, TRUE, TRUE, FALSE);
>> -
>> - /*
>> - Start bulk delete. If the engine does not support it, go on,
>> - it's not an error.
>> - */
>> - is_bulk_delete= ! table->file->start_bulk_delete();
>> + /* Table is parent in a non-self-referencing foreign key. */
>> + if (f_key_list.is_empty() || f_key_info)
>> + {
>> + my_error(ER_TRUNCATE_ILLEGAL_FK, MYF(0), table->s->table_name.str);
>> + return TRUE;
>> + }
>
> Could you please clarify why "f_key_list.is_empty()" part
> of the above condition is needed?
When we reach empty we know that the table is referenced by a foreign
key. In this case, we allow truncate only if all the FKs of this table
are self-referencing.
I've updated the comment to:
/*
Table is parent in a non-self-referencing foreign key or the table
has a foreign key that refers to another table. The intention is to
allow truncate only for tables that do not depend on other tables.
*/
Some background on what is being avoided: Bug#56785
>>
>> - table->mark_columns_needed_for_delete();
>> + return FALSE;
>> +}
>>
>> - while (!(error= info.read_record(&info))&& !thd->killed)
>> - {
>> - if ((error= table->file->ha_delete_row(table->record[0])))
>> - {
>> - table->file->print_error(error, MYF(0));
>> - break;
>> - }
>>
>> - some_rows_deleted= TRUE;
>> - }
>> +/*
>> + Open and truncate a locked table.
>>
>> - /* HA_ERR_END_OF_FILE */
>> - if (error == -1)
>> - error= 0;
>> + @param thd Thread context.
>> + @param table_ref Table list element for the table to be truncated.
>> + @param is_tmp_table True if element refers to a temp table.
>>
>> - /* Close down the bulk delete. */
>> - if (is_bulk_delete)
>> - {
>> - int bulk_delete_error= table->file->end_bulk_delete();
>> - if (bulk_delete_error&& !error)
>> - {
>> - table->file->print_error(bulk_delete_error, MYF(0));
>> - error= bulk_delete_error;
>> - }
>> - }
>> + @retval 0 Success.
>> + @retval> 0 Error code.
>> +*/
>>
>> - end_read_record(&info);
>> +int Truncate_statement::handler_truncate(THD *thd, TABLE_LIST *table_ref,
>> + bool is_tmp_table)
>> +{
>> + int error= 0;
>> + uint flags;
>> + DBUG_ENTER("Truncate_statement::handler_truncate");
>>
>> /*
>> - Regardless of the error status, the query must be written to the
>> - binary log if rows of the table is non-transactional.
>> + Can't recreate, the engine must mechanically delete all rows
>> + in the table. Use open_and_lock_tables() to open a write cursor.
>> */
>> - if (some_rows_deleted&& !table->file->has_transactions())
>> +
>> + /* If it is a temporary table, no need to take locks. */
>> + if (is_tmp_table)
>> + flags= MYSQL_OPEN_TEMPORARY_ONLY;
>> + else
>> {
>> - thd->transaction.stmt.modified_non_trans_table= TRUE;
>> - thd->transaction.all.modified_non_trans_table= TRUE;
>> + /* We don't need to load triggers. */
>> + DBUG_ASSERT(table_ref->trg_event_map == 0);
>> + /*
>> + Our metadata lock guarantees that no transaction is reading
>> + or writing into the table. Yet, to open a write cursor we need
>> + a thr_lock lock. Allow to open base tables only.
>> + */
>> + table_ref->required_type= FRMTYPE_TABLE;
>> + /*
>> + Ignore pending FLUSH TABLES since we don't want to release
>> + the MDL lock taken above and otherwise there is no way to
>> + wait for FLUSH TABLES in deadlock-free fashion.
>> + */
>> + flags= MYSQL_OPEN_IGNORE_FLUSH | MYSQL_OPEN_SKIP_TEMPORARY;
>> + /*
>> + Even though we have an MDL lock on the table here, we don't
>> + pass MYSQL_OPEN_HAS_MDL_LOCK to open_and_lock_tables
>> + since to truncate a MERGE table, we must open and lock
>> + merge children, and on those we don't have an MDL lock.
>> + Thus clear the ticket to satisfy MDL asserts.
>> + */
>> + table_ref->mdl_request.ticket= NULL;
>> + /* The handler truncate protocol dictates a exclusive lock. */
>> + table_ref->mdl_request.set_type(MDL_EXCLUSIVE);
>> }
>
> Maybe it makes sense to set the type of metadata lock right
> in the parser?
>
No, because we first check, with a shared lock, whether the engine
supports table recreate. Hum, actually, but this is also poses a problem
in a re-execution context, as the type will always be exclusive after
the first time the a engine for the table does not support recreate.
Suggestions?
>
>> @@ -225,30 +200,27 @@ static bool recreate_temporary_table(THD
>>
>>
>> /*
>> - Handle opening and locking if a base table for truncate.
>> + Handle locking a base table for truncate.
>>
>> @param[in] thd Thread context.
>> @param[in] table_ref Table list element for the table to
>> be truncated.
>> @param[out] hton_can_recreate Set to TRUE if table can be dropped
>> and recreated.
>> - @param[out] ticket_downgrade Set if a lock must be downgraded after
>> - truncate is done.
>>
>> @retval FALSE Success.
>> @retval TRUE Error.
>> */
>>
>> -static bool open_and_lock_table_for_truncate(THD *thd, TABLE_LIST *table_ref,
>> - bool *hton_can_recreate,
>> - MDL_ticket **ticket_downgrade)
>> +bool Truncate_statement::lock_table(THD *thd, TABLE_LIST *table_ref,
>> + bool *hton_can_recreate)
>> {
>> TABLE *table= NULL;
>> - handlerton *table_type;
>> - DBUG_ENTER("open_and_lock_table_for_truncate");
>> + DBUG_ENTER("Truncate_statement::lock_table");
>>
>> DBUG_ASSERT(table_ref->lock_type == TL_WRITE);
>> - DBUG_ASSERT(table_ref->mdl_request.type == MDL_SHARED_NO_READ_WRITE);
>> + DBUG_ASSERT(table_ref->mdl_request.type>= MDL_SHARED_NO_READ_WRITE);
>> +
>> /*
>> Before doing anything else, acquire a metadata lock on the table,
>> or ensure we have one. We don't use open_and_lock_tables()
>> @@ -268,8 +240,7 @@ static bool open_and_lock_table_for_trun
>> table_ref->table_name, FALSE)))
>> DBUG_RETURN(TRUE);
>>
>> - table_type= table->s->db_type();
>> - *hton_can_recreate= ha_check_storage_engine_flag(table_type,
>> + *hton_can_recreate= ha_check_storage_engine_flag(table->s->db_type(),
>> HTON_CAN_RECREATE);
>> table_ref->mdl_request.ticket= table->mdl_ticket;
>> }
>> @@ -285,86 +256,36 @@ static bool open_and_lock_table_for_trun
>> MYSQL_OPEN_SKIP_TEMPORARY))
>> DBUG_RETURN(TRUE);
>>
>> - if (dd_frm_storage_engine(thd, table_ref->db, table_ref->table_name,
>> -&table_type))
>> + if (dd_check_storage_engine_flag(thd, table_ref->db,
> table_ref->table_name,
>> + HTON_CAN_RECREATE, hton_can_recreate))
>> DBUG_RETURN(TRUE);
>> - *hton_can_recreate= ha_check_storage_engine_flag(table_type,
>> - HTON_CAN_RECREATE);
>> }
>>
>> -#ifdef WITH_PARTITION_STORAGE_ENGINE
>> - /*
>> - TODO: Add support for TRUNCATE PARTITION for NDB and other engines
>> - supporting native partitioning.
>> - */
>> - if (thd->lex->alter_info.flags& ALTER_ADMIN_PARTITION&&
>> - table_type != partition_hton)
>> - {
>> - my_error(ER_PARTITION_MGMT_ON_NONPARTITIONED, MYF(0));
>> - DBUG_RETURN(TRUE);
>> - }
>> -#endif
>> DEBUG_SYNC(thd, "lock_table_for_truncate");
>>
>> - if (*hton_can_recreate)
>> + /*
>> + Acquire an exclusive lock. A storage engine can recreate or
>> + truncate the table only if there are no references to it from
>> + anywhere, i.e. no cached TABLE in the table cache. To remove
>> + the table from the cache we need an exclusive lock.
>> + */
>> + if (thd->locked_tables_mode)
>> {
>> - /*
>> - Acquire an exclusive lock. The storage engine can recreate the
>> - table only if there are no references to it from anywhere, i.e.
>> - no cached TABLE in the table cache. To remove the table from the
>> - cache we need an exclusive lock.
>> - */
>> - if (thd->locked_tables_mode)
>> - {
>> - if (wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN))
>> - DBUG_RETURN(TRUE);
>> - *ticket_downgrade= table->mdl_ticket;
>> + if (wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN))
>> + DBUG_RETURN(TRUE);
>> + m_ticket_downgrade= table->mdl_ticket;
>
> So we init m_ticket_dowgrade in constructor and set here.
> Do we reset it anywhere to make code re-execution safe?
> (... OK, re-prepare will probably care about PS but what
> about SPs?...)
Fixed. Also added a test case.
>> + /* Close if table is going to be recreated. */
>> + if (*hton_can_recreate)
>> close_all_tables_for_name(thd, table->s, FALSE);
>> - }
>> - else
>> - {
>> - ulong timeout= thd->variables.lock_wait_timeout;
>> - if (thd->mdl_context.
>> - upgrade_shared_lock_to_exclusive(table_ref->mdl_request.ticket,
>> - timeout))
>> - DBUG_RETURN(TRUE);
>> - tdc_remove_table(thd, TDC_RT_REMOVE_ALL, table_ref->db,
>> - table_ref->table_name, FALSE);
>> - }
>> }
>> else
>> {
>> - /*
>> - Can't recreate, we must mechanically delete all rows in
>> - the table. Our metadata lock guarantees that no transaction
>> - is reading or writing into the table. Yet, to open a write
>> - cursor we need a thr_lock lock. Use open_and_lock_tables()
>> - to do the necessary job.
>> - */
>> -
>> - /* Allow to open base tables only. */
>> - table_ref->required_type= FRMTYPE_TABLE;
>> - /* We don't need to load triggers. */
>> - DBUG_ASSERT(table_ref->trg_event_map == 0);
>> - /*
>> - Even though we have an MDL lock on the table here, we don't
>> - pass MYSQL_OPEN_HAS_MDL_LOCK to open_and_lock_tables
>> - since to truncate a MERGE table, we must open and lock
>> - merge children, and on those we don't have an MDL lock.
>> - Thus clear the ticket to satisfy MDL asserts.
>> - */
>> - table_ref->mdl_request.ticket= NULL;
>> -
>> - /*
>> - Open the table as it will handle some required preparations.
>> - Ignore pending FLUSH TABLES since we don't want to release
>> - the MDL lock taken above and otherwise there is no way to
>> - wait for FLUSH TABLES in deadlock-free fashion.
>> - */
>> - if (open_and_lock_tables(thd, table_ref, FALSE,
>> - MYSQL_OPEN_IGNORE_FLUSH |
>> - MYSQL_OPEN_SKIP_TEMPORARY))
>> + ulong timeout= thd->variables.lock_wait_timeout;
>> + if (thd->mdl_context.
>> + upgrade_shared_lock_to_exclusive(table_ref->mdl_request.ticket,
> timeout))
>> DBUG_RETURN(TRUE);
>> + tdc_remove_table(thd, TDC_RT_REMOVE_ALL, table_ref->db,
>> + table_ref->table_name, FALSE);
>> }
>
> Taking into account that we now always upgrade metadata locks
> for base tables maybe it makes sense to take X metadata lock on
> table being truncated right from the start?
I think so but I remember that kostja had some desire to keep a shared
lock as long as necessary. I asked him once about this but go no response...
[..]
>> - DBUG_PRINT("exit", ("error: %d", error));
>> - DBUG_RETURN(test(error));
>> + DBUG_RETURN(error);
>> }
>>
>>
>> +/**
>> + Execute a TRUNCATE statement at runtime.
>> +
>> + @param thd The current thread.
>> +
>> + @return FALSE on success.
>> +*/
>> +
>> bool Truncate_statement::execute(THD *thd)
>> {
>> TABLE_LIST *first_table= thd->lex->select_lex.table_list.first;
>> @@ -502,9 +420,10 @@ bool Truncate_statement::execute(THD *th
>>
>> if (check_one_table_access(thd, DROP_ACL, first_table))
>> goto error;
>> +
>> /*
>> Don't allow this within a transaction because we want to use
>> - re-generate table
>> + re-generate table.
>> */
>> if (thd->in_active_multi_stmt_transaction())
>> {
>
> Taking into account that TRUNCATE TABLE causes implicit commit
> do we need the above if-statement at all?
Hum, some consider the below to be somewhat evil:
SET @autocommit = 0;
# Does implicit commit.
TRUNCATE TABLE t1;
but this now seems to be particular to truncate. Removed.
>> @@ -512,8 +431,11 @@ bool Truncate_statement::execute(THD *th
>> ER(ER_LOCK_OR_ACTIVE_TRANSACTION), MYF(0));
>> goto error;
>> }
>> - if (! (res= mysql_truncate_table(thd, first_table)))
>> +
>> + if (! (res= truncate_table(thd, first_table)))
>> my_ok(thd);
>> +
>> error:
>> DBUG_RETURN(res);
>> }
>> +
>>
>
> OK.
>
>> === modified file 'sql/sql_truncate.h'
>> --- a/sql/sql_truncate.h 2010-08-16 12:53:30 +0000
>> +++ b/sql/sql_truncate.h 2010-09-29 22:38:57 +0000
>> @@ -18,23 +18,26 @@
>> class THD;
>> struct TABLE_LIST;
>>
>> -bool mysql_truncate_table(THD *thd, TABLE_LIST *table_ref);
>> -
>> /**
>> Truncate_statement represents the TRUNCATE statement.
>> */
>> class Truncate_statement : public Sql_statement
>> {
>> +private:
>> + /* Set if a lock must be downgraded after truncate is done. */
>> + MDL_ticket *m_ticket_downgrade;
>> +
>> public:
>> /**
>> Constructor, used to represent a ALTER TABLE statement.
>> @param lex the LEX structure for this statement.
>> */
>> Truncate_statement(LEX *lex)
>> - : Sql_statement(lex)
>> + : Sql_statement(lex),
>> + m_ticket_downgrade(NULL)
>> {}
>>
>> - ~Truncate_statement()
>> + virtual ~Truncate_statement()
>> {}
>>
>> /**
>> @@ -43,7 +46,20 @@ public:
>> @return false on success.
>> */
>> bool execute(THD *thd);
>> -};
>>
>> +protected:
>> + /** Handle locking a base table for truncate. */
>> + bool lock_table(THD *, TABLE_LIST *, bool *);
>> +
>> + /** Truncate table via the handler method. */
>> + int handler_truncate(THD *, TABLE_LIST *, bool);
>> +
>> + /**
>> + Optimized delete of all rows by doing a full generate of the table.
>
> Maybe it is better to use word "regenerate" here ?
done.
>> + Depending on the storage engine, it can be accomplished through a
>> + drop and recreate or via the handler truncate method.
>> + */
>> + bool truncate_table(THD *, TABLE_LIST *);
>> +};
>>
>> #endif
>>
>> === modified file 'storage/archive/ha_archive.cc'
>> --- a/storage/archive/ha_archive.cc 2010-07-26 15:54:20 +0000
>> +++ b/storage/archive/ha_archive.cc 2010-09-29 22:38:57 +0000
>> @@ -1650,9 +1650,9 @@ int ha_archive::end_bulk_insert()
>> This is done for security reasons. In a later version we will enable this by
>> allowing the user to select a different row format.
>> */
>> -int ha_archive::delete_all_rows()
>> +int ha_archive::truncate()
>> {
>> - DBUG_ENTER("ha_archive::delete_all_rows");
>> + DBUG_ENTER("ha_archive::truncate");
>> DBUG_RETURN(HA_ERR_WRONG_COMMAND);
>> }
>>
>
> OK.
>
>> === modified file 'storage/blackhole/ha_blackhole.cc'
>> --- a/storage/blackhole/ha_blackhole.cc 2010-07-08 21:20:08 +0000
>> +++ b/storage/blackhole/ha_blackhole.cc 2010-09-29 22:38:57 +0000
>> @@ -87,6 +87,12 @@ int ha_blackhole::create(const char *nam
>> DBUG_RETURN(0);
>> }
>>
>> +int ha_blackhole::truncate()
>> +{
>> + DBUG_ENTER("ha_blackhole::truncate");
>> + DBUG_RETURN(0);
>> +}
>> +
>> const char *ha_blackhole::index_type(uint key_number)
>> {
>> DBUG_ENTER("ha_blackhole::index_type");
>>
>
> Do I understand correctly that the above method is needed only
> to support partitioning over blackhole engine (as this engine
> has HTON_CAN_RECREATE set)? Maybe it is worth to mention this
> in the comment for this method?
Done.
>> === modified file 'storage/csv/ha_tina.cc'
>> --- a/storage/csv/ha_tina.cc 2010-07-29 12:32:11 +0000
>> +++ b/storage/csv/ha_tina.cc 2010-09-29 22:38:57 +0000
>> @@ -1638,6 +1638,16 @@ int ha_tina::delete_all_rows()
>> }
>>
>> /*
>> + Manual truncate table.
>> +*/
>> +
>> +int ha_tina::truncate()
>> +{
>> + /* Simply delete all rows. No auto increment to reset. */
>> + return delete_all_rows();
>> +}
>> +
>> +/*
>> Called by the database to lock the table. Keep in mind that this
>> is an internal lock.
>> */
>>
>
> On one hand CSV has HTON_CAN_RECREATE flag set. So the above
> method will be called only for partitioned tables based on
> this engine.
> But on the other hand it also has HTON_NO_PARTITION set so such
> tables are impossible to create.
>
> Therefore I think it makes sense to leave this method unimplemented.
OK.
>> === modified file 'storage/example/ha_example.cc'
>> --- a/storage/example/ha_example.cc 2010-08-05 12:34:19 +0000
>> +++ b/storage/example/ha_example.cc 2010-09-29 22:38:57 +0000
>> @@ -739,6 +739,26 @@ int ha_example::delete_all_rows()
>>
>> /**
>> @brief
>> + Used for handler specific truncate table. The table is locked in
>> + exclusive mode and handler is responsible for reseting the auto-
>> + increment counter.
>> +
>> + @details
>> + Called from Truncate_statement::handler_truncate.
>> +
>> + @see
>> + Truncate_statement in sql_truncate.cc
>> + Remarks in handler::truncate.
>> +*/
>> +int ha_example::truncate()
>
> Don't you think that layout of the above comment looks wrong?
Yes, but it follows the style used throughout the file. Anyway, fixed
all cases :)
> Also maybe it makes sense to mention in this comment the role
> of HTON_CAN_RECREATE flag, especially since it is set for this
> engine?
>
Yes.
>
>> === modified file 'storage/ibmdb2i/ha_ibmdb2i.cc'
>> --- a/storage/ibmdb2i/ha_ibmdb2i.cc 2010-07-08 21:20:08 +0000
>> +++ b/storage/ibmdb2i/ha_ibmdb2i.cc 2010-09-29 22:38:57 +0000
>> @@ -1806,6 +1806,13 @@ int ha_ibmdb2i::delete_all_rows()
>> }
>>
>>
>> +int ha_ibmdb2i::truncate()
>> +{
>> + int error = delete_all_rows();
>> + return error ? error : reset_auto_increment(0);
>> +}
>> +
>> +
>> int ha_ibmdb2i::external_lock(THD *thd, int lock_type)
>> {
>> int rc = 0;
>
> OK. Seems like ha_ibmdb2i::delete_all_rows() contains
> some truncate-specific code, so it probably makes sense
> to notify person responsible for supporting this code
> (whoever it is) once your patch is pushed.
OK.
>>
>> === modified file 'storage/innobase/handler/ha_innodb.cc'
>> --- a/storage/innobase/handler/ha_innodb.cc 2010-08-18 07:14:06 +0000
>> +++ b/storage/innobase/handler/ha_innodb.cc 2010-09-29 22:38:57 +0000
>> @@ -7072,33 +7072,21 @@ Deletes all rows of an InnoDB table.
>> @return error number */
>> UNIV_INTERN
>> int
>> -ha_innobase::delete_all_rows(void)
>> +ha_innobase::truncate(void)
>> /*==============================*/
>> {
>> int error;
>>
>> - DBUG_ENTER("ha_innobase::delete_all_rows");
>> + DBUG_ENTER("ha_innobase::truncate");
>>
>> /* Get the transaction associated with the current thd, or create one
>> if not yet created, and update prebuilt->trx */
>>
>> update_thd(ha_thd());
>>
>> - if (thd_sql_command(user_thd) != SQLCOM_TRUNCATE) {
>> - fallback:
>> - /* We only handle TRUNCATE TABLE t as a special case.
>> - DELETE FROM t will have to use ha_innobase::delete_row(),
>> - because DELETE is transactional while TRUNCATE is not. */
>> - DBUG_RETURN(my_errno=HA_ERR_WRONG_COMMAND);
>> - }
>> -
>> /* Truncate the table in InnoDB */
>>
>> error = row_truncate_table_for_mysql(prebuilt->table, prebuilt->trx);
>> - if (error == DB_ERROR) {
>> - /* Cannot truncate; resort to ha_innobase::delete_row() */
>> - goto fallback;
>> - }
>>
>> error = convert_error_code_to_mysql(error, prebuilt->table->flags,
>> NULL);
>>
>
> OK. Please notify InnoDB guys so they can perform necessary clean-ups
> in their code once your patch is pushed.
OK.
>> === modified file 'storage/myisam/ha_myisam.cc'
>> --- a/storage/myisam/ha_myisam.cc 2010-07-08 21:20:08 +0000
>> +++ b/storage/myisam/ha_myisam.cc 2010-09-29 22:38:57 +0000
>> @@ -1788,6 +1788,12 @@ int ha_myisam::delete_all_rows()
>> return mi_delete_all_rows(file);
>> }
>>
>> +int ha_myisam::truncate()
>> +{
>> + int error= delete_all_rows();
>> + return error ? error : reset_auto_increment(0);
>> +}
>> +
>> int ha_myisam::reset_auto_increment(ulonglong value)
>> {
>> file->s->state.auto_increment= value;
>>
>
> Again maybe makes sense to mention in the comment describing
> this method that we need ha_myisam::truncate() only for the
> sake of partitioning?
Done.
>> === modified file 'storage/myisammrg/ha_myisammrg.cc'
>> --- a/storage/myisammrg/ha_myisammrg.cc 2010-09-08 08:25:37 +0000
>> +++ b/storage/myisammrg/ha_myisammrg.cc 2010-09-29 22:38:57 +0000
>> @@ -1203,6 +1203,22 @@ ha_rows ha_myisammrg::records_in_range(u
>> }
>>
>>
>> +int ha_myisammrg::truncate()
>> +{
>> + int err= 0;
>> + MYRG_TABLE *table;
>> + DBUG_ENTER("ha_myisammrg::truncate");
>> +
>> + for (table= file->open_tables; table != file->end_table; table++)
>> + {
>> + if ((err= mi_delete_all_rows(table->table)))
>> + break;
>> + }
>> +
>> + DBUG_RETURN(err);
>> +}
>> +
>> +
>
> Hmm... MERGE doesn't support HA_CAN_RECREATE and doesn't implement
> delete_all_rows() method. So it didn't support ordinary TRUNCATE
> before your patch. Also it has HTON_NO_PARTITION set so this method
> is not necessary for partitioning.
This is one corner case where TRUNCATE resorted to row-by-row delete.
> I don't think we should enable support of TRUNCATE for MERGE
> in your patch. Otherwise we probably should do something about
> resetting auto-increment and add test coverage for this new
> functionality.
Since the implementation is simple, I think it is worth implementing to
remain compatible. There is some test coverage already, that's why I
ended up implementing it. As for auto-increment, I think it does not
apply as the engine does not implement the auto-increment handler methods.
>
>> === modified file 'storage/perfschema/ha_perfschema.cc'
>> --- a/storage/perfschema/ha_perfschema.cc 2010-09-10 09:10:38 +0000
>> +++ b/storage/perfschema/ha_perfschema.cc 2010-09-29 22:38:57 +0000
>> @@ -345,6 +345,11 @@ int ha_perfschema::delete_all_rows(void)
>> DBUG_RETURN(result);
>> }
>>
>> +int ha_perfschema::truncate()
>> +{
>> + return delete_all_rows();
>> +}
>> +
>
> See my comment about bug #53696 above.
>
> ...
>
> Otherwise I am OK with your patch.
>
> I think it can be pushed after considering/addressing/discussing
> the above points and provided that we will settle somehow the
> issue with incompatible change in TRUNCATE behavior for InnoDB
> tables with FKs.
>
OK. Most points are addressed, we can discuss the rest over IRC.
Thank you.
Regards,
Davi