From: Dmitry Lenev Date: October 4 2010 7:11pm Subject: Re: bzr commit into mysql-5.5-runtime branch (davi:3149) Bug#49938 Bug#54678 List-Archive: http://lists.mysql.com/commits/119910 Message-Id: <20101004191100.GB25938@mockturtle> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Hello Davi! Here are my comments about your patch: * Davi Arnaut [10/09/30 02:41]: > 3149 Davi Arnaut 2010-09-29 > Bug#49938: Failing assertion: inode or deadlock in fsp/fsp0fsp.c > Bug#54678: InnoDB, TRUNCATE, ALTER, I_S SELECT, crash or deadlock > > The problem was that for storage engines that do not support > truncate table via a external drop and recreate, such as InnoDB > which implements truncate via a internal drop and recreate, the > delete_all_rows method could be invoked with a shared metadata > lock, causing problems if the engine needed exclusive access > to some internal metadata. This problem originated with the > fact that there is no truncate specific handler method, which > ended up leading to a abuse of the delete_all_rows method that > is primarily used for delete operations without a condition. > > The solution is to introduce a truncate handler method that is > invoked when the engine does not support truncation via a table > drop and recreate. This method is invoked under a exclusive > metadata lock, so that there is only a single instance of the > table when the method is invoked. > > Also, the method is not invoked and a error is thrown if > the table participates in a non-self-referencing foreign key > relationship. This was necessary to avoid inconsistency as > integrity checks are bypassed. This is inline with the fact > that truncate is primarily a DDL operation that was designed > to quickly remove all data from a table. > > Incompatible change: truncate no longer resorts to a high-level > row by row delete if the storage engine does not support the > truncate method. Consequently, the count of rows affected does > not reflect the actual number of rows. I think it makes sense to highlight in this comment the fact that TRUNCATE will no longer work for InnoDB tables which participate in non-recursive foreign key as parent. This includes cases when it succeeded before - when child table was empty (e.g. due to previous truncation) or this foreign key had a CASCADE as ON DELETE action. Actually this incompatible change is my main issue with your patch. Although I agree that this is a step in the right direction (we did the same thing in 6.1-fk tree after all) I am concerned with the fact that we are introducing it fairly late in the release cycle. ... > === modified file 'mysql-test/r/trigger-trans.result' > --- a/mysql-test/r/trigger-trans.result 2008-10-07 16:54:12 +0000 > +++ b/mysql-test/r/trigger-trans.result 2010-09-29 22:38:57 +0000 > @@ -151,9 +151,14 @@ CREATE TRIGGER t1_ad AFTER DELETE ON t1 > SET @a = 0; > SET @b = 0; > TRUNCATE t1; > +ERROR 42000: Cannot truncate table 't1' that is referred to by foreign key in another table > SELECT @a, @b; > @a @b > 0 0 > +DELETE FROM t1; > +SELECT @a, @b; > +@a @b > +1 1 > INSERT INTO t1 VALUES (1); I think the above 3 statements are redundant and can be simply removed. > DELETE FROM t1; > SELECT @a, @b; > ... > === modified file 'sql/ha_partition.cc' > --- a/sql/ha_partition.cc 2010-08-19 08:22:23 +0000 > +++ b/sql/ha_partition.cc 2010-09-29 22:38:57 +0000 ... > + > + > +/** > + Manually truncate the table. > + > + @remark Auto increment value will be truncated in that partition as well! > + Is the above comment still relevant? If yes which partition is meant under "that" ? > + @retval 0 Success. > + @retval > 0 Error code. > +*/ > + > +int ha_partition::truncate() > +{ ... > > > +/** > + Truncate a set of specific partitioins. ^^^^^^^^^^^ Typo: partitions > + > + ALTER TABLE t TRUNCATE PARTITION ... > +*/ > + > +int ha_partition::truncate_partition(Alter_info *alter_info) > +{ > + int error= 0; > + List_iterator part_it(m_part_info->partitions); > + uint num_parts= m_part_info->num_parts; > + uint num_subparts= m_part_info->num_subparts; > + uint i= 0; > + uint num_parts_set= alter_info->partition_names.elements; > + uint num_parts_found= set_part_state(alter_info, m_part_info, > + PART_ADMIN); > + DBUG_ENTER("ha_partition::truncate_partition"); > + > + /* TRUNCATE also means resetting auto_increment */ > + lock_auto_increment(); > + table_share->ha_part_data->next_auto_inc_val= 0; > + table_share->ha_part_data->auto_inc_initialized= FALSE; > + unlock_auto_increment(); > + > + if (num_parts_set != num_parts_found && > + (!(alter_info->flags & ALTER_ALL_PARTITION))) > + DBUG_RETURN(HA_ERR_NO_PARTITION_FOUND); > + > + do > + { > + partition_element *part_elem= part_it++; > + if (part_elem->part_state == PART_ADMIN) > + { > + if (m_is_sub_partitioned) > + { > + List_iterator > + subpart_it(part_elem->subpartitions); > + partition_element *sub_elem; > + uint j= 0, part; > + do > + { > + sub_elem= subpart_it++; > + part= i * num_subparts + j; > + DBUG_PRINT("info", ("truncate subpartition %u (%s)", > + part, sub_elem->partition_name)); > + if ((error= m_file[part]->ha_truncate())) > + break; > + } while (++j < num_subparts); > + } > + else > + { > + DBUG_PRINT("info", ("truncate partition %u (%s)", i, > + part_elem->partition_name)); > + error= m_file[i]->ha_truncate(); > + } > + part_elem->part_state= PART_NORMAL; > + } > + } while (!error && (++i < num_parts)); > + DBUG_RETURN(error); > +} OK. The code is so much clearer now. ... > === modified file 'sql/ha_partition.h' > --- a/sql/ha_partition.h 2010-07-23 20:09:27 +0000 > +++ b/sql/ha_partition.h 2010-09-29 22:38:57 +0000 > @@ -342,6 +342,7 @@ public: > virtual int update_row(const uchar * old_data, uchar * new_data); > virtual int delete_row(const uchar * buf); > virtual int delete_all_rows(void); > + virtual int truncate(); > virtual void start_bulk_insert(ha_rows rows); > virtual int end_bulk_insert(); > private: > @@ -350,6 +351,12 @@ private: > long estimate_read_buffer_size(long original_size); > public: > > + /* > + Method for truncating a specific partition. > + (i.e. ALTER TABLE t1 TRUNCATE PARTITION p). > + */ > + int truncate_partition(Alter_info *); > + Please correct me if I am wrong. This method is not a method of general SE API but rather a partitioning-specific hook. So we are not adding Alter_info class to the SE API. Right? Maybe it makes sense to mention this in the above comment? ... > === modified file 'sql/handler.h' > --- a/sql/handler.h 2010-08-18 11:55:37 +0000 > +++ b/sql/handler.h 2010-09-29 22:38:57 +0000 > @@ -1331,6 +1331,7 @@ public: > int ha_bulk_update_row(const uchar *old_data, uchar *new_data, > uint *dup_key_found); > int ha_delete_all_rows(); > + int ha_truncate(); > int ha_reset_auto_increment(ulonglong value); > int ha_optimize(THD* thd, HA_CHECK_OPT* check_opt); > int ha_analyze(THD* thd, HA_CHECK_OPT* check_opt); > @@ -2010,12 +2011,31 @@ private: > This is called to delete all rows in a table > If the handler don't support this, then this function will > return HA_ERR_WRONG_COMMAND and MySQL will delete the rows one > - by one. It should reset auto_increment if > - thd->lex->sql_command == SQLCOM_TRUNCATE. > + by one. > */ > virtual int delete_all_rows() > { return (my_errno=HA_ERR_WRONG_COMMAND); } > /** > + Quickly remove all rows from a table. > + > + @remark This method is responsible for implementing MySQL's TRUNCATE > + TABLE statement, which is a DDL operation. As such, a engine > + can bypass certain integrity checks and in some cases avoid > + fine-grained locking (e.g. row locks) which would normally be > + required for a DELETE statement. > + > + @remark Typically, truncate is not used if it can result in integrity > + violation. For example, truncate is not used when a foreign > + key references the table, but it might be used if foreign key > + checks are disabled. > + > + @remark Engine is responsible for reseting the auto-increment counter. > + > + @remark The table is locked in exclusive mode. > + */ > + virtual int truncate() > + { return HA_ERR_WRONG_COMMAND; } > + /** > Reset the auto-increment counter to the given value, i.e. the next row > inserted will get the given value. This is called e.g. after TRUNCATE > is emulated by doing a 'DELETE FROM t'. HA_ERR_WRONG_COMMAND is > Hmm... handler::ha_/reset_auto_increment() is no longer used outside of storage engines code. I am not sure if it makes sense to get rid of this API method at this point... OTOH it is probably a good idea to update the above comment describing this method. ... > === modified file 'sql/sql_parse.cc' > --- a/sql/sql_parse.cc 2010-09-22 08:15:41 +0000 > +++ b/sql/sql_parse.cc 2010-09-29 22:38:57 +0000 > @@ -49,7 +49,7 @@ > // mysql_recreate_table, > // mysql_backup_table, > // mysql_restore_table > -#include "sql_truncate.h" // mysql_truncate_table > +#include "sql_truncate.h" // Truncate_statement Taking into account that Truncate_statement class is not used in sql_parse.cc do we need this include at all? ... > === modified file 'sql/sql_partition_admin.cc' > --- a/sql/sql_partition_admin.cc 2010-08-16 14:25:23 +0000 > +++ b/sql/sql_partition_admin.cc 2010-09-29 22:38:57 +0000 > @@ -16,10 +16,10 @@ > #include "sql_parse.h" // check_one_table_access > #include "sql_table.h" // mysql_alter_table, etc. > #include "sql_lex.h" // Sql_statement > -#include "sql_truncate.h" // mysql_truncate_table, > - // Truncate_statement > #include "sql_admin.h" // Analyze/Check/.._table_statement > #include "sql_partition_admin.h" // Alter_table_*_partition > +#include "ha_partition.h" // ha_partition > +#include "sql_base.h" // open_and_lock_tables > > #ifndef WITH_PARTITION_STORAGE_ENGINE > > @@ -46,7 +46,7 @@ bool Alter_table_analyze_partition_state > m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION; > > res= Analyze_table_statement::execute(thd); > - > + > DBUG_RETURN(res); > } > > @@ -104,36 +104,64 @@ bool Alter_table_repair_partition_statem > > bool Alter_table_truncate_partition_statement::execute(THD *thd) > { > + int error; > + ha_partition *partition; > TABLE_LIST *first_table= thd->lex->select_lex.table_list.first; > - bool res; > - enum_sql_command original_sql_command; > DBUG_ENTER("Alter_table_truncate_partition_statement::execute"); > > /* > - Execute TRUNCATE PARTITION just like TRUNCATE TABLE. > - Some storage engines (InnoDB, partition) checks thd_sql_command, > - so we set it to SQLCOM_TRUNCATE during the execution. > - */ > - original_sql_command= m_lex->sql_command; > - m_lex->sql_command= SQLCOM_TRUNCATE; > - > - /* > Flag that it is an ALTER command which administrates partitions, used > by ha_partition. > */ > - m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION; > - > + m_lex->alter_info.flags|= ALTER_ADMIN_PARTITION | > + ALTER_TRUNCATE_PARTITION; > + > + /* Fix the lock types (not the same as ordinary ALTER TABLE). */ > + first_table->lock_type= TL_WRITE; > + first_table->mdl_request.set_type(MDL_EXCLUSIVE); > + > /* > - Fix the lock types (not the same as ordinary ALTER TABLE). > + Check table permissions and open it with a exclusive lock. > + Ensure it is a partitioned table and finally, upcast the > + handler and invoke the partition truncate method. Lastly, > + write the statement to the binary log if necessary. > */ > - first_table->lock_type= TL_WRITE; > - first_table->mdl_request.set_type(MDL_SHARED_NO_READ_WRITE); > > - /* execute as a TRUNCATE TABLE */ > - res= Truncate_statement::execute(thd); > + if (check_one_table_access(thd, DROP_ACL, first_table)) > + DBUG_RETURN(TRUE); > > - m_lex->sql_command= original_sql_command; > - DBUG_RETURN(res); > + if (open_and_lock_tables(thd, first_table, FALSE, 0)) > + DBUG_RETURN(TRUE); > + > + /* > + TODO: Add support for TRUNCATE PARTITION for NDB and other > + engines supporting native partitioning. > + */ > + if (first_table->table->s->db_type() != partition_hton) > + { > + my_error(ER_PARTITION_MGMT_ON_NONPARTITIONED, MYF(0)); > + DBUG_RETURN(TRUE); > + } > + > + partition= (ha_partition *) first_table->table->file; > + > + /* Invoke the handler method responsible for truncating the partition. */ > + if ((error= partition->truncate_partition(&thd->lex->alter_info))) > + first_table->table->file->print_error(error, MYF(0)); > + > + /* > + All effects of a truncate operation are committed even if the > + operation fails. Thus, the query must be written to the binary > + log. The only exception is a unimplemented truncate method. Also, > + it is logged in statement format, regardless of the binlog format. > + */ > + if (error != HA_ERR_WRONG_COMMAND) > + error|= write_bin_log(thd, !error, thd->query(), thd->query_length()); > + > + if (! error) > + my_ok(thd); > + > + DBUG_RETURN(error); > } So what happens if one executes ALTER TABLE TRUNCATE PARTITION under LOCK TABLES? Don't we need to upgrade SNRW metadata lock to X lock in this case? ... > === modified file 'sql/sql_truncate.cc' > --- a/sql/sql_truncate.cc 2010-08-31 10:03:36 +0000 > +++ b/sql/sql_truncate.cc 2010-09-29 22:38:57 +0000 > @@ -29,145 +29,120 @@ > #include "sql_truncate.h" > > > -/* > - Delete all rows of a locked table. > - > - @param thd Thread context. > - @param table_list Table list element for the table. > - @param rows_deleted Whether rows might have been deleted. > - > - @retval FALSE Success. > - @retval TRUE Error. > +/** > + Check if table which is going to be affected by TRUNCATE TABLE > + is a parent table in some non-self-referencing foreign key and > + if yes emit an error that it is illegal to truncate such table. > + > + @param thd Thread context. > + @param table Table handle. > + > + @retval FALSE This table is not parent in a non-self-referencing foreign > + key. Statement can proceed. > + @retval TRUE This table is parent in a non-self-referencing foreign key, > + error was emitted. > */ > > static bool > -delete_all_rows(THD *thd, TABLE *table) > +fk_truncate_illegal_if_parent(THD *thd, TABLE *table) > { > - int error; > - READ_RECORD info; > - bool is_bulk_delete; > - bool some_rows_deleted= FALSE; > - bool save_binlog_row_based= thd->is_current_stmt_binlog_format_row(); > - DBUG_ENTER("delete_all_rows"); > + FOREIGN_KEY_INFO *f_key_info; > + List f_key_list; > + List_iterator_fast it; > > - /* Replication of truncate table must be statement based. */ > - thd->clear_current_stmt_binlog_format_row(); > + /* Whether the table is referenced by a foreign key. */ > + if (! table->file->referenced_by_foreign_key()) > + return FALSE; > > - /* > - Update handler statistics (e.g. table->file->stats.records). > - Might be used by the storage engine to aggregate information > - necessary to allow deletion. Currently, this seems to be > - meaningful only to the archive storage engine, which uses > - the info method to set the number of records. Although > - archive does not support deletion, it becomes necessary in > - order to return a error if the table is not empty. > - */ > - error= table->file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK); > - if (error && error != HA_ERR_WRONG_COMMAND) > - { > - table->file->print_error(error, MYF(0)); > - goto end; > - } > + table->file->get_foreign_key_list(thd, &f_key_list); > > - /* > - Attempt to delete all rows in the table. > - If it is unsupported, switch to row by row deletion. > - */ > - if (! (error= table->file->ha_delete_all_rows())) > - goto end; > + it.init(f_key_list); > > - if (error != HA_ERR_WRONG_COMMAND) > + /* Check whether it is a self-referencing FK. */ > + while ((f_key_info= it++)) > { > - /* > - If a transactional engine fails in the middle of deletion, > - we expect it to be able to roll it back. Some reasons > - for the engine to fail would be media failure or corrupted > - data dictionary (i.e. in case of a partitioned table). We > - have sufficiently strong metadata locks to rule out any > - potential deadlocks. > - > - If a non-transactional engine fails here (that would > - not be MyISAM, since MyISAM does TRUNCATE by recreate), > - and binlog is on, replication breaks, since nothing gets > - written to the binary log. (XXX: is this a bug?) > - */ > - table->file->print_error(error, MYF(0)); > - goto end; > + if (my_strcasecmp(system_charset_info, f_key_info->referenced_db->str, > + table->s->db.str) || > + my_strcasecmp(system_charset_info, f_key_info->referenced_table->str, > + table->s->table_name.str)) > + break; > } > > - /* > - A workaround for Bug#53696 "Performance schema engine violates the > - PSEA API by calling my_error()". > - */ > - if (thd->is_error()) > - goto end; > - AFAIU this bug is still there. Does this mean that this code should have been moved somewhere else and not removed? > - /* Handler didn't support fast delete. Delete rows one by one. */ > - > - init_read_record(&info, thd, table, NULL, TRUE, TRUE, FALSE); > - > - /* > - Start bulk delete. If the engine does not support it, go on, > - it's not an error. > - */ > - is_bulk_delete= ! table->file->start_bulk_delete(); > + /* Table is parent in a non-self-referencing foreign key. */ > + if (f_key_list.is_empty() || f_key_info) > + { > + my_error(ER_TRUNCATE_ILLEGAL_FK, MYF(0), table->s->table_name.str); > + return TRUE; > + } Could you please clarify why "f_key_list.is_empty()" part of the above condition is needed? > > - table->mark_columns_needed_for_delete(); > + return FALSE; > +} > > - while (!(error= info.read_record(&info)) && !thd->killed) > - { > - if ((error= table->file->ha_delete_row(table->record[0]))) > - { > - table->file->print_error(error, MYF(0)); > - break; > - } > > - some_rows_deleted= TRUE; > - } > +/* > + Open and truncate a locked table. > > - /* HA_ERR_END_OF_FILE */ > - if (error == -1) > - error= 0; > + @param thd Thread context. > + @param table_ref Table list element for the table to be truncated. > + @param is_tmp_table True if element refers to a temp table. > > - /* Close down the bulk delete. */ > - if (is_bulk_delete) > - { > - int bulk_delete_error= table->file->end_bulk_delete(); > - if (bulk_delete_error && !error) > - { > - table->file->print_error(bulk_delete_error, MYF(0)); > - error= bulk_delete_error; > - } > - } > + @retval 0 Success. > + @retval > 0 Error code. > +*/ > > - end_read_record(&info); > +int Truncate_statement::handler_truncate(THD *thd, TABLE_LIST *table_ref, > + bool is_tmp_table) > +{ > + int error= 0; > + uint flags; > + DBUG_ENTER("Truncate_statement::handler_truncate"); > > /* > - Regardless of the error status, the query must be written to the > - binary log if rows of the table is non-transactional. > + Can't recreate, the engine must mechanically delete all rows > + in the table. Use open_and_lock_tables() to open a write cursor. > */ > - if (some_rows_deleted && !table->file->has_transactions()) > + > + /* If it is a temporary table, no need to take locks. */ > + if (is_tmp_table) > + flags= MYSQL_OPEN_TEMPORARY_ONLY; > + else > { > - thd->transaction.stmt.modified_non_trans_table= TRUE; > - thd->transaction.all.modified_non_trans_table= TRUE; > + /* We don't need to load triggers. */ > + DBUG_ASSERT(table_ref->trg_event_map == 0); > + /* > + Our metadata lock guarantees that no transaction is reading > + or writing into the table. Yet, to open a write cursor we need > + a thr_lock lock. Allow to open base tables only. > + */ > + table_ref->required_type= FRMTYPE_TABLE; > + /* > + Ignore pending FLUSH TABLES since we don't want to release > + the MDL lock taken above and otherwise there is no way to > + wait for FLUSH TABLES in deadlock-free fashion. > + */ > + flags= MYSQL_OPEN_IGNORE_FLUSH | MYSQL_OPEN_SKIP_TEMPORARY; > + /* > + Even though we have an MDL lock on the table here, we don't > + pass MYSQL_OPEN_HAS_MDL_LOCK to open_and_lock_tables > + since to truncate a MERGE table, we must open and lock > + merge children, and on those we don't have an MDL lock. > + Thus clear the ticket to satisfy MDL asserts. > + */ > + table_ref->mdl_request.ticket= NULL; > + /* The handler truncate protocol dictates a exclusive lock. */ > + table_ref->mdl_request.set_type(MDL_EXCLUSIVE); > } Maybe it makes sense to set the type of metadata lock right in the parser? ... > @@ -225,30 +200,27 @@ static bool recreate_temporary_table(THD > > > /* > - Handle opening and locking if a base table for truncate. > + Handle locking a base table for truncate. > > @param[in] thd Thread context. > @param[in] table_ref Table list element for the table to > be truncated. > @param[out] hton_can_recreate Set to TRUE if table can be dropped > and recreated. > - @param[out] ticket_downgrade Set if a lock must be downgraded after > - truncate is done. > > @retval FALSE Success. > @retval TRUE Error. > */ > > -static bool open_and_lock_table_for_truncate(THD *thd, TABLE_LIST *table_ref, > - bool *hton_can_recreate, > - MDL_ticket **ticket_downgrade) > +bool Truncate_statement::lock_table(THD *thd, TABLE_LIST *table_ref, > + bool *hton_can_recreate) > { > TABLE *table= NULL; > - handlerton *table_type; > - DBUG_ENTER("open_and_lock_table_for_truncate"); > + DBUG_ENTER("Truncate_statement::lock_table"); > > DBUG_ASSERT(table_ref->lock_type == TL_WRITE); > - DBUG_ASSERT(table_ref->mdl_request.type == MDL_SHARED_NO_READ_WRITE); > + DBUG_ASSERT(table_ref->mdl_request.type >= MDL_SHARED_NO_READ_WRITE); > + > /* > Before doing anything else, acquire a metadata lock on the table, > or ensure we have one. We don't use open_and_lock_tables() > @@ -268,8 +240,7 @@ static bool open_and_lock_table_for_trun > table_ref->table_name, FALSE))) > DBUG_RETURN(TRUE); > > - table_type= table->s->db_type(); > - *hton_can_recreate= ha_check_storage_engine_flag(table_type, > + *hton_can_recreate= ha_check_storage_engine_flag(table->s->db_type(), > HTON_CAN_RECREATE); > table_ref->mdl_request.ticket= table->mdl_ticket; > } > @@ -285,86 +256,36 @@ static bool open_and_lock_table_for_trun > MYSQL_OPEN_SKIP_TEMPORARY)) > DBUG_RETURN(TRUE); > > - if (dd_frm_storage_engine(thd, table_ref->db, table_ref->table_name, > - &table_type)) > + if (dd_check_storage_engine_flag(thd, table_ref->db, table_ref->table_name, > + HTON_CAN_RECREATE, hton_can_recreate)) > DBUG_RETURN(TRUE); > - *hton_can_recreate= ha_check_storage_engine_flag(table_type, > - HTON_CAN_RECREATE); > } > > -#ifdef WITH_PARTITION_STORAGE_ENGINE > - /* > - TODO: Add support for TRUNCATE PARTITION for NDB and other engines > - supporting native partitioning. > - */ > - if (thd->lex->alter_info.flags & ALTER_ADMIN_PARTITION && > - table_type != partition_hton) > - { > - my_error(ER_PARTITION_MGMT_ON_NONPARTITIONED, MYF(0)); > - DBUG_RETURN(TRUE); > - } > -#endif > DEBUG_SYNC(thd, "lock_table_for_truncate"); > > - if (*hton_can_recreate) > + /* > + Acquire an exclusive lock. A storage engine can recreate or > + truncate the table only if there are no references to it from > + anywhere, i.e. no cached TABLE in the table cache. To remove > + the table from the cache we need an exclusive lock. > + */ > + if (thd->locked_tables_mode) > { > - /* > - Acquire an exclusive lock. The storage engine can recreate the > - table only if there are no references to it from anywhere, i.e. > - no cached TABLE in the table cache. To remove the table from the > - cache we need an exclusive lock. > - */ > - if (thd->locked_tables_mode) > - { > - if (wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN)) > - DBUG_RETURN(TRUE); > - *ticket_downgrade= table->mdl_ticket; > + if (wait_while_table_is_used(thd, table, HA_EXTRA_FORCE_REOPEN)) > + DBUG_RETURN(TRUE); > + m_ticket_downgrade= table->mdl_ticket; So we init m_ticket_dowgrade in constructor and set here. Do we reset it anywhere to make code re-execution safe? (... OK, re-prepare will probably care about PS but what about SPs?...) > + /* Close if table is going to be recreated. */ > + if (*hton_can_recreate) > close_all_tables_for_name(thd, table->s, FALSE); > - } > - else > - { > - ulong timeout= thd->variables.lock_wait_timeout; > - if (thd->mdl_context. > - upgrade_shared_lock_to_exclusive(table_ref->mdl_request.ticket, > - timeout)) > - DBUG_RETURN(TRUE); > - tdc_remove_table(thd, TDC_RT_REMOVE_ALL, table_ref->db, > - table_ref->table_name, FALSE); > - } > } > else > { > - /* > - Can't recreate, we must mechanically delete all rows in > - the table. Our metadata lock guarantees that no transaction > - is reading or writing into the table. Yet, to open a write > - cursor we need a thr_lock lock. Use open_and_lock_tables() > - to do the necessary job. > - */ > - > - /* Allow to open base tables only. */ > - table_ref->required_type= FRMTYPE_TABLE; > - /* We don't need to load triggers. */ > - DBUG_ASSERT(table_ref->trg_event_map == 0); > - /* > - Even though we have an MDL lock on the table here, we don't > - pass MYSQL_OPEN_HAS_MDL_LOCK to open_and_lock_tables > - since to truncate a MERGE table, we must open and lock > - merge children, and on those we don't have an MDL lock. > - Thus clear the ticket to satisfy MDL asserts. > - */ > - table_ref->mdl_request.ticket= NULL; > - > - /* > - Open the table as it will handle some required preparations. > - Ignore pending FLUSH TABLES since we don't want to release > - the MDL lock taken above and otherwise there is no way to > - wait for FLUSH TABLES in deadlock-free fashion. > - */ > - if (open_and_lock_tables(thd, table_ref, FALSE, > - MYSQL_OPEN_IGNORE_FLUSH | > - MYSQL_OPEN_SKIP_TEMPORARY)) > + ulong timeout= thd->variables.lock_wait_timeout; > + if (thd->mdl_context. > + upgrade_shared_lock_to_exclusive(table_ref->mdl_request.ticket, timeout)) > DBUG_RETURN(TRUE); > + tdc_remove_table(thd, TDC_RT_REMOVE_ALL, table_ref->db, > + table_ref->table_name, FALSE); > } Taking into account that we now always upgrade metadata locks for base tables maybe it makes sense to take X metadata lock on table being truncated right from the start? > > DBUG_RETURN(FALSE); > @@ -385,14 +306,14 @@ static bool open_and_lock_table_for_trun > @retval TRUE Error. > */ > > -bool mysql_truncate_table(THD *thd, TABLE_LIST *table_ref) > +bool Truncate_statement::truncate_table(THD *thd, TABLE_LIST *table_ref) > { > + int error; > TABLE *table; > - bool error= TRUE, binlog_stmt; > - MDL_ticket *mdl_ticket= NULL; > - DBUG_ENTER("mysql_truncate_table"); > + bool binlog_stmt; > + DBUG_ENTER("Truncate_statement::truncate_table"); > > - /* Remove tables from the HANDLER's hash. */ > + /* Remove table from the HANDLER's hash. */ > mysql_ha_rm_tables(thd, table_ref); > > /* If it is a temporary table, no need to take locks. */ > @@ -413,14 +334,11 @@ bool mysql_truncate_table(THD *thd, TABL > { > /* > The engine does not support truncate-by-recreate. Open the > - table and delete all rows. In such a manner this can in fact > - open several tables if it's a temporary MyISAMMRG table. > + table and invoke the handler truncate. In such a manner this > + can in fact open several tables if it's a temporary MyISAMMRG > + table. > */ > - if (open_and_lock_tables(thd, table_ref, FALSE, > - MYSQL_OPEN_TEMPORARY_ONLY)) > - DBUG_RETURN(TRUE); > - > - error= delete_all_rows(thd, table_ref->table); > + error= handler_truncate(thd, table_ref, TRUE); > } > > /* > @@ -434,8 +352,7 @@ bool mysql_truncate_table(THD *thd, TABL > { > bool hton_can_recreate; > > - if (open_and_lock_table_for_truncate(thd, table_ref, > - &hton_can_recreate, &mdl_ticket)) > + if (lock_table(thd, table_ref, &hton_can_recreate)) > DBUG_RETURN(TRUE); > > if (hton_can_recreate) > @@ -454,13 +371,18 @@ bool mysql_truncate_table(THD *thd, TABL > } > else > { > - error= delete_all_rows(thd, table_ref->table); > + /* > + The engine does not support truncate-by-recreate. > + Attempt to use the handler truncate method. > + */ > + error= handler_truncate(thd, table_ref, FALSE); > > /* > - Regardless of the error status, the query must be written to the > - binary log if rows of a non-transactional table were deleted. > + All effects of a TRUNCATE TABLE operation are committed even if > + truncation fails. Thus, the query must be written to the binary > + log. The only exception is a unimplemented truncate method. > */ > - binlog_stmt= !error || thd->transaction.stmt.modified_non_trans_table; > + binlog_stmt= !error || error != HA_ERR_WRONG_COMMAND; > } > > query_cache_invalidate3(thd, table_ref, FALSE); > @@ -471,29 +393,25 @@ bool mysql_truncate_table(THD *thd, TABL > error|= write_bin_log(thd, !error, thd->query(), thd->query_length()); > > /* > - All effects of a TRUNCATE TABLE operation are rolled back if a row > - by row deletion fails. Otherwise, it is automatically committed at > - the end. > - */ > - if (error) > - { > - trans_rollback_stmt(thd); > - trans_rollback(thd); > - } > - > - /* > A locked table ticket was upgraded to a exclusive lock. After the > the query has been written to the binary log, downgrade the lock > to a shared one. > */ > - if (mdl_ticket) > - mdl_ticket->downgrade_exclusive_lock(MDL_SHARED_NO_READ_WRITE); > + if (m_ticket_downgrade) > + m_ticket_downgrade->downgrade_exclusive_lock(MDL_SHARED_NO_READ_WRITE); > See the above code about resetting m_ticket_downgrade before statement re-execution. > - DBUG_PRINT("exit", ("error: %d", error)); > - DBUG_RETURN(test(error)); > + DBUG_RETURN(error); > } > > > +/** > + Execute a TRUNCATE statement at runtime. > + > + @param thd The current thread. > + > + @return FALSE on success. > +*/ > + > bool Truncate_statement::execute(THD *thd) > { > TABLE_LIST *first_table= thd->lex->select_lex.table_list.first; > @@ -502,9 +420,10 @@ bool Truncate_statement::execute(THD *th > > if (check_one_table_access(thd, DROP_ACL, first_table)) > goto error; > + > /* > Don't allow this within a transaction because we want to use > - re-generate table > + re-generate table. > */ > if (thd->in_active_multi_stmt_transaction()) > { Taking into account that TRUNCATE TABLE causes implicit commit do we need the above if-statement at all? > @@ -512,8 +431,11 @@ bool Truncate_statement::execute(THD *th > ER(ER_LOCK_OR_ACTIVE_TRANSACTION), MYF(0)); > goto error; > } > - if (! (res= mysql_truncate_table(thd, first_table))) > + > + if (! (res= truncate_table(thd, first_table))) > my_ok(thd); > + > error: > DBUG_RETURN(res); > } > + > OK. > === modified file 'sql/sql_truncate.h' > --- a/sql/sql_truncate.h 2010-08-16 12:53:30 +0000 > +++ b/sql/sql_truncate.h 2010-09-29 22:38:57 +0000 > @@ -18,23 +18,26 @@ > class THD; > struct TABLE_LIST; > > -bool mysql_truncate_table(THD *thd, TABLE_LIST *table_ref); > - > /** > Truncate_statement represents the TRUNCATE statement. > */ > class Truncate_statement : public Sql_statement > { > +private: > + /* Set if a lock must be downgraded after truncate is done. */ > + MDL_ticket *m_ticket_downgrade; > + > public: > /** > Constructor, used to represent a ALTER TABLE statement. > @param lex the LEX structure for this statement. > */ > Truncate_statement(LEX *lex) > - : Sql_statement(lex) > + : Sql_statement(lex), > + m_ticket_downgrade(NULL) > {} > > - ~Truncate_statement() > + virtual ~Truncate_statement() > {} > > /** > @@ -43,7 +46,20 @@ public: > @return false on success. > */ > bool execute(THD *thd); > -}; > > +protected: > + /** Handle locking a base table for truncate. */ > + bool lock_table(THD *, TABLE_LIST *, bool *); > + > + /** Truncate table via the handler method. */ > + int handler_truncate(THD *, TABLE_LIST *, bool); > + > + /** > + Optimized delete of all rows by doing a full generate of the table. Maybe it is better to use word "regenerate" here ? > + Depending on the storage engine, it can be accomplished through a > + drop and recreate or via the handler truncate method. > + */ > + bool truncate_table(THD *, TABLE_LIST *); > +}; > > #endif > > === modified file 'storage/archive/ha_archive.cc' > --- a/storage/archive/ha_archive.cc 2010-07-26 15:54:20 +0000 > +++ b/storage/archive/ha_archive.cc 2010-09-29 22:38:57 +0000 > @@ -1650,9 +1650,9 @@ int ha_archive::end_bulk_insert() > This is done for security reasons. In a later version we will enable this by > allowing the user to select a different row format. > */ > -int ha_archive::delete_all_rows() > +int ha_archive::truncate() > { > - DBUG_ENTER("ha_archive::delete_all_rows"); > + DBUG_ENTER("ha_archive::truncate"); > DBUG_RETURN(HA_ERR_WRONG_COMMAND); > } > OK. > === modified file 'storage/blackhole/ha_blackhole.cc' > --- a/storage/blackhole/ha_blackhole.cc 2010-07-08 21:20:08 +0000 > +++ b/storage/blackhole/ha_blackhole.cc 2010-09-29 22:38:57 +0000 > @@ -87,6 +87,12 @@ int ha_blackhole::create(const char *nam > DBUG_RETURN(0); > } > > +int ha_blackhole::truncate() > +{ > + DBUG_ENTER("ha_blackhole::truncate"); > + DBUG_RETURN(0); > +} > + > const char *ha_blackhole::index_type(uint key_number) > { > DBUG_ENTER("ha_blackhole::index_type"); > Do I understand correctly that the above method is needed only to support partitioning over blackhole engine (as this engine has HTON_CAN_RECREATE set)? Maybe it is worth to mention this in the comment for this method? > === modified file 'storage/csv/ha_tina.cc' > --- a/storage/csv/ha_tina.cc 2010-07-29 12:32:11 +0000 > +++ b/storage/csv/ha_tina.cc 2010-09-29 22:38:57 +0000 > @@ -1638,6 +1638,16 @@ int ha_tina::delete_all_rows() > } > > /* > + Manual truncate table. > +*/ > + > +int ha_tina::truncate() > +{ > + /* Simply delete all rows. No auto increment to reset. */ > + return delete_all_rows(); > +} > + > +/* > Called by the database to lock the table. Keep in mind that this > is an internal lock. > */ > On one hand CSV has HTON_CAN_RECREATE flag set. So the above method will be called only for partitioned tables based on this engine. But on the other hand it also has HTON_NO_PARTITION set so such tables are impossible to create. Therefore I think it makes sense to leave this method unimplemented. > === modified file 'storage/example/ha_example.cc' > --- a/storage/example/ha_example.cc 2010-08-05 12:34:19 +0000 > +++ b/storage/example/ha_example.cc 2010-09-29 22:38:57 +0000 > @@ -739,6 +739,26 @@ int ha_example::delete_all_rows() > > /** > @brief > + Used for handler specific truncate table. The table is locked in > + exclusive mode and handler is responsible for reseting the auto- > + increment counter. > + > + @details > + Called from Truncate_statement::handler_truncate. > + > + @see > + Truncate_statement in sql_truncate.cc > + Remarks in handler::truncate. > +*/ > +int ha_example::truncate() Don't you think that layout of the above comment looks wrong? Also maybe it makes sense to mention in this comment the role of HTON_CAN_RECREATE flag, especially since it is set for this engine? ... > === modified file 'storage/ibmdb2i/ha_ibmdb2i.cc' > --- a/storage/ibmdb2i/ha_ibmdb2i.cc 2010-07-08 21:20:08 +0000 > +++ b/storage/ibmdb2i/ha_ibmdb2i.cc 2010-09-29 22:38:57 +0000 > @@ -1806,6 +1806,13 @@ int ha_ibmdb2i::delete_all_rows() > } > > > +int ha_ibmdb2i::truncate() > +{ > + int error = delete_all_rows(); > + return error ? error : reset_auto_increment(0); > +} > + > + > int ha_ibmdb2i::external_lock(THD *thd, int lock_type) > { > int rc = 0; OK. Seems like ha_ibmdb2i::delete_all_rows() contains some truncate-specific code, so it probably makes sense to notify person responsible for supporting this code (whoever it is) once your patch is pushed. > > === modified file 'storage/innobase/handler/ha_innodb.cc' > --- a/storage/innobase/handler/ha_innodb.cc 2010-08-18 07:14:06 +0000 > +++ b/storage/innobase/handler/ha_innodb.cc 2010-09-29 22:38:57 +0000 > @@ -7072,33 +7072,21 @@ Deletes all rows of an InnoDB table. > @return error number */ > UNIV_INTERN > int > -ha_innobase::delete_all_rows(void) > +ha_innobase::truncate(void) > /*==============================*/ > { > int error; > > - DBUG_ENTER("ha_innobase::delete_all_rows"); > + DBUG_ENTER("ha_innobase::truncate"); > > /* Get the transaction associated with the current thd, or create one > if not yet created, and update prebuilt->trx */ > > update_thd(ha_thd()); > > - if (thd_sql_command(user_thd) != SQLCOM_TRUNCATE) { > - fallback: > - /* We only handle TRUNCATE TABLE t as a special case. > - DELETE FROM t will have to use ha_innobase::delete_row(), > - because DELETE is transactional while TRUNCATE is not. */ > - DBUG_RETURN(my_errno=HA_ERR_WRONG_COMMAND); > - } > - > /* Truncate the table in InnoDB */ > > error = row_truncate_table_for_mysql(prebuilt->table, prebuilt->trx); > - if (error == DB_ERROR) { > - /* Cannot truncate; resort to ha_innobase::delete_row() */ > - goto fallback; > - } > > error = convert_error_code_to_mysql(error, prebuilt->table->flags, > NULL); > OK. Please notify InnoDB guys so they can perform necessary clean-ups in their code once your patch is pushed. > === modified file 'storage/myisam/ha_myisam.cc' > --- a/storage/myisam/ha_myisam.cc 2010-07-08 21:20:08 +0000 > +++ b/storage/myisam/ha_myisam.cc 2010-09-29 22:38:57 +0000 > @@ -1788,6 +1788,12 @@ int ha_myisam::delete_all_rows() > return mi_delete_all_rows(file); > } > > +int ha_myisam::truncate() > +{ > + int error= delete_all_rows(); > + return error ? error : reset_auto_increment(0); > +} > + > int ha_myisam::reset_auto_increment(ulonglong value) > { > file->s->state.auto_increment= value; > Again maybe makes sense to mention in the comment describing this method that we need ha_myisam::truncate() only for the sake of partitioning? > === modified file 'storage/myisammrg/ha_myisammrg.cc' > --- a/storage/myisammrg/ha_myisammrg.cc 2010-09-08 08:25:37 +0000 > +++ b/storage/myisammrg/ha_myisammrg.cc 2010-09-29 22:38:57 +0000 > @@ -1203,6 +1203,22 @@ ha_rows ha_myisammrg::records_in_range(u > } > > > +int ha_myisammrg::truncate() > +{ > + int err= 0; > + MYRG_TABLE *table; > + DBUG_ENTER("ha_myisammrg::truncate"); > + > + for (table= file->open_tables; table != file->end_table; table++) > + { > + if ((err= mi_delete_all_rows(table->table))) > + break; > + } > + > + DBUG_RETURN(err); > +} > + > + Hmm... MERGE doesn't support HA_CAN_RECREATE and doesn't implement delete_all_rows() method. So it didn't support ordinary TRUNCATE before your patch. Also it has HTON_NO_PARTITION set so this method is not necessary for partitioning. I don't think we should enable support of TRUNCATE for MERGE in your patch. Otherwise we probably should do something about resetting auto-increment and add test coverage for this new functionality. > === modified file 'storage/perfschema/ha_perfschema.cc' > --- a/storage/perfschema/ha_perfschema.cc 2010-09-10 09:10:38 +0000 > +++ b/storage/perfschema/ha_perfschema.cc 2010-09-29 22:38:57 +0000 > @@ -345,6 +345,11 @@ int ha_perfschema::delete_all_rows(void) > DBUG_RETURN(result); > } > > +int ha_perfschema::truncate() > +{ > + return delete_all_rows(); > +} > + See my comment about bug #53696 above. ... Otherwise I am OK with your patch. I think it can be pushed after considering/addressing/discussing the above points and provided that we will settle somehow the issue with incompatible change in TRUNCATE behavior for InnoDB tables with FKs. -- Dmitry Lenev, Software Developer Oracle Development SPB/MySQL, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification