List:Commits« Previous MessageNext Message »
From:marko.makela Date:February 1 2012 2:11pm
Subject:bzr push into mysql-trunk-wl5534-stage branch (marko.makela:3812 to 3813)
WL#5526 WL#5548
View as plain text  
 3813 Marko Mäkelä	2012-02-01
      InnoDB: WL#5526 online ADD INDEX, WL#5548 online DROP INDEX
      
      === First part: Replace the old ALTER TABLE API (WL#5534).
      
      Replace add_index(), prepare_drop_index() and final_drop_index() with
      the generic DDL methods implemented in WL#5534. This allows us to
      support things like ADD INDEX i(i), DROP INDEX i without problems. The
      new methods are roughly used as follows:
      
      ha_innobase::check_if_supported_inplace_alter_table(): Check if the
      operation is at all supported, and what kind of lock is
      needed. Creating a PRIMARY KEY or a FULLTEXT INDEX requires the table
      to be X-locked. All other supported operations can be done 'online',
      allowing concurrent modification of the table.
      
      ha_innobase::prepare_inplace_alter_table(): For index creation, start
      a dictionary transaction and create the data dictionary objects. For
      all operations, check if it is allowed.
      
      ha_innobase::inplace_alter_table(): Create any indexes, if index
      creation was requested.  Otherwise, do nothing. This method will be
      invoked for both online and offline index creation. If the operation
      is online, allocate a modification log for the secondary index(es)
      that will be created. If the operation is offline for any reason, do
      not allocate modification logs. This method will also notice and
      report any duplicate key errors.
      
      ha_innobase::commit_inplace_alter_table(): Commit or roll back the
      operation. Rollback may be initiated after a failed operation, or
      after a successfull operation when MySQL fails to upgrade the
      meta-data lock for rewriting the .frm file.
      
      If commit=true, drop the indexes that were requested to be dropped
      (WL#5548). Before dropping, rename the indexes to start with "\377" so
      that crash recovery will flag it as an incomplete index and drop it.
      
      If a successful index creation needs to be rolled back, we cannot
      immediately drop the index(es), because the table will typically be in
      use by other threads (this is what timed out the MDL upgrade in the
      first place). Instead, we mark the index as 'aborted' or 'zombie' and
      will attempt to drop it when we get the chance (reference count has
      dropped to zero, or the index is being evicted from the data
      dictionary cache).
      
      innobase_alter_table_flags(), check_column_being_renamed(),
      column_is_being_renamed(), foreign_key_column_is_being_renamed():
      Remove. Remove the column rename changes from
      ha_innobase::check_if_incompatible_data(). All these checks will be
      performed as part of check_if_supported_inplace_alter_table().
      
      add_index(), final_add_index(), prepare_drop_index(),
      final_drop_index(): Remove. These are replaced by the
      prepare_inplace_alter_table(), inplace_alter_table(), and
      commit_inplace_alter_table().
      
      innodb_online_alter_log_max_size (srv_online_max_size): New parameter
      for limiting the amount of modification log that is allowed to
      accumulate during online index creation. Add DB_ONLINE_LOG_TOO_BIG to
      enum db_err for exceeding this.
      
      === Second part: online index creation (WL#5534).
      
      The algorithm:
      
      (1) row_log_allocate() creates a temporary file for each index being created
       * a logical variant of the InnoDB change (insert) buffer
       * row_log_online_op(index, tuple, trx_id, enum row_op op)
        * enum row_op: INSERT, DELETE_MARK, DELETE_UNMARK, PURGE
        * invoked by DML, rollback, and purge, just like the InnoDB change buffer
        * update invokes (INSERT,new_value), (DELETE_MARK,old_value)
      (2) row_merge_read_clustered_index() scans the clustered index of the table
       * For every index being created, write an index entry to a merge sort file
      (3) row_merge_sort() the buffers (one for every index being created)
      (4) Insert the sorted entries to the new index B-tree(s)
      (5) row_log_apply() the change logs to the index B-tree(s)
       * this will 'publish' the index inside InnoDB
      (6) MySQL upgrades the meta-data lock
      (7) commit_inplace_alter_table(commit=true) will rename the created index to
      non-temporary name, and drop any indexes that were requested to be dropped
      
      Steps (2), (3), (4) are unaffected by this change.
      
      There is an anomaly. Between steps (5) and (7), a DML operation can
      fail due to a uniqueness violation in a created UNIQUE INDEX. The
      index exists in InnoDB at that point, but not in MySQL. Thus, we
      cannot report the index name to MySQL. It is somewhat wrong to report
      a duplicate key for DML before the DDL has fully finished. Other types
      of uniqueness violation observed during index creation would be
      reported to the DDL thread.
      
      Another anomaly is that when step (6) fails to upgrade the meta-data
      lock, MySQL will invoke commit_inplace_alter_table(commit=false) to
      drop any created indexes. These cannot be dropped immediately, because
      failure to upgrade the meta-data lock means that other threads must be
      operating on the table, and potentially accessing the index trees.
      Therefore, we must merely flag the indexes ONLINE_INDEX_ABORTED or
      ONLINE_INDEX_ABORTED_DROPPED. For such indexes, DML threads will
      invoke row_log_online_op() as if the index was still being created
      online. That function would do nothing, returning 'it was buffered' to
      the DML thread. If online index creation completed successfully, the
      function would return 'it was not buffered', and the DML thread would
      insert to the B-tree as usual.
      
      Online index creation can also be aborted when the log file written by
      row_log_online_op() exceeds the new parameter
      innodb_online_alter_log_max_size (a new error DB_ONLINE_LOG_TOO_BIG).
      In this case, DML threads will continue business as usual, and the DDL
      operation will fail.
      
      === Third part: new counters for INFORMATION_SCHEMA.INNODB_METRICS
      
      ddl_background_drop_indexes
        Number of indexes waiting to be dropped after failed index creation
      ddl_online_create_index
        Number of indexes being created online
      ddl_pending_alter_table
        Number of ALTER TABLE, CREATE INDEX, DROP INDEX in progress
      
      === Detailed change description
      
      dict_index_get_online_status(), dict_index_set_online_status(): New functions,
      for determining the status of index creation (enum online_index_status):
      ONLINE_INDEX_COMPLETE:
        the index is complete and ready for access
      ONLINE_INDEX_CREATION:
        the index is being created online
        (allowing concurrent modifications, not allowing index lookups)
      ONLINE_INDEX_ABORTED:
        online index creation was aborted and the index
        should be dropped as soon as index->table->n_ref_count reaches 0
      ONLINE_INDEX_ABORTED_DROPPED:
        the online index creation was aborted, the index was dropped from
        the data dictionary and the tablespace, and it should be dropped
        from the data dictionary cache as soon as index->table->n_ref_count
        reaches 0
      
      dict_index_is_online_ddl(): Determine if the index is or was being
      created online (TRUE) or it is useable for lookups (FALSE).
      
      dict_index_online_log(): Wrapper for row_log_online_op(), to resolve
      circular include file dependencies.
      
      dict_index_online_trylog(): Try logging an operation during online
      index creation. If the index is complete, return FALSE so that the
      operation will be performed directly on the index.
      
      dict_index_struct: Remove to_be_dropped, and add online_status. Add a
      union around search_info and a new member, online_log. The adaptive
      hash index will not be used during online index creation.
      
      dict_table_struct: Add the field drop_aborted, for noting that the
      table may contain 'aborted' or 'zombie' indexes that have to be
      dropped as soon as possible.
      
      btr_root_raise_and_insert(), btr_page_split_and_insert(),
      btr_attach_half_pages(), btr_insert_on_non_leaf_level(): Add undo
      logging and locking flags. Add the flag BTR_CREATE_FLAG, which allows
      operations to bypass row_log_online_op() when an index is being
      created online.
      
      btr_validate_index(): Skip indexes that are being created online.
      
      btr_cur_latch_leaves(): Add the latch_mode BTR_MODIFY_TREE_APPLY_LOG,
      to be invoked from row_log_apply(). It can skip most of the latching,
      because the log will be applied by a single thread.
      
      enum btr_latch_mode: Add BTR_MODIFY_TREE_APPLY_LOG and
      BTR_MODIFY_LEAF_APPLY_LOG, exclusively reserved for row_log_apply(),
      which is single-threaded for any given index that is being created
      online.
      
      btr_cur_search_to_nth_level(): Add the latch_mode
      BTR_MODIFY_TREE_APPLY_LOG and BTR_MODIFY_LEAF_APPLY_LOG. Do not update
      the adaptive hash index for indexes that are being built online.
      
      btr_cur_open_at_index_side(), btr_cur_open_at_rnd_pos(): Disallow the
      latch_mode BTR_MODIFY_TREE_APPLY_LOG and
      BTR_MODIFY_LEAF_APPLY_LOG. These functions are not to be called during
      online index creation.
      
      btr_cur_ins_lock_and_undo(), btr_cur_optimistic_insert(),
      btr_cur_pessimistic_insert(),
      btr_cur_upd_lock_and_undo(), btr_cur_update_in_place(),
      btr_cur_optimistic_update(), btr_cur_pessimistic_update(),
      btr_cur_optimistic_delete(), btr_cur_pessimistic_delete() : Assert that
      the index is not being built online, or the BTR_CREATE_FLAG is being
      passed (from row_log_apply()).
      
      btr_cur_update_in_place(), btr_cur_optimistic_update(),
      btr_cur_pessimistic_update(): Add a separate parameter for trx_id, so
      that row_log_apply() can pass thr=NULL.
      
      row_upd_write_sys_vals_to_log(), btr_cur_update_in_place_log(),
      btr_cur_del_mark_set_clust_rec_log(), btr_cur_trx_report(): Replace
      trx with trx_id.
      
      btr_search_drop_page_hash_index(), btr_search_build_page_hash_index(),
      btr_search_get_info(): Assert that the index is not being created online.
      
      dict_build_index_def_step(): Record only the first table_id created in
      the transaction. Crash recovery would drop that table if the
      data dictionary transaction is found to be incomplete.
      
      dict_table_try_drop_aborted(), dict_table_try_drop_aborted_and_mutex_exit():
      Try to drop any 'aborted' or 'zombie' indexes.
      
      dict_table_close(), dict_table_open_on_id(),
      dict_table_open_on_name_low(), dict_table_open_on_name(),
      dict_table_open_on_name_no_stats(): Add the parameter try_drop, for
      trying to drop incomplete indexes when dict_locked=FALSE.
      
      dict_table_remove_from_cache_low(): Try to drop 'aborted' or 'zombie'
      indexes.
      
      dict_index_add_to_cache(): Assert that the index is not being created
      online. The flag would be set later.
      
      dict_index_remove_from_cache_low(): Clean up after aborted online
      index creation.
      
      dict_table_get_foreign_constraint(): Consider both referencing and
      referenced indexes.
      
      dict_foreign_find_index(): Add const qualifiers. Remove the reference
      to index->to_be_dropped. This will be checked elsewhere.
      
      dict_foreign_find_equiv_index(): Replaced by dict_foreign_find_index().
      
      dict_table_replace_index_in_foreign_list(): Renamed to
      dict_foreign_replace_index(). This will not work properly until
      WL#6049 (meta-data locking for foreign key checks) has been
      implemented.
      
      dict_table_check_for_dup_indexes(): Replace the parameter ibool tmp_ok
      with enum check_name: CHECK_ALL_COMPLETE, CHECK_ABORTED_OK,
      CHECK_PARTIAL_OK.
      
      dict_lru_validate(), dict_lru_find_table(): Make static.
      
      dict_load_columns(): Check errors from dict_load_column_low() a little
      earlier.
      
      dict_stats_update_transient_for_index(): Refactored from
      dict_stats_update_transient(). We need to be able to update the
      statistics for a particular index, once the index has been created.
      
      dict_stats_update_persistent(): Skip indexes that are corrupted or
      being created online.
      
      dict_stats_fetch_from_ps_for_index(): dict_stats_update_for_index():
      New functions, for updating index statistics after index creation.
      
      dict_stats_delete_index_stats(): Take the table and index name as the
      parameter, instead of taking a dict_index_t. We will drop the
      statistics after the object has been freed.
      
      innobase_index_reserve_name: A global constant for the predefined name
      GEN_CLUST_INDEX.
      
      convert_error_code_to_mysql(): Make static in ha_innodb.cc. The ALTER
      TABLE code in handler0alter.cc will invoke a new function
      my_error_innodb() instead, so that my_error() will be invoked exactly
      once for each error.
      
      ha_innobase::info_low(): Ignore indexes that are being created online.
      
      ha_innobase::check(): Ignore indexes that are being created or dropped.
      
      my_error_innodb(): Error reporting for most conditions in DDL
      operations (except old_alter_table=1 or CREATE TABLE or DROP TABLE).
      The errors DB_DUPLICATE_KEY, DB_TABLESPACE_ALREADY_EXISTS, and
      DB_ONLINE_LOG_TOO_BIG must be handled by the caller.
      
      innobase_check_index_keys(): Replace key_info[], num_of_keys with
      Alter_inplace_info.
      
      innobase_create_index_field_def(): Add const qualifiers.
      
      innobase_create_index_def(): Add const qualifiers. Rename key_primary
      to key_clustered, because we do create a new clustered index also when
      creating the FTS_DOC_ID column.
      
      innobase_copy_index_field_def(): Remove. When creating the clustered
      index (and rebuilding the table), all index definitions will be copied
      from the MySQL data dictionary, not from the InnoDB data dictionary.
      
      innobase_fts_check_doc_id_col(),
      innobase_fts_check_doc_id_index_in_def(): Add const qualifiers.
      
      innobase_fts_check_doc_id_index(): Add ha_alter_info, for checking
      indexes that are to be added.
      
      innobase_create_key_def(): Rename to innobase_create_key_defs(). Move
      some handling of full-text index creation to the caller.
      
      innobase_check_column_length(): Change the return type from int to bool.
      
      innobase_find_equiv_index(): Similar to_foreign_find_index(), but
      searches the to-be-added indexes instead of existing ones.
      
      innobase_create_fts_doc_id_idx(): Add the parameter new_clustered.
      
      innobase_add_index_cleanup(): Remove.
      
      online_retry_drop_indexes_low(), online_retry_drop_indexes(): Drop
      'aborted' or 'zombie' indexes.  Invoked by
      prepare_inplace_alter_table() while sufficient locks are being held.
      
      prepare_inplace_alter_table_dict(): Prepare the data dictionary for
      inplace ALTER TABLE (or CREATE INDEX or DROP INDEX).
      
      i_s_fts_index_table_fill_one_index(), i_s_fts_config_fill(): Assert
      that the index is not being created online. Fulltext indexes are never
      being created online.
      
      row_upd_build_sec_rec_difference_binary(): Take the rec offsets as a
      parameter, to avoid having to recompute it. Remove the parameter trx.
      
      ins_node_create_entry_list(): Make static.
      
      struct trx_struct: Correct the comment of trx->table_id.  It was wrong
      already when index creation was implemented in the InnoDB Plugin.
      
      lock_clust_rec_cons_read_sees(), lock_rec_create(),
      lock_rec_enqueue_waiting(), lock_rec_add_to_queue(),
      lock_rec_lock_fast(), lock_rec_lock_slow(), lock_rec_lock(),
      lock_rec_queue_validate(), lock_rec_convert_impl_to_expl(),
      lock_sec_rec_read_check_and_lock(), lock_clust_rec_read_check_and_lock(),
      lock_get_table(), lock_rec_get_index(), lock_rec_get_index_name(),
      lock_table_locks_lookup(): Assert that the index is not being created
      online. These assertions should not be reached for DML threads,
      because they should be buffering the changes with row_log_online_op().
      The row_log_apply() thread will be passing BTR_NO_LOCKING_FLAG,
      skipping the locking.
      
      lock_rec_insert_check_and_lock(),
      lock_sec_rec_modify_check_and_lock(): Assert that the index is not
      being created online, or BTR_CREATE_FLAG is being passed.
      
      lock_rec_insert_check_and_lock(): Remove a bogus assertion about
      LOCK_S during index creation. Index creation is passing the
      BTR_NO_LOCKING_FLAG, skipping locking altogether.
      
      opt_calc_index_goodness(): Ignore indexes that are being created online.
      
      row_ins_must_modify_rec(): Update a comment about the uniqueness of
      node pointers.
      
      row_purge_parse_undo_rec(), row_undo_mod_upd_exist_sec(), row_upd():
      Proceed if there are any indexes being created online.
      
      row_ins_index_entry(), row_purge_remove_sec_if_poss(),
      row_undo_ins_remove_sec_rec(), row_undo_mod_del_mark_or_remove_sec(),
      row_undo_mod_del_mark_or_remove_sec(),
      row_undo_mod_del_unmark_sec_and_undo_update(): Invoke
      dict_index_online_trylog().
      
      row_upd_sec_online(): Auxiliary function for logging the update or
      delete of a record whose index is being created online. Invoked by
      row_upd_sec_step().
      
      row_create_table_for_mysql(), row_drop_table_for_mysql(): Assert that
      at most one table is being created or dropped per transaction, or the
      table is an auxiliary table for full-text search index.
      
      rec_offs_any_null_extern(): Make available in non-debug builds. This
      will be called when scanning all rows during index creation, in
      row_merge_read_clustered_index().
      
      row_log_allocate(), row_log_free(), row_log_online_op(),
      row_log_get_max_trx(), row_log_apply(): The modification log for
      buffering changes during online index creation.
      
      enum row_op: Index record modification operations buffered by
      row_log_online_op(): ROW_OP_INSERT, ROW_OP_DELETE_MARK,
      ROW_OP_DELETE_UNMARK, ROW_OP_PURGE, ROW_OP_DELETE_PURGE.
      
      merge_index_def_struct: Add key_number for the MySQL key number that
      is being created, or ULINT_UNDEFINED if none.
      
      row_merge_dup_report(), row_merge_file_create_low(),
      row_merge_file_destroy_low(): Make public, so that these can be called
      from row0log.cc.
      
      row_merge_drop_index(): Replace with row_merge_drop_indexes_dict() and
      row_merge_drop_indexes().
      
      row_merge_rename_index_to_add(), row_merge_rename_index_to_drop(): New
      functions, used in commit_inplace_alter_table() to guarantee somewhat
      satisfactory crash recovery.
      
      row_merge_build_indexes(): Add the flag 'online'. Add key_numbers[].
      If online index creation fails, flag all created indexes as 'aborted'
      or 'zombie'.
      
      row_merge_buf_encode(): Refactored from row_merge_buf_write().
      
      row_merge_insert_index_tuples(): Replace trx with trx_id. Remove the
      parameter 'table'. Remove the dummy query graph and invoke the b-tree
      functions directly.
      
      row_merge_drop_temp_indexes(): Use direct SQL to drop all temporary
      indexes.
      
      row_merge_read_clustered_index(): Do not commit the mini-transaction
      when switching pages except when there is a lock wait on the clustered
      index tree lock.
      
      MONITOR_MUTEX_INC(), MONITOR_MUTEX_DEC(): New macros, to be used when
      the mutex protecting the counter is to be acquired and released.
      
      MONITOR_ATOMIC_INC(), MONITOR_ATOMIC_DEC(): Define these for
      non-atomic builds as well. Use a new mutex (monitor_mutex) for
      protection in that case.
      
      row_ins_index_entry_big_rec(): A new function, for inserting the
      externally stored fields (off-page columns) of a clustered index entry.
      
      rb:854 approved by Jimmy Yang

    added:
      mysql-test/suite/innodb/r/innodb-index-online.result
      mysql-test/suite/innodb/t/innodb-index-online-master.opt
      mysql-test/suite/innodb/t/innodb-index-online.test
      mysql-test/suite/sys_vars/r/innodb_online_alter_log_max_size_basic.result
      mysql-test/suite/sys_vars/t/innodb_online_alter_log_max_size_basic.test
      storage/innobase/include/row0log.h
      storage/innobase/row/row0log.cc
    modified:
      mysql-test/r/alter_table.result
      mysql-test/suite/innodb/r/innodb-autoinc-44030.result
      mysql-test/suite/innodb/r/innodb.result
      mysql-test/suite/innodb/r/innodb_16k.result
      mysql-test/suite/innodb/r/innodb_4k.result
      mysql-test/suite/innodb/r/innodb_8k.result
      mysql-test/suite/innodb/r/innodb_bug46000.result
      mysql-test/suite/innodb/r/innodb_bug53591.result
      mysql-test/suite/innodb/r/innodb_index_large_prefix.result
      mysql-test/suite/innodb/r/innodb_index_large_prefix_4k.result
      mysql-test/suite/innodb/r/innodb_index_large_prefix_8k.result
      mysql-test/suite/innodb/r/innodb_monitor.result
      mysql-test/suite/innodb/r/innodb_prefix_index_liftedlimit.result
      mysql-test/suite/innodb/t/innodb-autoinc-44030.test
      mysql-test/suite/innodb/t/innodb_16k.test
      mysql-test/suite/innodb/t/innodb_4k.test
      mysql-test/suite/innodb/t/innodb_8k.test
      mysql-test/suite/innodb/t/innodb_bug53591.test
      mysql-test/suite/innodb/t/innodb_index_large_prefix.test
      mysql-test/suite/innodb/t/innodb_index_large_prefix_4k.test
      mysql-test/suite/innodb/t/innodb_index_large_prefix_8k.test
      mysql-test/suite/innodb/t/innodb_monitor.test
      mysql-test/suite/innodb/t/innodb_prefix_index_liftedlimit.test
      mysql-test/suite/innodb_fts/r/fulltext.result
      mysql-test/suite/innodb_fts/r/innodb-fts-ddl.result
      mysql-test/suite/innodb_fts/r/innodb-fts-fic.result
      mysql-test/suite/innodb_fts/r/innodb_fts_misc.result
      mysql-test/suite/innodb_fts/r/innodb_fts_misc_1.result
      mysql-test/suite/innodb_fts/r/innodb_fts_mutiple_index.result
      mysql-test/suite/innodb_fts/r/innodb_fts_proximity.result
      mysql-test/suite/innodb_fts/r/innodb_fts_transaction.result
      mysql-test/suite/innodb_fts/t/innodb-fts-ddl.test
      mysql-test/suite/innodb_fts/t/innodb-fts-fic.test
      mysql-test/suite/sys_vars/r/innodb_monitor_disable_basic.result
      mysql-test/suite/sys_vars/r/innodb_monitor_enable_basic.result
      mysql-test/suite/sys_vars/r/innodb_monitor_reset_all_basic.result
      mysql-test/suite/sys_vars/r/innodb_monitor_reset_basic.result
      mysql-test/suite/sys_vars/t/innodb_monitor_disable_basic.test
      mysql-test/suite/sys_vars/t/innodb_monitor_enable_basic.test
      mysql-test/suite/sys_vars/t/innodb_monitor_reset_all_basic.test
      mysql-test/suite/sys_vars/t/innodb_monitor_reset_basic.test
      mysql-test/t/alter_table.test
      sql/share/errmsg-utf8.txt
      storage/innobase/CMakeLists.txt
      storage/innobase/btr/btr0btr.cc
      storage/innobase/btr/btr0cur.cc
      storage/innobase/btr/btr0sea.cc
      storage/innobase/dict/dict0crea.cc
      storage/innobase/dict/dict0dict.cc
      storage/innobase/dict/dict0load.cc
      storage/innobase/dict/dict0stats.cc
      storage/innobase/fts/fts0fts.cc
      storage/innobase/fts/fts0opt.cc
      storage/innobase/handler/ha_innodb.cc
      storage/innobase/handler/ha_innodb.h
      storage/innobase/handler/handler0alter.cc
      storage/innobase/handler/i_s.cc
      storage/innobase/ibuf/ibuf0ibuf.cc
      storage/innobase/include/btr0btr.h
      storage/innobase/include/btr0cur.h
      storage/innobase/include/btr0pcur.h
      storage/innobase/include/btr0sea.ic
      storage/innobase/include/db0err.h
      storage/innobase/include/dict0dict.h
      storage/innobase/include/dict0dict.ic
      storage/innobase/include/dict0mem.h
      storage/innobase/include/dict0stats.h
      storage/innobase/include/rem0rec.h
      storage/innobase/include/rem0rec.ic
      storage/innobase/include/row0ins.h
      storage/innobase/include/row0merge.h
      storage/innobase/include/row0mysql.h
      storage/innobase/include/row0types.h
      storage/innobase/include/row0upd.h
      storage/innobase/include/row0upd.ic
      storage/innobase/include/srv0mon.h
      storage/innobase/include/srv0srv.h
      storage/innobase/include/sync0rw.h
      storage/innobase/include/sync0sync.h
      storage/innobase/include/trx0trx.h
      storage/innobase/lock/lock0lock.cc
      storage/innobase/pars/pars0opt.cc
      storage/innobase/pars/pars0pars.cc
      storage/innobase/pars/pars0sym.cc
      storage/innobase/row/row0ftsort.cc
      storage/innobase/row/row0ins.cc
      storage/innobase/row/row0merge.cc
      storage/innobase/row/row0mysql.cc
      storage/innobase/row/row0purge.cc
      storage/innobase/row/row0sel.cc
      storage/innobase/row/row0uins.cc
      storage/innobase/row/row0umod.cc
      storage/innobase/row/row0upd.cc
      storage/innobase/srv/srv0mon.cc
      storage/innobase/srv/srv0srv.cc
      storage/innobase/srv/srv0start.cc
      storage/innobase/sync/sync0sync.cc
      storage/innobase/trx/trx0rec.cc
      storage/innobase/trx/trx0roll.cc
      storage/innobase/ut/ut0ut.cc
 3812 Jon Olav Hauglid	2012-02-01
      WL#5534 Online ALTER, Phase 1
      
      This worklog extends the handler API to add general support
      for online table changes and rewrites the ALTER TABLE
      implementation to it. The handler API extensions are based
      on Cluster's online ALTER implementation and therefore also
      narrows the gap between the Server and Cluster codebases.
      
      Using this new API, the server supports concurrent reads
      and writes during the main phase of in-place ALTER TABLE
      operations. In this first phase the focus is on index
      creation, but the API is general enough to be extended to
      other ALTER TABLE operations.
      
      The ALTER TABLE syntax is extended to allow users to specify
      concurrency requirements as well as select between the in-place
      and copy algorithms.
      
      Execution of an ALTER TABLE statement has three phases:
      1) Initialization
         In this phase the algorithm and locking level is determined.
         This is done based on requirements from the user, capabilities
         of the given storage engine and characteristics of the
         ALTER TABLE operation.
         The new handler API calls for this phase has a default
         implementation that converts them to the old API calls so
         existing engines can be used without changes.
      
      2) Execution
         In this phase, the table definition is changed. This is done
         either directly on the current table (i.e. in-place) or using
         a temporary table copy. The new API calls only concern the
         the former. In this patch, a compatibility layer converts the
         new API calls for in-place execution to the old API calls for
         index creation/deletion. However these implementations will be
         removed later once the InnoDB changes that makes use of the
         new API is in place.
      
      3) Finalization
         The final phase, updates the data dictionary, installs the
         new .FRM and notifies the storage engine.
      
      The metadata subsystem is extended with a new lock type,
      MDL_SHARED_UPGRADABLE, which is used for phases when ALTER TABLE
      allows concurrent reads and writes.
      
      The patch also updates the partitioning implementation to use the
      new handler API functions.
     @ include/mysql_com.h
        Added deprecation comment for old flag.
     @ mysql-test/suite/rpl/t/rpl_slave_grp_exec.test
        Added missing sync_slave_with_master which caused
        problems made visible by WL#5534.
     @ mysql-test/t/alter_table.test
        Added single connection regression test for WL#5534
     @ mysql-test/t/flush_read_lock.test
        ALTER TABLE on temporary transactional tables is now
        compatible with FTWRL - updated test.
     @ mysql-test/t/innodb_mysql_lock.test
        Added regression test for Bug#11750045 fixed by WL#5534.
     @ mysql-test/t/innodb_mysql_sync.test
        Updated existing tests with changed debug sync points.
        Added multi connection tests for WL#5534.
     @ mysql-test/t/lock.test
        Added MERGE related test coverage for MDL-related regression
        introduced during WL#5534 development.
     @ mysql-test/t/mdl_sync.test
        Updated existing tests with changed debug sync points.
        Updated existing tests with changed ALTER TABLE locking behavior.
        Added test coverage for new MDL_SHARED_UPGRADABLE lock.
     @ mysql-test/t/partition_debug_sync.test
        Updated existing tests with changed debug sync points.
     @ mysql-test/t/truncate_coverage.test
        Updated existing tests with changed debug sync points.
     @ sql/ha_partition.cc
        Replaced old online ALTER API implemenation with
        implementation of new online ALTER API.
     @ sql/ha_partition.h
        Replaced old online ALTER API implemenation with
        implementation of new online ALTER API.
     @ sql/handler.cc
        handler::print_keydup_error() now takes KEY rather than
        key number as parameter, to account for situations where
        the key number is not known.
        Added compatibility layer implementations of new handler
        API functions for online ALTER TABLE execution. This allows
        storage engines to be used without changes. Note that parts
        of the compatibility layer that implements execution of 
        in-place add/drop index will be removed once InnoDB has been
        updated to use the new API functions.
     @ sql/handler.h
        Added new handler API calls for implementing online ALTER
        TABLE operations.
        Added Alter_inplace_info class which describes the changes
        to be done by ALTER TABLE. It is used as a paramenter for
        the new API calls.
        Marked parts of the old API for online add/drop index as
        deprected.
        Added public wrappers for new handler API functions to allow
        for enforcing asserts etc. regardless for handler implementations
        of the new API functions.
     @ sql/key.cc
        key_unpack() now takes KEY rather than key number as
        parameter, to account for situations where the key number
        is not known.
     @ sql/key.h
        key_unpack() now takes KEY rather than key number as
        parameter, to account for situations where the key number
        is not known.
     @ sql/mdl.cc
        Added new lock type MDL_SHARED_UPGRADABLE. It's an upgradable
        shared metadata lock that allows concurrent reads and writes
        to table data. The lock is not compatible with itself.
        It is used by the first phase of ALTER TABLE and during
        execution of ALTER TABLE operations if supported by
        the storage engine.
        Updates to MDL lock type compatibility matrices.
        Minor adjstments to the API to account for this new lock type.
     @ sql/mdl.h
        Added new lock type MDL_SHARED_UPGRADABLE. It's an upgradable
        shared metadata lock that allows concurrent reads and writes
        to table data. The lock is not compatible with itself.
        It is used by the first phase of ALTER TABLE and during
        execution of ALTER TABLE operations if supported by
        the storage engine.
        Minor adjstments to the API to account for this new lock type.
     @ sql/share/errmsg-utf8.txt
        Add new error messages ER_UNKNOWN_ALTER_ALGORITHM
        and ER_UNKNOWN_ALTER_LOCK used during parsing of
        ALTER TABLE syntax extension.
     @ sql/sql_admin.cc
        Updated with ALTER TABLE command flags moving to Alter_info class.
     @ sql/sql_alter.cc
        Moved Alter_info class here.
        Moved ALTER TABLE command flags to Alter_info.
        Updated and documented the individual flags.
        Added support for setting requested levels of concurrency
        and algorithm from new ALTER TABLE syntax.
        Added new ALter_table_ctx class for maintaining runtime
        context during execution of ALTER TABLE statements.
        Added new Sql_cmd class for IMPORT/DISCARD TABLESPACE.
     @ sql/sql_alter.h
        Moved Alter_info class here.
        Moved ALTER TABLE command flags to Alter_info.
        Updated and documented the individual flags.
        Added support for setting requested levels of concurrency
        and algorithm from new ALTER TABLE syntax.
        Added new ALter_table_ctx class for maintaining runtime
        context during execution of ALTER TABLE statements.
        Added new Sql_cmd class for IMPORT/DISCARD TABLESPACE.
     @ sql/sql_base.cc
        Updated with changes to MDL API and introduction of
        MDL_SHARED_UPGRADABLE lock.
        rm_temporary_table() now takes path parameter without extension.
     @ sql/sql_base.h
        open_table_uncached() now takes additional parameter telling
        if table should be opened by storage engine.
        Removed unused Alter_info member from
        Alter_table_prelocking_strategy.
     @ sql/sql_class.h
        enum_enable_or_disable moved to Alter_info.
        open_table_uncached() now takes additional parameter telling
        if table should be opened by storage engine.
     @ sql/sql_insert.cc
        Adjusted parameters to mysql_create_table_no_lock()
        and quick_rm_table().
     @ sql/sql_lex.cc
        Alter_info moved to sql_alter.
     @ sql/sql_lex.h
        Alter_info and ALTER_* flags moved to sql_alter.
        Alter_table_change_level no longer needed, removed.
     @ sql/sql_partition.cc
        Adjusted parameters for prep_alter_part_table(), ALTER TABLE
        context is now stored in Alter_table_ctx rather than passed
        separately.
        Updated with changes to MDL API.
        open_table_uncached() now takes additional parameter telling
        if table should be opened by storage engine.
     @ sql/sql_partition.h
        Adjusted parameters for prep_alter_part_table(), ALTER TABLE
        context is now stored in Alter_table_ctx rather than passed
        separately.
     @ sql/sql_partition_admin.cc
        Adjusted for changes to mysql_compare_tables()
        Updated with changes to MDL API
     @ sql/sql_table.cc
        build_tmptable_filename() now returns a path to .FRM
        without extension.
        Simplified mysql_compare_tables() since it's now only
        used by partitioning code.
        Made mysql_discard_or_import_tablespace() a global function
        since it's now called by a separate Sql_cmd.
        quick_rm_table() now takes a the new NO_HA_TABLE flag
        as parameter for when the table should not be removed in
        the storage engine.
        mysql_create_table_no_lock() is now a wrapper to
        create_table_impl() which does the real work.
        create_table_impl() now returns info about KEY objects
        in the table create, which prevents redundant calls to
        mysql_prepare_create_table().
        mysql_rename_table() now takes NO_HA_TABLE flag if table
        should not be renamed in engine.
        Added fill_alter_inplace_info() which figures out which
        operations are to be done by ALTER TABLE and initializes
        structures to be used during execution.
        Added is_inplace_alter_impossible() which determines
        if the characteristics for the pending ALTER TABLE operations
        are incompatible with in-place execution.
        Added mysql_inplace_alter_table() for execution of 
        in-place ALTER TABLE operations.
        Extracted simple_rename_or_index_change() from existing
        ALTER TABLE operation.
        Rewrite of mysql_alter_table():
        - No thr_lock.c lock is now taken when the table is
          first opened. It is taken later once the metadata
          lock has been upgraded. This avoids deadlock scenarios.
        - Name handling moved to new Alter_table_ctx class.
        - Simplified control flow (e.g. reduced number of gotos).
        - Added handling of new LOCK and ALGORITHM syntax.
        - Extracted code to new simple_rename_or_index_change() function.
        - In-place code generalized and extracted to new 
          mysql_inplace_alter_table() function.
        - For in-place ALTER we now open a TABLE instance from
          the .FRM created for the new version of the table and
          use this TABLE instance of in-place handler API calls.
        - Errors during clean-up phase (e.g. deletion of temporary
          files) are no longer ignored.
     @ sql/sql_table.h
        Made build_tmptable_filename() a global function.
        Made mysql_discard_or_import_tablespace() a global function
        since it's now called by a separate Sql_cmd.
        Added NO_HA_TABLE flag.
        Simplified mysql_compare_tables() since it's now only
        used by partitioning code.
        Adjusted quick_rm_table() parameters.
     @ sql/sql_trigger.cc
        Updated with changes to MDL API.
     @ sql/sql_truncate.cc
        open_table_uncached() now takes additional parameter telling
        if table should be opened by storage engine.
        Updated with changes to MDL API.
     @ sql/sql_yacc.yy
        Added LOCK and ALGORITHM clauses to ALTER TABLE statement.
        Added separate Sql_cmd for ALTER TABLE IMPORT/DISCARD TABLESPACE.
        Changed MDL lock type used for ALTER TABLE to MDL_SHARED_UPGRADABLE.
     @ sql/unireg.cc
        Added no_ha_table flag to rea_create_table() indicating that
        only .FRM(/.PAR) needs to be created. Used by ALTER TABLE.
     @ sql/unireg.h
        Added no_ha_table flag to rea_create_table() indicating that
        only .FRM(/.PAR) needs to be created. Used by ALTER TABLE.
     @ storage/innobase/handler/ha_innodb.cc
        Fixed regression found during WL#5534 development.
        Patch from Marko.
     @ storage/innobase/handler/ha_innodb.h
        Updated with changes in handler API.
     @ storage/innobase/handler/handler0alter.cc
        handler_add_index class renamed inplace_alter_handler_ctx.
        Updated with changes in handler API.
     @ storage/myisammrg/ha_myisammrg.cc
        ALTER TABLE now takes MDL_SHARED_UPGRADABLE lock when
        opening the table to be altered.
     @ unittest/gunit/mdl-t.cc
        Updated tests with changes to MDL API and introduction
        of MDL_SHARED_UPGRADABLE lock.
     @ unittest/gunit/mdl_mytap-t.cc
        Updated tests with changes to MDL API and introduction
        of MDL_SHARED_UPGRADABLE lock.

    modified:
      include/mysql_com.h
      mysql-test/include/commit.inc
      mysql-test/r/alter_table.result
      mysql-test/r/archive.result
      mysql-test/r/commit_1innodb.result
      mysql-test/r/ctype_utf8mb4.result
      mysql-test/r/flush_read_lock.result
      mysql-test/r/innodb_mysql_lock.result
      mysql-test/r/innodb_mysql_sync.result
      mysql-test/r/lock.result
      mysql-test/r/mdl_sync.result
      mysql-test/r/partition_debug_sync.result
      mysql-test/r/truncate_coverage.result
      mysql-test/suite/innodb/r/innodb-index.result
      mysql-test/suite/innodb/r/innodb_index_large_prefix.result
      mysql-test/suite/innodb/r/innodb_index_large_prefix_4k.result
      mysql-test/suite/innodb/r/innodb_index_large_prefix_8k.result
      mysql-test/suite/innodb/r/innodb_mysql.result
      mysql-test/suite/innodb/r/innodb_prefix_index_liftedlimit.result
      mysql-test/suite/perfschema/r/stage_mdl_table.result
      mysql-test/suite/rpl/t/rpl_slave_grp_exec.test
      mysql-test/t/alter_table.test
      mysql-test/t/flush_read_lock.test
      mysql-test/t/innodb_mysql_lock.test
      mysql-test/t/innodb_mysql_sync.test
      mysql-test/t/lock.test
      mysql-test/t/mdl_sync.test
      mysql-test/t/partition_debug_sync.test
      mysql-test/t/truncate_coverage.test
      sql/ha_partition.cc
      sql/ha_partition.h
      sql/handler.cc
      sql/handler.h
      sql/key.cc
      sql/key.h
      sql/mdl.cc
      sql/mdl.h
      sql/share/errmsg-utf8.txt
      sql/sql_admin.cc
      sql/sql_alter.cc
      sql/sql_alter.h
      sql/sql_base.cc
      sql/sql_base.h
      sql/sql_class.h
      sql/sql_insert.cc
      sql/sql_lex.cc
      sql/sql_lex.h
      sql/sql_partition.cc
      sql/sql_partition.h
      sql/sql_partition_admin.cc
      sql/sql_table.cc
      sql/sql_table.h
      sql/sql_trigger.cc
      sql/sql_truncate.cc
      sql/sql_yacc.yy
      sql/unireg.cc
      sql/unireg.h
      storage/innobase/handler/ha_innodb.cc
      storage/innobase/handler/ha_innodb.h
      storage/innobase/handler/handler0alter.cc
      storage/myisammrg/ha_myisammrg.cc
      unittest/gunit/mdl-t.cc
      unittest/gunit/mdl_mytap-t.cc

Diff too large for email (22515 lines, the limit is 10000).
No bundle (reason: useless for push emails).
Thread
bzr push into mysql-trunk-wl5534-stage branch (marko.makela:3812 to 3813)WL#5526 WL#5548marko.makela2 Feb