List:Commits« Previous MessageNext Message »
From:Dmitry Lenev Date:April 13 2010 2:36pm
Subject:bzr commit into mysql-trunk-runtime branch (dlenev:3000) Bug#51263
View as plain text  
#At file:///home/dlenev/src/bzr/mysql-trunk-rt-bg51263-2/ based on revid:dlenev@stripped

 3000 Dmitry Lenev	2010-04-13
      Tentative fix for bug #51263 "Deadlock between transactional
      SELECT and ALTER TABLE ...  REBUILD PARTITION".
      
      ALTER TABLE on InnoDB table (including partitioned tables)
      acquired exclusive locks on rows of table being altered.
      In cases when there was concurrent transaction which did
      locking reads from this table this sometimes led to a
      deadlock which was not detected by MDL subsystem nor by
      InnoDB engine (and was reported only after exceeding
      innodb_lock_wait_timeout).
      
      This problem stemmed from the fact that ALTER TABLE acquired
      TL_WRITE_ALLOW_READ lock on table being altered. This lock
      was interpreted as a write lock and thus for table being
      altered handler::external_lock() method was called with
      F_WRLCK as an argument. As result InnoDB engine treated
      ALTER TABLE as an operation which is going to change data
      and acquired LOCK_X locks on rows being read from old
      version of table.
      
      In case when there was a transaction which already acquired
      SR metadata lock on table and some LOCK_S locks on its rows
      (e.g. by using it in subquery of DML statement) concurrent
      ALTER TABLE was blocked at the moment when it tried to
      acquire LOCK_X lock before reading one of these rows.
      The transaction's attempt to acquire SW metadata lock on
      table being altered led to deadlock, since it had to wait
      for ALTER TABLE to release SNW lock. This deadlock was not
      detected and got resolved only after timeout expiring
      because waiting were happening in two different subsystems.
      
      Similar deadlocks could have occured in other situations.
      
      This patch tries to solve the problem by changing ALTER TABLE
      implementation to use TL_READ_NO_INSERT lock instead of
      TL_WRITE_ALLOW_READ. After this step handler::external_lock()
      is called with F_RDLCK as an argument and InnoDB engine
      correctly interprets ALTER TABLE as operation which only
      reads data from original version of table. Thanks to this
      ALTER TABLE acquires only LOCK_S locks on rows it reads.
      This, in its turn, causes inter-subsystem deadlocks to go
      away, as all potential lock conflicts and thus deadlocks will
      be limited to metadata locking subsystem:
      
      - When ALTER TABLE reads rows from table being altered it
        can't encounter any locks which conflict with LOCK_S row
        locks. There should be no concurrent transactions holding
        LOCK_X row locks. Such a transaction should have been
        acquired SW metadata lock on table first which would have
        conflicted with ALTER's SNW lock.
      - Vice versa, when DML which runs concurrently with ALTER
        TABLE tries to lock row it should be requesting only LOCK_S
        lock which is compatible with locks acquired by ALTER,
        as otherwise such DML must own an SW metadata lock on table
        which would be incompatible with ALTER's SNW lock.
      
      Questions for reviewer are marked by QQ.
     @ mysql-test/r/innodb_mysql_lock2.result
        Added test for bug #51263 "Deadlock between transactional
        SELECT and ALTER TABLE ... REBUILD PARTITION".
     @ mysql-test/suite/rpl_ndb/r/rpl_ndb_binlog_format_errors.result
        Since CREATE TRIGGER no longer acquires write lock on table
        it is no longer interpreted as an operation which modifies
        table data and therefore no longer fails if invoked for
        SBR-only engine in ROW mode.
     @ mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors.test
        Since CREATE TRIGGER no longer acquires write lock on table
        it is no longer interpreted as an operation which modifies
        table data and therefore no longer failes if invoked for
        SBR-only engine in ROW mode.
     @ mysql-test/t/innodb_mysql_lock2.test
        Added test for bug #51263 "Deadlock between transactional
        SELECT and ALTER TABLE ... REBUILD PARTITION".
     @ sql/ha_ndbcluster.cc
        Added question for reviewer.
     @ sql/ha_partition.cc
        When ALTER TABLE creates a new partition to be filled from
        other partition lock it in F_WRLCK mode instead of using
        mode which was used for locking the whole table (it is
        F_RDLCK now).
     @ sql/lock.cc
        Added questions for reviewer.
     @ sql/mdl.cc
        Updated outdated comment to reflect current situation.
     @ sql/sql_base.cc
        Added question for reviewer.
     @ sql/sql_table.cc
        mysql_admin_table():
          Use TL_WRITE_ALLOW_WRITE lock type instead of
          TL_WRITE_ALLOW_READ to determine that we need to acquire
          upgradable metadata lock. This should allow to completely
          get rid of TL_WRITE_ALLOW_READ in long term.
        mysql_recreate_table():
          ALTER TABLE now requires TL_READ_NO_INSERT thr_lock.c lock
          instead of TL_WRITE_ALLOW_READ.
     @ sql/sql_trigger.cc
        Changed CREATE/DROP TRIGGER implementation to use
        TL_READ_NO_INSERT lock instead of TL_WRITE_ALLOW_READ lock.
        The latter is no longer necessary since:
        a) We now can rely on metadata locks to achieve proper
           isolation between two DDL statements or DDL and DML
           statements.
        b) This statement does not change any data in table so there
           is no need to inform storage engine about it.
     @ sql/sql_yacc.yy
        Changed implementation of ALTER TABLE (and CREATE/DROP INDEX
        as a consequence) to use TL_READ_NO_INSERT lock instead of
        TL_WRITE_ALLOW_READ lock. This is possible since:
        a) We now can rely on metadata locks to achieve proper
           isolation between two DDL statements or DDL and DML
           statements.
        b) This statement only reads data in table being open.
           We write data only to the new version of table and
           then replace with it old version of table under
           X metadata lock.
        
        Thanks to this change InnoDB will no longer acquire LOCK_X
        locks on rows being read by ALTER TABLE (instead LOCK_S
        locks will be acquired) and thus cause of bug #51263
        "Deadlock between transactional SELECT and ALTER TABLE ...
        REBUILD PARTITION" is removed.
        
        Did the similar change for CREATE TRIGGER (see comments
        for sql_trigger.cc for details).

    added:
      mysql-test/r/innodb_mysql_lock2.result
      mysql-test/t/innodb_mysql_lock2.test
    modified:
      mysql-test/suite/rpl_ndb/r/rpl_ndb_binlog_format_errors.result
      mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors.test
      sql/ha_ndbcluster.cc
      sql/ha_partition.cc
      sql/lock.cc
      sql/mdl.cc
      sql/sql_base.cc
      sql/sql_table.cc
      sql/sql_trigger.cc
      sql/sql_yacc.yy
=== added file 'mysql-test/r/innodb_mysql_lock2.result'
--- a/mysql-test/r/innodb_mysql_lock2.result	1970-01-01 00:00:00 +0000
+++ b/mysql-test/r/innodb_mysql_lock2.result	2010-04-13 14:36:37 +0000
@@ -0,0 +1,65 @@
+#
+# Test for bug#51263 "Deadlock between transactional SELECT
+# and ALTER TABLE ... REBUILD PARTITION".
+#
+drop table if exists t1, t2;
+create table t1 (i int auto_increment not null primary key) engine=innodb;
+create table t2 (i int) engine=innodb;
+insert into t1 values (1), (2), (3), (4), (5);
+begin;
+# Acquire SR metadata lock on t1 and LOCK_S row-locks on its rows.
+insert into t2 select count(*) from t1;
+# Switching to connection 'con1'.
+# Sending:
+alter table t1 add column j int;
+# Switching to connection 'default'.
+# Wait until ALTER is blocked because it tries to upgrade SNW
+# metadata lock to X lock.
+# It should not be blocked during copying data to new version of
+# table as it acquires LOCK_S locks on rows of old version, which
+# are compatible with locks acquired by connection 'con1'.
+# The below statement will deadlock because it will try to acquire
+# SW lock on t1, which will conflict with ALTER's SNW lock. And
+# ALTER will be waiting for this connection to release its SR lock.
+# This deadlock should be detected by an MDL subsystem and this
+# statement should be aborted with an appropriate error.
+insert into t1 values (6);
+ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
+# Unblock ALTER TABLE.
+commit;
+# Switching to connection 'con1'.
+# Reaping ALTER TABLE.
+# Switching to connection 'default'.
+#
+# Now test for scenario in which bug was reported originally.
+#
+drop tables t1, t2;
+create table t1 (i int auto_increment not null primary key) engine=innodb
+partition by hash (i) partitions 4;
+create table t2 (i int) engine=innodb;
+insert into t1 values (1), (2), (3), (4), (5);
+begin;
+# Acquire SR metadata lock on t1.
+select * from t1;
+i
+1
+2
+3
+4
+5
+# Switching to connection 'con1'.
+# Sending:
+alter table t1 rebuild partition p0;
+# Switching to connection 'default'.
+# Wait until ALTER is blocked because of active SR lock.
+# The below statement should succeed as transaction
+# has SR metadata lock on t1 and only going to read
+# rows from it.
+insert into t2 select count(*) from t1;
+# Unblock ALTER TABLE.
+commit;
+# Switching to connection 'con1'.
+# Reaping ALTER TABLE.
+# Switching to connection 'default'.
+# Clean-up.
+drop tables t1, t2;

=== modified file 'mysql-test/suite/rpl_ndb/r/rpl_ndb_binlog_format_errors.result'
--- a/mysql-test/suite/rpl_ndb/r/rpl_ndb_binlog_format_errors.result	2010-02-05 17:04:38 +0000
+++ b/mysql-test/suite/rpl_ndb/r/rpl_ndb_binlog_format_errors.result	2010-04-13 14:36:37 +0000
@@ -43,9 +43,9 @@ SELECT * FROM t /* Should be empty */;
 a
 * Modify both row-only and stmt-only table
 CREATE TRIGGER trig_2 AFTER INSERT ON t_stmt FOR EACH ROW BEGIN INSERT INTO t_row VALUES(1); END;
-ERROR HY000: Cannot execute statement: binlogging impossible since BINLOG_FORMAT = ROW and at least one table uses a storage engine limited to statement-logging.
 INSERT INTO t_stmt VALUES (1);
-ERROR HY000: Cannot execute statement: binlogging impossible since BINLOG_FORMAT = ROW and at least one table uses a storage engine limited to statement-logging.
+ERROR HY000: Cannot execute statement: binlogging impossible since both row-incapable engines and statement-incapable engines are involved.
+DROP trigger trig_2;
 SELECT * FROM t_stmt /* should be empty */;
 a
 * Stmt-only table and binlog_format=row

=== modified file 'mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors.test'
--- a/mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors.test	2010-02-05 17:04:38 +0000
+++ b/mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors.test	2010-04-13 14:36:37 +0000
@@ -97,10 +97,10 @@ SELECT * FROM t_self_logging /* Should b
 SELECT * FROM t /* Should be empty */;
 
 --echo * Modify both row-only and stmt-only table
---error ER_BINLOG_ROW_MODE_AND_STMT_ENGINE
 --eval CREATE TRIGGER trig_2 AFTER INSERT ON t_stmt FOR EACH ROW BEGIN INSERT INTO t_row VALUES(1); END
---error ER_BINLOG_ROW_MODE_AND_STMT_ENGINE
+--error ER_BINLOG_ROW_ENGINE_AND_STMT_ENGINE
 INSERT INTO t_stmt VALUES (1);
+DROP trigger trig_2;
 SELECT * FROM t_stmt /* should be empty */;
 
 --echo * Stmt-only table and binlog_format=row

=== added file 'mysql-test/t/innodb_mysql_lock2.test'
--- a/mysql-test/t/innodb_mysql_lock2.test	1970-01-01 00:00:00 +0000
+++ b/mysql-test/t/innodb_mysql_lock2.test	2010-04-13 14:36:37 +0000
@@ -0,0 +1,111 @@
+--source include/have_innodb.inc
+# This test requires MIXED or STATEMENT mode of binary log to
+# be turned on as otherwise LOCK_S row locks won't be acquired
+# for DML and test for bug#51263 won't trigger execution path
+# on which this bug was encountered.
+--source include/have_binlog_format_statement.inc
+# Original test case for bug#51263 needs partitioning.
+--source include/have_partition.inc
+# Save the initial number of concurrent sessions.
+--source include/count_sessions.inc
+
+--echo #
+--echo # Test for bug#51263 "Deadlock between transactional SELECT
+--echo # and ALTER TABLE ... REBUILD PARTITION".
+--echo #
+connect (con1,localhost,root,,test,,);
+connection default;
+--disable_warnings
+drop table if exists t1, t2;
+--enable_warnings
+create table t1 (i int auto_increment not null primary key) engine=innodb;
+create table t2 (i int) engine=innodb;
+insert into t1 values (1), (2), (3), (4), (5);
+
+begin;
+--echo # Acquire SR metadata lock on t1 and LOCK_S row-locks on its rows.
+insert into t2 select count(*) from t1;
+
+--echo # Switching to connection 'con1'.
+connection con1;
+--echo # Sending:
+--send alter table t1 add column j int
+
+--echo # Switching to connection 'default'.
+connection default;
+--echo # Wait until ALTER is blocked because it tries to upgrade SNW
+--echo # metadata lock to X lock.
+--echo # It should not be blocked during copying data to new version of
+--echo # table as it acquires LOCK_S locks on rows of old version, which
+--echo # are compatible with locks acquired by connection 'con1'.
+let $wait_condition=
+  select count(*) = 1 from information_schema.processlist where state =
+    "Waiting for table" and info = "alter table t1 add column j int";
+--source include/wait_condition.inc
+
+--echo # The below statement will deadlock because it will try to acquire
+--echo # SW lock on t1, which will conflict with ALTER's SNW lock. And
+--echo # ALTER will be waiting for this connection to release its SR lock.
+--echo # This deadlock should be detected by an MDL subsystem and this
+--echo # statement should be aborted with an appropriate error.
+--error ER_LOCK_DEADLOCK
+insert into t1 values (6);
+--echo # Unblock ALTER TABLE.
+commit;
+
+--echo # Switching to connection 'con1'.
+connection con1;
+--echo # Reaping ALTER TABLE.
+--reap
+
+--echo # Switching to connection 'default'.
+connection default;
+
+--echo #
+--echo # Now test for scenario in which bug was reported originally.
+--echo #
+drop tables t1, t2;
+create table t1 (i int auto_increment not null primary key) engine=innodb
+  partition by hash (i) partitions 4;
+create table t2 (i int) engine=innodb;
+insert into t1 values (1), (2), (3), (4), (5);
+
+begin;
+--echo # Acquire SR metadata lock on t1.
+select * from t1;
+
+--echo # Switching to connection 'con1'.
+connection con1;
+--echo # Sending:
+--send alter table t1 rebuild partition p0
+
+--echo # Switching to connection 'default'.
+connection default;
+--echo # Wait until ALTER is blocked because of active SR lock.
+let $wait_condition=
+  select count(*) = 1 from information_schema.processlist
+  where state = "Waiting for table" and info = "alter table t1 rebuild partition p0";
+--source include/wait_condition.inc
+
+--echo # The below statement should succeed as transaction
+--echo # has SR metadata lock on t1 and only going to read
+--echo # rows from it.
+insert into t2 select count(*) from t1;
+--echo # Unblock ALTER TABLE.
+commit;
+
+--echo # Switching to connection 'con1'.
+connection con1;
+--echo # Reaping ALTER TABLE.
+--reap
+
+--echo # Switching to connection 'default'.
+connection default;
+disconnect con1;
+--echo # Clean-up.
+drop tables t1, t2;
+
+
+# Check that all connections opened by test cases in this file are really
+# gone so execution of other tests won't be affected by their presence.
+--source include/wait_until_count_sessions.inc

=== modified file 'sql/ha_ndbcluster.cc'
--- a/sql/ha_ndbcluster.cc	2010-03-23 21:04:07 +0000
+++ b/sql/ha_ndbcluster.cc	2010-04-13 14:36:37 +0000
@@ -1623,6 +1623,9 @@ void ha_ndbcluster::release_metadata(THD
 
 int ha_ndbcluster::get_ndb_lock_type(enum thr_lock_type type)
 {
+  /*
+    QQ: Should we also try to take LM_Read lock for ALTER?
+  */
   if (type >= TL_WRITE_ALLOW_WRITE)
     return NdbOperation::LM_Exclusive;
   if (type ==  TL_READ_WITH_SHARED_LOCKS ||

=== modified file 'sql/ha_partition.cc'
--- a/sql/ha_partition.cc	2010-03-03 14:58:12 +0000
+++ b/sql/ha_partition.cc	2010-04-13 14:36:37 +0000
@@ -1300,7 +1300,7 @@ int ha_partition::prepare_new_partition(
     assumes that external_lock() is last call that may fail here.
     Otherwise see description for cleanup_new_partition().
   */
-  if ((error= file->ha_external_lock(ha_thd(), m_lock_type)))
+  if ((error= file->ha_external_lock(ha_thd(), F_WRLCK)))
     goto error_external_lock;
   DBUG_PRINT("info", ("partition %s external locked", part_name));
 

=== modified file 'sql/lock.cc'
--- a/sql/lock.cc	2010-03-23 21:04:07 +0000
+++ b/sql/lock.cc	2010-04-13 14:36:37 +0000
@@ -151,6 +151,10 @@ lock_tables_check(THD *thd, TABLE **tabl
 
     if (t->reginfo.lock_type >= TL_WRITE_ALLOW_WRITE)
     {
+      /*
+        QQ: should we do anything here about the fact that ALTER TABLE
+            now acquires TL_READ_NO_INSERT?
+      */
       if (t->s->table_category == TABLE_CATEGORY_SYSTEM)
         system_count++;
 
@@ -406,6 +410,12 @@ void mysql_unlock_read_tables(THD *thd, 
   uint i,found;
   DBUG_ENTER("mysql_unlock_read_tables");
 
+  /*
+    QQ: Can we get rid of this TL_WRITE_ALLOW_READ usage?
+        Code below also releases TL_WRITE_ALLOW_WRITE locks
+        which could have been acquired by SELECT ... FOR
+        UPDATE.
+  */
   /* Move all write locks first */
   THR_LOCK_DATA **lock=sql_lock->locks;
   for (i=found=0 ; i < sql_lock->lock_count ; i++)

=== modified file 'sql/mdl.cc'
--- a/sql/mdl.cc	2010-03-07 17:50:47 +0000
+++ b/sql/mdl.cc	2010-04-13 14:36:37 +0000
@@ -1689,9 +1689,8 @@ err:
         shared mode).
 
   @note There can be only one upgrader for a lock or we will have deadlock.
-        This invariant is ensured by code outside of metadata subsystem usually
-        by obtaining some sort of exclusive table-level lock (e.g. TL_WRITE,
-        TL_WRITE_ALLOW_READ) before performing upgrade of metadata lock.
+        This invariant is ensured by the fact that upgradeable locks SNW
+        and SNRW are not compatible with each other and themselves.
 
   @retval FALSE  Success
   @retval TRUE   Failure (thread was killed)

=== modified file 'sql/sql_base.cc'
--- a/sql/sql_base.cc	2010-03-23 21:04:07 +0000
+++ b/sql/sql_base.cc	2010-04-13 14:36:37 +0000
@@ -4984,6 +4984,11 @@ static bool check_lock_and_start_stmt(TH
   int error;
   DBUG_ENTER("check_lock_and_start_stmt");
 
+  /*
+    QQ: should we adjust this check? Note that the second part
+        of condition implicitly checks that engines do not
+        downgrade write locks for LOCK TABLES.
+  */
   if ((int) lock_type >= (int) TL_WRITE_ALLOW_READ &&
       (int) table->reginfo.lock_type < (int) TL_WRITE_ALLOW_READ)
   {

=== modified file 'sql/sql_table.cc'
--- a/sql/sql_table.cc	2010-03-23 21:04:07 +0000
+++ b/sql/sql_table.cc	2010-04-13 14:36:37 +0000
@@ -4661,7 +4661,7 @@ static bool mysql_admin_table(THD* thd, 
       To allow concurrent execution of read-only operations we acquire
       weak metadata lock for them.
     */
-    table->mdl_request.set_type((lock_type >= TL_WRITE_ALLOW_READ) ?
+    table->mdl_request.set_type((lock_type >= TL_WRITE_ALLOW_WRITE) ?
                                 MDL_SHARED_NO_READ_WRITE : MDL_SHARED_READ);
     /* open only one table from local list of command */
     {
@@ -7862,7 +7862,7 @@ bool mysql_recreate_table(THD *thd, TABL
   /* Same applies to MDL ticket. */
   table_list->mdl_request.ticket= NULL;
   /* Set lock type which is appropriate for ALTER TABLE. */
-  table_list->lock_type= TL_WRITE_ALLOW_READ;
+  table_list->lock_type= TL_READ_NO_INSERT;
   /* Same applies to MDL request. */
   table_list->mdl_request.set_type(MDL_SHARED_NO_WRITE);
 

=== modified file 'sql/sql_trigger.cc'
--- a/sql/sql_trigger.cc	2010-03-23 21:04:07 +0000
+++ b/sql/sql_trigger.cc	2010-04-13 14:36:37 +0000
@@ -470,7 +470,7 @@ bool mysql_create_or_drop_trigger(THD *t
   else
   {
     tables->table= open_n_lock_single_table(thd, tables,
-                                            TL_WRITE_ALLOW_READ, 0);
+                                            TL_READ_NO_INSERT, 0);
     if (! tables->table)
       goto end;
     tables->table->use_all_columns();

=== modified file 'sql/sql_yacc.yy'
--- a/sql/sql_yacc.yy	2010-03-23 21:04:07 +0000
+++ b/sql/sql_yacc.yy	2010-04-13 14:36:37 +0000
@@ -689,7 +689,7 @@ static bool add_create_index_prepare (LE
   lex->sql_command= SQLCOM_CREATE_INDEX;
   if (!lex->current_select->add_table_to_list(lex->thd, table, NULL,
                                               TL_OPTION_UPDATING,
-                                              TL_WRITE_ALLOW_READ,
+                                              TL_READ_NO_INSERT,
                                               MDL_SHARED_NO_WRITE))
     return TRUE;
   lex->alter_info.reset();
@@ -6151,7 +6151,7 @@ alter:
             lex->duplicates= DUP_ERROR; 
             if (!lex->select_lex.add_table_to_list(thd, $4, NULL,
                                                    TL_OPTION_UPDATING,
-                                                   TL_WRITE_ALLOW_READ,
+                                                   TL_READ_NO_INSERT,
                                                    MDL_SHARED_NO_WRITE))
               MYSQL_YYABORT;
             lex->col_list.empty();
@@ -10130,7 +10130,7 @@ drop:
             lex->alter_info.drop_list.push_back(ad);
             if (!lex->current_select->add_table_to_list(lex->thd, $5, NULL,
                                                         TL_OPTION_UPDATING,
-                                                        TL_WRITE_ALLOW_READ,
+                                                        TL_READ_NO_INSERT,
                                                         MDL_SHARED_NO_WRITE))
               MYSQL_YYABORT;
           }
@@ -14063,7 +14063,7 @@ trigger_tail:
             if (!lex->select_lex.add_table_to_list(YYTHD, $9,
                                                    (LEX_STRING*) 0,
                                                    TL_OPTION_UPDATING,
-                                                   TL_WRITE_ALLOW_READ,
+                                                   TL_READ_NO_INSERT,
                                                    MDL_SHARED_NO_WRITE))
               MYSQL_YYABORT;
           }


Attachment: [text/bzr-bundle] bzr/dlenev@mysql.com-20100413143637-vgogq5fc8bxmkqv5.bundle
Thread
bzr commit into mysql-trunk-runtime branch (dlenev:3000) Bug#51263Dmitry Lenev13 Apr