MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:Mats Kindahl Date:December 1 2006 8:43am
Subject:bk commit into 5.1 tree (mats:1.2309) BUG#22865
View as plain text  
Below is the list of changes that have just been committed into a local
5.1 repository of mats. When mats does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html

ChangeSet@stripped, 2006-12-01 09:43:06+01:00, mats@stripped +11 -0
  BUG#22864 (Rollback following CREATE... SELECT discards 'CREATE TABLE'
  from log):
  When row-based logging is used, the CREATE-SELECT is written as two
  parts: as a CREATE TABLE statement and as the rows for the table. For
  both transactional and non-transactional tables, the CREATE TABLE
  statement was written to the transaction cache, as were the rows, and
  on statement end, the entire transaction cache was written to the binary
  log if the table was non-transactional. For transactional tables, the
  events were kept in the transaction cache until end of transaction (or
  statement that were not part of a transaction).
  
  For the case when AUTOCOMMIT=0 and we are creating a transactional table
  using a create select, we would then keep the CREATE TABLE statement and
  the rows for the CREATE-SELECT, while executing the following statements.
  On a rollback, the transaction cache would then be cleared, which would
  also remove the CREATE TABLE statement. Hence no table would be created
  on the slave, while there is an empty table on the master.
  
  This relates to BUG#22865 where the table being created exists on the
  master, but not on the slave during insertion of rows into the newly
  created table. This occurs since the CREATE TABLE statement were still
  in the transaction cache until the statement finished executing, and
  possibly longer if the table was transactional.
  
  This patch changes the behaviour of the CREATE-SELECT statement by
  writing the CREATE TABLE statement directly to the binary log as soon
  as it is created, then the rows that are inserted into the table, and
  on error a DROP table is written to the binary log, provided:
  - the table was created,
  - the table didn't exist prior to the statement,
  - we are replicating row-based, and
  - it was not a temporary table.
  
  For transactional tables, the intermediate rows are not written to the
  binary log, but for non-transactional tables, the rows are written to
  the binary log immediately (hence, changes to non-transactional tables
  are propagated to the slave as soon as enough rows are collected to form
  a rows event).

  mysql-test/r/binlog_row_insert_select.result@stripped, 2006-12-01 09:42:55+01:00, mats@stripped +7 -0
    Result change

  mysql-test/r/binlog_row_mix_innodb_myisam.result@stripped, 2006-12-01 09:42:55+01:00, mats@stripped +6 -0
    Result change

  mysql-test/r/rpl_row_create_table.result@stripped, 2006-12-01 09:42:56+01:00, mats@stripped +119 -11
    Result change

  mysql-test/t/rpl_row_create_table.test@stripped, 2006-12-01 09:42:56+01:00, mats@stripped +58 -6
    Requring InnoDB for slave as well.
    Adding test CREATE-SELECT that is rolled back explicitly.
    Changing binlog positions.

  sql/log.cc@stripped, 2006-12-01 09:42:56+01:00, mats@stripped +129 -29
    Adding helper class to handle lock/unlock of mutexes using RAII.
    Factoring out code into write_cache() function to transaction cache
      to binary log.
    Adding function THD::binlog_flush_transaction_cache() to flush the
      transaction cache to the binary log file.
    Factoring out code into binlog_set_stmt_begin() to set the beginning
      of statement savepoint.
    Clearing before statement point when transaction cache is truncated
     so that these points are out of range.

  sql/log.h@stripped, 2006-12-01 09:42:56+01:00, mats@stripped +2 -0
    Adding method MYSQL_BIN_LOG::write_cache()

  sql/log_event.h@stripped, 2006-12-01 09:42:57+01:00, mats@stripped +9 -3
    Replicating OPTION_NOT_AUTOCOMMIT flag (see changeset comment)

  sql/mysql_priv.h@stripped, 2006-12-01 09:42:57+01:00, mats@stripped +33 -33
    Although left-shifting signed integer values is well-defined,
    it has potential for strange errors. Using unsigned long long
    instead of signed long long since this is the type of the options
    flags.

  sql/slave.cc@stripped, 2006-12-01 09:42:57+01:00, mats@stripped +6 -0
    Adding printout of transaction-critical thread flags.

  sql/sql_class.h@stripped, 2006-12-01 09:42:58+01:00, mats@stripped +2 -0
    Adding function THD::binlog_flush_transaction_cache()
    Adding function THD::binlog_set_stmt_begin()

  sql/sql_insert.cc@stripped, 2006-12-01 09:42:59+01:00, mats@stripped +90 -32
    Writing CREATE TABLE directly to binary log at beginning of a CREATE-SELECT.
    Writing DROP TABLE statement to binary log on error *after* rolling back the statement.
    Remove changes to OPTION_STATUS_NO_TRANS_UPDATE since CREATE-SELECT is now separated
    into three sections:
    - CREATE (written non-transactionally)
    - rows (written according to the storage engines transactionallity)
    - and DROP (written non-transactionally)

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User:	mats
# Host:	kindahl-laptop.dnsalias.net
# Root:	/home/bk/b22864-mysql-5.1-new-rpl

--- 1.238/sql/log.cc	2006-12-01 09:43:16 +01:00
+++ 1.239/sql/log.cc	2006-12-01 09:43:16 +01:00
@@ -82,6 +82,41 @@
 }
 
 /*
+  Helper class to hold a mutex for the duration of the
+  block.
+
+  Eliminates the need for explicit unlocking of mutexes on, e.g.,
+  error returns.  On passing a null pointer, the sentry will not do
+  anything.
+ */
+class Mutex_sentry
+{
+public:
+  Mutex_sentry(pthread_mutex_t *mutex)
+    : m_mutex(mutex)
+  {
+    if (m_mutex)
+      pthread_mutex_lock(mutex);
+  }
+
+  ~Mutex_sentry()
+  {
+    if (m_mutex)
+      pthread_mutex_unlock(m_mutex);
+#ifndef DBUG_OFF
+    m_mutex= 0;
+#endif
+  }
+
+private:
+  pthread_mutex_t *m_mutex;
+
+  // It's not allowed to copy this object in any way
+  Mutex_sentry(Mutex_sentry const&);
+  void operator=(Mutex_sentry const&);
+};
+
+/*
   Helper class to store binary log transaction data.
 */
 class binlog_trx_data {
@@ -121,11 +156,17 @@
    */
   void truncate(my_off_t pos)
   {
+    DBUG_PRINT("info", ("truncating to position %lu", pos));
+    DBUG_PRINT("info", ("before_stmt_pos=%lu", pos));
 #ifdef HAVE_ROW_BASED_REPLICATION
     delete pending();
     set_pending(0);
 #endif
     reinit_io_cache(&trans_log, WRITE_CACHE, pos, 0, 0);
+#ifdef HAVE_ROW_BASED_REPLICATION
+    if (pos < before_stmt_pos)
+      before_stmt_pos= MY_OFF_T_UNDEF;
+#endif
   }
 
   /*
@@ -1416,12 +1457,11 @@
 
       If rolling back a statement in a transaction, we truncate the
       transaction cache to remove the statement.
-
      */
     if (all || !(thd->options & (OPTION_BEGIN | OPTION_NOT_AUTOCOMMIT)))
       trx_data->reset();
-    else
-      trx_data->truncate(trx_data->before_stmt_pos); // ...statement
+    else                                        // ...statement
+      trx_data->truncate(trx_data->before_stmt_pos);
 
     /*
       We need to step the table map version on a rollback to ensure
@@ -2010,7 +2050,7 @@
           goto err;
 
     /* command_type, thread_id */
-    length= my_snprintf(buff, 32, "%5ld ", thread_id);
+    length= my_snprintf(buff, 32, "%5ld ", static_cast<long>(thread_id));
 
     if (my_b_write(&log_file, (byte*) buff, length))
       goto err;
@@ -3338,18 +3378,7 @@
   if (trx_data == NULL ||
       trx_data->before_stmt_pos == MY_OFF_T_UNDEF)
   {
-    /*
-      The call to binlog_trans_log_savepos() might create the trx_data
-      structure, if it didn't exist before, so we save the position
-      into an auto variable and then write it into the transaction
-      data for the binary log (i.e., trx_data).
-    */
-    my_off_t pos= 0;
-    binlog_trans_log_savepos(this, &pos);
-    trx_data= (binlog_trx_data*) ha_data[binlog_hton->slot];
-
-    trx_data->before_stmt_pos= pos;
-
+    this->binlog_set_stmt_begin();
     if (options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN))
       trans_register_ha(this, TRUE, binlog_hton);
     trans_register_ha(this, FALSE, binlog_hton);
@@ -3357,6 +3386,51 @@
   DBUG_VOID_RETURN;
 }
 
+void THD::binlog_set_stmt_begin() {
+  binlog_trx_data *trx_data=
+    (binlog_trx_data*) ha_data[binlog_hton->slot];
+
+  /*
+    The call to binlog_trans_log_savepos() might create the trx_data
+    structure, if it didn't exist before, so we save the position
+    into an auto variable and then write it into the transaction
+    data for the binary log (i.e., trx_data).
+  */
+  my_off_t pos= 0;
+  binlog_trans_log_savepos(this, &pos);
+  trx_data= (binlog_trx_data*) ha_data[binlog_hton->slot];
+  trx_data->before_stmt_pos= pos;
+}
+
+int THD::binlog_flush_transaction_cache()
+{
+  DBUG_ENTER("binlog_flush_transaction_cache");
+  binlog_trx_data *trx_data= (binlog_trx_data*) ha_data[binlog_hton->slot];
+  DBUG_PRINT("enter", ("trx_data=0x%lu", trx_data));
+  if (trx_data)
+    DBUG_PRINT("enter", ("trx_data->before_stmt_pos=%u",
+                         trx_data->before_stmt_pos));
+
+  /*
+    Write the transaction cache to the binary log.  We don't flush and
+    sync the log file since we don't know if more will be written to
+    it. If the caller want the log file sync:ed, the caller has to do
+    it.
+
+    The transaction data is only reset upon a successful write of the
+    cache to the binary log.
+  */
+
+  if (trx_data && likely(mysql_bin_log.is_open())) {
+    if (int error= mysql_bin_log.write_cache(&trx_data->trans_log, true, true))
+      DBUG_RETURN(error);
+    trx_data->reset();
+  }
+
+  DBUG_RETURN(0);
+}
+
+
 /*
   Write a table map to the binary log.
  */
@@ -3768,12 +3842,48 @@
 
 
 /*
+  Write the contents of a cache to the binary log.
+
+  SYNOPSIS
+    write()
+    cache    Cache to write to the binary log
+    sync_log True if the log should be flushed and sync:ed
+
+  DESCRIPTION
+
+    Write the contents of the cache to the binary log. The cache will
+    be reset as a READ_CACHE to be able to read the contents from it.
+ */
+
+int MYSQL_BIN_LOG::write_cache(IO_CACHE *cache, bool lock_log, bool sync_log)
+{
+  Mutex_sentry sentry(lock_log ? &LOCK_log : NULL);
+
+  if (reinit_io_cache(cache, READ_CACHE, 0, 0, 0))
+    return ER_ERROR_ON_WRITE;
+  uint bytes= my_b_bytes_in_cache(cache);
+  do
+  {
+    if (my_b_write(&log_file, cache->read_pos, bytes))
+      return ER_ERROR_ON_WRITE;
+    cache->read_pos= cache->read_end;
+  } while ((bytes= my_b_fill(cache)));
+
+  if (sync_log)
+    flush_and_sync();
+
+  return 0;                                     // All OK
+}
+
+/*
   Write a cached log entry to the binary log
 
   SYNOPSIS
     write()
     thd
     cache		The cache to copy to the binlog
+    commit_event        The commit event to print after writing the
+                        contents of the cache.
 
   NOTE
     - We only come here if there is something in the cache.
@@ -3833,20 +3943,10 @@
         if (qinfo.write(&log_file))
           goto err;
       }
-      /* Read from the file used to cache the queries .*/
-      if (reinit_io_cache(cache, READ_CACHE, 0, 0, 0))
-        goto err;
-      length=my_b_bytes_in_cache(cache);
-      DBUG_EXECUTE_IF("half_binlogged_transaction", length-=100;);
-      do
-      {
-        /* Write data to the binary log file */
-        if (my_b_write(&log_file, cache->read_pos, length))
-          goto err;
-        cache->read_pos=cache->read_end;		// Mark buffer used up
-        DBUG_EXECUTE_IF("half_binlogged_transaction", goto DBUG_skip_commit;);
-      } while ((length=my_b_fill(cache)));
 
+      if ((write_error= write_cache(cache, false, false)))
+        goto err;
+      
       if (commit_event && commit_event->write(&log_file))
         goto err;
 #ifndef DBUG_OFF

--- 1.136/sql/log_event.h	2006-12-01 09:43:16 +01:00
+++ 1.137/sql/log_event.h	2006-12-01 09:43:16 +01:00
@@ -405,12 +405,18 @@
    either, as the manual says (because a too big in-memory temp table is
    automatically written to disk).
 */
-#define OPTIONS_WRITTEN_TO_BIN_LOG (OPTION_AUTO_IS_NULL | \
-OPTION_NO_FOREIGN_KEY_CHECKS | OPTION_RELAXED_UNIQUE_CHECKS)
+#define OPTIONS_WRITTEN_TO_BIN_LOG \
+  (OPTION_AUTO_IS_NULL | OPTION_NO_FOREIGN_KEY_CHECKS |  \
+   OPTION_RELAXED_UNIQUE_CHECKS | OPTION_NOT_AUTOCOMMIT)
 
-#if OPTIONS_WRITTEN_TO_BIN_LOG != ((1L << 14) | (1L << 26) | (1L << 27))
+/* Shouldn't be defined before */
+#define EXPECTED_OPTIONS \
+  ((ULL(1) << 14) | (ULL(1) << 26) | (ULL(1) << 27) | (ULL(1) << 19))
+
+#if OPTIONS_WRITTEN_TO_BIN_LOG != EXPECTED_OPTIONS
 #error OPTIONS_WRITTEN_TO_BIN_LOG must NOT change their values!
 #endif
+#undef EXPECTED_OPTIONS         /* You shouldn't use this one */
 
 enum Log_event_type
 {

--- 1.448/sql/mysql_priv.h	2006-12-01 09:43:16 +01:00
+++ 1.449/sql/mysql_priv.h	2006-12-01 09:43:16 +01:00
@@ -296,54 +296,54 @@
    TODO: separate three contexts above, move them to separate bitfields.
 */
 
-#define SELECT_DISTINCT         (LL(1) << 0)       // SELECT, user
-#define SELECT_STRAIGHT_JOIN    (LL(1) << 1)       // SELECT, user
-#define SELECT_DESCRIBE         (LL(1) << 2)       // SELECT, user
-#define SELECT_SMALL_RESULT     (LL(1) << 3)       // SELECT, user
-#define SELECT_BIG_RESULT       (LL(1) << 4)       // SELECT, user
-#define OPTION_FOUND_ROWS       (LL(1) << 5)       // SELECT, user
-#define OPTION_TO_QUERY_CACHE   (LL(1) << 6)       // SELECT, user
-#define SELECT_NO_JOIN_CACHE    (LL(1) << 7)       // intern
-#define OPTION_BIG_TABLES       (LL(1) << 8)       // THD, user
-#define OPTION_BIG_SELECTS      (LL(1) << 9)       // THD, user
-#define OPTION_LOG_OFF          (LL(1) << 10)      // THD, user
-#define OPTION_QUOTE_SHOW_CREATE (LL(1) << 11)     // THD, user
-#define TMP_TABLE_ALL_COLUMNS   (LL(1) << 12)      // SELECT, intern
-#define OPTION_WARNINGS         (LL(1) << 13)      // THD, user
-#define OPTION_AUTO_IS_NULL     (LL(1) << 14)      // THD, user, binlog
-#define OPTION_FOUND_COMMENT    (LL(1) << 15)      // SELECT, intern, parser
-#define OPTION_SAFE_UPDATES     (LL(1) << 16)      // THD, user
-#define OPTION_BUFFER_RESULT    (LL(1) << 17)      // SELECT, user
-#define OPTION_BIN_LOG          (LL(1) << 18)      // THD, user
-#define OPTION_NOT_AUTOCOMMIT   (LL(1) << 19)      // THD, user
-#define OPTION_BEGIN            (LL(1) << 20)      // THD, intern
-#define OPTION_TABLE_LOCK       (LL(1) << 21)      // THD, intern
-#define OPTION_QUICK            (LL(1) << 22)      // SELECT (for DELETE)
-#define OPTION_KEEP_LOG         (LL(1) << 23)      // Keep binlog on rollback
+#define SELECT_DISTINCT          (ULL(1) << 0)       // SELECT, user
+#define SELECT_STRAIGHT_JOIN     (ULL(1) << 1)       // SELECT, user
+#define SELECT_DESCRIBE          (ULL(1) << 2)       // SELECT, user
+#define SELECT_SMALL_RESULT      (ULL(1) << 3)       // SELECT, user
+#define SELECT_BIG_RESULT        (ULL(1) << 4)       // SELECT, user
+#define OPTION_FOUND_ROWS        (ULL(1) << 5)       // SELECT, user
+#define OPTION_TO_QUERY_CACHE    (ULL(1) << 6)       // SELECT, user
+#define SELECT_NO_JOIN_CACHE     (ULL(1) << 7)       // intern
+#define OPTION_BIG_TABLES        (ULL(1) << 8)       // THD, user
+#define OPTION_BIG_SELECTS       (ULL(1) << 9)       // THD, user
+#define OPTION_LOG_OFF           (ULL(1) << 10)      // THD, user
+#define OPTION_QUOTE_SHOW_CREATE (ULL(1) << 11)      // THD, user
+#define TMP_TABLE_ALL_COLUMNS    (ULL(1) << 12)      // SELECT, intern
+#define OPTION_WARNINGS          (ULL(1) << 13)      // THD, user
+#define OPTION_AUTO_IS_NULL      (ULL(1) << 14)      // THD, user, binlog
+#define OPTION_FOUND_COMMENT     (ULL(1) << 15)      // SELECT, intern, parser
+#define OPTION_SAFE_UPDATES      (ULL(1) << 16)      // THD, user
+#define OPTION_BUFFER_RESULT     (ULL(1) << 17)      // SELECT, user
+#define OPTION_BIN_LOG           (ULL(1) << 18)      // THD, user
+#define OPTION_NOT_AUTOCOMMIT    (ULL(1) << 19)      // THD, user
+#define OPTION_BEGIN             (ULL(1) << 20)      // THD, intern
+#define OPTION_TABLE_LOCK        (ULL(1) << 21)      // THD, intern
+#define OPTION_QUICK             (ULL(1) << 22)      // SELECT (for DELETE)
+#define OPTION_KEEP_LOG          (ULL(1) << 23)      // Keep binlog on rollback
 
 /* The following is used to detect a conflict with DISTINCT */
-#define SELECT_ALL              (LL(1) << 24)      // SELECT, user, parser
+#define SELECT_ALL               (ULL(1) << 24)      // SELECT, user, parser
 
 /* Set if we are updating a non-transaction safe table */
-#define OPTION_STATUS_NO_TRANS_UPDATE   (LL(1) << 25) // THD, intern
+#define OPTION_STATUS_NO_TRANS_UPDATE   (ULL(1) << 25) // THD, intern
 
 /* The following can be set when importing tables in a 'wrong order'
    to suppress foreign key checks */
-#define OPTION_NO_FOREIGN_KEY_CHECKS    (LL(1) << 26) // THD, user, binlog
+#define OPTION_NO_FOREIGN_KEY_CHECKS    (ULL(1) << 26) // THD, user, binlog
 /* The following speeds up inserts to InnoDB tables by suppressing unique
    key checks in some cases */
-#define OPTION_RELAXED_UNIQUE_CHECKS    (LL(1) << 27) // THD, user, binlog
-#define SELECT_NO_UNLOCK                (LL(1) << 28) // SELECT, intern
-#define OPTION_SCHEMA_TABLE             (LL(1) << 29) // SELECT, intern
+#define OPTION_RELAXED_UNIQUE_CHECKS    (ULL(1) << 27) // THD, user, binlog
+#define SELECT_NO_UNLOCK                (ULL(1) << 28) // SELECT, intern
+#define OPTION_SCHEMA_TABLE             (ULL(1) << 29) // SELECT, intern
 /* Flag set if setup_tables already done */
-#define OPTION_SETUP_TABLES_DONE        (LL(1) << 30) // intern
+#define OPTION_SETUP_TABLES_DONE        (ULL(1) << 30) // intern
 /* If not set then the thread will ignore all warnings with level notes. */
-#define OPTION_SQL_NOTES                (LL(1) << 31) // THD, user
+#define OPTION_SQL_NOTES                (ULL(1) << 31) // THD, user
 /*
   Force the used temporary table to be a MyISAM table (because we will use
   fulltext functions when reading from it.
 */
-#define TMP_TABLE_FORCE_MYISAM          (LL(1) << 32)
+#define TMP_TABLE_FORCE_MYISAM          (ULL(1) << 32)
 
 /*
   Maximum length of time zone name that we support

--- 1.288/sql/slave.cc	2006-12-01 09:43:16 +01:00
+++ 1.289/sql/slave.cc	2006-12-01 09:43:16 +01:00
@@ -31,6 +31,8 @@
 
 #include "rpl_tblmap.h"
 
+#define FLAGSTR(V,F) ((V)&(F)?#F" ":"")
+
 #define MAX_SLAVE_RETRY_PAUSE 5
 bool use_slave_mask = 0;
 MY_BITMAP slave_error_mask;
@@ -3153,6 +3155,10 @@
     if (!ev->when)
       ev->when = time(NULL);
     ev->thd = thd; // because up to this point, ev->thd == 0
+    DBUG_PRINT("info", ("thd->options={ %s%s}",
+                        FLAGSTR(thd->options, OPTION_NOT_AUTOCOMMIT),
+                        FLAGSTR(thd->options, OPTION_BEGIN)));
+
     exec_res = ev->exec_event(rli);
     DBUG_PRINT("info", ("exec_event result = %d", exec_res));
     DBUG_ASSERT(rli->sql_thd==thd);

--- 1.321/sql/sql_class.h	2006-12-01 09:43:16 +01:00
+++ 1.322/sql/sql_class.h	2006-12-01 09:43:16 +01:00
@@ -931,6 +931,8 @@
     Public interface to write RBR events to the binlog
   */
   void binlog_start_trans_and_stmt();
+  int binlog_flush_transaction_cache();
+  void binlog_set_stmt_begin();
   int binlog_write_table_map(TABLE *table, bool is_transactional);
   int binlog_write_row(TABLE* table, bool is_transactional,
                        MY_BITMAP const* cols, my_size_t colcnt,

--- 1.231/sql/sql_insert.cc	2006-12-01 09:43:16 +01:00
+++ 1.232/sql/sql_insert.cc	2006-12-01 09:43:16 +01:00
@@ -2640,8 +2640,7 @@
     If the creation of the table failed (due to a syntax error, for
     example), no table will have been opened and therefore 'table'
     will be NULL. In that case, we still need to execute the rollback
-    and the end of the function to truncate the binary log, but we can
-    skip all the intermediate steps.
+    and the end of the function.
    */
   if (table)
   {
@@ -2672,13 +2671,8 @@
       if (!table->file->has_transactions())
       {
         if (mysql_bin_log.is_open())
-        {
           thd->binlog_query(THD::ROW_QUERY_TYPE, thd->query, thd->query_length,
                             table->file->has_transactions(), FALSE);
-        }
-        if (!thd->current_stmt_binlog_row_based && !table->s->tmp_table &&
-            !can_rollback_data())
-          thd->options|= OPTION_STATUS_NO_TRANS_UPDATE;
         query_cache_invalidate3(thd, table, 1);
       }
     }
@@ -2711,14 +2705,10 @@
     */
     query_cache_invalidate3(thd, table, 1);
     /*
-      Mark that we have done permanent changes if all of the below is true
-      - Table doesn't support transactions
-      - It's a normal (not temporary) table. (Changes to temporary tables
-        are not logged in RBR)
-      - We are using statement based replication
+      Mark that we have done permanent changes if the table doesn't
+      support transactions.
     */
-    if (!trans_table &&
-        (!table->s->tmp_table || !thd->current_stmt_binlog_row_based))
+    if (!trans_table)
       thd->options|= OPTION_STATUS_NO_TRANS_UPDATE;
    }
 
@@ -2943,6 +2933,27 @@
 
   TABLEOP_HOOKS *hook_ptr= NULL;
 #ifdef HAVE_ROW_BASED_REPLICATION
+  /*
+    For row-based replication, the CREATE-SELECT statement is written
+    in two pieces: the first one contain the CREATE TABLE statement
+    necessary to create the table and the second part contain the rows
+    that should go into the table.
+
+    Since CREATE-SELECT implicitly commits the previous transaction,
+    the CREATE TABLE statement will be written directly to the binary
+    log and not go into the transaction cache.
+
+    After that, the rows will either be written directly to the binary
+    log, if the table is non-transactional, or be written to the
+    transaction cache, if the table is transactional.
+
+    On the master, the table is locked for the duration of the
+    statement, but since the CREATE part is replicated as a simple
+    statement, there is no way to lock the table for acesses on the
+    slave.  Therefore, it is possible to manipulate the table on the
+    slave in the time between the creation of the table and the
+    arrival of the first row that will go into the table.
+   */
   class MY_HOOKS : public TABLEOP_HOOKS {
   public:
     MY_HOOKS(select_create *x) : ptr(x) { }
@@ -2968,19 +2979,6 @@
 
   unit= u;
 
-#ifdef HAVE_ROW_BASED_REPLICATION
-  /*
-    Start a statement transaction before the create if we are creating
-    a non-temporary table and are using row-based replication for the
-    statement.
-  */
-  if ((thd->lex->create_info.options & HA_LEX_CREATE_TMP_TABLE) == 0 &&
-      thd->current_stmt_binlog_row_based)
-  {
-    thd->binlog_start_trans_and_stmt();
-  }
-#endif
-
   if (!(table= create_table_from_items(thd, create_info, create_table,
                                        extra_fields, keys, &values,
                                        &thd->extra_lock, hook_ptr)))
@@ -3062,7 +3060,7 @@
 
   thd->binlog_query(THD::STMT_QUERY_TYPE,
                     query.ptr(), query.length(),
-                    /* is_trans */ TRUE,
+                    /* is_trans */ FALSE,
                     /* suppress_use */ FALSE);
 }
 #endif // HAVE_ROW_BASED_REPLICATION
@@ -3076,13 +3074,71 @@
 
 void select_create::send_error(uint errcode,const char *err)
 {
+  DBUG_ENTER("select_create::send_error");
+
+  DBUG_PRINT("info",
+             ("Current statement %s row-based",
+              thd->current_stmt_binlog_row_based ? "is" : "is NOT"));
+  DBUG_PRINT("info",
+             ("Current table (at 0x%ul) %s a temporary (or non-existant) table",
+              table,
+              table && table->s->tmp_table == NO_TMP_TABLE ? "is NOT" : "is"));
+  DBUG_PRINT("info",
+             ("Table %s prior to executing this statement",
+              get_create_info()->table_existed ? "existed" : "did not exist"));
+
   /*
-   Disable binlog, because we "roll back" partial inserts in ::abort
-   by removing the table, even for non-transactional tables.
+    This will execute any rollbacks that are necessary before we write
+    the DROP (if needed). For statement-based replication, we disable
+    the binary log since nothing should be written to the binary log.
   */
-  tmp_disable_binlog(thd);
+  if (!thd->current_stmt_binlog_row_based)
+    tmp_disable_binlog(thd);
+
   select_insert::send_error(errcode, err);
-  reenable_binlog(thd);
+
+  if (!thd->current_stmt_binlog_row_based)
+    reenable_binlog(thd);
+
+  /*
+    We only write a DROP TABLE if:
+    - If the current statement is logged row-based.
+    - The table creation succeeded.
+    - The table is not a temporary table: if it is a temporary table,
+      no CREATE statement nor any rows have been written to the binary
+      log.
+    - The table did not exist prior to execution of the CREATE-SELECT
+      statement: if it existed before starting to execute this
+      statement, we should not drop the table.
+   */
+  if (thd->current_stmt_binlog_row_based &&
+      table &&
+      table->s->tmp_table == NO_TMP_TABLE &&
+      !get_create_info()->table_existed)
+  {
+    char buf[2048];
+    String query(buf, sizeof(buf), system_charset_info);
+    query.length(0);
+
+    query.append(STRING_WITH_LEN("DROP TABLE "));
+    append_identifier(thd, &query,
+                      create_table->table_name,
+                      create_table->table_name_length);
+
+
+    // Save error code to be able to restore it later
+    int saved_errno= thd->net.last_errno;
+
+    // No error for the DROP statement statement
+    thd->net.last_errno= 0;
+    thd->binlog_query(THD::STMT_QUERY_TYPE,
+                      query.ptr(), query.length(),
+                      /* is_trans */ FALSE,
+                      /* suppress_use */ FALSE);
+    // Restore error to report it correctly upwards
+    thd->net.last_errno= saved_errno;
+  }
+  DBUG_VOID_RETURN;
 }
 
 
@@ -3111,6 +3167,7 @@
 
 void select_create::abort()
 {
+  DBUG_ENTER("select_create::abort");
   VOID(pthread_mutex_lock(&LOCK_open));
   if (thd->extra_lock)
   {
@@ -3148,6 +3205,7 @@
     table=0;                                    // Safety
   }
   VOID(pthread_mutex_unlock(&LOCK_open));
+  DBUG_VOID_RETURN;
 }
 
 

--- 1.4/mysql-test/r/binlog_row_insert_select.result	2006-12-01 09:43:16 +01:00
+++ 1.5/mysql-test/r/binlog_row_insert_select.result	2006-12-01 09:43:16 +01:00
@@ -24,4 +24,11 @@
 show binlog events;
 Log_name	Pos	Event_type	Server_id	End_log_pos	Info
 master-bin.000001	4	Format_desc	1	102	Server ver: VERSION, Binlog ver: 4
+master-bin.000001	102	Query	1	237	use `test`; CREATE TABLE `t2` (
+  `a` int(11) DEFAULT NULL,
+  UNIQUE KEY `a` (`a`)
+)
+master-bin.000001	237	Table_map	1	276	table_id: # (test.t2)
+master-bin.000001	276	Write_rows	1	310	table_id: # flags: STMT_END_F
+master-bin.000001	310	Query	1	388	use `test`; DROP TABLE `t2`
 drop table t1;

--- 1.13/mysql-test/r/binlog_row_mix_innodb_myisam.result	2006-12-01 09:43:16 +01:00
+++ 1.14/mysql-test/r/binlog_row_mix_innodb_myisam.result	2006-12-01 09:43:16 +01:00
@@ -359,6 +359,12 @@
 Log_name	Pos	Event_type	Server_id	End_log_pos	Info
 master-bin.000001	#	Table_map	1	#	table_id: # (test.t1)
 master-bin.000001	#	Write_rows	1	#	table_id: # flags: STMT_END_F
+master-bin.000001	#	Query	1	#	use `test`; CREATE TABLE `t2` (
+  `a` int(11) NOT NULL DEFAULT '0',
+  `b` int(11) DEFAULT NULL,
+  PRIMARY KEY (`a`)
+) ENGINE=InnoDB
+master-bin.000001	#	Query	1	#	use `test`; DROP TABLE `t2`
 master-bin.000001	#	Query	1	#	use `test`; DROP TABLE if exists t2
 master-bin.000001	#	Table_map	1	#	table_id: # (test.t1)
 master-bin.000001	#	Write_rows	1	#	table_id: # flags: STMT_END_F

--- 1.7/mysql-test/r/rpl_row_create_table.result	2006-12-01 09:43:16 +01:00
+++ 1.8/mysql-test/r/rpl_row_create_table.result	2006-12-01 09:43:16 +01:00
@@ -127,8 +127,16 @@
 NULL	6	12
 CREATE TABLE t7 (UNIQUE(b)) SELECT a,b FROM tt3;
 ERROR 23000: Duplicate entry '2' for key 'b'
-SHOW BINLOG EVENTS FROM 1256;
+SHOW BINLOG EVENTS FROM 1118;
 Log_name	Pos	Event_type	Server_id	End_log_pos	Info
+master-bin.000001	1118	Query	1	1281	use `test`; CREATE TABLE `t7` (
+  `a` int(11) DEFAULT NULL,
+  `b` int(11) DEFAULT NULL,
+  UNIQUE KEY `b` (`b`)
+)
+master-bin.000001	1281	Table_map	1	1321	table_id: # (test.t7)
+master-bin.000001	1321	Write_rows	1	1377	table_id: # flags: STMT_END_F
+master-bin.000001	1377	Query	1	1455	use `test`; DROP TABLE `t7`
 CREATE TABLE t7 (a INT, b INT UNIQUE);
 INSERT INTO t7 SELECT a,b FROM tt3;
 ERROR 23000: Duplicate entry '2' for key 'b'
@@ -137,11 +145,11 @@
 1	2
 2	4
 3	6
-SHOW BINLOG EVENTS FROM 1118;
+SHOW BINLOG EVENTS FROM 1455;
 Log_name	Pos	Event_type	Server_id	End_log_pos	Info
-master-bin.000001	1118	Query	1	1218	use `test`; CREATE TABLE t7 (a INT, b INT UNIQUE)
-master-bin.000001	1218	Table_map	1	1258	table_id: # (test.t7)
-master-bin.000001	1258	Write_rows	1	1314	table_id: # flags: STMT_END_F
+master-bin.000001	1455	Query	1	1555	use `test`; CREATE TABLE t7 (a INT, b INT UNIQUE)
+master-bin.000001	1555	Table_map	1	1595	table_id: # (test.t7)
+master-bin.000001	1595	Write_rows	1	1651	table_id: # flags: STMT_END_F
 SELECT * FROM t7 ORDER BY a,b;
 a	b
 1	2
@@ -154,10 +162,10 @@
 ROLLBACK;
 Warnings:
 Warning	1196	Some non-transactional changed tables couldn't be rolled back
-SHOW BINLOG EVENTS FROM 1314;
+SHOW BINLOG EVENTS FROM 1651;
 Log_name	Pos	Event_type	Server_id	End_log_pos	Info
-master-bin.000001	1314	Table_map	1	1354	table_id: # (test.t7)
-master-bin.000001	1354	Write_rows	1	1410	table_id: # flags: STMT_END_F
+master-bin.000001	1651	Table_map	1	1691	table_id: # (test.t7)
+master-bin.000001	1691	Write_rows	1	1747	table_id: # flags: STMT_END_F
 SELECT * FROM t7 ORDER BY a,b;
 a	b
 1	2
@@ -192,10 +200,10 @@
   `a` int(11) DEFAULT NULL,
   `b` int(11) DEFAULT NULL
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1
-SHOW BINLOG EVENTS FROM 1410;
+SHOW BINLOG EVENTS FROM 1747;
 Log_name	Pos	Event_type	Server_id	End_log_pos	Info
-master-bin.000001	1410	Query	1	1496	use `test`; CREATE TABLE t8 LIKE t4
-master-bin.000001	1496	Query	1	1635	use `test`; CREATE TABLE `t9` (
+master-bin.000001	1747	Query	1	1833	use `test`; CREATE TABLE t8 LIKE t4
+master-bin.000001	1833	Query	1	1972	use `test`; CREATE TABLE `t9` (
   `a` int(11) DEFAULT NULL,
   `b` int(11) DEFAULT NULL
 )
@@ -212,3 +220,103 @@
   `a` int(11) DEFAULT NULL,
   `b` int(11) DEFAULT NULL
 ) ENGINE=MEMORY DEFAULT CHARSET=latin1
+DROP TABLE IF EXISTS t1,t2,t3,t4,t5,t6,t7,t8,t9;
+STOP SLAVE;
+SET GLOBAL storage_engine=@storage_engine;
+START SLAVE;
+================ BUG#22864 ================
+STOP SLAVE;
+RESET SLAVE;
+RESET MASTER;
+START SLAVE;
+SET AUTOCOMMIT=0;
+CREATE TABLE t1 (a INT);
+INSERT INTO t1 VALUES (1),(2),(3);
+CREATE TABLE t2 ENGINE=INNODB SELECT * FROM t1;
+ROLLBACK;
+CREATE TABLE t3 ENGINE=INNODB SELECT * FROM t1;
+INSERT INTO t3 VALUES (4),(5),(6);
+ROLLBACK;
+CREATE TABLE t4 ENGINE=INNODB SELECT * FROM t1;
+INSERT INTO t1 VALUES (4),(5),(6);
+ROLLBACK;
+Warnings:
+Warning	1196	Some non-transactional changed tables couldn't be rolled back
+SHOW TABLES;
+Tables_in_test
+t1
+t2
+t3
+t4
+SELECT TABLE_NAME,ENGINE
+FROM INFORMATION_SCHEMA.TABLES
+WHERE TABLE_NAME LIKE 't_';
+TABLE_NAME	ENGINE
+t1	MyISAM
+t2	InnoDB
+t3	InnoDB
+t4	InnoDB
+SELECT * FROM t1;
+a
+1
+2
+3
+4
+5
+6
+SELECT * FROM t2;
+a
+SELECT * FROM t3;
+a
+SELECT * FROM t4;
+a
+SHOW BINLOG EVENTS;
+Log_name	Pos	Event_type	Server_id	End_log_pos	Info
+master-bin.000001	4	Format_desc	1	102	Server ver: 5.1.12-beta-debug-log, Binlog ver: 4
+master-bin.000001	102	Query	1	188	use `test`; CREATE TABLE t1 (a INT)
+master-bin.000001	188	Table_map	1	227	table_id: # (test.t1)
+master-bin.000001	227	Write_rows	1	271	table_id: # flags: STMT_END_F
+master-bin.000001	271	Query	1	396	use `test`; CREATE TABLE `t2` (
+  `a` int(11) DEFAULT NULL
+) ENGINE=InnoDB
+master-bin.000001	396	Query	1	521	use `test`; CREATE TABLE `t3` (
+  `a` int(11) DEFAULT NULL
+) ENGINE=InnoDB
+master-bin.000001	521	Query	1	646	use `test`; CREATE TABLE `t4` (
+  `a` int(11) DEFAULT NULL
+) ENGINE=InnoDB
+master-bin.000001	646	Query	1	714	use `test`; BEGIN
+master-bin.000001	714	Table_map	1	39	table_id: # (test.t4)
+master-bin.000001	753	Write_rows	1	83	table_id: # flags: STMT_END_F
+master-bin.000001	797	Table_map	1	122	table_id: # (test.t1)
+master-bin.000001	836	Write_rows	1	166	table_id: # flags: STMT_END_F
+master-bin.000001	880	Query	1	951	use `test`; ROLLBACK
+SHOW TABLES;
+Tables_in_test
+t1
+t2
+t3
+t4
+SELECT TABLE_NAME,ENGINE
+FROM INFORMATION_SCHEMA.TABLES
+WHERE TABLE_NAME LIKE 't_';
+TABLE_NAME	ENGINE
+t1	MyISAM
+t2	InnoDB
+t3	InnoDB
+t4	InnoDB
+SELECT * FROM t1;
+a
+1
+2
+3
+4
+5
+6
+SELECT * FROM t2;
+a
+SELECT * FROM t3;
+a
+SELECT * FROM t4;
+a
+DROP TABLE IF EXISTS t1,t2,t3,t4;

--- 1.7/mysql-test/t/rpl_row_create_table.test	2006-12-01 09:43:16 +01:00
+++ 1.8/mysql-test/t/rpl_row_create_table.test	2006-12-01 09:43:16 +01:00
@@ -2,6 +2,10 @@
 
 --source include/have_binlog_format_row.inc
 --source include/master-slave.inc
+--source include/have_innodb.inc
+connection slave;
+--source include/have_innodb.inc
+connection master;
 
 # Bug#18326: Do not lock table for writing during prepare of statement
 # The use of the ps protocol causes extra table maps in the binlog, so
@@ -67,7 +71,7 @@
 CREATE TABLE t7 (UNIQUE(b)) SELECT a,b FROM tt3;
 # Shouldn't be written to the binary log
 --replace_regex /table_id: [0-9]+/table_id: #/
-SHOW BINLOG EVENTS FROM 1256;
+SHOW BINLOG EVENTS FROM 1118;
 
 # Test that INSERT-SELECT works the same way as for SBR.
 CREATE TABLE t7 (a INT, b INT UNIQUE);
@@ -76,7 +80,7 @@
 SELECT * FROM t7 ORDER BY a,b;
 # Should be written to the binary log
 --replace_regex /table_id: [0-9]+/table_id: #/
-SHOW BINLOG EVENTS FROM 1118;
+SHOW BINLOG EVENTS FROM 1455;
 sync_slave_with_master;
 SELECT * FROM t7 ORDER BY a,b;
 
@@ -87,7 +91,7 @@
 INSERT INTO t7 SELECT a,b FROM tt4;
 ROLLBACK;
 --replace_regex /table_id: [0-9]+/table_id: #/
-SHOW BINLOG EVENTS FROM 1314;
+SHOW BINLOG EVENTS FROM 1651;
 SELECT * FROM t7 ORDER BY a,b;
 sync_slave_with_master;
 SELECT * FROM t7 ORDER BY a,b;
@@ -102,19 +106,67 @@
 --query_vertical SHOW CREATE TABLE t8
 --query_vertical SHOW CREATE TABLE t9
 --replace_regex /table_id: [0-9]+/table_id: #/
-SHOW BINLOG EVENTS FROM 1410;
+SHOW BINLOG EVENTS FROM 1747;
 sync_slave_with_master;
 --echo **** On Slave ****
 --query_vertical SHOW CREATE TABLE t8
 --query_vertical SHOW CREATE TABLE t9
 
 connection master;
---disable_query_log
 DROP TABLE IF EXISTS t1,t2,t3,t4,t5,t6,t7,t8,t9;
 sync_slave_with_master;
 # Here we reset the value of the default storage engine
 STOP SLAVE;
 SET GLOBAL storage_engine=@storage_engine;
 START SLAVE;
---enable_query_log
 --enable_ps_protocol
+
+# BUG#22864 (Rollback following CREATE ... SELECT discards 'CREATE
+# table' from log):
+--echo ================ BUG#22864 ================
+connection slave;
+STOP SLAVE;
+RESET SLAVE;
+connection master;
+RESET MASTER;
+connection slave;
+START SLAVE;
+connection master;
+SET AUTOCOMMIT=0;
+CREATE TABLE t1 (a INT);
+INSERT INTO t1 VALUES (1),(2),(3);
+
+CREATE TABLE t2 ENGINE=INNODB SELECT * FROM t1;
+ROLLBACK;
+
+CREATE TABLE t3 ENGINE=INNODB SELECT * FROM t1;
+INSERT INTO t3 VALUES (4),(5),(6);
+ROLLBACK;
+
+CREATE TABLE t4 ENGINE=INNODB SELECT * FROM t1;
+INSERT INTO t1 VALUES (4),(5),(6);
+ROLLBACK;
+
+SHOW TABLES;
+SELECT TABLE_NAME,ENGINE
+  FROM INFORMATION_SCHEMA.TABLES
+ WHERE TABLE_NAME LIKE 't_';
+SELECT * FROM t1;
+SELECT * FROM t2;
+SELECT * FROM t3;
+SELECT * FROM t4;
+--replace_regex /table_id: [0-9]+/table_id: #/
+SHOW BINLOG EVENTS;
+sync_slave_with_master;
+SHOW TABLES;
+SELECT TABLE_NAME,ENGINE
+  FROM INFORMATION_SCHEMA.TABLES
+ WHERE TABLE_NAME LIKE 't_';
+SELECT * FROM t1;
+SELECT * FROM t2;
+SELECT * FROM t3;
+SELECT * FROM t4;
+
+connection master;
+DROP TABLE IF EXISTS t1,t2,t3,t4;
+sync_slave_with_master;

--- 1.16/sql/log.h	2006-12-01 09:43:16 +01:00
+++ 1.17/sql/log.h	2006-12-01 09:43:16 +01:00
@@ -338,6 +338,8 @@
   bool write(Log_event* event_info); // binary log write
   bool write(THD *thd, IO_CACHE *cache, Log_event *commit_event);
 
+  int  write_cache(IO_CACHE *cache, bool lock_log, bool flush_and_sync);
+
   void start_union_events(THD *thd);
   void stop_union_events(THD *thd);
   bool is_query_in_union(THD *thd, query_id_t query_id_param);
Thread
bk commit into 5.1 tree (mats:1.2309) BUG#22865Mats Kindahl1 Dec