MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:rsomla Date:January 18 2007 10:26am
Subject:bk commit into 5.1 tree (rafal:1.2372) BUG#25597
View as plain text  
Below is the list of changes that have just been committed into a local
5.1 repository of rafal. When rafal does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html

ChangeSet@stripped, 2007-01-18 11:26:27+01:00, rafal@quant.(none) +6 -0
  BUG#25597 (Extraneous error message displayed for slave SQL thread errors):
  This is incomplete patch for review purposes.
  
  The patch re-works error reporting in replication code. Some goals of the patch:
  
  - Move error reporting as close as possible to the place where error was 
    detected (to have more information about the error).
  
  - When errors happen during calls to handler methods, print handler and method
    name in error messages so that the error code can be correctly interpreted.
  
  - Write to the error log more information about exceptional situations such as
    skipping events, re-reading and re-executing event groups.
  
  - Add consistent message headers such as "Slave I/O thread ..." and "Slave SQL 
    thread ..." informing about the place where error was detected.
  
  - Suppress confusing message printouts in error log such as reporting the same
    error twice or printing non-informative "Unknown error" messages.
  
  - Some refactoring in event reading and execution code: analyzing errors from 
    Query event, master reconnection logic in I/O thread, remove base class 
    initialization order compiler warnings.
  
  TODO:
  
  - Review/update error messages generated by helper functions of the I/O thread.
  
  - After calling exec_relay_log_event() the SQL thread prints waring pushed 
    during event execution. Since exec_relay_log_event() can re-execute events it 
    sometimes create confusing output. Move warning printing code inside 
    exec_relay_log_event() so that they are printed just after an event was 
    executed.
  
  - Polish the patch.

  sql/log_event.cc@stripped, 2007-01-18 11:26:24+01:00, rafal@quant.(none) +407 -199
    - Code layout changes.
    
    - Changed (simplified?) logic analysing errors detected in Query event, also 
      more informative messages written to error log.
    
    - unpack_row(): Replaced RLI parameter by a Boolean flag controling printing of 
      informative messages.
    
    - Rephrasing of error messages in a more uniform way. Also, add more information
      about a source of a message.
    
    - In Rows event, move error reporting inside do_... functions which get 
      additional RLI argument.
    
    - Use m_table member of Rows event instead of passing TABLE argument to the 
      do_... methods.
    
    - When calling handler methods: give the name of the 
      handler in an error message.
    
    - Error reporting inside replace_record() and find_and_fetch_row() functions. 

  sql/log_event.h@stripped, 2007-01-18 11:26:24+01:00, rafal@quant.(none) +67 -55
    - Added RLI parameter to do_... methods of Rows event class.
    
    - Removed TABLE parameter from these methods - replaced by m_table member.
    
    - Changed comments of these methods to doxygen format.

  sql/partition_element.h@stripped, 2007-01-18 11:26:24+01:00, rafal@quant.(none) +2 -1
    - Constructor invocation reordering to eliminate some compiler warnings.

  sql/rpl_utility.cc@stripped, 2007-01-18 11:26:24+01:00, rafal@quant.(none) +2 -2
    - Added function name to error messages.

  sql/rpl_utility.h@stripped, 2007-01-18 11:26:24+01:00, rafal@quant.(none) +1 -1
    - Constructor invocation reordering to eliminate some compiler warnings.

  sql/slave.cc@stripped, 2007-01-18 11:26:24+01:00, rafal@quant.(none) +324 -215
    - Changes in read_event(): removed suppress_warning parameter, function returns 
      info about result of the operation.
    
    - Added documentation to exec_relay_log_event().
    
    - More information messages about event execution in exec_relay_log_event(): 
      skipping events, event re-execution.
    
    - Set thd->no_warninfs_for_error which is used in my_error. With this flag, 
      errors are nod pushed as warnings which would lead to confusing double reporting
      of the same error.
    
    - Reworked error messages inside handle_slave_io() and handle_slave_sql() thread
      handlers.
    
    - Don't print errors or warnings with ER_UNKNOWN_ERROR code as they do not 
      contain any interesting information (the my_error() function overwrites 
      description string for such errors with "Unknown error" (fill a bug about it?).
    
    - In handle_slave_io(): move duplicated reconnection code in one place 
      (at the beginning of the function), log information about reconnections to
      the master.
    

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User:	rafal
# Host:	quant.(none)
# Root:	/ext/mysql/bk/mysql-5.1-bug21842

--- 1.259/sql/log_event.cc	2007-01-18 11:26:33 +01:00
+++ 1.260/sql/log_event.cc	2007-01-18 11:26:33 +01:00
@@ -32,6 +32,8 @@
 
 #define log_cs	&my_charset_latin1
 
+#define TABLE_HANDLER_NAME(T) ha_resolve_storage_engine_name((T)->file->ht)
+
 /*
   Cache that will automatically be written to a dedicated file on
   destruction.
@@ -1923,15 +1925,22 @@
     thd->set_time((time_t)when);
     thd->query_length= q_len_arg;
     thd->query= (char*)query_arg;
+    
     VOID(pthread_mutex_lock(&LOCK_thread_count));
     thd->query_id = next_query_id();
     VOID(pthread_mutex_unlock(&LOCK_thread_count));
+    
     thd->variables.pseudo_thread_id= thread_id;		// for temp tables
+    
     DBUG_PRINT("query",("%s",thd->query));
 
+    // Check if the error reported on master is not so fatal that we should
+    // not even try to execute it on slave.
+
     if (ignored_error_code((expected_error= error_code)) ||
-	!check_expected_error(thd,rli,expected_error))
+        !check_expected_error(thd,rli,expected_error))
     {
+      
       if (flags2_inited)
         /*
           all bits of thd->options which are 1 in OPTIONS_WRITTEN_TO_BIN_LOG
@@ -1943,6 +1952,7 @@
         Rotate_log_event which reset thd->options and sql_mode etc, so
         nothing to do.
       */
+      
       /*
         We do not replicate IGNORE_DIR_IN_CREATE. That is, if the master is a
         slave which runs with SQL_MODE=IGNORE_DIR_IN_CREATE, this should not
@@ -1955,6 +1965,7 @@
         thd->variables.sql_mode=
           (ulong) ((thd->variables.sql_mode & MODE_NO_DIR_IN_CREATE) |
                    (sql_mode & ~(ulong) MODE_NO_DIR_IN_CREATE));
+      
       if (charset_inited)
       {
         if (rli->cached_charset_compare(charset))
@@ -1979,6 +1990,7 @@
           thd->update_charset(); // for the charset change to take effect
         }
       }
+      
       if (time_zone_len)
       {
         String tmp(time_zone_str, time_zone_len, &my_charset_bin);
@@ -1990,6 +2002,7 @@
           goto compare_errors;
         }
       }
+      
       if (lc_time_names_number)
       {
         if (!(thd->variables.lc_time_names=
@@ -2017,17 +2030,21 @@
         we exit gracefully; otherwise we warn about the bad error and tell DBA
         to check/fix it.
       */
+      DBUG_PRINT("info",("Query ignored on slave")); 
+      
       if (mysql_test_parse_for_slave(thd, thd->query, thd->query_length))
+      { 
         clear_all_errors(thd, rli);        /* Can ignore query */
+      }
       else
       {
         slave_print_msg(ERROR_LEVEL, rli, expected_error, 
-                          "\
-Query partially completed on the master (error on master: %d) \
-and was aborted. There is a chance that your master is inconsistent at this \
-point. If you are sure that your master is ok, run this query manually on the \
-slave and then restart the slave with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; \
-START SLAVE; . Query: '%s'", expected_error, thd->query);
+          "Query log event: query partially completed on the master "
+          "(error on master: %d) and was aborted. There is a chance that your master "
+          "is inconsistent at this point. If you are sure that your master is ok, "
+          "run this query manually on the slave and then restart the slave with "
+          "SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; . Query: '%s'", 
+          expected_error, thd->query);
         thd->query_error= 1;
       }
       goto end;
@@ -2039,50 +2056,60 @@
 
 compare_errors:
 
-     /*
-      If we expected a non-zero error code, and we don't get the same error
-      code, and none of them should be ignored.
-    */
-    DBUG_PRINT("info",("expected_error: %d  last_errno: %d",
- 		       expected_error, thd->net.last_errno));
-    if ((expected_error != (actual_error= thd->net.last_errno)) &&
- 	expected_error &&
- 	!ignored_error_code(actual_error) &&
- 	!ignored_error_code(expected_error))
+    actual_error= thd->net.last_errno;  // error reported while executing a statement
+    
+    DBUG_PRINT("info",("expected_error: %d  last_errno: %d", expected_error, actual_error));
+
+    if (ignored_error_code(expected_error))
     {
-      slave_print_msg(ERROR_LEVEL, rli, 0,
-                      "\
-Query caused different errors on master and slave.     \
-Error on master: '%s' (%d), Error on slave: '%s' (%d). \
-Default database: '%s'. Query: '%s'",
-                      ER_SAFE(expected_error),
-                      expected_error,
-                      actual_error ? thd->net.last_error: "no error",
-                      actual_error,
-                      print_slave_db_safe(db), query_arg);
-      thd->query_error= 1;
+      DBUG_PRINT("info",("expected_error ignored"));
+      expected_error= 0;
     }
-    /*
-      If we get the same error code as expected, or they should be ignored. 
-    */
-    else if (expected_error == actual_error ||
- 	     ignored_error_code(actual_error))
+    
+    if (ignored_error_code(actual_error))
     {
-      DBUG_PRINT("info",("error ignored"));
-      clear_all_errors(thd, rli);
+      DBUG_PRINT("info",("last_errno ignored"));
+      actual_error= 0;
     }
-    /*
-      Other cases: mostly we expected no error and get one.
-    */
-    else if (thd->query_error || thd->is_fatal_error)
+    
+    if (expected_error) 
+    {
+      if (actual_error == expected_error)
+      { 
+        DBUG_PRINT("info",("same error as on master -- ignored"));
+        clear_all_errors(thd, rli);
+      }
+      else if (actual_error)
+      {
+        slave_print_msg(ERROR_LEVEL, rli, 0,
+                        "Query log event: different errors on master and slave. "
+                        "Error on master: '%s' (%d), Error on slave: '%s' (%d). "
+                        "Default database: '%s'. Query: '%s'",
+                        ER_SAFE(expected_error), expected_error,
+                        thd->net.last_error, actual_error,
+                        print_slave_db_safe(db), query_arg);
+        thd->query_error= 1;
+      }
+      else
+      {
+        slave_print_msg(ERROR_LEVEL, rli, 0,
+                        "Query log event: expected error '%s' (%d) but no error found. "
+                        "Default database: '%s'. Query: '%s'",
+                        ER_SAFE(expected_error), expected_error,
+                        print_slave_db_safe(thd->db), query_arg);
+        thd->query_error= 1;
+      }
+    }
+    else if (actual_error)
     {
       slave_print_msg(ERROR_LEVEL, rli, actual_error,
-                      "Error '%s' on query. Default database: '%s'. Query: '%s'",
-                      (actual_error ? thd->net.last_error :
-                       "unexpected success or fatal error"),
+                      "Query log event: unexpected error `%s`. "
+                      "Default database: '%s'. Query: '%s'",
+                      thd->net.last_error,
                       print_slave_db_safe(thd->db), query_arg);
       thd->query_error= 1;
     }
+    
 
     /*
       TODO: compare the values of "affected rows" around here. Something
@@ -5513,7 +5540,6 @@
   
   SYNOPSIS
     unpack_row()
-    rli     Relay log info
     table   Table to unpack into
     colcnt  Number of columns to read from record
     row     Packed row data
@@ -5525,6 +5551,9 @@
             record on the master side
     rw_set  Pointer to bitmap that holds either the read_set or the
             write_set of the table
+    check_no_defaults
+            Tells if ER_NO_DEFAULY_FOR_FIELD error described below 
+            should be detected
 
   DESCRIPTION
 
@@ -5546,15 +5575,15 @@
       be returned:
 
       ER_NO_DEFAULT_FOR_FIELD
-        Returned if one of the fields existing on the slave but not on
-        the master does not have a default value (and isn't nullable)
+        Returned if check_no_defaults is TRUE and one of the fields 
+        existing on the slave but not on the master does not have a 
+        default value (and isn't nullable)
  */
 static int
-unpack_row(RELAY_LOG_INFO *rli,
-           TABLE *table, uint const colcnt,
+unpack_row(TABLE *table, uint const colcnt,
            char const *row, MY_BITMAP const *cols,
            char const **row_end, ulong *master_reclength,
-           MY_BITMAP* const rw_set, Log_event_type const event_type)
+           MY_BITMAP* const rw_set, bool check_no_defaults )
 {
   byte *const record= table->record[0];
   DBUG_ENTER("unpack_row");
@@ -5639,14 +5668,15 @@
 
     DBUG_PRINT("info", ("processing column '%s' @ 0x%lx", f->field_name,
                         (long) f->ptr));
-    if (event_type == WRITE_ROWS_EVENT && (f->flags & mask) == mask)
+                        
+    if ( check_no_defaults && (f->flags & mask) == mask)
     {
-      slave_print_msg(ERROR_LEVEL, rli, ER_NO_DEFAULT_FOR_FIELD,
-                      "Field `%s` of table `%s`.`%s` "
-                      "has no default value and cannot be NULL",
-                      (*field_ptr)->field_name, table->s->db.str,
-                      table->s->table_name.str);
-      error = ER_NO_DEFAULT_FOR_FIELD;
+      sql_print_information(
+        "Slave SQL thread: unpack_row: field `%s` of table `%s`.`%s` "
+        "has no default value and cannot be NULL",
+        (*field_ptr)->field_name, table->s->db.str, table->s->table_name.str);
+
+      error= ER_NO_DEFAULT_FOR_FIELD;
     }
     else
       f->set_default();
@@ -5669,6 +5699,7 @@
    */
   if (m_table_id == ~0UL)
   {
+    DBUG_PRINT("info",("dummy Rows log event"));
     /*
        This one is supposed to be set: just an extra check so that
        nothing strange has happened.
@@ -5711,11 +5742,12 @@
       if (!need_reopen)
       {
         slave_print_msg(ERROR_LEVEL, rli, error,
-                        "Error in %s event: when locking tables",
-                        get_type_str());
+                        "Error in %s event: when doing lock_tables()", get_type_str());
         rli->clear_tables_to_lock();
         DBUG_RETURN(error);
       }
+      
+      DBUG_PRINT("info",("Error openning tables and we reopen them"));
 
       /*
         So we need to reopen the tables.
@@ -5738,21 +5770,17 @@
       {
         if (thd->query_error || thd->is_fatal_error)
         {
-          /*
-            Error reporting borrowed from Query_log_event with many excessive
-            simplifications (we don't honour --slave-skip-errors)
-          */
-          uint actual_error= thd->net.last_errno;
-          slave_print_msg(ERROR_LEVEL, rli, actual_error,
-                          "Error '%s' on reopening tables",
-                          (actual_error ? thd->net.last_error :
-                           "unexpected success or fatal error"));
+          slave_print_msg(ERROR_LEVEL, rli, error,
+                          "Error in %s event: when reopenning tables using open_tables()",
+                          get_type_str());
           thd->query_error= 1;
         }
         rli->clear_tables_to_lock();
         DBUG_RETURN(error);
       }
-    }
+    
+    }  // while ((error= lock_tables(thd, rli->tables_to_lock, ...
+    
     /*
       When the open and locking succeeded, we add all the tables to
       the table map and remove them from tables to lock.
@@ -5769,20 +5797,25 @@
     query_cache.invalidate_locked_for_write(rli->tables_to_lock);
 #endif
     rli->clear_tables_to_lock();
-  }
+  
+  } // if (!thd->lock)
 
   DBUG_ASSERT(rli->tables_to_lock == NULL && rli->tables_to_lock_count == 0);
 
-  TABLE* table= rli->m_table_map.get_table(m_table_id);
+  m_table= rli->m_table_map.get_table(m_table_id);
 
-  if (table)
+  if (!m_table)
   {
     /*
       table == NULL means that this table should not be replicated
       (this was set up by Table_map_log_event::exec_event() which
       tested replicate-* rules).
     */
-
+    DBUG_PRINT("info",("Table map has no table with id %d (table skipped)",
+                       m_table_id));   
+  }
+  else
+  {
     /*
       It's not needed to set_time() but
       1) it continues the property that "Time" in SHOW PROCESSLIST shows how
@@ -5792,6 +5825,7 @@
       So we call set_time(), like in SBR. Presently it changes nothing.
     */
     thd->set_time((time_t)when);
+    
     /*
       There are a few flags that are replicated with each row event.
       Make sure to set/clear them before executing the main body of
@@ -5806,14 +5840,19 @@
         thd->options|= OPTION_RELAXED_UNIQUE_CHECKS;
     else
         thd->options&= ~OPTION_RELAXED_UNIQUE_CHECKS;
+    
     /* A small test to verify that objects have consistent types */
     DBUG_ASSERT(sizeof(thd->options) == sizeof(OPTION_RELAXED_UNIQUE_CHECKS));
 
-    error= do_before_row_operations(table);
+    error= do_before_row_operations(rli);
+    
+    // loop over rows in the Row event
+    
     while (error == 0 && row_start < (const char*) m_rows_end)
     {
       char const *row_end= NULL;
-      if ((error= do_prepare_row(thd, rli, table, row_start, &row_end)))
+      
+      if ((error= do_prepare_row(rli, row_start, &row_end)))
         break; // We should perform the after-row operation even in
                // the case of error
 
@@ -5821,49 +5860,63 @@
       DBUG_ASSERT(row_end <= (const char*)m_rows_end);
 
       /* in_use can have been set to NULL in close_tables_for_reopen */
-      THD* old_thd= table->in_use;
-      if (!table->in_use)
-        table->in_use= thd;
-      error= do_exec_row(table);
-      table->in_use = old_thd;
-      switch (error)
-      {
+      THD* old_thd= m_table->in_use;
+      
+      if (!m_table->in_use)
+        m_table->in_use= thd;
+      
+      error= do_exec_row(rli);
+      
+      m_table->in_use = old_thd;
+      
+      switch (error) {
         /* Some recoverable errors */
       case HA_ERR_RECORD_CHANGED:
       case HA_ERR_KEY_NOT_FOUND:	/* Idempotency support: OK if
                                            tuple does not exist */
-	error= 0;
+        DBUG_PRINT("info",("Event execution error %d ignored as recoverable one.",error));                                   
+        clear_all_errors(thd,rli); // otherwise users will see the error e.g. in SHOW SLAVE
+                                   // Note: error should still be reported inside do_exec_row()
+        error= 0;
+      
       case 0:
-	break;
+        break;
 
       default:
-	slave_print_msg(ERROR_LEVEL, rli, error,
-                        "Error in %s event: row application failed",
-                        get_type_str());
-	thd->query_error= 1;
-	break;
+        thd->query_error= 1;
+      	break;
       }
 
       row_start= row_end;
-    }
-    DBUG_EXECUTE_IF("STOP_SLAVE_after_first_Rows_event",
-                    rli->abort_slave=1;);
-    error= do_after_row_operations(table, error);
+    
+    } // while (error == 0 && row_start < (const char*) m_rows_end)
+    
+    DBUG_EXECUTE_IF("STOP_SLAVE_after_first_Rows_event", rli->abort_slave=1;);
+    
+    int error1= do_after_row_operations(rli, error);
+    if( error1 ) 
+      error= error1;
+     
+    
     if (!cache_stmt)
     {
       DBUG_PRINT("info", ("Marked that we need to keep log"));
       thd->options|= OPTION_KEEP_LOG;
     }
-  }
+    
+  } // if (m_table)
 
   if (error)
   {                     /* error has occured during the transaction */
-    slave_print_msg(ERROR_LEVEL, rli, error,
+    // @note error has been already reported when it happened.
+/*
+    slave_print_msg(ERROR_LEVEL, rli, 0,
                     "Error in %s event: error during transaction execution "
                     "on table %s.%s",
                     get_type_str(), table->s->db.str, 
                     table->s->table_name.str);
-     /*
+*/     
+     /* 
       If one day we honour --skip-slave-errors in row-based replication, and
       the error should be skipped, then we would clear mappings, rollback,
       close tables, but the slave SQL thread would not stop and then may
@@ -5879,6 +5932,8 @@
     thd->query_error= 1;
     DBUG_RETURN(error);
   }
+  
+  // Here we know event execution was successful (possibly some errors ignored)
 
   if (get_flags(STMT_END_F))
   {
@@ -5898,6 +5953,7 @@
       replicate-ignore rules).
     */
     thd->binlog_flush_pending_rows_event(true);
+    
     /*
       If this event is not in a transaction, the call below will, if some
       transactional storage engines are involved, commit the statement into
@@ -5906,8 +5962,11 @@
       Xid_log_event will come next which will, if some transactional engines
       are involved, commit the transaction and flush the pending event to the
       binlog.
+      
+      @note error returned by ha_autocommit_or_rollback is noninformative.
     */
     error= ha_autocommit_or_rollback(thd, 0);
+    
     /*
       Now what if this is not a transactional engine? we still need to
       flush the pending event to the binlog; we did it with
@@ -5922,29 +5981,22 @@
     rli->cleanup_context(thd, 0);
     rli->transaction_end(thd);
 
-    if (error == 0)
-    {
-      /*
-        Clear any errors pushed in thd->net.last_err* if for example "no key
-        found" (as this is allowed). This is a safety measure; apparently
-        those errors (e.g. when executing a Delete_rows_log_event of a
-        non-existing row, like in rpl_row_mystery22.test,
-        thd->net.last_error = "Can't find record in 't1'" and last_errno=1032)
-        do not become visible. We still prefer to wipe them out.
-      */
-      thd->clear_error();
-      error= Log_event::exec_event(rli);
-    }
+    if (error)
+      slave_print_msg(ERROR_LEVEL, rli, 0,
+                      "Error in %s event for table `%s`.`%s`: when commiting row changes "
+                      "using ha_autocommit_or_rollback()",
+                      get_type_str(), m_table->s->db.str, 
+                      m_table->s->table_name.str);
     else
-      slave_print_msg(ERROR_LEVEL, rli, error,
-                      "Error in %s event: commit of row events failed, "
-                      "table `%s`.`%s`",
-                      get_type_str(), table->s->db.str, 
-                      table->s->table_name.str);
+      error= Log_event::exec_event(rli);
+      
     DBUG_RETURN(error);
-  }
+    
+  } // if (get_flags(STMT_END_F))
+
+  // This is not the last event of a statement
 
-  if (table && (table->s->primary_key == MAX_KEY) && !cache_stmt)
+  if (m_table && (m_table->s->primary_key == MAX_KEY) && !cache_stmt)
   {
     /*
       ------------ Temporary fix until WL#2975 is implemented ---------
@@ -5966,11 +6018,12 @@
   }
 
   DBUG_ASSERT(error == 0);
-  thd->clear_error();
+  thd->clear_error();             // should be redundant?
   rli->inc_event_relay_log_pos();
   
   DBUG_RETURN(0);
 }
+
 #endif /* !defined(MYSQL_CLIENT) && defined(HAVE_REPLICATION) */
 
 #ifndef MYSQL_CLIENT
@@ -6441,7 +6494,7 @@
 #endif
 
 #if !defined(MYSQL_CLIENT) && defined(HAVE_REPLICATION)
-int Write_rows_log_event::do_before_row_operations(TABLE *table)
+int Write_rows_log_event::do_before_row_operations(RELAY_LOG_INFO *rli)
 {
   int error= 0;
 
@@ -6461,15 +6514,16 @@
   */
   thd->lex->sql_command= SQLCOM_REPLACE;
 
-  table->file->extra(HA_EXTRA_IGNORE_DUP_KEY);  // Needed for ndbcluster
-  table->file->extra(HA_EXTRA_WRITE_CAN_REPLACE);  // Needed for ndbcluster
-  table->file->extra(HA_EXTRA_IGNORE_NO_KEY);   // Needed for ndbcluster
+  m_table->file->extra(HA_EXTRA_IGNORE_DUP_KEY);  // Needed for ndbcluster
+  m_table->file->extra(HA_EXTRA_WRITE_CAN_REPLACE);  // Needed for ndbcluster
+  m_table->file->extra(HA_EXTRA_IGNORE_NO_KEY);   // Needed for ndbcluster
   /*
     TODO: the cluster team (Tomas?) says that it's better if the engine knows
     how many rows are going to be inserted, then it can allocate needed memory
     from the start.
   */
-  table->file->ha_start_bulk_insert(0);
+  m_table->file->ha_start_bulk_insert(0);
+  
   /*
     We need TIMESTAMP_NO_AUTO_SET otherwise ha_write_row() will not use fill
     any TIMESTAMP column with data from the row but instead will use
@@ -6485,29 +6539,45 @@
     some cases we won't want TIMESTAMP_NO_AUTO_SET (will require some code to
     analyze if explicit data is provided for slave's TIMESTAMP columns).
   */
-  table->timestamp_field_type= TIMESTAMP_NO_AUTO_SET;
-  return error;
+  m_table->timestamp_field_type= TIMESTAMP_NO_AUTO_SET;
+  
+  return error; // @note always 0
 }
 
-int Write_rows_log_event::do_after_row_operations(TABLE *table, int error)
+int Write_rows_log_event::do_after_row_operations(RELAY_LOG_INFO *rli, int error)
 {
   if (error == 0)
-    error= table->file->ha_end_bulk_insert();
+  { 
+    error= m_table->file->ha_end_bulk_insert();
+    
+    if (error)
+    {
+      slave_print_msg(ERROR_LEVEL,rli,error,
+        "Error in %s event: when calling ha_end_bulk_insert() method of %s handler.",
+        get_type_str(), TABLE_HANDLER_NAME(m_table));
+    }
+  }  
+    
   return error;
 }
 
-int Write_rows_log_event::do_prepare_row(THD *thd, RELAY_LOG_INFO *rli,
-                                         TABLE *table,
+int Write_rows_log_event::do_prepare_row(RELAY_LOG_INFO *rli,
                                          char const *const row_start,
                                          char const **const row_end)
 {
-  DBUG_ASSERT(table != NULL);
+  DBUG_ASSERT(m_table != NULL);
   DBUG_ASSERT(row_start && row_end);
 
   int error;
-  error= unpack_row(rli, table, m_width, row_start, &m_cols, row_end,
-                    &m_master_reclength, table->write_set, WRITE_ROWS_EVENT);
-  bitmap_copy(table->read_set, table->write_set);
+  error= unpack_row(m_table, m_width, row_start, &m_cols, row_end,
+                    &m_master_reclength, m_table->write_set, 
+                    TRUE /* detect missing default values */ );
+  bitmap_copy(m_table->read_set, m_table->write_set);
+
+  if (error)
+    slave_print_msg(ERROR_LEVEL, rli, error, 
+      "Error in %s event: when unpacking row.", get_type_str());
+    
   return error;
 }
 
@@ -6642,6 +6712,7 @@
 
   SYNOPSIS
       replace_record()
+      rli    Relay log info for reporting errors.
       thd    Thread context for writing the record.
       table  Table to which record should be written.
       master_reclength
@@ -6658,7 +6729,7 @@
       indices), we do an ha_update_row() or a ha_delete_row() instead.
  */
 static int
-replace_record(THD *thd, TABLE *table,
+replace_record(RELAY_LOG_INFO *rli, THD *thd, TABLE *table,
                ulong const master_reclength,
                uint const master_fields)
 {
@@ -6671,14 +6742,28 @@
 
   while ((error= table->file->ha_write_row(table->record[0])))
   {
+
+    DBUG_PRINT("info",("ha_write_row() failed, will try to delete the record"));
+
     if (error == HA_ERR_LOCK_DEADLOCK || error == HA_ERR_LOCK_WAIT_TIMEOUT)
     {
       table->file->print_error(error, MYF(0)); /* to check at exec_relay_log_event */
+
+      slave_print_msg(ERROR_LEVEL, rli, error,
+        "replace_record: error in ha_write_row() method of %s handler.",
+        TABLE_HANDLER_NAME(table));
+      
       DBUG_RETURN(error);
     }
+    
     if ((keynum= table->file->get_dup_key(error)) < 0)
     {
       /* We failed to retrieve the duplicate key */
+
+      slave_print_msg(ERROR_LEVEL, rli, HA_ERR_FOUND_DUPP_KEY,
+        "replace_record: failed to retreive duplicate key using get_dup_key() method ",
+        "of %s handler.", TABLE_HANDLER_NAME(table));
+
       DBUG_RETURN(HA_ERR_FOUND_DUPP_KEY);
     }
 
@@ -6696,12 +6781,22 @@
     {
       error= table->file->rnd_pos(table->record[1], table->file->dup_ref);
       if (error)
+      {
+        slave_print_msg(ERROR_LEVEL, rli, error,
+          "replace_record: error in rnd_pos() method of %s handler.",
+          TABLE_HANDLER_NAME(table));
+
         DBUG_RETURN(error);
+      }
     }
     else
     {
       if (table->file->extra(HA_EXTRA_FLUSH_CACHE))
       {
+        slave_print_msg(ERROR_LEVEL, rli, my_errno,
+          "replace_record: failed to flush %s handler's cache.",
+          TABLE_HANDLER_NAME(table));
+
         DBUG_RETURN(my_errno);
       }
 
@@ -6709,17 +6804,30 @@
       {
         key.assign(static_cast<char*>(my_alloca(table->s->max_unique_length)));
         if (key.get() == NULL)
+        {
+          slave_print_msg(ERROR_LEVEL, rli, ENOMEM,
+            "replace_record: out of mem when allocating key.");
+
           DBUG_RETURN(ENOMEM);
+        }
       }
 
       key_copy((byte*)key.get(), table->record[0], table->key_info + keynum, 0);
+      
       error= table->file->index_read_idx(table->record[1], keynum, 
                                          (const byte*)key.get(),
                                          table->key_info[keynum].key_length,
                                          HA_READ_KEY_EXACT);
       if (error)
+      {
+        slave_print_msg(ERROR_LEVEL, rli, error,
+          "replace_record: error in index_read_idx() method of %s handler.",
+          TABLE_HANDLER_NAME(table));
+
         DBUG_RETURN(error);
-    }
+      }
+      
+    } 
 
     /*
        Now, table->record[1] should contain the offending row.  That
@@ -6751,22 +6859,43 @@
     {
       error=table->file->ha_update_row(table->record[1],
                                        table->record[0]);
+      
+      if (error)
+        slave_print_msg(ERROR_LEVEL, rli, error,
+          "replace_record: error in ha_update_row() method of %s handler.",
+          TABLE_HANDLER_NAME(table));
+
       DBUG_RETURN(error);
     }
     else
     {
       if ((error= table->file->ha_delete_row(table->record[1])))
+      {
+        slave_print_msg(ERROR_LEVEL, rli, error,
+          "replace_record: error in ha_delete_row() method of %s handler.",
+          TABLE_HANDLER_NAME(table));
+
         DBUG_RETURN(error);
+      }
       /* Will retry ha_write_row() with the offending row removed. */
     }
-  }
+    
+  } // while ((error= table->file->ha_write_row(table->record[0])))
+  
+  if( error )
+    slave_print_msg(ERROR_LEVEL, rli, error,
+      "replace_record: error in ha_write_row() method of %s handler.",
+      TABLE_HANDLER_NAME(table));
+      
   DBUG_RETURN(error);
 }
 
-int Write_rows_log_event::do_exec_row(TABLE *table)
+int Write_rows_log_event::do_exec_row(RELAY_LOG_INFO *rli)
 {
-  DBUG_ASSERT(table != NULL);
-  int error= replace_record(thd, table, m_master_reclength, m_width);
+  DBUG_ASSERT(m_table != NULL);
+  int error= replace_record(rli, thd, m_table, m_master_reclength, m_width);
+  
+  // @note error reporting is done inside replace_record.
   return error;
 }
 #endif /* !defined(MYSQL_CLIENT) && defined(HAVE_REPLICATION) */
@@ -6814,6 +6943,7 @@
   If the engine allows random access of the records, a combination of
   position() and rnd_pos() will be used.
 
+  @param rli   Relay log info for error reporting
   @param table Pointer to table to search
   @param key   Pointer to key to use for search, if table has key
 
@@ -6831,7 +6961,7 @@
   <code>table->record[1]</code>, error code otherwise.
  */
 
-static int find_and_fetch_row(TABLE *table, byte *key)
+static int find_and_fetch_row(RELAY_LOG_INFO *rli, TABLE *table, byte *key)
 {
   DBUG_ENTER("find_and_fetch_row(TABLE *table, byte *key, byte *record)");
   DBUG_PRINT("enter", ("table: 0x%lx, key: 0x%lx  record: 0x%lx",
@@ -6862,6 +6992,12 @@
     */
     table->file->position(table->record[0]);
     int error= table->file->rnd_pos(table->record[0], table->file->ref);
+
+    if (error)
+      slave_print_msg(ERROR_LEVEL, rli, error,
+        "find_and_fetch_row: error in rnd_pos() method of %s handler.",
+        TABLE_HANDLER_NAME(table));
+    
     /*
       rnd_pos() returns the record in table->record[0], so we have to
       move it to table->record[1].
@@ -6879,8 +7015,13 @@
     int error;
     /* We have a key: search the table using the index */
     if (!table->file->inited && (error= table->file->ha_index_init(0, FALSE)))
-      DBUG_RETURN(error);
+    {
+      slave_print_msg(ERROR_LEVEL, rli, error,
+        "find_and_fetch_row: error in ha_index_init() method of %s handler.",
+        TABLE_HANDLER_NAME(table));
 
+      DBUG_RETURN(error);
+    }
   /*
     Don't print debug messages when running valgrind since they can
     trigger false warnings.
@@ -6899,12 +7040,18 @@
     my_ptrdiff_t const pos=
       table->s->null_bytes > 0 ? table->s->null_bytes - 1 : 0;
     table->record[1][pos]= 0xFF;
+    
     if ((error= table->file->index_read(table->record[1], key,
                                         table->key_info->key_length,
                                         HA_READ_KEY_EXACT)))
     {
+      slave_print_msg(ERROR_LEVEL, rli, error,
+        "find_and_fetch_row: error in index_read() method of %s handler.",
+        TABLE_HANDLER_NAME(table));
+
       table->file->print_error(error, MYF(0));
       table->file->ha_index_end();
+
       DBUG_RETURN(error);
     }
 
@@ -6950,9 +7097,14 @@
       table->record[1][pos]= 0xFF;
       if ((error= table->file->index_next(table->record[1])))
       {
-	table->file->print_error(error, MYF(0));
+        slave_print_msg(ERROR_LEVEL, rli, error,
+          "find_and_fetch_row: error in index_next() method of %s handler.",
+          TABLE_HANDLER_NAME(table));
+
+      	table->file->print_error(error, MYF(0));
         table->file->ha_index_end();
-	DBUG_RETURN(error);
+
+      	DBUG_RETURN(error);
       }
     }
 
@@ -6968,8 +7120,14 @@
 
     /* We don't have a key: search the table using rnd_next() */
     if ((error= table->file->ha_rnd_init(1)))
-      return error;
+    {
+      slave_print_msg(ERROR_LEVEL, rli, error,
+        "find_and_fetch_row: error in ha_rnd_init() method of %s handler.",
+        TABLE_HANDLER_NAME(table));
 
+      DBUG_RETURN(error);
+    }
+    
     /* Continue until we find the right record or have made a full loop */
     do
     {
@@ -6982,23 +7140,29 @@
       my_ptrdiff_t const pos=
         table->s->null_bytes > 0 ? table->s->null_bytes - 1 : 0;
       table->record[1][pos]= 0xFF;
+    
       error= table->file->rnd_next(table->record[1]);
 
-      switch (error)
-      {
+      switch (error) {
+      
       case 0:
       case HA_ERR_RECORD_DELETED:
-	break;
+        break;
 
       case HA_ERR_END_OF_FILE:
-	if (++restart_count < 2)
-	  table->file->ha_rnd_init(1);
-	break;
+      	if (++restart_count < 2)
+      	  table->file->ha_rnd_init(1);
+      	break;
 
       default:
-	table->file->print_error(error, MYF(0));
+        slave_print_msg(ERROR_LEVEL, rli, error,
+          "find_and_fetch_row: error in rnd_next() method of %s handler.",
+          TABLE_HANDLER_NAME(table));
+
+      	table->file->print_error(error, MYF(0));
         table->file->ha_rnd_end();
-	DBUG_RETURN(error);
+      	
+        DBUG_RETURN(error);
       }
     }
     while (restart_count < 2 && record_compare(table));
@@ -7009,6 +7173,10 @@
     table->file->ha_rnd_end();
 
     DBUG_ASSERT(error == HA_ERR_END_OF_FILE || error == 0);
+    
+    if(error)
+        slave_print_msg(ERROR_LEVEL, rli, 0, "find_and_fetch_row: row not found.");
+    
     DBUG_RETURN(error);
   }
 
@@ -7050,12 +7218,12 @@
 #endif
 
 #if !defined(MYSQL_CLIENT) && defined(HAVE_REPLICATION)
-int Delete_rows_log_event::do_before_row_operations(TABLE *table)
+int Delete_rows_log_event::do_before_row_operations(RELAY_LOG_INFO *rli)
 {
   DBUG_ASSERT(m_memory == NULL);
 
-  if ((table->file->ha_table_flags() & HA_PRIMARY_KEY_REQUIRED_FOR_POSITION) &&
-      table->s->primary_key < MAX_KEY)
+  if ((m_table->file->ha_table_flags() & HA_PRIMARY_KEY_REQUIRED_FOR_POSITION) &&
+      m_table->s->primary_key < MAX_KEY)
   {
     /*
       We don't need to allocate any memory for m_after_image and
@@ -7066,30 +7234,36 @@
 
   int error= 0;
 
-  if (table->s->keys > 0)
+  if (m_table->s->keys > 0)
   {
     m_memory=
       my_multi_malloc(MYF(MY_WME),
-		      &m_after_image, table->s->reclength,
-		      &m_key, table->key_info->key_length,
+		      &m_after_image, m_table->s->reclength,
+		      &m_key, m_table->key_info->key_length,
 		      NULL);
   }
   else
   {
-    m_after_image= (byte*)my_malloc(table->s->reclength, MYF(MY_WME));
+    m_after_image= (byte*)my_malloc(m_table->s->reclength, MYF(MY_WME));
     m_memory= (gptr)m_after_image;
     m_key= NULL;
   }
+  
   if (!m_memory)
+  {
+    slave_print_msg(ERROR_LEVEL,rli,0, 
+                    "%s event: out of memory when allocating after image", 
+                    get_type_str());
     return HA_ERR_OUT_OF_MEM;
-
+  }
+  
   return error;
 }
 
-int Delete_rows_log_event::do_after_row_operations(TABLE *table, int error)
+int Delete_rows_log_event::do_after_row_operations(RELAY_LOG_INFO *rli, int error)
 {
   /*error= ToDo:find out what this should really be, this triggers close_scan in nbd, returning error?*/
-  table->file->ha_index_or_rnd_end();
+  m_table->file->ha_index_or_rnd_end();
   my_free(m_memory, MYF(MY_ALLOW_ZERO_PTR)); // Free for multi_malloc
   m_memory= NULL;
   m_after_image= NULL;
@@ -7098,8 +7272,7 @@
   return error;
 }
 
-int Delete_rows_log_event::do_prepare_row(THD *thd, RELAY_LOG_INFO *rli,
-                                          TABLE *table,
+int Delete_rows_log_event::do_prepare_row(RELAY_LOG_INFO *rli,
                                           char const *const row_start,
                                           char const **const row_end)
 {
@@ -7109,38 +7282,50 @@
     This assertion actually checks that there is at least as many
     columns on the slave as on the master.
   */
-  DBUG_ASSERT(table->s->fields >= m_width);
+  DBUG_ASSERT(m_table->s->fields >= m_width);
+
+  error= unpack_row(m_table, m_width, row_start, &m_cols, row_end,
+                    &m_master_reclength, m_table->read_set, 
+                    FALSE /* don't detect missing default values */);
+  
+  if (error)  
+    slave_print_msg(ERROR_LEVEL, rli, error,
+      "Error in %s event: when unpacking row.", get_type_str());
 
-  error= unpack_row(rli, table, m_width, row_start, &m_cols, row_end,
-                    &m_master_reclength, table->read_set, DELETE_ROWS_EVENT);
   /*
     If we will access rows using the random access method, m_key will
     be set to NULL, so we do not need to make a key copy in that case.
-   */
+  */
   if (m_key)
   {
-    KEY *const key_info= table->key_info;
+    KEY *const key_info= m_table->key_info;
 
-    key_copy(m_key, table->record[0], key_info, 0);
+    key_copy(m_key, m_table->record[0], key_info, 0);
   }
 
   return error;
 }
 
-int Delete_rows_log_event::do_exec_row(TABLE *table)
+int Delete_rows_log_event::do_exec_row(RELAY_LOG_INFO *rli)
 {
   int error;
-  DBUG_ASSERT(table != NULL);
+  DBUG_ASSERT(m_table != NULL);
 
-  if (!(error= find_and_fetch_row(table, m_key)))
+  if (!(error= find_and_fetch_row(rli, m_table, m_key)))
   { 
     /*
       Now we should have the right row to delete.  We are using
       record[0] since it is guaranteed to point to a record with the
       correct value.
     */
-    error= table->file->ha_delete_row(table->record[0]);
+    error= m_table->file->ha_delete_row(m_table->record[0]);
+
+    if (error)
+      slave_print_msg(ERROR_LEVEL, rli, error,
+         "%s event: error in ha_delete_row() method of %s handler.",
+         get_type_str(), TABLE_HANDLER_NAME(m_table));
   }
+  
   return error;
 }
 
@@ -7193,38 +7378,45 @@
 #endif
 
 #if !defined(MYSQL_CLIENT) && defined(HAVE_REPLICATION)
-int Update_rows_log_event::do_before_row_operations(TABLE *table)
+int Update_rows_log_event::do_before_row_operations(RELAY_LOG_INFO *rli)
 {
   DBUG_ASSERT(m_memory == NULL);
 
   int error= 0;
 
-  if (table->s->keys > 0)
+  if (m_table->s->keys > 0)
   {
     m_memory=
       my_multi_malloc(MYF(MY_WME),
-		      &m_after_image, table->s->reclength,
-		      &m_key, table->key_info->key_length,
+		      &m_after_image, m_table->s->reclength,
+		      &m_key, m_table->key_info->key_length,
 		      NULL);
   }
   else
   {
-    m_after_image= (byte*)my_malloc(table->s->reclength, MYF(MY_WME));
+    m_after_image= (byte*)my_malloc(m_table->s->reclength, MYF(MY_WME));
     m_memory= (gptr)m_after_image;
     m_key= NULL;
   }
+  
   if (!m_memory)
+  {
+    slave_print_msg(ERROR_LEVEL,rli,0, 
+                    "%s event: out of memory when allocating after image", 
+                    get_type_str());
     return HA_ERR_OUT_OF_MEM;
+  }
+
 
-  table->timestamp_field_type= TIMESTAMP_NO_AUTO_SET;
+  m_table->timestamp_field_type= TIMESTAMP_NO_AUTO_SET;
 
   return error;
 }
 
-int Update_rows_log_event::do_after_row_operations(TABLE *table, int error)
+int Update_rows_log_event::do_after_row_operations(RELAY_LOG_INFO *rli, int error)
 {
   /*error= ToDo:find out what this should really be, this triggers close_scan in nbd, returning error?*/
-  table->file->ha_index_or_rnd_end();
+  m_table->file->ha_index_or_rnd_end();
   my_free(m_memory, MYF(MY_ALLOW_ZERO_PTR));
   m_memory= NULL;
   m_after_image= NULL;
@@ -7233,8 +7425,7 @@
   return error;
 }
 
-int Update_rows_log_event::do_prepare_row(THD *thd, RELAY_LOG_INFO *rli,
-                                          TABLE *table,
+int Update_rows_log_event::do_prepare_row(RELAY_LOG_INFO *rli,
                                           char const *const row_start,
                                           char const **const row_end)
 {
@@ -7244,7 +7435,7 @@
     This assertion actually checks that there is at least as many
     columns on the slave as on the master.
   */
-  DBUG_ASSERT(table->s->fields >= m_width);
+  DBUG_ASSERT(m_table->s->fields >= m_width);
 
   /*
     We need to perform some juggling below since unpack_row() always
@@ -7253,23 +7444,35 @@
   */
 
   /* record[0] is the before image for the update */
-  error= unpack_row(rli, table, m_width, row_start, &m_cols, row_end,
-                    &m_master_reclength, table->read_set, UPDATE_ROWS_EVENT);
-  store_record(table, record[1]);
+  error= unpack_row(m_table, m_width, row_start, &m_cols, row_end,
+                    &m_master_reclength, m_table->read_set, 
+                    FALSE /* don't detect missing defaults */);
+  if (error)
+    slave_print_msg(ERROR_LEVEL, rli, error, 
+      "Error in %s event: when unpacking before image.", get_type_str());
+
+
+  store_record(m_table, record[1]);
+  
   char const *next_start = *row_end;
+  
   /* m_after_image is the after image for the update */
-  error= unpack_row(rli, table, m_width, next_start, &m_cols, row_end,
-                    &m_master_reclength, table->write_set, UPDATE_ROWS_EVENT);
-  bmove_align(m_after_image, table->record[0], table->s->reclength);
-  restore_record(table, record[1]);
+  error= unpack_row(m_table, m_width, next_start, &m_cols, row_end,
+                    &m_master_reclength, m_table->write_set, FALSE );
+  if (error)
+    slave_print_msg(ERROR_LEVEL, rli, error, 
+      "Error in %s event: when unpacking after image.", get_type_str());
+  
+  bmove_align(m_after_image, m_table->record[0], m_table->s->reclength);
+  restore_record(m_table, record[1]);
 
   /*
     Don't print debug messages when running valgrind since they can
     trigger false warnings.
    */
 #ifndef HAVE_purify
-  DBUG_DUMP("record[0]", (const char *)table->record[0], table->s->reclength);
-  DBUG_DUMP("m_after_image", (const char *)m_after_image, table->s->reclength);
+  DBUG_DUMP("record[0]", (const char *)m_table->record[0], m_table->s->reclength);
+  DBUG_DUMP("m_after_image", (const char *)m_after_image, m_table->s->reclength);
 #endif
 
   /*
@@ -7278,19 +7481,19 @@
    */
   if (m_key)
   {
-    KEY *const key_info= table->key_info;
+    KEY *const key_info= m_table->key_info;
 
-    key_copy(m_key, table->record[0], key_info, 0);
+    key_copy(m_key, m_table->record[0], key_info, 0);
   }
 
   return error;
 }
 
-int Update_rows_log_event::do_exec_row(TABLE *table)
+int Update_rows_log_event::do_exec_row(RELAY_LOG_INFO *rli)
 {
-  DBUG_ASSERT(table != NULL);
+  DBUG_ASSERT(m_table != NULL);
 
-  int error= find_and_fetch_row(table, m_key);
+  int error= find_and_fetch_row(rli,m_table, m_key);
   if (error)
     return error;
 
@@ -7307,8 +7510,8 @@
     overwriting the default values that where put there by the
     unpack_row() function.
   */
-  bmove_align(table->record[0], m_after_image, table->s->reclength);
-  copy_extra_record_fields(table, m_master_reclength, m_width);
+  bmove_align(m_table->record[0], m_after_image, m_table->s->reclength);
+  copy_extra_record_fields(m_table, m_master_reclength, m_width);
 
   /*
     Now we have the right row to update.  The old row (the one we're
@@ -7316,7 +7519,12 @@
     We also have copied the original values already in the slave's
     database into the after image delivered from the master.
   */
-  error= table->file->ha_update_row(table->record[1], table->record[0]);
+  error= m_table->file->ha_update_row(m_table->record[1], m_table->record[0]);
+
+  if (error)
+    slave_print_msg(ERROR_LEVEL, rli, error,
+      "Error in %s event: when calling ha_update_row() method of %s handler.",
+      get_type_str(), TABLE_HANDLER_NAME(m_table));
 
   return error;
 }

--- 1.142/sql/log_event.h	2007-01-18 11:26:33 +01:00
+++ 1.143/sql/log_event.h	2007-01-18 11:26:33 +01:00
@@ -1942,66 +1942,78 @@
 
 #if !defined(MYSQL_CLIENT) && defined(HAVE_REPLICATION)
   /*
+     Global remarks for the following do_... primitives.
+     
+     When these methods are called, m_table field is initialized.
+     
+     Metods should report any detect errors using slave_write_msg() 
+     (so that slave's error state is set accordingly). For that purpose 
+     the relay log info object which was passed to exec_event() method 
+     is given as an argument.
+   */ 
+  
+  /**
     Primitive to prepare for a sequence of row executions.
 
-    DESCRIPTION
-
-      Before doing a sequence of do_prepare_row() and do_exec_row()
-      calls, this member function should be called to prepare for the
-      entire sequence. Typically, this member function will allocate
-      space for any buffers that are needed for the two member
-      functions mentioned above.
-
-    RETURN VALUE
+    Before doing a sequence of do_prepare_row() and do_exec_row()
+    calls, this member function should be called to prepare for the
+    entire sequence. Typically, this member function will allocate
+    space for any buffers that are needed for the two member
+    functions mentioned above.
+    
+    @param rli    for error reporting,
 
-      The member function will return 0 if all went OK, or a non-zero
-      error code otherwise.
+    @returns 0 if all went OK, or a non-zero error code otherwise.
   */
-  virtual int do_before_row_operations(TABLE *table) = 0;
+  virtual int do_before_row_operations(RELAY_LOG_INFO *rli) = 0;
 
-  /*
+  /**
     Primitive to clean up after a sequence of row executions.
 
-    DESCRIPTION
+    After doing a sequence of do_prepare_row() and do_exec_row(),
+    this member function should be called to clean up and release
+    any allocated buffers.
+
+    @param rli    for error reporting,
+    @param error  non-zero if errors were detected while processing rows.
     
-      After doing a sequence of do_prepare_row() and do_exec_row(),
-      this member function should be called to clean up and release
-      any allocated buffers.
+    @returns 0 if all went OK, or a non-zero error code otherwise.    
   */
-  virtual int do_after_row_operations(TABLE *table, int error) = 0;
+  virtual int do_after_row_operations(RELAY_LOG_INFO *rli, int error) = 0;
 
-  /*
+  /**
     Primitive to prepare for handling one row in a row-level event.
     
-    DESCRIPTION 
+    The member function prepares for execution of operations needed for one
+    row in a row-level event by reading up data from the buffer containing
+    the row. No specific interpretation of the data is normally done here,
+    since SQL thread specific data is not available: that data is made
+    available for the do_exec function.
+
+    @param rli        for error reporting,
+    @param row_start  pointer to the start of the row,
+    @param row_end    (in/out) pointer to the start of the next row.
 
-      The member function prepares for execution of operations needed for one
-      row in a row-level event by reading up data from the buffer containing
-      the row. No specific interpretation of the data is normally done here,
-      since SQL thread specific data is not available: that data is made
-      available for the do_exec function.
-
-      A pointer to the start of the next row, or NULL if the preparation
-      failed. Currently, preparation cannot fail, but don't rely on this
-      behavior. 
+    @note row_end is set to NULL if the preparation failed. Currently, preparation 
+    cannot fail, but don't rely on this behavior. 
 
-    RETURN VALUE
-      Error code, if something went wrong, 0 otherwise.
+    @returns error code, if something went wrong, 0 otherwise.
    */
-  virtual int do_prepare_row(THD*, RELAY_LOG_INFO*, TABLE*,
-                             char const *row_start, char const **row_end) = 0;
+  virtual int do_prepare_row(RELAY_LOG_INFO *rli, char const *row_start, 
+                                                  char const **row_end) = 0;
 
-  /*
+  /**
     Primitive to do the actual execution necessary for a row.
 
-    DESCRIPTION
-      The member function will do the actual execution needed to handle a row.
+    It is assumed that the row on which operation should be performed was
+    prepared by do_prepare_row() and stored in the event. 
+    
+    @param rli    for error reporting,
 
-    RETURN VALUE
-      0 if execution succeeded, 1 if execution failed.
-      
+    @returns 0 if execution succeeded, 1 if execution failed.
   */
-  virtual int do_exec_row(TABLE *table) = 0;
+  virtual int do_exec_row(RELAY_LOG_INFO *rli) = 0;
+  
 #endif /* !defined(MYSQL_CLIENT) && defined(HAVE_REPLICATION) */
 };
 
@@ -2057,11 +2069,11 @@
   gptr  m_memory;
   byte *m_after_image;
 
-  virtual int do_before_row_operations(TABLE *table);
-  virtual int do_after_row_operations(TABLE *table, int error);
-  virtual int do_prepare_row(THD*, RELAY_LOG_INFO*, TABLE*,
-                             char const *row_start, char const **row_end);
-  virtual int do_exec_row(TABLE *table);
+  virtual int do_before_row_operations(RELAY_LOG_INFO *rli);
+  virtual int do_after_row_operations(RELAY_LOG_INFO *rli, int error);
+  virtual int do_prepare_row(RELAY_LOG_INFO *rli, char const *row_start, 
+                                                  char const **row_end);
+  virtual int do_exec_row(RELAY_LOG_INFO *rli);
 #endif
 };
 
@@ -2122,11 +2134,11 @@
   byte *m_key;
   byte *m_after_image;
 
-  virtual int do_before_row_operations(TABLE *table);
-  virtual int do_after_row_operations(TABLE *table, int error);
-  virtual int do_prepare_row(THD*, RELAY_LOG_INFO*, TABLE*,
-                             char const *row_start, char const **row_end);
-  virtual int do_exec_row(TABLE *table);
+  virtual int do_before_row_operations(RELAY_LOG_INFO *rli);
+  virtual int do_after_row_operations(RELAY_LOG_INFO *rli, int error);
+  virtual int do_prepare_row(RELAY_LOG_INFO *rli, char const *row_start, 
+                                                  char const **row_end);
+  virtual int do_exec_row(RELAY_LOG_INFO *rli);
 #endif /* !defined(MYSQL_CLIENT) && defined(HAVE_REPLICATION) */
 };
 
@@ -2193,11 +2205,11 @@
   byte *m_key;
   byte *m_after_image;
 
-  virtual int do_before_row_operations(TABLE *table);
-  virtual int do_after_row_operations(TABLE *table, int error);
-  virtual int do_prepare_row(THD*, RELAY_LOG_INFO*, TABLE*,
-                             char const *row_start, char const **row_end);
-  virtual int do_exec_row(TABLE *table);
+  virtual int do_before_row_operations(RELAY_LOG_INFO *rli);
+  virtual int do_after_row_operations(RELAY_LOG_INFO *rli, int error);
+  virtual int do_prepare_row(RELAY_LOG_INFO *rli, char const *row_start, 
+                                                  char const **row_end);
+  virtual int do_exec_row(RELAY_LOG_INFO *rli);
 #endif
 };
 

--- 1.301/sql/slave.cc	2007-01-18 11:26:34 +01:00
+++ 1.302/sql/slave.cc	2007-01-18 11:26:34 +01:00
@@ -1550,30 +1550,43 @@
   DBUG_RETURN(0);
 }
 
+#undef EOF
 
-/*
+namespace rpl {
+  namespace read_event_result {
+    
+    /// Enum type used to report result of reading a binlog event
+    /// packet from master.
+    
+    enum type { OK=0, RETRY, EOF, ERROR };
+  }
+}
+
+/**
   Read one event from the master
 
-  SYNOPSIS
-    read_event()
-    mysql               MySQL connection
-    mi                  Master connection information
-    suppress_warnings   TRUE when a normal net read timeout has caused us to
-                        try a reconnect.  We do not want to print anything to
-                        the error log in this case because this a anormal
-                        event in an idle server.
-
-    RETURN VALUES
-    'packet_error'      Error
-    number              Length of packet
+  @param mysql    MySQL connection to the master
+  @param mi       Master connection information
+  @param res      (out) Result of the read is stored here:
+                    - OK:    successful read,
+                    - RETRY: transient error - caller should try to read again, 
+                    - EOF:   eof packed was received,
+                    - ERROR: serious error, can't continue reading.
+  
+  @returns length of the packet read or packet_error if read failed.
+  
+  @fixme Better EOF detection: in some cases inability to read next packet means
+  that master has quit which should be distinguished from other connection errors.  
 */
 
-static ulong read_event(MYSQL* mysql, MASTER_INFO *mi, bool* suppress_warnings)
+// @todo return operation result and store data length in output parameter
+
+static ulong read_event(MYSQL *mysql, MASTER_INFO *mi, rpl::read_event_result::type *res)
 {
   ulong len;
   DBUG_ENTER("read_event");
 
-  *suppress_warnings= 0;
+  *res= rpl::read_event_result::OK;
   /*
     my_real_read() will time us out
     We check if we were told to die, and if not, try reading again
@@ -1584,31 +1597,62 @@
 #endif
 
   len = cli_safe_read(mysql);
+  int error_no= mysql_errno(mysql);
+  
+  // Check for errors
   if (len == packet_error || (long) len < 1)
   {
-    if (mysql_errno(mysql) == ER_NET_READ_INTERRUPTED)
-    {
+    *res= rpl::read_event_result::ERROR;
+    
+    
+    switch (error_no) {
+    
+    case ER_NET_READ_INTERRUPTED:
+      
+      // TODO: find out how to get correct error description: ER(errno) or mysql_error(errno)?
+      sql_print_information("Slave: read_event: Interrupted (%s).", ER(error_no));
+      
       /*
-        We are trying a normal reconnect after a read timeout;
-        we suppress prints to .err file as long as the reconnect
-        happens without problems
+        We assume this is a read timeout so we request the caller to retry the
+        operation.
       */
-      *suppress_warnings= TRUE;
+      *res= rpl::read_event_result::RETRY;
+      break;
+      
+    case ER_NET_PACKET_TOO_LARGE:
+ 
+      sql_print_error(
+        "Slave: read_event: Log entry on master is longer than "
+        "max_allowed_packet (%ld). If the entry is correct, restart the server "
+        "with a higher value of max_allowed_packet",
+        current_thd->variables.max_allowed_packet
+      );
+      break;
+ 
+    default:
+      sql_print_error("Slave: read_event: Error %d reading packet from master (%s).",
+                      error_no, ER(error_no));
+      break;
+      
     }
-    else
-      sql_print_error("Error reading packet from server: %s ( server_errno=%d)",
-                      mysql_error(mysql), mysql_errno(mysql));
+    
     DBUG_RETURN(packet_error);
   }
 
-  /* Check if eof packet */
+  // Check if eof packet 
   if (len < 8 && mysql->net.read_pos[0] == 254)
   {
-    sql_print_information("Slave: received end packet from server, apparent "
-                          "master shutdown: %s",
-                     mysql_error(mysql));
-     DBUG_RETURN(packet_error);
+    *res= rpl::read_event_result::EOF;
+    
+    sql_print_information(
+      "Slave: read_event: Received end packet from master (%s).",
+      ER(error_no)
+    );
+    
+    DBUG_RETURN(packet_error);
   }
+  
+  // Read ok.
 
   DBUG_PRINT("exit", ("len: %lu  net->read_pos[4]: %d",
                       len, mysql->net.read_pos[4]));
@@ -1676,6 +1720,28 @@
   DBUG_RETURN(0);
 }
 
+
+/**
+  Read next event from the relay log and execute it.
+  
+  This function handles such aspects of event execution as:
+  - Handle the UNTIL barier, i.e. don't execute event (and stop replication)
+    if a given position in the log has been reached.
+  - Avoid cyclic replication (e.g. skip events comming from the same server)
+  - Skip number of events specified by user.
+  - Handle transient errors by positining ralay log so that next call will
+    re-execute whole event group. Keep repetition count so that the group is
+    not repeated ad infinitum.
+  
+  @note Rafal thinks that all these tasks should be handled outside this function 
+  (i.e. in the SQL thread).
+  
+  After reading and executing the event, relay log is positioned at next
+  event to execute. In case of retries (caused by transient errors), the next 
+  event is the first event in the currently executed event group.
+  
+ */ 
+
 static int exec_relay_log_event(THD* thd, RELAY_LOG_INFO* rli)
 {
   DBUG_ENTER("exec_relay_log_event");
@@ -1686,6 +1752,7 @@
      wait for something for example inside of next_event().
    */
   pthread_mutex_lock(&rli->data_lock);
+  
   /*
     This tests if the position of the end of the last previous executed event
     hits the UNTIL barrier.
@@ -1715,10 +1782,12 @@
 
   if (sql_slave_killed(thd,rli))
   {
+    sql_print_information("Slave SQL thread killed while waiting for next event.");
     pthread_mutex_unlock(&rli->data_lock);
     delete ev;
     DBUG_RETURN(1);
   }
+
   if (ev)
   {
     int type_code = ev->get_type_code();
@@ -1747,7 +1816,21 @@
          type_code != ROTATE_EVENT && type_code != STOP_EVENT &&
          type_code != START_EVENT_V3 && type_code!= FORMAT_DESCRIPTION_EVENT))
     {
-      DBUG_PRINT("info", ("event skipped"));
+      // TODO: print event coordinates
+      if (rli->slave_skip_counter)
+        sql_print_information(
+          "Slave SQL thread: exec_relay_log_event: skipping %s event "
+          "(slave_skip_counter=%d).", 
+          ev->get_type_str(), rli->slave_skip_counter
+        );
+      else  // TODO: don't report internal events (better: don't send them to SQL thread)
+        sql_print_information(
+          "Slave SQL thread: exec_relay_log_event: skipping %s event "
+          "comming from the same server.", ev->get_type_str()
+        );
+      
+      DBUG_PRINT("info", ("(slave_skip_counter=%d).", rli->slave_skip_counter));
+      
       /*
         We only skip the event here and do not increase the group log
         position.  In the event that we have to restart, this means
@@ -1793,6 +1876,8 @@
       DBUG_RETURN(0);                                 // avoid infinite update loops
     }
     pthread_mutex_unlock(&rli->data_lock);
+    
+    // TODO: print event group position
 
     thd->server_id = ev->server_id; // use the original server id for logging
     thd->set_time();                            // time the query
@@ -1804,9 +1889,13 @@
                         FLAGSTR(thd->options, OPTION_NOT_AUTOCOMMIT),
                         FLAGSTR(thd->options, OPTION_BEGIN)));
 
-    exec_res = ev->exec_event(rli);
-    DBUG_PRINT("info", ("exec_event result: %d", exec_res));
+    thd->no_warnings_for_error= TRUE;  // don't double report errors as warnings when
+                                       // using my_error()
+    
+    exec_res= ev->exec_event(rli);
+    
     DBUG_ASSERT(rli->sql_thd==thd);
+     
     /*
        Format_description_log_event should not be deleted because it will be
        used to read info about the relay log's format; it will be deleted when
@@ -1817,8 +1906,10 @@
       DBUG_PRINT("info", ("Deleting the event after it has been executed"));
       delete ev;
     }
+    
     if (slave_trans_retries)
     {
+
       if (exec_res && has_temporary_error(thd))
       {
         const char *errmsg;
@@ -1826,8 +1917,8 @@
           We were in a transaction which has been rolled back because of a
           temporary error;
           let's seek back to BEGIN log event and retry it all again.
-	  Note, if lock wait timeout (innodb_lock_wait_timeout exceeded)
-	  there is no rollback since 5.0.13 (ref: manual).
+          Note, if lock wait timeout (innodb_lock_wait_timeout exceeded)
+          there is no rollback since 5.0.13 (ref: manual).
           We have to not only seek but also
           a) init_master_info(), to seek back to hot relay log's start for later
           (for when we will come back to this hot log after re-processing the
@@ -1838,13 +1929,16 @@
         */
         if (rli->trans_retries < slave_trans_retries)
         {
+          sql_print_information("Slave SQL thread: Transient error, re-executing event group.");
+          
           if (init_master_info(rli->mi, 0, 0, 0, SLAVE_SQL))
-            sql_print_error("Failed to initialize the master info structure");
+            sql_print_error("Slave SQL thread: Failed to initialize "
+                            "the master info structure");
           else if (init_relay_log_pos(rli,
                                       rli->group_relay_log_name,
                                       rli->group_relay_log_pos,
                                       1, &errmsg, 1))
-            sql_print_error("Error initializing relay log position: %s",
+            sql_print_error("Slave SQL thread: Error initializing relay log position: %s",
                             errmsg);
           else
           {
@@ -1875,21 +1969,35 @@
           the execution will proceed as usual; in the case of a
           non-transient error, the slave will stop with an error.
          */
-        rli->trans_retries= 0; // restart from fresh
+        if (rli->trans_retries)
+        { 
+          if (exec_res) 
+            sql_print_information("Slave SQL thread: Failed to execute event group "
+                                  "after %d re-tries",rli->trans_retries);
+          else
+            sql_print_information("Slave SQL thread: Successful event group execution "
+                                  "after %d re-tries",rli->trans_retries);
+          rli->trans_retries= 0;
+        }         
       }
-    }
+
+    }  // if (slave_trans_retries)
+
     DBUG_RETURN(exec_res);
   }
+  
   pthread_mutex_unlock(&rli->data_lock);
-  slave_print_msg(ERROR_LEVEL, rli, 0, "\
-Could not parse relay log event entry. The possible reasons are: the master's \
-binary log is corrupted (you can check this by running 'mysqlbinlog' on the \
-binary log), the slave's relay log is corrupted (you can check this by running \
-'mysqlbinlog' on the relay log), a network problem, or a bug in the master's \
-or slave's MySQL code. If you want to check the master's binary log or slave's \
-relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' \
-on this slave.\
-");
+  
+  slave_print_msg(ERROR_LEVEL, rli, 0, 
+    "Could not parse relay log event entry. The possible reasons are: the master's "
+    "binary log is corrupted (you can check this by running 'mysqlbinlog' on the "
+    "binary log), the slave's relay log is corrupted (you can check this by running "
+    "'mysqlbinlog' on the relay log), a network problem, or a bug in the master's "
+    "or slave's MySQL code. If you want to check the master's binary log or slave's "
+    "relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' "
+    "on this slave."
+  );
+  
   DBUG_RETURN(1);
 }
 
@@ -1926,13 +2034,15 @@
 
   pthread_detach_this_thread();
   thd->thread_stack= (char*) &thd; // remember where our stack is
+  
   if (init_slave_thread(thd, SLAVE_THD_IO))
   {
     pthread_cond_broadcast(&mi->start_cond);
     pthread_mutex_unlock(&mi->run_lock);
-    sql_print_error("Failed during slave I/O thread initialization");
+    sql_print_error("Slave I/O thread: Thread initialization failed");
     goto err;
   }
+  
   mi->io_thd = thd;
   pthread_mutex_lock(&LOCK_thread_count);
   threads.append(thd);
@@ -1953,33 +2063,79 @@
   }
 
   thd->proc_info = "Connecting to master";
-  // we can get killed during safe_connect
-  if (!safe_connect(thd, mysql, mi))
+  
+ reconnect:  
+  
+  if (retry_count > master_retry_count)
   {
-    sql_print_information("Slave I/O thread: connected to master '%s@%s:%d',"
-                          "replication started in log '%s' at position %s",
-                          mi->user, mi->host, mi->port,
-			  IO_RPL_LOG_NAME,
-			  llstr(mi->master_log_pos,llbuff));
-  /*
-    Adding MAX_LOG_EVENT_HEADER_LEN to the max_packet_size on the I/O
-    thread, since a replication event can become this much larger than
-    the corresponding packet (query) sent from client to master.
-  */
-    mysql->net.max_packet_size= thd->net.max_packet_size+= MAX_LOG_EVENT_HEADER;
+    sql_print_error("Slave I/O thread: Too many connection retries, aborting.");
+    goto err;                             // Don't retry forever
   }
-  else
+
+  if (retry_count) // we are already connected, disconnect first
+  {
+    mi->slave_running= MYSQL_SLAVE_RUN_NOT_CONNECT;
+    #ifdef SIGNAL_WITH_VIO_CLOSE
+    thd->clear_active_vio();
+    #endif
+    
+    end_server(mysql);
+    
+    sql_print_information(
+      "Slave I/O thread: Closed connection to master, will reconnect "
+      "(retry_count: %d).", retry_count
+    );
+    
+    /*
+      First time retry immediately, assuming that we can recover
+      right away - if first time fails, sleep between re-tries
+      hopefuly the admin can fix the problem sometime
+    */
+    if (retry_count>1)
+    {
+      
+      thd->proc_info= "Waiting before reconnect";
+      
+      safe_sleep(thd,mi->connect_retry,(CHECK_KILLED_FUNC)io_slave_killed,
+                 (void*)mi);
+    }
+    
+    if (io_slave_killed(thd,mi))
+    {
+      sql_print_information("Slave I/O thread: Killed while waiting before reconnect.");
+      goto err;
+    }
+  
+    thd->proc_info = "Reconnecting to the master";
+  
+  }
+  
+  // Establish connection to master
+  
+  if (connect_to_master(thd, mysql, mi, 
+                        retry_count>0,  // reconnecting? 
+                        FALSE))         // suppress warnings?
+    goto err;
+  
+  // we can get killed during connection
+  if (io_slave_killed(thd,mi))
   {
     sql_print_information("Slave I/O thread killed while connecting to master");
     goto err;
   }
-
-connected:
-
+  
+  sql_print_information("Slave I/O thread: Connected to master '%s@%s:%d', "
+                        "Reading log '%s' from position %s",
+                        mi->user, mi->host, mi->port,
+			                  IO_RPL_LOG_NAME, llstr(mi->master_log_pos,llbuff));
+  retry_count++;
+  
+  
   // TODO: the assignment below should be under mutex (5.0)
   mi->slave_running= MYSQL_SLAVE_RUN_CONNECT;
   thd->slave_net = &mysql->net;
   thd->proc_info = "Checking master version";
+  
   if (get_master_version_and_clock(mysql, mi))
     goto err;
 
@@ -1996,64 +2152,31 @@
   }
 
   DBUG_PRINT("info",("Starting reading binary log from master"));
+  
   while (!io_slave_killed(thd,mi))
   {
     bool suppress_warnings= 0;
     thd->proc_info = "Requesting binlog dump";
+    
+    
     if (request_dump(mysql, mi, &suppress_warnings))
-    {
-      sql_print_error("Failed on request_dump()");
+    {      
       if (io_slave_killed(thd,mi))
       {
-        sql_print_information("Slave I/O thread killed while requesting master \
-dump");
+        sql_print_information("Slave I/O thread: Killed while requesting master dump");
         goto err;
       }
 
-      mi->slave_running= MYSQL_SLAVE_RUN_NOT_CONNECT;
-      thd->proc_info= "Waiting to reconnect after a failed binlog dump request";
-#ifdef SIGNAL_WITH_VIO_CLOSE
-      thd->clear_active_vio();
-#endif
-      end_server(mysql);
-      /*
-        First time retry immediately, assuming that we can recover
-        right away - if first time fails, sleep between re-tries
-        hopefuly the admin can fix the problem sometime
-      */
-      if (retry_count++)
-      {
-        if (retry_count > master_retry_count)
-          goto err;                             // Don't retry forever
-        safe_sleep(thd,mi->connect_retry,(CHECK_KILLED_FUNC)io_slave_killed,
-                   (void*)mi);
-      }
-      if (io_slave_killed(thd,mi))
-      {
-        sql_print_information("Slave I/O thread killed while retrying master \
-dump");
-        goto err;
-      }
+      sql_print_information("Slave I/O thread: Reconnecting after failed dump request");
 
-      thd->proc_info = "Reconnecting after a failed binlog dump request";
-      if (!suppress_warnings)
-        sql_print_error("Slave I/O thread: failed dump request, \
-reconnecting to try again, log '%s' at postion %s", IO_RPL_LOG_NAME,
-                        llstr(mi->master_log_pos,llbuff));
-      if (safe_reconnect(thd, mysql, mi, suppress_warnings) ||
-          io_slave_killed(thd,mi))
-      {
-        sql_print_information("Slave I/O thread killed during or \
-after reconnect");
-        goto err;
-      }
+      thd->proc_info= "Reconnecting after failed dump request";
 
-      goto connected;
+      goto reconnect;
     }
 
     while (!io_slave_killed(thd,mi))
     {
-      bool suppress_warnings= 0;
+      rpl::read_event_result::type res;
       /*
          We say "waiting" because read_event() will wait if there's nothing to
          read. But if there's something to read, it will not wait. The
@@ -2061,81 +2184,56 @@
          we're in fact receiving nothing.
       */
       thd->proc_info = "Waiting for master to send event";
-      ulong event_len = read_event(mysql, mi, &suppress_warnings);
+      
+      ulong event_len = read_event(mysql, mi, &res);
+      
       if (io_slave_killed(thd,mi))
       {
-        if (global_system_variables.log_warnings)
-          sql_print_information("Slave I/O thread killed while reading event");
+        if (res != rpl::read_event_result::EOF)
+          sql_print_information("Slave I/O thread: Killed while reading event "
+                                "(master shutdown?)");
         goto err;
       }
 
-      if (event_len == packet_error)
-      {
-        uint mysql_error_number= mysql_errno(mysql);
-        if (mysql_error_number == ER_NET_PACKET_TOO_LARGE)
-        {
-          sql_print_error("\
-Log entry on master is longer than max_allowed_packet (%ld) on \
-slave. If the entry is correct, restart the server with a higher value of \
-max_allowed_packet",
-                          thd->variables.max_allowed_packet);
-          goto err;
-        }
-        if (mysql_error_number == ER_MASTER_FATAL_ERROR_READING_BINLOG)
-        {
-          sql_print_error(ER(mysql_error_number), mysql_error_number,
-                          mysql_error(mysql));
-          goto err;
-        }
-        mi->slave_running= MYSQL_SLAVE_RUN_NOT_CONNECT;
+      switch (res) {
+        
+      case rpl::read_event_result::OK:
+        break; // continue if no error  
+        
+      case rpl::read_event_result::RETRY:
+        sql_print_information("Slave I/O thread: "
+                              "Reconnectiong to master after failed event read, "
+                              "last event group at %s in '%s'.",
+                              llstr(mi->master_log_pos,llbuff), IO_RPL_LOG_NAME);
+
         thd->proc_info = "Waiting to reconnect after a failed master event read";
-#ifdef SIGNAL_WITH_VIO_CLOSE
-        thd->clear_active_vio();
-#endif
-        end_server(mysql);
-        if (retry_count++)
-        {
-          if (retry_count > master_retry_count)
-            goto err;                           // Don't retry forever
-          safe_sleep(thd,mi->connect_retry,(CHECK_KILLED_FUNC)io_slave_killed,
-                     (void*) mi);
-        }
-        if (io_slave_killed(thd,mi))
-        {
-          if (global_system_variables.log_warnings)
-            sql_print_information("Slave I/O thread killed while waiting to \
-reconnect after a failed read");
-          goto err;
-        }
-        thd->proc_info = "Reconnecting after a failed master event read";
-        if (!suppress_warnings)
-          sql_print_information("Slave I/O thread: Failed reading log event, \
-reconnecting to retry, log '%s' position %s", IO_RPL_LOG_NAME,
-                          llstr(mi->master_log_pos, llbuff));
-        if (safe_reconnect(thd, mysql, mi, suppress_warnings) ||
-            io_slave_killed(thd,mi))
-        {
-          if (global_system_variables.log_warnings)
-            sql_print_information("Slave I/O thread killed during or after a \
-reconnect done to recover from failed read");
-          goto err;
-        }
-        goto connected;
-      } // if (event_len == packet_error)
+        
+        goto reconnect;
+      
+      default:
+        goto err;  // stop I/O thread in all other cases
+      
+      }
 
-      retry_count=0;                    // ok event, reset retry counter
+      if( retry_count > 1 )
+      {
+        sql_print_information("Slave I/O thread: Successful event read after reconnection "
+                              "(%d retries).", retry_count);
+        retry_count=1;    // ok event, reset retry counter 
+      }
+      
       thd->proc_info = "Queueing master event to the relay log";
+      
       if (queue_event(mi,(const char*)mysql->net.read_pos + 1,
                       event_len))
-      {
-        sql_print_error("Slave I/O thread could not queue event from master");
         goto err;
-      }
+      
       if (flush_master_info(mi, 1))
       {
-        sql_print_error("Failed to flush master info file");
+        sql_print_error("Slave I/O thread: Failed to flush master info file");
         goto err;
       }
+      
       /*
         See if the relay logs take too much space.
         We don't lock mi->rli.log_space_lock here; this dirty read saves time
@@ -2151,8 +2249,8 @@
 #ifndef DBUG_OFF
       {
         char llbuf1[22], llbuf2[22];
-        DBUG_PRINT("info", ("log_space_limit=%s log_space_total=%s \
-ignore_log_space_limit=%d",
+        DBUG_PRINT("info", ("log_space_limit=%s log_space_total=%s "
+                            "ignore_log_space_limit=%d",
                             llstr(rli->log_space_limit,llbuf1),
                             llstr(rli->log_space_total,llbuf2),
                             (int) rli->ignore_log_space_limit));
@@ -2164,8 +2262,8 @@
           !rli->ignore_log_space_limit)
         if (wait_for_relay_log_space(rli))
         {
-          sql_print_error("Slave I/O thread aborted while waiting for relay \
-log space");
+          sql_print_error("Slave I/O thread: Aborted while waiting for relay "
+                          "log space.");
           goto err;
         }
     }
@@ -2174,7 +2272,7 @@
   // error = 0;
 err:
   // print the current replication position
-  sql_print_information("Slave I/O thread exiting, read up to log '%s', position %s",
+  sql_print_information("Slave I/O thread: Exiting, read up to log '%s', position %s",
                   IO_RPL_LOG_NAME, llstr(mi->master_log_pos,llbuff));
   VOID(pthread_mutex_lock(&LOCK_thread_count));
   thd->query = thd->db = 0; // extra safety
@@ -2252,6 +2350,7 @@
   rli->slave_run_id++;
 
   pthread_detach_this_thread();
+  
   if (init_slave_thread(thd, SLAVE_THD_SQL))
   {
     /*
@@ -2260,9 +2359,10 @@
     */
     pthread_cond_broadcast(&rli->start_cond);
     pthread_mutex_unlock(&rli->run_lock);
-    sql_print_error("Failed during slave thread initialization");
+    sql_print_error("Slave SQL thread: Thread initialization failed.");
     goto err;
   }
+  
   thd->init_for_queries();
   rli->sql_thd= thd;
   thd->temporary_tables = rli->save_temporary_tables; // restore temp tables
@@ -2306,7 +2406,7 @@
                          1 /*need data lock*/, &errmsg,
                          1 /*look for a description_event*/))
   {
-    sql_print_error("Error initializing relay log position: %s",
+    sql_print_error("Slave SQL thread: Error initializing relay log position: %s",
                     errmsg);
     goto err;
   }
@@ -2341,10 +2441,12 @@
                             rli->group_master_log_name,
                             llstr(rli->group_master_log_pos,llbuff)));
   if (global_system_variables.log_warnings)
-    sql_print_information("Slave SQL thread initialized, starting replication in \
-log '%s' at position %s, relay log '%s' position: %s", RPL_LOG_NAME,
-                    llstr(rli->group_master_log_pos,llbuff),rli->group_relay_log_name,
-                    llstr(rli->group_relay_log_pos,llbuff1));
+    sql_print_information(
+      "Slave SQL thread initialized, starting replication in "
+      "log '%s' at position %s, relay log '%s' position: %s", 
+      RPL_LOG_NAME, llstr(rli->group_master_log_pos,llbuff),
+      rli->group_relay_log_name, llstr(rli->group_relay_log_pos,llbuff1)
+    );
 
   /* execute init_slave variable */
   if (sys_init_slave.value_length)
@@ -2352,8 +2454,7 @@
     execute_init_command(thd, &sys_init_slave, &LOCK_sys_init_slave);
     if (thd->query_error)
     {
-      sql_print_error("\
-Slave SQL thread aborted. Can't execute init_slave query");
+      sql_print_error("Slave SQL thread aborted. Can't execute init_slave query");
       goto err;
     }
   }
@@ -2365,8 +2466,14 @@
     thd->proc_info = "Reading event from the relay log";
     DBUG_ASSERT(rli->sql_thd == thd);
     THD_CHECK_SENTRY(thd);
+    
     if (exec_relay_log_event(thd,rli))
     {
+      // FIXME: move whole error reporting inside exec_relay_log_event where we have
+      // more information to report. Alternatively, move all event execution logic 
+      // into this loop and leave to exec_relay_log_event the simple task of executing
+      // a single event.
+      
       // do not scare the user if SQL thread was simply killed or stopped
       if (!sql_slave_killed(thd,rli))
       {
@@ -2374,17 +2481,25 @@
           retrieve as much info as possible from the thd and, error codes and warnings
           and print this to the error log as to allow the user to locate the error
         */
-        if (thd->net.last_errno != 0)
+        
+        int last_errno= thd->net.last_errno;
+        
+        if (last_errno != 0)
         {
           if (rli->last_slave_errno == 0)
           {
-            slave_print_msg(ERROR_LEVEL, rli, thd->net.last_errno,
-                            thd->net.last_error ?
-                            thd->net.last_error : "<no message>");
+            slave_print_msg(ERROR_LEVEL, rli, last_errno,
+              "Slave SQL thread: error after executing event: %s",
+              thd->net.last_error ?
+              thd->net.last_error : "<no message>");
           }
-          else if (rli->last_slave_errno != (int) thd->net.last_errno)
+          else if (last_errno != ER_UNKNOWN_ERROR && rli->last_slave_errno != last_errno)
           {
-            sql_print_error("Slave (additional info): %s Error_code: %d",
+            // TODO: reconsider printing unknown errors when their desctiptions are
+            //       meaningful. In that case how to avoid double printing messages
+            //       reported via slave_print_msg?
+             
+            sql_print_error("Slave SQL thread: Additional error info: %s Error_code: %d",
                             thd->net.last_error ?
                             thd->net.last_error : "<no message>",
                             thd->net.last_errno);
@@ -2394,13 +2509,17 @@
         /* Print any warnings issued */
         List_iterator_fast<MYSQL_ERROR> it(thd->warn_list);
         MYSQL_ERROR *err;
+        
         while ((err= it++))
-          sql_print_warning("Slave: %s Error_code: %d",err->msg, err->code);
+          if( (int)err->code != rli->last_slave_errno && (int)err->code != last_errno )
+            sql_print_warning("Slave SQL thread: %s Error_code: %d",err->msg, err->code);
 
-        sql_print_error("\
-Error running query, slave SQL thread aborted. Fix the problem, and restart \
-the slave SQL thread with \"SLAVE START\". We stopped at log \
-'%s' position %s", RPL_LOG_NAME, llstr(rli->group_master_log_pos, llbuff));
+        sql_print_error(
+          "Slave SQL thread: Error executing event; thread aborted. "
+          "Fix the problem, and restart the slave SQL thread with \"SLAVE START\". "
+          "We stopped at log '%s' position %s", 
+          RPL_LOG_NAME, llstr(rli->group_master_log_pos, llbuff)
+        );
       }
       goto err;
     }
@@ -3112,14 +3231,12 @@
     {
       last_errno=mysql_errno(mysql);
       suppress_warnings= 0;
-      sql_print_error("Slave I/O thread: error %s to master "
-                      "'%s@%s:%d':                       \
-Error: '%s'  errno: %d  retry-time: %d  retries: %lu",
+      sql_print_error("Slave I/O thread: Error %s to master '%s@%s:%d': "                       \
+                      "Error: %d (%s) retry-time: %d  retries: %lu",
                       (reconnect ? "reconnecting" : "connecting"),
                       mi->user, mi->host, mi->port,
-                      mysql_error(mysql), last_errno,
-                      mi->connect_retry,
-                      master_retry_count);
+                      last_errno, mysql_error(mysql),
+                      mi->connect_retry, master_retry_count);
     }
     /*
       By default we try forever. The reason is that failure will trigger
@@ -3140,17 +3257,15 @@
 
   if (!slave_was_killed)
   {
-    if (reconnect)
-    {
-      if (!suppress_warnings && global_system_variables.log_warnings)
-        sql_print_information("Slave: connected to master '%s@%s:%d',\
-replication resumed in log '%s' at position %s", mi->user,
-                        mi->host, mi->port,
-                        IO_RPL_LOG_NAME,
-                        llstr(mi->master_log_pos,llbuff));
-    }
-    else
+    if (!reconnect)
     {
+      /*
+        Adding MAX_LOG_EVENT_HEADER_LEN to the max_packet_size on the I/O
+        thread, since a replication event can become this much larger than
+        the corresponding packet (query) sent from client to master.
+      */
+      mysql->net.max_packet_size= thd->net.max_packet_size+= MAX_LOG_EVENT_HEADER;
+
       change_rpl_status(RPL_IDLE_SLAVE,RPL_ACTIVE_SLAVE);
       general_log_print(thd, COM_CONNECT_OUT, "%s@%s:%d",
                         mi->user, mi->host, mi->port);
@@ -3590,16 +3705,10 @@
       break;                                    // To end of function
     }
   }
-  if (!errmsg && global_system_variables.log_warnings)
-  {
-    sql_print_information("Error reading relay log event: %s",
-                          "slave SQL thread was killed");
-    DBUG_RETURN(0);
-  }
-
+  
 err:
   if (errmsg)
-    sql_print_error("Error reading relay log event: %s", errmsg);
+    sql_print_error("Slave: next_event: Error reading relay log: %s", errmsg);
   DBUG_RETURN(0);
 }
 

--- 1.4/sql/rpl_utility.cc	2007-01-18 11:26:34 +01:00
+++ 1.5/sql/rpl_utility.cc	2007-01-18 11:26:34 +01:00
@@ -129,7 +129,7 @@
     DBUG_ASSERT(tsh->db.str && tsh->table_name.str);
     error= 1;
     slave_print_msg(ERROR_LEVEL, rli, ER_BINLOG_ROW_WRONG_TABLE_DEF,
-                    "Table width mismatch - "
+                    "table_def::compatible_with: Table width mismatch - "
                     "received %u columns, %s.%s has %u columns",
                     (uint) size(), tsh->db.str, tsh->table_name.str,
                     tsh->fields);
@@ -143,7 +143,7 @@
       DBUG_ASSERT(tsh->db.str && tsh->table_name.str);
       error= 1;
       slave_print_msg(ERROR_LEVEL, rli, ER_BINLOG_ROW_WRONG_TABLE_DEF,
-                      "Column %d type mismatch - "
+                      "table_def::compatible_with: Column %d type mismatch - "
                       "received type %d, %s.%s has type %d",
                       col, type(col), tsh->db.str, tsh->table_name.str,
                       table->field[col]->type());

--- 1.2/sql/rpl_utility.h	2007-01-18 11:26:34 +01:00
+++ 1.3/sql/rpl_utility.h	2007-01-18 11:26:34 +01:00
@@ -65,7 +65,7 @@
       size  Number of elements in array 'types'
    */
   table_def(field_type *types, my_size_t size)
-    : m_type(types), m_size(size)
+    : m_size(size), m_type(types)
   {
   }
 

--- 1.10/sql/partition_element.h	2007-01-18 11:26:34 +01:00
+++ 1.11/sql/partition_element.h	2007-01-18 11:26:34 +01:00
@@ -85,9 +85,10 @@
   partition_element(partition_element *part_elem)
   : part_max_rows(part_elem->part_max_rows),
     part_min_rows(part_elem->part_min_rows),
+    range_value(0), 
     partition_name(NULL),
     tablespace_name(part_elem->tablespace_name),
-    range_value(0), part_comment(part_elem->part_comment),
+    part_comment(part_elem->part_comment),
     data_file_name(part_elem->data_file_name),
     index_file_name(part_elem->index_file_name),
     engine_type(part_elem->engine_type),
Thread
bk commit into 5.1 tree (rafal:1.2372) BUG#25597rsomla18 Jan