MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:Andrei Elkin Date:April 27 2009 9:18pm
Subject:bzr commit into mysql-6.0-rpl branch (aelkin:2842) Bug#38716 Bug#44312
View as plain text  
#At file:///home/andrei/MySQL/BZR/FIXES/6.0-rpl-bug38716-restart_slave_kill_crashed/ based on revid:alfranio.correia@stripped

 2842 Andrei Elkin	2009-04-28
      Bug #38716 slave crashed after 'stop slave' during concurrent stop/start/reset
      Bug #44312  deadlock between IO thread and SLAVE START
      
      The issue in terminate_slave_threads() of bug#38716 was reproduced in a slighly
      different form. terminate_slave_threads() should not acquire run_lock when it's
      called from slave_thread()->start_slave_threads(); 
      that leads to an assertion on run_lock mutex.
      OTH, init_slave()->start_slave_threads() path requires terminate_slave_threads() to
      be invoked with skip_lock == false.
      
      No other crashes has been found using the regression stress test program.
      Another issues is a deadlock as described separately in Bug #44312.
      It was caused by grabbing two mutexes by IO thread and STARTing SLAVE thread in
      reverse order.
      
      Fixed:
      
      terminate_slave_threads() does not request locking in start_slave_threads()
      when it's called from start_slave() and does it when it's called from init_slave();
      
      rotate_relay_log() does not acquire mi->run_lock which is safe. 
      The rli->inited is not guarded by this mutex, and locking of the mutex 
      in the function contradicts the safe pattern of locking with run_lock.
modified:
  sql/rpl_info.h
  sql/slave.cc

per-file messages:
  sql/rpl_info.h
    rli->inited is made volatile.
  sql/slave.cc
    terminate_slave_threads() should not acquire run_lock when it's
    called from start_slave_threads() and should do that when it's called from init_slave();
    Notice, terminate_slave_thread() does not unlock start_lock in the start_slave() execution
    branch. Moreover it does not unlock in the init_slave() branch either because of the while 
    operator's argument.
=== modified file 'sql/rpl_info.h'
--- a/sql/rpl_info.h	2009-04-02 16:14:14 +0000
+++ b/sql/rpl_info.h	2009-04-27 21:18:21 +0000
@@ -35,7 +35,13 @@ public:
 
   THD *info_thd; 
 
-  bool inited;
+  /*
+    inited changes its value within LOCK_active_mi-guarded critical
+    sections  at times of start_slave_threads() (0->1) and end_slave() (1->0).
+    Readers may not acquire the mutex while they realize potential concurrency
+    issue.
+  */
+  volatile bool inited;
   volatile bool abort_slave;
   volatile uint slave_running;
   volatile ulong slave_run_id;

=== modified file 'sql/slave.cc'
--- a/sql/slave.cc	2009-04-13 13:24:28 +0000
+++ b/sql/slave.cc	2009-04-27 21:18:21 +0000
@@ -720,11 +720,15 @@ int start_slave_thread(pthread_handler h
       DBUG_PRINT("sleep",("Waiting for slave thread to start"));
       const char* old_msg = thd->enter_cond(start_cond,cond_lock,
                                             "Waiting for slave thread to start");
-      pthread_cond_wait(start_cond,cond_lock);
+      pthread_cond_wait(start_cond, cond_lock);
       thd->exit_cond(old_msg);
       pthread_mutex_lock(cond_lock); // re-acquire it as exit_cond() released
       if (thd->killed)
+      {
+        if (start_lock)
+          pthread_mutex_unlock(start_lock);
         DBUG_RETURN(thd->killed_errno());
+      }
     }
   }
   if (start_lock)
@@ -775,7 +779,7 @@ int start_slave_threads(bool need_slave_
                              &mi->rli->slave_running, &mi->rli->slave_run_id,
                              mi);
     if (error)
-      terminate_slave_threads(mi, thread_mask & SLAVE_IO, 0);
+      terminate_slave_threads(mi, thread_mask & SLAVE_IO, !need_slave_mutex);
   }
   DBUG_RETURN(error);
 }
@@ -2500,6 +2504,7 @@ pthread_handler_t handle_slave_io(void *
 
   thd= new THD; // note that contructor of THD uses DBUG_ !
   THD_CHECK_SENTRY(thd);
+  DBUG_ASSERT(mi->info_thd == 0);
   mi->info_thd = thd;
 
   pthread_detach_this_thread();
@@ -4371,9 +4376,6 @@ void rotate_relay_log(Master_info* mi)
 
   DBUG_EXECUTE_IF("crash_before_rotate_relaylog", DBUG_ABORT(););
 
-  /* We don't lock rli->run_lock. This would lead to deadlocks. */
-  pthread_mutex_lock(&mi->run_lock);
-
   /*
      We need to test inited because otherwise, new_file() will attempt to lock
      LOCK_log, which may not be inited (if we're not a slave).
@@ -4402,7 +4404,6 @@ void rotate_relay_log(Master_info* mi)
   */
   rli->relay_log.harvest_bytes_written(&rli->log_space_total);
 end:
-  pthread_mutex_unlock(&mi->run_lock);
   DBUG_VOID_RETURN;
 }
 

Thread
bzr commit into mysql-6.0-rpl branch (aelkin:2842) Bug#38716 Bug#44312Andrei Elkin27 Apr
  • Re: bzr commit into mysql-6.0-rpl branch (aelkin:2842) Bug#38716Bug#44312Luís Soares27 Apr