List:Commits« Previous MessageNext Message »
From:Stewart Smith Date:January 12 2006 4:15am
Subject:bk commit into 5.0 tree (stewart:1.1997) BUG#15695
View as plain text  
Below is the list of changes that have just been committed into a local
5.0 repository of stewart. When stewart does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html

ChangeSet
  1.1997 06/01/12 15:15:03 stewart@stripped +1 -0
  Bug#15695 startings nodes hang in phase 2 forever on temporary network failure
  
  Fix so that:
  - if --initial is given, we can get a warning that there may be network partitioning
  - if no --initial, but we're doing an initial start, you can get an error when the
    nodes can talk to each other.
  
  This is because we're getting the problems at a very early stage of startup - we have
  not yet inferred if it's an initial start (among other things).

  ndb/src/kernel/blocks/qmgr/QmgrMain.cpp
    1.24 06/01/12 15:14:58 stewart@stripped +36 -0
    - allow reception of CONNECT_REP when ZRUNNING
    - in CM_REGCONF, check that we both agree on who the president is.
      - if we disagree, then there probably was network partitioning at some point.
    - add a check in CM_REGREF for if initial start, check that we can see everybody.
      If not, warn the user that they should check network connections.

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User:	stewart
# Host:	willster.(none)
# Root:	/home/stewart/Documents/MySQL/5.0/bug15695

--- 1.23/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp	2005-12-06 21:25:49 +11:00
+++ 1.24/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp	2006-01-12 15:14:58 +11:00
@@ -288,6 +288,8 @@
     jam();
     break;
   case ZRUNNING:
+    jam();
+    break;
   case ZPREPARE_FAIL:
   case ZFAIL_CLOSING:
     jam();
@@ -619,6 +621,19 @@
     return;
   }
 
+  if(cpresident != ZNIL && cpresident != cmRegConf->presidentNodeId)
+  {
+    jam();
+    char buf[256];
+    BaseString::snprintf(buf,sizeof(buf),"Disagreement on who the president is"
+                         ". We think it's %u, but somebody else thinks %u."
+                         " This probably means there was network partitioning "
+                         "when trying to start the cluster and you ended up "
+                         "with two nodes trying to control cluster startup.",
+                         cpresident, cmRegConf->presidentNodeId);
+    systemErrorLab(signal, __LINE__, buf);
+    return;
+  }
 
   cpdistref    = cmRegConf->presidentBlockRef;
   cpresident   = cmRegConf->presidentNodeId;
@@ -782,6 +797,27 @@
     return;
   }
 
+  if(theConfiguration.getInitialStart())
+  {
+    NodeRecPtr nodePtr;
+
+    for (nodePtr.i = 1; nodePtr.i < MAX_NDB_NODES; nodePtr.i++) {
+      jam();
+      ptrAss(nodePtr, nodeRec);
+      if(getNodeInfo(nodePtr.i).getType() != NodeInfo::DB)
+        continue;
+
+      if(c_start.m_nodes.isWaitingFor(nodePtr.i) &&
+         !c_connectedNodes.get(nodePtr.i))
+      {
+        warningEvent("Initial start without all nodes present.");
+        warningEvent("Waiting until we can communicate with other nodes"
+                     " before attempting to start the cluster.");
+        warningEvent("If other nodes are starting, check network connection.");
+        return;
+      }
+    }
+  }
   /**
    * All configured nodes has agreed
    */
Thread
bk commit into 5.0 tree (stewart:1.1997) BUG#15695Stewart Smith12 Jan