Below is the list of changes that have just been committed into a local
5.0 repository of stewart. When stewart does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
ChangeSet
1.1997 06/01/12 15:15:03 stewart@stripped +1 -0
Bug#15695 startings nodes hang in phase 2 forever on temporary network failure
Fix so that:
- if --initial is given, we can get a warning that there may be network partitioning
- if no --initial, but we're doing an initial start, you can get an error when the
nodes can talk to each other.
This is because we're getting the problems at a very early stage of startup - we have
not yet inferred if it's an initial start (among other things).
ndb/src/kernel/blocks/qmgr/QmgrMain.cpp
1.24 06/01/12 15:14:58 stewart@stripped +36 -0
- allow reception of CONNECT_REP when ZRUNNING
- in CM_REGCONF, check that we both agree on who the president is.
- if we disagree, then there probably was network partitioning at some point.
- add a check in CM_REGREF for if initial start, check that we can see everybody.
If not, warn the user that they should check network connections.
# This is a BitKeeper patch. What follows are the unified diffs for the
# set of deltas contained in the patch. The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User: stewart
# Host: willster.(none)
# Root: /home/stewart/Documents/MySQL/5.0/bug15695
--- 1.23/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp 2005-12-06 21:25:49 +11:00
+++ 1.24/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp 2006-01-12 15:14:58 +11:00
@@ -288,6 +288,8 @@
jam();
break;
case ZRUNNING:
+ jam();
+ break;
case ZPREPARE_FAIL:
case ZFAIL_CLOSING:
jam();
@@ -619,6 +621,19 @@
return;
}
+ if(cpresident != ZNIL && cpresident != cmRegConf->presidentNodeId)
+ {
+ jam();
+ char buf[256];
+ BaseString::snprintf(buf,sizeof(buf),"Disagreement on who the president is"
+ ". We think it's %u, but somebody else thinks %u."
+ " This probably means there was network partitioning "
+ "when trying to start the cluster and you ended up "
+ "with two nodes trying to control cluster startup.",
+ cpresident, cmRegConf->presidentNodeId);
+ systemErrorLab(signal, __LINE__, buf);
+ return;
+ }
cpdistref = cmRegConf->presidentBlockRef;
cpresident = cmRegConf->presidentNodeId;
@@ -782,6 +797,27 @@
return;
}
+ if(theConfiguration.getInitialStart())
+ {
+ NodeRecPtr nodePtr;
+
+ for (nodePtr.i = 1; nodePtr.i < MAX_NDB_NODES; nodePtr.i++) {
+ jam();
+ ptrAss(nodePtr, nodeRec);
+ if(getNodeInfo(nodePtr.i).getType() != NodeInfo::DB)
+ continue;
+
+ if(c_start.m_nodes.isWaitingFor(nodePtr.i) &&
+ !c_connectedNodes.get(nodePtr.i))
+ {
+ warningEvent("Initial start without all nodes present.");
+ warningEvent("Waiting until we can communicate with other nodes"
+ " before attempting to start the cluster.");
+ warningEvent("If other nodes are starting, check network connection.");
+ return;
+ }
+ }
+ }
/**
* All configured nodes has agreed
*/
| Thread |
|---|
| • bk commit into 5.0 tree (stewart:1.1997) BUG#15695 | Stewart Smith | 12 Jan |