After (a heck of a) timeout, improve teh error message we display, and
attempt a rollback of the transaction to free resources in kernel.
Remove abort for this error case too... we can mostly continue just
fine. (only VM_TRACE builds)
I don't think this is perfect yet though... the thread can still
get rather confused until we close the transaction properly at the end...
this could be something to do with how the handler should be doing
things... I'm just not too sure. Thoughts quite welcome!
TAKE2 changes:
- use g_eventLogger
- restore abort()
===== ndb/src/ndbapi/NdbTransaction.cpp 1.59 vs edited =====
Index: ndb-work/ndb/src/ndbapi/NdbTransaction.cpp
===================================================================
--- ndb-work.orig/ndb/src/ndbapi/NdbTransaction.cpp 2007-07-02 16:01:30.626091533 +1000
+++ ndb-work/ndb/src/ndbapi/NdbTransaction.cpp 2007-07-02 16:18:07.394894150 +1000
@@ -481,12 +481,21 @@ NdbTransaction::executeNoBlobs(ExecType
while (1) {
int noOfComp = tNdb->sendPollNdb(3 * timeout, 1, forceSend);
if (noOfComp == 0) {
- /**
- * This timeout situation can occur if NDB crashes.
- */
- ndbout << "This timeout should never occur, execute(..)" << endl;
+ time_t t;
+ t= time(NULL);
+ g_eventLogger.error("At %s"
+ "WARNING: Timeout in executeNoBlobs() waiting for "
+ "response from NDB data nodes. This should NEVER "
+ "occur. You have likely hit a NDB Bug. Please "
+ "file a bug.",
+ asctime(localtime(&t)));
+ DBUG_PRINT("error",("This timeout should never occure, execute()"));
+ g_eventLogger.error("Forcibly trying to rollback txn (%p"
+ ") to try to clean up data node resources.",
+ this);
+ executeNoBlobs(NdbTransaction::Rollback);
theError.code = 4012;
- setOperationErrorCodeAbort(4012); // Error code for "Cluster Failure"
+ setOperationErrorCodeAbort(4012); // ndbd timeout
DBUG_RETURN(-1);
}//if
--
Stewart Smith