List:Commits« Previous MessageNext Message »
From:stewart Date:July 2 2007 3:00pm
Subject:[patch 1/5] BUG#28804 improve NDBAPI behaviour in execute on timeout waiting for txn respones.
View as plain text  
After (a heck of a) timeout, improve teh error message we display, and
attempt a rollback of the transaction to free resources in kernel.

Remove abort for this error case too... we can mostly continue just
fine. (only VM_TRACE builds)

I don't think this is perfect yet though... the thread can still
get rather confused until we close the transaction properly at the end...
this could be something to do with how the handler should be doing
things... I'm just not too sure. Thoughts quite welcome!


TAKE2 changes:
 - use g_eventLogger
 - restore abort()

===== ndb/src/ndbapi/NdbTransaction.cpp 1.59 vs edited =====
Index: ndb-work/ndb/src/ndbapi/NdbTransaction.cpp
===================================================================
--- ndb-work.orig/ndb/src/ndbapi/NdbTransaction.cpp	2007-07-02 16:01:30.626091533 +1000
+++ ndb-work/ndb/src/ndbapi/NdbTransaction.cpp	2007-07-02 16:18:07.394894150 +1000
@@ -481,12 +481,21 @@ NdbTransaction::executeNoBlobs(ExecType 
     while (1) {
       int noOfComp = tNdb->sendPollNdb(3 * timeout, 1, forceSend);
       if (noOfComp == 0) {
-        /** 
-         * This timeout situation can occur if NDB crashes.
-         */
-        ndbout << "This timeout should never occur, execute(..)" << endl;
+        time_t t;
+        t= time(NULL);
+        g_eventLogger.error("At %s"
+                            "WARNING: Timeout in executeNoBlobs() waiting for "
+                            "response from NDB data nodes. This should NEVER "
+                            "occur. You have likely hit a NDB Bug. Please "
+                            "file a bug.",
+                            asctime(localtime(&t)));
+        DBUG_PRINT("error",("This timeout should never occure, execute()"));
+        g_eventLogger.error("Forcibly trying to rollback txn (%p"
+                            ") to try to clean up data node resources.",
+                            this);
+        executeNoBlobs(NdbTransaction::Rollback);
 	theError.code = 4012;
-        setOperationErrorCodeAbort(4012);  // Error code for "Cluster Failure"
+        setOperationErrorCodeAbort(4012); // ndbd timeout
         DBUG_RETURN(-1);
       }//if
 

--
Stewart Smith
Thread
[patch 0/5] BUG#28804 (take2)stewart2 Jul
  • [patch 3/5] BUG#28804 Dbtc::releaseAbortResources() send returnsignal when exists.stewart2 Jul
    • Re: [patch 3/5] BUG#28804 Dbtc::releaseAbortResources() send returnsignalwhen exists.Jonas Oreland3 Jul
  • [patch 1/5] BUG#28804 improve NDBAPI behaviour in execute on timeout waiting for txn respones.stewart2 Jul