From: Kevin Lewis Date: March 23 2009 12:55am Subject: Re: Deadlock detector accesses deleted Transaction? List-Archive: http://lists.mysql.com/falcon/630 Message-Id: <49C6DE1A.3040404@sun.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Olav, If a TransactionState::waitingFor is pointing to another TransactionState object, it should set a useCount. Would this solve the problem? Olav Sandstaa wrote: > Hi, > > Bug#41665 reports a crash situation in > Transaction::waitForTransaction() caused by Falcon accessing a deleted > transaction object. I expected this to be caused by the RecordVersion > object having a pointer to a deleted Transaction object. And by > introducing the TransactionState object and using this instead of the > Transaction object it first seemed like it fixed the problem. The > attached repro case for bug#41655 which failed almost on every run > before the fix ran without error for the ~15 test runs (or more than > an hour) - but then it crashed again. > > The new crash occurs in the "dead lock detector" in > Transaction::waitForTransaction(). In the for loop the "trans" pointer > points to an illegal address: > > volatile Transaction *trans; > for (trans = transaction->waitingFor; trans; trans = trans->waitingFor) > if (trans == this) > { > *deadlock = true; > break; > } > > Note that as part of the rewrite and introduction of the new > TransactionState object I have rewritten this code to: > > volatile TransactionState *trans; > for (trans = ts->waitingFor; trans; trans = trans->waitingFor) > if (trans == this->transactionState) > { > *deadlock = true; > break; > } > > I hope this code should behave the same as the original code but I can > not exclude that I have introduced something in the code that causes > this crash. > > My initial theory for how this crash can happen is that the > Transaction object's waitingFor pointer can access a deleted > Transaction object. I have not been able to get this in a debuger/core > file but currently my theory based on reading code is that the > following scenario can happen: > > 1. When we start, the following three Transactions A, B, C exists and > A and B is blocked by C: > > |A|->|B|->|C| > > where "->" is the waitingFor pointer. > > 2. Transaction C commits (or more likely aborts). This unlocks the > "syncIsActive" lock which B is waiting on. Note that C's transaction > object still exists since B has a use count on it. > > 3. A "new" transaction D calls Transaction::waitForTransaction(A) and > the wait for graph becomes: > > |D|->|A|->|B|->|C| > > 4. Transaction D initializes the "dead lock detector" by walking > through this list. At some point the "trans" pointer points on "C" - > at this time the thread is interrupted (or more likely runs in > parallel with thread B on a multi-core machine). > > 5. Thread B is waken-up (signalled by thread C in step 2 above) and > continue to run: it release the use count it has on C' transaction > object. I think this can be the last use count and this leads to C's > transaction object being deleted. > > 6. The thread running transaction D continues with the "dead lock > detector". The "trans" pointer now point to a deleted (and > overwritten) transaction object and this leads to it (eventually) > accessing an illegal address (and crashes). > > > Note that this is just a theory for how this might happen (and again: > I do not rule out that I have introduced something in my prototype > implementation of the new TransactionState object). Still if anyone > has a better theory for what happens or suggestions for a solutions > let me know. > > Note also the following comment in the header of the > Transaction::waitForTransaction(): > > // Note: > // Deadlock check could use locking, because there are potentially > concurrent > // threads checking and modifying the waitFor list. > // Instead, it implements a fancy lock-free algorithm that works > reliably only > // with full memory barriers. Thus "volatile"-specifier and > COMPARE_EXCHANGE > // are used when traversing and modifying waitFor list. Maybe it is > better to > // use inline assembly or intrinsics to generate memory barrier instead of > // volatile. > > I do not quite understand the purpose of the following use of > "volatile" in the above code: > > volatile Transaction *trans; > > Hopefully after a some hours of sleep I might have a good idea for > what happens and how to fix this. > > Olav > > >