List:Falcon Storage Engine« Previous MessageNext Message »
From:Olav Sandstaa Date:March 22 2009 10:59pm
Subject:Deadlock detector accesses deleted Transaction?
View as plain text  
Hi,

Bug#41665 reports a crash situation in
Transaction::waitForTransaction() caused by Falcon accessing a deleted
transaction object. I expected this to be caused by the RecordVersion
object having a pointer to a deleted Transaction object. And by
introducing the TransactionState object and using this instead of the
Transaction object it first seemed like it fixed the problem. The
attached repro case for bug#41655 which failed almost on every run
before the fix ran without error for the ~15 test runs (or more than
an hour) - but then it crashed again.

The new crash occurs in the "dead lock detector" in
Transaction::waitForTransaction(). In the for loop the "trans" pointer
points to an illegal address:

    volatile Transaction *trans;
    for (trans = transaction->waitingFor; trans; trans = trans->waitingFor)
        if (trans == this)
            {
            *deadlock = true;
            break;
            }

Note that as part of the rewrite and introduction of the new
TransactionState object I have rewritten this code to:

       volatile TransactionState *trans;
    for (trans = ts->waitingFor; trans; trans = trans->waitingFor)
        if (trans == this->transactionState)
            {
            *deadlock = true;
            break;
            }

I hope this code should behave the same as the original code but I can
not exclude that I have introduced something in the code that causes
this crash.

My initial theory for how this crash can happen is that the
Transaction object's waitingFor pointer can access a deleted
Transaction object. I have not been able to get this in a debuger/core
file but currently my theory based on reading code is that the
following scenario can happen:

1. When we start, the following three Transactions A, B, C exists and
A and B is blocked by C:

   
     |A|->|B|->|C|

   where "->" is the waitingFor pointer.

2. Transaction C commits (or more likely aborts). This unlocks the
"syncIsActive" lock which B is waiting on. Note that C's transaction
object still exists since B has a use count on it.

3. A "new" transaction D calls Transaction::waitForTransaction(A) and
the wait for graph becomes:

     |D|->|A|->|B|->|C|

4. Transaction D initializes the "dead lock detector" by walking
through this list. At some point the "trans" pointer points on "C" -
at this time the thread is interrupted (or more likely runs in
parallel with thread B on a multi-core machine).

5. Thread B is waken-up (signalled by thread C in step 2 above) and
continue to run: it release the use count it has on C' transaction
object. I think this can be the last use count and this leads to C's
transaction object being deleted.

6. The thread running transaction D continues with the "dead lock
detector". The "trans" pointer now point to a deleted (and
overwritten) transaction object and this leads to it (eventually)
accessing an illegal address (and crashes).


Note that this is just a theory for how this might happen (and again:
I do not rule out that I have introduced something in my prototype
implementation of the new TransactionState object). Still if anyone
has a better theory for what happens or suggestions for a solutions
let me know.

Note also the following comment in the header of the
Transaction::waitForTransaction():

// Note:
// Deadlock check could use locking, because  there are potentially 
concurrent
// threads checking and modifying the waitFor list.
// Instead, it implements a fancy lock-free algorithm  that works 
reliably only
// with full memory barriers. Thus "volatile"-specifier and COMPARE_EXCHANGE
// are used  when traversing and modifying waitFor list. Maybe it is 
better to
// use inline assembly or intrinsics to generate memory barrier instead of
// volatile.

I do not quite understand the purpose of the following use of
"volatile" in the above code:

  volatile Transaction *trans;

Hopefully after a some hours of sleep I might have a good idea for
what happens and how to fix this.

Olav


Thread
Deadlock detector accesses deleted Transaction?Olav Sandstaa22 Mar
  • Re: Deadlock detector accesses deleted Transaction?Kevin Lewis23 Mar
    • Re: Deadlock detector accesses deleted Transaction?Olav Sandstaa23 Mar
      • Re: Deadlock detector accesses deleted Transaction?Kevin Lewis24 Mar
        • Re: Deadlock detector accesses deleted Transaction?Olav Sandstaa24 Mar
  • RE: Deadlock detector accesses deleted Transaction?Vladislav Vaintroub23 Mar
    • RE: Deadlock detector accesses deleted Transaction?Vladislav Vaintroub23 Mar