List:Falcon Storage Engine« Previous MessageNext Message »
From:Kevin Lewis Date:March 23 2009 12:55am
Subject:Re: Deadlock detector accesses deleted Transaction?
View as plain text  
Olav, If a TransactionState::waitingFor is pointing to another 
TransactionState object, it should set a useCount.  Would this solve the 
problem?

Olav Sandstaa wrote:
> Hi,
> 
> Bug#41665 reports a crash situation in
> Transaction::waitForTransaction() caused by Falcon accessing a deleted
> transaction object. I expected this to be caused by the RecordVersion
> object having a pointer to a deleted Transaction object. And by
> introducing the TransactionState object and using this instead of the
> Transaction object it first seemed like it fixed the problem. The
> attached repro case for bug#41655 which failed almost on every run
> before the fix ran without error for the ~15 test runs (or more than
> an hour) - but then it crashed again.
> 
> The new crash occurs in the "dead lock detector" in
> Transaction::waitForTransaction(). In the for loop the "trans" pointer
> points to an illegal address:
> 
>    volatile Transaction *trans;
>    for (trans = transaction->waitingFor; trans; trans = trans->waitingFor)
>        if (trans == this)
>            {
>            *deadlock = true;
>            break;
>            }
> 
> Note that as part of the rewrite and introduction of the new
> TransactionState object I have rewritten this code to:
> 
>       volatile TransactionState *trans;
>    for (trans = ts->waitingFor; trans; trans = trans->waitingFor)
>        if (trans == this->transactionState)
>            {
>            *deadlock = true;
>            break;
>            }
> 
> I hope this code should behave the same as the original code but I can
> not exclude that I have introduced something in the code that causes
> this crash.
> 
> My initial theory for how this crash can happen is that the
> Transaction object's waitingFor pointer can access a deleted
> Transaction object. I have not been able to get this in a debuger/core
> file but currently my theory based on reading code is that the
> following scenario can happen:
> 
> 1. When we start, the following three Transactions A, B, C exists and
> A and B is blocked by C:
> 
>       |A|->|B|->|C|
> 
>   where "->" is the waitingFor pointer.
> 
> 2. Transaction C commits (or more likely aborts). This unlocks the
> "syncIsActive" lock which B is waiting on. Note that C's transaction
> object still exists since B has a use count on it.
> 
> 3. A "new" transaction D calls Transaction::waitForTransaction(A) and
> the wait for graph becomes:
> 
>     |D|->|A|->|B|->|C|
> 
> 4. Transaction D initializes the "dead lock detector" by walking
> through this list. At some point the "trans" pointer points on "C" -
> at this time the thread is interrupted (or more likely runs in
> parallel with thread B on a multi-core machine).
> 
> 5. Thread B is waken-up (signalled by thread C in step 2 above) and
> continue to run: it release the use count it has on C' transaction
> object. I think this can be the last use count and this leads to C's
> transaction object being deleted.
> 
> 6. The thread running transaction D continues with the "dead lock
> detector". The "trans" pointer now point to a deleted (and
> overwritten) transaction object and this leads to it (eventually)
> accessing an illegal address (and crashes).
> 
> 
> Note that this is just a theory for how this might happen (and again:
> I do not rule out that I have introduced something in my prototype
> implementation of the new TransactionState object). Still if anyone
> has a better theory for what happens or suggestions for a solutions
> let me know.
> 
> Note also the following comment in the header of the
> Transaction::waitForTransaction():
> 
> // Note:
> // Deadlock check could use locking, because  there are potentially 
> concurrent
> // threads checking and modifying the waitFor list.
> // Instead, it implements a fancy lock-free algorithm  that works 
> reliably only
> // with full memory barriers. Thus "volatile"-specifier and 
> COMPARE_EXCHANGE
> // are used  when traversing and modifying waitFor list. Maybe it is 
> better to
> // use inline assembly or intrinsics to generate memory barrier instead of
> // volatile.
> 
> I do not quite understand the purpose of the following use of
> "volatile" in the above code:
> 
>  volatile Transaction *trans;
> 
> Hopefully after a some hours of sleep I might have a good idea for
> what happens and how to fix this.
> 
> Olav
> 
> 
> 
Thread
Deadlock detector accesses deleted Transaction?Olav Sandstaa22 Mar
  • Re: Deadlock detector accesses deleted Transaction?Kevin Lewis23 Mar
    • Re: Deadlock detector accesses deleted Transaction?Olav Sandstaa23 Mar
      • Re: Deadlock detector accesses deleted Transaction?Kevin Lewis24 Mar
        • Re: Deadlock detector accesses deleted Transaction?Olav Sandstaa24 Mar
  • RE: Deadlock detector accesses deleted Transaction?Vladislav Vaintroub23 Mar
    • RE: Deadlock detector accesses deleted Transaction?Vladislav Vaintroub23 Mar