List:Falcon Storage Engine« Previous MessageNext Message »
From:Olav Sandstaa Date:March 23 2009 4:31pm
Subject:Re: Deadlock detector accesses deleted Transaction?
View as plain text  
Kevin Lewis wrote:
> Olav, If a TransactionState::waitingFor is pointing to another 
> TransactionState object, it should set a useCount.  Would this solve 
> the problem?

The existing code already has a useCount for this. Before a Transaction 
sets the waitingFor to point to a Transaction (old 
code)/TransactionState (new code) object it increases the use count on 
the Transaction/TransactionState object. This protect the waitingFor 
pointer to always point to a valid Transaction/TransactionState object.

But this seems not to handle all situations. In the example I sent we 
use transaction B's waitingFor pointer to locate the next transaction 
(C) in the graph. Unfortunately, there is nothing preventing the thread 
"owning" transaction B from continuing and remove the waitingFor pointer 
and decrease the use count and then delete the 
Transaction/TransactionState object. So even if the thread D got a valid 
pointer to Transaction/TransactionState object C there is nothing 
preventing the Transaction/TransactionState object from getting deleted 
while it  accesses this pointer.

So the current use of use count to protect this from happening seems not 
to be enough.

I think one way to protect this would be to introduce a "use count" on 
the waitingFor pointer itself.  Then each thread traversing the graph 
would set this new use count as it walked down the graph and removed it 
as it walked back up. This would result in two interlocked operations 
per node in the graph as we traverses it (seems a bit much to me...).

There might be other, simpler solutions for this. I will try to think of 
better ways.

The scenario I presented in the last email might also not be what 
actually happens.....


> Olav Sandstaa wrote:
>> Hi,
>> Bug#41665 reports a crash situation in
>> Transaction::waitForTransaction() caused by Falcon accessing a deleted
>> transaction object. I expected this to be caused by the RecordVersion
>> object having a pointer to a deleted Transaction object. And by
>> introducing the TransactionState object and using this instead of the
>> Transaction object it first seemed like it fixed the problem. The
>> attached repro case for bug#41655 which failed almost on every run
>> before the fix ran without error for the ~15 test runs (or more than
>> an hour) - but then it crashed again.
>> The new crash occurs in the "dead lock detector" in
>> Transaction::waitForTransaction(). In the for loop the "trans" pointer
>> points to an illegal address:
>>    volatile Transaction *trans;
>>    for (trans = transaction->waitingFor; trans; trans = 
>> trans->waitingFor)
>>        if (trans == this)
>>            {
>>            *deadlock = true;
>>            break;
>>            }
>> Note that as part of the rewrite and introduction of the new
>> TransactionState object I have rewritten this code to:
>>       volatile TransactionState *trans;
>>    for (trans = ts->waitingFor; trans; trans = trans->waitingFor)
>>        if (trans == this->transactionState)
>>            {
>>            *deadlock = true;
>>            break;
>>            }
>> I hope this code should behave the same as the original code but I can
>> not exclude that I have introduced something in the code that causes
>> this crash.
>> My initial theory for how this crash can happen is that the
>> Transaction object's waitingFor pointer can access a deleted
>> Transaction object. I have not been able to get this in a debuger/core
>> file but currently my theory based on reading code is that the
>> following scenario can happen:
>> 1. When we start, the following three Transactions A, B, C exists and
>> A and B is blocked by C:
>>       |A|->|B|->|C|
>>   where "->" is the waitingFor pointer.
>> 2. Transaction C commits (or more likely aborts). This unlocks the
>> "syncIsActive" lock which B is waiting on. Note that C's transaction
>> object still exists since B has a use count on it.
>> 3. A "new" transaction D calls Transaction::waitForTransaction(A) and
>> the wait for graph becomes:
>>     |D|->|A|->|B|->|C|
>> 4. Transaction D initializes the "dead lock detector" by walking
>> through this list. At some point the "trans" pointer points on "C" -
>> at this time the thread is interrupted (or more likely runs in
>> parallel with thread B on a multi-core machine).
>> 5. Thread B is waken-up (signalled by thread C in step 2 above) and
>> continue to run: it release the use count it has on C' transaction
>> object. I think this can be the last use count and this leads to C's
>> transaction object being deleted.
>> 6. The thread running transaction D continues with the "dead lock
>> detector". The "trans" pointer now point to a deleted (and
>> overwritten) transaction object and this leads to it (eventually)
>> accessing an illegal address (and crashes).
>> Note that this is just a theory for how this might happen (and again:
>> I do not rule out that I have introduced something in my prototype
>> implementation of the new TransactionState object). Still if anyone
>> has a better theory for what happens or suggestions for a solutions
>> let me know.
>> Note also the following comment in the header of the
>> Transaction::waitForTransaction():
>> // Note:
>> // Deadlock check could use locking, because  there are potentially 
>> concurrent
>> // threads checking and modifying the waitFor list.
>> // Instead, it implements a fancy lock-free algorithm  that works 
>> reliably only
>> // with full memory barriers. Thus "volatile"-specifier and 
>> // are used  when traversing and modifying waitFor list. Maybe it is 
>> better to
>> // use inline assembly or intrinsics to generate memory barrier 
>> instead of
>> // volatile.
>> I do not quite understand the purpose of the following use of
>> "volatile" in the above code:
>>  volatile Transaction *trans;
>> Hopefully after a some hours of sleep I might have a good idea for
>> what happens and how to fix this.
>> Olav

Deadlock detector accesses deleted Transaction?Olav Sandstaa22 Mar
  • Re: Deadlock detector accesses deleted Transaction?Kevin Lewis23 Mar
    • Re: Deadlock detector accesses deleted Transaction?Olav Sandstaa23 Mar
      • Re: Deadlock detector accesses deleted Transaction?Kevin Lewis24 Mar
        • Re: Deadlock detector accesses deleted Transaction?Olav Sandstaa24 Mar
  • RE: Deadlock detector accesses deleted Transaction?Vladislav Vaintroub23 Mar
    • RE: Deadlock detector accesses deleted Transaction?Vladislav Vaintroub23 Mar