Olav, Code looks good, please push.
Olav Sandstaa wrote:
> #At file:///home/os136802/mysql/develop/repo/falcon-bug41665/ based on
> revid:jstarkey@stripped
>
> 3087 Olav Sandstaa 2009-03-27
> Fix for Bug #41665 Falcon crash in Transaction::waitForTransaction
>
> This crash occurred in the deadlock detector code. The current thread
> A is traversing the waitingFor list while another thread C has just
> aborted (or possibly committed) and is releasing its reference count
> on the transaction state object. A third thread, B, that is just in
> front of C in the waitingFor list is waken up, and also releases its
> reference count on C. If A is inspecting object C as this happens,
> this object is deleted - and can result in thread A crashing.
>
> The fix to Transaction::waitForTransaction(): Before releasing the
> reference count on the transaction state object we aquire an exclusive
> lock on the active transaction list. This will avoid that the
> transaction state object is released and deleted while another thread
> is transversing the waitingFor list.
>
> Also added a missing call to transState->release() in the catch block.
> @ storage/falcon/Transaction.cpp
> Fix to Transaction::waitForTransaction(): Before releasing the
> reference count on the transaction state object we aquire an exclusive
> lock on the active transaction list. The cause for this is that
> during the deadlock detektion code another thread might have used this
> transaction state object's waitingFor pointer to navigate to the next
> transaction state object - and this thread might be the only one that
> has a reference count on that transaction state object. This will
> avoid that the transaction state object is released and deleted while
> another thread is transversing the waitingFor list.
>
> modified:
> storage/falcon/Transaction.cpp
> === modified file 'storage/falcon/Transaction.cpp'
> --- a/storage/falcon/Transaction.cpp 2009-03-27 13:43:10 +0000
> +++ b/storage/falcon/Transaction.cpp 2009-03-27 18:12:46 +0000
> @@ -1083,11 +1083,15 @@ State Transaction::waitForTransaction(Tr
> break;
> }
>
> + // Release the lock on the active transaction list because we will
> + // possibly be blocked and will re-take the lock exclusively
> +
> + syncActiveTransactions.unlock();
> +
> if (!(*deadlock))
> {
> try
> {
> - syncActiveTransactions.unlock();
> CycleLock *cycleLock = CycleLock::unlock();
> transState->waitForTransaction();
>
> @@ -1098,6 +1102,11 @@ State Transaction::waitForTransaction(Tr
> {
> if (!COMPARE_EXCHANGE_POINTER(&transactionState->waitingFor, transState,
> NULL))
> FATAL("waitingFor was not %p", transState);
> +
> + // See comments about this locking further down
> +
> + syncActiveTransactions.lock(Exclusive);
> + transState->release();
>
> throw;
> }
> @@ -1106,6 +1115,18 @@ State Transaction::waitForTransaction(Tr
> if (!COMPARE_EXCHANGE_POINTER(&transactionState->waitingFor, transState,
> NULL))
> FATAL("waitingFor was not %p", transState);
>
> + // Before releasing the reference count on the transaction state object we
> + // need to aquire a exclusive lock on the active transaction list. The
> + // cause for this is that another thread might have used this
> + // transaction state object's waitingFor pointer to navigate to the next
> + // transaction state object - and this thread might be the only one that
> + // has a reference count on that transaction state object. So to avoid that
> + // the transaction state object is released and deleted while another
> + // thread is transversing the waitingFor list (code above) with a shared
> + // lock on it we lock it exclusively.
> +
> + syncActiveTransactions.lock(Exclusive);
> +
> state = (State) transState->state;
> transState->release();
>
>
>
>
> ------------------------------------------------------------------------
>
>