Olav,
Thanks for the tip on running falcon_bug_22165 repeatably. It stopped
at this line on the third run;
for (int n = 0; n < numberStates; ++n)
if (states [n].transactionId == transId)
return CommittedInvisible; <-- break here
Transaction 8988 is calling Table::fetchForUpdate which has read a base
record and is calling getRelativeState to determine if it is visible.
The base record was owned by transaction 8979 which is currently the
oldest active transaction. But this records state=7
static const int recUnlocked = 7; // record is being unlocked
You will notice that Table::unlockRecord() calls removeRecord(record)
which sets the transaction pointer to NULL. That is important to do.
This base record for 8979 is not visible to 8988 since it is a lock
record that is being backed out. The dependency is tracked by the
states array, but we have to figure out another way to do it.
I have repeated this breakpoint many times and it is always a record
that has state = recUnlocked.
I think this case is actually an existing bug! It currently returns
CommittedInvisible or CommittedVisible depending on whether it is
consistent read. But this record is NOT committed. The code assumes
that if Recordversion::transaction is NULL, the transaction must be
committed. This transaction could be rolling back and undoing all
locks, or it could be committing and releasing lock records that were
not changed.
The code should go into the (transaction->isActive() section and wait
for that transaction. I'm not sure what to do about this, but let me
think about it a little.
However, my original proposal still is valid. We need to prevent these
lock records from being seen while the transaction is NULL, or handle it
correctly.
Olav Sandstaa wrote:
> Hi Kevin,
>
> I have started looking into your suggestions. Here is one initial result:
>
>
> Kevin Lewis wrote:
>> Kevin Lewis wrote:
>>>
>>>
>>> In other words, if a record no longer had a transaction pointer, it
>>> ended before any transaction that would be calling getRelativeState()
>>> ever started.
>>
>> This implies that the code you are worried about is no longer used or
>> needed;
>>
>> // Be sure it was not active when we started.
>>
>> for (int n = 0; n < numberStates; ++n)
>> if (states [n].transactionId == transId)
>> return CommittedInvisible;
>>
>> Can you run a test to see if that return is EVER hit in the current code?
>
> I have done a clean branch of mysql-6.0-falcon-team and run some tests,
> and Yes, I am able to hit the return statement in the above code. With
> the following extra print statement added:
>
> for (int n = 0; n < numberStates; ++n)
> if (states [n].transactionId == transId)
> {
> fprintf(stderr, "Transaction::getRelativeState:
> mytid=%d mystate=%d transId=%d otherTransId=%d otherState=%d
> dependencies=%d\n", transactionId, state, transId,
> states[n].transaction->transactionId,
> states[n].transaction->state,states[n].transaction->dependencies );
> FATAL("Transaction::getRelativeState");
> return CommittedInvisible;
> }
> }
>
>
> I get the following written out:
>
> Transaction::getRelativeState: mytid=1953 mystate=0 transId=1952
> otherTransId=1952 otherState=0 dependencies=1
> [Falcon] Error: Transaction::getRelativeState
>
> It seems like this code is not "quite dead" and it seems like the
> transaction that the RecordVersion object is referring to (but have no
> pointer too) is not deleted but in Active state with one existing
> dependency (if reading what the transaction pointer points to is valid
> data).
>
> It is possible to reproduce this by running falcon_bug_22165 repeatable
> on a multi-CPU server (you get the crash easier if you add some extra
> printout each time Transaction::getRelativeState() is called).
>
> So I am still "worried" about this code, and I think I need to try to
> figure out what is the answers to the follow questions:
>
>
>>>> 2) Comments in the code above seems to suggest that the reason for
>>>> not having the pointer to the transaction object is due to that the
>>>> transaction object has been deleted and it assumes that the
>>>> transaction has been committed. In my testing this seems not like it
>>>> is an correct assumption, in most cases when the transaction pointer
>>>> is NULL the transaction is actually still in Active state. Questions:
>>>>
>>>> a) Are the comments in the code wrong? Or
>>>> b) The code is wrong? And
>>>> c) Why do we not have the transaction pointer pointing to the
>>>> transaction if it is still active
>>>>
>>>> or (d) I have I introduced some error that causes this?
>>>
>>> You have introduced an error that causes Transaction::commitRecords()
>>> to be called while dependencies still exist.
>
> Kevin might still be right about that I have introduced an error but it
> seems like something is also not quite correct when using the existing
> dependency manager.
>
> Olav
>