From: Olav Sandstaa Date: November 19 2008 2:19pm Subject: Re: New dependency manager and Transaction::getRelativeState List-Archive: http://lists.mysql.com/falcon/198 Message-Id: <49242074.2090803@sun.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Kevin, Thanks a lot for looking into this and for providing a very detailed explanation of what happens. I agree that this is likely an existing bug in the code. Comments below: Kevin Lewis wrote: > Olav, > > Thanks for the tip on running falcon_bug_22165 repeatably. It stopped > at this line on the third run; > > for (int n = 0; n < numberStates; ++n) > if (states [n].transactionId == transId) > return CommittedInvisible; <-- break here > > Transaction 8988 is calling Table::fetchForUpdate which has read a > base record and is calling getRelativeState to determine if it is > visible. The base record was owned by transaction 8979 which is > currently the oldest active transaction. But this records state=7 > > static const int recUnlocked = 7; // record is being unlocked > > You will notice that Table::unlockRecord() calls removeRecord(record) > which sets the transaction pointer to NULL. That is important to do. > > This base record for 8979 is not visible to 8988 since it is a lock > record that is being backed out. The dependency is tracked by the > states array, but we have to figure out another way to do it. > > I have repeated this breakpoint many times and it is always a record > that has state = recUnlocked. > > I think this case is actually an existing bug! It currently returns > CommittedInvisible or CommittedVisible depending on whether it is > consistent read. But this record is NOT committed. The code assumes > that if Recordversion::transaction is NULL, the transaction must be > committed. This transaction could be rolling back and undoing all > locks, or it could be committing and releasing lock records that were > not changed. > > The code should go into the (transaction->isActive() section and wait > for that transaction. I'm not sure what to do about this, but let me > think about it a little. > > However, my original proposal still is valid. We need to prevent > these lock records from being seen while the transaction is NULL, or > handle it correctly. I have code for the new dependency manager that handles this similarly to what the existing code (and thus have the same bug as the existing code) but I am not quite satisfied with the implementation that I currently have (as pointed out in the "issue 1" in the first email I sent about this yesterday). The new dependency manager uses the "startEvent" (start time) and the "endEvent" (commit time) for each transaction to determine relationships (and thus dependency) between two transaction. The main test for this is: if (transaction A's endEvent < transaction B's startEvent) { // Transaction A committed before B even started // A and B was never active on the same time // B would not have any dependencies on A (using the old dependency manage) // A can be clean/removed/deleted } This test is basically replacing most of the tests for "dependencies == 0" in the old dependency manager. The problem with the code in Transaction::getRelativeState() is that we do not always have the transaction object for transaction A (only the transaction id of it). I have a solution for this where B commits (in Transaction::commit()) it copies it "endEvent" number into all RecordVersion objects. This makes B's endEvent available in Transaction::getRelativeState() also when we do not have the pointer to the transaction object. This makes it possible to replace the for-loop that goes through the states array with a similar test using the endEvent for the "other" and the startEvent of "this" transaction for determining if the "other" transaction was active or not when "this" transaction started. It was actully by this approach that I found out that we had cases where the pointer to transaction was NULL while the transaction in fact was not committed. The way I saw this was that Transaction::getReletiveState got called and the endEvent was zero (e.g. the transaction had not committed). So I can actually by looking at the endEvent number I have introduced in the RecordVersion object both determine if the "other" transaction was active when "this" transaction started (test above) and if it has committed or not (endEvent equal to zero). But... I do not like having to extend the use of the endEvent transition event into the RecordVersion objects... So if we could fix the existing bug that we found yesterday it would have been great if we just could have declared the for-loop that tests the states array in Transaction::getRelativeState() as "dead code". Olav