Olav, comments below;
========================================
> Olav wrote;
> Do you have any opinion about how I should proceed with this? I think we
> both agree that this probably is an existing bug, and I think that
> solving this might make this part of using the dependency manager
> simpler. One alternative is that I continue the work on the dependency
> manager making it work identical to the old one in this case and then
> later after we have solved this bug do any simplifications that can be
> done? An alternative is that that I started to focus on this bug but I
> am a bit worried that that might take some time both to understand and
> fix (and take away focus from the dependency manager).
>
> Another option for me is to spend some time writing some tests for this
> area to check and verify that the both the old and new dependency
> manager works for this parts of the code. Philip has asked a few times
> about how we plan to test this area and by spending some time on testing
> I could get a better overview of what kind of testing that is useful.
>
> I also try to get some progress on some of my other more trivial bugs
> when I feel for doing something that needs less concentration and
> effort. And it feels good to get progress on the bug count too.
>
> Let me know if you have any specific opinion of what I should do if not
> I probably will continue working as planned on the dependency manager,
> some testing and some other bugs.
>
> Is this new bug something that can be discuss in Friday's meeting or
> will it be to detailed for you to bring up?
========================================
> Olav wrote;
> The new dependency manager uses the "startEvent" (start time) and the
> "endEvent" (commit time) for each transaction to determine relationships
> (and thus dependency) between two transaction. The main test for this is:
>
> if (transaction A's endEvent < transaction B's startEvent) { //
> Transaction A committed before B even started
> // A and B was never active on the same time
> // B would not have any dependencies on A (using the old
> dependency manage)
> // A can be clean/removed/deleted
> }
Note that you cannot remove transaction A until it's endEvent is < the
startEvent of the 'oldestActiveTransaction'. In fact, without the
dependency pointers between transactions to prevent this,
Transaction::commitRecords() will probably not actually be able to break
that linkage between records and transaction at
Transaction::writeComplete(). It will have to be done later if there
are 'dependencies'. The current code uses that dependency counter to
release these older transactions when the dependencies pointers finally
go away. Fortunately, the current code also has a call to
TransactionManager::purgeTransactions() in the scavenger. You will need
to rely on this to clean up these old committed transactions that still
have records attached to them.
That means that the work I am doing is now critical for the success of
your new dependency management code. My new scavenger starts up on a
load based event and will call purgeTransactions for you much more
often. The current code will only call it every 30 seconds by default.
At least it does not wait until the scavenge threshold is reached in
the record cache like it does before releasing old records!
> This test is basically replacing most of the tests for "dependencies ==
> 0" in the old dependency manager.
>
> The problem with the code in Transaction::getRelativeState() is that we
> do not always have the transaction object for transaction A (only the
> transaction id of it).
This must change. A record needs to keep the transaction pointer until
the transaction is visible to all. The alternative is to add the
endEvent to each and every RecordVersion, which I agree is generally not
what we want.
You have found an exception to this with Table::unlockRecord(). I am
working on a solution to this exception and will provide it to you
tomorrow. Even if you add the endEvent to each Recordversion, we need
to fix this problem that a lock record can get state=recUnlock. This
state is not expected by many places in the code.
> I have a solution for this where B commits (in
> Transaction::commit()) it copies it "endEvent" number into all
> RecordVersion objects. This makes B's endEvent available in
> Transaction::getRelativeState() also when we do not have the pointer to
> the transaction object. This makes it possible to replace the for-loop
> that goes through the states array with a similar test using the
> endEvent for the "other" and the startEvent of "this" transaction for
> determining if the "other" transaction was active or not when "this"
> transaction started.
>
> It was actully by this approach that I found out that we had cases where
> the pointer to transaction was NULL while the transaction in fact was
> not committed. The way I saw this was that Transaction::getReletiveState
> got called and the endEvent was zero (e.g. the transaction had not
> committed). So I can actually by looking at the endEvent number I have
> introduced in the RecordVersion object both determine if the "other"
> transaction was active when "this" transaction started (test above) and
> if it has committed or not (endEvent equal to zero).
>
> But... I do not like having to extend the use of the endEvent transition
> event into the RecordVersion objects... So if we could fix the existing
> bug that we found yesterday it would have been great if we just could
> have declared the for-loop that tests the states array in
> Transaction::getRelativeState() as "dead code".
There is another problem with putting the endEvent into each and every
record. It takes a while to do. It would have to be done at
Transaction::commit. Transaction::getRelativeState is supposed to wait
on another active transaction until it commits. But if it sees a number
in endEvent, it will assume that the transaction is committed already.
This may cause some race conditions. It is much better to go directly
to the transaction object to see its state.
I talked to Jim today about how to fix this. He confirmed my solution
which is to get rid of the recUnlock state and leave detached lock
records attached to the transaction when they are 'unlocked'. It will
still replace this record on the recordLeaf so that future Table::fetch
calls will find the lock record's priorVersion.
In addition, I need to make sure that any code caught looking at one of
these lock records has the wherewithall to look at the prior. And if it
waits, it MUST do another Table::fetch().
I'll get this code to you as soon as I can.
Kevin