Jim Starkey wrote:
> Kevin Lewis wrote:
>> Jim Starkey wrote:
>>> Olav Sandstaa wrote:
>>>> Here is a quick summary of some possible solutions for how to solve
>>>> this problem that have popped up the last days.
>>>> First a summary of what the problem is: when determining if a
>>>> RecordVersion should be visible for a transaction we check the
>>>> RecordVersion's pointer to the transaction object. If this is
>>>> non-null we access several member variables of the transaction
>>>> object. At least one of the inconsistencies that I have seen occurs
>>>> when the transaction object is deleted (and overwritten) just as we
>>>> are using its member variables for determining the visibility of the
>>>> The list of more or less random ideas for how this can be solved:
>>>> 1. We can stop using the pointer to the transaction and instead
>>>> locate the transaction by searching for it using the transaction id.
>>>> Disadvantage of this: a) searching for the correct transaction
>>>> could become costly. b) we would likely have to introduce an extra
>>>> data structure for locating transaction (e.g. a hash table) and c)
>>>> we would need to acquire the shared lock on the
>>> We already have somebody who does that,
>>> TransactionManager::findTransaction. It doesn't use a hash table (it
>>> could), but whether it does or doesn't, it requires a shared lock and
>>> release. That's way too expensive for checking record visibility.
>> I agree that this is too expensive.
>>>> 2. We can avoid that the transaction object get deleted "behind our
>>>> back" by acquiring the shared lock on the committed transaction list
>>>> each time we call Transaction::getRelativeState and
>>>> Transaction::visible(). Disadvantage: a) a lot of access to this
>>>> shared lock, potential contention
>>> I agree. Squared.
>> Too much contention on the committed transaction list.
>>>> 3. An alternative to 2: we introduce a new shared lock for just this
>>>> purpose to avoid the contention on the committed transaction list
>>>> lock. When purging transactions we must acquire an exclusive lock on
>>>> this SyncObject (and we have to reduce the frequency of checking for
>>>> purging of transactions). Disadvantage: a new shared lock that needs
>>>> to be used by both the code checking for visibility and for purging
>>>> of transactions - could still become a hot-spot (I do not think this
>>>> is very different from cycle locking, but I have not thought too
>>>> much about it yet).
>>> I think we need to avoid both locks and interlocked instructions
>>> (increments and decrements) if we're going to keep performance up.
>> This suggestion is about a SyncObject in each transaction that is held
>> with a shared lock whenever another thread has a reference to it.
>> Normally, this is done in Falcon with a useCount and interlocked
>> instructions. The Transaction class does have a useCount. Why not
>> use it? This would be idea 3A. Transaction::addRef() could be called
>> for every RecordVersion added to the transaction in
>> Transactino::addRecord(). Then also for every stack based use of that
>> pointer, an addref and release should be called. I think this might
>> work without too much cost.
> There isn't much difference in cost between a lock(Shared)/unlock and
> addRef/release. Each requires a pair of interlocked instructions.
Right, the cost is there. How bad is it? This solution has a huge pile
of interlocked instructions associated with all the
RecordVersion::transaction pointers and the stack based pointers.
Compare that with one per thread using cycle locks. And that cycle lock
could also apply to other objects that are headed to the same purgatory.
If a call to purgeTransactions detaches all the records and then finds
that the useCount is still > 0, it must skip on to the next transaction.
That is a kind of purgatory right there. Each call to
purgeTransactions will get a chance to purge this old transaction as
soon as the useCount goes to zero.
>>>> 4. Avoid having to access the transaction object: "Duplicate" the
>>>> commit information (one integer) in every RecordVersion object. So
>>>> when committing a transaction, we update every record version with
>>>> the commitId of the transaction and all RecordVersion objects:
>>>> Disadvantages: a) duplication of information b) issues with doing
>>>> the commit as an "atomic operation".
>>> This is doable and would be my second choice. Making it visible
>>> across processors might be tricky.
>> I think this is too expensive. It makes the commit take longer
>> because the list of records attached to the committing transaction
>> need to be marked with this commitId. AND, this needs to happen while
>> both transaction lists are locked because that is where the commitId
>> is assigned to the transaction.
> I'm going to back off on my assessment of doable. Let's move this to
> the back burner for now.
>>>> 5. "Make it safe to access the transaction pointer and transaction
>>>> object": By never deleting the transaction object until all record
>>>> objects pointing to it have been deleted. This would also simplify
>>>> some of the code since we would always have the transaction
>>>> available. Disadvantages: a) foot print would increase since we
>>>> would potentially have a lot of old transaction objects laying
>>>> around (and we do not want to introduce chilling or backlogging of
>>>> transaction objects :-) ) b) The current scavenger (or a new
>>>> transaction scavenger?) would need to be able to determine when a
>>>> transaction object was ready for purging. I do not know this part of
>>>> the code, but it might be solvable by having a "record version
>>>> counter" in each transaction object. c) I do not know what extra
>>>> this would cost but I think it should be possible to do without any
>>> That's a lot of dead transaction objects. And it isn't just records
>>> that point to the transaction but also threads that picked up the
>>> transaction pointer and are waiting for someone to purge the
>>> transaction so they can crash Falcon. Evil little suckers, threads.
>> Well, this sounds doable to me also. Records would be detached from
>> these transactions by the scavenger, mostly. The exception is
>> releaseSavepoint() and rollbackSavepoint(). So records, while they
>> can still be used, will always have that transaction pointer. It
>> could be set to null in code that assures that no other thread is
>> using it. Pruning is like that. And records that are retired use an
>> exclusive lock on the recordLeaf and require the Record::useCount to
>> be 1. So I think we can exclude those other pesky threads with this
> Does that actually work? How does the scavenger know that another
> thread isn't using a record transaction pointer so it can delete the
> transaction object?
The more I think about this, it just makes the problem slightly less
likely because it takes more time to release records and purge the
transaction. Once the last record is scavenged, we still do not know
how soon to purge the transaction object. A client checking for
duplicates or a more recent version of that last record may need to be
scavenged for a savepoint. Both would cause a thread to look at that
record. That thread could grab the transaction pointer just before it is
pruned. So the problem still exists.
Whoever purges the transaction must know that no other thread is still
looking at it. Only useCounts, cycle locking, or eliminating the
pointers at commit time will help us, I guess.
>> Now, will the committed transaction list get too long? Maybe. But
>> this might also allow us to retire records sooner. The current
>> scavenger cannot retire Records until after they have been separated
>> from their transaction (purgeTransactions). But this would allow the
>> scavenger to do the separating, and retire records that are just not
>> needed by any other active transaction. So this solution has the
>> benefit of allowing records to be retired sooner. See the email chain
>> titled "Reluctant Scavenger" for a possible example of this problem.
>>>> 6. Kevin/Jim's proposal to use a cycle locking - I have not had the
>>>> time to consider this in detail but it might perform similar to
>>>> alternative 3 above.
>>> OK, I've got cycle locking on the brain. It doesn't cure everything
>>> -- it does nothing about the Palestine problem, for example. But
>>> it's a cheap, non-blocking solution to short duration pointer
>>> validity. It 1) makes it possible to compress intermediate versions,
>>> and 2) papers over problems of transaction pointer validation. Since
>>> everyone understands that the typical software sequence is 0 ... 1
>>> ... 2 ... infinite, there are clearly other places that a are just
>>> screaming for cycle locking.
>> But this change is a big deal. I plan to write up a separate email
>> with lots of questions about this.
> I don't think it is a big deal at all, so I look forward to your email.
> One thing that does need to be added, however, is a mechanism to release
> a cycle lock before going to sleep for a protracted interval, e.g.
> waiting on another transaction. I've got a version of CycleLock that
> uses thread specific data to do this.
That is exactly my big question at the moment. A call stack will have a
bunch of pointers to things in the current cycle. If that thread
releases the cycle, it cannot trust those pointers when it wakes up.
And we have a 50 second waitForTransaction! The cycle must be given up.
>> 7. Yes, I have another alternative. Probably the easiest of all.
>> Comment out this line;
>> This is new with Olav's dependency manager change and it causes
>> transactions to be purged much sooner than a scavenger cycle. It
>> extends the CPU cycles during the critical/serialized part of every
>> commit and may actually be to blame for a recent performance slowdown
>> in DBT2. (that is a big maybe! Not sure about that at all). But if we
>> leave the transactions around until the scavenge cycle, this error
>> might not happen any more...
> I don't understand how this fixes the problem. Doesn't this leave the
> race between picking up a transaction pointer from a record and deleting
> the transaction object persist? I'll grant you that it is less likely,
> maybe even highly improbable. But you need to demonstrate it works in
> all circumstances.
OK, it just a band-aid. But ouch-less, non-sticking, and has pictures
of the Flintstones!
>>>> I welcome comments and alternative suggestions.
>>>> My current "favorites" this far is 5 (simple, safe and almost
>>>> lockfree - but uses way more memory) and 6 (because it is a great
>>>> and general idea).
>>> Hmmm. Isn't that just throwing memory at the problem???? (where
>>> have I heard that before?).
>>> I like 6, of course.
>> I would like to try 7 first, then 5, then 3A, and then maybe 6.
> The theoretical problem is that there generally accessible pointers to
> transaction objects, so deleting the transaction object is problematic.
> I think the only solutions involve:
> * Eliminating the pointer to the transaction
> * A reference count of the transaction object held by the record object
> * A lock on the transaction object held by the record object
> * A lock on transaction existence for the duration of record
> visibility cycle
> I think the most fruitful place to look for a solution is the last
> alternative. The real question is the granularity of the lock (or
> implied lock). An ad hoc lock (nice rhyme) is too expensive. "Forever"
> is cheap but hard on memory. The scavenge cycle is tempting, but
> doesn't actually solve the problem.
> Maybe the first alternative has an answer. Suppose we had a reusable
> transaction id as an index into a transaction state vector. That would
> tie down only a byte per transaction rather than a large object. We
> would still need a mechanism to recycle the transaction ids, however.
> Something might take a day and think about this...
Sounds as big a change as cycle locking. Let's talk about wait states
with a cycle lock first...