List:Falcon Storage Engine« Previous MessageNext Message »
From:Kevin Lewis Date:January 28 2009 6:57pm
Subject:Re: Problems with record visibility and how it is computed
View as plain text  

Jim Starkey wrote:
> Kevin Lewis wrote:
>> Jim Starkey wrote:
>>> Olav Sandstaa wrote:
>>>> Hi,
>>>>
>>>> Here is a quick summary of some possible solutions for how to solve 
>>>> this problem that have popped up the last days.
>>>>
>>>> First a summary of what the problem is: when determining if a 
>>>> RecordVersion should be visible for a transaction we check the 
>>>> RecordVersion's pointer to the transaction object. If this is 
>>>> non-null we access several member variables of the transaction 
>>>> object. At least one of the inconsistencies that I have seen occurs 
>>>> when the transaction object is deleted (and overwritten) just as we 
>>>> are using its member variables for determining the visibility of the 
>>>> transaction.
>>>>
>>>> The list of more or less random ideas for how this can be solved:
>>>>
>>>> 1. We can stop using the pointer to the transaction and instead 
>>>> locate the transaction by searching for it using the transaction id. 
>>>> Disadvantage of this: a) searching for the correct transaction  
>>>> could become costly. b) we would likely have to  introduce an extra 
>>>> data structure for locating transaction (e.g. a hash table) and c) 
>>>> we would need to  acquire the shared lock on the 
>>>> Transaction::committedList.
>>> We already have somebody who does that, 
>>> TransactionManager::findTransaction.  It doesn't use a hash table (it 
>>> could), but whether it does or doesn't, it requires a shared lock and 
>>> release.  That's way too expensive for checking record visibility.
>> I agree that this is too expensive.
>>
>>>> 2. We can avoid that the transaction object get deleted "behind our 
>>>> back" by acquiring the shared lock on the committed transaction list 
>>>> each time we call Transaction::getRelativeState and 
>>>> Transaction::visible(). Disadvantage: a) a lot of access to this 
>>>> shared lock, potential contention
>>> I agree.  Squared.
>> Too much contention on the committed transaction list.
>>
>>>> 3. An alternative to 2: we introduce a new shared lock for just this 
>>>> purpose to avoid the contention on the committed transaction list 
>>>> lock. When purging transactions we must acquire an exclusive lock on 
>>>> this SyncObject (and we have to reduce the frequency of checking for 
>>>> purging of transactions). Disadvantage: a new shared lock that needs 
>>>> to be used by both the code checking for visibility and for purging 
>>>> of transactions - could still become a hot-spot (I do not think this 
>>>> is very different from cycle locking, but I have not thought too 
>>>> much about it yet).
>>> I think we need to avoid both locks and interlocked instructions 
>>> (increments and decrements) if we're going to keep performance up.
>> This suggestion is about a SyncObject in each transaction that is held 
>> with a shared lock whenever another thread has a reference to it. 
>> Normally, this is done in Falcon with a useCount and interlocked 
>> instructions.  The Transaction class does have a useCount.  Why not 
>> use it?  This would be idea 3A.  Transaction::addRef() could be called 
>> for every RecordVersion added to the transaction in 
>> Transactino::addRecord().  Then also for every stack based use of that 
>> pointer, an addref and release should be called.  I think this might 
>> work without too much cost.
> 
> There isn't much difference in cost between a lock(Shared)/unlock and 
> addRef/release.  Each requires a pair of interlocked instructions.

Right, the cost is there.  How bad is it?  This solution has a huge pile 
of interlocked instructions associated with all the 
RecordVersion::transaction pointers and the stack based pointers. 
Compare that with one per thread using cycle locks.  And that cycle lock 
could also apply to other objects that are headed to the same purgatory.

If a call to purgeTransactions detaches all the records and then finds 
that the useCount is still > 0, it must skip on to the next transaction. 
  That is a kind of purgatory right there.  Each call to 
purgeTransactions will get a chance to purge this old transaction as 
soon as the useCount goes to zero.

>>
>>>> 4. Avoid having to access the transaction object: "Duplicate" the 
>>>> commit information (one integer) in every RecordVersion object. So 
>>>> when committing a transaction, we update every record version with 
>>>> the commitId of the transaction and all RecordVersion objects: 
>>>> Disadvantages: a) duplication of information b) issues with doing 
>>>> the commit as an "atomic operation".
>>> This is doable and would be my second choice.  Making it visible 
>>> across processors might be tricky.
>> I think this is too expensive.  It makes the commit take longer 
>> because the list of records attached to the committing transaction 
>> need to be marked with this commitId.  AND, this needs to happen while 
>> both transaction lists are locked because that is where the commitId 
>> is assigned to the transaction.
> 
> I'm going to back off on my assessment of doable.  Let's move this to 
> the back burner for now.
> 
>>
>>
>>>> 5. "Make it safe to access the transaction pointer and transaction 
>>>> object": By never deleting the transaction object until all record 
>>>> objects pointing to it have been deleted. This would also simplify 
>>>> some of the code since we would always have the transaction 
>>>> available. Disadvantages: a) foot print would increase since we 
>>>> would potentially have a lot of old transaction objects laying 
>>>> around (and we do not want to introduce chilling or backlogging of 
>>>> transaction objects :-) ) b) The current scavenger (or a new 
>>>> transaction scavenger?) would need to be able to determine when a 
>>>> transaction object was ready for purging. I do not know this part of 
>>>> the code, but it might be solvable by having a "record version 
>>>> counter" in each transaction object. c) I do not know what extra 
>>>> this would cost but I think it should be possible to do without any 
>>>> locking.
>>> That's a lot of dead transaction objects.  And it isn't just records 
>>> that point to the transaction but also threads that picked up the 
>>> transaction pointer and are waiting for someone to purge the 
>>> transaction so they can crash Falcon.  Evil little suckers, threads.
>> Well, this sounds doable to me also.  Records would be detached from 
>> these transactions by the scavenger, mostly.  The exception is 
>> releaseSavepoint() and rollbackSavepoint().  So records, while they 
>> can still be used, will always have that transaction pointer.  It 
>> could be set to null in code that assures that no other thread is 
>> using it. Pruning is like that.  And records that are retired use an 
>> exclusive lock on the recordLeaf and require the Record::useCount to 
>> be 1. So I think we can exclude those other pesky threads with this 
>> solution.
> Does that actually work?  How does the scavenger know that another 
> thread isn't using a record transaction pointer so it can delete the 
> transaction object?

The more I think about this, it just makes the problem slightly less 
likely because it takes more time to release records and purge the 
transaction.  Once the last record is scavenged, we still do not know 
how soon to purge the transaction object.  A client checking for 
duplicates or a more recent version of that last record may need to be 
scavenged for a savepoint.  Both would cause a thread to look at that 
record. That thread could grab the transaction pointer just before it is 
pruned.  So the problem still exists.

Whoever purges the transaction must know that no other thread is still 
looking at it.  Only useCounts, cycle locking, or eliminating the 
pointers at commit time will help us, I guess.


>>
>> Now, will the committed transaction list get too long?  Maybe.  But 
>> this might also allow us to retire records sooner.  The current 
>> scavenger cannot retire Records until after they have been separated 
>> from their transaction (purgeTransactions).  But this would allow the 
>> scavenger to do the separating, and retire records that are just not 
>> needed by any other active transaction.  So this solution has the 
>> benefit of allowing records to be retired sooner.  See the email chain 
>> titled "Reluctant Scavenger" for a possible example of this problem.
>>
>>>> 6. Kevin/Jim's proposal to use a cycle locking - I have not had the 
>>>> time to consider this in detail but it might perform similar to 
>>>> alternative 3 above.
>>> OK, I've got cycle locking on the brain.  It doesn't cure everything 
>>> -- it does nothing about the Palestine problem, for example.  But 
>>> it's a cheap, non-blocking solution to short duration pointer 
>>> validity.  It 1) makes it possible to compress intermediate versions, 
>>> and 2) papers over problems of transaction pointer validation.  Since 
>>> everyone understands that the typical software sequence is 0 ... 1 
>>> ... 2 ... infinite, there are clearly other places that a are just 
>>> screaming for cycle locking.
>> But this change is a big deal.  I plan to write up a separate email 
>> with  lots of questions about this.
> 
> I don't think it is a big deal at all, so I look forward to your email.
> 
> One thing that does need to be added, however, is a mechanism to release 
> a cycle lock before going to sleep for a protracted interval, e.g. 
> waiting on another transaction.  I've got a version of CycleLock that 
> uses thread specific data to do this.

That is exactly my big question at the moment.  A call stack will have a 
bunch of pointers to things in the current cycle.  If that thread 
releases the cycle, it cannot trust those pointers when it wakes up. 
And we have a 50 second waitForTransaction!  The cycle must be given up.

>>
>> 7.  Yes, I have another alternative.  Probably the easiest of all. 
>> Comment out this line;
>>
>> storage\falcon\Transaction.cpp(292):
>>     transactionManager->purgeTransactionsWithLocks();
>>
>> This is new with Olav's dependency manager change and it causes 
>> transactions to be purged much sooner than a scavenger cycle.  It 
>> extends the CPU cycles during the critical/serialized part of every 
>> commit and may actually be to blame for a recent performance slowdown 
>> in DBT2. (that is a big maybe! Not sure about that at all).  But if we 
>> leave the transactions around until the scavenge cycle, this error 
>> might not happen any more...
> 
> I don't understand how this fixes the problem.  Doesn't this leave the 
> race between picking up a transaction pointer from a record and deleting 
> the transaction object persist?  I'll grant you that it is less likely, 
> maybe even highly improbable.  But you need to demonstrate it works in 
> all circumstances.

OK, it just a band-aid.  But ouch-less, non-sticking, and has pictures 
of the Flintstones!

>>
>>>>
>>>> I welcome comments and alternative suggestions.
>>>>
>>>> My current "favorites" this far is 5 (simple, safe and almost 
>>>> lockfree - but uses way more memory) and 6 (because it is a great 
>>>> and general idea).
>>>>
>>>>
>>> Hmmm.   Isn't that just throwing memory at the problem????  (where 
>>> have I heard that before?).
>>>
>>> I like 6, of course.
>>
>> I would like to try 7 first, then 5, then 3A, and then maybe 6.
>>
> 
> The theoretical problem is that there generally accessible pointers to 
> transaction objects, so deleting the transaction object is problematic.  
> I think the only solutions involve:
> 
>    * Eliminating the pointer to the transaction
>    * A reference count of the transaction object held by the record object
>    * A lock on the transaction object held by the record object
>    * A lock on transaction existence for the duration of record
>      visibility cycle
> 
> I think the most fruitful place to look for a solution is the last 
> alternative.  The real question is the granularity of the lock (or 
> implied lock).  An ad hoc lock (nice rhyme) is too expensive.  "Forever" 
> is cheap but hard on memory.  The scavenge cycle is tempting, but 
> doesn't actually solve the problem.
All true!

> Maybe the first alternative has an answer.  Suppose we had a reusable 
> transaction id as an index into a transaction state vector.  That would 
> tie down only a byte per transaction rather than a large object.  We 
> would still need a mechanism to recycle the transaction ids, however.  
> Something might take a day and think about this...

Sounds as big a change as cycle locking.  Let's talk about wait states 
with a cycle lock first...

Thread
Problems with record visibility and how it is computedOlav Sandstaa21 Jan
  • search for null values in indexed columnsVladislav Vaintroub21 Jan
    • Re: search for null values in indexed columnsLars-Erik Bjørk21 Jan
    • Re: search for null values in indexed columnsKevin Lewis21 Jan
      • RE: search for null values in indexed columnsVladislav Vaintroub21 Jan
        • Re: search for null values in indexed columnsKevin Lewis21 Jan
      • Re: search for null values in indexed columnsJames Day23 Jan
  • Re: Problems with record visibility and how it is computedOlav Sandstaa23 Jan
    • Re: Problems with record visibility and how it is computedJim Starkey23 Jan
      • Re: Problems with record visibility and how it is computedKevin Lewis28 Jan
        • Re: Problems with record visibility and how it is computedJim Starkey28 Jan
          • Re: Problems with record visibility and how it is computedKevin Lewis28 Jan
            • Cycle Locking (was Problems with record visibility and how it iscomputed)Jim Starkey28 Jan
            • Re: Problems with record visibility and how it is computedAnn W. Harrison28 Jan
              • Re: Problems with record visibility and how it is computedJim Starkey28 Jan
                • Re: Problems with record visibility and how it is computedAnn W. Harrison28 Jan
                  • Re: Problems with record visibility and how it is computedOlav Sandstaa26 Feb
                    • New Transaction State object (Was: Problems with record visibility andhow it is computed)Olav Sandstaa16 Mar
                      • Re: New Transaction State object (Was: Problems with record visibilityand how it is computed)Kevin Lewis16 Mar
                      • Re: New Transaction State object (Was: Problems with record visibilityand how it is computed)Jim Starkey16 Mar
                        • Re: New Transaction State object (Was: Problems with record visibilityand how it is computed)Olav Sandstaa19 Mar
                          • Re: New Transaction State object (Was: Problems with record visibilityand how it is computed)Jim Starkey19 Mar
                            • RE: New Transaction State object (Was: Problems with record visibilityand how it is computed)Vladislav Vaintroub20 Mar
                            • Re: New Transaction State object (Was: Problems with record visibilityand how it is computed)Olav Sandstaa20 Mar
                              • Re: New Transaction State object (Was: Problems with record visibilityand how it is computed)Jim Starkey20 Mar
          • Re: Problems with record visibility and how it is computedOlav Sandstaa28 Jan
            • Re: Problems with record visibility and how it is computedJim Starkey28 Jan
              • Another Idea for Transaction Lifetime ControlJim Starkey28 Jan
                • Re: Another Idea for Transaction Lifetime ControlJim Starkey29 Jan
              • RE: Problems with record visibility and how it is computedXuekun Hu4 Feb
          • Re: Problems with record visibility and how it is computedAnn W. Harrison28 Jan
  • Quick question on row countsKeith Murphy24 Jan
    • Re: Quick question on row countsJim Starkey25 Jan
      • Re: Quick question on row countsKeith Murphy25 Jan