List:Falcon Storage Engine« Previous MessageNext Message »
From:Kevin Lewis Date:October 27 2008 5:51pm
Subject:Re: Unthawable
View as plain text  
Ann wrote a wonderful response and summary on Saturday, but 
inadvertently only replied to me.  So I include the whole thing here and 
reply below.

On 10/25/2008 3:44 PM, Ann W. Harrison wrote:
> Kevin,
>> My opinion is that we ought to find a way to thaw any record in the 
>> record cache before it becomes unthawable if it needs to be viewed.
>> In other words, any record that is visible to any transaction should 
>> be thawable or already thawed.
>  >
>  > Imagine a record with a bunch of old record versions;
>  >
>  > Chain  RecState    TransID  TransState  WriteComplete
>  >   0    recDeleted  300      Committed   Yes
>  >   1    recChilled  150      Committed   Yes
>  >   2    recChilled  100      Committed   Yes
>  >   3    Record      N/A      N/A         N/A
> <Kevin's excellent explanation cut>
> Philip's cases are valuable tests, but not typical usage...  With
> reasonable settings, chills will happen only during loads and mass
> inserts.  The chances of modifying or deleting a record that's been
> chilled are small, and the chances of having a multi-level modify /
> chill / delete cycle are lower still.
> Here's what I thought happened in that case:
> Transaction 100 creates the record, chills it, and commits.
> Transaction 150 starts to modify the record, thaws it, and creates
> a version chain in the record cache.  If Transaction 150 hits its
> chill limit, it chills just the newest version.  If there's
> a low memory situation, the whole chain goes to the backlog.
> If not, the record version chain in cache now consists of a
> reference to Transaction 150's chilled record and the original
> version which is in the cache.  The original eventually gets
> scavenged after Transaction 150 commits and all its contemporaries
> die.  If they don't die and Transaction 300 comes along to delete
> the record, Transaction 300  thaws Transaction 2's version,
> creating a record chain consisting of a delete, Transaction 2's
> version, and Transaction 1's version.
> So, my assumption was that modifying or deleting a chilled
> record caused it to be thawed and left in the cache.  What
> seems to be happening is that the back version is a reference
> to the chilled record in the serial log.  That saves cache
> memory, but gets us into the current state, where needed old
> versions can be lost because they're kept in the serial log
> which doesn't preserve state for transaction dependencies.
>> Note that if the newest record is DELETED like this example, the 
>> current code hits an assert trying to thaw it from the page since the 
>> record does not exist.  But if the newest record is just UPDATED, I 
>> believe that we are thawing the wrong version of the record!
> That's true too.  And that's a problem, since the record from
> the database can be newer than is appropriate for any particular
> transaction.
>> Let's look at the older record version in the chain; #2.  It was never 
>> thawed, and it never needed to be.  What good luck!  This is an example
>> of why we do not want to thaw everything that goes writeComplete.  We 
>> chilled it to save memory, so why fill up the record cache again if we 
>> don't HAVE to.  
> The normal case will be that a chilled record is never referenced at
> all, assuming a reasonably high chill threshold.  Thawing everything
> that goes writeComplete largely eliminates the benefit of chilling
> and would have a terrible effect when we're running the large insert
> and update transactions for which chilling was invented.
>> This record version and the one originally read from the file, #3, 
>> will be scavenged during the next cycle.  #2 will get scavenged if it 
>> is older than the record version that is visible to the oldest 
>> currently active transaction.  (Right Ann?)
> Right.
>> So when do we thaw these records that MAY be read in the future?  I 
>> think it must be done when a newer version is put into the data page. 
>> That includes when a record is deleted from the page.  This way we do 
>> not have to rely on the serial log to thaw old versions.  That means 
>> that somewhere in SRLUpdateRecords::redo(), before we overwrite a 
>> record in a page, we need to thaw the prior record version if it is 
>> still chilled.
> Yes, I think that works and uses less cache than thawing the old
> version immediately when it's modified or deleted.  The resulting
> record chain would consist of a reference to the most recent record
> in its chilled state in the serial log plus an actual copy of the
> old record version.  It is a bit complex since you have to determine
> at a fairly low level that the old version you're replacing is
> chilled.  You'd probably also want to check that it's still valid
> for some active transaction.
> Jim has another suggestion.
>  >
>  > Why not just guarantee that if a transaction is a) visible (has
>  > dependencies) and b) has chilled records, it is retained in the serial
>  > log inactive transaction list, which will prevent the serial log from
>  > being overwritten.  There is some bookkeeping to be done in
>  > SerialLogTransaction, another test in SerialLog::pageCacheFlushed, and
>  > some tricky stuff for a Transaction to tell the SerialLogTransaction
>  > that he's gone boring, but should be no big deal and won't cost
>  > anything in performance, and won't have any effect at all unless
>  > records are actually chilled.
> Unfortunately, where there is chilling, and the chill threshold is big,
> that will make the serial log grow without end.  Starting a transaction
> without doing anything, then starting a separate large load will prevent
> the log from being swapped and truncated until the reader finally quits.
> That's the case now if the first transaction does an update, but having
> a read block log swapping doesn't seem very good.
> So, there are three possible solutions:
> Ann's   - thaw records when they're modified or deleted.
> Jim's   - make the serial log track useful chilled records and
>           preserve them.
> Kevin's - thaw old versions (if chilled) when replacing them on
>           data pages.
> Ann's uses the most record cache.
> Jim's can lead to unbounded serial log growth.
> Kevin's puts more complexity into writing a record on page.
> Going back to the beginning, modifying and deleting chilled
> records is unlikely in the wild.  I'd go for whichever of
> Ann's or Kevin's solution is most robust and easiest to
> implement.
>> Ann has another theory on how a record version may be unthawable.  
>> Maybe she can present that...
> I had a theory but it collapsed when I convinced myself that deleted
> records are not chilled.  My theory sometimes results in the record
> coming back with data that doesn't match the index, but not the absence
> of the record.  And since that theory was base on an incorrect
> assumption (thawing records before modifying or deleting them), it's
> not very interesting.
> Cheers,
> Ann

I like the presentation of the 3 solutions than Ann makes above.  They 
can be summarized by how soon, or 'just-in-time' they are.

Jim's solution is the most just-in-time solution.  It saves space in the 
record cache at the expense of the serial log.  It has a significant 
risk of growing the serial log since even long running read transactions 
will hold the serial log hostage from being switched over.  And the 
complexity of the changes seem significant.

Ann's solution is the least Just-In-Time.  It happens during a 
transaction.  The thaw is wasted if the transaction rolls back or if 
that prior RecordVersion is never actually viewed.  Jim's solution only 
thaws it when it is actually viewed.  But Ann's solution has the great 
benefit of being VERY simple.  All we have to do it call a thaw in the 
constructor of a new RecordVersion for the oldRecord that is passed in. 
  This is a two line change!

Kevin's solution is not to hot, and not to cold, but is it just right? 
It thaws after the commit, during writeComplete, just before the chilled 
priorVersion currently becomes unthawable.  Serial log processing stays 
the same.  The only advantage over Ann's solution is that it does not 
waste a thaw for a transaction that is rolled back.  The code is 
slightly more complex, maybe 5 or 6 lines in SRLUpdateRecords::redo(). 
But this might not be worth the effort just for rolled back record versions.

I talked with Chris about this and he will try Ann's solution for a 
quick fix.  He has a test that easily finds unthawable records and will 
try it out.  Then he will look into Kevin's solution to see if it is 
worth the added complexity.

UnthawableKevin Lewis25 Oct
  • Re: UnthawableJim Starkey25 Oct
  • Re: UnthawablePhilip Stoev29 Oct
    • Re: UnthawableKevin Lewis30 Oct
      • Re: UnthawableLars-Erik Bjørk30 Oct
    • Re: UnthawableAnn W. Harrison30 Oct
  • Re: UnthawablePhilip Stoev30 Oct
    • Re: UnthawableKevin Lewis30 Oct
      • Re: UnthawableHakan Kuecuekyilmaz30 Oct
      • Re: UnthawableJames Day31 Oct
Re: UnthawableKevin Lewis27 Oct
  • Re: UnthawableAnn W. Harrison27 Oct
  • Re: UnthawableJim Starkey27 Oct