List:Falcon Storage Engine« Previous MessageNext Message »
From:Ann W. Harrison Date:July 10 2009 3:40pm
Subject:Re: SRLOverflowPages log record and flushing of serial log
View as plain text  
Olav Sandstaa wrote:

>>  1. Have the Gopher flush SRLOverflowPages log records to disk? 
>> (question: if we do not flush the log, why bother writing the log 
>> records?) My concern here is that this will be a "performance issue".

Overflow pages were overlooked in the original serial log design.
Generally the gopher threads don't create serial log records, but
they do create SRLOverflowPages records when moving large records
from the serial log to data pages after the transaction has committed.
To prevent the error you saw, Falcon would have to flush the serial
log for each record - as you suggest, that would slow down everything.

>>  2. Improve how we handle DataPage::deleteOverflowPage() during 
>> recovery: by either:
>>       2a: check the file size, if block is beyond EOF do not attempt 
>> to read it - assume it is freed.

We don't free space a the end of the tablespace, but sometimes as
you noted, we don't allocate all the space that appears to be mapped
to pages in the cache.  When we allocate a new page, we "fake" it
and make it real when the page is written.  If the page is not written
because of a crash, there may be pages that are marked as in use on
a PIP or referenced as overflow pages that are not part of the file
on disk.

>>       2b: if beyond EOF is reported - ignore it during recovery - and 
>> assume that the block is freed.

Change "freed" to "never written".

>>  3. Avoid having to read in the overflow block: Is there other reasons 
>> for reading it than to check if it has a "next" overflow page? If that 
>> is the case, we could change it to store the entire list of overflow 
>> blocks in the data page (could also improve on reading large 
>> records/blobs) 

There is a solution there ... it will need to be more complicated for
blobs where the list of overflow pages may exceed a page, but its even
more necessary to avoid page by page reads and writes of blobs.

>> Note that alternative 2: will not handle links to following overflow 
>> blocks and that can lead to permanently lost data blocks (but that is 
>> a minor issue with the current price per 4K block of disk :-) ).

There is code to locate and reclaim lost pages that can be invoked
through one of the table optimize/validate/etc. statements.
> 
>     Alternative approach (which I also test-implemented at least for 
> Unix) could be to check the size of the database file at startup and 
> thus avoiding the actual "read beyond end-of-file" operation.

I think that doesn't work on windows.  At least their public bookkeeping
on file size tends to lag.
> 
> I also had to fix an error in Cache::fetchPage(). This code did not 
> behave correctly if an error was thrown during an read operation. If an 
> IO exception was thrown the BDB object was not released and would stay 
> "forever" in the page cache and would lead to a crash during shutdown 
> due to the BDB object having a referece count not 0.

Good catch!  I've seen those errors and worried about them.
> 

Cheers,

Ann
Thread
SRLOverflowPages log record and flushing of serial logOlav Sandstaa3 Jul
  • Re: SRLOverflowPages log record and flushing of serial logAnn W. Harrison6 Jul
    • Re: SRLOverflowPages log record and flushing of serial logOlav Sandstaa6 Jul
      • Double RecoveriesJim Starkey6 Jul
    • Re: SRLOverflowPages log record and flushing of serial logOlav Sandstaa8 Jul
      • Re: SRLOverflowPages log record and flushing of serial logOlav Sandstaa10 Jul
        • Re: SRLOverflowPages log record and flushing of serial logAnn W. Harrison10 Jul