List:Falcon Storage Engine« Previous MessageNext Message »
From:Olav Sandstaa Date:July 3 2009 2:14pm
Subject:SRLOverflowPages log record and flushing of serial log
View as plain text  
Hi,

Here is a short status and some questions about the recovery bug I work 
on (Bug #45301 Recovery crash due to fetched page marked free in PIP 
http://bugs.mysql.com/bug.php?id=45301).

The recovery crash occurs when the following happens:

 1. Recovery is processing an UpdateRecords log record (record id 1602). 
It finds the record in the Record Locator Page which points to DataPage 461.
 2. DataPage 461 looks fine (ie. no evidence of curruption). In this 
page the record's slot says that the record is stored in an overflowPage 
(with page number 5909). So the first thing the code tires to do is to 
delete this overflow page (and then the problems starts)
 3. The code tries to fetch this page from disk by calling 
Cache::fetchPage()
 4. First indication of something wrong: Cache::fetchPage() checks that 
this page is marked as used in the PageInventoryPage. This results in 
the error written:

       "During recovery, fetched page 5909 tablespace 1 type 9 marked 
free in PIP"

 5. The code "corrects" this by calling 
PageInventoryPage::markPageInUse() (questionable??)
 6. Next indication of something wrong: When IO::readPage() tries to 
read the page it get an EOF error.

Conclusion: The overflow page does not exits in the database file.

So the questions are: Why did this happen? (it has only happend once as 
far as I am aware of...). How to correct this?

Some of the things I have found out this far is:

 * Is page "5909" a reasonable block number? Yes. The database file is 
5864 blocks - so the "missing block" is "only" 55 blocks behind the EOF 
of the database file.

I have also done a scan of the database and found that there are also 
three other DataPages that refers to OverflowPage that are behind EOF 
(data pages refer to overflow pages 5902, 5909, 5913, 5929).

 * The serial log (including log going back about three checkpoints) 
only contains SRLOverflowPages log records where the max overflow page 
size if 5854 (ie. all are within the existing database file).

So the question I am currently wondering about is (and which would be 
great to get some feedback/suggestion on): why do we not have 
SRLOverflowPages log records for the four overflow pages that are behind 
EOF? Are these created before the section of the serial log I have? Or 
are they created after the log I have? Or have we forgotten to log them?

If I read the code correctly, the only place we create overflow pages is 
in Section::storeTail(), at least this is the only place we log the 
creation of OverflowPages. This is mostly done by the Gopher (at least 
for normal data records) but I can not see that the Gopher is flushing 
the log after creating the SRLOverflowPages log record. So the main 
question is: Is ti OK that we create overflow pages and SRLOverflowPages 
log records without flushing the serial log? Is it possible that this is 
the reason for me not finding SRLOverflowPages log records for overflow 
page 5909? (and if it is OK, how should be handle this situation? Catch 
the EOF exception and ignore it? The page is going to be deleted/free 
anyway..)

I am also studying some other issues related to this crash but that can 
wait until another email - but if anyone have suggestions for what 
caused this crash to happen feel free to let me know.

BTW: I am also a bit concerned about the performance impact if we have 
to flush on every SRLOverflowPage (so it is best if we do not have to - 
and it would be great if someone confirmed that this is OK).

Olav



Thread
SRLOverflowPages log record and flushing of serial logOlav Sandstaa3 Jul
  • Re: SRLOverflowPages log record and flushing of serial logAnn W. Harrison6 Jul
    • Re: SRLOverflowPages log record and flushing of serial logOlav Sandstaa6 Jul
      • Double RecoveriesJim Starkey6 Jul
    • Re: SRLOverflowPages log record and flushing of serial logOlav Sandstaa8 Jul
      • Re: SRLOverflowPages log record and flushing of serial logOlav Sandstaa10 Jul
        • Re: SRLOverflowPages log record and flushing of serial logAnn W. Harrison10 Jul