List:Falcon Storage Engine« Previous MessageNext Message »
From:Vladislav Vaintroub Date:February 18 2009 7:45pm
Subject:2 recovery bugs, single patch,
please check
View as plain text  

Here is a patch that fixes 2 nasty problems with recovery and tablespaces.

Bug #42745 Exception: can't find table space during recovery 
(recovery  got log record with unknown tablespace id )

Bug #41837 Falcon recovery error: page 102/0 wrong page type, expected 7 got

The problem is  TableSpace::bootstrap that reads "system.tablespaces" by
hand, and database pages are potentially out of date, because the recovery
has not yet started. This is a catch-22 situation recovery needs tablespace
info and reads from a tablespace that is possibly in corrupted state.

I solved this by avoiding to read system.tablespaces when possible.  When
tableSpaces are modified, I serialize TableSpaceManager to serial log.
During the phase 1 of recovery I deserialize it and the after phase 1 I got
the most recent  TableSpaceManager state before crash. I start recovery
phase2  using this most recent pre-crash TableSpaceManager.

In most  real life cases however, people do not create/drop tablespaces
every second and 99.999% of all serial logs will not contain SRLTableSpaces
record. So I need to fallback to  TableSpaceManager::bootstrap(), exactly as
before this patch, but this is safe - since we know that
"system.tablespaces" database pages was not modified since last checkpoint.

FWIW, database->tableSpaceManager is now NULL until phase 2 of recovery
starts. I do not think it is a problem, attempts to read tablespaces in
phase1 are already punished with ASSERT. IT could be a problem with
scavenger (I needed to modify it a little bit so it does not attempt to
reportTableSpaceStatistics() in this patch. 

General question unrelated to the patch mostly to Kevin and Chris : should
background activities scavenger/updateCardinalities be disabled for the
whole duration of recovery or for phase1 and 2 of recovery? I was somewhat
surprised seeing interference with background threads there, because I was
somehow sure recovery runs alne. I do not think recovery was written with
concurrency in mind. Was scavenger meant to run in recovery at all?  Does
updateCardinality make sense? Any thoughts? 

2 recovery bugs, single patch,please check Vaintroub18 Feb
  • Re: 2 recovery bugs, single patch,please check Lewis19 Feb
    • Re: 2 recovery bugs, single patch, please check Starkey19 Feb