List:Falcon Storage Engine« Previous MessageNext Message »
From:Ann W. Harrison Date:May 4 2009 6:15pm
Subject:Re: Corrupt indexes?
View as plain text  

> Sometimes we get too many rows returned when doing LIMIT queries. F.ex 
> my previous RQG run says:
> SELECT * FROM `E` WHERE `int_key` < 5 returns different result when 
> executed with predicate 'ORDER BY `int_key` LIMIT 1073741824' (1000 vs. 
> 1459 rows)
> The tables has 1000 rows.

That's not nice.

> In IndexWalker::getValidatedRecord, I have added debug printouts, the 
> recordNumber and the pageNumber whenever we return a  valid record. For 
> the previous example, there are plenty of index pages used, but only the 
> three first of them contains valid nodes.

That's interesting too.  If I understand correctly, only three pages
contain entries for the current content of the table.  What do the other
pages contain?  Old versions of records?  Garbage?  Nothing?

If there are three "leaf nodes" - level 0 pages - there should be one
mostly empty level 1 page for a total of four pages, unless the table
was once much larger, or had key values with a ranges that varied with
time.  If the key values are evenly distributed over time, then there
could be empty pages at the end, assuming that the key values got
smaller (lower value, not just shorter) over time.  Can you print out
an unused page to see what's in it?
> The three first pages are page 154, 136, 213 (in that order).
> All the nodes returned as valid from pages 154 and 213 match with the 
> duplicate rows returned. Page 136 contains valid nodes for all 1000 
> records in the table.

One page contains all the legitimate values, and it's the middle of
the three.  Hmmm. Doesn't exactly sound like a failed split, but it
could be.  Are the entries all ordered correctly within the page?
Is there any overlap between page 154 and 213?  I'm assuming that
154 and 213 contain about 500 entries.  Is page 136 full?
> This leads me to believe that our index is fishy. Some of the nodes have 
> two entries in the index, and these two entries are not even adjacent ...


> I assume that if this is the case, we would not notice it during a 
> regular index search (not using LIMIT), because this would only result 
> in the same bit in the bitmap being set twice, during the scan phase. I 
> tried to test this by adding the following to IndexRootPage::scanIndex
>            ASSERT (!bitmap->isSet(number));  // Added this
>            bitmap->set (number);
> And this asserted during the test.

Good test.
> Does this mean anything? Or are there valid conditions where this assert 
> may happen? I think this looks suspect.

No there are no valid cases that I know of where a single value appears
twice with the same record number.  The record number is part of the
upper level key (as you know) to distinguish duplicate values from each
other - reduces the index garbage collection cost.  But in this case,
as Kevin pointed out, garbage collection isn't going to work.

Interesting indeed.

Best regards,

Corrupt indexes?Lars-Erik Bjørk4 May
  • Re: Corrupt indexes?Kevin Lewis4 May
  • Re: Corrupt indexes?Ann W. Harrison4 May
    • Re: Corrupt indexes?Lars-Erik Bjørk5 May
      • Re: Corrupt indexes?Lars-Erik Bjørk5 May
        • Re: Corrupt indexes?Lars-Erik Bjørk5 May
      • Re: Corrupt indexes?Ann W. Harrison5 May