List:Falcon Storage Engine« Previous MessageNext Message »
From:Lars-Erik Bjørk Date:May 5 2009 1:05pm
Subject:Re: Corrupt indexes?
View as plain text  
Hi Ann and all!

OK, I have looked a little closer. The exact same case is not 
reproducible, but
it is the same bug in a slightly different wrapping. RQG says:

SELECT * FROM `E` WHERE `int_key` < 84 returns different result when 
executed with predicate 'ORDER BY `int_key` LIMIT 1073741824' (1000 vs. 
1137 rows).

57 distinct index pages are examined by getValidateRecord during the walk.
5 of these pages contain 'valid' entries.

These pages are 188, 139, 202, 138 and 147 (in search order). The number 
of 'valid'
entries in each of these are:

page 188: 20
page 139: 960
page 202: 1
page 138: 137
page 147: 19

Note that there are 137 'valid' entries in page 138, and there are 137 
rows too
many returned ...

Some aggregated data about the pages examined:

Looking at page: 137 (517 entries, all key mismatches)
Looking at page: 217 (484 entries, all key mismatches)
Looking at page: 187 (only BUCKET_END)
Looking at page: 188 (1000 key mismatches, 20 'valid' entries returned)
Looking at page: 139 (960 'valid' entries returned)
Looking at page: 202 (42 key mismatches, 1 'valid' entry returned)
Looking at page: 136 (999 key mismatches, 1 ignored as duplicate)
Looking at page: 224 (509 key mismatches)
Looking at page: 225 (590 key mismatches, 1 ignored as duplicate)
Looking at page: 184 (only BUCKET_END)
Looking at page: 186 (1 key mismatch)
Looking at page: 138 (1 key mismatch, 137 'valid' entries returned)
Looking at page: 141 (1000 key mismatches, 1 ignored as duplicate)
Looking at page: 142 (BUCKET_END only)
Looking at page: 143 (193 key mismatches, 1 ignored as duplicate)
Looking at page: 147 (1100 key mismatches, 1 ignored as duplicate, 19 
'valid' entries returned)
Looking at page: 144 (1 key mismatch)
Looking at page: 140 (1000 key mismatches, 1 ignored as duplicate)


The rest of the nodes have no 'valid' entries. Most of them have a lot 
of key
mismatches, some of them only have BUCKET_END.

The number of times getValidatedRecord was called when walking the index 
for a
1000 row table: 19875

One hour later. when running this query again, on the same database that 
has been standing
idle in the meantime, the results have changed slightly, but are still 
1051 rows are returned. This shows that some of what we see is 
transient, some is not.
We still look at 57 pages, but some of the rubbish has been cleaned.

The same pages contain 'valid' entries, this time the distribution looks 
like this:

page 188: 0
page 139: 894
page 202: 1
page 138: 137
page 147: 19

The difference is for pages 188 and 139 where a continuous sequence of 
entries that used to be 'valid' now result in a key mismatch.

To be clear, by key mismatch I mean the entries that are rejected by the 
following piece of code
in getValidatedRecord

if (recordKey.keyLength != keyLength ||
        memcmp(recordKey.key, key, keyLength) != 0)
        return NULL;

Leaving the dataabse idle for even more time, does not change the results.

I am sorry for turning this email that was supposed to be explanatory 
into something confusing and complex :)

Do you have any ideas on where to look next or why the index looks like 



Lars-Erik Bjørk wrote:
> Lars-Erik Bjørk wrote:
>> Hi Ann!
>> Thank you for the detailed response
>> I will have to reproduce the bug one more time to get answers to some 
>> of your
>> questions. I did some testing with rebuilding the index yesterday, so 
>> the
>> corrupted database is no longer corrupt :)
>> I have in-line answers to some of your questions, though
>> Ann W. Harrison wrote:
>>> Lars-Erik,
>>>> Sometimes we get too many rows returned when doing LIMIT queries. 
>>>> F.ex my previous RQG run says:
>>>> SELECT * FROM `E` WHERE `int_key` < 5 returns different result when 
>>>> executed with predicate 'ORDER BY `int_key` LIMIT 1073741824' (1000 
>>>> vs. 1459 rows)
>>>> The tables has 1000 rows.
>>> That's not nice.
>>>> In IndexWalker::getValidatedRecord, I have added debug printouts, 
>>>> the recordNumber and the pageNumber whenever we return a  valid 
>>>> record. For the previous example, there are plenty of index pages 
>>>> used, but only the three first of them contains valid nodes.
>>> That's interesting too.  If I understand correctly, only three pages
>>> contain entries for the current content of the table.  What do the 
>>> other
>>> pages contain?  Old versions of records?  Garbage?  Nothing?
> Other, not older, sorry :)
>> So far I am not sure what the older pages contain. The debug 
>> printouts I have
>> is when we start a LIMIT search (IndexRootPage::positionIndex), when 
>> we move
>> on to the next index page (IndexRootPage::repositionIndex), and when 
>> we return
>> a valid record number from IndexWalker::getValidatedRecord. So the only
>> information I have about the 'empty' pages right now is that I see 
>> that we
>> reposition to them, and then move along to the next page.
>> Just fyi:
>> A change that I have in my sandbox, is that if we come across 
>> BUCKET_END as the
>> first entry in a page, we move on to the next page instead of 
>> stopping the search.
>>> If there are three "leaf nodes" - level 0 pages - there should be one
>>> mostly empty level 1 page for a total of four pages, unless the table
>>> was once much larger, or had key values with a ranges that varied with
>>> time.  If the key values are evenly distributed over time, then there
>>> could be empty pages at the end, assuming that the key values got
>>> smaller (lower value, not just shorter) over time.  Can you print out
>>> an unused page to see what's in it?
>> As soon as the bug is reproduced, the database remains in this state, 
>> and
>> I can run the query multiple times and experience the same results every
>> time. Getting the information you asked for should be doable. I will 
>> respond
>> to that in an additional mail when I have dug a little deeper.
>>>> The three first pages are page 154, 136, 213 (in that order).
>>>> All the nodes returned as valid from pages 154 and 213 match with 
>>>> the duplicate rows returned. Page 136 contains valid nodes for all 
>>>> 1000 records in the table.
>>> One page contains all the legitimate values, and it's the middle of
>>> the three.  Hmmm. Doesn't exactly sound like a failed split, but it
>>> could be.  Are the entries all ordered correctly within the page?
>>> Is there any overlap between page 154 and 213?  I'm assuming that
>>> 154 and 213 contain about 500 entries.  Is page 136 full?
>> What I forgot to mention yesterday, is that the value of the indexed 
>> field
>> is set to the same value for every record. Within the page, the nodes 
>> are however
>> orderer according to record number. The are no overlap of the entries 
>> in page 154
>> and 213. Page 154 has about 400 'valid' entries, whereas page 213 has 
>> about 25. How
>> many 'unvalid' entries they have, I do not know. I don't know at the 
>> present time if
>> page 136 is full.
>>>> This leads me to believe that our index is fishy. Some of the nodes 
>>>> have two entries in the index, and these two entries are not even 
>>>> adjacent ...
>>> Nasty.
>>>> I assume that if this is the case, we would not notice it during a 
>>>> regular index search (not using LIMIT), because this would only 
>>>> result in the same bit in the bitmap being set twice, during the 
>>>> scan phase. I tried to test this by adding the following to 
>>>> IndexRootPage::scanIndex
>>>>            ASSERT (!bitmap->isSet(number));  // Added this
>>>>            bitmap->set (number);
>>>> And this asserted during the test.
>>> Good test.
>>>> Does this mean anything? Or are there valid conditions where this 
>>>> assert may happen? I think this looks suspect.
>>> No there are no valid cases that I know of where a single value appears
>>> twice with the same record number.  The record number is part of the
>>> upper level key (as you know) to distinguish duplicate values from each
>>> other - reduces the index garbage collection cost.  But in this case,
>>> as Kevin pointed out, garbage collection isn't going to work.
>> This seems correct, as the database is left in this corrupted state. 
>> It is not a transient
>> bug (unless the occurrence window is VERY big :)).
>> I will get back to you with more detailed answers later today.
>>> Interesting indeed.
>>> Best regards,
>>> Ann
>> Best,
>> Lars-Erik

Corrupt indexes?Lars-Erik Bjørk4 May
  • Re: Corrupt indexes?Kevin Lewis4 May
  • Re: Corrupt indexes?Ann W. Harrison4 May
    • Re: Corrupt indexes?Lars-Erik Bjørk5 May
      • Re: Corrupt indexes?Lars-Erik Bjørk5 May
        • Re: Corrupt indexes?Lars-Erik Bjørk5 May
      • Re: Corrupt indexes?Ann W. Harrison5 May