List:Falcon Storage Engine« Previous MessageNext Message »
From:Lars-Erik Bjørk Date:May 29 2009 9:01am
Subject:Missing index entry
View as plain text  
Hi team!

Here is another one of the confusing mails I tend to mail you :)
This time I am working on a LIMIT bug where we miss exactly one row.

(For those of you reluctant to read the entire mail, the question I
raise at the end is:  "how can we loose an index entry in the "middle"
of a page?")

I have come across two different versions of this issue:

1. The row is only missing when accessing using index walkers
2. The row is also missing when accessing using a bitmap

I have dug a little deeper into scenario 1.

SELECT * FROM D WHERE int_key < 164 ORDER BY int_key LIMIT 1073741824;
returns 99 rows, where 100 rows are expected.

Looking at the debug printouts (slightly massaged by a Perl hack) I
have aggregated the following data, collected when searching the bottom
level of the index using index walkers. The pages are listed in the
order they are encountered during the search.

The notation is:
* dup = number of entries that refer to the same record as the last
returned valid entry (two sequential entries referring to the same record)

* nrec = number of entries referring to a record we can't look up
(probably an entry referring to a deleted record?)

* ncan = number of entries referring to a record we can find no visible
candidate for

* nmis = number of entries where the index key does not match the record

* nret = number of entries that are returned as valid

The printouts are added in IndexWalker::getValidatedRecord

Page:216,  dup:0,  nrec:0,   ncan:0,   nmis:2,   nret:99  (valid)
Page:240,  dup:1,  nrec:0,   ncan:0,   nmis:99,  nret:0
Page:213,  dup:0,  nrec:0,   ncan:0,   nmis:1,   nret:0
Page:253,  dup:0,  nrec:0,   ncan:0,   nmis:1,   nret:0
Page:169,  dup:0,  nrec:0,   ncan:0,   nmis:1,   nret:0
Page:226,  dup:0,  nrec:0,   ncan:0,   nmis:0,   nret:0   (empty)

99 valid entries were found in page 216, I *assume* that these are
correct index entries

99 key mismatches were found in page 240, I *assume* that these are old
versions of the index entries (because the record values do not match
the index keys)

Pages 213, 253, 169 and 226 are almost empty or empty

If we look closer at the matches in page 216, we can see
that the index page does not include record number 12
(which is the missing row):

( rid = recordId, pid = pageId )
[returning record|rid:9|pid:216]
[returning record|rid:10|pid:216]
[returning record|rid:11|pid:216]
[returning record|rid:13|pid:216]
[returning record|rid:14|pid:216]
[returning record|rid:15|pid:216]

If we look at the mismatches in page 240, we can see that record
number 12 is listed among these:

[key mismatch|rid:9|pid:240]
[key mismatch|rid:10|pid:240]
[key mismatch|rid:11|pid:240]
[key mismatch|rid:12|pid:240]
[key mismatch|rid:13|pid:240]
[key mismatch|rid:14|pid:240]
[key mismatch|rid:15|pid:240]

I have added similar printouts when accessing using a bitmap,
when setting the bits in the bitmap.
For page 216, we have the following:

[Setting bit|rid:9|pid:216]
[Setting bit|rid:10|pid:216]
[Setting bit|rid:11|pid:216]
[Setting bit|rid:13|pid:216]
[Setting bit|rid:14|pid:216]
[Setting bit|rid:15|pid:216]

We can see that also here record number 12 is missing on
page 216

When we look at page 240 however, we set a bit for all of the
records, including record number 12:

[Setting bit|rid:9|pid:240]
[Setting bit|rid:10|pid:240]
[Setting bit|rid:11|pid:240]
[Setting bit|rid:12|pid:240]
[Setting bit|rid:13|pid:240]
[Setting bit|rid:14|pid:240]
[Setting bit|rid:15|pid:240]

To me this is strange by it self, as I believed we never set the same
bit twice ...
However, I guess this is because the index key is not compared to
the actual record value, and the key is within the search range,
even though it is a stale entry.
At a later stage, when we iteratively call the handler function
StorageInterface::index_next, we fetch the records, and check if the
values are within the range, which they are, and return them.
So in the bitmap case, I guess the problem is masked. Even though the
up-to-date index entry for record 12 is missing, we still happen to find
it because we have a stale index entry that is within the search range.

The difference probably lies in that accessing through the walkers, we
also check if we are using the correct version of the index entry
(checking that the index key is equal to the actual
record value)

I assume that the second version of the problem, were both accessing
using index walkers and accessing using a bitmap misses a row, we don't
happen to have an old index entry with a key within the range, masking
the actual error.
(this is a *guess*, I have not collected data for this scenario)

So my question is: "how can we loose an index entry in the "middle" of a
page?". Maybe this is not really a LIMIT issue, but a general index issue?

Missing index entryLars-Erik Bjørk29 May