It looks as if your making good progress. I'll toss in some
more theories, hoping that works.
> So far I am not sure what the older pages contain. The debug
> printouts I have is when we start a LIMIT search
> (IndexRootPage::positionIndex), when we move on to
> the next index page (IndexRootPage::repositionIndex), and
> when we return a valid record number from
> IndexWalker::getValidatedRecord. So the only information I have
> about the 'empty' pages right now is that I see that we
> reposition to them, and then move along to the next page.
Ah! I didn't realize that the useless pages were on the left
side of the index. From your description, I assumed that
the first pages in the index were valid, followed by dreck.
Instead, dreck precedes the good pages. That makes sense.
> Just fyi:
> A change that I have in my sandbox, is that if we come across
> BUCKET_END as the first entry in a page, we move on to the
> next page instead of stopping the search.
Yes, that's a good change. The normal case when you'd see BUCKET_END
as the first entry in a page is when you have an auto-increment or
timestamp key (or something like that) where new entries are always
higher value than old entries and old entries are deleted. No new
record can go into the left hand pages as they empty. So we've seen
BUCKET_END from time to time at the beginning of an index. But the
condition could occur in the middle of an index, I guess, and we
should continue looking for valid records.
> As soon as the bug is reproduced, the database remains in this state, and
> I can run the query multiple times and experience the same results every
That's good. The transient problems are hard.
> What I forgot to mention yesterday, is that the value of the
> indexed field is set to the same value for every record.
Ah! That makes sense. I should have guessed. OK, here's a
theory. The pages of dreck precede the valid data because
all records in the index have the same key value, so they're
sorted in record number order. Although record numbers are
reused, in heavy contention, they're likely to increase in
As you know, index pages split in different ways depending
on their location in the index. Ordinary pages split in
half, more or less, with half the records going to the new
page. The last page on each level splits by putting only
the last entry from the page on the new page, leaving the
old page basically full. The idea is that many indexes
contain data stored in index order (e.g. auto-increment),
so the new record always gets placed at the end of the last
page. Splitting in the middle just wastes space.
That's true absolutely when you're running in low concurrency.
At high concurrency levels, sometimes A will get a value X,
then B will get a value X+1, but B will store its value first.
If the index is in a state where the last value is X-1 and the
next entry will split that page, this messy situation occurs.
B stores X+1, causing an 'end of level' split, putting X+1 as
the lone value in a new page at the end of the level. Then
A stores X, causing the next-to-last page to split again,
but because it's not longer the last page on the level, it
splits in the middle. That's consistent with having about
a half full first page, a full second page, and a nearly
empty third page.
Is this a post-recovery database? My thinking is that we
got messed up somewhere in a double split and didn't record
the changes to the middle page correctly.
As an aside, the deferred indexes in Falcon make storing
monotonically increasing values into the index out of order
more likely than the situation would be in a database where
values are stored in the "real" index directly.
> Within the page, the nodes are however orderex according to
> record number. The are no overlap of the entries in
> page 154 and 213. Page 154 has about 400 'valid' entries,
> whereas page 213 has about 25. How many 'unvalid' entries
> they have, I do not know. I don't know at the present time
> if page 136 is full.
Given that the old database is "fixed", this isn't all that
interesting, but probably the total of valid looking entries
on pages 154 and 213 was 459. Not inconsistent with "about
400" and "about 25".
What's interesting is that we're now looking at a problem in
ordering index entries by record number, not a generic problem
with ordering by key value.... Maybe that's significant,