From: Ann W. Harrison Date: May 5 2009 3:26pm Subject: Re: Corrupt indexes? List-Archive: http://lists.mysql.com/falcon/717 Message-Id: <4A005A8F.1000700@mysql.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Hi Lars-Erik! It looks as if your making good progress. I'll toss in some more theories, hoping that works. > So far I am not sure what the older pages contain. The debug > printouts I have is when we start a LIMIT search > (IndexRootPage::positionIndex), when we move on to > the next index page (IndexRootPage::repositionIndex), and > when we return a valid record number from > IndexWalker::getValidatedRecord. So the only information I have > about the 'empty' pages right now is that I see that we > reposition to them, and then move along to the next page. Ah! I didn't realize that the useless pages were on the left side of the index. From your description, I assumed that the first pages in the index were valid, followed by dreck. Instead, dreck precedes the good pages. That makes sense. > > Just fyi: > A change that I have in my sandbox, is that if we come across > BUCKET_END as the first entry in a page, we move on to the > next page instead of stopping the search. Yes, that's a good change. The normal case when you'd see BUCKET_END as the first entry in a page is when you have an auto-increment or timestamp key (or something like that) where new entries are always higher value than old entries and old entries are deleted. No new record can go into the left hand pages as they empty. So we've seen BUCKET_END from time to time at the beginning of an index. But the condition could occur in the middle of an index, I guess, and we should continue looking for valid records. > > > As soon as the bug is reproduced, the database remains in this state, and > I can run the query multiple times and experience the same results every > time. That's good. The transient problems are hard. > What I forgot to mention yesterday, is that the value of the > indexed field is set to the same value for every record. Ah! That makes sense. I should have guessed. OK, here's a theory. The pages of dreck precede the valid data because all records in the index have the same key value, so they're sorted in record number order. Although record numbers are reused, in heavy contention, they're likely to increase in value. As you know, index pages split in different ways depending on their location in the index. Ordinary pages split in half, more or less, with half the records going to the new page. The last page on each level splits by putting only the last entry from the page on the new page, leaving the old page basically full. The idea is that many indexes contain data stored in index order (e.g. auto-increment), so the new record always gets placed at the end of the last page. Splitting in the middle just wastes space. That's true absolutely when you're running in low concurrency. At high concurrency levels, sometimes A will get a value X, then B will get a value X+1, but B will store its value first. If the index is in a state where the last value is X-1 and the next entry will split that page, this messy situation occurs. B stores X+1, causing an 'end of level' split, putting X+1 as the lone value in a new page at the end of the level. Then A stores X, causing the next-to-last page to split again, but because it's not longer the last page on the level, it splits in the middle. That's consistent with having about a half full first page, a full second page, and a nearly empty third page. Is this a post-recovery database? My thinking is that we got messed up somewhere in a double split and didn't record the changes to the middle page correctly. As an aside, the deferred indexes in Falcon make storing monotonically increasing values into the index out of order more likely than the situation would be in a database where values are stored in the "real" index directly. > Within the page, the nodes are however orderex according to > record number. The are no overlap of the entries in > page 154 and 213. Page 154 has about 400 'valid' entries, > whereas page 213 has about 25. How many 'unvalid' entries > they have, I do not know. I don't know at the present time > if page 136 is full. Given that the old database is "fixed", this isn't all that interesting, but probably the total of valid looking entries on pages 154 and 213 was 459. Not inconsistent with "about 400" and "about 25". What's interesting is that we're now looking at a problem in ordering index entries by record number, not a generic problem with ordering by key value.... Maybe that's significant, maybe not. Cheers, Ann