Lars-Erik Bjørk wrote:
> Hi Ann!
> Thank you for the detailed response
> I will have to reproduce the bug one more time to get answers to some
> of your
> questions. I did some testing with rebuilding the index yesterday, so the
> corrupted database is no longer corrupt :)
> I have in-line answers to some of your questions, though
> Ann W. Harrison wrote:
>>> Sometimes we get too many rows returned when doing LIMIT queries.
>>> F.ex my previous RQG run says:
>>> SELECT * FROM `E` WHERE `int_key` < 5 returns different result when
>>> executed with predicate 'ORDER BY `int_key` LIMIT 1073741824' (1000
>>> vs. 1459 rows)
>>> The tables has 1000 rows.
>> That's not nice.
>>> In IndexWalker::getValidatedRecord, I have added debug printouts,
>>> the recordNumber and the pageNumber whenever we return a valid
>>> record. For the previous example, there are plenty of index pages
>>> used, but only the three first of them contains valid nodes.
>> That's interesting too. If I understand correctly, only three pages
>> contain entries for the current content of the table. What do the other
>> pages contain? Old versions of records? Garbage? Nothing?
Other, not older, sorry :)
> So far I am not sure what the older pages contain. The debug printouts
> I have
> is when we start a LIMIT search (IndexRootPage::positionIndex), when
> we move
> on to the next index page (IndexRootPage::repositionIndex), and when
> we return
> a valid record number from IndexWalker::getValidatedRecord. So the only
> information I have about the 'empty' pages right now is that I see
> that we
> reposition to them, and then move along to the next page.
> Just fyi:
> A change that I have in my sandbox, is that if we come across
> BUCKET_END as the
> first entry in a page, we move on to the next page instead of stopping
> the search.
>> If there are three "leaf nodes" - level 0 pages - there should be one
>> mostly empty level 1 page for a total of four pages, unless the table
>> was once much larger, or had key values with a ranges that varied with
>> time. If the key values are evenly distributed over time, then there
>> could be empty pages at the end, assuming that the key values got
>> smaller (lower value, not just shorter) over time. Can you print out
>> an unused page to see what's in it?
> As soon as the bug is reproduced, the database remains in this state, and
> I can run the query multiple times and experience the same results every
> time. Getting the information you asked for should be doable. I will
> to that in an additional mail when I have dug a little deeper.
>>> The three first pages are page 154, 136, 213 (in that order).
>>> All the nodes returned as valid from pages 154 and 213 match with
>>> the duplicate rows returned. Page 136 contains valid nodes for all
>>> 1000 records in the table.
>> One page contains all the legitimate values, and it's the middle of
>> the three. Hmmm. Doesn't exactly sound like a failed split, but it
>> could be. Are the entries all ordered correctly within the page?
>> Is there any overlap between page 154 and 213? I'm assuming that
>> 154 and 213 contain about 500 entries. Is page 136 full?
> What I forgot to mention yesterday, is that the value of the indexed
> is set to the same value for every record. Within the page, the nodes
> are however
> orderer according to record number. The are no overlap of the entries
> in page 154
> and 213. Page 154 has about 400 'valid' entries, whereas page 213 has
> about 25. How
> many 'unvalid' entries they have, I do not know. I don't know at the
> present time if
> page 136 is full.
>>> This leads me to believe that our index is fishy. Some of the nodes
>>> have two entries in the index, and these two entries are not even
>>> adjacent ...
>>> I assume that if this is the case, we would not notice it during a
>>> regular index search (not using LIMIT), because this would only
>>> result in the same bit in the bitmap being set twice, during the
>>> scan phase. I tried to test this by adding the following to
>>> ASSERT (!bitmap->isSet(number)); // Added this
>>> bitmap->set (number);
>>> And this asserted during the test.
>> Good test.
>>> Does this mean anything? Or are there valid conditions where this
>>> assert may happen? I think this looks suspect.
>> No there are no valid cases that I know of where a single value appears
>> twice with the same record number. The record number is part of the
>> upper level key (as you know) to distinguish duplicate values from each
>> other - reduces the index garbage collection cost. But in this case,
>> as Kevin pointed out, garbage collection isn't going to work.
> This seems correct, as the database is left in this corrupted state.
> It is not a transient
> bug (unless the occurrence window is VERY big :)).
> I will get back to you with more detailed answers later today.
>> Interesting indeed.
>> Best regards,