MySQL Lists are EOL. Please join:

List:Falcon Storage Engine« Previous MessageNext Message »
From:Lars-Erik Bjørk Date:March 17 2009 11:29am
Subject:Indexes and their walkers
View as plain text  
Hi all, and maybe Ann in particular!

I am working with stabilizing the LIMIT feature, and I am running
Philip's RQG to expose the bugs. So far I have fixed a few, but there
seems to be more of them hiding behind each other. I have some questions
regarding indexes and walkers that maybe someone are able to answer, but
first I will give a little context. If you think this mail is too long
(you may think it is already), feel free to skip to the end to see the

When I started running the test as described at, I would get 0 rows returned
instead of 1000, when executing a LIMIT query designed to return all the
rows in a table. This was reproducible, and did not involve any deferred
indexes. The walker structure is therefore simple, and only involves a
WalkIndex that traverses the index at the bottom level.

For the index nodes, we have two record numbers with special meaning.
These are

END_BUCKET (-1) - This is the last node in this page (I hope I interpret
this correctly :) )
END_LEVEL (-2) - This is the last node at this level

At first I discovered that given enough updates, the first index page
may contain only the END_BUCKET node. There is special handling of the
first node of every page, but this logic was not able to handle this,
and we would stop the search without finding any matching records. I
have changed this to reposition the index to the next page, and continue
the search from there (IndexRootPage::repositionIndex). This made the
test return more rows, but still not all of them (and sometimes still zero).

When traversing the nodes, we check each index key value against a key
we build from the record the node refers to
(IndexWalker::getValidatedRecord). When we reach the end of an index
page we reposition to the next page and continue from there. When
debugging, after "fixing" the first error, I saw that for many records,
none of the versions in the index would match the record. I realized
that when we reposition to the next page, we don't update the (local
version of the) index key value that we compare to (at least I have not
found the place it is done). It therefore seems like we compare the key
built from the first referred record in a page to the index value of the
last node in the previous page. Changing this improves the situation,
but still not to perfection. In most runs of the test, we are returned
1001 rows instead of 1000 rows, or 101 instead of 100. In these cases, 
it is always the first
record that is duplicated. However, every once in a while we are
returned more than 1300 records instead of 1000. The excess records
seem to be in a continuous sequence. Not a random record every now and 
then, but a sequence of records.

I have no idea why this is happening. But I have made the observation
that there seems to be a lot of versions in the index for each record.
And the index does not seem to be cleaned. When counting,
IndexWalker::getValidatedRecord was called almost 9000 times for a table
with 1000 rows. I think this is strange.

So my questions are:

1. Ann, do you agree that we are comparing the wrong keys after

2. How long should it take for old versions to be removed from the
index? (I would believe this was done quickly)

3. If a record A changes it's value from x to y and back to x again, and
the there are three entries for it in the index. Is there any chance
that another record B, also with value x, can be positioned between the
two entries for A?

4. Does anyone (Ann?:)) have a gut feeling of what is going on?

Thank you for reading the mail!
Humble regards,


P.S: For the interested, my diff so far is the following:

=== modified file 'storage/falcon/WalkIndex.cpp'
--- storage/falcon/WalkIndex.cpp	2009-03-02 08:16:53 +0000
+++ storage/falcon/WalkIndex.cpp	2009-03-17 11:27:41 +0000
@@ -88,9 +88,23 @@ int32 WalkIndex::getNextNode(void)
  			recordNumber = node.getNumber();
  			if (recordNumber >= 0)
+				{
+				node.expandKey(&indexKey);
  				return recordNumber;
-			else if (recordNumber == END_LEVEL || recordNumber == END_BUCKET)
+				}
+			else if (recordNumber == END_BUCKET)
+				{
+				IndexRootPage::repositionIndex(index->dbb, index->indexId, this);
+				continue;
+				}
+			else if (recordNumber == END_LEVEL)
+				{
  				return -1;
+				}
+			else
+				{
+				ASSERT (false);
+				}

Indexes and their walkersLars-Erik Bjørk17 Mar