List:Falcon Storage Engine« Previous MessageNext Message »
From:Lars-Erik Bjørk Date:April 1 2009 8:30am
Subject:Indexes and their walkers (again, slightly edited)
View as plain text  
Hi all,

I am resending this mail, that I sent a couple of weeks ago, hoping to  
some input that can help me move on. At Kevin's request, I have put my
questions at the beginning this time, however, some of them will not  
much sense without the context given below. I have also tagged pieces of
the text so that you can skip directly to the part giving the context  
to the question
of your choice. However, the information is not perfectly modularized :)

I am working with stabilizing the LIMIT feature, and I am running
Philip's RQG to expose the bugs. So far I have fixed a few, but there
seems to be more of them hiding behind each other.
So my questions are:

1. Ann, do you agree that we are comparing the wrong keys after

2. How long should it take for old versions to be removed from the
index? (I would believe this was done quickly)

3. If a record A changes it's value from x to y and back to x again, and
there are three entries for it in the index. Is there any chance
that another record B, also with value x, can be positioned between the
two entries for A?

4. Does anyone (Ann?:)) have a gut feeling of what is going on?

When I started running the test as described at, I would get 0 rows returned
instead of 1000, when executing a LIMIT query designed to return all the
rows in a table. This was reproducible, and did not involve any deferred
indexes. The walker structure is therefore simple, and only involves a
WalkIndex that traverses the index at the bottom level.

For the index nodes, we have two record numbers with special meaning.
These are

END_BUCKET (-1) - This is the last node in this page (I hope I interpret
this correctly :) )
END_LEVEL (-2) - This is the last node at this level

At first I discovered that given enough updates, the first index page
may contain only the END_BUCKET node. There is special handling of the
first node of every page, but this logic was not able to handle this,
and we would stop the search without finding any matching records. I
have changed the behavior to reposition the index to the next page,  
and continue
the search (IndexRootPage::repositionIndex). This made the
test return more rows, but still not all of them (and sometimes still  

When traversing the nodes, we check each index key value against a key
we build from the record the node refers to
(IndexWalker::getValidatedRecord). When we reach the end of an index
page we reposition to the next page and continue from there. When
debugging, after "fixing" the first error, I saw that for many records,
none of the versions in the index would match the record. I realized
that when we reposition to the next page, we don't update the (local
version of the) index key value that we compare to (at least I have not
found the place it is done). It therefore seems like we compare the key
built from the first referred record in a page to the index value of the
last node in the previous page.

Changing this improves the situation, but still not to perfection. In  
runs of the test, we are returned 1001 rows instead of 1000 rows, or
101 instead of 100. In these cases, it is always the first
record that is duplicated. However, every once in a while we are
returned more than 1300 records instead of 1000. The excess records
seem to be in a continuous sequence. Not a random record every now
and then, but a sequence of records.

I have no idea why this is happening. But I have made the observation
that there seems to be a lot of versions in the index for each record.
And the index does not seem to be cleaned. When counting,
IndexWalker::getValidatedRecord was called almost 9000 times for a table
with 1000 rows. I think this is strange.

Thank you for reading the mail!
Humble regards,


P.S: For the interested, my diff so far is the following:

=== modified file 'storage/falcon/WalkIndex.cpp'
--- storage/falcon/WalkIndex.cpp	2009-03-02 08:16:53 +0000
+++ storage/falcon/WalkIndex.cpp	2009-03-17 11:27:41 +0000
@@ -88,9 +88,23 @@ int32 WalkIndex::getNextNode(void)
			recordNumber = node.getNumber();
			if (recordNumber >= 0)
+				{
+				node.expandKey(&indexKey);
				return recordNumber;
-			else if (recordNumber == END_LEVEL || recordNumber == END_BUCKET)
+				}
+			else if (recordNumber == END_BUCKET)
+				{
+				IndexRootPage::repositionIndex(index->dbb, index->indexId, this);
+				continue;
+				}
+			else if (recordNumber == END_LEVEL)
+				{
				return -1;
+				}
+			else
+				{
+				ASSERT (false);
+				}

Falcon Storage Engine Mailing List
For list archives:
To unsubscribe:

Indexes and their walkers (again, slightly edited)Lars-Erik Bjørk1 Apr
  • Re: Indexes and their walkers (again, slightly edited)Kevin Lewis2 Apr