From: Lars-Erik Bjørk Date: April 1 2009 8:30am Subject: Indexes and their walkers (again, slightly edited) List-Archive: http://lists.mysql.com/falcon/658 Message-Id: MIME-Version: 1.0 Content-Type: text/plain; delsp=yes; format=flowed; charset=US-ASCII Content-Transfer-Encoding: 7BIT Hi all, I am resending this mail, that I sent a couple of weeks ago, hoping to get some input that can help me move on. At Kevin's request, I have put my questions at the beginning this time, however, some of them will not make much sense without the context given below. I have also tagged pieces of the text so that you can skip directly to the part giving the context to the question of your choice. However, the information is not perfectly modularized :) I am working with stabilizing the LIMIT feature, and I am running Philip's RQG to expose the bugs. So far I have fixed a few, but there seems to be more of them hiding behind each other. So my questions are: 1. Ann, do you agree that we are comparing the wrong keys after repositioning? 2. How long should it take for old versions to be removed from the index? (I would believe this was done quickly) 3. If a record A changes it's value from x to y and back to x again, and there are three entries for it in the index. Is there any chance that another record B, also with value x, can be positioned between the two entries for A? 4. Does anyone (Ann?:)) have a gut feeling of what is going on? When I started running the test as described at http://bugs.mysql.com/bug.php?id=43630, I would get 0 rows returned instead of 1000, when executing a LIMIT query designed to return all the rows in a table. This was reproducible, and did not involve any deferred indexes. The walker structure is therefore simple, and only involves a WalkIndex that traverses the index at the bottom level. For the index nodes, we have two record numbers with special meaning. These are END_BUCKET (-1) - This is the last node in this page (I hope I interpret this correctly :) ) END_LEVEL (-2) - This is the last node at this level At first I discovered that given enough updates, the first index page may contain only the END_BUCKET node. There is special handling of the first node of every page, but this logic was not able to handle this, and we would stop the search without finding any matching records. I have changed the behavior to reposition the index to the next page, and continue the search (IndexRootPage::repositionIndex). This made the test return more rows, but still not all of them (and sometimes still zero). When traversing the nodes, we check each index key value against a key we build from the record the node refers to (IndexWalker::getValidatedRecord). When we reach the end of an index page we reposition to the next page and continue from there. When debugging, after "fixing" the first error, I saw that for many records, none of the versions in the index would match the record. I realized that when we reposition to the next page, we don't update the (local version of the) index key value that we compare to (at least I have not found the place it is done). It therefore seems like we compare the key built from the first referred record in a page to the index value of the last node in the previous page. Changing this improves the situation, but still not to perfection. In most runs of the test, we are returned 1001 rows instead of 1000 rows, or 101 instead of 100. In these cases, it is always the first record that is duplicated. However, every once in a while we are returned more than 1300 records instead of 1000. The excess records seem to be in a continuous sequence. Not a random record every now and then, but a sequence of records. I have no idea why this is happening. But I have made the observation that there seems to be a lot of versions in the index for each record. And the index does not seem to be cleaned. When counting, IndexWalker::getValidatedRecord was called almost 9000 times for a table with 1000 rows. I think this is strange. Thank you for reading the mail! Humble regards, Lars-Erik P.S: For the interested, my diff so far is the following: === modified file 'storage/falcon/WalkIndex.cpp' --- storage/falcon/WalkIndex.cpp 2009-03-02 08:16:53 +0000 +++ storage/falcon/WalkIndex.cpp 2009-03-17 11:27:41 +0000 @@ -88,9 +88,23 @@ int32 WalkIndex::getNextNode(void) recordNumber = node.getNumber(); if (recordNumber >= 0) + { + node.expandKey(&indexKey); return recordNumber; - else if (recordNumber == END_LEVEL || recordNumber == END_BUCKET) + } + else if (recordNumber == END_BUCKET) + { + IndexRootPage::repositionIndex(index->dbb, index->indexId, this); + continue; + } + else if (recordNumber == END_LEVEL) + { return -1; + } + else + { + ASSERT (false); + } } node.getNext(endNodes); -- Falcon Storage Engine Mailing List For list archives: http://lists.mysql.com/falcon To unsubscribe: http://lists.mysql.com/falcon?unsub=lars-erik.bjork@stripped