From: Lars-Erik Bjørk Date: July 14 2009 12:41pm Subject: Possible reason for the eternal loop in addIndexEntry? List-Archive: http://lists.mysql.com/falcon/791 Message-Id: <4A5C7D11.60201@sun.com> MIME-Version: 1.0 Content-Type: text/plain; CHARSET=US-ASCII; format=flowed Content-Transfer-Encoding: 7BIT Hi all, I have looked at bug#46083 (falcon_pagesize2K fails in IndexRootPage::addIndexEntry() during recovery) today, and I have debugged the relevant parts using the log and tablespace files provided by Olav. This is what seems to happen: We try to add a node in IndexRootPage::addIndexEntry. Here is a piece of the code: /* If the node fits on page, we're done */ AddNodeResult result; for (;;) { result = page->addNode(dbb, key, recordNumber); The node does not fit on the page, and we are told to split the page at the middle. In IndexPage::splitIndexPageMiddle, we search for a node to split on. We want to chose a node where the prefix compression is poor. for (; node.node < pageEnd; node.getNext(bucketEnd)) { //int l = node.expandKey(key); if (node.offset || node.length) { if (node.nextNode > midpoint) break; chain = node.node; } } Then we check to make sure that this is not the last node on the page, and if it is, we split on the previous node (prevNode) (I sent a mail about the calculation of the previous node some weeks ago, unrelated to this): // The split node should never be the last node on the page if ((UCHAR*) node.nextNode > (UCHAR*) this + length) node = prevNode; What I see in the debugger is that (UCHAR*) node.nextNode == (UCHAR*) this + length (note *equal*). To my understanding, this *is* the last node on the page, and the condition should have been '>=' instead of just '>'. This means that we in this case split on the last node instead of the second to last node. On the next iteration we will navigate to this page once again, and try to insert into it. This page is still full (same length as the last time, we split on the END_BUCKET node I assume), and we will split it again. This goes on ... and on (I have looped through 14 of the iterations). If I change this condition from '>' to '>=' and also actually update prevNode in the loop showed earlier, so that it does not refer to the first node on the page, the node will be inserted on the second iteration, after the first split. The recovery still segfaults, but on a different place that I have no knowledge of, further down the recovery process :) Does this sound like a plausible theory? /Lars-Erik