List:Falcon Storage Engine« Previous MessageNext Message »
From:Lars-Erik Bjørk Date:July 7 2009 10:19am
View as plain text  
Hi all!

Here is a mail briefly describing the problem seen with Vlad's
patch for the loop in IndexRootPage::addIndexEntry as requested
by Ann.

I had a quick look at the patch while I was reading through that part
of the code anyway.
The patch looks correct, and except from a typo in a variable
name, I can't find anything that looks fishy. I have discussed this with
Vlad, and we agree that this is a strange problem.

The patch is available at

What the patch essentially does, is removing the loop going
  from 0 - 1000 in IndexRootPage::addIndexEntry, and adding the
new node directly in IndexSpage::splitIndexPageMiddle instead,
so that the thread splitting the node will be ensured that there is
still space available on the page it has just split.

The only thing that looks wrong with this patch, that I have been
able to spot is a small error/typo in splitIndexPageMiddle:

     int32 splitRecordNumber = node.getNumber();
     IndexNode newNode;
     newNode.insert(split->nodes, 0, kLength, key, splitRecordNumber);
     int newNodeSize = IndexNode::nodeLength(0, kLength,  
recordNumber);   <====

Here we use recordNumber (the record number of the node we
are trying to insert) to calculate newNodeSize, instead of
splitRecordNumber (the record number of the node we are splitting
on). newNodeSize is later used when calculating the supernode
offsets for the new page

     for (; i < SUPERNODES && superNodes[i]; i++)
         short newVal = superNodes[i] - delta + newNodeSize;   <====
         ASSERT(newVal >=0);
         if(newVal == 0)
         split->superNodes[j++] = newVal;
         superNodes[i] = 0;

But this does not fix the problem, the patch still crashes the falcon
  test suite. I have narrowed it down to these lines of code added to
the end of splitIndexPageMiddle

     AddNodeResult res = addNode(dbb, insertKey, recordNumber);
     if (res == NextPage)
         res = split->addNode(dbb, insertKey, recordNumber);
     ASSERT(res == NodeAdded || res == Duplicate);
     return splitBdb;

Without fixing the recordNumber typo, this will result in a lot of
different crashes, but with the fix, it crashes with these two call  
When compiling without debug, the second goes away. I swear
I have also observed a call stack including the scavenger, but I can't
seem to reproduce that one anymore :)

#0  0x00007f005c99f1f6 in pthread_kill () from /lib/
#1  0x0000000000bf2164 in my_write_core (sig=6) at stacktrace.c:309
#2  0x00000000006df77b in handle_segfault (sig=6) at
#3  <signal handler called>
#4  0x00007f005c9a1f3b in raise () from /lib/
#5  0x0000000000a18d20 in Error::debugBreak () at Error.cpp:94
#6  0x0000000000a18e65 in Error::error (string=0xdf00b0 "assertion  
(%s) failed at line %d in file %s\n") at Error.cpp:71
#7  0x0000000000a18f09 in Error::assertionFailed (text=0xdf1dac "key -  
(UCHAR*) indexNode < 14", fileName=0xdf1da0 "IndexNode.h", line=109)  
at Error.cpp:78
#8  0x0000000000a2c2f8 in IndexNode::parseNode (this=0x7f0056204f60,  
indexNode=0x7f005ab767f5) at IndexNode.h:109
#9  0x0000000000a2c3ac in IndexNode::getNext (this=0x7f0056204f60,  
end=0x7f005ab767f6) at IndexNode.h:132
#10 0x0000000000acab71 in IndexPage::findNodeInLeaf  
(this=0x7f005ab76000, indexKey=0x7f005620fab0, foundKey=0x0) at  
#11 0x0000000000a2a845 in IndexRootPage::addIndexEntry  
(dbb=0x7f005afeb798, indexId=0, key=0x7f005620fab0, recordNumber=2,  
transId=0) at IndexRootPage.cpp:135
#12 0x0000000000a2b02b in IndexRootPage::indexMerge  
(dbb=0x7f005afeb798, indexId=0, logRecord=0x7f0056212c88, transId=0)  
at IndexRootPage.cpp:841
#13 0x0000000000a9f247 in SRLUpdateIndex::execute  
(this=0x7f0056212c88) at SRLUpdateIndex.cpp:209
#14 0x0000000000a9f2c7 in SRLUpdateIndex::commit (this=0x7f0056212c88)  
at SRLUpdateIndex.cpp:194
#15 0x0000000000a83370 in SerialLogTransaction::commit  
(this=0x5fdd920) at SerialLogTransaction.cpp:92
#16 0x0000000000a83488 in SerialLogTransaction::doAction  
(this=0x5fdd920) at SerialLogTransaction.cpp:158
#17 0x0000000000ac37e7 in Gopher::gopherThread (this=0x7f005ac1fd50)  
at Gopher.cpp:71
#18 0x0000000000ac398b in Gopher::gopherThread (arg=0x7f005ac1fd50) at  
#19 0x00000000009cce59 in Thread::thread (this=0x7f005ac29de0) at  
#20 0x00000000009cd09d in Thread::thread (parameter=0x7f005ac29de0) at  
#21 0x00007f005c99a3ba in start_thread () from /lib/
#22 0x00007f005b906fcd in clone () from /lib/
#23 0x0000000000000000 in ?? ()

#0  0x00007fd187b931f6 in pthread_kill () from /lib/
#0  0x00007fd187b931f6 in pthread_kill () from /lib/
#1  0x0000000000bf2164 in my_write_core (sig=6) at stacktrace.c:309
#2  0x00000000006df77b in handle_segfault (sig=6) at
#3  <signal handler called>
#4  0x00007fd186a47fb5 in raise () from /lib/
#5  0x00007fd186a49bc3 in abort () from /lib/
#6  0x00007fd186a40f09 in __assert_fail () from /lib/
#7  0x00000000006f835f in mysql_execute_command (thd=0x278ed20) at
#8  0x00000000006f9d2b in mysql_parse (thd=0x278ed20, inBuf=0x272b408  
"CALL p1()", length=9, found_semicolon=0x7fd188026b30) at 
#9  0x00000000006fa8ed in dispatch_command (command=COM_QUERY,  
thd=0x278ed20, packet=0x2723551 "", packet_length=9) at 
#10 0x00000000006fbdcb in do_command (thd=0x278ed20) at
#11 0x00000000006e9231 in handle_one_connection (arg=0x278ed20) at
#12 0x00007fd187b8e3ba in start_thread () from /lib/
#13 0x00007fd186afafcd in clone () from /lib/
#14 0x0000000000000000 in ?? ()

It also shows some inconsistencies in result output at one
point, which is probably due to some page inconsistencies
  that are not caught by the assert in IndexNode::parseNode.


IndexRootPage::addIndexEntryLars-Erik Bjørk7 Jul