From: Date: November 15 2006 9:43am Subject: ndb: usage of uninitialized memory List-Archive: http://lists.mysql.com/internals/34081 Message-Id: <20061115084324.GA88290@qlovarnika.bg.datamax> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="liOOAslEiF7prFVr" --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, I got ndb_dd_backuprestore test failing with some core files lying around after that. Here is what happens (it is MySQL 5.1.14-beta from BK as of 2006.11.15): --- cut --- $ ./mysql-test-run.pl --mysqld=3D--binlog-format=3Dmixed --do-test=3Dndb_dd= _backuprestore =2E.. ndb_dd_backuprestore [ fail ] Errors are (from /usr/local/devel/mysql/mysql-5.1/mysql-test/var/log/mysqlt= est-time) : Failed to connect to ndb!! Failed to initialize consumers mysqltest: In included file "./include/ndb_restore_master.inc": At line 8: = command "$NDB_TOOLS_DIR/ndb_restore --no-defaults --ndb-connectstring=3D"lo= calhost:$NDBCLUSTER_PORT" -p 8 -b $the_backup_id -n 2 -r --print --print_m= eta $NDB_BACKUP_DIR/BACKUP/BACKUP-$the_backup_id >> $NDB_TOOLS_OUTPUT" fail= ed (the last lines may be the most important ones) Result from queries before failure can be found in r/ndb_dd_backuprestore.l= og --- cut --- Here are the last 4 lines from r/ndb_dd_backuprestore.log: --- r/ndb_dd_backuprestore.log begins here --- 113: ENGINE =3D NDB; 114: DROP LOGFILE GROUP log_group1 115: ENGINE =3DNDB; 116: Abort trap (core dumped) <--- great, we will play with gdb soon... --- r/ndb_dd_backuprestore.log ends here --- and the relevant kernel messages: --- kern.all begins here --- Nov 15 08:39:31 qlovarnika kernel: pid 80453 (ndbd), uid 1001: exited on si= gnal 6 (core dumped) Nov 15 08:39:31 qlovarnika kernel: pid 80454 (ndbd), uid 1001: exited on si= gnal 6 (core dumped) Nov 15 08:39:54 qlovarnika kernel: pid 80492 (ndb_restore), uid 1001: exite= d on signal 6 (core dumped) --- kern.all ends here --- ndb_restore is aborting because it cannot connect to ndb, the root of the problem is that ndbd aborts, here is its backtrace: --- ndbd backtrace begins here --- Core was generated by `ndbd'. Program terminated with signal 6, Aborted. #0 0x0000000800ac789c in pthread_testcancel () from /usr/lib/libpthread.so= =2E2 [New Thread 0x6c8c000 (sleeping)] [New Thread 0x6b75c00 (sleeping)] [New Thread 0x6b75800 (sleeping)] [New Thread 0x6b75400 (sleeping)] [New Thread 0x6b75000 (sleeping)] [New Thread 0x6a60c00 (sleeping)] [New Thread 0x6a60800 (sleeping)] [New Thread 0x6a60400 (sleeping)] [New Thread 0x6a60000 (sleeping)] [New Thread 0x6949c00 (sleeping)] [New Thread 0x6949800 (sleeping)] [New Thread 0x6949400 (sleeping)] [New Thread 0x6949000 (sleeping)] [New Thread 0x6832c00 (sleeping)] [New Thread 0x6832800 (sleeping)] [New Thread 0x6832400 (sleeping)] [New Thread 0x6832000 (sleeping)] [New Thread 0x671cc00 (sleeping)] [New Thread 0x671c800 (sleeping)] [New Thread 0x671c400 (sleeping)] [New Thread 0x671c000 (sleeping)] [New Thread 0x6604c00 (sleeping)] [New Thread 0x6604800 (sleeping)] [New Thread 0x6604400 (sleeping)] [New Thread 0x6604000 (sleeping)] [New Thread 0x6424c00 (sleeping)] [New Thread 0x6424800 (sleeping)] [New Thread 0xa14800 (LWP 100306)] [New Thread 0x8e1000 (LWP 100184)] #0 0x0000000800ac789c in pthread_testcancel () from /usr/lib/libpthread.so= =2E2 #1 0x0000000800ab55c3 in sigaction () from /usr/lib/libpthread.so.2 #2 0x0000000800ab70e2 in sigaction () from /usr/lib/libpthread.so.2 #3 0x0000000800ab0db6 in pthread_kill () from /usr/lib/libpthread.so.2 #4 0x0000000800ab0633 in raise () from /usr/lib/libpthread.so.2 #5 0x0000000800ca115d in abort () from /lib/libc.so.6 #6 0x000000000040541b in childAbort (code=3D-1, currentStartPhase=3D4) at main.cpp:105 #7 0x00000000005bd2ee in NdbShutdown (type=3DNST_ErrorHandler,=20 restartType=3DNRT_Default) at Emulator.cpp:255 #8 0x00000000005c6569 in ErrorReporter::handleError (messageID=3D2306,=20 problemData=3D0x4
,=20 objRef=3D0x6 , nst=3DNST_ErrorHandler) at ErrorReporter.cpp:206 #9 0x00000000005b5948 in SimulatedBlock::progError (this=3D0xed4000, line= =3D538,=20 err_code=3D2306, extra=3D0x629dfd "dbtup/DbtupScan.cpp") at SimulatedBlock.cpp:738 #10 0x0000000000537444 in Dbtup::scanFirst (this=3D0xed4000, scanPtr=3D {p =3D 0x4378000, i =3D 0}) at dbtup/DbtupScan.cpp:538 #11 0x000000000053904f in Dbtup::execLCP_FRAG_ORD (this=3D0xed4000,=20 signal=3D0x8c5b28) at dbtup/DbtupScan.cpp:1069 #12 0x00000000004d23cf in Dblqh::execLCP_PREPARE_CONF (this=3D0xec8000,=20 signal=3D0x8c5b28) at SimulatedBlock.hpp:576 #13 0x00000000005b9bf7 in FastScheduler::doJob (this=3D0x8cf4e0) at SimulatedBlock.hpp:576 #14 0x00000000005bb485 in ThreadConfig::ipControlLoop (this=3D0x0) at ThreadConfig.cpp:175 #15 0x0000000000405fac in main (argc=3D7, argv=3D0x7fffffffcff8) at main.cp= p:474 --- ndbd backtrace ends here --- The file names and line numbers in the backtrace are a little bit misleading because SimulatedBlock::executeFunction() is inline'd. Anyway let's dig a little bit more: --- gdb session begins here --- (gdb) f 10 #10 0x0000000000537444 in Dbtup::scanFirst (this=3D0xed4000, scanPtr=3D {p =3D 0x4378000, i =3D 0}) at dbtup/DbtupScan.cpp:538 538 ptrCheckGuard(fragPtr, cnoOfFragrec, fragrecord); --- gdb session ends here --- This is the implementation of ptrCheckGuard: --- storage/ndb/src/kernel/vm/pc.hpp:ptrCheckGuardErr begins here --- 93 #define ptrCheckGuardErr(ptr, limit, rec, error) {\ 94 UintR TxxzLimit; \ 95 TxxzLimit =3D (limit); \ 96 UintR TxxxPtr; \ 97 TxxxPtr =3D ptr.i; \ 98 ptr.p =3D &rec[TxxxPtr]; \ 99 if (TxxxPtr < (TxxzLimit)) { \ 100 ; \ 101 } else { \ 102 progError(__LINE__, error, __FILE__); \ 103 }} --- storage/ndb/src/kernel/vm/pc.hpp:ptrCheckGuardErr ends here --- It calls progError() if ptr.i >=3D limit, so let's investigate: --- gdb session begins here --- (gdb) f 10 #10 0x0000000000537444 in Dbtup::scanFirst (this=3D0xed4000, scanPtr=3D {p =3D 0x4378000, i =3D 0}) at dbtup/DbtupScan.cpp:538 538 ptrCheckGuard(fragPtr, cnoOfFragrec, fragrecord); (gdb) ins fragPtr.i $1 =3D 3503345872 <----- Seems bogus (gdb) ins cnoOfFragrec $2 =3D 452 (gdb) l 534 529 Dbtup::scanFirst(Signal*, ScanOpPtr scanPtr) 530 { 531 ScanOp& scan =3D *scanPtr.p; 532 ScanPos& pos =3D scan.m_scanPos; 533 Local_key& key =3D pos.m_key; 534 const Uint32 bits =3D scan.m_bits; 535 // fragment 536 FragrecordPtr fragPtr; 537 fragPtr.i =3D scan.m_fragPtrI; 538 ptrCheckGuard(fragPtr, cnoOfFragrec, fragrecord); // // So fragPtr.i came from scan.m_fragPtrI which came from // scanPtr.p->m_fragPtrI which was passed as an argument to // Dbtup::scanFirst() // (gdb) ins scanPtr.p->m_fragPtrI $3 =3D 3503345872 // // That's correct: fragPtr.i has the same value as scanPtr.p->m_fragPtrI, // let's see where did it came from: // (gdb) f 11 #11 0x000000000053904f in Dbtup::execLCP_FRAG_ORD (this=3D0xed4000, signal=3D0x8c5b28) at dbtup/DbtupScan.cpp:1069 1069 scanFirst(signal, scanPtr); (gdb) l 1065,1069 1065 ScanOpPtr scanPtr; 1066 c_scanOpPool.getPtr(scanPtr, frag.m_lcp_scan_op); 1067 //ndbrequire(scanPtr.p->m_fragPtrI =3D=3D fragPtr.i); ? 1068 1069 scanFirst(signal, scanPtr); // // Obviously scanPtr must have been initialized in c_scanOpPool.getPtr(). // What type is c_scanOpPool? // (gdb) ins &c_scanOpPool $4 =3D (ArrayPool