Hi,
last monday we've tried to update a database which is mainly used by our
Java-applications from SAP DB version 7.4 (ASCII) to MaxDB Version 7.6.0.36
(UNICODE) and then 7.6.1.15. We use the same binaries/installation procedure
as for our SAP systems. Despite the fact this database is not used by any of
our SAP systems the bug could threat them too.
Here are the message from knldiag and knldiag.err:
knldiag:
---8<---
[...]
2007-05-28 13:13:15 30386 12929 TASKING Task T142 started
2007-05-28 13:13:15 30386 11007 COMMUNIC wait for connection T142
2007-05-28 13:13:15 30386 11561 COMMUNIC Connected T142
japs02.saarstahl.de 0
2007-05-28 13:13:15 30386 11560 COMMUNIC Releasing T142
2007-05-28 13:13:15 30386 12827 COMMUNIC wait for connection T142
2007-05-28 13:13:15 30370 11561 COMMUNIC Connecting T177
japs02.saarstahl.de 0
2007-05-28 13:13:15 30386 12929 TASKING Task T177 started
2007-05-28 13:13:15 30386 11007 COMMUNIC wait for connection T177
2007-05-28 13:13:15 30386 11561 COMMUNIC Connected T177
japs02.saarstahl.de 0
2007-05-28 13:13:15 30386 11560 COMMUNIC Releasing T177
2007-05-28 13:13:15 30386 12827 COMMUNIC wait for connection T177
+++++++++++++++++++++++++++++++++++++++ Kernel Exit
++++++++++++++++++++++++++++
2007-05-28 13:13:51 0 11987 dump_rte rtedump written to file
'rtedump'
2007-05-28 13:13:51 0 ERR 12005 DBCRASH Kernel exited with core and
exit status 0x8b
2007-05-28 13:13:51 0 ERR 12012 DBCRASH No stack backtrace since signal
handler was suppressed by SUPPRESS_CORE=NO
2007-05-28 13:13:51 0 ERR 12009 DBCRASH Kernel exited due to signal
11(SIGSEGV)
2007-05-28 13:13:51 0 12808 DBSTATE Flushing knltrace pages
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T113 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T121 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T134 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T147 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T153 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T184 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T213 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T242 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T273 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T278 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T289 kernel abort
2007-05-28 13:13:52 0 WNG 11824 COMMUNIC Releasing T295 kernel abort
2007-05-28 13:13:52 0 12696 DBSTATE Change DbState to 'OFFLINE
'(29)
---8<---
knldiag.err:
---8<---
[...]
2007-05-28 13:13:51 0 ERR 12005 DBCRASH Kernel exited with core and
exit status 0x8b
2007-05-28 13:13:51 0 ERR 12012 DBCRASH No stack backtrace since signal
handler was suppressed by SUPPRESS_CORE=NO
2007-05-28 13:13:51 0 ERR 12009 DBCRASH Kernel exited due to signal
11(SIGSEGV)
2007-05-28 13:13:52 ___ Stopping GMT 2007-05-28
11:13:52 7.6.01 Build 015-121-147-649
---8<---
A stack backtrace with gdb showed this:
---8<---
(gdb) bt
#0 0x0000000000823675 in a93swap_from_application ()
#1 0x0000000000823cba in ak93vreceive ()
#2 0x0000000000823f29 in a93_user_commands ()
#3 0x0000000000d8ae04 in SQLTask ()
#4 0x0000000000d8afe3 in Kernel_Main ()
#5 0x0000000000e13a48 in RTETask_TaskMain ()
#6 0x0000000000ec119c in en88_CallKernelTaskMain ()
#7 0x00002b0b5740fe70 in __correctly_grouped_prefixwc () from
/lib64/libc.so.6
#8 0x0000000000000000 in ?? ()
---8<---
---8<---
(gdb) info registers
rax 0x4c535f50 1280532304
rbx 0x142bfc0 21151680
rcx 0x2 2
rdx 0x4c535f5e 1280532318
rsi 0x3 3
rdi 0x4c535f50 1280532304
rbp 0x2aacb201ce5f 0x2aacb201ce5f
rsp 0x2aacb201cd60 0x2aacb201cd60
r8 0x2aacb201cd54 46921209204052
r9 0x2aacb63d4fdc 46921280212956
r10 0x2aaaaac2bc08 46912497695752
r11 0x212 530
r12 0x2aad0290af30 46922560745264
r13 0x2aacb201d320 46921209205536
r14 0x2aacb201cede 46921209204446
r15 0x2aacb63d4f88 46921280212872
rip 0x823675 0x823675 <a93swap_from_application+453>
eflags 0x10203 [ CF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x63 99
gs 0x0 0
---8<---
A quick look in the source shows, that the function which crashed here is a
very central one.
We first migrated to 7.6.0.36 (normal installation, import of data via
loadercli). The database was running for approximatley 5 hours then the
kernel crashed the first time (same backtrace btw.). Afterwards we updated
to version 7.6.1.15 but the databases crashed again multiple times.
The database runs on a 8 way Intel Core2 machine with 32GB RAM and SuSE
SLES10 x86_64 (Linux kernel 2.6.16).
Now I try to provoke the error; could you give me a hint? Maybe the request
is too big for an internal buffer or the like?
Thanks a lot for your help!
bye
Chris
phone: +49 6898/10-4987
fax: +49 6898/10-54987
http://www.saarstahl.de
| Thread |
|---|
| • MaxDB Kernel Crash 7.6.0.36/7.6.1.15 | Christian JUNG | 1 Jun |