This crash applies to 5.0.15, I know I should upgrade as allot of bugs
fixed etc in newer released, but I thought I would post my occurrence
here in hope that this 'bug' or situation gets fixed in further
releases.
The purpose of this rolling stop/start was to increase the
MaxNoOfOperations, from what I have read about the place this should
only have involved a stop - start, so no initial required. (I hope)
Their were active connections and queries running on the cluster
during this process also (if that's supposed to effect any failing
node behaviour)
# diff config.ini config.ini.20050609
5c5
< MaxNoOfConcurrentOperations=250000
---
> MaxNoOfConcurrentOperations=200000
This configuration doesnt seem to of caused a problem, as the cluster
is running fine after starting all the nodes again.
I have a 8 NDBD node (8machines) 2replica setup. Each node with 6gb memory.
I proceeded to go from nodeid 18 through to 11;
X stop (and wait for it to stop)
X# ndbd (and wait for it to start)
Where X is 18, 17, 16, 15, 14, 13, 12, 11*
*11: This was the last node, and triggered the full crash.
<<< Here is a paste from inside the mgm console >>>
ndb_mgm> 12 stop
Node 12: Node shutdown initiated
Node 12 has shutdown.
ndb_mgm> Node 12: Node shutdown completed.
ndb_mgm> all status;
Node 11: started (Version 5.0.15)
Node 12: not connected
ndb_mgm> Node 12: Started (version 5.0.15)
ndb_mgm> 11 stop
Node 11: Node shutdown initiated
Node 11 has shutdown.
ndb_mgm> Node 11: Node shutdown completed.
Node 18: Forced node shutdown completed. Initiated by signal 0. Caused by error
2334: 'Job buffer congestion(Internal error, programming error or missing error
message, please report a bug). Temporary error, restart node'.
Node 17: Forced node shutdown completed. Initiated by signal 0. Caused by error
2334: 'Job buffer congestion(Internal error, programming error or missing error
message, please report a bug). Temporary error, restart node'.
Node 16: Forced node shutdown completed. Initiated by signal 0. Caused by error
2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitra
tion error). Temporary error, restart node'.
Node 15: Forced node shutdown completed. Initiated by signal 0. Caused by error
2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitra
tion error). Temporary error, restart node'.
Node 14: Forced node shutdown completed. Initiated by signal 0. Caused by error
2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitra
tion error). Temporary error, restart node'.
Node 13: Forced node shutdown completed. Initiated by signal 0. Caused by error
2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitra
tion error). Temporary error, restart node'.
Node 12: Forced node shutdown completed. Initiated by signal 0. Caused by error
2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitra
tion error). Temporary error, restart node'.
<<< end >>>
I have uploaded all the log, out and trace files available in the ndbd
directories to; http://adixon.adam.com.au/mysql/crash20060309/
The ndb_mgm.log file is there too which includes node 12's restart,
then node 11's stop, then the arbitrary shutdown of all other nodes.
Most notably, node 11 has no error log entry or trace file, it was
actually shutdown cleanly, however the rest of the nodes do.
Is it worth lodging a bug report or anything like that? And or, did I
do anything wrong?
Anyone help here?
Adam