you can use for example Monit to watch the cluster processes (its opensource aplication:
http://www.tildeslash.com/monit). It allows to detect the process failures, service
availability, response times, memory and cpu usage per process, etc. It has support for
alerts and custom actions for automatic reparation of the problem. It has also XML output
(both GET and PUSH methods like in SNMP) which you can use to collect process
characteristics from external application. There is web and command-line interface
available too. It allows to watch remote hosts, devices, files and directories too and
you can use dependency tree to manage the services (so you can for example stop the
service in the case that the filesystem is 99% full, etc.)
I'm currently starting using MySQL cluster and though about integration with Monit - i
will write some 'howto' soon. This can provide redundant monitoring for MySQL cluster
(there is already 'angel' watchdog) and provide additional data for trends monitoring
(note that we are working on centralized application 'm/monit' for management of monit
agents, collecting events, graphing characteristics, etc.).
From: Lewis Bergman [mailto:lbergman@stripped]
Sent: Wednesday, December 29, 2004 5:03 PM
Subject: Re: I have run into a hitch with the
Jonas Oreland wrote:
> Lewis Bergman wrote:
>> Back to the books. It looks like the stale session problem keeps an ndbd
>> node from coming back immediately. How do all you guru's who know how to
>> do this handle this situation without intervention.
>> If I use the --no-nodeid-checks can I then use a connect string that
>> includes the nodeid=<nodeid> parameter and avoid the stale session
> there is a ndb_mgm command: "PURGE STALE SESSIONS"
I have seen that and used it to get the nodes back up and running.
My question is more like this:
What do you do to insure that a cluster node disappearing in the middle of
the night does not necessitate someone's manual intervention?
I may have different goals for this than most of you. I have noone
babysitting a lrage cluster 24/7. I would have to be alerted somehow and
then wake up, log in, start mgm, PURGE STALE SESSIONS, log out. The main
reason for me to have the cluster is to avoid such problems to the greatest
I want the thing to come back on and get back in the cluster so I can
figure it out tommorrow instead of at two in the morning.
That does bring up a good point though. Is there anyone who has any scripts
or (dare I say it) snmp capability that the mgmd can react with? I guess I
could write one to watch the cluster log or something. That doesn't look
very friendly though. At any rate, if anyone has thoughts on how you deal
with this please let me know.
4309 Maple St.
Abilene, TX 79602-8044
MySQL Cluster Mailing List
For list archives: http://lists.mysql.com/cluster
To unsubscribe: http://lists.mysql.com/cluster?unsub=1