List:Cluster« Previous MessageNext Message »
From:Martin Pála Date:December 30 2004 10:07am
Subject:RE: I have run into a hitch with the
View as plain text  
Hi,

you can use for example Monit to watch the cluster processes (its opensource aplication:
http://www.tildeslash.com/monit). It allows to detect the process failures, service
availability,  response times, memory and cpu usage per process, etc. It has support for
alerts and custom actions for automatic reparation of the problem. It has also XML output
(both GET and PUSH methods like in SNMP) which you can use to collect process
characteristics from external application. There is web and command-line interface
available too. It allows to watch remote hosts, devices, files and directories too and
you can use dependency tree to manage the services (so you can for example stop the
service in the case that the filesystem is 99% full, etc.)

I'm currently starting using MySQL cluster and though about integration with Monit - i
will write some 'howto' soon. This can provide redundant monitoring for MySQL cluster
(there is already 'angel' watchdog) and provide additional data for trends monitoring
(note that we are working on centralized application 'm/monit' for management of monit
agents, collecting events, graphing characteristics, etc.).

Cheers,
Martin

-----Original Message-----
From: Lewis Bergman [mailto:lbergman@stripped]
Sent: Wednesday, December 29, 2004 5:03 PM
To: 'cluster@stripped'
Subject: Re: I have run into a hitch with the


Jonas Oreland wrote:
> Lewis Bergman wrote:
> 
>> Back to the books. It looks like the stale session problem keeps an ndbd
>> node from coming back immediately. How do all you guru's who know how to
>> do this handle this situation without intervention.
>>
>> If I use the --no-nodeid-checks can I then use a connect string that 
>> includes the nodeid=<nodeid> parameter and avoid the stale session 
>> problem?
> 
> 
> there is a ndb_mgm command: "PURGE STALE SESSIONS"
> 
I have seen that and used it to get the nodes back up and running.

My question is more like this:
What do you do to insure that a cluster node disappearing in the middle of 
the night does not necessitate someone's manual intervention?

I may have different goals for this than most of you. I have noone 
babysitting a lrage cluster 24/7. I would have to be alerted somehow and 
then wake up, log in, start mgm, PURGE STALE SESSIONS, log out. The main 
reason for me to have the cluster is to avoid such problems to the greatest 
extent.

I want the thing to come back on and get back in the cluster so I can 
figure it out tommorrow instead of at two in the morning.

That does bring up a good point though. Is there anyone who has any scripts 
or (dare I say it) snmp capability that the mgmd can react with? I guess I 
could write one to watch the cluster log or something. That doesn't look 
very friendly though. At any rate, if anyone has thoughts on how you deal 
with this please let me know.

-- 
Lewis Bergman
Texas Communications
4309 Maple St.
Abilene, TX 79602-8044
325-691-3301
800-299-6962

-- 
MySQL Cluster Mailing List
For list archives: http://lists.mysql.com/cluster
To unsubscribe:    http://lists.mysql.com/cluster?unsub=1

Thread
I have run into a hitch with theLewis Bergman29 Dec
  • Re: I have run into a hitch with theJonas Oreland29 Dec
    • Re: I have run into a hitch with theLewis Bergman29 Dec
      • Re: I have run into a hitch with theJonas Oreland29 Dec
RE: I have run into a hitch with theMartin Pála30 Dec