From: Lewis Bergman Date: December 29 2004 4:02pm Subject: Re: I have run into a hitch with the List-Archive: http://lists.mysql.com/cluster/1300 Message-Id: <41D2D528.9020805@wtxs.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Jonas Oreland wrote: > Lewis Bergman wrote: > >> Back to the books. It looks like the stale session problem keeps an ndbd >> node from coming back immediately. How do all you guru's who know how to >> do this handle this situation without intervention. >> >> If I use the --no-nodeid-checks can I then use a connect string that >> includes the nodeid= parameter and avoid the stale session >> problem? > > > there is a ndb_mgm command: "PURGE STALE SESSIONS" > I have seen that and used it to get the nodes back up and running. My question is more like this: What do you do to insure that a cluster node disappearing in the middle of the night does not necessitate someone's manual intervention? I may have different goals for this than most of you. I have noone babysitting a lrage cluster 24/7. I would have to be alerted somehow and then wake up, log in, start mgm, PURGE STALE SESSIONS, log out. The main reason for me to have the cluster is to avoid such problems to the greatest extent. I want the thing to come back on and get back in the cluster so I can figure it out tommorrow instead of at two in the morning. That does bring up a good point though. Is there anyone who has any scripts or (dare I say it) snmp capability that the mgmd can react with? I guess I could write one to watch the cluster log or something. That doesn't look very friendly though. At any rate, if anyone has thoughts on how you deal with this please let me know. -- Lewis Bergman Texas Communications 4309 Maple St. Abilene, TX 79602-8044 325-691-3301 800-299-6962