Lewis Bergman wrote:
> Jonas Oreland wrote:
>
>> Lewis Bergman wrote:
>>
>>> Back to the books. It looks like the stale session problem keeps an ndbd
>>> node from coming back immediately. How do all you guru's who know how to
>>> do this handle this situation without intervention.
>>>
>>> If I use the --no-nodeid-checks can I then use a connect string that
>>> includes the nodeid=<nodeid> parameter and avoid the stale session
>>> problem?
yes
>>
>>
>>
>> there is a ndb_mgm command: "PURGE STALE SESSIONS"
>>
> I have seen that and used it to get the nodes back up and running.
>
> My question is more like this:
> What do you do to insure that a cluster node disappearing in the middle
> of the night does not necessitate someone's manual intervention?
>
> I may have different goals for this than most of you. I have noone
> babysitting a lrage cluster 24/7. I would have to be alerted somehow and
> then wake up, log in, start mgm, PURGE STALE SESSIONS, log out. The main
> reason for me to have the cluster is to avoid such problems to the
> greatest extent.
>
> I want the thing to come back on and get back in the cluster so I can
> figure it out tommorrow instead of at two in the morning.
>
> That does bring up a good point though. Is there anyone who has any
> scripts or (dare I say it) snmp capability that the mgmd can react with?
> I guess I could write one to watch the cluster log or something. That
> doesn't look very friendly though. At any rate, if anyone has thoughts
> on how you deal with this please let me know.
>
Hi,
The only case where the "PURGE STALE SESSIONS" should be needed
is if a _computer/os_ fails.
That will likely introduce manual intervention.
Anyway,
1) we're working on removing this problem.
The need to use the command even for computer/os failures.
2) As you suggested:
If all nodes have nodeid in theirs connect string and ndb_mgmd is
started without nodeid checks, (i.e no dynamic node allocation)
There should _never_ be a need for the command.
And ndb should be 24/7 wo/ anykind of manual intervention.
3) If you found a way to force the usage of "purge" wo/ os/computer
failure please file a bug report, and we'll try to fix it asap.
/Jonas
--
Jonas Oreland, Software Engineer
MySQL AB, www.mysql.com