List:Cluster« Previous MessageNext Message »
From:Tomas Ulin Date:November 9 2004 7:56pm
Subject:Re: node unable to rejoin cluster after reboot
View as plain text  
What we believe is the problem is that when a node dies in this manner 
some sockets are not closed.  This leads the management server to 
believe that this node is still making use of that node id.

We don't have a great solution to this at this point (for 4.1).

so the options that are there right now are as stated before:

1. restart the management server to reset this (erroneous) state
2. once you've figured out your nodeids and decided on a config, run the 
management server with --no-nodeid-checks and specify nodeids in the
connectstrings and avoid the issue as a while.

and

3. in 4.1.8 we will also offer a new command in the management server: 
PURGE STALE SESSIONS
    which fixes the issue without restarting the management server.

ndb_mgm> purge stale sessions
Purged sessions with node id's: 1
ndb_mgm> purge stale sessions
No sessions purged

For 5.0 we will redesign the protocol for reserving nodeids to avoid 
this situation (we want to avoid protocol changes in 4.1).

Hope this will suffice for 4.1.

Please comment if you wish.

T


Russell E Glaue wrote:

> Hi.
>
> This was filed as bug: #6328
> "Bug #6328 [Opn]: Cluster API node crashes, and cannot reenter cluster 
> without restarting MGM node"
>
> Also see the thread:
> "API Crash, does not reenter cluster without MGM restart"
>
>
> Please add your relevant findings of this bug to the bug report. You 
> can add a comment on to the bug report.
> http://bugs.mysql.com/bug.php?id=6328
>
> Thanks.
> -RG
>
>
>
> jon stuart wrote:
>
>> hi,
>>
>> after a reboot of a ndbd node (with node id 1), it was unable to 
>> rejoin the cluster, erroring with:
>>
>> Problem data: Unable to alloc node id
>> Object of reference: Could not alloc node id: Id 1 already allocated 
>> by another node.
>>
>> whilst the mgm client was reporting node 1 offline:
>>
>> NDB> show
>> Cluster Configuration
>> ---------------------
>> [ndbd(NDB)]     4 node(s)
>> id=1 (not connected, accepting connect from host01)
>> id=2    @x.y.z.a  (Version: 3.5.2, Nodegroup: 0)
>> id=3    @x.y.z.b  (Version: 3.5.2, Nodegroup: 1)
>> id=4    @x.y.z.c  (Version: 3.5.2, Nodegroup: 1)
>>
>> NDB> 1 status
>> Node 1: not connected
>>
>> the problem peristed for many ndbd start attempts on node 1. i 
>> cleared the problem by restarting ndb_mgmd (which is on node 5) and 
>> now all is fine.
>>
>> is this a known issue and what can i do about it?
>>
>> i am running mysql-4.1.6-gamma on redhat9 (for my sins).
>>
>> thanks and regards, jon.
>>
>
>

Thread
node unable to rejoin cluster after rebootjon stuart9 Nov
  • Re: node unable to rejoin cluster after rebootOlivier Kaloudoff9 Nov
    • Re: node unable to rejoin cluster after rebootjon stuart9 Nov
      • Re: node unable to rejoin cluster after rebootTomas Ulin9 Nov
        • Re: node unable to rejoin cluster after rebootTomas Ulin9 Nov
          • Re: node unable to rejoin cluster after rebootjon stuart9 Nov
            • Re: node unable to rejoin cluster after rebootTomas Ulin9 Nov
        • Re: node unable to rejoin cluster after rebootjon stuart9 Nov
  • Re: node unable to rejoin cluster after rebootRussell E Glaue9 Nov
    • Re: node unable to rejoin cluster after rebootTomas Ulin9 Nov