List:Cluster« Previous MessageNext Message »
From:tulin Date:June 22 2004 9:37pm
Subject:Re: DB node hang on start
View as plain text  
Brancaleoni Matteo wrote:

>Hi.
>
>I managed to get it work on the same machine.
>What I didn't know, is that a db node waits
>all other nodes on startup, if nr of db nodes >1 .
>
great

>so that is solved...
>so, can we go on with the remote one? could I
>follow this same procedure on the remote node?
>
for 2 nodes  on different machines (I'm assuming you have the same config file
as you showed before):
-- kill all ndb processes
-- <vi|emacs> config.ini
-- change computer 2 to your other host  (nothing else)
-- start ndb_mgmd
-- start ndbd -i on "computer 1", Ndb.cfg|NDB_CONNECSTRING should read
"host=bestia:2200;nodeid=2"
-- ssh to your other machine
-- start ndbd -i on "computer 2", Ndb.cfg|NDB_CONNECSTRING should read
"host=bestia:2200;nodeid=3", also make sure you have the dir
/root/ndb/ndb_data2 on that computer


>Another question:
>NoOfReplicas what means? from the manual
>it can be from 1 to 4, but really I cannot
>set it more than 2... so seems that
>a cluster with 4 db nodes (all on the same
>machine, with multiple computer entries)
>uses only first 2, even all nodes are up.
>
>is that expected?

The data is partitioned over the ndb nodes. NoOfReplicas means how many nodes
sould carry same data. E.g. 4-node and 4 partitions(fragments) P1-P4:

Replicas=1
Node 1 <- P1
Node 2 <- P2
Node 3 <- P3
Node 4 <- P4

=> cluster will die if one node dies

Replicas=2
Node 1 <- P1, P2
Node 2 <- P2, P1
Node 3 <- P3, P4
Node 4 <- P4, P3

=> cluster can continue operate even if one node dies

Running with more that 2 replicas is not well tested.


>
>Thanks, Matteo
>
>Il lun, 2004-06-21 alle 20:36, Tomas Ulin ha scritto:
>
>>you should be able to run with 2 COMPUTER definitions on the same 
>>machine, let's start with making that work
>>
>>do the following:
>>- clean up directory from NDB_Trace, error.log etc
>>- get back to the stuck state (make sure it's stuck for a minute or so)
>>- identify the "ndb pid" = e.g. 17993 for the stuck node, you should see 
>>something like 2004-06-21 18:29:18 [NDB] INFO     -- Angel pid: 17991 
>>ndb pid: 17993
>>- force an abort with  kill -6 17993
>>- tar.gz the NDB_Trace... file and send it to me
>>
>>T
>>


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Thread
DB node hang on startBrancaleoni Matteo20 Jun
  • Re: DB node hang on startTomas Ulin20 Jun
    • Re: DB node hang on startBrancaleoni Matteo20 Jun
      • Re: DB node hang on startTomas Ulin21 Jun
      • Re: DB node hang on startTomas Ulin21 Jun
        • Re: DB node hang on startMatteo Brancaleoni21 Jun
          • Re: DB node hang on startTomas Ulin21 Jun
            • Re: DB node hang on startTomas Ulin21 Jun
              • Re: DB node hang on startMatteo Brancaleoni21 Jun
                • Re: DB node hang on startTomas Ulin21 Jun
                  • Re: DB node hang on startBrancaleoni Matteo21 Jun
                • Re: DB node hang on starttulin23 Jun
                  • Re: DB node hang on startMatteo Brancaleoni23 Jun
                    • Re: DB node hang on startTomas Ulin23 Jun
                      • Re: DB node hang on startMatteo Brancaleoni23 Jun
                  • Re: DB node hang on startMatteo Brancaleoni23 Jun
Re: DB node hang on startTomas Ulin22 Jun