From: tulin Date: June 22 2004 9:37pm Subject: Re: DB node hang on start List-Archive: http://lists.mysql.com/cluster/27 Message-Id: <1087940258.40d8a6a2e6094@mail.mysql.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Brancaleoni Matteo wrote: >Hi. > >I managed to get it work on the same machine. >What I didn't know, is that a db node waits >all other nodes on startup, if nr of db nodes >1 . > great >so that is solved... >so, can we go on with the remote one? could I >follow this same procedure on the remote node? > for 2 nodes on different machines (I'm assuming you have the same config file as you showed before): -- kill all ndb processes -- config.ini -- change computer 2 to your other host (nothing else) -- start ndb_mgmd -- start ndbd -i on "computer 1", Ndb.cfg|NDB_CONNECSTRING should read "host=bestia:2200;nodeid=2" -- ssh to your other machine -- start ndbd -i on "computer 2", Ndb.cfg|NDB_CONNECSTRING should read "host=bestia:2200;nodeid=3", also make sure you have the dir /root/ndb/ndb_data2 on that computer >Another question: >NoOfReplicas what means? from the manual >it can be from 1 to 4, but really I cannot >set it more than 2... so seems that >a cluster with 4 db nodes (all on the same >machine, with multiple computer entries) >uses only first 2, even all nodes are up. > >is that expected? The data is partitioned over the ndb nodes. NoOfReplicas means how many nodes sould carry same data. E.g. 4-node and 4 partitions(fragments) P1-P4: Replicas=1 Node 1 <- P1 Node 2 <- P2 Node 3 <- P3 Node 4 <- P4 => cluster will die if one node dies Replicas=2 Node 1 <- P1, P2 Node 2 <- P2, P1 Node 3 <- P3, P4 Node 4 <- P4, P3 => cluster can continue operate even if one node dies Running with more that 2 replicas is not well tested. > >Thanks, Matteo > >Il lun, 2004-06-21 alle 20:36, Tomas Ulin ha scritto: > >>you should be able to run with 2 COMPUTER definitions on the same >>machine, let's start with making that work >> >>do the following: >>- clean up directory from NDB_Trace, error.log etc >>- get back to the stuck state (make sure it's stuck for a minute or so) >>- identify the "ndb pid" = e.g. 17993 for the stuck node, you should see >>something like 2004-06-21 18:29:18 [NDB] INFO -- Angel pid: 17991 >>ndb pid: 17993 >>- force an abort with kill -6 17993 >>- tar.gz the NDB_Trace... file and send it to me >> >>T >> ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.