From: Matteo Brancaleoni Date: June 21 2004 2:31pm Subject: Re: DB node hang on start List-Archive: http://lists.mysql.com/cluster/21 Message-Id: <1087828278.2337.27.camel@centrino> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi. I was able to run 2 db nodes on the same machine. The problem was into the [COMPUTER] definition. Following the demos, I thought that I need 2 [COMPUTER] definitions, even pointing to the same machine, and let DB node 1 to run on computer 1 and db node 2 to run on computer 2 (that's the same entry). Simply removing the 2nd computer entry and letting db node #2 to run on computer 1 (as the first db node) works ok. so far so good. but now I have the problem about having the 2nd db node on another machine... still no joy. Matteo Il lun, 2004-06-21 alle 17:34, Tomas Ulin ha scritto: > but, I saw the below. It shows that you did not start cluster empty (-i). > > T > > 2004-06-20 16:49:34 [NDB] INFO -- Angel pid: 5558 ndb pid: 5560 > 2004-06-20 16:49:34 [NDB] INFO -- NDB Cluster -- DB node 2 > 2004-06-20 16:49:34 [NDB] INFO -- Version 3.5.0 (beta) -- > 2004-06-20 16:49:34 [NDB] INFO -- Start initiated (version 3.5.0) > Dbdict: name=sys/def/SYSTAB_0,id=0 > Dbdict: name=sys/def/NDB$EVENTS_0,id=2 > Dbdict: name=test/def/matteotabella2,id=4 > Dbdict: name=test/def/4/PRIMARY,id=6 > Dbdict: name=test/def/matteo,id=8 > Dbdict: name=test/def/8/PRIMARY,id=10 > Dbdict: name=test/def/mytabella,id=12 > Dbdict: name=test/def/12/PRIMARY,id=14 > 2004-06-20 16:50:12 [NDB] INFO -- Started (version 3.5.0) > > > > Tomas Ulin wrote: > > > > > when going from 1-node to 2-nodes, did you restart both nodes with -i > > flag? > > > > T > > > > Matteo Brancaleoni wrote: > > > >> Hi > >> > >> Il lun, 2004-06-21 alle 13:45, Tomas Ulin ha scritto: > >> > >> > >>> Did you try to start the second node with "ndbd -i"? > >>> > >> > >> > >> yes, without success. > >> > >> > >> > >>> Brancaleoni Matteo wrote: > >>> > >>> > >>> > >>>> Hi, thanks for the fast answer :) > >>>> see my comments inline. > >>>> > >>>> Il lun, 2004-06-21 alle 00:43, Tomas Ulin ha scritto: > >>>> > >>>> > >>>> > >>>> > >>>>> first of all, if you download the latest source you don't have to > >>>>> specify the "[TCP]" connections at all > >>>>> > >>>>> > >>>> > >>>> Ok, done. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> 1) please look where you started ndb_mgmd, you should find a > >>>>> cluster.log (look at the end "tail -n100 cluster.log") > >>>>> > >>>>> > >>>> > >>>> ok, got it. unfortunately no trace about the db node #3, that's > >>>> the one onto the remote machine > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> 2) please make sure that you don't have any trailing "ndbd" > >>>>> processes on the failing machine. (we're working on better > >>>>> detection on clashes), if so kill and restart (if a "ndb" process > >>>>> hangs this is often due to that there are "multiple" processes > >>>>> trying to connect as the same "id") > >>>>> > >>>>> > >>>> > >>>> ok. no trailing processes. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> 3) make sure you have your [COMPUTER] sections correct in the > >>>>> config file > >>>>> > >>>>> > >>>> > >>>> ok, done > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> 4) make sure that your Ndb.cfg/NDB_CONNECTSTRING points to the > >>>>> actual host:port that run the ndb_mgmd > >>>>> > >>>>> > >>>> > >>>> sure done. > >>>> If I write something wrong (done just 4 testing) the node > >>>> doesn't go at all into starting phase (should be phase 1, I think). > >>>> But when starts, is stick in that state. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> and try again until you get the config right > >>>>> > >>>>> > >>>> > >>>> mmh... I tried to start 2 db nodes on the same machine > >>>> (of course with different fs), the 2nd db node starts, > >>>> but after phase #4 crashes. > >>>> > >>>> I have a rather long trace file for that. > >>>> the error into ndbd error.log is : > >>>> > >>>> Date/Time: x 20 June 2004 - 23:15:49 > >>>> Type of error: error > >>>> Message: Internal program error (failed ndbrequire) > >>>> Fault ID: 2341 > >>>> Problem data: DbdihMain.cpp > >>>> Object of reference: DBDIH (Line: 1080) 0x00000002 > >>>> ProgramName: NDB Kernel > >>>> ProcessID: 10904 > >>>> TraceFile: NDB_TraceFile_1.trace > >>>> ***EOM*** > >>>> > >>>> > >>>> The mgm config is (for 2 db nodes on same machine) > >>>> [COMPUTER] > >>>> Id: 1 > >>>> ByteOrder: Little > >>>> HostName: bestia > >>>> [COMPUTER] > >>>> Id: 2 > >>>> ByteOrder: Little > >>>> HostName: bestia > >>>> [MGM] > >>>> Id: 1 > >>>> ExecuteOnComputer: 1 > >>>> ArbitrationRank: 1 > >>>> [DB DEFAULT] > >>>> NoOfReplicas: 2 > >>>> [DB] > >>>> Id: 2 > >>>> ExecuteOnComputer: 1 > >>>> FileSystemPath: /root/ndb/ndb_data1 > >>>> [DB] > >>>> Id: 3 > >>>> ExecuteOnComputer: 2 > >>>> FileSystemPath: /root/ndb/ndb_data2 > >>>> [API] > >>>> Id: 4 > >>>> ExecuteOnComputer: 1 > >>>> ArbitrationRank: 1 > >>>> > >>>> Regarding 2 db nodes on different machines, I'm stick > >>>> to node #3 not starting (stops at phase 1, without > >>>> exiting...) > >>>> The only difference in mgm config.ini is the hostname > >>>> of COMPUTER with id #2 > >>>> > >>>> any clue? > >>>> > >>>> > >>>> > >>>> > >>> > > > > -- Matteo Brancaleoni Espia - Emmegi Srl