From: Tomas Ulin Date: June 21 2004 3:34pm Subject: Re: DB node hang on start List-Archive: http://lists.mysql.com/cluster/20 Message-Id: <40D70005.7010606@mysql.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit but, I saw the below. It shows that you did not start cluster empty (-i). T 2004-06-20 16:49:34 [NDB] INFO -- Angel pid: 5558 ndb pid: 5560 2004-06-20 16:49:34 [NDB] INFO -- NDB Cluster -- DB node 2 2004-06-20 16:49:34 [NDB] INFO -- Version 3.5.0 (beta) -- 2004-06-20 16:49:34 [NDB] INFO -- Start initiated (version 3.5.0) Dbdict: name=sys/def/SYSTAB_0,id=0 Dbdict: name=sys/def/NDB$EVENTS_0,id=2 Dbdict: name=test/def/matteotabella2,id=4 Dbdict: name=test/def/4/PRIMARY,id=6 Dbdict: name=test/def/matteo,id=8 Dbdict: name=test/def/8/PRIMARY,id=10 Dbdict: name=test/def/mytabella,id=12 Dbdict: name=test/def/12/PRIMARY,id=14 2004-06-20 16:50:12 [NDB] INFO -- Started (version 3.5.0) Tomas Ulin wrote: > > when going from 1-node to 2-nodes, did you restart both nodes with -i > flag? > > T > > Matteo Brancaleoni wrote: > >> Hi >> >> Il lun, 2004-06-21 alle 13:45, Tomas Ulin ha scritto: >> >> >>> Did you try to start the second node with "ndbd -i"? >>> >> >> >> yes, without success. >> >> >> >>> Brancaleoni Matteo wrote: >>> >>> >>> >>>> Hi, thanks for the fast answer :) >>>> see my comments inline. >>>> >>>> Il lun, 2004-06-21 alle 00:43, Tomas Ulin ha scritto: >>>> >>>> >>>> >>>> >>>>> first of all, if you download the latest source you don't have to >>>>> specify the "[TCP]" connections at all >>>>> >>>>> >>>> >>>> Ok, done. >>>> >>>> >>>> >>>> >>>> >>>>> 1) please look where you started ndb_mgmd, you should find a >>>>> cluster.log (look at the end "tail -n100 cluster.log") >>>>> >>>>> >>>> >>>> ok, got it. unfortunately no trace about the db node #3, that's >>>> the one onto the remote machine >>>> >>>> >>>> >>>> >>>> >>>>> 2) please make sure that you don't have any trailing "ndbd" >>>>> processes on the failing machine. (we're working on better >>>>> detection on clashes), if so kill and restart (if a "ndb" process >>>>> hangs this is often due to that there are "multiple" processes >>>>> trying to connect as the same "id") >>>>> >>>>> >>>> >>>> ok. no trailing processes. >>>> >>>> >>>> >>>> >>>> >>>>> 3) make sure you have your [COMPUTER] sections correct in the >>>>> config file >>>>> >>>>> >>>> >>>> ok, done >>>> >>>> >>>> >>>> >>>> >>>>> 4) make sure that your Ndb.cfg/NDB_CONNECTSTRING points to the >>>>> actual host:port that run the ndb_mgmd >>>>> >>>>> >>>> >>>> sure done. >>>> If I write something wrong (done just 4 testing) the node >>>> doesn't go at all into starting phase (should be phase 1, I think). >>>> But when starts, is stick in that state. >>>> >>>> >>>> >>>> >>>> >>>>> and try again until you get the config right >>>>> >>>>> >>>> >>>> mmh... I tried to start 2 db nodes on the same machine >>>> (of course with different fs), the 2nd db node starts, >>>> but after phase #4 crashes. >>>> >>>> I have a rather long trace file for that. >>>> the error into ndbd error.log is : >>>> >>>> Date/Time: x 20 June 2004 - 23:15:49 >>>> Type of error: error >>>> Message: Internal program error (failed ndbrequire) >>>> Fault ID: 2341 >>>> Problem data: DbdihMain.cpp >>>> Object of reference: DBDIH (Line: 1080) 0x00000002 >>>> ProgramName: NDB Kernel >>>> ProcessID: 10904 >>>> TraceFile: NDB_TraceFile_1.trace >>>> ***EOM*** >>>> >>>> >>>> The mgm config is (for 2 db nodes on same machine) >>>> [COMPUTER] >>>> Id: 1 >>>> ByteOrder: Little >>>> HostName: bestia >>>> [COMPUTER] >>>> Id: 2 >>>> ByteOrder: Little >>>> HostName: bestia >>>> [MGM] >>>> Id: 1 >>>> ExecuteOnComputer: 1 >>>> ArbitrationRank: 1 >>>> [DB DEFAULT] >>>> NoOfReplicas: 2 >>>> [DB] >>>> Id: 2 >>>> ExecuteOnComputer: 1 >>>> FileSystemPath: /root/ndb/ndb_data1 >>>> [DB] >>>> Id: 3 >>>> ExecuteOnComputer: 2 >>>> FileSystemPath: /root/ndb/ndb_data2 >>>> [API] >>>> Id: 4 >>>> ExecuteOnComputer: 1 >>>> ArbitrationRank: 1 >>>> >>>> Regarding 2 db nodes on different machines, I'm stick >>>> to node #3 not starting (stops at phase 1, without >>>> exiting...) >>>> The only difference in mgm config.ini is the hostname >>>> of COMPUTER with id #2 >>>> >>>> any clue? >>>> >>>> >>>> >>>> >>> > >