From: Matteo Brancaleoni Date: June 21 2004 12:22pm Subject: Re: DB node hang on start List-Archive: http://lists.mysql.com/cluster/18 Message-Id: <1087820549.2337.15.camel@centrino> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi Il lun, 2004-06-21 alle 13:45, Tomas Ulin ha scritto: > Did you try to start the second node with "ndbd -i"? yes, without success. > Brancaleoni Matteo wrote: > > >Hi, thanks for the fast answer :) > >see my comments inline. > > > >Il lun, 2004-06-21 alle 00:43, Tomas Ulin ha scritto: > > > > > >>first of all, if you download the latest source you don't have to > >>specify the "[TCP]" connections at all > >> > >> > >Ok, done. > > > > > > > >>1) please look where you started ndb_mgmd, you should find a cluster.log > >>(look at the end "tail -n100 cluster.log") > >> > >> > >ok, got it. unfortunately no trace about the db node #3, that's > >the one onto the remote machine > > > > > > > >>2) please make sure that you don't have any trailing "ndbd" processes on > >>the failing machine. (we're working on better detection on clashes), if > >>so kill and restart (if a "ndb" process hangs this is often due to that > >>there are "multiple" processes trying to connect as the same "id") > >> > >> > >ok. no trailing processes. > > > > > > > >>3) make sure you have your [COMPUTER] sections correct in the config file > >> > >> > >ok, done > > > > > > > >>4) make sure that your Ndb.cfg/NDB_CONNECTSTRING points to the actual > >>host:port that run the ndb_mgmd > >> > >> > >sure done. > >If I write something wrong (done just 4 testing) the node > >doesn't go at all into starting phase (should be phase 1, I think). > >But when starts, is stick in that state. > > > > > > > >>and try again until you get the config right > >> > >> > > > >mmh... I tried to start 2 db nodes on the same machine > >(of course with different fs), the 2nd db node starts, > >but after phase #4 crashes. > > > >I have a rather long trace file for that. > >the error into ndbd error.log is : > > > >Date/Time: x 20 June 2004 - 23:15:49 > >Type of error: error > >Message: Internal program error (failed ndbrequire) > >Fault ID: 2341 > >Problem data: DbdihMain.cpp > >Object of reference: DBDIH (Line: 1080) 0x00000002 > >ProgramName: NDB Kernel > >ProcessID: 10904 > >TraceFile: NDB_TraceFile_1.trace > >***EOM*** > > > > > >The mgm config is (for 2 db nodes on same machine) > >[COMPUTER] > >Id: 1 > >ByteOrder: Little > >HostName: bestia > >[COMPUTER] > >Id: 2 > >ByteOrder: Little > >HostName: bestia > >[MGM] > >Id: 1 > >ExecuteOnComputer: 1 > >ArbitrationRank: 1 > >[DB DEFAULT] > >NoOfReplicas: 2 > >[DB] > >Id: 2 > >ExecuteOnComputer: 1 > >FileSystemPath: /root/ndb/ndb_data1 > >[DB] > >Id: 3 > >ExecuteOnComputer: 2 > >FileSystemPath: /root/ndb/ndb_data2 > >[API] > >Id: 4 > >ExecuteOnComputer: 1 > >ArbitrationRank: 1 > > > >Regarding 2 db nodes on different machines, I'm stick > >to node #3 not starting (stops at phase 1, without > >exiting...) > >The only difference in mgm config.ini is the hostname > >of COMPUTER with id #2 > > > >any clue? > > > > > > -- Matteo Brancaleoni Espia - Emmegi Srl