From: Tomas Ulin Date: June 21 2004 2:57pm Subject: Re: DB node hang on start List-Archive: http://lists.mysql.com/cluster/19 Message-Id: <40D6F768.7020003@mysql.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit when going from 1-node to 2-nodes, did you restart both nodes with -i flag? T Matteo Brancaleoni wrote: >Hi > >Il lun, 2004-06-21 alle 13:45, Tomas Ulin ha scritto: > > >>Did you try to start the second node with "ndbd -i"? >> >> > >yes, without success. > > > >>Brancaleoni Matteo wrote: >> >> >> >>>Hi, thanks for the fast answer :) >>>see my comments inline. >>> >>>Il lun, 2004-06-21 alle 00:43, Tomas Ulin ha scritto: >>> >>> >>> >>> >>>>first of all, if you download the latest source you don't have to >>>>specify the "[TCP]" connections at all >>>> >>>> >>>> >>>> >>>Ok, done. >>> >>> >>> >>> >>> >>>>1) please look where you started ndb_mgmd, you should find a cluster.log >>>>(look at the end "tail -n100 cluster.log") >>>> >>>> >>>> >>>> >>>ok, got it. unfortunately no trace about the db node #3, that's >>>the one onto the remote machine >>> >>> >>> >>> >>> >>>>2) please make sure that you don't have any trailing "ndbd" processes on >>>>the failing machine. (we're working on better detection on clashes), if >>>>so kill and restart (if a "ndb" process hangs this is often due to that >>>>there are "multiple" processes trying to connect as the same "id") >>>> >>>> >>>> >>>> >>>ok. no trailing processes. >>> >>> >>> >>> >>> >>>>3) make sure you have your [COMPUTER] sections correct in the config file >>>> >>>> >>>> >>>> >>>ok, done >>> >>> >>> >>> >>> >>>>4) make sure that your Ndb.cfg/NDB_CONNECTSTRING points to the actual >>>>host:port that run the ndb_mgmd >>>> >>>> >>>> >>>> >>>sure done. >>>If I write something wrong (done just 4 testing) the node >>>doesn't go at all into starting phase (should be phase 1, I think). >>>But when starts, is stick in that state. >>> >>> >>> >>> >>> >>>>and try again until you get the config right >>>> >>>> >>>> >>>> >>>mmh... I tried to start 2 db nodes on the same machine >>>(of course with different fs), the 2nd db node starts, >>>but after phase #4 crashes. >>> >>>I have a rather long trace file for that. >>>the error into ndbd error.log is : >>> >>>Date/Time: x 20 June 2004 - 23:15:49 >>>Type of error: error >>>Message: Internal program error (failed ndbrequire) >>>Fault ID: 2341 >>>Problem data: DbdihMain.cpp >>>Object of reference: DBDIH (Line: 1080) 0x00000002 >>>ProgramName: NDB Kernel >>>ProcessID: 10904 >>>TraceFile: NDB_TraceFile_1.trace >>>***EOM*** >>> >>> >>>The mgm config is (for 2 db nodes on same machine) >>>[COMPUTER] >>>Id: 1 >>>ByteOrder: Little >>>HostName: bestia >>>[COMPUTER] >>>Id: 2 >>>ByteOrder: Little >>>HostName: bestia >>>[MGM] >>>Id: 1 >>>ExecuteOnComputer: 1 >>>ArbitrationRank: 1 >>>[DB DEFAULT] >>>NoOfReplicas: 2 >>>[DB] >>>Id: 2 >>>ExecuteOnComputer: 1 >>>FileSystemPath: /root/ndb/ndb_data1 >>>[DB] >>>Id: 3 >>>ExecuteOnComputer: 2 >>>FileSystemPath: /root/ndb/ndb_data2 >>>[API] >>>Id: 4 >>>ExecuteOnComputer: 1 >>>ArbitrationRank: 1 >>> >>>Regarding 2 db nodes on different machines, I'm stick >>>to node #3 not starting (stops at phase 1, without >>>exiting...) >>>The only difference in mgm config.ini is the hostname >>>of COMPUTER with id #2 >>> >>>any clue? >>> >>> >>> >>> >>>