List:Cluster« Previous MessageNext Message »
From:Brancaleoni Matteo Date:June 21 2004 5:33pm
Subject:Re: DB node hang on start
View as plain text  
Hi.

I managed to get it work on the same machine.
What I didn't know, is that a db node waits
all other nodes on startup, if nr of db nodes >1 .

so that is solved...
so, can we go on with the remote one? could I
follow this same procedure on the remote node?

Another question:
NoOfReplicas what means? from the manual
it can be from 1 to 4, but really I cannot
set it more than 2... so seems that
a cluster with 4 db nodes (all on the same
machine, with multiple computer entries)
uses only first 2, even all nodes are up.

is that expected?

Thanks, Matteo

Il lun, 2004-06-21 alle 20:36, Tomas Ulin ha scritto:
> you should be able to run with 2 COMPUTER definitions on the same 
> machine, let's start with making that work
> 
> do the following:
> - clean up directory from NDB_Trace, error.log etc
> - get back to the stuck state (make sure it's stuck for a minute or so)
> - identify the "ndb pid" = e.g. 17993 for the stuck node, you should see 
> something like 2004-06-21 18:29:18 [NDB] INFO     -- Angel pid: 17991 
> ndb pid: 17993
> - force an abort with  kill -6 17993
> - tar.gz the NDB_Trace... file and send it to me
> 
> T
> 
> Matteo Brancaleoni wrote:
> 
> >Hi.
> >
> >I was able to run 2 db nodes on the same machine.
> >The problem was into the [COMPUTER]
> >definition. Following the demos, I thought that
> >I need 2 [COMPUTER] definitions, even pointing
> >to the same machine, and let DB node 1 to run
> >on computer 1 and db node 2 to run on computer 2
> >(that's the same entry).
> >
> >Simply removing the 2nd computer entry and
> >letting db node #2 to run on computer 1 (as the first
> >db node) works ok.
> >
> >so far so good.
> >
> >but now I have the problem about having the 2nd db node
> >on another machine... still no joy.
> >
> >Matteo
> >
> >Il lun, 2004-06-21 alle 17:34, Tomas Ulin ha scritto:
> >  
> >
> >>but, I saw the below.  It shows that you did not start cluster empty (-i).
> >>
> >>T
> >>
> >>2004-06-20 16:49:34 [NDB] INFO     -- Angel pid: 5558 ndb pid: 5560
> >>2004-06-20 16:49:34 [NDB] INFO     -- NDB Cluster -- DB node 2
> >>2004-06-20 16:49:34 [NDB] INFO     -- Version 3.5.0 (beta) --
> >>2004-06-20 16:49:34 [NDB] INFO     -- Start initiated (version 3.5.0)
> >>Dbdict: name=sys/def/SYSTAB_0,id=0
> >>Dbdict: name=sys/def/NDB$EVENTS_0,id=2
> >>Dbdict: name=test/def/matteotabella2,id=4
> >>Dbdict: name=test/def/4/PRIMARY,id=6
> >>Dbdict: name=test/def/matteo,id=8
> >>Dbdict: name=test/def/8/PRIMARY,id=10
> >>Dbdict: name=test/def/mytabella,id=12
> >>Dbdict: name=test/def/12/PRIMARY,id=14
> >>2004-06-20 16:50:12 [NDB] INFO     -- Started (version 3.5.0)
> >>
> >>
> >>
> >>Tomas Ulin wrote:
> >>
> >>    
> >>
> >>>when going from 1-node to 2-nodes, did you restart both nodes with -i 
> >>>flag?
> >>>
> >>>T
> >>>
> >>>Matteo Brancaleoni wrote:
> >>>
> >>>      
> >>>
> >>>>Hi
> >>>>
> >>>>Il lun, 2004-06-21 alle 13:45, Tomas Ulin ha scritto:
> >>>> 
> >>>>
> >>>>        
> >>>>
> >>>>>Did you try to start the second node with "ndbd -i"?
> >>>>>  
> >>>>>          
> >>>>>
> >>>>yes, without success.
> >>>>
> >>>> 
> >>>>
> >>>>        
> >>>>
> >>>>>Brancaleoni Matteo wrote:
> >>>>>
> >>>>>  
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>Hi, thanks for the fast answer :)
> >>>>>>see my comments inline.
> >>>>>>
> >>>>>>Il lun, 2004-06-21 alle 00:43, Tomas Ulin ha scritto:
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>first of all, if you download the latest source you don't
> have to 
> >>>>>>>specify the "[TCP]" connections at all
> >>>>>>> 
> >>>>>>>      
> >>>>>>>              
> >>>>>>>
> >>>>>>Ok, done.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>1) please look where you started ndb_mgmd, you should
> find a 
> >>>>>>>cluster.log (look at the end "tail -n100 cluster.log")
> >>>>>>> 
> >>>>>>>      
> >>>>>>>              
> >>>>>>>
> >>>>>>ok, got it. unfortunately no trace about the db node #3,
> that's
> >>>>>>the one onto the remote machine
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>2) please make sure that you don't have any trailing
> "ndbd" 
> >>>>>>>processes on the failing machine. (we're working on
> better 
> >>>>>>>detection on clashes), if so kill and restart  (if a
> "ndb" process 
> >>>>>>>hangs this is often due to that there are "multiple"
> processes 
> >>>>>>>trying to connect as the same "id")
> >>>>>>> 
> >>>>>>>      
> >>>>>>>              
> >>>>>>>
> >>>>>>ok. no trailing processes.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>3) make sure you have your [COMPUTER] sections correct in
> the 
> >>>>>>>config file
> >>>>>>> 
> >>>>>>>      
> >>>>>>>              
> >>>>>>>
> >>>>>>ok, done
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>4) make sure that your Ndb.cfg/NDB_CONNECTSTRING points
> to the 
> >>>>>>>actual host:port that run the ndb_mgmd
> >>>>>>> 
> >>>>>>>      
> >>>>>>>              
> >>>>>>>
> >>>>>>sure done.
> >>>>>>If I write something wrong (done just 4 testing) the node
> >>>>>>doesn't go at all into starting phase (should be phase 1, I
> think).
> >>>>>>But when starts, is stick in that state.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>and try again until you get the config right
> >>>>>>> 
> >>>>>>>      
> >>>>>>>              
> >>>>>>>
> >>>>>>mmh... I tried to start 2 db nodes on the same machine
> >>>>>>(of course with different fs), the 2nd db node starts,
> >>>>>>but after phase #4 crashes.
> >>>>>>
> >>>>>>I have a rather long trace file for that.
> >>>>>>the error into ndbd error.log is :
> >>>>>>
> >>>>>>Date/Time: x 20 June 2004 - 23:15:49
> >>>>>>Type of error: error
> >>>>>>Message: Internal program error (failed ndbrequire)
> >>>>>>Fault ID: 2341
> >>>>>>Problem data: DbdihMain.cpp
> >>>>>>Object of reference: DBDIH (Line: 1080) 0x00000002
> >>>>>>ProgramName: NDB Kernel
> >>>>>>ProcessID: 10904
> >>>>>>TraceFile: NDB_TraceFile_1.trace
> >>>>>>***EOM***
> >>>>>>
> >>>>>>
> >>>>>>The mgm config is (for 2 db nodes on same machine)
> >>>>>>[COMPUTER]
> >>>>>>Id: 1
> >>>>>>ByteOrder: Little
> >>>>>>HostName: bestia
> >>>>>>[COMPUTER]
> >>>>>>Id: 2
> >>>>>>ByteOrder: Little
> >>>>>>HostName: bestia
> >>>>>>[MGM]
> >>>>>>Id: 1
> >>>>>>ExecuteOnComputer: 1
> >>>>>>ArbitrationRank: 1
> >>>>>>[DB DEFAULT]
> >>>>>>NoOfReplicas: 2
> >>>>>>[DB]
> >>>>>>Id: 2
> >>>>>>ExecuteOnComputer: 1
> >>>>>>FileSystemPath: /root/ndb/ndb_data1
> >>>>>>[DB]
> >>>>>>Id: 3
> >>>>>>ExecuteOnComputer: 2
> >>>>>>FileSystemPath: /root/ndb/ndb_data2
> >>>>>>[API]
> >>>>>>Id: 4
> >>>>>>ExecuteOnComputer: 1
> >>>>>>ArbitrationRank: 1
> >>>>>>
> >>>>>>Regarding 2 db nodes on different machines, I'm stick
> >>>>>>to node #3 not starting (stops at phase 1, without
> >>>>>>exiting...)
> >>>>>>The only difference in mgm config.ini is the hostname
> >>>>>>of COMPUTER with id #2
> >>>>>>
> >>>>>>any clue?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>>>            
> >>>>>>
> >>>      
> >>>
-- 
Brancaleoni Matteo <mbrancaleoni@stripped>
Espia Srl

Thread
DB node hang on startBrancaleoni Matteo20 Jun
  • Re: DB node hang on startTomas Ulin20 Jun
    • Re: DB node hang on startBrancaleoni Matteo20 Jun
      • Re: DB node hang on startTomas Ulin21 Jun
      • Re: DB node hang on startTomas Ulin21 Jun
        • Re: DB node hang on startMatteo Brancaleoni21 Jun
          • Re: DB node hang on startTomas Ulin21 Jun
            • Re: DB node hang on startTomas Ulin21 Jun
              • Re: DB node hang on startMatteo Brancaleoni21 Jun
                • Re: DB node hang on startTomas Ulin21 Jun
                  • Re: DB node hang on startBrancaleoni Matteo21 Jun
                • Re: DB node hang on starttulin23 Jun
                  • Re: DB node hang on startMatteo Brancaleoni23 Jun
                    • Re: DB node hang on startTomas Ulin23 Jun
                      • Re: DB node hang on startMatteo Brancaleoni23 Jun
                  • Re: DB node hang on startMatteo Brancaleoni23 Jun
Re: DB node hang on startTomas Ulin22 Jun