List:Cluster« Previous MessageNext Message »
From:Tomas Ulin Date:June 21 2004 6:36pm
Subject:Re: DB node hang on start
View as plain text  
you should be able to run with 2 COMPUTER definitions on the same 
machine, let's start with making that work

do the following:
- clean up directory from NDB_Trace, error.log etc
- get back to the stuck state (make sure it's stuck for a minute or so)
- identify the "ndb pid" = e.g. 17993 for the stuck node, you should see 
something like 2004-06-21 18:29:18 [NDB] INFO     -- Angel pid: 17991 
ndb pid: 17993
- force an abort with  kill -6 17993
- tar.gz the NDB_Trace... file and send it to me

T

Matteo Brancaleoni wrote:

>Hi.
>
>I was able to run 2 db nodes on the same machine.
>The problem was into the [COMPUTER]
>definition. Following the demos, I thought that
>I need 2 [COMPUTER] definitions, even pointing
>to the same machine, and let DB node 1 to run
>on computer 1 and db node 2 to run on computer 2
>(that's the same entry).
>
>Simply removing the 2nd computer entry and
>letting db node #2 to run on computer 1 (as the first
>db node) works ok.
>
>so far so good.
>
>but now I have the problem about having the 2nd db node
>on another machine... still no joy.
>
>Matteo
>
>Il lun, 2004-06-21 alle 17:34, Tomas Ulin ha scritto:
>  
>
>>but, I saw the below.  It shows that you did not start cluster empty (-i).
>>
>>T
>>
>>2004-06-20 16:49:34 [NDB] INFO     -- Angel pid: 5558 ndb pid: 5560
>>2004-06-20 16:49:34 [NDB] INFO     -- NDB Cluster -- DB node 2
>>2004-06-20 16:49:34 [NDB] INFO     -- Version 3.5.0 (beta) --
>>2004-06-20 16:49:34 [NDB] INFO     -- Start initiated (version 3.5.0)
>>Dbdict: name=sys/def/SYSTAB_0,id=0
>>Dbdict: name=sys/def/NDB$EVENTS_0,id=2
>>Dbdict: name=test/def/matteotabella2,id=4
>>Dbdict: name=test/def/4/PRIMARY,id=6
>>Dbdict: name=test/def/matteo,id=8
>>Dbdict: name=test/def/8/PRIMARY,id=10
>>Dbdict: name=test/def/mytabella,id=12
>>Dbdict: name=test/def/12/PRIMARY,id=14
>>2004-06-20 16:50:12 [NDB] INFO     -- Started (version 3.5.0)
>>
>>
>>
>>Tomas Ulin wrote:
>>
>>    
>>
>>>when going from 1-node to 2-nodes, did you restart both nodes with -i 
>>>flag?
>>>
>>>T
>>>
>>>Matteo Brancaleoni wrote:
>>>
>>>      
>>>
>>>>Hi
>>>>
>>>>Il lun, 2004-06-21 alle 13:45, Tomas Ulin ha scritto:
>>>> 
>>>>
>>>>        
>>>>
>>>>>Did you try to start the second node with "ndbd -i"?
>>>>>  
>>>>>          
>>>>>
>>>>yes, without success.
>>>>
>>>> 
>>>>
>>>>        
>>>>
>>>>>Brancaleoni Matteo wrote:
>>>>>
>>>>>  
>>>>>
>>>>>          
>>>>>
>>>>>>Hi, thanks for the fast answer :)
>>>>>>see my comments inline.
>>>>>>
>>>>>>Il lun, 2004-06-21 alle 00:43, Tomas Ulin ha scritto:
>>>>>>
>>>>>>
>>>>>>    
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>first of all, if you download the latest source you don't have
> to 
>>>>>>>specify the "[TCP]" connections at all
>>>>>>> 
>>>>>>>      
>>>>>>>              
>>>>>>>
>>>>>>Ok, done.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>1) please look where you started ndb_mgmd, you should find a 
>>>>>>>cluster.log (look at the end "tail -n100 cluster.log")
>>>>>>> 
>>>>>>>      
>>>>>>>              
>>>>>>>
>>>>>>ok, got it. unfortunately no trace about the db node #3, that's
>>>>>>the one onto the remote machine
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>2) please make sure that you don't have any trailing "ndbd" 
>>>>>>>processes on the failing machine. (we're working on better 
>>>>>>>detection on clashes), if so kill and restart  (if a "ndb"
> process 
>>>>>>>hangs this is often due to that there are "multiple" processes
> 
>>>>>>>trying to connect as the same "id")
>>>>>>> 
>>>>>>>      
>>>>>>>              
>>>>>>>
>>>>>>ok. no trailing processes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>3) make sure you have your [COMPUTER] sections correct in the
> 
>>>>>>>config file
>>>>>>> 
>>>>>>>      
>>>>>>>              
>>>>>>>
>>>>>>ok, done
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>4) make sure that your Ndb.cfg/NDB_CONNECTSTRING points to the
> 
>>>>>>>actual host:port that run the ndb_mgmd
>>>>>>> 
>>>>>>>      
>>>>>>>              
>>>>>>>
>>>>>>sure done.
>>>>>>If I write something wrong (done just 4 testing) the node
>>>>>>doesn't go at all into starting phase (should be phase 1, I
> think).
>>>>>>But when starts, is stick in that state.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>and try again until you get the config right
>>>>>>> 
>>>>>>>      
>>>>>>>              
>>>>>>>
>>>>>>mmh... I tried to start 2 db nodes on the same machine
>>>>>>(of course with different fs), the 2nd db node starts,
>>>>>>but after phase #4 crashes.
>>>>>>
>>>>>>I have a rather long trace file for that.
>>>>>>the error into ndbd error.log is :
>>>>>>
>>>>>>Date/Time: x 20 June 2004 - 23:15:49
>>>>>>Type of error: error
>>>>>>Message: Internal program error (failed ndbrequire)
>>>>>>Fault ID: 2341
>>>>>>Problem data: DbdihMain.cpp
>>>>>>Object of reference: DBDIH (Line: 1080) 0x00000002
>>>>>>ProgramName: NDB Kernel
>>>>>>ProcessID: 10904
>>>>>>TraceFile: NDB_TraceFile_1.trace
>>>>>>***EOM***
>>>>>>
>>>>>>
>>>>>>The mgm config is (for 2 db nodes on same machine)
>>>>>>[COMPUTER]
>>>>>>Id: 1
>>>>>>ByteOrder: Little
>>>>>>HostName: bestia
>>>>>>[COMPUTER]
>>>>>>Id: 2
>>>>>>ByteOrder: Little
>>>>>>HostName: bestia
>>>>>>[MGM]
>>>>>>Id: 1
>>>>>>ExecuteOnComputer: 1
>>>>>>ArbitrationRank: 1
>>>>>>[DB DEFAULT]
>>>>>>NoOfReplicas: 2
>>>>>>[DB]
>>>>>>Id: 2
>>>>>>ExecuteOnComputer: 1
>>>>>>FileSystemPath: /root/ndb/ndb_data1
>>>>>>[DB]
>>>>>>Id: 3
>>>>>>ExecuteOnComputer: 2
>>>>>>FileSystemPath: /root/ndb/ndb_data2
>>>>>>[API]
>>>>>>Id: 4
>>>>>>ExecuteOnComputer: 1
>>>>>>ArbitrationRank: 1
>>>>>>
>>>>>>Regarding 2 db nodes on different machines, I'm stick
>>>>>>to node #3 not starting (stops at phase 1, without
>>>>>>exiting...)
>>>>>>The only difference in mgm config.ini is the hostname
>>>>>>of COMPUTER with id #2
>>>>>>
>>>>>>any clue?
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>            
>>>>>>
>>>      
>>>

Thread
DB node hang on startBrancaleoni Matteo20 Jun
  • Re: DB node hang on startTomas Ulin20 Jun
    • Re: DB node hang on startBrancaleoni Matteo20 Jun
      • Re: DB node hang on startTomas Ulin21 Jun
      • Re: DB node hang on startTomas Ulin21 Jun
        • Re: DB node hang on startMatteo Brancaleoni21 Jun
          • Re: DB node hang on startTomas Ulin21 Jun
            • Re: DB node hang on startTomas Ulin21 Jun
              • Re: DB node hang on startMatteo Brancaleoni21 Jun
                • Re: DB node hang on startTomas Ulin21 Jun
                  • Re: DB node hang on startBrancaleoni Matteo21 Jun
                • Re: DB node hang on starttulin23 Jun
                  • Re: DB node hang on startMatteo Brancaleoni23 Jun
                    • Re: DB node hang on startTomas Ulin23 Jun
                      • Re: DB node hang on startMatteo Brancaleoni23 Jun
                  • Re: DB node hang on startMatteo Brancaleoni23 Jun
Re: DB node hang on startTomas Ulin22 Jun