List:Cluster« Previous MessageNext Message »
From:Jim Hoadley Date:May 3 2005 1:45am
Subject:failed ndbrequire -- reason?
View as plain text  
After running for many days, my cluster crashed this 
afternoon. Can someone help me understand the reason?

Any help would be greatly appreciated.

This was the sequence of events. Node 1 crashed while I
was logged in with the mysql client. I restarted node 1,
then both node 1 and node 3 (both on the same host) crashed.
I restarted nodes 1 and 3 successfully and they're still running.

Here's what the error log for node1 says. Node 3 did not 
have an error log. Please let me know which lines from the 
trace logs are relevant and I'll post theose too.

   Date/Time: Monday 2 May 2005 - 17:49:05
   Type of error: error
   Message: Internal program error (failed ndbrequire)
   Fault ID: 2341
   Problem data: DbtcMain.cpp
   Object of reference: DBTC (Line: 12251) 0x0000000a
   ProgramName: ndbd
   ProcessID: 3669
   TraceFile: /usr/local/mysql/ndb_1_trace.log.2
   ***EOM***
                                                                               

                       
   Date/Time: Monday 2 May 2005 - 17:52:14
   Type of error: error
   Message: Node failed during system restart
   Fault ID: 2308
   Problem data: Unhandled node failure of started node during restart
   Object of reference: NDBCNTR (Line: 1417) 0x0000000a
   ProgramName: ndbd
   ProcessID: 13585
   TraceFile: /usr/local/mysql/ndb_1_trace.log.3
   ***EOM***

These are my specs.

3-host cluster:

   host1 = node [1], node [3], API [6]
   host2 = node [2], node [4], API [7]
   host3 = mgm [5]

Each host has 6GB RAM and 2 3.6G Xeons
RedHat Enterprise Linux 3 with hugemem kernel
2.4.21-27.0.4.ELhugemem #1 SMP

Here's my config.ini:

   [ndbd default]
   LockPagesInMainMemory=1
   TransactionDeadlockDetectionTimeout=14000
   NoOfReplicas= 2
   MaxNoOfConcurrentOperations=131072
   DataMemory= 1900M
   IndexMemory= 400M
   Diskless= 0
   DataDir= /var/mysql-cluster
   TimeBetweenWatchDogCheck=10000
   HeartbeatIntervalDbDb=10000
   HeartbeatIntervalDbApi=10000
   NoOfFragmentLogFiles=64
   
   NoOfDiskPagesToDiskAfterRestartTUP=54   #40
   NoOfDiskPagesToDiskAfterRestartACC=8    #20
   
   MaxNoOfAttributes = 2000                #1000
   MaxNoOfOrderedIndexes = 5000            #128
   MaxNoOfUniqueHashIndexes = 5000         #64
    
   [ndbd]
   HostName= 10.0.1.199
    
   [ndbd]
   HostName= 10.0.1.200
 
   [ndbd]
   HostName= 10.0.1.199
   
   [ndbd]
   HostName= 10.0.1.200

   [ndb_mgmd]
   HostName= 10.0.1.198
   PortNumber= 2200
   
   [mysqld]
   
   [mysqld]
   [tcp default]
   PortNumber= 2202


And show:

   ndb_mgm> show
   Connected to Management Server at: 10.0.1.198:2200
   Cluster Configuration
   ---------------------
   [ndbd(NDB)]     4 node(s)
   id=1    @10.0.1.199  (Version: 4.1.11, Nodegroup: 0)
   id=2    @10.0.1.200  (Version: 4.1.11, Nodegroup: 0, Master)
   id=3    @10.0.1.199  (Version: 4.1.11, Nodegroup: 1)
   id=4    @10.0.1.200  (Version: 4.1.11, Nodegroup: 1)
    
   [ndb_mgmd(MGM)] 1 node(s)
   id=5    @10.0.1.198  (Version: 4.1.11)
 
   [mysqld(API)]   2 node(s)
   id=6    @10.0.1.199  (Version: 4.1.11)
   id=7    @10.0.1.200  (Version: 4.1.11)


Thanks in advance.

-- Jim



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Thread
failed ndbrequire -- reason?Jim Hoadley3 May
Re: failed ndbrequire -- reason?Jim Hoadley3 May
  • Re: failed ndbrequire -- reason?Jonas Oreland4 May
    • RAIDLeonard Cremer4 May
Re: RAIDMikael Ronström4 May
  • Re: RAIDSimon Garner5 May
    • Re: RAIDMikael Ronström5 May
    • Re: RAIDpekka6 May
Re: RAIDClint Byrum4 May
  • Re: RAIDMikael Ronström5 May