List:Cluster« Previous MessageNext Message »
From:Jim Hoadley Date:May 3 2005 3:38am
Subject:Re: failed ndbrequire -- reason?
View as plain text  
More on this problem. A couple of hours later, node 4 went down, 
then all nodes died, taking down the cluster. 

Looks like the error messages available to me our more interesting 
this time.

Here's what the ndb_error logs say:

Node 1:

Date/Time: Monday 2 May 2005 - 20:15:08
Type of error: assert
Message: Assertion, probably a programming error
Fault ID: 2301
Problem data: ArrayPool<T>::getPtr
Object of reference: ../../../../../ndb/src/kernel/vm/ArrayPool.hpp line: 350
(b
lock: BACKUP)
ProgramName: ndbd
ProcessID: 13637
TraceFile: /usr/local/mysql/ndb_1_trace.log.4
***EOM***

Node 2:

Date/Time: Monday 2 May 2005 - 20:15:22
Type of error: assert
Message: Assertion, probably a programming error
Fault ID: 2301
Problem data: ArrayPool<T>::getPtr
Object of reference: ../../../../../ndb/src/kernel/vm/ArrayPool.hpp line: 350
(b
lock: BACKUP)
ProgramName: ndbd
ProcessID: 3666
TraceFile: /usr/local/mysql/ndb_2_trace.log.8
***EOM***

Node 3:

<none>

Node 4:

<none>

Here's what the ndb_out files say:

Node 1:
Error handler shutting down system
Error handler shutdown completed - exiting

Node 3:
2005-05-02 20:15:23 [NDB] INFO     -- Received signal 11. Running error
handler.

Node 2:
Ndb kernel is stuck in: Polling for Receive
Error handler shutting down system
Error handler shutdown completed - exiting

Node 4:
2005-05-02 20:00:04 [NDB] INFO     -- Received signal 11. Running error
handler.

It looks like node 1 and node 2 died, then with no node available in that node
group, the management server had to shut down node 3 and node 4.

 The "object of reference" line in the error log mentions BACKUP. I began an 
ndbcluster BACKUP just 10 or 15 minutes prior to the crash (at 20:00). Could 
that have been the cause? If so, why?

The previous backup ran (at 18:00) when one of the 4 nodes was offline.
When it finished I deleted the directories. Could either of these caused some 
corruption?

-- Jim




--- Jim Hoadley <j_hoadley@stripped> wrote:
> After running for many days, my cluster crashed this 
> afternoon. Can someone help me understand the reason?
> 
> Any help would be greatly appreciated.
> 
> This was the sequence of events. Node 1 crashed while I
> was logged in with the mysql client. I restarted node 1,
> then both node 1 and node 3 (both on the same host) crashed.
> I restarted nodes 1 and 3 successfully and they're still running.
> 
> Here's what the error log for node1 says. Node 3 did not 
> have an error log. Please let me know which lines from the 
> trace logs are relevant and I'll post theose too.
> 
>    Date/Time: Monday 2 May 2005 - 17:49:05
>    Type of error: error
>    Message: Internal program error (failed ndbrequire)
>    Fault ID: 2341
>    Problem data: DbtcMain.cpp
>    Object of reference: DBTC (Line: 12251) 0x0000000a
>    ProgramName: ndbd
>    ProcessID: 3669
>    TraceFile: /usr/local/mysql/ndb_1_trace.log.2
>    ***EOM***
>                                                                              
>  
> 
>                        
>    Date/Time: Monday 2 May 2005 - 17:52:14
>    Type of error: error
>    Message: Node failed during system restart
>    Fault ID: 2308
>    Problem data: Unhandled node failure of started node during restart
>    Object of reference: NDBCNTR (Line: 1417) 0x0000000a
>    ProgramName: ndbd
>    ProcessID: 13585
>    TraceFile: /usr/local/mysql/ndb_1_trace.log.3
>    ***EOM***
> 
> These are my specs.
> 
> 3-host cluster:
> 
>    host1 = node [1], node [3], API [6]
>    host2 = node [2], node [4], API [7]
>    host3 = mgm [5]
> 
> Each host has 6GB RAM and 2 3.6G Xeons
> RedHat Enterprise Linux 3 with hugemem kernel
> 2.4.21-27.0.4.ELhugemem #1 SMP
> 
> Here's my config.ini:
> 
>    [ndbd default]
>    LockPagesInMainMemory=1
>    TransactionDeadlockDetectionTimeout=14000
>    NoOfReplicas= 2
>    MaxNoOfConcurrentOperations=131072
>    DataMemory= 1900M
>    IndexMemory= 400M
>    Diskless= 0
>    DataDir= /var/mysql-cluster
>    TimeBetweenWatchDogCheck=10000
>    HeartbeatIntervalDbDb=10000
>    HeartbeatIntervalDbApi=10000
>    NoOfFragmentLogFiles=64
>    
>    NoOfDiskPagesToDiskAfterRestartTUP=54   #40
>    NoOfDiskPagesToDiskAfterRestartACC=8    #20
>    
>    MaxNoOfAttributes = 2000                #1000
>    MaxNoOfOrderedIndexes = 5000            #128
>    MaxNoOfUniqueHashIndexes = 5000         #64
>     
>    [ndbd]
>    HostName= 10.0.1.199
>     
>    [ndbd]
>    HostName= 10.0.1.200
>  
>    [ndbd]
>    HostName= 10.0.1.199
>    
>    [ndbd]
>    HostName= 10.0.1.200
> 
>    [ndb_mgmd]
>    HostName= 10.0.1.198
>    PortNumber= 2200
>    
>    [mysqld]
>    
>    [mysqld]
>    [tcp default]
>    PortNumber= 2202
> 
> 
> And show:
> 
>    ndb_mgm> show
>    Connected to Management Server at: 10.0.1.198:2200
>    Cluster Configuration
>    ---------------------
>    [ndbd(NDB)]     4 node(s)
>    id=1    @10.0.1.199  (Version: 4.1.11, Nodegroup: 0)
>    id=2    @10.0.1.200  (Version: 4.1.11, Nodegroup: 0, Master)
>    id=3    @10.0.1.199  (Version: 4.1.11, Nodegroup: 1)
>    id=4    @10.0.1.200  (Version: 4.1.11, Nodegroup: 1)
>     
>    [ndb_mgmd(MGM)] 1 node(s)
>    id=5    @10.0.1.198  (Version: 4.1.11)
>  
>    [mysqld(API)]   2 node(s)
>    id=6    @10.0.1.199  (Version: 4.1.11)
>    id=7    @10.0.1.200  (Version: 4.1.11)
> 
> 
> Thanks in advance.
> 
> -- Jim
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> -- 
> MySQL Cluster Mailing List
> For list archives: http://lists.mysql.com/cluster
> To unsubscribe:    http://lists.mysql.com/cluster?unsub=1
> 
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Thread
failed ndbrequire -- reason?Jim Hoadley3 May
Re: failed ndbrequire -- reason?Jim Hoadley3 May
  • Re: failed ndbrequire -- reason?Jonas Oreland4 May
    • RAIDLeonard Cremer4 May
Re: RAIDMikael Ronström4 May
  • Re: RAIDSimon Garner5 May
    • Re: RAIDMikael Ronström5 May
    • Re: RAIDpekka6 May
Re: RAIDClint Byrum4 May
  • Re: RAIDMikael Ronström5 May