List:Cluster« Previous MessageNext Message »
From:Tomas Ulin Date:October 10 2004 9:33am
Subject:Re: It's not going well....
View as plain text  
Paul,

- you should not have a problem fitting your database into the cluster 
if it is only 1.6GB, the problem lies somewhere else
- The cluster.log will inform you of the memory usage (50% full, 60% 
full, 70 % full etc)
- If you want your data to be in the database when you restart you have 
to make sure 2 things, NOT start diskless, not start ndbd with --initial 
(it will clean your data files)

Please send me your:

config file, cluster log and error.log's of your failed nodes.

T

Paul Gardner wrote:

>I'm running RH ES 3.  But I'm not even close to the 2GB per process
>limit, I'm at 1.3GB right now.
>
>Thanks everyone for their help, I'm going to try some more
>suggestions, and then give up if I get no joy.
>
>A MySQL sales rep told me that cluster goes to production in two weeks
>time - this surprises me - I thought 5.0 was still some way off?
>
>Cheers
>Paul
>
>
>On Sat, 9 Oct 2004 14:19:49 +0200, Eugen Leitl <eugen@stripped> wrote:
>  
>
>>On 32 bit architectures no process can use more than 2 GBytes of memory.
>>Are you running x86_64 Linux, or another 64 bit UNIX?
>>
>>
>>
>>On Sat, Oct 09, 2004 at 12:40:18PM +0100, Paul Gardner wrote:
>>    
>>
>>>Hi - desperately need some help on this, two real problems:
>>>
>>>1) I simply cannot seem to fit my database into the Cluster because I
>>>cannot set IndexMemory much higher than 768M and DataMemory much
>>>higher than 1344M.  My current db size on a running traditional MySQL
>>>is 1.6GB (700MB of which are indexes).  I'm running with 2 replicas
>>>and 4 physical computers for storage nodes - each computer running 2
>>>instances of ndbd.  Each computer has 8GB of RAM.  The storage nodes
>>>run nothing else apart from ndbd.
>>>
>>>Surely I am missing something here?  There must be a way forward to
>>>import more data?  Would more nodes fix it?  Running more instances of
>>>ndbd?  Clearly adding more RAM to the machine won't help, but I'm
>>>finding it hard to swallow that there is a limit to the database size
>>>that cluster can handle - MyISAM would eat this for breakfast.
>>>
>>>2) I suspect this problem relates to 1), but after importing about 95%
>>>of the database, I cannot import any more tables.  I get an error
>>>'4009'.
>>>
>>>ERROR 1005 (HY000) at line 1: Can't create table
>>>'./dvdrental/#sql-1f1f_3.frm' (errno: 4009)
>>>[root@cl-mn-2 tmp]# /usr/local/mysql/bin/perror --ndb 4009
>>>Error code 4009:  Cluster Failure: Unknown result: Unknown result error
>>>
>>>After this error occurs, the cluster shuts down.
>>>
>>>Desperate for any suggestions?
>>>
>>>Thanks
>>>Paul
>>>
>>>
>>>Output from all relevant log files are below (trace file was large, so
>>>not included):
>>>
>>>2004-10-09 11:56:54 [MgmSrvr] ALERT    -- Node 9: Node 5 Disconnected
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Lost connection to node 5
>>>2004-10-09 11:56:54 [MgmSrvr] ALERT    -- Node 1: Arbitration check
>>>won - node group majority
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1: President restarts
>>>arbitration thread [state=6]
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1: GCP Take over started
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1: LCP Take over started
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1: ParticipatingDIH =
>>>0000000000000000
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1: ParticipatingLQH =
>>>0000000000000000
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1:
>>>m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0
>>>0000000000000000]
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1:
>>>m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0
>>>0000000000000000]
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1: m_LAST_LCP_FRAG_ORD
>>>= [SignalCounter: m_count=0 0000000000000000]
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1:
>>>m_LCP_COMPLETE_REP_From_Master_Received = 1
>>>2004-10-09 11:56:54 [MgmSrvr] INFO     -- Node 1: GCP Take over completed
>>>2004-10-09 11:56:55 [MgmSrvr] ALERT    -- Node 9: Node 1 Disconnected
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Lost connection to node 1
>>>2004-10-09 11:56:55 [MgmSrvr] ALERT    -- Node 2: Arbitration check
>>>won - node group majority
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2: President restarts
>>>arbitration thread [state=6]
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2: GCP Take over started
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2: LCP Take over started
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2: ParticipatingDIH =
>>>0000000000000000
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2: ParticipatingLQH =
>>>0000000000000000
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2:
>>>m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0
>>>0000000000000000]
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2:
>>>m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0
>>>0000000000000000]
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2: m_LAST_LCP_FRAG_ORD
>>>= [SignalCounter: m_count=0 0000000000000000]
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Node 2:
>>>m_LCP_COMPLETE_REP_From_Master_Received = 1
>>>2004-10-09 11:56:55 [MgmSrvr] ALERT    -- Node 9: Node 2 Disconnected
>>>2004-10-09 11:56:55 [MgmSrvr] INFO     -- Lost connection to node 2
>>>2004-10-09 11:56:56 [MgmSrvr] ALERT    -- Node 9: Node 6 Disconnected
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Lost connection to node 6
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 3: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 4
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 4: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 4
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 7: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 4
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 8: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 4
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] ALERT    -- Node 9: Node 3 Disconnected
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Lost connection to node 3
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 7: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 5
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 4: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 5
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 8: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 5
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] ALERT    -- Node 9: Node 7 Disconnected
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Lost connection to node 7
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 4: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 6
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 8: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 6
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] ALERT    -- Node 9: Node 7 Disconnected
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Lost connection to node 7
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 4: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 6
>>>sig->failNo =
>>>2004-10-09 11:56:56 [MgmSrvr] INFO     -- Node 8: Possible bug in
>>>Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 6
>>>sig->failNo =
>>>2004-10-09 11:56:57 [MgmSrvr] ALERT    -- Node 9: Node 4 Disconnected
>>>2004-10-09 11:56:57 [MgmSrvr] INFO     -- Lost connection to node 4
>>>2004-10-09 11:56:57 [MgmSrvr] ALERT    -- Node 9: Node 8 Disconnected
>>>2004-10-09 11:56:57 [MgmSrvr] INFO     -- Lost connection to node 8
>>>
>>>
>>>Date/Time: Saturday 9 October 2004 - 11:56:09
>>>Type of error: error
>>>Message: Internal program error (failed ndbrequire)
>>>Fault ID: 2341
>>>Problem data: Dbdict.cpp
>>>Object of reference: DBDICT (Line: 776) 0x0000000a
>>>ProgramName: NDB Kernel
>>>ProcessID: 8358
>>>TraceFile: ./ndb_5_trace.log.4
>>>***EOM***
>>>
>>>--
>>>MySQL Cluster Mailing List
>>>For list archives: http://lists.mysql.com/cluster
>>>To unsubscribe:    http://lists.mysql.com/cluster?unsub=1
>>>      
>>>
>>--
>>Eugen* Leitl <a href="http://leitl.org">leitl</a>
>>______________________________________________________________
>>ICBM: 48.07078, 11.61144            http://www.leitl.org
>>8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
>>http://moleculardevices.org         http://nanomachines.net
>>
>>
>>
>>    
>>
>
>  
>

Thread
It's not going well....Paul Gardner9 Oct
Re: It's not going well....Paul Gardner10 Oct
  • Re: It's not going well....Tomas Ulin10 Oct