List:Cluster« Previous MessageNext Message »
From:Aaron Weller // Crucial Paradigm Date:October 1 2010 6:59am
Subject:Re: MySQL Cluster -- Table Full
View as plain text  
Hi Johan,

1. I don't think it was swapping, as the server has 144GB RAM, 130GB to 
NDB, and the rest for the OS. However I did not check at the time.
2. No, no backups were being done.  The server was not yet in 
production, and no backups were configured.
3. These are brand new Seagate 300GB 15k SAS (8 in RAID 10), so I'm not 
sure if that would be the issue or not.
4. Hope not! 

Unfortunately I don't have the iostat or free -m during the crash, 
however ther crashes happened very quickly (within and under a minute in 
most cases). 

Are there any settings we can tweak?  Should we be using multiple 
smaller tablespace files rather than 1 large 500GB tablespace? Should we 
be doing checkpoints less often?

Thanks!
Aaron

Johan Andersson wrote:
> GCP Stops can happen b/c
> 1) swapping
> 2) os backups / copying of large files
> 3) slow disks / too few disks and too much IO (write cache enabled?)
> 4) bugs
>
> What was the io load (from iostat -mx 1)  when the crash happened?
> What does free -m show?
>
> BR
> johan andersson
>
> Aaron Weller // Crucial Paradigm wrote:
>> Hey Johan and Jonas,
>>
>> Thanks for your help so far!  I've been working with Karl on these 
>> issues.
>>
>> We ended up taking your suggestions and did the following:
>>
>> * Ran NDB initial on both nodes
>> * Re-created tablespace using a single 500GB file
>> * Re-created undo log using a single 50GB file
>> * Set max rows to 650 million
>>
>> When we started restoring databases of anything larger than a few 
>> thousand records, or when we used mysqlslap to hit the database with 
>> a decent number of queries after a few seconds it would crash and we 
>> would get an error like this in the logs:
>>
>> ---------------------------------
>> 2010-09-30 11:39:22 [ndbd] INFO     -- Node 4 killed this node 
>> because GCP stop was detected
>> 2010-09-30 11:39:22 [ndbd] INFO     -- NDBCNTR (Line: 274) 0x00000006
>> 2010-09-30 11:39:22 [ndbd] INFO     -- Error handler shutting down 
>> system
>> 2010-09-30 11:39:22 [ndbd] INFO     -- Error handler shutdown 
>> completed - exiting
>> Time: Thursday 30 September 2010 - 11:39:22
>> Status: Temporary error, restart node
>> Message: System error, node killed during node restart by other node 
>> (Internal error, programming error or missing error message, please 
>> report a bug)
>> Error: 2303
>> Error data: Node 4 killed this node because GCP stop was detected
>> Error object: NDBCNTR (Line: 274) 0x00000006
>> Program: /usr/bin/ndbd
>> Pid: 31222
>> Version: mysql-5.1.47 ndb-7.1.5
>> Trace: /var/lib/mysql-cluster/ndb_4_trace.log.7
>> ***EOM***
>>
>> 2010-09-30 11:39:27 [ndbd] ALERT    -- Node 4: Forced node shutdown 
>> completed. Caused by error 2303: 'System error, node killed during 
>> node restart by other node(Internal error, programming error or 
>> missing error message, please report a bug). Temporary error, restart 
>> node'.
>> ---------------------------------
>>
>> Doing some research I found this regarding the GCP stop errors: 
>>
> http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#mysql-cluster-ndbd-definition-gcp-stop-errors
> 
>>
>>
>> And I also found this: 
>>
> http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-diskiothreadpool
> 
>>
>>
>>
>> Unforutnately due to an extremely tight deadline and having to put 
>> the server in production 6 hours ago, we had to revert back to the 
>> more stable config of not using a tablespace file. This however means 
>> we are limited by the amount of data we can put in our tables.
>> I have a few questions, if anyone can help?
>>
>> * Is there any other command other than "all report mem" for checking 
>> the actual amount of RAM being used of a table? We have noticed that 
>> we have received table full when we have plenty of RAM left, but no 
>> tablespace.  However we don't know how to check to see when we are 
>> reaching that point.
>>
>> * Any ideas on how to fix the issue with NDB  crashing when we add a 
>> larger tablespace file?  And without causing any downtime while doing 
>> it? There are some things about DiskIOThreadPool, however I'm not 
>> sure what our ideal values would be for our dataset, etc.
>>
>> Thanks!
>> Aaron
>>
>> Karl Kloppenborg wrote:
>>> Hi guys,
>>>
>>> In light of the continual problems we face, Aaron will be taking 
>>> over this so I can continue testing...
>>>
>>> Thanks for helping guys!
>>>
>>> Cheers,
>>> Karl Kloppenborg.
>>>
>>>
>>> On 30/09/2010, at 17:31, Johan Andersson wrote:
>>>
>>>  
>>>> Hi,
>>>>
>>>> You are running out of extents --> add data file
>>>> Then if you are going to load in a lot of records (>100M) you need 
>>>> to add max_rows too.
>>>> max_rows will underneath make the data nodes to create more fragments.
>>>> With many rows, more fragments than the default are needed to store 
>>>> the primary key hash index.
>>>>
>>>> BR
>>>> johan
>>>>
>>>>
>>>> Karl Kloppenborg wrote:
>>>>   
>>>>> Hi Johan,
>>>>>
>>>>> Can you please explain that?
>>>>>
>>>>> I am a bit confused....
>>>>>
>>>>> Cheers,
>>>>> Karl.
>>>>> On 30/09/2010, at 16:51, Johan Andersson wrote:
>>>>>
>>>>>  
>>>>>     
>>>>>> Karl,
>>>>>> Just to summarize:
>>>>>> - You must do ALTER TS ADD DATAFILE _and_ set max_rows.
>>>>>>
>>>>>> BR
>>>>>> johan
>>>>>>
>>>>>> Jonas Oreland wrote:
>>>>>>          
>>>>>>> On 09/30/10 07:27, Karl Kloppenborg wrote:
>>>>>>>               
>>>>>>>> Hey Jonas,
>>>>>>>>
>>>>>>>> Thanks for the reply, we will try implement the Max_rows
> after 
>>>>>>>> the rebuild takes place, However I have a few questions:
>>>>>>>>
>>>>>>>> 1) after doing some writes and getting the table full I 
>>>>>>>> executed the show warnings:
>>>>>>>>
>>>>>>>>
> +-------+------+--------------------------------------------------------+ 
>>>>>>>>
>>>>>>>> | Level | Code | 
>>>>>>>> Message                                                |
>>>>>>>>
> +-------+------+--------------------------------------------------------+ 
>>>>>>>>
>>>>>>>> | Error | 1296 | Got error 1601 'Out extents, tablespace
> full' 
>>>>>>>> from NDB |
>>>>>>>> | Error | 1114 | The table 'my_ndb_awesome_large_table'
> is 
>>>>>>>> full                   |
>>>>>>>>
> +-------+------+--------------------------------------------------------+ 
>>>>>>>>
>>>>>>>> 2 rows in set (0.00 sec)
>>>>>>>>                       
>>>>>>> 1601 means data-on-disk if i'm not mistaken.
>>>>>>>
>>>>>>> ndbd's tablespaces doesn't auto grow.
>>>>>>> "alter tablespace X add datafile Y initial_size=10G" or 
>>>>>>> something should do it.
>>>>>>>
>>>>>>> /Jonas
>>>>>>>
>>>>>>>               
>>>>>>>> Could you explain this?
>>>>>>>>
>>>>>>>> 2) I will try implementing max_rows..
>>>>>>>>
>>>>>>>> 3) we are not using ndbmtd but each server has 16 cores /
> 
>>>>>>>> 144GB  should we?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Karl Kloppenborg.
>>>>>>>>
>>>>>>>> On 30/09/2010, at 15:19, Jonas Oreland wrote:
>>>>>>>>
>>>>>>>>                     
>>>>>>>>> On 09/30/10 07:13, Karl Kloppenborg wrote:
>>>>>>>>>                           
>>>>>>>>>> Hi guys,
>>>>>>>>>>
>>>>>>>>>> We initially didn't but we got this problem at 92
> million rows,
>>>>>>>>>>
>>>>>>>>>> After a lot of research we found a post that
> stated maybe 
>>>>>>>>>> increasing the max rows would help, however after
> reading 
>>>>>>>>>> more on max_rows with NDB it was found that it is
> not used by 
>>>>>>>>>> NDBcluster engine and ignored? is this true?
>>>>>>>>>>                                   
>>>>>>>>> 1) after you get "table full", issue "show warnings",
> this 
>>>>>>>>> will show you exact error code
>>>>>>>>>
>>>>>>>>> 2) maxrows *should* help
>>>>>>>>>
>>>>>>>>> 3) are you using ndbmtd ?
>>>>>>>>>
>>>>>>>>> /Jonas
>>>>>>>>>
>>>>>>>>>                           
>>>>>>>>>> However take note, that we require that it hold
> 600 million 
>>>>>>>>>> rows... challenge..
>>>>>>>>>>
>>>>>>>>>> I will also add my create table syntax to show
> you what we're 
>>>>>>>>>> doing.
>>>>>>>>>>
>>>>>>>>>> CREATE TABLE `my_ndb_awesome_large_table ` (
>>>>>>>>>> `user_id` int(4) NOT NULL,
>>>>>>>>>> `description` varchar(50) NOT NULL,
>>>>>>>>>> `type` varchar(64) NOT NULL,
>>>>>>>>>> `count` int(4) NOT NULL,
>>>>>>>>>> `after` int(3) NOT NULL,
>>>>>>>>>> `active` int(1) NOT NULL,
>>>>>>>>>> `lastactivity` timestamp NOT NULL DEFAULT
> CURRENT_TIMESTAMP 
>>>>>>>>>> ON UPDATE CURRENT_TIMESTAMP
>>>>>>>>>> )  ENGINE=ndbcluster DEFAULT CHARSET=utf8
>>>>>>>>>>
>>>>>>>>>> Any thoughts on what this virtual "level" might
> be? because 
>>>>>>>>>> as you can see in my last email have not run out
> of index / 
>>>>>>>>>> data space?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 30/09/2010, at 14:59, Jonas Oreland wrote:
>>>>>>>>>>
>>>>>>>>>>                                 
>>>>>>>>>>> Are you using "maxrows" in your table
> definition ?
>>>>>>>>>>>
>>>>>>>>>>> /Jonas
>>>>>>>>>>>
>>>>>>>>>>> On 09/30/10 06:15, Karl Kloppenborg wrote:
>>>>>>>>>>>                                       
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> We have setup a MySQL cluster (pretty
> standard two NDB 
>>>>>>>>>>>> nodes + two management servers)
>>>>>>>>>>>> 2x cluster processing nodes (running the
> MySQL NDB daemon + 
>>>>>>>>>>>> MySQL server)
>>>>>>>>>>>>   - 144GB ram
>>>>>>>>>>>>   -  8x 300GB SAS - Raid 10     -
> Data-Storage = 135GB 
>>>>>>>>>>>> Ram     - Index-Storage = 5GB RAM
>>>>>>>>>>>> However at 92Million rows in a table, it
> is returning the 
>>>>>>>>>>>> TableFull error?
>>>>>>>>>>>>
>>>>>>>>>>>> My config is as follows:
>>>>>>>>>>>> [NDBD DEFAULT]
>>>>>>>>>>>> NoOfReplicas=2
>>>>>>>>>>>> LockPagesInMainMemory=1
>>>>>>>>>>>>
>>>>>>>>>>>> DataMemory=131G
>>>>>>>>>>>> IndexMemory=10G
>>>>>>>>>>>>
>>>>>>>>>>>> TimeBetweenLocalCheckpoints=6
>>>>>>>>>>>> NoOfFragmentLogFiles=500
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [MYSQLD DEFAULT]
>>>>>>>>>>>>
>>>>>>>>>>>> [NDB_MGMD DEFAULT]
>>>>>>>>>>>>
>>>>>>>>>>>> [TCP DEFAULT]
>>>>>>>>>>>> SendBufferMemory=8M
>>>>>>>>>>>> ReceiveBufferMemory=8M
>>>>>>>>>>>>
>>>>>>>>>>>> # Section for the cluster management
> node
>>>>>>>>>>>> [NDB_MGMD]
>>>>>>>>>>>> ID=1 #LB1 ID is 1
>>>>>>>>>>>> Datadir=/var/lib/mysql-cluster
>>>>>>>>>>>> HostName=#.#.#.# #PRIVATE IP OF LB1
>>>>>>>>>>>>
>>>>>>>>>>>> [NDB_MGMD]
>>>>>>>>>>>> HostName=#.#.#.# #PRIVATE IP OF LB2
>>>>>>>>>>>> ID=2 #ID of LB2 is 2
>>>>>>>>>>>> Datadir=/var/lib/mysql-cluster
>>>>>>>>>>>>
>>>>>>>>>>>> # Section for the storage nodes
>>>>>>>>>>>> [NDBD]
>>>>>>>>>>>> # IP address of the first storage node
>>>>>>>>>>>> HostName=#.#.#.# # PRIVATE IP OF DB1
>>>>>>>>>>>> DataDir=/var/lib/mysql-cluster
>>>>>>>>>>>>
>>>>>>>>>>>> [NDBD]
>>>>>>>>>>>> # IP address of the second storage node
>>>>>>>>>>>> HostName=#.#.#.# #PRIVATE IP OF DB2
>>>>>>>>>>>> DataDir=/var/lib/mysql-cluster
>>>>>>>>>>>>
>>>>>>>>>>>> # one [MYSQLD] per storage node
>>>>>>>>>>>> [MYSQLD]
>>>>>>>>>>>> [MYSQLD]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Can anyone please shed some light on this
> matter?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Karl Kloppenborg --
>>>>>>>>>>>> MySQL Cluster Mailing List
>>>>>>>>>>>> For list archives:
> http://lists.mysql.com/cluster
>>>>>>>>>>>> To unsubscribe:    
>>>>>>>>>>>> http://lists.mysql.com/cluster?unsub=1
>>>>>>>>>>>>
>>>>>>>>>>>>                                          
>     
>>>>>>>> -- 
>>>>>>>> MySQL Cluster Mailing List
>>>>>>>> For list archives: http://lists.mysql.com/cluster
>>>>>>>> To unsubscribe:    
>>>>>>>> http://lists.mysql.com/cluster?unsub=1
>>>>>>>>
>>>>>>>>                       
>>>>>>>                 
>>>>> -- 
>>>>> MySQL Cluster Mailing List
>>>>> For list archives: http://lists.mysql.com/cluster
>>>>> To unsubscribe:    
>>>>> http://lists.mysql.com/cluster?unsub=1
>>>>>
>>>>>  
>>>>>       
>>>
>>>
>>>   
>>
>>
>
>

Thread
MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
  • Re: MySQL Cluster -- Table FullAdam Dixon30 Sep
    • Re: MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
  • Re: MySQL Cluster -- Table FullJonas Oreland30 Sep
    • Re: MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
      • Re: MySQL Cluster -- Table FullJonas Oreland30 Sep
        • Re: MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
          • Re: MySQL Cluster -- Table FullJonas Oreland30 Sep
            • Re: MySQL Cluster -- Table FullJohan Andersson30 Sep
              • Re: MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
                • Re: MySQL Cluster -- Table FullJohan Andersson30 Sep
                  • Re: MySQL Cluster -- Table FullKarl Kloppenborg1 Oct
                    • Re: MySQL Cluster -- Table FullAaron Weller // Crucial Paradigm1 Oct
                      • Re: MySQL Cluster -- Table FullJohan Andersson1 Oct
                        • Re: MySQL Cluster -- Table FullAaron Weller // Crucial Paradigm7 Oct
                      • Re: MySQL Cluster -- Table FullJohan Andersson1 Oct
                      • Re: MySQL Cluster -- Table FullJon Stephens1 Oct
  • Re: MySQL Cluster -- Table FullMoses28 Apr
    • Re: MySQL Cluster -- Table FullJohan Andersson28 Apr
  • Re: MySQL Cluster -- Table FullMoses6 May
Re: MySQL Cluster -- Table FullAaron Weller // Crucial Paradigm1 Oct
  • Re: MySQL Cluster -- Table FullJohan Andersson1 Oct