List:Cluster« Previous MessageNext Message »
From:Johan Andersson Date:October 1 2010 7:05am
Subject:Re: MySQL Cluster -- Table Full
View as plain text  
Hi,

First you need to investigate your environment!
Then you can tweak.

-j

Aaron Weller // Crucial Paradigm wrote:
> Hi Johan,
>
> 1. I don't think it was swapping, as the server has 144GB RAM, 130GB 
> to NDB, and the rest for the OS. However I did not check at the time.
> 2. No, no backups were being done.  The server was not yet in 
> production, and no backups were configured.
> 3. These are brand new Seagate 300GB 15k SAS (8 in RAID 10), so I'm 
> not sure if that would be the issue or not.
> 4. Hope not!
> Unfortunately I don't have the iostat or free -m during the crash, 
> however ther crashes happened very quickly (within and under a minute 
> in most cases).
> Are there any settings we can tweak?  Should we be using multiple 
> smaller tablespace files rather than 1 large 500GB tablespace? Should 
> we be doing checkpoints less often?
>
> Thanks!
> Aaron
>
> Johan Andersson wrote:
>> GCP Stops can happen b/c
>> 1) swapping
>> 2) os backups / copying of large files
>> 3) slow disks / too few disks and too much IO (write cache enabled?)
>> 4) bugs
>>
>> What was the io load (from iostat -mx 1)  when the crash happened?
>> What does free -m show?
>>
>> BR
>> johan andersson
>>
>> Aaron Weller // Crucial Paradigm wrote:
>>> Hey Johan and Jonas,
>>>
>>> Thanks for your help so far!  I've been working with Karl on these 
>>> issues.
>>>
>>> We ended up taking your suggestions and did the following:
>>>
>>> * Ran NDB initial on both nodes
>>> * Re-created tablespace using a single 500GB file
>>> * Re-created undo log using a single 50GB file
>>> * Set max rows to 650 million
>>>
>>> When we started restoring databases of anything larger than a few 
>>> thousand records, or when we used mysqlslap to hit the database with 
>>> a decent number of queries after a few seconds it would crash and we 
>>> would get an error like this in the logs:
>>>
>>> ---------------------------------
>>> 2010-09-30 11:39:22 [ndbd] INFO     -- Node 4 killed this node 
>>> because GCP stop was detected
>>> 2010-09-30 11:39:22 [ndbd] INFO     -- NDBCNTR (Line: 274) 0x00000006
>>> 2010-09-30 11:39:22 [ndbd] INFO     -- Error handler shutting down 
>>> system
>>> 2010-09-30 11:39:22 [ndbd] INFO     -- Error handler shutdown 
>>> completed - exiting
>>> Time: Thursday 30 September 2010 - 11:39:22
>>> Status: Temporary error, restart node
>>> Message: System error, node killed during node restart by other node 
>>> (Internal error, programming error or missing error message, please 
>>> report a bug)
>>> Error: 2303
>>> Error data: Node 4 killed this node because GCP stop was detected
>>> Error object: NDBCNTR (Line: 274) 0x00000006
>>> Program: /usr/bin/ndbd
>>> Pid: 31222
>>> Version: mysql-5.1.47 ndb-7.1.5
>>> Trace: /var/lib/mysql-cluster/ndb_4_trace.log.7
>>> ***EOM***
>>>
>>> 2010-09-30 11:39:27 [ndbd] ALERT    -- Node 4: Forced node shutdown 
>>> completed. Caused by error 2303: 'System error, node killed during 
>>> node restart by other node(Internal error, programming error or 
>>> missing error message, please report a bug). Temporary error, 
>>> restart node'.
>>> ---------------------------------
>>>
>>> Doing some research I found this regarding the GCP stop errors: 
>>>
> http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#mysql-cluster-ndbd-definition-gcp-stop-errors
> 
>>>
>>>
>>> And I also found this: 
>>>
> http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-diskiothreadpool
> 
>>>
>>>
>>>
>>> Unforutnately due to an extremely tight deadline and having to put 
>>> the server in production 6 hours ago, we had to revert back to the 
>>> more stable config of not using a tablespace file. This however 
>>> means we are limited by the amount of data we can put in our tables.
>>> I have a few questions, if anyone can help?
>>>
>>> * Is there any other command other than "all report mem" for 
>>> checking the actual amount of RAM being used of a table? We have 
>>> noticed that we have received table full when we have plenty of RAM 
>>> left, but no tablespace.  However we don't know how to check to see 
>>> when we are reaching that point.
>>>
>>> * Any ideas on how to fix the issue with NDB  crashing when we add a 
>>> larger tablespace file?  And without causing any downtime while 
>>> doing it? There are some things about DiskIOThreadPool, however I'm 
>>> not sure what our ideal values would be for our dataset, etc.
>>>
>>> Thanks!
>>> Aaron
>>>
>>> Karl Kloppenborg wrote:
>>>> Hi guys,
>>>>
>>>> In light of the continual problems we face, Aaron will be taking 
>>>> over this so I can continue testing...
>>>>
>>>> Thanks for helping guys!
>>>>
>>>> Cheers,
>>>> Karl Kloppenborg.
>>>>
>>>>
>>>> On 30/09/2010, at 17:31, Johan Andersson wrote:
>>>>
>>>>  
>>>>> Hi,
>>>>>
>>>>> You are running out of extents --> add data file
>>>>> Then if you are going to load in a lot of records (>100M) you need
> 
>>>>> to add max_rows too.
>>>>> max_rows will underneath make the data nodes to create more 
>>>>> fragments.
>>>>> With many rows, more fragments than the default are needed to 
>>>>> store the primary key hash index.
>>>>>
>>>>> BR
>>>>> johan
>>>>>
>>>>>
>>>>> Karl Kloppenborg wrote:
>>>>>  
>>>>>> Hi Johan,
>>>>>>
>>>>>> Can you please explain that?
>>>>>>
>>>>>> I am a bit confused....
>>>>>>
>>>>>> Cheers,
>>>>>> Karl.
>>>>>> On 30/09/2010, at 16:51, Johan Andersson wrote:
>>>>>>
>>>>>>  
>>>>>>    
>>>>>>> Karl,
>>>>>>> Just to summarize:
>>>>>>> - You must do ALTER TS ADD DATAFILE _and_ set max_rows.
>>>>>>>
>>>>>>> BR
>>>>>>> johan
>>>>>>>
>>>>>>> Jonas Oreland wrote:
>>>>>>>         
>>>>>>>> On 09/30/10 07:27, Karl Kloppenborg wrote:
>>>>>>>>              
>>>>>>>>> Hey Jonas,
>>>>>>>>>
>>>>>>>>> Thanks for the reply, we will try implement the
> Max_rows after 
>>>>>>>>> the rebuild takes place, However I have a few
> questions:
>>>>>>>>>
>>>>>>>>> 1) after doing some writes and getting the table full
> I 
>>>>>>>>> executed the show warnings:
>>>>>>>>>
>>>>>>>>>
> +-------+------+--------------------------------------------------------+ 
>>>>>>>>>
>>>>>>>>> | Level | Code | 
>>>>>>>>> Message                                              
>  |
>>>>>>>>>
> +-------+------+--------------------------------------------------------+ 
>>>>>>>>>
>>>>>>>>> | Error | 1296 | Got error 1601 'Out extents,
> tablespace full' 
>>>>>>>>> from NDB |
>>>>>>>>> | Error | 1114 | The table
> 'my_ndb_awesome_large_table' is 
>>>>>>>>> full                   |
>>>>>>>>>
> +-------+------+--------------------------------------------------------+ 
>>>>>>>>>
>>>>>>>>> 2 rows in set (0.00 sec)
>>>>>>>>>                       
>>>>>>>> 1601 means data-on-disk if i'm not mistaken.
>>>>>>>>
>>>>>>>> ndbd's tablespaces doesn't auto grow.
>>>>>>>> "alter tablespace X add datafile Y initial_size=10G" or 
>>>>>>>> something should do it.
>>>>>>>>
>>>>>>>> /Jonas
>>>>>>>>
>>>>>>>>              
>>>>>>>>> Could you explain this?
>>>>>>>>>
>>>>>>>>> 2) I will try implementing max_rows..
>>>>>>>>>
>>>>>>>>> 3) we are not using ndbmtd but each server has 16
> cores / 
>>>>>>>>> 144GB  should we?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Karl Kloppenborg.
>>>>>>>>>
>>>>>>>>> On 30/09/2010, at 15:19, Jonas Oreland wrote:
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>> On 09/30/10 07:13, Karl Kloppenborg wrote:
>>>>>>>>>>                          
>>>>>>>>>>> Hi guys,
>>>>>>>>>>>
>>>>>>>>>>> We initially didn't but we got this problem
> at 92 million rows,
>>>>>>>>>>>
>>>>>>>>>>> After a lot of research we found a post that
> stated maybe 
>>>>>>>>>>> increasing the max rows would help, however
> after reading 
>>>>>>>>>>> more on max_rows with NDB it was found that
> it is not used 
>>>>>>>>>>> by NDBcluster engine and ignored? is this
> true?
>>>>>>>>>>>                                   
>>>>>>>>>> 1) after you get "table full", issue "show
> warnings", this 
>>>>>>>>>> will show you exact error code
>>>>>>>>>>
>>>>>>>>>> 2) maxrows *should* help
>>>>>>>>>>
>>>>>>>>>> 3) are you using ndbmtd ?
>>>>>>>>>>
>>>>>>>>>> /Jonas
>>>>>>>>>>
>>>>>>>>>>                          
>>>>>>>>>>> However take note, that we require that it
> hold 600 million 
>>>>>>>>>>> rows... challenge..
>>>>>>>>>>>
>>>>>>>>>>> I will also add my create table syntax to
> show you what 
>>>>>>>>>>> we're doing.
>>>>>>>>>>>
>>>>>>>>>>> CREATE TABLE `my_ndb_awesome_large_table ` (
>>>>>>>>>>> `user_id` int(4) NOT NULL,
>>>>>>>>>>> `description` varchar(50) NOT NULL,
>>>>>>>>>>> `type` varchar(64) NOT NULL,
>>>>>>>>>>> `count` int(4) NOT NULL,
>>>>>>>>>>> `after` int(3) NOT NULL,
>>>>>>>>>>> `active` int(1) NOT NULL,
>>>>>>>>>>> `lastactivity` timestamp NOT NULL DEFAULT
> CURRENT_TIMESTAMP 
>>>>>>>>>>> ON UPDATE CURRENT_TIMESTAMP
>>>>>>>>>>> )  ENGINE=ndbcluster DEFAULT CHARSET=utf8
>>>>>>>>>>>
>>>>>>>>>>> Any thoughts on what this virtual "level"
> might be? because 
>>>>>>>>>>> as you can see in my last email have not run
> out of index / 
>>>>>>>>>>> data space?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 30/09/2010, at 14:59, Jonas Oreland
> wrote:
>>>>>>>>>>>
>>>>>>>>>>>                                
>>>>>>>>>>>> Are you using "maxrows" in your table
> definition ?
>>>>>>>>>>>>
>>>>>>>>>>>> /Jonas
>>>>>>>>>>>>
>>>>>>>>>>>> On 09/30/10 06:15, Karl Kloppenborg
> wrote:
>>>>>>>>>>>>                                      
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> We have setup a MySQL cluster (pretty
> standard two NDB 
>>>>>>>>>>>>> nodes + two management servers)
>>>>>>>>>>>>> 2x cluster processing nodes (running
> the MySQL NDB daemon 
>>>>>>>>>>>>> + MySQL server)
>>>>>>>>>>>>>   - 144GB ram
>>>>>>>>>>>>>   -  8x 300GB SAS - Raid 10     -
> Data-Storage = 135GB 
>>>>>>>>>>>>> Ram     - Index-Storage = 5GB RAM
>>>>>>>>>>>>> However at 92Million rows in a table,
> it is returning the 
>>>>>>>>>>>>> TableFull error?
>>>>>>>>>>>>>
>>>>>>>>>>>>> My config is as follows:
>>>>>>>>>>>>> [NDBD DEFAULT]
>>>>>>>>>>>>> NoOfReplicas=2
>>>>>>>>>>>>> LockPagesInMainMemory=1
>>>>>>>>>>>>>
>>>>>>>>>>>>> DataMemory=131G
>>>>>>>>>>>>> IndexMemory=10G
>>>>>>>>>>>>>
>>>>>>>>>>>>> TimeBetweenLocalCheckpoints=6
>>>>>>>>>>>>> NoOfFragmentLogFiles=500
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [MYSQLD DEFAULT]
>>>>>>>>>>>>>
>>>>>>>>>>>>> [NDB_MGMD DEFAULT]
>>>>>>>>>>>>>
>>>>>>>>>>>>> [TCP DEFAULT]
>>>>>>>>>>>>> SendBufferMemory=8M
>>>>>>>>>>>>> ReceiveBufferMemory=8M
>>>>>>>>>>>>>
>>>>>>>>>>>>> # Section for the cluster management
> node
>>>>>>>>>>>>> [NDB_MGMD]
>>>>>>>>>>>>> ID=1 #LB1 ID is 1
>>>>>>>>>>>>> Datadir=/var/lib/mysql-cluster
>>>>>>>>>>>>> HostName=#.#.#.# #PRIVATE IP OF LB1
>>>>>>>>>>>>>
>>>>>>>>>>>>> [NDB_MGMD]
>>>>>>>>>>>>> HostName=#.#.#.# #PRIVATE IP OF LB2
>>>>>>>>>>>>> ID=2 #ID of LB2 is 2
>>>>>>>>>>>>> Datadir=/var/lib/mysql-cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>> # Section for the storage nodes
>>>>>>>>>>>>> [NDBD]
>>>>>>>>>>>>> # IP address of the first storage
> node
>>>>>>>>>>>>> HostName=#.#.#.# # PRIVATE IP OF DB1
>>>>>>>>>>>>> DataDir=/var/lib/mysql-cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>> [NDBD]
>>>>>>>>>>>>> # IP address of the second storage
> node
>>>>>>>>>>>>> HostName=#.#.#.# #PRIVATE IP OF DB2
>>>>>>>>>>>>> DataDir=/var/lib/mysql-cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>> # one [MYSQLD] per storage node
>>>>>>>>>>>>> [MYSQLD]
>>>>>>>>>>>>> [MYSQLD]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can anyone please shed some light on
> this matter?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Karl Kloppenborg --
>>>>>>>>>>>>> MySQL Cluster Mailing List
>>>>>>>>>>>>> For list archives:
> http://lists.mysql.com/cluster
>>>>>>>>>>>>> To unsubscribe:    
>>>>>>>>>>>>>
> http://lists.mysql.com/cluster?unsub=1
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                      
>         
>>>>>>>>> -- 
>>>>>>>>> MySQL Cluster Mailing List
>>>>>>>>> For list archives: http://lists.mysql.com/cluster
>>>>>>>>> To unsubscribe:    
>>>>>>>>> http://lists.mysql.com/cluster?unsub=1
>>>>>>>>>
>>>>>>>>>                       
>>>>>>>>                 
>>>>>> -- 
>>>>>> MySQL Cluster Mailing List
>>>>>> For list archives: http://lists.mysql.com/cluster
>>>>>> To unsubscribe:    
>>>>>> http://lists.mysql.com/cluster?unsub=1
>>>>>>
>>>>>>  
>>>>>>       
>>>>
>>>>
>>>>   
>>>
>>>
>>
>>
>

Thread
MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
  • Re: MySQL Cluster -- Table FullAdam Dixon30 Sep
    • Re: MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
  • Re: MySQL Cluster -- Table FullJonas Oreland30 Sep
    • Re: MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
      • Re: MySQL Cluster -- Table FullJonas Oreland30 Sep
        • Re: MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
          • Re: MySQL Cluster -- Table FullJonas Oreland30 Sep
            • Re: MySQL Cluster -- Table FullJohan Andersson30 Sep
              • Re: MySQL Cluster -- Table FullKarl Kloppenborg30 Sep
                • Re: MySQL Cluster -- Table FullJohan Andersson30 Sep
                  • Re: MySQL Cluster -- Table FullKarl Kloppenborg1 Oct
                    • Re: MySQL Cluster -- Table FullAaron Weller // Crucial Paradigm1 Oct
                      • Re: MySQL Cluster -- Table FullJohan Andersson1 Oct
                        • Re: MySQL Cluster -- Table FullAaron Weller // Crucial Paradigm7 Oct
                      • Re: MySQL Cluster -- Table FullJohan Andersson1 Oct
                      • Re: MySQL Cluster -- Table FullJon Stephens1 Oct
  • Re: MySQL Cluster -- Table FullMoses28 Apr
    • Re: MySQL Cluster -- Table FullJohan Andersson28 Apr
  • Re: MySQL Cluster -- Table FullMoses6 May
Re: MySQL Cluster -- Table FullAaron Weller // Crucial Paradigm1 Oct
  • Re: MySQL Cluster -- Table FullJohan Andersson1 Oct