List:Cluster« Previous MessageNext Message »
From:Mikael Ronström Date:April 4 2005 9:44am
Subject:Re: error on loading data
View as plain text  
Hi Jim,
Great to see your progress. These questions you are coming up with now 
is usually the last ones on the
line towards a production configuration.

The problem you're facing is that the REDO log gets filled up before 3 
checkpoints have completed. The
default value of the NoOfFragmentLogFiles is 8 which means 8 * 4 * 16 
MBytes of REDO log files = 512 MB.
The speed of a local checkpoint is controlled by the parameters

Quote from manual

	• 	[NDBD]NoOfDiskPagesToDiskAfterRestartTUP

  When executing a local checkpoint the algorithm flushes all data pages 
to disk.  Merely doing as quickly as possible without any moderation is 
likely to impose  excessive loads on processors, networks, and disks. 
To control the write speed,  this parameter specifies how many pages 
per 100 milliseconds are to be written.  In this context, a "page" is 
defined as 8KB; thus, this parameter is specified  in units of 80KB per 
second. Therefore, setting  NoOfDiskPagesToDiskAfterRestartTUP to a 
value of 20 means writing 1.6MB  of data pages to disk each second 
during a local checkpoint. This value  includes the writing of UNDO log 
records for data pages; that is, this  parameter handles the limitation 
of writes from data memory. UNDO log  records for index pages are 
handled by the parameter  NoOfDiskPagesToDiskAfterRestartACC. (See the 
entry for IndexMemory  for information about index pages.)

  In short, this parameter specifies how quickly local checkpoints will 
be  executed, and operates in conjunction with NoOfFragmentLogFiles,  
DataMemory, and IndexMemory.

  The default value is 40 (3.2MB of data pages per second).
	• 	
  [NDBD]NoOfDiskPagesToDiskAfterRestartACC

  This parameter uses the same units as  
NoOfDiskPagesToDiskAfterRestartTUP and acts in a similar fashion, but 
limits the speed of writing index pages from index memory.


  The default value of this parameter is 20 index memory pages per 
second (1.6MB  per second).

So in your configuration the time to checkpoint the ACC index memory 
data is around 61M/1.6M seconds = 38 seconds (Should not be a problem,
2 minutes should not fill up the 512 MBytes of REDO log files (would 
require 4.7 MByte of log file generated per second, REDO log has to 
keep
3 checkpoints).

The DataMemory should take around 639M / 3.2M seconds = 200 seconds. 
Thus the REDO log has to keep 600 seconds of data. A good guess
is that you get this error then before 10 minutes have passed.

So what should your parameters be.
It depends on how much disk and cpu you want to spend on checkpoint in 
operation compared to how fast you want the system restart to take.
The slower your checkpoints are the more REDO log has to be processed 
at a system restart.

My thumbnail rule is to set a checkpoint to take around 5 minutes. In 
your case with 1276 M of DataMemory this would require setting
NoOfDiskPagesToDiskAfterRestartTUP to (1276MByte  / 300 seconds) / 80 
kBytes /second = 53.16 => 54 (4.25 MByte per second)
NoOfDiskPagesToDiskAfterRestartACC to (175M / 300 seconds) / 80 
kBytes/second = 7.29 => 8 (640 kBytes per second)

In this case the total load on the disk per node would be 4.89 MBytes 
per second for writing checkpoints.
If you allow checkpoints to take 10 minutes instead the checkpoint 
speed is half (27 and 4 => 2.12 MBytes + 320 kBytes = 2.44 MBytes / 
second)

When setting NoOfFragmentLogFiles I usually use a thumbnail rule to 
allow for 6 checkpoints (usually allocating twice as much disk space 
for REDO
logs isn't a big deal) to have a good safety margin.

Using 5 minutes this means that the REDO log file has to keep 30 
minutes of REDO logs. How much REDO logs are produced during 30 minutes 
is
very dependent on the application.
There is 72 bytes of overhead per delete/update/insert of one record 
plus there is an additional overhead of 4 bytes per field changed and 
the
primary key is also stored in the REDO log. Thus a REDO log record size 
is = 72 + 4 * no_of_fields_updated + size_of_fields_updated + 
size_of_PK.

As an example an insert in the dog table mentioned in one of your 
previous mails would consume
72 + 4 * 5 + (4 + 24 + 52 + 256 + 12) + 4 = 444 bytes

Usually at max speed the ndbd nodes can write around 5-15 MByte of REDO 
log per second. It is not likely that you come close
to this speed.  I would guess that you are more in the range of 2 
Mbytes/second. Thus setting
NoOfFragmentLogFiles to 64 would handle this (= 4 GByte of REDO log). 
Actually it is pretty likely a smaller figure would do as well but I 
would
use some margins to avoid the 410 problem.

Rgrds Mikael

2005-04-04 kl. 04.12 skrev Jim Hoadley:

> Reading the documentation and various posts in this list (thank you
> Michael and Pekka especially), I have calculated the size of my 
> datbase as
> data size=1276M and index size=175M (details available upon request).
>
> Using these formulas and given that I have 2 replicas and 2 data 
> nodes, I
> arrived
> at these settings for config.ini:
>
> DataMemory = (data size + (for each table(number of records * ordered
> indexes * 10)) * NoOfReplicas) / number of data nodes (Size * 1.1) *
> NoOfReplicas / NoOfDataNodes
>
> IndexMemory = (for each table (for each primary or unique key (size of
> attribute + 25))) * NoOfReplicas / number of data nodes
>
> DataMemory=1404M
> IndexMemory=193M
>
> While loading my data (using 'mysql test < db.dump'), the process stops
> midway (on table 8 of 20). I've calculated, and at that point, only 
> 639M
> of DataMemory and 61M of IndexMemory should be required, therefore it's
> unlikely that the error is caused by having either DataMemory or
> IndexMemory set too low.
>
> In fact, the error received is "ERROR 1297 (HY000) at line 2777438: Got
> temporary error 410 'REDO log buffers overloaded, consult online manual
> (increase RedoBuffer, and|or decrease TimeBetweenLocalCheckpoints, 
> and|or
> increase NoOfFragmentLogFiles)' from ndbcluster". I have RedoBuffer,
> TimeBetweenLocalCheckpoints and NoOfFragmentLogFiles set to their
> defaults.
>
> What should I set these to?
>
> I'm running RHEL 3 (2.4.21-27.0.2.ELsmp) on 2 Dell PowerEdge 1850s
> each w/6GB RAM, and here's my current config.ini:
>
> #----------------------------
>
> [ndbd default]
> NoOfReplicas= 2
> MaxNoOfConcurrentOperations=131072
> DataMemory= 1404M
> IndexMemory= 193M
> Diskless= 0
> DataDir= /var/mysql-cluster
> TimeBetweenWatchDogCheck=10000
> HeartbeatIntervalDbDb=10000
> HeartbeatIntervalDbApi=10000
> MaxNoOfAttributes = 2000
> MaxNoOfOrderedIndexes = 5000
> MaxNoOfUniqueHashIndexes = 5000
>
> [ndbd]
> HostName= 10.0.1.199
>
> [ndbd]
> HostName= 10.0.1.200
>
> [ndb_mgmd]
> HostName= 10.0.1.198
> PortNumber= 2200
>
> [mysqld]
>
> [mysqld]
>
> [tcp default]
> PortNumber= 2202
>
> #----------------------------
>
> Any help would be appreciated. Thanks.
>
> -- Jim Hoadley
>     Sr Software Eng
>     Dealer Fusion Inc
>
>
> 		
> __________________________________
> Yahoo! Messenger
> Show us what our next emoticon should look like. Join the fun.
> http://www.advision.webevents.yahoo.com/emoticontest
>
> -- 
> MySQL Cluster Mailing List
> For list archives: http://lists.mysql.com/cluster
> To unsubscribe:    
> http://lists.mysql.com/cluster?unsub=1
>
>
Mikael Ronström, Senior Software Architect
MySQL AB, www.mysql.com

Jumpstart your cluster:
http://www.mysql.com/consulting/packaged/cluster.html


Thread
error on loading dataJim Hoadley4 Apr
  • Re: error on loading dataMikael Ronström4 Apr
    • Re: error on loading dataJim Hoadley4 Apr
      • Re: error on loading dataMikael Ronström4 Apr
        • Re: error on loading dataJim Hoadley5 Apr
          • Re: error on loading datapekka5 Apr
          • Re: error on loading dataMikael Ronström5 Apr
Re: error on loading dataJim Hoadley4 Apr