Hi Jim,
Great to see your progress. These questions you are coming up with now
is usually the last ones on the
line towards a production configuration.
The problem you're facing is that the REDO log gets filled up before 3
checkpoints have completed. The
default value of the NoOfFragmentLogFiles is 8 which means 8 * 4 * 16
MBytes of REDO log files = 512 MB.
The speed of a local checkpoint is controlled by the parameters
Quote from manual
• [NDBD]NoOfDiskPagesToDiskAfterRestartTUP
When executing a local checkpoint the algorithm flushes all data pages
to disk. Merely doing as quickly as possible without any moderation is
likely to impose excessive loads on processors, networks, and disks.
To control the write speed, this parameter specifies how many pages
per 100 milliseconds are to be written. In this context, a "page" is
defined as 8KB; thus, this parameter is specified in units of 80KB per
second. Therefore, setting NoOfDiskPagesToDiskAfterRestartTUP to a
value of 20 means writing 1.6MB of data pages to disk each second
during a local checkpoint. This value includes the writing of UNDO log
records for data pages; that is, this parameter handles the limitation
of writes from data memory. UNDO log records for index pages are
handled by the parameter NoOfDiskPagesToDiskAfterRestartACC. (See the
entry for IndexMemory for information about index pages.)
In short, this parameter specifies how quickly local checkpoints will
be executed, and operates in conjunction with NoOfFragmentLogFiles,
DataMemory, and IndexMemory.
The default value is 40 (3.2MB of data pages per second).
•
[NDBD]NoOfDiskPagesToDiskAfterRestartACC
This parameter uses the same units as
NoOfDiskPagesToDiskAfterRestartTUP and acts in a similar fashion, but
limits the speed of writing index pages from index memory.
The default value of this parameter is 20 index memory pages per
second (1.6MB per second).
So in your configuration the time to checkpoint the ACC index memory
data is around 61M/1.6M seconds = 38 seconds (Should not be a problem,
2 minutes should not fill up the 512 MBytes of REDO log files (would
require 4.7 MByte of log file generated per second, REDO log has to
keep
3 checkpoints).
The DataMemory should take around 639M / 3.2M seconds = 200 seconds.
Thus the REDO log has to keep 600 seconds of data. A good guess
is that you get this error then before 10 minutes have passed.
So what should your parameters be.
It depends on how much disk and cpu you want to spend on checkpoint in
operation compared to how fast you want the system restart to take.
The slower your checkpoints are the more REDO log has to be processed
at a system restart.
My thumbnail rule is to set a checkpoint to take around 5 minutes. In
your case with 1276 M of DataMemory this would require setting
NoOfDiskPagesToDiskAfterRestartTUP to (1276MByte / 300 seconds) / 80
kBytes /second = 53.16 => 54 (4.25 MByte per second)
NoOfDiskPagesToDiskAfterRestartACC to (175M / 300 seconds) / 80
kBytes/second = 7.29 => 8 (640 kBytes per second)
In this case the total load on the disk per node would be 4.89 MBytes
per second for writing checkpoints.
If you allow checkpoints to take 10 minutes instead the checkpoint
speed is half (27 and 4 => 2.12 MBytes + 320 kBytes = 2.44 MBytes /
second)
When setting NoOfFragmentLogFiles I usually use a thumbnail rule to
allow for 6 checkpoints (usually allocating twice as much disk space
for REDO
logs isn't a big deal) to have a good safety margin.
Using 5 minutes this means that the REDO log file has to keep 30
minutes of REDO logs. How much REDO logs are produced during 30 minutes
is
very dependent on the application.
There is 72 bytes of overhead per delete/update/insert of one record
plus there is an additional overhead of 4 bytes per field changed and
the
primary key is also stored in the REDO log. Thus a REDO log record size
is = 72 + 4 * no_of_fields_updated + size_of_fields_updated +
size_of_PK.
As an example an insert in the dog table mentioned in one of your
previous mails would consume
72 + 4 * 5 + (4 + 24 + 52 + 256 + 12) + 4 = 444 bytes
Usually at max speed the ndbd nodes can write around 5-15 MByte of REDO
log per second. It is not likely that you come close
to this speed. I would guess that you are more in the range of 2
Mbytes/second. Thus setting
NoOfFragmentLogFiles to 64 would handle this (= 4 GByte of REDO log).
Actually it is pretty likely a smaller figure would do as well but I
would
use some margins to avoid the 410 problem.
Rgrds Mikael
2005-04-04 kl. 04.12 skrev Jim Hoadley:
> Reading the documentation and various posts in this list (thank you
> Michael and Pekka especially), I have calculated the size of my
> datbase as
> data size=1276M and index size=175M (details available upon request).
>
> Using these formulas and given that I have 2 replicas and 2 data
> nodes, I
> arrived
> at these settings for config.ini:
>
> DataMemory = (data size + (for each table(number of records * ordered
> indexes * 10)) * NoOfReplicas) / number of data nodes (Size * 1.1) *
> NoOfReplicas / NoOfDataNodes
>
> IndexMemory = (for each table (for each primary or unique key (size of
> attribute + 25))) * NoOfReplicas / number of data nodes
>
> DataMemory=1404M
> IndexMemory=193M
>
> While loading my data (using 'mysql test < db.dump'), the process stops
> midway (on table 8 of 20). I've calculated, and at that point, only
> 639M
> of DataMemory and 61M of IndexMemory should be required, therefore it's
> unlikely that the error is caused by having either DataMemory or
> IndexMemory set too low.
>
> In fact, the error received is "ERROR 1297 (HY000) at line 2777438: Got
> temporary error 410 'REDO log buffers overloaded, consult online manual
> (increase RedoBuffer, and|or decrease TimeBetweenLocalCheckpoints,
> and|or
> increase NoOfFragmentLogFiles)' from ndbcluster". I have RedoBuffer,
> TimeBetweenLocalCheckpoints and NoOfFragmentLogFiles set to their
> defaults.
>
> What should I set these to?
>
> I'm running RHEL 3 (2.4.21-27.0.2.ELsmp) on 2 Dell PowerEdge 1850s
> each w/6GB RAM, and here's my current config.ini:
>
> #----------------------------
>
> [ndbd default]
> NoOfReplicas= 2
> MaxNoOfConcurrentOperations=131072
> DataMemory= 1404M
> IndexMemory= 193M
> Diskless= 0
> DataDir= /var/mysql-cluster
> TimeBetweenWatchDogCheck=10000
> HeartbeatIntervalDbDb=10000
> HeartbeatIntervalDbApi=10000
> MaxNoOfAttributes = 2000
> MaxNoOfOrderedIndexes = 5000
> MaxNoOfUniqueHashIndexes = 5000
>
> [ndbd]
> HostName= 10.0.1.199
>
> [ndbd]
> HostName= 10.0.1.200
>
> [ndb_mgmd]
> HostName= 10.0.1.198
> PortNumber= 2200
>
> [mysqld]
>
> [mysqld]
>
> [tcp default]
> PortNumber= 2202
>
> #----------------------------
>
> Any help would be appreciated. Thanks.
>
> -- Jim Hoadley
> Sr Software Eng
> Dealer Fusion Inc
>
>
>
> __________________________________
> Yahoo! Messenger
> Show us what our next emoticon should look like. Join the fun.
> http://www.advision.webevents.yahoo.com/emoticontest
>
> --
> MySQL Cluster Mailing List
> For list archives: http://lists.mysql.com/cluster
> To unsubscribe:
> http://lists.mysql.com/cluster?unsub=1
>
>
Mikael Ronström, Senior Software Architect
MySQL AB, www.mysql.com
Jumpstart your cluster:
http://www.mysql.com/consulting/packaged/cluster.html