List:Cluster« Previous MessageNext Message »
From:Adam Dixon Date:March 14 2006 1:44am
Subject:Max Operations & Checkpoint times.
View as plain text  
Hi guys,
In the last week I have been having issues with running out of
operation records. I have since then increase this value from a lowish
200,000 to 250,000. However this only alleviates the problem, as I
need more operations than this (much more by the looks).

This has caused me to read into how Operations work and what exactly
they are used for. But its unclear from the documentation how an
Operation Record is used in every instance, which makes it hard to
plan for what I might actually need. Ive also revisited
http://mikaelronstrom.blogspot.com/2005/10/calculating-parameters-for-local.html
and am looking at perhaps revising these settings.

Im assuming that an operation record is used for all 'operations' eg,
select, insert, update, delete. However is there any cases like
BLOB/TEXT which uses more operations, how can I calculate (guestimate)
a suitable value.
I under stand blobs are stored as in, first 256bytes in real table,
every 256bytes thereafter in a hidden table. And to select a blob with
5 of these parts would take 1+5 operation records?

Also, I know that decreasing the time between checkpoints will also
assist in minimising out of operation records situation,
This is the output of my mgm log,
2006-03-14 10:34:15 INFO     -- Node 13: Local checkpoint 40208
started. Keep GCI = 6079137 oldest restorable GCI = 6079287
2006-03-14 10:39:17 INFO     -- Node 13: Local checkpoint 40208 completed

The time between the checkpoint starting and completing, is 5 minutes.
I remember choosing   this due to the disks only being SATA and not
wanting to overload them.

I have this set in my config.ini;
#
NoOfFragmentLogFiles = 32
RedoBuffer=128MB

# (4*(2^22)) = 16MB
TimeBetweenLocalCheckpoints = 18
TimeBetweenGlobalCheckpoints = 1500

#100 = 100 * 10 * 8kb = 8MBit/sec
#40 = 40 * 10 *8kb = 3.2MBit/sec
NoOfDiskPagesToDiskAfterRestartTUP = 175
NoOfDiskPagesToDiskAfterRestartACC = 80
#

So I went conservative so as to not over configure for my hardware
(kind of a guess really which is unfortunate) So now I am thinking of
adjusting the TimeBetweenLocalCheckpoints = 18 value, the manual gives
the suggestion of setting this number to 6, so that checkpoints start
as soon as they finish, perhaps this would be the best course of
action here.

So, would it be more ideal to try and adjust for more Operations
records in the config, or change the TimeBetweenLocalCheckpoints value
so that checkpoints run more often than they are currently.

Bear in mind that with my current DataMemory configuration is pretty
much maxing out the available RAM on the datanodes, so Ive only got
about 40-50 mb of available memory on each of my 8 datanodes.

Further to this, Im unsure if the operation records are divided up
apon the nodes evenly, but a busier partition may chew up operation
records sooner then perhaps the other partitions with less heavily
accessed data. Is this a possibility?

Any comments on any of the above would be great,
Adam
Thread
Max Operations & Checkpoint times.Adam Dixon14 Mar