Awesome response! Thank you!
>> Could you explain the difference between
[DB]TimeBetweenLocalCheckpoints and [DB]TimeBetweenGlobalCheckpoints,
and how both of these relate to [DB]NoOfFragmentLogFiles?
> Even though they are similar in name local and global checkpoint is
fairly dissimilar.
> Global checkpoint is a protocol that ensures that all REDO logs upto
a certain global checkpoint identifier is flushed to disk. One can think
of global checkpoint as a method of grouping transactions together for
flushing them to disk. At system restart one restores the system from
one of those global checkpoints.
> Local checkpoint is the method used to ensure that the log size
doesn't have to be infinite. By ensuring that all data pages are written
to disk we know that we don't need old log records anymore and can thus
get rid of those.
> MySQL Cluster contains the capability to restart from old global
checkpoints and thus actually save log records until three local
checkpoints have been performed.
> There is a section planned in the documentation for an example of how
to configure those parameters.
>
I look forward to reading it as soon as it is released! :)
>> One of the functions that I am testing is importing a large quantity
of data into the cluster, and so I am trying to figure out what settings
need to be what for this to be possible - I hope to be able to move a
very large table (75mil rows, 8 columns: 4 int, 3 varchar(20), 1
varchar(100), all but 1 column allowed to be NULL, and charset has to be
UTF8). I continue to run into problems with UNDO buffers filling up
during the import process, though I have had mostly success lately, when
using small import files.
> As you don't mention any primary key, I will assume that you haven't
defined one.
> Are there any indexes, it is important to know this to make the
calculations below.
> Each row will store in DataMemory:
> 1) 16 bytes of overhead (4 bytes of pointer to linked list of
operations performed on the record, 4 bytes of NULL bits, 4 bytes for
future use, 4 bytes for checksum)
> 2) 8 bytes primary key
> 3) 4 int => 4*4 bytes = 16 bytes
> 4) 3 varchar(20) => 3*24 bytes = 72 bytes
> 5) 1 varchar(100) => 104 bytes
> => 216 bytes/per record stored in 32768 byte pages with 128 bytes
header => 151 records per 32k page.
> => 151k records per 32M => 3M rows ~ 640M => 75M rows ~16 GByte
> In IndexMemory you have with a 8 byte primary key 33 bytes of storage
per record. Thus 96M is sufficient for almost 3 M rows and with 8 DB
nodes around
> 11 M rows should be ok.
Wonderful explanation, thank you!
There is a primary key on one of the varchar(20) fields, and no other
indexes.
In the documentation, I found this regarding UTF8's memory requirements:
"Tip: To save space with UTF8, use VARCHAR instead of CHAR. Otherwise,
MySQL has to reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column,
because that's the maximum possible length." Since the cluster reserves
the maximum space for varchar columns, does this mean that a UTF8
varchar(20) field will require (20*3 + 4 = ) 64 bytes? If this is the
case, then it would be 512 bytes/per record + (25+64) bytes for the hash
index + ?? for the ordered index. Since there should never be any
lookups besides single rows based on primary key, I tried to create the
table with PRIMARY KEY ('username') USING HASH but received an error.
ERROR 1064 (42000): You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for the right
syntax to use near 'USING HASH
Assuming I can drop the ordered index and that UTF8 requires tripple the
space, there will be 601 bytes/per record => 54 records/page => 54k
records/ 32M => 75M records ~ 45GB. Then with 8 nodes and 2 replicas
(thus 4 pairs), this would require a minimum of 12GB DataMemory per
machine. Is this correct?
> I presume you have 2 replicas and 8 DB nodes.
Yes, that is correct.
> Regarding Log Files and inserting large amounts of data. Each insert
(assuming you update all attributes in the insert) will generate a log
record of
> (216 - 16) + //Attributes
> 9*4 + //Attribute Header
> 72 = //Log record header
> 308 bytes.
With the UTF8 calculations, this becomes 693 bytes.
> Assuming you manage to insert 10.000 records per second this will
generate 308*10000 bytes of REDO log per second but each node will only get
> a quarter of this with 2 replicas and 8 DB nodes => 750kBytes/second
Again, this changes to 1.73Mb/sec.
> During inserts and an ongoing local checkpoints there will also be
UNDO log records produced.
> If you run out of UNDO log space it can either be that the UNDO log
buffer in the Index-part is too small (2 M) or less likely the UNDO log
buffer
> in the Data-part (16M). I am working on a patch that will make those
buffers configurable.
I love good news :)
> It could also be however that your disk subsystem
> doesn't handle the load and then every buffer size will be too small.
This is very possible as the harddrives that I am testing on are 7200rmp
IDE drives.
> If you are using a default config and a 4.1.4-version then
> NoOfDiskPagesToDiskAfterRestartTUP
> and
> NoOfDiskPagesToDiskAfterRestartACC
> have been updated and now consume 4 MByte of disk bandwidth per
second. Thus during a local checkpoint there will be around 6 MByte per
> second of disk writes.
> Rgrds Mikael
> PS: Please ask more questions if there still are unclarities in my
description.
Thanks!
These explanations have helped me greatly, thank you very much for
taking the time to respond so clearly, and thanks to all of your team
for the ongoing great work!
Devananda
Neopets, Inc