List:Cluster« Previous MessageNext Message »
From:Devananda Date:August 20 2004 4:45pm
Subject:Re: Data persistency
View as plain text  
Awesome response! Thank you!


 >> Could you explain the difference between 
[DB]TimeBetweenLocalCheckpoints and [DB]TimeBetweenGlobalCheckpoints, 
and how both of these relate to [DB]NoOfFragmentLogFiles?

 > Even though they are similar in name local and global checkpoint is 
fairly dissimilar.
 > Global checkpoint is a protocol that ensures that all REDO logs upto 
a certain global checkpoint identifier is flushed to disk. One can think 
of global checkpoint as a method of grouping transactions together for 
flushing them to disk. At system restart one restores the system from 
one of those global checkpoints.
 > Local checkpoint is the method used to ensure that the log size 
doesn't have to be infinite. By ensuring that all data pages are written 
to disk we know that we don't need old log records anymore and can thus 
get rid of those.
 > MySQL Cluster contains the capability to restart from old global 
checkpoints and thus actually save log records until three local 
checkpoints have been performed.

 > There is a section planned in the documentation for an example of how 
to configure those parameters.
 >

I look forward to reading it as soon as it is released! :)


 >> One of the functions that I am testing is importing a large quantity 
of data into the cluster, and so I am trying to figure out what settings 
need to be what for this to be possible - I hope to be able to move a 
very large table (75mil rows, 8 columns: 4 int, 3 varchar(20), 1 
varchar(100), all but 1 column allowed to be NULL, and charset has to be 
UTF8). I continue to run into problems with UNDO buffers filling up 
during the import process, though I have had mostly success lately, when 
using small import files.


 > As you don't mention any primary key, I will assume that you haven't 
defined one.
 > Are there any indexes, it is important to know this to make the 
calculations below.
 > Each row will store in DataMemory:
 > 1) 16 bytes of overhead (4 bytes of pointer to linked list of 
operations performed on the record, 4 bytes of NULL bits, 4 bytes for 
future use, 4 bytes for checksum)
 > 2) 8 bytes primary key
 > 3) 4 int => 4*4 bytes = 16 bytes
 > 4) 3 varchar(20) => 3*24 bytes = 72 bytes
 > 5) 1 varchar(100) => 104 bytes
 > => 216 bytes/per record stored in 32768 byte pages with 128 bytes 
header => 151 records per 32k page.
 > => 151k records per 32M => 3M rows ~ 640M => 75M rows ~16 GByte

 > In IndexMemory you have with a 8 byte primary key 33 bytes of storage 
per record. Thus 96M is sufficient for almost 3 M rows and with 8 DB 
nodes around
 > 11 M rows should be ok.

Wonderful explanation, thank you!
There is a primary key on one of the varchar(20) fields, and no other 
indexes.

In the documentation, I found this regarding UTF8's memory requirements: 
"Tip: To save space with UTF8, use VARCHAR instead of CHAR. Otherwise, 
MySQL has to reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column, 
because that's the maximum possible length."  Since the cluster reserves 
the maximum space for varchar columns, does this mean that a UTF8 
varchar(20) field will require (20*3 + 4 = ) 64 bytes? If this is the 
case, then it would be 512 bytes/per record + (25+64) bytes for the hash 
index + ?? for the ordered index. Since there should never be any 
lookups besides single rows based on primary key, I tried to create the 
table with PRIMARY KEY ('username') USING HASH but received an error.

ERROR 1064 (42000): You have an error in your SQL syntax; check the 
manual that corresponds to your MySQL server version for the right 
syntax to use near 'USING HASH

Assuming I can drop the ordered index and that UTF8 requires tripple the 
space, there will be 601 bytes/per record => 54 records/page => 54k 
records/ 32M => 75M records ~ 45GB. Then with 8 nodes and 2 replicas 
(thus 4 pairs), this would require a minimum of 12GB DataMemory per 
machine. Is this correct?


 > I presume you have 2 replicas and 8 DB nodes.

Yes, that is correct.


 > Regarding Log Files and inserting large amounts of data. Each insert 
(assuming you update all attributes in the insert) will generate a log 
record of

 > (216 - 16) + //Attributes
 > 9*4 + //Attribute Header
 > 72 = //Log record header
 > 308 bytes.

With the UTF8 calculations, this becomes 693 bytes.


 > Assuming you manage to insert 10.000 records per second this will 
generate 308*10000 bytes of REDO log per second but each node will only get
 > a quarter of this with 2 replicas and 8 DB nodes => 750kBytes/second

Again, this changes to 1.73Mb/sec.

 > During inserts and an ongoing local checkpoints there will also be 
UNDO log records produced.
 > If you run out of UNDO log space it can either be that the UNDO log 
buffer in the Index-part is too small (2 M) or less likely the UNDO log 
buffer
 > in the Data-part (16M). I am working on a patch that will make those 
buffers configurable.

I love good news :)

 > It could also be however that your disk subsystem
 > doesn't handle the load and then every buffer size will be too small.

This is very possible as the harddrives that I am testing on are 7200rmp 
IDE drives.

 > If you are using a default config and a 4.1.4-version then
 > NoOfDiskPagesToDiskAfterRestartTUP
 > and
 > NoOfDiskPagesToDiskAfterRestartACC
 > have been updated and now consume 4 MByte of disk bandwidth per 
second. Thus during a local checkpoint there will be around 6 MByte per
 > second of disk writes.



 > Rgrds Mikael
 > PS: Please ask more questions if there still are unclarities in my 
description.

Thanks!


These explanations have helped me greatly, thank you very much for 
taking the time to respond so clearly, and thanks to all of your team 
for the ongoing great work!

Devananda
Neopets, Inc
Thread
Data persistencyRichard Goh19 Aug
  • Re: Data persistencyMikael Ronström19 Aug
    • Re: Data persistencyDevananda19 Aug
      • Re: Data persistencyMikael Ronström20 Aug
        • Re: Data persistencyDevananda20 Aug
          • Re: Data persistencyDevananda20 Aug
          • Re: Data persistencyMikael Ronström21 Aug
            • Re: Data persistencyDevananda21 Aug
            • Re: Data persistencyClint Byrum21 Aug
              • Re: Data persistencyMikael Ronström21 Aug