Author: jstephens
Date: 2008-05-09 22:13:28 +0200 (Fri, 09 May 2008)
New Revision: 10708
Log:
Added working file for improved Cluster 5.1/6.2/6.3 features overview
Added:
trunk/refman-5.1/mysql-cluster-features.xml
Added: trunk/refman-5.1/mysql-cluster-features.xml
===================================================================
--- trunk/refman-5.1/mysql-cluster-features.xml (rev 0)
+++ trunk/refman-5.1/mysql-cluster-features.xml 2008-05-09 20:13:28 UTC (rev 10708)
Changed blocks: 1, Lines Added: 317, Lines Deleted: 0; 12589 bytes
@@ -0,0 +1,317 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
+<!ENTITY % all.entities SYSTEM "all-entities.ent">
+%all.entities;
+]>
+<section id="mysql-cluster-features">
+
+ <title>MySQL Cluster Features</title>
+
+ <para>
+<programlisting>
+Monday, February 11, 2008
+MySQL Cluster Features - what they are and what they do
+
+There are currently three derivates of MySQL Cluster 5.1:
+
+ * MySQL Cluster 5.1
+ * MySQL Cluster 5.1 - Carrier Grade Edition (CGE) 6.2
+ * MySQL Cluster 5.1 - Carrier Grade Edition 6.3
+
+In this post I will explain what features exists where, what they do and what
+they mean and how they can be used.
+
+The way to read it is as follows: Whatever is included in MySQL Cluster 5.1, is
+also in CGE 6.2 and 6.3. Whatever is in 6.2 is also in 6.3.
+
+What version you should use depends on what features you need, but CGE 6.2 is
+more mature than 6.3 at the point of writing and is used in production in a
+number of places of some of the largest telco's in the world. CGE 6.3 is a
+version that many of our customers are developing and testing on right now.
+
+Moreover, the CGE versions are currently only available in source distributions.
+Next post I will explain how to build it on the Linux platform.
+
+Features in MySQL Cluster 5.1
+Feature
+
+Category
+
+Included in version and later
+Disk data support Kernel/MySQL 5.1
+Asynchronous replication (geo redundancy) Kernel/MySQL 5.1
+Variable size attributes Kernel/MySQL 5.1
+Optimized Node Recovery Kernel 5.1
+Online create index Kernel/MySQL 5.1
+Batching of Insert/Select MySQL 5.1
+Distribution awareness (direct APIs) Direct APIs 5.1
+Features in MySQL Cluster 5.1 CGE 6.2
+Feature
+
+Category
+
+Included in version and later
+255 nodes (increased Cluster size) Kernel CGE 6.2
+Page allocation Kernel CGE 6.2
+Online Alter Table Add Column Kernel/MySQL CGE 6.2
+Micro GCP Replication CGE 6.2
+Slave batching Replication CGE 6.2
+Multiconnection optimization MySQL CGE 6.2
+Features in MySQL Cluster 5.1 CGE 6.3
+Feature
+
+Category
+
+Included in version and later
+Realtime extensions Kernel CGE 6.3
+Compressed Backup/LCP Kernel CGE 6.3
+Two Local Checkpoints Kernel CGE 6.3
+Parallel Node Recovery Kernel CGE 6.3
+Page de-allocation Kernel CGE 6.3
+Epoll Kernel CGE 6.3
+Optimize table Kernel/MySQL CGE 6.3
+Conflict detection/resolution Replication CGE 6.3
+Batching of Update/Delete MySQL CGE 6.3
+Distribution awareness (MySQL) MySQL CGE 6.3
+
+Realtime extensions
+
+The Realtime extensions make it possible to set realtime priority to the data
+node process and control on which CPU cores it should operate. In environments
+with strong requirements on response times it is usually a good idea to enable
+the realtime extensions and tune the SchedulerExecutionTimer and
+SchedulerSpinTimer. Preliminary benchmarks (to be published here soon) indicates
+significantly lower response times (40% lower) and much more predictable
+response times.
+
+Below is a baseline that can be set in config.ini (please not that I have left out the
cpuids here. See below):
+
+RealtimeScheduler=1
+LockExecuteThreadToCPU=[cpuid]
+LockMaintThreadsToCPU=[cpuid]
+SchedulerExecutionTimer=80 (ms)
+SchedulerSpinTimer=400 (ms)
+
+Which CPU cores to bind to is very dependent on the enviroment (kernel, network
+adapters etc). Use cat /proc/interrupts to figure out which CPU ID that does not
+generate bulk of the interrupts on the network adapters. Set
+LockExecuteThreadToCPU to lock the data node's main execution thread to that
+CPU. Lock the maintenance threads to some other CPU.
+
+Compressed Backup/LCP
+
+The space requirements of the backups and LCPs can be lowered by compressing the
+Backup and LCP. However, compression costs CPU cycles and should only be used if
+the maintenance threads (file system threads) have been bound to a dedicated
+core, see (Realtime extensions). Enabling compression requires about 60% cpu
+time on that core.
+
+The space saving is about 42% on completely random data. Thus in many cases it
+is probably more. Saving space is usually important in ATCA platforms.
+
+To enable it, set the following in config.ini: (0=disable (default), 1=enable):
+
+CompressedBACKUP=0|1
+CompressedLCP=0|1
+Parallel Node Recovery
+
+Parallel node recovery enables more than one node at a time to perform node
+recovery. This means that all data nodes will restart faster in the case of
+multi-node failure.
+
+Two LocalCheckpoints
+
+Traditionally the data nodes are writing three LCPs before recycling the
+REDO-log. This features makes it necessary to write two LCPs before the REDO-log
+is recycled.
+
+Disk Data Support
+
+Too much to write here. Please see the reference manual.
+
+Epoll
+
+epoll is a new and improved way of handling many file descriptors. It is a
+cheaper way than using select/poll to find out if a fd has data to be read.
+Instead of doing a select and scanning through the fds (done by the OS), epoll
+can do this in constant. epoll is specific to Linux and has other names on e.g
+Solaris and FreeBSD. Currently, MySQL Cluster only supports this kind of
+functionality on Linux.
+
+Optimize table
+
+Current version of OPTIMIZE TABLE can defragment records in table which has
+DYNAMIC attributes (VARCHAR, VARBINARY , but also columns created with
+COLUMN_FORMAT DYNAMIC such as INT, BIGINT) because DYNAMIC attributes are stored
+on a separate page compared to FIXED attributes. Thus tables which has DYNAMIC
+attributes and are subject to INSERT/DELETE will be defragmented using OPTIMIZE
+TABLE, and freed pages will be returned to the page pool (DataMemory) and can
+then be reused by another table.
+
+Page de-allocation
+
+The memory allocator allocates one page at the time from the Page pool
+(DataMemory) to the table that needs it. If a page associated to a table becomes
+empty (e.g by DELETE) it will be returned to the page pool and can be reused by
+another table or the same. In earlier versions (5.0,5.1, 5.1 CGE 6.2), DELETE
+does not release the pages associated with a table. DROP TABLE returns all pages
+to the page pool. See Memory allocation and deallocation in MySQL Cluster for
+more details.
+
+Asynchronous Replication (Geo Redundancy)
+
+Use asynchronous replication between Cluster that are geographically separated
+in order to increase redundancy even more. In CGE 6.3 it is also possible to
+have master-master replication with conflict detectiona and resolution. Please
+see the reference manual.
+
+Variable size attributes
+
+VARCHAR and VARBINARY are now variable sized. This means that e.g a VARCHAR(255)
+will only occupy the amount of space that are inserted into it. In 5.0 a
+VARCHAR(255) occupies 256B of storage no matter if you store only one byte in it.
+
+Optimized Node Recovery
+
+Optimized Node Recovery means that a node that is restarted as ndbd will copy
+only changed that from the other started node in its node group (partition).
+Thus, the faster the node is restarted after a node failure, the faster it will
+recover since it will be less changes to copy. Restarting the node as ndbd
+--initial will result in that the data node's file system will be deleted and
+all data will be copied from the other node in the node group.
+
+Online create index
+
+CREATE INDEX 'indexname' ON t1(col1,..,coln);
+
+Online create index means that this index will be created online, i.e. no ALTER
+TABLE is needed and transactions can still use the table while the index is
+created.
+
+255 nodes (increased Cluster size)
+
+Maximum number of nodes has been increased from 63 to 255. However, max number
+of data nodes are still 48.
+
+Page allocation
+
+New memory allocator for ndb kernel which allocates memory to a table page by
+page (32 K bytes per page). This removes the high memory overhead with the old
+memory allocator.
+
+See Memory allocation and deallocation in MySQL Cluster for more details.
+
+Online alter table add column
+
+It is possible to add columns to the end of the table. It is not possible to add
+BLOBs and disk data columns online. If the ALTER TABLE ONLINE cannot be
+performed online a warning message will be printed and it will not perform the
+ALTER TABLE.
+
+Example usage:
+
+ALTER ONLINE TABLE t1 ADD COLUMN g INT;
+
+Micro GCP
+
+Used in conjuction to asynchronous replication. Micro GCP controls how often
+epochs should be generated. An epoch is a consistent group of committed
+transactions. If the MySQL server is generating a binary log, this group of
+committed transactions written in the binary log and sent as a unit to the slave
+mysql server. Normally the epoch is generated every TimebetweenGlobalCheckpoints
+when a Global CheckPoint is generated (GCP), which is by default every 2
+seconds. This means that the epochs sent between the master and slave can be
+big, and also latency (replication lag) is effected since data between the
+master and slave will be sent effectively every TimebetweenGlobalCheckpoints.
+
+Micro GCP alleviates that by chunking up the Global checkpoint into smaller
+parts. The parameter that controls how big the chunks are is TimeBetweenEpochs.
+Setting TimeBetweenEpochs=100 [ms] in config.ini usually works very well and
+this means that epochs are smaller and sent to the slave mysql server every
+100ms. Thus, setting this lowers the replication lag significantly. Use this
+also with Slave Batching.
+
+Slave Batching
+
+Used in conjuction to asynchronous replication. Instead of the slave applying
+one record at a time, Slave Batching allows an entire epoch (see Micro GCP)
+consisting one or more transactions. It is highly recommended to enable setting
+when using asynchronous replication.
+
+[mysqld]
+...
+#1 -- means enable, 0 -- means disable
+slave-allow-batching=1
+...
+Conflict detection/resolution
+
+To much to write here. Please see the reference manual.
+Multi-connection optimization
+
+Multi-connection optimization means that one MySQL server can make several
+connections(ndb cluster connections) to the data nodes. This means that the
+MySQL server can load balance requests on several ndb cluster connections and
+get around some mutex contention problems. Depending on queries this gives from
+70% and more improvement on throughput from the MySQL server. This feature can
+be activated by setting the following in my.cnf:
+
+[mysqld]
+...
+ndb-cluster-connection-pool=10
+...
+
+The above will create 10 ndb cluster connections to the data nodes from this
+MySQL server and will require 10 free [mysqld] slots in the config.ini. A good
+value is obtained by benchmarking your system and depends on the number of
+connections made to the MySQL server from client APIs.
+
+Batching of Update/Delete
+
+UPDATE t1 SET y=y+1 WHERE pk_key_part IN (1,2,...);
+DELETE FROM t1 WHERE pk_key_part IN (1,2,...);
+
+Batching of Insert/Select
+
+INSERT INTO t1 VALUES(1,1), (2,2), ... (n,n);
+SELECT * FROM t1 WHERE pk_key_part IN (1,2,...);
+
+Distribution awareness (MySQL)
+
+Distribution Awarness is a mechanism for the MySQL server to select the correct
+data node to query for information. Consider following example:
+
+CREATE TABLE t1 (
+ userid INTEGER,
+ data VARCHAR(255),
+ PRIMARY KEY(userid)
+) ENGINE=NDB
+PARTITION BY KEY (userid);
+
+CREATE TABLE t2 (
+ userid INTEGER,
+ serviceid INTEGER AUTO_INCREMENT,
+ data VARCHAR(255),
+ PRIMARY KEY(userid, serviceid)
+) ENGINE=NDB
+PARTITION BY KEY(userid);
+
+Table t1 will be distributed according to its primary key, userid, which is
+the default way of partitioning data.
+
+Table t2 will be distributed according to userid, even though the full primary
+key is (userid,serviceid). This means that a records with the same userid will
+be stored in the same partition (node group). Moreover, when e.g reading data
+the MySQL server will start transactions on the data node having the data
+instead of using a round-robin mechanism to select which data node to start the
+transaction on. However, this only has an impact on performance when having >= 4
+data nodes. With two data nodes, each data node will have the same data.
+Benchmarks indicates that using Distribution Awareness yields about 45%
+performance increase on four data nodes, but a lot more on bigger clusters.
+
+Distribution awareness (directAPIs)
+
+See above how to create tables. See Direct API documentation for details.
+</programlisting>
+ </para>
+
+</section>
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r10708 - trunk/refman-5.1 | jon | 9 May 2008 |