From: Date: May 9 2008 10:13pm Subject: svn commit - mysqldoc@docsrva: r10708 - trunk/refman-5.1 List-Archive: http://lists.mysql.com/commits/46581 Message-Id: <200805092013.m49KDSvo018002@docsrva.mysql.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Author: jstephens Date: 2008-05-09 22:13:28 +0200 (Fri, 09 May 2008) New Revision: 10708 Log: Added working file for improved Cluster 5.1/6.2/6.3 features overview Added: trunk/refman-5.1/mysql-cluster-features.xml Added: trunk/refman-5.1/mysql-cluster-features.xml =================================================================== --- trunk/refman-5.1/mysql-cluster-features.xml (rev 0) +++ trunk/refman-5.1/mysql-cluster-features.xml 2008-05-09 20:13:28 UTC (rev 10708) Changed blocks: 1, Lines Added: 317, Lines Deleted: 0; 12589 bytes @@ -0,0 +1,317 @@ + + +%all.entities; +]> +
+ + MySQL Cluster Features + + + +Monday, February 11, 2008 +MySQL Cluster Features - what they are and what they do + +There are currently three derivates of MySQL Cluster 5.1: + + * MySQL Cluster 5.1 + * MySQL Cluster 5.1 - Carrier Grade Edition (CGE) 6.2 + * MySQL Cluster 5.1 - Carrier Grade Edition 6.3 + +In this post I will explain what features exists where, what they do and what +they mean and how they can be used. + +The way to read it is as follows: Whatever is included in MySQL Cluster 5.1, is +also in CGE 6.2 and 6.3. Whatever is in 6.2 is also in 6.3. + +What version you should use depends on what features you need, but CGE 6.2 is +more mature than 6.3 at the point of writing and is used in production in a +number of places of some of the largest telco's in the world. CGE 6.3 is a +version that many of our customers are developing and testing on right now. + +Moreover, the CGE versions are currently only available in source distributions. +Next post I will explain how to build it on the Linux platform. + +Features in MySQL Cluster 5.1 +Feature + +Category + +Included in version and later +Disk data support Kernel/MySQL 5.1 +Asynchronous replication (geo redundancy) Kernel/MySQL 5.1 +Variable size attributes Kernel/MySQL 5.1 +Optimized Node Recovery Kernel 5.1 +Online create index Kernel/MySQL 5.1 +Batching of Insert/Select MySQL 5.1 +Distribution awareness (direct APIs) Direct APIs 5.1 +Features in MySQL Cluster 5.1 CGE 6.2 +Feature + +Category + +Included in version and later +255 nodes (increased Cluster size) Kernel CGE 6.2 +Page allocation Kernel CGE 6.2 +Online Alter Table Add Column Kernel/MySQL CGE 6.2 +Micro GCP Replication CGE 6.2 +Slave batching Replication CGE 6.2 +Multiconnection optimization MySQL CGE 6.2 +Features in MySQL Cluster 5.1 CGE 6.3 +Feature + +Category + +Included in version and later +Realtime extensions Kernel CGE 6.3 +Compressed Backup/LCP Kernel CGE 6.3 +Two Local Checkpoints Kernel CGE 6.3 +Parallel Node Recovery Kernel CGE 6.3 +Page de-allocation Kernel CGE 6.3 +Epoll Kernel CGE 6.3 +Optimize table Kernel/MySQL CGE 6.3 +Conflict detection/resolution Replication CGE 6.3 +Batching of Update/Delete MySQL CGE 6.3 +Distribution awareness (MySQL) MySQL CGE 6.3 + +Realtime extensions + +The Realtime extensions make it possible to set realtime priority to the data +node process and control on which CPU cores it should operate. In environments +with strong requirements on response times it is usually a good idea to enable +the realtime extensions and tune the SchedulerExecutionTimer and +SchedulerSpinTimer. Preliminary benchmarks (to be published here soon) indicates +significantly lower response times (40% lower) and much more predictable +response times. + +Below is a baseline that can be set in config.ini (please not that I have left out the cpuids here. See below): + +RealtimeScheduler=1 +LockExecuteThreadToCPU=[cpuid] +LockMaintThreadsToCPU=[cpuid] +SchedulerExecutionTimer=80 (ms) +SchedulerSpinTimer=400 (ms) + +Which CPU cores to bind to is very dependent on the enviroment (kernel, network +adapters etc). Use cat /proc/interrupts to figure out which CPU ID that does not +generate bulk of the interrupts on the network adapters. Set +LockExecuteThreadToCPU to lock the data node's main execution thread to that +CPU. Lock the maintenance threads to some other CPU. + +Compressed Backup/LCP + +The space requirements of the backups and LCPs can be lowered by compressing the +Backup and LCP. However, compression costs CPU cycles and should only be used if +the maintenance threads (file system threads) have been bound to a dedicated +core, see (Realtime extensions). Enabling compression requires about 60% cpu +time on that core. + +The space saving is about 42% on completely random data. Thus in many cases it +is probably more. Saving space is usually important in ATCA platforms. + +To enable it, set the following in config.ini: (0=disable (default), 1=enable): + +CompressedBACKUP=0|1 +CompressedLCP=0|1 +Parallel Node Recovery + +Parallel node recovery enables more than one node at a time to perform node +recovery. This means that all data nodes will restart faster in the case of +multi-node failure. + +Two LocalCheckpoints + +Traditionally the data nodes are writing three LCPs before recycling the +REDO-log. This features makes it necessary to write two LCPs before the REDO-log +is recycled. + +Disk Data Support + +Too much to write here. Please see the reference manual. + +Epoll + +epoll is a new and improved way of handling many file descriptors. It is a +cheaper way than using select/poll to find out if a fd has data to be read. +Instead of doing a select and scanning through the fds (done by the OS), epoll +can do this in constant. epoll is specific to Linux and has other names on e.g +Solaris and FreeBSD. Currently, MySQL Cluster only supports this kind of +functionality on Linux. + +Optimize table + +Current version of OPTIMIZE TABLE can defragment records in table which has +DYNAMIC attributes (VARCHAR, VARBINARY , but also columns created with +COLUMN_FORMAT DYNAMIC such as INT, BIGINT) because DYNAMIC attributes are stored +on a separate page compared to FIXED attributes. Thus tables which has DYNAMIC +attributes and are subject to INSERT/DELETE will be defragmented using OPTIMIZE +TABLE, and freed pages will be returned to the page pool (DataMemory) and can +then be reused by another table. + +Page de-allocation + +The memory allocator allocates one page at the time from the Page pool +(DataMemory) to the table that needs it. If a page associated to a table becomes +empty (e.g by DELETE) it will be returned to the page pool and can be reused by +another table or the same. In earlier versions (5.0,5.1, 5.1 CGE 6.2), DELETE +does not release the pages associated with a table. DROP TABLE returns all pages +to the page pool. See Memory allocation and deallocation in MySQL Cluster for +more details. + +Asynchronous Replication (Geo Redundancy) + +Use asynchronous replication between Cluster that are geographically separated +in order to increase redundancy even more. In CGE 6.3 it is also possible to +have master-master replication with conflict detectiona and resolution. Please +see the reference manual. + +Variable size attributes + +VARCHAR and VARBINARY are now variable sized. This means that e.g a VARCHAR(255) +will only occupy the amount of space that are inserted into it. In 5.0 a +VARCHAR(255) occupies 256B of storage no matter if you store only one byte in it. + +Optimized Node Recovery + +Optimized Node Recovery means that a node that is restarted as ndbd will copy +only changed that from the other started node in its node group (partition). +Thus, the faster the node is restarted after a node failure, the faster it will +recover since it will be less changes to copy. Restarting the node as ndbd +--initial will result in that the data node's file system will be deleted and +all data will be copied from the other node in the node group. + +Online create index + +CREATE INDEX 'indexname' ON t1(col1,..,coln); + +Online create index means that this index will be created online, i.e. no ALTER +TABLE is needed and transactions can still use the table while the index is +created. + +255 nodes (increased Cluster size) + +Maximum number of nodes has been increased from 63 to 255. However, max number +of data nodes are still 48. + +Page allocation + +New memory allocator for ndb kernel which allocates memory to a table page by +page (32 K bytes per page). This removes the high memory overhead with the old +memory allocator. + +See Memory allocation and deallocation in MySQL Cluster for more details. + +Online alter table add column + +It is possible to add columns to the end of the table. It is not possible to add +BLOBs and disk data columns online. If the ALTER TABLE ONLINE cannot be +performed online a warning message will be printed and it will not perform the +ALTER TABLE. + +Example usage: + +ALTER ONLINE TABLE t1 ADD COLUMN g INT; + +Micro GCP + +Used in conjuction to asynchronous replication. Micro GCP controls how often +epochs should be generated. An epoch is a consistent group of committed +transactions. If the MySQL server is generating a binary log, this group of +committed transactions written in the binary log and sent as a unit to the slave +mysql server. Normally the epoch is generated every TimebetweenGlobalCheckpoints +when a Global CheckPoint is generated (GCP), which is by default every 2 +seconds. This means that the epochs sent between the master and slave can be +big, and also latency (replication lag) is effected since data between the +master and slave will be sent effectively every TimebetweenGlobalCheckpoints. + +Micro GCP alleviates that by chunking up the Global checkpoint into smaller +parts. The parameter that controls how big the chunks are is TimeBetweenEpochs. +Setting TimeBetweenEpochs=100 [ms] in config.ini usually works very well and +this means that epochs are smaller and sent to the slave mysql server every +100ms. Thus, setting this lowers the replication lag significantly. Use this +also with Slave Batching. + +Slave Batching + +Used in conjuction to asynchronous replication. Instead of the slave applying +one record at a time, Slave Batching allows an entire epoch (see Micro GCP) +consisting one or more transactions. It is highly recommended to enable setting +when using asynchronous replication. + +[mysqld] +... +#1 -- means enable, 0 -- means disable +slave-allow-batching=1 +... +Conflict detection/resolution + +To much to write here. Please see the reference manual. +Multi-connection optimization + +Multi-connection optimization means that one MySQL server can make several +connections(ndb cluster connections) to the data nodes. This means that the +MySQL server can load balance requests on several ndb cluster connections and +get around some mutex contention problems. Depending on queries this gives from +70% and more improvement on throughput from the MySQL server. This feature can +be activated by setting the following in my.cnf: + +[mysqld] +... +ndb-cluster-connection-pool=10 +... + +The above will create 10 ndb cluster connections to the data nodes from this +MySQL server and will require 10 free [mysqld] slots in the config.ini. A good +value is obtained by benchmarking your system and depends on the number of +connections made to the MySQL server from client APIs. + +Batching of Update/Delete + +UPDATE t1 SET y=y+1 WHERE pk_key_part IN (1,2,...); +DELETE FROM t1 WHERE pk_key_part IN (1,2,...); + +Batching of Insert/Select + +INSERT INTO t1 VALUES(1,1), (2,2), ... (n,n); +SELECT * FROM t1 WHERE pk_key_part IN (1,2,...); + +Distribution awareness (MySQL) + +Distribution Awarness is a mechanism for the MySQL server to select the correct +data node to query for information. Consider following example: + +CREATE TABLE t1 ( + userid INTEGER, + data VARCHAR(255), + PRIMARY KEY(userid) +) ENGINE=NDB +PARTITION BY KEY (userid); + +CREATE TABLE t2 ( + userid INTEGER, + serviceid INTEGER AUTO_INCREMENT, + data VARCHAR(255), + PRIMARY KEY(userid, serviceid) +) ENGINE=NDB +PARTITION BY KEY(userid); + +Table t1 will be distributed according to its primary key, userid, which is +the default way of partitioning data. + +Table t2 will be distributed according to userid, even though the full primary +key is (userid,serviceid). This means that a records with the same userid will +be stored in the same partition (node group). Moreover, when e.g reading data +the MySQL server will start transactions on the data node having the data +instead of using a round-robin mechanism to select which data node to start the +transaction on. However, this only has an impact on performance when having >= 4 +data nodes. With two data nodes, each data node will have the same data. +Benchmarks indicates that using Distribution Awareness yields about 45% +performance increase on four data nodes, but a lot more on bigger clusters. + +Distribution awareness (directAPIs) + +See above how to create tables. See Direct API documentation for details. + + + +