From: jon
Date: April 26 2006 2:15am
Subject: svn commit - mysqldoc@docsrva: r1939 - trunk/ndbapi
List-Archive: http://lists.mysql.com/commits/5536
Message-Id: <200604260215.k3Q2FQpY002758@docsrva.mysql.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Author: jstephens
Date: 2006-04-26 04:15:23 +0200 (Wed, 26 Apr 2006)
New Revision: 1939
Log:
More DocBook-ification: Cluster Concepts Review, Adaptive Send Algorithm
Modified:
trunk/ndbapi/overview.xml
Modified: trunk/ndbapi/overview.xml
===================================================================
--- trunk/ndbapi/overview.xml 2006-04-25 23:58:27 UTC (rev 1938)
+++ trunk/ndbapi/overview.xml 2006-04-26 02:15:23 UTC (rev 1939)
@@ -809,13 +809,17 @@
operations operates on a defined unique hash index.
-
- Note: If you want to
- define multiple operations within the same transaction,
- then you need to call NdbTransaction::getNdbOperation()
- or NdbTransaction::getNdbIndexOperation() for each
- operation.
-
+
+
+
+ If you want to define multiple operations within the
+ same transaction, then you need to call
+ NdbTransaction::getNdbOperation() or
+ NdbTransaction::getNdbIndexOperation() for each
+ operation.
+
+
+
@@ -1081,15 +1085,18 @@
NdbIndexScanOperation::readTuples()).
-
- Note: If you want to
- define multiple scan operations within the same
- transaction, then you need to call
- NdbTransaction::getNdbScanOperation()
- or
- NdbTransaction::getNdbIndexScanOperation()
- separately for each operation.
-
+
+
+
+ If you want to define multiple scan operations within
+ the same transaction, then you need to call
+ NdbTransaction::getNdbScanOperation()
+ or
+ NdbTransaction::getNdbIndexScanOperation()
+ separately for each operation.
+
+
+
@@ -1111,12 +1118,16 @@
NdbScanFilter and bounds.
-
- Note: When
- NdbScanFilter is used, each row is examined, whether or
- not it is actually returned. However, when using bounds,
- only rows within the bounds will be examined.
-
+
+
+
+ When NdbScanFilter is used, each row is examined,
+ whether or not it is actually returned. However, when
+ using bounds, only rows within the bounds will be
+ examined.
+
+
+
@@ -1353,14 +1364,17 @@
information about the error.
-
- Note: Transactions are
- not automatically closed when an error
- occurs. You must call
- Ndb::closeTransaction() to close the
- transaction.
-
+
+
+ Transactions are not automatically
+ closed when an error occurs. You must call
+ Ndb::closeTransaction() to close the
+ transaction.
+
+
+
+
One recommended way to handle a transaction failure (that
is, when an error is reported) is as shown here:
@@ -1428,6 +1442,253 @@
+
+ The NDB Kernel is the collection of
+ storage nodes belonging to a MySQL Cluster. The application
+ programmer can for most purposes view the set of all storage
+ nodes as a single entity. Each storage node is made up of three
+ main components:
+
+
+
+
+
+
+ TC: The transaction
+ co-ordinator.
+
+
+
+
+
+ ACC: The index storage
+ component.
+
+
+
+
+
+ TUP: The data storage
+ component.
+
+
+
+
+
+
+ When an application executes a transaction, it connects to one
+ transaction co-ordinator on one storage node. Usually, the
+ programmer does not need to specify which TC should be used, but
+ in some cases where performance is important, the programmer can
+ provide hints
to use a certain TC. (If the node
+ with the desired transaction co-ordinator is down, then another
+ TC will automatically take its place.)
+
+
+
+ Each storage node has an ACC and a TUP which store the indexes
+ and data portions of the database table fragment. Even though a
+ single TC is responsible for the transaction, several ACCs and
+ TUPs on other storage nodes might be involved in that
+ transaction's execution.
+
+
+
+
+ Selecting a Transaction Co-Ordinator
+
+
+ The default method is to select the transaction co-ordinator
+ (TC) determined to be the "nearest" storage node, using a
+ heuristic for proximity based on the type of transporter
+ connection. In order of nearest to most distant, these are:
+
+
+
+
+
+
+ SCI
+
+
+
+
+
+ SHM
+
+
+
+
+
+ TCP/IP (localhost)
+
+
+
+
+
+ TCP/IP (remote host)
+
+
+
+
+
+
+ If there are several connections available with the same
+ proximity, one is selected for each transaction in a
+ round-robin fashion. Optionally, you may set the method for TC
+ selection to round-robin mode, where each new set of
+ transactions is placed on the next data node. The pool of
+ connections from which this selection is made consists of all
+ available connections.
+
+
+
+ [js] Turn "For more info" into xrefs to sections we have have
+ IDs for these.
+
+
+
+ As noted in , the
+ application programmer can provide hints to the NDB API as to
+ which transaction co-ordinator should be uses. This is done by
+ providing a table and a partition key (usually the primary
+ key). If the primary key as the partition key, then the
+ transaction is placed on the node where the primary replica of
+ that record resides. Note that this is only a hint; the system
+ can be reconfigured at any time, in which case the NDB API
+ chooses a transaction co-ordinator without using the hint. For
+ more information, see
+ NdbDictionary::Column::getPartitionKey()
+ and Ndb::startTransaction(). The
+ application programmer can specify the partition key from SQL
+ by using the construct,
+
+
+
+CREATE TABLE ... ENGINE=NDB PARTITION BY KEY (attribute_list);
+
+
+
+ For additional information, see
+ Partitioning
+ and in particular
+ KEY
+ Partitioning in the MySQL Manual.
+
+
+
+
+
+
+ NDB Record Structure
+
+
+ The NDB Cluster storage engine used by
+ MySQL Cluster is a relational database engine storing records
+ in tables just as with any other database system. Table rows
+ represent records as tuples of relational data. When a new
+ table is created, its attribute schema is specified for the
+ table as a whole, and thus each table row has the same
+ structure. Again, this is typical of relational databases, and
+ NDB is no different in this regard.
+
+
+
+ Primary Keys
+
+
+
+ Each record has from 1 up to 32 attributes which belong to the
+ primary key of the table.
+
+
+
+ Transactions
+
+
+
+ Transactions are committed first to main memory, and then to
+ disk after a global checkpoint (GCP) is issued. Since all data
+ are (in most NDB Cluster configurations) synchronously
+ replicated and stored on multiple data nodes, the system can
+ handle processor failures without loss of data. However, in
+ the case of a system-wide failure, all transactions (committed
+ or not) occurring since the most recent GCP are lost.
+
+
+
+ Concurrency Control
+
+
+
+ NDB Cluster uses pessimistic
+ concurrency control based on locking. If a
+ requested lock (implicit and depending on database operation)
+ cannot be attained within a specified time, then a timeout
+ error results.
+
+
+
+ Concurrent transactions as requested by parallel application
+ programs and thread-based applications can sometimes deadlock
+ when they try to access the same information simultaneously.
+ Thus, applications need to be written in a manner such that
+ timeout errors occurring due to such deadlocks are handled
+ gracefully. This generally means that the transaction
+ encountering a timeout should be rolled back and restarted.
+
+
+
+ Hints and Performance
+
+
+
+ Placing the transaction co-ordinator in close proximity to the
+ actual data used in the transaction can in many cases improve
+ performance significantly. This is particularly true for
+ systems using TCP/IP. For example, a Solaris system using a
+ single 500 MHz processor has a cost model for TCP/IP
+ communication which can be represented by the formula
+
+
+
+[30 microseconds] + ([100 nanoseconds] * [number of bytes])
+
+
+
+ This means that if we can ensure that we use
+ popular
links we increase buffering and thus
+ drastically reduce the costs of communication. The same system
+ using SCI has a different cost model:
+
+
+
+[5 microseconds] + ([10 nanoseconds] * [number of bytes])
+
+
+
+ This means that the efficiency of an SCI system is much less
+ dependent on selection of transaction co-ordinators.
+ Typically, TCP/IP systems spend 30 to 60% of their working
+ time on communication, whereas for SCI systems this figure is
+ in the range of 5 to 10%. Thus, employing SCI for data
+ transport means that less effort from the NDB API programmer
+ is required and greater scalability can be achieved, even for
+ applications using data from many different parts of the
+ database.
+
+
+
+ A simple example would be an application that uses many simple
+ updates where a transaction needs to update one record. This
+ record has a 32-bit primary key which also serves as the
+ partitioning key. Then the keyData is used
+ as the address of the integer of the primary key and
+ keyLen is 4.
+
+
+
+
@@ -1444,6 +1705,89 @@
+
+ At the time a transaction is sent using
+ NdbTransaction::execute(), the transaction is in reality not
+ immediately transfered to the NDB Kernel. Instead, the
+ transaction is kept in a special send list (buffer) in the Ndb
+ object to which they belong. The adaptive send algorithm decides
+ when transactions should actually be transferred to the NDB
+ kernel.
+
+
+
+ The NDB API is designed as a multi-threaded interface, and so it
+ is often desirable to transfer database operations from more
+ than one thread at a time. The NDB API keeps track of which Ndb
+ objects are active in transferring information to the NDB kernel
+ and the expected number of threads to interact with the NDB
+ kernel. Note that a given instance of Ndb should be used in at
+ most one thread; different threads should not share the same Ndb
+ object.
+
+
+
+ There are four conditions leading to the transfer of database
+ operations from Ndb object buffers to the NDB kernel:
+
+
+
+
+
+
+ The NDB Transporter (TCP/IP, OSE, SCI or shared memory)
+ decides that a buffer is full and sends it off. The buffer
+ size is implementation-dependent and may change between
+ MySQL Cluster releases. When TCP/IP is the transporter, the
+ buffer size is usually around 64 KB; when using OSE/Delta it
+ is usually less than 2000 bytes. Since each Ndb object
+ provides a single buffer per storage node, the notion of a
+ full
buffer is local to each storage node.
+
+
+
+
+
+ The accumulation of statistical data on transferred
+ information may force sending of buffers to all storage
+ nodes.
+
+
+
+
+
+ Every 10 ms, a special transmission thread checks whether or
+ not any send activity has occurred. If not, then the thread
+ will force transmission to all nodes. This means that 20 ms
+ is the maximum amount of time that database operations are
+ kept waiting before being dispatched. A 10-millisecond limit
+ is likely in future releases of MySQL Cluster; checks more
+ frequent than this require additional support from the
+ operating system.
+
+
+
+
+
+ For methods that are affected by the adaptive send alorithm
+ (such as NdbTransaction::execute()), there is a force
+ parameter that overrides its default behaviour in this
+ regard and forces immediate transmission to all nodes. See
+ the inidvidual NDB API class listings for more information.
+
+
+
+
+
+
+
+
+ The conditions listed above are subject to change in future
+ releases of MySQL Cluster.
+
+
+
+