From: jon Date: April 26 2006 2:15am Subject: svn commit - mysqldoc@docsrva: r1939 - trunk/ndbapi List-Archive: http://lists.mysql.com/commits/5536 Message-Id: <200604260215.k3Q2FQpY002758@docsrva.mysql.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Author: jstephens Date: 2006-04-26 04:15:23 +0200 (Wed, 26 Apr 2006) New Revision: 1939 Log: More DocBook-ification: Cluster Concepts Review, Adaptive Send Algorithm Modified: trunk/ndbapi/overview.xml Modified: trunk/ndbapi/overview.xml =================================================================== --- trunk/ndbapi/overview.xml 2006-04-25 23:58:27 UTC (rev 1938) +++ trunk/ndbapi/overview.xml 2006-04-26 02:15:23 UTC (rev 1939) @@ -809,13 +809,17 @@ operations operates on a defined unique hash index. - - Note: If you want to - define multiple operations within the same transaction, - then you need to call NdbTransaction::getNdbOperation() - or NdbTransaction::getNdbIndexOperation() for each - operation. - + + + + If you want to define multiple operations within the + same transaction, then you need to call + NdbTransaction::getNdbOperation() or + NdbTransaction::getNdbIndexOperation() for each + operation. + + + @@ -1081,15 +1085,18 @@ NdbIndexScanOperation::readTuples()). - - Note: If you want to - define multiple scan operations within the same - transaction, then you need to call - NdbTransaction::getNdbScanOperation() - or - NdbTransaction::getNdbIndexScanOperation() - separately for each operation. - + + + + If you want to define multiple scan operations within + the same transaction, then you need to call + NdbTransaction::getNdbScanOperation() + or + NdbTransaction::getNdbIndexScanOperation() + separately for each operation. + + + @@ -1111,12 +1118,16 @@ NdbScanFilter and bounds. - - Note: When - NdbScanFilter is used, each row is examined, whether or - not it is actually returned. However, when using bounds, - only rows within the bounds will be examined. - + + + + When NdbScanFilter is used, each row is examined, + whether or not it is actually returned. However, when + using bounds, only rows within the bounds will be + examined. + + + @@ -1353,14 +1364,17 @@ information about the error. - - Note: Transactions are - not automatically closed when an error - occurs. You must call - Ndb::closeTransaction() to close the - transaction. - + + + Transactions are not automatically + closed when an error occurs. You must call + Ndb::closeTransaction() to close the + transaction. + + + + One recommended way to handle a transaction failure (that is, when an error is reported) is as shown here: @@ -1428,6 +1442,253 @@ + + The NDB Kernel is the collection of + storage nodes belonging to a MySQL Cluster. The application + programmer can for most purposes view the set of all storage + nodes as a single entity. Each storage node is made up of three + main components: + + + + + + + TC: The transaction + co-ordinator. + + + + + + ACC: The index storage + component. + + + + + + TUP: The data storage + component. + + + + + + + When an application executes a transaction, it connects to one + transaction co-ordinator on one storage node. Usually, the + programmer does not need to specify which TC should be used, but + in some cases where performance is important, the programmer can + provide hints to use a certain TC. (If the node + with the desired transaction co-ordinator is down, then another + TC will automatically take its place.) + + + + Each storage node has an ACC and a TUP which store the indexes + and data portions of the database table fragment. Even though a + single TC is responsible for the transaction, several ACCs and + TUPs on other storage nodes might be involved in that + transaction's execution. + + +
+ + Selecting a Transaction Co-Ordinator + + + The default method is to select the transaction co-ordinator + (TC) determined to be the "nearest" storage node, using a + heuristic for proximity based on the type of transporter + connection. In order of nearest to most distant, these are: + + + + + + + SCI + + + + + + SHM + + + + + + TCP/IP (localhost) + + + + + + TCP/IP (remote host) + + + + + + + If there are several connections available with the same + proximity, one is selected for each transaction in a + round-robin fashion. Optionally, you may set the method for TC + selection to round-robin mode, where each new set of + transactions is placed on the next data node. The pool of + connections from which this selection is made consists of all + available connections. + + + + [js] Turn "For more info" into xrefs to sections we have have + IDs for these. + + + + As noted in , the + application programmer can provide hints to the NDB API as to + which transaction co-ordinator should be uses. This is done by + providing a table and a partition key (usually the primary + key). If the primary key as the partition key, then the + transaction is placed on the node where the primary replica of + that record resides. Note that this is only a hint; the system + can be reconfigured at any time, in which case the NDB API + chooses a transaction co-ordinator without using the hint. For + more information, see + NdbDictionary::Column::getPartitionKey() + and Ndb::startTransaction(). The + application programmer can specify the partition key from SQL + by using the construct, + + + +CREATE TABLE ... ENGINE=NDB PARTITION BY KEY (attribute_list); + + + + For additional information, see + Partitioning + and in particular + KEY + Partitioning in the MySQL Manual. + + +
+ +
+ + NDB Record Structure + + + The NDB Cluster storage engine used by + MySQL Cluster is a relational database engine storing records + in tables just as with any other database system. Table rows + represent records as tuples of relational data. When a new + table is created, its attribute schema is specified for the + table as a whole, and thus each table row has the same + structure. Again, this is typical of relational databases, and + NDB is no different in this regard. + + + + Primary Keys + + + + Each record has from 1 up to 32 attributes which belong to the + primary key of the table. + + + + Transactions + + + + Transactions are committed first to main memory, and then to + disk after a global checkpoint (GCP) is issued. Since all data + are (in most NDB Cluster configurations) synchronously + replicated and stored on multiple data nodes, the system can + handle processor failures without loss of data. However, in + the case of a system-wide failure, all transactions (committed + or not) occurring since the most recent GCP are lost. + + + + Concurrency Control + + + + NDB Cluster uses pessimistic + concurrency control based on locking. If a + requested lock (implicit and depending on database operation) + cannot be attained within a specified time, then a timeout + error results. + + + + Concurrent transactions as requested by parallel application + programs and thread-based applications can sometimes deadlock + when they try to access the same information simultaneously. + Thus, applications need to be written in a manner such that + timeout errors occurring due to such deadlocks are handled + gracefully. This generally means that the transaction + encountering a timeout should be rolled back and restarted. + + + + Hints and Performance + + + + Placing the transaction co-ordinator in close proximity to the + actual data used in the transaction can in many cases improve + performance significantly. This is particularly true for + systems using TCP/IP. For example, a Solaris system using a + single 500 MHz processor has a cost model for TCP/IP + communication which can be represented by the formula + + + +[30 microseconds] + ([100 nanoseconds] * [number of bytes]) + + + + This means that if we can ensure that we use + popular links we increase buffering and thus + drastically reduce the costs of communication. The same system + using SCI has a different cost model: + + + +[5 microseconds] + ([10 nanoseconds] * [number of bytes]) + + + + This means that the efficiency of an SCI system is much less + dependent on selection of transaction co-ordinators. + Typically, TCP/IP systems spend 30 to 60% of their working + time on communication, whereas for SCI systems this figure is + in the range of 5 to 10%. Thus, employing SCI for data + transport means that less effort from the NDB API programmer + is required and greater scalability can be achieved, even for + applications using data from many different parts of the + database. + + + + A simple example would be an application that uses many simple + updates where a transaction needs to update one record. This + record has a 32-bit primary key which also serves as the + partitioning key. Then the keyData is used + as the address of the integer of the primary key and + keyLen is 4. + + +
+
@@ -1444,6 +1705,89 @@ + + At the time a transaction is sent using + NdbTransaction::execute(), the transaction is in reality not + immediately transfered to the NDB Kernel. Instead, the + transaction is kept in a special send list (buffer) in the Ndb + object to which they belong. The adaptive send algorithm decides + when transactions should actually be transferred to the NDB + kernel. + + + + The NDB API is designed as a multi-threaded interface, and so it + is often desirable to transfer database operations from more + than one thread at a time. The NDB API keeps track of which Ndb + objects are active in transferring information to the NDB kernel + and the expected number of threads to interact with the NDB + kernel. Note that a given instance of Ndb should be used in at + most one thread; different threads should not share the same Ndb + object. + + + + There are four conditions leading to the transfer of database + operations from Ndb object buffers to the NDB kernel: + + + + + + + The NDB Transporter (TCP/IP, OSE, SCI or shared memory) + decides that a buffer is full and sends it off. The buffer + size is implementation-dependent and may change between + MySQL Cluster releases. When TCP/IP is the transporter, the + buffer size is usually around 64 KB; when using OSE/Delta it + is usually less than 2000 bytes. Since each Ndb object + provides a single buffer per storage node, the notion of a + full buffer is local to each storage node. + + + + + + The accumulation of statistical data on transferred + information may force sending of buffers to all storage + nodes. + + + + + + Every 10 ms, a special transmission thread checks whether or + not any send activity has occurred. If not, then the thread + will force transmission to all nodes. This means that 20 ms + is the maximum amount of time that database operations are + kept waiting before being dispatched. A 10-millisecond limit + is likely in future releases of MySQL Cluster; checks more + frequent than this require additional support from the + operating system. + + + + + + For methods that are affected by the adaptive send alorithm + (such as NdbTransaction::execute()), there is a force + parameter that overrides its default behaviour in this + regard and forces immediate transmission to all nodes. See + the inidvidual NDB API class listings for more information. + + + + + + + + + The conditions listed above are subject to change in future + releases of MySQL Cluster. + + + +