List:Commits« Previous MessageNext Message »
From:jon Date:April 26 2006 2:15am
Subject:svn commit - mysqldoc@docsrva: r1939 - trunk/ndbapi
View as plain text  
Author: jstephens
Date: 2006-04-26 04:15:23 +0200 (Wed, 26 Apr 2006)
New Revision: 1939

Log:

More DocBook-ification: Cluster Concepts Review, Adaptive Send Algorithm



Modified:
   trunk/ndbapi/overview.xml

Modified: trunk/ndbapi/overview.xml
===================================================================
--- trunk/ndbapi/overview.xml	2006-04-25 23:58:27 UTC (rev 1938)
+++ trunk/ndbapi/overview.xml	2006-04-26 02:15:23 UTC (rev 1939)
@@ -809,13 +809,17 @@
                 operations operates on a defined unique hash index.
               </para>
 
-              <para>
-                <emphasis role="bold">Note</emphasis>: If you want to
-                define multiple operations within the same transaction,
-                then you need to call NdbTransaction::getNdbOperation()
-                or NdbTransaction::getNdbIndexOperation() for each
-                operation.
-              </para>
+              <note>
+
+                <para>
+                  If you want to define multiple operations within the
+                  same transaction, then you need to call
+                  NdbTransaction::getNdbOperation() or
+                  NdbTransaction::getNdbIndexOperation() for each
+                  operation.
+                </para>
+
+              </note>
             </listitem>
 
             <listitem>
@@ -1081,15 +1085,18 @@
                 <literal>NdbIndexScanOperation::readTuples()</literal>).
               </para>
 
-              <para>
-                <emphasis role="bold">Note</emphasis>: If you want to
-                define multiple scan operations within the same
-                transaction, then you need to call
-                <literal>NdbTransaction::getNdbScanOperation()</literal>
-                or
-                <literal>NdbTransaction::getNdbIndexScanOperation()</literal>
-                separately for each operation.
-              </para>
+              <note>
+
+                <para>
+                  If you want to define multiple scan operations within
+                  the same transaction, then you need to call
+                  <literal>NdbTransaction::getNdbScanOperation()</literal>
+                  or
+                  <literal>NdbTransaction::getNdbIndexScanOperation()</literal>
+                  separately for each operation.
+                </para>
+
+              </note>
             </listitem>
 
             <listitem>
@@ -1111,12 +1118,16 @@
                 <literal>NdbScanFilter</literal> and bounds.
               </para>
 
-              <para>
-                <emphasis role="bold">Note</emphasis>: When
-                NdbScanFilter is used, each row is examined, whether or
-                not it is actually returned. However, when using bounds,
-                only rows within the bounds will be examined.
-              </para>
+              <note>
+
+                <para>
+                  When NdbScanFilter is used, each row is examined,
+                  whether or not it is actually returned. However, when
+                  using bounds, only rows within the bounds will be
+                  examined.
+                </para>
+
+              </note>
             </listitem>
 
             <listitem>
@@ -1353,14 +1364,17 @@
             information about the error.
           </para>
 
-          <para>
-            <emphasis role="bold">Note</emphasis>: Transactions are
-            <emphasis>not</emphasis> automatically closed when an error
-            occurs. You must call
-            <literal>Ndb::closeTransaction()</literal> to close the
-            transaction.
-          </para>
+          <note>
 
+            <para>
+              Transactions are <emphasis>not</emphasis> automatically
+              closed when an error occurs. You must call
+              <literal>Ndb::closeTransaction()</literal> to close the
+              transaction.
+            </para>
+
+          </note>
+
           <para>
             One recommended way to handle a transaction failure (that
             is, when an error is reported) is as shown here:
@@ -1428,6 +1442,253 @@
 
       </abstract>
 
+      <para>
+        The <firstterm>NDB Kernel</firstterm> is the collection of
+        storage nodes belonging to a MySQL Cluster. The application
+        programmer can for most purposes view the set of all storage
+        nodes as a single entity. Each storage node is made up of three
+        main components:
+      </para>
+
+      <itemizedlist>
+
+        <listitem>
+          <para>
+            <emphasis role="bold">TC</emphasis>: The transaction
+            co-ordinator.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            <emphasis role="bold">ACC</emphasis>: The index storage
+            component.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            <emphasis role="bold">TUP</emphasis>: The data storage
+            component.
+          </para>
+        </listitem>
+
+      </itemizedlist>
+
+      <para>
+        When an application executes a transaction, it connects to one
+        transaction co-ordinator on one storage node. Usually, the
+        programmer does not need to specify which TC should be used, but
+        in some cases where performance is important, the programmer can
+        provide <quote>hints</quote> to use a certain TC. (If the node
+        with the desired transaction co-ordinator is down, then another
+        TC will automatically take its place.)
+      </para>
+
+      <para>
+        Each storage node has an ACC and a TUP which store the indexes
+        and data portions of the database table fragment. Even though a
+        single TC is responsible for the transaction, several ACCs and
+        TUPs on other storage nodes might be involved in that
+        transaction's execution.
+      </para>
+
+      <section id="overview-selecting-tc">
+
+        <title>Selecting a Transaction Co-Ordinator</title>
+
+        <para>
+          The default method is to select the transaction co-ordinator
+          (TC) determined to be the "nearest" storage node, using a
+          heuristic for proximity based on the type of transporter
+          connection. In order of nearest to most distant, these are:
+        </para>
+
+        <orderedlist>
+
+          <listitem>
+            <para>
+              SCI
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              SHM
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              TCP/IP (localhost)
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              TCP/IP (remote host)
+            </para>
+          </listitem>
+
+        </orderedlist>
+
+        <para>
+          If there are several connections available with the same
+          proximity, one is selected for each transaction in a
+          round-robin fashion. Optionally, you may set the method for TC
+          selection to round-robin mode, where each new set of
+          transactions is placed on the next data node. The pool of
+          connections from which this selection is made consists of all
+          available connections.
+        </para>
+
+        <remark role="todo">
+          [js] Turn "For more info" into xrefs to sections we have have
+          IDs for these.
+        </remark>
+
+        <para>
+          As noted in <xref linkend="overview-cluster-concepts"/>, the
+          application programmer can provide hints to the NDB API as to
+          which transaction co-ordinator should be uses. This is done by
+          providing a table and a partition key (usually the primary
+          key). If the primary key as the partition key, then the
+          transaction is placed on the node where the primary replica of
+          that record resides. Note that this is only a hint; the system
+          can be reconfigured at any time, in which case the NDB API
+          chooses a transaction co-ordinator without using the hint. For
+          more information, see
+          <literal>NdbDictionary::Column::getPartitionKey()</literal>
+          and <literal>Ndb::startTransaction()</literal>. The
+          application programmer can specify the partition key from SQL
+          by using the construct,
+        </para>
+
+<programlisting>
+CREATE TABLE ... ENGINE=NDB PARTITION BY KEY (<replaceable>attribute_list</replaceable>);
+</programlisting>
+
+        <para>
+          For additional information, see
+          <ulink url="&refman-base-url;partitioning.html">Partitioning</ulink>
+          and in particular
+          <ulink url="&refman-base-url;partitioning-key.html"><literal>KEY</literal>
+          Partitioning</ulink> in the MySQL Manual.
+        </para>
+
+      </section>
+
+      <section id="overview-ndb-record-structure">
+
+        <title>NDB Record Structure</title>
+
+        <para>
+          The <literal>NDB Cluster</literal> storage engine used by
+          MySQL Cluster is a relational database engine storing records
+          in tables just as with any other database system. Table rows
+          represent records as tuples of relational data. When a new
+          table is created, its attribute schema is specified for the
+          table as a whole, and thus each table row has the same
+          structure. Again, this is typical of relational databases, and
+          <literal>NDB</literal> is no different in this regard.
+        </para>
+
+        <para>
+          <emphasis role="bold">Primary Keys</emphasis>
+        </para>
+
+        <para>
+          Each record has from 1 up to 32 attributes which belong to the
+          primary key of the table.
+        </para>
+
+        <para>
+          <emphasis role="bold">Transactions</emphasis>
+        </para>
+
+        <para>
+          Transactions are committed first to main memory, and then to
+          disk after a global checkpoint (GCP) is issued. Since all data
+          are (in most NDB Cluster configurations) synchronously
+          replicated and stored on multiple data nodes, the system can
+          handle processor failures without loss of data. However, in
+          the case of a system-wide failure, all transactions (committed
+          or not) occurring since the most recent GCP are lost.
+        </para>
+
+        <para>
+          <emphasis role="bold">Concurrency Control</emphasis>
+        </para>
+
+        <para>
+          <literal>NDB Cluster</literal> uses <firstterm>pessimistic
+          concurrency control</firstterm> based on locking. If a
+          requested lock (implicit and depending on database operation)
+          cannot be attained within a specified time, then a timeout
+          error results.
+        </para>
+
+        <para>
+          Concurrent transactions as requested by parallel application
+          programs and thread-based applications can sometimes deadlock
+          when they try to access the same information simultaneously.
+          Thus, applications need to be written in a manner such that
+          timeout errors occurring due to such deadlocks are handled
+          gracefully. This generally means that the transaction
+          encountering a timeout should be rolled back and restarted.
+        </para>
+
+        <para>
+          <emphasis role="bold">Hints and Performance</emphasis>
+        </para>
+
+        <para>
+          Placing the transaction co-ordinator in close proximity to the
+          actual data used in the transaction can in many cases improve
+          performance significantly. This is particularly true for
+          systems using TCP/IP. For example, a Solaris system using a
+          single 500 MHz processor has a cost model for TCP/IP
+          communication which can be represented by the formula
+        </para>
+
+<programlisting>
+[30 microseconds] + ([100 nanoseconds] * [number of bytes])
+</programlisting>
+
+        <para>
+          This means that if we can ensure that we use
+          <quote>popular</quote> links we increase buffering and thus
+          drastically reduce the costs of communication. The same system
+          using SCI has a different cost model:
+        </para>
+
+<programlisting>
+[5 microseconds] + ([10 nanoseconds] * [number of bytes])
+</programlisting>
+
+        <para>
+          This means that the efficiency of an SCI system is much less
+          dependent on selection of transaction co-ordinators.
+          Typically, TCP/IP systems spend 30 to 60% of their working
+          time on communication, whereas for SCI systems this figure is
+          in the range of 5 to 10%. Thus, employing SCI for data
+          transport means that less effort from the NDB API programmer
+          is required and greater scalability can be achieved, even for
+          applications using data from many different parts of the
+          database.
+        </para>
+
+        <para>
+          A simple example would be an application that uses many simple
+          updates where a transaction needs to update one record. This
+          record has a 32-bit primary key which also serves as the
+          partitioning key. Then the <literal>keyData</literal> is used
+          as the address of the integer of the primary key and
+          <literal>keyLen</literal> is <literal>4</literal>.
+        </para>
+
+      </section>
+
     </section>
 
     <section id="overview-adaptive-send">
@@ -1444,6 +1705,89 @@
 
       </abstract>
 
+      <para>
+        At the time a transaction is sent using
+        NdbTransaction::execute(), the transaction is in reality not
+        immediately transfered to the NDB Kernel. Instead, the
+        transaction is kept in a special send list (buffer) in the Ndb
+        object to which they belong. The adaptive send algorithm decides
+        when transactions should actually be transferred to the NDB
+        kernel.
+      </para>
+
+      <para>
+        The NDB API is designed as a multi-threaded interface, and so it
+        is often desirable to transfer database operations from more
+        than one thread at a time. The NDB API keeps track of which Ndb
+        objects are active in transferring information to the NDB kernel
+        and the expected number of threads to interact with the NDB
+        kernel. Note that a given instance of Ndb should be used in at
+        most one thread; different threads should not share the same Ndb
+        object.
+      </para>
+
+      <para>
+        There are four conditions leading to the transfer of database
+        operations from Ndb object buffers to the NDB kernel:
+      </para>
+
+      <orderedlist>
+
+        <listitem>
+          <para>
+            The NDB Transporter (TCP/IP, OSE, SCI or shared memory)
+            decides that a buffer is full and sends it off. The buffer
+            size is implementation-dependent and may change between
+            MySQL Cluster releases. When TCP/IP is the transporter, the
+            buffer size is usually around 64 KB; when using OSE/Delta it
+            is usually less than 2000 bytes. Since each Ndb object
+            provides a single buffer per storage node, the notion of a
+            <quote>full</quote> buffer is local to each storage node.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            The accumulation of statistical data on transferred
+            information may force sending of buffers to all storage
+            nodes.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Every 10 ms, a special transmission thread checks whether or
+            not any send activity has occurred. If not, then the thread
+            will force transmission to all nodes. This means that 20 ms
+            is the maximum amount of time that database operations are
+            kept waiting before being dispatched. A 10-millisecond limit
+            is likely in future releases of MySQL Cluster; checks more
+            frequent than this require additional support from the
+            operating system.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            For methods that are affected by the adaptive send alorithm
+            (such as NdbTransaction::execute()), there is a force
+            parameter that overrides its default behaviour in this
+            regard and forces immediate transmission to all nodes. See
+            the inidvidual NDB API class listings for more information.
+          </para>
+        </listitem>
+
+      </orderedlist>
+
+      <note>
+
+        <para>
+          The conditions listed above are subject to change in future
+          releases of MySQL Cluster.
+        </para>
+
+      </note>
+
     </section>
 
     <section id="overview-ndb-class-quickref">

Thread
svn commit - mysqldoc@docsrva: r1939 - trunk/ndbapijon26 Apr