List:Commits« Previous MessageNext Message »
From:jon Date:June 28 2007 9:29pm
Subject:svn commit - mysqldoc@docsrva: r6948 - trunk/ndbapi
View as plain text  
Author: jstephens
Date: 2007-06-28 23:29:23 +0200 (Thu, 28 Jun 2007)
New Revision: 6948

Log:

More of the Start Phases doc.



Modified:
   trunk/ndbapi/ndb-internals-start-phases.xml
   trunk/ndbapi/start-phases-tmp.txt


Modified: trunk/ndbapi/ndb-internals-start-phases.xml
===================================================================
--- trunk/ndbapi/ndb-internals-start-phases.xml	2007-06-28 20:07:28 UTC (rev 6947)
+++ trunk/ndbapi/ndb-internals-start-phases.xml	2007-06-28 21:29:23 UTC (rev 6948)
Changed blocks: 12, Lines Added: 325, Lines Deleted: 37; 18065 bytes

@@ -11,8 +11,8 @@
     <title>Read Configuration Phase (Phase -1)</title>
 
     <para>
-      Before the data node actually starts, a number of other things
-      must be set up and initialized, including the block objects,
+      Before the data node actually starts, a number of other setup and
+      initialization tasks must be done for the block objects,
       transporters, and watchdog checks, among others.
     </para>
 

@@ -229,9 +229,9 @@
 
     <para>
       Most <literal>NDB</literal> kernel blocks begin their start phases
-      at <literal>STTOR</literal> phase 1, with the exception of
+      at <literal>STTOR</literal> Phase 1, with the exception of
       <literal>NDBFS</literal> and <literal>NDBCNTR</literal>, which
-      begin with phase 0, as can be seen by inspecting the first value
+      begin with Phase 0, as can be seen by inspecting the first value
       for each element in the <literal>ALL_BLOCKS</literal> array
       (defined in
       <filename>src/kernel/blocks/ndbcntr/NdbcntrMain.cpp</filename>).

@@ -246,14 +246,13 @@
       <literal>STTOR</literal> signals are sent out in the order in
       which the kernel blocks are listed in the
       <literal>ALL_BLOCKS</literal> array. While
-      <literal>NDBCNTR</literal> goes through the start phases all the
-      way from start phase 0 to start phase 255, most of these start
-      phases are empty.
+      <literal>NDBCNTR</literal> goes through start phases 0 to 255,
+      most of these are empty.
     </para>
 
     <para>
-      Both activities in startphase 0 have to do with initialization of
-      the <literal>NDB</literal> filesystem. First, if necessary,
+      Both activities in Phase 0 have to do with initialization of the
+      <literal>NDB</literal> filesystem. First, if necessary,
       <literal>NDBFS</literal> creates the filesystem directory for the
       data node. In the case of an initial start,
       <literal>NDBCNTR</literal> clears any existing files from the

@@ -297,9 +296,13 @@
       <para>
         In the following table, and throughout this text, we sometimes
         refer to <literal>STTOR</literal> start phases simply as
-        <quote>start phases</quote>. <literal>NDB_STTOR</literal> start
-        phases are always qualified as such, and so referred to as
-        <quote><literal>NDB_STTOR</literal> start phases</quote>.
+        <quote>start phases</quote> or <quote>Phase
+        <replaceable>N</replaceable></quote> (where
+        <replaceable>N</replaceable> is some number).
+        <literal>NDB_STTOR</literal> start phases are always qualified
+        as such, and so referred to as
+        <quote><literal>NDB_STTOR</literal> start phases</quote> or
+        <quote><literal>NDB_STTOR</literal> phases</quote>.
       </para>
     </note>
 

@@ -384,11 +387,11 @@
           </row>
           <row>
             <entry><literal>PGMAN</literal></entry>
-            <entry>1, 3, 7 (phase 7 currently empty)</entry>
+            <entry>1, 3, 7 (Phase 7 currently empty)</entry>
           </row>
           <row>
             <entry><literal>RESTORE</literal></entry>
-            <entry>1,3 (only in phase 1 is any real work done)</entry>
+            <entry>1,3 (only in Phase 1 is any real work done)</entry>
           </row>
         </tbody>
       </tgroup>

@@ -659,8 +662,8 @@
     </para>
 
     <para>
-      <literal>NDBCNTR</literal> executes phase 2 of the
-      <literal>NDB_STTOR</literal> startphases, with no other
+      <literal>NDBCNTR</literal> executes the second of the
+      <literal>NDB_STTOR</literal> start phases, with no other
       <literal>NDBCNTR</literal> activity taking place during this
       <literal>STTOR</literal> phase.
     </para>

@@ -784,34 +787,34 @@
       node.)
     </para>
 
-<!--  STOP POINT  -->
-
     <para>
       For node restarts and initial node restarts no more work is done
       in this phase. For initial starts the work is done when all nodes
-      have created the initial restart information and initialised the
+      have created the initial restart information and initialized the
       system file.
     </para>
 
     <para>
-      For system restarts this is where most of the work is performed
-      activated by sending the NDB_STARTREQ signal from NDBCNTR to DBDIH
-      in the master. This signal is sent when all nodes in the system
-      restart have reached to this point in the restart. Thus we first
-      have a synchronisation point. For a description of the system
-      restart version of phase 4, see
-      <xref linkend="ndb-internals-start-phases-system-restart-phase-4"/>.
+      For system restarts this is where most of the work is performed,
+      initiated by sending the <literal>NDB_STARTREQ</literal> signal
+      from <literal>NDBCNTR</literal> to <literal>DBDIH</literal> in the
+      master. This signal is sent when all nodes in the system restart
+      have reached this point in the restart. This we can mark as our
+      first synchronization point for system restarts, designated
+      <literal>WAITPOINT_4_1</literal>.
     </para>
 
     <para>
-      SYNCHRONISATION POINT for system restarts: WAITPOINT_4_1
+      For a description of the system restart version of Phase 4, see
+      <xref linkend="ndb-internals-start-phases-system-restart-phase-4"/>.
     </para>
 
     <para>
-      After completing execution of the NDB_STARTREQ signal the master
-      will send CNTR_WAITREP with WAITPOINT_4_2 to all nodes, this will
-      end the NDB_STTOR phase 3 and will also be last activity in
-      startphase 4.
+      After completing execution of the <literal>NDB_STARTREQ</literal>
+      signal, the master sends a <literal>CNTR_WAITREP</literal> signal
+      with <literal>WAITPOINT_4_2</literal> to all nodes. This ends
+      <literal>NDB_STTOR</literal> phase 3 as well as
+      (<literal>STTOR</literal>) Phase 4.
     </para>
 
   </section>

@@ -820,7 +823,13 @@
 
     <title><literal>STTOR</literal> Phase 5</title>
 
-    <para></para>
+    <para>
+      All that takes place in Phase 5 is the delivery by
+      <literal>NDBCNTR</literal> of <literal>NDB_STTOR</literal> phase
+      4; the only block that acts on this signal is
+      <literal>DBDIH</literal> that controls most of the part of a data
+      node start that is database-related.
+    </para>
 
   </section>
 

@@ -828,23 +837,250 @@
 
     <title><literal>NDB_STTOR</literal> Phase 4</title>
 
-    <para></para>
+    <para>
+      Some initialisation of local checkpoint variables takes place in
+      this phase, and for initial restarts, this is all that happens in
+      this phase.
+    </para>
 
+    <para>
+      For system restarts, all required takeovers are also performed.
+      Currently, this means that all nodes whose states could not be
+      recovered using the redo log are restarted by copying to them all
+      the necessary data from the <quote>live</quote> data nodes.
+
+      <remark role="NOTE">
+        [js] Commented out until Mikael supplies the material on node
+        takeovers.
+      </remark>
+
+<!-- For a
+      description of this process, see 
+      <xref linkend="ndb-internals-start-phases-takeovers"/>. 
+    </para>
+    <para>-->
+
+      For node restarts and initial node restarts, the master node
+      performs a number of services, requested to do so by sending the
+      <literal>START_MEREQ</literal> signal to it. This phase is
+      complete when the master responds with a
+      <literal>START_MECONF</literal> message, and is described in
+      <xref linkend="ndb-internals-start-phases-start-mereq-handling"/>.
+    </para>
+
+    <para>
+      After ensuring that the tasks assigned to <literal>DBDIH</literal>
+      tasks in the NDB_STTOR phase 4 are complete,
+      <literal>NDBCNTR</literal> performs some work on its own. For
+      initial starts, it creates the system table that keeps track of
+      unique identifiers such as those used for
+      <literal>AUTO_INCREMENT</literal>. Following the WAITPOINT_4_1
+      synchronization point, all system restarts proceed immediately to
+      <literal>NDB_STTOR</literal> phase 5, which is handled by the
+      <literal>DBDIH</literal> block. See
+      <xref linkend="ndb-internals-start-phases-ndb-sttor-5"/>, for more
+      information.
+    </para>
+
   </section>
 
   <section id="ndb-internals-start-phases-ndb-sttor-5">
 
     <title><literal>NDB_STTOR</literal> Phase 5</title>
 
-    <para></para>
+    <para>
+      For initial starts and system restarts this phase means executing
+      a local checkpoint. This is handled by the master so that the
+      other nodes will return immediately from this phase. Node restarts
+      and initial node restarts perform the copying of the records from
+      the primary replica to the starting replicas in this phase. Local
+      checkpoints are enabled before the copying process is begun.
+    </para>
 
+    <para>
+      Copying the data to a starting node is part of the node takeover
+      protocol. As part of this protocol, the node status of the
+      starting node is updated; this is communicated using the global
+      checkpoint protocol. Waiting for these events to take place
+      ensures that the new node status is communicated to all nodes and
+      their system files.
+    </para>
+
+    <para>
+      After the node's status has been communicated, all nodes are
+      signalled that we are about to start the takeover protocol for
+      this node. Part of this protocol consists of Steps 3 - 9 during
+      the system restart phase as described below. This means that
+      restoration of all the fragments, preparation for execution of the
+      redo log, execution of the redo log, and finally reporting back to
+      <literal>DBDIH</literal> when the execution of the redo log is
+      completed, are all part of this process.
+    </para>
+
+    <para>
+      After preparations are complete, copy phase for each fragment in
+      the node must be performed. The process of copying a fragment
+      involves the following steps:
+
+      <orderedlist>
+
+        <listitem>
+          <para>
+            The <literal>DBLQH</literal> kernel block in the starting
+            node is informed that the copy process is about to begin by
+            sending it a <literal>PREPARE_COPY_FRAGREQ</literal> signal.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            When <literal>DBLQH</literal> acknowledges this request a
+            <literal>CREATE_FRAGREQ</literal> signal is sent to all
+            nodes notify them of the preparation being made to copy data
+            to this replica for this table fragment.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            After all nodes have acknowledged this, a
+            <literal>COPY_FRAGREQ</literal> signal is sent to the node
+            from which the data is to be copied to the new node. This is
+            always the primary replica of the fragment. The node
+            indicated copies all the data over to the starting node in
+            response to this message.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            After copying has been completed, and a
+            <literal>COPY_FRAGCONF</literal> message is sent, all nodes
+            are notified of the completion through an
+            <literal>UPDATE_TOREQ</literal> signal.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            After all nodes have updated to reflect the new state of the
+            fragment, the <literal>DBLQH</literal> kernel block of the
+            starting node is informed of the fact that the copy has been
+            completed, and that the replica is now up-to-date and any
+            failures should now be treated as real failures.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            The new replica is transformed into a primary replica if
+            this is the role it had when the table was created.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            After completing this change another round of
+            <literal>CREATE_FRAGREQ</literal> messages is sent to all
+            nodes informing them that the takeover of the fragment is
+            now committed.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            After this, process is repeated with the next fragment if
+            another one exists.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            When there are no more fragments for takeover by the node,
+            all nodes are informed of this by sending an
+            <literal>UPDATE_TOREQ</literal> signal sent to all of them.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Wait for the next complete local checkpoint to occur,
+            running from start to finish.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            The node states are updated, using a complete global
+            checkpoint. As with the local checkpoint in the previous
+            step, the global checkpoint must be allowed to start and
+            then to finish.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            When the global checkpoint has completed, it will
+            communicate the successful local checkpoint of this node
+            restart by sending an <literal>END_TOREQ</literal> signal to
+            all nodes.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            A <literal>START_COPYCONF</literal> is sent back to the
+            starting node informing it that the node restart has been
+            completed.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Receiving the <literal>START_COPYCONF</literal> signal ends
+            <literal>NDB_STTOR</literal> phase 5. This provides another
+            synchronisation point for system restarts, designated as
+            <literal>WAITPOINT_5_2</literal>.
+          </para>
+        </listitem>
+
+      </orderedlist>
+    </para>
+
+    <note>
+      <para>
+        The copy process in this phase can in theory be performed in
+        parallel by several nodes. However, all messages from the master
+        to all nodes are currently sent to single node at a time, but
+        can be made completely parallel. This is likely to be done in
+        the not too distant future.
+      </para>
+    </note>
+
+    <para>
+      In an initial and an initial node restart, the
+      <literal>SUMA</literal> block requests the subscriptions from the
+      <literal>SUMA</literal> master node. <literal>NDBCNTR</literal>
+      executes <literal>NDB_STTOR</literal> phase 6. No other
+      <literal>NDBCNTR</literal> activity takes place.
+    </para>
+
   </section>
 
   <section id="ndb-internals-start-phases-ndb-sttor-6">
 
     <title><literal>NDB_STTOR</literal> Phase 6</title>
 
-    <para></para>
+    <para>
+      In this <literal>NDB_STTOR</literal> phase, both
+      <literal>DBLQH</literal> and <literal>DBDICT</literal> clear their
+      internal representing the current restart type. The
+      <literal>DBACC</literal> block resets the system restart flag;
+      <literal>DBACC</literal> and <literal>DBTUP</literal> start a
+      periodic signal for checking memory usage once per second.
+      <literal>DBTC</literal> sets an internal variable indicating that
+      the system restart has been completed.
+    </para>
 
   </section>
 

@@ -852,7 +1088,14 @@
 
     <title><literal>STTOR</literal> Phase 6</title>
 
-    <para></para>
+    <para>
+      The <literal>NDBCNTR</literal> block defines the cluster's node
+      groups, and the <literal>DBUTIL</literal> block initialises a
+      number of data structures to facilitate the sending keyed
+      operations can be to the system tables. <literal>DBUTIL</literal>
+      also sets up a single connection to the <literal>DBTC</literal>
+      kernel block.
+    </para>
 
   </section>
 

@@ -860,9 +1103,32 @@
 
     <title><literal>STTOR</literal> Phase 7</title>
 
-    <para></para>
+    <para>
+      In <literal>QMGR</literal> the president starts an arbitrator
+      (unless this feature has been disabled by setting the avlue of the
+      <literal>ArbitrationRank</literal> configuration parameter to 0
+      for all nodes &mdash; see
+      <xref linkend="mysql-cluster-mgm-definition"/>, and
+      <xref linkend="mysql-cluster-api-definition"/>, for more
+      information; note that this currently can be done only when using
+      MySQL Cluster Carrier Grade Edition). In addition, checking of API
+      nodes through heartbeats is activated.
+    </para>
 
+    <para>
+      Also during this phase, the <literal>BACKUP</literal> block sets
+      the disk write speed to the value used following the completion of
+      the restart. The master node during initial start also inserts the
+      record keeping track of which backup ID is to be used next. The
+      <literal>SUMA</literal> and <literal>DBTUX</literal> blocks set
+      variables indicating start phase 7 has been completed, and that
+      requests to <literal>DBTUX</literal> that occurs when running the
+      redo log should no longer be ignored.
+    </para>
+
   </section>
+  
+<!--  STOP POINT  -->
 
   <section id="ndb-internals-start-phases-sttor-8">
 

@@ -904,4 +1170,26 @@
 
   </section>
 
+<!--  
+  <section id="ndb-internals-start-phases-takeovers">
+    <title>Handling of Node Takeovers</title>
+    
+    <para>
+      <remark role="NOTE">
+        [js] Commented out until Mikael supplies the material on node
+        takeovers.
+      </remark>
+      
+    </para>
+  </section>
+-->
+
+  <section id="ndb-internals-start-phases-start-mereq-handling">
+
+    <title>START_MEREQ Handling</title>
+
+    <para></para>
+
+  </section>
+
 </section>


Modified: trunk/ndbapi/start-phases-tmp.txt
===================================================================
--- trunk/ndbapi/start-phases-tmp.txt	2007-06-28 20:07:28 UTC (rev 6947)
+++ trunk/ndbapi/start-phases-tmp.txt	2007-06-28 21:29:23 UTC (rev 6948)
Changed blocks: 2, Lines Added: 2, Lines Deleted: 4; 784 bytes

@@ -192,11 +192,7 @@
 sending heartbeats. It will also send a response message CM_ACKADD to
 the president.
 
-[***STOP POINT***]
-
 PROTOCOL TO INCLUDE STARTING NODE IN CLUSTER
-
-PROTOCOL TO INCLUDE STARTING NODE IN CLUSTER
 --------------------------------------------
 
 START NODE                    ALIVE NODES             PRESIDENT         API NODES

@@ -579,6 +575,8 @@
 indicate that we can now stop ignoring requests to DBTUX that occurs
 when running the REDO log.
 
+[*  STOP  POINT  *]
+
 STTOR PHASE 8
 ------------------------
 NDBCNTR:


Thread
svn commit - mysqldoc@docsrva: r6948 - trunk/ndbapijon28 Jun