Author: jstephens
Date: 2007-06-28 23:29:23 +0200 (Thu, 28 Jun 2007)
New Revision: 6948
Log:
More of the Start Phases doc.
Modified:
trunk/ndbapi/ndb-internals-start-phases.xml
trunk/ndbapi/start-phases-tmp.txt
Modified: trunk/ndbapi/ndb-internals-start-phases.xml
===================================================================
--- trunk/ndbapi/ndb-internals-start-phases.xml 2007-06-28 20:07:28 UTC (rev 6947)
+++ trunk/ndbapi/ndb-internals-start-phases.xml 2007-06-28 21:29:23 UTC (rev 6948)
Changed blocks: 12, Lines Added: 325, Lines Deleted: 37; 18065 bytes
@@ -11,8 +11,8 @@
<title>Read Configuration Phase (Phase -1)</title>
<para>
- Before the data node actually starts, a number of other things
- must be set up and initialized, including the block objects,
+ Before the data node actually starts, a number of other setup and
+ initialization tasks must be done for the block objects,
transporters, and watchdog checks, among others.
</para>
@@ -229,9 +229,9 @@
<para>
Most <literal>NDB</literal> kernel blocks begin their start phases
- at <literal>STTOR</literal> phase 1, with the exception of
+ at <literal>STTOR</literal> Phase 1, with the exception of
<literal>NDBFS</literal> and <literal>NDBCNTR</literal>, which
- begin with phase 0, as can be seen by inspecting the first value
+ begin with Phase 0, as can be seen by inspecting the first value
for each element in the <literal>ALL_BLOCKS</literal> array
(defined in
<filename>src/kernel/blocks/ndbcntr/NdbcntrMain.cpp</filename>).
@@ -246,14 +246,13 @@
<literal>STTOR</literal> signals are sent out in the order in
which the kernel blocks are listed in the
<literal>ALL_BLOCKS</literal> array. While
- <literal>NDBCNTR</literal> goes through the start phases all the
- way from start phase 0 to start phase 255, most of these start
- phases are empty.
+ <literal>NDBCNTR</literal> goes through start phases 0 to 255,
+ most of these are empty.
</para>
<para>
- Both activities in startphase 0 have to do with initialization of
- the <literal>NDB</literal> filesystem. First, if necessary,
+ Both activities in Phase 0 have to do with initialization of the
+ <literal>NDB</literal> filesystem. First, if necessary,
<literal>NDBFS</literal> creates the filesystem directory for the
data node. In the case of an initial start,
<literal>NDBCNTR</literal> clears any existing files from the
@@ -297,9 +296,13 @@
<para>
In the following table, and throughout this text, we sometimes
refer to <literal>STTOR</literal> start phases simply as
- <quote>start phases</quote>. <literal>NDB_STTOR</literal> start
- phases are always qualified as such, and so referred to as
- <quote><literal>NDB_STTOR</literal> start phases</quote>.
+ <quote>start phases</quote> or <quote>Phase
+ <replaceable>N</replaceable></quote> (where
+ <replaceable>N</replaceable> is some number).
+ <literal>NDB_STTOR</literal> start phases are always qualified
+ as such, and so referred to as
+ <quote><literal>NDB_STTOR</literal> start phases</quote> or
+ <quote><literal>NDB_STTOR</literal> phases</quote>.
</para>
</note>
@@ -384,11 +387,11 @@
</row>
<row>
<entry><literal>PGMAN</literal></entry>
- <entry>1, 3, 7 (phase 7 currently empty)</entry>
+ <entry>1, 3, 7 (Phase 7 currently empty)</entry>
</row>
<row>
<entry><literal>RESTORE</literal></entry>
- <entry>1,3 (only in phase 1 is any real work done)</entry>
+ <entry>1,3 (only in Phase 1 is any real work done)</entry>
</row>
</tbody>
</tgroup>
@@ -659,8 +662,8 @@
</para>
<para>
- <literal>NDBCNTR</literal> executes phase 2 of the
- <literal>NDB_STTOR</literal> startphases, with no other
+ <literal>NDBCNTR</literal> executes the second of the
+ <literal>NDB_STTOR</literal> start phases, with no other
<literal>NDBCNTR</literal> activity taking place during this
<literal>STTOR</literal> phase.
</para>
@@ -784,34 +787,34 @@
node.)
</para>
-<!-- STOP POINT -->
-
<para>
For node restarts and initial node restarts no more work is done
in this phase. For initial starts the work is done when all nodes
- have created the initial restart information and initialised the
+ have created the initial restart information and initialized the
system file.
</para>
<para>
- For system restarts this is where most of the work is performed
- activated by sending the NDB_STARTREQ signal from NDBCNTR to DBDIH
- in the master. This signal is sent when all nodes in the system
- restart have reached to this point in the restart. Thus we first
- have a synchronisation point. For a description of the system
- restart version of phase 4, see
- <xref linkend="ndb-internals-start-phases-system-restart-phase-4"/>.
+ For system restarts this is where most of the work is performed,
+ initiated by sending the <literal>NDB_STARTREQ</literal> signal
+ from <literal>NDBCNTR</literal> to <literal>DBDIH</literal> in the
+ master. This signal is sent when all nodes in the system restart
+ have reached this point in the restart. This we can mark as our
+ first synchronization point for system restarts, designated
+ <literal>WAITPOINT_4_1</literal>.
</para>
<para>
- SYNCHRONISATION POINT for system restarts: WAITPOINT_4_1
+ For a description of the system restart version of Phase 4, see
+ <xref linkend="ndb-internals-start-phases-system-restart-phase-4"/>.
</para>
<para>
- After completing execution of the NDB_STARTREQ signal the master
- will send CNTR_WAITREP with WAITPOINT_4_2 to all nodes, this will
- end the NDB_STTOR phase 3 and will also be last activity in
- startphase 4.
+ After completing execution of the <literal>NDB_STARTREQ</literal>
+ signal, the master sends a <literal>CNTR_WAITREP</literal> signal
+ with <literal>WAITPOINT_4_2</literal> to all nodes. This ends
+ <literal>NDB_STTOR</literal> phase 3 as well as
+ (<literal>STTOR</literal>) Phase 4.
</para>
</section>
@@ -820,7 +823,13 @@
<title><literal>STTOR</literal> Phase 5</title>
- <para></para>
+ <para>
+ All that takes place in Phase 5 is the delivery by
+ <literal>NDBCNTR</literal> of <literal>NDB_STTOR</literal> phase
+ 4; the only block that acts on this signal is
+ <literal>DBDIH</literal> that controls most of the part of a data
+ node start that is database-related.
+ </para>
</section>
@@ -828,23 +837,250 @@
<title><literal>NDB_STTOR</literal> Phase 4</title>
- <para></para>
+ <para>
+ Some initialisation of local checkpoint variables takes place in
+ this phase, and for initial restarts, this is all that happens in
+ this phase.
+ </para>
+ <para>
+ For system restarts, all required takeovers are also performed.
+ Currently, this means that all nodes whose states could not be
+ recovered using the redo log are restarted by copying to them all
+ the necessary data from the <quote>live</quote> data nodes.
+
+ <remark role="NOTE">
+ [js] Commented out until Mikael supplies the material on node
+ takeovers.
+ </remark>
+
+<!-- For a
+ description of this process, see
+ <xref linkend="ndb-internals-start-phases-takeovers"/>.
+ </para>
+ <para>-->
+
+ For node restarts and initial node restarts, the master node
+ performs a number of services, requested to do so by sending the
+ <literal>START_MEREQ</literal> signal to it. This phase is
+ complete when the master responds with a
+ <literal>START_MECONF</literal> message, and is described in
+ <xref linkend="ndb-internals-start-phases-start-mereq-handling"/>.
+ </para>
+
+ <para>
+ After ensuring that the tasks assigned to <literal>DBDIH</literal>
+ tasks in the NDB_STTOR phase 4 are complete,
+ <literal>NDBCNTR</literal> performs some work on its own. For
+ initial starts, it creates the system table that keeps track of
+ unique identifiers such as those used for
+ <literal>AUTO_INCREMENT</literal>. Following the WAITPOINT_4_1
+ synchronization point, all system restarts proceed immediately to
+ <literal>NDB_STTOR</literal> phase 5, which is handled by the
+ <literal>DBDIH</literal> block. See
+ <xref linkend="ndb-internals-start-phases-ndb-sttor-5"/>, for more
+ information.
+ </para>
+
</section>
<section id="ndb-internals-start-phases-ndb-sttor-5">
<title><literal>NDB_STTOR</literal> Phase 5</title>
- <para></para>
+ <para>
+ For initial starts and system restarts this phase means executing
+ a local checkpoint. This is handled by the master so that the
+ other nodes will return immediately from this phase. Node restarts
+ and initial node restarts perform the copying of the records from
+ the primary replica to the starting replicas in this phase. Local
+ checkpoints are enabled before the copying process is begun.
+ </para>
+ <para>
+ Copying the data to a starting node is part of the node takeover
+ protocol. As part of this protocol, the node status of the
+ starting node is updated; this is communicated using the global
+ checkpoint protocol. Waiting for these events to take place
+ ensures that the new node status is communicated to all nodes and
+ their system files.
+ </para>
+
+ <para>
+ After the node's status has been communicated, all nodes are
+ signalled that we are about to start the takeover protocol for
+ this node. Part of this protocol consists of Steps 3 - 9 during
+ the system restart phase as described below. This means that
+ restoration of all the fragments, preparation for execution of the
+ redo log, execution of the redo log, and finally reporting back to
+ <literal>DBDIH</literal> when the execution of the redo log is
+ completed, are all part of this process.
+ </para>
+
+ <para>
+ After preparations are complete, copy phase for each fragment in
+ the node must be performed. The process of copying a fragment
+ involves the following steps:
+
+ <orderedlist>
+
+ <listitem>
+ <para>
+ The <literal>DBLQH</literal> kernel block in the starting
+ node is informed that the copy process is about to begin by
+ sending it a <literal>PREPARE_COPY_FRAGREQ</literal> signal.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When <literal>DBLQH</literal> acknowledges this request a
+ <literal>CREATE_FRAGREQ</literal> signal is sent to all
+ nodes notify them of the preparation being made to copy data
+ to this replica for this table fragment.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ After all nodes have acknowledged this, a
+ <literal>COPY_FRAGREQ</literal> signal is sent to the node
+ from which the data is to be copied to the new node. This is
+ always the primary replica of the fragment. The node
+ indicated copies all the data over to the starting node in
+ response to this message.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ After copying has been completed, and a
+ <literal>COPY_FRAGCONF</literal> message is sent, all nodes
+ are notified of the completion through an
+ <literal>UPDATE_TOREQ</literal> signal.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ After all nodes have updated to reflect the new state of the
+ fragment, the <literal>DBLQH</literal> kernel block of the
+ starting node is informed of the fact that the copy has been
+ completed, and that the replica is now up-to-date and any
+ failures should now be treated as real failures.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The new replica is transformed into a primary replica if
+ this is the role it had when the table was created.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ After completing this change another round of
+ <literal>CREATE_FRAGREQ</literal> messages is sent to all
+ nodes informing them that the takeover of the fragment is
+ now committed.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ After this, process is repeated with the next fragment if
+ another one exists.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When there are no more fragments for takeover by the node,
+ all nodes are informed of this by sending an
+ <literal>UPDATE_TOREQ</literal> signal sent to all of them.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Wait for the next complete local checkpoint to occur,
+ running from start to finish.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The node states are updated, using a complete global
+ checkpoint. As with the local checkpoint in the previous
+ step, the global checkpoint must be allowed to start and
+ then to finish.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When the global checkpoint has completed, it will
+ communicate the successful local checkpoint of this node
+ restart by sending an <literal>END_TOREQ</literal> signal to
+ all nodes.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A <literal>START_COPYCONF</literal> is sent back to the
+ starting node informing it that the node restart has been
+ completed.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Receiving the <literal>START_COPYCONF</literal> signal ends
+ <literal>NDB_STTOR</literal> phase 5. This provides another
+ synchronisation point for system restarts, designated as
+ <literal>WAITPOINT_5_2</literal>.
+ </para>
+ </listitem>
+
+ </orderedlist>
+ </para>
+
+ <note>
+ <para>
+ The copy process in this phase can in theory be performed in
+ parallel by several nodes. However, all messages from the master
+ to all nodes are currently sent to single node at a time, but
+ can be made completely parallel. This is likely to be done in
+ the not too distant future.
+ </para>
+ </note>
+
+ <para>
+ In an initial and an initial node restart, the
+ <literal>SUMA</literal> block requests the subscriptions from the
+ <literal>SUMA</literal> master node. <literal>NDBCNTR</literal>
+ executes <literal>NDB_STTOR</literal> phase 6. No other
+ <literal>NDBCNTR</literal> activity takes place.
+ </para>
+
</section>
<section id="ndb-internals-start-phases-ndb-sttor-6">
<title><literal>NDB_STTOR</literal> Phase 6</title>
- <para></para>
+ <para>
+ In this <literal>NDB_STTOR</literal> phase, both
+ <literal>DBLQH</literal> and <literal>DBDICT</literal> clear their
+ internal representing the current restart type. The
+ <literal>DBACC</literal> block resets the system restart flag;
+ <literal>DBACC</literal> and <literal>DBTUP</literal> start a
+ periodic signal for checking memory usage once per second.
+ <literal>DBTC</literal> sets an internal variable indicating that
+ the system restart has been completed.
+ </para>
</section>
@@ -852,7 +1088,14 @@
<title><literal>STTOR</literal> Phase 6</title>
- <para></para>
+ <para>
+ The <literal>NDBCNTR</literal> block defines the cluster's node
+ groups, and the <literal>DBUTIL</literal> block initialises a
+ number of data structures to facilitate the sending keyed
+ operations can be to the system tables. <literal>DBUTIL</literal>
+ also sets up a single connection to the <literal>DBTC</literal>
+ kernel block.
+ </para>
</section>
@@ -860,9 +1103,32 @@
<title><literal>STTOR</literal> Phase 7</title>
- <para></para>
+ <para>
+ In <literal>QMGR</literal> the president starts an arbitrator
+ (unless this feature has been disabled by setting the avlue of the
+ <literal>ArbitrationRank</literal> configuration parameter to 0
+ for all nodes — see
+ <xref linkend="mysql-cluster-mgm-definition"/>, and
+ <xref linkend="mysql-cluster-api-definition"/>, for more
+ information; note that this currently can be done only when using
+ MySQL Cluster Carrier Grade Edition). In addition, checking of API
+ nodes through heartbeats is activated.
+ </para>
+ <para>
+ Also during this phase, the <literal>BACKUP</literal> block sets
+ the disk write speed to the value used following the completion of
+ the restart. The master node during initial start also inserts the
+ record keeping track of which backup ID is to be used next. The
+ <literal>SUMA</literal> and <literal>DBTUX</literal> blocks set
+ variables indicating start phase 7 has been completed, and that
+ requests to <literal>DBTUX</literal> that occurs when running the
+ redo log should no longer be ignored.
+ </para>
+
</section>
+
+<!-- STOP POINT -->
<section id="ndb-internals-start-phases-sttor-8">
@@ -904,4 +1170,26 @@
</section>
+<!--
+ <section id="ndb-internals-start-phases-takeovers">
+ <title>Handling of Node Takeovers</title>
+
+ <para>
+ <remark role="NOTE">
+ [js] Commented out until Mikael supplies the material on node
+ takeovers.
+ </remark>
+
+ </para>
+ </section>
+-->
+
+ <section id="ndb-internals-start-phases-start-mereq-handling">
+
+ <title>START_MEREQ Handling</title>
+
+ <para></para>
+
+ </section>
+
</section>
Modified: trunk/ndbapi/start-phases-tmp.txt
===================================================================
--- trunk/ndbapi/start-phases-tmp.txt 2007-06-28 20:07:28 UTC (rev 6947)
+++ trunk/ndbapi/start-phases-tmp.txt 2007-06-28 21:29:23 UTC (rev 6948)
Changed blocks: 2, Lines Added: 2, Lines Deleted: 4; 784 bytes
@@ -192,11 +192,7 @@
sending heartbeats. It will also send a response message CM_ACKADD to
the president.
-[***STOP POINT***]
-
PROTOCOL TO INCLUDE STARTING NODE IN CLUSTER
-
-PROTOCOL TO INCLUDE STARTING NODE IN CLUSTER
--------------------------------------------
START NODE ALIVE NODES PRESIDENT API NODES
@@ -579,6 +575,8 @@
indicate that we can now stop ignoring requests to DBTUX that occurs
when running the REDO log.
+[* STOP POINT *]
+
STTOR PHASE 8
------------------------
NDBCNTR:
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r6948 - trunk/ndbapi | jon | 28 Jun |