List:Commits« Previous MessageNext Message »
From:jon Date:June 26 2007 3:50pm
Subject:svn commit - mysqldoc@docsrva: r6913 - trunk/ndbapi
View as plain text  
Author: jstephens
Date: 2007-06-26 15:50:43 +0200 (Tue, 26 Jun 2007)
New Revision: 6913

Log:

Word-wrap start phases text file.



Modified:
   trunk/ndbapi/start-phases-tmp.txt


Modified: trunk/ndbapi/start-phases-tmp.txt
===================================================================
--- trunk/ndbapi/start-phases-tmp.txt	2007-06-26 12:30:12 UTC (rev 6912)
+++ trunk/ndbapi/start-phases-tmp.txt	2007-06-26 13:50:43 UTC (rev 6913)
Changed blocks: 10, Lines Added: 446, Lines Deleted: 128; 56372 bytes

@@ -1,11 +1,14 @@
 Documentation of NDB Data node start:
 
-Already before the start occurs the block objects, the transporters, the watchdog and a
number of other things are set-up and initialised.
+Already before the start occurs the block objects, the transporters, the
+watchdog and a number of other things are set-up and initialised.
 
 1) Starts in main.cpp with calls to
 globalEmulatorData.theThreadConfig->doStart
-When starting with -n there is only one call to it and otherwise there are
-two calls where the second call will start. The doStart-routine will send the
+When starting with -n there is only one call to it and otherwise there
+are
+two calls where the second call will start. The doStart-routine will
+send the
 signal START_ORD to CMVMI and on the second call to this routine there
 will be a START_ORD signal sent to NDBCNTR.
 

@@ -36,39 +39,76 @@
 
 READ CONFIG PHASE (PHASE = -1)
 ---------------------------------------------------
-The READ_CONFIG_REQ signal gives all the blocks a chance to read the configuration. The
configuration is stored in a global object accessible to all
-blocks. This phase also performs all memory allocations. Thus after this phase there are
no more memory allocations in the NDB data nodes.
+The READ_CONFIG_REQ signal gives all the blocks a chance to read the
+configuration. The configuration is stored in a global object accessible
+to all
+blocks. This phase also performs all memory allocations. Thus after this
+phase there are no more memory allocations in the NDB data nodes.
 
 This phase also sets up the connections between the blocks and the
-NDB filesystem. This is necessary to enable the blocks to easily communicate
+NDB filesystem. This is necessary to enable the blocks to easily
+communicate
 which parts of a struct that are to be written to disk.
 
-There are two ways that NDB performs memory allocations. It is using the
allocRecord-method which is the old method where records are accessed using the old
macros ptrCheckGuard. The other method is using templates where the method to allocate
the struct-array is setSize.
+There are two ways that NDB performs memory allocations. It is using the
+allocRecord-method which is the old method where records are accessed
+using the old macros ptrCheckGuard. The other method is using templates
+where the method to allocate the struct-array is setSize.
 
-These methods does sometime also initialise the memory. It ensures both the memory
allocation and the initialisation is done with watchdog protection.
+These methods does sometime also initialise the memory. It ensures both
+the memory allocation and the initialisation is done with watchdog
+protection.
 
-Many blocks also perform block-specific initialisation per struct. Often these
initialisations entails building linked list or double-linked list and in some cases
hash-tables.
+Many blocks also perform block-specific initialisation per struct. Often
+these initialisations entails building linked list or double-linked list
+and in some cases hash-tables.
 
-Many of the sizes used in allocation are calculated in the method calcSizeAlt in the file
Configuration.cpp.
+Many of the sizes used in allocation are calculated in the method
+calcSizeAlt in the file Configuration.cpp.
 
-Some preparation for more intelligent pooling of memory resources have been started. The
DataMemory and the disk records belongs to this Global Memory Pool currently.
+Some preparation for more intelligent pooling of memory resources have
+been started. The DataMemory and the disk records belongs to this Global
+Memory Pool currently.
 
 STTOR PHASE 0
 -----------------------
-Most blocks start their start phases at phase 1. Only NDBFS and NDBCNTR start at phase 0.
This can be found in the ALL_BLOCKS array. Also when sending the STTOR to a block, the
return signal STTORRY will always contain a list of the start phases which the block is
interested in. Only in those startphases will the block actually receive a STTOR signal.
+Most blocks start their start phases at phase 1. Only NDBFS and NDBCNTR
+start at phase 0. This can be found in the ALL_BLOCKS array. Also when
+sending the STTOR to a block, the return signal STTORRY will always
+contain a list of the start phases which the block is interested in.
+Only in those startphases will the block actually receive a STTOR
+signal.
 
-STTOR signals are sent in the order that blocks are listed in the ALL_BLOCKS array.
NDBCNTR goes through the startphases all the way from startphase 0 to startphase 255.
Most of these startphases are empty.
+STTOR signals are sent in the order that blocks are listed in the
+ALL_BLOCKS array. NDBCNTR goes through the startphases all the way from
+startphase 0 to startphase 255. Most of these startphases are empty.
 
-Both activities in startphase 0 have to do with initialisation of the file system. First
NDBFS creates the directory for the data node and then in the case of an initial start
NDBCNTR will clear files from the directory of the data node to ensure that DBDIH later
doesn't discover a system file and thus it will interpret the start as an initial start.
+Both activities in startphase 0 have to do with initialisation of the
+file system. First NDBFS creates the directory for the data node and
+then in the case of an initial start NDBCNTR will clear files from the
+directory of the data node to ensure that DBDIH later doesn't discover a
+system file and thus it will interpret the start as an initial start.
 
-Every time that NDBCNTR has completed sending of one startphase to all blocks it will
send the signal NODE_STATE_REP to all blocks which effectively updates the NodeState in
all blocks.
+Every time that NDBCNTR has completed sending of one startphase to all
+blocks it will send the signal NODE_STATE_REP to all blocks which
+effectively updates the NodeState in all blocks.
 
-Every time that NDBCNTR has completed one non-empty startphase it will report this to the
management server and in most cases this will end up in the cluster log.
+Every time that NDBCNTR has completed one non-empty startphase it will
+report this to the management server and in most cases this will end up
+in the cluster log.
 
-Finally after completing all startphases NDBCNTR will update the node state in all blocks
through the NODE_STATE_REP signal and also send a event report about that the startphases
are all completed. Then also all other nodes in the cluster will receive a report about
that  this node has completed all its startphases to ensure all nodes are in synch about
their state. Each node will send NODE_START_REP to all blocks but in reality only DBDIH
listens to it to know when it can unlock the lock on DBDICT for schema changes.
+Finally after completing all startphases NDBCNTR will update the node
+state in all blocks through the NODE_STATE_REP signal and also send a
+event report about that the startphases are all completed. Then also all
+other nodes in the cluster will receive a report about that  this node
+has completed all its startphases to ensure all nodes are in synch about
+their state. Each node will send NODE_START_REP to all blocks but in
+reality only DBDIH listens to it to know when it can unlock the lock on
+DBDICT for schema changes.
 
 List of start phases that the blocks will listen to:
-(Note: This list is likely to changer over time so source code is the real source of this
information)
+(Note: This list is likely to changer over time so source code is the
+real source of this information)
 NDBFS: Phase 0
 DBTC: Phase 1
 DBDIH: Phase 1

@@ -92,84 +132,172 @@
 STTOR PHASE 1
 -----------------------
 
-In most blocks this is where the phases a block is involved in. Most blocks doesn't
really do anything apart from this other than potentially initialising some data.
+In most blocks this is where the phases a block is involved in. Most
+blocks doesn't really do anything apart from this other than potentially
+initialising some data.
 
 DBTC initialises a couple of variables.
 DBLQH initialises block references to DBTUP.
 DBACC initialises block references to DBTUP and DBLQH
 DBTUP initialises references to the blocks DBLQH, TSMAN and LGMAN.
 
-NDBCNTR initialises a few variables and sets up block references to DBTUP, DBLQH, DBACC,
DBTC, DBDIH and DBDICT which will be used in the special startphase handling of these
blocks using the NDB_STTOR signals. This is where the bulk of the start of the node
actually takes place.
+NDBCNTR initialises a few variables and sets up block references to
+DBTUP, DBLQH, DBACC, DBTC, DBDIH and DBDICT which will be used in the
+special startphase handling of these blocks using the NDB_STTOR signals.
+This is where the bulk of the start of the node actually takes place.
 
 CMVMI locks memory if this is configured to happen.
 
-QMGR calls initData that performs work which all other blocks handles in the
READ_CONFIG_REQ phase. After these initialisations it does send the DIH_RESTARTREQ signal
to DIH. This signal will discover whether a proper system file exists in which case we're
not performing an initial start. After receiving this signal the process of getting the
node integrated among the other data nodes in the cluster happens. In this process data
nodes enter the cluster serially one at a time. The first one to enter will become master
and whenever the master dies the new master will always be the node that was started
longest time ago.
+QMGR calls initData that performs work which all other blocks handles in
+the READ_CONFIG_REQ phase. After these initialisations it does send the
+DIH_RESTARTREQ signal to DIH. This signal will discover whether a proper
+system file exists in which case we're not performing an initial start.
+After receiving this signal the process of getting the node integrated
+among the other data nodes in the cluster happens. In this process data
+nodes enter the cluster serially one at a time. The first one to enter
+will become master and whenever the master dies the new master will
+always be the node that was started longest time ago.
 
-QMGR will first set-up timers to ensure that the inclusion in the cluster doesn't take
longer than what we configured for. Then communication to all other data nodes will be
established. Then we send a CM_REGREQ signal to all data nodes. Only the president of the
cluster will respond to this signal and the president will allow one node at a time to
enter the cluster. If no node responds within 3 seconds then the node will make itself
master. If several nodes start-up simultaneously then the node with the lowest node id
will become president. The president will send CM_REGCONF in response to this signal but
will also send CM_ADD to all nodes currently alive.
+QMGR will first set-up timers to ensure that the inclusion in the
+cluster doesn't take longer than what we configured for. Then
+communication to all other data nodes will be established. Then we send
+a CM_REGREQ signal to all data nodes. Only the president of the cluster
+will respond to this signal and the president will allow one node at a
+time to enter the cluster. If no node responds within 3 seconds then the
+node will make itself master. If several nodes start-up simultaneously
+then the node with the lowest node id will become president. The
+president will send CM_REGCONF in response to this signal but will also
+send CM_ADD to all nodes currently alive.
 
-Next step is that the starting node sends CM_NODEINFOREQ to all current nodes alive. When
the alive nodes receive this signal they will send a NODE_VERSION_REP to all API nodes
they have connected to them. Each node will send CM_ACKADD to the president to inform the
president that they have heard the CM_NODEINFOREQ signal from the new node. Finally they
will send the CM_NODEINFOCONF response to the starting node. When the starting node has
received all these signals it will also send the CM_ACKADD signal to the president. Thus
when the president has received all CM_ACKADD he knows that all nodes have replied to the
CM_NODEINFOREQ and also the starting node have heard all this responses. When the
president receives the last CM_ACKADD it sends CM_ADD to all nodes currently in the
cluster except for the starting node. When receiving this signal the nodes will enable
communication to the new node, start sending heartbeats to it, set up the new node in the
list of neighbours for!
  heartbeat protocol. Finally it will reset the start struct to be able to handle new
starting nodes and then it will send CM_ACKADD to the president. The president will send
CM_ADD to the starting node after receiving the CM_ACKADD from all nodes. Upon reception
of this signal the new node will enable all communication channels to the currently alive
data nodes, will set-up heartbeat structures and will start sending heartbeats. It will
also send a response message CM_ACKADD to the president.
+Next step is that the starting node sends CM_NODEINFOREQ to all current
+nodes alive. When the alive nodes receive this signal they will send a
+NODE_VERSION_REP to all API nodes they have connected to them. Each node
+will send CM_ACKADD to the president to inform the president that they
+have heard the CM_NODEINFOREQ signal from the new node. Finally they
+will send the CM_NODEINFOCONF response to the starting node. When the
+starting node has received all these signals it will also send the
+CM_ACKADD signal to the president. Thus when the president has received
+all CM_ACKADD he knows that all nodes have replied to the CM_NODEINFOREQ
+and also the starting node have heard all this responses. When the
+president receives the last CM_ACKADD it sends CM_ADD to all nodes
+currently in the cluster except for the starting node. When receiving
+this signal the nodes will enable communication to the new node, start
+sending heartbeats to it, set up the new node in the list of neighbours
+for heartbeat protocol. Finally it will reset the start struct to be
+able to handle new starting nodes and then it will send CM_ACKADD to the
+president. The president will send CM_ADD to the starting node after
+receiving the CM_ACKADD from all nodes. Upon reception of this signal
+the new node will enable all communication channels to the currently
+alive data nodes, will set-up heartbeat structures and will start
+sending heartbeats. It will also send a response message CM_ACKADD to
+the president.
 
 PROTOCOL TO INCLUDE STARTING NODE IN CLUSTER
---------------------------------------------------------------------------------
+------------------------------------------------------------------------
+--------
 
 START NODE       ALIVE NODES              PRESIDENT         API NODES
-         |                                          |                                |
+         |                                          |
+            |
          |  CM_REGREQ              |                                |
-         ----------------------------->>|                                |
-         |                                          |                                |
+         ----------------------------->>|
+|
+         |                                          |
+            |
          |                                          |  CM_REGCONF |
          |<<-----------------------------------------------------|
-         |                                          |         CM_ADD      |
-         |                                          |<<---------------------|
+         |                                          |         CM_ADD
+ |
+         |
+|<<---------------------|
          |<-------------------------------------------------------|
-         |                                          |                                |
+         |                                          |
+            |
          | CM_NODEINFOREQ   |                                |
-         |---------------------------->>|                                |
-         |                                           |                               |
-         |                                           |      NODE_VERSION REP
-         |                                          
|------------------------------------------------->>
-         |                                           |                                |
+         |---------------------------->>|
+|
+         |                                           |
+            |
+         |                                           |      NODE_VERSION
+REP
+         |
+|------------------------------------------------->>
+         |                                           |
+             |
          | CM_NODEINFOCONF |                                |
-         |<<-----------------------------|                                |
-         |                                           |                                |
+         |<<-----------------------------|
+ |
+         |                                           |
+             |
          |                                           |   CM_ACKADD    |
-         |                                           |<<---------------------|
-         |                                           |                                 |
+         |
+|<<---------------------|
+         |                                           |
+              |
          |  CM_ACKADD               |                                 |
          |-------------------------------------------------------->|
-         |                                           |                                 |
-         |                                           |  CM_ADD              |
-         |                                           |<<---------------------|
-         |                                           |                                 |
+         |                                           |
+              |
+         |                                           |  CM_ADD
+   |
+         |
+|<<---------------------|
+         |                                           |
+              |
          |                                           |  CM_ACKADD     |
-         |                                           |--------------------->>|
-         |                                           |                                 |
-         |                                           |  CM_ADD             |
+         |
+|--------------------->>|
+         |                                           |
+              |
+         |                                           |  CM_ADD
+  |
          |<--------------------------------------------------------|
-         |                                           |                                 |
+         |                                           |
+              |
          |   CM_ACKADD              |                                 |
          |-------------------------------------------------------->|
 
 NOTE: ALIVE NODES include the president
 
-As a final step QMGR will also start the timer handling it is responsible for. This means
that it will generate a signal every 10 ms to blocks that have requested this signal. This
signal will be sent thus 100 times per second even if one signal sending is delayed
somewhat.
+As a final step QMGR will also start the timer handling it is
+responsible for. This means that it will generate a signal every 10 ms
+to blocks that have requested this signal. This signal will be sent thus
+100 times per second even if one signal sending is delayed somewhat.
 
-BACKUP will start the periodic signal to ensure that we don't write too much data to
disk. We will ensure that we keep the writes within the limits of the what the
configuration has specified for during restarts and after restarts.
+BACKUP will start the periodic signal to ensure that we don't write too
+much data to disk. We will ensure that we keep the writes within the
+limits of the what the configuration has specified for during restarts
+and after restarts.
 DBUTIL initialises the transaction identity.
 DBTUX sets up reference to DBTUP block.
 PGMAN initialises pointers to the LGMAN block and the DBTUP block.
-RESTORE sets up references to the DBLQH and DBTUP block to enable fast access to those
blocks.
+RESTORE sets up references to the DBLQH and DBTUP block to enable fast
+access to those blocks.
 
 STTOR PHASE 2
 ------------------------
-The only block that participates in this phase with real work is NDBCNTR.
-In this phase NDBCNTR will get all configured nodes in the cluster and the state of each
of them at this moment. There will be messages sent to NDBCNTR from QMGR reporting any
changes in status of the nodes. NDBCNTR will set-up timers for StartPartialTimeout,
StartPartitionTimeout and StartFailureTimeout.
+The only block that participates in this phase with real work is
+NDBCNTR.
+In this phase NDBCNTR will get all configured nodes in the cluster and
+the state of each of them at this moment. There will be messages sent to
+NDBCNTR from QMGR reporting any changes in status of the nodes. NDBCNTR
+will set-up timers for StartPartialTimeout, StartPartitionTimeout and
+StartFailureTimeout.
 
-The next step is to CNTR_START_REQ to the proposed master node. Normally the president
choosen is also choosen as master. However if there is a system restart and the starting
node has a newer global checkpoint that he survived then this node will take over as
master node although he isn't president in QMGR. If the starting node is choosen as new
master then all other nodes will be informed about this through a CNTR_START_REF signal.
+The next step is to CNTR_START_REQ to the proposed master node. Normally
+the president choosen is also choosen as master. However if there is a
+system restart and the starting node has a newer global checkpoint that
+he survived then this node will take over as master node although he
+isn't president in QMGR. If the starting node is choosen as new master
+then all other nodes will be informed about this through a
+CNTR_START_REF signal.
 
-The master will hold the CNTR_START_REQ signal until it's ready to start a new node or
start the cluster for an initial restart or system restart.
+The master will hold the CNTR_START_REQ signal until it's ready to start
+a new node or start the cluster for an initial restart or system
+restart.
 
-When the starting node receives CNTR_START_CONF it will start the NDB_STTOR phases which
will be sent in the order:
+When the starting node receives CNTR_START_CONF it will start the
+NDB_STTOR phases which will be sent in the order:
 DBLQH
 DBDICT
 DBTUP

@@ -177,11 +305,14 @@
 DBTC
 DBDIH
 
-The phase used in the NDB_STTOR will mostly be one less than the real startphase.
+The phase used in the NDB_STTOR will mostly be one less than the real
+startphase.
 
 NDB_STTOR PHASE 1
 **************************
-DBDICT will initialise the schema file if necessary. DBIDH, DBTC, DBTUP and DBLQH will
initialise some variables. DBLQH will also initialise sending of statistics on database
operations.
+DBDICT will initialise the schema file if necessary. DBIDH, DBTC, DBTUP
+and DBLQH will initialise some variables. DBLQH will also initialise
+sending of statistics on database operations.
 
 STTOR PHASE 3
 -----------------------

@@ -189,18 +320,35 @@
 Initialise variable that keeps track of type of restart.
 
 NDBCNTR:
-NDBCNTR will execute phase 2 of the NDB_STTOR startphases and no other NDBCNTR activity.
+NDBCNTR will execute phase 2 of the NDB_STTOR startphases and no other
+NDBCNTR activity.
 
 NDB_STTOR PHASE 2
 **************************
 DBLQH will connect all internal records between DBLQH and DBTUP+DBACC.
 DBTC will connect all internal records between DBTC and DBDIH.
-DBDIH will create "mutexes" used by kernel and will read the nodes using the
READ_NODESREQ signal. With the data from the response to this signal DBDIH will create
node lists, node groups and so forth. For node restarts and initial node restarts it will
also ask the master for permission to perform the node restart. The master will ask all
alive nodes if they are ok to permit the new node to join the cluster. If the node will
perform an initial node restart we'll also invalidate all LCP's as part of this phase.
+DBDIH will create "mutexes" used by kernel and will read the nodes using
+the READ_NODESREQ signal. With the data from the response to this signal
+DBDIH will create node lists, node groups and so forth. For node
+restarts and initial node restarts it will also ask the master for
+permission to perform the node restart. The master will ask all alive
+nodes if they are ok to permit the new node to join the cluster. If the
+node will perform an initial node restart we'll also invalidate all
+LCP's as part of this phase.
 
-It's ok that we don't invalidate LCP's from nodes that aren't part of the cluster at the
time of the initial node restart. The reason is that there is no chance that a node will
ever become master of a system restart using any of the LCP's that have been invalidated.
This is the case since the node will have to complete a node restart which includes a
local checkpoint before it can join the cluster and even potentially become a master.
+It's ok that we don't invalidate LCP's from nodes that aren't part of
+the cluster at the time of the initial node restart. The reason is that
+there is no chance that a node will ever become master of a system
+restart using any of the LCP's that have been invalidated. This is the
+case since the node will have to complete a node restart which includes
+a local checkpoint before it can join the cluster and even potentially
+become a master.
 
 CMVMI:
-Activate the sending of packed signals. Packing of signals only occurs as part of
database operations and must be enabled before any such operations starts up which they
do in the execution of the REDO log and the node recovery phases.
+Activate the sending of packed signals. Packing of signals only occurs
+as part of database operations and must be enabled before any such
+operations starts up which they do in the execution of the REDO log and
+the node recovery phases.
 
 BACKUP:
 Initialises type of restart variable.

@@ -211,101 +359,198 @@
 Initialises variable indicating what type of start is ongoing.
 
 PGMAN:
-Starts two signals that are send repetitiously. The first handles cleanup and is sent
every 200 milliseconds and the other handles statistics and is sent once per second.
+Starts two signals that are send repetitiously. The first handles
+cleanup and is sent every 200 milliseconds and the other handles
+statistics and is sent once per second.
 
 STTOR PHASE 4
 ------------------------
 DBLQH:
-Allocate a record in BACKUP for execution of local checkpoints using the
DEFINE_BACKUP_REQ signal.
+Allocate a record in BACKUP for execution of local checkpoints using the
+DEFINE_BACKUP_REQ signal.
 
 NDBCNTR:
 NDB_STTOR will execute NDB_STTOR phase 3 and no other NDBCNTR activity.
 
 NDB_STTOR PHASE 3
 **************************
-DBLQH will initiate checking of the log files here. Then it will read the nodes using the
READ_NODESREQ signal. If the start isn't an initial start or an initial node restart then
the check of the log files will be handled in parallel with a set of other startphases.
For initial starts the log files will be initialised, this can be a lengthy process and
should have some progress status attached to it.
+DBLQH will initiate checking of the log files here. Then it will read
+the nodes using the READ_NODESREQ signal. If the start isn't an initial
+start or an initial node restart then the check of the log files will be
+handled in parallel with a set of other startphases. For initial starts
+the log files will be initialised, this can be a lengthy process and
+should have some progress status attached to it.
 
 NOTE:
 ---------
-From here we have two parallel paths, one continuing restart and another reading REDO log
files to find out state of those.
+From here we have two parallel paths, one continuing restart and another
+reading REDO log files to find out state of those.
 
-DBDICT will request information about the nodes in the cluster through the READ_NODESREQ
signal.
-DBACC will reset system restart variable if it's not a system restart (only used to
verify we don't get requests from DBTUX during system restart).
-DBTC will request information about all nodes in the cluster through the READ_NODESREQ
signal.
-DBDIH will set some internal master state and only perform other work for initial starts.
In this case the non-master nodes will perform some initial work. The master node will
perform this work when all non-master nodes have reported that they've done this work
(this is an unnecessary delay since there is no reason to wait with initialising the
master node here).
+DBDICT will request information about the nodes in the cluster through
+the READ_NODESREQ signal.
+DBACC will reset system restart variable if it's not a system restart
+(only used to verify we don't get requests from DBTUX during system
+restart).
+DBTC will request information about all nodes in the cluster through the
+READ_NODESREQ signal.
+DBDIH will set some internal master state and only perform other work
+for initial starts. In this case the non-master nodes will perform some
+initial work. The master node will perform this work when all non-master
+nodes have reported that they've done this work (this is an unnecessary
+delay since there is no reason to wait with initialising the master node
+here).
 
-For node restarts and initial node restarts no more work is done in this phase. For
initial starts the work is done when all nodes have created the initial restart
information and initialised the system file.
+For node restarts and initial node restarts no more work is done in this
+phase. For initial starts the work is done when all nodes have created
+the initial restart information and initialised the system file.
 
-For system restarts this is where most of the work is performed activated by sending the
NDB_STARTREQ signal from NDBCNTR to DBDIH in the master. This signal is sent when all
nodes in the system restart have reached to this point in the restart. Thus we first have
a synchronisation point. The description of the system restart phase 4 is described in a
chapter below.
------------------------------------------------------------------------------------------
+For system restarts this is where most of the work is performed
+activated by sending the NDB_STARTREQ signal from NDBCNTR to DBDIH in
+the master. This signal is sent when all nodes in the system restart
+have reached to this point in the restart. Thus we first have a
+synchronisation point. The description of the system restart phase 4 is
+described in a chapter below.
+------------------------------------------------------------------------
+-----------------
 SYNCHRONISATION POINT for system restarts: WAITPOINT_4_1
------------------------------------------------------------------------------------------
-After completing execution of the NDB_STARTREQ signal the master will send CNTR_WAITREP
with WAITPOINT_4_2 to all nodes, this will end the NDB_STTOR phase 3 and will also be
last activity in startphase 4.
+------------------------------------------------------------------------
+-----------------
+After completing execution of the NDB_STARTREQ signal the master will
+send CNTR_WAITREP with WAITPOINT_4_2 to all nodes, this will end the
+NDB_STTOR phase 3 and will also be last activity in startphase 4.
 
 STTOR PHASE 5
 ------------------------
 NDBCNTR:
-NDBCNTR will only deliver phase 4 of the NDB_STTOR phases and the only block that will
act on this signal is DBDIH that controls most of the database related part of the start
of a data node.
+NDBCNTR will only deliver phase 4 of the NDB_STTOR phases and the only
+block that will act on this signal is DBDIH that controls most of the
+database related part of the start of a data node.
 
 NDB_STTOR PHASE 4
 **************************
-Some initialisation of local checkpoint variables. For initial restarts this is the only
thing happening in this phase.
+Some initialisation of local checkpoint variables. For initial restarts
+this is the only thing happening in this phase.
 
-For system restarts we'll perform all the take overs that are required. This means
currently that all nodes that couldn't be recovered using the REDO log will be restarted
by copying all data from the alive nodes.
+For system restarts we'll perform all the take overs that are required.
+This means currently that all nodes that couldn't be recovered using the
+REDO log will be restarted by copying all data from the alive nodes.
 
 This part is described in the chapter on Take Over Node Handling.
 
-For node restarts and initial node restarts we request that the master node performs some
services on our behalf. We request those services by sending the signal START_MEREQ to the
master. This startphase will be completed when the master responds with a START_MECONF
message. The START_MEREQ handling of the Master is described in the chapter on
START_MEREQ below.
+For node restarts and initial node restarts we request that the master
+node performs some services on our behalf. We request those services by
+sending the signal START_MEREQ to the master. This startphase will be
+completed when the master responds with a START_MECONF message. The
+START_MEREQ handling of the Master is described in the chapter on
+START_MEREQ below.
 
-After completing DBDIH's work in the NDB_STTOR startphase 4 NDBCNTR will also perform
some work on its own. For initial starts it will create the system table that keeps track
of unique identifiers like autoincrement identifiers. All system restarts will after a
synchronisation point start the next NDB_STTOR phase immediately. This phase is only sent
to DBDIH.
------------------------------------------------------------------------------------------
+After completing DBDIH's work in the NDB_STTOR startphase 4 NDBCNTR will
+also perform some work on its own. For initial starts it will create the
+system table that keeps track of unique identifiers like autoincrement
+identifiers. All system restarts will after a synchronisation point
+start the next NDB_STTOR phase immediately. This phase is only sent to
+DBDIH.
+------------------------------------------------------------------------
+-----------------
 SYNCHRONISATION POINT for system restarts: WAITPOINT_4_1
------------------------------------------------------------------------------------------
+------------------------------------------------------------------------
+-----------------
 
 NDB_STTOR PHASE 5
 **************************
-For initial starts and system restarts this phase means executing a local checkpoint.
This is handled by the master so the other nodes will return immediately from this phase.
Node restarts and initial node restarts will perform the copying of the records from the
primary replica to the starting replicas in this phase. Before starting the copy phase
we'll enable local checkpoints.
+For initial starts and system restarts this phase means executing a
+local checkpoint. This is handled by the master so the other nodes will
+return immediately from this phase. Node restarts and initial node
+restarts will perform the copying of the records from the primary
+replica to the starting replicas in this phase. Before starting the copy
+phase we'll enable local checkpoints.
 
-When we copy the data to a starting node this is part of a take over protocol. As part of
this protocol the starting node will get a new node status. This status is communicated
using the global checkpoint protocol. So we'll update the status and then wait for the
global checkpoint to ensure that the new node status is communicated to all nodes and
their system files.
+When we copy the data to a starting node this is part of a take over
+protocol. As part of this protocol the starting node will get a new node
+status. This status is communicated using the global checkpoint
+protocol. So we'll update the status and then wait for the global
+checkpoint to ensure that the new node status is communicated to all
+nodes and their system files.
 
-When the node status have been communicated we communicate to all nodes that we are about
to start the take over protocol for the node. As part of this protocol we will perform
step 3) through 9) as described in the system restart phase. This means we will perform
restore of all the fragments, prepare for execution of the redo log, execute the redo log
and finally reporting back to DBDIH when the execution of the redo log is completed.
+When the node status have been communicated we communicate to all nodes
+that we are about to start the take over protocol for the node. As part
+of this protocol we will perform step 3) through 9) as described in the
+system restart phase. This means we will perform restore of all the
+fragments, prepare for execution of the redo log, execute the redo log
+and finally reporting back to DBDIH when the execution of the redo log
+is completed.
 
-After completing the preparatory work we will start by performing the copy phase for each
fragment in the node. The process to copy a fragment involves the following steps. As a
side note this process is prepared to be performed in parallel. It is supposed to be
possible to have several nodes in this phase in parallel. However all messages that are
sent from the master to all nodes currently is single node at a time so it isn't entirely
parallelisable. It is trivial though to make it completely parallel in this phase.
Currently it isn't parallelised but will most likely be so in a not too distant future.:
+After completing the preparatory work we will start by performing the
+copy phase for each fragment in the node. The process to copy a fragment
+involves the following steps. As a side note this process is prepared to
+be performed in parallel. It is supposed to be possible to have several
+nodes in this phase in parallel. However all messages that are sent from
+the master to all nodes currently is single node at a time so it isn't
+entirely parallelisable. It is trivial though to make it completely
+parallel in this phase. Currently it isn't parallelised but will most
+likely be so in a not too distant future.:
 
-1) We start by informing DBLQH in the starting node about our intention to start the copy
process by sending a PREPARE_COPY_FRAGREQ signal.
+1) We start by informing DBLQH in the starting node about our intention
+to start the copy process by sending a PREPARE_COPY_FRAGREQ signal.
 
-2) When DBLQH responded positively to this request we continue by sending CREATE_FRAGREQ
signal to all nodes to inform all nodes of our intention to copy data to this replica for
this table fragment.
+2) When DBLQH responded positively to this request we continue by
+sending CREATE_FRAGREQ signal to all nodes to inform all nodes of our
+intention to copy data to this replica for this table fragment.
 
-3) After all nodes have responded positively the next step is to send COPY_FRAGREQ to the
node which is to copy the data to the new node. This is always the primary replica of the
fragment. This node will copy all the data over to the starting node in response to this
message.
+3) After all nodes have responded positively the next step is to send
+COPY_FRAGREQ to the node which is to copy the data to the new node. This
+is always the primary replica of the fragment. This node will copy all
+the data over to the starting node in response to this message.
 
-4) After copying has been completed and the COPY_FRAGCONF message received all nodes
receive information about this completion through a UPDATE_TOREQ signal.
+4) After copying has been completed and the COPY_FRAGCONF message
+received all nodes receive information about this completion through a
+UPDATE_TOREQ signal.
 
-5) After all nodes have updated to this new state of the fragment the DBLQH of the
starting node is informed of the fact that the copying has been completed and that the
replica is now up-to-date and failures should now be treated as real failures.
+5) After all nodes have updated to this new state of the fragment the
+DBLQH of the starting node is informed of the fact that the copying has
+been completed and that the replica is now up-to-date and failures
+should now be treated as real failures.
 
-6) The next step is that the new replica is transformed into a primary replica if this is
the role it had when the table was created.
+6) The next step is that the new replica is transformed into a primary
+replica if this is the role it had when the table was created.
 
-7) After completing this change another CREATE_FRAGREQ round is sent to all nodes
informing them that the take over of the fragment is now committed.
+7) After completing this change another CREATE_FRAGREQ round is sent to
+all nodes informing them that the take over of the fragment is now
+committed.
 
-8) After completing this we proceed with the next fragment if more exists.
+8) After completing this we proceed with the next fragment if more
+exists.
 
-9) When no more fragment exists to take over for the node all nodes will be informed of
this through a UPDATE_TOREQ signal sent to all nodes.
+9) When no more fragment exists to take over for the node all nodes will
+be informed of this through a UPDATE_TOREQ signal sent to all nodes.
 
-10) The next step is to wait for a local checkpoint to complete, we need first to wait
for the next local checkpoint to start and then also to wait for that local checkpoint to
end.
+10) The next step is to wait for a local checkpoint to complete, we need
+first to wait for the next local checkpoint to start and then also to
+wait for that local checkpoint to end.
 
-11) The next step is to update the node state which is done using a global checkpoint.
Again we need to wait for a new global checkpoint to start and then wait for it to
finish.
+11) The next step is to update the node state which is done using a
+global checkpoint. Again we need to wait for a new global checkpoint to
+start and then wait for it to finish.
 
-12) When the global checkpoint has completed it will communicate the successful local
checkpoint of this node restart through the signal END_TOREQ sent to all nodes.
+12) When the global checkpoint has completed it will communicate the
+successful local checkpoint of this node restart through the signal
+END_TOREQ sent to all nodes.
 
-13) After sending this to all nodes we complete by sending START_COPYCONF back to the
starting node informing him about that we have completed the node restart.
+13) After sending this to all nodes we complete by sending
+START_COPYCONF back to the starting node informing him about that we
+have completed the node restart.
 
 14) Receiving START_COPYCONF ends the NDB_STTOR start phase 5.
 
------------------------------------------------------------------------------------------
+------------------------------------------------------------------------
+-----------------
 SYNCHRONISATION POINT for system restarts: WAITPOINT_5_2
------------------------------------------------------------------------------------------
+------------------------------------------------------------------------
+-----------------
 
 SUMA:
-In an initial and an initial node restart the SUMA block will request the subscriptions
from the SUMA master node.
+In an initial and an initial node restart the SUMA block will request
+the subscriptions from the SUMA master node.
 
 NDBCNTR:
 NDB_STTOR will execute NDB_STTOR phase 6 and no other NDBCNTR activity.

@@ -313,14 +558,18 @@
 NDB_STTOR PHASE 6
 **************************
 DBLQH will clear internal flags for what type of restart is ongoing.
-DBDICT will clear all internal flags for what type of restart is ongoing.
+DBDICT will clear all internal flags for what type of restart is
+ongoing.
 DBACC will reset system restart flag.
-DBACC and DBTUP will start a periodic signal to check memory usage once every second.
+DBACC and DBTUP will start a periodic signal to check memory usage once
+every second.
 DBTC will set internal variable indicating system restart is completed.
 
------------------------------------------------------------------------------------------
+------------------------------------------------------------------------
+-----------------
 SYNCHRONISATION POINT for system restarts: WAITPOINT_5_1
------------------------------------------------------------------------------------------
+------------------------------------------------------------------------
+-----------------
 
 STTOR PHASE 6
 ------------------------

@@ -328,21 +577,27 @@
 Define the node groups in the cluster.
 
 DBUTIL:
-Initialises a number of data structures to ensure we can send keyed operations to system
tables in an easy manner, also sets up one connection to DBTC.
+Initialises a number of data structures to ensure we can send keyed
+operations to system tables in an easy manner, also sets up one
+connection to DBTC.
 
 STTOR PHASE 7
 ------------------------
 QMGR:
-The president will start an arbitrator unless this feature is disabled. Also checking of
API nodes through heartbeats will be activated.
+The president will start an arbitrator unless this feature is disabled.
+Also checking of API nodes through heartbeats will be activated.
 
 BACKUP:
 Sets disk write speed to value it uses after the restart is completed.
-Master node during initial start will also insert the record keeping track of which
backup id is to be used next.
+Master node during initial start will also insert the record keeping
+track of which backup id is to be used next.
 
 SUMA:
 
 DBTUX:
-Sets variable indicating that we have passed startphase 7. This is to indicate that we
can now stop ignoring requests to DBTUX that occurs when running the REDO log.
+Sets variable indicating that we have passed startphase 7. This is to
+indicate that we can now stop ignoring requests to DBTUX that occurs
+when running the REDO log.
 
 STTOR PHASE 8
 ------------------------

@@ -351,7 +606,8 @@
 
 NDB_STTOR PHASE 7
 **************************
-The master node in a system restart will initiate a rebuild of all indexes from DBDICT in
this phase.
+The master node in a system restart will initiate a rebuild of all
+indexes from DBDICT in this phase.
 
 CMVMI:
 Open communication channels to the API nodes (MySQL Servers).

@@ -370,43 +626,105 @@
 
 SYSTEM RESTART HANDLING PHASE 4
 ---------------------------------------------------------
-1) The master will set the its latest GCI to be the restart GCI and then it will
synchronise its system file to all other nodes in the system restart.
-2) Next step is to synchronise the schema of all the nodes in the system restart. This
will be performed in 15 passes. The problem we are trying to solve here is when a schema
object have been created while the node was up but dropped when the node was down and
possibly even a new object was created with the same schema id while the node was dead.
In order to handle this we recreate all objects that are supposed to exist as the
starting node sees it. Then in the next phase we drop this if it was dropped by other
nodes in the cluster while the node was dead, we also drop tables that have been dropped
by other nodes already while we were dead in this pass. In the final phase we create
tables that have been created by other nodes while the starting node was a dead node. All
these create/drop will only happen locally in the node. As part of this work we will also
ensure that all tables to create have been created locally and that the proper data
structures have been set-up for !
 them in all blocks.
+1) The master will set the its latest GCI to be the restart GCI and then
+it will synchronise its system file to all other nodes in the system
+restart.
+2) Next step is to synchronise the schema of all the nodes in the system
+restart. This will be performed in 15 passes. The problem we are trying
+to solve here is when a schema object have been created while the node
+was up but dropped when the node was down and possibly even a new object
+was created with the same schema id while the node was dead. In order to
+handle this we recreate all objects that are supposed to exist as the
+starting node sees it. Then in the next phase we drop this if it was
+dropped by other nodes in the cluster while the node was dead, we also
+drop tables that have been dropped by other nodes already while we were
+dead in this pass. In the final phase we create tables that have been
+created by other nodes while the starting node was a dead node. All
+these create/drop will only happen locally in the node. As part of this
+work we will also ensure that all tables to create have been created
+locally and that the proper data structures have been set-up for them in
+all blocks.
 
-After performing the above for the master node we'll send the new schema file to all
other participants in the system restart and they will perform the same synchronisation
as described above.
+After performing the above for the master node we'll send the new schema
+file to all other participants in the system restart and they will
+perform the same synchronisation as described above.
 
-3) The next phase is to ensure that all fragments to restart have proper parameters as
derived from DBDIH. This will send a bunch of START_FRAGREQ signals from DIH to LQH. This
phase will also start the restoration of the fragments. Fragments will be restored one by
one and one record at a time so this will be a phase of the system restart which will
read the restore data from disk and in parallel apply the restore data read from disk
into main memory. This only restores the main memory parts of the tables.
+3) The next phase is to ensure that all fragments to restart have proper
+parameters as derived from DBDIH. This will send a bunch of
+START_FRAGREQ signals from DIH to LQH. This phase will also start the
+restoration of the fragments. Fragments will be restored one by one and
+one record at a time so this will be a phase of the system restart which
+will read the restore data from disk and in parallel apply the restore
+data read from disk into main memory. This only restores the main memory
+parts of the tables.
 
-4) The next step is to send START_RECREQ to all nodes in the starting cluster. This will
wait until all fragments have been restored. After all fragments have been restored the
next step is to apply all UNDO logs in the Disk Data Part.
+4) The next step is to send START_RECREQ to all nodes in the starting
+cluster. This will wait until all fragments have been restored. After
+all fragments have been restored the next step is to apply all UNDO logs
+in the Disk Data Part.
 
-5) After applying the UNDO logs in LGMAN we will also perform some restore work in TSMAN
that requires scanning the extent headers of the tablespaces.
+5) After applying the UNDO logs in LGMAN we will also perform some
+restore work in TSMAN that requires scanning the extent headers of the
+tablespaces.
 
-6) The next step is to prepare for execution of the REDO log. The execution of the REDO
log can be performed in upto four phases. For each fragment one might require execution
of REDO logs from several nodes. To handle this we execute the REDO logs different phases
for a specific fragment, the set-up of this was decided in DBDIH when sending the
START_FRAGREQ signal. For each phase and fragment that requires execution in this phase
we'll send an EXEC_FRAGREQ signal. After sending out all those signals we'll send an
EXEC_SRREQ signal to all nodes to tell them they can start executing the REDO log.
+6) The next step is to prepare for execution of the REDO log. The
+execution of the REDO log can be performed in upto four phases. For each
+fragment one might require execution of REDO logs from several nodes. To
+handle this we execute the REDO logs different phases for a specific
+fragment, the set-up of this was decided in DBDIH when sending the
+START_FRAGREQ signal. For each phase and fragment that requires
+execution in this phase we'll send an EXEC_FRAGREQ signal. After sending
+out all those signals we'll send an EXEC_SRREQ signal to all nodes to
+tell them they can start executing the REDO log.
 
 NOTE:
-Before starting execution of the first REDO log execution we'll ensure that the set-up
which was started earlier in start phase 4 by LQH has completed. If it hasn't we'll have
to wait for it to complete before starting the execution of the REDO log.
+Before starting execution of the first REDO log execution we'll ensure
+that the set-up which was started earlier in start phase 4 by LQH has
+completed. If it hasn't we'll have to wait for it to complete before
+starting the execution of the REDO log.
 
-7) Execute the REDO log phase 1. To execute the REDO log we need first to calculate where
to start reading and where we should have reached end of the REDO log execution. As part
of the REDO log execution in each node we'll find the end of the REDO log when we reach
the last GCI to restore.
+7) Execute the REDO log phase 1. To execute the REDO log we need first
+to calculate where to start reading and where we should have reached end
+of the REDO log execution. As part of the REDO log execution in each
+node we'll find the end of the REDO log when we reach the last GCI to
+restore.
 
-8) After completing the execution of the REDO logs four phases we'll ensure that all REDO
log pages that have been written beyond the last GCI to restore will be invalidated. This
might even take the invalidation into new REDO log files after the last one executed.
+8) After completing the execution of the REDO logs four phases we'll
+ensure that all REDO log pages that have been written beyond the last
+GCI to restore will be invalidated. This might even take the
+invalidation into new REDO log files after the last one executed.
 
-9) After completing this final step of REDO log execution LQH will report back
START_RECCONF to DIH.
+9) After completing this final step of REDO log execution LQH will
+report back START_RECCONF to DIH.
 
-10) When the master have received this message back from all starting nodes it will send
NDB_STARTCONF back to NDBCNTR.
+10) When the master have received this message back from all starting
+nodes it will send NDB_STARTCONF back to NDBCNTR.
 
-11) End of STTOR phase 4 sent by NDBCNTR which was the only block that did any real work
in this phase.
+11) End of STTOR phase 4 sent by NDBCNTR which was the only block that
+did any real work in this phase.
 
 
 Take Over Node Handling
 ------------------------------------
-This is part of system restart to restart nodes that couldn't be restarted using the REDO
log.
+This is part of system restart to restart nodes that couldn't be
+restarted using the REDO log.
 
 START_MEREQ Handling
 ------------------------------------
-The first step in handling START_MEREQ is to ensure that no local checkpoint is ongoing.
If one is ongoing, then at first one waits until its completed. The next step is to copy
all distribution information from the master DBDIH to the starting DBDIH. After this all
meta data is synchronised in DBDICT (see description in System Restart handling above
2)).
+The first step in handling START_MEREQ is to ensure that no local
+checkpoint is ongoing. If one is ongoing, then at first one waits until
+its completed. The next step is to copy all distribution information
+from the master DBDIH to the starting DBDIH. After this all meta data is
+synchronised in DBDICT (see description in System Restart handling above
+2)).
 
-After blocking local checkpoints, synchronising distribution information and meta data
information we will block the global checkpoints.
+After blocking local checkpoints, synchronising distribution information
+and meta data information we will block the global checkpoints.
 
-Next step is to integrate the starting node in the global checkpoint protocol, local
checkpoint protocol and all other distributed protocols. As part of this we also update
the node status.
+Next step is to integrate the starting node in the global checkpoint
+protocol, local checkpoint protocol and all other distributed protocols.
+As part of this we also update the node status.
 
-After completing this step we allow the global checkpoint protocol to start again. Then
we send the START_MECONF signal to indicate to the starting node we're ready for the next
phase.
+After completing this step we allow the global checkpoint protocol to
+start again. Then we send the START_MECONF signal to indicate to the
+starting node we're ready for the next phase.


Thread
svn commit - mysqldoc@docsrva: r6913 - trunk/ndbapijon26 Jun