List:Commits« Previous MessageNext Message »
From:jon Date:June 12 2007 7:29pm
Subject:svn commit - mysqldoc@docsrva: r6768 - trunk/ndbapi
View as plain text  
Author: jstephens
Date: 2007-06-12 21:29:15 +0200 (Tue, 12 Jun 2007)
New Revision: 6768

Log:

Temporary file: code comments awaiting markup + edits.



Added:
   trunk/ndbapi/pgman-tmp.xml


Added: trunk/ndbapi/pgman-tmp.xml
===================================================================
--- trunk/ndbapi/pgman-tmp.xml	                        (rev 0)
+++ trunk/ndbapi/pgman-tmp.xml	2007-06-12 19:29:15 UTC (rev 6768)
Changed blocks: 1, Lines Added: 208, Lines Deleted: 0; 9067 bytes

@@ -0,0 +1,208 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE section SYSTEM "http://www.docbook.org/xml/4.4/docbookx.dtd">
+<section>
+  <title></title>
+  
+  <para>
+    PAGE ENTRIES AND REQUESTS
+ *
+ * Central structure is "page entry".  It corresponds to a disk page
+ * identified by file and page number (file_no, page_no).
+ *
+ * A page entry is created by first request for the disk page.
+ * Subsequent requests are queued under the same page entry.
+ *
+ * There is a limited number of in-memory "cache pages", also called
+ * "buffer pages" or "real pages".  These are used by the more numerous
+ * page entries to buffer the disk pages.
+ *
+ * A new or non-resident page entry must first be "bound" to an
+ * available cache page.  Next the disk page must be "mapped" to the
+ * cache page.  If the page is empty (never written) it is considered
+ * mapped trivially.  Otherwise the cache page must be updated via
+ * "pagein" from disk.  A bound and mapped page is called "resident".
+ *
+ * Updating a resident cache page makes it "dirty".  A background
+ * clean-up process makes dirty pages "clean" via "pageout" to disk.
+ * Write ahead logging (WAL) of the page is done first i.e. UNDO log is
+ * flushed up to the page log sequence number (LSN) by calling a TSMAN
+ * method.  The reason for this is obvious but not relevant to PGMAN.
+ *
+ * A local check point (LCP) periodically performs a complete pageout of
+ * dirty pages.  It must iterate over a list which will cover all pages
+ * which had been dirty since LCP start.
+ *
+ * A clean page is a candidate ("victim") for being "unmapped" and
+ * "evicted" from the cache, to allow another page to become resident.
+ * This process is called "page replacement".
+ *
+ * PAGE REPLACEMENT
+ *
+ * Page replacement uses the LIRS algorithm (Jiang-Zhang).
+ * 
+ * The "recency" of a page is the time between now and the last request
+ * for the page.  The "inter-reference recency" (IRR) of a page is the
+ * time between the last 2 requests for the page.  "Time" is advanced by
+ * request for any page.
+ *
+ * Page entries are divided into "hot" ("lir") and "cold" ("hir").  Here
+ * lir/hir refers to low/high IRR.  Hot pages are always resident but
+ * cold pages need not be.
+ *
+ * Number of hot pages is limited to slightly less than number of cache
+ * pages.  Until this number is reached, all used cache pages are hot.
+ * Then the algorithm described next is applied.  The algorithm avoids
+ * storing any of the actual recency values.
+ *
+ * Primary data structure is the "stack".  It contains all hot entries
+ * and recently referenced cold entries (resident or not).  The stack is
+ * in recency order with most recent (lowest recency) entry on top.
+ * Entries which are less recent than the least recent hot page are
+ * removed ("stack pruning").  So the bottom page is always hot.
+ *
+ * The cold entries on the stack are undergoing a "trial period".  If
+ * they are referenced soon again (see IRR), they become hot.  Otherwise
+ * they fall off the bottom of the stack.
+ *
+ * Secondary data structure is the "queue".  It contains all resident
+ * cold pages (on stack or not).  When a hot page is removed from the
+ * stack it is added to the end of the queue.  When page replacement
+ * needs a page it removes it from the front of the queue.
+ *
+ * Page requests cause the input entry to be inserted and updated in
+ * LIRS lists.  Remember that an entry can be present on both stack and
+ * queue.  The rules per type of input entry are:
+ *
+ * 1. Hot.  Move input entry to stack top.  If input entry was at stack
+ * bottom, do stack pruning.
+ *
+ * 2. Cold resident.  Move input entry to stack top.  Then:
+ *
+ * 2a. If input entry was on stack, change it to hot, remove it from
+ * queue, change stack bottom entry to cold and move the bottom entry to
+ * queue end, and do stack pruning.
+ *
+ * 2b. If input entry was on queue only, leave it cold but move it to
+ * end of queue.
+ *
+ * 3. Cold non-resident.  Remove entry at queue front and evict it from
+ * the cache.  If the evicted entry was on stack, it remains as unbound
+ * entry on stack, to continue its trial period.  Map input entry to the
+ * freed cache page.  Move input entry to stack top.  Then:
+ *
+ * 3a. If input entry was on stack, change it to hot, change stack
+ * bottom entry to cold and move the bottom entry to queue end, and do
+ * stack pruning.
+ *
+ * 3b. If input entry was new, leave it cold but move it to end of
+ * queue.
+ *
+ * LIRS CHANGES
+ *
+ * In LIRS the 'resident' requirement is changed as follows:
+ *
+ * Stack entries, including hot ones, can have any state.  Unbound stack
+ * entries are created by new requests and by pages evicted from queue
+ * front which are still on stack.
+ *
+ * Queue entries must be bound.  They become resident and evictable
+ * within a finite time.  A page is "evictable" if it is mapped, clean,
+ * and has no requests.
+ *
+ * An unbound entry which should be on queue is added there at bind
+ * time.  Such entries are created when an unbound entry with open
+ * requests is popped (hot) or pruned (cold) from the stack.  This can
+ * happen if the cache is too small.
+ *
+ * CLEANUP PROCESS
+ *
+ * LIRS (and related algorithms) do not address dirty pages.  From above
+ * it is obvious that the clean-up process should process dirty queue
+ * entries proceeding from front to end.  This also favors pages with
+ * lower LSN numbers which minimizes amount of WAL to write.
+ *
+ * In fact the clean-up process holds a permanent pointer into the queue
+ * where all entries strictly towards the front are clean.  For such an
+ * entry to become dirty it must be referenced again which moves it to
+ * queue end and past the clean-up pointer.  (In practice, until this
+ * works, cleanup recycles back to queue front).
+ *
+ * PAGE LISTS
+ *
+ * Page entries are put on a number of lists.
+ *
+ * 1. Hash table on (file_no, page_no).  Used for fast lookup and for
+ * LCP to iterate over.
+ *
+ * The other lists are doubly-linked FIFOs.  In general entries are
+ * added to the end (last entry) and processed from the front (first
+ * entry).  When used as stack, end is top and front is bottom.
+ *
+ * 2. The LIRS stack and queue.  These control page replacement.
+ *
+ * 3. Page entries are divided into disjoint "sublists" based on page
+ * "state" i.e. the set of page properties.  Some sublists drive page
+ * processing and have next entry to process at the front.
+ *
+ * Current sublists are as follows.  Those that drive processing are
+ * marked with a plus (+).
+ *
+ * SL_BIND          + waiting for available buffer page
+ * SL_MAP           + waiting to start pagein from disk
+ * SL_MAP_IO        - above in i/o wait (the pagein)
+ * SL_CALLBACK      + request done, waiting to invoke callbacks
+ * SL_CALLBACK_IO   - above in i/o wait (pageout by cleanup)
+ * SL_BUSY          - being written to by PGMAN client
+ * SL_LOCKED        - permanently locked to cache
+ * SL_OTHER         - default sublist
+ *
+ * PAGE PROCESSING
+ *
+ * Page processing uses a number independent continueB loops.
+ *
+ * 1. The "stats loop".  Started at node start.  Checks lists in debug
+ * mode.  In the future could gather statistics and adjust parameters
+ * based on load.  Continues via delay signal.
+ *
+ * 2. The "busy loop".  Started by page request.  Each loop does bind,
+ * map, and callback of a number of entries.  Continues via no-delay
+ * signal until nothing to do.
+ *
+ * 3. The "cleanup loop".  Started at node start.  Each loop starts
+ * pageout of a number of dirty queue entries.  Continues via delay
+ * signal.
+ *
+ * 4. The "LCP loop".  Started periodically by NDB.  Each loop starts
+ * pageout of a number of hash list entries.  Continues via delay signal
+ * until done.
+ *
+ * SPECIAL CASES
+ *
+ * LOCKED pages are not put on stack or queue.  They are flushed to disk
+ * by LCP but not by clean-up.
+ *
+ * A TUP scan is likely to access a page repeatedly within a short time.
+ * This can make the page hot when it should not be.  Such "correlated
+ * requests" are handled by a request flag which modifies default LIRS
+ * processing.  [fill in details later]
+ *
+ * Also PK operations make 2 rapid page references.  The 2nd one is for
+ * commit.  This too should be handled as a correlated request.
+ *
+ * CLIENT TSMAN
+ *
+ * TSMAN reads "meta" pages such as extent headers.  There are currently
+ * "locked" forever in PGMAN cache.
+ *
+ * CLIENT DBTUP
+ * 
+ * DBTUP works with copy pages (or UNDO buffers) in memory.  The real
+ * page is updated only between page request with COMMIT_REQ flag and
+ * a subsequent LSN update.  These need not occur in same timeslice
+ * since DBTUP may need to flush UNDO log in-between.
+ *
+ * The page is "busy" if any transaction is between COMMIT_REQ and LSN
+ * update.  A busy page must be locked in buffer cache.  No pageout of
+ * a busy page can be started by clean-up or LCP.
+  </para>
+</section>


Thread
svn commit - mysqldoc@docsrva: r6768 - trunk/ndbapijon12 Jun