Author: jstephens
Date: 2007-06-12 21:29:15 +0200 (Tue, 12 Jun 2007)
New Revision: 6768
Log:
Temporary file: code comments awaiting markup + edits.
Added:
trunk/ndbapi/pgman-tmp.xml
Added: trunk/ndbapi/pgman-tmp.xml
===================================================================
--- trunk/ndbapi/pgman-tmp.xml (rev 0)
+++ trunk/ndbapi/pgman-tmp.xml 2007-06-12 19:29:15 UTC (rev 6768)
Changed blocks: 1, Lines Added: 208, Lines Deleted: 0; 9067 bytes
@@ -0,0 +1,208 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE section SYSTEM "http://www.docbook.org/xml/4.4/docbookx.dtd">
+<section>
+ <title></title>
+
+ <para>
+ PAGE ENTRIES AND REQUESTS
+ *
+ * Central structure is "page entry". It corresponds to a disk page
+ * identified by file and page number (file_no, page_no).
+ *
+ * A page entry is created by first request for the disk page.
+ * Subsequent requests are queued under the same page entry.
+ *
+ * There is a limited number of in-memory "cache pages", also called
+ * "buffer pages" or "real pages". These are used by the more numerous
+ * page entries to buffer the disk pages.
+ *
+ * A new or non-resident page entry must first be "bound" to an
+ * available cache page. Next the disk page must be "mapped" to the
+ * cache page. If the page is empty (never written) it is considered
+ * mapped trivially. Otherwise the cache page must be updated via
+ * "pagein" from disk. A bound and mapped page is called "resident".
+ *
+ * Updating a resident cache page makes it "dirty". A background
+ * clean-up process makes dirty pages "clean" via "pageout" to disk.
+ * Write ahead logging (WAL) of the page is done first i.e. UNDO log is
+ * flushed up to the page log sequence number (LSN) by calling a TSMAN
+ * method. The reason for this is obvious but not relevant to PGMAN.
+ *
+ * A local check point (LCP) periodically performs a complete pageout of
+ * dirty pages. It must iterate over a list which will cover all pages
+ * which had been dirty since LCP start.
+ *
+ * A clean page is a candidate ("victim") for being "unmapped" and
+ * "evicted" from the cache, to allow another page to become resident.
+ * This process is called "page replacement".
+ *
+ * PAGE REPLACEMENT
+ *
+ * Page replacement uses the LIRS algorithm (Jiang-Zhang).
+ *
+ * The "recency" of a page is the time between now and the last request
+ * for the page. The "inter-reference recency" (IRR) of a page is the
+ * time between the last 2 requests for the page. "Time" is advanced by
+ * request for any page.
+ *
+ * Page entries are divided into "hot" ("lir") and "cold" ("hir"). Here
+ * lir/hir refers to low/high IRR. Hot pages are always resident but
+ * cold pages need not be.
+ *
+ * Number of hot pages is limited to slightly less than number of cache
+ * pages. Until this number is reached, all used cache pages are hot.
+ * Then the algorithm described next is applied. The algorithm avoids
+ * storing any of the actual recency values.
+ *
+ * Primary data structure is the "stack". It contains all hot entries
+ * and recently referenced cold entries (resident or not). The stack is
+ * in recency order with most recent (lowest recency) entry on top.
+ * Entries which are less recent than the least recent hot page are
+ * removed ("stack pruning"). So the bottom page is always hot.
+ *
+ * The cold entries on the stack are undergoing a "trial period". If
+ * they are referenced soon again (see IRR), they become hot. Otherwise
+ * they fall off the bottom of the stack.
+ *
+ * Secondary data structure is the "queue". It contains all resident
+ * cold pages (on stack or not). When a hot page is removed from the
+ * stack it is added to the end of the queue. When page replacement
+ * needs a page it removes it from the front of the queue.
+ *
+ * Page requests cause the input entry to be inserted and updated in
+ * LIRS lists. Remember that an entry can be present on both stack and
+ * queue. The rules per type of input entry are:
+ *
+ * 1. Hot. Move input entry to stack top. If input entry was at stack
+ * bottom, do stack pruning.
+ *
+ * 2. Cold resident. Move input entry to stack top. Then:
+ *
+ * 2a. If input entry was on stack, change it to hot, remove it from
+ * queue, change stack bottom entry to cold and move the bottom entry to
+ * queue end, and do stack pruning.
+ *
+ * 2b. If input entry was on queue only, leave it cold but move it to
+ * end of queue.
+ *
+ * 3. Cold non-resident. Remove entry at queue front and evict it from
+ * the cache. If the evicted entry was on stack, it remains as unbound
+ * entry on stack, to continue its trial period. Map input entry to the
+ * freed cache page. Move input entry to stack top. Then:
+ *
+ * 3a. If input entry was on stack, change it to hot, change stack
+ * bottom entry to cold and move the bottom entry to queue end, and do
+ * stack pruning.
+ *
+ * 3b. If input entry was new, leave it cold but move it to end of
+ * queue.
+ *
+ * LIRS CHANGES
+ *
+ * In LIRS the 'resident' requirement is changed as follows:
+ *
+ * Stack entries, including hot ones, can have any state. Unbound stack
+ * entries are created by new requests and by pages evicted from queue
+ * front which are still on stack.
+ *
+ * Queue entries must be bound. They become resident and evictable
+ * within a finite time. A page is "evictable" if it is mapped, clean,
+ * and has no requests.
+ *
+ * An unbound entry which should be on queue is added there at bind
+ * time. Such entries are created when an unbound entry with open
+ * requests is popped (hot) or pruned (cold) from the stack. This can
+ * happen if the cache is too small.
+ *
+ * CLEANUP PROCESS
+ *
+ * LIRS (and related algorithms) do not address dirty pages. From above
+ * it is obvious that the clean-up process should process dirty queue
+ * entries proceeding from front to end. This also favors pages with
+ * lower LSN numbers which minimizes amount of WAL to write.
+ *
+ * In fact the clean-up process holds a permanent pointer into the queue
+ * where all entries strictly towards the front are clean. For such an
+ * entry to become dirty it must be referenced again which moves it to
+ * queue end and past the clean-up pointer. (In practice, until this
+ * works, cleanup recycles back to queue front).
+ *
+ * PAGE LISTS
+ *
+ * Page entries are put on a number of lists.
+ *
+ * 1. Hash table on (file_no, page_no). Used for fast lookup and for
+ * LCP to iterate over.
+ *
+ * The other lists are doubly-linked FIFOs. In general entries are
+ * added to the end (last entry) and processed from the front (first
+ * entry). When used as stack, end is top and front is bottom.
+ *
+ * 2. The LIRS stack and queue. These control page replacement.
+ *
+ * 3. Page entries are divided into disjoint "sublists" based on page
+ * "state" i.e. the set of page properties. Some sublists drive page
+ * processing and have next entry to process at the front.
+ *
+ * Current sublists are as follows. Those that drive processing are
+ * marked with a plus (+).
+ *
+ * SL_BIND + waiting for available buffer page
+ * SL_MAP + waiting to start pagein from disk
+ * SL_MAP_IO - above in i/o wait (the pagein)
+ * SL_CALLBACK + request done, waiting to invoke callbacks
+ * SL_CALLBACK_IO - above in i/o wait (pageout by cleanup)
+ * SL_BUSY - being written to by PGMAN client
+ * SL_LOCKED - permanently locked to cache
+ * SL_OTHER - default sublist
+ *
+ * PAGE PROCESSING
+ *
+ * Page processing uses a number independent continueB loops.
+ *
+ * 1. The "stats loop". Started at node start. Checks lists in debug
+ * mode. In the future could gather statistics and adjust parameters
+ * based on load. Continues via delay signal.
+ *
+ * 2. The "busy loop". Started by page request. Each loop does bind,
+ * map, and callback of a number of entries. Continues via no-delay
+ * signal until nothing to do.
+ *
+ * 3. The "cleanup loop". Started at node start. Each loop starts
+ * pageout of a number of dirty queue entries. Continues via delay
+ * signal.
+ *
+ * 4. The "LCP loop". Started periodically by NDB. Each loop starts
+ * pageout of a number of hash list entries. Continues via delay signal
+ * until done.
+ *
+ * SPECIAL CASES
+ *
+ * LOCKED pages are not put on stack or queue. They are flushed to disk
+ * by LCP but not by clean-up.
+ *
+ * A TUP scan is likely to access a page repeatedly within a short time.
+ * This can make the page hot when it should not be. Such "correlated
+ * requests" are handled by a request flag which modifies default LIRS
+ * processing. [fill in details later]
+ *
+ * Also PK operations make 2 rapid page references. The 2nd one is for
+ * commit. This too should be handled as a correlated request.
+ *
+ * CLIENT TSMAN
+ *
+ * TSMAN reads "meta" pages such as extent headers. There are currently
+ * "locked" forever in PGMAN cache.
+ *
+ * CLIENT DBTUP
+ *
+ * DBTUP works with copy pages (or UNDO buffers) in memory. The real
+ * page is updated only between page request with COMMIT_REQ flag and
+ * a subsequent LSN update. These need not occur in same timeslice
+ * since DBTUP may need to flush UNDO log in-between.
+ *
+ * The page is "busy" if any transaction is between COMMIT_REQ and LSN
+ * update. A busy page must be locked in buffer cache. No pageout of
+ * a busy page can be started by clean-up or LCP.
+ </para>
+</section>
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r6768 - trunk/ndbapi | jon | 12 Jun |