From: Date: June 12 2007 9:29pm Subject: svn commit - mysqldoc@docsrva: r6768 - trunk/ndbapi List-Archive: http://lists.mysql.com/commits/28601 Message-Id: <200706121929.l5CJTGtO024975@docsrva.mysql.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Author: jstephens Date: 2007-06-12 21:29:15 +0200 (Tue, 12 Jun 2007) New Revision: 6768 Log: Temporary file: code comments awaiting markup + edits. Added: trunk/ndbapi/pgman-tmp.xml Added: trunk/ndbapi/pgman-tmp.xml =================================================================== --- trunk/ndbapi/pgman-tmp.xml (rev 0) +++ trunk/ndbapi/pgman-tmp.xml 2007-06-12 19:29:15 UTC (rev 6768) Changed blocks: 1, Lines Added: 208, Lines Deleted: 0; 9067 bytes @@ -0,0 +1,208 @@ + + +
+ + + + PAGE ENTRIES AND REQUESTS + * + * Central structure is "page entry". It corresponds to a disk page + * identified by file and page number (file_no, page_no). + * + * A page entry is created by first request for the disk page. + * Subsequent requests are queued under the same page entry. + * + * There is a limited number of in-memory "cache pages", also called + * "buffer pages" or "real pages". These are used by the more numerous + * page entries to buffer the disk pages. + * + * A new or non-resident page entry must first be "bound" to an + * available cache page. Next the disk page must be "mapped" to the + * cache page. If the page is empty (never written) it is considered + * mapped trivially. Otherwise the cache page must be updated via + * "pagein" from disk. A bound and mapped page is called "resident". + * + * Updating a resident cache page makes it "dirty". A background + * clean-up process makes dirty pages "clean" via "pageout" to disk. + * Write ahead logging (WAL) of the page is done first i.e. UNDO log is + * flushed up to the page log sequence number (LSN) by calling a TSMAN + * method. The reason for this is obvious but not relevant to PGMAN. + * + * A local check point (LCP) periodically performs a complete pageout of + * dirty pages. It must iterate over a list which will cover all pages + * which had been dirty since LCP start. + * + * A clean page is a candidate ("victim") for being "unmapped" and + * "evicted" from the cache, to allow another page to become resident. + * This process is called "page replacement". + * + * PAGE REPLACEMENT + * + * Page replacement uses the LIRS algorithm (Jiang-Zhang). + * + * The "recency" of a page is the time between now and the last request + * for the page. The "inter-reference recency" (IRR) of a page is the + * time between the last 2 requests for the page. "Time" is advanced by + * request for any page. + * + * Page entries are divided into "hot" ("lir") and "cold" ("hir"). Here + * lir/hir refers to low/high IRR. Hot pages are always resident but + * cold pages need not be. + * + * Number of hot pages is limited to slightly less than number of cache + * pages. Until this number is reached, all used cache pages are hot. + * Then the algorithm described next is applied. The algorithm avoids + * storing any of the actual recency values. + * + * Primary data structure is the "stack". It contains all hot entries + * and recently referenced cold entries (resident or not). The stack is + * in recency order with most recent (lowest recency) entry on top. + * Entries which are less recent than the least recent hot page are + * removed ("stack pruning"). So the bottom page is always hot. + * + * The cold entries on the stack are undergoing a "trial period". If + * they are referenced soon again (see IRR), they become hot. Otherwise + * they fall off the bottom of the stack. + * + * Secondary data structure is the "queue". It contains all resident + * cold pages (on stack or not). When a hot page is removed from the + * stack it is added to the end of the queue. When page replacement + * needs a page it removes it from the front of the queue. + * + * Page requests cause the input entry to be inserted and updated in + * LIRS lists. Remember that an entry can be present on both stack and + * queue. The rules per type of input entry are: + * + * 1. Hot. Move input entry to stack top. If input entry was at stack + * bottom, do stack pruning. + * + * 2. Cold resident. Move input entry to stack top. Then: + * + * 2a. If input entry was on stack, change it to hot, remove it from + * queue, change stack bottom entry to cold and move the bottom entry to + * queue end, and do stack pruning. + * + * 2b. If input entry was on queue only, leave it cold but move it to + * end of queue. + * + * 3. Cold non-resident. Remove entry at queue front and evict it from + * the cache. If the evicted entry was on stack, it remains as unbound + * entry on stack, to continue its trial period. Map input entry to the + * freed cache page. Move input entry to stack top. Then: + * + * 3a. If input entry was on stack, change it to hot, change stack + * bottom entry to cold and move the bottom entry to queue end, and do + * stack pruning. + * + * 3b. If input entry was new, leave it cold but move it to end of + * queue. + * + * LIRS CHANGES + * + * In LIRS the 'resident' requirement is changed as follows: + * + * Stack entries, including hot ones, can have any state. Unbound stack + * entries are created by new requests and by pages evicted from queue + * front which are still on stack. + * + * Queue entries must be bound. They become resident and evictable + * within a finite time. A page is "evictable" if it is mapped, clean, + * and has no requests. + * + * An unbound entry which should be on queue is added there at bind + * time. Such entries are created when an unbound entry with open + * requests is popped (hot) or pruned (cold) from the stack. This can + * happen if the cache is too small. + * + * CLEANUP PROCESS + * + * LIRS (and related algorithms) do not address dirty pages. From above + * it is obvious that the clean-up process should process dirty queue + * entries proceeding from front to end. This also favors pages with + * lower LSN numbers which minimizes amount of WAL to write. + * + * In fact the clean-up process holds a permanent pointer into the queue + * where all entries strictly towards the front are clean. For such an + * entry to become dirty it must be referenced again which moves it to + * queue end and past the clean-up pointer. (In practice, until this + * works, cleanup recycles back to queue front). + * + * PAGE LISTS + * + * Page entries are put on a number of lists. + * + * 1. Hash table on (file_no, page_no). Used for fast lookup and for + * LCP to iterate over. + * + * The other lists are doubly-linked FIFOs. In general entries are + * added to the end (last entry) and processed from the front (first + * entry). When used as stack, end is top and front is bottom. + * + * 2. The LIRS stack and queue. These control page replacement. + * + * 3. Page entries are divided into disjoint "sublists" based on page + * "state" i.e. the set of page properties. Some sublists drive page + * processing and have next entry to process at the front. + * + * Current sublists are as follows. Those that drive processing are + * marked with a plus (+). + * + * SL_BIND + waiting for available buffer page + * SL_MAP + waiting to start pagein from disk + * SL_MAP_IO - above in i/o wait (the pagein) + * SL_CALLBACK + request done, waiting to invoke callbacks + * SL_CALLBACK_IO - above in i/o wait (pageout by cleanup) + * SL_BUSY - being written to by PGMAN client + * SL_LOCKED - permanently locked to cache + * SL_OTHER - default sublist + * + * PAGE PROCESSING + * + * Page processing uses a number independent continueB loops. + * + * 1. The "stats loop". Started at node start. Checks lists in debug + * mode. In the future could gather statistics and adjust parameters + * based on load. Continues via delay signal. + * + * 2. The "busy loop". Started by page request. Each loop does bind, + * map, and callback of a number of entries. Continues via no-delay + * signal until nothing to do. + * + * 3. The "cleanup loop". Started at node start. Each loop starts + * pageout of a number of dirty queue entries. Continues via delay + * signal. + * + * 4. The "LCP loop". Started periodically by NDB. Each loop starts + * pageout of a number of hash list entries. Continues via delay signal + * until done. + * + * SPECIAL CASES + * + * LOCKED pages are not put on stack or queue. They are flushed to disk + * by LCP but not by clean-up. + * + * A TUP scan is likely to access a page repeatedly within a short time. + * This can make the page hot when it should not be. Such "correlated + * requests" are handled by a request flag which modifies default LIRS + * processing. [fill in details later] + * + * Also PK operations make 2 rapid page references. The 2nd one is for + * commit. This too should be handled as a correlated request. + * + * CLIENT TSMAN + * + * TSMAN reads "meta" pages such as extent headers. There are currently + * "locked" forever in PGMAN cache. + * + * CLIENT DBTUP + * + * DBTUP works with copy pages (or UNDO buffers) in memory. The real + * page is updated only between page request with COMMIT_REQ flag and + * a subsequent LSN update. These need not occur in same timeslice + * since DBTUP may need to flush UNDO log in-between. + * + * The page is "busy" if any transaction is between COMMIT_REQ and LSN + * update. A busy page must be locked in buffer cache. No pageout of + * a busy page can be started by clean-up or LCP. + +