#At file:///home/inaam/w/lru_flush/ based on revid:marc.alff@stripped
3352 Inaam Rana 2011-08-10
WL5580: Changes to LRU flushing (InnoDB)
Approved by: Marko, Sunny
rb://589
This work is performance related. The idea is to off load flushing
activity that happens in the LRU list from user threads to the
background thread i.e.: the page_cleaner. Also included in the scope is
simpler and may be better heuristic for LRU flushing.
Summary of Changes:
New Config Options:
===================
innodb_lru_scan_depth (default 1024): dynamic, min:100, max:~0
innodb_flush_neighbors (default TRUE): dynamic
innodb_doublewrite_batch_size (default 120) static, min 1, max 127
(undocumented for internal testing only. enabled when
UNIV_PERF_DEBUG is defined)
New LRU flushing algorithm:
===========================
* LRU flushing happens only in page_cleaner thread
* LRU flushing includes cleaning the tail of LRU list AND putting
blocks to the free list
* When a user threads can't find a block in free list or a clean block
in the tail of LRU then it triggers a new type of flush called
BUF_FLUSH_SINGLE_PAGE in which it tries to flush a single page from LRU
list instead of triggering a batch.
Page eviction algorithm:
========================
* iteration 0:
* get a block from free list, success:done
* if there is an LRU flush batch in progress:
* wait for batch to end: retry free list
* if buf_pool->try_LRU_scan is set
* scan LRU up to srv_LRU_scan_depth to find a clean block
* the above will put the block on free list
* success:retry the free list
* flush one dirty page from tail of LRU to disk
* the above will put the block on free list
* success: retry the free list
* iteration 1:
* same as iteration 0 except:
* scan whole LRU list
* scan LRU list even if buf_pool->try_LRU_scan is not set
* iteration > 1:
* same as iteration 1 but sleep 100ms
Note that potential convoy problem where all user threads try to find
a clean page in the tail of the LRU list when there is none is resolved
by introducing buf_pool->try_LRU_scan flag which is set to TRUE when an
LRU batch is completed and is set to FALSE when an LRU scan fails to
find a clean page.
Doublewrite buffer changes:
===========================
The doublewrite buffer is split into two parts. First part is used
for batch flushing (e.g.: LRU flushing and flush_list flushing) while
the second part is used for single page flushes. The logic for the
batch flushing remains same. For the single page flushing we use a
flag to indicate if a slot is in use and we force a write to the disk
after writing to the doublewrite buffer right away.
There is an undocumented hidden config parameter
innodb_doublewrite_batch_size which is visible only with
UNIV_PERF_DEBUG or UNIV_DEBUG. The value determines how much of
doublewrite is to be used for batch flushing. The default is 120 and
allowable values are 1 - 127. It is a static variable.
LRU batch size:
===============
The size of an LRU batch depends on how deep we scan the LRU
list i.e.: innodb_LRU_scan_depth. But since user threads wait for an
LRU batch to finish and since the size of doublewrite buffer is 128 it
makes sense to divide one big LRU batch into multiple chunks.
PAGE_CLEANER_LRU_BATCH_CHUNK_SIZE == 100 does that i.e.: after flushing
100 pages the page cleaner signals waiting user threads to proceed to
grab a free page.
added:
mysql-test/suite/sys_vars/r/innodb_flush_neighbors_basic.result
mysql-test/suite/sys_vars/r/innodb_lru_scan_depth_basic.result
mysql-test/suite/sys_vars/t/innodb_flush_neighbors_basic.test
mysql-test/suite/sys_vars/t/innodb_lru_scan_depth_basic.test
modified:
mysql-test/suite/innodb/r/innodb_monitor.result
mysql-test/suite/innodb/t/innodb_buffer_pool_load-master.opt
mysql-test/suite/sys_vars/r/all_vars.result
mysql-test/suite/sys_vars/t/all_vars.test
storage/innobase/btr/btr0sea.c
storage/innobase/buf/buf0buf.c
storage/innobase/buf/buf0flu.c
storage/innobase/buf/buf0lru.c
storage/innobase/buf/buf0rea.c
storage/innobase/handler/ha_innodb.cc
storage/innobase/ibuf/ibuf0ibuf.c
storage/innobase/include/buf0buf.h
storage/innobase/include/buf0buf.ic
storage/innobase/include/buf0flu.h
storage/innobase/include/buf0lru.h
storage/innobase/include/buf0types.h
storage/innobase/include/ibuf0ibuf.ic
storage/innobase/include/srv0mon.h
storage/innobase/include/srv0srv.h
storage/innobase/include/trx0sys.h
storage/innobase/log/log0log.c
storage/innobase/srv/srv0mon.c
storage/innobase/srv/srv0srv.c
storage/innobase/trx/trx0sys.c
=== modified file 'mysql-test/suite/innodb/r/innodb_monitor.result'
--- a/mysql-test/suite/innodb/r/innodb_monitor.result revid:marc.alff@stripped
+++ b/mysql-test/suite/innodb/r/innodb_monitor.result revid:inaam.rana@stripped
@@ -38,25 +38,47 @@ buffer_pages_written enabled
buffer_pages_read enabled
buffer_data_reads enabled
buffer_data_written enabled
-buffer_flush_adaptive_flushes disabled
-buffer_flush_adaptive_pages disabled
-buffer_flush_async_flushes disabled
-buffer_flush_async_pages disabled
-buffer_flush_sync_flushes disabled
-buffer_flush_sync_pages disabled
-buffer_flush_max_dirty_flushes disabled
-buffer_flush_max_dirty_pages disabled
-buffer_flush_free_margin_flushes disabled
-buffer_flush_free_margin_pages disabled
-buffer_flush_io_capacity_pct disabled
buffer_flush_batch_scanned disabled
buffer_flush_batch_num_scan disabled
buffer_flush_batch_scanned_per_call disabled
buffer_flush_batch_total_pages disabled
buffer_flush_batches disabled
buffer_flush_batch_pages disabled
-buffer_flush_by_lru disabled
-buffer_flush_by_list disabled
+buffer_flush_neighbor_total_pages disabled
+buffer_flush_neighbor disabled
+buffer_flush_neighbor_pages disabled
+buffer_flush_max_dirty_total_pages disabled
+buffer_flush_max_dirty disabled
+buffer_flush_max_dirty_pages disabled
+buffer_flush_adaptive_total_pages disabled
+buffer_flush_adaptive disabled
+buffer_flush_adaptive_pages disabled
+buffer_flush_async_total_pages disabled
+buffer_flush_async disabled
+buffer_flush_async_pages disabled
+buffer_flush_sync_total_pages disabled
+buffer_flush_sync disabled
+buffer_flush_sync_pages disabled
+buffer_flush_background_total_pages disabled
+buffer_flush_background disabled
+buffer_flush_background_pages disabled
+buffer_LRU_batch_scanned disabled
+buffer_LRU_batch_num_scan disabled
+buffer_LRU_batch_scanned_per_call disabled
+buffer_LRU_batch_total_pages disabled
+buffer_LRU_batches disabled
+buffer_LRU_batch_pages disabled
+buffer_LRU_single_flush_scanned disabled
+buffer_LRU_single_flush_num_scan disabled
+buffer_LRU_single_flush_scanned_per_call disabled
+buffer_LRU_single_flush_failure_count disabled
+buffer_LRU_get_free_search disabled
+buffer_LRU_search_scanned disabled
+buffer_LRU_search_num_scan disabled
+buffer_LRU_search_scanned_per_call disabled
+buffer_LRU_unzip_search_scanned disabled
+buffer_LRU_unzip_search_num_scan disabled
+buffer_LRU_unzip_search_scanned_per_call disabled
buffer_page_read_index_leaf disabled
buffer_page_read_index_non_leaf disabled
buffer_page_read_index_ibuf_leaf disabled
@@ -218,25 +240,47 @@ buffer_pages_written enabled
buffer_pages_read enabled
buffer_data_reads enabled
buffer_data_written enabled
-buffer_flush_adaptive_flushes enabled
-buffer_flush_adaptive_pages enabled
-buffer_flush_async_flushes enabled
-buffer_flush_async_pages enabled
-buffer_flush_sync_flushes enabled
-buffer_flush_sync_pages enabled
-buffer_flush_max_dirty_flushes enabled
-buffer_flush_max_dirty_pages enabled
-buffer_flush_free_margin_flushes enabled
-buffer_flush_free_margin_pages enabled
-buffer_flush_io_capacity_pct enabled
buffer_flush_batch_scanned enabled
buffer_flush_batch_num_scan enabled
buffer_flush_batch_scanned_per_call enabled
buffer_flush_batch_total_pages enabled
buffer_flush_batches enabled
buffer_flush_batch_pages enabled
-buffer_flush_by_lru enabled
-buffer_flush_by_list enabled
+buffer_flush_neighbor_total_pages enabled
+buffer_flush_neighbor enabled
+buffer_flush_neighbor_pages enabled
+buffer_flush_max_dirty_total_pages enabled
+buffer_flush_max_dirty enabled
+buffer_flush_max_dirty_pages enabled
+buffer_flush_adaptive_total_pages enabled
+buffer_flush_adaptive enabled
+buffer_flush_adaptive_pages enabled
+buffer_flush_async_total_pages enabled
+buffer_flush_async enabled
+buffer_flush_async_pages enabled
+buffer_flush_sync_total_pages enabled
+buffer_flush_sync enabled
+buffer_flush_sync_pages enabled
+buffer_flush_background_total_pages enabled
+buffer_flush_background enabled
+buffer_flush_background_pages enabled
+buffer_LRU_batch_scanned enabled
+buffer_LRU_batch_num_scan enabled
+buffer_LRU_batch_scanned_per_call enabled
+buffer_LRU_batch_total_pages enabled
+buffer_LRU_batches enabled
+buffer_LRU_batch_pages enabled
+buffer_LRU_single_flush_scanned enabled
+buffer_LRU_single_flush_num_scan enabled
+buffer_LRU_single_flush_scanned_per_call enabled
+buffer_LRU_single_flush_failure_count enabled
+buffer_LRU_get_free_search enabled
+buffer_LRU_search_scanned enabled
+buffer_LRU_search_num_scan enabled
+buffer_LRU_search_scanned_per_call enabled
+buffer_LRU_unzip_search_scanned enabled
+buffer_LRU_unzip_search_num_scan enabled
+buffer_LRU_unzip_search_scanned_per_call enabled
buffer_page_read_index_leaf enabled
buffer_page_read_index_non_leaf enabled
buffer_page_read_index_ibuf_leaf enabled
@@ -400,25 +444,47 @@ buffer_pages_written disabled
buffer_pages_read disabled
buffer_data_reads disabled
buffer_data_written disabled
-buffer_flush_adaptive_flushes disabled
-buffer_flush_adaptive_pages disabled
-buffer_flush_async_flushes disabled
-buffer_flush_async_pages disabled
-buffer_flush_sync_flushes disabled
-buffer_flush_sync_pages disabled
-buffer_flush_max_dirty_flushes disabled
-buffer_flush_max_dirty_pages disabled
-buffer_flush_free_margin_flushes disabled
-buffer_flush_free_margin_pages disabled
-buffer_flush_io_capacity_pct disabled
buffer_flush_batch_scanned disabled
buffer_flush_batch_num_scan disabled
buffer_flush_batch_scanned_per_call disabled
buffer_flush_batch_total_pages disabled
buffer_flush_batches disabled
buffer_flush_batch_pages disabled
-buffer_flush_by_lru disabled
-buffer_flush_by_list disabled
+buffer_flush_neighbor_total_pages disabled
+buffer_flush_neighbor disabled
+buffer_flush_neighbor_pages disabled
+buffer_flush_max_dirty_total_pages disabled
+buffer_flush_max_dirty disabled
+buffer_flush_max_dirty_pages disabled
+buffer_flush_adaptive_total_pages disabled
+buffer_flush_adaptive disabled
+buffer_flush_adaptive_pages disabled
+buffer_flush_async_total_pages disabled
+buffer_flush_async disabled
+buffer_flush_async_pages disabled
+buffer_flush_sync_total_pages disabled
+buffer_flush_sync disabled
+buffer_flush_sync_pages disabled
+buffer_flush_background_total_pages disabled
+buffer_flush_background disabled
+buffer_flush_background_pages disabled
+buffer_LRU_batch_scanned disabled
+buffer_LRU_batch_num_scan disabled
+buffer_LRU_batch_scanned_per_call disabled
+buffer_LRU_batch_total_pages disabled
+buffer_LRU_batches disabled
+buffer_LRU_batch_pages disabled
+buffer_LRU_single_flush_scanned disabled
+buffer_LRU_single_flush_num_scan disabled
+buffer_LRU_single_flush_scanned_per_call disabled
+buffer_LRU_single_flush_failure_count disabled
+buffer_LRU_get_free_search disabled
+buffer_LRU_search_scanned disabled
+buffer_LRU_search_num_scan disabled
+buffer_LRU_search_scanned_per_call disabled
+buffer_LRU_unzip_search_scanned disabled
+buffer_LRU_unzip_search_num_scan disabled
+buffer_LRU_unzip_search_scanned_per_call disabled
buffer_page_read_index_leaf disabled
buffer_page_read_index_non_leaf disabled
buffer_page_read_index_ibuf_leaf disabled
@@ -580,25 +646,47 @@ buffer_pages_written 0 disabled
buffer_pages_read 0 disabled
buffer_data_reads 0 disabled
buffer_data_written 0 disabled
-buffer_flush_adaptive_flushes 0 disabled
-buffer_flush_adaptive_pages 0 disabled
-buffer_flush_async_flushes 0 disabled
-buffer_flush_async_pages 0 disabled
-buffer_flush_sync_flushes 0 disabled
-buffer_flush_sync_pages 0 disabled
-buffer_flush_max_dirty_flushes 0 disabled
-buffer_flush_max_dirty_pages 0 disabled
-buffer_flush_free_margin_flushes 0 disabled
-buffer_flush_free_margin_pages 0 disabled
-buffer_flush_io_capacity_pct 0 disabled
buffer_flush_batch_scanned 0 disabled
buffer_flush_batch_num_scan 0 disabled
buffer_flush_batch_scanned_per_call 0 disabled
buffer_flush_batch_total_pages 0 disabled
buffer_flush_batches 0 disabled
buffer_flush_batch_pages 0 disabled
-buffer_flush_by_lru 0 disabled
-buffer_flush_by_list 0 disabled
+buffer_flush_neighbor_total_pages 0 disabled
+buffer_flush_neighbor 0 disabled
+buffer_flush_neighbor_pages 0 disabled
+buffer_flush_max_dirty_total_pages 0 disabled
+buffer_flush_max_dirty 0 disabled
+buffer_flush_max_dirty_pages 0 disabled
+buffer_flush_adaptive_total_pages 0 disabled
+buffer_flush_adaptive 0 disabled
+buffer_flush_adaptive_pages 0 disabled
+buffer_flush_async_total_pages 0 disabled
+buffer_flush_async 0 disabled
+buffer_flush_async_pages 0 disabled
+buffer_flush_sync_total_pages 0 disabled
+buffer_flush_sync 0 disabled
+buffer_flush_sync_pages 0 disabled
+buffer_flush_background_total_pages 0 disabled
+buffer_flush_background 0 disabled
+buffer_flush_background_pages 0 disabled
+buffer_LRU_batch_scanned 0 disabled
+buffer_LRU_batch_num_scan 0 disabled
+buffer_LRU_batch_scanned_per_call 0 disabled
+buffer_LRU_batch_total_pages 0 disabled
+buffer_LRU_batches 0 disabled
+buffer_LRU_batch_pages 0 disabled
+buffer_LRU_single_flush_scanned 0 disabled
+buffer_LRU_single_flush_num_scan 0 disabled
+buffer_LRU_single_flush_scanned_per_call 0 disabled
+buffer_LRU_single_flush_failure_count 0 disabled
+buffer_LRU_get_free_search 0 disabled
+buffer_LRU_search_scanned 0 disabled
+buffer_LRU_search_num_scan 0 disabled
+buffer_LRU_search_scanned_per_call 0 disabled
+buffer_LRU_unzip_search_scanned 0 disabled
+buffer_LRU_unzip_search_num_scan 0 disabled
+buffer_LRU_unzip_search_scanned_per_call 0 disabled
buffer_page_read_index_leaf 0 disabled
buffer_page_read_index_non_leaf 0 disabled
buffer_page_read_index_ibuf_leaf 0 disabled
@@ -814,25 +902,47 @@ buffer_pages_written enabled
buffer_pages_read enabled
buffer_data_reads enabled
buffer_data_written enabled
-buffer_flush_adaptive_flushes enabled
-buffer_flush_adaptive_pages enabled
-buffer_flush_async_flushes enabled
-buffer_flush_async_pages enabled
-buffer_flush_sync_flushes enabled
-buffer_flush_sync_pages enabled
-buffer_flush_max_dirty_flushes enabled
-buffer_flush_max_dirty_pages enabled
-buffer_flush_free_margin_flushes enabled
-buffer_flush_free_margin_pages enabled
-buffer_flush_io_capacity_pct enabled
buffer_flush_batch_scanned enabled
buffer_flush_batch_num_scan enabled
buffer_flush_batch_scanned_per_call enabled
buffer_flush_batch_total_pages enabled
buffer_flush_batches enabled
buffer_flush_batch_pages enabled
-buffer_flush_by_lru enabled
-buffer_flush_by_list enabled
+buffer_flush_neighbor_total_pages enabled
+buffer_flush_neighbor enabled
+buffer_flush_neighbor_pages enabled
+buffer_flush_max_dirty_total_pages enabled
+buffer_flush_max_dirty enabled
+buffer_flush_max_dirty_pages enabled
+buffer_flush_adaptive_total_pages enabled
+buffer_flush_adaptive enabled
+buffer_flush_adaptive_pages enabled
+buffer_flush_async_total_pages enabled
+buffer_flush_async enabled
+buffer_flush_async_pages enabled
+buffer_flush_sync_total_pages enabled
+buffer_flush_sync enabled
+buffer_flush_sync_pages enabled
+buffer_flush_background_total_pages enabled
+buffer_flush_background enabled
+buffer_flush_background_pages enabled
+buffer_LRU_batch_scanned enabled
+buffer_LRU_batch_num_scan enabled
+buffer_LRU_batch_scanned_per_call enabled
+buffer_LRU_batch_total_pages enabled
+buffer_LRU_batches enabled
+buffer_LRU_batch_pages enabled
+buffer_LRU_single_flush_scanned enabled
+buffer_LRU_single_flush_num_scan enabled
+buffer_LRU_single_flush_scanned_per_call enabled
+buffer_LRU_single_flush_failure_count enabled
+buffer_LRU_get_free_search enabled
+buffer_LRU_search_scanned enabled
+buffer_LRU_search_num_scan enabled
+buffer_LRU_search_scanned_per_call enabled
+buffer_LRU_unzip_search_scanned enabled
+buffer_LRU_unzip_search_num_scan enabled
+buffer_LRU_unzip_search_scanned_per_call enabled
buffer_page_read_index_leaf enabled
buffer_page_read_index_non_leaf enabled
buffer_page_read_index_ibuf_leaf enabled
@@ -994,25 +1104,47 @@ buffer_pages_written disabled
buffer_pages_read disabled
buffer_data_reads disabled
buffer_data_written disabled
-buffer_flush_adaptive_flushes disabled
-buffer_flush_adaptive_pages disabled
-buffer_flush_async_flushes disabled
-buffer_flush_async_pages disabled
-buffer_flush_sync_flushes disabled
-buffer_flush_sync_pages disabled
-buffer_flush_max_dirty_flushes disabled
-buffer_flush_max_dirty_pages disabled
-buffer_flush_free_margin_flushes disabled
-buffer_flush_free_margin_pages disabled
-buffer_flush_io_capacity_pct disabled
buffer_flush_batch_scanned disabled
buffer_flush_batch_num_scan disabled
buffer_flush_batch_scanned_per_call disabled
buffer_flush_batch_total_pages disabled
buffer_flush_batches disabled
buffer_flush_batch_pages disabled
-buffer_flush_by_lru disabled
-buffer_flush_by_list disabled
+buffer_flush_neighbor_total_pages disabled
+buffer_flush_neighbor disabled
+buffer_flush_neighbor_pages disabled
+buffer_flush_max_dirty_total_pages disabled
+buffer_flush_max_dirty disabled
+buffer_flush_max_dirty_pages disabled
+buffer_flush_adaptive_total_pages disabled
+buffer_flush_adaptive disabled
+buffer_flush_adaptive_pages disabled
+buffer_flush_async_total_pages disabled
+buffer_flush_async disabled
+buffer_flush_async_pages disabled
+buffer_flush_sync_total_pages disabled
+buffer_flush_sync disabled
+buffer_flush_sync_pages disabled
+buffer_flush_background_total_pages disabled
+buffer_flush_background disabled
+buffer_flush_background_pages disabled
+buffer_LRU_batch_scanned disabled
+buffer_LRU_batch_num_scan disabled
+buffer_LRU_batch_scanned_per_call disabled
+buffer_LRU_batch_total_pages disabled
+buffer_LRU_batches disabled
+buffer_LRU_batch_pages disabled
+buffer_LRU_single_flush_scanned disabled
+buffer_LRU_single_flush_num_scan disabled
+buffer_LRU_single_flush_scanned_per_call disabled
+buffer_LRU_single_flush_failure_count disabled
+buffer_LRU_get_free_search disabled
+buffer_LRU_search_scanned disabled
+buffer_LRU_search_num_scan disabled
+buffer_LRU_search_scanned_per_call disabled
+buffer_LRU_unzip_search_scanned disabled
+buffer_LRU_unzip_search_num_scan disabled
+buffer_LRU_unzip_search_scanned_per_call disabled
buffer_page_read_index_leaf disabled
buffer_page_read_index_non_leaf disabled
buffer_page_read_index_ibuf_leaf disabled
@@ -1174,25 +1306,47 @@ buffer_pages_written enabled
buffer_pages_read enabled
buffer_data_reads enabled
buffer_data_written enabled
-buffer_flush_adaptive_flushes enabled
-buffer_flush_adaptive_pages enabled
-buffer_flush_async_flushes enabled
-buffer_flush_async_pages enabled
-buffer_flush_sync_flushes enabled
-buffer_flush_sync_pages enabled
-buffer_flush_max_dirty_flushes enabled
-buffer_flush_max_dirty_pages enabled
-buffer_flush_free_margin_flushes enabled
-buffer_flush_free_margin_pages enabled
-buffer_flush_io_capacity_pct enabled
buffer_flush_batch_scanned enabled
buffer_flush_batch_num_scan enabled
buffer_flush_batch_scanned_per_call enabled
buffer_flush_batch_total_pages enabled
buffer_flush_batches enabled
buffer_flush_batch_pages enabled
-buffer_flush_by_lru enabled
-buffer_flush_by_list enabled
+buffer_flush_neighbor_total_pages enabled
+buffer_flush_neighbor enabled
+buffer_flush_neighbor_pages enabled
+buffer_flush_max_dirty_total_pages enabled
+buffer_flush_max_dirty enabled
+buffer_flush_max_dirty_pages enabled
+buffer_flush_adaptive_total_pages enabled
+buffer_flush_adaptive enabled
+buffer_flush_adaptive_pages enabled
+buffer_flush_async_total_pages enabled
+buffer_flush_async enabled
+buffer_flush_async_pages enabled
+buffer_flush_sync_total_pages enabled
+buffer_flush_sync enabled
+buffer_flush_sync_pages enabled
+buffer_flush_background_total_pages enabled
+buffer_flush_background enabled
+buffer_flush_background_pages enabled
+buffer_LRU_batch_scanned enabled
+buffer_LRU_batch_num_scan enabled
+buffer_LRU_batch_scanned_per_call enabled
+buffer_LRU_batch_total_pages enabled
+buffer_LRU_batches enabled
+buffer_LRU_batch_pages enabled
+buffer_LRU_single_flush_scanned enabled
+buffer_LRU_single_flush_num_scan enabled
+buffer_LRU_single_flush_scanned_per_call enabled
+buffer_LRU_single_flush_failure_count enabled
+buffer_LRU_get_free_search enabled
+buffer_LRU_search_scanned enabled
+buffer_LRU_search_num_scan enabled
+buffer_LRU_search_scanned_per_call enabled
+buffer_LRU_unzip_search_scanned enabled
+buffer_LRU_unzip_search_num_scan enabled
+buffer_LRU_unzip_search_scanned_per_call enabled
buffer_page_read_index_leaf enabled
buffer_page_read_index_non_leaf enabled
buffer_page_read_index_ibuf_leaf enabled
@@ -1354,25 +1508,47 @@ buffer_pages_written disabled
buffer_pages_read disabled
buffer_data_reads disabled
buffer_data_written disabled
-buffer_flush_adaptive_flushes disabled
-buffer_flush_adaptive_pages disabled
-buffer_flush_async_flushes disabled
-buffer_flush_async_pages disabled
-buffer_flush_sync_flushes disabled
-buffer_flush_sync_pages disabled
-buffer_flush_max_dirty_flushes disabled
-buffer_flush_max_dirty_pages disabled
-buffer_flush_free_margin_flushes disabled
-buffer_flush_free_margin_pages disabled
-buffer_flush_io_capacity_pct disabled
buffer_flush_batch_scanned disabled
buffer_flush_batch_num_scan disabled
buffer_flush_batch_scanned_per_call disabled
buffer_flush_batch_total_pages disabled
buffer_flush_batches disabled
buffer_flush_batch_pages disabled
-buffer_flush_by_lru disabled
-buffer_flush_by_list disabled
+buffer_flush_neighbor_total_pages disabled
+buffer_flush_neighbor disabled
+buffer_flush_neighbor_pages disabled
+buffer_flush_max_dirty_total_pages disabled
+buffer_flush_max_dirty disabled
+buffer_flush_max_dirty_pages disabled
+buffer_flush_adaptive_total_pages disabled
+buffer_flush_adaptive disabled
+buffer_flush_adaptive_pages disabled
+buffer_flush_async_total_pages disabled
+buffer_flush_async disabled
+buffer_flush_async_pages disabled
+buffer_flush_sync_total_pages disabled
+buffer_flush_sync disabled
+buffer_flush_sync_pages disabled
+buffer_flush_background_total_pages disabled
+buffer_flush_background disabled
+buffer_flush_background_pages disabled
+buffer_LRU_batch_scanned disabled
+buffer_LRU_batch_num_scan disabled
+buffer_LRU_batch_scanned_per_call disabled
+buffer_LRU_batch_total_pages disabled
+buffer_LRU_batches disabled
+buffer_LRU_batch_pages disabled
+buffer_LRU_single_flush_scanned disabled
+buffer_LRU_single_flush_num_scan disabled
+buffer_LRU_single_flush_scanned_per_call disabled
+buffer_LRU_single_flush_failure_count disabled
+buffer_LRU_get_free_search disabled
+buffer_LRU_search_scanned disabled
+buffer_LRU_search_num_scan disabled
+buffer_LRU_search_scanned_per_call disabled
+buffer_LRU_unzip_search_scanned disabled
+buffer_LRU_unzip_search_num_scan disabled
+buffer_LRU_unzip_search_scanned_per_call disabled
buffer_page_read_index_leaf disabled
buffer_page_read_index_non_leaf disabled
buffer_page_read_index_ibuf_leaf disabled
@@ -1534,25 +1710,47 @@ buffer_pages_written disabled
buffer_pages_read disabled
buffer_data_reads disabled
buffer_data_written disabled
-buffer_flush_adaptive_flushes disabled
-buffer_flush_adaptive_pages disabled
-buffer_flush_async_flushes disabled
-buffer_flush_async_pages disabled
-buffer_flush_sync_flushes disabled
-buffer_flush_sync_pages disabled
-buffer_flush_max_dirty_flushes disabled
-buffer_flush_max_dirty_pages disabled
-buffer_flush_free_margin_flushes disabled
-buffer_flush_free_margin_pages disabled
-buffer_flush_io_capacity_pct disabled
buffer_flush_batch_scanned disabled
buffer_flush_batch_num_scan disabled
buffer_flush_batch_scanned_per_call disabled
buffer_flush_batch_total_pages disabled
buffer_flush_batches disabled
buffer_flush_batch_pages disabled
-buffer_flush_by_lru disabled
-buffer_flush_by_list disabled
+buffer_flush_neighbor_total_pages disabled
+buffer_flush_neighbor disabled
+buffer_flush_neighbor_pages disabled
+buffer_flush_max_dirty_total_pages disabled
+buffer_flush_max_dirty disabled
+buffer_flush_max_dirty_pages disabled
+buffer_flush_adaptive_total_pages disabled
+buffer_flush_adaptive disabled
+buffer_flush_adaptive_pages disabled
+buffer_flush_async_total_pages disabled
+buffer_flush_async disabled
+buffer_flush_async_pages disabled
+buffer_flush_sync_total_pages disabled
+buffer_flush_sync disabled
+buffer_flush_sync_pages disabled
+buffer_flush_background_total_pages disabled
+buffer_flush_background disabled
+buffer_flush_background_pages disabled
+buffer_LRU_batch_scanned disabled
+buffer_LRU_batch_num_scan disabled
+buffer_LRU_batch_scanned_per_call disabled
+buffer_LRU_batch_total_pages disabled
+buffer_LRU_batches disabled
+buffer_LRU_batch_pages disabled
+buffer_LRU_single_flush_scanned disabled
+buffer_LRU_single_flush_num_scan disabled
+buffer_LRU_single_flush_scanned_per_call disabled
+buffer_LRU_single_flush_failure_count disabled
+buffer_LRU_get_free_search disabled
+buffer_LRU_search_scanned disabled
+buffer_LRU_search_num_scan disabled
+buffer_LRU_search_scanned_per_call disabled
+buffer_LRU_unzip_search_scanned disabled
+buffer_LRU_unzip_search_num_scan disabled
+buffer_LRU_unzip_search_scanned_per_call disabled
buffer_page_read_index_leaf disabled
buffer_page_read_index_non_leaf disabled
buffer_page_read_index_ibuf_leaf disabled
=== modified file 'mysql-test/suite/innodb/t/innodb_buffer_pool_load-master.opt'
--- a/mysql-test/suite/innodb/t/innodb_buffer_pool_load-master.opt revid:marc.alff@stripped
+++ b/mysql-test/suite/innodb/t/innodb_buffer_pool_load-master.opt revid:inaam.rana@stripped
@@ -1 +1 @@
---innodb-buffer-pool-size=16M
+--innodb-buffer-pool-size=64M
=== modified file 'mysql-test/suite/sys_vars/r/all_vars.result'
--- a/mysql-test/suite/sys_vars/r/all_vars.result revid:marc.alff@stripped
+++ b/mysql-test/suite/sys_vars/r/all_vars.result revid:inaam.rana@stripped
@@ -5,6 +5,7 @@ insert into t2 select variable_name from
insert into t2 select variable_name from information_schema.session_variables;
delete from t2 where variable_name='innodb_change_buffering_debug';
delete from t2 where variable_name='innodb_page_hash_locks';
+delete from t2 where variable_name='innodb_doublewrite_batch_size';
update t2 set variable_name= replace(variable_name, "PERFORMANCE_SCHEMA_", "PFS_");
select variable_name as `There should be *no* long test name listed below:` from t2
where length(variable_name) > 50;
=== added file 'mysql-test/suite/sys_vars/r/innodb_flush_neighbors_basic.result'
--- a/mysql-test/suite/sys_vars/r/innodb_flush_neighbors_basic.result 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/sys_vars/r/innodb_flush_neighbors_basic.result revid:inaam.rana@stripped
@@ -0,0 +1,92 @@
+SET @start_global_value = @@global.innodb_flush_neighbors;
+SELECT @start_global_value;
+@start_global_value
+1
+Valid values are 'ON' and 'OFF'
+select @@global.innodb_flush_neighbors in (0, 1);
+@@global.innodb_flush_neighbors in (0, 1)
+1
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
+select @@session.innodb_flush_neighbors;
+ERROR HY000: Variable 'innodb_flush_neighbors' is a GLOBAL variable
+show global variables like 'innodb_flush_neighbors';
+Variable_name Value
+innodb_flush_neighbors ON
+show session variables like 'innodb_flush_neighbors';
+Variable_name Value
+innodb_flush_neighbors ON
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS ON
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS ON
+set global innodb_flush_neighbors='OFF';
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+0
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS OFF
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS OFF
+set @@global.innodb_flush_neighbors=1;
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS ON
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS ON
+set global innodb_flush_neighbors=0;
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+0
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS OFF
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS OFF
+set @@global.innodb_flush_neighbors='ON';
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS ON
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS ON
+set session innodb_flush_neighbors='OFF';
+ERROR HY000: Variable 'innodb_flush_neighbors' is a GLOBAL variable and should be set with SET GLOBAL
+set @@session.innodb_flush_neighbors='ON';
+ERROR HY000: Variable 'innodb_flush_neighbors' is a GLOBAL variable and should be set with SET GLOBAL
+set global innodb_flush_neighbors=1.1;
+ERROR 42000: Incorrect argument type to variable 'innodb_flush_neighbors'
+set global innodb_flush_neighbors=1e1;
+ERROR 42000: Incorrect argument type to variable 'innodb_flush_neighbors'
+set global innodb_flush_neighbors=2;
+ERROR 42000: Variable 'innodb_flush_neighbors' can't be set to the value of '2'
+NOTE: The following should fail with ER_WRONG_VALUE_FOR_VAR (BUG#50643)
+set global innodb_flush_neighbors=-3;
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS ON
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS ON
+set global innodb_flush_neighbors='AUTO';
+ERROR 42000: Variable 'innodb_flush_neighbors' can't be set to the value of 'AUTO'
+SET @@global.innodb_flush_neighbors = @start_global_value;
+SELECT @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
=== added file 'mysql-test/suite/sys_vars/r/innodb_lru_scan_depth_basic.result'
--- a/mysql-test/suite/sys_vars/r/innodb_lru_scan_depth_basic.result 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/sys_vars/r/innodb_lru_scan_depth_basic.result revid:inaam.rana@stripped
@@ -0,0 +1,69 @@
+SET @start_global_value = @@global.innodb_lru_scan_depth;
+SELECT @start_global_value;
+@start_global_value
+1024
+Valid value 128 or more
+select @@global.innodb_lru_scan_depth > 127;
+@@global.innodb_lru_scan_depth > 127
+1
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+1024
+select @@session.innodb_lru_scan_depth;
+ERROR HY000: Variable 'innodb_lru_scan_depth' is a GLOBAL variable
+show global variables like 'innodb_lru_scan_depth';
+Variable_name Value
+innodb_lru_scan_depth 1024
+show session variables like 'innodb_lru_scan_depth';
+Variable_name Value
+innodb_lru_scan_depth 1024
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH 1024
+select * from information_schema.session_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH 1024
+set global innodb_lru_scan_depth=325;
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+325
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH 325
+select * from information_schema.session_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH 325
+set session innodb_lru_scan_depth=444;
+ERROR HY000: Variable 'innodb_lru_scan_depth' is a GLOBAL variable and should be set with SET GLOBAL
+set global innodb_lru_scan_depth=1.1;
+ERROR 42000: Incorrect argument type to variable 'innodb_lru_scan_depth'
+set global innodb_lru_scan_depth=1e1;
+ERROR 42000: Incorrect argument type to variable 'innodb_lru_scan_depth'
+set global innodb_lru_scan_depth="foo";
+ERROR 42000: Incorrect argument type to variable 'innodb_lru_scan_depth'
+set global innodb_lru_scan_depth=7;
+Warnings:
+Warning 1292 Truncated incorrect innodb_lru_scan_depth value: '7'
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+100
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH 100
+set global innodb_lru_scan_depth=-7;
+Warnings:
+Warning 1292 Truncated incorrect innodb_lru_scan_depth value: '-7'
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+100
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH 100
+set global innodb_lru_scan_depth=128;
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+128
+SET @@global.innodb_lru_scan_depth = @start_global_value;
+SELECT @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+1024
=== modified file 'mysql-test/suite/sys_vars/t/all_vars.test'
--- a/mysql-test/suite/sys_vars/t/all_vars.test revid:marc.alff@stripped
+++ b/mysql-test/suite/sys_vars/t/all_vars.test revid:inaam.rana@stripped
@@ -70,6 +70,7 @@ insert into t2 select variable_name from
# These are only present in debug builds.
delete from t2 where variable_name='innodb_change_buffering_debug';
delete from t2 where variable_name='innodb_page_hash_locks';
+delete from t2 where variable_name='innodb_doublewrite_batch_size';
# Performance schema variables are too long for files named
# 'mysql-test/suite/sys_vars/t/' ...
=== added file 'mysql-test/suite/sys_vars/t/innodb_flush_neighbors_basic.test'
--- a/mysql-test/suite/sys_vars/t/innodb_flush_neighbors_basic.test 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/sys_vars/t/innodb_flush_neighbors_basic.test revid:inaam.rana@stripped
@@ -0,0 +1,70 @@
+
+
+# 2011-02-23 - Added
+#
+
+--source include/have_innodb.inc
+
+SET @start_global_value = @@global.innodb_flush_neighbors;
+SELECT @start_global_value;
+
+#
+# exists as global only
+#
+--echo Valid values are 'ON' and 'OFF'
+select @@global.innodb_flush_neighbors in (0, 1);
+select @@global.innodb_flush_neighbors;
+--error ER_INCORRECT_GLOBAL_LOCAL_VAR
+select @@session.innodb_flush_neighbors;
+show global variables like 'innodb_flush_neighbors';
+show session variables like 'innodb_flush_neighbors';
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+
+#
+# show that it's writable
+#
+set global innodb_flush_neighbors='OFF';
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+set @@global.innodb_flush_neighbors=1;
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+set global innodb_flush_neighbors=0;
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+set @@global.innodb_flush_neighbors='ON';
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+--error ER_GLOBAL_VARIABLE
+set session innodb_flush_neighbors='OFF';
+--error ER_GLOBAL_VARIABLE
+set @@session.innodb_flush_neighbors='ON';
+
+#
+# incorrect types
+#
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_flush_neighbors=1.1;
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_flush_neighbors=1e1;
+--error ER_WRONG_VALUE_FOR_VAR
+set global innodb_flush_neighbors=2;
+--echo NOTE: The following should fail with ER_WRONG_VALUE_FOR_VAR (BUG#50643)
+set global innodb_flush_neighbors=-3;
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+--error ER_WRONG_VALUE_FOR_VAR
+set global innodb_flush_neighbors='AUTO';
+
+#
+# Cleanup
+#
+
+SET @@global.innodb_flush_neighbors = @start_global_value;
+SELECT @@global.innodb_flush_neighbors;
=== added file 'mysql-test/suite/sys_vars/t/innodb_lru_scan_depth_basic.test'
--- a/mysql-test/suite/sys_vars/t/innodb_lru_scan_depth_basic.test 1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/sys_vars/t/innodb_lru_scan_depth_basic.test revid:inaam.rana@stripped
@@ -0,0 +1,58 @@
+
+
+# 2011-02-23 - Added
+#
+
+--source include/have_innodb.inc
+
+SET @start_global_value = @@global.innodb_lru_scan_depth;
+SELECT @start_global_value;
+
+#
+# exists as global only
+#
+--echo Valid value 128 or more
+select @@global.innodb_lru_scan_depth > 127;
+select @@global.innodb_lru_scan_depth;
+--error ER_INCORRECT_GLOBAL_LOCAL_VAR
+select @@session.innodb_lru_scan_depth;
+show global variables like 'innodb_lru_scan_depth';
+show session variables like 'innodb_lru_scan_depth';
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+select * from information_schema.session_variables where variable_name='innodb_lru_scan_depth';
+
+#
+# show that it's writable
+#
+set global innodb_lru_scan_depth=325;
+select @@global.innodb_lru_scan_depth;
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+select * from information_schema.session_variables where variable_name='innodb_lru_scan_depth';
+--error ER_GLOBAL_VARIABLE
+set session innodb_lru_scan_depth=444;
+
+#
+# incorrect types
+#
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_lru_scan_depth=1.1;
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_lru_scan_depth=1e1;
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_lru_scan_depth="foo";
+
+set global innodb_lru_scan_depth=7;
+select @@global.innodb_lru_scan_depth;
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+set global innodb_lru_scan_depth=-7;
+select @@global.innodb_lru_scan_depth;
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+
+#
+# min/max values
+#
+set global innodb_lru_scan_depth=128;
+select @@global.innodb_lru_scan_depth;
+
+SET @@global.innodb_lru_scan_depth = @start_global_value;
+SELECT @@global.innodb_lru_scan_depth;
=== modified file 'storage/innobase/btr/btr0sea.c'
--- a/storage/innobase/btr/btr0sea.c revid:marc.alff@stripped
+++ b/storage/innobase/btr/btr0sea.c revid:inaam.rana@stripped
@@ -1034,7 +1034,11 @@ btr_search_drop_page_hash_index(
buf_block_t* block) /*!< in: block containing index page,
s- or x-latched, or an index page
for which we know that
- block->buf_fix_count == 0 */
+ block->buf_fix_count == 0 or it is an
+ index page which has already been
+ removed from the buf_pool->page_hash
+ i.e.: it is in state
+ BUF_BLOCK_REMOVE_HASH */
{
hash_table_t* table;
ulint n_fields;
@@ -1082,7 +1086,8 @@ retry:
#ifdef UNIV_SYNC_DEBUG
ut_ad(rw_lock_own(&(block->lock), RW_LOCK_SHARED)
|| rw_lock_own(&(block->lock), RW_LOCK_EX)
- || (block->page.buf_fix_count == 0));
+ || block->page.buf_fix_count == 0
+ || buf_block_get_state(block) == BUF_BLOCK_REMOVE_HASH);
#endif /* UNIV_SYNC_DEBUG */
n_fields = block->curr_n_fields;
=== modified file 'storage/innobase/buf/buf0buf.c'
--- a/storage/innobase/buf/buf0buf.c revid:marc.alff@stripped
+++ b/storage/innobase/buf/buf0buf.c revid:inaam.rana@stripped
@@ -1207,6 +1207,8 @@ buf_pool_init_instance(
/* All fields are initialized by mem_zalloc(). */
+ buf_pool->try_LRU_scan = TRUE;
+
buf_pool_mutex_exit(buf_pool);
return(DB_SUCCESS);
@@ -3683,9 +3685,6 @@ buf_page_create(
ibuf_merge_or_delete_for_page(NULL, space, offset, zip_size, TRUE);
- /* Flush pages from the end of the LRU list if necessary */
- buf_flush_free_margin(buf_pool);
-
frame = block->frame;
memset(frame + FIL_PAGE_PREV, 0xff, 4);
@@ -4075,7 +4074,6 @@ buf_pool_invalidate_instance(
/*=========================*/
buf_pool_t* buf_pool) /*!< in: buffer pool instance */
{
- ibool freed;
enum buf_flush i;
buf_pool_mutex_enter(buf_pool);
@@ -4104,21 +4102,17 @@ buf_pool_invalidate_instance(
ut_ad(buf_all_freed_instance(buf_pool));
- freed = TRUE;
+ buf_pool_mutex_enter(buf_pool);
- while (freed) {
- freed = buf_LRU_search_and_free_block(buf_pool, 100);
+ while (buf_LRU_scan_and_free_block(buf_pool, TRUE)) {
}
- buf_pool_mutex_enter(buf_pool);
-
ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);
ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);
buf_pool->freed_page_clock = 0;
buf_pool->LRU_old = NULL;
buf_pool->LRU_old_len = 0;
- buf_pool->LRU_flush_ended = 0;
memset(&buf_pool->stat, 0x00, sizeof(buf_pool->stat));
buf_refresh_io_stats(buf_pool);
@@ -4156,6 +4150,7 @@ buf_pool_validate_instance(
buf_chunk_t* chunk;
ulint i;
ulint n_lru_flush = 0;
+ ulint n_page_flush = 0;
ulint n_list_flush = 0;
ulint n_lru = 0;
ulint n_flush = 0;
@@ -4219,9 +4214,13 @@ buf_pool_validate_instance(
&block->page)) {
case BUF_FLUSH_LRU:
n_lru_flush++;
+ goto assert_s_latched;
+ case BUF_FLUSH_SINGLE_PAGE:
+ n_page_flush++;
+assert_s_latched:
ut_a(rw_lock_is_locked(
&block->lock,
- RW_LOCK_SHARED));
+ RW_LOCK_SHARED));
break;
case BUF_FLUSH_LIST:
n_list_flush++;
@@ -4312,6 +4311,9 @@ buf_pool_validate_instance(
case BUF_FLUSH_LRU:
n_lru_flush++;
break;
+ case BUF_FLUSH_SINGLE_PAGE:
+ n_page_flush++;
+ break;
case BUF_FLUSH_LIST:
n_list_flush++;
break;
@@ -4362,6 +4364,7 @@ buf_pool_validate_instance(
ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush);
ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush);
+ ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_page_flush);
buf_pool_mutex_exit(buf_pool);
@@ -4429,7 +4432,7 @@ buf_print_instance(
"modified database pages %lu\n"
"n pending decompressions %lu\n"
"n pending reads %lu\n"
- "n pending flush LRU %lu list %lu\n"
+ "n pending flush LRU %lu list %lu single page %lu\n"
"pages made young %lu, not young %lu\n"
"pages read %lu, created %lu, written %lu\n",
(ulong) size,
@@ -4440,6 +4443,7 @@ buf_print_instance(
(ulong) buf_pool->n_pend_reads,
(ulong) buf_pool->n_flush[BUF_FLUSH_LRU],
(ulong) buf_pool->n_flush[BUF_FLUSH_LIST],
+ (ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE],
(ulong) buf_pool->stat.n_pages_made_young,
(ulong) buf_pool->stat.n_pages_not_made_young,
(ulong) buf_pool->stat.n_pages_read,
@@ -4790,6 +4794,10 @@ buf_stats_get_pool_info(
(buf_pool->n_flush[BUF_FLUSH_LIST]
+ buf_pool->init_flush[BUF_FLUSH_LIST]);
+ pool_info->n_pending_flush_single_page =
+ (buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]
+ + buf_pool->init_flush[BUF_FLUSH_SINGLE_PAGE]);
+
buf_flush_list_mutex_exit(buf_pool);
current_time = time(NULL);
@@ -4894,7 +4902,7 @@ buf_print_io_instance(
"Old database pages %lu\n"
"Modified db pages %lu\n"
"Pending reads %lu\n"
- "Pending writes: LRU %lu, flush list %lu\n",
+ "Pending writes: LRU %lu, flush list %lu single page %lu\n",
pool_info->pool_size,
pool_info->free_list_len,
pool_info->lru_len,
@@ -4902,7 +4910,8 @@ buf_print_io_instance(
pool_info->flush_list_len,
pool_info->n_pend_reads,
pool_info->n_pending_flush_lru,
- pool_info->n_pending_flush_list);
+ pool_info->n_pending_flush_list,
+ pool_info->n_pending_flush_single_page);
fprintf(file,
"Pages made young %lu, not young %lu\n"
@@ -5090,6 +5099,7 @@ buf_pool_check_no_pending_io(void)
pending_io += buf_pool->n_pend_reads
+ buf_pool->n_flush[BUF_FLUSH_LRU]
+ + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]
+ buf_pool->n_flush[BUF_FLUSH_LIST];
}
=== modified file 'storage/innobase/buf/buf0flu.c'
--- a/storage/innobase/buf/buf0flu.c revid:marc.alff@stripped
+++ b/storage/innobase/buf/buf0flu.c revid:inaam.rana@stripped
@@ -61,6 +61,11 @@ Each interval is 1 second, defined by th
srv_error_monitor_thread() calls buf_flush_stat_update(). */
#define BUF_FLUSH_STAT_N_INTERVAL 20
+/** Time in milliseconds that we sleep when unable to find a slot in
+the doublewrite buffer or when we have to wait for a running batch
+to end. */
+#define TRX_DOUBLEWRITE_BATCH_POLL_DELAY 10000
+
/** Sampled values buf_flush_stat_cur.
Not protected by any mutex. Updated by buf_flush_stat_update(). */
static buf_flush_stat_t buf_flush_stat_arr[BUF_FLUSH_STAT_N_INTERVAL];
@@ -86,10 +91,20 @@ need to protect it by a mutex. It is onl
doing the shutdown */
UNIV_INTERN ibool buf_page_cleaner_is_active = FALSE;
+/** LRU flush batch is further divided into this chunk size to
+reduce the wait time for the threads waiting for a clean block */
+#define PAGE_CLEANER_LRU_BATCH_CHUNK_SIZE 100
+
#ifdef UNIV_PFS_THREAD
UNIV_INTERN mysql_pfs_key_t buf_page_cleaner_thread_key;
#endif /* UNIV_PFS_THREAD */
+/** If LRU list of a buf_pool is less than this size then LRU eviction
+should not happen. This is because when we do LRU flushing we also put
+the blocks on free list. If LRU list is very small then we can end up
+in thrashing. */
+#define BUF_LRU_MIN_LEN 256
+
/* @} */
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
@@ -479,7 +494,7 @@ buf_flush_ready_for_flush(
/*======================*/
buf_page_t* bpage, /*!< in: buffer control block, must be
buf_page_in_file(bpage) */
- enum buf_flush flush_type)/*!< in: BUF_FLUSH_LRU or BUF_FLUSH_LIST */
+ enum buf_flush flush_type)/*!< in: type of flush */
{
#ifdef UNIV_DEBUG
buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
@@ -487,26 +502,33 @@ buf_flush_ready_for_flush(
#endif
ut_a(buf_page_in_file(bpage));
ut_ad(mutex_own(buf_page_get_mutex(bpage)));
- ut_ad(flush_type == BUF_FLUSH_LRU || BUF_FLUSH_LIST);
+ ut_ad(flush_type < BUF_FLUSH_N_TYPES);
- if (bpage->oldest_modification != 0
- && buf_page_get_io_fix(bpage) == BUF_IO_NONE) {
- ut_ad(bpage->in_flush_list);
-
- if (flush_type != BUF_FLUSH_LRU) {
-
- return(TRUE);
+ if (bpage->oldest_modification == 0
+ || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
+ return(FALSE);
+ }
- } else if (bpage->buf_fix_count == 0) {
+ ut_ad(bpage->in_flush_list);
- /* If we are flushing the LRU list, to avoid deadlocks
- we require the block not to be bufferfixed, and hence
- not latched. */
+ switch (flush_type) {
+ case BUF_FLUSH_LIST:
+ return(TRUE);
- return(TRUE);
- }
+ case BUF_FLUSH_LRU:
+ case BUF_FLUSH_SINGLE_PAGE:
+ /* Because any thread may call single page flush, even
+ when owning locks on pages, to avoid deadlocks, we must
+ make sure that the that it is not buffer fixed.
+ The same holds true for LRU flush because a user thread
+ may end up waiting for an LRU flush to end while
+ holding locks on other pages. */
+ return(bpage->buf_fix_count == 0);
+ case BUF_FLUSH_N_TYPES:
+ break;
}
+ ut_error;
return(FALSE);
}
@@ -664,15 +686,6 @@ buf_flush_write_complete(
flush_type = buf_page_get_flush_type(bpage);
buf_pool->n_flush[flush_type]--;
- if (flush_type == BUF_FLUSH_LRU) {
- /* Put the block to the end of the LRU list to wait to be
- moved to the free list */
-
- buf_LRU_make_block_old(bpage);
-
- buf_pool->LRU_flush_ended++;
- }
-
/* fprintf(stderr, "n pending flush %lu\n",
buf_pool->n_flush[flush_type]); */
@@ -708,6 +721,123 @@ buf_flush_sync_datafiles(void)
}
/********************************************************************//**
+Check the LSN values on the page. */
+static
+void
+buf_flush_doublewrite_check_page_lsn(
+/*=================================*/
+ const page_t* page) /*!< in: page to check */
+{
+ if (memcmp(page + (FIL_PAGE_LSN + 4),
+ page + (UNIV_PAGE_SIZE
+ - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
+ 4)) {
+
+ ut_print_timestamp(stderr);
+ fprintf(stderr,
+ " InnoDB: ERROR: The page to be written"
+ " seems corrupt!\n"
+ "InnoDB: The LSN fields do not match!"
+ " Noticed in the buffer pool\n");
+ }
+}
+
+/********************************************************************//**
+Asserts when a corrupt block is find during writing out data to the
+disk. */
+static
+void
+buf_flush_doublewrite_assert_on_corrupt_block(
+/*==========================================*/
+ const buf_block_t* block) /*!< in: block to check */
+{
+ buf_page_print(block->frame, 0);
+
+ ut_print_timestamp(stderr);
+ fprintf(stderr,
+ " InnoDB: Apparent corruption of an"
+ " index page n:o %lu in space %lu\n"
+ "InnoDB: to be written to data file."
+ " We intentionally crash server\n"
+ "InnoDB: to prevent corrupt data"
+ " from ending up in data\n"
+ "InnoDB: files.\n",
+ (ulong) buf_block_get_page_no(block),
+ (ulong) buf_block_get_space(block));
+
+ ut_error;
+}
+
+/********************************************************************//**
+Check the LSN values on the page with which this block is associated.
+Also validate the page if the option is set. */
+static
+void
+buf_flush_doublewrite_check_block(
+/*==============================*/
+ const buf_block_t* block) /*!< in: block to check */
+{
+ if (buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE
+ || block->page.zip.data) {
+ /* No simple validate for compressed pages exists. */
+ return;
+ }
+
+ buf_flush_doublewrite_check_page_lsn(block->frame);
+
+ if (!block->check_index_page_at_flush) {
+ return;
+ }
+
+ if (page_is_comp(block->frame)) {
+ if (!page_simple_validate_new(block->frame)) {
+ buf_flush_doublewrite_assert_on_corrupt_block(block);
+ }
+ } else if (!page_simple_validate_old(block->frame)) {
+
+ buf_flush_doublewrite_assert_on_corrupt_block(block);
+ }
+}
+
+/********************************************************************//**
+Writes a page that has already been written to the doublewrite buffer
+to the datafile. It is the job of the caller to sync the datafile. */
+static
+void
+buf_flush_write_block_to_datafile(
+/*==============================*/
+ const buf_block_t* block) /*!< in: block to write */
+{
+ ut_a(block);
+ ut_a(buf_page_in_file(&block->page));
+
+ if (UNIV_LIKELY_NULL(block->page.zip.data)) {
+ fil_io(OS_FILE_WRITE | OS_AIO_SIMULATED_WAKE_LATER,
+ FALSE, buf_page_get_space(&block->page),
+ buf_page_get_zip_size(&block->page),
+ buf_page_get_page_no(&block->page), 0,
+ buf_page_get_zip_size(&block->page),
+ (void*)block->page.zip.data,
+ (void*)block);
+
+ goto exit;
+ }
+
+ ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
+ buf_flush_doublewrite_check_page_lsn(block->frame);
+
+ fil_io(OS_FILE_WRITE | OS_AIO_SIMULATED_WAKE_LATER,
+ FALSE, buf_block_get_space(block), 0,
+ buf_block_get_page_no(block), 0, UNIV_PAGE_SIZE,
+ (void*)block->frame, (void*)block);
+
+exit:
+ /* Increment the counter of I/O operations used
+ for selecting LRU policy. */
+ buf_LRU_stat_inc_io();
+}
+
+/********************************************************************//**
Flushes possible buffered writes from the doublewrite memory buffer to disk,
and also wakes up the aio thread if simulated aio is used. It is very
important to call this function after a batch of writes has been posted,
@@ -729,6 +859,7 @@ buf_flush_buffered_writes(void)
return;
}
+try_again:
mutex_enter(&(trx_doublewrite->mutex));
/* Write first to doublewrite buffer blocks. We use synchronous
@@ -742,7 +873,32 @@ buf_flush_buffered_writes(void)
return;
}
- for (i = 0; i < trx_doublewrite->first_free; i++) {
+ if (trx_doublewrite->batch_running) {
+ mutex_exit(&trx_doublewrite->mutex);
+
+ /* Another thread is running the batch right now. Wait
+ for it to finish. */
+ os_thread_sleep(TRX_DOUBLEWRITE_BATCH_POLL_DELAY);
+ goto try_again;
+ }
+
+ ut_a(!trx_doublewrite->batch_running);
+
+ /* Disallow anyone else to post to doublewrite buffer or to
+ start another batch of flushing. */
+ trx_doublewrite->batch_running = TRUE;
+
+ /* Now safe to release the mutex. Note that though no other
+ thread is allowed to post to the doublewrite batch flushing
+ but any threads working on single page flushes are allowed
+ to proceed. */
+ mutex_exit(&trx_doublewrite->mutex);
+
+ write_buf = trx_doublewrite->write_buf;
+
+ for (len2 = 0, i = 0;
+ i < trx_doublewrite->first_free;
+ len2 += UNIV_PAGE_SIZE, i++) {
const buf_block_t* block;
@@ -750,130 +906,50 @@ buf_flush_buffered_writes(void)
if (buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE
|| block->page.zip.data) {
- /* No simple validate for compressed pages exists. */
+ /* No simple validate for compressed
+ pages exists. */
continue;
}
- if (UNIV_UNLIKELY
- (memcmp(block->frame + (FIL_PAGE_LSN + 4),
- block->frame + (UNIV_PAGE_SIZE
- - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
- 4))) {
- ut_print_timestamp(stderr);
- fprintf(stderr,
- " InnoDB: ERROR: The page to be written"
- " seems corrupt!\n"
- "InnoDB: The lsn fields do not match!"
- " Noticed in the buffer pool\n"
- "InnoDB: before posting to the"
- " doublewrite buffer.\n");
- }
-
- if (!block->check_index_page_at_flush) {
- } else if (page_is_comp(block->frame)) {
- if (UNIV_UNLIKELY
- (!page_simple_validate_new(block->frame))) {
-corrupted_page:
- buf_page_print(block->frame, 0);
-
- ut_print_timestamp(stderr);
- fprintf(stderr,
- " InnoDB: Apparent corruption of an"
- " index page n:o %lu in space %lu\n"
- "InnoDB: to be written to data file."
- " We intentionally crash server\n"
- "InnoDB: to prevent corrupt data"
- " from ending up in data\n"
- "InnoDB: files.\n",
- (ulong) buf_block_get_page_no(block),
- (ulong) buf_block_get_space(block));
-
- ut_error;
- }
- } else if (UNIV_UNLIKELY
- (!page_simple_validate_old(block->frame))) {
+ /* Check that the actual page in the buffer pool is
+ not corrupt and the LSN values are sane. */
+ buf_flush_doublewrite_check_block(block);
- goto corrupted_page;
- }
+ /* Check that the page as written to the doublewrite
+ buffer has sane LSN values. */
+ buf_flush_doublewrite_check_page_lsn(write_buf + len2);
}
- /* increment the doublewrite flushed pages counter */
- srv_dblwr_pages_written+= trx_doublewrite->first_free;
- srv_dblwr_writes++;
-
+ /* Write out the first block of the doublewrite buffer */
len = ut_min(TRX_SYS_DOUBLEWRITE_BLOCK_SIZE,
trx_doublewrite->first_free) * UNIV_PAGE_SIZE;
- write_buf = trx_doublewrite->write_buf;
- i = 0;
-
fil_io(OS_FILE_WRITE, TRUE, TRX_SYS_SPACE, 0,
trx_doublewrite->block1, 0, len,
(void*) write_buf, NULL);
- for (len2 = 0; len2 + UNIV_PAGE_SIZE <= len;
- len2 += UNIV_PAGE_SIZE, i++) {
- const buf_block_t* block = (buf_block_t*)
- trx_doublewrite->buf_block_arr[i];
-
- if (UNIV_LIKELY(!block->page.zip.data)
- && UNIV_LIKELY(buf_block_get_state(block)
- == BUF_BLOCK_FILE_PAGE)
- && UNIV_UNLIKELY
- (memcmp(write_buf + len2 + (FIL_PAGE_LSN + 4),
- write_buf + len2
- + (UNIV_PAGE_SIZE
- - FIL_PAGE_END_LSN_OLD_CHKSUM + 4), 4))) {
- ut_print_timestamp(stderr);
- fprintf(stderr,
- " InnoDB: ERROR: The page to be written"
- " seems corrupt!\n"
- "InnoDB: The lsn fields do not match!"
- " Noticed in the doublewrite block1.\n");
- }
- }
-
if (trx_doublewrite->first_free <= TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) {
+ /* No unwritten pages in the second block. */
goto flush;
}
+ /* Write out the second block of the doublewrite buffer. */
len = (trx_doublewrite->first_free - TRX_SYS_DOUBLEWRITE_BLOCK_SIZE)
- * UNIV_PAGE_SIZE;
+ * UNIV_PAGE_SIZE;
write_buf = trx_doublewrite->write_buf
- + TRX_SYS_DOUBLEWRITE_BLOCK_SIZE * UNIV_PAGE_SIZE;
- ut_ad(i == TRX_SYS_DOUBLEWRITE_BLOCK_SIZE);
+ + TRX_SYS_DOUBLEWRITE_BLOCK_SIZE * UNIV_PAGE_SIZE;
fil_io(OS_FILE_WRITE, TRUE, TRX_SYS_SPACE, 0,
trx_doublewrite->block2, 0, len,
(void*) write_buf, NULL);
- for (len2 = 0; len2 + UNIV_PAGE_SIZE <= len;
- len2 += UNIV_PAGE_SIZE, i++) {
- const buf_block_t* block = (buf_block_t*)
- trx_doublewrite->buf_block_arr[i];
-
- if (UNIV_LIKELY(!block->page.zip.data)
- && UNIV_LIKELY(buf_block_get_state(block)
- == BUF_BLOCK_FILE_PAGE)
- && UNIV_UNLIKELY
- (memcmp(write_buf + len2 + (FIL_PAGE_LSN + 4),
- write_buf + len2
- + (UNIV_PAGE_SIZE
- - FIL_PAGE_END_LSN_OLD_CHKSUM + 4), 4))) {
- ut_print_timestamp(stderr);
- fprintf(stderr,
- " InnoDB: ERROR: The page to be"
- " written seems corrupt!\n"
- "InnoDB: The lsn fields do not match!"
- " Noticed in"
- " the doublewrite block2.\n");
- }
- }
-
flush:
- /* Now flush the doublewrite buffer data to disk */
+ /* increment the doublewrite flushed pages counter */
+ srv_dblwr_pages_written += trx_doublewrite->first_free;
+ srv_dblwr_writes++;
+ /* Now flush the doublewrite buffer data to disk */
fil_flush(TRX_SYS_SPACE);
/* We know that the writes have been flushed to disk now
@@ -884,60 +960,17 @@ flush:
const buf_block_t* block = (buf_block_t*)
trx_doublewrite->buf_block_arr[i];
- ut_a(buf_page_in_file(&block->page));
- if (UNIV_LIKELY_NULL(block->page.zip.data)) {
- fil_io(OS_FILE_WRITE | OS_AIO_SIMULATED_WAKE_LATER,
- FALSE, buf_page_get_space(&block->page),
- buf_page_get_zip_size(&block->page),
- buf_page_get_page_no(&block->page), 0,
- buf_page_get_zip_size(&block->page),
- (void*)block->page.zip.data,
- (void*)block);
-
- /* Increment the counter of I/O operations used
- for selecting LRU policy. */
- buf_LRU_stat_inc_io();
-
- continue;
- }
-
- ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
-
- if (UNIV_UNLIKELY(memcmp(block->frame + (FIL_PAGE_LSN + 4),
- block->frame
- + (UNIV_PAGE_SIZE
- - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
- 4))) {
- ut_print_timestamp(stderr);
- fprintf(stderr,
- " InnoDB: ERROR: The page to be written"
- " seems corrupt!\n"
- "InnoDB: The lsn fields do not match!"
- " Noticed in the buffer pool\n"
- "InnoDB: after posting and flushing"
- " the doublewrite buffer.\n"
- "InnoDB: Page buf fix count %lu,"
- " io fix %lu, state %lu\n",
- (ulong)block->page.buf_fix_count,
- (ulong)buf_block_get_io_fix(block),
- (ulong)buf_block_get_state(block));
- }
-
- fil_io(OS_FILE_WRITE | OS_AIO_SIMULATED_WAKE_LATER,
- FALSE, buf_block_get_space(block), 0,
- buf_block_get_page_no(block), 0, UNIV_PAGE_SIZE,
- (void*)block->frame, (void*)block);
-
- /* Increment the counter of I/O operations used
- for selecting LRU policy. */
- buf_LRU_stat_inc_io();
+ buf_flush_write_block_to_datafile(block);
}
/* Sync the writes to the disk. */
buf_flush_sync_datafiles();
+ mutex_enter(&trx_doublewrite->mutex);
+
/* We can now reuse the doublewrite memory buffer: */
trx_doublewrite->first_free = 0;
+ trx_doublewrite->batch_running = FALSE;
mutex_exit(&(trx_doublewrite->mutex));
}
@@ -953,13 +986,28 @@ buf_flush_post_to_doublewrite_buf(
buf_page_t* bpage) /*!< in: buffer block to write */
{
ulint zip_size;
+
+ ut_a(buf_page_in_file(bpage));
+
try_again:
mutex_enter(&(trx_doublewrite->mutex));
- ut_a(buf_page_in_file(bpage));
+ ut_a(trx_doublewrite->first_free <= srv_doublewrite_batch_size);
+
+ if (trx_doublewrite->batch_running) {
+ mutex_exit(&trx_doublewrite->mutex);
+
+ /* This not nearly as bad as it looks. There is only
+ page_cleaner thread which does background flushing
+ in batches therefore it is unlikely to be a contention
+ point. The only exception is when a user thread is
+ forced to do a flush batch because of a sync
+ checkpoint. */
+ os_thread_sleep(TRX_DOUBLEWRITE_BATCH_POLL_DELAY);
+ goto try_again;
+ }
- if (trx_doublewrite->first_free
- >= 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) {
+ if (trx_doublewrite->first_free == srv_doublewrite_batch_size) {
mutex_exit(&(trx_doublewrite->mutex));
buf_flush_buffered_writes();
@@ -992,8 +1040,7 @@ try_again:
trx_doublewrite->first_free++;
- if (trx_doublewrite->first_free
- >= 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) {
+ if (trx_doublewrite->first_free == srv_doublewrite_batch_size) {
mutex_exit(&(trx_doublewrite->mutex));
buf_flush_buffered_writes();
@@ -1003,6 +1050,140 @@ try_again:
mutex_exit(&(trx_doublewrite->mutex));
}
+
+/********************************************************************//**
+Writes a page to the doublewrite buffer on disk, sync it, then write
+the page to the datafile and sync the datafile. This function is used
+for single page flushes. If all the buffers allocated for single page
+flushes in the doublewrite buffer are in use we wait here for one to
+become free. We are guaranteed that a slot will become free because any
+thread that is using a slot must also release the slot before leaving
+this function. */
+static
+void
+buf_flush_write_to_dblwr_and_datafile(
+/*==================================*/
+ buf_page_t* bpage) /*!< in: buffer block to write */
+{
+ ulint n_slots;
+ ulint size;
+ ulint zip_size;
+ ulint offset;
+ ulint i;
+
+ ut_a(buf_page_in_file(bpage));
+ ut_a(srv_use_doublewrite_buf);
+ ut_a(trx_doublewrite != NULL);
+
+ /* total number of slots available for single page flushes
+ starts from srv_doublewrite_batch_size to the end of the
+ buffer. */
+ size = 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE;
+ ut_a(size > srv_doublewrite_batch_size);
+ n_slots = size - srv_doublewrite_batch_size;
+
+ if (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE) {
+
+ /* Check that the actual page in the buffer pool is
+ not corrupt and the LSN values are sane. */
+ buf_flush_doublewrite_check_block((buf_block_t*) bpage);
+
+ /* Check that the page as written to the doublewrite
+ buffer has sane LSN values. */
+ buf_flush_doublewrite_check_page_lsn(
+ ((buf_block_t*) bpage)->frame);
+ }
+
+retry:
+ mutex_enter(&trx_doublewrite->mutex);
+ if (trx_doublewrite->n_reserved == n_slots) {
+
+ mutex_exit(&trx_doublewrite->mutex);
+ /* All slots are reserved. Since it involves two IOs
+ during the processing a sleep of 10ms should be
+ enough. */
+ os_thread_sleep(TRX_DOUBLEWRITE_BATCH_POLL_DELAY);
+ goto retry;
+ }
+
+ for (i = srv_doublewrite_batch_size; i < size; ++i) {
+
+ if (!trx_doublewrite->in_use[i]) {
+ break;
+ }
+ }
+
+ /* We are guaranteed to find a slot. */
+ ut_a(i < size);
+ trx_doublewrite->in_use[i] = TRUE;
+ trx_doublewrite->n_reserved++;
+ trx_doublewrite->buf_block_arr[i] = bpage;
+ mutex_exit(&trx_doublewrite->mutex);
+
+ /* Lets see if we are going to write in the first or second
+ block of the doublewrite buffer. */
+ if (i < TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) {
+ offset = trx_doublewrite->block1 + i;
+ } else {
+ offset = trx_doublewrite->block2 + i
+ - TRX_SYS_DOUBLEWRITE_BLOCK_SIZE;
+ }
+
+ /* We deal with compressed and uncompressed pages a little
+ differently here. In case of uncompressed pages we can
+ directly write the block to the allocated slot in the
+ doublewrite buffer in the system tablespace and then after
+ syncing the system table space we can proceed to write the page
+ in the datafile.
+ In case of compressed page we first do a memcpy of the block
+ to the in-memory buffer of doublewrite before proceeding to
+ write it. This is so because we want to pad the remaining
+ bytes in the doublewrite page with zeros. */
+
+ zip_size = buf_page_get_zip_size(bpage);
+ if (zip_size) {
+ memcpy(trx_doublewrite->write_buf + UNIV_PAGE_SIZE * i,
+ bpage->zip.data, zip_size);
+ memset(trx_doublewrite->write_buf + UNIV_PAGE_SIZE * i
+ + zip_size, 0, UNIV_PAGE_SIZE - zip_size);
+
+ fil_io(OS_FILE_WRITE, TRUE, TRX_SYS_SPACE, 0,
+ offset, 0, UNIV_PAGE_SIZE,
+ (void*) (trx_doublewrite->write_buf
+ + UNIV_PAGE_SIZE * i), NULL);
+ } else {
+ /* It is a regular page. Write it directly to the
+ doublewrite buffer */
+ fil_io(OS_FILE_WRITE, TRUE, TRX_SYS_SPACE, 0,
+ offset, 0, UNIV_PAGE_SIZE,
+ (void*) ((buf_block_t*) bpage)->frame,
+ NULL);
+ }
+
+ /* Now flush the doublewrite buffer data to disk */
+ fil_flush(TRX_SYS_SPACE);
+
+ /* We know that the write has been flushed to disk now
+ and during recovery we will find it in the doublewrite buffer
+ blocks. Next do the write to the intended position. */
+ buf_flush_write_block_to_datafile((buf_block_t*) bpage);
+
+ /* Sync the writes to the disk. */
+ buf_flush_sync_datafiles();
+
+ mutex_enter(&trx_doublewrite->mutex);
+
+ trx_doublewrite->n_reserved--;
+ trx_doublewrite->buf_block_arr[i] = NULL;
+ trx_doublewrite->in_use[i] = FALSE;
+
+ /* increment the doublewrite flushed pages counter */
+ srv_dblwr_pages_written += trx_doublewrite->first_free;
+ srv_dblwr_writes++;
+
+ mutex_exit(&(trx_doublewrite->mutex));
+
+}
#endif /* !UNIV_HOTBACKUP */
/********************************************************************//**
@@ -1092,7 +1273,8 @@ static
void
buf_flush_write_block_low(
/*======================*/
- buf_page_t* bpage) /*!< in: buffer block to write */
+ buf_page_t* bpage, /*!< in: buffer block to write */
+ enum buf_flush flush_type) /*!< in: type of flush */
{
ulint zip_size = buf_page_get_zip_size(bpage);
page_t* frame = NULL;
@@ -1174,88 +1356,13 @@ buf_flush_write_block_low(
buf_page_get_page_no(bpage), 0,
zip_size ? zip_size : UNIV_PAGE_SIZE,
frame, bpage);
+ } else if (flush_type == BUF_FLUSH_SINGLE_PAGE) {
+ buf_flush_write_to_dblwr_and_datafile(bpage);
} else {
buf_flush_post_to_doublewrite_buf(bpage);
}
}
-# if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
-/********************************************************************//**
-Writes a flushable page asynchronously from the buffer pool to a file.
-NOTE: buf_pool->mutex and block->mutex must be held upon entering this
-function, and they will be released by this function after flushing.
-This is loosely based on buf_flush_batch() and buf_flush_page().
-@return TRUE if the page was flushed and the mutexes released */
-UNIV_INTERN
-ibool
-buf_flush_page_try(
-/*===============*/
- buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */
- buf_block_t* block) /*!< in/out: buffer control block */
-{
- ut_ad(buf_pool_mutex_own(buf_pool));
- ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
- ut_ad(mutex_own(&block->mutex));
-
- if (!buf_flush_ready_for_flush(&block->page, BUF_FLUSH_LRU)) {
- return(FALSE);
- }
-
- if (buf_pool->n_flush[BUF_FLUSH_LRU] > 0
- || buf_pool->init_flush[BUF_FLUSH_LRU]) {
- /* There is already a flush batch of the same type running */
- return(FALSE);
- }
-
- buf_pool->init_flush[BUF_FLUSH_LRU] = TRUE;
-
- buf_page_set_io_fix(&block->page, BUF_IO_WRITE);
-
- buf_page_set_flush_type(&block->page, BUF_FLUSH_LRU);
-
- if (buf_pool->n_flush[BUF_FLUSH_LRU]++ == 0) {
-
- os_event_reset(buf_pool->no_flush[BUF_FLUSH_LRU]);
- }
-
- /* VERY IMPORTANT:
- Because any thread may call the LRU flush, even when owning
- locks on pages, to avoid deadlocks, we must make sure that the
- s-lock is acquired on the page without waiting: this is
- accomplished because buf_flush_ready_for_flush() must hold,
- and that requires the page not to be bufferfixed. */
-
- rw_lock_s_lock_gen(&block->lock, BUF_IO_WRITE);
-
- /* Note that the s-latch is acquired before releasing the
- buf_pool mutex: this ensures that the latch is acquired
- immediately. */
-
- mutex_exit(&block->mutex);
- buf_pool_mutex_exit(buf_pool);
-
- /* Even though block is not protected by any mutex at this
- point, it is safe to access block, because it is io_fixed and
- oldest_modification != 0. Thus, it cannot be relocated in the
- buffer pool or removed from flush_list or LRU_list. */
-
- buf_flush_write_block_low(&block->page);
-
- buf_pool_mutex_enter(buf_pool);
- buf_pool->init_flush[BUF_FLUSH_LRU] = FALSE;
-
- if (buf_pool->n_flush[BUF_FLUSH_LRU] == 0) {
- /* The running flush batch has ended */
- os_event_set(buf_pool->no_flush[BUF_FLUSH_LRU]);
- }
-
- buf_pool_mutex_exit(buf_pool);
- buf_flush_buffered_writes();
-
- return(TRUE);
-}
-# endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
-
/********************************************************************//**
Writes a flushable page asynchronously from the buffer pool to a file.
NOTE: in simulated aio we must call
@@ -1269,13 +1376,12 @@ buf_flush_page(
/*===========*/
buf_pool_t* buf_pool, /*!< in: buffer pool instance */
buf_page_t* bpage, /*!< in: buffer control block */
- enum buf_flush flush_type) /*!< in: BUF_FLUSH_LRU
- or BUF_FLUSH_LIST */
+ enum buf_flush flush_type) /*!< in: type of flush */
{
mutex_t* block_mutex;
ibool is_uncompressed;
- ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST);
+ ut_ad(flush_type < BUF_FLUSH_N_TYPES);
ut_ad(buf_pool_mutex_own(buf_pool));
ut_ad(buf_page_in_file(bpage));
@@ -1311,8 +1417,6 @@ buf_flush_page(
BUF_IO_WRITE);
}
- MONITOR_INC(MONITOR_BUF_FLUSH_LIST);
-
mutex_exit(block_mutex);
buf_pool_mutex_exit(buf_pool);
@@ -1334,20 +1438,23 @@ buf_flush_page(
break;
case BUF_FLUSH_LRU:
+ case BUF_FLUSH_SINGLE_PAGE:
/* VERY IMPORTANT:
- Because any thread may call the LRU flush, even when owning
- locks on pages, to avoid deadlocks, we must make sure that the
- s-lock is acquired on the page without waiting: this is
- accomplished because buf_flush_ready_for_flush() must hold,
- and that requires the page not to be bufferfixed. */
+ Because any thread may call single page flush, even when
+ owning locks on pages, to avoid deadlocks, we must make
+ sure that the s-lock is acquired on the page without
+ waiting: this is accomplished because
+ buf_flush_ready_for_flush() must hold, and that requires
+ the page not to be bufferfixed.
+ The same holds true for LRU flush because a user thread
+ may end up waiting for an LRU flush to end while
+ holding locks on other pages. */
if (is_uncompressed) {
rw_lock_s_lock_gen(&((buf_block_t*) bpage)->lock,
BUF_IO_WRITE);
}
- MONITOR_INC(MONITOR_BUF_FLUSH_LRU);
-
/* Note that the s-latch is acquired before releasing the
buf_pool mutex: this ensures that the latch is acquired
immediately. */
@@ -1372,9 +1479,37 @@ buf_flush_page(
flush_type, bpage->space, bpage->offset);
}
#endif /* UNIV_DEBUG */
- buf_flush_write_block_low(bpage);
+ buf_flush_write_block_low(bpage, flush_type);
}
+# if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
+/********************************************************************//**
+Writes a flushable page asynchronously from the buffer pool to a file.
+NOTE: buf_pool->mutex and block->mutex must be held upon entering this
+function, and they will be released by this function after flushing.
+This is loosely based on buf_flush_batch() and buf_flush_page().
+@return TRUE if the page was flushed and the mutexes released */
+UNIV_INTERN
+ibool
+buf_flush_page_try(
+/*===============*/
+ buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */
+ buf_block_t* block) /*!< in/out: buffer control block */
+{
+ ut_ad(buf_pool_mutex_own(buf_pool));
+ ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
+ ut_ad(mutex_own(&block->mutex));
+
+ if (!buf_flush_ready_for_flush(&block->page, BUF_FLUSH_SINGLE_PAGE)) {
+ return(FALSE);
+ }
+
+ /* The following call will release the buffer pool and
+ block mutex. */
+ buf_flush_page(buf_pool, &block->page, BUF_FLUSH_SINGLE_PAGE);
+ return(TRUE);
+}
+# endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
/***********************************************************//**
Flushes to disk all flushable pages within the flush area.
@return number of pages flushed */
@@ -1399,10 +1534,10 @@ buf_flush_try_neighbors(
ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST);
- if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN) {
- /* If there is little space, it is better not to flush
- any block except from the end of the LRU list */
-
+ if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN
+ || !srv_flush_neighbors) {
+ /* If there is little space or neighbor flushing is
+ not enabled then just flush the victim. */
low = offset;
high = offset + 1;
} else {
@@ -1493,6 +1628,14 @@ buf_flush_try_neighbors(
buf_pool_mutex_exit(buf_pool);
}
+ if (count > 0) {
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
+ MONITOR_FLUSH_NEIGHBOR_COUNT,
+ MONITOR_FLUSH_NEIGHBOR_PAGES,
+ (count - 1));
+ }
+
return(count);
}
@@ -1500,7 +1643,7 @@ buf_flush_try_neighbors(
Check if the block is modified and ready for flushing. If the the block
is ready to flush then flush the page and try o flush its neighbors.
-@return TRUE if buf_pool mutex was not released during this function.
+@return TRUE if buf_pool mutex was released during this function.
This does not guarantee that some pages were written as well.
Number of pages written are incremented to the count. */
static
@@ -1566,36 +1709,77 @@ buf_flush_page_and_try_neighbors(
/*******************************************************************//**
This utility flushes dirty blocks from the end of the LRU list.
-In the case of an LRU flush the calling thread may own latches to
-pages: to avoid deadlocks, this function must be written so that it
-cannot end up waiting for these latches!
+The calling thread is not allowed to own any latches on pages!
+It attempts to make 'max' blocks available in the free list. Note that
+it is a best effort attempt and it is not guaranteed that after a call
+to this function there will be 'max' blocks in the free list.
@return number of blocks for which the write request was queued. */
static
ulint
buf_flush_LRU_list_batch(
/*=====================*/
buf_pool_t* buf_pool, /*!< in: buffer pool instance */
- ulint max) /*!< in: max of blocks to flush */
+ ulint max) /*!< in: desired number of
+ blocks in the free_list */
{
buf_page_t* bpage;
+ ulint scanned = 0;
ulint count = 0;
+ ulint free_len = UT_LIST_GET_LEN(buf_pool->free);
+ ulint lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
ut_ad(buf_pool_mutex_own(buf_pool));
- do {
- /* Start from the end of the list looking for a
- suitable block to be flushed. */
- bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+ bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+ while (bpage != NULL && count < max
+ && free_len < srv_LRU_scan_depth
+ && lru_len > BUF_LRU_MIN_LEN) {
- /* Iterate backwards over the flush list till we find
- a page that isn't ready for flushing. */
- while (bpage != NULL
- && !buf_flush_page_and_try_neighbors(
- bpage, BUF_FLUSH_LRU, max, &count)) {
+ mutex_t* block_mutex = buf_page_get_mutex(bpage);
+ ibool evict;
+
+ mutex_enter(block_mutex);
+ evict = buf_flush_ready_for_replace(bpage);
+ mutex_exit(block_mutex);
+ ++scanned;
+
+ /* If the block is ready to be replaced we try to
+ free it i.e.: put it on the free list.
+ Otherwise we try to flush the block and its
+ neighbors. In this case we'll put it on the
+ free list in the next pass. We do this extra work
+ of putting blocks to the free list instead of
+ just flushing them because after every flush
+ we have to restart the scan from the tail of
+ the LRU list and if we don't clear the tail
+ of the flushed pages then the scan becomes
+ O(n*n). */
+ if (evict) {
+
+ ibool evict_zip;
+
+ evict_zip = !buf_LRU_evict_from_unzip_LRU(buf_pool);
+
+ /* This will potentially release the
+ buf_pool->mutex. */
+ buf_LRU_free_block(bpage, evict_zip);
+ bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+
+ } else if (buf_flush_page_and_try_neighbors(
+ bpage,
+ BUF_FLUSH_LRU, max, &count)) {
+
+ /* buf_pool->mutex was released.
+ Restart the scan. */
+ bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+ } else {
bpage = UT_LIST_GET_PREV(LRU, bpage);
}
- } while (bpage != NULL && count < max);
+
+ free_len = UT_LIST_GET_LEN(buf_pool->free);
+ lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
+ }
/* We keep track of all flushes happening as part of LRU
flush. When estimating the desired rate at which flush_list
@@ -1604,6 +1788,13 @@ buf_flush_LRU_list_batch(
ut_ad(buf_pool_mutex_own(buf_pool));
+ if (scanned) {
+ MONITOR_INC_VALUE_CUMULATIVE(MONITOR_LRU_BATCH_SCANNED,
+ MONITOR_LRU_BATCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_BATCH_SCANNED_PER_CALL,
+ scanned);
+ }
+
return(count);
}
@@ -1857,6 +2048,8 @@ buf_flush_end(
buf_pool->init_flush[flush_type] = FALSE;
+ buf_pool->try_LRU_scan = TRUE;
+
if (buf_pool->n_flush[flush_type] == 0) {
/* The running flush batch has ended */
@@ -1899,17 +2092,17 @@ buf_flush_wait_batch_end(
}
/*******************************************************************//**
-This utility flushes dirty blocks from the end of the LRU list.
-NOTE: The calling thread may own latches to pages: to avoid deadlocks,
-this function must be written so that it cannot end up waiting for these
-latches!
+This utility flushes dirty blocks from the end of the LRU list and also
+puts replaceable clean pages from the end of the LRU list to the free
+list.
+NOTE: The calling thread is not allowed to own any latches on pages!
@return number of blocks for which the write request was queued;
ULINT_UNDEFINED if there was a flush of the same type already running */
-UNIV_INTERN
+static
ulint
buf_flush_LRU(
/*==========*/
- buf_pool_t* buf_pool, /*!< in: buffer pool instance */
+ buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */
ulint min_n) /*!< in: wished minimum mumber of blocks
flushed (it is not guaranteed that the
actual number is that big, though) */
@@ -1993,121 +2186,109 @@ buf_flush_list(
total_page_count += page_count;
- MONITOR_INC_VALUE_CUMULATIVE(
+ if (page_count) {
+ MONITOR_INC_VALUE_CUMULATIVE(
MONITOR_FLUSH_BATCH_TOTAL_PAGE,
MONITOR_FLUSH_BATCH_COUNT,
MONITOR_FLUSH_BATCH_PAGES,
page_count);
+ }
}
return(lsn_limit != LSN_MAX && skipped
? ULINT_UNDEFINED : total_page_count);
}
-
+
/******************************************************************//**
-Gives a recommendation of how many blocks should be flushed to establish
-a big enough margin of replaceable blocks near the end of the LRU list
-and in the free list.
-@return number of blocks which should be flushed from the end of the
-LRU list */
-static
-ulint
-buf_flush_LRU_recommendation(
-/*=========================*/
- buf_pool_t* buf_pool) /*!< in: Buffer pool instance */
+This function picks up a single dirty page from the tail of the LRU
+list, flushes it, removes it from page_hash and LRU list and puts
+it on the free list. It is called from user threads when they are
+unable to find a replaceable page at the tail of the LRU list i.e.:
+when the background LRU flushing in the page_cleaner thread is not
+fast enough to keep pace with the workload.
+@return TRUE if success. */
+UNIV_INTERN
+ibool
+buf_flush_single_page_from_LRU(
+/*===========================*/
+ buf_pool_t* buf_pool) /*!< in/out: buffer pool instance */
{
+ ulint scanned;
buf_page_t* bpage;
- ulint n_replaceable;
- ulint distance = 0;
+ mutex_t* block_mutex;
+ ibool freed;
+ ibool evict_zip;
buf_pool_mutex_enter(buf_pool);
- n_replaceable = UT_LIST_GET_LEN(buf_pool->free);
-
- bpage = UT_LIST_GET_LAST(buf_pool->LRU);
-
- while ((bpage != NULL)
- && (n_replaceable < BUF_FLUSH_FREE_BLOCK_MARGIN(buf_pool)
- + BUF_FLUSH_EXTRA_MARGIN(buf_pool))
- && (distance < BUF_LRU_FREE_SEARCH_LEN(buf_pool))) {
-
- mutex_t* block_mutex = buf_page_get_mutex(bpage);
+ for (bpage = UT_LIST_GET_LAST(buf_pool->LRU), scanned = 1;
+ bpage != NULL;
+ bpage = UT_LIST_GET_PREV(LRU, bpage), ++scanned) {
+ block_mutex = buf_page_get_mutex(bpage);
mutex_enter(block_mutex);
-
- if (buf_flush_ready_for_replace(bpage)) {
- n_replaceable++;
+ if (buf_flush_ready_for_flush(bpage,
+ BUF_FLUSH_SINGLE_PAGE)) {
+ /* buf_flush_page() will release the block
+ mutex */
+ break;
}
-
mutex_exit(block_mutex);
-
- distance++;
-
- bpage = UT_LIST_GET_PREV(LRU, bpage);
}
- buf_pool_mutex_exit(buf_pool);
-
- if (n_replaceable >= BUF_FLUSH_FREE_BLOCK_MARGIN(buf_pool)) {
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED_PER_CALL,
+ scanned);
- return(0);
+ if (!bpage) {
+ /* Can't find a single flushable page. */
+ buf_pool_mutex_exit(buf_pool);
+ return(FALSE);
}
- return(BUF_FLUSH_FREE_BLOCK_MARGIN(buf_pool)
- + BUF_FLUSH_EXTRA_MARGIN(buf_pool)
- - n_replaceable);
-}
-
-/*********************************************************************//**
-Flushes pages from the end of the LRU list if there is too small a margin
-of replaceable pages there or in the free list. VERY IMPORTANT: this function
-is called also by threads which have locks on pages. To avoid deadlocks, we
-flush only pages such that the s-lock required for flushing can be acquired
-immediately, without waiting. */
-UNIV_INTERN
-void
-buf_flush_free_margin(
-/*==================*/
- buf_pool_t* buf_pool) /*!< in: Buffer pool instance */
-{
- ulint n_to_flush;
-
- n_to_flush = buf_flush_LRU_recommendation(buf_pool);
-
- if (n_to_flush > 0) {
- ulint n_flushed;
+ /* The following call will release the buffer pool and
+ block mutex. */
+ buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE);
+
+ /* At this point the page has been written to the disk.
+ As we are not holding buffer pool or block mutex therefore
+ we cannot use the bpage safely. It may have been plucked out
+ of the LRU list by some other thread or it may even have
+ relocated in case of a compressed page. We need to start
+ the scan of LRU list again to remove the block from the LRU
+ list and put it on the free list. */
+ buf_pool_mutex_enter(buf_pool);
- n_flushed = buf_flush_LRU(buf_pool, n_to_flush);
+ for (bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+ bpage != NULL;
+ bpage = UT_LIST_GET_PREV(LRU, bpage)) {
- if (n_flushed == ULINT_UNDEFINED) {
- /* There was an LRU type flush batch already running;
- let us wait for it to end */
+ ibool ready;
- buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);
- } else {
- MONITOR_INC(MONITOR_NUM_FREE_MARGIN_FLUSHES);
- MONITOR_INC_VALUE(MONITOR_FLUSH_FREE_MARGIN_PAGES,
- n_flushed);
+ block_mutex = buf_page_get_mutex(bpage);
+ mutex_enter(block_mutex);
+ ready = buf_flush_ready_for_replace(bpage);
+ mutex_exit(block_mutex);
+ if (ready) {
+ break;
}
+
}
-}
-/*********************************************************************//**
-Flushes pages from the end of all the LRU lists. */
-UNIV_INTERN
-void
-buf_flush_free_margins(void)
-/*========================*/
-{
- ulint i;
+ if (!bpage) {
+ /* Can't find a single replaceable page. */
+ buf_pool_mutex_exit(buf_pool);
+ return(FALSE);
+ }
- for (i = 0; i < srv_buf_pool_instances; i++) {
- buf_pool_t* buf_pool;
+ evict_zip = !buf_LRU_evict_from_unzip_LRU(buf_pool);;
- buf_pool = buf_pool_from_array(i);
+ freed = buf_LRU_free_block(bpage, evict_zip);
+ buf_pool_mutex_exit(buf_pool);
- buf_flush_free_margin(buf_pool);
- }
+ return(freed);
}
/*********************************************************************
@@ -2233,6 +2414,84 @@ buf_flush_get_desired_flush_rate(void)
}
/*********************************************************************//**
+Clears up tail of the LRU lists:
+* Put replaceable pages at the tail of LRU to the free list
+* Flush dirty pages at the tail of LRU to the disk
+The depth to which we scan each buffer pool is controlled by dynamic
+config parameter innodb_LRU_scan_depth.
+@return total pages flushed */
+UNIV_INLINE
+ulint
+page_cleaner_flush_LRU_tail(void)
+/*=============================*/
+{
+ ulint i;
+ ulint j;
+ ulint total_flushed = 0;
+
+ for (i = 0; i < srv_buf_pool_instances; i++) {
+
+ buf_pool_t* buf_pool = buf_pool_from_array(i);
+
+ /* We divide LRU flush into smaller chunks because
+ there may be user threads waiting for the flush to
+ end in buf_LRU_get_free_block(). */
+ for (j = 0;
+ j < srv_LRU_scan_depth;
+ j += PAGE_CLEANER_LRU_BATCH_CHUNK_SIZE) {
+
+ ulint n_flushed = buf_flush_LRU(buf_pool,
+ PAGE_CLEANER_LRU_BATCH_CHUNK_SIZE);
+
+ /* Currently page_cleaner is the only thread
+ that can trigger an LRU flush. It is possible
+ that a batch triggered during last iteration is
+ still running, */
+ if (n_flushed != ULINT_UNDEFINED) {
+ total_flushed += n_flushed;
+ }
+ }
+ }
+
+ if (total_flushed) {
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_LRU_BATCH_TOTAL_PAGE,
+ MONITOR_LRU_BATCH_COUNT,
+ MONITOR_LRU_BATCH_PAGES,
+ total_flushed);
+ }
+
+ return(total_flushed);
+}
+
+/*********************************************************************//**
+Wait for any possible LRU flushes that are in progress to end. */
+UNIV_INLINE
+void
+page_cleaner_wait_LRU_flush(void)
+/*=============================*/
+{
+ ulint i;
+
+ for (i = 0; i < srv_buf_pool_instances; i++) {
+ buf_pool_t* buf_pool;
+
+ buf_pool = buf_pool_from_array(i);
+
+ buf_pool_mutex_enter(buf_pool);
+
+ if (buf_pool->n_flush[BUF_FLUSH_LRU] > 0
+ || buf_pool->init_flush[BUF_FLUSH_LRU]) {
+
+ buf_pool_mutex_exit(buf_pool);
+ buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);
+ } else {
+ buf_pool_mutex_exit(buf_pool);
+ }
+ }
+}
+
+/*********************************************************************//**
Flush a batch of dirty pages from the flush list
@return number of pages flushed, 0 if no page is flushed or if another
flush_list type batch is running */
@@ -2256,12 +2515,6 @@ page_cleaner_do_flush_batch(
n_flushed = 0;
}
- /* Record the IO capacity percentage used for the flush.
- Note that this can be more than 100% in case where we
- are being asked to flush to a certain lsn_limit */
- MONITOR_SET(MONITOR_FLUSH_IO_CAPACITY_PCT,
- n_flushed * 100 / srv_io_capacity)
-
return(n_flushed);
}
@@ -2303,8 +2556,11 @@ page_cleaner_flush_pages_if_needed(void)
n_pages_flushed = page_cleaner_do_flush_batch(ULINT_MAX,
lsn_limit);
- MONITOR_INC(MONITOR_NUM_ASYNC_FLUSHES);
- MONITOR_SET(MONITOR_FLUSH_ASYNC_PAGES, n_pages_flushed);
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_FLUSH_ASYNC_TOTAL_PAGE,
+ MONITOR_FLUSH_ASYNC_COUNT,
+ MONITOR_FLUSH_ASYNC_PAGES,
+ n_pages_flushed);
}
if (UNIV_UNLIKELY(n_pages_flushed < PCT_IO(100)
@@ -2316,8 +2572,12 @@ page_cleaner_flush_pages_if_needed(void)
n_pages_flushed += page_cleaner_do_flush_batch(PCT_IO(100),
LSN_MAX);
- MONITOR_INC(MONITOR_NUM_MAX_DIRTY_FLUSHES);
- MONITOR_SET(MONITOR_FLUSH_MAX_DIRTY_PAGES, n_pages_flushed);
+
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE,
+ MONITOR_FLUSH_MAX_DIRTY_COUNT,
+ MONITOR_FLUSH_MAX_DIRTY_PAGES,
+ n_pages_flushed);
}
if (srv_adaptive_flushing && n_pages_flushed == 0) {
@@ -2330,12 +2590,13 @@ page_cleaner_flush_pages_if_needed(void)
ut_ad(n_flush <= PCT_IO(100));
if (n_flush) {
n_pages_flushed = page_cleaner_do_flush_batch(
- n_flush,
- LSN_MAX);
+ n_flush, LSN_MAX);
- MONITOR_INC(MONITOR_NUM_ADAPTIVE_FLUSHES);
- MONITOR_SET(MONITOR_FLUSH_ADAPTIVE_PAGES,
- n_pages_flushed);
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
+ MONITOR_FLUSH_ADAPTIVE_COUNT,
+ MONITOR_FLUSH_ADAPTIVE_PAGES,
+ n_pages_flushed);
}
}
@@ -2407,11 +2668,24 @@ buf_flush_page_cleaner_thread(
if (srv_check_activity(last_activity)) {
last_activity = srv_get_activity_count();
- n_flushed = page_cleaner_flush_pages_if_needed();
+
+ /* Flush pages from end of LRU if required */
+ n_flushed = page_cleaner_flush_LRU_tail();
+
+ /* Flush pages from flush_list if required */
+ n_flushed += page_cleaner_flush_pages_if_needed();
} else {
n_flushed = page_cleaner_do_flush_batch(
PCT_IO(100),
LSN_MAX);
+
+ if (n_flushed) {
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
+ MONITOR_FLUSH_BACKGROUND_COUNT,
+ MONITOR_FLUSH_BACKGROUND_PAGES,
+ n_flushed);
+ }
}
}
@@ -2456,6 +2730,8 @@ buf_flush_page_cleaner_thread(
sweep and we'll come out of the loop leaving behind dirty pages
in the flush_list */
buf_flush_wait_batch_end(NULL, BUF_FLUSH_LIST);
+ page_cleaner_wait_LRU_flush();
+
do {
n_flushed = buf_flush_list(PCT_IO(100), LSN_MAX);
=== modified file 'storage/innobase/buf/buf0lru.c'
--- a/storage/innobase/buf/buf0lru.c revid:marc.alff@stripped
+++ b/storage/innobase/buf/buf0lru.c revid:inaam.rana@stripped
@@ -49,6 +49,7 @@ Created 11/5/1995 Heikki Tuuri
#include "page0zip.h"
#include "log0recv.h"
#include "srv0srv.h"
+#include "srv0mon.h"
/** The number of blocks from the LRU_old pointer onward, including
the block pointed to, must be buf_pool->LRU_old_ratio/BUF_LRU_OLD_RATIO_DIV
@@ -155,7 +156,7 @@ buf_LRU_block_free_hashed_page(
Determines if the unzip_LRU list should be used for evicting a victim
instead of the general LRU list.
@return TRUE if should use unzip_LRU */
-UNIV_INLINE
+UNIV_INTERN
ibool
buf_LRU_evict_from_unzip_LRU(
/*=========================*/
@@ -548,52 +549,39 @@ ibool
buf_LRU_free_from_unzip_LRU_list(
/*=============================*/
buf_pool_t* buf_pool, /*!< in: buffer pool instance */
- ulint n_iterations) /*!< in: how many times this has
- been called repeatedly without
- result: a high value means that
- we should search farther; we will
- search n_iterations / 5 of the
- unzip_LRU list, or nothing if
- n_iterations >= 5 */
+ ibool scan_all) /*!< in: scan whole LRU list
+ if TRUE, otherwise scan only
+ srv_LRU_scan_depth / 2 blocks. */
{
buf_block_t* block;
- ulint distance;
+ ibool freed;
+ ulint scanned;
ut_ad(buf_pool_mutex_own(buf_pool));
- /* Theoratically it should be much easier to find a victim
- from unzip_LRU as we can choose even a dirty block (as we'll
- be evicting only the uncompressed frame). In a very unlikely
- eventuality that we are unable to find a victim from
- unzip_LRU, we fall back to the regular LRU list. We do this
- if we have done five iterations so far. */
-
- if (UNIV_UNLIKELY(n_iterations >= 5)
- || !buf_LRU_evict_from_unzip_LRU(buf_pool)) {
-
+ if (!buf_LRU_evict_from_unzip_LRU(buf_pool)) {
return(FALSE);
}
- distance = 100 + (n_iterations
- * UT_LIST_GET_LEN(buf_pool->unzip_LRU)) / 5;
-
- for (block = UT_LIST_GET_LAST(buf_pool->unzip_LRU);
- UNIV_LIKELY(block != NULL) && UNIV_LIKELY(distance > 0);
- block = UT_LIST_GET_PREV(unzip_LRU, block), distance--) {
-
- ibool freed;
+ for (block = UT_LIST_GET_LAST(buf_pool->unzip_LRU),
+ scanned = 1, freed = FALSE;
+ block != NULL && !freed
+ && (scan_all || scanned < srv_LRU_scan_depth);
+ block = UT_LIST_GET_PREV(unzip_LRU, block), ++scanned) {
ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
ut_ad(block->in_unzip_LRU_list);
ut_ad(block->page.in_LRU_list);
freed = buf_LRU_free_block(&block->page, FALSE);
- if (freed) {
- return(TRUE);
- }
}
- return(FALSE);
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED_PER_CALL,
+ scanned);
+ return(freed);
}
/******************************************************************//**
@@ -603,27 +591,23 @@ UNIV_INLINE
ibool
buf_LRU_free_from_common_LRU_list(
/*==============================*/
- buf_pool_t* buf_pool,
- ulint n_iterations)
- /*!< in: how many times this has been called
- repeatedly without result: a high value means
- that we should search farther; if
- n_iterations < 10, then we search
- n_iterations / 10 * buf_pool->curr_size
- pages from the end of the LRU list */
+ buf_pool_t* buf_pool, /*!< in: buffer pool instance */
+ ibool scan_all) /*!< in: scan whole LRU list
+ if TRUE, otherwise scan only
+ srv_LRU_scan_depth / 2 blocks. */
{
buf_page_t* bpage;
- ulint distance;
+ ibool freed;
+ ulint scanned;
ut_ad(buf_pool_mutex_own(buf_pool));
- distance = 100 + (n_iterations * buf_pool->curr_size) / 10;
+ for (bpage = UT_LIST_GET_LAST(buf_pool->LRU),
+ scanned = 1, freed = FALSE;
+ bpage != NULL && !freed
+ && (scan_all || scanned < srv_LRU_scan_depth);
+ bpage = UT_LIST_GET_PREV(LRU, bpage), ++scanned) {
- for (bpage = UT_LIST_GET_LAST(buf_pool->LRU);
- UNIV_LIKELY(bpage != NULL) && UNIV_LIKELY(distance > 0);
- bpage = UT_LIST_GET_PREV(LRU, bpage), distance--) {
-
- ibool freed;
unsigned accessed;
ut_ad(buf_page_in_file(bpage));
@@ -631,18 +615,21 @@ buf_LRU_free_from_common_LRU_list(
accessed = buf_page_is_accessed(bpage);
freed = buf_LRU_free_block(bpage, TRUE);
- if (freed) {
+ if (freed && !accessed) {
/* Keep track of pages that are evicted without
ever being accessed. This gives us a measure of
the effectiveness of readahead */
- if (!accessed) {
- ++buf_pool->stat.n_ra_pages_evicted;
- }
- return(TRUE);
+ ++buf_pool->stat.n_ra_pages_evicted;
}
}
- return(FALSE);
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_LRU_SEARCH_SCANNED,
+ MONITOR_LRU_SEARCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_SEARCH_SCANNED_PER_CALL,
+ scanned);
+
+ return(freed);
}
/******************************************************************//**
@@ -650,78 +637,18 @@ Try to free a replaceable block.
@return TRUE if found and freed */
UNIV_INTERN
ibool
-buf_LRU_search_and_free_block(
-/*==========================*/
- buf_pool_t* buf_pool,
- /*!< in: buffer pool instance */
- ulint n_iterations)
- /*!< in: how many times this has been called
- repeatedly without result: a high value means
- that we should search farther; if
- n_iterations < 10, then we search
- n_iterations / 10 * buf_pool->curr_size
- pages from the end of the LRU list; if
- n_iterations < 5, then we will also search
- n_iterations / 5 of the unzip_LRU list. */
-{
- ibool freed = FALSE;
-
- buf_pool_mutex_enter(buf_pool);
-
- freed = buf_LRU_free_from_unzip_LRU_list(buf_pool, n_iterations);
-
- if (!freed) {
- freed = buf_LRU_free_from_common_LRU_list(
- buf_pool, n_iterations);
- }
-
- if (!freed) {
- buf_pool->LRU_flush_ended = 0;
- } else if (buf_pool->LRU_flush_ended > 0) {
- buf_pool->LRU_flush_ended--;
- }
-
- buf_pool_mutex_exit(buf_pool);
-
- return(freed);
-}
-
-/******************************************************************//**
-Tries to remove LRU flushed blocks from the end of the LRU list and put them
-to the free list. This is beneficial for the efficiency of the insert buffer
-operation, as flushed pages from non-unique non-clustered indexes are here
-taken out of the buffer pool, and their inserts redirected to the insert
-buffer. Otherwise, the flushed blocks could get modified again before read
-operations need new buffer blocks, and the i/o work done in flushing would be
-wasted. */
-UNIV_INTERN
-void
-buf_LRU_try_free_flushed_blocks(
-/*============================*/
- buf_pool_t* buf_pool) /*!< in: buffer pool instance */
+buf_LRU_scan_and_free_block(
+/*========================*/
+ buf_pool_t* buf_pool, /*!< in: buffer pool instance */
+ ibool scan_all) /*!< in: scan whole LRU list
+ if TRUE, otherwise scan only
+ 'old' blocks. */
{
+ ut_ad(buf_pool_mutex_own(buf_pool));
- if (buf_pool == NULL) {
- ulint i;
-
- for (i = 0; i < srv_buf_pool_instances; i++) {
- buf_pool = buf_pool_from_array(i);
- buf_LRU_try_free_flushed_blocks(buf_pool);
- }
- } else {
- buf_pool_mutex_enter(buf_pool);
-
- while (buf_pool->LRU_flush_ended > 0) {
-
- buf_pool_mutex_exit(buf_pool);
-
- buf_LRU_search_and_free_block(buf_pool, 1);
-
- buf_pool_mutex_enter(buf_pool);
- }
-
- buf_pool_mutex_exit(buf_pool);
- }
+ return(buf_LRU_free_from_unzip_LRU_list(buf_pool, scan_all)
+ || buf_LRU_free_from_common_LRU_list(
+ buf_pool, scan_all));
}
/******************************************************************//**
@@ -797,23 +724,17 @@ buf_LRU_get_free_only(
}
/******************************************************************//**
-Returns a free block from the buf_pool. The block is taken off the
-free list. If it is empty, blocks are moved from the end of the
-LRU list to the free list.
-@return the free control block, in state BUF_BLOCK_READY_FOR_USE */
-UNIV_INTERN
-buf_block_t*
-buf_LRU_get_free_block(
-/*===================*/
- buf_pool_t* buf_pool) /*!< in/out: buffer pool instance */
+Checks how much of buf_pool is occupied by non-data objects like
+AHI, lock heaps etc. Depending on the size of non-data objects this
+function will either assert or issue a warning and switch on the
+status monitor. */
+static
+void
+buf_LRU_check_size_of_non_data_objects(
+/*===================================*/
+ const buf_pool_t* buf_pool) /*!< in: buffer pool instance */
{
- buf_block_t* block = NULL;
- ibool freed;
- ulint n_iterations = 1;
- ibool mon_value_was = FALSE;
- ibool started_monitor = FALSE;
-loop:
- buf_pool_mutex_enter(buf_pool);
+ ut_ad(buf_pool_mutex_own(buf_pool));
if (!recv_recovery_on && UT_LIST_GET_LEN(buf_pool->free)
+ UT_LIST_GET_LEN(buf_pool->LRU) < buf_pool->curr_size / 20) {
@@ -878,12 +799,59 @@ loop:
buf_lru_switched_on_innodb_mon = FALSE;
srv_print_innodb_monitor = FALSE;
}
+}
+
+/******************************************************************//**
+Returns a free block from the buf_pool. The block is taken off the
+free list. If free list is empty, blocks are moved from the end of the
+LRU list to the free list.
+This function is called from a user thread when it needs a clean
+block to read in a page. Note that we only ever get a block from
+the free list. Even when we flush a page or find a page in LRU scan
+we put it to free list to be used.
+* iteration 0:
+ * get a block from free list, success:done
+ * if there is an LRU flush batch in progress:
+ * wait for batch to end: retry free list
+ * if buf_pool->try_LRU_scan is set
+ * scan LRU up to srv_LRU_scan_depth to find a clean block
+ * the above will put the block on free list
+ * success:retry the free list
+ * flush one dirty page from tail of LRU to disk
+ * the above will put the block on free list
+ * success: retry the free list
+* iteration 1:
+ * same as iteration 0 except:
+ * scan whole LRU list
+ * scan LRU list even if buf_pool->try_LRU_scan is not set
+* iteration > 1:
+ * same as iteration 1 but sleep 100ms
+@return the free control block, in state BUF_BLOCK_READY_FOR_USE */
+UNIV_INTERN
+buf_block_t*
+buf_LRU_get_free_block(
+/*===================*/
+ buf_pool_t* buf_pool) /*!< in/out: buffer pool instance */
+{
+ buf_block_t* block = NULL;
+ ibool freed = FALSE;
+ ulint n_iterations = 0;
+ ulint flush_failures = 0;
+ ibool mon_value_was = FALSE;
+ ibool started_monitor = FALSE;
+
+ MONITOR_INC(MONITOR_LRU_GET_FREE_SEARCH);
+loop:
+ buf_pool_mutex_enter(buf_pool);
+
+ buf_LRU_check_size_of_non_data_objects(buf_pool);
/* If there is a block in the free list, take it */
block = buf_LRU_get_free_only(buf_pool);
- buf_pool_mutex_exit(buf_pool);
if (block) {
+
+ buf_pool_mutex_exit(buf_pool);
ut_ad(buf_pool_from_block(block) == buf_pool);
memset(&block->page.zip, 0, sizeof block->page.zip);
@@ -894,20 +862,52 @@ loop:
return(block);
}
- /* If no block was in the free list, search from the end of the LRU
- list and try to free a block there */
+ if (buf_pool->init_flush[BUF_FLUSH_LRU]
+ && srv_use_doublewrite_buf
+ && trx_doublewrite != NULL) {
+
+ /* If there is an LRU flush happening in the background
+ then we wait for it to end instead of trying a single
+ page flush. If, however, we are not using doublewrite
+ buffer then it is better to do our own single page
+ flush instead of waiting for LRU flush to end. */
+ buf_pool_mutex_exit(buf_pool);
+ buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);
+ goto loop;
+ }
- freed = buf_LRU_search_and_free_block(buf_pool, n_iterations);
+ freed = FALSE;
+ if (buf_pool->try_LRU_scan || n_iterations > 0) {
+ /* If no block was in the free list, search from the
+ end of the LRU list and try to free a block there.
+ If we are doing for the first time we'll scan only
+ tail of the LRU list otherwise we scan the whole LRU
+ list. */
+ freed = buf_LRU_scan_and_free_block(buf_pool,
+ n_iterations > 0);
+
+ if (!freed && n_iterations == 0) {
+ /* Tell other threads that there is no point
+ in scanning the LRU list. This flag is set to
+ TRUE again when we flush a batch from this
+ buffer pool. */
+ buf_pool->try_LRU_scan = FALSE;
+ }
+ }
- if (freed > 0) {
+ buf_pool_mutex_exit(buf_pool);
+
+ if (freed) {
goto loop;
+
}
- if (n_iterations > 30) {
+ if (n_iterations > 20) {
ut_print_timestamp(stderr);
fprintf(stderr,
" InnoDB: Warning: difficult to find free blocks in\n"
- "InnoDB: the buffer pool (%lu search iterations)!"
+ "InnoDB: the buffer pool (%lu search iterations)!\n"
+ "InnoDB: %lu failed attempts to flush a page!"
" Consider\n"
"InnoDB: increasing the buffer pool size.\n"
"InnoDB: It is also possible that"
@@ -926,6 +926,7 @@ loop:
"InnoDB: Starting InnoDB Monitor to print further\n"
"InnoDB: diagnostics to the standard output.\n",
(ulong) n_iterations,
+ (ulong) flush_failures,
(ulong) fil_n_pending_log_flushes,
(ulong) fil_n_pending_tablespace_flushes,
(ulong) os_n_file_reads, (ulong) os_n_file_writes,
@@ -937,31 +938,31 @@ loop:
os_event_set(srv_timeout_event);
}
- /* No free block was found: try to flush the LRU list */
-
- buf_flush_free_margin(buf_pool);
- ++srv_buf_pool_wait_free;
-
- os_aio_simulated_wake_handler_threads();
-
- buf_pool_mutex_enter(buf_pool);
-
- if (buf_pool->LRU_flush_ended > 0) {
- /* We have written pages in an LRU flush. To make the insert
- buffer more efficient, we try to move these pages to the free
- list. */
-
- buf_pool_mutex_exit(buf_pool);
-
- buf_LRU_try_free_flushed_blocks(buf_pool);
- } else {
- buf_pool_mutex_exit(buf_pool);
+ /* If we have scanned the whole LRU and still are unable to
+ find a free block then we should sleep here to let the
+ page_cleaner do an LRU batch for us.
+ TODO: It'd be better if we can signal the page_cleaner. Perhaps
+ we should use timed wait for page_cleaner. */
+ if (n_iterations > 1) {
+
+ os_thread_sleep(100000);
+ }
+
+ /* No free block was found: try to flush the LRU list.
+ This call will flush one page from the LRU and put it on the
+ free list. That means that the free block is up for grabs for
+ all user threads.
+ TODO: A more elegant way would have been to return the freed
+ up block to the caller here but the code that deals with
+ removing the block from page_hash and LRU_list is fairly
+ involved (particularly in case of compressed pages). We
+ can do that in a separate patch sometime in future. */
+ if (!buf_flush_single_page_from_LRU(buf_pool)) {
+ MONITOR_INC(MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT);
+ ++flush_failures;
}
- if (n_iterations > 10) {
-
- os_thread_sleep(500000);
- }
+ ++srv_buf_pool_wait_free;
n_iterations++;
@@ -1589,6 +1590,22 @@ func_exit:
rw_lock_x_unlock(hash_lock);
mutex_exit(block_mutex);
+ } else {
+
+ /* There can be multiple threads doing an LRU scan to
+ free a block. The page_cleaner thread can be doing an
+ LRU batch whereas user threads can potentially be doing
+ multiple single page flushes. As we release
+ buf_pool->mutex below we need to make sure that no one
+ else considers this block as a victim for page
+ replacement. This block is already out of page_hash
+ and we are about to remove it from the LRU list and put
+ it on the free list. To avoid this situation we set the
+ buf_fix_count and io_fix fields here. */
+ mutex_enter(block_mutex);
+ buf_block_buf_fix_inc((buf_block_t*) bpage, __FILE__, __LINE__);
+ buf_page_set_io_fix(bpage, BUF_IO_READ);
+ mutex_exit(block_mutex);
}
buf_pool_mutex_exit(buf_pool);
@@ -1629,6 +1646,13 @@ func_exit:
b->buf_fix_count--;
buf_page_set_io_fix(b, BUF_IO_NONE);
mutex_exit(&buf_pool->zip_mutex);
+ } else {
+ mutex_enter(block_mutex);
+ ut_ad(bpage->buf_fix_count > 0);
+ ut_ad(bpage->io_fix == BUF_IO_READ);
+ buf_block_buf_fix_dec((buf_block_t*) bpage);
+ buf_page_set_io_fix(bpage, BUF_IO_NONE);
+ mutex_exit(block_mutex);
}
buf_LRU_block_free_hashed_page((buf_block_t*) bpage);
=== modified file 'storage/innobase/buf/buf0rea.c'
--- a/storage/innobase/buf/buf0rea.c revid:marc.alff@stripped
+++ b/storage/innobase/buf/buf0rea.c revid:inaam.rana@stripped
@@ -355,7 +355,6 @@ buf_read_page(
ulint zip_size,/*!< in: compressed page size in bytes, or 0 */
ulint offset) /*!< in: page number */
{
- buf_pool_t* buf_pool = buf_pool_get(space, offset);
ib_int64_t tablespace_version;
ulint count;
ulint err;
@@ -379,9 +378,6 @@ buf_read_page(
(ulong) space, (ulong) offset);
}
- /* Flush pages from the end of the LRU list if necessary */
- buf_flush_free_margin(buf_pool);
-
/* Increment number of I/O operations used for LRU policy. */
buf_LRU_stat_inc_io();
@@ -401,7 +397,6 @@ buf_read_page_async(
ulint space, /*!< in: space id */
ulint offset) /*!< in: page number */
{
- buf_pool_t* buf_pool = buf_pool_get(space, offset);
ulint zip_size;
ib_int64_t tablespace_version;
ulint count;
@@ -422,9 +417,6 @@ buf_read_page_async(
tablespace_version, offset);
srv_buf_pool_reads += count;
- /* Flush pages from the end of the LRU list if necessary */
- buf_flush_free_margin(buf_pool);
-
/* We do not increment number of I/O operations used for LRU policy
here (buf_LRU_stat_inc_io()). We use this in heuristics to decide
about evicting uncompressed version of compressed pages from the
@@ -701,9 +693,6 @@ buf_read_ahead_linear(
os_aio_simulated_wake_handler_threads();
- /* Flush pages from the end of the LRU list if necessary */
- buf_flush_free_margin(buf_pool);
-
#ifdef UNIV_DEBUG
if (buf_debug_prints && (count > 0)) {
fprintf(stderr,
@@ -789,9 +778,6 @@ tablespace_deleted:
os_aio_simulated_wake_handler_threads();
- /* Flush pages from the end of all the LRU lists if necessary */
- buf_flush_free_margins();
-
#ifdef UNIV_DEBUG
if (buf_debug_prints) {
fprintf(stderr,
@@ -883,9 +869,6 @@ buf_read_recv_pages(
os_aio_simulated_wake_handler_threads();
- /* Flush pages from the end of all the LRU lists if necessary */
- buf_flush_free_margins();
-
#ifdef UNIV_DEBUG
if (buf_debug_prints) {
fprintf(stderr,
=== modified file 'storage/innobase/handler/ha_innodb.cc'
--- a/storage/innobase/handler/ha_innodb.cc revid:marc.alff@stripped
+++ b/storage/innobase/handler/ha_innodb.cc revid:inaam.rana@stripped
@@ -12430,6 +12430,11 @@ static MYSQL_SYSVAR_ULONG(page_hash_lock
PLUGIN_VAR_OPCMDARG | PLUGIN_VAR_READONLY,
"Number of rw_locks protecting buffer pool page_hash. Rounded up to the next power of 2",
NULL, NULL, 16, 1, MAX_PAGE_HASH_LOCKS, 0);
+
+static MYSQL_SYSVAR_ULONG(doublewrite_batch_size, srv_doublewrite_batch_size,
+ PLUGIN_VAR_OPCMDARG | PLUGIN_VAR_READONLY,
+ "Number of pages reserved in doublewrite buffer for batch flushing",
+ NULL, NULL, 120, 1, 127, 0);
#endif /* defined UNIV_DEBUG || defined UNIV_PERF_DEBUG */
static MYSQL_SYSVAR_LONG(buffer_pool_instances, innobase_buffer_pool_instances,
@@ -12468,6 +12473,16 @@ static MYSQL_SYSVAR_BOOL(buffer_pool_loa
"Load the buffer pool from a file named @@innodb_buffer_pool_filename",
NULL, NULL, FALSE);
+static MYSQL_SYSVAR_ULONG(lru_scan_depth, srv_LRU_scan_depth,
+ PLUGIN_VAR_RQCMDARG,
+ "How deep to scan LRU to keep it clean",
+ NULL, NULL, 1024, 100, ~0L, 0);
+
+static MYSQL_SYSVAR_BOOL(flush_neighbors, srv_flush_neighbors,
+ PLUGIN_VAR_NOCMDARG,
+ "Flush neighbors from buffer pool when flushing a block.",
+ NULL, NULL, TRUE);
+
static MYSQL_SYSVAR_ULONG(commit_concurrency, innobase_commit_concurrency,
PLUGIN_VAR_RQCMDARG,
"Helps in performance tuning in heavily concurrent environments.",
@@ -12698,6 +12713,8 @@ static struct st_mysql_sys_var* innobase
MYSQL_SYSVAR(buffer_pool_load_now),
MYSQL_SYSVAR(buffer_pool_load_abort),
MYSQL_SYSVAR(buffer_pool_load_at_startup),
+ MYSQL_SYSVAR(lru_scan_depth),
+ MYSQL_SYSVAR(flush_neighbors),
MYSQL_SYSVAR(checksums),
MYSQL_SYSVAR(commit_concurrency),
MYSQL_SYSVAR(concurrency_tickets),
@@ -12770,6 +12787,7 @@ static struct st_mysql_sys_var* innobase
MYSQL_SYSVAR(purge_batch_size),
#if defined UNIV_DEBUG || defined UNIV_PERF_DEBUG
MYSQL_SYSVAR(page_hash_locks),
+ MYSQL_SYSVAR(doublewrite_batch_size),
#endif /* defined UNIV_DEBUG || defined UNIV_PERF_DEBUG */
MYSQL_SYSVAR(print_all_deadlocks),
MYSQL_SYSVAR(undo_logs),
=== modified file 'storage/innobase/ibuf/ibuf0ibuf.c'
--- a/storage/innobase/ibuf/ibuf0ibuf.c revid:marc.alff@stripped
+++ b/storage/innobase/ibuf/ibuf0ibuf.c revid:inaam.rana@stripped
@@ -197,9 +197,6 @@ UNIV_INTERN uint ibuf_debug;
/** The insert buffer control structure */
UNIV_INTERN ibuf_t* ibuf = NULL;
-/** Counter for ibuf_should_try() */
-UNIV_INTERN ulint ibuf_flush_count = 0;
-
#ifdef UNIV_PFS_MUTEX
UNIV_INTERN mysql_pfs_key_t ibuf_pessimistic_insert_mutex_key;
UNIV_INTERN mysql_pfs_key_t ibuf_mutex_key;
=== modified file 'storage/innobase/include/buf0buf.h'
--- a/storage/innobase/include/buf0buf.h revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0buf.h revid:inaam.rana@stripped
@@ -145,6 +145,10 @@ struct buf_pool_info_struct{
ulint n_pend_reads; /*!< buf_pool->n_pend_reads, pages
pending read */
ulint n_pending_flush_lru; /*!< Pages pending flush in LRU */
+ ulint n_pending_flush_single_page;/*!< Pages pending to be
+ flushed as part of single page
+ flushes issued by various user
+ threads */
ulint n_pending_flush_list; /*!< Pages pending flush in FLUSH
LIST */
ulint n_pages_made_young; /*!< number of pages made young */
@@ -1844,10 +1848,16 @@ struct buf_pool_struct{
to read this for heuristic
purposes without holding any
mutex or latch */
- ulint LRU_flush_ended;/*!< when an LRU flush ends for a page,
- this is incremented by one; this is
- set to zero when a buffer block is
- allocated */
+ ibool try_LRU_scan; /*!< Set to FALSE when an LRU
+ scan for free block fails. This
+ flag is used to avoid repeated
+ scans of LRU list when we know
+ that there is no free block
+ available in the scan depth for
+ eviction. Set to TRUE whenever
+ we flush a batch from the
+ buffer pool. Protected by the
+ buf_pool->mutex */
/* @} */
/** @name LRU replacement algorithm fields */
=== modified file 'storage/innobase/include/buf0buf.ic'
--- a/storage/innobase/include/buf0buf.ic revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0buf.ic revid:inaam.rana@stripped
@@ -373,9 +373,10 @@ buf_page_get_flush_type(
switch (flush_type) {
case BUF_FLUSH_LRU:
case BUF_FLUSH_LIST:
+ case BUF_FLUSH_SINGLE_PAGE:
return(flush_type);
case BUF_FLUSH_N_TYPES:
- break;
+ ut_error;
}
ut_error;
#endif /* UNIV_DEBUG */
=== modified file 'storage/innobase/include/buf0flu.h'
--- a/storage/innobase/include/buf0flu.h revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0flu.h revid:inaam.rana@stripped
@@ -60,21 +60,6 @@ void
buf_flush_write_complete(
/*=====================*/
buf_page_t* bpage); /*!< in: pointer to the block in question */
-/*********************************************************************//**
-Flushes pages from the end of the LRU list if there is too small
-a margin of replaceable pages there. If buffer pool is NULL it
-means flush free margin on all buffer pool instances. */
-UNIV_INTERN
-void
-buf_flush_free_margin(
-/*==================*/
- buf_pool_t* buf_pool);
-/*********************************************************************//**
-Flushes pages from the end of all the LRU lists. */
-UNIV_INTERN
-void
-buf_flush_free_margins(void);
-/*=========================*/
#endif /* !UNIV_HOTBACKUP */
/********************************************************************//**
Initializes a page for writing to the tablespace. */
@@ -103,21 +88,6 @@ buf_flush_page_try(
__attribute__((nonnull, warn_unused_result));
# endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
/*******************************************************************//**
-This utility flushes dirty blocks from the end of the LRU list.
-NOTE: The calling thread may own latches to pages: to avoid deadlocks,
-this function must be written so that it cannot end up waiting for these
-latches!
-@return number of blocks for which the write request was queued;
-ULINT_UNDEFINED if there was a flush of the same type already running */
-UNIV_INTERN
-ulint
-buf_flush_LRU(
-/*==========*/
- buf_pool_t* buf_pool, /*!< in: buffer pool instance */
- ulint min_n); /*!< in: wished minimum mumber of blocks
- flushed (it is not guaranteed that the
- actual number is that big, though) */
-/*******************************************************************//**
This utility flushes dirty blocks from the end of the flush_list of
all buffer pool instances.
NOTE: The calling thread is not allowed to own any latches on pages!
@@ -136,6 +106,19 @@ buf_flush_list(
(if their number does not exceed
min_n), otherwise ignored */
/******************************************************************//**
+This function picks up a single dirty page from the tail of the LRU
+list, flushes it, removes it from page_hash and LRU list and puts
+it on the free list. It is called from user threads when they are
+unable to find a replacable page at the tail of the LRU list i.e.:
+when the background LRU flushing in the page_cleaner thread is not
+fast enough to keep pace with the workload.
+@return TRUE if success. */
+UNIV_INTERN
+ibool
+buf_flush_single_page_from_LRU(
+/*===========================*/
+ buf_pool_t* buf_pool); /*!< in/out: buffer pool instance */
+/******************************************************************//**
Waits until a flush batch of the given type ends */
UNIV_INTERN
void
@@ -249,15 +232,6 @@ UNIV_INTERN
void
buf_flush_free_flush_rbt(void);
/*==========================*/
-
-/** When buf_flush_free_margin is called, it tries to make this many blocks
-available to replacement in the free list and at the end of the LRU list (to
-make sure that a read-ahead batch can be read efficiently in a single
-sweep). */
-#define BUF_FLUSH_FREE_BLOCK_MARGIN(b) (5 + BUF_READ_AHEAD_AREA(b))
-/** Extra margin to apply above BUF_FLUSH_FREE_BLOCK_MARGIN */
-#define BUF_FLUSH_EXTRA_MARGIN(b) ((BUF_FLUSH_FREE_BLOCK_MARGIN(b) / 4 \
- + 100) / srv_buf_pool_instances)
#endif /* !UNIV_HOTBACKUP */
#ifndef UNIV_NONINL
=== modified file 'storage/innobase/include/buf0lru.h'
--- a/storage/innobase/include/buf0lru.h revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0lru.h revid:inaam.rana@stripped
@@ -32,19 +32,6 @@ Created 11/5/1995 Heikki Tuuri
#include "buf0types.h"
/******************************************************************//**
-Tries to remove LRU flushed blocks from the end of the LRU list and put them
-to the free list. This is beneficial for the efficiency of the insert buffer
-operation, as flushed pages from non-unique non-clustered indexes are here
-taken out of the buffer pool, and their inserts redirected to the insert
-buffer. Otherwise, the flushed blocks could get modified again before read
-operations need new buffer blocks, and the i/o work done in flushing would be
-wasted. */
-UNIV_INTERN
-void
-buf_LRU_try_free_flushed_blocks(
-/*============================*/
- buf_pool_t* buf_pool); /*!< in: buffer pool instance */
-/******************************************************************//**
Returns TRUE if less than 25 % of the buffer pool is available. This can be
used in heuristics to prevent huge transactions eating up the whole buffer
pool for their locks.
@@ -61,9 +48,6 @@ These are low-level functions
/** Minimum LRU list length for which the LRU_old pointer is defined */
#define BUF_LRU_OLD_MIN_LEN 512 /* 8 megabytes of 16k pages */
-/** Maximum LRU list search length in buf_flush_LRU_recommendation() */
-#define BUF_LRU_FREE_SEARCH_LEN(b) (5 + 2 * BUF_READ_AHEAD_AREA(b))
-
/******************************************************************//**
Invalidates all pages belonging to a given tablespace when we are deleting
the data file(s) of that tablespace. A PROBLEM: if readahead is being started,
@@ -108,19 +92,13 @@ Try to free a replaceable block.
@return TRUE if found and freed */
UNIV_INTERN
ibool
-buf_LRU_search_and_free_block(
-/*==========================*/
+buf_LRU_scan_and_free_block(
+/*========================*/
buf_pool_t* buf_pool, /*!< in: buffer pool instance */
- ulint n_iterations); /*!< in: how many times this has
- been called repeatedly without
- result: a high value means that
- we should search farther; if
- n_iterations < 10, then we search
- n_iterations / 10 * buf_pool->curr_size
- pages from the end of the LRU list; if
- n_iterations < 5, then we will
- also search n_iterations / 5
- of the unzip_LRU list. */
+ ibool scan_all) /*!< in: scan whole LRU list
+ if TRUE, otherwise scan only
+ 'old' blocks. */
+ __attribute__((nonnull,warn_unused_result));
/******************************************************************//**
Returns a free block from the buf_pool. The block is taken off the
free list. If it is empty, returns NULL.
@@ -134,6 +112,27 @@ buf_LRU_get_free_only(
Returns a free block from the buf_pool. The block is taken off the
free list. If it is empty, blocks are moved from the end of the
LRU list to the free list.
+This function is called from a user thread when it needs a clean
+block to read in a page. Note that we only ever get a block from
+the free list. Even when we flush a page or find a page in LRU scan
+we put it to free list to be used.
+* iteration 0:
+ * get a block from free list, success:done
+ * if there is an LRU flush batch in progress:
+ * wait for batch to end: retry free list
+ * if buf_pool->try_LRU_scan is set
+ * scan LRU up to srv_LRU_scan_depth to find a clean block
+ * the above will put the block on free list
+ * success:retry the free list
+ * flush one dirty page from tail of LRU to disk
+ * the above will put the block on free list
+ * success: retry the free list
+* iteration 1:
+ * same as iteration 0 except:
+ * scan whole LRU list
+ * scan LRU list even if buf_pool->try_LRU_scan is not set
+* iteration > 1:
+ * same as iteration 1 but sleep 100ms
@return the free control block, in state BUF_BLOCK_READY_FOR_USE */
UNIV_INTERN
buf_block_t*
@@ -141,7 +140,15 @@ buf_LRU_get_free_block(
/*===================*/
buf_pool_t* buf_pool) /*!< in/out: buffer pool instance */
__attribute__((nonnull,warn_unused_result));
-
+/******************************************************************//**
+Determines if the unzip_LRU list should be used for evicting a victim
+instead of the general LRU list.
+@return TRUE if should use unzip_LRU */
+UNIV_INTERN
+ibool
+buf_LRU_evict_from_unzip_LRU(
+/*=========================*/
+ buf_pool_t* buf_pool);
/******************************************************************//**
Puts a block back to the free list. */
UNIV_INTERN
=== modified file 'storage/innobase/include/buf0types.h'
--- a/storage/innobase/include/buf0types.h revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0types.h revid:inaam.rana@stripped
@@ -47,6 +47,8 @@ enum buf_flush {
BUF_FLUSH_LRU = 0, /*!< flush via the LRU list */
BUF_FLUSH_LIST, /*!< flush via the flush list
of dirty blocks */
+ BUF_FLUSH_SINGLE_PAGE, /*!< flush via the LRU list
+ but only a single page */
BUF_FLUSH_N_TYPES /*!< index of last element + 1 */
};
=== modified file 'storage/innobase/include/ibuf0ibuf.ic'
--- a/storage/innobase/include/ibuf0ibuf.ic revid:marc.alff@stripped
+++ b/storage/innobase/include/ibuf0ibuf.ic revid:inaam.rana@stripped
@@ -28,9 +28,6 @@ Created 7/19/1997 Heikki Tuuri
#ifndef UNIV_HOTBACKUP
#include "buf0lru.h"
-/** Counter for ibuf_should_try() */
-extern ulint ibuf_flush_count;
-
/** An index page must contain at least UNIV_PAGE_SIZE /
IBUF_PAGE_SIZE_PER_FREE_SPACE bytes of free space for ibuf to try to
buffer inserts to this page. If there is this much of free space, the
@@ -127,22 +124,10 @@ ibuf_should_try(
a secondary index when we
decide */
{
- if (ibuf_use != IBUF_USE_NONE
- && ibuf->max_size != 0
- && !dict_index_is_clust(index)
- && (ignore_sec_unique || !dict_index_is_unique(index))) {
-
- ibuf_flush_count++;
-
- if (ibuf_flush_count % 4 == 0) {
-
- buf_LRU_try_free_flushed_blocks(NULL);
- }
-
- return(TRUE);
- }
-
- return(FALSE);
+ return(ibuf_use != IBUF_USE_NONE
+ && ibuf->max_size != 0
+ && !dict_index_is_clust(index)
+ && (ignore_sec_unique || !dict_index_is_unique(index)));
}
/******************************************************************//**
=== modified file 'storage/innobase/include/srv0mon.h'
--- a/storage/innobase/include/srv0mon.h revid:marc.alff@stripped
+++ b/storage/innobase/include/srv0mon.h revid:inaam.rana@stripped
@@ -167,25 +167,47 @@ enum monitor_id_value {
MONITOR_OVLD_PAGES_READ,
MONITOR_OVLD_BYTE_READ,
MONITOR_OVLD_BYTE_WRITTEN,
- MONITOR_NUM_ADAPTIVE_FLUSHES,
- MONITOR_FLUSH_ADAPTIVE_PAGES,
- MONITOR_NUM_ASYNC_FLUSHES,
- MONITOR_FLUSH_ASYNC_PAGES,
- MONITOR_NUM_SYNC_FLUSHES,
- MONITOR_FLUSH_SYNC_PAGES,
- MONITOR_NUM_MAX_DIRTY_FLUSHES,
- MONITOR_FLUSH_MAX_DIRTY_PAGES,
- MONITOR_NUM_FREE_MARGIN_FLUSHES,
- MONITOR_FLUSH_FREE_MARGIN_PAGES,
- MONITOR_FLUSH_IO_CAPACITY_PCT,
MONITOR_FLUSH_BATCH_SCANNED,
MONITOR_FLUSH_BATCH_SCANNED_NUM_CALL,
MONITOR_FLUSH_BATCH_SCANNED_PER_CALL,
MONITOR_FLUSH_BATCH_TOTAL_PAGE,
MONITOR_FLUSH_BATCH_COUNT,
MONITOR_FLUSH_BATCH_PAGES,
- MONITOR_BUF_FLUSH_LRU,
- MONITOR_BUF_FLUSH_LIST,
+ MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
+ MONITOR_FLUSH_NEIGHBOR_COUNT,
+ MONITOR_FLUSH_NEIGHBOR_PAGES,
+ MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE,
+ MONITOR_FLUSH_MAX_DIRTY_COUNT,
+ MONITOR_FLUSH_MAX_DIRTY_PAGES,
+ MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
+ MONITOR_FLUSH_ADAPTIVE_COUNT,
+ MONITOR_FLUSH_ADAPTIVE_PAGES,
+ MONITOR_FLUSH_ASYNC_TOTAL_PAGE,
+ MONITOR_FLUSH_ASYNC_COUNT,
+ MONITOR_FLUSH_ASYNC_PAGES,
+ MONITOR_FLUSH_SYNC_TOTAL_PAGE,
+ MONITOR_FLUSH_SYNC_COUNT,
+ MONITOR_FLUSH_SYNC_PAGES,
+ MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
+ MONITOR_FLUSH_BACKGROUND_COUNT,
+ MONITOR_FLUSH_BACKGROUND_PAGES,
+ MONITOR_LRU_BATCH_SCANNED,
+ MONITOR_LRU_BATCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_BATCH_SCANNED_PER_CALL,
+ MONITOR_LRU_BATCH_TOTAL_PAGE,
+ MONITOR_LRU_BATCH_COUNT,
+ MONITOR_LRU_BATCH_PAGES,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED_PER_CALL,
+ MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT,
+ MONITOR_LRU_GET_FREE_SEARCH,
+ MONITOR_LRU_SEARCH_SCANNED,
+ MONITOR_LRU_SEARCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_SEARCH_SCANNED_PER_CALL,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED_PER_CALL,
/* Buffer Page I/O specific counters. */
MONITOR_MODULE_BUF_PAGE,
=== modified file 'storage/innobase/include/srv0srv.h'
--- a/storage/innobase/include/srv0srv.h revid:marc.alff@stripped
+++ b/storage/innobase/include/srv0srv.h revid:inaam.rana@stripped
@@ -178,6 +178,10 @@ extern ulint srv_buf_pool_size; /*!< req
extern ulint srv_buf_pool_instances; /*!< requested number of buffer pool instances */
extern ulong srv_n_page_hash_locks; /*!< number of locks to
protect buf_pool->page_hash */
+extern ulong srv_LRU_scan_depth; /*!< Scan depth for LRU
+ flush batch */
+extern my_bool srv_flush_neighbors; /*!< whether or not to flush
+ neighbors of a block */
extern ulint srv_buf_pool_old_size; /*!< previously requested size */
extern ulint srv_buf_pool_curr_size; /*!< current size in bytes */
extern ulint srv_mem_pool_size;
@@ -230,6 +234,7 @@ extern unsigned long long srv_stats_tran
extern unsigned long long srv_stats_persistent_sample_pages;
extern ibool srv_use_doublewrite_buf;
+extern ulong srv_doublewrite_batch_size;
extern ibool srv_use_checksums;
extern ulong srv_max_buf_pool_modified_pct;
=== modified file 'storage/innobase/include/trx0sys.h'
--- a/storage/innobase/include/trx0sys.h revid:marc.alff@stripped
+++ b/storage/innobase/include/trx0sys.h revid:inaam.rana@stripped
@@ -659,6 +659,14 @@ struct trx_doublewrite_struct{
ulint block2; /*!< page number of the second block */
ulint first_free; /*!< first free position in write_buf measured
in units of UNIV_PAGE_SIZE */
+ ulint n_reserved; /*!< number of slots currently reserved
+ for single page flushes. */
+ ibool* in_use; /*!< flag used to indicate if a slot is
+ in use. Only used for single page
+ flushes. */
+ ibool batch_running; /*!< set to TRUE if currently a batch
+ is being written from the doublewrite
+ buffer. */
byte* write_buf; /*!< write buffer used in writing to the
doublewrite buffer, aligned to an
address divisible by UNIV_PAGE_SIZE
=== modified file 'storage/innobase/log/log0log.c'
--- a/storage/innobase/log/log0log.c revid:marc.alff@stripped
+++ b/storage/innobase/log/log0log.c revid:inaam.rana@stripped
@@ -1644,8 +1644,11 @@ log_preflush_pool_modified_pages(
return(FALSE);
}
- MONITOR_INC(MONITOR_NUM_SYNC_FLUSHES);
- MONITOR_SET(MONITOR_FLUSH_SYNC_PAGES, n_pages);
+ MONITOR_INC_VALUE_CUMULATIVE(
+ MONITOR_FLUSH_SYNC_TOTAL_PAGE,
+ MONITOR_FLUSH_SYNC_COUNT,
+ MONITOR_FLUSH_SYNC_PAGES,
+ n_pages);
return(TRUE);
}
=== modified file 'storage/innobase/srv/srv0mon.c'
--- a/storage/innobase/srv/srv0mon.c revid:marc.alff@stripped
+++ b/storage/innobase/srv/srv0mon.c revid:inaam.rana@stripped
@@ -253,55 +253,7 @@ static monitor_info_t innodb_counter_inf
MONITOR_EXISTING | MONITOR_DEFAULT_ON, 0,
MONITOR_OVLD_BYTE_WRITTEN},
- {"buffer_flush_adaptive_flushes", "buffer",
- "Occurrences of adaptive flush", 0, 0,
- MONITOR_NUM_ADAPTIVE_FLUSHES},
-
- {"buffer_flush_adaptive_pages", "buffer",
- "Number of pages flushed as part of adaptive flushing",
- MONITOR_DISPLAY_CURRENT, 0,
- MONITOR_FLUSH_ADAPTIVE_PAGES},
-
- {"buffer_flush_async_flushes", "buffer",
- "Occurrences of async flush",
- 0, 0, MONITOR_NUM_ASYNC_FLUSHES},
-
- {"buffer_flush_async_pages", "buffer",
- "Number of pages flushed as part of async flushing",
- MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_ASYNC_PAGES},
-
- {"buffer_flush_sync_flushes", "buffer", "Number of sync flushes",
- 0, 0, MONITOR_NUM_SYNC_FLUSHES},
-
- {"buffer_flush_sync_pages", "buffer",
- "Number of pages flushed as part of sync flushing",
- MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_SYNC_PAGES},
-
- {"buffer_flush_max_dirty_flushes", "buffer",
- "Number of flushes as part of max dirty page flush",
- 0, 0, MONITOR_NUM_MAX_DIRTY_FLUSHES},
-
- {"buffer_flush_max_dirty_pages", "buffer",
- "Number of pages flushed as part of max dirty flushing",
- MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_MAX_DIRTY_PAGES},
-
- {"buffer_flush_free_margin_flushes", "buffer",
- "Number of flushes due to lack of replaceable pages in free list",
- 0, 0, MONITOR_NUM_FREE_MARGIN_FLUSHES},
-
- {"buffer_flush_free_margin_pages", "buffer",
- "Number of pages flushed due to lack of replaceable pages"
- " in free list",
- MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_FREE_MARGIN_PAGES},
-
- {"buffer_flush_io_capacity_pct", "buffer",
- "Percent of Server I/O capacity during flushing",
- MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_IO_CAPACITY_PCT},
-
- /* Following three counters are of one monitor set, with
- "buffer_flush_batch_scanned" being the set owner, and averaged
- by "buffer_flush_batch_scanned_num_calls" */
-
+ /* Cumulative counter for scanning in flush batches */
{"buffer_flush_batch_scanned", "buffer",
"Total pages scanned as part of flush batch",
MONITOR_SET_OWNER,
@@ -314,16 +266,13 @@ static monitor_info_t innodb_counter_inf
MONITOR_FLUSH_BATCH_SCANNED_NUM_CALL},
{"buffer_flush_batch_scanned_per_call", "buffer",
- "Page scanned per flush batch scanned",
+ "Pages scanned per flush batch scan",
MONITOR_SET_MEMBER, MONITOR_FLUSH_BATCH_SCANNED,
MONITOR_FLUSH_BATCH_SCANNED_PER_CALL},
- /* Following three counters are of one monitor set, with
- "buffer_flush_batch_scanned" being the set owner, and averaged
- by "buffer_flush_batch_count" */
-
+ /* Cumulative counter for pages flushed in flush batches */
{"buffer_flush_batch_total_pages", "buffer",
- "Total pages scanned as part of flush batch",
+ "Total pages flushed as part of flush batch",
MONITOR_SET_OWNER, MONITOR_FLUSH_BATCH_COUNT,
MONITOR_FLUSH_BATCH_TOTAL_PAGE},
@@ -333,16 +282,196 @@ static monitor_info_t innodb_counter_inf
MONITOR_FLUSH_BATCH_COUNT},
{"buffer_flush_batch_pages", "buffer",
- "Page queued as a flush batch",
+ "Pages queued as a flush batch",
MONITOR_SET_MEMBER, MONITOR_FLUSH_BATCH_TOTAL_PAGE,
MONITOR_FLUSH_BATCH_PAGES},
- {"buffer_flush_by_lru", "buffer",
- "buffer flushed via LRU list", 0, 0, MONITOR_BUF_FLUSH_LRU},
+ /* Cumulative counter for flush batches because of neighbor */
+ {"buffer_flush_neighbor_total_pages", "buffer",
+ "Total neighbors flushed as part of neighbor flush",
+ MONITOR_SET_OWNER, MONITOR_FLUSH_NEIGHBOR_COUNT,
+ MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE},
+
+ {"buffer_flush_neighbor", "buffer",
+ "Number of times neighbors flushing is invoked",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
+ MONITOR_FLUSH_NEIGHBOR_COUNT},
+
+ {"buffer_flush_neighbor_pages", "buffer",
+ "Pages queued as a neighbor batch",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
+ MONITOR_FLUSH_NEIGHBOR_PAGES},
+
+ /* Cumulative counter for flush batches because of max_dirty */
+ {"buffer_flush_max_dirty_total_pages", "buffer",
+ "Total pages flushed as part of max_dirty batches",
+ MONITOR_SET_OWNER, MONITOR_FLUSH_MAX_DIRTY_COUNT,
+ MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE},
+
+ {"buffer_flush_max_dirty", "buffer",
+ "Number of max_dirty batches",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE,
+ MONITOR_FLUSH_MAX_DIRTY_COUNT},
+
+ {"buffer_flush_max_dirty_pages", "buffer",
+ "Pages queued as a max_dirty batch",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE,
+ MONITOR_FLUSH_MAX_DIRTY_PAGES},
+
+ /* Cumulative counter for flush batches because of adaptive */
+ {"buffer_flush_adaptive_total_pages", "buffer",
+ "Total pages flushed as part of adaptive batches",
+ MONITOR_SET_OWNER, MONITOR_FLUSH_ADAPTIVE_COUNT,
+ MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE},
+
+ {"buffer_flush_adaptive", "buffer",
+ "Number of adaptive batches",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
+ MONITOR_FLUSH_ADAPTIVE_COUNT},
+
+ {"buffer_flush_adaptive_pages", "buffer",
+ "Pages queued as an adaptive batch",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
+ MONITOR_FLUSH_ADAPTIVE_PAGES},
+
+ /* Cumulative counter for flush batches because of async */
+ {"buffer_flush_async_total_pages", "buffer",
+ "Total pages flushed as part of async batches",
+ MONITOR_SET_OWNER, MONITOR_FLUSH_ASYNC_COUNT,
+ MONITOR_FLUSH_ASYNC_TOTAL_PAGE},
+
+ {"buffer_flush_async", "buffer",
+ "Number of async batches",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_ASYNC_TOTAL_PAGE,
+ MONITOR_FLUSH_ASYNC_COUNT},
+
+ {"buffer_flush_async_pages", "buffer",
+ "Pages queued as an async batch",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_ASYNC_TOTAL_PAGE,
+ MONITOR_FLUSH_ASYNC_PAGES},
+
+ /* Cumulative counter for flush batches because of sync */
+ {"buffer_flush_sync_total_pages", "buffer",
+ "Total pages flushed as part of sync batches",
+ MONITOR_SET_OWNER, MONITOR_FLUSH_SYNC_COUNT,
+ MONITOR_FLUSH_SYNC_TOTAL_PAGE},
+
+ {"buffer_flush_sync", "buffer",
+ "Number of sync batches",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_SYNC_TOTAL_PAGE,
+ MONITOR_FLUSH_SYNC_COUNT},
+
+ {"buffer_flush_sync_pages", "buffer",
+ "Pages queued as a sync batch",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_SYNC_TOTAL_PAGE,
+ MONITOR_FLUSH_SYNC_PAGES},
+
+ /* Cumulative counter for flush batches because of background */
+ {"buffer_flush_background_total_pages", "buffer",
+ "Total pages flushed as part of background batches",
+ MONITOR_SET_OWNER, MONITOR_FLUSH_BACKGROUND_COUNT,
+ MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE},
+
+ {"buffer_flush_background", "buffer",
+ "Number of background batches",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
+ MONITOR_FLUSH_BACKGROUND_COUNT},
+
+ {"buffer_flush_background_pages", "buffer",
+ "Pages queued as a background batch",
+ MONITOR_SET_MEMBER, MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
+ MONITOR_FLUSH_BACKGROUND_PAGES},
+
+ /* Cumulative counter for LRU batch scan */
+ {"buffer_LRU_batch_scanned", "buffer",
+ "Total pages scanned as part of LRU batch",
+ MONITOR_SET_OWNER, MONITOR_LRU_BATCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_BATCH_SCANNED},
+
+ {"buffer_LRU_batch_num_scan", "buffer",
+ "Number of times LRU batch is called",
+ MONITOR_SET_MEMBER, MONITOR_LRU_BATCH_SCANNED,
+ MONITOR_LRU_BATCH_SCANNED_NUM_CALL},
+
+ {"buffer_LRU_batch_scanned_per_call", "buffer",
+ "Pages scanned per LRU batch call",
+ MONITOR_SET_MEMBER, MONITOR_LRU_BATCH_SCANNED,
+ MONITOR_LRU_BATCH_SCANNED_PER_CALL},
+
+ /* Cumulative counter for LRU batch pages flushed */
+ {"buffer_LRU_batch_total_pages", "buffer",
+ "Total pages flushed as part of LRU batches",
+ MONITOR_SET_OWNER, MONITOR_LRU_BATCH_COUNT,
+ MONITOR_LRU_BATCH_TOTAL_PAGE},
+
+ {"buffer_LRU_batches", "buffer",
+ "Number of LRU batches",
+ MONITOR_SET_MEMBER, MONITOR_LRU_BATCH_TOTAL_PAGE,
+ MONITOR_LRU_BATCH_COUNT},
+
+ {"buffer_LRU_batch_pages", "buffer",
+ "Pages queued as an LRU batch",
+ MONITOR_SET_MEMBER, MONITOR_LRU_BATCH_TOTAL_PAGE,
+ MONITOR_LRU_BATCH_PAGES},
+
+ /* Cumulative counter for single page LRU scans */
+ {"buffer_LRU_single_flush_scanned", "buffer",
+ "Total pages scanned as part of single page LRU flush",
+ MONITOR_SET_OWNER,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED},
+
+ {"buffer_LRU_single_flush_num_scan", "buffer",
+ "Number of times single page LRU flush is called",
+ MONITOR_SET_MEMBER, MONITOR_LRU_SINGLE_FLUSH_SCANNED,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL},
+
+ {"buffer_LRU_single_flush_scanned_per_call", "buffer",
+ "Page scanned per single LRU flush",
+ MONITOR_SET_MEMBER, MONITOR_LRU_SINGLE_FLUSH_SCANNED,
+ MONITOR_LRU_SINGLE_FLUSH_SCANNED_PER_CALL},
+
+ {"buffer_LRU_single_flush_failure_count", "Buffer",
+ "Number of times attempt to flush a single page from LRU failed",
+ 0, 0, MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT},
+
+ {"buffer_LRU_get_free_search", "Buffer",
+ "Number of searches performed for a clean page",
+ 0, 0, MONITOR_LRU_GET_FREE_SEARCH},
+
+ /* Cumulative counter for LRU search scans */
+ {"buffer_LRU_search_scanned", "buffer",
+ "Total pages scanned as part of LRU search",
+ MONITOR_SET_OWNER,
+ MONITOR_LRU_SEARCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_SEARCH_SCANNED},
+
+ {"buffer_LRU_search_num_scan", "buffer",
+ "Number of times LRU search is performed",
+ MONITOR_SET_MEMBER, MONITOR_LRU_SEARCH_SCANNED,
+ MONITOR_LRU_SEARCH_SCANNED_NUM_CALL},
+
+ {"buffer_LRU_search_scanned_per_call", "buffer",
+ "Page scanned per single LRU search",
+ MONITOR_SET_MEMBER, MONITOR_LRU_SEARCH_SCANNED,
+ MONITOR_LRU_SEARCH_SCANNED_PER_CALL},
+
+ /* Cumulative counter for LRU unzip search scans */
+ {"buffer_LRU_unzip_search_scanned", "buffer",
+ "Total pages scanned as part of LRU unzip search",
+ MONITOR_SET_OWNER,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED},
- {"buffer_flush_by_list", "buffer",
- "buffer flushed via flush list of dirty pages",
- 0, 0, MONITOR_BUF_FLUSH_LIST},
+ {"buffer_LRU_unzip_search_num_scan", "buffer",
+ "Number of times LRU unzip search is performed",
+ MONITOR_SET_MEMBER, MONITOR_LRU_UNZIP_SEARCH_SCANNED,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL},
+
+ {"buffer_LRU_unzip_search_scanned_per_call", "buffer",
+ "Page scanned per single LRU unzip search",
+ MONITOR_SET_MEMBER, MONITOR_LRU_UNZIP_SEARCH_SCANNED,
+ MONITOR_LRU_UNZIP_SEARCH_SCANNED_PER_CALL},
/* ========== Counters for Buffer Page I/O ========== */
{"module_buffer_page", "buffer_page_io", "Buffer Page I/O Module",
=== modified file 'storage/innobase/srv/srv0srv.c'
--- a/storage/innobase/srv/srv0srv.c revid:marc.alff@stripped
+++ b/storage/innobase/srv/srv0srv.c revid:inaam.rana@stripped
@@ -204,6 +204,10 @@ UNIV_INTERN ulint srv_buf_pool_size = UL
UNIV_INTERN ulint srv_buf_pool_instances = 1;
/* number of locks to protect buf_pool->page_hash */
UNIV_INTERN ulong srv_n_page_hash_locks = 16;
+/** Scan depth for LRU flush batch i.e.: number of blocks scanned*/
+UNIV_INTERN ulong srv_LRU_scan_depth = 1024;
+/** whether or not to flush neighbors of a block */
+UNIV_INTERN my_bool srv_flush_neighbors = TRUE;
/* previously requested size */
UNIV_INTERN ulint srv_buf_pool_old_size;
/* current size in kilobytes */
@@ -345,6 +349,12 @@ UNIV_INTERN unsigned long long srv_stats
UNIV_INTERN unsigned long long srv_stats_persistent_sample_pages = 20;
UNIV_INTERN ibool srv_use_doublewrite_buf = TRUE;
+
+/** doublewrite buffer is 1MB is size i.e.: it can hold 128 16K pages.
+The following parameter is the size of the buffer that is used for
+batch flushing i.e.: LRU flushing and flush_list flushing. The rest
+of the pages are used for single page flushing. */
+UNIV_INTERN ulong srv_doublewrite_batch_size = 120;
UNIV_INTERN ibool srv_use_checksums = TRUE;
UNIV_INTERN ulong srv_replication_delay = 0;
=== modified file 'storage/innobase/trx/trx0sys.c'
--- a/storage/innobase/trx/trx0sys.c revid:marc.alff@stripped
+++ b/storage/innobase/trx/trx0sys.c revid:inaam.rana@stripped
@@ -180,7 +180,18 @@ trx_doublewrite_init(
byte* doublewrite) /*!< in: pointer to the doublewrite buf
header on trx sys page */
{
- trx_doublewrite = mem_alloc(sizeof(trx_doublewrite_t));
+ ulint buf_size;
+
+ trx_doublewrite = mem_zalloc(sizeof(trx_doublewrite_t));
+
+ /* There are two blocks of same size in the doublewrite
+ buffer. */
+ buf_size = 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE;
+
+ /* There must be atleast one buffer for single page writes
+ and one buffer for batch writes. */
+ ut_a(srv_doublewrite_batch_size > 0
+ && srv_doublewrite_batch_size < buf_size);
/* Since we now start to use the doublewrite buffer, no need to call
fsync() after every write to a data file */
@@ -192,18 +203,22 @@ trx_doublewrite_init(
&trx_doublewrite->mutex, SYNC_DOUBLEWRITE);
trx_doublewrite->first_free = 0;
+ trx_doublewrite->n_reserved = 0;
trx_doublewrite->block1 = mach_read_from_4(
doublewrite + TRX_SYS_DOUBLEWRITE_BLOCK1);
trx_doublewrite->block2 = mach_read_from_4(
doublewrite + TRX_SYS_DOUBLEWRITE_BLOCK2);
- trx_doublewrite->write_buf_unaligned = ut_malloc(
- (1 + 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) * UNIV_PAGE_SIZE);
+ trx_doublewrite->in_use = mem_zalloc(buf_size * sizeof(ibool));
+
+ trx_doublewrite->write_buf_unaligned = ut_malloc(
+ (1 + buf_size) * UNIV_PAGE_SIZE);
trx_doublewrite->write_buf = ut_align(
trx_doublewrite->write_buf_unaligned, UNIV_PAGE_SIZE);
- trx_doublewrite->buf_block_arr = mem_alloc(
- 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE * sizeof(void*));
+
+ trx_doublewrite->buf_block_arr = mem_zalloc(
+ buf_size * sizeof(void*));
}
/****************************************************************//**
@@ -1673,12 +1688,17 @@ trx_sys_close(void)
/* Free the double write data structures. */
ut_a(trx_doublewrite != NULL);
+ ut_ad(trx_doublewrite->n_reserved == 0);
+
ut_free(trx_doublewrite->write_buf_unaligned);
trx_doublewrite->write_buf_unaligned = NULL;
mem_free(trx_doublewrite->buf_block_arr);
trx_doublewrite->buf_block_arr = NULL;
+ mem_free(trx_doublewrite->in_use);
+ trx_doublewrite->in_use = NULL;
+
mutex_free(&trx_doublewrite->mutex);
mem_free(trx_doublewrite);
trx_doublewrite = NULL;
Attachment: [text/bzr-bundle] bzr/inaam.rana@oracle.com-20110810062639-wucshtazrqtzjrc1.bundle
| Thread |
|---|
| • bzr commit into mysql-trunk branch (inaam.rana:3352) | Inaam Rana | 10 Aug |