List:Commits« Previous MessageNext Message »
From:Inaam Rana Date:August 10 2011 6:31am
Subject:bzr commit into mysql-trunk branch (inaam.rana:3352)
View as plain text  
#At file:///home/inaam/w/lru_flush/ based on revid:marc.alff@stripped

 3352 Inaam Rana	2011-08-10
      WL5580: Changes to LRU flushing (InnoDB)
      
      Approved by: Marko, Sunny
      rb://589
      
      This work is performance related. The idea is to off load flushing
      activity that happens in the LRU list from user threads to the
      background thread i.e.: the page_cleaner. Also included in the scope is
      simpler and may be better heuristic for LRU flushing.
      
      Summary of Changes:
      
      New Config Options:
      ===================
      innodb_lru_scan_depth (default 1024): dynamic, min:100, max:~0
      innodb_flush_neighbors (default TRUE): dynamic
      innodb_doublewrite_batch_size (default 120) static, min 1, max 127
      (undocumented for internal testing only. enabled when
      UNIV_PERF_DEBUG is defined)
      
      New LRU flushing algorithm:
      ===========================
      * LRU flushing happens only in page_cleaner thread
      * LRU flushing includes cleaning the tail of LRU list AND putting
      blocks to the free list
      * When a user threads can't find a block in free list or a clean block
      in the tail of LRU then it triggers a new type of flush called
      BUF_FLUSH_SINGLE_PAGE in which it tries to flush a single page from LRU
      list instead of triggering a batch.
      
      Page eviction algorithm:
      ========================
      * iteration 0:
        * get a block from free list, success:done
        * if there is an LRU flush batch in progress:
          * wait for batch to end: retry free list
        * if buf_pool->try_LRU_scan is set
          * scan LRU up to srv_LRU_scan_depth to find a clean block
          * the above will put the block on free list
          * success:retry the free list
        * flush one dirty page from tail of LRU to disk
          * the above will put the block on free list
          * success: retry the free list
      * iteration 1:
        * same as iteration 0 except:
          * scan whole LRU list
          * scan LRU list even if buf_pool->try_LRU_scan is not set
      * iteration > 1:
        * same as iteration 1 but sleep 100ms
      
      Note that potential convoy problem where all user threads try to find
      a clean page in the tail of the LRU list when there is none is resolved
      by introducing buf_pool->try_LRU_scan flag which is set to TRUE when an
      LRU batch is completed and is set to FALSE when an LRU scan fails to
      find a clean page.
      
      Doublewrite buffer changes:
      ===========================
      The doublewrite buffer is split into two parts. First part is used
      for batch flushing (e.g.: LRU flushing and flush_list flushing) while
      the second part is used for single page flushes. The logic for the
      batch flushing remains same. For the single page flushing we use a
      flag to indicate if a slot is in use and we force a write to the disk
      after writing to the doublewrite buffer right away.
      
      There is an undocumented hidden config parameter
      innodb_doublewrite_batch_size which is visible only with
      UNIV_PERF_DEBUG or UNIV_DEBUG. The value determines how much of
      doublewrite is to be used for batch flushing. The default is 120 and 
      allowable values are 1 - 127. It is a static variable.
      
      LRU batch size:
      ===============
      The size of an LRU batch depends on how deep we scan the LRU
      list i.e.: innodb_LRU_scan_depth. But since user threads wait for an
      LRU batch to finish and since the size of doublewrite buffer is 128 it
      makes sense to divide one big LRU batch into multiple chunks.
      PAGE_CLEANER_LRU_BATCH_CHUNK_SIZE == 100 does that i.e.: after flushing
      100 pages the page cleaner signals waiting user threads to proceed to
      grab a free page.

    added:
      mysql-test/suite/sys_vars/r/innodb_flush_neighbors_basic.result
      mysql-test/suite/sys_vars/r/innodb_lru_scan_depth_basic.result
      mysql-test/suite/sys_vars/t/innodb_flush_neighbors_basic.test
      mysql-test/suite/sys_vars/t/innodb_lru_scan_depth_basic.test
    modified:
      mysql-test/suite/innodb/r/innodb_monitor.result
      mysql-test/suite/innodb/t/innodb_buffer_pool_load-master.opt
      mysql-test/suite/sys_vars/r/all_vars.result
      mysql-test/suite/sys_vars/t/all_vars.test
      storage/innobase/btr/btr0sea.c
      storage/innobase/buf/buf0buf.c
      storage/innobase/buf/buf0flu.c
      storage/innobase/buf/buf0lru.c
      storage/innobase/buf/buf0rea.c
      storage/innobase/handler/ha_innodb.cc
      storage/innobase/ibuf/ibuf0ibuf.c
      storage/innobase/include/buf0buf.h
      storage/innobase/include/buf0buf.ic
      storage/innobase/include/buf0flu.h
      storage/innobase/include/buf0lru.h
      storage/innobase/include/buf0types.h
      storage/innobase/include/ibuf0ibuf.ic
      storage/innobase/include/srv0mon.h
      storage/innobase/include/srv0srv.h
      storage/innobase/include/trx0sys.h
      storage/innobase/log/log0log.c
      storage/innobase/srv/srv0mon.c
      storage/innobase/srv/srv0srv.c
      storage/innobase/trx/trx0sys.c
=== modified file 'mysql-test/suite/innodb/r/innodb_monitor.result'
--- a/mysql-test/suite/innodb/r/innodb_monitor.result	revid:marc.alff@stripped
+++ b/mysql-test/suite/innodb/r/innodb_monitor.result	revid:inaam.rana@stripped
@@ -38,25 +38,47 @@ buffer_pages_written	enabled
 buffer_pages_read	enabled
 buffer_data_reads	enabled
 buffer_data_written	enabled
-buffer_flush_adaptive_flushes	disabled
-buffer_flush_adaptive_pages	disabled
-buffer_flush_async_flushes	disabled
-buffer_flush_async_pages	disabled
-buffer_flush_sync_flushes	disabled
-buffer_flush_sync_pages	disabled
-buffer_flush_max_dirty_flushes	disabled
-buffer_flush_max_dirty_pages	disabled
-buffer_flush_free_margin_flushes	disabled
-buffer_flush_free_margin_pages	disabled
-buffer_flush_io_capacity_pct	disabled
 buffer_flush_batch_scanned	disabled
 buffer_flush_batch_num_scan	disabled
 buffer_flush_batch_scanned_per_call	disabled
 buffer_flush_batch_total_pages	disabled
 buffer_flush_batches	disabled
 buffer_flush_batch_pages	disabled
-buffer_flush_by_lru	disabled
-buffer_flush_by_list	disabled
+buffer_flush_neighbor_total_pages	disabled
+buffer_flush_neighbor	disabled
+buffer_flush_neighbor_pages	disabled
+buffer_flush_max_dirty_total_pages	disabled
+buffer_flush_max_dirty	disabled
+buffer_flush_max_dirty_pages	disabled
+buffer_flush_adaptive_total_pages	disabled
+buffer_flush_adaptive	disabled
+buffer_flush_adaptive_pages	disabled
+buffer_flush_async_total_pages	disabled
+buffer_flush_async	disabled
+buffer_flush_async_pages	disabled
+buffer_flush_sync_total_pages	disabled
+buffer_flush_sync	disabled
+buffer_flush_sync_pages	disabled
+buffer_flush_background_total_pages	disabled
+buffer_flush_background	disabled
+buffer_flush_background_pages	disabled
+buffer_LRU_batch_scanned	disabled
+buffer_LRU_batch_num_scan	disabled
+buffer_LRU_batch_scanned_per_call	disabled
+buffer_LRU_batch_total_pages	disabled
+buffer_LRU_batches	disabled
+buffer_LRU_batch_pages	disabled
+buffer_LRU_single_flush_scanned	disabled
+buffer_LRU_single_flush_num_scan	disabled
+buffer_LRU_single_flush_scanned_per_call	disabled
+buffer_LRU_single_flush_failure_count	disabled
+buffer_LRU_get_free_search	disabled
+buffer_LRU_search_scanned	disabled
+buffer_LRU_search_num_scan	disabled
+buffer_LRU_search_scanned_per_call	disabled
+buffer_LRU_unzip_search_scanned	disabled
+buffer_LRU_unzip_search_num_scan	disabled
+buffer_LRU_unzip_search_scanned_per_call	disabled
 buffer_page_read_index_leaf	disabled
 buffer_page_read_index_non_leaf	disabled
 buffer_page_read_index_ibuf_leaf	disabled
@@ -218,25 +240,47 @@ buffer_pages_written	enabled
 buffer_pages_read	enabled
 buffer_data_reads	enabled
 buffer_data_written	enabled
-buffer_flush_adaptive_flushes	enabled
-buffer_flush_adaptive_pages	enabled
-buffer_flush_async_flushes	enabled
-buffer_flush_async_pages	enabled
-buffer_flush_sync_flushes	enabled
-buffer_flush_sync_pages	enabled
-buffer_flush_max_dirty_flushes	enabled
-buffer_flush_max_dirty_pages	enabled
-buffer_flush_free_margin_flushes	enabled
-buffer_flush_free_margin_pages	enabled
-buffer_flush_io_capacity_pct	enabled
 buffer_flush_batch_scanned	enabled
 buffer_flush_batch_num_scan	enabled
 buffer_flush_batch_scanned_per_call	enabled
 buffer_flush_batch_total_pages	enabled
 buffer_flush_batches	enabled
 buffer_flush_batch_pages	enabled
-buffer_flush_by_lru	enabled
-buffer_flush_by_list	enabled
+buffer_flush_neighbor_total_pages	enabled
+buffer_flush_neighbor	enabled
+buffer_flush_neighbor_pages	enabled
+buffer_flush_max_dirty_total_pages	enabled
+buffer_flush_max_dirty	enabled
+buffer_flush_max_dirty_pages	enabled
+buffer_flush_adaptive_total_pages	enabled
+buffer_flush_adaptive	enabled
+buffer_flush_adaptive_pages	enabled
+buffer_flush_async_total_pages	enabled
+buffer_flush_async	enabled
+buffer_flush_async_pages	enabled
+buffer_flush_sync_total_pages	enabled
+buffer_flush_sync	enabled
+buffer_flush_sync_pages	enabled
+buffer_flush_background_total_pages	enabled
+buffer_flush_background	enabled
+buffer_flush_background_pages	enabled
+buffer_LRU_batch_scanned	enabled
+buffer_LRU_batch_num_scan	enabled
+buffer_LRU_batch_scanned_per_call	enabled
+buffer_LRU_batch_total_pages	enabled
+buffer_LRU_batches	enabled
+buffer_LRU_batch_pages	enabled
+buffer_LRU_single_flush_scanned	enabled
+buffer_LRU_single_flush_num_scan	enabled
+buffer_LRU_single_flush_scanned_per_call	enabled
+buffer_LRU_single_flush_failure_count	enabled
+buffer_LRU_get_free_search	enabled
+buffer_LRU_search_scanned	enabled
+buffer_LRU_search_num_scan	enabled
+buffer_LRU_search_scanned_per_call	enabled
+buffer_LRU_unzip_search_scanned	enabled
+buffer_LRU_unzip_search_num_scan	enabled
+buffer_LRU_unzip_search_scanned_per_call	enabled
 buffer_page_read_index_leaf	enabled
 buffer_page_read_index_non_leaf	enabled
 buffer_page_read_index_ibuf_leaf	enabled
@@ -400,25 +444,47 @@ buffer_pages_written	disabled
 buffer_pages_read	disabled
 buffer_data_reads	disabled
 buffer_data_written	disabled
-buffer_flush_adaptive_flushes	disabled
-buffer_flush_adaptive_pages	disabled
-buffer_flush_async_flushes	disabled
-buffer_flush_async_pages	disabled
-buffer_flush_sync_flushes	disabled
-buffer_flush_sync_pages	disabled
-buffer_flush_max_dirty_flushes	disabled
-buffer_flush_max_dirty_pages	disabled
-buffer_flush_free_margin_flushes	disabled
-buffer_flush_free_margin_pages	disabled
-buffer_flush_io_capacity_pct	disabled
 buffer_flush_batch_scanned	disabled
 buffer_flush_batch_num_scan	disabled
 buffer_flush_batch_scanned_per_call	disabled
 buffer_flush_batch_total_pages	disabled
 buffer_flush_batches	disabled
 buffer_flush_batch_pages	disabled
-buffer_flush_by_lru	disabled
-buffer_flush_by_list	disabled
+buffer_flush_neighbor_total_pages	disabled
+buffer_flush_neighbor	disabled
+buffer_flush_neighbor_pages	disabled
+buffer_flush_max_dirty_total_pages	disabled
+buffer_flush_max_dirty	disabled
+buffer_flush_max_dirty_pages	disabled
+buffer_flush_adaptive_total_pages	disabled
+buffer_flush_adaptive	disabled
+buffer_flush_adaptive_pages	disabled
+buffer_flush_async_total_pages	disabled
+buffer_flush_async	disabled
+buffer_flush_async_pages	disabled
+buffer_flush_sync_total_pages	disabled
+buffer_flush_sync	disabled
+buffer_flush_sync_pages	disabled
+buffer_flush_background_total_pages	disabled
+buffer_flush_background	disabled
+buffer_flush_background_pages	disabled
+buffer_LRU_batch_scanned	disabled
+buffer_LRU_batch_num_scan	disabled
+buffer_LRU_batch_scanned_per_call	disabled
+buffer_LRU_batch_total_pages	disabled
+buffer_LRU_batches	disabled
+buffer_LRU_batch_pages	disabled
+buffer_LRU_single_flush_scanned	disabled
+buffer_LRU_single_flush_num_scan	disabled
+buffer_LRU_single_flush_scanned_per_call	disabled
+buffer_LRU_single_flush_failure_count	disabled
+buffer_LRU_get_free_search	disabled
+buffer_LRU_search_scanned	disabled
+buffer_LRU_search_num_scan	disabled
+buffer_LRU_search_scanned_per_call	disabled
+buffer_LRU_unzip_search_scanned	disabled
+buffer_LRU_unzip_search_num_scan	disabled
+buffer_LRU_unzip_search_scanned_per_call	disabled
 buffer_page_read_index_leaf	disabled
 buffer_page_read_index_non_leaf	disabled
 buffer_page_read_index_ibuf_leaf	disabled
@@ -580,25 +646,47 @@ buffer_pages_written	0	disabled
 buffer_pages_read	0	disabled
 buffer_data_reads	0	disabled
 buffer_data_written	0	disabled
-buffer_flush_adaptive_flushes	0	disabled
-buffer_flush_adaptive_pages	0	disabled
-buffer_flush_async_flushes	0	disabled
-buffer_flush_async_pages	0	disabled
-buffer_flush_sync_flushes	0	disabled
-buffer_flush_sync_pages	0	disabled
-buffer_flush_max_dirty_flushes	0	disabled
-buffer_flush_max_dirty_pages	0	disabled
-buffer_flush_free_margin_flushes	0	disabled
-buffer_flush_free_margin_pages	0	disabled
-buffer_flush_io_capacity_pct	0	disabled
 buffer_flush_batch_scanned	0	disabled
 buffer_flush_batch_num_scan	0	disabled
 buffer_flush_batch_scanned_per_call	0	disabled
 buffer_flush_batch_total_pages	0	disabled
 buffer_flush_batches	0	disabled
 buffer_flush_batch_pages	0	disabled
-buffer_flush_by_lru	0	disabled
-buffer_flush_by_list	0	disabled
+buffer_flush_neighbor_total_pages	0	disabled
+buffer_flush_neighbor	0	disabled
+buffer_flush_neighbor_pages	0	disabled
+buffer_flush_max_dirty_total_pages	0	disabled
+buffer_flush_max_dirty	0	disabled
+buffer_flush_max_dirty_pages	0	disabled
+buffer_flush_adaptive_total_pages	0	disabled
+buffer_flush_adaptive	0	disabled
+buffer_flush_adaptive_pages	0	disabled
+buffer_flush_async_total_pages	0	disabled
+buffer_flush_async	0	disabled
+buffer_flush_async_pages	0	disabled
+buffer_flush_sync_total_pages	0	disabled
+buffer_flush_sync	0	disabled
+buffer_flush_sync_pages	0	disabled
+buffer_flush_background_total_pages	0	disabled
+buffer_flush_background	0	disabled
+buffer_flush_background_pages	0	disabled
+buffer_LRU_batch_scanned	0	disabled
+buffer_LRU_batch_num_scan	0	disabled
+buffer_LRU_batch_scanned_per_call	0	disabled
+buffer_LRU_batch_total_pages	0	disabled
+buffer_LRU_batches	0	disabled
+buffer_LRU_batch_pages	0	disabled
+buffer_LRU_single_flush_scanned	0	disabled
+buffer_LRU_single_flush_num_scan	0	disabled
+buffer_LRU_single_flush_scanned_per_call	0	disabled
+buffer_LRU_single_flush_failure_count	0	disabled
+buffer_LRU_get_free_search	0	disabled
+buffer_LRU_search_scanned	0	disabled
+buffer_LRU_search_num_scan	0	disabled
+buffer_LRU_search_scanned_per_call	0	disabled
+buffer_LRU_unzip_search_scanned	0	disabled
+buffer_LRU_unzip_search_num_scan	0	disabled
+buffer_LRU_unzip_search_scanned_per_call	0	disabled
 buffer_page_read_index_leaf	0	disabled
 buffer_page_read_index_non_leaf	0	disabled
 buffer_page_read_index_ibuf_leaf	0	disabled
@@ -814,25 +902,47 @@ buffer_pages_written	enabled
 buffer_pages_read	enabled
 buffer_data_reads	enabled
 buffer_data_written	enabled
-buffer_flush_adaptive_flushes	enabled
-buffer_flush_adaptive_pages	enabled
-buffer_flush_async_flushes	enabled
-buffer_flush_async_pages	enabled
-buffer_flush_sync_flushes	enabled
-buffer_flush_sync_pages	enabled
-buffer_flush_max_dirty_flushes	enabled
-buffer_flush_max_dirty_pages	enabled
-buffer_flush_free_margin_flushes	enabled
-buffer_flush_free_margin_pages	enabled
-buffer_flush_io_capacity_pct	enabled
 buffer_flush_batch_scanned	enabled
 buffer_flush_batch_num_scan	enabled
 buffer_flush_batch_scanned_per_call	enabled
 buffer_flush_batch_total_pages	enabled
 buffer_flush_batches	enabled
 buffer_flush_batch_pages	enabled
-buffer_flush_by_lru	enabled
-buffer_flush_by_list	enabled
+buffer_flush_neighbor_total_pages	enabled
+buffer_flush_neighbor	enabled
+buffer_flush_neighbor_pages	enabled
+buffer_flush_max_dirty_total_pages	enabled
+buffer_flush_max_dirty	enabled
+buffer_flush_max_dirty_pages	enabled
+buffer_flush_adaptive_total_pages	enabled
+buffer_flush_adaptive	enabled
+buffer_flush_adaptive_pages	enabled
+buffer_flush_async_total_pages	enabled
+buffer_flush_async	enabled
+buffer_flush_async_pages	enabled
+buffer_flush_sync_total_pages	enabled
+buffer_flush_sync	enabled
+buffer_flush_sync_pages	enabled
+buffer_flush_background_total_pages	enabled
+buffer_flush_background	enabled
+buffer_flush_background_pages	enabled
+buffer_LRU_batch_scanned	enabled
+buffer_LRU_batch_num_scan	enabled
+buffer_LRU_batch_scanned_per_call	enabled
+buffer_LRU_batch_total_pages	enabled
+buffer_LRU_batches	enabled
+buffer_LRU_batch_pages	enabled
+buffer_LRU_single_flush_scanned	enabled
+buffer_LRU_single_flush_num_scan	enabled
+buffer_LRU_single_flush_scanned_per_call	enabled
+buffer_LRU_single_flush_failure_count	enabled
+buffer_LRU_get_free_search	enabled
+buffer_LRU_search_scanned	enabled
+buffer_LRU_search_num_scan	enabled
+buffer_LRU_search_scanned_per_call	enabled
+buffer_LRU_unzip_search_scanned	enabled
+buffer_LRU_unzip_search_num_scan	enabled
+buffer_LRU_unzip_search_scanned_per_call	enabled
 buffer_page_read_index_leaf	enabled
 buffer_page_read_index_non_leaf	enabled
 buffer_page_read_index_ibuf_leaf	enabled
@@ -994,25 +1104,47 @@ buffer_pages_written	disabled
 buffer_pages_read	disabled
 buffer_data_reads	disabled
 buffer_data_written	disabled
-buffer_flush_adaptive_flushes	disabled
-buffer_flush_adaptive_pages	disabled
-buffer_flush_async_flushes	disabled
-buffer_flush_async_pages	disabled
-buffer_flush_sync_flushes	disabled
-buffer_flush_sync_pages	disabled
-buffer_flush_max_dirty_flushes	disabled
-buffer_flush_max_dirty_pages	disabled
-buffer_flush_free_margin_flushes	disabled
-buffer_flush_free_margin_pages	disabled
-buffer_flush_io_capacity_pct	disabled
 buffer_flush_batch_scanned	disabled
 buffer_flush_batch_num_scan	disabled
 buffer_flush_batch_scanned_per_call	disabled
 buffer_flush_batch_total_pages	disabled
 buffer_flush_batches	disabled
 buffer_flush_batch_pages	disabled
-buffer_flush_by_lru	disabled
-buffer_flush_by_list	disabled
+buffer_flush_neighbor_total_pages	disabled
+buffer_flush_neighbor	disabled
+buffer_flush_neighbor_pages	disabled
+buffer_flush_max_dirty_total_pages	disabled
+buffer_flush_max_dirty	disabled
+buffer_flush_max_dirty_pages	disabled
+buffer_flush_adaptive_total_pages	disabled
+buffer_flush_adaptive	disabled
+buffer_flush_adaptive_pages	disabled
+buffer_flush_async_total_pages	disabled
+buffer_flush_async	disabled
+buffer_flush_async_pages	disabled
+buffer_flush_sync_total_pages	disabled
+buffer_flush_sync	disabled
+buffer_flush_sync_pages	disabled
+buffer_flush_background_total_pages	disabled
+buffer_flush_background	disabled
+buffer_flush_background_pages	disabled
+buffer_LRU_batch_scanned	disabled
+buffer_LRU_batch_num_scan	disabled
+buffer_LRU_batch_scanned_per_call	disabled
+buffer_LRU_batch_total_pages	disabled
+buffer_LRU_batches	disabled
+buffer_LRU_batch_pages	disabled
+buffer_LRU_single_flush_scanned	disabled
+buffer_LRU_single_flush_num_scan	disabled
+buffer_LRU_single_flush_scanned_per_call	disabled
+buffer_LRU_single_flush_failure_count	disabled
+buffer_LRU_get_free_search	disabled
+buffer_LRU_search_scanned	disabled
+buffer_LRU_search_num_scan	disabled
+buffer_LRU_search_scanned_per_call	disabled
+buffer_LRU_unzip_search_scanned	disabled
+buffer_LRU_unzip_search_num_scan	disabled
+buffer_LRU_unzip_search_scanned_per_call	disabled
 buffer_page_read_index_leaf	disabled
 buffer_page_read_index_non_leaf	disabled
 buffer_page_read_index_ibuf_leaf	disabled
@@ -1174,25 +1306,47 @@ buffer_pages_written	enabled
 buffer_pages_read	enabled
 buffer_data_reads	enabled
 buffer_data_written	enabled
-buffer_flush_adaptive_flushes	enabled
-buffer_flush_adaptive_pages	enabled
-buffer_flush_async_flushes	enabled
-buffer_flush_async_pages	enabled
-buffer_flush_sync_flushes	enabled
-buffer_flush_sync_pages	enabled
-buffer_flush_max_dirty_flushes	enabled
-buffer_flush_max_dirty_pages	enabled
-buffer_flush_free_margin_flushes	enabled
-buffer_flush_free_margin_pages	enabled
-buffer_flush_io_capacity_pct	enabled
 buffer_flush_batch_scanned	enabled
 buffer_flush_batch_num_scan	enabled
 buffer_flush_batch_scanned_per_call	enabled
 buffer_flush_batch_total_pages	enabled
 buffer_flush_batches	enabled
 buffer_flush_batch_pages	enabled
-buffer_flush_by_lru	enabled
-buffer_flush_by_list	enabled
+buffer_flush_neighbor_total_pages	enabled
+buffer_flush_neighbor	enabled
+buffer_flush_neighbor_pages	enabled
+buffer_flush_max_dirty_total_pages	enabled
+buffer_flush_max_dirty	enabled
+buffer_flush_max_dirty_pages	enabled
+buffer_flush_adaptive_total_pages	enabled
+buffer_flush_adaptive	enabled
+buffer_flush_adaptive_pages	enabled
+buffer_flush_async_total_pages	enabled
+buffer_flush_async	enabled
+buffer_flush_async_pages	enabled
+buffer_flush_sync_total_pages	enabled
+buffer_flush_sync	enabled
+buffer_flush_sync_pages	enabled
+buffer_flush_background_total_pages	enabled
+buffer_flush_background	enabled
+buffer_flush_background_pages	enabled
+buffer_LRU_batch_scanned	enabled
+buffer_LRU_batch_num_scan	enabled
+buffer_LRU_batch_scanned_per_call	enabled
+buffer_LRU_batch_total_pages	enabled
+buffer_LRU_batches	enabled
+buffer_LRU_batch_pages	enabled
+buffer_LRU_single_flush_scanned	enabled
+buffer_LRU_single_flush_num_scan	enabled
+buffer_LRU_single_flush_scanned_per_call	enabled
+buffer_LRU_single_flush_failure_count	enabled
+buffer_LRU_get_free_search	enabled
+buffer_LRU_search_scanned	enabled
+buffer_LRU_search_num_scan	enabled
+buffer_LRU_search_scanned_per_call	enabled
+buffer_LRU_unzip_search_scanned	enabled
+buffer_LRU_unzip_search_num_scan	enabled
+buffer_LRU_unzip_search_scanned_per_call	enabled
 buffer_page_read_index_leaf	enabled
 buffer_page_read_index_non_leaf	enabled
 buffer_page_read_index_ibuf_leaf	enabled
@@ -1354,25 +1508,47 @@ buffer_pages_written	disabled
 buffer_pages_read	disabled
 buffer_data_reads	disabled
 buffer_data_written	disabled
-buffer_flush_adaptive_flushes	disabled
-buffer_flush_adaptive_pages	disabled
-buffer_flush_async_flushes	disabled
-buffer_flush_async_pages	disabled
-buffer_flush_sync_flushes	disabled
-buffer_flush_sync_pages	disabled
-buffer_flush_max_dirty_flushes	disabled
-buffer_flush_max_dirty_pages	disabled
-buffer_flush_free_margin_flushes	disabled
-buffer_flush_free_margin_pages	disabled
-buffer_flush_io_capacity_pct	disabled
 buffer_flush_batch_scanned	disabled
 buffer_flush_batch_num_scan	disabled
 buffer_flush_batch_scanned_per_call	disabled
 buffer_flush_batch_total_pages	disabled
 buffer_flush_batches	disabled
 buffer_flush_batch_pages	disabled
-buffer_flush_by_lru	disabled
-buffer_flush_by_list	disabled
+buffer_flush_neighbor_total_pages	disabled
+buffer_flush_neighbor	disabled
+buffer_flush_neighbor_pages	disabled
+buffer_flush_max_dirty_total_pages	disabled
+buffer_flush_max_dirty	disabled
+buffer_flush_max_dirty_pages	disabled
+buffer_flush_adaptive_total_pages	disabled
+buffer_flush_adaptive	disabled
+buffer_flush_adaptive_pages	disabled
+buffer_flush_async_total_pages	disabled
+buffer_flush_async	disabled
+buffer_flush_async_pages	disabled
+buffer_flush_sync_total_pages	disabled
+buffer_flush_sync	disabled
+buffer_flush_sync_pages	disabled
+buffer_flush_background_total_pages	disabled
+buffer_flush_background	disabled
+buffer_flush_background_pages	disabled
+buffer_LRU_batch_scanned	disabled
+buffer_LRU_batch_num_scan	disabled
+buffer_LRU_batch_scanned_per_call	disabled
+buffer_LRU_batch_total_pages	disabled
+buffer_LRU_batches	disabled
+buffer_LRU_batch_pages	disabled
+buffer_LRU_single_flush_scanned	disabled
+buffer_LRU_single_flush_num_scan	disabled
+buffer_LRU_single_flush_scanned_per_call	disabled
+buffer_LRU_single_flush_failure_count	disabled
+buffer_LRU_get_free_search	disabled
+buffer_LRU_search_scanned	disabled
+buffer_LRU_search_num_scan	disabled
+buffer_LRU_search_scanned_per_call	disabled
+buffer_LRU_unzip_search_scanned	disabled
+buffer_LRU_unzip_search_num_scan	disabled
+buffer_LRU_unzip_search_scanned_per_call	disabled
 buffer_page_read_index_leaf	disabled
 buffer_page_read_index_non_leaf	disabled
 buffer_page_read_index_ibuf_leaf	disabled
@@ -1534,25 +1710,47 @@ buffer_pages_written	disabled
 buffer_pages_read	disabled
 buffer_data_reads	disabled
 buffer_data_written	disabled
-buffer_flush_adaptive_flushes	disabled
-buffer_flush_adaptive_pages	disabled
-buffer_flush_async_flushes	disabled
-buffer_flush_async_pages	disabled
-buffer_flush_sync_flushes	disabled
-buffer_flush_sync_pages	disabled
-buffer_flush_max_dirty_flushes	disabled
-buffer_flush_max_dirty_pages	disabled
-buffer_flush_free_margin_flushes	disabled
-buffer_flush_free_margin_pages	disabled
-buffer_flush_io_capacity_pct	disabled
 buffer_flush_batch_scanned	disabled
 buffer_flush_batch_num_scan	disabled
 buffer_flush_batch_scanned_per_call	disabled
 buffer_flush_batch_total_pages	disabled
 buffer_flush_batches	disabled
 buffer_flush_batch_pages	disabled
-buffer_flush_by_lru	disabled
-buffer_flush_by_list	disabled
+buffer_flush_neighbor_total_pages	disabled
+buffer_flush_neighbor	disabled
+buffer_flush_neighbor_pages	disabled
+buffer_flush_max_dirty_total_pages	disabled
+buffer_flush_max_dirty	disabled
+buffer_flush_max_dirty_pages	disabled
+buffer_flush_adaptive_total_pages	disabled
+buffer_flush_adaptive	disabled
+buffer_flush_adaptive_pages	disabled
+buffer_flush_async_total_pages	disabled
+buffer_flush_async	disabled
+buffer_flush_async_pages	disabled
+buffer_flush_sync_total_pages	disabled
+buffer_flush_sync	disabled
+buffer_flush_sync_pages	disabled
+buffer_flush_background_total_pages	disabled
+buffer_flush_background	disabled
+buffer_flush_background_pages	disabled
+buffer_LRU_batch_scanned	disabled
+buffer_LRU_batch_num_scan	disabled
+buffer_LRU_batch_scanned_per_call	disabled
+buffer_LRU_batch_total_pages	disabled
+buffer_LRU_batches	disabled
+buffer_LRU_batch_pages	disabled
+buffer_LRU_single_flush_scanned	disabled
+buffer_LRU_single_flush_num_scan	disabled
+buffer_LRU_single_flush_scanned_per_call	disabled
+buffer_LRU_single_flush_failure_count	disabled
+buffer_LRU_get_free_search	disabled
+buffer_LRU_search_scanned	disabled
+buffer_LRU_search_num_scan	disabled
+buffer_LRU_search_scanned_per_call	disabled
+buffer_LRU_unzip_search_scanned	disabled
+buffer_LRU_unzip_search_num_scan	disabled
+buffer_LRU_unzip_search_scanned_per_call	disabled
 buffer_page_read_index_leaf	disabled
 buffer_page_read_index_non_leaf	disabled
 buffer_page_read_index_ibuf_leaf	disabled

=== modified file 'mysql-test/suite/innodb/t/innodb_buffer_pool_load-master.opt'
--- a/mysql-test/suite/innodb/t/innodb_buffer_pool_load-master.opt	revid:marc.alff@stripped
+++ b/mysql-test/suite/innodb/t/innodb_buffer_pool_load-master.opt	revid:inaam.rana@stripped
@@ -1 +1 @@
---innodb-buffer-pool-size=16M
+--innodb-buffer-pool-size=64M

=== modified file 'mysql-test/suite/sys_vars/r/all_vars.result'
--- a/mysql-test/suite/sys_vars/r/all_vars.result	revid:marc.alff@stripped
+++ b/mysql-test/suite/sys_vars/r/all_vars.result	revid:inaam.rana@stripped
@@ -5,6 +5,7 @@ insert into t2 select variable_name from
 insert into t2 select variable_name from information_schema.session_variables;
 delete from t2 where variable_name='innodb_change_buffering_debug';
 delete from t2 where variable_name='innodb_page_hash_locks';
+delete from t2 where variable_name='innodb_doublewrite_batch_size';
 update t2 set variable_name= replace(variable_name, "PERFORMANCE_SCHEMA_", "PFS_");
 select variable_name as `There should be *no* long test name listed below:` from t2
 where length(variable_name) > 50;

=== added file 'mysql-test/suite/sys_vars/r/innodb_flush_neighbors_basic.result'
--- a/mysql-test/suite/sys_vars/r/innodb_flush_neighbors_basic.result	1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/sys_vars/r/innodb_flush_neighbors_basic.result	revid:inaam.rana@stripped
@@ -0,0 +1,92 @@
+SET @start_global_value = @@global.innodb_flush_neighbors;
+SELECT @start_global_value;
+@start_global_value
+1
+Valid values are 'ON' and 'OFF' 
+select @@global.innodb_flush_neighbors in (0, 1);
+@@global.innodb_flush_neighbors in (0, 1)
+1
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
+select @@session.innodb_flush_neighbors;
+ERROR HY000: Variable 'innodb_flush_neighbors' is a GLOBAL variable
+show global variables like 'innodb_flush_neighbors';
+Variable_name	Value
+innodb_flush_neighbors	ON
+show session variables like 'innodb_flush_neighbors';
+Variable_name	Value
+innodb_flush_neighbors	ON
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	ON
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	ON
+set global innodb_flush_neighbors='OFF';
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+0
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	OFF
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	OFF
+set @@global.innodb_flush_neighbors=1;
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	ON
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	ON
+set global innodb_flush_neighbors=0;
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+0
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	OFF
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	OFF
+set @@global.innodb_flush_neighbors='ON';
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	ON
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	ON
+set session innodb_flush_neighbors='OFF';
+ERROR HY000: Variable 'innodb_flush_neighbors' is a GLOBAL variable and should be set with SET GLOBAL
+set @@session.innodb_flush_neighbors='ON';
+ERROR HY000: Variable 'innodb_flush_neighbors' is a GLOBAL variable and should be set with SET GLOBAL
+set global innodb_flush_neighbors=1.1;
+ERROR 42000: Incorrect argument type to variable 'innodb_flush_neighbors'
+set global innodb_flush_neighbors=1e1;
+ERROR 42000: Incorrect argument type to variable 'innodb_flush_neighbors'
+set global innodb_flush_neighbors=2;
+ERROR 42000: Variable 'innodb_flush_neighbors' can't be set to the value of '2'
+NOTE: The following should fail with ER_WRONG_VALUE_FOR_VAR (BUG#50643)
+set global innodb_flush_neighbors=-3;
+select @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	ON
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_FLUSH_NEIGHBORS	ON
+set global innodb_flush_neighbors='AUTO';
+ERROR 42000: Variable 'innodb_flush_neighbors' can't be set to the value of 'AUTO'
+SET @@global.innodb_flush_neighbors = @start_global_value;
+SELECT @@global.innodb_flush_neighbors;
+@@global.innodb_flush_neighbors
+1

=== added file 'mysql-test/suite/sys_vars/r/innodb_lru_scan_depth_basic.result'
--- a/mysql-test/suite/sys_vars/r/innodb_lru_scan_depth_basic.result	1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/sys_vars/r/innodb_lru_scan_depth_basic.result	revid:inaam.rana@stripped
@@ -0,0 +1,69 @@
+SET @start_global_value = @@global.innodb_lru_scan_depth;
+SELECT @start_global_value;
+@start_global_value
+1024
+Valid value 128 or more
+select @@global.innodb_lru_scan_depth > 127;
+@@global.innodb_lru_scan_depth > 127
+1
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+1024
+select @@session.innodb_lru_scan_depth;
+ERROR HY000: Variable 'innodb_lru_scan_depth' is a GLOBAL variable
+show global variables like 'innodb_lru_scan_depth';
+Variable_name	Value
+innodb_lru_scan_depth	1024
+show session variables like 'innodb_lru_scan_depth';
+Variable_name	Value
+innodb_lru_scan_depth	1024
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH	1024
+select * from information_schema.session_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH	1024
+set global innodb_lru_scan_depth=325;
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+325
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH	325
+select * from information_schema.session_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH	325
+set session innodb_lru_scan_depth=444;
+ERROR HY000: Variable 'innodb_lru_scan_depth' is a GLOBAL variable and should be set with SET GLOBAL
+set global innodb_lru_scan_depth=1.1;
+ERROR 42000: Incorrect argument type to variable 'innodb_lru_scan_depth'
+set global innodb_lru_scan_depth=1e1;
+ERROR 42000: Incorrect argument type to variable 'innodb_lru_scan_depth'
+set global innodb_lru_scan_depth="foo";
+ERROR 42000: Incorrect argument type to variable 'innodb_lru_scan_depth'
+set global innodb_lru_scan_depth=7;
+Warnings:
+Warning	1292	Truncated incorrect innodb_lru_scan_depth value: '7'
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+100
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH	100
+set global innodb_lru_scan_depth=-7;
+Warnings:
+Warning	1292	Truncated incorrect innodb_lru_scan_depth value: '-7'
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+100
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+VARIABLE_NAME	VARIABLE_VALUE
+INNODB_LRU_SCAN_DEPTH	100
+set global innodb_lru_scan_depth=128;
+select @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+128
+SET @@global.innodb_lru_scan_depth = @start_global_value;
+SELECT @@global.innodb_lru_scan_depth;
+@@global.innodb_lru_scan_depth
+1024

=== modified file 'mysql-test/suite/sys_vars/t/all_vars.test'
--- a/mysql-test/suite/sys_vars/t/all_vars.test	revid:marc.alff@stripped
+++ b/mysql-test/suite/sys_vars/t/all_vars.test	revid:inaam.rana@stripped
@@ -70,6 +70,7 @@ insert into t2 select variable_name from
 # These are only present in debug builds.
 delete from t2 where variable_name='innodb_change_buffering_debug';
 delete from t2 where variable_name='innodb_page_hash_locks';
+delete from t2 where variable_name='innodb_doublewrite_batch_size';
 
 # Performance schema variables are too long for files named
 # 'mysql-test/suite/sys_vars/t/' ...

=== added file 'mysql-test/suite/sys_vars/t/innodb_flush_neighbors_basic.test'
--- a/mysql-test/suite/sys_vars/t/innodb_flush_neighbors_basic.test	1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/sys_vars/t/innodb_flush_neighbors_basic.test	revid:inaam.rana@stripped
@@ -0,0 +1,70 @@
+
+
+# 2011-02-23 - Added
+#
+
+--source include/have_innodb.inc
+
+SET @start_global_value = @@global.innodb_flush_neighbors;
+SELECT @start_global_value;
+
+#
+# exists as global only
+#
+--echo Valid values are 'ON' and 'OFF' 
+select @@global.innodb_flush_neighbors in (0, 1);
+select @@global.innodb_flush_neighbors;
+--error ER_INCORRECT_GLOBAL_LOCAL_VAR
+select @@session.innodb_flush_neighbors;
+show global variables like 'innodb_flush_neighbors';
+show session variables like 'innodb_flush_neighbors';
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+
+#
+# show that it's writable
+#
+set global innodb_flush_neighbors='OFF';
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+set @@global.innodb_flush_neighbors=1;
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+set global innodb_flush_neighbors=0;
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+set @@global.innodb_flush_neighbors='ON';
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+--error ER_GLOBAL_VARIABLE
+set session innodb_flush_neighbors='OFF';
+--error ER_GLOBAL_VARIABLE
+set @@session.innodb_flush_neighbors='ON';
+
+#
+# incorrect types
+#
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_flush_neighbors=1.1;
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_flush_neighbors=1e1;
+--error ER_WRONG_VALUE_FOR_VAR
+set global innodb_flush_neighbors=2;
+--echo NOTE: The following should fail with ER_WRONG_VALUE_FOR_VAR (BUG#50643)
+set global innodb_flush_neighbors=-3;
+select @@global.innodb_flush_neighbors;
+select * from information_schema.global_variables where variable_name='innodb_flush_neighbors';
+select * from information_schema.session_variables where variable_name='innodb_flush_neighbors';
+--error ER_WRONG_VALUE_FOR_VAR
+set global innodb_flush_neighbors='AUTO';
+
+#
+# Cleanup
+#
+
+SET @@global.innodb_flush_neighbors = @start_global_value;
+SELECT @@global.innodb_flush_neighbors;

=== added file 'mysql-test/suite/sys_vars/t/innodb_lru_scan_depth_basic.test'
--- a/mysql-test/suite/sys_vars/t/innodb_lru_scan_depth_basic.test	1970-01-01 00:00:00 +0000
+++ b/mysql-test/suite/sys_vars/t/innodb_lru_scan_depth_basic.test	revid:inaam.rana@stripped
@@ -0,0 +1,58 @@
+
+
+# 2011-02-23 - Added
+#
+
+--source include/have_innodb.inc
+
+SET @start_global_value = @@global.innodb_lru_scan_depth;
+SELECT @start_global_value;
+
+#
+# exists as global only
+#
+--echo Valid value 128 or more
+select @@global.innodb_lru_scan_depth > 127;
+select @@global.innodb_lru_scan_depth;
+--error ER_INCORRECT_GLOBAL_LOCAL_VAR
+select @@session.innodb_lru_scan_depth;
+show global variables like 'innodb_lru_scan_depth';
+show session variables like 'innodb_lru_scan_depth';
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+select * from information_schema.session_variables where variable_name='innodb_lru_scan_depth';
+
+#
+# show that it's writable
+#
+set global innodb_lru_scan_depth=325;
+select @@global.innodb_lru_scan_depth;
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+select * from information_schema.session_variables where variable_name='innodb_lru_scan_depth';
+--error ER_GLOBAL_VARIABLE
+set session innodb_lru_scan_depth=444;
+
+#
+# incorrect types
+#
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_lru_scan_depth=1.1;
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_lru_scan_depth=1e1;
+--error ER_WRONG_TYPE_FOR_VAR
+set global innodb_lru_scan_depth="foo";
+
+set global innodb_lru_scan_depth=7;
+select @@global.innodb_lru_scan_depth;
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+set global innodb_lru_scan_depth=-7;
+select @@global.innodb_lru_scan_depth;
+select * from information_schema.global_variables where variable_name='innodb_lru_scan_depth';
+
+#
+# min/max values
+#
+set global innodb_lru_scan_depth=128;
+select @@global.innodb_lru_scan_depth;
+
+SET @@global.innodb_lru_scan_depth = @start_global_value;
+SELECT @@global.innodb_lru_scan_depth;

=== modified file 'storage/innobase/btr/btr0sea.c'
--- a/storage/innobase/btr/btr0sea.c	revid:marc.alff@stripped
+++ b/storage/innobase/btr/btr0sea.c	revid:inaam.rana@stripped
@@ -1034,7 +1034,11 @@ btr_search_drop_page_hash_index(
 	buf_block_t*	block)	/*!< in: block containing index page,
 				s- or x-latched, or an index page
 				for which we know that
-				block->buf_fix_count == 0 */
+				block->buf_fix_count == 0 or it is an
+				index page which has already been
+				removed from the buf_pool->page_hash
+				i.e.: it is in state
+				BUF_BLOCK_REMOVE_HASH */
 {
 	hash_table_t*		table;
 	ulint			n_fields;
@@ -1082,7 +1086,8 @@ retry:
 #ifdef UNIV_SYNC_DEBUG
 	ut_ad(rw_lock_own(&(block->lock), RW_LOCK_SHARED)
 	      || rw_lock_own(&(block->lock), RW_LOCK_EX)
-	      || (block->page.buf_fix_count == 0));
+	      || block->page.buf_fix_count == 0
+	      || buf_block_get_state(block) == BUF_BLOCK_REMOVE_HASH);
 #endif /* UNIV_SYNC_DEBUG */
 
 	n_fields = block->curr_n_fields;

=== modified file 'storage/innobase/buf/buf0buf.c'
--- a/storage/innobase/buf/buf0buf.c	revid:marc.alff@stripped
+++ b/storage/innobase/buf/buf0buf.c	revid:inaam.rana@stripped
@@ -1207,6 +1207,8 @@ buf_pool_init_instance(
 
 	/* All fields are initialized by mem_zalloc(). */
 
+	buf_pool->try_LRU_scan = TRUE;
+
 	buf_pool_mutex_exit(buf_pool);
 
 	return(DB_SUCCESS);
@@ -3683,9 +3685,6 @@ buf_page_create(
 
 	ibuf_merge_or_delete_for_page(NULL, space, offset, zip_size, TRUE);
 
-	/* Flush pages from the end of the LRU list if necessary */
-	buf_flush_free_margin(buf_pool);
-
 	frame = block->frame;
 
 	memset(frame + FIL_PAGE_PREV, 0xff, 4);
@@ -4075,7 +4074,6 @@ buf_pool_invalidate_instance(
 /*=========================*/
 	buf_pool_t*	buf_pool)	/*!< in: buffer pool instance */
 {
-	ibool		freed;
 	enum buf_flush	i;
 
 	buf_pool_mutex_enter(buf_pool);
@@ -4104,21 +4102,17 @@ buf_pool_invalidate_instance(
 
 	ut_ad(buf_all_freed_instance(buf_pool));
 
-	freed = TRUE;
+	buf_pool_mutex_enter(buf_pool);
 
-	while (freed) {
-		freed = buf_LRU_search_and_free_block(buf_pool, 100);
+	while (buf_LRU_scan_and_free_block(buf_pool, TRUE)) {
 	}
 
-	buf_pool_mutex_enter(buf_pool);
-
 	ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);
 	ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);
 
 	buf_pool->freed_page_clock = 0;
 	buf_pool->LRU_old = NULL;
 	buf_pool->LRU_old_len = 0;
-	buf_pool->LRU_flush_ended = 0;
 
 	memset(&buf_pool->stat, 0x00, sizeof(buf_pool->stat));
 	buf_refresh_io_stats(buf_pool);
@@ -4156,6 +4150,7 @@ buf_pool_validate_instance(
 	buf_chunk_t*	chunk;
 	ulint		i;
 	ulint		n_lru_flush	= 0;
+	ulint		n_page_flush	= 0;
 	ulint		n_list_flush	= 0;
 	ulint		n_lru		= 0;
 	ulint		n_flush		= 0;
@@ -4219,9 +4214,13 @@ buf_pool_validate_instance(
 							&block->page)) {
 					case BUF_FLUSH_LRU:
 						n_lru_flush++;
+						goto assert_s_latched;
+					case BUF_FLUSH_SINGLE_PAGE:
+						n_page_flush++;
+assert_s_latched:
 						ut_a(rw_lock_is_locked(
 							     &block->lock,
-							     RW_LOCK_SHARED));
+								     RW_LOCK_SHARED));
 						break;
 					case BUF_FLUSH_LIST:
 						n_list_flush++;
@@ -4312,6 +4311,9 @@ buf_pool_validate_instance(
 				case BUF_FLUSH_LRU:
 					n_lru_flush++;
 					break;
+				case BUF_FLUSH_SINGLE_PAGE:
+					n_page_flush++;
+					break;
 				case BUF_FLUSH_LIST:
 					n_list_flush++;
 					break;
@@ -4362,6 +4364,7 @@ buf_pool_validate_instance(
 
 	ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush);
 	ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush);
+	ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_page_flush);
 
 	buf_pool_mutex_exit(buf_pool);
 
@@ -4429,7 +4432,7 @@ buf_print_instance(
 		"modified database pages %lu\n"
 		"n pending decompressions %lu\n"
 		"n pending reads %lu\n"
-		"n pending flush LRU %lu list %lu\n"
+		"n pending flush LRU %lu list %lu single page %lu\n"
 		"pages made young %lu, not young %lu\n"
 		"pages read %lu, created %lu, written %lu\n",
 		(ulong) size,
@@ -4440,6 +4443,7 @@ buf_print_instance(
 		(ulong) buf_pool->n_pend_reads,
 		(ulong) buf_pool->n_flush[BUF_FLUSH_LRU],
 		(ulong) buf_pool->n_flush[BUF_FLUSH_LIST],
+		(ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE],
 		(ulong) buf_pool->stat.n_pages_made_young,
 		(ulong) buf_pool->stat.n_pages_not_made_young,
 		(ulong) buf_pool->stat.n_pages_read,
@@ -4790,6 +4794,10 @@ buf_stats_get_pool_info(
 		 (buf_pool->n_flush[BUF_FLUSH_LIST]
 		  + buf_pool->init_flush[BUF_FLUSH_LIST]);
 
+	pool_info->n_pending_flush_single_page =
+		 (buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]
+		  + buf_pool->init_flush[BUF_FLUSH_SINGLE_PAGE]);
+
 	buf_flush_list_mutex_exit(buf_pool);
 
 	current_time = time(NULL);
@@ -4894,7 +4902,7 @@ buf_print_io_instance(
 		"Old database pages %lu\n"
 		"Modified db pages  %lu\n"
 		"Pending reads %lu\n"
-		"Pending writes: LRU %lu, flush list %lu\n",
+		"Pending writes: LRU %lu, flush list %lu single page %lu\n",
 		pool_info->pool_size,
 		pool_info->free_list_len,
 		pool_info->lru_len,
@@ -4902,7 +4910,8 @@ buf_print_io_instance(
 		pool_info->flush_list_len,
 		pool_info->n_pend_reads,
 		pool_info->n_pending_flush_lru,
-		pool_info->n_pending_flush_list);
+		pool_info->n_pending_flush_list,
+		pool_info->n_pending_flush_single_page);
 
 	fprintf(file,
 		"Pages made young %lu, not young %lu\n"
@@ -5090,6 +5099,7 @@ buf_pool_check_no_pending_io(void)
 
 		pending_io += buf_pool->n_pend_reads
 			      + buf_pool->n_flush[BUF_FLUSH_LRU]
+			      + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]
 			      + buf_pool->n_flush[BUF_FLUSH_LIST];
 
 	}

=== modified file 'storage/innobase/buf/buf0flu.c'
--- a/storage/innobase/buf/buf0flu.c	revid:marc.alff@stripped
+++ b/storage/innobase/buf/buf0flu.c	revid:inaam.rana@stripped
@@ -61,6 +61,11 @@ Each interval is 1 second, defined by th
 srv_error_monitor_thread() calls buf_flush_stat_update(). */
 #define BUF_FLUSH_STAT_N_INTERVAL 20
 
+/** Time in milliseconds that we sleep when unable to find a slot in
+the doublewrite buffer or when we have to wait for a running batch
+to end. */
+#define TRX_DOUBLEWRITE_BATCH_POLL_DELAY	10000
+
 /** Sampled values buf_flush_stat_cur.
 Not protected by any mutex.  Updated by buf_flush_stat_update(). */
 static buf_flush_stat_t	buf_flush_stat_arr[BUF_FLUSH_STAT_N_INTERVAL];
@@ -86,10 +91,20 @@ need to protect it by a mutex. It is onl
 doing the shutdown */
 UNIV_INTERN ibool buf_page_cleaner_is_active = FALSE;
 
+/** LRU flush batch is further divided into this chunk size to
+reduce the wait time for the threads waiting for a clean block */
+#define PAGE_CLEANER_LRU_BATCH_CHUNK_SIZE	100
+
 #ifdef UNIV_PFS_THREAD
 UNIV_INTERN mysql_pfs_key_t buf_page_cleaner_thread_key;
 #endif /* UNIV_PFS_THREAD */
 
+/** If LRU list of a buf_pool is less than this size then LRU eviction
+should not happen. This is because when we do LRU flushing we also put
+the blocks on free list. If LRU list is very small then we can end up
+in thrashing. */
+#define BUF_LRU_MIN_LEN		256
+
 /* @} */
 
 #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
@@ -479,7 +494,7 @@ buf_flush_ready_for_flush(
 /*======================*/
 	buf_page_t*	bpage,	/*!< in: buffer control block, must be
 				buf_page_in_file(bpage) */
-	enum buf_flush	flush_type)/*!< in: BUF_FLUSH_LRU or BUF_FLUSH_LIST */
+	enum buf_flush	flush_type)/*!< in: type of flush */
 {
 #ifdef UNIV_DEBUG
 	buf_pool_t*	buf_pool = buf_pool_from_bpage(bpage);
@@ -487,26 +502,33 @@ buf_flush_ready_for_flush(
 #endif
 	ut_a(buf_page_in_file(bpage));
 	ut_ad(mutex_own(buf_page_get_mutex(bpage)));
-	ut_ad(flush_type == BUF_FLUSH_LRU || BUF_FLUSH_LIST);
+	ut_ad(flush_type < BUF_FLUSH_N_TYPES);
 
-	if (bpage->oldest_modification != 0
-	    && buf_page_get_io_fix(bpage) == BUF_IO_NONE) {
-		ut_ad(bpage->in_flush_list);
-
-		if (flush_type != BUF_FLUSH_LRU) {
-
-			return(TRUE);
+	if (bpage->oldest_modification == 0
+	    || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
+		return(FALSE);
+	}
 
-		} else if (bpage->buf_fix_count == 0) {
+	ut_ad(bpage->in_flush_list);
 
-			/* If we are flushing the LRU list, to avoid deadlocks
-			we require the block not to be bufferfixed, and hence
-			not latched. */
+	switch (flush_type) {
+	case BUF_FLUSH_LIST:
+		return(TRUE);
 
-			return(TRUE);
-		}
+	case BUF_FLUSH_LRU:
+	case BUF_FLUSH_SINGLE_PAGE:
+		/* Because any thread may call single page flush, even
+		when owning locks on pages, to avoid deadlocks, we must
+		make sure that the that it is not buffer fixed.
+		The same holds true for LRU flush because a user thread
+		may end up waiting for an LRU flush to end while
+		holding locks on other pages. */
+		return(bpage->buf_fix_count == 0);
+	case BUF_FLUSH_N_TYPES:
+		break;
 	}
 
+	ut_error;
 	return(FALSE);
 }
 
@@ -664,15 +686,6 @@ buf_flush_write_complete(
 	flush_type = buf_page_get_flush_type(bpage);
 	buf_pool->n_flush[flush_type]--;
 
-	if (flush_type == BUF_FLUSH_LRU) {
-		/* Put the block to the end of the LRU list to wait to be
-		moved to the free list */
-
-		buf_LRU_make_block_old(bpage);
-
-		buf_pool->LRU_flush_ended++;
-	}
-
 	/* fprintf(stderr, "n pending flush %lu\n",
 	buf_pool->n_flush[flush_type]); */
 
@@ -708,6 +721,123 @@ buf_flush_sync_datafiles(void)
 }
 
 /********************************************************************//**
+Check the LSN values on the page. */
+static
+void
+buf_flush_doublewrite_check_page_lsn(
+/*=================================*/
+	const page_t*	page)		/*!< in: page to check */
+{
+	if (memcmp(page + (FIL_PAGE_LSN + 4),
+		   page + (UNIV_PAGE_SIZE
+			   - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
+		   4)) {
+
+		ut_print_timestamp(stderr);
+		fprintf(stderr,
+			"  InnoDB: ERROR: The page to be written"
+			" seems corrupt!\n"
+			"InnoDB: The LSN fields do not match!"
+			" Noticed in the buffer pool\n");
+	}
+}
+
+/********************************************************************//**
+Asserts when a corrupt block is find during writing out data to the
+disk. */
+static
+void
+buf_flush_doublewrite_assert_on_corrupt_block(
+/*==========================================*/
+	const buf_block_t*	block)	/*!< in: block to check */
+{
+	buf_page_print(block->frame, 0);
+
+	ut_print_timestamp(stderr);
+	fprintf(stderr,
+		"  InnoDB: Apparent corruption of an"
+		" index page n:o %lu in space %lu\n"
+		"InnoDB: to be written to data file."
+		" We intentionally crash server\n"
+		"InnoDB: to prevent corrupt data"
+		" from ending up in data\n"
+		"InnoDB: files.\n",
+		(ulong) buf_block_get_page_no(block),
+		(ulong) buf_block_get_space(block));
+
+	ut_error;
+}
+
+/********************************************************************//**
+Check the LSN values on the page with which this block is associated.
+Also validate the page if the option is set. */
+static
+void
+buf_flush_doublewrite_check_block(
+/*==============================*/
+	const buf_block_t*	block)	/*!< in: block to check */
+{
+	if (buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE
+	    || block->page.zip.data) {
+		/* No simple validate for compressed pages exists. */
+		return;
+	}
+
+	buf_flush_doublewrite_check_page_lsn(block->frame);
+
+	if (!block->check_index_page_at_flush) {
+		return;
+	}
+
+	if (page_is_comp(block->frame)) {
+		if (!page_simple_validate_new(block->frame)) {
+			buf_flush_doublewrite_assert_on_corrupt_block(block);
+		}
+	} else if (!page_simple_validate_old(block->frame)) {
+
+		buf_flush_doublewrite_assert_on_corrupt_block(block);
+	}
+}
+
+/********************************************************************//**
+Writes a page that has already been written to the doublewrite buffer
+to the datafile. It is the job of the caller to sync the datafile. */
+static
+void
+buf_flush_write_block_to_datafile(
+/*==============================*/
+	const buf_block_t*	block)	/*!< in: block to write */
+{
+	ut_a(block);
+	ut_a(buf_page_in_file(&block->page));
+
+	if (UNIV_LIKELY_NULL(block->page.zip.data)) {
+		fil_io(OS_FILE_WRITE | OS_AIO_SIMULATED_WAKE_LATER,
+		       FALSE, buf_page_get_space(&block->page),
+		       buf_page_get_zip_size(&block->page),
+		       buf_page_get_page_no(&block->page), 0,
+		       buf_page_get_zip_size(&block->page),
+		       (void*)block->page.zip.data,
+		       (void*)block);
+
+		goto exit;
+	}
+
+	ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
+	buf_flush_doublewrite_check_page_lsn(block->frame);
+
+	fil_io(OS_FILE_WRITE | OS_AIO_SIMULATED_WAKE_LATER,
+	       FALSE, buf_block_get_space(block), 0,
+	       buf_block_get_page_no(block), 0, UNIV_PAGE_SIZE,
+	       (void*)block->frame, (void*)block);
+
+exit:
+	/* Increment the counter of I/O operations used
+	for selecting LRU policy. */
+	buf_LRU_stat_inc_io();
+}
+
+/********************************************************************//**
 Flushes possible buffered writes from the doublewrite memory buffer to disk,
 and also wakes up the aio thread if simulated aio is used. It is very
 important to call this function after a batch of writes has been posted,
@@ -729,6 +859,7 @@ buf_flush_buffered_writes(void)
 		return;
 	}
 
+try_again:
 	mutex_enter(&(trx_doublewrite->mutex));
 
 	/* Write first to doublewrite buffer blocks. We use synchronous
@@ -742,7 +873,32 @@ buf_flush_buffered_writes(void)
 		return;
 	}
 
-	for (i = 0; i < trx_doublewrite->first_free; i++) {
+	if (trx_doublewrite->batch_running) {
+		mutex_exit(&trx_doublewrite->mutex);
+
+		/* Another thread is running the batch right now. Wait
+		for it to finish. */
+		os_thread_sleep(TRX_DOUBLEWRITE_BATCH_POLL_DELAY);
+		goto try_again;
+	}
+
+	ut_a(!trx_doublewrite->batch_running);
+
+	/* Disallow anyone else to post to doublewrite buffer or to
+	start another batch of flushing. */
+	trx_doublewrite->batch_running = TRUE;
+
+	/* Now safe to release the mutex. Note that though no other
+	thread is allowed to post to the doublewrite batch flushing
+	but any threads working on single page flushes are allowed
+	to proceed. */
+	mutex_exit(&trx_doublewrite->mutex);
+
+	write_buf = trx_doublewrite->write_buf;
+
+	for (len2 = 0, i = 0;
+	     i < trx_doublewrite->first_free;
+	     len2 += UNIV_PAGE_SIZE, i++) {
 
 		const buf_block_t*	block;
 
@@ -750,130 +906,50 @@ buf_flush_buffered_writes(void)
 
 		if (buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE
 		    || block->page.zip.data) {
-			/* No simple validate for compressed pages exists. */
+			/* No simple validate for compressed
+			pages exists. */
 			continue;
 		}
 
-		if (UNIV_UNLIKELY
-		    (memcmp(block->frame + (FIL_PAGE_LSN + 4),
-			    block->frame + (UNIV_PAGE_SIZE
-					    - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
-			    4))) {
-			ut_print_timestamp(stderr);
-			fprintf(stderr,
-				"  InnoDB: ERROR: The page to be written"
-				" seems corrupt!\n"
-				"InnoDB: The lsn fields do not match!"
-				" Noticed in the buffer pool\n"
-				"InnoDB: before posting to the"
-				" doublewrite buffer.\n");
-		}
-
-		if (!block->check_index_page_at_flush) {
-		} else if (page_is_comp(block->frame)) {
-			if (UNIV_UNLIKELY
-			    (!page_simple_validate_new(block->frame))) {
-corrupted_page:
-				buf_page_print(block->frame, 0);
-
-				ut_print_timestamp(stderr);
-				fprintf(stderr,
-					"  InnoDB: Apparent corruption of an"
-					" index page n:o %lu in space %lu\n"
-					"InnoDB: to be written to data file."
-					" We intentionally crash server\n"
-					"InnoDB: to prevent corrupt data"
-					" from ending up in data\n"
-					"InnoDB: files.\n",
-					(ulong) buf_block_get_page_no(block),
-					(ulong) buf_block_get_space(block));
-
-				ut_error;
-			}
-		} else if (UNIV_UNLIKELY
-			   (!page_simple_validate_old(block->frame))) {
+		/* Check that the actual page in the buffer pool is
+		not corrupt and the LSN values are sane. */
+		buf_flush_doublewrite_check_block(block);
 
-			goto corrupted_page;
-		}
+		/* Check that the page as written to the doublewrite
+		buffer has sane LSN values. */
+		buf_flush_doublewrite_check_page_lsn(write_buf + len2);
 	}
 
-	/* increment the doublewrite flushed pages counter */
-	srv_dblwr_pages_written+= trx_doublewrite->first_free;
-	srv_dblwr_writes++;
-
+	/* Write out the first block of the doublewrite buffer */
 	len = ut_min(TRX_SYS_DOUBLEWRITE_BLOCK_SIZE,
 		     trx_doublewrite->first_free) * UNIV_PAGE_SIZE;
 
-	write_buf = trx_doublewrite->write_buf;
-	i = 0;
-
 	fil_io(OS_FILE_WRITE, TRUE, TRX_SYS_SPACE, 0,
 	       trx_doublewrite->block1, 0, len,
 	       (void*) write_buf, NULL);
 
-	for (len2 = 0; len2 + UNIV_PAGE_SIZE <= len;
-	     len2 += UNIV_PAGE_SIZE, i++) {
-		const buf_block_t* block = (buf_block_t*)
-			trx_doublewrite->buf_block_arr[i];
-
-		if (UNIV_LIKELY(!block->page.zip.data)
-		    && UNIV_LIKELY(buf_block_get_state(block)
-				   == BUF_BLOCK_FILE_PAGE)
-		    && UNIV_UNLIKELY
-		    (memcmp(write_buf + len2 + (FIL_PAGE_LSN + 4),
-			    write_buf + len2
-			    + (UNIV_PAGE_SIZE
-			       - FIL_PAGE_END_LSN_OLD_CHKSUM + 4), 4))) {
-			ut_print_timestamp(stderr);
-			fprintf(stderr,
-				"  InnoDB: ERROR: The page to be written"
-				" seems corrupt!\n"
-				"InnoDB: The lsn fields do not match!"
-				" Noticed in the doublewrite block1.\n");
-		}
-	}
-
 	if (trx_doublewrite->first_free <= TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) {
+		/* No unwritten pages in the second block. */
 		goto flush;
 	}
 
+	/* Write out the second block of the doublewrite buffer. */
 	len = (trx_doublewrite->first_free - TRX_SYS_DOUBLEWRITE_BLOCK_SIZE)
-		* UNIV_PAGE_SIZE;
+	       * UNIV_PAGE_SIZE;
 
 	write_buf = trx_doublewrite->write_buf
-		+ TRX_SYS_DOUBLEWRITE_BLOCK_SIZE * UNIV_PAGE_SIZE;
-	ut_ad(i == TRX_SYS_DOUBLEWRITE_BLOCK_SIZE);
+		    + TRX_SYS_DOUBLEWRITE_BLOCK_SIZE * UNIV_PAGE_SIZE;
 
 	fil_io(OS_FILE_WRITE, TRUE, TRX_SYS_SPACE, 0,
 	       trx_doublewrite->block2, 0, len,
 	       (void*) write_buf, NULL);
 
-	for (len2 = 0; len2 + UNIV_PAGE_SIZE <= len;
-	     len2 += UNIV_PAGE_SIZE, i++) {
-		const buf_block_t* block = (buf_block_t*)
-			trx_doublewrite->buf_block_arr[i];
-
-		if (UNIV_LIKELY(!block->page.zip.data)
-		    && UNIV_LIKELY(buf_block_get_state(block)
-				   == BUF_BLOCK_FILE_PAGE)
-		    && UNIV_UNLIKELY
-		    (memcmp(write_buf + len2 + (FIL_PAGE_LSN + 4),
-			    write_buf + len2
-			    + (UNIV_PAGE_SIZE
-			       - FIL_PAGE_END_LSN_OLD_CHKSUM + 4), 4))) {
-			ut_print_timestamp(stderr);
-			fprintf(stderr,
-				"  InnoDB: ERROR: The page to be"
-				" written seems corrupt!\n"
-				"InnoDB: The lsn fields do not match!"
-				" Noticed in"
-				" the doublewrite block2.\n");
-		}
-	}
-
 flush:
-	/* Now flush the doublewrite buffer data to disk */
+	/* increment the doublewrite flushed pages counter */
+	srv_dblwr_pages_written += trx_doublewrite->first_free;
+	srv_dblwr_writes++;
 
+	/* Now flush the doublewrite buffer data to disk */
 	fil_flush(TRX_SYS_SPACE);
 
 	/* We know that the writes have been flushed to disk now
@@ -884,60 +960,17 @@ flush:
 		const buf_block_t* block = (buf_block_t*)
 			trx_doublewrite->buf_block_arr[i];
 
-		ut_a(buf_page_in_file(&block->page));
-		if (UNIV_LIKELY_NULL(block->page.zip.data)) {
-			fil_io(OS_FILE_WRITE | OS_AIO_SIMULATED_WAKE_LATER,
-			       FALSE, buf_page_get_space(&block->page),
-			       buf_page_get_zip_size(&block->page),
-			       buf_page_get_page_no(&block->page), 0,
-			       buf_page_get_zip_size(&block->page),
-			       (void*)block->page.zip.data,
-			       (void*)block);
-
-			/* Increment the counter of I/O operations used
-			for selecting LRU policy. */
-			buf_LRU_stat_inc_io();
-
-			continue;
-		}
-
-		ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
-
-		if (UNIV_UNLIKELY(memcmp(block->frame + (FIL_PAGE_LSN + 4),
-					 block->frame
-					 + (UNIV_PAGE_SIZE
-					    - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
-					 4))) {
-			ut_print_timestamp(stderr);
-			fprintf(stderr,
-				"  InnoDB: ERROR: The page to be written"
-				" seems corrupt!\n"
-				"InnoDB: The lsn fields do not match!"
-				" Noticed in the buffer pool\n"
-				"InnoDB: after posting and flushing"
-				" the doublewrite buffer.\n"
-				"InnoDB: Page buf fix count %lu,"
-				" io fix %lu, state %lu\n",
-				(ulong)block->page.buf_fix_count,
-				(ulong)buf_block_get_io_fix(block),
-				(ulong)buf_block_get_state(block));
-		}
-
-		fil_io(OS_FILE_WRITE | OS_AIO_SIMULATED_WAKE_LATER,
-		       FALSE, buf_block_get_space(block), 0,
-		       buf_block_get_page_no(block), 0, UNIV_PAGE_SIZE,
-		       (void*)block->frame, (void*)block);
-
-		/* Increment the counter of I/O operations used
-		for selecting LRU policy. */
-		buf_LRU_stat_inc_io();
+		buf_flush_write_block_to_datafile(block);
 	}
 
 	/* Sync the writes to the disk. */
 	buf_flush_sync_datafiles();
 
+	mutex_enter(&trx_doublewrite->mutex);
+
 	/* We can now reuse the doublewrite memory buffer: */
 	trx_doublewrite->first_free = 0;
+	trx_doublewrite->batch_running = FALSE;
 
 	mutex_exit(&(trx_doublewrite->mutex));
 }
@@ -953,13 +986,28 @@ buf_flush_post_to_doublewrite_buf(
 	buf_page_t*	bpage)	/*!< in: buffer block to write */
 {
 	ulint	zip_size;
+
+	ut_a(buf_page_in_file(bpage));
+
 try_again:
 	mutex_enter(&(trx_doublewrite->mutex));
 
-	ut_a(buf_page_in_file(bpage));
+	ut_a(trx_doublewrite->first_free <= srv_doublewrite_batch_size);
+
+	if (trx_doublewrite->batch_running) {
+		mutex_exit(&trx_doublewrite->mutex);
+
+		/* This not nearly as bad as it looks. There is only
+		page_cleaner thread which does background flushing
+		in batches therefore it is unlikely to be a contention
+		point. The only exception is when a user thread is
+		forced to do a flush batch because of a sync
+		checkpoint. */
+		os_thread_sleep(TRX_DOUBLEWRITE_BATCH_POLL_DELAY);
+		goto try_again;
+	}
 
-	if (trx_doublewrite->first_free
-	    >= 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) {
+	if (trx_doublewrite->first_free == srv_doublewrite_batch_size) {
 		mutex_exit(&(trx_doublewrite->mutex));
 
 		buf_flush_buffered_writes();
@@ -992,8 +1040,7 @@ try_again:
 
 	trx_doublewrite->first_free++;
 
-	if (trx_doublewrite->first_free
-	    >= 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) {
+	if (trx_doublewrite->first_free == srv_doublewrite_batch_size) {
 		mutex_exit(&(trx_doublewrite->mutex));
 
 		buf_flush_buffered_writes();
@@ -1003,6 +1050,140 @@ try_again:
 
 	mutex_exit(&(trx_doublewrite->mutex));
 }
+
+/********************************************************************//**
+Writes a page to the doublewrite buffer on disk, sync it, then write
+the page to the datafile and sync the datafile. This function is used
+for single page flushes. If all the buffers allocated for single page
+flushes in the doublewrite buffer are in use we wait here for one to
+become free. We are guaranteed that a slot will become free because any
+thread that is using a slot must also release the slot before leaving
+this function. */
+static
+void
+buf_flush_write_to_dblwr_and_datafile(
+/*==================================*/
+	buf_page_t*	bpage)	/*!< in: buffer block to write */
+{
+	ulint		n_slots;
+	ulint		size;
+	ulint		zip_size;
+	ulint		offset;
+	ulint		i;
+
+	ut_a(buf_page_in_file(bpage));
+	ut_a(srv_use_doublewrite_buf);
+	ut_a(trx_doublewrite != NULL);
+
+	/* total number of slots available for single page flushes
+	starts from srv_doublewrite_batch_size to the end of the
+	buffer. */
+	size = 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE;
+	ut_a(size > srv_doublewrite_batch_size);
+	n_slots = size - srv_doublewrite_batch_size;
+
+	if (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE) {
+
+		/* Check that the actual page in the buffer pool is
+		not corrupt and the LSN values are sane. */
+		buf_flush_doublewrite_check_block((buf_block_t*) bpage);
+
+		/* Check that the page as written to the doublewrite
+		buffer has sane LSN values. */
+		buf_flush_doublewrite_check_page_lsn(
+			((buf_block_t*) bpage)->frame);
+	}
+
+retry:
+	mutex_enter(&trx_doublewrite->mutex);
+	if (trx_doublewrite->n_reserved == n_slots) {
+
+		mutex_exit(&trx_doublewrite->mutex);
+		/* All slots are reserved. Since it involves two IOs
+		during the processing a sleep of 10ms should be
+		enough. */
+		os_thread_sleep(TRX_DOUBLEWRITE_BATCH_POLL_DELAY);
+		goto retry;
+	}
+
+	for (i = srv_doublewrite_batch_size; i < size; ++i) {
+
+		if (!trx_doublewrite->in_use[i]) {
+			break;
+		}
+	}
+
+	/* We are guaranteed to find a slot. */
+	ut_a(i < size);
+	trx_doublewrite->in_use[i] = TRUE;
+	trx_doublewrite->n_reserved++;
+	trx_doublewrite->buf_block_arr[i] = bpage;
+	mutex_exit(&trx_doublewrite->mutex);
+
+	/* Lets see if we are going to write in the first or second
+	block of the doublewrite buffer. */
+	if (i < TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) {
+		offset = trx_doublewrite->block1 + i;
+	} else {
+		offset = trx_doublewrite->block2 + i
+			 - TRX_SYS_DOUBLEWRITE_BLOCK_SIZE;
+	}
+
+	/* We deal with compressed and uncompressed pages a little
+	differently here. In case of uncompressed pages we can
+	directly write the block to the allocated slot in the
+	doublewrite buffer in the system tablespace and then after
+	syncing the system table space we can proceed to write the page
+	in the datafile.
+	In case of compressed page we first do a memcpy of the block
+	to the in-memory buffer of doublewrite before proceeding to
+	write it. This is so because we want to pad the remaining
+	bytes in the doublewrite page with zeros. */
+
+	zip_size = buf_page_get_zip_size(bpage);
+	if (zip_size) {
+		memcpy(trx_doublewrite->write_buf + UNIV_PAGE_SIZE * i,
+		       bpage->zip.data, zip_size);
+		memset(trx_doublewrite->write_buf + UNIV_PAGE_SIZE * i
+		       + zip_size, 0, UNIV_PAGE_SIZE - zip_size);
+
+		fil_io(OS_FILE_WRITE, TRUE, TRX_SYS_SPACE, 0,
+		       offset, 0, UNIV_PAGE_SIZE,
+		       (void*) (trx_doublewrite->write_buf
+				+ UNIV_PAGE_SIZE * i), NULL);
+	} else {
+		/* It is a regular page. Write it directly to the
+		doublewrite buffer */
+		fil_io(OS_FILE_WRITE, TRUE, TRX_SYS_SPACE, 0,
+		       offset, 0, UNIV_PAGE_SIZE,
+		       (void*) ((buf_block_t*) bpage)->frame,
+		       NULL);
+	}
+
+	/* Now flush the doublewrite buffer data to disk */
+	fil_flush(TRX_SYS_SPACE);
+
+	/* We know that the write has been flushed to disk now
+	and during recovery we will find it in the doublewrite buffer
+	blocks. Next do the write to the intended position. */
+	buf_flush_write_block_to_datafile((buf_block_t*) bpage);
+
+	/* Sync the writes to the disk. */
+	buf_flush_sync_datafiles();
+
+	mutex_enter(&trx_doublewrite->mutex);
+
+	trx_doublewrite->n_reserved--;
+	trx_doublewrite->buf_block_arr[i] = NULL;
+	trx_doublewrite->in_use[i] = FALSE;
+
+	/* increment the doublewrite flushed pages counter */
+	srv_dblwr_pages_written += trx_doublewrite->first_free;
+	srv_dblwr_writes++;
+
+	mutex_exit(&(trx_doublewrite->mutex));
+
+}
 #endif /* !UNIV_HOTBACKUP */
 
 /********************************************************************//**
@@ -1092,7 +1273,8 @@ static
 void
 buf_flush_write_block_low(
 /*======================*/
-	buf_page_t*	bpage)	/*!< in: buffer block to write */
+	buf_page_t*	bpage,		/*!< in: buffer block to write */
+	enum buf_flush	flush_type)	/*!< in: type of flush */
 {
 	ulint	zip_size	= buf_page_get_zip_size(bpage);
 	page_t*	frame		= NULL;
@@ -1174,88 +1356,13 @@ buf_flush_write_block_low(
 		       buf_page_get_page_no(bpage), 0,
 		       zip_size ? zip_size : UNIV_PAGE_SIZE,
 		       frame, bpage);
+	} else if (flush_type == BUF_FLUSH_SINGLE_PAGE) {
+		buf_flush_write_to_dblwr_and_datafile(bpage);
 	} else {
 		buf_flush_post_to_doublewrite_buf(bpage);
 	}
 }
 
-# if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
-/********************************************************************//**
-Writes a flushable page asynchronously from the buffer pool to a file.
-NOTE: buf_pool->mutex and block->mutex must be held upon entering this
-function, and they will be released by this function after flushing.
-This is loosely based on buf_flush_batch() and buf_flush_page().
-@return TRUE if the page was flushed and the mutexes released */
-UNIV_INTERN
-ibool
-buf_flush_page_try(
-/*===============*/
-	buf_pool_t*	buf_pool,	/*!< in/out: buffer pool instance */
-	buf_block_t*	block)		/*!< in/out: buffer control block */
-{
-	ut_ad(buf_pool_mutex_own(buf_pool));
-	ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
-	ut_ad(mutex_own(&block->mutex));
-
-	if (!buf_flush_ready_for_flush(&block->page, BUF_FLUSH_LRU)) {
-		return(FALSE);
-	}
-
-	if (buf_pool->n_flush[BUF_FLUSH_LRU] > 0
-	    || buf_pool->init_flush[BUF_FLUSH_LRU]) {
-		/* There is already a flush batch of the same type running */
-		return(FALSE);
-	}
-
-	buf_pool->init_flush[BUF_FLUSH_LRU] = TRUE;
-
-	buf_page_set_io_fix(&block->page, BUF_IO_WRITE);
-
-	buf_page_set_flush_type(&block->page, BUF_FLUSH_LRU);
-
-	if (buf_pool->n_flush[BUF_FLUSH_LRU]++ == 0) {
-
-		os_event_reset(buf_pool->no_flush[BUF_FLUSH_LRU]);
-	}
-
-	/* VERY IMPORTANT:
-	Because any thread may call the LRU flush, even when owning
-	locks on pages, to avoid deadlocks, we must make sure that the
-	s-lock is acquired on the page without waiting: this is
-	accomplished because buf_flush_ready_for_flush() must hold,
-	and that requires the page not to be bufferfixed. */
-
-	rw_lock_s_lock_gen(&block->lock, BUF_IO_WRITE);
-
-	/* Note that the s-latch is acquired before releasing the
-	buf_pool mutex: this ensures that the latch is acquired
-	immediately. */
-
-	mutex_exit(&block->mutex);
-	buf_pool_mutex_exit(buf_pool);
-
-	/* Even though block is not protected by any mutex at this
-	point, it is safe to access block, because it is io_fixed and
-	oldest_modification != 0.  Thus, it cannot be relocated in the
-	buffer pool or removed from flush_list or LRU_list. */
-
-	buf_flush_write_block_low(&block->page);
-
-	buf_pool_mutex_enter(buf_pool);
-	buf_pool->init_flush[BUF_FLUSH_LRU] = FALSE;
-
-	if (buf_pool->n_flush[BUF_FLUSH_LRU] == 0) {
-		/* The running flush batch has ended */
-		os_event_set(buf_pool->no_flush[BUF_FLUSH_LRU]);
-	}
-
-	buf_pool_mutex_exit(buf_pool);
-	buf_flush_buffered_writes();
-
-	return(TRUE);
-}
-# endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
-
 /********************************************************************//**
 Writes a flushable page asynchronously from the buffer pool to a file.
 NOTE: in simulated aio we must call
@@ -1269,13 +1376,12 @@ buf_flush_page(
 /*===========*/
 	buf_pool_t*	buf_pool,	/*!< in: buffer pool instance */
 	buf_page_t*	bpage,		/*!< in: buffer control block */
-	enum buf_flush	flush_type)	/*!< in: BUF_FLUSH_LRU
-					or BUF_FLUSH_LIST */
+	enum buf_flush	flush_type)	/*!< in: type of flush */
 {
 	mutex_t*	block_mutex;
 	ibool		is_uncompressed;
 
-	ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST);
+	ut_ad(flush_type < BUF_FLUSH_N_TYPES);
 	ut_ad(buf_pool_mutex_own(buf_pool));
 	ut_ad(buf_page_in_file(bpage));
 
@@ -1311,8 +1417,6 @@ buf_flush_page(
 					   BUF_IO_WRITE);
 		}
 
-		MONITOR_INC(MONITOR_BUF_FLUSH_LIST);
-
 		mutex_exit(block_mutex);
 		buf_pool_mutex_exit(buf_pool);
 
@@ -1334,20 +1438,23 @@ buf_flush_page(
 		break;
 
 	case BUF_FLUSH_LRU:
+	case BUF_FLUSH_SINGLE_PAGE:
 		/* VERY IMPORTANT:
-		Because any thread may call the LRU flush, even when owning
-		locks on pages, to avoid deadlocks, we must make sure that the
-		s-lock is acquired on the page without waiting: this is
-		accomplished because buf_flush_ready_for_flush() must hold,
-		and that requires the page not to be bufferfixed. */
+		Because any thread may call single page flush, even when
+		owning locks on pages, to avoid deadlocks, we must make
+		sure that the s-lock is acquired on the page without
+		waiting: this is accomplished because
+		buf_flush_ready_for_flush() must hold, and that requires
+		the page not to be bufferfixed.
+		The same holds true for LRU flush because a user thread
+		may end up waiting for an LRU flush to end while
+		holding locks on other pages. */
 
 		if (is_uncompressed) {
 			rw_lock_s_lock_gen(&((buf_block_t*) bpage)->lock,
 					   BUF_IO_WRITE);
 		}
 
-		MONITOR_INC(MONITOR_BUF_FLUSH_LRU);
-
 		/* Note that the s-latch is acquired before releasing the
 		buf_pool mutex: this ensures that the latch is acquired
 		immediately. */
@@ -1372,9 +1479,37 @@ buf_flush_page(
 			flush_type, bpage->space, bpage->offset);
 	}
 #endif /* UNIV_DEBUG */
-	buf_flush_write_block_low(bpage);
+	buf_flush_write_block_low(bpage, flush_type);
 }
 
+# if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
+/********************************************************************//**
+Writes a flushable page asynchronously from the buffer pool to a file.
+NOTE: buf_pool->mutex and block->mutex must be held upon entering this
+function, and they will be released by this function after flushing.
+This is loosely based on buf_flush_batch() and buf_flush_page().
+@return TRUE if the page was flushed and the mutexes released */
+UNIV_INTERN
+ibool
+buf_flush_page_try(
+/*===============*/
+	buf_pool_t*	buf_pool,	/*!< in/out: buffer pool instance */
+	buf_block_t*	block)		/*!< in/out: buffer control block */
+{
+	ut_ad(buf_pool_mutex_own(buf_pool));
+	ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
+	ut_ad(mutex_own(&block->mutex));
+
+	if (!buf_flush_ready_for_flush(&block->page, BUF_FLUSH_SINGLE_PAGE)) {
+		return(FALSE);
+	}
+
+	/* The following call will release the buffer pool and
+	block mutex. */
+	buf_flush_page(buf_pool, &block->page, BUF_FLUSH_SINGLE_PAGE);
+	return(TRUE);
+}
+# endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
 /***********************************************************//**
 Flushes to disk all flushable pages within the flush area.
 @return	number of pages flushed */
@@ -1399,10 +1534,10 @@ buf_flush_try_neighbors(
 
 	ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST);
 
-	if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN) {
-		/* If there is little space, it is better not to flush
-		any block except from the end of the LRU list */
-
+	if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN
+	    || !srv_flush_neighbors) {
+		/* If there is little space or neighbor flushing is
+		not enabled then just flush the victim. */
 		low = offset;
 		high = offset + 1;
 	} else {
@@ -1493,6 +1628,14 @@ buf_flush_try_neighbors(
 		buf_pool_mutex_exit(buf_pool);
 	}
 
+	if (count > 0) {
+		MONITOR_INC_VALUE_CUMULATIVE(
+					MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
+					MONITOR_FLUSH_NEIGHBOR_COUNT,
+					MONITOR_FLUSH_NEIGHBOR_PAGES,
+					(count - 1));
+	}
+
 	return(count);
 }
 
@@ -1500,7 +1643,7 @@ buf_flush_try_neighbors(
 Check if the block is modified and ready for flushing. If the the block
 is ready to flush then flush the page and try o flush its neighbors.
 
-@return	TRUE if buf_pool mutex was not released during this function.
+@return	TRUE if buf_pool mutex was released during this function.
 This does not guarantee that some pages were written as well.
 Number of pages written are incremented to the count. */
 static
@@ -1566,36 +1709,77 @@ buf_flush_page_and_try_neighbors(
 
 /*******************************************************************//**
 This utility flushes dirty blocks from the end of the LRU list.
-In the case of an LRU flush the calling thread may own latches to
-pages: to avoid deadlocks, this function must be written so that it
-cannot end up waiting for these latches!
+The calling thread is not allowed to own any latches on pages!
+It attempts to make 'max' blocks available in the free list. Note that
+it is a best effort attempt and it is not guaranteed that after a call
+to this function there will be 'max' blocks in the free list.
 @return number of blocks for which the write request was queued. */
 static
 ulint
 buf_flush_LRU_list_batch(
 /*=====================*/
 	buf_pool_t*	buf_pool,	/*!< in: buffer pool instance */
-	ulint		max)		/*!< in: max of blocks to flush */
+	ulint		max)		/*!< in: desired number of
+					blocks in the free_list */
 {
 	buf_page_t*	bpage;
+	ulint		scanned = 0;
 	ulint		count = 0;
+	ulint		free_len = UT_LIST_GET_LEN(buf_pool->free);
+	ulint		lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
 
 	ut_ad(buf_pool_mutex_own(buf_pool));
 
-	do {
-		/* Start from the end of the list looking for a
-		suitable block to be flushed. */
-		bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+	bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+	while (bpage != NULL && count < max
+	       && free_len < srv_LRU_scan_depth
+	       && lru_len > BUF_LRU_MIN_LEN) {
 
-		/* Iterate backwards over the flush list till we find
-		a page that isn't ready for flushing. */
-		while (bpage != NULL
-		       && !buf_flush_page_and_try_neighbors(
-				bpage, BUF_FLUSH_LRU, max, &count)) {
+		mutex_t* block_mutex = buf_page_get_mutex(bpage);
+		ibool	 evict;
+
+		mutex_enter(block_mutex);
+		evict = buf_flush_ready_for_replace(bpage);
+		mutex_exit(block_mutex);
 
+		++scanned;
+
+		/* If the block is ready to be replaced we try to
+		free it i.e.: put it on the free list.
+		Otherwise we try to flush the block and its
+		neighbors. In this case we'll put it on the
+		free list in the next pass. We do this extra work
+		of putting blocks to the free list instead of
+		just flushing them because after every flush
+		we have to restart the scan from the tail of
+		the LRU list and if we don't clear the tail
+		of the flushed pages then the scan becomes
+		O(n*n). */
+		if (evict) {
+
+			ibool	evict_zip;
+
+			evict_zip = !buf_LRU_evict_from_unzip_LRU(buf_pool);
+
+			/* This will potentially release the
+			buf_pool->mutex. */
+			buf_LRU_free_block(bpage, evict_zip);
+			bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+
+		} else if (buf_flush_page_and_try_neighbors(
+				bpage,
+				BUF_FLUSH_LRU, max, &count)) {
+
+			/* buf_pool->mutex was released.
+			Restart the scan. */
+			bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+		} else {
 			bpage = UT_LIST_GET_PREV(LRU, bpage);
 		}
-	} while (bpage != NULL && count < max);
+
+		free_len = UT_LIST_GET_LEN(buf_pool->free);
+		lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
+	}
 
 	/* We keep track of all flushes happening as part of LRU
 	flush. When estimating the desired rate at which flush_list
@@ -1604,6 +1788,13 @@ buf_flush_LRU_list_batch(
 
 	ut_ad(buf_pool_mutex_own(buf_pool));
 
+	if (scanned) {
+		MONITOR_INC_VALUE_CUMULATIVE(MONITOR_LRU_BATCH_SCANNED,
+				MONITOR_LRU_BATCH_SCANNED_NUM_CALL,
+				MONITOR_LRU_BATCH_SCANNED_PER_CALL,
+				scanned);
+	}
+
 	return(count);
 }
 
@@ -1857,6 +2048,8 @@ buf_flush_end(
 
 	buf_pool->init_flush[flush_type] = FALSE;
 
+	buf_pool->try_LRU_scan = TRUE;
+
 	if (buf_pool->n_flush[flush_type] == 0) {
 
 		/* The running flush batch has ended */
@@ -1899,17 +2092,17 @@ buf_flush_wait_batch_end(
 }
 
 /*******************************************************************//**
-This utility flushes dirty blocks from the end of the LRU list.
-NOTE: The calling thread may own latches to pages: to avoid deadlocks,
-this function must be written so that it cannot end up waiting for these
-latches!
+This utility flushes dirty blocks from the end of the LRU list and also
+puts replaceable clean pages from the end of the LRU list to the free
+list.
+NOTE: The calling thread is not allowed to own any latches on pages!
 @return number of blocks for which the write request was queued;
 ULINT_UNDEFINED if there was a flush of the same type already running */
-UNIV_INTERN
+static
 ulint
 buf_flush_LRU(
 /*==========*/
-	buf_pool_t*	buf_pool,	/*!< in: buffer pool instance */
+	buf_pool_t*	buf_pool,	/*!< in/out: buffer pool instance */
 	ulint		min_n)		/*!< in: wished minimum mumber of blocks
 					flushed (it is not guaranteed that the
 					actual number is that big, though) */
@@ -1993,121 +2186,109 @@ buf_flush_list(
 
 		total_page_count += page_count;
 
-		MONITOR_INC_VALUE_CUMULATIVE(
+		if (page_count) {
+			MONITOR_INC_VALUE_CUMULATIVE(
 				MONITOR_FLUSH_BATCH_TOTAL_PAGE,
 				MONITOR_FLUSH_BATCH_COUNT,
 				MONITOR_FLUSH_BATCH_PAGES,
 				page_count);
+		}
 	}
 
 	return(lsn_limit != LSN_MAX && skipped
 	       ? ULINT_UNDEFINED : total_page_count);
 }
- 
+
 /******************************************************************//**
-Gives a recommendation of how many blocks should be flushed to establish
-a big enough margin of replaceable blocks near the end of the LRU list
-and in the free list.
-@return number of blocks which should be flushed from the end of the
-LRU list */
-static
-ulint
-buf_flush_LRU_recommendation(
-/*=========================*/
-	buf_pool_t*	buf_pool)		/*!< in: Buffer pool instance */
+This function picks up a single dirty page from the tail of the LRU
+list, flushes it, removes it from page_hash and LRU list and puts
+it on the free list. It is called from user threads when they are
+unable to find a replaceable page at the tail of the LRU list i.e.:
+when the background LRU flushing in the page_cleaner thread is not
+fast enough to keep pace with the workload.
+@return TRUE if success. */
+UNIV_INTERN
+ibool
+buf_flush_single_page_from_LRU(
+/*===========================*/
+	buf_pool_t*	buf_pool)	/*!< in/out: buffer pool instance */
 {
+	ulint		scanned;
 	buf_page_t*	bpage;
-	ulint		n_replaceable;
-	ulint		distance	= 0;
+	mutex_t*	block_mutex;
+	ibool		freed;
+	ibool		evict_zip;
 
 	buf_pool_mutex_enter(buf_pool);
 
-	n_replaceable = UT_LIST_GET_LEN(buf_pool->free);
-
-	bpage = UT_LIST_GET_LAST(buf_pool->LRU);
-
-	while ((bpage != NULL)
-	       && (n_replaceable < BUF_FLUSH_FREE_BLOCK_MARGIN(buf_pool)
-		   + BUF_FLUSH_EXTRA_MARGIN(buf_pool))
-	       && (distance < BUF_LRU_FREE_SEARCH_LEN(buf_pool))) {
-
-		mutex_t* block_mutex = buf_page_get_mutex(bpage);
+	for (bpage = UT_LIST_GET_LAST(buf_pool->LRU), scanned = 1;
+	     bpage != NULL;
+	     bpage = UT_LIST_GET_PREV(LRU, bpage), ++scanned) {
 
+		block_mutex = buf_page_get_mutex(bpage);
 		mutex_enter(block_mutex);
-
-		if (buf_flush_ready_for_replace(bpage)) {
-			n_replaceable++;
+		if (buf_flush_ready_for_flush(bpage,
+					      BUF_FLUSH_SINGLE_PAGE)) {
+			/* buf_flush_page() will release the block
+			mutex */
+			break;
 		}
-
 		mutex_exit(block_mutex);
-
-		distance++;
-
-		bpage = UT_LIST_GET_PREV(LRU, bpage);
 	}
 
-	buf_pool_mutex_exit(buf_pool);
-
-	if (n_replaceable >= BUF_FLUSH_FREE_BLOCK_MARGIN(buf_pool)) {
+	MONITOR_INC_VALUE_CUMULATIVE(
+		MONITOR_LRU_SINGLE_FLUSH_SCANNED,
+		MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL,
+		MONITOR_LRU_SINGLE_FLUSH_SCANNED_PER_CALL,
+		scanned);
 
-		return(0);
+	if (!bpage) {
+		/* Can't find a single flushable page. */
+		buf_pool_mutex_exit(buf_pool);
+		return(FALSE);
 	}
 
-	return(BUF_FLUSH_FREE_BLOCK_MARGIN(buf_pool)
-	       + BUF_FLUSH_EXTRA_MARGIN(buf_pool)
-	       - n_replaceable);
-}
-
-/*********************************************************************//**
-Flushes pages from the end of the LRU list if there is too small a margin
-of replaceable pages there or in the free list. VERY IMPORTANT: this function
-is called also by threads which have locks on pages. To avoid deadlocks, we
-flush only pages such that the s-lock required for flushing can be acquired
-immediately, without waiting. */
-UNIV_INTERN
-void
-buf_flush_free_margin(
-/*==================*/
-	buf_pool_t*	buf_pool)		/*!< in: Buffer pool instance */
-{
-	ulint	n_to_flush;
-
-	n_to_flush = buf_flush_LRU_recommendation(buf_pool);
-
-	if (n_to_flush > 0) {
-		ulint	n_flushed;
+	/* The following call will release the buffer pool and
+	block mutex. */
+	buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE);
+
+	/* At this point the page has been written to the disk.
+	As we are not holding buffer pool or block mutex therefore
+	we cannot use the bpage safely. It may have been plucked out
+	of the LRU list by some other thread or it may even have
+	relocated in case of a compressed page. We need to start
+	the scan of LRU list again to remove the block from the LRU
+	list and put it on the free list. */
+	buf_pool_mutex_enter(buf_pool);
 
-		n_flushed = buf_flush_LRU(buf_pool, n_to_flush);
+	for (bpage = UT_LIST_GET_LAST(buf_pool->LRU);
+	     bpage != NULL;
+	     bpage = UT_LIST_GET_PREV(LRU, bpage)) {
 
-		if (n_flushed == ULINT_UNDEFINED) {
-			/* There was an LRU type flush batch already running;
-			let us wait for it to end */
+		ibool	ready;
 
-			buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);
-		} else {
-			MONITOR_INC(MONITOR_NUM_FREE_MARGIN_FLUSHES);
-			MONITOR_INC_VALUE(MONITOR_FLUSH_FREE_MARGIN_PAGES,
-					  n_flushed);
+		block_mutex = buf_page_get_mutex(bpage);
+		mutex_enter(block_mutex);
+		ready = buf_flush_ready_for_replace(bpage);
+		mutex_exit(block_mutex);
+		if (ready) {
+			break;
 		}
+
 	}
-}
 
-/*********************************************************************//**
-Flushes pages from the end of all the LRU lists. */
-UNIV_INTERN
-void
-buf_flush_free_margins(void)
-/*========================*/
-{
-	ulint	i;
+	if (!bpage) {
+		/* Can't find a single replaceable page. */
+		buf_pool_mutex_exit(buf_pool);
+		return(FALSE);
+	}
 
-	for (i = 0; i < srv_buf_pool_instances; i++) {
-		buf_pool_t*	buf_pool;
+	evict_zip = !buf_LRU_evict_from_unzip_LRU(buf_pool);;
 
-		buf_pool = buf_pool_from_array(i);
+	freed = buf_LRU_free_block(bpage, evict_zip);
+	buf_pool_mutex_exit(buf_pool);
 
-		buf_flush_free_margin(buf_pool);
-	}
+	return(freed);
 }
 
 /*********************************************************************
@@ -2233,6 +2414,84 @@ buf_flush_get_desired_flush_rate(void)
 }
 
 /*********************************************************************//**
+Clears up tail of the LRU lists:
+* Put replaceable pages at the tail of LRU to the free list
+* Flush dirty pages at the tail of LRU to the disk
+The depth to which we scan each buffer pool is controlled by dynamic
+config parameter innodb_LRU_scan_depth.
+@return total pages flushed */
+UNIV_INLINE
+ulint
+page_cleaner_flush_LRU_tail(void)
+/*=============================*/
+{
+	ulint	i;
+	ulint	j;
+	ulint	total_flushed = 0;
+
+	for (i = 0; i < srv_buf_pool_instances; i++) {
+
+		buf_pool_t*	buf_pool = buf_pool_from_array(i);
+
+		/* We divide LRU flush into smaller chunks because
+		there may be user threads waiting for the flush to
+		end in buf_LRU_get_free_block(). */
+		for (j = 0;
+		     j < srv_LRU_scan_depth;
+		     j += PAGE_CLEANER_LRU_BATCH_CHUNK_SIZE) {
+
+			ulint	n_flushed = buf_flush_LRU(buf_pool,
+				PAGE_CLEANER_LRU_BATCH_CHUNK_SIZE);
+
+			/* Currently page_cleaner is the only thread
+			that can trigger an LRU flush. It is possible
+			that a batch triggered during last iteration is
+			still running, */
+			if (n_flushed != ULINT_UNDEFINED) {
+				total_flushed += n_flushed;
+			}
+		}
+	}
+
+	if (total_flushed) {
+		MONITOR_INC_VALUE_CUMULATIVE(
+			MONITOR_LRU_BATCH_TOTAL_PAGE,
+			MONITOR_LRU_BATCH_COUNT,
+			MONITOR_LRU_BATCH_PAGES,
+			total_flushed);
+	}
+
+	return(total_flushed);
+}
+
+/*********************************************************************//**
+Wait for any possible LRU flushes that are in progress to end. */
+UNIV_INLINE
+void
+page_cleaner_wait_LRU_flush(void)
+/*=============================*/
+{
+	ulint	i;
+
+	for (i = 0; i < srv_buf_pool_instances; i++) {
+		buf_pool_t*	buf_pool;
+
+		buf_pool = buf_pool_from_array(i);
+
+		buf_pool_mutex_enter(buf_pool);
+
+		if (buf_pool->n_flush[BUF_FLUSH_LRU] > 0
+		   || buf_pool->init_flush[BUF_FLUSH_LRU]) {
+
+			buf_pool_mutex_exit(buf_pool);
+			buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);
+		} else {
+			buf_pool_mutex_exit(buf_pool);
+		}
+	}
+}
+
+/*********************************************************************//**
 Flush a batch of dirty pages from the flush list
 @return number of pages flushed, 0 if no page is flushed or if another
 flush_list type batch is running */
@@ -2256,12 +2515,6 @@ page_cleaner_do_flush_batch(
 		n_flushed = 0;
 	}
 
-	/* Record the IO capacity percentage used for the flush.
-	Note that this can be more than 100% in case where we
-	are being asked to flush to a certain lsn_limit */
-	MONITOR_SET(MONITOR_FLUSH_IO_CAPACITY_PCT,
-		    n_flushed * 100 / srv_io_capacity)
-
 	return(n_flushed);
 }
 
@@ -2303,8 +2556,11 @@ page_cleaner_flush_pages_if_needed(void)
 		n_pages_flushed = page_cleaner_do_flush_batch(ULINT_MAX,
 							      lsn_limit);
 
-		MONITOR_INC(MONITOR_NUM_ASYNC_FLUSHES);
-		MONITOR_SET(MONITOR_FLUSH_ASYNC_PAGES, n_pages_flushed);
+		MONITOR_INC_VALUE_CUMULATIVE(
+			MONITOR_FLUSH_ASYNC_TOTAL_PAGE,
+			MONITOR_FLUSH_ASYNC_COUNT,
+			MONITOR_FLUSH_ASYNC_PAGES,
+			n_pages_flushed);
 	}
 
 	if (UNIV_UNLIKELY(n_pages_flushed < PCT_IO(100)
@@ -2316,8 +2572,12 @@ page_cleaner_flush_pages_if_needed(void)
 
 		n_pages_flushed += page_cleaner_do_flush_batch(PCT_IO(100),
 							       LSN_MAX);
-		MONITOR_INC(MONITOR_NUM_MAX_DIRTY_FLUSHES);
-		MONITOR_SET(MONITOR_FLUSH_MAX_DIRTY_PAGES, n_pages_flushed);
+
+		MONITOR_INC_VALUE_CUMULATIVE(
+			MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE,
+			MONITOR_FLUSH_MAX_DIRTY_COUNT,
+			MONITOR_FLUSH_MAX_DIRTY_PAGES,
+			n_pages_flushed);
 	}
 
 	if (srv_adaptive_flushing && n_pages_flushed == 0) {
@@ -2330,12 +2590,13 @@ page_cleaner_flush_pages_if_needed(void)
 		ut_ad(n_flush <= PCT_IO(100));
 		if (n_flush) {
 			n_pages_flushed = page_cleaner_do_flush_batch(
-							n_flush,
-							LSN_MAX);
+				n_flush, LSN_MAX);
 
-			MONITOR_INC(MONITOR_NUM_ADAPTIVE_FLUSHES);
-			MONITOR_SET(MONITOR_FLUSH_ADAPTIVE_PAGES,
-				    n_pages_flushed);
+			MONITOR_INC_VALUE_CUMULATIVE(
+				MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
+				MONITOR_FLUSH_ADAPTIVE_COUNT,
+				MONITOR_FLUSH_ADAPTIVE_PAGES,
+				n_pages_flushed);
 		}
 	}
 
@@ -2407,11 +2668,24 @@ buf_flush_page_cleaner_thread(
 
 		if (srv_check_activity(last_activity)) {
 			last_activity = srv_get_activity_count();
-			n_flushed = page_cleaner_flush_pages_if_needed();
+
+			/* Flush pages from end of LRU if required */
+			n_flushed = page_cleaner_flush_LRU_tail();
+
+			/* Flush pages from flush_list if required */
+			n_flushed += page_cleaner_flush_pages_if_needed();
 		} else {
 			n_flushed = page_cleaner_do_flush_batch(
 							PCT_IO(100),
 							LSN_MAX);
+
+			if (n_flushed) {
+				MONITOR_INC_VALUE_CUMULATIVE(
+					MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
+					MONITOR_FLUSH_BACKGROUND_COUNT,
+					MONITOR_FLUSH_BACKGROUND_PAGES,
+					n_flushed);
+			}
 		}
 	}
 
@@ -2456,6 +2730,8 @@ buf_flush_page_cleaner_thread(
 	sweep and we'll come out of the loop leaving behind dirty pages
 	in the flush_list */
 	buf_flush_wait_batch_end(NULL, BUF_FLUSH_LIST);
+	page_cleaner_wait_LRU_flush();
+
 	do {
 
 		n_flushed = buf_flush_list(PCT_IO(100), LSN_MAX);

=== modified file 'storage/innobase/buf/buf0lru.c'
--- a/storage/innobase/buf/buf0lru.c	revid:marc.alff@stripped
+++ b/storage/innobase/buf/buf0lru.c	revid:inaam.rana@stripped
@@ -49,6 +49,7 @@ Created 11/5/1995 Heikki Tuuri
 #include "page0zip.h"
 #include "log0recv.h"
 #include "srv0srv.h"
+#include "srv0mon.h"
 
 /** The number of blocks from the LRU_old pointer onward, including
 the block pointed to, must be buf_pool->LRU_old_ratio/BUF_LRU_OLD_RATIO_DIV
@@ -155,7 +156,7 @@ buf_LRU_block_free_hashed_page(
 Determines if the unzip_LRU list should be used for evicting a victim
 instead of the general LRU list.
 @return	TRUE if should use unzip_LRU */
-UNIV_INLINE
+UNIV_INTERN
 ibool
 buf_LRU_evict_from_unzip_LRU(
 /*=========================*/
@@ -548,52 +549,39 @@ ibool
 buf_LRU_free_from_unzip_LRU_list(
 /*=============================*/
 	buf_pool_t*	buf_pool,	/*!< in: buffer pool instance */
-	ulint		n_iterations)	/*!< in: how many times this has
-					been called repeatedly without
-					result: a high value means that
-					we should search farther; we will
-					search n_iterations / 5 of the
-					unzip_LRU list, or nothing if
-					n_iterations >= 5 */
+	ibool		scan_all)	/*!< in: scan whole LRU list
+					if TRUE, otherwise scan only
+					srv_LRU_scan_depth / 2 blocks. */
 {
 	buf_block_t*	block;
-	ulint		distance;
+	ibool 		freed;
+	ulint		scanned;
 
 	ut_ad(buf_pool_mutex_own(buf_pool));
 
-	/* Theoratically it should be much easier to find a victim
-	from unzip_LRU as we can choose even a dirty block (as we'll
-	be evicting only the uncompressed frame).  In a very unlikely
-	eventuality that we are unable to find a victim from
-	unzip_LRU, we fall back to the regular LRU list.  We do this
-	if we have done five iterations so far. */
-
-	if (UNIV_UNLIKELY(n_iterations >= 5)
-	    || !buf_LRU_evict_from_unzip_LRU(buf_pool)) {
-
+	if (!buf_LRU_evict_from_unzip_LRU(buf_pool)) {
 		return(FALSE);
 	}
 
-	distance = 100 + (n_iterations
-			  * UT_LIST_GET_LEN(buf_pool->unzip_LRU)) / 5;
-
-	for (block = UT_LIST_GET_LAST(buf_pool->unzip_LRU);
-	     UNIV_LIKELY(block != NULL) && UNIV_LIKELY(distance > 0);
-	     block = UT_LIST_GET_PREV(unzip_LRU, block), distance--) {
-
-		ibool freed;
+	for (block = UT_LIST_GET_LAST(buf_pool->unzip_LRU),
+	     scanned = 1, freed = FALSE;
+	     block != NULL && !freed
+	     && (scan_all || scanned < srv_LRU_scan_depth);
+	     block = UT_LIST_GET_PREV(unzip_LRU, block), ++scanned) {
 
 		ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
 		ut_ad(block->in_unzip_LRU_list);
 		ut_ad(block->page.in_LRU_list);
 
 		freed = buf_LRU_free_block(&block->page, FALSE);
-		if (freed) {
-			return(TRUE);
-		}
 	}
 
-	return(FALSE);
+	MONITOR_INC_VALUE_CUMULATIVE(
+		MONITOR_LRU_UNZIP_SEARCH_SCANNED,
+		MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL,
+		MONITOR_LRU_UNZIP_SEARCH_SCANNED_PER_CALL,
+		scanned);
+	return(freed);
 }
 
 /******************************************************************//**
@@ -603,27 +591,23 @@ UNIV_INLINE
 ibool
 buf_LRU_free_from_common_LRU_list(
 /*==============================*/
-	buf_pool_t*	buf_pool,
-	ulint		n_iterations)
-				/*!< in: how many times this has been called
-				repeatedly without result: a high value means
-				that we should search farther; if
-				n_iterations < 10, then we search
-				n_iterations / 10 * buf_pool->curr_size
-				pages from the end of the LRU list */
+	buf_pool_t*	buf_pool,	/*!< in: buffer pool instance */
+	ibool		scan_all)	/*!< in: scan whole LRU list
+					if TRUE, otherwise scan only
+					srv_LRU_scan_depth / 2 blocks. */
 {
 	buf_page_t*	bpage;
-	ulint		distance;
+	ibool		freed;
+	ulint		scanned;
 
 	ut_ad(buf_pool_mutex_own(buf_pool));
 
-	distance = 100 + (n_iterations * buf_pool->curr_size) / 10;
+	for (bpage = UT_LIST_GET_LAST(buf_pool->LRU),
+	     scanned = 1, freed = FALSE;
+	     bpage != NULL && !freed
+	     && (scan_all || scanned < srv_LRU_scan_depth);
+	     bpage = UT_LIST_GET_PREV(LRU, bpage), ++scanned) {
 
-	for (bpage = UT_LIST_GET_LAST(buf_pool->LRU);
-	     UNIV_LIKELY(bpage != NULL) && UNIV_LIKELY(distance > 0);
-	     bpage = UT_LIST_GET_PREV(LRU, bpage), distance--) {
-
-		ibool		freed;
 		unsigned	accessed;
 
 		ut_ad(buf_page_in_file(bpage));
@@ -631,18 +615,21 @@ buf_LRU_free_from_common_LRU_list(
 
 		accessed = buf_page_is_accessed(bpage);
 		freed = buf_LRU_free_block(bpage, TRUE);
-		if (freed) {
+		if (freed && !accessed) {
 			/* Keep track of pages that are evicted without
 			ever being accessed. This gives us a measure of
 			the effectiveness of readahead */
-			if (!accessed) {
-				++buf_pool->stat.n_ra_pages_evicted;
-			}
-			return(TRUE);
+			++buf_pool->stat.n_ra_pages_evicted;
 		}
 	}
 
-	return(FALSE);
+	MONITOR_INC_VALUE_CUMULATIVE(
+		MONITOR_LRU_SEARCH_SCANNED,
+		MONITOR_LRU_SEARCH_SCANNED_NUM_CALL,
+		MONITOR_LRU_SEARCH_SCANNED_PER_CALL,
+		scanned);
+
+	return(freed);
 }
 
 /******************************************************************//**
@@ -650,78 +637,18 @@ Try to free a replaceable block.
 @return	TRUE if found and freed */
 UNIV_INTERN
 ibool
-buf_LRU_search_and_free_block(
-/*==========================*/
-	buf_pool_t*	buf_pool,
-				/*!< in: buffer pool instance */
-	ulint		n_iterations)
-				/*!< in: how many times this has been called
-				repeatedly without result: a high value means
-				that we should search farther; if
-				n_iterations < 10, then we search
-				n_iterations / 10 * buf_pool->curr_size
-				pages from the end of the LRU list; if
-				n_iterations < 5, then we will also search
-				n_iterations / 5 of the unzip_LRU list. */
-{
-	ibool	freed = FALSE;
-
-	buf_pool_mutex_enter(buf_pool);
-
-	freed = buf_LRU_free_from_unzip_LRU_list(buf_pool, n_iterations);
-
-	if (!freed) {
-		freed = buf_LRU_free_from_common_LRU_list(
-			buf_pool, n_iterations);
-	}
-
-	if (!freed) {
-		buf_pool->LRU_flush_ended = 0;
-	} else if (buf_pool->LRU_flush_ended > 0) {
-		buf_pool->LRU_flush_ended--;
-	}
-
-	buf_pool_mutex_exit(buf_pool);
-
-	return(freed);
-}
-
-/******************************************************************//**
-Tries to remove LRU flushed blocks from the end of the LRU list and put them
-to the free list. This is beneficial for the efficiency of the insert buffer
-operation, as flushed pages from non-unique non-clustered indexes are here
-taken out of the buffer pool, and their inserts redirected to the insert
-buffer. Otherwise, the flushed blocks could get modified again before read
-operations need new buffer blocks, and the i/o work done in flushing would be
-wasted. */
-UNIV_INTERN
-void
-buf_LRU_try_free_flushed_blocks(
-/*============================*/
-	buf_pool_t*	buf_pool)		/*!< in: buffer pool instance */
+buf_LRU_scan_and_free_block(
+/*========================*/
+	buf_pool_t*	buf_pool,	/*!< in: buffer pool instance */
+	ibool		scan_all)	/*!< in: scan whole LRU list
+					if TRUE, otherwise scan only
+					'old' blocks. */
 {
+	ut_ad(buf_pool_mutex_own(buf_pool));
 
-	if (buf_pool == NULL) {
-		ulint	i;
-
-		for (i = 0; i < srv_buf_pool_instances; i++) {
-			buf_pool = buf_pool_from_array(i);
-			buf_LRU_try_free_flushed_blocks(buf_pool);
-		}
-	} else {
-		buf_pool_mutex_enter(buf_pool);
-
-		while (buf_pool->LRU_flush_ended > 0) {
-
-			buf_pool_mutex_exit(buf_pool);
-
-			buf_LRU_search_and_free_block(buf_pool, 1);
-
-			buf_pool_mutex_enter(buf_pool);
-		}
-
-		buf_pool_mutex_exit(buf_pool);
-	}
+	return(buf_LRU_free_from_unzip_LRU_list(buf_pool, scan_all)
+	       || buf_LRU_free_from_common_LRU_list(
+			buf_pool, scan_all));
 }
 
 /******************************************************************//**
@@ -797,23 +724,17 @@ buf_LRU_get_free_only(
 }
 
 /******************************************************************//**
-Returns a free block from the buf_pool. The block is taken off the
-free list. If it is empty, blocks are moved from the end of the
-LRU list to the free list.
-@return	the free control block, in state BUF_BLOCK_READY_FOR_USE */
-UNIV_INTERN
-buf_block_t*
-buf_LRU_get_free_block(
-/*===================*/
-	buf_pool_t*	buf_pool)	/*!< in/out: buffer pool instance */
+Checks how much of buf_pool is occupied by non-data objects like
+AHI, lock heaps etc. Depending on the size of non-data objects this
+function will either assert or issue a warning and switch on the
+status monitor. */
+static
+void
+buf_LRU_check_size_of_non_data_objects(
+/*===================================*/
+	const buf_pool_t*	buf_pool)	/*!< in: buffer pool instance */
 {
-	buf_block_t*	block		= NULL;
-	ibool		freed;
-	ulint		n_iterations	= 1;
-	ibool		mon_value_was	= FALSE;
-	ibool		started_monitor	= FALSE;
-loop:
-	buf_pool_mutex_enter(buf_pool);
+	ut_ad(buf_pool_mutex_own(buf_pool));
 
 	if (!recv_recovery_on && UT_LIST_GET_LEN(buf_pool->free)
 	    + UT_LIST_GET_LEN(buf_pool->LRU) < buf_pool->curr_size / 20) {
@@ -878,12 +799,59 @@ loop:
 		buf_lru_switched_on_innodb_mon = FALSE;
 		srv_print_innodb_monitor = FALSE;
 	}
+}
+
+/******************************************************************//**
+Returns a free block from the buf_pool. The block is taken off the
+free list. If free list is empty, blocks are moved from the end of the
+LRU list to the free list.
+This function is called from a user thread when it needs a clean
+block to read in a page. Note that we only ever get a block from
+the free list. Even when we flush a page or find a page in LRU scan
+we put it to free list to be used.
+* iteration 0:
+  * get a block from free list, success:done
+  * if there is an LRU flush batch in progress:
+    * wait for batch to end: retry free list
+  * if buf_pool->try_LRU_scan is set
+    * scan LRU up to srv_LRU_scan_depth to find a clean block
+    * the above will put the block on free list
+    * success:retry the free list
+  * flush one dirty page from tail of LRU to disk
+    * the above will put the block on free list
+    * success: retry the free list
+* iteration 1:
+  * same as iteration 0 except:
+    * scan whole LRU list
+    * scan LRU list even if buf_pool->try_LRU_scan is not set
+* iteration > 1:
+  * same as iteration 1 but sleep 100ms
+@return	the free control block, in state BUF_BLOCK_READY_FOR_USE */
+UNIV_INTERN
+buf_block_t*
+buf_LRU_get_free_block(
+/*===================*/
+	buf_pool_t*	buf_pool)	/*!< in/out: buffer pool instance */
+{
+	buf_block_t*	block		= NULL;
+	ibool		freed		= FALSE;
+	ulint		n_iterations	= 0;
+	ulint		flush_failures	= 0;
+	ibool		mon_value_was	= FALSE;
+	ibool		started_monitor	= FALSE;
+
+	MONITOR_INC(MONITOR_LRU_GET_FREE_SEARCH);
+loop:
+	buf_pool_mutex_enter(buf_pool);
+
+	buf_LRU_check_size_of_non_data_objects(buf_pool);
 
 	/* If there is a block in the free list, take it */
 	block = buf_LRU_get_free_only(buf_pool);
-	buf_pool_mutex_exit(buf_pool);
 
 	if (block) {
+
+		buf_pool_mutex_exit(buf_pool);
 		ut_ad(buf_pool_from_block(block) == buf_pool);
 		memset(&block->page.zip, 0, sizeof block->page.zip);
 
@@ -894,20 +862,52 @@ loop:
 		return(block);
 	}
 
-	/* If no block was in the free list, search from the end of the LRU
-	list and try to free a block there */
+	if (buf_pool->init_flush[BUF_FLUSH_LRU]
+	    && srv_use_doublewrite_buf
+	    && trx_doublewrite != NULL) {
+
+		/* If there is an LRU flush happening in the background
+		then we wait for it to end instead of trying a single
+		page flush. If, however, we are not using doublewrite
+		buffer then it is better to do our own single page
+		flush instead of waiting for LRU flush to end. */
+		buf_pool_mutex_exit(buf_pool);
+		buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);
+		goto loop;
+	}
 
-	freed = buf_LRU_search_and_free_block(buf_pool, n_iterations);
+	freed = FALSE;
+	if (buf_pool->try_LRU_scan || n_iterations > 0) {
+		/* If no block was in the free list, search from the
+		end of the LRU list and try to free a block there.
+		If we are doing for the first time we'll scan only
+		tail of the LRU list otherwise we scan the whole LRU
+		list. */
+		freed = buf_LRU_scan_and_free_block(buf_pool,
+						    n_iterations > 0);
+
+		if (!freed && n_iterations == 0) {
+			/* Tell other threads that there is no point
+			in scanning the LRU list. This flag is set to
+			TRUE again when we flush a batch from this
+			buffer pool. */
+			buf_pool->try_LRU_scan = FALSE;
+		}
+	}
 
-	if (freed > 0) {
+	buf_pool_mutex_exit(buf_pool);
+
+	if (freed) {
 		goto loop;
+
 	}
 
-	if (n_iterations > 30) {
+	if (n_iterations > 20) {
 		ut_print_timestamp(stderr);
 		fprintf(stderr,
 			"  InnoDB: Warning: difficult to find free blocks in\n"
-			"InnoDB: the buffer pool (%lu search iterations)!"
+			"InnoDB: the buffer pool (%lu search iterations)!\n"
+			"InnoDB: %lu failed attempts to flush a page!"
 			" Consider\n"
 			"InnoDB: increasing the buffer pool size.\n"
 			"InnoDB: It is also possible that"
@@ -926,6 +926,7 @@ loop:
 			"InnoDB: Starting InnoDB Monitor to print further\n"
 			"InnoDB: diagnostics to the standard output.\n",
 			(ulong) n_iterations,
+			(ulong)	flush_failures,
 			(ulong) fil_n_pending_log_flushes,
 			(ulong) fil_n_pending_tablespace_flushes,
 			(ulong) os_n_file_reads, (ulong) os_n_file_writes,
@@ -937,31 +938,31 @@ loop:
 		os_event_set(srv_timeout_event);
 	}
 
-	/* No free block was found: try to flush the LRU list */
-
-	buf_flush_free_margin(buf_pool);
-	++srv_buf_pool_wait_free;
-
-	os_aio_simulated_wake_handler_threads();
-
-	buf_pool_mutex_enter(buf_pool);
-
-	if (buf_pool->LRU_flush_ended > 0) {
-		/* We have written pages in an LRU flush. To make the insert
-		buffer more efficient, we try to move these pages to the free
-		list. */
-
-		buf_pool_mutex_exit(buf_pool);
-
-		buf_LRU_try_free_flushed_blocks(buf_pool);
-	} else {
-		buf_pool_mutex_exit(buf_pool);
+	/* If we have scanned the whole LRU and still are unable to
+	find a free block then we should sleep here to let the
+	page_cleaner do an LRU batch for us.
+	TODO: It'd be better if we can signal the page_cleaner. Perhaps
+	we should use timed wait for page_cleaner. */
+	if (n_iterations > 1) {
+
+		os_thread_sleep(100000);
+	}
+
+	/* No free block was found: try to flush the LRU list.
+	This call will flush one page from the LRU and put it on the
+	free list. That means that the free block is up for grabs for
+	all user threads.
+	TODO: A more elegant way would have been to return the freed
+	up block to the caller here but the code that deals with
+	removing the block from page_hash and LRU_list is fairly
+	involved (particularly in case of compressed pages). We
+	can do that in a separate patch sometime in future. */
+	if (!buf_flush_single_page_from_LRU(buf_pool)) {
+		MONITOR_INC(MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT);
+		++flush_failures;
 	}
 
-	if (n_iterations > 10) {
-
-		os_thread_sleep(500000);
-	}
+	++srv_buf_pool_wait_free;
 
 	n_iterations++;
 
@@ -1589,6 +1590,22 @@ func_exit:
 
 		rw_lock_x_unlock(hash_lock);
 		mutex_exit(block_mutex);
+	} else {
+
+		/* There can be multiple threads doing an LRU scan to
+		free a block. The page_cleaner thread can be doing an
+		LRU batch whereas user threads can potentially be doing
+		multiple single page flushes. As we release
+		buf_pool->mutex below we need to make sure that no one
+		else considers this block as a victim for page
+		replacement. This block is already out of page_hash
+		and we are about to remove it from the LRU list and put
+		it on the free list. To avoid this situation we set the
+		buf_fix_count and io_fix fields here. */
+		mutex_enter(block_mutex);
+		buf_block_buf_fix_inc((buf_block_t*) bpage, __FILE__, __LINE__);
+		buf_page_set_io_fix(bpage, BUF_IO_READ);
+		mutex_exit(block_mutex);
 	}
 
 	buf_pool_mutex_exit(buf_pool);
@@ -1629,6 +1646,13 @@ func_exit:
 		b->buf_fix_count--;
 		buf_page_set_io_fix(b, BUF_IO_NONE);
 		mutex_exit(&buf_pool->zip_mutex);
+	} else {
+		mutex_enter(block_mutex);
+		ut_ad(bpage->buf_fix_count > 0);
+		ut_ad(bpage->io_fix == BUF_IO_READ);
+		buf_block_buf_fix_dec((buf_block_t*) bpage);
+		buf_page_set_io_fix(bpage, BUF_IO_NONE);
+		mutex_exit(block_mutex);
 	}
 
 	buf_LRU_block_free_hashed_page((buf_block_t*) bpage);

=== modified file 'storage/innobase/buf/buf0rea.c'
--- a/storage/innobase/buf/buf0rea.c	revid:marc.alff@stripped
+++ b/storage/innobase/buf/buf0rea.c	revid:inaam.rana@stripped
@@ -355,7 +355,6 @@ buf_read_page(
 	ulint	zip_size,/*!< in: compressed page size in bytes, or 0 */
 	ulint	offset)	/*!< in: page number */
 {
-	buf_pool_t*	buf_pool = buf_pool_get(space, offset);
 	ib_int64_t	tablespace_version;
 	ulint		count;
 	ulint		err;
@@ -379,9 +378,6 @@ buf_read_page(
 			(ulong) space, (ulong) offset);
 	}
 
-	/* Flush pages from the end of the LRU list if necessary */
-	buf_flush_free_margin(buf_pool);
-
 	/* Increment number of I/O operations used for LRU policy. */
 	buf_LRU_stat_inc_io();
 
@@ -401,7 +397,6 @@ buf_read_page_async(
 	ulint	space,	/*!< in: space id */
 	ulint	offset)	/*!< in: page number */
 {
-	buf_pool_t*	buf_pool = buf_pool_get(space, offset);
 	ulint		zip_size;
 	ib_int64_t	tablespace_version;
 	ulint		count;
@@ -422,9 +417,6 @@ buf_read_page_async(
 				  tablespace_version, offset);
 	srv_buf_pool_reads += count;
 
-	/* Flush pages from the end of the LRU list if necessary */
-	buf_flush_free_margin(buf_pool);
-
 	/* We do not increment number of I/O operations used for LRU policy
 	here (buf_LRU_stat_inc_io()). We use this in heuristics to decide
 	about evicting uncompressed version of compressed pages from the
@@ -701,9 +693,6 @@ buf_read_ahead_linear(
 
 	os_aio_simulated_wake_handler_threads();
 
-	/* Flush pages from the end of the LRU list if necessary */
-	buf_flush_free_margin(buf_pool);
-
 #ifdef UNIV_DEBUG
 	if (buf_debug_prints && (count > 0)) {
 		fprintf(stderr,
@@ -789,9 +778,6 @@ tablespace_deleted:
 
 	os_aio_simulated_wake_handler_threads();
 
-	/* Flush pages from the end of all the LRU lists if necessary */
-	buf_flush_free_margins();
-
 #ifdef UNIV_DEBUG
 	if (buf_debug_prints) {
 		fprintf(stderr,
@@ -883,9 +869,6 @@ buf_read_recv_pages(
 
 	os_aio_simulated_wake_handler_threads();
 
-	/* Flush pages from the end of all the LRU lists if necessary */
-	buf_flush_free_margins();
-
 #ifdef UNIV_DEBUG
 	if (buf_debug_prints) {
 		fprintf(stderr,

=== modified file 'storage/innobase/handler/ha_innodb.cc'
--- a/storage/innobase/handler/ha_innodb.cc	revid:marc.alff@stripped
+++ b/storage/innobase/handler/ha_innodb.cc	revid:inaam.rana@stripped
@@ -12430,6 +12430,11 @@ static MYSQL_SYSVAR_ULONG(page_hash_lock
   PLUGIN_VAR_OPCMDARG | PLUGIN_VAR_READONLY,
   "Number of rw_locks protecting buffer pool page_hash. Rounded up to the next power of 2",
   NULL, NULL, 16, 1, MAX_PAGE_HASH_LOCKS, 0);
+
+static MYSQL_SYSVAR_ULONG(doublewrite_batch_size, srv_doublewrite_batch_size,
+  PLUGIN_VAR_OPCMDARG | PLUGIN_VAR_READONLY,
+  "Number of pages reserved in doublewrite buffer for batch flushing",
+  NULL, NULL, 120, 1, 127, 0);
 #endif /* defined UNIV_DEBUG || defined UNIV_PERF_DEBUG */
 
 static MYSQL_SYSVAR_LONG(buffer_pool_instances, innobase_buffer_pool_instances,
@@ -12468,6 +12473,16 @@ static MYSQL_SYSVAR_BOOL(buffer_pool_loa
   "Load the buffer pool from a file named @@innodb_buffer_pool_filename",
   NULL, NULL, FALSE);
 
+static MYSQL_SYSVAR_ULONG(lru_scan_depth, srv_LRU_scan_depth,
+  PLUGIN_VAR_RQCMDARG,
+  "How deep to scan LRU to keep it clean",
+  NULL, NULL, 1024, 100, ~0L, 0);
+
+static MYSQL_SYSVAR_BOOL(flush_neighbors, srv_flush_neighbors,
+  PLUGIN_VAR_NOCMDARG,
+  "Flush neighbors from buffer pool when flushing a block.",
+  NULL, NULL, TRUE);
+
 static MYSQL_SYSVAR_ULONG(commit_concurrency, innobase_commit_concurrency,
   PLUGIN_VAR_RQCMDARG,
   "Helps in performance tuning in heavily concurrent environments.",
@@ -12698,6 +12713,8 @@ static struct st_mysql_sys_var* innobase
   MYSQL_SYSVAR(buffer_pool_load_now),
   MYSQL_SYSVAR(buffer_pool_load_abort),
   MYSQL_SYSVAR(buffer_pool_load_at_startup),
+  MYSQL_SYSVAR(lru_scan_depth),
+  MYSQL_SYSVAR(flush_neighbors),
   MYSQL_SYSVAR(checksums),
   MYSQL_SYSVAR(commit_concurrency),
   MYSQL_SYSVAR(concurrency_tickets),
@@ -12770,6 +12787,7 @@ static struct st_mysql_sys_var* innobase
   MYSQL_SYSVAR(purge_batch_size),
 #if defined UNIV_DEBUG || defined UNIV_PERF_DEBUG
   MYSQL_SYSVAR(page_hash_locks),
+  MYSQL_SYSVAR(doublewrite_batch_size),
 #endif /* defined UNIV_DEBUG || defined UNIV_PERF_DEBUG */
   MYSQL_SYSVAR(print_all_deadlocks),
   MYSQL_SYSVAR(undo_logs),

=== modified file 'storage/innobase/ibuf/ibuf0ibuf.c'
--- a/storage/innobase/ibuf/ibuf0ibuf.c	revid:marc.alff@stripped
+++ b/storage/innobase/ibuf/ibuf0ibuf.c	revid:inaam.rana@stripped
@@ -197,9 +197,6 @@ UNIV_INTERN uint	ibuf_debug;
 /** The insert buffer control structure */
 UNIV_INTERN ibuf_t*	ibuf			= NULL;
 
-/** Counter for ibuf_should_try() */
-UNIV_INTERN ulint	ibuf_flush_count	= 0;
-
 #ifdef UNIV_PFS_MUTEX
 UNIV_INTERN mysql_pfs_key_t	ibuf_pessimistic_insert_mutex_key;
 UNIV_INTERN mysql_pfs_key_t	ibuf_mutex_key;

=== modified file 'storage/innobase/include/buf0buf.h'
--- a/storage/innobase/include/buf0buf.h	revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0buf.h	revid:inaam.rana@stripped
@@ -145,6 +145,10 @@ struct buf_pool_info_struct{
 	ulint	n_pend_reads;		/*!< buf_pool->n_pend_reads, pages
 					pending read */
 	ulint	n_pending_flush_lru;	/*!< Pages pending flush in LRU */
+	ulint	n_pending_flush_single_page;/*!< Pages pending to be
+					flushed as part of single page
+					flushes issued by various user
+					threads */
 	ulint	n_pending_flush_list;	/*!< Pages pending flush in FLUSH
 					LIST */
 	ulint	n_pages_made_young;	/*!< number of pages made young */
@@ -1844,10 +1848,16 @@ struct buf_pool_struct{
 					to read this for heuristic
 					purposes without holding any
 					mutex or latch */
-	ulint		LRU_flush_ended;/*!< when an LRU flush ends for a page,
-					this is incremented by one; this is
-					set to zero when a buffer block is
-					allocated */
+	ibool		try_LRU_scan;	/*!< Set to FALSE when an LRU
+					scan for free block fails. This
+					flag is used to avoid repeated
+					scans of LRU list when we know
+					that there is no free block
+					available in the scan depth for
+					eviction. Set to TRUE whenever
+					we flush a batch from the
+					buffer pool. Protected by the
+					buf_pool->mutex */
 	/* @} */
 
 	/** @name LRU replacement algorithm fields */

=== modified file 'storage/innobase/include/buf0buf.ic'
--- a/storage/innobase/include/buf0buf.ic	revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0buf.ic	revid:inaam.rana@stripped
@@ -373,9 +373,10 @@ buf_page_get_flush_type(
 	switch (flush_type) {
 	case BUF_FLUSH_LRU:
 	case BUF_FLUSH_LIST:
+	case BUF_FLUSH_SINGLE_PAGE:
 		return(flush_type);
 	case BUF_FLUSH_N_TYPES:
-		break;
+		ut_error;
 	}
 	ut_error;
 #endif /* UNIV_DEBUG */

=== modified file 'storage/innobase/include/buf0flu.h'
--- a/storage/innobase/include/buf0flu.h	revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0flu.h	revid:inaam.rana@stripped
@@ -60,21 +60,6 @@ void
 buf_flush_write_complete(
 /*=====================*/
 	buf_page_t*	bpage);	/*!< in: pointer to the block in question */
-/*********************************************************************//**
-Flushes pages from the end of the LRU list if there is too small
-a margin of replaceable pages there. If buffer pool is NULL it
-means flush free margin on all buffer pool instances. */
-UNIV_INTERN
-void
-buf_flush_free_margin(
-/*==================*/
-	 buf_pool_t*	buf_pool);
-/*********************************************************************//**
-Flushes pages from the end of all the LRU lists. */
-UNIV_INTERN
-void
-buf_flush_free_margins(void);
-/*=========================*/
 #endif /* !UNIV_HOTBACKUP */
 /********************************************************************//**
 Initializes a page for writing to the tablespace. */
@@ -103,21 +88,6 @@ buf_flush_page_try(
 	__attribute__((nonnull, warn_unused_result));
 # endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
 /*******************************************************************//**
-This utility flushes dirty blocks from the end of the LRU list.
-NOTE: The calling thread may own latches to pages: to avoid deadlocks,
-this function must be written so that it cannot end up waiting for these
-latches!
-@return number of blocks for which the write request was queued;
-ULINT_UNDEFINED if there was a flush of the same type already running */
-UNIV_INTERN
-ulint
-buf_flush_LRU(
-/*==========*/
-	buf_pool_t*	buf_pool,	/*!< in: buffer pool instance */
-	ulint		min_n);		/*!< in: wished minimum mumber of blocks
-					flushed (it is not guaranteed that the
-					actual number is that big, though) */
-/*******************************************************************//**
 This utility flushes dirty blocks from the end of the flush_list of
 all buffer pool instances.
 NOTE: The calling thread is not allowed to own any latches on pages!
@@ -136,6 +106,19 @@ buf_flush_list(
 					(if their number does not exceed
 					min_n), otherwise ignored */
 /******************************************************************//**
+This function picks up a single dirty page from the tail of the LRU
+list, flushes it, removes it from page_hash and LRU list and puts
+it on the free list. It is called from user threads when they are
+unable to find a replacable page at the tail of the LRU list i.e.:
+when the background LRU flushing in the page_cleaner thread is not
+fast enough to keep pace with the workload.
+@return TRUE if success. */
+UNIV_INTERN
+ibool
+buf_flush_single_page_from_LRU(
+/*===========================*/
+	buf_pool_t*	buf_pool);	/*!< in/out: buffer pool instance */
+/******************************************************************//**
 Waits until a flush batch of the given type ends */
 UNIV_INTERN
 void
@@ -249,15 +232,6 @@ UNIV_INTERN
 void
 buf_flush_free_flush_rbt(void);
 /*==========================*/
-
-/** When buf_flush_free_margin is called, it tries to make this many blocks
-available to replacement in the free list and at the end of the LRU list (to
-make sure that a read-ahead batch can be read efficiently in a single
-sweep). */
-#define BUF_FLUSH_FREE_BLOCK_MARGIN(b)	(5 + BUF_READ_AHEAD_AREA(b))
-/** Extra margin to apply above BUF_FLUSH_FREE_BLOCK_MARGIN */
-#define BUF_FLUSH_EXTRA_MARGIN(b)	((BUF_FLUSH_FREE_BLOCK_MARGIN(b) / 4 \
-					+ 100) / srv_buf_pool_instances)
 #endif /* !UNIV_HOTBACKUP */
 
 #ifndef UNIV_NONINL

=== modified file 'storage/innobase/include/buf0lru.h'
--- a/storage/innobase/include/buf0lru.h	revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0lru.h	revid:inaam.rana@stripped
@@ -32,19 +32,6 @@ Created 11/5/1995 Heikki Tuuri
 #include "buf0types.h"
 
 /******************************************************************//**
-Tries to remove LRU flushed blocks from the end of the LRU list and put them
-to the free list. This is beneficial for the efficiency of the insert buffer
-operation, as flushed pages from non-unique non-clustered indexes are here
-taken out of the buffer pool, and their inserts redirected to the insert
-buffer. Otherwise, the flushed blocks could get modified again before read
-operations need new buffer blocks, and the i/o work done in flushing would be
-wasted. */
-UNIV_INTERN
-void
-buf_LRU_try_free_flushed_blocks(
-/*============================*/
-	buf_pool_t*	buf_pool);	/*!< in: buffer pool instance */
-/******************************************************************//**
 Returns TRUE if less than 25 % of the buffer pool is available. This can be
 used in heuristics to prevent huge transactions eating up the whole buffer
 pool for their locks.
@@ -61,9 +48,6 @@ These are low-level functions
 /** Minimum LRU list length for which the LRU_old pointer is defined */
 #define BUF_LRU_OLD_MIN_LEN	512	/* 8 megabytes of 16k pages */
 
-/** Maximum LRU list search length in buf_flush_LRU_recommendation() */
-#define BUF_LRU_FREE_SEARCH_LEN(b)	(5 + 2 * BUF_READ_AHEAD_AREA(b))
-
 /******************************************************************//**
 Invalidates all pages belonging to a given tablespace when we are deleting
 the data file(s) of that tablespace. A PROBLEM: if readahead is being started,
@@ -108,19 +92,13 @@ Try to free a replaceable block.
 @return	TRUE if found and freed */
 UNIV_INTERN
 ibool
-buf_LRU_search_and_free_block(
-/*==========================*/
+buf_LRU_scan_and_free_block(
+/*========================*/
 	buf_pool_t*	buf_pool,	/*!< in: buffer pool instance */
-	ulint		n_iterations);	/*!< in: how many times this has
-					been called repeatedly without
-					result: a high value means that
-					we should search farther; if
-					n_iterations < 10, then we search
-					n_iterations / 10 * buf_pool->curr_size
-					pages from the end of the LRU list; if
-					n_iterations < 5, then we will
-					also search n_iterations / 5
-					of the unzip_LRU list. */
+	ibool		scan_all)	/*!< in: scan whole LRU list
+					if TRUE, otherwise scan only
+					'old' blocks. */
+	__attribute__((nonnull,warn_unused_result));
 /******************************************************************//**
 Returns a free block from the buf_pool.  The block is taken off the
 free list.  If it is empty, returns NULL.
@@ -134,6 +112,27 @@ buf_LRU_get_free_only(
 Returns a free block from the buf_pool. The block is taken off the
 free list. If it is empty, blocks are moved from the end of the
 LRU list to the free list.
+This function is called from a user thread when it needs a clean
+block to read in a page. Note that we only ever get a block from
+the free list. Even when we flush a page or find a page in LRU scan
+we put it to free list to be used.
+* iteration 0:
+  * get a block from free list, success:done
+  * if there is an LRU flush batch in progress:
+    * wait for batch to end: retry free list
+  * if buf_pool->try_LRU_scan is set
+    * scan LRU up to srv_LRU_scan_depth to find a clean block
+    * the above will put the block on free list
+    * success:retry the free list
+  * flush one dirty page from tail of LRU to disk
+    * the above will put the block on free list
+    * success: retry the free list
+* iteration 1:
+  * same as iteration 0 except:
+    * scan whole LRU list
+    * scan LRU list even if buf_pool->try_LRU_scan is not set
+* iteration > 1:
+  * same as iteration 1 but sleep 100ms
 @return	the free control block, in state BUF_BLOCK_READY_FOR_USE */
 UNIV_INTERN
 buf_block_t*
@@ -141,7 +140,15 @@ buf_LRU_get_free_block(
 /*===================*/
 	buf_pool_t*	buf_pool)	/*!< in/out: buffer pool instance */
 	__attribute__((nonnull,warn_unused_result));
-
+/******************************************************************//**
+Determines if the unzip_LRU list should be used for evicting a victim
+instead of the general LRU list.
+@return	TRUE if should use unzip_LRU */
+UNIV_INTERN
+ibool
+buf_LRU_evict_from_unzip_LRU(
+/*=========================*/
+	buf_pool_t*	buf_pool);
 /******************************************************************//**
 Puts a block back to the free list. */
 UNIV_INTERN

=== modified file 'storage/innobase/include/buf0types.h'
--- a/storage/innobase/include/buf0types.h	revid:marc.alff@stripped
+++ b/storage/innobase/include/buf0types.h	revid:inaam.rana@stripped
@@ -47,6 +47,8 @@ enum buf_flush {
 	BUF_FLUSH_LRU = 0,		/*!< flush via the LRU list */
 	BUF_FLUSH_LIST,			/*!< flush via the flush list
 					of dirty blocks */
+	BUF_FLUSH_SINGLE_PAGE,		/*!< flush via the LRU list
+					but only a single page */
 	BUF_FLUSH_N_TYPES		/*!< index of last element + 1  */
 };
 

=== modified file 'storage/innobase/include/ibuf0ibuf.ic'
--- a/storage/innobase/include/ibuf0ibuf.ic	revid:marc.alff@stripped
+++ b/storage/innobase/include/ibuf0ibuf.ic	revid:inaam.rana@stripped
@@ -28,9 +28,6 @@ Created 7/19/1997 Heikki Tuuri
 #ifndef UNIV_HOTBACKUP
 #include "buf0lru.h"
 
-/** Counter for ibuf_should_try() */
-extern ulint	ibuf_flush_count;
-
 /** An index page must contain at least UNIV_PAGE_SIZE /
 IBUF_PAGE_SIZE_PER_FREE_SPACE bytes of free space for ibuf to try to
 buffer inserts to this page.  If there is this much of free space, the
@@ -127,22 +124,10 @@ ibuf_should_try(
 						a secondary index when we
 						decide */
 {
-	if (ibuf_use != IBUF_USE_NONE
-	    && ibuf->max_size != 0
-	    && !dict_index_is_clust(index)
-	    && (ignore_sec_unique || !dict_index_is_unique(index))) {
-
-		ibuf_flush_count++;
-
-		if (ibuf_flush_count % 4 == 0) {
-
-			buf_LRU_try_free_flushed_blocks(NULL);
-		}
-
-		return(TRUE);
-	}
-
-	return(FALSE);
+	return(ibuf_use != IBUF_USE_NONE
+	       && ibuf->max_size != 0
+	       && !dict_index_is_clust(index)
+	       && (ignore_sec_unique || !dict_index_is_unique(index)));
 }
 
 /******************************************************************//**

=== modified file 'storage/innobase/include/srv0mon.h'
--- a/storage/innobase/include/srv0mon.h	revid:marc.alff@stripped
+++ b/storage/innobase/include/srv0mon.h	revid:inaam.rana@stripped
@@ -167,25 +167,47 @@ enum monitor_id_value {
 	MONITOR_OVLD_PAGES_READ,
 	MONITOR_OVLD_BYTE_READ,
 	MONITOR_OVLD_BYTE_WRITTEN,
-	MONITOR_NUM_ADAPTIVE_FLUSHES,
-	MONITOR_FLUSH_ADAPTIVE_PAGES,
-	MONITOR_NUM_ASYNC_FLUSHES,
-	MONITOR_FLUSH_ASYNC_PAGES,
-	MONITOR_NUM_SYNC_FLUSHES,
-	MONITOR_FLUSH_SYNC_PAGES,
-	MONITOR_NUM_MAX_DIRTY_FLUSHES,
-	MONITOR_FLUSH_MAX_DIRTY_PAGES,
-	MONITOR_NUM_FREE_MARGIN_FLUSHES,
-	MONITOR_FLUSH_FREE_MARGIN_PAGES,
-	MONITOR_FLUSH_IO_CAPACITY_PCT,
 	MONITOR_FLUSH_BATCH_SCANNED,
 	MONITOR_FLUSH_BATCH_SCANNED_NUM_CALL,
 	MONITOR_FLUSH_BATCH_SCANNED_PER_CALL,
 	MONITOR_FLUSH_BATCH_TOTAL_PAGE,
 	MONITOR_FLUSH_BATCH_COUNT,
 	MONITOR_FLUSH_BATCH_PAGES,
-	MONITOR_BUF_FLUSH_LRU,
-	MONITOR_BUF_FLUSH_LIST,
+	MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
+	MONITOR_FLUSH_NEIGHBOR_COUNT,
+	MONITOR_FLUSH_NEIGHBOR_PAGES,
+	MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE,
+	MONITOR_FLUSH_MAX_DIRTY_COUNT,
+	MONITOR_FLUSH_MAX_DIRTY_PAGES,
+	MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
+	MONITOR_FLUSH_ADAPTIVE_COUNT,
+	MONITOR_FLUSH_ADAPTIVE_PAGES,
+	MONITOR_FLUSH_ASYNC_TOTAL_PAGE,
+	MONITOR_FLUSH_ASYNC_COUNT,
+	MONITOR_FLUSH_ASYNC_PAGES,
+	MONITOR_FLUSH_SYNC_TOTAL_PAGE,
+	MONITOR_FLUSH_SYNC_COUNT,
+	MONITOR_FLUSH_SYNC_PAGES,
+	MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
+	MONITOR_FLUSH_BACKGROUND_COUNT,
+	MONITOR_FLUSH_BACKGROUND_PAGES,
+	MONITOR_LRU_BATCH_SCANNED,
+	MONITOR_LRU_BATCH_SCANNED_NUM_CALL,
+	MONITOR_LRU_BATCH_SCANNED_PER_CALL,
+	MONITOR_LRU_BATCH_TOTAL_PAGE,
+	MONITOR_LRU_BATCH_COUNT,
+	MONITOR_LRU_BATCH_PAGES,
+	MONITOR_LRU_SINGLE_FLUSH_SCANNED,
+	MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL,
+	MONITOR_LRU_SINGLE_FLUSH_SCANNED_PER_CALL,
+	MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT,
+	MONITOR_LRU_GET_FREE_SEARCH,
+	MONITOR_LRU_SEARCH_SCANNED,
+	MONITOR_LRU_SEARCH_SCANNED_NUM_CALL,
+	MONITOR_LRU_SEARCH_SCANNED_PER_CALL,
+	MONITOR_LRU_UNZIP_SEARCH_SCANNED,
+	MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL,
+	MONITOR_LRU_UNZIP_SEARCH_SCANNED_PER_CALL,
 
 	/* Buffer Page I/O specific counters. */
 	MONITOR_MODULE_BUF_PAGE,

=== modified file 'storage/innobase/include/srv0srv.h'
--- a/storage/innobase/include/srv0srv.h	revid:marc.alff@stripped
+++ b/storage/innobase/include/srv0srv.h	revid:inaam.rana@stripped
@@ -178,6 +178,10 @@ extern ulint	srv_buf_pool_size;	/*!< req
 extern ulint    srv_buf_pool_instances; /*!< requested number of buffer pool instances */
 extern ulong	srv_n_page_hash_locks;	/*!< number of locks to
 					protect buf_pool->page_hash */
+extern ulong	srv_LRU_scan_depth;	/*!< Scan depth for LRU
+					flush batch */
+extern my_bool	srv_flush_neighbors;	/*!< whether or not to flush
+					neighbors of a block */
 extern ulint	srv_buf_pool_old_size;	/*!< previously requested size */
 extern ulint	srv_buf_pool_curr_size;	/*!< current size in bytes */
 extern ulint	srv_mem_pool_size;
@@ -230,6 +234,7 @@ extern unsigned long long	srv_stats_tran
 extern unsigned long long	srv_stats_persistent_sample_pages;
 
 extern ibool	srv_use_doublewrite_buf;
+extern ulong	srv_doublewrite_batch_size;
 extern ibool	srv_use_checksums;
 
 extern ulong	srv_max_buf_pool_modified_pct;

=== modified file 'storage/innobase/include/trx0sys.h'
--- a/storage/innobase/include/trx0sys.h	revid:marc.alff@stripped
+++ b/storage/innobase/include/trx0sys.h	revid:inaam.rana@stripped
@@ -659,6 +659,14 @@ struct trx_doublewrite_struct{
 	ulint	block2;		/*!< page number of the second block */
 	ulint	first_free;	/*!< first free position in write_buf measured
 				in units of UNIV_PAGE_SIZE */
+	ulint	n_reserved;	/*!< number of slots currently reserved
+				for single page flushes. */
+	ibool*	in_use;		/*!< flag used to indicate if a slot is
+				in use. Only used for single page
+				flushes. */
+	ibool	batch_running;	/*!< set to TRUE if currently a batch
+				is being written from the doublewrite
+				buffer. */
 	byte*	write_buf;	/*!< write buffer used in writing to the
 				doublewrite buffer, aligned to an
 				address divisible by UNIV_PAGE_SIZE

=== modified file 'storage/innobase/log/log0log.c'
--- a/storage/innobase/log/log0log.c	revid:marc.alff@stripped
+++ b/storage/innobase/log/log0log.c	revid:inaam.rana@stripped
@@ -1644,8 +1644,11 @@ log_preflush_pool_modified_pages(
 		return(FALSE);
 	}
 
-	MONITOR_INC(MONITOR_NUM_SYNC_FLUSHES);
-	MONITOR_SET(MONITOR_FLUSH_SYNC_PAGES, n_pages);
+	MONITOR_INC_VALUE_CUMULATIVE(
+		MONITOR_FLUSH_SYNC_TOTAL_PAGE,
+		MONITOR_FLUSH_SYNC_COUNT,
+		MONITOR_FLUSH_SYNC_PAGES,
+		n_pages);
 
 	return(TRUE);
 }

=== modified file 'storage/innobase/srv/srv0mon.c'
--- a/storage/innobase/srv/srv0mon.c	revid:marc.alff@stripped
+++ b/storage/innobase/srv/srv0mon.c	revid:inaam.rana@stripped
@@ -253,55 +253,7 @@ static monitor_info_t	innodb_counter_inf
 	 MONITOR_EXISTING | MONITOR_DEFAULT_ON, 0,
 	 MONITOR_OVLD_BYTE_WRITTEN},
 
-	{"buffer_flush_adaptive_flushes", "buffer",
-	 "Occurrences of adaptive flush", 0, 0,
-	 MONITOR_NUM_ADAPTIVE_FLUSHES},
-
-	{"buffer_flush_adaptive_pages", "buffer",
-	 "Number of pages flushed as part of adaptive flushing",
-	 MONITOR_DISPLAY_CURRENT, 0,
-	 MONITOR_FLUSH_ADAPTIVE_PAGES},
-
-	{"buffer_flush_async_flushes", "buffer",
-	 "Occurrences of async flush",
-	 0, 0, MONITOR_NUM_ASYNC_FLUSHES},
-
-	{"buffer_flush_async_pages", "buffer",
-	 "Number of pages flushed as part of async flushing",
-	 MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_ASYNC_PAGES},
-
-	{"buffer_flush_sync_flushes", "buffer", "Number of sync flushes",
-	 0, 0, MONITOR_NUM_SYNC_FLUSHES},
-
-	{"buffer_flush_sync_pages", "buffer",
-	 "Number of pages flushed as part of sync flushing",
-	 MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_SYNC_PAGES},
-
-	{"buffer_flush_max_dirty_flushes", "buffer",
-	 "Number of flushes as part of max dirty page flush",
-	 0, 0, MONITOR_NUM_MAX_DIRTY_FLUSHES},
-
-	{"buffer_flush_max_dirty_pages", "buffer",
-	 "Number of pages flushed as part of max dirty flushing",
-	 MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_MAX_DIRTY_PAGES},
-
-	{"buffer_flush_free_margin_flushes", "buffer",
-	 "Number of flushes due to lack of replaceable pages in free list",
-	 0, 0, MONITOR_NUM_FREE_MARGIN_FLUSHES},
-
-	{"buffer_flush_free_margin_pages", "buffer",
-	 "Number of pages flushed due to lack of replaceable pages"
-	 " in free list",
-	 MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_FREE_MARGIN_PAGES},
-
-	{"buffer_flush_io_capacity_pct", "buffer",
-	 "Percent of Server I/O capacity during flushing",
-	 MONITOR_DISPLAY_CURRENT, 0, MONITOR_FLUSH_IO_CAPACITY_PCT},
-
-	/* Following three counters are of one monitor set, with
-	"buffer_flush_batch_scanned" being the set owner, and averaged
-	by "buffer_flush_batch_scanned_num_calls" */
-
+	/* Cumulative counter for scanning in flush batches */
 	{"buffer_flush_batch_scanned", "buffer",
 	 "Total pages scanned as part of flush batch",
 	 MONITOR_SET_OWNER,
@@ -314,16 +266,13 @@ static monitor_info_t	innodb_counter_inf
 	 MONITOR_FLUSH_BATCH_SCANNED_NUM_CALL},
 
 	{"buffer_flush_batch_scanned_per_call", "buffer",
-	 "Page scanned per flush batch scanned",
+	 "Pages scanned per flush batch scan",
 	 MONITOR_SET_MEMBER, MONITOR_FLUSH_BATCH_SCANNED,
 	 MONITOR_FLUSH_BATCH_SCANNED_PER_CALL},
 
-	/* Following three counters are of one monitor set, with
-	"buffer_flush_batch_scanned" being the set owner, and averaged
-	by "buffer_flush_batch_count" */
-
+	/* Cumulative counter for pages flushed in flush batches */
 	{"buffer_flush_batch_total_pages", "buffer",
-	 "Total pages scanned as part of flush batch",
+	 "Total pages flushed as part of flush batch",
 	 MONITOR_SET_OWNER, MONITOR_FLUSH_BATCH_COUNT,
 	 MONITOR_FLUSH_BATCH_TOTAL_PAGE},
 
@@ -333,16 +282,196 @@ static monitor_info_t	innodb_counter_inf
 	 MONITOR_FLUSH_BATCH_COUNT},
 
 	{"buffer_flush_batch_pages", "buffer",
-	 "Page queued as a flush batch",
+	 "Pages queued as a flush batch",
 	 MONITOR_SET_MEMBER, MONITOR_FLUSH_BATCH_TOTAL_PAGE,
 	 MONITOR_FLUSH_BATCH_PAGES},
 
-	{"buffer_flush_by_lru", "buffer",
-	 "buffer flushed via LRU list", 0, 0, MONITOR_BUF_FLUSH_LRU},
+	/* Cumulative counter for flush batches because of neighbor */
+	{"buffer_flush_neighbor_total_pages", "buffer",
+	 "Total neighbors flushed as part of neighbor flush",
+	 MONITOR_SET_OWNER, MONITOR_FLUSH_NEIGHBOR_COUNT,
+	 MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE},
+
+	{"buffer_flush_neighbor", "buffer",
+	 "Number of times neighbors flushing is invoked",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
+	 MONITOR_FLUSH_NEIGHBOR_COUNT},
+
+	{"buffer_flush_neighbor_pages", "buffer",
+	 "Pages queued as a neighbor batch",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
+	 MONITOR_FLUSH_NEIGHBOR_PAGES},
+
+	/* Cumulative counter for flush batches because of max_dirty */
+	{"buffer_flush_max_dirty_total_pages", "buffer",
+	 "Total pages flushed as part of max_dirty batches",
+	 MONITOR_SET_OWNER, MONITOR_FLUSH_MAX_DIRTY_COUNT,
+	 MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE},
+
+	{"buffer_flush_max_dirty", "buffer",
+	 "Number of max_dirty batches",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE,
+	 MONITOR_FLUSH_MAX_DIRTY_COUNT},
+
+	{"buffer_flush_max_dirty_pages", "buffer",
+	 "Pages queued as a max_dirty batch",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_MAX_DIRTY_TOTAL_PAGE,
+	 MONITOR_FLUSH_MAX_DIRTY_PAGES},
+
+	/* Cumulative counter for flush batches because of adaptive */
+	{"buffer_flush_adaptive_total_pages", "buffer",
+	 "Total pages flushed as part of adaptive batches",
+	 MONITOR_SET_OWNER, MONITOR_FLUSH_ADAPTIVE_COUNT,
+	 MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE},
+
+	{"buffer_flush_adaptive", "buffer",
+	 "Number of adaptive batches",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
+	 MONITOR_FLUSH_ADAPTIVE_COUNT},
+
+	{"buffer_flush_adaptive_pages", "buffer",
+	 "Pages queued as an adaptive batch",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_ADAPTIVE_TOTAL_PAGE,
+	 MONITOR_FLUSH_ADAPTIVE_PAGES},
+
+	/* Cumulative counter for flush batches because of async */
+	{"buffer_flush_async_total_pages", "buffer",
+	 "Total pages flushed as part of async batches",
+	 MONITOR_SET_OWNER, MONITOR_FLUSH_ASYNC_COUNT,
+	 MONITOR_FLUSH_ASYNC_TOTAL_PAGE},
+
+	{"buffer_flush_async", "buffer",
+	 "Number of async batches",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_ASYNC_TOTAL_PAGE,
+	 MONITOR_FLUSH_ASYNC_COUNT},
+
+	{"buffer_flush_async_pages", "buffer",
+	 "Pages queued as an async batch",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_ASYNC_TOTAL_PAGE,
+	 MONITOR_FLUSH_ASYNC_PAGES},
+
+	/* Cumulative counter for flush batches because of sync */
+	{"buffer_flush_sync_total_pages", "buffer",
+	 "Total pages flushed as part of sync batches",
+	 MONITOR_SET_OWNER, MONITOR_FLUSH_SYNC_COUNT,
+	 MONITOR_FLUSH_SYNC_TOTAL_PAGE},
+
+	{"buffer_flush_sync", "buffer",
+	 "Number of sync batches",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_SYNC_TOTAL_PAGE,
+	 MONITOR_FLUSH_SYNC_COUNT},
+
+	{"buffer_flush_sync_pages", "buffer",
+	 "Pages queued as a sync batch",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_SYNC_TOTAL_PAGE,
+	 MONITOR_FLUSH_SYNC_PAGES},
+
+	/* Cumulative counter for flush batches because of background */
+	{"buffer_flush_background_total_pages", "buffer",
+	 "Total pages flushed as part of background batches",
+	 MONITOR_SET_OWNER, MONITOR_FLUSH_BACKGROUND_COUNT,
+	 MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE},
+
+	{"buffer_flush_background", "buffer",
+	 "Number of background batches",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
+	 MONITOR_FLUSH_BACKGROUND_COUNT},
+
+	{"buffer_flush_background_pages", "buffer",
+	 "Pages queued as a background batch",
+	 MONITOR_SET_MEMBER, MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
+	 MONITOR_FLUSH_BACKGROUND_PAGES},
+
+	/* Cumulative counter for LRU batch scan */
+	{"buffer_LRU_batch_scanned", "buffer",
+	 "Total pages scanned as part of LRU batch",
+	 MONITOR_SET_OWNER, MONITOR_LRU_BATCH_SCANNED_NUM_CALL,
+	 MONITOR_LRU_BATCH_SCANNED},
+
+	{"buffer_LRU_batch_num_scan", "buffer",
+	 "Number of times LRU batch is called",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_BATCH_SCANNED,
+	 MONITOR_LRU_BATCH_SCANNED_NUM_CALL},
+
+	{"buffer_LRU_batch_scanned_per_call", "buffer",
+	 "Pages scanned per LRU batch call",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_BATCH_SCANNED,
+	 MONITOR_LRU_BATCH_SCANNED_PER_CALL},
+
+	/* Cumulative counter for LRU batch pages flushed */
+	{"buffer_LRU_batch_total_pages", "buffer",
+	 "Total pages flushed as part of LRU batches",
+	 MONITOR_SET_OWNER, MONITOR_LRU_BATCH_COUNT,
+	 MONITOR_LRU_BATCH_TOTAL_PAGE},
+
+	{"buffer_LRU_batches", "buffer",
+	 "Number of LRU batches",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_BATCH_TOTAL_PAGE,
+	 MONITOR_LRU_BATCH_COUNT},
+
+	{"buffer_LRU_batch_pages", "buffer",
+	 "Pages queued as an LRU batch",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_BATCH_TOTAL_PAGE,
+	 MONITOR_LRU_BATCH_PAGES},
+
+	/* Cumulative counter for single page LRU scans */
+	{"buffer_LRU_single_flush_scanned", "buffer",
+	 "Total pages scanned as part of single page LRU flush",
+	 MONITOR_SET_OWNER,
+	 MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL,
+	 MONITOR_LRU_SINGLE_FLUSH_SCANNED},
+
+	{"buffer_LRU_single_flush_num_scan", "buffer",
+	 "Number of times single page LRU flush is called",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_SINGLE_FLUSH_SCANNED,
+	 MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL},
+
+	{"buffer_LRU_single_flush_scanned_per_call", "buffer",
+	 "Page scanned per single LRU flush",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_SINGLE_FLUSH_SCANNED,
+	 MONITOR_LRU_SINGLE_FLUSH_SCANNED_PER_CALL},
+
+	{"buffer_LRU_single_flush_failure_count", "Buffer",
+	 "Number of times attempt to flush a single page from LRU failed",
+	 0, 0, MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT},
+
+	{"buffer_LRU_get_free_search", "Buffer",
+	 "Number of searches performed for a clean page",
+	 0, 0, MONITOR_LRU_GET_FREE_SEARCH},
+
+	/* Cumulative counter for LRU search scans */
+	{"buffer_LRU_search_scanned", "buffer",
+	 "Total pages scanned as part of LRU search",
+	 MONITOR_SET_OWNER,
+	 MONITOR_LRU_SEARCH_SCANNED_NUM_CALL,
+	 MONITOR_LRU_SEARCH_SCANNED},
+
+	{"buffer_LRU_search_num_scan", "buffer",
+	 "Number of times LRU search is performed",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_SEARCH_SCANNED,
+	 MONITOR_LRU_SEARCH_SCANNED_NUM_CALL},
+
+	{"buffer_LRU_search_scanned_per_call", "buffer",
+	 "Page scanned per single LRU search",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_SEARCH_SCANNED,
+	 MONITOR_LRU_SEARCH_SCANNED_PER_CALL},
+
+	/* Cumulative counter for LRU unzip search scans */
+	{"buffer_LRU_unzip_search_scanned", "buffer",
+	 "Total pages scanned as part of LRU unzip search",
+	 MONITOR_SET_OWNER,
+	 MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL,
+	 MONITOR_LRU_UNZIP_SEARCH_SCANNED},
 
-	{"buffer_flush_by_list", "buffer",
-	 "buffer flushed via flush list of dirty pages",
-	 0, 0, MONITOR_BUF_FLUSH_LIST},
+	{"buffer_LRU_unzip_search_num_scan", "buffer",
+	 "Number of times LRU unzip search is performed",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_UNZIP_SEARCH_SCANNED,
+	 MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL},
+
+	{"buffer_LRU_unzip_search_scanned_per_call", "buffer",
+	 "Page scanned per single LRU unzip search",
+	 MONITOR_SET_MEMBER, MONITOR_LRU_UNZIP_SEARCH_SCANNED,
+	 MONITOR_LRU_UNZIP_SEARCH_SCANNED_PER_CALL},
 
 	/* ========== Counters for Buffer Page I/O ========== */
 	{"module_buffer_page", "buffer_page_io", "Buffer Page I/O Module",

=== modified file 'storage/innobase/srv/srv0srv.c'
--- a/storage/innobase/srv/srv0srv.c	revid:marc.alff@stripped
+++ b/storage/innobase/srv/srv0srv.c	revid:inaam.rana@stripped
@@ -204,6 +204,10 @@ UNIV_INTERN ulint	srv_buf_pool_size	= UL
 UNIV_INTERN ulint       srv_buf_pool_instances  = 1;
 /* number of locks to protect buf_pool->page_hash */
 UNIV_INTERN ulong	srv_n_page_hash_locks = 16;
+/** Scan depth for LRU flush batch i.e.: number of blocks scanned*/
+UNIV_INTERN ulong	srv_LRU_scan_depth	= 1024;
+/** whether or not to flush neighbors of a block */
+UNIV_INTERN my_bool	srv_flush_neighbors	= TRUE;
 /* previously requested size */
 UNIV_INTERN ulint	srv_buf_pool_old_size;
 /* current size in kilobytes */
@@ -345,6 +349,12 @@ UNIV_INTERN unsigned long long	srv_stats
 UNIV_INTERN unsigned long long	srv_stats_persistent_sample_pages = 20;
 
 UNIV_INTERN ibool	srv_use_doublewrite_buf	= TRUE;
+
+/** doublewrite buffer is 1MB is size i.e.: it can hold 128 16K pages.
+The following parameter is the size of the buffer that is used for
+batch flushing i.e.: LRU flushing and flush_list flushing. The rest
+of the pages are used for single page flushing. */
+UNIV_INTERN ulong	srv_doublewrite_batch_size	= 120;
 UNIV_INTERN ibool	srv_use_checksums = TRUE;
 
 UNIV_INTERN ulong	srv_replication_delay		= 0;

=== modified file 'storage/innobase/trx/trx0sys.c'
--- a/storage/innobase/trx/trx0sys.c	revid:marc.alff@stripped
+++ b/storage/innobase/trx/trx0sys.c	revid:inaam.rana@stripped
@@ -180,7 +180,18 @@ trx_doublewrite_init(
 	byte*	doublewrite)	/*!< in: pointer to the doublewrite buf
 				header on trx sys page */
 {
-	trx_doublewrite = mem_alloc(sizeof(trx_doublewrite_t));
+	ulint	buf_size;
+
+	trx_doublewrite = mem_zalloc(sizeof(trx_doublewrite_t));
+
+	/* There are two blocks of same size in the doublewrite
+	buffer. */
+	buf_size = 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE;
+
+	/* There must be atleast one buffer for single page writes
+	and one buffer for batch writes. */
+	ut_a(srv_doublewrite_batch_size > 0
+	     && srv_doublewrite_batch_size < buf_size);
 
 	/* Since we now start to use the doublewrite buffer, no need to call
 	fsync() after every write to a data file */
@@ -192,18 +203,22 @@ trx_doublewrite_init(
 		     &trx_doublewrite->mutex, SYNC_DOUBLEWRITE);
 
 	trx_doublewrite->first_free = 0;
+	trx_doublewrite->n_reserved = 0;
 
 	trx_doublewrite->block1 = mach_read_from_4(
 		doublewrite + TRX_SYS_DOUBLEWRITE_BLOCK1);
 	trx_doublewrite->block2 = mach_read_from_4(
 		doublewrite + TRX_SYS_DOUBLEWRITE_BLOCK2);
-	trx_doublewrite->write_buf_unaligned = ut_malloc(
-		(1 + 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE) * UNIV_PAGE_SIZE);
 
+	trx_doublewrite->in_use = mem_zalloc(buf_size * sizeof(ibool));
+
+	trx_doublewrite->write_buf_unaligned = ut_malloc(
+		(1 + buf_size) * UNIV_PAGE_SIZE);
 	trx_doublewrite->write_buf = ut_align(
 		trx_doublewrite->write_buf_unaligned, UNIV_PAGE_SIZE);
-	trx_doublewrite->buf_block_arr = mem_alloc(
-		2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE * sizeof(void*));
+
+	trx_doublewrite->buf_block_arr = mem_zalloc(
+		buf_size * sizeof(void*));
 }
 
 /****************************************************************//**
@@ -1673,12 +1688,17 @@ trx_sys_close(void)
 
 	/* Free the double write data structures. */
 	ut_a(trx_doublewrite != NULL);
+	ut_ad(trx_doublewrite->n_reserved == 0);
+
 	ut_free(trx_doublewrite->write_buf_unaligned);
 	trx_doublewrite->write_buf_unaligned = NULL;
 
 	mem_free(trx_doublewrite->buf_block_arr);
 	trx_doublewrite->buf_block_arr = NULL;
 
+	mem_free(trx_doublewrite->in_use);
+	trx_doublewrite->in_use = NULL;
+
 	mutex_free(&trx_doublewrite->mutex);
 	mem_free(trx_doublewrite);
 	trx_doublewrite = NULL;


Attachment: [text/bzr-bundle] bzr/inaam.rana@oracle.com-20110810062639-wucshtazrqtzjrc1.bundle
Thread
bzr commit into mysql-trunk branch (inaam.rana:3352) Inaam Rana10 Aug