From: ingo Date: September 1 2006 10:35am Subject: bk commit into 5.0 tree (istruewing:1.2208) BUG#17332 List-Archive: http://lists.mysql.com/commits/11242 X-Bug: 17332 Message-Id: Below is the list of changes that have just been committed into a local 5.0 repository of istruewing. When istruewing does a push these changes will be propagated to the main repository and, within 24 hours after the push, to the public repository. For information on how to access the public repository see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html ChangeSet@stripped, 2006-09-01 12:34:55+02:00, istruewing@stripped +2 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load Resizing a key cache while it was in heavy use could crash the server. There were several race conditions. I reworked some of the algorithms to fix the race conditions. include/keycache.h@stripped, 2006-09-01 12:34:53+02:00, istruewing@stripped +2 -0 Bug#17332 - changing key_buffer_size on a running server can crash under load Added a wait queue for use by the "resizer" if it has to wait for other threads to finish with blocks after the flush is complete. mysys/mf_keycache.c@stripped, 2006-09-01 12:34:53+02:00, istruewing@stripped +401 -113 Bug#17332 - changing key_buffer_size on a running server can crash under load Changed the resize_queue handling so that the "resizer" does not hang in this queue while it flushes the cache. A thread can only wait in one queue. The "resizer" needs to wait for other thing during the flush. So it has to leave the queue as soon as it is in charge of resizing. Added a new queue to be used if the resizer has to wait for other threads to finish with blocks after the flush is complete. Added a check if a block has been freed while a thread was waiting for it. Added a check for a changed page when writing during resize. A changed block must be returned as the caller might want to overwrite a part of the block only. Added checks for blocks marked in switch or to be freed by another thread. These must not be freed by this thread. Added flush handling to the eviction procedure. If a block is marked in flush, the "evicter" has to wait until this is complete. Otherwise it has to mark it in flush before releasing the mutex. Changed the assignment of a block status bit after a first-time read to an addition of the bit to the block status. Other status bits must not be cleared here. The block might have been selected to be freed while the read was in progress. Changed the flush handling for changed blocks so that it skips blocks which are marked in flush already and waits for them later. Changed the flush handlind for clean blocks (this means freeing them) so that it skips blocks which are are marked in switch or to be freed. Also we need to restart from the beginning of the chain whenever we detect that the next block changed its state. It might not be at the same position in the chain any more. The change for the next block can happen while we need to wait for readers of the current block before we can free it. Changed the approach for flushing all blocks. It was not safe to select blocks from the LRU ring to find a file for flush. The LRU ring can be empty though blocks in use exist. The only safe way is to go through the 'changed_blocks' and 'file_blocks' hashes. This two-step approach has the advantage that changed blocks do not need to be freed immediately after their write. They can remain in the cache for further reads until the second step. # This is a BitKeeper patch. What follows are the unified diffs for the # set of deltas contained in the patch. The rest of the patch, the part # that BitKeeper cares about, is below these diffs. # User: istruewing # Host: chilla.local # Root: /home/mydev/mysql-5.0-bug17332-one --- 1.56/mysys/mf_keycache.c 2006-09-01 12:35:01 +02:00 +++ 1.57/mysys/mf_keycache.c 2006-09-01 12:35:01 +02:00 @@ -413,6 +413,7 @@ int init_key_cache(KEY_CACHE *keycache, keycache->resize_in_flush= 0; keycache->can_be_used= 1; + keycache->waiting_for_resize_cnt.last_thread= NULL; keycache->waiting_for_hash_link.last_thread= NULL; keycache->waiting_for_block.last_thread= NULL; DBUG_PRINT("exit", @@ -503,14 +504,22 @@ int resize_key_cache(KEY_CACHE *keycache #ifdef THREAD wqueue= &keycache->resize_queue; thread= my_thread_var; - link_into_queue(wqueue, thread); - while (wqueue->last_thread->next != thread) + /* We may need to wait for another thread which is doing a resize already. */ + while (keycache->resize_in_flush) { + /* Enter the resize_queue every time as the signalling thread removes us. */ + link_into_queue(wqueue, thread); keycache_pthread_cond_wait(&thread->suspend, &keycache->cache_lock); + /* We are removed from the resize_queue here. */ } #endif + /* + Mark the operation in progress. This blocks other threads from doing + a resize in parallel. And it prohibits changes to cache blocks. + Writing of index blocks has to bypass the cache. + */ keycache->resize_in_flush= 1; if (flush_all_key_blocks(keycache)) { @@ -520,19 +529,46 @@ int resize_key_cache(KEY_CACHE *keycache keycache->can_be_used= 0; goto finish; } + + /* + Mark the flush finished. This would allow other threads to start a + resize. But we do still have the cache_lock and do not release it + until the end of this function. + */ keycache->resize_in_flush= 0; + + /* Mark the cache unusable. This prohibits all uses of the cache. */ keycache->can_be_used= 0; + #ifdef THREAD + /* + Some cache read/write operations may still be unfinished. Wait until + they are done. However I doubt that this can ever happen. During + flush_all_key_blocks() we waited for every block in use and freed + all blocks. There cannot be any block in use any more (IMHO). + */ while (keycache->cnt_for_resize_op) { + /* + Enter the waiting_for_resize_cnt queue every time as the + signalling thread removes us. + */ + link_into_queue(&keycache->waiting_for_resize_cnt, thread); KEYCACHE_DBUG_PRINT("resize_key_cache: wait", ("suspend thread %ld", thread->id)); keycache_pthread_cond_wait(&thread->suspend, &keycache->cache_lock); + /* We are removed from the waiting_for_resize_cnt queue here. */ } #else KEYCACHE_DBUG_ASSERT(keycache->cnt_for_resize_op == 0); #endif + /* + Free old cache structures, allocate new structures, and initialize + them. Note that the cache_lock mutex and the resize_queue are left + untouched. We do not lose the cache_lock and will release it only at + the end of this function. + */ end_key_cache(keycache, 0); /* Don't free mutex */ /* The following will work even if use_mem is 0 */ blocks= init_key_cache(keycache, key_cache_block_size, use_mem, @@ -540,13 +576,17 @@ int resize_key_cache(KEY_CACHE *keycache finish: #ifdef THREAD - unlink_from_queue(wqueue, thread); /* Signal for the next resize request to proceeed if any */ if (wqueue->last_thread) { KEYCACHE_DBUG_PRINT("resize_key_cache: signal", ("thread %ld", wqueue->last_thread->next->id)); keycache_pthread_cond_signal(&wqueue->last_thread->next->suspend); + /* + Remove the signalled thread from the resize_queue. It will enter + itself again if necessary. + */ + unlink_from_queue(wqueue, wqueue->last_thread->next); } #endif keycache_pthread_mutex_unlock(&keycache->cache_lock); @@ -572,11 +612,17 @@ static inline void dec_counter_for_resiz #ifdef THREAD struct st_my_thread_var *last_thread; if (!--keycache->cnt_for_resize_op && - (last_thread= keycache->resize_queue.last_thread)) + (last_thread= keycache->waiting_for_resize_cnt.last_thread)) { + /* + The count became zero and there is a thread waiting for this + event. Wake it. + */ KEYCACHE_DBUG_PRINT("dec_counter_for_resize_op: signal", ("thread %ld", last_thread->next->id)); keycache_pthread_cond_signal(&last_thread->next->suspend); + /* Unlink the signalled thread from the queue. */ + unlink_from_queue(&keycache->waiting_for_resize_cnt, last_thread->next); } #else keycache->cnt_for_resize_op--; @@ -819,6 +865,10 @@ static inline void unlink_changed(BLOCK_ if (block->next_changed) block->next_changed->prev_changed= block->prev_changed; *block->prev_changed= block->next_changed; + + /* Avoid accidental traverse into the wrong chain. */ + block->next_changed= NULL; + block->prev_changed= NULL; } @@ -1138,6 +1188,9 @@ static inline void wait_for_readers(KEY_ block->condvar= &thread->suspend; keycache_pthread_cond_wait(&thread->suspend, &keycache->cache_lock); block->condvar= NULL; + /* The other thread might have freed the block in between. */ + if (!block->hash_link) + break; } #else KEYCACHE_DBUG_ASSERT(block->hash_link->requests == 0); @@ -1377,29 +1430,54 @@ restart: if (page_status != PAGE_READ) { - /* We don't need the page in the cache: we are going to write on disk */ + /* + The page is not in the cache. We don't need the page in the + cache: we are going to write on disk. Cancel the request. + */ hash_link->requests--; unlink_hash(keycache, hash_link); - return 0; + DBUG_RETURN(0); } + + /* The page is in the cache. */ + if (!(block->status & BLOCK_IN_FLUSH)) { - hash_link->requests--; + /* The page is not yet or no longer part of a flush operation. */ + if (block->status & BLOCK_CHANGED) + { + /* The block has changed contents, which must not be lost. */ + reg_requests(keycache, block, 1); + *page_st= page_status; + DBUG_RETURN(block); + } + /* - Remove block to invalidate the page in the block buffer - as we are going to write directly on disk. - Although we have an exlusive lock for the updated key part - the control can be yieded by the current thread as we might - have unfinished readers of other key parts in the block - buffer. Still we are guaranteed not to have any readers - of the key part we are writing into until the block is - removed from the cache as we set the BLOCL_REASSIGNED - flag (see the code below that handles reading requests). + Unregister from the hash_link. This must be done before freeing + the block. And it must also be done if not freeing it. */ - free_block(keycache, block); - return 0; + hash_link->requests--; + + /* If the block is not in eviction and not in free, we can free it. */ + if (!(block->status & (BLOCK_IN_SWITCH | BLOCK_REASSIGNED))) + { + /* + Remove block to invalidate the page in the block buffer + as we are going to write directly on disk. + Although we have an exlusive lock for the updated key part + the control can be yielded by the current thread as we might + have unfinished readers of other key parts in the block + buffer. Still we are guaranteed not to have any readers + of the key part we are writing into until the block is + removed from the cache as we set the BLOCK_REASSIGNED + flag (see the code below that handles reading requests). + */ + free_block(keycache, block); + } + DBUG_RETURN(0); } - /* Wait intil the page is flushed on disk */ + + /* Wait until the page is flushed on disk */ hash_link->requests--; { #ifdef THREAD @@ -1424,10 +1502,14 @@ restart: */ #endif } - /* Invalidate page in the block if it has not been done yet */ - if (block->status) + /* + Invalidate page in the block if it has not been done yet, and if + it is neither marked for eviction nor for to be freed. + */ + if (block->status && + !(block->status & (BLOCK_IN_SWITCH | BLOCK_REASSIGNED))) free_block(keycache, block); - return 0; + DBUG_RETURN(0); } if (page_status == PAGE_READ && @@ -1518,17 +1600,20 @@ restart: } else { - /* There are no never used blocks, use a block from the LRU chain */ - - /* - Wait until a new block is added to the LRU chain; - several threads might wait here for the same page, - all of them must get the same block + /* + There are no free blocks and no never used blocks, use a block + from the LRU ring. */ #ifdef THREAD if (! keycache->used_last) { + /* + The LRU ring is empty. Wait until a new block is added to + it. Several threads might wait here for the same page, all + of them must get the same block + */ + struct st_my_thread_var *thread= my_thread_var; thread->opt_info= (void *) hash_link; link_into_queue(&keycache->waiting_for_block, thread); @@ -1545,6 +1630,11 @@ restart: #else KEYCACHE_DBUG_ASSERT(keycache->used_last); #endif + /* + If we waited above, hash_link->block has been assigned by + link_block(). Otherwise it is still NULL. In the latter case + we need to grab a block from the LRU ring ourselves. + */ block= hash_link->block; if (! block) { @@ -1559,6 +1649,11 @@ restart: hash_link->block= block; } + /* + If we had to wait above, there is a small chance that another + thread grabbed this block for the same file block already. But + in most cases the first condition is true. + */ if (block->hash_link != hash_link && ! (block->status & BLOCK_IN_SWITCH) ) { @@ -1573,19 +1668,40 @@ restart: /* The block contains a dirty page - push it out of the cache */ KEYCACHE_DBUG_PRINT("find_key_block", ("block is dirty")); - - keycache_pthread_mutex_unlock(&keycache->cache_lock); - /* - The call is thread safe because only the current - thread might change the block->hash_link value - */ - error= my_pwrite(block->hash_link->file, - block->buffer+block->offset, - block->length - block->offset, - block->hash_link->diskpos+ block->offset, - MYF(MY_NABP | MY_WAIT_IF_FULL)); - keycache_pthread_mutex_lock(&keycache->cache_lock); - keycache->global_cache_write++; + if (block->status & BLOCK_IN_FLUSH) + { +#ifdef THREAD + /* Wait until the block is flushed. */ + struct st_my_thread_var *thread= my_thread_var; + add_to_queue(&block->wqueue[COND_FOR_SAVED], thread); + do + { + KEYCACHE_DBUG_PRINT("find_key_block: wait", + ("suspend thread %ld", thread->id)); + keycache_pthread_cond_wait(&thread->suspend, + &keycache->cache_lock); + } while (thread->next); +#else + KEYCACHE_DBUG_ASSERT(0); + /* No parallel requests in single-threaded case */ +#endif + } + else + { + block->status|= BLOCK_IN_FLUSH; + keycache_pthread_mutex_unlock(&keycache->cache_lock); + /* + The call is thread safe because only the current + thread might change the block->hash_link value + */ + error= my_pwrite(block->hash_link->file, + block->buffer+block->offset, + block->length - block->offset, + block->hash_link->diskpos+ block->offset, + MYF(MY_NABP | MY_WAIT_IF_FULL)); + keycache_pthread_mutex_lock(&keycache->cache_lock); + keycache->global_cache_write++; + } } block->status|= BLOCK_REASSIGNED; @@ -1715,7 +1831,7 @@ static void read_block(KEY_CACHE *keycac block->status|= BLOCK_ERROR; else { - block->status= BLOCK_READ; + block->status|= BLOCK_READ; block->length= got_length; } KEYCACHE_DBUG_PRINT("read_block", @@ -1983,7 +2099,7 @@ int key_cache_insert(KEY_CACHE *keycache keycache_pthread_mutex_lock(&keycache->cache_lock); /* Here we are alone again. */ #endif - block->status= BLOCK_READ; + block->status|= BLOCK_READ; block->length= read_length+offset; KEYCACHE_DBUG_PRINT("key_cache_insert", ("primary request: new page in cache")); @@ -2199,6 +2315,16 @@ static void free_block(KEY_CACHE *keycac KEYCACHE_DBUG_PRINT("free_block", ("block %u to be freed, hash_link %p", BLOCK_NUMBER(block), block->hash_link)); + /* + IMHO the below conditional makes no sense. I can't see how it could + be possible that free_block() is entered with a NULL hash_link + pointer. The only place where it can become NULL is in free_block() + (or before its first use ever, but for those blocks free_block() is + not called). I don't remove the conditional as it cannot harm, but + place an DBUG_ASSERT to confirm my hypothesis. Eventually the + conditional can be removed. + */ + DBUG_ASSERT(block->hash_link); if (block->hash_link) { /* @@ -2209,6 +2335,8 @@ static void free_block(KEY_CACHE *keycac */ block->status|= BLOCK_REASSIGNED; wait_for_readers(keycache, block); + /* The block must not have been freed by another thread. */ + DBUG_ASSERT(block->hash_link); unlink_hash(keycache, block->hash_link); } @@ -2300,6 +2428,8 @@ static int flush_cached_blocks(KEY_CACHE { keycache->blocks_changed--; keycache->global_blocks_changed--; + DBUG_ASSERT((block->status & BLOCK_IN_FLUSH) && + !(block->status & (BLOCK_IN_SWITCH | BLOCK_REASSIGNED))); free_block(keycache, block); } else @@ -2355,6 +2485,7 @@ static int flush_key_blocks_int(KEY_CACH uint count= 0; BLOCK_LINK **pos,**end; BLOCK_LINK *first_in_switch= NULL; + BLOCK_LINK *last_in_flush; BLOCK_LINK *block, *next; #if defined(KEYCACHE_DEBUG) uint cnt=0; @@ -2370,7 +2501,8 @@ static int flush_key_blocks_int(KEY_CACH block ; block= block->next_changed) { - if (block->hash_link->file == file) + if ((block->hash_link->file == file) && + !(block->status & BLOCK_IN_FLUSH)) { count++; KEYCACHE_DBUG_ASSERT(count<= keycache->blocks_used); @@ -2388,6 +2520,7 @@ static int flush_key_blocks_int(KEY_CACH /* Retrieve the blocks and write them to a buffer to be flushed */ restart: + last_in_flush= NULL; end= (pos= cache)+count; for (block= keycache->changed_blocks[FILE_HASH(file)] ; block ; @@ -2400,54 +2533,63 @@ restart: next= block->next_changed; if (block->hash_link->file == file) { - /* - Mark the block with BLOCK_IN_FLUSH in order not to let - other threads to use it for new pages and interfere with - our sequence ot flushing dirty file pages - */ - block->status|= BLOCK_IN_FLUSH; - - if (! (block->status & BLOCK_IN_SWITCH)) + if (!(block->status & BLOCK_IN_FLUSH)) { - /* - We care only for the blocks for which flushing was not - initiated by other threads as a result of page swapping + /* + Mark the block with BLOCK_IN_FLUSH in order not to let + other threads to use it for new pages and interfere with + our sequence ot flushing dirty file pages */ - reg_requests(keycache, block, 1); - if (type != FLUSH_IGNORE_CHANGED) + block->status|= BLOCK_IN_FLUSH; + + if (!(block->status & BLOCK_IN_SWITCH)) { - /* It's not a temporary file */ - if (pos == end) + /* + We care only for the blocks for which flushing was not + initiated by other threads as a result of page swapping + */ + reg_requests(keycache, block, 1); + if (type != FLUSH_IGNORE_CHANGED) { - /* - This happens only if there is not enough - memory for the big block - */ - if ((error= flush_cached_blocks(keycache, file, cache, - end,type))) - last_errno=error; - /* - Restart the scan as some other thread might have changed - the changed blocks chain: the blocks that were in switch - state before the flush started have to be excluded - */ - goto restart; + /* It's not a temporary file */ + if (pos == end) + { + /* + This happens only if there is not enough + memory for the big block + */ + if ((error= flush_cached_blocks(keycache, file, cache, + end,type))) + last_errno=error; + /* + Restart the scan as some other thread might have changed + the changed blocks chain: the blocks that were in switch + state before the flush started have to be excluded + */ + goto restart; + } + *pos++= block; + } + else + { + /* It's a temporary file */ + keycache->blocks_changed--; + keycache->global_blocks_changed--; + DBUG_ASSERT(!(block->status & BLOCK_REASSIGNED)); + free_block(keycache, block); } - *pos++= block; } else { - /* It's a temporary file */ - keycache->blocks_changed--; - keycache->global_blocks_changed--; - free_block(keycache, block); + /* Link the block into a list of blocks 'in switch' */ + unlink_changed(block); + link_changed(block, &first_in_switch); } } else { - /* Link the block into a list of blocks 'in switch' */ - unlink_changed(block); - link_changed(block, &first_in_switch); + /* Remember the last block found to be in flush. */ + last_in_flush= block; } } } @@ -2456,7 +2598,35 @@ restart: if ((error= flush_cached_blocks(keycache, file, cache, pos, type))) last_errno= error; } - /* Wait until list of blocks in switch is empty */ + else if (last_in_flush) + { + /* + There are no blocks to be flushed by this thread, but blocks in + flush by other threads. Wait until one of the blocks is flushed. + */ +#ifdef THREAD + struct st_my_thread_var *thread= my_thread_var; + add_to_queue(&last_in_flush->wqueue[COND_FOR_SAVED], thread); + do + { + KEYCACHE_DBUG_PRINT("flush_key_blocks_int: wait", + ("suspend thread %ld", thread->id)); + keycache_pthread_cond_wait(&thread->suspend, + &keycache->cache_lock); + } while (thread->next); +#else + KEYCACHE_DBUG_ASSERT(0); + /* No parallel requests in single-threaded case */ +#endif + /* Be sure not to lose a block. They may be flushed in random order. */ + goto restart; + } + + /* + Wait until the list of blocks in switch is empty. The threads that + are switching these blocks will relink them to clean file chains + while we wait and thus empty the 'first_in_switch' chain. + */ while (first_in_switch) { #if defined(KEYCACHE_DEBUG) @@ -2485,29 +2655,87 @@ restart: KEYCACHE_DBUG_ASSERT(cnt <= keycache->blocks_used); #endif } - /* The following happens very seldom */ + if (! (type == FLUSH_KEEP || type == FLUSH_FORCE_WRITE)) { -#if defined(KEYCACHE_DEBUG) - cnt=0; -#endif - for (block= keycache->file_blocks[FILE_HASH(file)] ; - block ; - block= next) + uint found; + + /* + Finally free all clean blocks for this file. + During resize this may be run by two threads in parallel. + */ + do { -#if defined(KEYCACHE_DEBUG) - cnt++; - KEYCACHE_DBUG_ASSERT(cnt <= keycache->blocks_used); -#endif - next= block->next_changed; - if (block->hash_link->file == file && - (! (block->status & BLOCK_CHANGED) - || type == FLUSH_IGNORE_CHANGED)) + found= 0; + for (block= keycache->file_blocks[FILE_HASH(file)] ; + block ; + block= next) { - reg_requests(keycache, block, 1); - free_block(keycache, block); + /* Remember the next block. After freeing we cannot get at it. */ + next= block->next_changed; + + /* + We must not free blocks in eviction (BLOCK_IN_SWITCH | + BLOCK_REASSIGNED), blocks intended to be freed + (BLOCK_REASSIGNED) or changed blocks (though I don't know + how this could happen here. Changed blocks should not appear + in the file_blocks hash. But I left this check as it was + before.) + */ + if ((block->hash_link->file == file) && + !(block->status & (BLOCK_IN_SWITCH | BLOCK_REASSIGNED)) && + (!(block->status & BLOCK_CHANGED) + || (type == FLUSH_IGNORE_CHANGED))) + { + struct st_hash_link *hash_link; + my_off_t diskpos; + File file; + uint status; + uint requests; + + found++; + KEYCACHE_DBUG_ASSERT(found <= keycache->blocks_used); + + reg_requests(keycache, block, 1); + + /* + free_block() may need to wait for readers of the block. + This is the moment where the other thread can move the + 'next' blocks from the chain. free_block() needs to wait + if there are requests for the block pending. + */ + if ((requests= block->requests) && next) + { + /* Copy values from the 'next' block and its hash_link. */ + hash_link= next->hash_link; + diskpos= hash_link->diskpos; + file= hash_link->file; + status= next->status; + } + + free_block(keycache, block); + + /* + If free_block() needed to wait and the state of the 'next' + block changed, break the inner loop. 'next' may no longer + be part of the current chain. + + We do not want to break the loop after every free_block(), + not even only after waits. The chain might be quite long + and contain blocks for many files. Traversing it again and + again to find more blocks for this file could become quite + inefficient. + */ + if (requests && next && + ((status != next->status) || + (hash_link != next->hash_link) || + (file != hash_link->file) || + (diskpos != hash_link->diskpos))) + break; + } } - } + + } while (found); } } @@ -2562,29 +2790,89 @@ int flush_key_blocks(KEY_CACHE *keycache static int flush_all_key_blocks(KEY_CACHE *keycache) { -#if defined(KEYCACHE_DEBUG) - uint cnt=0; -#endif - while (keycache->blocks_changed > 0) + uint found; + + /* Flush all changed blocks first. */ + do { BLOCK_LINK *block; - for (block= keycache->used_last->next_used ; ; block=block->next_used) + uint idx; + + found= 0; + /* Step over the whole changed_blocks hash array. */ + for (idx= 0; idx < CHANGED_BLOCKS_HASH; idx++) { - if (block->hash_link) + /* + If an array element is non-empty, use the first block from its + chain to find a file for flush. All blocks for this file are + flushed. So the same block will not appear at this place again + with the next iteration. New writes for blocks are not accepted + during the flush. + */ + if ((block= keycache->changed_blocks[idx])) { -#if defined(KEYCACHE_DEBUG) - cnt++; - KEYCACHE_DBUG_ASSERT(cnt <= keycache->blocks_used); -#endif + /* A block in the changed_blocks hash must have a hash_link. */ + DBUG_ASSERT(block->hash_link); + + found++; + /* + Flush dirty block but do not free them yet. They can be used + for reading until all other blocks are flushed too. + */ + if (flush_key_blocks_int(keycache, block->hash_link->file, + FLUSH_FORCE_WRITE)) + return 1; + } + } + + } while (found); + + /* + Now there are no dirty blocks in the cache any more and no new + writes are accepted since the start of the flush. + Now do also stop new clean blocks from being read into the cache. + This prevents a fight between the flushing thread and other threads + that requests new pages for read. + However there may be blocks still in use for reading. We need to + wait for them whenever we hit a block in use. + */ + keycache->can_be_used= FALSE; + + /* Now flush (free) all clean blocks. */ + do + { + BLOCK_LINK *block; + uint idx; + + found= 0; + /* Step over the whole file_blocks hash array. */ + for (idx= 0; idx < CHANGED_BLOCKS_HASH; idx++) + { + /* + If an array element is non-empty, use the first block from its + chain to find a file for flush. All blocks for this file are + freed. So the same block will not appear at this place again + with the next iteration. Unless it has been read into the cache + anew. In this case readers and the flusher fight against each + other. But since the flusher does not need to do I/O for clean + blocks, and writes for blocks are not accepted during the flush, + it will win finally. + */ + if ((block= keycache->file_blocks[idx])) + { + /* A block in the file_blocks hash must have a hash_link. */ + DBUG_ASSERT(block->hash_link); + + found++; if (flush_key_blocks_int(keycache, block->hash_link->file, FLUSH_RELEASE)) return 1; - break; } - if (block == keycache->used_last) - break; } - } + + } while (found); + + /* Now all blocks are free an no new blocks can come in (!can_be_used). */ return 0; } --- 1.6/include/keycache.h 2006-09-01 12:35:01 +02:00 +++ 1.7/include/keycache.h 2006-09-01 12:35:01 +02:00 @@ -73,6 +73,8 @@ typedef struct st_key_cache BLOCK_LINK *used_ins; /* ptr to the insertion block in LRU chain */ pthread_mutex_t cache_lock; /* to lock access to the cache structure */ KEYCACHE_WQUEUE resize_queue; /* threads waiting during resize operation */ + KEYCACHE_WQUEUE waiting_for_resize_cnt; /* waiting for a zero resize count. + Using a queue for symmetry though only one thread can wait here. */ KEYCACHE_WQUEUE waiting_for_hash_link; /* waiting for a free hash link */ KEYCACHE_WQUEUE waiting_for_block; /* requests waiting for a free block */ BLOCK_LINK *changed_blocks[CHANGED_BLOCKS_HASH]; /* hash for dirty file bl.*/