List:Commits« Previous MessageNext Message »
From:kpettersson Date:July 2 2007 5:14pm
Subject:bk commit into 5.1 tree (thek:1.2541) BUG#21074
View as plain text  
Below is the list of changes that have just been committed into a local
5.1 repository of thek. When thek does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html

ChangeSet@stripped, 2007-07-02 19:14:48+02:00, thek@adventure.(none) +10 -0
  Bug#21074 Large query_cache freezes mysql server sporadically under heavy load
  
  Invaldating a subset of a sufficiently large query cache can take a long time.
  During this time the server is efficiently frozen and no other operation can
  be executed. This patch addresses this problem by moving the locks which cause
  the freezing and also by temporarily disable the query cache while the 
  invalidation takes place.

  sql/ha_ndbcluster.cc@stripped, 2007-07-02 19:14:46+02:00, thek@adventure.(none) +7 -6
    - mysql_rm_table_part2 has a new parameter to indicate if OPEN_lock mutex 
      protection is needed.

  sql/lock.cc@stripped, 2007-07-02 19:14:46+02:00, thek@adventure.(none) +96 -0
    - Added function for acquiring table name exclusive locks.
    - Added function for asserting that table name lock is acquired.

  sql/mysql_priv.h@stripped, 2007-07-02 19:14:46+02:00, thek@adventure.(none) +7 -4
    - Added function for acquiring table name exclusive locks.
    - Added function for asserting that table name lock is acquired.
    - Added parameter to mysql_rm_table_part2 to indicate whether OPEN_lock mutex
      protection is needed or not.

  sql/sql_cache.cc@stripped, 2007-07-02 19:14:46+02:00, thek@adventure.(none) +444 -267
    - Changed flush_in_progress-flag into a state and added a function, 
      is_flushing() to reflect on this state. A new state was needed to indicate
      that a partial invalidation was in progress.
    - An unused parameter 'under_guard' was removed.
    - The Query_cache mutex structural_guard was pushed down into one
      invalidate_table function to avoid multiple entry points which makes
      maintainens more difficult.
    - Instead of keeping the structural_guard mutex during the entire invalidation
      we set the query cache status state to TABLE_FLUSH_IN_PROGRESS to
      temporarily disable the cache and avoid locking other threads needing the
      Query_cache resource.

  sql/sql_cache.h@stripped, 2007-07-02 19:14:46+02:00, thek@adventure.(none) +60 -14
    - Changed flush_in_progress-flag into a state and added a function, 
      is_flushing() to reflect on this state. A new state was needed to indicate
      that a partial invalidation was in progress.
    - An unused parameter 'under_guard' was removed.
    - The Query_cache mutex structural_guard was pushed down into one
      invalidate_table function to avoid multiple entry points which makes
      maintainens more difficult.
    - Instead of keeping the structural_guard mutex during the entire invalidation
      we set the query cache status state to TABLE_FLUSH_IN_PROGRESS to
      temporarily disable the cache and avoid locking other threads needing the
      the Query_cache resource.

  sql/sql_db.cc@stripped, 2007-07-02 19:14:46+02:00, thek@adventure.(none) +1 -1
    - mysql_rm_table_part2_with_lock is redundant and replaced
    with mysql_rm_table_part2.

  sql/sql_parse.cc@stripped, 2007-07-02 19:14:47+02:00, thek@adventure.(none) +1 -1
    - Function query_cache_invalidate3 isn't protect by a lock and we have a 
      race condition.
    - Moving this function into mysql_rename_tables and make sure it is protected
      by a exclusive table name lock.

  sql/sql_rename.cc@stripped, 2007-07-02 19:14:47+02:00, thek@adventure.(none) +12 -4
    - Function query_cache_invalidation3 isn't protect by a lock and we have a 
      race condition.
    - Moving this function into mysql_rename_tables and make sure it is protected
      by a exclusive table name lock.
    - Instead of using LOCK_open mutex, which excludes all other threads, the lock
      is changed into exclusive table name locks instead. This prevents us from
      locking the server if a query cache invalidation would take a long time to
      complete.

  sql/sql_table.cc@stripped, 2007-07-02 19:14:47+02:00, thek@adventure.(none) +24 -58
    - Instead of using LOCK_open mutex, which excludes all other threads, the lock
      is changed into exclusive table name locks instead. This prevents us from
      locking the server if a query cache invalidation would take a long time to
      complete.
    - Added new parameter to mysql_rm_table_part2 to control whether OPEN_lock mutex
      needs to be aquired or not. This is currently needed by the NDB implemenation.

  sql/sql_trigger.cc@stripped, 2007-07-02 19:14:47+02:00, thek@adventure.(none) +24 -16
    - Table_triggers don't need to be protexted by LOCK_open mutex. This 
      patch cancel this restriction.
    - Refactored comments to doxygen style.

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User:	thek
# Host:	adventure.(none)
# Root:	/home/thek/Development/cpp/bug21074/my51-bug21074

--- 1.108/sql/lock.cc	2007-05-24 12:24:28 +02:00
+++ 1.109/sql/lock.cc	2007-07-02 19:14:46 +02:00
@@ -1027,6 +1027,102 @@ end:
 }
 
 
+/**
+  @brief Lock all tables in list with an exclusive table name lock.
+
+  @param thd Thread handle.
+  @param table_list Names of tables to lock.
+
+  @note This function needs to be protected by LOCK_open. If we're 
+    under LOCK TABLES, this function does not work as advertised. Namely,
+    it does not exclude other threads from using this table and does not
+    put an exclusive name lock on this table into the table cache.
+
+  @see lock_table_names
+  @see unlock_table_names
+
+  @retval TRUE An error occured.
+  @retval FALSE Name lock successfully acquired.
+*/
+
+bool lock_table_names_exclusively(THD *thd, TABLE_LIST *table_list)
+{
+  if (lock_table_names(thd, table_list))
+    return TRUE;
+
+  /*
+    Upgrade the table name locks from semi-exclusive to exclusive locks.
+  */
+  for (TABLE_LIST *table= table_list; table; table= table->next_global)
+  {
+    if (table->table)
+      table->table->open_placeholder= 1;
+  }
+  return FALSE;
+}
+
+
+/**
+  @brief Test is 'table' is protected by an exclusive name lock.
+
+  @param[in] thd The current thread handler
+  @param[in] table Table container containing the single table to be tested
+
+  @note Needs to be protected by LOCK_open mutex.
+
+  @return Error status code
+    @retval TRUE Table is protected
+    @retval FALSE Table is not protected
+*/
+
+bool
+is_table_name_exclusively_locked_by_this_thread(THD *thd,
+                                                TABLE_LIST *table_list)
+{
+  char  key[MAX_DBKEY_LENGTH];
+  uint  key_length;
+
+  key_length= create_table_def_key(thd, key, table_list, 0);
+
+  return is_table_name_exclusively_locked_by_this_thread(thd, (uchar *)key,
+                                                         key_length);
+}
+
+
+/**
+  @brief Test is 'table key' is protected by an exclusive name lock.
+
+  @param[in] thd The current thread handler.
+  @param[in] table Table container containing the single table to be tested.
+
+  @note Needs to be protected by LOCK_open mutex
+
+  @retval TRUE Table is protected
+  @retval FALSE Table is not protected
+ */
+
+bool
+is_table_name_exclusively_locked_by_this_thread(THD *thd, uchar *key,
+                                                int key_length)
+{
+  HASH_SEARCH_STATE state;
+  TABLE *table;
+
+  for (table= (TABLE*) hash_first(&open_cache, key,
+                                  key_length, &state);
+       table ;
+       table= (TABLE*) hash_next(&open_cache, key,
+                                 key_length, &state))
+  {
+    if (table->in_use == thd &&
+        table->open_placeholder == 1 &&
+        table->s->version == 0)
+      return TRUE;
+  }
+
+  return FALSE;
+}
+
 /*
   Unlock all tables in list with a name lock
 

--- 1.511/sql/mysql_priv.h	2007-06-01 09:43:52 +02:00
+++ 1.512/sql/mysql_priv.h	2007-07-02 19:14:46 +02:00
@@ -820,10 +820,8 @@ void mysql_client_binlog_statement(THD *
 bool mysql_rm_table(THD *thd,TABLE_LIST *tables, my_bool if_exists,
                     my_bool drop_temporary);
 int mysql_rm_table_part2(THD *thd, TABLE_LIST *tables, bool if_exists,
-			 bool drop_temporary, bool drop_view, bool log_query);
-int mysql_rm_table_part2_with_lock(THD *thd, TABLE_LIST *tables,
-				   bool if_exists, bool drop_temporary,
-				   bool log_query);
+                         bool drop_temporary, bool drop_view, bool log_query,
+                         bool need_lock_open);
 bool quick_rm_table(handlerton *base,const char *db,
                     const char *table_name, uint flags);
 void close_cached_table(THD *thd, TABLE *table);
@@ -1799,6 +1797,11 @@ bool wait_for_locked_table_names(THD *th
 bool lock_table_names(THD *thd, TABLE_LIST *table_list);
 void unlock_table_names(THD *thd, TABLE_LIST *table_list,
 			TABLE_LIST *last_table);
+bool lock_table_names_exclusively(THD *thd, TABLE_LIST *table_list);
+bool is_table_name_exclusively_locked_by_this_thread(THD *thd, 
+                                                     TABLE_LIST *table_list);
+bool is_table_name_exclusively_locked_by_this_thread(THD *thd, uchar *key,
+                                                     int key_length);
 
 
 /* old unireg functions */

--- 1.112/sql/sql_cache.cc	2007-06-01 10:12:02 +02:00
+++ 1.113/sql/sql_cache.cc	2007-07-02 19:14:46 +02:00
@@ -268,6 +268,39 @@ are stored in one block.
 
 If join_results allocated new block(s) then we need call pack_cache again.
 
+7. Interface
+The query cache interfaces with the rest of the server code through 7
+functions:
+ 1. Query_cache::send_result_to_client
+       - Called before parsing and used to match a statement with the stored
+         queries hash.
+         If a match is found the cached result set is sent through repeated
+         calls to net_real_write. (note: calling thread doesn't have a regis-
+         tered result set writer: thd->net.query_cache_query=0)
+ 2. Query_cache::store_query
+       - Called just before handle_select() and is used to register a result
+         set writer to the statement currently being processed
+         (thd->net.query_cache_query).
+ 3. query_cache_insert
+       - Called from net_real_write to append a result set to a cached query
+         if (and only if) this query has a registered result set writer
+         (thd->net.query_cache_query).
+ 4. Query_cache::invalidate
+       - Called from various places to invalidate query cache based on data-
+         base, table and myisam file name. During an on going invalidation
+         the query cache is temporarily disabled.
+ 5. Query_cache::flush
+       - Used when a RESET QUERY CACHE is issued. This clears the entire
+         cache block by block.
+ 6. Query_cache::resize
+       - Used to change the available memory used by the query cache. This
+         will also invalidate the entrie query cache in one free operation.
+ 7. Query_cache::pack
+       - Used when a FLUSH QUERY CACHE is issued. This changes the order of
+         the used memory blocks in physical memory order and move all avail-
+         able memory to the 'bottom' of the memory.
+
+
 TODO list:
 
   - Delayed till after-parsing qache answer (for column rights processing)
@@ -615,49 +648,55 @@ void query_cache_insert(NET *net, const 
     DBUG_VOID_RETURN;
 
   STRUCT_LOCK(&query_cache.structure_guard_mutex);
+  bool interrupt;
+  query_cache.wait_while_table_flush_is_in_progress(&interrupt);
+  if (interrupt)
+  {
+    STRUCT_UNLOCK(&query_cache.structure_guard_mutex);
+    return;
+  }
 
-  if (unlikely(query_cache.query_cache_size == 0 ||
-               query_cache.flush_in_progress))
+  Query_cache_block *query_block= (Query_cache_block*)net->query_cache_query;
+  if (!query_block)
   {
+    /*
+      We lost the writer and the currently processed query has been
+      invalidated; there is nothing left to do.
+    */
     STRUCT_UNLOCK(&query_cache.structure_guard_mutex);
     DBUG_VOID_RETURN;
   }
 
-  Query_cache_block *query_block = ((Query_cache_block*)
-				    net->query_cache_query);
-  if (query_block)
-  {
-    Query_cache_query *header = query_block->query();
-    Query_cache_block *result = header->result();
+  Query_cache_query *header= query_block->query();
+  Query_cache_block *result= header->result();
 
-    DUMP(&query_cache);
-    BLOCK_LOCK_WR(query_block);
-    DBUG_PRINT("qcache", ("insert packet %lu bytes long",length));
+  DUMP(&query_cache);
+  BLOCK_LOCK_WR(query_block);
+  DBUG_PRINT("qcache", ("insert packet %lu bytes long",length));
 
-    /*
-      On success STRUCT_UNLOCK(&query_cache.structure_guard_mutex) will be
-      done by query_cache.append_result_data if success (if not we need
-      query_cache.structure_guard_mutex locked to free query)
-    */
-    if (!query_cache.append_result_data(&result, length, (uchar*) packet,
-					query_block))
-    {
-      DBUG_PRINT("warning", ("Can't append data"));
-      header->result(result);
-      DBUG_PRINT("qcache", ("free query 0x%lx", (ulong) query_block));
-      // The following call will remove the lock on query_block
-      query_cache.free_query(query_block);
-      // append_result_data no success => we need unlock
-      STRUCT_UNLOCK(&query_cache.structure_guard_mutex);
-      DBUG_VOID_RETURN;
-    }
+  /*
+    On success, STRUCT_UNLOCK is done by append_result_data. Otherwise, we
+    still need structure_guard_mutex to free the query, and therefore unlock
+    it later in this function.
+  */
+  if (!query_cache.append_result_data(&result, length, (uchar*) packet,
+                                      query_block))
+  {
+    DBUG_PRINT("warning", ("Can't append data"));
     header->result(result);
-    header->last_pkt_nr= net->pkt_nr;
-    BLOCK_UNLOCK_WR(query_block);
-    DBUG_EXECUTE("check_querycache",query_cache.check_integrity(0););
-  }
-  else
+    DBUG_PRINT("qcache", ("free query 0x%lx", (ulong) query_block));
+    // The following call will remove the lock on query_block
+    query_cache.free_query(query_block);
+    // append_result_data no success => we need unlock
     STRUCT_UNLOCK(&query_cache.structure_guard_mutex);
+    DBUG_VOID_RETURN;
+  }
+
+  header->result(result);
+  header->last_pkt_nr= net->pkt_nr;
+  BLOCK_UNLOCK_WR(query_block);
+  DBUG_EXECUTE("check_querycache",query_cache.check_integrity(0););
+
   DBUG_VOID_RETURN;
 }
 
@@ -671,17 +710,21 @@ void query_cache_abort(NET *net)
     DBUG_VOID_RETURN;
 
   STRUCT_LOCK(&query_cache.structure_guard_mutex);
-
-  if (unlikely(query_cache.query_cache_size == 0 ||
-               query_cache.flush_in_progress))
+  bool interrupt;
+  query_cache.wait_while_table_flush_is_in_progress(&interrupt);
+  if (interrupt)
   {
     STRUCT_UNLOCK(&query_cache.structure_guard_mutex);
     DBUG_VOID_RETURN;
   }
 
+  /*
+    While we were waiting another thread might have changed the status
+    of the writer. Make sure the writer still exists before continue.
+  */
   Query_cache_block *query_block= ((Query_cache_block*)
                                    net->query_cache_query);
-  if (query_block)			// Test if changed by other thread
+  if (query_block)
   {
     DUMP(&query_cache);
     BLOCK_LOCK_WR(query_block);
@@ -713,13 +756,22 @@ void query_cache_end_of_result(THD *thd)
 
   STRUCT_LOCK(&query_cache.structure_guard_mutex);
 
-  if (unlikely(query_cache.query_cache_size == 0 ||
-               query_cache.flush_in_progress))
-    goto end;
+  bool interrupt;
+  query_cache.wait_while_table_flush_is_in_progress(&interrupt);
+  if (interrupt)
+  {
+    STRUCT_UNLOCK(&query_cache.structure_guard_mutex);
+    DBUG_VOID_RETURN;
+  }
 
   query_block= ((Query_cache_block*) thd->net.query_cache_query);
   if (query_block)
   {
+    /*
+      The writer is still present; finish last result block by chopping it to 
+      suitable size if needed and setting block type. Since this is the last
+      block, the writer should be dropped.
+    */
     DUMP(&query_cache);
     BLOCK_LOCK_WR(query_block);
     Query_cache_query *header= query_block->query();
@@ -746,8 +798,11 @@ void query_cache_end_of_result(THD *thd)
 #endif
     header->found_rows(current_thd->limit_found_rows);
     header->result()->type= Query_cache_block::RESULT;
+
+    /* Drop the writer. */
     header->writer(0);
     thd->net.query_cache_query= 0;
+
     BLOCK_UNLOCK_WR(query_block);
     DBUG_EXECUTE("check_querycache",query_cache.check_integrity(1););
 
@@ -801,9 +856,9 @@ ulong Query_cache::resize(ulong query_ca
   DBUG_ASSERT(initialized);
 
   STRUCT_LOCK(&structure_guard_mutex);
-  while (flush_in_progress)
-    pthread_cond_wait(&COND_flush_finished, &structure_guard_mutex);
-  flush_in_progress= TRUE;
+  while (is_flushing())
+    pthread_cond_wait(&COND_cache_status_changed, &structure_guard_mutex);
+  m_cache_status= Query_cache::FLUSH_IN_PROGRESS;
   STRUCT_UNLOCK(&structure_guard_mutex);
 
   free_cache();
@@ -814,8 +869,8 @@ ulong Query_cache::resize(ulong query_ca
   DBUG_EXECUTE("check_querycache",check_integrity(0););
 
   STRUCT_LOCK(&structure_guard_mutex);
-  flush_in_progress= FALSE;
-  pthread_cond_signal(&COND_flush_finished);
+  m_cache_status= Query_cache::NO_FLUSH_IN_PROGRESS;
+  pthread_cond_signal(&COND_cache_status_changed);
   STRUCT_UNLOCK(&structure_guard_mutex);
 
   DBUG_RETURN(new_query_cache_size);
@@ -910,8 +965,13 @@ def_week_frmt: %lu",                    
     ha_release_temporary_latches(thd);
 
     STRUCT_LOCK(&structure_guard_mutex);
-    if (query_cache_size == 0 || flush_in_progress)
+    if (query_cache_size == 0 || is_flushing())
     {
+      /*
+        A table- or a full flush operation can potentially take a long time to 
+        finish. We choose not to wait for them and skip caching statements
+        instead.
+      */
       STRUCT_UNLOCK(&structure_guard_mutex);
       DBUG_VOID_RETURN;
     }
@@ -954,7 +1014,7 @@ def_week_frmt: %lu",                    
       Query_cache_block *query_block;
       query_block= write_block_data(tot_length, (uchar*) thd->query,
 				    ALIGN_SIZE(sizeof(Query_cache_query)),
-				    Query_cache_block::QUERY, local_tables, 1);
+				    Query_cache_block::QUERY, local_tables);
       if (query_block != 0)
       {
 	DBUG_PRINT("qcache", ("query block 0x%lx allocated, %lu",
@@ -1088,13 +1148,21 @@ Query_cache::send_result_to_client(THD *
   }
 
   STRUCT_LOCK(&structure_guard_mutex);
-  if (query_cache_size == 0 || flush_in_progress)
+
+  if (query_cache_size == 0)
+    goto err_unlock;
+
+  if (is_flushing())
   {
-    DBUG_PRINT("qcache", ("query cache disabled"));
+    /* Return; Query cache is temporarily disabled while we flush. */
+    DBUG_PRINT("qcache",("query cache disabled"));
     goto err_unlock;
   }
 
-  /* Check that we haven't forgot to reset the query cache variables */
+  /*
+    Check that we haven't forgot to reset the query cache variables;
+    make sure there are no attached query cache writer to this thread.
+   */
   DBUG_ASSERT(thd->net.query_cache_query == 0);
 
   Query_cache_block *query_block;
@@ -1267,7 +1335,7 @@ def_week_frmt: %lu",                    
                    ("Handler require invalidation queries of %s.%s %lu-%lu",
                     table_list.db, table_list.alias,
                     (ulong) engine_data, (ulong) table->engine_data()));
-        invalidate_table((uchar *) table->db(), table->key_length());
+        invalidate_table(thd, (uchar *) table->db(), table->key_length());
       }
       else
         thd->lex->safe_to_cache_query= 0;       // Don't try to cache this
@@ -1330,32 +1398,26 @@ void Query_cache::invalidate(THD *thd, T
 			     my_bool using_transactions)
 {
   DBUG_ENTER("Query_cache::invalidate (table list)");
-  STRUCT_LOCK(&structure_guard_mutex);
-  if (query_cache_size > 0 && !flush_in_progress)
-  {
-    DUMP(this);
 
-    using_transactions= using_transactions &&
-      (thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN));
-    for (; tables_used; tables_used= tables_used->next_local)
-    {
-      DBUG_ASSERT(!using_transactions || tables_used->table!=0);
-      if (tables_used->derived)
-        continue;
-      if (using_transactions &&
-         (tables_used->table->file->table_cache_type() ==
-          HA_CACHE_TBL_TRANSACT))
-        /*
-           Tables_used->table can't be 0 in transaction.
-           Only 'drop' invalidate not opened table, but 'drop'
-           force transaction finish.
-        */
-        thd->add_changed_table(tables_used->table);
-      else
-        invalidate_table(tables_used);
-    }
+  using_transactions= using_transactions &&
+    (thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN));
+  for (; tables_used; tables_used= tables_used->next_local)
+  {
+    DBUG_ASSERT(!using_transactions || tables_used->table!=0);
+    if (tables_used->derived)
+      continue;
+    if (using_transactions &&
+        (tables_used->table->file->table_cache_type() ==
+        HA_CACHE_TBL_TRANSACT))
+      /*
+        tables_used->table can't be 0 in transaction.
+        Only 'drop' invalidate not opened table, but 'drop'
+        force transaction finish.
+      */
+      thd->add_changed_table(tables_used->table);
+    else
+      invalidate_table(thd, tables_used);
   }
-  STRUCT_UNLOCK(&structure_guard_mutex);
 
   DBUG_VOID_RETURN;
 }
@@ -1363,21 +1425,13 @@ void Query_cache::invalidate(THD *thd, T
 void Query_cache::invalidate(CHANGED_TABLE_LIST *tables_used)
 {
   DBUG_ENTER("Query_cache::invalidate (changed table list)");
-  if (tables_used)
+  THD *thd= current_thd;
+  for (; tables_used; tables_used= tables_used->next)
   {
-    STRUCT_LOCK(&structure_guard_mutex);
-    if (query_cache_size > 0 && !flush_in_progress)
-    {
-      DUMP(this);
-      for (; tables_used; tables_used= tables_used->next)
-      {
-	invalidate_table((uchar*) tables_used->key, tables_used->key_length);
-	DBUG_PRINT("qcache", ("db: %s  table: %s", tables_used->key,
-			      tables_used->key+
-			      strlen(tables_used->key)+1));
-      }
-    }
-    STRUCT_UNLOCK(&structure_guard_mutex);
+    invalidate_table(thd, (uchar*) tables_used->key, tables_used->key_length);
+    DBUG_PRINT("qcache", ("db: %s  table: %s", tables_used->key,
+                          tables_used->key+
+                          strlen(tables_used->key)+1));
   }
   DBUG_VOID_RETURN;
 }
@@ -1396,20 +1450,14 @@ void Query_cache::invalidate(CHANGED_TAB
 void Query_cache::invalidate_locked_for_write(TABLE_LIST *tables_used)
 {
   DBUG_ENTER("Query_cache::invalidate_locked_for_write");
-  if (tables_used)
+  for (; tables_used; tables_used= tables_used->next_local)
   {
-    STRUCT_LOCK(&structure_guard_mutex);
-    if (query_cache_size > 0 && !flush_in_progress)
+    if (tables_used->lock_type & (TL_WRITE_LOW_PRIORITY | TL_WRITE) &&
+        tables_used->table)
     {
-      DUMP(this);
-      for (; tables_used; tables_used= tables_used->next_local)
-      {
-        if (tables_used->lock_type & (TL_WRITE_LOW_PRIORITY | TL_WRITE) &&
-            tables_used->table)
-	  invalidate_table(tables_used->table);
-      }
+      THD *thd= current_thd; 
+      invalidate_table(thd, tables_used->table);
     }
-    STRUCT_UNLOCK(&structure_guard_mutex);
   }
   DBUG_VOID_RETURN;
 }
@@ -1423,18 +1471,14 @@ void Query_cache::invalidate(THD *thd, T
 {
   DBUG_ENTER("Query_cache::invalidate (table)");
   
-  STRUCT_LOCK(&structure_guard_mutex);
-  if (query_cache_size > 0 && !flush_in_progress)
-  {
-    using_transactions= using_transactions &&
-      (thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN));
-    if (using_transactions && 
-        (table->file->table_cache_type() == HA_CACHE_TBL_TRANSACT))
-      thd->add_changed_table(table);
-    else
-      invalidate_table(table);
-  }
-  STRUCT_UNLOCK(&structure_guard_mutex);
+  using_transactions= using_transactions &&
+    (thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN));
+  if (using_transactions && 
+      (table->file->table_cache_type() == HA_CACHE_TBL_TRANSACT))
+    thd->add_changed_table(table);
+  else
+    invalidate_table(thd, table);
+
 
   DBUG_VOID_RETURN;
 }
@@ -1443,31 +1487,77 @@ void Query_cache::invalidate(THD *thd, c
 			     my_bool using_transactions)
 {
   DBUG_ENTER("Query_cache::invalidate (key)");
-  
-  STRUCT_LOCK(&structure_guard_mutex);
-  if (query_cache_size > 0 && !flush_in_progress)
-  {
-    using_transactions= using_transactions &&
-      (thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN));
-    if (using_transactions) // used for innodb => has_transactions() is TRUE
-      thd->add_changed_table(key, key_length);
-    else
-      invalidate_table((uchar*)key, key_length);
-  }
-  STRUCT_UNLOCK(&structure_guard_mutex);  
+
+  using_transactions= using_transactions &&
+    (thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN));
+  if (using_transactions) // used for innodb => has_transactions() is TRUE
+    thd->add_changed_table(key, key_length);
+  else
+    invalidate_table(thd, (uchar*)key, key_length);
 
   DBUG_VOID_RETURN;
 }
 
+
+/**
+  @brief Synchronize the thread with any flushing operations.
+
+  This helper function is called whenever a thread needs to operate on the
+  query cache structure (example: during invalidation). If a table flush is in
+  progress this function will wait for it to stop. If a full flush is in
+  progress, the function will set the interrupt parameter to indicate that the
+  current operation is redundant and should be interrupted.
+
+  @param[out] interrupt This out-parameter will be set to TRUE if the calling
+    function is redundant and should be interrupted.
+
+  @return If the interrupt-parameter is TRUE then m_cache_status is set to
+    NO_FLUSH_IN_PROGRESS. If the interrupt-parameter is FALSE then
+    m_cache_status is set to FLUSH_IN_PROGRESS.
+    The structure_guard_mutex will in any case be locked.
+*/
+
+void Query_cache::wait_while_table_flush_is_in_progress(bool *interrupt)
+{
+  while (is_flushing())
+  {
+    /*
+      If there already is a full flush in progress query cache isn't enabled
+      and additional flushes are redundant; just return instead.
+    */
+    if (m_cache_status == Query_cache::FLUSH_IN_PROGRESS)
+    {
+      *interrupt= TRUE;
+      return;
+    }
+    /*
+      If a table flush is in progress; wait on cache status to change.
+    */
+    if (m_cache_status == Query_cache::TABLE_FLUSH_IN_PROGRESS)
+      pthread_cond_wait(&COND_cache_status_changed, &structure_guard_mutex);
+  }
+  *interrupt= FALSE;
+}
+
 /*
   Remove all cached queries that uses the given database
 */
-
 void Query_cache::invalidate(char *db)
 {
   DBUG_ENTER("Query_cache::invalidate (db)");
+
   STRUCT_LOCK(&structure_guard_mutex);
-  if (query_cache_size > 0 && !flush_in_progress)
+  bool interrupt;
+  wait_while_table_flush_is_in_progress(&interrupt);
+  if (interrupt)
+  {
+    STRUCT_UNLOCK(&structure_guard_mutex);
+    return;
+  }
+
+  THD *thd= current_thd;
+
+  if (query_cache_size > 0)
   {
     DUMP(this);
   restart_search:
@@ -1479,7 +1569,10 @@ void Query_cache::invalidate(char *db)
       {
         next= curr->next;
         if (strcmp(db, (char*)(curr->table()->db())) == 0)
-          invalidate_table(curr);
+        {
+          Query_cache_block_table *list_root= curr->table(0);
+          invalidate_query_block_list(thd,list_root);
+        }
         /*
           invalidate_table can freed block on which point 'next' (if
           table of this block used only in queries which was deleted
@@ -1491,6 +1584,11 @@ void Query_cache::invalidate(char *db)
         if (next->type == Query_cache_block::FREE)
           goto restart_search;
         curr= next;
+        /*
+          The loop will end if the circular list pointer has reached the
+          point where it started from, or if the current thread was signaled
+          to die.
+        */
       } while (curr != tables_blocks);
     }
   }
@@ -1504,21 +1602,12 @@ void Query_cache::invalidate_by_MyISAM_f
 {
   DBUG_ENTER("Query_cache::invalidate_by_MyISAM_filename");
 
-  STRUCT_LOCK(&structure_guard_mutex);
-  if (query_cache_size > 0 && !flush_in_progress)
-  {
-    /* Calculate the key outside the lock to make the lock shorter */
-    char key[MAX_DBKEY_LENGTH];
-    uint32 db_length;
-    uint key_length= filename_2_table_key(key, filename, &db_length);
-    Query_cache_block *table_block;
-    if ((table_block = (Query_cache_block*) hash_search(&tables,
-                                                        (uchar*) key,
-                                                        key_length)))
-      invalidate_table(table_block);
-  }
-  STRUCT_UNLOCK(&structure_guard_mutex);
-
+  /* Calculate the key outside the lock to make the lock shorter */
+  char key[MAX_DBKEY_LENGTH];
+  uint32 db_length;
+  uint key_length= filename_2_table_key(key, filename, &db_length);
+  THD *thd= current_thd;
+  invalidate_table(thd,(uchar *)key, key_length);
   DBUG_VOID_RETURN;
 }
 
@@ -1540,16 +1629,43 @@ void Query_cache::flush()
   DBUG_VOID_RETURN;
 }
 
-  /* Join result in cache in 1 block (if result length > join_limit) */
+
+/**
+  @brief Rearrange the memory blocks and join result in cache in 1 block (if
+    result length > join_limit)
+
+  @param[in] join_limit If the minimum length of a result block to be joined.
+  @param[in] iteration_limit The maximum number of packing and joining
+    sequences.
+
+*/
 
 void Query_cache::pack(ulong join_limit, uint iteration_limit)
 {
   DBUG_ENTER("Query_cache::pack");
+
+  bool interrupt;
+  STRUCT_LOCK(&structure_guard_mutex);
+  wait_while_table_flush_is_in_progress(&interrupt);
+  if (interrupt)
+  {
+    STRUCT_UNLOCK(&structure_guard_mutex);
+    DBUG_VOID_RETURN;
+  }
+
+  if (query_cache_size == 0)
+  {
+    STRUCT_UNLOCK(&structure_guard_mutex);
+    DBUG_VOID_RETURN;
+  }
+
   uint i = 0;
   do
   {
     pack_cache();
   } while ((++i < iteration_limit) && join_results(join_limit));
+
+  STRUCT_UNLOCK(&structure_guard_mutex);
   DBUG_VOID_RETURN;
 }
 
@@ -1568,7 +1684,7 @@ void Query_cache::destroy()
     free_cache();
     STRUCT_UNLOCK(&structure_guard_mutex);
 
-    pthread_cond_destroy(&COND_flush_finished);
+    pthread_cond_destroy(&COND_cache_status_changed);
     pthread_mutex_destroy(&structure_guard_mutex);
     initialized = 0;
   }
@@ -1584,8 +1700,8 @@ void Query_cache::init()
 {
   DBUG_ENTER("Query_cache::init");
   pthread_mutex_init(&structure_guard_mutex,MY_MUTEX_INIT_FAST);
-  pthread_cond_init(&COND_flush_finished, NULL);
-  flush_in_progress= FALSE;
+  pthread_cond_init(&COND_cache_status_changed, NULL);
+  m_cache_status= Query_cache::NO_FLUSH_IN_PROGRESS;
   initialized = 1;
   DBUG_VOID_RETURN;
 }
@@ -1787,9 +1903,10 @@ void Query_cache::make_disabled()
 /**
   @class Query_cache
   @brief Free all resources allocated by the cache.
-  @details  This function frees all resources allocated by the cache.  You
-    have to call init_cache() before using the cache again. This function requires
-    the structure_guard_mutex to be locked.
+
+  This function frees all resources allocated by the cache.  You
+  have to call init_cache() before using the cache again. This function
+  requires the structure_guard_mutex to be locked.
 */
 
 void Query_cache::free_cache()
@@ -1808,24 +1925,17 @@ void Query_cache::free_cache()
 *****************************************************************************/
 
 
-/*
-  flush_cache() - flush the cache.
-
-  SYNOPSIS
-    flush_cache()
+/**
+  @brief Flush the cache.
 
-  DESCRIPTION
-    This function will flush cache contents.  It assumes we have
-    'structure_guard_mutex' locked.  The function sets the
-    flush_in_progress flag and releases the lock, so other threads may
-    proceed skipping the cache as if it is disabled.  Concurrent
-    flushes are performed in turn.
-
-    After flush_cache() call, the cache is flushed, all the freed
-    memory is accumulated in bin[0], and the 'structure_guard_mutex'
-    is locked.  However, since we could release the mutex during
-    execution, the rest of the cache state could have been changed,
-    and should not be relied on.
+  This function will flush cache contents.  It assumes we have
+  'structure_guard_mutex' locked. The function sets the m_cache_status flag and
+  releases the lock, so other threads may proceed skipping the cache as if it
+  is disabled. Concurrent flushes are performed in turn.
+  After flush_cache() call, the cache is flushed, all the freed memory is
+  accumulated in bin[0], and the 'structure_guard_mutex' is locked. However,
+  since we could release the mutex during execution, the rest of the cache
+  state could have been changed, and should not be relied on.
 */
 
 void Query_cache::flush_cache()
@@ -1837,15 +1947,15 @@ void Query_cache::flush_cache()
     Query_cache::free_cache()) depends on the fact that after the
     flush the cache is empty.
   */
-  while (flush_in_progress)
-    pthread_cond_wait(&COND_flush_finished, &structure_guard_mutex);
+  while (is_flushing())
+    pthread_cond_wait(&COND_cache_status_changed, &structure_guard_mutex);
 
   /*
-    Setting 'flush_in_progress' will prevent other threads from using
+    Setting 'FLUSH_IN_PROGRESS' will prevent other threads from using
     the cache while we are in the middle of the flush, and we release
     the lock so that other threads won't block.
   */
-  flush_in_progress= TRUE;
+  m_cache_status= Query_cache::FLUSH_IN_PROGRESS;
   STRUCT_UNLOCK(&structure_guard_mutex);
 
   my_hash_reset(&queries);
@@ -1856,8 +1966,8 @@ void Query_cache::flush_cache()
   }
 
   STRUCT_LOCK(&structure_guard_mutex);
-  flush_in_progress= FALSE;
-  pthread_cond_signal(&COND_flush_finished);
+  m_cache_status= Query_cache::NO_FLUSH_IN_PROGRESS;
+  pthread_cond_signal(&COND_cache_status_changed);
 }
 
 /*
@@ -1875,7 +1985,7 @@ my_bool Query_cache::free_old_query()
       sequence is breached.
       Also we don't need remove locked queries at this point.
     */
-    Query_cache_block *query_block = 0;
+    Query_cache_block *query_block= 0;
     if (queries_blocks != 0)
     {
       Query_cache_block *block = queries_blocks;
@@ -2013,8 +2123,7 @@ Query_cache_block *
 Query_cache::write_block_data(ulong data_len, uchar* data,
 			      ulong header_len,
 			      Query_cache_block::block_type type,
-			      TABLE_COUNTER_TYPE ntab,
-			      my_bool under_guard)
+			      TABLE_COUNTER_TYPE ntab)
 {
   ulong all_headers_len = (ALIGN_SIZE(sizeof(Query_cache_block)) +
 			   ALIGN_SIZE(ntab*sizeof(Query_cache_block_table)) +
@@ -2024,9 +2133,8 @@ Query_cache::write_block_data(ulong data
   DBUG_ENTER("Query_cache::write_block_data");
   DBUG_PRINT("qcache", ("data: %ld, header: %ld, all header: %ld",
 		      data_len, header_len, all_headers_len));
-  Query_cache_block *block = allocate_block(max(align_len, 
-						min_allocation_unit),
-					    1, 0, under_guard);
+  Query_cache_block *block= allocate_block(max(align_len,
+                                           min_allocation_unit),1, 0);
   if (block != 0)
   {
     block->type = type;
@@ -2243,8 +2351,7 @@ my_bool Query_cache::allocate_data_chain
 
     if (!(new_block= allocate_block(max(min_size, align_len),
 				    min_result_data_size == 0,
-				    all_headers_len + min_result_data_size,
-				    1)))
+				    all_headers_len + min_result_data_size)))
     {
       DBUG_PRINT("warning", ("Can't allocate block for results"));
       DBUG_RETURN(FALSE);
@@ -2286,51 +2393,94 @@ my_bool Query_cache::allocate_data_chain
   Invalidate the first table in the table_list
 */
 
-void Query_cache::invalidate_table(TABLE_LIST *table_list)
+void Query_cache::invalidate_table(THD *thd, TABLE_LIST *table_list)
 {
   if (table_list->table != 0)
-    invalidate_table(table_list->table);	// Table is open
+    invalidate_table(thd, table_list->table);	// Table is open
   else
   {
     char key[MAX_DBKEY_LENGTH];
     uint key_length;
-    Query_cache_block *table_block;
+
     key_length=(uint) (strmov(strmov(key,table_list->db)+1,
 			      table_list->table_name) -key)+ 1;
 
     // We don't store temporary tables => no key_length+=4 ...
-    if ((table_block = (Query_cache_block*)
-	 hash_search(&tables,(uchar*) key,key_length)))
-      invalidate_table(table_block);
+    invalidate_table(thd, (uchar *)key, key_length);
   }
 }
 
-void Query_cache::invalidate_table(TABLE *table)
+void Query_cache::invalidate_table(THD *thd, TABLE *table)
 {
-  invalidate_table((uchar*) table->s->table_cache_key.str,
+  invalidate_table(thd, (uchar*) table->s->table_cache_key.str,
                    table->s->table_cache_key.length);
 }
 
-void Query_cache::invalidate_table(uchar * key, uint32  key_length)
+void Query_cache::invalidate_table(THD *thd, uchar * key, uint32  key_length)
 {
-  Query_cache_block *table_block;
-  if ((table_block = ((Query_cache_block*)
-		      hash_search(&tables, key, key_length))))
-    invalidate_table(table_block);
+  bool interrupt;
+  STRUCT_LOCK(&structure_guard_mutex);
+  wait_while_table_flush_is_in_progress(&interrupt);
+  if (interrupt)
+  {
+    STRUCT_UNLOCK(&structure_guard_mutex);
+    return;
+  }
+
+  /*
+    Setting 'TABLE_FLUSH_IN_PROGRESS' will temporarily disable the cache
+    so that structural changes to cache won't block the entire server.
+    However, threads requesting to change the query cache will still have
+    to wait for the flush to finish.
+  */
+  m_cache_status= Query_cache::TABLE_FLUSH_IN_PROGRESS;
+  STRUCT_UNLOCK(&structure_guard_mutex);
+
+  Query_cache_block *table_block=
+    (Query_cache_block*)hash_search(&tables, key, key_length);
+  if (query_cache_size > 0 && table_block)
+  {
+    Query_cache_block_table *list_root= table_block->table(0);
+    invalidate_query_block_list(thd, list_root);
+  }
+
+  STRUCT_LOCK(&structure_guard_mutex);
+  m_cache_status= Query_cache::NO_FLUSH_IN_PROGRESS;
+
+  /*
+    net_real_write might be waiting on a change on the m_cache_status
+    variable.
+  */
+  pthread_cond_signal(&COND_cache_status_changed);
+  STRUCT_UNLOCK(&structure_guard_mutex);
 }
 
-void Query_cache::invalidate_table(Query_cache_block *table_block)
+
+/**
+  @brief Invalidate a linked list of query cache blocks.
+
+  Each block tries to aquire a block level lock before
+  free_query is a called. This function will in turn affect
+  related table- and result-blocks.
+
+  @param[in,out] thd Thread context.
+  @param[in,out] list_root A pointer to a circular list of query blocks.
+
+*/
+
+void
+Query_cache::invalidate_query_block_list(THD *thd,
+                                         Query_cache_block_table *list_root)
 {
-  Query_cache_block_table *list_root =	table_block->table(0);
   while (list_root->next != list_root)
   {
-    Query_cache_block *query_block = list_root->next->block();
+    Query_cache_block *query_block= list_root->next->block();
     BLOCK_LOCK_WR(query_block);
     free_query(query_block);
+    DBUG_EXECUTE_IF("debug_cache_locks", sleep(10););
   }
 }
 
-
 /*
   Register given table list begining with given position in tables table of
   block
@@ -2463,7 +2613,6 @@ my_bool Query_cache::register_all_tables
 
   if (n)
   {
-    DBUG_PRINT("qcache", ("failed at table %d", (int) n));
     /* Unlink the tables we allocated above */
     for (Query_cache_block_table *tmp = block->table(0) ;
 	 tmp != block_table;
@@ -2473,9 +2622,13 @@ my_bool Query_cache::register_all_tables
   return (n);
 }
 
-/*
-  Insert used tablename in cache
-  Returns 0 on error
+
+/**
+  @brief Insert used table name into the cache.
+
+  @return Error status
+    @retval FALSE On error
+    @retval TRUE On success
 */
 
 my_bool
@@ -2489,9 +2642,10 @@ Query_cache::insert_table(uint key_len, 
   DBUG_PRINT("qcache", ("insert table node 0x%lx, len %d",
 		      (ulong)node, key_len));
 
-  Query_cache_block *table_block = ((Query_cache_block *)
-				    hash_search(&tables, (uchar*) key,
-						key_len));
+  THD *thd= current_thd;
+
+  Query_cache_block *table_block= 
+    (Query_cache_block *)hash_search(&tables, (uchar*) key, key_len);
 
   if (table_block &&
       table_block->table()->engine_data() != engine_data)
@@ -2506,7 +2660,11 @@ Query_cache::insert_table(uint key_len, 
       as far as we delete all queries with this table, table block will be
       deleted, too
     */
-    invalidate_table(table_block);
+    {
+      Query_cache_block_table *list_root= table_block->table(0);
+      invalidate_query_block_list(thd, list_root);
+    }
+
     table_block= 0;
   }
 
@@ -2514,21 +2672,29 @@ Query_cache::insert_table(uint key_len, 
   {
     DBUG_PRINT("qcache", ("new table block from 0x%lx (%u)",
 			(ulong) key, (int) key_len));
-    table_block = write_block_data(key_len, (uchar*) key,
-				   ALIGN_SIZE(sizeof(Query_cache_table)),
-				   Query_cache_block::TABLE,
-				   1, 1);
+    table_block= write_block_data(key_len, (uchar*) key,
+                                  ALIGN_SIZE(sizeof(Query_cache_table)),
+                                  Query_cache_block::TABLE, 1);
     if (table_block == 0)
     {
       DBUG_PRINT("qcache", ("Can't write table name to cache"));
       DBUG_RETURN(0);
     }
-    Query_cache_table *header = table_block->table();
+    Query_cache_table *header= table_block->table();
     double_linked_list_simple_include(table_block,
-				      &tables_blocks);
-    Query_cache_block_table *list_root = table_block->table(0);
-    list_root->n = 0;
-    list_root->next = list_root->prev = list_root;
+                                      &tables_blocks);
+    /*
+      First node in the Query_cache_block_table-chain is the table-type
+      block. This block will only have one Query_cache_block_table (n=0).
+    */
+    Query_cache_block_table *list_root= table_block->table(0);
+    list_root->n= 0;
+
+    /*
+      The node list is circular in nature.
+    */
+    list_root->next= list_root->prev= list_root;
+
     if (my_hash_insert(&tables, (const uchar *) table_block))
     {
       DBUG_PRINT("qcache", ("Can't insert table to hash"));
@@ -2536,20 +2702,37 @@ Query_cache::insert_table(uint key_len, 
       free_memory_block(table_block);
       DBUG_RETURN(0);
     }
-    char *db = header->db();
+    char *db= header->db();
     header->table(db + db_length + 1);
     header->key_length(key_len);
     header->type(cache_type);
     header->callback(callback);
     header->engine_data(engine_data);
+
+    /*
+      We insert this table without the assumption that it isn't refrenenced by
+      any queries.
+    */
+    header->m_cached_query_count= 0;
   }
 
-  Query_cache_block_table *list_root = table_block->table(0);
-  node->next = list_root->next;
-  list_root->next = node;
-  node->next->prev = node;
-  node->prev = list_root;
-  node->parent = table_block->table();
+  /*
+    Table is now in the cache; link the table_block-node associated
+    with the currently processed query into the chain of queries depending
+    on the cached table.
+  */
+  Query_cache_block_table *list_root= table_block->table(0);
+  node->next= list_root->next;
+  list_root->next= node;
+  node->next->prev= node;
+  node->prev= list_root;
+  node->parent= table_block->table();
+  /*
+    Increase the counter to keep track on how long this chain
+    of queries is.
+  */
+  Query_cache_table *table_block_data= table_block->table();
+  table_block_data->m_cached_query_count++;
   DBUG_RETURN(1);
 }
 
@@ -2557,15 +2740,27 @@ Query_cache::insert_table(uint key_len, 
 void Query_cache::unlink_table(Query_cache_block_table *node)
 {
   DBUG_ENTER("Query_cache::unlink_table");
-  node->prev->next = node->next;
-  node->next->prev = node->prev;
-  Query_cache_block_table *neighbour = node->next;
+  node->prev->next= node->next;
+  node->next->prev= node->prev;
+  Query_cache_block_table *neighbour= node->next;
+  Query_cache_table *table_block_data= node->parent;
+  table_block_data->m_cached_query_count--;
+
+  DBUG_ASSERT(table_block_data->m_cached_query_count >= 0);
+
   if (neighbour->next == neighbour)
   {
-    // list is empty (neighbor is root of list)
-    Query_cache_block *table_block = neighbour->block();
+    DBUG_ASSERT(table_block_data->m_cached_query_count == 0);
+    /*
+      If neighbor is root of list, the list is empty.
+      The root of the list is always a table-type block
+      which contain exactly one Query_cache_block_table
+      node object, thus we can use the block() method
+      to calculate the Query_cache_block address.
+    */
+    Query_cache_block *table_block= neighbour->block();
     double_linked_list_exclude(table_block,
-			       &tables_blocks);
+                               &tables_blocks);
     hash_delete(&tables,(uchar *) table_block);
     free_memory_block(table_block);
   }
@@ -2577,12 +2772,11 @@ void Query_cache::unlink_table(Query_cac
 *****************************************************************************/
 
 Query_cache_block *
-Query_cache::allocate_block(ulong len, my_bool not_less, ulong min,
-			    my_bool under_guard)
+Query_cache::allocate_block(ulong len, my_bool not_less, ulong min)
 {
   DBUG_ENTER("Query_cache::allocate_block");
-  DBUG_PRINT("qcache", ("len %lu, not less %d, min %lu, uder_guard %d",
-		      len, not_less,min,under_guard));
+  DBUG_PRINT("qcache", ("len %lu, not less %d, min %lu",
+             len, not_less,min));
 
   if (len >= min(query_cache_size, query_cache_limit))
   {
@@ -2591,17 +2785,6 @@ Query_cache::allocate_block(ulong len, m
     DBUG_RETURN(0); // in any case we don't have such piece of memory
   }
 
-  if (!under_guard)
-  {
-    STRUCT_LOCK(&structure_guard_mutex);
-
-    if (unlikely(query_cache.query_cache_size == 0 || flush_in_progress))
-    {
-      STRUCT_UNLOCK(&structure_guard_mutex);
-      DBUG_RETURN(0);
-    }
-  }
-
   /* Free old queries until we have enough memory to store this block */
   Query_cache_block *block;
   do
@@ -2616,8 +2799,6 @@ Query_cache::allocate_block(ulong len, m
       split_block(block,ALIGN_SIZE(len));
   }
 
-  if (!under_guard)
-    STRUCT_UNLOCK(&structure_guard_mutex);
   DBUG_RETURN(block);
 }
 
@@ -2852,9 +3033,7 @@ uint Query_cache::find_bin(ulong size)
   }
   uint bin =  steps[left].idx - 
     (uint)((size - steps[left].size)/steps[left].increment);
-#ifndef DBUG_OFF
-  bins_dump();
-#endif
+
   DBUG_PRINT("qcache", ("bin %u step %u, size %lu step size %lu",
 			bin, left, size, steps[left].size));
   DBUG_RETURN(bin);
@@ -3140,18 +3319,17 @@ my_bool Query_cache::ask_handler_allowan
   Packing
 *****************************************************************************/
 
+
+/**
+  @brief Rearrange all memory blocks so that free memory joins at the
+    'bottom' of the allocated memory block containing all cache data.
+  @see Query_cache::pack(ulong join_limit, uint iteration_limit)
+*/
+
 void Query_cache::pack_cache()
 {
   DBUG_ENTER("Query_cache::pack_cache");
 
-  STRUCT_LOCK(&structure_guard_mutex);
-
-  if (unlikely(query_cache_size == 0 || flush_in_progress))
-  {
-    STRUCT_UNLOCK(&structure_guard_mutex);
-    DBUG_VOID_RETURN;
-  }
-
   DBUG_EXECUTE("check_querycache",query_cache.check_integrity(1););
 
   uchar *border = 0;
@@ -3185,7 +3363,6 @@ void Query_cache::pack_cache()
   }
 
   DBUG_EXECUTE("check_querycache",query_cache.check_integrity(1););
-  STRUCT_UNLOCK(&structure_guard_mutex);
   DBUG_VOID_RETURN;
 }
 
@@ -3460,8 +3637,7 @@ my_bool Query_cache::join_results(ulong 
   my_bool has_moving = 0;
   DBUG_ENTER("Query_cache::join_results");
 
-  STRUCT_LOCK(&structure_guard_mutex);
-  if (queries_blocks != 0 && !flush_in_progress)
+  if (queries_blocks != 0)
   {
     DBUG_ASSERT(query_cache_size > 0);
     Query_cache_block *block = queries_blocks;
@@ -3524,7 +3700,6 @@ my_bool Query_cache::join_results(ulong 
       block = block->next;
     } while ( block != queries_blocks );
   }
-  STRUCT_UNLOCK(&structure_guard_mutex);
   DBUG_RETURN(has_moving);
 }
 
@@ -3760,6 +3935,14 @@ void Query_cache::tables_dump()
 }
 
 
+/**
+  @brief Checks integrity of the various linked lists
+
+  @return Error status code
+    @retval FALSE Query cache is operational.
+    @retval TRUE Query cache is broken.
+*/
+
 my_bool Query_cache::check_integrity(bool locked)
 {
   my_bool result = 0;
@@ -3769,14 +3952,8 @@ my_bool Query_cache::check_integrity(boo
   if (!locked)
     STRUCT_LOCK(&structure_guard_mutex);
 
-  if (unlikely(query_cache_size == 0 || flush_in_progress))
-  {
-    if (!locked)
-      STRUCT_UNLOCK(&query_cache.structure_guard_mutex);
-
-    DBUG_PRINT("qcache", ("Query Cache not initialized"));
-    DBUG_RETURN(0);
-  }
+  while (is_flushing())
+    pthread_cond_wait(&COND_cache_status_changed,&structure_guard_mutex);
 
   if (hash_check(&queries))
   {

--- 1.153/sql/sql_db.cc	2007-06-01 09:43:53 +02:00
+++ 1.154/sql/sql_db.cc	2007-07-02 19:14:46 +02:00
@@ -1084,7 +1084,7 @@ static long mysql_rm_known_files(THD *th
     }
   }
   if (thd->killed ||
-      (tot_list && mysql_rm_table_part2_with_lock(thd, tot_list, 1, 0, 1)))
+      (tot_list && mysql_rm_table_part2(thd, tot_list, 1, 0, 1, 1, 1)))
     goto err;
 
   /* Remove RAID directories */

--- 1.675/sql/sql_parse.cc	2007-06-01 09:43:54 +02:00
+++ 1.676/sql/sql_parse.cc	2007-07-02 19:14:47 +02:00
@@ -2445,7 +2445,7 @@ end_with_restore_list:
           check_grant(thd, INSERT_ACL | CREATE_ACL, &new_list, 0, 1, 0)))
         goto error;
     }
-    query_cache_invalidate3(thd, first_table, 0);
+
     if (end_active_trans(thd) || mysql_rename_tables(thd, first_table, 0))
       goto error;
     break;

--- 1.424/sql/sql_table.cc	2007-06-01 11:33:54 +02:00
+++ 1.425/sql/sql_table.cc	2007-07-02 19:14:47 +02:00
@@ -1411,14 +1411,7 @@ bool mysql_rm_table(THD *thd,TABLE_LIST 
     LOCK_open during wait_if_global_read_lock(), other threads could not
     close their tables. This would make a pretty deadlock.
   */
-  thd->mysys_var->current_mutex= &LOCK_open;
-  thd->mysys_var->current_cond= &COND_refresh;
-  VOID(pthread_mutex_lock(&LOCK_open));
-
-  error= mysql_rm_table_part2(thd, tables, if_exists, drop_temporary, 0, 0);
-
-  pthread_mutex_unlock(&LOCK_open);
-
+  error= mysql_rm_table_part2(thd, tables, if_exists, drop_temporary, 0, 0, 1);
   pthread_mutex_lock(&thd->mysys_var->mutex);
   thd->mysys_var->current_mutex= 0;
   thd->mysys_var->current_cond= 0;
@@ -1433,49 +1426,6 @@ bool mysql_rm_table(THD *thd,TABLE_LIST 
   DBUG_RETURN(FALSE);
 }
 
-
-/*
- delete (drop) tables.
-
-  SYNOPSIS
-    mysql_rm_table_part2_with_lock()
-    thd			Thread handle
-    tables		List of tables to delete
-    if_exists		If 1, don't give error if one table doesn't exists
-    dont_log_query	Don't write query to log files. This will also not
-                        generate warnings if the handler files doesn't exists
-
- NOTES
-   Works like documented in mysql_rm_table(), but don't check
-   global_read_lock and don't send_ok packet to server.
-
- RETURN
-  0	ok
-  1	error
-*/
-
-int mysql_rm_table_part2_with_lock(THD *thd,
-				   TABLE_LIST *tables, bool if_exists,
-				   bool drop_temporary, bool dont_log_query)
-{
-  int error;
-  thd->mysys_var->current_mutex= &LOCK_open;
-  thd->mysys_var->current_cond= &COND_refresh;
-  VOID(pthread_mutex_lock(&LOCK_open));
-
-  error= mysql_rm_table_part2(thd, tables, if_exists, drop_temporary, 1,
-			      dont_log_query);
-
-  pthread_mutex_unlock(&LOCK_open);
-
-  pthread_mutex_lock(&thd->mysys_var->mutex);
-  thd->mysys_var->current_mutex= 0;
-  thd->mysys_var->current_cond= 0;
-  pthread_mutex_unlock(&thd->mysys_var->mutex);
-  return error;
-}
-
-
 /*
   Execute the drop of a normal or temporary table
 
@@ -1508,7 +1458,7 @@ int mysql_rm_table_part2_with_lock(THD *
 
 int mysql_rm_table_part2(THD *thd, TABLE_LIST *tables, bool if_exists,
 			 bool drop_temporary, bool drop_view,
-			 bool dont_log_query)
+			 bool dont_log_query, bool need_lock_open)
 {
   TABLE_LIST *table;
   char path[FN_REFLEN], *alias;
@@ -1520,9 +1470,11 @@ int mysql_rm_table_part2(THD *thd, TABLE
   String built_query;
   DBUG_ENTER("mysql_rm_table_part2");
 
+  if (need_lock_open)
+    pthread_mutex_lock(&LOCK_open);
+
   LINT_INIT(alias);
   LINT_INIT(path_length);
-  safe_mutex_assert_owner(&LOCK_open);
 
   if (thd->current_stmt_binlog_row_based && !dont_log_query)
   {
@@ -1555,8 +1507,15 @@ int mysql_rm_table_part2(THD *thd, TABLE
     }
   }
 
-  if (!drop_temporary && lock_table_names(thd, tables))
+  if (!drop_temporary && lock_table_names_exclusively(thd, tables))
+  {
+    if (need_lock_open)
+      pthread_mutex_unlock(&LOCK_open);
     DBUG_RETURN(1);
+  }
+
+  if (need_lock_open)
+    pthread_mutex_unlock(&LOCK_open);
 
   /* Don't give warnings for not found errors, as we already generate notes */
   thd->no_warnings_for_error= 1;
@@ -1567,7 +1526,7 @@ int mysql_rm_table_part2(THD *thd, TABLE
     handlerton *table_type;
     enum legacy_db_type frm_db_type;
 
-    mysql_ha_flush(thd, table, MYSQL_HA_CLOSE_FINAL, TRUE);
+    mysql_ha_flush(thd, table, MYSQL_HA_CLOSE_FINAL, !need_lock_open);
     if (!close_temporary_table(thd, table))
     {
       tmp_table_deleted=1;
@@ -1604,6 +1563,8 @@ int mysql_rm_table_part2(THD *thd, TABLE
     {
       TABLE *locked_table;
       abort_locked_tables(thd, db, table->table_name);
+      if (need_lock_open)
+        pthread_mutex_lock(&LOCK_open);
       remove_table_from_cache(thd, db, table->table_name,
 	                      RTFC_WAIT_OTHER_THREAD_FLAG |
 			      RTFC_CHECK_KILLED_FLAG);
@@ -1614,6 +1575,9 @@ int mysql_rm_table_part2(THD *thd, TABLE
       if ((locked_table= drop_locked_tables(thd, db, table->table_name)))
         table->table= locked_table;
 
+      if (need_lock_open)
+        pthread_mutex_unlock(&LOCK_open);
+
       if (thd->killed)
       {
         thd->no_warnings_for_error= 0;
@@ -1739,9 +1703,11 @@ int mysql_rm_table_part2(THD *thd, TABLE
       */
     }
   }
-
-  if (!drop_temporary)
-    unlock_table_names(thd, tables, (TABLE_LIST*) 0);
+  if (need_lock_open)
+    pthread_mutex_lock(&LOCK_open);
+  unlock_table_names(thd, tables, (TABLE_LIST*) 0);
+  if (need_lock_open)
+    pthread_mutex_unlock(&LOCK_open);
   thd->no_warnings_for_error= 0;
   DBUG_RETURN(error);
 }

--- 1.94/sql/sql_trigger.cc	2007-05-24 00:39:25 +02:00
+++ 1.95/sql/sql_trigger.cc	2007-07-02 19:14:47 +02:00
@@ -1268,8 +1268,6 @@ bool Table_triggers_list::drop_all_trigg
   bzero(&table, sizeof(table));
   init_alloc_root(&table.mem_root, 8192, 0);
 
-  safe_mutex_assert_owner(&LOCK_open);
-
   if (Table_triggers_list::check_n_load(thd, db, name, &table, 1))
   {
     result= 1;
@@ -1431,26 +1429,24 @@ Table_triggers_list::change_table_name_i
 }
 
 
-/*
-  Update .TRG and .TRN files after renaming triggers' subject table.
+/**
+  @brief Update .TRG and .TRN files after renaming triggers' subject table.
 
-  SYNOPSIS
-    change_table_name()
-      thd        Thread context
-      db         Old database of subject table
-      old_table  Old name of subject table
-      new_db     New database for subject table
-      new_table  New name of subject table
+  @param[in,out] thd Thread context
+  @param[in] db Old database of subject table
+  @param[in] old_table Old name of subject table
+  @param[in] new_db New database for subject table
+  @param[in] new_table New name of subject table
 
-  NOTE
+  @note
     This method tries to leave trigger related files in consistent state,
     i.e. it either will complete successfully, or will fail leaving files
     in their initial state.
     Also this method assumes that subject table is not renamed to itself.
+    This method needs to be called under an exclusive table name lock.
 
-  RETURN VALUE
-    FALSE  Success
-    TRUE   Error
+  @retval FALSE Success
+  @retval TRUE  Error
 */
 
 bool Table_triggers_list::change_table_name(THD *thd, const char *db,
@@ -1466,7 +1462,19 @@ bool Table_triggers_list::change_table_n
   bzero(&table, sizeof(table));
   init_alloc_root(&table.mem_root, 8192, 0);
 
-  safe_mutex_assert_owner(&LOCK_open);
+  uchar key[MAX_DBKEY_LENGTH];
+  uint key_length= (uint) (strmov(strmov((char*)&key[0], db)+1,
+                    old_table)-(char*)&key[0])+1;
+
+  /*
+    This method interfaces the mysql server code protected by
+    either LOCK_open mutex or with an exclusive table name lock.
+    In the future, only an exclusive table name lock will be enough.
+  */
+#ifndef DBUG_OFF
+  if (!is_table_name_exclusively_locked_by_this_thread(thd, key, key_length))
+    safe_mutex_assert_owner(&LOCK_open);
+#endif
 
   DBUG_ASSERT(my_strcasecmp(table_alias_charset, db, new_db) ||
               my_strcasecmp(table_alias_charset, old_table, new_table));

--- 1.454/sql/ha_ndbcluster.cc	2007-05-24 18:47:54 +02:00
+++ 1.455/sql/ha_ndbcluster.cc	2007-07-02 19:14:46 +02:00
@@ -6996,7 +6996,6 @@ int ndbcluster_find_files(handlerton *ht
 
   // Lock mutex before deleting and creating frm files
   pthread_mutex_lock(&LOCK_open);
-
   if (!global_read_lock)
   {
     // Delete old files
@@ -7010,10 +7009,12 @@ int ndbcluster_find_files(handlerton *ht
       table_list.db= (char*) db;
       table_list.alias= table_list.table_name= (char*)file_name;
       (void)mysql_rm_table_part2(thd, &table_list,
-                                                                 /* if_exists */ FALSE,
-                                                                 /* drop_temporary */ FALSE,
-                                                                 /* drop_view */ FALSE,
-                                                                 /* dont_log_query*/ TRUE);
+                                 FALSE,   /* if_exists */
+                                 FALSE,   /* drop_temporary */ 
+                                 FALSE,   /* drop_view */
+                                 TRUE,    /* dont_log_query*/ 
+                                 FALSE);  /* need lock open */
+
       /* Clear error message that is returned when table is deleted */
       thd->clear_error();
     }
@@ -7029,7 +7030,7 @@ int ndbcluster_find_files(handlerton *ht
   }
 
   pthread_mutex_unlock(&LOCK_open);
-  
+
   hash_free(&ok_tables);
   hash_free(&ndb_tables);
 

--- 1.43/sql/sql_rename.cc	2006-12-31 01:06:36 +01:00
+++ 1.44/sql/sql_rename.cc	2007-07-02 19:14:47 +02:00
@@ -144,10 +144,14 @@ bool mysql_rename_tables(THD *thd, TABLE
     }
   }
 
-  VOID(pthread_mutex_lock(&LOCK_open));
-  if (lock_table_names(thd, table_list))
+  pthread_mutex_lock(&LOCK_open);
+  if (lock_table_names_exclusively(thd, table_list))
+  {
+    pthread_mutex_unlock(&LOCK_open);
     goto err;
-  
+  }
+  pthread_mutex_unlock(&LOCK_open);
+
   error=0;
   if ((ren_table=rename_tables(thd,table_list,0)))
   {
@@ -183,10 +187,14 @@ bool mysql_rename_tables(THD *thd, TABLE
     send_ok(thd);
   }
 
+  if (!error)
+    query_cache_invalidate3(thd, table_list, 0);
+
+  pthread_mutex_lock(&LOCK_open);
   unlock_table_names(thd, table_list, (TABLE_LIST*) 0);
+  pthread_mutex_unlock(&LOCK_open);
 
 err:
-  pthread_mutex_unlock(&LOCK_open);
   /* enable logging back if needed */
   if (disable_logs)
   {

--- 1.38/sql/sql_cache.h	2007-05-10 11:59:29 +02:00
+++ 1.39/sql/sql_cache.h	2007-07-02 19:14:46 +02:00
@@ -65,17 +65,44 @@ struct Query_cache_query;
 struct Query_cache_result;
 class Query_cache;
 
-
+/**
+  @brief This class represents a node in the linked chain of queries
+         belonging to one table.
+
+  @note The root of this linked list is not a query-type block, but the table-
+        type block which all queries has in common.
+*/
 struct Query_cache_block_table
 {
   Query_cache_block_table() {}                /* Remove gcc warning */
-  TABLE_COUNTER_TYPE n;		// numbr in table (from 0)
+
+  /**
+    This node holds a position in a static table list belonging
+    to the associated query (base 0).
+  */
+  TABLE_COUNTER_TYPE n;
+
+  /**
+    Pointers to the next and previous node, linking all queries with 
+    a common table.
+  */
   Query_cache_block_table *next, *prev;
+
+  /**
+    A pointer to the table-type block which all
+    linked queries has in common.
+  */
   Query_cache_table *parent;
+
+  /**
+    A method to calculate the address of the query cache block
+    owning this node. The purpose of this calculation is to 
+    make it easier to move the query cache block without having
+    to modify all the pointer addresses.
+  */
   inline Query_cache_block *block();
 };
 
-
 struct Query_cache_block
 {
   Query_cache_block() {}                      /* Remove gcc warning */
@@ -151,6 +178,11 @@ struct Query_cache_table
   /* data need by some engines */
   ulonglong engine_data_buff;
 
+  /**
+    The number of queries depending of this table.
+  */
+  int32 m_cached_query_count;
+
   inline char *db()			     { return (char *) data(); }
   inline char *table()			     { return tbl; }
   inline void table(char *table_arg)	     { tbl= table_arg; }
@@ -237,9 +269,14 @@ public:
   ulong free_memory, queries_in_cache, hits, inserts, refused,
     free_memory_blocks, total_blocks, lowmem_prunes;
 
+
 private:
-  pthread_cond_t COND_flush_finished;
-  bool flush_in_progress;
+  pthread_cond_t COND_cache_status_changed;
+
+  enum Cache_status { NO_FLUSH_IN_PROGRESS, FLUSH_IN_PROGRESS,
+                      TABLE_FLUSH_IN_PROGRESS };
+
+  Cache_status m_cache_status;
 
   void free_query_internal(Query_cache_block *point);
 
@@ -253,7 +290,7 @@ protected:
       2. query block (for operation inside query (query block/results))
 
     Thread doing cache flush releases the mutex once it sets
-    flush_in_progress flag, so other threads may bypass the cache as
+    m_cache_status flag, so other threads may bypass the cache as
     if it is disabled, not waiting for reset to finish.  The exception
     is other threads that were going to do cache flush---they'll wait
     till the end of a flush operation.
@@ -270,6 +307,7 @@ protected:
   /* options */
   ulong min_allocation_unit, min_result_data_size;
   uint def_query_hash_size, def_table_hash_size;
+  
   uint mem_bin_num, mem_bin_steps;		// See at init_cache & find_bin
 
   my_bool initialized;
@@ -295,10 +333,13 @@ protected:
 			      ulong data_len,
 			      Query_cache_block *query_block,
 			      my_bool first_block);
-  void invalidate_table(TABLE_LIST *table);
-  void invalidate_table(TABLE *table);
-  void invalidate_table(uchar *key, uint32  key_length);
-  void invalidate_table(Query_cache_block *table_block);
+  void invalidate_table(THD *thd, TABLE_LIST *table);
+  void invalidate_table(THD *thd, TABLE *table);
+  void invalidate_table(THD *thd, uchar *key, uint32  key_length);
+  void invalidate_table(THD *thd, Query_cache_block *table_block);
+  void invalidate_query_block_list(THD *thd, 
+                                   Query_cache_block_table *list_root);
+
   TABLE_COUNTER_TYPE
     register_tables_from_list(TABLE_LIST *tables_used,
                               TABLE_COUNTER_TYPE counter,
@@ -337,6 +378,8 @@ protected:
 	      Query_cache_block *pprev);
   my_bool join_results(ulong join_limit);
 
+  void wait_while_table_flush_is_in_progress(bool *interrupt);
+
   /*
     Following function control structure_guard_mutex
     by themself or don't need structure_guard_mutex
@@ -347,8 +390,7 @@ protected:
   Query_cache_block *write_block_data(ulong data_len, uchar* data,
 				       ulong header_len,
 				       Query_cache_block::block_type type,
-				       TABLE_COUNTER_TYPE ntab = 0,
-				       my_bool under_guard=0);
+				       TABLE_COUNTER_TYPE ntab = 0);
   my_bool append_result_data(Query_cache_block **result,
 			     ulong data_len, uchar* data,
 			     Query_cache_block *parent);
@@ -360,8 +402,7 @@ protected:
   inline ulong get_min_first_result_data_size();
   inline ulong get_min_append_result_data_size();
   Query_cache_block *allocate_block(ulong len, my_bool not_less,
-				     ulong min,
-				     my_bool under_guard=0);
+				     ulong min);
   /*
     If query is cacheable return number tables in query
     (query without tables not cached)
@@ -423,6 +464,11 @@ protected:
   friend void query_cache_insert(NET *net, const char *packet, ulong length);
   friend void query_cache_end_of_result(THD *thd);
   friend void query_cache_abort(NET *net);
+
+  bool is_flushing(void) 
+  { 
+    return (m_cache_status != Query_cache::NO_FLUSH_IN_PROGRESS);
+  }
 
   /*
     The following functions are only used when debugging
Thread
bk commit into 5.1 tree (thek:1.2541) BUG#21074kpettersson2 Jul