List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:February 25 2008 10:47am
Subject:review of patch for BUG#34634 "Concurrent pagecache_delete() and
eviction of the same page cause crash"
View as plain text  
Hello Sanja,

# ChangeSet
#   2008/02/22 10:42:29+02:00 bell@stripped 
#   Fixed problem of deleting blocks which are removing at the moment.

"delete" and "remove" mean the same thing in English.
Suggestion:
 Fixed problem of deleting blocks which are being evicted at the moment.

# storage/maria/ma_pagecache.c
#   2008/02/22 10:42:19+02:00 bell@stripped +33 -1
#   Avoid deleting blocks which already chosen for deleting.

They may have been chosen not for deleting but simply evicting.

>--- a/storage/maria/ma_pagecache.c      2008-02-22 10:45:48 +02:00
>+++ b/storage/maria/ma_pagecache.c      2008-02-22 10:45:48 +02:00
>@@ -1774,7 +1774,7 @@
>   PAGECACHE_HASH_LINK *hash_link;
>   PAGECACHE_BLOCK_LINK *block;
>   int error= 0;
>-  int page_status;
>+  int page_status, old_block_status;
> 
>   DBUG_ENTER("find_block");
>   KEYCACHE_THREAD_TRACE("find_block:begin");
>@@ -1909,7 +1909,12 @@
>       /* Resubmit the request */
>       goto restart;
>     }
>+    old_block_status= block->status;
>     block->status&= ~PCBLOCK_IN_SWITCH;
>+    if (old_block_status & PCBLOCK_IN_SWITCH)
>+      DBUG_PRINT("info", ("BUGINFO block loses PCBLOCK_IN_SWITCH; "
>+                          "status %d changes to %d",
>+                          old_block_status, block->status));

I added the above debug info to track down the problem and show the
wrong lines to the bugfixer, but I don't think it should go into the 
code. Especially this BUGINFO tag: it was to show where the bug's web
starts knitting, but it's not a bug per se to remove
BLOCK_IN_SWITCH. The bug was in pagecache_delete*(), not here.
If you think old_block_status and the DBUG_PRINT() are useful, then
please make sure that
old_block_status= block->status;
is not done in release builds, and remove the BUGINFO tag.

>   }
>   else
>   {
>@@ -2009,9 +2014,13 @@
>            ! (block->status & PCBLOCK_IN_SWITCH) )
>         {
>          /* this is a primary request for a new page */
>+          int old_block_status= block->status;

same here

>           DBUG_ASSERT(block->wlocks == 0);
>           DBUG_ASSERT(block->pins == 0);
>           block->status|= PCBLOCK_IN_SWITCH;
>+          DBUG_PRINT("info", ("BUGINFO block got PCBLOCK_IN_SWITCH; "
>+                              "status %d changes to %d",
>+                              old_block_status, block->status));
> 
>           KEYCACHE_DBUG_PRINT("find_block",
>                               ("got block %u for new page",
>@@ -3225,6 +3234,15 @@
>     if (!pagecache->can_be_used)
>       goto end;
> 
>+    DBUG_ASSERT((block->status & PCBLOCK_IN_SWITCH) == 0);
>+    if (block->status & PCBLOCK_REASSIGNED)

This is pagecache_delete_by_link(). As a link is provided by caller,
it implies that block is pinned by caller already. I imagine this pin
prevents anybody from evicting the block, so PCBLOCK_REASSIGNED is not
possible. Is this change really needed?
We have a lack of testing of this function: it's not used anywhere in
Maria except only once in a unit test which has no concurrency
(ma_pagecache_single.c), so the above change looks untested.

>+    {
>+      DBUG_PRINT("info", ("Block 0x%0lx already is reassigned",
>+                          (ulong) block));
>+      /* The block (will be | is) flushed and we can't prevent it */
>+      error= !flush;
>+      goto end;
>+    }
>     if (make_lock_and_pin(pagecache, block, lock, pin))
>     {
>       /*
>@@ -3336,6 +3354,15 @@
>       pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
>       DBUG_RETURN(0);
>     }
>+    if (block->status & (PCBLOCK_REASSIGNED | PCBLOCK_IN_SWITCH))
>+    {
>+      DBUG_PRINT("info", ("Block 0x%0lx already is reassigned or in switch",
>+                          (ulong) block));
>+      /* The block (will be | is) flushed and we can't prevent it */
>+      error= !flush;

Not sure that "error" should be 1 if "flush" is 0.
'flush == 1' means "I want this page to be flushed".
'flush == 0' means: "I don't need this page to be flushed, but it's
not a drama if it's flushed". See how pagecache_delete() is used in
practice: ma_blockrec.c calls it with 'flush==0' when it wants to
eliminate blob pages out of the page cache during a
maria_delete(). It's ok if those pages are on disk already (they may
have been flushed by a checkpoint one second ago); it's just that as
we know we delete the row which fills the entire page, and mark the
page free in the bitmap, the data in this page does not matter, there
is no need to flush it.
Another reason: pagecache_delete() starts with:

    page_link= get_present_hash_link(pagecache, file, pageno, &unused_start);
    if (!page_link)
    {
      DBUG_PRINT("info", ("There is no such page in the cache"));
      pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
      DBUG_RETURN(0);
    }

So, if it does not find the page, it returns ok. Thus, consider this
situation:

thread1 (calls pagecache_delete())        thread2 (evicts same page)

                                          evicts page X...
                                          sets IN_SWITCH...
                                          flushes to disk...
                                          sets REASSIGNED...
                                          waits for readers...
                                          reassigns to page Y...
                                          ...done

wants to delete page X...
finds no such page in the cache
so returns 0.

Change the timings a bit

thread1 (calls pagecache_delete())        thread2 (evicts same page)

                                          evicts page X...
                                          sets IN_SWITCH...
                                          flushes to disk...
                                          sets REASSIGNED...

wants to delete page X...
finds page in the cache
sees REASSIGNED
so returns 1.

                                          waits for readers...
                                          reassigns to page Y...
                                          ...done

There is no reason why the second scenario would return 1 if the first
scenario returns 0.

And it's going to provoke real-life bugs:
thread1 can be code in ma_blockrec.c deleting blob pages, and thread2
can be any other thread using the page cache, so thread1 will receive
an error and mark the table corrupted.

>+      page_link->requests--;
>+      goto end;
>+    }
>     block= page_link->block;
>     /* See NOTE for pagecache_unlock about registering requests. */
>     if (pin == PAGECACHE_PIN)
>@@ -3348,6 +3375,7 @@
>         lock is released, we will try to get the block again.
>       */
>       pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
>+      page_link->requests--;

I suggest to decrement before releasing the mutex, otherwise it's
unprotected.

A few lines above this change, there is:
    if (pin == PAGECACHE_PIN)
      reg_requests(pagecache, block, 1);

So, if we come to the block where make_lock_and_pin() fails, we
should cancel this reg_requests().

>       DBUG_PRINT("info", ("restarting..."));
>       goto restart;
>     }
>@@ -3750,10 +3778,14 @@
>   KEYCACHE_THREAD_TRACE("free block");
>   KEYCACHE_DBUG_PRINT("free_block",
>                       ("block is freed"));
>+  if (block->requests > 1)
>+    DBUG_PRINT("info",("BUGINFO block->requests is %d", block->requests));

not sure we want my debug info in the code.
How about changing it to DBUG_ASSERT() (Ingo has one in mf_keycache.c
at this place)?

>   unreg_request(pagecache, block, 0);
>   block->hash_link= NULL;
> 
>   /* Remove the free block from the LRU ring. */
>+  if (block->next_used == NULL)
>+    DBUG_PRINT("info",("BUGINFO block->next_used is NULL"));

Suggestion: remove this and instead add
DBUG_ASSERT(block->next_used != NULL) at start of unlink_block()
function.

>   unlink_block(pagecache, block);
>   if (block->temperature == PCBLOCK_WARM)
>     pagecache->warm_blocks--;

Thanks for fixing this bug.

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /    Mr. Guilhem Bichot <guilhem@stripped>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Lead Software Engineer
/_/  /_/\_, /___/\___\_\___/   Bordeaux, France
       <___/   www.mysql.com   
Thread
review of patch for BUG#34634 "Concurrent pagecache_delete() andeviction of the same page cause crash"Guilhem Bichot25 Feb