List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:January 5 2009 2:24pm
Subject:Re: bzr commit into mysql-6.0-maria branch (serg:2707) Bug#40990
View as plain text  
Hello Serg,

Sergei Golubchik a écrit, Le 12/22/2008 07:04 PM:
> #At file:///usr/home/serg/Abk/mysql/6.0-maria/

I guess this will be ported back to 5.1-maria.

>  2707 Sergei Golubchik	2008-12-22
>       Bug#40990 Maria: failure of maria.test & maria_notemebedded in deadlock
> detection
>       detect a case when a blocker has removed itself and signalled after the
> condition timed out
>       but before it (cond_wait) acquired the mutex back
> modified:
>   include/waiting_threads.h
>   mysys/waiting_threads.c
> 
> === modified file 'include/waiting_threads.h'
> --- a/include/waiting_threads.h	2008-11-04 13:09:32 +0000
> +++ b/include/waiting_threads.h	2008-12-22 18:04:24 +0000
> @@ -182,6 +182,7 @@ typedef struct st_wt_thd {
>  } WT_THD;
>  
>  #define WT_TIMEOUT              ETIMEDOUT
> +#define WT_FREE_TO_GO           1

Can we be sure that on all platforms, ETIMEDOUT != 1 (is there some 
POSIX rule somewhere) ?
Why not pick a negative value for our "custom errors" like is already 
the case for WT_DEADLOCK, WT_DEPTH_EXCEEDED? We would use -3 for 
WT_FREE_TO_GO.

>  #define WT_OK                   0
>  #define WT_DEADLOCK             -1
>  #define WT_DEPTH_EXCEEDED       -2
> 
> === modified file 'mysys/waiting_threads.c'
> --- a/mysys/waiting_threads.c	2008-11-10 19:11:27 +0000
> +++ b/mysys/waiting_threads.c	2008-12-22 18:04:24 +0000
> @@ -613,7 +613,16 @@ static int deadlock(WT_THD *thd, WT_THD 
>    if (ret == WT_DEADLOCK && depth)
>      change_victim(blocker, &arg);
>    if (arg.rc)
> +  {
> +    /* special return code if there's nobody to wait for */
> +    if (depth == 0 && ret == WT_OK && arg.rc->owners.elements ==
> 0)

I guess the added code has to be in deadlock() and not in 
wt_thd_timedwait() (which could have been more natural), to avoid adding 
locking/unlocking of the rc in wt_thd_timedwait().

So, looking at the new if(), if one of these four is false:
arg.rc
depth == 0
ret == WT_OK
arg.rc->owners.elements == 0

then it is considered ok to have missed the broadcast and go into the 
long timeout wait... Let's see:

- if "arg.rc->owners.elements == 0" is false, then there is some other 
thread to wait for, we have to do the long timeout
- if "ret == WT_OK" is false: it could be that ret==WT_DEADLOCK, imagine 
then that we go into "/* notify the victim, if appropriate */" below: 
the if() there sets 'ret' to WT_OK, so it looks like we will do the long 
timeout wait; but the killed victim thread will call 
wt_thd_release_self() and so wake us up, ok
- if "depth == 0" is false: then it's not wt_thd_cond_timedwait() which 
is calling us, so ok; is there another more fundamental reason for the 
"depth==0"?
- if "arg.rc" is false: assuming "depth" is 0, this is possible only if 
deadlock_search(), at first call of the recursion, exceeded search depth 
(which is impossible as it is the first call) or found no resource to 
wait on ("exit: OK (early)"); that last case should be impossible, as 
wt_thd_cond_timedwait() logically assumes thd->waiting_for!=NULL.

Given this, the patch looks ok to me, but is there a more 
straightforward explanation of these added lines than mine?
I suggest adding a comment before this added if(). To me
"depth == 0 && ret == WT_OK" is the non-obvious part, I translate 
"nobody to wait for" to only "arg.rc->owners.elements == 0".

> +    {
> +      DBUG_ASSERT(thd == blocker);
> +      DBUG_ASSERT(arg.rc == thd->waiting_for);
> +      ret= WT_FREE_TO_GO;
> +    }
>      rc_unlock(arg.rc);
> +  }
>    /* notify the victim, if appropriate */
>    if (ret == WT_DEADLOCK && arg.victim != thd)
>    {
> @@ -888,7 +897,10 @@ int wt_thd_cond_timedwait(WT_THD *thd, p
>      ret= pthread_cond_timedwait(&rc->cond, mutex, &timeout);
>    if (ret == WT_TIMEOUT)
>    {
> -    if (deadlock(thd, thd, 0, *thd->deadlock_search_depth_long))
> +    int r= deadlock(thd, thd, 0, *thd->deadlock_search_depth_long);
> +    if (r == WT_FREE_TO_GO)
> +      ret= WT_OK;
> +    else if (r != WT_OK)
>        ret= WT_DEADLOCK;
>      else if (*thd->timeout_long > *thd->timeout_short)
>      {

If we had a lot of time free, we could put some DBUG_EXECUTE_IF(make a 
pause) into our Windows pthread_cond_timedwait() implementation between 
when it wakes up and when it takes the mutex again, so that we can test 
the scenario I imagined (we don't even know if this scenario is what 
causes BUG#40990 in practice). But this may be overkill.

-- 
Mr. Guilhem Bichot <guilhem@stripped>
Sun Microsystems / MySQL, Lead Software Engineer
Bordeaux, France
www.sun.com / www.mysql.com
Thread
bzr commit into mysql-6.0-maria branch (serg:2707) Bug#40990Sergei Golubchik22 Dec
  • Re: bzr commit into mysql-6.0-maria branch (serg:2707) Bug#40990Guilhem Bichot5 Jan 2009
    • Re: bzr commit into mysql-6.0-maria branch (serg:2707) Bug#40990Sergei Golubchik7 Jan 2009