Hello Serg,
Sergei Golubchik a écrit, Le 12/22/2008 07:04 PM:
> #At file:///usr/home/serg/Abk/mysql/6.0-maria/
I guess this will be ported back to 5.1-maria.
> 2707 Sergei Golubchik 2008-12-22
> Bug#40990 Maria: failure of maria.test & maria_notemebedded in deadlock
> detection
> detect a case when a blocker has removed itself and signalled after the
> condition timed out
> but before it (cond_wait) acquired the mutex back
> modified:
> include/waiting_threads.h
> mysys/waiting_threads.c
>
> === modified file 'include/waiting_threads.h'
> --- a/include/waiting_threads.h 2008-11-04 13:09:32 +0000
> +++ b/include/waiting_threads.h 2008-12-22 18:04:24 +0000
> @@ -182,6 +182,7 @@ typedef struct st_wt_thd {
> } WT_THD;
>
> #define WT_TIMEOUT ETIMEDOUT
> +#define WT_FREE_TO_GO 1
Can we be sure that on all platforms, ETIMEDOUT != 1 (is there some
POSIX rule somewhere) ?
Why not pick a negative value for our "custom errors" like is already
the case for WT_DEADLOCK, WT_DEPTH_EXCEEDED? We would use -3 for
WT_FREE_TO_GO.
> #define WT_OK 0
> #define WT_DEADLOCK -1
> #define WT_DEPTH_EXCEEDED -2
>
> === modified file 'mysys/waiting_threads.c'
> --- a/mysys/waiting_threads.c 2008-11-10 19:11:27 +0000
> +++ b/mysys/waiting_threads.c 2008-12-22 18:04:24 +0000
> @@ -613,7 +613,16 @@ static int deadlock(WT_THD *thd, WT_THD
> if (ret == WT_DEADLOCK && depth)
> change_victim(blocker, &arg);
> if (arg.rc)
> + {
> + /* special return code if there's nobody to wait for */
> + if (depth == 0 && ret == WT_OK && arg.rc->owners.elements ==
> 0)
I guess the added code has to be in deadlock() and not in
wt_thd_timedwait() (which could have been more natural), to avoid adding
locking/unlocking of the rc in wt_thd_timedwait().
So, looking at the new if(), if one of these four is false:
arg.rc
depth == 0
ret == WT_OK
arg.rc->owners.elements == 0
then it is considered ok to have missed the broadcast and go into the
long timeout wait... Let's see:
- if "arg.rc->owners.elements == 0" is false, then there is some other
thread to wait for, we have to do the long timeout
- if "ret == WT_OK" is false: it could be that ret==WT_DEADLOCK, imagine
then that we go into "/* notify the victim, if appropriate */" below:
the if() there sets 'ret' to WT_OK, so it looks like we will do the long
timeout wait; but the killed victim thread will call
wt_thd_release_self() and so wake us up, ok
- if "depth == 0" is false: then it's not wt_thd_cond_timedwait() which
is calling us, so ok; is there another more fundamental reason for the
"depth==0"?
- if "arg.rc" is false: assuming "depth" is 0, this is possible only if
deadlock_search(), at first call of the recursion, exceeded search depth
(which is impossible as it is the first call) or found no resource to
wait on ("exit: OK (early)"); that last case should be impossible, as
wt_thd_cond_timedwait() logically assumes thd->waiting_for!=NULL.
Given this, the patch looks ok to me, but is there a more
straightforward explanation of these added lines than mine?
I suggest adding a comment before this added if(). To me
"depth == 0 && ret == WT_OK" is the non-obvious part, I translate
"nobody to wait for" to only "arg.rc->owners.elements == 0".
> + {
> + DBUG_ASSERT(thd == blocker);
> + DBUG_ASSERT(arg.rc == thd->waiting_for);
> + ret= WT_FREE_TO_GO;
> + }
> rc_unlock(arg.rc);
> + }
> /* notify the victim, if appropriate */
> if (ret == WT_DEADLOCK && arg.victim != thd)
> {
> @@ -888,7 +897,10 @@ int wt_thd_cond_timedwait(WT_THD *thd, p
> ret= pthread_cond_timedwait(&rc->cond, mutex, &timeout);
> if (ret == WT_TIMEOUT)
> {
> - if (deadlock(thd, thd, 0, *thd->deadlock_search_depth_long))
> + int r= deadlock(thd, thd, 0, *thd->deadlock_search_depth_long);
> + if (r == WT_FREE_TO_GO)
> + ret= WT_OK;
> + else if (r != WT_OK)
> ret= WT_DEADLOCK;
> else if (*thd->timeout_long > *thd->timeout_short)
> {
If we had a lot of time free, we could put some DBUG_EXECUTE_IF(make a
pause) into our Windows pthread_cond_timedwait() implementation between
when it wakes up and when it takes the mutex again, so that we can test
the scenario I imagined (we don't even know if this scenario is what
causes BUG#40990 in practice). But this may be overkill.
--
Mr. Guilhem Bichot <guilhem@stripped>
Sun Microsystems / MySQL, Lead Software Engineer
Bordeaux, France
www.sun.com / www.mysql.com