From: Date: January 5 2009 2:24pm Subject: Re: bzr commit into mysql-6.0-maria branch (serg:2707) Bug#40990 List-Archive: http://lists.mysql.com/commits/62479 Message-Id: <49620A10.2000203@sun.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Hello Serg, Sergei Golubchik a écrit, Le 12/22/2008 07:04 PM: > #At file:///usr/home/serg/Abk/mysql/6.0-maria/ I guess this will be ported back to 5.1-maria. > 2707 Sergei Golubchik 2008-12-22 > Bug#40990 Maria: failure of maria.test & maria_notemebedded in deadlock detection > detect a case when a blocker has removed itself and signalled after the condition timed out > but before it (cond_wait) acquired the mutex back > modified: > include/waiting_threads.h > mysys/waiting_threads.c > > === modified file 'include/waiting_threads.h' > --- a/include/waiting_threads.h 2008-11-04 13:09:32 +0000 > +++ b/include/waiting_threads.h 2008-12-22 18:04:24 +0000 > @@ -182,6 +182,7 @@ typedef struct st_wt_thd { > } WT_THD; > > #define WT_TIMEOUT ETIMEDOUT > +#define WT_FREE_TO_GO 1 Can we be sure that on all platforms, ETIMEDOUT != 1 (is there some POSIX rule somewhere) ? Why not pick a negative value for our "custom errors" like is already the case for WT_DEADLOCK, WT_DEPTH_EXCEEDED? We would use -3 for WT_FREE_TO_GO. > #define WT_OK 0 > #define WT_DEADLOCK -1 > #define WT_DEPTH_EXCEEDED -2 > > === modified file 'mysys/waiting_threads.c' > --- a/mysys/waiting_threads.c 2008-11-10 19:11:27 +0000 > +++ b/mysys/waiting_threads.c 2008-12-22 18:04:24 +0000 > @@ -613,7 +613,16 @@ static int deadlock(WT_THD *thd, WT_THD > if (ret == WT_DEADLOCK && depth) > change_victim(blocker, &arg); > if (arg.rc) > + { > + /* special return code if there's nobody to wait for */ > + if (depth == 0 && ret == WT_OK && arg.rc->owners.elements == 0) I guess the added code has to be in deadlock() and not in wt_thd_timedwait() (which could have been more natural), to avoid adding locking/unlocking of the rc in wt_thd_timedwait(). So, looking at the new if(), if one of these four is false: arg.rc depth == 0 ret == WT_OK arg.rc->owners.elements == 0 then it is considered ok to have missed the broadcast and go into the long timeout wait... Let's see: - if "arg.rc->owners.elements == 0" is false, then there is some other thread to wait for, we have to do the long timeout - if "ret == WT_OK" is false: it could be that ret==WT_DEADLOCK, imagine then that we go into "/* notify the victim, if appropriate */" below: the if() there sets 'ret' to WT_OK, so it looks like we will do the long timeout wait; but the killed victim thread will call wt_thd_release_self() and so wake us up, ok - if "depth == 0" is false: then it's not wt_thd_cond_timedwait() which is calling us, so ok; is there another more fundamental reason for the "depth==0"? - if "arg.rc" is false: assuming "depth" is 0, this is possible only if deadlock_search(), at first call of the recursion, exceeded search depth (which is impossible as it is the first call) or found no resource to wait on ("exit: OK (early)"); that last case should be impossible, as wt_thd_cond_timedwait() logically assumes thd->waiting_for!=NULL. Given this, the patch looks ok to me, but is there a more straightforward explanation of these added lines than mine? I suggest adding a comment before this added if(). To me "depth == 0 && ret == WT_OK" is the non-obvious part, I translate "nobody to wait for" to only "arg.rc->owners.elements == 0". > + { > + DBUG_ASSERT(thd == blocker); > + DBUG_ASSERT(arg.rc == thd->waiting_for); > + ret= WT_FREE_TO_GO; > + } > rc_unlock(arg.rc); > + } > /* notify the victim, if appropriate */ > if (ret == WT_DEADLOCK && arg.victim != thd) > { > @@ -888,7 +897,10 @@ int wt_thd_cond_timedwait(WT_THD *thd, p > ret= pthread_cond_timedwait(&rc->cond, mutex, &timeout); > if (ret == WT_TIMEOUT) > { > - if (deadlock(thd, thd, 0, *thd->deadlock_search_depth_long)) > + int r= deadlock(thd, thd, 0, *thd->deadlock_search_depth_long); > + if (r == WT_FREE_TO_GO) > + ret= WT_OK; > + else if (r != WT_OK) > ret= WT_DEADLOCK; > else if (*thd->timeout_long > *thd->timeout_short) > { If we had a lot of time free, we could put some DBUG_EXECUTE_IF(make a pause) into our Windows pthread_cond_timedwait() implementation between when it wakes up and when it takes the mutex again, so that we can test the scenario I imagined (we don't even know if this scenario is what causes BUG#40990 in practice). But this may be overkill. -- Mr. Guilhem Bichot Sun Microsystems / MySQL, Lead Software Engineer Bordeaux, France www.sun.com / www.mysql.com