Hi,
So I see the LATEST subtest taking >15 minutes, which is not normal, and
looks like pushbuild Windows machines have.
In my tests, after a few seconds CPU goes to 0%.
So I freeze the process in the debugger (SIGSTOP-like) and I look at the
state of all threads. Many are in pthread_cond_timedwait called by
waiting_threads-t.c:93, waiting to acquire the "lock" global mutex
defined at start of waiting_threads-t.c, after waking up from condition
(my_wincond.c:142). Some others are waiting for the same mutex
(waiting_threads-t.c:92 or 100). Two are my_wincond.c:124
(WaitForMultipleObjects(), the real cond wait), waiting to acquire the
same rwlock: one at waiting_threads.c:666 (write-lock), and the other
(who owns the "lock" global mutex) (read-lock) at waiting_threads.c:419.
Why don't the two threads wanting the rwlock get this rwlock?
The read-locker is the 8th recursive call to deadlock_search(); it wants
38d8b4 (hex address of rwlock), but it already has read-locked those
others rwlocks in previous calls of the recursion (waiting_threads.c:419):
38e638, 38d898, 38e638, 38afe8, 38dd38, 38a6d0, 3875c8.
This does not include the wanted rwlock.
So, I don't know.
Maybe it is the deadlock which you feared on certain rwlock
implementations? Note that Windows build is using our home-made
thr_rwlock.c implementation.
Still I noticed something: tens of threads are in
res= wt_thd_cond_timedwait(& thds[id].thd, &lock);
(waiting_threads-t.c:103) at the same moment. That means that they call
pthread_cond_timedwait() with different conditions but with the same
single mutex (the "lock" global one).
I know that at least one cond with multiple mutexes is forbidden
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_cond_wait.html
but here it's the opposite situation...
If you have any idea...
--
Mr. Guilhem Bichot <guilhem@stripped>
Sun Microsystems / MySQL, Lead Software Engineer
Bordeaux, France
www.sun.com / www.mysql.com