>>>>> "Joe" == Joe Kislo <joe@stripped> writes:
>> As soon as the above problem (segfault) got to my attention I provided
>> a fix for it; I think there may have been a small delay as I may have
>> been on a conference trip when you reported it first time, but this is
>> the only reason it wasn't fixed at once; We are really trying to fix
>> the bugs as soon as it comes to our attention!
Joe> Agreed! Your response was most impressive. I think I was just taken
Joe> back when the first thing I got back the mysql devel team was to try not
Joe> dropping the table :)
Did anyone from the development team really say that? In that case I
have to apologize on his behalf; I am on the other hand quite sure
that if he said this, he only meant is a temporary fix until we have
solved the problem...
>> I thought that I did fix the server :( After the fix, I couldn't get
>> any test program to produce anything wrong...
Joe> Yeah worked great after the patch. Now there seems to be a weird race
Joe> condition though [as described in my last message]. Only bites us once
Joe> a month. Once it nailed us twice in one weekend, so that stepped up the
Joe> effort to code around it :)
> Joe> 000301 21:30:08 4152 Query show tables like 'Event_CLASS'
> Joe> 4152 Query show tables like 'Event'
> Joe> 4152 Query LOCK TABLES Event_CLASS WRITE, Event
> Joe> WRITE;
> Joe> 4153 Query show tables like 'Event_CLASS'
> Joe> 4152 Query SELECT ClassKey, cl_previousEventID
> Joe> from Even
> Joe> t_CLASS;
> Joe> 4153 Query show tables like 'Event'
> Joe> 4152 Query SELECT Event.eventAction,
> Joe> Event.rawTimeCreate
> Joe> d, Event.courseID, Event.sectionID, Event.eventCode,
> Joe> Event.rawTimeUpdated, Event
> Joe> .eventTime, Event.eventID, Event.roundID from Event where (
> Joe> Event.eventTime<'951
> Joe> 964208.868' );
> Joe> 4153 Query LOCK TABLES Event_CLASS WRITE, Event
> Joe> WRITE;
> Joe> 4152 Query DELETE from Event where eventID='35';
> Joe> 4152 Query SELECT count(eventID) from Event;
> Joe> 4152 Query DROP table Event;
>> I can't see from the above that 4153 would get any LOCK on the table.
Joe> Are you saying this because 4153 should NOT get the lock, or because
Joe> mysql prints to the log SQL queries as it receives them --before it
Joe> processes them? Meaning that although 4153 printed "lock tables" to the
Joe> log prior to 4152 yielding the lock, 4153 did not GET the lock. yeah, I
Joe> think that makes sense.
Yes; The normal log will print the entry as soon as the server
receives it. The update log will print the entry when it has been processed.
Joe> I'm not sure WHO has the lock, but I was hoping
Joe> you guys were able to decyper the "waiting on cond" state that thread
Joe> 4152 turns into in this situation. I assume that means it's waiting on
Joe> a condition variable... any reason why it would be STUCK doing that?
After some additional thinking you are probably right that this is
could be a dead-lock situation that we haven't thought about before;
This only happens if you are locking more that one table in two threads.
The problem is that to be able to drop Event, MySQL will send a signal
to all threads to close Event as soon as possible. It will detect
threads that are waiting for a lock on this table and abort the lock.
In this case the 4153 thread is probably waiting for Event_CLASS and
the code that breaks locks doesn't notice that 'Event' is kept open
by this thread :( In other words, 4152 is waiting for Event to come
free while 4153 is waiting for 4152 to unlock Event_class :(
One reason that this doesn't always happens is that locks are get in a
pre-determinated order depending on how the blocks are allocated in
memory. If Event is allocated before Event_CLASS the above code works
until something forces Event to be closed (for example a delete and
recreate of Event).
This means it's very likely that this is a bug in the MySQL code and
not in the thread library.
The big problem is that I, who is most familiar with the lock code, is
VERY busy for at least 2 weeks, so I can't promise a patch before this:(
The question is, now that you know the problem, can you work around it
some way until we get time to fix this ?