On 1/21/09 3:05 PM, Ingo Strüwing wrote:
> Hi Davi,
> thanks for the review. Please find comments below.
> Davi Arnaut, 20.01.2009 21:22:
>> On 1/11/09 4:05 PM, Ingo Struewing wrote:
>> The usual way to do this, as I've seen in other projects, is to use
>> non-blocking i/o and have a file descriptor for thread notification.
>> Have you considered this alternative?
> No. I want to keep the changes to the VIO subsystem as small as
> possible. But I agree that this would probably solve the problem ultimately.
Less is good, but not always.
>> Nonetheless, this approach of closing the socket needs more detailing:
>> Is it portable? I mean, its it guaranteed (standard) that it will
>> interrupt a thread blocked on a read/write?
> No guarantee. It does definitely not work on Windows. But the change was
> meant to reduce the chances for missing a kill, not to make it 100%
> safe. The user can repeat the KILL statement anyway, but the tests
> should profit from a reduced chance to miss a kill.
OK, please make this clear in the changeset comment.
>> Will the client stay connected after the read direction is closed?
> Yes. Keeping the write direction open means that the client can still
> receive data. The worst thing that can happen is that the client gets an
> error on an attempt to write to the socket. If it interprets this as
> "Lost connection to server", then we've won anyway. Just an error
> message from the server might get lost.
My doubt is about a client polling his socket and bailing out in case
poll returns a error if one of the directions is closed. This needs to
be documented as a protocol change.
>> Why don't we just default to close the socket? Sending a error message
>> seems like a corner case that is only meaningful when the connection is
>> sitting idle.
> The client can read the error message from the socket whenever it is
> going to await a reply from the server. The socket buffers it until then.
>> Is there anything that prevents the thread being killed to try to read
>> or write to the "closed" socket? For example, a re-using case:
>> thr1> sleeping right before a write(fd=10)
>> thr2> kill thr1
>> thr2> close(fd=10)
>> thr0> new connection, fd=accept()=10
>> thr1> awakes, write(fd=10)
> That's an interesting scenario. Perhaps it is the reason why killing by
> close had been disabled some time ago.
> When closing the read direction only, the file descriptor remains in
> use. So we are safe in this case.
My worries are related to the case where the socket is closed.
> The above does also suggest not to default to a full close, if a
> half-close does not work.
>> What happens if we close one direction of a SSL connection?
> Hm. Good catch. While analyzing the code path of the server
> communication, I did not stumble over SSL. Now I looked a bit deeper.
> Net_serv.cc uses calls like vio_read() and vio_write(). Mysqld.cc and
> sql_class.cc use vio_close(). Functions of these names exist in
> viosocket.c. So I worked on these.
> After trying to find code for SSL handling, I stumbled over definitions
> in violite.h. The above mentioned "functions" are macros. In case of SSL
> they expand to vio_ssl_read(), vio_ssl_write(), and vio_ssl_close().
> So I will have to rework the patch to make vio_close_read() work with
> socket based connections, while returning error for other types. In case
> of SSL it will break the connection and prevent error message transport,
> but that's probably better than miss the kill.
>> I've moved the declarations to a mdl_sync.h header on December 12. You
>> might want to pull from main and drop the conflicting changes as mine
>> was done in a slightly different way.
> I would like to keep adding sync points as easy as possible. If
> debug_sync.h is included in mysql_priv.h, it is available in most files.
> No extra check for "#include<debug_sync.h>" needs to be done after
> adding a sync point. Since debug_sync.h has protection against double
> use, I'll leave the explicit uses in the files where you added them.
Please, don't add it back to mysql_priv.h.. easy of use is not a concern
> BTW. The quoted patch does not include debug_sync.h itself.
Yes, I fixed it later.
> You did also move the old macro DBUG_SYNC_POINT to debug_sync.h. This is
> an independent facility. But I verified that it is not in real use any
> more. So I will remove it together with its only use in sql_repl.cc,
> which is not triggered from tests or otherwise anyway.
>>> per-file messages:
>>> Bug#37780 - main.kill fails randomly
>> Out of curiosity, why do you add the bug name to each per-file comment?
> Once upon a time, when we had Bitkeeper, I often used revtool on single
> files to find certain changes. Bug/WL numbers helped me to avoid to take
> the detour over the changeset comment to understand in which context it
> was added. This may work with bzr too, or will be added later.
gannotate has a "General" tab which shows the main changeset comment and
a "Per-file" tab that shows the per-file comments for each line.
>>> --- a/client/mysql.cc 2008-12-05 01:05:05 +0000
>>> +++ b/client/mysql.cc 2009-01-11 18:05:39 +0000
>>> @@ -1133,10 +1133,13 @@ int main(int argc,char *argv)
>>> if (mysql_server_init(embedded_server_arg_count,
>>> (char**) embedded_server_groups))
>>> + /* purecov: begin inspected */
>>> + batch_readline_end(status.line_buff);
>>> + /* purecov: end */
>> Unrelated, please commit separated so it can be pushed early.
> Reporting a bug for this, and waiting for it to become tagged by triage,
> may finally take longer than to have it included here.
OK to push separately. I didn't suggest a separate bug report.
>>> --- a/mysql-test/t/debug_sync.test 2008-04-29 09:22:04 +0000
>>> +++ b/mysql-test/t/debug_sync.test 2009-01-11 18:05:39 +0000
>>> @@ -206,7 +206,7 @@ SET GLOBAL DEBUG_SYNC= 'p0 CLEAR';
>>> # immediately after setting of the DEBUG_SYNC variable.
>>> # So it is executed before the SET statement ends.
>>> -# NOTE: There is only on global signal (say "signal post" or "flag
>>> +# NOTE: There is only one global signal (say "signal post" or "flag
>>> # A SIGNAL action writes its signal into it ("sets a flag").
>>> # The signal persists until explicitly overwritten.
>>> # To avoid confusion for later tests, it is recommended to clear
>> Feel free to push this kind of stuff anytime without approval.
> This is not what I have learned in my 5 years with MySQL. Never push
> anything without approval. The only exception are changes made during merge.
> While I would like to have a rule that allows to push trivial changes, I
> fear that it can go wrong to leave it to the developer to decide, what a
> "trivial" change is.
Right, developers can't be trusted! :)
>>> === modified file 'mysql-test/t/kill.test'
>>> --- a/mysql-test/t/kill.test 2008-05-25 07:19:02 +0000
>>> +++ b/mysql-test/t/kill.test 2009-01-11 18:05:39 +0000
>>> @@ -4,329 +4,594 @@
>>> # This would work if mysqltest run would be threaded and handle each
>>> # connection in a separate thread.
>>> --- source include/not_embedded.inc
>>> +--source include/not_embedded.inc
>>> +--source include/have_debug_sync.inc
>> Hum, we now restrict kill.test to debug builds. BTW, all tests really
>> needed a port to sync points?
> Yes. All tests had a sleep or multiple wait_conditions (short sleeps) in
> it. My goal was to replace them all. This made the test case much
> faster. OTOH each sleep can someday become too short. So syncing seems
> to be the right thing to do.
>>> +--echo # Create two connections for use throughout the test case.
>>> +--connect (con1, localhost, root,,)
>>> +--connect (con2, localhost, root,,)
>>> +--echo #
>>> +--echo # Test kill connection when blocked in read/recv().
>>> +--echo #
>>> + --echo # connection con1, store own connection id
>>> + --connection con1
>>> + --let $ID= `SELECT @id := CONNECTION_ID()`
>>> + --echo # connection con2, store con1 connection id
>>> + --connection con2
>>> + --let $ignore= `SELECT @id := $ID`
>>> + --echo # connection con1, request a signal from thread end
>>> + --connection con1
>>> + SET DEBUG_SYNC= 'thread_end SIGNAL con1_end';
>>> + #
>>> + --echo # connection con1, disable reconnect
>>> + --disable_reconnect
>> Whats up with all those spaces? never seem this on test case and it
>> makes things harder to read.
> Oh. What a surprise. Everyone else, I have talked about this, agreed
> that indentation of the threads makes the tests better readable. Or are
> the empty lines disturbing you?
If its just me.. OK then.
>>> + Note that it is important to repeat the test for thd->killed after
>>> + enter_cond(). Otherwise the killing thread may kill this thread after
>>> + it tested thd->killed in the loop condition and bfore it registered
>> bfore> before
>>> + the condition variable and mutex in enter_cond(). In this case, the
>>> + killing thread does not know that this thread is going to wait on a
>>> + condition variable. It would just set THD::killed. But if we would not
>>> + test it again, we would go asleep though we are killed. If the killing
>>> + thread would kill us when we are after the second test, but still
>>> + before sleeping, we hold the mutex, which is registered in mysys_var.
>>> + The killing thread would try to acquire the mutex before signalling
>> signalling -> signaling
>>> + the condition variable. Since the mutex is only released implicitly in
>>> + pthread_cond_wait(), the signalling happens at the right place. We
>>> + have a safe synchronization.
>> I think you should just drop the above paragraph and point out that
>> people should know how to properly use wait conditions.
> Not everybody is as familiar with "wait conditions" as you. IMHO we have
> much too few comments in the code. Many things are not properly
> documented. This may be a result from specialists thinking everybody
> knows, how things work.
> If people would "know how to properly use wait conditions", the mistakes
> of not checking thd->killed inside the "condition" would not have been done.
> Please allow me to leave the paragraph in.
OK, but if they didn't pay attention to the read and understand
pthread_cond_wait documentation, I think they won't bother to read your
long comments too.
>>> --- a/sql/event_queue.cc 2008-05-14 13:49:41 +0000
>>> +++ b/sql/event_queue.cc 2009-01-11 18:05:39 +0000
>>> @@ -723,6 +723,18 @@ Event_queue::unlock_data(const char *fun
>>> abstime If not null then call pthread_cond_timedwait()
>>> func Which function is requesting cond_wait
>>> line On which line cond_wait is requested
>>> + @note 'killed': It is not sufficient to test thd->killed at the top of
>>> + the loop (from which this function is called). If the thread is killed
>>> + after it tested thd->killed and before it registered the mutex (here:
>>> + LOCK_event_queue) in its mysys_var (in enter_cond()), the killing
>>> + thread won't do anything but set thd->killed. If we don't check it
>>> + again after registering the mutex, we would start waiting, though
>>> + thd->killed is set. The solution is to check thd->killed again,
>>> + registering the mutex and before start waiting. An additional
>>> + advantage is the memory barrier, included in the mutex handling. This
>>> + prevents out-of-order execution in the processor. Thus we cannot read
>>> + an old value.
>> Let's shorten or drop this comment, its already explained in various
>> other places.
> I noticed the same when I read through my patch. But then I thought that
> a reader of the code won't read all these places in one go normally. So
> I think, better to repeat this explanation at different places than to
> risk someone to copy a section without the explanation and removing the
> extra check of thd->killed to "optimize" the code.
>>> --- a/sql/lock.cc 2008-10-23 16:29:52 +0000
>>> +++ b/sql/lock.cc 2009-01-11 18:05:39 +0000
>>> @@ -1127,6 +1127,14 @@ bool lock_global_read_lock(THD *thd)
>>> const char *new_message= "Waiting to get readlock";
>>> (void) pthread_mutex_lock(&LOCK_global_read_lock);
>>> + new_message);
>>> + DBUG_PRINT("info",
>>> + ("waiting_for: %d protect_against: %d",
>>> + waiting_for_read_lock,
>>> + waiting_for_read_lock++;
>> Care to explain why this is being moved?
> I thought, I explained it enough in the code comment below:
>>> #if defined(ENABLED_DEBUG_SYNC)
>>> + Call the sync point after enter_cond() so that the proc_info is
>>> + available immediately after the sync point sends a SIGNAL. This
>>> + makes tests more reliable.
> If this is insufficient explanation, can you please try to say, what the
> open questions are?
>>> --- a/sql/mysqld.cc 2008-12-16 20:54:07 +0000
>>> +++ b/sql/mysqld.cc 2009-01-11 18:05:39 +0000
>>> @@ -5804,7 +5804,6 @@ enum options_mysqld
>>> - OPT_INNODB_ROLLBACK_ON_TIMEOUT,
>> Push separatedly. You don't need approval for trivial stuff.
> As mentioned before, I doubt it. I would at least once need agreement
> about this from my devlead. This has never happened yet. And I'm
> definitely not the only one, who does not push without approval of even
> trivial stuff.
OK from me to push separately.
>>> === modified file 'sql/slave.cc'
>>> --- a/sql/slave.cc 2008-12-08 11:29:15 +0000
>>> +++ b/sql/slave.cc 2009-01-11 18:05:39 +0000
>>> @@ -557,9 +557,17 @@ int start_slave_thread(pthread_handler h
>>> while (start_id == *slave_run_id)
>>> DBUG_PRINT("sleep",("Waiting for slave thread to start"));
>>> - const char* old_msg = thd->enter_cond(start_cond,cond_lock,
>>> - "Waiting for slave thread
>>> to start");
>>> - pthread_cond_wait(start_cond,cond_lock);
>>> + const char* old_msg= thd->enter_cond(start_cond,cond_lock,
>>> + "Waiting for slave thread
>>> to start");
>>> + /*
>>> + It is not sufficient to test this at loop bottom. We must test
>>> + it after registering the mutex in enter_cond(). If the kill
>>> + happens after testing of thd->killed and before the mutex is
>>> + registered, we would otherwise go waiting though thd->killed is
>>> + set.
>>> + */
>>> + if (!thd->killed)
>>> + pthread_cond_wait(start_cond,cond_lock);
>>> pthread_mutex_lock(cond_lock); // re-acquire it as exit_cond()
>>> if (thd->killed)
>> BTW, don't we need to release start_lock on this error path?
> Agree. Seems like the slave has never before been killed in this situation.
>>> + @note 'killed': It is not sufficient to test thd->killed at the top of
>>> + the loop. If the thread is killed after it tested thd->killed and
>>> + before it registered the mutex (here: LOCK_open) in its mysys_var (in
>>> + enter_cond()), the killing thread won't do anything but set
>>> + thd->killed. If we don't check it again after registering the mutex,
>>> + we would start waiting, though thd->killed is set. The solution is to
>>> + check thd->killed again, after registering the mutex and before start
>>> + waiting. Since we cannot 'break' between enter_cond() and exit_cond(),
>>> + we just omit the wait. An additional advantage is the memory barrier,
>>> + included in the mutex handling. This prevents out-of-order execution
>>> + in the processor. Thus we cannot read an old value.
>> Theorically, this only applies for the same mutex. Better stick to
>> pthread rules.
> Agree. But do you suggest a change here?
I need to think about it.
-- Davi Arnaut