List:Commits« Previous MessageNext Message »
From:Ingo Strüwing Date:January 21 2009 5:05pm
Subject:Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)
Bug#37780
View as plain text  
Hi Davi,

thanks for the review. Please find comments below.

Davi Arnaut, 20.01.2009 21:22:

...
> On 1/11/09 4:05 PM, Ingo Struewing wrote:
...
>>   2749 Ingo Struewing    2009-01-11
>>        Bug#37780 - main.kill fails randomly
...
>>        1. There is no safe way to detect, when a thread enters the
>> blocking
>>        state of a read(2) or recv(2) system call, where it can be
>> interrupted
>>        by a signal. Hence it is not possible to wait for the right
>> moment to
>>        send a kill signal. To be safe, we need to close the
>> connection. This
>>        has the disadvantage, that we cannot send a status or error
>> message.
>>        This patch tries to close the read direction of a socket only,
>> which
>>        avoids this problem, but terminates a read(2) or recv(2)
>> anyway. If it
>>        is not possible to keep the write direction open, we close the
>>        connection completely to be safe.
>
> The usual way to do this, as I've seen in other projects, is to use
> non-blocking i/o and have a file descriptor for thread notification.
> Have you considered this alternative?



No. I want to keep the changes to the VIO subsystem as small as
possible. But I agree that this would probably solve the problem ultimately.

>
> Nonetheless, this approach of closing the socket needs more detailing:
>
> Is it portable? I mean, its it guaranteed (standard) that it will
> interrupt a thread blocked on a read/write?


No guarantee. It does definitely not work on Windows. But the change was
meant to reduce the chances for missing a kill, not to make it 100%
safe. The user can repeat the KILL statement anyway, but the tests
should profit from a reduced chance to miss a kill.

>
> Will the client stay connected after the read direction is closed?


Yes. Keeping the write direction open means that the client can still
receive data. The worst thing that can happen is that the client gets an
error on an attempt to write to the socket. If it interprets this as
"Lost connection to server", then we've won anyway. Just an error
message from the server might get lost.

>
> Why don't we just default to close the socket? Sending a error message
> seems like a corner case that is only meaningful when the connection is
> sitting idle.


The client can read the error message from the socket whenever it is
going to await a reply from the server. The socket buffers it until then.

>
> Is there anything that prevents the thread being killed to try to read
> or write to the "closed" socket? For example, a re-using case:
>
> thr1> sleeping right before a write(fd=10)
> thr2> kill thr1
> thr2> close(fd=10)
> thr0> new connection, fd=accept()=10
> thr1> awakes, write(fd=10)


That's an interesting scenario. Perhaps it is the reason why killing by
close had been disabled some time ago.

When closing the read direction only, the file descriptor remains in
use. So we are safe in this case.

The above does also suggest not to default to a full close, if a
half-close does not work.

>
> What happens if we close one direction of a SSL connection?


Hm. Good catch. While analyzing the code path of the server
communication, I did not stumble over SSL. Now I looked a bit deeper.

Net_serv.cc uses calls like vio_read() and vio_write(). Mysqld.cc and
sql_class.cc use vio_close(). Functions of these names exist in
viosocket.c. So I worked on these.

After trying to find code for SSL handling, I stumbled over definitions
in violite.h. The above mentioned "functions" are macros. In case of SSL
they expand to vio_ssl_read(), vio_ssl_write(), and vio_ssl_close().

So I will have to rework the patch to make vio_close_read() work with
socket based connections, while returning error for other types. In case
of SSL it will break the connection and prevent error message transport,
but that's probably better than miss the kill.

>
>>        2. Before waiting on a condition variable, we register it
>> together with
>>        a synchronizating mutex in THD::mysys_var. After this, we need
>> to test
>
> synchronizating -> synchronizing
>
>>        THD::killed again. At some places we did only test it in a loop
>>        condition before the registration. When THD::killed had been
>> set between
>>        this test and the registration, we entered waiting without
>> noticing the
>>        killed flag.
>
> OK.
>
>>        3a. The string value for the DEBUG_SYNC system variable
>> received '\0'
>>        terminators. It could not be re-used. This was a problem for
>> setting
>>        sync points in stored procedures/functions. It worked only with
>> the
>>        first execution of the procedure/function. Using a user
>> variable instead
>>        of a string literal did not work (e.g. SET DEBUG_SYNC= @a).
>> Fixed by a
>>        correct retrieval of the set value and ensuring a copy of the
>> string.
>
> This seems like a separate bug fix.


Yes, but without this fix I cannot rewrite all of the tests. Some would
still need their sleep, which can be too short, if the machine is
extremely busy.

...
> I've moved the declarations to a mdl_sync.h header on December 12. You
> might want to pull from main and drop the conflicting changes as mine
> was done in a slightly different way.
>
> http://lists.mysql.com/commits/61466


I would like to keep adding sync points as easy as possible. If
debug_sync.h is included in mysql_priv.h, it is available in most files.
No extra check for "#include <debug_sync.h>" needs to be done after
adding a sync point. Since debug_sync.h has protection against double
use, I'll leave the explicit uses in the files where you added them.

BTW. The quoted patch does not include debug_sync.h itself.

You did also move the old macro DBUG_SYNC_POINT to debug_sync.h. This is
an independent facility. But I verified that it is not in real use any
more. So I will remove it together with its only use in sql_repl.cc,
which is not triggered from tests or otherwise anyway.

>
>> added:
>>    sql/debug_sync.h
>> modified:
>>    client/mysql.cc
>>    include/my_net.h
>>    include/violite.h
>>    mysql-test/r/kill.result
>>    mysql-test/suite/backup/t/backup_triggers_and_events.test
>>    mysql-test/t/debug_sync.test
>>    mysql-test/t/kill.test
>>    sql/CMakeLists.txt
>>    sql/Makefile.am
>>    sql/debug_sync.cc
>>    sql/event_queue.cc
>>    sql/lock.cc
>>    sql/mdl.cc
>>    sql/mysql_priv.h
>>    sql/mysqld.cc
>>    sql/slave.cc
>>    sql/sql_base.cc
>>    sql/sql_class.cc
>>    sql/sql_class.h
>>    sql/sql_parse.cc
>>    sql/sql_select.cc
>>    vio/viosocket.c
>>
>> per-file messages:
>>    client/mysql.cc
>>      Bug#37780 - main.kill fails randomly
>
> Out of curiosity, why do you add the bug name to each per-file comment?


Once upon a time, when we had Bitkeeper, I often used revtool on single
files to find certain changes. Bug/WL numbers helped me to avoid to take
the detour over the changeset comment to understand in which context it
was added. This may work with bzr too, or will be added later.

...
>> --- a/client/mysql.cc    2008-12-05 01:05:05 +0000
>> +++ b/client/mysql.cc    2009-01-11 18:05:39 +0000
>> @@ -1133,10 +1133,13 @@ int main(int argc,char *argv[])
>>     if (mysql_server_init(embedded_server_arg_count,
>> embedded_server_args,
>>                           (char**) embedded_server_groups))
>>     {
>> +    /* purecov: begin inspected */
>>       put_error(NULL);
>>       free_defaults(defaults_argv);
>> +    batch_readline_end(status.line_buff);
>>       my_end(0);
>>       exit(1);
>> +    /* purecov: end */
>
> Unrelated, please commit separated so it can be pushed early.


Reporting a bug for this, and waiting for it to become tagged by triage,
may finally take longer than to have it included here.

...
>> --- a/mysql-test/t/debug_sync.test    2008-04-29 09:22:04 +0000
>> +++ b/mysql-test/t/debug_sync.test    2009-01-11 18:05:39 +0000
>> @@ -206,7 +206,7 @@ SET GLOBAL DEBUG_SYNC= 'p0 CLEAR';
>>   #       immediately after setting of the DEBUG_SYNC variable.
>>   #       So it is executed before the SET statement ends.
>>   #
>> -# NOTE: There is only on global signal (say "signal post" or "flag
>> mast").
>> +# NOTE: There is only one global signal (say "signal post" or "flag
>> mast").
>>   #       A SIGNAL action writes its signal into it ("sets a flag").
>>   #       The signal persists until explicitly overwritten.
>>   #       To avoid confusion for later tests, it is recommended to clear
>
> Feel free to push this kind of stuff anytime without approval.


This is not what I have learned in my 5 years with MySQL. Never push
anything without approval. The only exception are changes made during merge.

While I would like to have a rule that allows to push trivial changes, I
fear that it can go wrong to leave it to the developer to decide, what a
"trivial" change is.

>
>>
>> === modified file 'mysql-test/t/kill.test'
>> --- a/mysql-test/t/kill.test    2008-05-25 07:19:02 +0000
>> +++ b/mysql-test/t/kill.test    2009-01-11 18:05:39 +0000
>> @@ -4,329 +4,594 @@
>>   # This would work if mysqltest run would be threaded and handle each
>>   # connection in a separate thread.
>>   #
>> --- source include/not_embedded.inc
>> +--source include/not_embedded.inc
>> +--source include/have_debug_sync.inc
>
> Hum, we now restrict kill.test to debug builds. BTW, all tests really
> needed a port to sync points?


Yes. All tests had a sleep or multiple wait_conditions (short sleeps) in
it. My goal was to replace them all. This made the test case much
faster. OTOH each sleep can someday become too short. So syncing seems
to be the right thing to do.

>
> [..]
>
>> +
>> +--echo # Create two connections for use throughout the test case.
>> +--connect (con1, localhost, root,,)
>> +--connect (con2, localhost, root,,)
>> +
>> +--echo #
>> +--echo # Test kill connection when blocked in read/recv().
>> +--echo #
>> +
>> +    --echo # connection con1, store own connection id
>> +    --connection con1
>> +    --let $ID= `SELECT @id := CONNECTION_ID()`
>> +
>> +        --echo # connection con2, store con1 connection id
>> +        --connection con2
>> +        --let $ignore= `SELECT @id := $ID`
>> +
>> +    --echo # connection con1, request a signal from thread end
>> +    --connection con1
>> +    SET DEBUG_SYNC= 'thread_end SIGNAL con1_end';
>> +    #
>> +    --echo # connection con1, disable reconnect
>> +    --disable_reconnect
>
> Whats up with all those spaces? never seem this on test case and it
> makes things harder to read.


Oh. What a surprise. Everyone else, I have talked about this, agreed
that indentation of the threads makes the tests better readable. Or are
the empty lines disturbing you?

...

>> +  Note that it is important to repeat the test for thd->killed after
>> +  enter_cond(). Otherwise the killing thread may kill this thread after
>> +  it tested thd->killed in the loop condition and bfore it registered
>
> bfore > before
>
>> +  the condition variable and mutex in enter_cond(). In this case, the
>> +  killing thread does not know that this thread is going to wait on a
>> +  condition variable. It would just set THD::killed. But if we would not
>> +  test it again, we would go asleep though we are killed. If the killing
>> +  thread would kill us when we are after the second test, but still
>> +  before sleeping, we hold the mutex, which is registered in mysys_var.
>> +  The killing thread would try to acquire the mutex before signalling
>
> signalling -> signaling
>
>> +  the condition variable. Since the mutex is only released implicitly in
>> +  pthread_cond_wait(), the signalling happens at the right place. We
>> +  have a safe synchronization.
>
> I think you should just drop the above paragraph and point out that
> people should know how to properly use wait conditions.


Not everybody is as familiar with "wait conditions" as you. IMHO we have
much too few comments in the code. Many things are not properly
documented. This may be a result from specialists thinking everybody
knows, how things work.

If people would "know how to properly use wait conditions", the mistakes
of not checking thd->killed inside the "condition" would not have been done.

Please allow me to leave the paragraph in.

...

>> --- a/sql/event_queue.cc    2008-05-14 13:49:41 +0000
>> +++ b/sql/event_queue.cc    2009-01-11 18:05:39 +0000
>> @@ -723,6 +723,18 @@ Event_queue::unlock_data(const char *fun
>>         abstime If not null then call pthread_cond_timedwait()
>>         func    Which function is requesting cond_wait
>>         line    On which line cond_wait is requested
>> +
>> +  @note 'killed': It is not sufficient to test thd->killed at the top of
>> +  the loop (from which this function is called). If the thread is killed
>> +  after it tested thd->killed and before it registered the mutex (here:
>> +  LOCK_event_queue) in its mysys_var (in enter_cond()), the killing
>> +  thread won't do anything but set thd->killed. If we don't check it
>> +  again after registering the mutex, we would start waiting, though
>> +  thd->killed is set. The solution is to check thd->killed again, after
>> +  registering the mutex and before start waiting. An additional
>> +  advantage is the memory barrier, included in the mutex handling. This
>> +  prevents out-of-order execution in the processor. Thus we cannot read
>> +  an old value.
>>   */
>
> Let's shorten or drop this comment, its already explained in various
> other places.


I noticed the same when I read through my patch. But then I thought that
a reader of the code won't read all these places in one go normally. So
I think, better to repeat this explanation at different places than to
risk someone to copy a section without the explanation and removing the
extra check of thd->killed to "optimize" the code.

...
>> --- a/sql/lock.cc    2008-10-23 16:29:52 +0000
>> +++ b/sql/lock.cc    2009-01-11 18:05:39 +0000
>> @@ -1127,6 +1127,14 @@ bool lock_global_read_lock(THD *thd)
>>       const char *new_message= "Waiting to get readlock";
>>       (void) pthread_mutex_lock(&LOCK_global_read_lock);
>>
>> +   
> old_message=thd->enter_cond(&COND_global_read_lock,&LOCK_global_read_lock,

>> +                                new_message);
>> +    DBUG_PRINT("info",
>> +               ("waiting_for: %d  protect_against: %d",
>> +                waiting_for_read_lock,
>> protect_against_global_read_lock));
>> +
>> +    waiting_for_read_lock++;
>
> Care to explain why this is being moved?


I thought, I explained it enough in the code comment below:

...
>>   #if defined(ENABLED_DEBUG_SYNC)
...
>> +      Call the sync point after enter_cond() so that the proc_info is
>> +      available immediately after the sync point sends a SIGNAL. This
>> +      makes tests more reliable.


If this is insufficient explanation, can you please try to say, what the
open questions are?

...

>> --- a/sql/mysqld.cc    2008-12-16 20:54:07 +0000
>> +++ b/sql/mysqld.cc    2009-01-11 18:05:39 +0000
>> @@ -5804,7 +5804,6 @@ enum options_mysqld
>>     OPT_BACKUP_PROGRESS_LOG,
>>     OPT_SLOW_LOG,
>>     OPT_THREAD_HANDLING,
>> -  OPT_INNODB_ROLLBACK_ON_TIMEOUT,
>>     OPT_SECURE_FILE_PRIV,
>>     OPT_MIN_EXAMINED_ROW_LIMIT,
>>     OPT_LOG_SLOW_SLAVE_STATEMENTS,
>
> Push separatedly. You don't need approval for trivial stuff.


As mentioned before, I doubt it. I would at least once need agreement
about this from my devlead. This has never happened yet. And I'm
definitely not the only one, who does not push without approval of even
trivial stuff.

>
>>
>> === modified file 'sql/slave.cc'
>> --- a/sql/slave.cc    2008-12-08 11:29:15 +0000
>> +++ b/sql/slave.cc    2009-01-11 18:05:39 +0000
>> @@ -557,9 +557,17 @@ int start_slave_thread(pthread_handler h
>>       while (start_id == *slave_run_id)
>>       {
>>         DBUG_PRINT("sleep",("Waiting for slave thread to start"));
>> -      const char* old_msg = thd->enter_cond(start_cond,cond_lock,
>> -                                            "Waiting for slave thread
>> to start");
>> -      pthread_cond_wait(start_cond,cond_lock);
>> +      const char* old_msg= thd->enter_cond(start_cond,cond_lock,
>> +                                           "Waiting for slave thread
>> to start");
>> +      /*
>> +        It is not sufficient to test this at loop bottom. We must test
>> +        it after registering the mutex in enter_cond(). If the kill
>> +        happens after testing of thd->killed and before the mutex is
>> +        registered, we would otherwise go waiting though thd->killed is
>> +        set.
>> +      */
>> +      if (!thd->killed)
>> +        pthread_cond_wait(start_cond,cond_lock);
>>         thd->exit_cond(old_msg);
>>         pthread_mutex_lock(cond_lock); // re-acquire it as exit_cond()
>> released
>>         if (thd->killed)
>
> BTW, don't we need to release start_lock on this error path?


Agree. Seems like the slave has never before been killed in this situation.

...
>> +  @note 'killed': It is not sufficient to test thd->killed at the top of
>> +  the loop. If the thread is killed after it tested thd->killed and
>> +  before it registered the mutex (here: LOCK_open) in its mysys_var (in
>> +  enter_cond()), the killing thread won't do anything but set
>> +  thd->killed. If we don't check it again after registering the mutex,
>> +  we would start waiting, though thd->killed is set. The solution is to
>> +  check thd->killed again, after registering the mutex and before start
>> +  waiting. Since we cannot 'break' between enter_cond() and exit_cond(),
>> +  we just omit the wait. An additional advantage is the memory barrier,
>> +  included in the mutex handling. This prevents out-of-order execution
>> +  in the processor. Thus we cannot read an old value.
>
> Theorically, this only applies for the same mutex. Better stick to
> pthread rules.


Agree. But do you suggest a change here?

...
>> +  /*
>> +    In case of a connection kill, we want to place a memory barrier
>> +    before sending the kill signal. This should prevent the processor
>> +    from out-of-order execution of setting the killed flag and sending
>> +    the signal. Also a possible existing write queue in the processor is
>> +    to be flushed. The memory barrier is included in mutex locking and
>> +    unlocking. We can use an arbitrary un-locked mutex here, as we do
>> +    not always read the flag in a mutex. The cases where we do it are
>> +    handled below (mysys_var).
>> +  */
>> +  if (state_to_set != THD::KILL_QUERY)
>> +    pthread_mutex_lock(&LOCK_thread_count);
>> +
>> +  /*
>> +    Set the 'killed' flag of 'this', which is the target THD object.
>> +  */
>>     killed= state_to_set;
>> +
>>     if (state_to_set != THD::KILL_QUERY)
>>     {
>> +    /* Memory barrier after setting the killed flag. */
>> +    pthread_mutex_unlock(&LOCK_thread_count);
>
> Again, the only visibility guarantee is for threads that will lock this
> mutex.


Agree. But still the goal is to reduce the chance of missing a kill, not
to make it guaranteed safe. I guess that on many systems the mutex
handling includes a memory barrier, so that we avoid out-of-order
execution here on the write side. This may reduce the probability of a miss.

But I was unsure from the beginning, if this construct would be
accepted. Especially as it establishes a required lock order between
LOCK_delete and LOCK_thread_count. I am ready to remove the locks here
or replace by another mutex, if the reviewers want so.

...
>>         /*
>> -        In addition to a signal, let's close the socket of the thread
>> that
>> -        is being killed. This is to make sure it does not block if the
>> -        signal is lost. This needs to be done only on platforms where
>> -        signals are not a reliable interruption mechanism.
>> -
>> -        If we're killing ourselves, we know that we're not blocked,
>> so this
>> -        hack is not used.
>> +        In addition to a signal, let's close the socket of the thread
>> +        that is being killed. This is to make sure it does not block if
>> +        the signal is lost. Another reason is that the target
>> +        thread may not have been blocked on the socket at the moment
>> +        when we sent the signal. It may come later to the blocking
>> +        point. The closed connection prevents it from blocking.
>> +
>> +        We try to keep the write direction of the socket open. This
>> +        allows to send a status or error message. If it fails, the
>> +        connection is completely closed.
>> +
>> +        If we are killing ourselves, we know that we are not blocked.
>> +        We also know that we will check thd->killed before we go for
>> +        reading the next statement.
>>         */
>>
>>         close_active_vio();
>
> Back to my initial question, what prevents the thread being killed to
> read from or write to this socket?


I guess, your question means: "... after the read direction is closed".
Then the answer is that a read will return 0 (EOF). A write will succeed.

If the socket is closed both directions, read and write will return -1
and EBADF. If a read was waiting for data when the socket is closed, it
will probably also return 0 (EOF). If a write was waiting for space in
the socket buffer, it will probably return -1 and EPIPE, and perhaps
emit SIGPIPE, which we set to ignore.

...
>> +int vio_close_read(Vio *vio)
>> +{
>> +  int rc= 0;
>> +  DBUG_ENTER("vio_close_read");
>> +
>> +  if (vio->type != VIO_CLOSED)
>> +  {
>> +    DBUG_ASSERT(vio->sd>= 0);
>> +
>> +    if ((vio->type == VIO_TYPE_NAMEDPIPE) || shutdown(vio->sd, SHUT_RD))
>
> What if the type is VIO_TYPE_SHARED_MEMORY?


As mentioned above, I have to fix this.

>
>> +      rc= vio_close(vio); /* purecov: inspected */
>> +  }
>> +  DBUG_RETURN(rc);
>> +}
>>
>
> I would prefer that the caller handled the close case.

Ok. I was indifferent between the two approaches. I'll go with your
preference.

Regards
Ingo
-- 
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Dr. Roland Bömer
Vorsitzender des Aufsichtsrates: Martin Häring   HRB München 161028

Thread
bzr commit into mysql-6.0-backup branch (ingo.struewing:2749) Bug#37780Ingo Struewing11 Jan
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Davi Arnaut20 Jan
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Ingo Strüwing21 Jan
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Davi Arnaut26 Jan
        • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Ingo Strüwing27 Jan
          • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749) Bug#37780Paul DuBois27 Jan
          • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Davi Arnaut28 Jan
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Ingo Strüwing25 Feb