List:Commits« Previous MessageNext Message »
From:Davi Arnaut Date:January 20 2009 8:22pm
Subject:Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)
Bug#37780
View as plain text  
Hi Ingo,

Excellent work, some initial comments below.

On 1/11/09 4:05 PM, Ingo Struewing wrote:
> #At file:///home2/mydev/bzrroot/mysql-6.0-bug37780-2/ based on
> revid:rafal.somla@stripped
>
>   2749 Ingo Struewing	2009-01-11
>        Bug#37780 - main.kill fails randomly
>
>        The test case main.kill did not work reliably.
>
>        The following problems have been identified:
>
>        1. A kill signal could go lost if it came in, short before a thread went
>        reading on the client connection.
>
>        2. A kill signal could go lost if it came in, short before a thread went
>        waiting on a condition variable.
>
>        3. The Debug Sync Facility implementation had flaws that prevented safe
>        synchronization is some cases.
>
>        These problems have been solved as follows. Please see also added code
>        comments for more details.
>
>        1. There is no safe way to detect, when a thread enters the blocking
>        state of a read(2) or recv(2) system call, where it can be interrupted
>        by a signal. Hence it is not possible to wait for the right moment to
>        send a kill signal. To be safe, we need to close the connection. This
>        has the disadvantage, that we cannot send a status or error message.
>        This patch tries to close the read direction of a socket only, which
>        avoids this problem, but terminates a read(2) or recv(2) anyway. If it
>        is not possible to keep the write direction open, we close the
>        connection completely to be safe.

The usual way to do this, as I've seen in other projects, is to use 
non-blocking i/o and have a file descriptor for thread notification.
Have you considered this alternative?

Nonetheless, this approach of closing the socket needs more detailing:

Is it portable? I mean, its it guaranteed (standard) that it will 
interrupt a thread blocked on a read/write?

Will the client stay connected after the read direction is closed?

Why don't we just default to close the socket? Sending a error message 
seems like a corner case that is only meaningful when the connection is 
sitting idle.

Is there anything that prevents the thread being killed to try to read 
or write to the "closed" socket? For example, a re-using case:

thr1> sleeping right before a write(fd=10)
thr2> kill thr1
thr2> close(fd=10)
thr0> new connection, fd=accept()=10
thr1> awakes, write(fd=10)

What happens if we close one direction of a SSL connection?

>        2. Before waiting on a condition variable, we register it together with
>        a synchronizating mutex in THD::mysys_var. After this, we need to test

synchronizating -> synchronizing

>        THD::killed again. At some places we did only test it in a loop
>        condition before the registration. When THD::killed had been set between
>        this test and the registration, we entered waiting without noticing the
>        killed flag.

OK.

>        3a. The string value for the DEBUG_SYNC system variable received '\0'
>        terminators. It could not be re-used. This was a problem for setting
>        sync points in stored procedures/functions. It worked only with the
>        first execution of the procedure/function. Using a user variable instead
>        of a string literal did not work (e.g. SET DEBUG_SYNC= @a). Fixed by a
>        correct retrieval of the set value and ensuring a copy of the string.

This seems like a separate bug fix.

>        3b. A Debug Sync point can time out and it can be killed. In both cases it
>        sets the THD::killed flag. When it times out, it reports an error. When
>        it is killed, it does not report an error. When it does report an error,
>        it must return a TRUE value, so that the calling function does not add
>        send_ok(). The detection of time out was based on the killed flag. This
>        failed when the sync point was killed. Fixed by making the return value
>        dependend from the error reporting state of the thread and not from the
>        killed flag.
>
>        In addition to the above, a re-write of the main.kill test case has been
>        done. Most sleeps have been replaced by Debug Sync Facility
>        synchronization. The test case run time decreased from over 30 to below
>        three seconds. A couple of sync points have been added to the server
>        code. the declarations have been moved from mysql_priv.h to the new file
>        debug_sync.h. This was required to place sync points the the meta data
>        locking code.

I've moved the declarations to a mdl_sync.h header on December 12. You 
might want to pull from main and drop the conflicting changes as mine 
was done in a slightly different way.

http://lists.mysql.com/commits/61466

> added:
>    sql/debug_sync.h
> modified:
>    client/mysql.cc
>    include/my_net.h
>    include/violite.h
>    mysql-test/r/kill.result
>    mysql-test/suite/backup/t/backup_triggers_and_events.test
>    mysql-test/t/debug_sync.test
>    mysql-test/t/kill.test
>    sql/CMakeLists.txt
>    sql/Makefile.am
>    sql/debug_sync.cc
>    sql/event_queue.cc
>    sql/lock.cc
>    sql/mdl.cc
>    sql/mysql_priv.h
>    sql/mysqld.cc
>    sql/slave.cc
>    sql/sql_base.cc
>    sql/sql_class.cc
>    sql/sql_class.h
>    sql/sql_parse.cc
>    sql/sql_select.cc
>    vio/viosocket.c
>
> per-file messages:
>    client/mysql.cc
>      Bug#37780 - main.kill fails randomly

Out of curiosity, why do you add the bug name to each per-file comment?

>      Fixed some memory leaks, which caused several test failures
>      on a Windows debug build. (unrelated)
>    include/my_net.h
>      Bug#37780 - main.kill fails randomly
>      Added definitions for SHUT_WR and SHUT_RD.

"Added Windows definitions for .."

>    include/violite.h
>      Bug#37780 - main.kill fails randomly
>      Added declaration for vio_close_read();
>    mysql-test/r/kill.result
>      Bug#37780 - main.kill fails randomly
>      Updated test result.
>    mysql-test/suite/backup/t/backup_triggers_and_events.test
>      Bug#37780 - main.kill fails randomly
>      Added a forgotten clean up. (unrelated)
>    mysql-test/t/debug_sync.test
>      Bug#37780 - main.kill fails randomly
>      Fixed a comment.
>    mysql-test/t/kill.test
>      Bug#37780 - main.kill fails randomly
>      Re-wrote test case to use Debug Sync points instead of sleeps.
>    sql/CMakeLists.txt
>      Bug#37780 - main.kill fails randomly
>      Added debug_sync.h.
>    sql/Makefile.am
>      Bug#37780 - main.kill fails randomly
>      Added debug_sync.h.
>    sql/debug_sync.cc
>      Bug#37780 - main.kill fails randomly
>      Included debug_sync.h.
>      Added a sync point.
>      Fixed time out detection from thd->killed to thd->is_error().
>      Fixed SET value retrieval and assured a copy is taken.
>      Fixed comments.
>      Added comments.
>    sql/debug_sync.h
>      Bug#37780 - main.kill fails randomly
>      Moved Debug Sync Facility declarations from mysql_priv.h to here.
>    sql/event_queue.cc
>      Bug#37780 - main.kill fails randomly
>      Fixed kill detection in Event_queue::cond_wait().
>    sql/lock.cc
>      Bug#37780 - main.kill fails randomly
>      Moved Debug Sync points behind enter_cond().
>      Fixed comments.
>    sql/mdl.cc
>      Bug#37780 - main.kill fails randomly
>      Included debug_sync.h.
>      Fixed a compiler warning.
>      Added Debug Sync points.
>    sql/mysql_priv.h
>      Bug#37780 - main.kill fails randomly
>      Moved Debug Sync Facility declarations from here to debug_sync.h
>      Included debug_sync.h.
>    sql/mysqld.cc
>      Bug#37780 - main.kill fails randomly
>      Removed an unused enumeration value. (unrelated)
>    sql/slave.cc
>      Bug#37780 - main.kill fails randomly
>      Fixed kill detection in start_slave_thread().
>    sql/sql_base.cc
>      Bug#37780 - main.kill fails randomly
>      Fixed kill detection in close_cached_tables() and
>      tdc_wait_for_old_versions().
>    sql/sql_class.cc
>      Bug#37780 - main.kill fails randomly
>      Encapsulated assignment to THD::killed in a mutex
>      to enforce a memory barrier.
>      Changed vio_close() to vio_close_read(), trying to keep the
>      write direction open. Disabled this for Windows, where closing
>      of the read direction only is insufficient.
>      Fixed and added comments.
>    sql/sql_class.h
>      Bug#37780 - main.kill fails randomly
>      Unconditionally enabled SIGNAL_WITH_VIO_CLOSE with a comment.
>    sql/sql_parse.cc
>      Bug#37780 - main.kill fails randomly
>      Added a sync point in do_command().
>    sql/sql_select.cc
>      Bug#37780 - main.kill fails randomly
>      Added a sync point in JOIN::optimize().
>    vio/viosocket.c
>      Bug#37780 - main.kill fails randomly
>      Added the new function vio_close_read(), which tries to close
>      the read direction of a socket and keep the write direction open.
> === modified file 'client/mysql.cc'
> --- a/client/mysql.cc	2008-12-05 01:05:05 +0000
> +++ b/client/mysql.cc	2009-01-11 18:05:39 +0000
> @@ -1133,10 +1133,13 @@ int main(int argc,char *argv[])
>     if (mysql_server_init(embedded_server_arg_count, embedded_server_args,
>                           (char**) embedded_server_groups))
>     {
> +    /* purecov: begin inspected */
>       put_error(NULL);
>       free_defaults(defaults_argv);
> +    batch_readline_end(status.line_buff);
>       my_end(0);
>       exit(1);
> +    /* purecov: end */

Unrelated, please commit separated so it can be pushed early.

>     }
>     glob_buffer.realloc(512);
>     completion_hash_init(&ht, 128);
> @@ -1203,8 +1206,12 @@ int main(int argc,char *argv[])
>         if (!(histfile_tmp= (char*) my_malloc((uint) strlen(histfile) + 5,
>   					    MYF(MY_WME))))
>         {
> -	fprintf(stderr, "Couldn't allocate memory for temp histfile!\n");
> -	exit(1);
> +        /* purecov: begin inspected */
> +        fprintf(stderr, "Couldn't allocate memory for temp histfile!\n");
> +        status.exit_status= 1;
> +        mysql_end(1);
> +        /* purecov: end */
> +        exit(1); /* purecov: deadcode */

Same as above.

>         }
>         sprintf(histfile_tmp, "%s.TMP", histfile);
>       }
> @@ -1238,11 +1245,11 @@ sig_handler mysql_end(int sig)
>       if (!write_history(histfile_tmp))
>         my_rename(histfile_tmp, histfile, MYF(MY_WME));
>     }
> -  batch_readline_end(status.line_buff);
>     completion_hash_free(&ht);
>     free_root(&hash_mem_root,MYF(0));
> -
>   #endif
> +  batch_readline_end(status.line_buff);
> +
>     if (sig>= 0)
>       put_info(sig ? "Aborted" : "Bye", INFO_RESULT);
>     glob_buffer.free();
>
> === modified file 'include/my_net.h'
> --- a/include/my_net.h	2008-02-14 20:53:25 +0000
> +++ b/include/my_net.h	2009-01-11 18:05:39 +0000
> @@ -61,6 +61,8 @@ C_MODE_START
>     #define SD_BOTH 0x02
>   */
>   #define SHUT_RDWR 0x02
> +#define SHUT_WR   SD_SEND
> +#define SHUT_RD   SD_RECEIVE
>
>   #endif
>
>
> === modified file 'include/violite.h'
> --- a/include/violite.h	2008-04-23 12:33:25 +0000
> +++ b/include/violite.h	2009-01-11 18:05:39 +0000
> @@ -60,6 +60,7 @@ int vio_close_pipe(Vio * vio);
>
>   void	vio_delete(Vio* vio);
>   int	vio_close(Vio* vio);
> +int	vio_close_read(Vio* vio);
>   void    vio_reset(Vio* vio, enum enum_vio_type type,
>                     my_socket sd, HANDLE hPipe, uint flags);
>   size_t	vio_read(Vio *vio, uchar *	buf, size_t size);
>

[..]

>
> === modified file 'mysql-test/t/debug_sync.test'
> --- a/mysql-test/t/debug_sync.test	2008-04-29 09:22:04 +0000
> +++ b/mysql-test/t/debug_sync.test	2009-01-11 18:05:39 +0000
> @@ -206,7 +206,7 @@ SET GLOBAL DEBUG_SYNC= 'p0 CLEAR';
>   #       immediately after setting of the DEBUG_SYNC variable.
>   #       So it is executed before the SET statement ends.
>   #
> -# NOTE: There is only on global signal (say "signal post" or "flag mast").
> +# NOTE: There is only one global signal (say "signal post" or "flag mast").
>   #       A SIGNAL action writes its signal into it ("sets a flag").
>   #       The signal persists until explicitly overwritten.
>   #       To avoid confusion for later tests, it is recommended to clear

Feel free to push this kind of stuff anytime without approval.

>
> === modified file 'mysql-test/t/kill.test'
> --- a/mysql-test/t/kill.test	2008-05-25 07:19:02 +0000
> +++ b/mysql-test/t/kill.test	2009-01-11 18:05:39 +0000
> @@ -4,329 +4,594 @@
>   # This would work if mysqltest run would be threaded and handle each
>   # connection in a separate thread.
>   #
> --- source include/not_embedded.inc
> +--source include/not_embedded.inc
> +--source include/have_debug_sync.inc

Hum, we now restrict kill.test to debug builds. BTW, all tests really 
needed a port to sync points?

[..]

> +
> +--echo # Create two connections for use throughout the test case.
> +--connect (con1, localhost, root,,)
> +--connect (con2, localhost, root,,)
> +
> +--echo #
> +--echo # Test kill connection when blocked in read/recv().
> +--echo #
> +
> +    --echo # connection con1, store own connection id
> +    --connection con1
> +    --let $ID= `SELECT @id := CONNECTION_ID()`
> +
> +        --echo # connection con2, store con1 connection id
> +        --connection con2
> +        --let $ignore= `SELECT @id := $ID`
> +
> +    --echo # connection con1, request a signal from thread end
> +    --connection con1
> +    SET DEBUG_SYNC= 'thread_end SIGNAL con1_end';
> +    #
> +    --echo # connection con1, disable reconnect
> +    --disable_reconnect

Whats up with all those spaces? never seem this on test case and it 
makes things harder to read.

[..]

>
> -###########################################################################
> +--echo # Restore concurrent_insert value. Keep in the end of the test file.
> +SET @@global.concurrent_insert= @old_concurrent_insert;
>
> -# Restore global concurrent_insert value. Keep in the end of the test file.
> -set @@global.concurrent_insert= @old_concurrent_insert;
>
> === modified file 'sql/CMakeLists.txt'
> --- a/sql/CMakeLists.txt	2008-11-12 01:37:42 +0000
> +++ b/sql/CMakeLists.txt	2009-01-11 18:05:39 +0000
> @@ -57,7 +57,7 @@ ADD_EXECUTABLE(mysqld
>                  sql_error.cc sql_handler.cc sql_help.cc sql_insert.cc sql_lex.cc
>                  sql_list.cc sql_load.cc sql_manager.cc sql_map.cc sql_parse.cc
>                  sql_partition.cc sql_plugin.cc sql_prepare.cc sql_rename.cc
> -               debug_sync.cc
> +               debug_sync.cc debug_sync.h
>                  sql_repl.cc sql_select.cc sql_show.cc sql_state.c sql_string.cc
>                  sql_table.cc sql_test.cc sql_trigger.cc sql_udf.cc sql_union.cc
>                  sql_update.cc sql_view.cc strfunc.cc table.cc thr_malloc.cc
>
> === modified file 'sql/Makefile.am'
> --- a/sql/Makefile.am	2008-12-03 16:53:23 +0000
> +++ b/sql/Makefile.am	2009-01-11 18:05:39 +0000
> @@ -78,6 +78,7 @@ noinst_HEADERS =	item.h item_func.h item
>   			ha_ndbcluster_binlog.h ha_ndbcluster_tables.h \
>   			ha_ndbcluster_connection.h ha_ndbcluster_connection.h \
>   			ha_partition.h rpl_constants.h \
> +			debug_sync.h \
>   			opt_range.h protocol.h rpl_tblmap.h rpl_utility.h \
>   			rpl_reporting.h \
>   			log.h sql_show.h rpl_rli.h rpl_mi.h \
>
> === modified file 'sql/debug_sync.cc'
> --- a/sql/debug_sync.cc	2008-11-20 11:01:12 +0000
> +++ b/sql/debug_sync.cc	2009-01-11 18:05:39 +0000
> @@ -13,11 +13,11 @@
>      along with this program; if not, write to the Free Software
>      Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
>
> -/*
> -  === Debug Sync Facility. ===
> +/**
> +  == Debug Sync Facility ==
>
> -  The Debug Sync Facility allows to place synchronization points in the
> -  code:
> +  The Debug Sync Facility allows placement of synchronization points in
> +  the server code by using the DEBUG_SYNC macro:
>
>         open_tables(...)
>
> @@ -27,7 +27,7 @@
>
>     When activated, a sync point can
>
> -    - Send a signal and/or
> +    - Emit a signal and/or
>       - Wait for a signal
>
>     Nomenclature:
> @@ -38,7 +38,7 @@
>                             or "flag mast". Then the signal is what is
>                             attached to the "signal post" or "flag mast".
>
> -    - send a signal:      Assign the value (the signal) to the global
> +    - emit a signal:      Assign the value (the signal) to the global
>                             variable ("set a flag") and broadcast a
>                             global condition to wake those waiting for
>                             a signal.
> @@ -46,16 +46,17 @@
>       - wait for a signal:  Loop over waiting for the global condition until
>                             the global value matches the wait-for signal.
>
> -  By default, all sync points are inactive. They do nothing (except of
> -  burning a couple of CPU cycles for checking if they are active).
> +  By default, all sync points are inactive. They do nothing (except to
> +  burn a couple of CPU cycles for checking if they are active).
>
> -  A sync point becomes active when an action is requested for it:
> +  A sync point becomes active when an action is requested for it.
> +  To do so, put a line like this in the test case file:
>
>         SET DEBUG_SYNC= 'after_open_tables SIGNAL opened WAIT_FOR flushed';
>
>     This activates the sync point 'after_open_tables'. It requests it to
> -  send the signal 'opened' and wait for another thread to send the signal
> -  'flushed' when the threads execution runs through the sync point.
> +  emit the signal 'opened' and wait for another thread to emit the signal
> +  'flushed' when the thread's execution runs through the sync point.
>
>     For every sync point there can be one action per thread only. Every
>     thread can request multiple actions, but only one per sync point. In
> @@ -73,14 +74,14 @@
>
>     When conn1 runs through the INSERT statement, it hits the sync point
>     'after_open_tables'. It notices that it is active and executes its
> -  action. It sends the signal 'opened' and waits for another thread to
> -  send the signal 'flushed'.
> +  action. It emits the signal 'opened' and waits for another thread to
> +  emit the signal 'flushed'.
>
>     conn2 waits immediately at the special sync point 'now' for another
> -  thread to send the 'opened' signal.
> +  thread to emit the 'opened' signal.
>
>     A signal remains in effect until it is overwritten. If conn1 signals
> -  'opened' before conn2 reaches 'now', it will still find the 'opened'
> +  'opened' before conn2 reaches 'now', conn2 will still find the 'opened'
>     signal. It does not wait in this case.
>
>     When conn2 reaches 'after_abort_locks', it signals 'flushed', which lets
> @@ -92,16 +93,16 @@
>
>         SET DEBUG_SYNC= 'name SIGNAL sig EXECUTE 3';
>
> -  This will set an activation counter to 3. Each execution decrements the
> -  counter. After the third execution the sync point becomes inactive in
> -  this example.
> +  This sets the signal point's activation counter to 3. Each execution
> +  decrements the counter. After the third execution the sync point
> +  becomes inactive.
>
>     One of the primary goals of this facility is to eliminate sleeps from
>     the test suite. In most cases it should be possible to rewrite test
>     cases so that they do not need to sleep. (But this facility cannot
> -  synchronize multiple processes.) However to support developing of the
> -  tests, and as a last resort, sync point waiting times out. There is a
> -  default timeout, but it can be overridden:
> +  synchronize multiple processes.) However, to support test development,
> +  and as a last resort, sync point waiting times out. There is a default
> +  timeout, but it can be overridden:
>
>         SET DEBUG_SYNC= 'name WAIT_FOR sig TIMEOUT 10 EXECUTE 2';
>
> @@ -120,7 +121,7 @@
>
>         SET DEBUG_SYNC= 'name SIGNAL sig EXECUTE 2 HIT_LIMIT 3';
>
> -  Here the first two hits send the signal, the third hit returns the error
> +  Here the first two hits emit the signal, the third hit returns the error
>     message and kills the query.
>
>     For cases where you are not sure that an action is taken and thus
> @@ -134,13 +135,13 @@
>
>     This is the only way to reset the global signal to an empty string.
>
> -  For testing of the facility itself you can test a sync point just as
> -  if it had been hit:
> +  For testing of the facility itself you can execute a sync point just
> +  as if it had been hit:
>
>         SET DEBUG_SYNC= 'name TEST';
>
>
> -  ==== Formal Syntax ====
> +  === Formal Syntax ===
>
>     The string to "assign" to the DEBUG_SYNC variable can contain:
>
> @@ -155,9 +156,10 @@
>     separated by '&|' must be present or both of them.
>
>
> -  ==== Activation/Deactivation ====
> +  === Activation/Deactivation ===
>
>     The facility is an optional part of the MySQL server.
> +  It is enabled in a debug server by default.
>
>         ./configure --enable-debug-sync
>
> @@ -178,8 +180,27 @@
>
>         mysql-test-run.pl ... --debug-sync-timeout=10 ...
>
> +  The command line option influences the readable value of the system
> +  variable 'debug_sync'.
> +
> +  * If the facility is not compiled in, the system variable does not exist.
> +
> +  * If --debug-sync-timeout=0 the value of the variable reads as "OFF".
> +
> +  * Otherwise the value reads as "ON - current signal: " followed by the
> +    current signal string, which can be empty.
> +
> +  The readable variable value is the same, regardless if read as global
> +  or session value.
> +
> +  Setting the 'debug-sync' system variable requires 'SUPER' privilege.
> +  You can never read back the string that you assigned to the variable,
> +  unless you assign the value that the variable does already have. But
> +  that would give a parse error. A syntactically correct string is
> +  parsed into a debug sync action and stored apart from the variable value.
> +
>
> -  ==== Implementation ====
> +  === Implementation ===
>
>     Pseudo code for a sync point:
>
> @@ -195,7 +216,79 @@
>     new action, the array is sorted again.
>
>
> -  ==== Further reading ====
> +  === A typical synchronization pattern ===
> +
> +  There are quite a few places in MySQL, where we use a synchronization
> +  pattern like this:
> +
> +  pthread_mutex_lock(&mutex);
> +  thd->enter_cond(&condition_variable,&mutex, new_message);
> +  #if defined(ENABLE_DEBUG_SYNC)
> +  if (!thd->killed&&  !end_of_wait_condition)
> +     DEBUG_SYNC(thd, "sync_point_name");
> +  #endif
> +  while (!thd->killed && !end_of_wait_condition)
> +    pthread_cond_wait(&condition_variable,&mutex);
> +  thd->exit_cond(old_message);
> +
> +  Here some explanations:
> +
> +  thd->enter_cond() is used to register the condition variable and the
> +  synchronizating mutex in thd->mysys_var. This is done to allow the
> +  thread to be interrupted (killed) from its sleep. Another thread can
> +  find the condition variable to signal and mutex to use for
> +  synchronization in this thread's THD::mysys_var.
> +
> +  thd->enter_cond() requires the mutex to be acquired in advance.
> +
> +  thd->exit_cond() unregisters the condition variable and mutex and
> +  releases the mutex.
> +
> +  If you want to have a Debug Sync point with the wait, please place it
> +  behind enter_cond(). Only then you can safely decide, if the wait will
> +  be taken. Also you will have THD::proc_info correct when the sync
> +  point emits a signal. DEBUG_SYNC sets its own proc_info, but restores
> +  the previous one before releasing its internal mutex. As soon as
> +  another thread sees the signal, it does also see the proc_info from
> +  before entering the sync point. In this case it will be "new_message",
> +  which is associated with the wait that is to be synchronized.
> +
> +  In the example above, the wait condition is repeated before the sync
> +  point. This is done to skip the sync point, if no wait takes place.
> +  The sync point is before the loop (not inside the loop) to have it hit
> +  once only. It is possible that the condition variable is signalled

signalled -> signaled

> +  multiple times without the wait condition to be true.
> +
> +  A bit off-topic: At some places, the loop is taken around the whole
> +  synchronization pattern:
> +
> +  while (!thd->killed&&  !end_of_wait_condition)
> +  {
> +    pthread_mutex_lock(&mutex);
> +    thd->enter_cond(&condition_variable,&mutex, new_message);
> +    if (!thd->killed [&&  !end_of_wait_condition])
> +    {
> +      [DEBUG_SYNC(thd, "sync_point_name");]
> +      pthread_cond_wait(&condition_variable,&mutex);
> +    }
> +    thd->exit_cond(old_message);
> +  }
> +
> +  Note that it is important to repeat the test for thd->killed after
> +  enter_cond(). Otherwise the killing thread may kill this thread after
> +  it tested thd->killed in the loop condition and bfore it registered

bfore > before

> +  the condition variable and mutex in enter_cond(). In this case, the
> +  killing thread does not know that this thread is going to wait on a
> +  condition variable. It would just set THD::killed. But if we would not
> +  test it again, we would go asleep though we are killed. If the killing
> +  thread would kill us when we are after the second test, but still
> +  before sleeping, we hold the mutex, which is registered in mysys_var.
> +  The killing thread would try to acquire the mutex before signalling

signalling -> signaling

> +  the condition variable. Since the mutex is only released implicitly in
> +  pthread_cond_wait(), the signalling happens at the right place. We
> +  have a safe synchronization.

I think you should just drop the above paragraph and point out that 
people should know how to properly use wait conditions.

> +  === Further reading ===
>
>     For a discussion of other methods to synchronize threads see
>     http://forge.mysql.com/wiki/MySQL_Internals_Test_Synchronization
> @@ -206,8 +299,21 @@
>     See also worklog entry WL#4259 - Test Synchronization Facility
>   */
>
> +/*
> +  Due to weaknesses in our include files, we need to include
> +  mysql_priv.h here. To have THD declared, we need to include
> +  sql_class.h. This includes log_event.h, which in turn requires
> +  declarations from mysql_priv.h (e.g. OPTION_AUTO_IS_NULL).
> +  mysql_priv.h includes almost everything, so is sufficient here.
> +*/
>   #include "mysql_priv.h"
>
> +/*
> +  This is included from mysql_priv.h already.
> +  Repeated here to document the normal way to use this facility.
> +*/
> +#include "debug_sync.h"
> +
>   #if defined(ENABLED_DEBUG_SYNC)
>
>   /*
> @@ -223,7 +329,7 @@ struct st_debug_sync_action
>     ulong         hit_limit;              /* hits before kill query */
>     ulong         execute;                /* executes before self-clear */
>     ulong         timeout;                /* wait_for timeout */
> -  String        signal;                 /* signal to send */
> +  String        signal;                 /* signal to emit */

You can push this s/send/emit/ noise anytime..

>     String        wait_for;               /* signal to wait for */
>     String        sync_point;             /* sync point name */
>     bool          need_sort;              /* if new action, array needs sort */
> @@ -458,6 +564,12 @@ void debug_sync_end_thread(THD *thd)
>     {
>       st_debug_sync_control *ds_control= thd->debug_sync_control;
>
> +    /*
> +      This synchronization point can be used to synchronize on thread end.
> +      This is the latest point in a THD's life, where this can be done.
> +    */
> +    DEBUG_SYNC(thd, "thread_end");
> +
>       if (ds_control->ds_action)
>       {
>         st_debug_sync_action *action= ds_control->ds_action;
> @@ -969,11 +1081,12 @@ static bool debug_sync_set_action(THD *t
>         The calling functions will not call send_ok(), if we return TRUE
>         from this function.
>
> -      To detect, if an error message has been reported, we can check one
> -      of the above conditions. That is, either examine the error message
> -      stack for an entry, or check thd->killed. We do the latter below.
> +      thd->killed is also set if the wait is interrupted from a
> +      KILL or KILL QUERY statement. In this case, no error is reported
> +      and shall not be reported as a result of SET DEBUG_SYNC.
> +      Hence, we check for the first condition above.
>       */
> -    if (thd->killed)
> +    if (thd->is_error())
>         DBUG_RETURN(TRUE);
>     }
>
> @@ -984,9 +1097,9 @@ static bool debug_sync_set_action(THD *t
>   /**
>     Extract a token from a string.
>
> -  @param[out]   token_p         returns start of token
> -  @param[out]   token_length_p  returns length of token
> -  @param[in]    ptr             current string pointer
> +  @param[out]     token_p         returns start of token
> +  @param[out]     token_length_p  returns length of token
> +  @param[in,out]  ptr             current string pointer, adds '\0' terminators
>
>     @return       string pointer or NULL
>       @retval     != NULL         ptr behind token terminator or at string end
> @@ -1468,17 +1581,40 @@ bool sys_var_debug_sync::check(THD *thd,
>       "Setting" of the system variable 'debug_sync' does not mean to
>       assign a value to it as usual. Instead a debug sync action is parsed
>       from the input string and stored apart from the variable value.
> +
> +  @note
> +    For efficiency reasons, the action string parser places '\0'
> +    terminators in the string. So we need to take a copy here.
>   */
>
>   bool sys_var_debug_sync::update(THD *thd, set_var *var)
>   {
> -  char empty= '\0';
> -  char *val_str= var ? var->value->str_value.c_ptr() :∅
> +  char   *val_str;
> +  String *val_ptr;
> +  String val_buf;
>     DBUG_ENTER("sys_var_debug_sync::update");
>     DBUG_ASSERT(thd);
>
> +  /*
> +    Depending on the value type (string literal, user variable, ...)
> +    val_buf receives a copy of the value or not. But we always need
> +    a copy. So we take a copy, if it is not done by val_str().
> +    If val_str() puts a copy into val_buf, then it returns&val_buf,
> +    otherwise it returns a pointer to the string object that we need
> +    to copy.
> +  */
> +  val_ptr= var ? var->value->val_str(&val_buf) :&val_buf;
> +  if (val_ptr !=&val_buf)
> +  {
> +    val_buf.copy(*val_ptr);
> +  }
> +  val_str= val_buf.c_ptr();
>     DBUG_PRINT("debug_sync", ("set action: '%s'", val_str));
>
> +  /*
> +    debug_sync_eval_action() places '\0' in the string, which itself
> +    must be '\0' terminated.
> +  */
>     DBUG_RETURN(opt_debug_sync_timeout ?
>                 debug_sync_eval_action(thd, val_str) :
>                 FALSE);
> @@ -1596,6 +1732,7 @@ static void debug_sync_execute(THD *thd,
>         threads just reads an old cached version of the signal.
>       */
>       pthread_mutex_lock(&debug_sync_global.ds_mutex);
> +
>       if (action->signal.length())
>       {
>         /* Copy the signal to the global variable. */
> @@ -1664,9 +1801,16 @@ static void debug_sync_execute(THD *thd,
>           }
>           error= 0;
>         }
> -      DBUG_PRINT("debug_sync_exec", ("%s from '%s'  at: '%s'",
> -                                     error ? "timeout" : "resume",
> -                                     sig_wait, dsp_name));
> +      DBUG_EXECUTE("debug_sync_exec",
> +                   if (thd->killed)
> +                     DBUG_PRINT("debug_sync_exec",
> +                                ("killed %d from '%s'  at: '%s'",
> +                                 thd->killed, sig_wait, dsp_name));
> +                   else
> +                     DBUG_PRINT("debug_sync_exec",
> +                                ("%s from '%s'  at: '%s'",
> +                                 error ? "timeout" : "resume",
> +                                 sig_wait, dsp_name)););
>
>         /*
>           We don't use enter_cond()/exit_cond(). They do not save old
> @@ -1682,12 +1826,12 @@ static void debug_sync_execute(THD *thd,
>         thd_proc_info(thd, old_proc_info);
>         pthread_mutex_unlock(&thd->mysys_var->mutex);
>
> -    } /* end if (action->wait_for.length()) */
> +    }
>       else
>       {
> -      // Make sure to realease mutex also when there is no WAIT_FOR
> +      /* In case we don't wait, we just release the mutex. */
>         pthread_mutex_unlock(&debug_sync_global.ds_mutex);
> -    }
> +    } /* end if (action->wait_for.length()) */
>
>     } /* end if (action->execute) */
>
>
> === added file 'sql/debug_sync.h'
> --- a/sql/debug_sync.h	1970-01-01 00:00:00 +0000
> +++ b/sql/debug_sync.h	2009-01-11 18:05:39 +0000
> @@ -0,0 +1,53 @@
> +/* Copyright (C) 2008 Sun Microsystems, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; version 2 of the License.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program; if not, write to the Free Software
> +   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
> +
> +/**
> +  @file
> +
> +  Declarations for the Debug Sync Facility. See debug_sync.cc for details.
> +*/
> +
> +#ifndef DEBUG_SYNC_H
> +#define DEBUG_SYNC_H
> +
> +#if defined(ENABLED_DEBUG_SYNC)
> +
> +/* Macro to be put in the code at synchronization points. */
> +#define DEBUG_SYNC(_thd_, _sync_point_name_)                            \
> +          do { if (unlikely(opt_debug_sync_timeout))                    \
> +               debug_sync(_thd_, STRING_WITH_LEN(_sync_point_name_));   \
> +             } while (0)
> +
> +/* Command line option --debug-sync-timeout. See mysqld.cc. */
> +extern uint opt_debug_sync_timeout;
> +
> +/* Default WAIT_FOR timeout if command line option is given without argument. */
> +#define DEBUG_SYNC_DEFAULT_WAIT_TIMEOUT 300
> +
> +/* Debug Sync prototypes. See debug_sync.cc. */
> +extern int  debug_sync_init(void);
> +extern void debug_sync_end(void);
> +extern void debug_sync_init_thread(THD *thd);
> +extern void debug_sync_end_thread(THD *thd);
> +extern void debug_sync(THD *thd, const char *sync_point_name, size_t name_len);
> +
> +#else /* defined(ENABLED_DEBUG_SYNC) */
> +
> +#define DEBUG_SYNC(_thd_, _sync_point_name_)    /* disabled DEBUG_SYNC */
> +
> +#endif /* defined(ENABLED_DEBUG_SYNC) */
> +
> +#endif /* DEBUG_SYNC_H */
> +
>
> === modified file 'sql/event_queue.cc'
> --- a/sql/event_queue.cc	2008-05-14 13:49:41 +0000
> +++ b/sql/event_queue.cc	2009-01-11 18:05:39 +0000
> @@ -723,6 +723,18 @@ Event_queue::unlock_data(const char *fun
>         abstime If not null then call pthread_cond_timedwait()
>         func    Which function is requesting cond_wait
>         line    On which line cond_wait is requested
> +
> +  @note 'killed': It is not sufficient to test thd->killed at the top of
> +  the loop (from which this function is called). If the thread is killed
> +  after it tested thd->killed and before it registered the mutex (here:
> +  LOCK_event_queue) in its mysys_var (in enter_cond()), the killing
> +  thread won't do anything but set thd->killed. If we don't check it
> +  again after registering the mutex, we would start waiting, though
> +  thd->killed is set. The solution is to check thd->killed again, after
> +  registering the mutex and before start waiting. An additional
> +  advantage is the memory barrier, included in the mutex handling. This
> +  prevents out-of-order execution in the processor. Thus we cannot read
> +  an old value.
>   */

Let's shorten or drop this comment, its already explained in various 
other places.

>
>   void
> @@ -737,11 +749,15 @@ Event_queue::cond_wait(THD *thd, struct
>
>     thd->enter_cond(&COND_queue_state,&LOCK_event_queue, msg);
>
> -  DBUG_PRINT("info", ("pthread_cond_%swait", abstime? "timed":""));
> -  if (!abstime)
> -    pthread_cond_wait(&COND_queue_state,&LOCK_event_queue);
> -  else
> -    pthread_cond_timedwait(&COND_queue_state,&LOCK_event_queue, abstime);
> +  /* See function comment for 'killed'. */
> +  if (!thd->killed)
> +  {
> +    DBUG_PRINT("info", ("pthread_cond_%swait", abstime? "timed":""));
> +    if (!abstime)
> +      pthread_cond_wait(&COND_queue_state,&LOCK_event_queue);
> +    else
> +      pthread_cond_timedwait(&COND_queue_state,&LOCK_event_queue, abstime);
> +  }
>
>     mutex_last_locked_in_func= func;
>     mutex_last_locked_at_line= line;
>
> === modified file 'sql/lock.cc'
> --- a/sql/lock.cc	2008-10-23 16:29:52 +0000
> +++ b/sql/lock.cc	2009-01-11 18:05:39 +0000
> @@ -1127,6 +1127,14 @@ bool lock_global_read_lock(THD *thd)
>       const char *new_message= "Waiting to get readlock";
>       (void) pthread_mutex_lock(&LOCK_global_read_lock);
>
> +   
> old_message=thd->enter_cond(&COND_global_read_lock,&LOCK_global_read_lock,
> +                                new_message);
> +    DBUG_PRINT("info",
> +               ("waiting_for: %d  protect_against: %d",
> +                waiting_for_read_lock, protect_against_global_read_lock));
> +
> +    waiting_for_read_lock++;

Care to explain why this is being moved?

> +
>   #if defined(ENABLED_DEBUG_SYNC)
>       /*
>         The below sync point fires if we have to wait for
> @@ -1135,27 +1143,18 @@ bool lock_global_read_lock(THD *thd)
>         WARNING: Beware to use WAIT_FOR with this sync point. We hold
>         LOCK_global_read_lock here.
>
> -      Call the sync point before calling enter_cond() as it does use
> -      enter_cond() and exit_cond() itself if a WAIT_FOR action is
> -      executed in spite of the above warning.
> +      Call the sync point after enter_cond() so that the proc_info is
> +      available immediately after the sync point sends a SIGNAL. This
> +      makes tests more reliable.
>
> -      Pre-set proc_info so that it is available immediately after the
> -      sync point sends a SIGNAL. This makes tests more reliable.
> +      Call the sync point before the loop so that it is executed once only.
>       */
> -    if (protect_against_global_read_lock)
> +    if (protect_against_global_read_lock&&  !thd->killed)
>       {
> -      thd_proc_info(thd, new_message);
>         DEBUG_SYNC(thd, "wait_lock_global_read_lock");
>       }
>   #endif /* defined(ENABLED_DEBUG_SYNC) */
>
> -   
> old_message=thd->enter_cond(&COND_global_read_lock,&LOCK_global_read_lock,
> -                                new_message);
> -    DBUG_PRINT("info",
> -	       ("waiting_for: %d  protect_against: %d",
> -		waiting_for_read_lock, protect_against_global_read_lock));
> -
> -    waiting_for_read_lock++;
>       while (protect_against_global_read_lock&&  !thd->killed)
>         pthread_cond_wait(&COND_global_read_lock,&LOCK_global_read_lock);
>       waiting_for_read_lock--;
> @@ -1271,6 +1270,9 @@ bool wait_if_global_read_lock(THD *thd,
>         DBUG_RETURN(is_not_commit);
>       }
>
> +   
> old_message=thd->enter_cond(&COND_global_read_lock,&LOCK_global_read_lock,
> +                                new_message);
> +
>   #if defined(ENABLED_DEBUG_SYNC)
>       /*
>         The below sync point fires if we have to wait for
> @@ -1279,22 +1281,19 @@ bool wait_if_global_read_lock(THD *thd,
>         WARNING: Beware to use WAIT_FOR with this sync point. We hold
>         LOCK_global_read_lock here.
>
> -      Call the sync point before calling enter_cond() as it does use
> -      enter_cond() and exit_cond() itself if a WAIT_FOR action is
> -      executed in spite of the above warning.
> +      Call the sync point after enter_cond() so that the proc_info is
> +      available immediately after the sync point sends a SIGNAL. This
> +      makes tests more reliable.
>
> -      Pre-set proc_info so that it is available immediately after the
> -      sync point sends a SIGNAL. This makes tests more reliable.
> +      Call the sync point before the loop so that it is executed once only.
>       */
> -    if (must_wait)
> +    if (must_wait&&  ! thd->killed&&
> +        (!abort_on_refresh || thd->version == refresh_version))
>       {
> -      thd_proc_info(thd, new_message);
>         DEBUG_SYNC(thd, "wait_if_global_read_lock");
>       }
>   #endif /* defined(ENABLED_DEBUG_SYNC) */
>
> -   
> old_message=thd->enter_cond(&COND_global_read_lock,&LOCK_global_read_lock,
> -                                new_message);
>       while (must_wait&&  ! thd->killed&&
>   	(!abort_on_refresh || thd->version == refresh_version))
>       {
>
> === modified file 'sql/mdl.cc'
> --- a/sql/mdl.cc	2008-10-14 12:08:56 +0000
> +++ b/sql/mdl.cc	2009-01-11 18:05:39 +0000
> @@ -15,6 +15,7 @@
>
>
>   #include "mdl.h"
> +#include "debug_sync.h"
>
>   #include<hash.h>
>   #include<mysqld_error.h>
> @@ -627,8 +628,8 @@ static bool can_grant_lock(MDL_LOCK *loc
>     case MDL_SHARED_HIGH_PRIO:
>       if ((lock->active_exclusive.is_empty()&&
>            (lock_data->type == MDL_SHARED_HIGH_PRIO ||
> -          lock->waiting_exclusive.is_empty()&&
> -          lock->active_shared_waiting_upgrade.is_empty())) ||
> +          (lock->waiting_exclusive.is_empty()&&
> +           lock->active_shared_waiting_upgrade.is_empty()))) ||
>           (!lock->active_exclusive.is_empty()&&
>            lock->active_exclusive.head()->ctx == lock_data->ctx))
>       {
> @@ -890,6 +891,7 @@ bool mdl_acquire_exclusive_locks(MDL_CON
>       }
>       if (!lock_data)
>         break;
> +    DEBUG_SYNC(context->thd, "before_wait_excl_mdl_lock");
>       if (signalled)
>         pthread_cond_wait(&COND_mdl,&LOCK_mdl);
>       else
> @@ -1021,6 +1023,7 @@ bool mdl_upgrade_shared_lock_to_exclusiv
>         }
>       }
>
> +    DEBUG_SYNC(context->thd, "before_wait_upgrade_mdl_lock");
>       if (signalled)
>         pthread_cond_wait(&COND_mdl,&LOCK_mdl);
>       else
> @@ -1154,6 +1157,25 @@ bool mdl_acquire_global_shared_lock(MDL_
>     global_lock.waiting_shared++;
>     old_msg= MDL_ENTER_COND(context, mysys_var);
>
> +#if defined(ENABLED_DEBUG_SYNC)
> +  /*
> +    The below sync point fires if we have to wait for a table lock.
> +
> +    WARNING: Beware to use WAIT_FOR with this sync point. We hold
> +    LOCK_mdl here.
> +
> +    Call the sync point after enter_cond() so that the proc_info is
> +    available immediately after the sync point sends a SIGNAL. This
> +    makes tests more reliable.
> +
> +    Call the sync point before the loop so that it is executed once only.
> +  */
> +  if (!mysys_var->abort && global_lock.active_intention_exclusive)
> +  {
> +    DEBUG_SYNC(context->thd, "wait_mdl_global_shared");
> +  }
> +#endif /* defined(ENABLED_DEBUG_SYNC) */

Please drop, unused sync point.

> +
>     while (!mysys_var->abort && global_lock.active_intention_exclusive)
>       pthread_cond_wait(&COND_mdl,&LOCK_mdl);
>
> @@ -1233,6 +1255,7 @@ bool mdl_wait_for_locks(MDL_CONTEXT *con
>         pthread_mutex_unlock(&LOCK_mdl);
>         break;
>       }
> +    DEBUG_SYNC(context->thd, "before_wait_mdl_locks");
>       pthread_cond_wait(&COND_mdl,&LOCK_mdl);
>       /* As a side-effect MDL_EXIT_COND() unlocks LOCK_mdl. */
>       MDL_EXIT_COND(context, mysys_var, old_msg);
>
> === modified file 'sql/mysql_priv.h'
> --- a/sql/mysql_priv.h	2008-12-16 20:54:07 +0000
> +++ b/sql/mysql_priv.h	2009-01-11 18:05:39 +0000
> @@ -613,27 +613,6 @@ void debug_sync_point(const char* lock_n
>   #define DBUG_SYNC_POINT(lock_name,lock_timeout)
>   #endif /* EXTRA_DEBUG */
>
> -/* Debug Sync Facility. */
> -#if defined(ENABLED_DEBUG_SYNC)
> -/* Macro to be put in the code at synchronization points. */
> -#define DEBUG_SYNC(_thd_, _sync_point_name_)                            \
> -          do { if (unlikely(opt_debug_sync_timeout))                    \
> -               debug_sync(_thd_, STRING_WITH_LEN(_sync_point_name_));   \
> -             } while (0)
> -/* Command line option --debug-sync-timeout. See mysqld.cc. */
> -extern uint opt_debug_sync_timeout;
> -/* Default WAIT_FOR timeout if command line option is given without argument. */
> -#define DEBUG_SYNC_DEFAULT_WAIT_TIMEOUT 300
> -/* Debug Sync prototypes. See debug_sync.cc. */
> -extern int  debug_sync_init(void);
> -extern void debug_sync_end(void);
> -extern void debug_sync_init_thread(THD *thd);
> -extern void debug_sync_end_thread(THD *thd);
> -extern void debug_sync(THD *thd, const char *sync_point_name, size_t name_len);
> -#else /* defined(ENABLED_DEBUG_SYNC) */
> -#define DEBUG_SYNC(_thd_, _sync_point_name_)    /* disabled DEBUG_SYNC */
> -#endif /* defined(ENABLED_DEBUG_SYNC) */
> -
>   #define BACKUP_WAIT_TIMEOUT_DEFAULT 50
>
>   /* BINLOG_DUMP options */
> @@ -917,6 +896,7 @@ bool general_log_write(THD *thd, enum en
>   #include "sql_servers.h"
>   #include "records.h"
>   #include "opt_range.h"
> +#include "debug_sync.h"
>
>   #ifdef HAVE_QUERY_CACHE
>   struct Query_cache_query_flags
>
> === modified file 'sql/mysqld.cc'
> --- a/sql/mysqld.cc	2008-12-16 20:54:07 +0000
> +++ b/sql/mysqld.cc	2009-01-11 18:05:39 +0000
> @@ -5804,7 +5804,6 @@ enum options_mysqld
>     OPT_BACKUP_PROGRESS_LOG,
>     OPT_SLOW_LOG,
>     OPT_THREAD_HANDLING,
> -  OPT_INNODB_ROLLBACK_ON_TIMEOUT,
>     OPT_SECURE_FILE_PRIV,
>     OPT_MIN_EXAMINED_ROW_LIMIT,
>     OPT_LOG_SLOW_SLAVE_STATEMENTS,

Push separatedly. You don't need approval for trivial stuff.

>
> === modified file 'sql/slave.cc'
> --- a/sql/slave.cc	2008-12-08 11:29:15 +0000
> +++ b/sql/slave.cc	2009-01-11 18:05:39 +0000
> @@ -557,9 +557,17 @@ int start_slave_thread(pthread_handler h
>       while (start_id == *slave_run_id)
>       {
>         DBUG_PRINT("sleep",("Waiting for slave thread to start"));
> -      const char* old_msg = thd->enter_cond(start_cond,cond_lock,
> -                                            "Waiting for slave thread to start");
> -      pthread_cond_wait(start_cond,cond_lock);
> +      const char* old_msg= thd->enter_cond(start_cond,cond_lock,
> +                                           "Waiting for slave thread to start");
> +      /*
> +        It is not sufficient to test this at loop bottom. We must test
> +        it after registering the mutex in enter_cond(). If the kill
> +        happens after testing of thd->killed and before the mutex is
> +        registered, we would otherwise go waiting though thd->killed is
> +        set.
> +      */
> +      if (!thd->killed)
> +        pthread_cond_wait(start_cond,cond_lock);
>         thd->exit_cond(old_msg);
>         pthread_mutex_lock(cond_lock); // re-acquire it as exit_cond() released
>         if (thd->killed)

BTW, don't we need to release start_lock on this error path?

>
> === modified file 'sql/sql_base.cc'
> --- a/sql/sql_base.cc	2008-12-08 11:29:15 +0000
> +++ b/sql/sql_base.cc	2009-01-11 18:05:39 +0000
> @@ -886,6 +886,18 @@ static void kill_delayed_threads_for_tab
>           when some write-locked table is being reopened (by FLUSH TABLES or
>           ALTER TABLE) we have to rely on additional global shared metadata
>           lock taken by thread trying to obtain global read lock.
> +
> +  @note 'killed': It is not sufficient to test thd->killed at the top of
> +  the loop. If the thread is killed after it tested thd->killed and
> +  before it registered the mutex (here: LOCK_open) in its mysys_var (in
> +  enter_cond()), the killing thread won't do anything but set
> +  thd->killed. If we don't check it again after registering the mutex,
> +  we would start waiting, though thd->killed is set. The solution is to
> +  check thd->killed again, after registering the mutex and before start
> +  waiting. Since we cannot 'break' between enter_cond() and exit_cond(),
> +  we just omit the wait. An additional advantage is the memory barrier,
> +  included in the mutex handling. This prevents out-of-order execution
> +  in the processor. Thus we cannot read an old value.
>   */
>
>   bool close_cached_tables(THD *thd, TABLE_LIST *tables, bool have_lock,
> @@ -1015,7 +1027,8 @@ bool close_cached_tables(THD *thd, TABLE
>         }
>       }
>
> -    if (found)
> +    /* See function comment for 'killed'. */
> +    if (found&&  ! thd->killed)
>       {
>         DBUG_PRINT("signal", ("Waiting for COND_refresh"));
>         pthread_cond_wait(&COND_refresh,&LOCK_open);
> @@ -7644,11 +7657,25 @@ void tdc_remove_table(THD *thd, enum_tdc
>
>
>   /**
> -   Wait until there are no old versions of tables in the table
> -   definition cache for the metadata locks that we try to acquire.
> +  Wait until there are no old versions of tables in the table
> +  definition cache for the metadata locks that we try to acquire.
> +
> +  @param[in]    thd             thread context
> +  @param[in]    context         metadata locking context with locks
> +
> +  @return       killed          if the thread has been killed
>
> -   @param thd      Thread context
> -   @param context  Metadata locking context with locks.
> +  @note 'killed': It is not sufficient to test thd->killed at the top of
> +  the loop. If the thread is killed after it tested thd->killed and
> +  before it registered the mutex (here: LOCK_open) in its mysys_var (in
> +  enter_cond()), the killing thread won't do anything but set
> +  thd->killed. If we don't check it again after registering the mutex,
> +  we would start waiting, though thd->killed is set. The solution is to
> +  check thd->killed again, after registering the mutex and before start
> +  waiting. Since we cannot 'break' between enter_cond() and exit_cond(),
> +  we just omit the wait. An additional advantage is the memory barrier,
> +  included in the mutex handling. This prevents out-of-order execution
> +  in the processor. Thus we cannot read an old value.

Theorically, this only applies for the same mutex. Better stick to 
pthread rules.

>   */
>
>   static bool tdc_wait_for_old_versions(THD *thd, MDL_CONTEXT *context)
> @@ -7685,7 +7712,17 @@ static bool tdc_wait_for_old_versions(TH
>         break;
>       }
>       old_msg= thd->enter_cond(&COND_refresh,&LOCK_open, "Waiting for
> table");
> -    pthread_cond_wait(&COND_refresh,&LOCK_open);
> +    /* See function comment for 'killed'. */
> +    if (!thd->killed)
> +    {
> +      /*
> +        Call the sync point after enter_cond() so that the proc_info is
> +        available immediately after the sync point sends a SIGNAL. This
> +        makes tests more reliable.
> +      */
> +      DEBUG_SYNC(thd, "tdc_wait_for_table");
> +      pthread_cond_wait(&COND_refresh,&LOCK_open);
> +    }
>       /* LOCK_open mutex is unlocked by THD::exit_cond() as side-effect. */
>       thd->exit_cond(old_msg);
>     }
>
> === modified file 'sql/sql_class.cc'
> --- a/sql/sql_class.cc	2008-12-16 11:51:34 +0000
> +++ b/sql/sql_class.cc	2009-01-11 18:05:39 +0000
> @@ -892,36 +892,84 @@ void add_diff_to_status(STATUS_VAR *to_v
>   }
>
>
> +/**
> +  Awake a thread.
> +
> +  @param[in]    state_to_set            value for THD::killed
> +
> +  This is normally called from another thread's THD object.
> +
> +  @note Do not call this within LOCK_thread_count.
> +
> +  @note Do always call this within LOCK_delete.
> +*/
> +
>   void THD::awake(THD::killed_state state_to_set)
>   {
>     DBUG_ENTER("THD::awake");
>     DBUG_PRINT("enter", ("this: %p", this));
>     THD_CHECK_SENTRY(this);
> -  safe_mutex_assert_owner(&LOCK_delete);
> +  safe_mutex_assert_owner(&LOCK_delete);
> +  safe_mutex_assert_not_owner(&LOCK_thread_count);
>
> +  /*
> +    In case of a connection kill, we want to place a memory barrier
> +    before sending the kill signal. This should prevent the processor
> +    from out-of-order execution of setting the killed flag and sending
> +    the signal. Also a possible existing write queue in the processor is
> +    to be flushed. The memory barrier is included in mutex locking and
> +    unlocking. We can use an arbitrary un-locked mutex here, as we do
> +    not always read the flag in a mutex. The cases where we do it are
> +    handled below (mysys_var).
> +  */
> +  if (state_to_set != THD::KILL_QUERY)
> +    pthread_mutex_lock(&LOCK_thread_count);
> +
> +  /*
> +    Set the 'killed' flag of 'this', which is the target THD object.
> +  */
>     killed= state_to_set;
> +
>     if (state_to_set != THD::KILL_QUERY)
>     {
> +    /* Memory barrier after setting the killed flag. */
> +    pthread_mutex_unlock(&LOCK_thread_count);

Again, the only visibility guarantee is for threads that will lock this 
mutex.

> +    /* Mark the target thread's alarm request expired, and signal alarm. */
>       thr_alarm_kill(thread_id);
> +
> +    /* Send an event to "libevent" that a thread should be killed. */
>       if (!slave_thread)
>         thread_scheduler.post_kill_notification(this);
> +
>   #ifdef SIGNAL_WITH_VIO_CLOSE
>       if (this != current_thd)
>       {
>         /*
> -        In addition to a signal, let's close the socket of the thread that
> -        is being killed. This is to make sure it does not block if the
> -        signal is lost. This needs to be done only on platforms where
> -        signals are not a reliable interruption mechanism.
> -
> -        If we're killing ourselves, we know that we're not blocked, so this
> -        hack is not used.
> +        In addition to a signal, let's close the socket of the thread
> +        that is being killed. This is to make sure it does not block if
> +        the signal is lost. Another reason is that the target
> +        thread may not have been blocked on the socket at the moment
> +        when we sent the signal. It may come later to the blocking
> +        point. The closed connection prevents it from blocking.
> +
> +        We try to keep the write direction of the socket open. This
> +        allows to send a status or error message. If it fails, the
> +        connection is completely closed.
> +
> +        If we are killing ourselves, we know that we are not blocked.
> +        We also know that we will check thd->killed before we go for
> +        reading the next statement.
>         */
>
>         close_active_vio();

Back to my initial question, what prevents the thread being killed to 
read from or write to this socket?

>       }
> -#endif
> +#endif
>     }
> +
> +  /*
> +    Broadcast a condition to kick the target if it is waiting on it.
> +  */
>     if (mysys_var)
>     {
>       pthread_mutex_lock(&mysys_var->mutex);
> @@ -946,6 +994,9 @@ void THD::awake(THD::killed_state state_
>         we issue a second KILL or the status it's waiting for happens).
>         It's true that we have set its thd->killed but it may not
>         see it immediately and so may have time to reach the cond_wait().
> +
> +      However, where possible, we test for killed once again after
> +      enter_cond(). This should make the signalling safe.
>       */
>       if (mysys_var->current_cond&&  mysys_var->current_mutex)
>       {
> @@ -1302,7 +1353,18 @@ void THD::close_active_vio()
>   #ifndef EMBEDDED_LIBRARY
>     if (active_vio)
>     {
> +    /*
> +      Close read direction, try to keep write direction open.
> +      If this is not possible, close connection completely.
> +
> +      On Windows, closing only the read direction does not terminate
> +      a blocked recv(2). We need to to a full close there.
> +    */
> +#if defined(__WIN__)
>       vio_close(active_vio);
> +#else
> +    vio_close_read(active_vio);
> +#endif
>       active_vio = 0;
>     }
>   #endif
>
> === modified file 'sql/sql_class.h'
> --- a/sql/sql_class.h	2008-12-05 23:47:51 +0000
> +++ b/sql/sql_class.h	2009-01-11 18:05:39 +0000
> @@ -1177,6 +1177,35 @@ struct Ha_data
>   };
>
>
> +/*
> +  The initial intent of SIGNAL_WITH_VIO_CLOSE was to avoid closing of
> +  the connection if the operating system has reliable signals. The
> +  assumption was that setting THD::killed, followed by sending a signal
> +  and broadcasting a condition can make the other thread aware of the
> +  kill request in any situation.
> +
> +  Unfortunately, it was forgotton that there is always some code between
> +  a test of thd->killed and the blocked operation. With
> +  pthread_cond_wait() this is defeated by taking a mutex, registering
> +  the condition, and then checking thd->killed again before calling the
> +  blocking function.
> +
> +  For blocking system services like read() or recv(), we cannot do this.
> +  They won't release a mutex or similar facility when blocked. When the
> +  signal comes in while the code executes between the test of
> +  thd->killed and read/recv() being blocked, the signal handler will
> +  run, but when it returns, the code will continue to run into
> +  read/recv().
> +
> +  The only reliable workaround is to close the connection. This will
> +  prevent read/recv() to block, and even awake them if they are blocked
> +  already.
> +*/
> +#ifndef SIGNAL_WITH_VIO_CLOSE
> +#define SIGNAL_WITH_VIO_CLOSE 1
> +#endif
> +
> +
>   /**
>     @class THD
>     For each client connection we create a separate thread with THD serving as
>
> === modified file 'sql/sql_parse.cc'
> --- a/sql/sql_parse.cc	2009-01-06 09:53:50 +0000
> +++ b/sql/sql_parse.cc	2009-01-11 18:05:39 +0000
> @@ -633,6 +633,22 @@ bool do_command(THD *thd)
>
>     net_new_transaction(net);
>
> +  /*
> +    Synchronization point for testing of KILL_CONNECTION.
> +    This sync point can wait here, to simulate slow code execution
> +    between the last test of thd->killed and blocking in read().
> +
> +    The goal of this test is to verify that a connection does not
> +    hang, if it is killed at this point of execution.
> +    (Bug#37780 - main.kill fails randomly)
> +
> +    Note that the sync point wait itself will be terminated by a
> +    kill. In this case it consumes a condition broadcast, but does
> +    not change anything else. The consumed broadcast should not
> +    matter here, because the read/recv() below doesn't use it.
> +  */
> +  DEBUG_SYNC(thd, "before_do_command_net_read");
> +
>     if ((packet_length= my_net_read(net)) == packet_error)
>     {
>       DBUG_PRINT("info",("Got error %d reading command from socket %s",
>
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc	2008-12-08 11:29:15 +0000
> +++ b/sql/sql_select.cc	2009-01-11 18:05:39 +0000
> @@ -1395,6 +1395,7 @@ JOIN::optimize()
>     if (optimized)
>       DBUG_RETURN(0);
>     optimized= 1;
> +  DEBUG_SYNC(thd, "before_join_optimize");
>
>     thd_proc_info(thd, "optimizing");
>     row_limit= ((select_distinct || order || group_list) ? HA_POS_ERROR :
>
> === modified file 'vio/viosocket.c'
> --- a/vio/viosocket.c	2008-06-05 20:44:29 +0000
> +++ b/vio/viosocket.c	2009-01-11 18:05:39 +0000
> @@ -291,6 +291,41 @@ int vio_close(Vio * vio)
>   }
>
>
> +/**
> +  Close read direction.
> +
> +  @param[in]    vio             vio handle
> +
> +  @retun        status
> +    @retval     0               ok
> +    @retval     -1              error
> +
> +  This closes the read direction of a connection. It tries to keep the
> +  write direction open.
> +
> +  If this is a socket connection, the send direction stays open for
> +  status or error reports, and read attempts receive zero bytes (EOF).
> +
> +  If it is not a socket connection, or closing of the read direction
> +  fails, the connection is completely closed.
> +*/
> +
> +int vio_close_read(Vio *vio)
> +{
> +  int rc= 0;
> +  DBUG_ENTER("vio_close_read");
> +
> +  if (vio->type != VIO_CLOSED)
> +  {
> +    DBUG_ASSERT(vio->sd>= 0);
> +
> +    if ((vio->type == VIO_TYPE_NAMEDPIPE) || shutdown(vio->sd, SHUT_RD))

What if the type is VIO_TYPE_SHARED_MEMORY?

> +      rc= vio_close(vio); /* purecov: inspected */
> +  }
> +  DBUG_RETURN(rc);
> +}
>

I would prefer that the caller handled the close case.

Regards,

-- Davi Arnaut
Thread
bzr commit into mysql-6.0-backup branch (ingo.struewing:2749) Bug#37780Ingo Struewing11 Jan
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Davi Arnaut20 Jan
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Ingo Strüwing21 Jan
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Davi Arnaut26 Jan
        • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Ingo Strüwing27 Jan
          • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749) Bug#37780Paul DuBois27 Jan
          • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Davi Arnaut28 Jan
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2749)Bug#37780Ingo Strüwing25 Feb