List:Commits« Previous MessageNext Message »
From:Davi Arnaut Date:December 15 2009 3:05pm
Subject:Re: bzr commit into mysql-6.0-codebase-bugfixing branch
(jon.hauglid:3728) Bug#48541
View as plain text  
Hi Jon,

On 11/24/09 10:23 AM, Jon Olav Hauglid wrote:
> #At file:///export/home/z/mysql-6.0-codebase-bugfixing-bug48541/ based on
> revid:alik@stripped
>
>   3728 Jon Olav Hauglid	2009-11-24
>        Bug #48541 Server-wide deadlock involving LOCK_thread_count
>
>        The bug title is misleading. This is a deadlock between LOCK_open and
> LOCK_mdl.
>        The reason for the deadlock was an improper exit from
>        MDL_context::wait_for_locks() which caused mysys_var->current_mutex to
> remain
>        LOCK_mdl even though LOCK_mdl was no longer held by that connection.
>
>        This could lead to a deadlock in the following way:
>        1) INSERT DELAYED tries to open a table but fails, and trying to recover it
>        calls wait_for_locks().
>        2) Due to a pending exclusive request, wait_for_locks() fails and exits
> without
>        resetting mysys_var->current_mutex for the delayed insert handler thread.
> So it
>        continues to point to LOCK_mdl.
>        3) The handler thread manages to open a table.
>        4) A different connection takes LOCK_open and tries to take LOCK_mdl.
>        5) FLUSH TABLES from a third connection notices that the handler thread has a
>        table open, and tries to kill it. This involves locking
> mysys_var->current_mutex
>        while having LOCK_open locked. Since current_mutex mistakenly points to
> LOCK_mdl,
>        we have a deadlock.

FWIW, there is another scenario which can lead to a self-deadlock: the 
insert delayed thread opens a table but fails. It tries again and is 
successful, but the current_mutex still points to LOCK_mdl. A exclusive 
lock request comes in and, while holding LOCK_mdl, tries to notify the 
insert delayed thread (via mysql_notify_thread_having_shared_lock) -- 
the notification process involves attempting to lock the current_mutex 
of the insert delayed thread, leading to a self-deadlock.

>        This patch makes sure MDL_EXIT_COND() is called before exiting
> wait_for_locks().

Suggestion: could we extend the MDL_ENTER_COND and MDL_EXIT_COND to 
detect, under debug builds, this kind of mishap? Perhaps declare on the 
stack a class that in its destructor detects whether the condition has 
been entered and exited.

>        This clears mysys->current_mutex which resolves the issue. The patch also
> removes
>        the unnecessary locking of mysys_var->current_mutex in
> kill_delayed_threads() and
>        kill_delayed_threads_for_table().

Why is it unnecessary?

Unfortunately, it appears to be necessary as the CV in this case is used 
as a pulse event -- at the time of the signal, only threads already 
waiting on the CV are woken up. Please consider the following scenario:

insert delayed thread>
	set current_mutex
	set current_cond
	check killed flag

kill connection thread>
	set killed flag
	signal condition variable

insert delayed thread>
	wait on the condition
	# missed the signal

>        An assert is added to recover_from_failed_open_table_attempt() after
>        wait_for_locks() is called, to check that current_mutex is indeed reset.
>        With this assert in place, existing tests in (e.g.) mdl_sync.test will fail
>        without this patch.
>

[..]

>
> === modified file 'sql/sql_base.cc'
> --- a/sql/sql_base.cc	2009-11-23 12:36:34 +0000
> +++ b/sql/sql_base.cc	2009-11-24 12:22:57 +0000
> @@ -899,11 +899,7 @@ static void kill_delayed_threads_for_tab
>         in_use->killed= THD::KILL_CONNECTION;
>         pthread_mutex_lock(&in_use->mysys_var->mutex);
>         if (in_use->mysys_var->current_cond)
> -      {
> -        pthread_mutex_lock(in_use->mysys_var->current_mutex);
>           pthread_cond_broadcast(in_use->mysys_var->current_cond);
> -        pthread_mutex_unlock(in_use->mysys_var->current_mutex);
> -      }
>         pthread_mutex_unlock(&in_use->mysys_var->mutex);
>       }
>     }
> @@ -3700,6 +3696,7 @@ recover_from_failed_open_table_attempt(T
>       case OT_WAIT:
>         result= (thd->mdl_context.wait_for_locks(&m_mdl_requests) ||
>                  tdc_wait_for_old_versions(thd,&m_mdl_requests));
> +      DBUG_ASSERT(thd->mysys_var->current_mutex == 0);

0 -> NULL

Regards,

--
Davi Arnaut
Thread
bzr commit into mysql-6.0-codebase-bugfixing branch (jon.hauglid:3728)Bug#48541Jon Olav Hauglid24 Nov
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(jon.hauglid:3728) Bug#48541Davi Arnaut15 Dec