#At file:///H:/bzr-new/mysql-5.5-bugfixing/ based on revid:marc.alff@stripped
3206 Vladislav Vaintroub 2010-09-17
Yet another take on rwlock on Windows.
This is the port of Vance Morrison's one from
"Low-Lock Techniques in action: Implementing
a Reader-Writer lock" originally written in C#
and available here
http://blogs.msdn.com/b/vancem/archive/2006/03/28/563180.aspx
This features own fast spinlock to protect the structure,
and lazy event initialization.
A minimal change is done to for "prefer readers".
Additionally, writers are protected by critical section, which
helps *a lot* to reduce kernel-mode CPU consumption, so this
is one of the cases, where adding additional synchronization
help peformance.
The likely reason for high CPU consumption is lock convoy -
kernel object like event are trying to be fair (FIFO), which
confuses the dispatcher when locks are highly contended and
held for short time.
Critical Section avoid lock convoys,so for our degenerate RWlock
case with high writer contention for very short code paths, they
are appropriate.
modified:
include/my_pthread.h
mysys/thr_rwlock.c
=== modified file 'include/my_pthread.h'
--- a/include/my_pthread.h 2010-08-10 21:12:01 +0000
+++ b/include/my_pthread.h 2010-09-17 16:47:40 +0000
@@ -1,4 +1,4 @@
-/* Copyright (C) 2000-2008 MySQL AB, 2008-2009 Sun Microsystems, Inc.
+/* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -650,6 +650,21 @@ extern int rw_pr_init(struct st_my_rw_lo
On systems which don't support native read/write locks, or don't support
read/write locks which prefer readers we have to use own implementation.
*/
+#ifdef _WIN32
+
+typedef struct st_my_rw_lock_t {
+ volatile LONG my_lock; /* spinlock for the structure */
+ CRITICAL_SECTION writer_cs; /* writer serialization lock */
+ int owners; /* < 0 : writers, 0 :unlocked, >0 : readers */
+ int num_write_waiters; /* number of waiting writers */
+ int num_read_waiters; /* number of waiting readers */
+ DWORD write_owner; /* thread that holds exclusive lock */
+ HANDLE write_event; /* waiting readers event */
+ HANDLE read_event; /* waiting writers event */
+ my_bool prefer_readers; /* prefer readers flag */
+} my_rw_lock_t;
+
+#else
typedef struct st_my_rw_lock_t {
pthread_mutex_t lock; /* lock for structure */
pthread_cond_t readers; /* waiting readers */
@@ -661,6 +676,7 @@ typedef struct st_my_rw_lock_t {
pthread_t write_thread;
#endif
} my_rw_lock_t;
+#endif
extern int my_rw_init(my_rw_lock_t *, my_bool *);
extern int my_rw_destroy(my_rw_lock_t *);
=== modified file 'mysys/thr_rwlock.c'
--- a/mysys/thr_rwlock.c 2010-08-12 13:50:23 +0000
+++ b/mysys/thr_rwlock.c 2010-09-17 16:47:40 +0000
@@ -1,4 +1,4 @@
-/* Copyright (C) 2000 MySQL AB
+/* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -19,6 +19,7 @@
#if defined(THREAD)
#if defined(NEED_MY_RW_LOCK)
#include <errno.h>
+#if !(defined _WIN32)
/*
Source base from Sun Microsystems SPILT, simplified for MySQL use
@@ -199,7 +200,551 @@ int rw_pr_init(struct st_my_rw_lock_t *r
return my_rw_init(rwlock, &prefer_readers_attr);
}
-#else
+#else /* WIN32 */
+
+/*
+ Reader-Writer lock on Windows.
+
+ There is a reason why we do not use the one-size-fits all, and that reason is
+ speed. General implementation is based on POSIX conditions, which are not only
+ heavy-weight,as implemented inside mysys but also not necessarily needed in
+ presence of other more suitable native primitives, e.g events.
+
+ And there is also reason why we cannot, in general case, use native Vista
+ reader-writer locks - MySQL runtime sometimes requires reader-writer locks
+ with special properties (e.g prefer-reader property, where writer must not
+ preempt a reader, or that reader lock needs to support recursion). These
+ properties are not given with native slim reader-writer locks, which means
+ home-backed implementation is necessary, at least for special cases.
+
+
+ Following is the reader/writer lock implementation taken from Vance
+ Morrison's blog post "Low-Lock Techniques in action: Implementing a Reader-
+ Writer lock". The original blog post can be found here:
+ http://blogs.msdn.com/b/vancem/archive/2006/03/28/563180.aspx
+
+ Analysis of the implementation is given in followup article
+ http://blogs.msdn.com/b/vancem/archive/2006/03/29/564854.aspx
+
+ Implementation here is a straightforward port from original C# code (
+ that can be found at http://tinyurl.com/2dgkw56) .Lock upgrading/downgrading
+ is currently ommitted, as we do not need it right now. The code is very
+ slightly modified for prefer-reader option, and adapted to MySQL naming
+ conventions.
+
+ There is an additional optimization (not in the cited article) for
+ writer-heavy scenarios (holding a critical section for the whole duration of
+ the writer lock). See comment inside my_rw_wrlock() on what it fixes as why it
+ helps.
+
+ This implementation features own spinlock and lazy initialization of event
+ objects. Using spinlock, instead critical section, is justified because
+ spinlock protects short code paths, since implemenetation avoids holding a
+ lock while in system calls. Lazy initialization also comes handy in MySQL case
+ as rwlocks are created and destructed at relatively high rate in MDL.
+
+ Besides the features listed above, the underlying ideas are the same as used
+ in general POSIX case (with events instead of conditions, spinlock instead of
+ mutex).
+
+
+*/
+
+
+/*
+ Spinlock routines, implemented with standard techniques, i.e loop with
+ InterlockedCompareExchange, pause instruction (YieldProcessor) or
+ yielding on OS level (SwitchToThread). There is no Sleep(N) here, the locks
+ are fast enough without it.
+
+ Executing these routines will cause appropriate memory barriers, due to
+ InterlockedCompareExchange used, so the routines are just as good as
+ critical sections when it comes to memory visibility.
+
+ Refer to MSDN article "Synchronization and Multiprocessor Issues"
+ http://msdn.microsoft.com/en-us/library/ms686355(VS.85).aspx for more
+ info on memory visibility.
+*/
+
+/**
+ Helper routine to acquire spinlock
+ @param my_rw_lock_t reader writer lock
+*/
+
+static void enter_my_lock_spin(my_rw_lock_t *rwp)
+{
+ int i;
+ for (i=0; ;i++)
+ {
+ if (i < 3)
+ {
+ int j;
+ for (j=0; j < 20; j++)
+ /* Wait a few dozen instructions to let another processor release lock. */
+ YieldProcessor();
+ }
+ else
+ {
+ /* Give up my quantum. */
+ SwitchToThread();
+ }
+
+ if (InterlockedCompareExchange(&rwp->my_lock, 1, 0) == 0)
+ return;
+ }
+}
+
+
+/**
+ Acquire spinlock
+ @param rwp reader writer lock
+*/
+static void enter_my_lock(my_rw_lock_t *rwp)
+{
+
+ if (InterlockedCompareExchange(&rwp->my_lock, 1, 0) != 0)
+ enter_my_lock_spin(rwp);
+
+}
+
+
+static BOOL my_lock_held(my_rw_lock_t *rwp)
+{
+ return (rwp->my_lock == 1);
+}
+
+/**
+ Release spinlock
+ @param rwp reader writer lock
+*/
+static void exit_my_lock(my_rw_lock_t *rwp)
+{
+ DBUG_ASSERT(my_lock_held(rwp));
+ rwp->my_lock= 0;
+}
+
+/**
+ Create event lazily
+
+ @param rwp the rwlock for which event is allocated
+ @param wait_event event to allocate
+ @param make_autoreset_event TRUE for autorest event, FALSE for manual reset
+
+ Lazily create event outside the lock (so if errors happen they are outside the
+ lock and that we don't do much work while holding a spin lock).
+ If all goes well, reenter the lock and set 'wait_event'.
+*/
+static int lazy_create_event(my_rw_lock_t *rwp, HANDLE *wait_event,
+ BOOL make_autoreset_event)
+{
+ HANDLE new_event;
+
+ DBUG_ASSERT(my_lock_held(rwp));
+ DBUG_ASSERT(*wait_event == NULL);
+
+ exit_my_lock(rwp);
+ new_event = CreateEvent(NULL, make_autoreset_event, FALSE, NULL);
+ enter_my_lock(rwp);
+
+ if (!new_event)
+ return ENOMEM;
+
+ if (*wait_event == NULL)
+ {
+ *wait_event = new_event;
+ }
+ else
+ {
+ /* Someone snuck in. Close the handle we just created. */
+ exit_my_lock(rwp);
+ CloseHandle(new_event);
+ enter_my_lock(rwp);
+ }
+ return 0;
+}
+
+/**
+ Waits on event with a given timeout.
+
+ @param rwp the lock
+ @param wait_event handle to wait on
+ @param num_waiters waiter count(incremented and decremented in this routine)
+ @param timeout_ms timeout (in milliseconds)
+
+ @note Before the wait waiter count is incremented and is restored before
+ leaving this routine.
+*/
+static int wait_on_event(my_rw_lock_t *rwp, HANDLE wait_event, int *num_waiters,
+ int timeout_ms)
+{
+ DWORD wait_retval;
+ BOOL ok;
+
+ DBUG_ASSERT(rwp->my_lock);
+
+ ok = ResetEvent(wait_event);
+ DBUG_ASSERT(ok);
+
+ (*num_waiters)++;
+ exit_my_lock(rwp);
+
+ /* Do the wait outside of any lock */
+ wait_retval = WaitForSingleObject(wait_event, timeout_ms);
+
+ enter_my_lock(rwp);
+ (*num_waiters)--;
+
+ if (wait_retval != WAIT_OBJECT_0)
+ {
+ DBUG_ASSERT(wait_retval == WAIT_TIMEOUT);
+ return EBUSY;
+ }
+
+ return 0;
+}
+
+
+/**
+ Determines the appropriate events to set, leaves the locks, and sets the
+ events.
+
+ @param rwp the lock
+*/
+static void exit_and_wakeup_appropriate_waiters(my_rw_lock_t *rwp)
+{
+ BOOL ok;
+
+ DBUG_ASSERT(my_lock_held(rwp));
+ /*
+ Exit below will be done before signaling to improve efficiency (wakee will
+ need the lock)
+ */
+ if (rwp->owners == 0 && rwp->num_write_waiters > 0)
+ {
+ exit_my_lock(rwp);
+ ok= SetEvent(rwp->write_event); /* release one writer */
+ DBUG_ASSERT(ok);
+ }
+ else if (rwp->owners >= 0 && rwp->num_read_waiters != 0)
+ {
+ exit_my_lock(rwp);
+ ok= SetEvent(rwp->read_event); /* release all readers. */
+ DBUG_ASSERT(ok);
+ }
+ else
+ {
+ exit_my_lock(rwp);
+ }
+}
+
+
+/**
+ Acquire reader lock, with timeout parameter.
+
+ @param rwp the lock
+ @param timeout_ms tmeout (in millisections)
+*/
+static int acquire_reader_lock(my_rw_lock_t *rwp, int timeout_ms)
+{
+ int ret = 0;
+
+ enter_my_lock(rwp);
+ for (;;)
+ {
+ /*
+ We can enter a read lock if there are only read-locks have been given out
+ If it is a normal (NOT prefer-reader) lock, we also check that a writer is
+ not trying to get in. If it is a prefer-reader lock, incoming readers
+ always preempt writers.
+ */
+ if (rwp->owners >= 0 && (rwp->num_write_waiters == 0 || rwp->prefer_readers))
+ {
+ /*
+ Good case, there is no contention, we are basically done.
+ Indicate we have another reader
+ */
+ rwp->owners++;
+ break;
+ }
+
+ /* Drat, we need to wait. Mark that we have waiters and wait. */
+ if (rwp->read_event == NULL)
+ {
+ /* Create the needed event. */
+ ret = lazy_create_event(rwp, &rwp->read_event, FALSE);
+ if(ret != 0)
+ {
+ /* This must be resource allocation error. */
+ DBUG_ASSERT(ret == ENOMEM);
+ break;
+ }
+
+ /* Since we left the lock, in 'lazy_create_event', start over. */
+ continue;
+ }
+
+ ret = wait_on_event(rwp, rwp->read_event, &rwp->num_read_waiters,
+ timeout_ms);
+ if(ret)
+ {
+ /* This must be timeout error. */
+ DBUG_ASSERT(ret == EBUSY && timeout_ms != INFINITE);
+ break;
+ }
+ }
+ exit_my_lock(rwp);
+
+ return ret;
+}
+
+
+
+/**
+ Acquire writer lock, with timeout parameter.
+
+ @param rwp the lock
+ @param timeout_ms timeout (in milliseconds)
+*/
+static int acquire_writer_lock(my_rw_lock_t *rwp, int timeout_ms)
+{
+ int ret= 0;
+
+#ifndef DBUG_OFF
+ DWORD tid = GetCurrentThreadId(); /* track the owner */
+#endif
+
+ enter_my_lock(rwp);
+
+ for (;;)
+ {
+ if (rwp->owners == 0)
+ {
+ /*
+ Good case, there is no contention, we are basically done. Indicate we
+ have a writer.
+ */
+ rwp->owners= -1;
+#ifndef DBUG_OFF
+ rwp->write_owner= tid;
+#endif
+ break;
+ }
+
+ /* Drat, we need to wait. Mark that we have waiters and wait.*/
+ if (rwp->write_event == NULL)
+ {
+ /* Create the needed event. */
+ ret= lazy_create_event(rwp, &rwp->write_event, TRUE);
+ if (ret != 0)
+ {
+ /* This must be resource allocation error. */
+ DBUG_ASSERT(ret == ENOMEM);
+ break;
+ }
+
+ /* Since we left the lock, start over. */
+ continue;
+ }
+
+
+ ret = wait_on_event(rwp, rwp->write_event, &rwp->num_write_waiters,
+ timeout_ms);
+
+ if(ret)
+ {
+ /* This must be timeout. */
+ DBUG_ASSERT(ret == EBUSY && timeout_ms != INFINITE);
+ break;
+ }
+ }
+ exit_my_lock(rwp);
+ return ret;
+}
+
+/**
+ Release reader lock
+*/
+static void release_reader_lock(my_rw_lock_t *rwp)
+{
+ DBUG_ASSERT(rwp->owners > 0);
+ DBUG_ASSERT(rwp->write_owner == 0);
+ enter_my_lock(rwp);
+ --(rwp->owners);
+ exit_and_wakeup_appropriate_waiters(rwp);
+}
+
+
+/**
+ Release writer lock
+*/
+static void release_writer_lock(my_rw_lock_t *rwp)
+{
+ DBUG_ASSERT(rwp->owners < 0);
+ DBUG_ASSERT(rwp->write_owner == GetCurrentThreadId());
+
+ enter_my_lock(rwp);
+ rwp->owners++;
+ rwp->write_owner= 0;
+ exit_and_wakeup_appropriate_waiters(rwp);
+}
+
+
+static void release_reader_or_writer_lock(my_rw_lock_t *rwp, my_bool *was_writer)
+{
+ enter_my_lock(rwp);
+
+ *was_writer= (rwp->owners < 0);
+ if(*was_writer)
+ rwp->owners++;
+ else
+ rwp->owners--;
+
+ exit_and_wakeup_appropriate_waiters(rwp);
+}
+
+/**
+ Initialite reader writer lock
+
+ @param rwp the lock
+ @param prefer_readers whether writers should be starved in presence of readers
+
+*/
+int my_rw_init(my_rw_lock_t *rwp, my_bool *prefer_readers)
+{
+ rwp->prefer_readers= prefer_readers ? *prefer_readers : FALSE;
+ rwp->num_read_waiters= 0;
+ rwp->num_write_waiters= 0;
+ rwp->owners= 0;
+ rwp->write_owner= 0;
+
+ /* Events will be lazily allocated only in case of contention */
+ rwp->read_event= NULL;
+ rwp->write_event= NULL;
+ rwp->my_lock= 0;
+ InitializeCriticalSection(&rwp->writer_cs);
+ return 0;
+}
+
+
+/**
+ Destroy reader writer lock
+*/
+int my_rw_destroy(my_rw_lock_t *rwp)
+{
+ /* Lock must not be busy.*/
+ DBUG_ASSERT(rwp->owners == 0);
+
+ if(rwp->read_event)
+ CloseHandle(rwp->read_event);
+ if(rwp->write_event)
+ CloseHandle(rwp->write_event);
+
+ DeleteCriticalSection(&rwp->writer_cs);
+ return 0;
+}
+
+
+/**
+ Acquire writer lock
+*/
+int my_rw_wrlock(my_rw_lock_t *rwp)
+{
+ int ret;
+
+ /*
+ Additional serialization of writers via critical section (writer_cs)
+ for the whole acquire/release cycle actually helps performance in writer-
+ mostly workloads, and has minimal costs otherwise.
+
+ Without critical section serialization with writer-heavy scenarios, a kernel
+ CPU usage goes high, and context switching and readythread events is going
+ on inside wait_on_event() in WaitForSingleObject.
+
+ Critical section eliminates this switching almost completely, decreases CPU
+ time spent in kernel, and improves benchmark results a lot.
+
+ The difference between critical section and kernel object synchronization
+ likely stemms from the fact that kernel locks are trying to be fair (FIFO),
+ thus subjects to lock convoy, while critical sections include mechanisms to
+ prevent it (there is mentioning about it in MSDN, but no details of lock
+ convoy prevention is exactly implemented)
+
+ It is theoretically possible to spin in acquire_write_lock(), but we
+ are not doing it for the sake of simplicity, and also because we probably
+ cannot do this as efficiently as tried-and-true critical section.
+
+ Note also that wrapping writer with critical section does not change the
+ correctness of the original algorithm.
+ */
+
+ EnterCriticalSection(&rwp->writer_cs);
+ ret = acquire_writer_lock(rwp, INFINITE);
+ if(ret != 0)
+ LeaveCriticalSection(&rwp->writer_cs);
+ return ret;
+}
+
+
+/**
+ Try to acquire writer lock
+*/
+int my_rw_trywrlock(my_rw_lock_t *rwp)
+{
+ if(TryEnterCriticalSection(&rwp->writer_cs))
+ {
+ int ret = acquire_writer_lock(rwp, 0);
+ if(ret != 0)
+ LeaveCriticalSection(&rwp->writer_cs);
+ return ret;
+ }
+
+ return EBUSY; /* TryEnter failed */
+}
+
+
+/**
+ Acquire reader lock
+*/
+int my_rw_rdlock(my_rw_lock_t *rwp)
+{
+ return acquire_reader_lock(rwp, INFINITE);
+}
+
+/**
+ Try to acquire reader lock
+*/
+int my_rw_tryrdlock(my_rw_lock_t *rwp)
+{
+ return acquire_reader_lock(rwp, 0);
+}
+
+/**
+ Unlock previously acquired lock
+*/
+int my_rw_unlock(my_rw_lock_t *rwp)
+{
+ my_bool was_writer;
+ release_reader_or_writer_lock(rwp, &was_writer);
+ if(was_writer)
+ {
+ LeaveCriticalSection(&rwp->writer_cs);
+ }
+ return 0;
+}
+
+
+/**
+ Initialize prefer-readers lock
+*/
+int rw_pr_init(struct st_my_rw_lock_t *rwlock)
+{
+ my_bool prefer_readers_attr= TRUE;
+ return my_rw_init(rwlock, &prefer_readers_attr);
+}
+
+#endif /* WIN32 */
+
+#else /* !defined(NEED_MY_RWLOCK) */
/*
We are on system which has native read/write locks which support
Attachment: [text/bzr-bundle] bzr/vvaintroub@mysql.com-20100917164740-elrcm5xlk2govptb.bundle
| Thread |
|---|
| • bzr commit into mysql-5.5-bugfixing branch (vvaintroub:3206) | Vladislav Vaintroub | 17 Sep |