List:Commits« Previous MessageNext Message »
From:Vladislav Vaintroub Date:September 17 2010 4:47pm
Subject:bzr commit into mysql-5.5-bugfixing branch (vvaintroub:3206)
View as plain text  
#At file:///H:/bzr-new/mysql-5.5-bugfixing/ based on revid:marc.alff@stripped

 3206 Vladislav Vaintroub	2010-09-17
      Yet another take on rwlock on Windows.
      
      This is the port of Vance Morrison's one from 
      "Low-Lock Techniques in action: Implementing 
      a Reader-Writer lock" originally written in C#
      and available here
      http://blogs.msdn.com/b/vancem/archive/2006/03/28/563180.aspx
      
      This features own fast spinlock to protect the structure,
      and lazy event initialization.
      
      A minimal change is done to for "prefer readers".
      
      Additionally, writers are protected by critical section, which
      helps *a lot* to reduce kernel-mode CPU consumption, so this 
      is one of  the cases, where adding additional synchronization
      help peformance. 
      
      The likely reason for high CPU consumption is lock convoy -
       kernel object like event are trying to be fair (FIFO), which 
      confuses the dispatcher when locks are highly contended and
      held for short time.
      
      Critical Section avoid lock convoys,so for our degenerate RWlock 
      case with high writer contention for very short code paths, they 
      are appropriate.

    modified:
      include/my_pthread.h
      mysys/thr_rwlock.c
=== modified file 'include/my_pthread.h'
--- a/include/my_pthread.h	2010-08-10 21:12:01 +0000
+++ b/include/my_pthread.h	2010-09-17 16:47:40 +0000
@@ -1,4 +1,4 @@
-/* Copyright (C) 2000-2008 MySQL AB, 2008-2009 Sun Microsystems, Inc.
+/* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -650,6 +650,21 @@ extern int rw_pr_init(struct st_my_rw_lo
   On systems which don't support native read/write locks, or don't support
   read/write locks which prefer readers we have to use own implementation.
 */
+#ifdef _WIN32
+
+typedef struct  st_my_rw_lock_t {
+  volatile LONG my_lock;         /* spinlock for the structure */
+  CRITICAL_SECTION writer_cs;    /* writer serialization lock  */
+  int owners;                    /* < 0 : writers, 0 :unlocked, >0 : readers */
+  int num_write_waiters;         /* number of waiting writers */
+  int num_read_waiters;          /* number of waiting readers */
+  DWORD  write_owner;            /* thread that holds exclusive lock */
+  HANDLE write_event;            /* waiting readers event */
+  HANDLE read_event;             /* waiting writers event */
+  my_bool prefer_readers;        /* prefer readers flag */
+} my_rw_lock_t;
+
+#else
 typedef struct st_my_rw_lock_t {
 	pthread_mutex_t lock;		/* lock for structure		*/
 	pthread_cond_t	readers;	/* waiting readers		*/
@@ -661,6 +676,7 @@ typedef struct st_my_rw_lock_t {
         pthread_t       write_thread;
 #endif
 } my_rw_lock_t;
+#endif
 
 extern int my_rw_init(my_rw_lock_t *, my_bool *);
 extern int my_rw_destroy(my_rw_lock_t *);

=== modified file 'mysys/thr_rwlock.c'
--- a/mysys/thr_rwlock.c	2010-08-12 13:50:23 +0000
+++ b/mysys/thr_rwlock.c	2010-09-17 16:47:40 +0000
@@ -1,4 +1,4 @@
-/* Copyright (C) 2000 MySQL AB
+/* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -19,6 +19,7 @@
 #if defined(THREAD)
 #if defined(NEED_MY_RW_LOCK)
 #include <errno.h>
+#if !(defined _WIN32)
 
 /*
   Source base from Sun Microsystems SPILT, simplified for MySQL use
@@ -199,7 +200,551 @@ int rw_pr_init(struct st_my_rw_lock_t *r
   return my_rw_init(rwlock, &prefer_readers_attr);
 }
 
-#else
+#else /* WIN32 */
+
+/* 
+  Reader-Writer lock on Windows.
+
+  There is a reason why we do not use the  one-size-fits all, and that reason is 
+  speed. General implementation is based on POSIX conditions, which are not only 
+  heavy-weight,as implemented inside mysys but also not necessarily needed in 
+  presence of other more suitable native primitives, e.g events.
+
+  And there is also reason why we cannot, in general case, use native Vista 
+  reader-writer locks - MySQL runtime sometimes requires reader-writer locks 
+  with special properties (e.g prefer-reader property, where writer must not 
+  preempt a reader, or that reader lock needs to support recursion). These 
+  properties are not given with native  slim reader-writer locks, which means 
+  home-backed implementation is necessary, at least for special cases.
+
+
+  Following is the reader/writer lock implementation taken from  Vance 
+  Morrison's  blog post "Low-Lock Techniques in action: Implementing a Reader-
+  Writer lock". The original blog post can be found here:
+  http://blogs.msdn.com/b/vancem/archive/2006/03/28/563180.aspx 
+
+  Analysis of the implementation is given in followup article 
+  http://blogs.msdn.com/b/vancem/archive/2006/03/29/564854.aspx
+
+  Implementation here is a straightforward port from original C# code (
+  that can be found at http://tinyurl.com/2dgkw56) .Lock upgrading/downgrading
+  is currently ommitted, as we do not need it right now. The code is very 
+  slightly modified for prefer-reader option, and adapted to MySQL naming 
+  conventions.
+
+  There is an additional optimization (not in the cited article) for 
+  writer-heavy scenarios (holding a critical section for the whole duration of 
+  the writer lock). See comment inside my_rw_wrlock() on what it fixes as why it
+  helps. 
+
+  This implementation features own spinlock and lazy initialization of event
+  objects. Using spinlock, instead critical section, is justified because 
+  spinlock protects short code paths, since implemenetation avoids holding a
+  lock while in system calls. Lazy initialization also comes handy in MySQL case 
+  as rwlocks are  created and destructed at relatively high rate in MDL.
+
+  Besides the features listed above, the underlying ideas are the same as used
+  in general POSIX case (with events instead of conditions, spinlock instead of
+  mutex).
+
+
+*/
+
+
+/*
+  Spinlock routines, implemented with standard techniques, i.e loop with
+  InterlockedCompareExchange, pause instruction  (YieldProcessor) or 
+  yielding on OS level (SwitchToThread). There is no Sleep(N) here, the locks 
+  are fast enough without it.
+
+  Executing these routines will cause appropriate memory barriers, due to 
+  InterlockedCompareExchange used, so the routines are just as good as 
+  critical sections when it comes to memory visibility.
+
+  Refer to MSDN article "Synchronization and Multiprocessor Issues" 
+  http://msdn.microsoft.com/en-us/library/ms686355(VS.85).aspx for more
+  info on memory visibility.
+*/
+
+/**
+  Helper routine to acquire spinlock 
+  @param my_rw_lock_t   reader writer lock
+*/
+
+static void enter_my_lock_spin(my_rw_lock_t *rwp)
+{
+  int i;
+  for (i=0; ;i++)
+  {
+    if (i < 3)
+    {
+      int j;
+      for (j=0; j < 20; j++)
+        /* Wait a few dozen instructions to let another processor release lock. */
+        YieldProcessor();
+    }
+    else
+    {
+       /* Give up my quantum. */
+      SwitchToThread();
+    }
+
+    if (InterlockedCompareExchange(&rwp->my_lock, 1, 0) == 0)
+      return;
+  }
+}
+
+
+/**
+  Acquire spinlock 
+  @param rwp   reader writer lock
+*/
+static void enter_my_lock(my_rw_lock_t *rwp) 
+{
+
+  if (InterlockedCompareExchange(&rwp->my_lock, 1, 0) != 0)
+    enter_my_lock_spin(rwp);
+
+}
+
+
+static BOOL my_lock_held(my_rw_lock_t *rwp)
+{
+  return (rwp->my_lock == 1);
+}
+
+/**
+  Release spinlock
+  @param rwp   reader writer lock
+*/
+static void exit_my_lock(my_rw_lock_t *rwp)
+{
+  DBUG_ASSERT(my_lock_held(rwp));
+  rwp->my_lock= 0;
+}
+
+/**
+  Create event lazily
+
+  @param rwp                    the rwlock for which event is allocated
+  @param wait_event             event to allocate
+  @param make_autoreset_event   TRUE for autorest event, FALSE for manual reset
+
+  Lazily create event outside the lock (so if errors happen they are outside the
+  lock and that we don't do much work while holding a spin  lock).  
+  If all goes well, reenter the lock and set 'wait_event'.
+*/
+static int lazy_create_event(my_rw_lock_t *rwp, HANDLE *wait_event, 
+  BOOL make_autoreset_event) 
+{
+  HANDLE new_event;
+
+  DBUG_ASSERT(my_lock_held(rwp));
+  DBUG_ASSERT(*wait_event == NULL);
+
+  exit_my_lock(rwp);
+  new_event = CreateEvent(NULL, make_autoreset_event, FALSE, NULL);
+  enter_my_lock(rwp);
+
+  if (!new_event)
+    return ENOMEM;
+
+  if (*wait_event == NULL) 
+  {
+    *wait_event = new_event;
+  }
+  else
+  {
+    /* Someone snuck in. Close the handle we just created. */
+    exit_my_lock(rwp);
+    CloseHandle(new_event);
+    enter_my_lock(rwp);
+  }
+  return 0;
+}
+
+/**
+  Waits on event with a given timeout. 
+  
+  @param rwp          the lock
+  @param wait_event   handle to wait on
+  @param num_waiters  waiter count(incremented and decremented in this routine)
+  @param timeout_ms   timeout (in milliseconds)
+
+  @note Before the wait waiter count is incremented and is restored before 
+  leaving this routine.
+*/
+static int wait_on_event(my_rw_lock_t *rwp, HANDLE wait_event, int *num_waiters,
+  int timeout_ms)
+{
+  DWORD wait_retval;
+  BOOL ok;
+
+  DBUG_ASSERT(rwp->my_lock);
+
+  ok = ResetEvent(wait_event);
+  DBUG_ASSERT(ok);
+
+  (*num_waiters)++;
+  exit_my_lock(rwp); 
+
+  /* Do the wait outside of any lock */
+  wait_retval = WaitForSingleObject(wait_event, timeout_ms);
+
+  enter_my_lock(rwp);
+  (*num_waiters)--;
+
+  if (wait_retval != WAIT_OBJECT_0)
+  {
+    DBUG_ASSERT(wait_retval == WAIT_TIMEOUT);
+    return EBUSY;
+  }
+
+  return 0;
+}
+
+
+/**
+  Determines the appropriate events to set, leaves the locks, and sets the 
+  events.
+
+  @param rwp  the lock
+*/
+static void exit_and_wakeup_appropriate_waiters(my_rw_lock_t *rwp)
+{
+  BOOL ok;
+
+  DBUG_ASSERT(my_lock_held(rwp));
+  /* 
+    Exit below will be done before signaling to improve efficiency (wakee will 
+    need the lock)
+  */
+  if (rwp->owners == 0 && rwp->num_write_waiters > 0)
+  {
+    exit_my_lock(rwp);  
+    ok= SetEvent(rwp->write_event);   /* release one writer */
+    DBUG_ASSERT(ok);
+  }
+  else if (rwp->owners >= 0 && rwp->num_read_waiters != 0)
+  {
+    exit_my_lock(rwp);    
+    ok= SetEvent(rwp->read_event);  /* release all readers. */
+    DBUG_ASSERT(ok);
+  }
+  else
+  {
+    exit_my_lock(rwp);
+  }
+}
+
+
+/**
+  Acquire reader lock, with timeout parameter.
+
+  @param rwp          the lock
+  @param timeout_ms   tmeout (in millisections)
+*/
+static int acquire_reader_lock(my_rw_lock_t *rwp, int timeout_ms)
+{
+  int ret = 0;
+
+  enter_my_lock(rwp);
+  for (;;)
+  {
+    /*
+      We can enter a read lock if there are only read-locks have been given out
+      If it is a normal (NOT prefer-reader) lock, we also check that a writer is
+      not trying to get in. If it is a prefer-reader lock, incoming readers 
+      always preempt writers.
+    */
+    if (rwp->owners >= 0 && (rwp->num_write_waiters == 0 || rwp->prefer_readers))
+    {
+      /*
+        Good case, there is no contention, we are basically done.
+        Indicate we have another reader 
+      */
+      rwp->owners++;
+      break;
+    }
+
+    /* Drat, we need to wait.  Mark that we have waiters and wait. */
+    if (rwp->read_event == NULL)
+    {
+      /* Create the needed event. */
+      ret = lazy_create_event(rwp, &rwp->read_event, FALSE);
+      if(ret != 0)
+      {
+        /* This must be resource allocation error. */
+        DBUG_ASSERT(ret == ENOMEM);
+        break;
+      }
+
+      /* Since we left the lock, in 'lazy_create_event', start over. */
+      continue;
+    }
+
+    ret = wait_on_event(rwp, rwp->read_event, &rwp->num_read_waiters, 
+      timeout_ms);
+    if(ret)
+    {
+      /* This must be timeout error. */
+      DBUG_ASSERT(ret == EBUSY && timeout_ms != INFINITE);
+      break;
+    }
+  }
+  exit_my_lock(rwp);
+
+  return ret;
+}
+
+
+
+/**
+  Acquire writer lock, with timeout parameter.
+
+  @param rwp         the lock
+  @param timeout_ms  timeout (in milliseconds)
+*/
+static int acquire_writer_lock(my_rw_lock_t *rwp, int timeout_ms)
+{
+  int ret= 0;
+
+#ifndef DBUG_OFF
+  DWORD tid = GetCurrentThreadId(); /* track the owner */
+#endif
+
+  enter_my_lock(rwp);
+
+  for (;;)
+  {
+    if (rwp->owners == 0)
+    {
+      /* 
+        Good case, there is no contention, we are basically done.  Indicate we 
+        have a writer.
+      */
+      rwp->owners= -1;
+#ifndef DBUG_OFF
+      rwp->write_owner= tid;
+#endif
+      break;
+    }
+
+    /* Drat, we need to wait.  Mark that we have waiters and wait.*/
+    if (rwp->write_event == NULL)
+    {
+       /* Create the needed event. */
+      ret= lazy_create_event(rwp, &rwp->write_event, TRUE);
+      if (ret != 0)
+      {
+         /* This must be resource allocation error. */
+        DBUG_ASSERT(ret == ENOMEM);
+        break;
+      }
+
+      /* Since we left the lock, start over. */
+      continue;
+    }
+
+
+    ret = wait_on_event(rwp, rwp->write_event, &rwp->num_write_waiters, 
+      timeout_ms);
+
+    if(ret)
+    {
+      /* This must be timeout. */
+      DBUG_ASSERT(ret == EBUSY && timeout_ms != INFINITE);
+      break;
+    }
+  }
+  exit_my_lock(rwp);
+  return ret;
+}
+
+/**
+  Release reader lock
+*/
+static void release_reader_lock(my_rw_lock_t *rwp)
+{
+  DBUG_ASSERT(rwp->owners > 0);
+  DBUG_ASSERT(rwp->write_owner == 0);
+  enter_my_lock(rwp);
+  --(rwp->owners);
+  exit_and_wakeup_appropriate_waiters(rwp);
+}
+
+
+/**
+  Release writer lock
+*/
+static void release_writer_lock(my_rw_lock_t *rwp)
+{
+  DBUG_ASSERT(rwp->owners < 0);
+  DBUG_ASSERT(rwp->write_owner == GetCurrentThreadId());
+
+  enter_my_lock(rwp);
+  rwp->owners++;
+  rwp->write_owner= 0;
+  exit_and_wakeup_appropriate_waiters(rwp);
+}
+
+
+static void release_reader_or_writer_lock(my_rw_lock_t *rwp, my_bool *was_writer)
+{
+   enter_my_lock(rwp);
+
+   *was_writer=  (rwp->owners < 0);
+   if(*was_writer)
+     rwp->owners++;
+   else
+     rwp->owners--;
+
+   exit_and_wakeup_appropriate_waiters(rwp);
+}
+
+/**
+  Initialite reader writer lock
+
+  @param rwp             the lock
+  @param prefer_readers  whether writers should be starved in presence of readers
+
+*/
+int my_rw_init(my_rw_lock_t *rwp, my_bool *prefer_readers)
+{
+  rwp->prefer_readers= prefer_readers ? *prefer_readers : FALSE;
+  rwp->num_read_waiters= 0; 
+  rwp->num_write_waiters= 0;
+  rwp->owners= 0;
+  rwp->write_owner= 0;
+
+  /* Events will be lazily allocated only in case of contention */
+  rwp->read_event= NULL;
+  rwp->write_event= NULL;
+  rwp->my_lock= 0;
+  InitializeCriticalSection(&rwp->writer_cs);
+  return 0;
+}
+
+
+/**
+  Destroy reader writer lock
+*/
+int my_rw_destroy(my_rw_lock_t *rwp)
+{
+  /* Lock must not be busy.*/
+  DBUG_ASSERT(rwp->owners == 0);
+
+  if(rwp->read_event)
+    CloseHandle(rwp->read_event);
+  if(rwp->write_event)
+    CloseHandle(rwp->write_event);
+
+  DeleteCriticalSection(&rwp->writer_cs);
+  return 0;
+}
+
+
+/**
+  Acquire writer lock
+*/
+int my_rw_wrlock(my_rw_lock_t *rwp)
+{
+  int ret;
+
+  /*
+    Additional serialization of writers via critical section (writer_cs)
+    for the whole  acquire/release cycle actually helps performance in writer-
+    mostly workloads, and has minimal costs otherwise.
+    
+    Without critical section serialization with writer-heavy scenarios, a kernel 
+    CPU usage goes high, and context switching and readythread events is going 
+    on inside wait_on_event() in WaitForSingleObject.
+
+    Critical section eliminates this switching almost completely, decreases CPU 
+    time spent in kernel, and improves benchmark results a lot.
+
+    The difference between critical section and kernel object synchronization
+    likely stemms from the fact that kernel locks are trying to be fair (FIFO), 
+    thus subjects to lock convoy, while critical sections include mechanisms to 
+    prevent it (there is mentioning about it in MSDN, but no details of lock 
+    convoy prevention is exactly implemented)
+
+    It is theoretically possible to spin in acquire_write_lock(), but we 
+    are not doing it for the sake of simplicity, and also because we probably 
+    cannot do this as efficiently as tried-and-true critical section.
+
+    Note also that wrapping writer with critical section does not change the 
+    correctness of the original algorithm.
+  */
+
+  EnterCriticalSection(&rwp->writer_cs);
+  ret = acquire_writer_lock(rwp, INFINITE);
+  if(ret != 0)
+    LeaveCriticalSection(&rwp->writer_cs);
+  return ret;
+}
+
+
+/**
+  Try to acquire writer lock
+*/
+int my_rw_trywrlock(my_rw_lock_t *rwp)
+{
+  if(TryEnterCriticalSection(&rwp->writer_cs))
+  {
+    int ret = acquire_writer_lock(rwp, 0);
+    if(ret != 0)
+      LeaveCriticalSection(&rwp->writer_cs);
+    return ret;
+  }
+
+  return EBUSY; /* TryEnter failed */
+}
+
+
+/**
+  Acquire reader lock
+*/
+int my_rw_rdlock(my_rw_lock_t *rwp)
+{
+  return acquire_reader_lock(rwp, INFINITE);
+}
+
+/**
+  Try to acquire reader lock
+*/
+int my_rw_tryrdlock(my_rw_lock_t *rwp)
+{
+  return acquire_reader_lock(rwp, 0);
+}
+
+/**
+  Unlock previously acquired lock
+*/
+int my_rw_unlock(my_rw_lock_t *rwp)
+{
+  my_bool was_writer;
+  release_reader_or_writer_lock(rwp, &was_writer);
+  if(was_writer)
+  {
+    LeaveCriticalSection(&rwp->writer_cs);
+  }
+  return 0;
+}
+
+
+/**
+  Initialize prefer-readers lock
+*/
+int rw_pr_init(struct st_my_rw_lock_t *rwlock)
+{
+  my_bool prefer_readers_attr= TRUE;
+  return my_rw_init(rwlock, &prefer_readers_attr);
+}
+
+#endif /* WIN32 */
+
+#else  /* !defined(NEED_MY_RWLOCK) */
 
 /*
   We are on system which has native read/write locks which support


Attachment: [text/bzr-bundle] bzr/vvaintroub@mysql.com-20100917164740-elrcm5xlk2govptb.bundle
Thread
bzr commit into mysql-5.5-bugfixing branch (vvaintroub:3206) Vladislav Vaintroub17 Sep