On Fri, Dec 12, 2008 at 7:20 AM, Jim Starkey <jstarkey@stripped> wrote:
> Falcon uses a user mode SyncObject for thread synchronization.
> SyncObject is portable, supports read/write locks, and handles
> recursion. It is also fair in the sense that locks are granted in the
> order requested. Instrumenting pthread mutex and rwlocks does nothing
> for Falcon.
> As for the callback during monitoring, there are at least three (no! four!)
> why it is desirable, if not necessary:
> 1. A goal is to be unobtrusive as possible when monitoring is
> compiled in but turned off. Adding memory for monitoring to the
> objects monitored bulks them up, changing the profile of memory
> usage. Since Falcon allocates hundreds of thousand of SyncObjects
> (one per buffer in the page cache), this can amount to a
> significant amount of memory. Allocating memory only for actual
> monitored objects mitigates this problem.
> 2. It isn't enough to monitor usage. You also have to be able to
> collect the data. Without a central registry of monitored object,
> how are you going to collect the data for reporting? In some
> cases, the monitored object themselves may have been deleted.
> Without that data, how can someone draw accurate conclusions?
> 3. Potential deadlock analysis is a difficult but necessary process.
> Centralizing monitoring allows collection of data on overlapping
> locks, data that can reveal potential deadlocks.
> 4. A central monitoring facility can track locks on a per thread basis.
> Handling this in target objects is expensive in code, effort, debugging,
> and resource utilization.
> I'm hope your message was in the spirit of on-going discussion rather
> than the delivery of stone tablets memorializing decisions made.
> Peter Gulutzan wrote:
>> Hi Jim,
>> Jim Starkey wrote:
>>> Waiting for the hotel shuttle at the Riga airport, Chris and Robin
>>> raised the question of instrumenting lock waits in Falcon. While it is
>>> true that 99.9% of lock (SyncObject) requests are granted without
>>> contention, it would be nice to know a great deal more about the 0.1%
>>> that gum up the works.
>> I hope that "instrumenting lock waits" includes instrumenting mutexes
>> rwlocks etc., in which case I think we've made relevant progress.
>>> What I think makes sense is a mechanism for a thread to register itself
>>> before it actually waits on something with a corresponding call when it
>>> wakes up after the look has been granted. Since we would only register
>>> waits, the overhead should be insignificant. We would want to registers
>>> waits on SyncObjects and transactions (we might even register the record
>>> we're blocking on).
>>> A off-the-top-of-my-head API might be something like:
>>> WaitMask waitEvents;
>>> WaitToken waitRegister(int waitType, void *waitObject);
>>> void waitCompleted(WaitToken token);
>>> (by all means, change the names).
>> Sorry, we change more than the names and (rather than registering
>> a routine to call) we have fixed code which either gets executed
>> or skipped. The spec and the code now exist in mysql-6.0-perf tree.
>> We surround code snippets with instrumentation before/after macros.
>> Users can select from read-only in-memory tables and find
>> * what the latest mutex/rwlock/etc. waits were for each thread,
>> and how long they took
>> * summary of the above, by thread
>> * other stuff but I won't repeat what's in WL#2360 specification.
>> At the Riga meeting Chris liked our (Marc Alff's and my) description,
>> and now the specification is 'done' unless Mikael rejects it.
>>> The global mask would indicate which classes of events are interesting
>>> to the wait monitor; if zero, nobody would register anything.
>>> non-zero, specific wait types would be registered. Timing and
>>> accumulating would be the responsibility of the wait monitor, not the
>>> mainline code.
>> No. Flexibility = a bit more overhead, which we can ill afford.
>> I posited the idea of a callback in discussions, there were no takers.
>>> The monitor could, if it understood a specific wait
>>> type, get more specific information about the the blocking object
>>> (SyncObject name, table name and record number for record, etc.).
>>> Even without Robin pushing for external use, this might be a very useful
>>> tool for internal analysis, particularly with Philip's pathological test
>> Our big hope is the "external use" bit, especially via Enterprise Tools.
> Jim Starkey
> President, NimbusDB, Inc.
> 978 526-1376
> Falcon Storage Engine Mailing List
> For list archives: http://lists.mysql.com/falcon
> To unsubscribe: http://lists.mysql.com/falcon?unsub=1
Worklog 2360 is private or doesn't exist.
InnoDB has the same problem that Jim mentions for SyncObject. The
number of InnoDB mutex and rw-mutex objects allocated are linear in
the number of buffer cache pages. InnoDB has ~1400 byte overhead per
buffer cache page --
After the changes we made to InnoDB to make mutex and rw-mutex faster,
I have no desire to add extra macros to the code to support a standard
mechanism for collecting mutex stats provided by MySQL. I would extend
InnoDB to support a new mechanism for reporting such stats and I would
consider reusing common code that provided fast and scalable system
libraries. If I were implementing Falcon, I would expect the same. But
if Falcon must add this and this makes it slower, that is better for
Wrapping pthread_mutex is convenient because that is what MySQL uses
above the storage engine level. But I hope that MySQL will do
something better here. pthread_mutex should only be used for
short-term locks but some of the uses of it appear to have long code
paths when the lock is held. Something better will allow lock-holder
and lock-waiter states to be exported via a SQL command, optionally
collect contention and usage stats and do the right thing when a
thread spins too long waiting for a lock (use less power -- 'green'
I don't get many servers hung on locks, but when that happens I am
stuck. Eventually, the state is noticed because there will be a
connection pile up and monitoring probes will fail and alerts will go
out. InnoDB is kind enough to dump SHOW INNODB STATUS that includes
lock holder and lock waiters when it detects a long lock wait and
eventually crashes the server.
I am not interested in lock-wait stats per thread. I am very
interested in aggregate performance data -- which lock has the most
waits/calls, which lock get call in the code has the most waits/calls.
Aggregate performance is much cheaper to collect. I am sure you will
hear differently from others. But I have been working on SMP
contention in MySQL for a year and I have yet to need per-thread
stats. Also, lock-wait performance stats might help sell a server, but
if that comes at the cost of reducing SMP performance I hope you make
the right choice. I don't think that everyone needs these stats from
their production servers -- and if they do they can always use DTrace.