Marc Alff a écrit, Le 31.03.2010 17:48:
> #At file:///Users/malff/BZR_TREE/mysql-next-mr-bugfixing-52502/ based on
> 3144 Marc Alff 2010-03-31
> Bug#52502 Performance schema does not start with large mutex_instance buffers
> This is a performance issue affecting scalability.
> Prior to this fix, the performance schema code did not scale well
> when a very large number of instruments are created in the code.
> The root cause was create_mutex() spinning "forever",
> trying to find an empty slot in a full mutex_array.
> With this fix:
> - functions like create_mutex() abort sooner,
> when used with a very large array which is full or almost full.
> - randomize_index() has been revised to improve
> the statistical spread of values in the mutex_array.
> See the comments in the code for details.
> Other allocation functions similar to create_mutex()
> have been also fixed.
> No MTR test case provided (performance scalability issue).
> Fix tested manually.
> === modified file 'storage/perfschema/pfs_global.h'
> --- a/storage/perfschema/pfs_global.h 2010-01-12 01:47:27 +0000
> +++ b/storage/perfschema/pfs_global.h 2010-03-31 15:48:47 +0000
> @@ -1,4 +1,4 @@
> -/* Copyright (C) 2008-2009 Sun Microsystems, Inc
> +/* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
> This program is free software; you can redistribute it and/or modify
> it under the terms of the GNU General Public License as published by
> @@ -30,27 +30,49 @@ void pfs_free(void *ptr);
> inline uint randomized_index(const void *ptr, uint max_size)
> + static uint seed1= 0;
> + static uint seed2= 0;
> + uint result;
> + register intptr value;
> if (unlikely(max_size == 0))
> return 0;
> - ptr is typically an aligned structure,
> - so the last bits are not really random, but this has no effect.
> - Apply a factor A*x to spread
> - close values of ptr further apart (which helps with arrays),
> - and to spread values way beyond a typical max_size.
> - Then, apply a modulo to end within [0, max_size - 1].
> - A is big prime numbers, to avoid resonating with max_size,
> - to have a uniform distribution in [0, max_size - 1].
> - The value of A is chosen so that index(ptr) and index(ptr + N) (for arrays)
> - are likely to be not similar for typical values of max_size
> - (50, 100, 1000, etc).
> - In other words, (sizeof(T)*A % max_size) should not be a small number,
> - to avoid that with 'T array[max_size]', index(array[i])
> - and index(array[i + 1]) end up pointing in the same area in [0, max_size - 1].
> + ptr is typically an aligned structure, and can be in an array.
> + - The last bits are not random because of alignment,
> + so we divide by 8.
> + - The high bits are mostly constant, especially with 64 bits architectures,
> + but we keep most of them anyway, by doing computation in intptr.
> + The high bits are significant depending on where the data is
> + stored (the data segment, the stack, the heap, ...).
> + - To spread consecutive cells in an array further, we multiply by
> + a factor A. This factor should not be too high, which would cause
> + an overflow and cause loss of randomness (droping the top high bits).
> + The factor is a prime number, to help spread the distribution.
> + - To add more noise, and to be more robust if the calling code is
> + passing a constant value instead of a random identity,
> + we add the previous results, for hysteresys, with a degree 2 polynom,
> + X^2 + X + 1.
> + - Last, a modulo is applied to be within the [0, max_size - 1] range.
> + Note that seed1 and seed2 are static, and are *not* thread safe,
> + which is even better.
In spite of the copious details (thanks!), this is a bit too complicated
for me but it all looks plausible and I'll admit it. I wish I had read
this Knuth book about pseudo-random numbers, which is closed on my shelf
since too long.
> + Effect with arrays: T array[N]
> + - ptr(i) = & array[i] = & array + i * sizeof(T)
> + - ptr(i+1) = ptr(i) + sizeof(T).
> + What we want here, is to have index(i) and index(i+1) fall into
> + very different areas in [0, max_size - 1], to avoid locality.
> - return static_cast<uint>
> - (((reinterpret_cast<intptr> (ptr)) * 2166179) % max_size);
> + value= (reinterpret_cast<intptr> (ptr)) >> 3;
> + value*= 1789;
> + value+= seed2 + seed1 + 1;
> + value%= max_size;
> + result= static_cast<uint> (value);
I would have dropped the "value" variable and done:
result= static_cast<uint>((((reinterpret_cast<intptr> (ptr)) >> 3)
* 1789 + seed2 + seed1 + 1 ) % max_size;
If the compiler really puts "value" in a register as requested in the
declaration, it probably doesn't make a difference.
Up to you.
> + seed2= seed1*seed1;
> + seed1= result;
> + return result;
> void pfs_print_error(const char *format, ...);
> === modified file 'storage/perfschema/pfs_instr.cc'
> --- a/storage/perfschema/pfs_instr.cc 2010-03-22 12:48:18 +0000
> +++ b/storage/perfschema/pfs_instr.cc 2010-03-31 15:48:47 +0000
> @@ -454,6 +454,7 @@ void cleanup_file_hash(void)
> PFS_mutex* create_mutex(PFS_mutex_class *klass, const void *identity)
> int pass;
> + uint attempt= 0;
> uint i= randomized_index(identity, mutex_max);
> @@ -464,7 +465,9 @@ PFS_mutex* create_mutex(PFS_mutex_class
> PFS_mutex *pfs= mutex_array + i;
> PFS_mutex *pfs_last= mutex_array + mutex_max;
> - for ( ; pfs < pfs_last; pfs++)
> + for ( ;
> + (pfs < pfs_last) && (attempt < PFS_MAX_ALLOC_RETRY);
> + pfs++, attempt++)
This scans at most the first PFS_MAX_ALLOC_RETRY elements of [pfs,
pfs_last[. I suggest to save one comparison and increment in each
iteration by setting a single bound before the for(), like:
PFS_mutex *pfs= mutex_array + i;
PFS_mutex *pfs_last= mutex_array + mutex_max;
PFS_mutex *bound= pfs + PFS_MAX_ALLOC_RETRY;
if (pfs_last > bound)
for ( ; pfs < pfs_last; pfs++)
> if (pfs->m_lock.is_free())
> === modified file 'storage/perfschema/pfs_instr.h'
> --- a/storage/perfschema/pfs_instr.h 2010-01-12 01:47:27 +0000
> +++ b/storage/perfschema/pfs_instr.h 2010-03-31 15:48:47 +0000
> @@ -136,6 +136,13 @@ struct PFS_table : public PFS_instr
> #define LOCKER_STACK_SIZE 3
> + @def PFS_MAX_ALLOC_RETRY
> + Maximun number of times the code attempts to allocate an item