List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:March 31 2010 8:27pm
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (marc.alff:3144)
Bug#52502
View as plain text  
Hello Marc,

Marc Alff a écrit, Le 31.03.2010 17:48:
> #At file:///Users/malff/BZR_TREE/mysql-next-mr-bugfixing-52502/ based on
> revid:alik@stripped
> 
>  3144 Marc Alff	2010-03-31
>       Bug#52502 Performance schema does not start with large mutex_instance buffers
>       
>       This is a performance issue affecting scalability.
>       
>       Prior to this fix, the performance schema code did not scale well
>       when a very large number of instruments are created in the code.
>       
>       The root cause was create_mutex() spinning "forever",
>       trying to find an empty slot in a full mutex_array.
>       
>       With this fix:
>       - functions like create_mutex() abort sooner,
>         when used with a very large array which is full or almost full.
>       - randomize_index() has been revised to improve
>         the statistical spread of values in the mutex_array.  
>         See the comments in the code for details.
>       
>       Other allocation functions similar to create_mutex()
>       have been also fixed.
>       
>       No MTR test case provided (performance scalability issue).
>       Fix tested manually.
> 
>     modified:
>       storage/perfschema/pfs_global.h
>       storage/perfschema/pfs_instr.cc
>       storage/perfschema/pfs_instr.h
> === modified file 'storage/perfschema/pfs_global.h'
> --- a/storage/perfschema/pfs_global.h	2010-01-12 01:47:27 +0000
> +++ b/storage/perfschema/pfs_global.h	2010-03-31 15:48:47 +0000
> @@ -1,4 +1,4 @@
> -/* Copyright (C) 2008-2009 Sun Microsystems, Inc
> +/* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
>  
>    This program is free software; you can redistribute it and/or modify
>    it under the terms of the GNU General Public License as published by
> @@ -30,27 +30,49 @@ void pfs_free(void *ptr);
>  
>  inline uint randomized_index(const void *ptr, uint max_size)
>  {
> +  static uint seed1= 0;
> +  static uint seed2= 0;
> +  uint result;
> +  register intptr value;
> +
>    if (unlikely(max_size == 0))
>      return 0;
>  
>    /*
> -    ptr is typically an aligned structure,
> -    so the last bits are not really random, but this has no effect.
> -    Apply a factor A*x to spread
> -    close values of ptr further apart (which helps with arrays),
> -    and to spread values way beyond a typical max_size.
> -    Then, apply a modulo to end within [0, max_size - 1].
> -    A is big prime numbers, to avoid resonating with max_size,
> -    to have a uniform distribution in [0, max_size - 1].
> -    The value of A is chosen so that index(ptr) and index(ptr + N) (for arrays)
> -    are likely to be not similar for typical values of max_size
> -    (50, 100, 1000, etc).
> -    In other words, (sizeof(T)*A % max_size) should not be a small number,
> -    to avoid that with 'T array[max_size]', index(array[i])
> -    and index(array[i + 1]) end up pointing in the same area in [0, max_size - 1].
> +    ptr is typically an aligned structure, and can be in an array.
> +    - The last bits are not random because of alignment,
> +      so we divide by 8.
> +    - The high bits are mostly constant, especially with 64 bits architectures,
> +      but we keep most of them anyway, by doing computation in intptr.
> +      The high bits are significant depending on where the data is
> +      stored (the data segment, the stack, the heap, ...).
> +    - To spread consecutive cells in an array further, we multiply by
> +      a factor A. This factor should not be too high, which would cause
> +      an overflow and cause loss of randomness (droping the top high bits).
> +      The factor is a prime number, to help spread the distribution.
> +    - To add more noise, and to be more robust if the calling code is
> +      passing a constant value instead of a random identity,
> +      we add the previous results, for hysteresys, with a degree 2 polynom,
> +      X^2 + X + 1.
> +    - Last, a modulo is applied to be within the [0, max_size - 1] range.
> +    Note that seed1 and seed2 are static, and are *not* thread safe,
> +    which is even better.

In spite of the copious details (thanks!), this is a bit too complicated 
for me but it all looks plausible and I'll admit it. I wish I had read 
this Knuth book about pseudo-random numbers, which is closed on my shelf 
since too long.

> +    Effect with arrays: T array[N]
> +    - ptr(i) = & array[i] = & array[0] + i * sizeof(T)
> +    - ptr(i+1) = ptr(i) + sizeof(T).
> +    What we want here, is to have index(i) and index(i+1) fall into
> +    very different areas in [0, max_size - 1], to avoid locality.
>    */
> -  return static_cast<uint>
> -    (((reinterpret_cast<intptr> (ptr)) * 2166179) % max_size);
> +  value= (reinterpret_cast<intptr> (ptr)) >> 3;
> +  value*= 1789;
> +  value+= seed2 + seed1 + 1;
> +  value%= max_size;
> +  
> +  result= static_cast<uint> (value);

I would have dropped the "value" variable and done:
result= static_cast<uint>((((reinterpret_cast<intptr> (ptr)) >> 3)
                             * 1789 + seed2 + seed1 + 1 ) % max_size;
If the compiler really puts "value" in a register as requested in the 
declaration, it probably doesn't make a difference.
Up to you.

> +  seed2= seed1*seed1;
> +  seed1= result;
> +
> +  return result;
>  }
>  
>  void pfs_print_error(const char *format, ...);
> 
> === modified file 'storage/perfschema/pfs_instr.cc'
> --- a/storage/perfschema/pfs_instr.cc	2010-03-22 12:48:18 +0000
> +++ b/storage/perfschema/pfs_instr.cc	2010-03-31 15:48:47 +0000
> @@ -454,6 +454,7 @@ void cleanup_file_hash(void)
>  PFS_mutex* create_mutex(PFS_mutex_class *klass, const void *identity)
>  {
>    int pass;
> +  uint attempt= 0;
>    uint i= randomized_index(identity, mutex_max);
>  
>    /*
> @@ -464,7 +465,9 @@ PFS_mutex* create_mutex(PFS_mutex_class 
>    {
>      PFS_mutex *pfs= mutex_array + i;
>      PFS_mutex *pfs_last= mutex_array + mutex_max;
> -    for ( ; pfs < pfs_last; pfs++)
> +    for ( ;
> +         (pfs < pfs_last) && (attempt < PFS_MAX_ALLOC_RETRY);
> +         pfs++, attempt++)

This scans at most the first PFS_MAX_ALLOC_RETRY elements of [pfs, 
pfs_last[. I suggest to save one comparison and increment in each 
iteration by setting a single bound before the for(), like:
     PFS_mutex *pfs= mutex_array + i;
     PFS_mutex *pfs_last= mutex_array + mutex_max;
     PFS_mutex *bound= pfs + PFS_MAX_ALLOC_RETRY;
     if (pfs_last > bound)
        pfs_last= bound;
     for ( ; pfs < pfs_last; pfs++)

>      {
>        if (pfs->m_lock.is_free())
>        {

> === modified file 'storage/perfschema/pfs_instr.h'
> --- a/storage/perfschema/pfs_instr.h	2010-01-12 01:47:27 +0000
> +++ b/storage/perfschema/pfs_instr.h	2010-03-31 15:48:47 +0000
> @@ -136,6 +136,13 @@ struct PFS_table : public PFS_instr
>  */
>  #define LOCKER_STACK_SIZE 3
>  
> +/**
> +  @def PFS_MAX_ALLOC_RETRY
> +  Maximun number of times the code attempts to allocate an item

s/Maximun/Maximum

Thread
bzr commit into mysql-next-mr-bugfixing branch (marc.alff:3144)Bug#52502Marc Alff31 Mar
  • Re: bzr commit into mysql-next-mr-bugfixing branch (marc.alff:3144)Bug#52502Guilhem Bichot31 Mar
    • Re: bzr commit into mysql-next-mr-bugfixing branch (marc.alff:3144)Bug#52502Marc Alff1 Apr