List:Commits« Previous MessageNext Message »
From:Sergei Golubchik Date:February 12 2008 8:54pm
Subject:Re: bk commit into 6.0 tree (gkodinov:1.2799) BUG#14637
View as plain text  
Hi!

On Feb 06, kgeorge@stripped wrote:
> ChangeSet@stripped, 2008-02-06 18:45:46+02:00, gkodinov@stripped +5 -0
>   Bug #14637: trim trailing spaces processes data only byte wise
>    Use and int * where possible to scan for trailing space in a
>    string instead of always iterating char-by-char.
> 
> diff -Nrup a/include/m_string.h b/include/m_string.h
> --- a/include/m_string.h	2007-11-30 15:52:49 +02:00
> +++ b/include/m_string.h	2008-02-06 18:45:44 +02:00
> @@ -263,4 +263,63 @@ typedef struct st_mysql_lex_string LEX_S
>  #define USTRING_WITH_LEN(X) ((uchar*) X), ((size_t) (sizeof(X) - 1))
>  #define C_STRING_WITH_LEN(X) ((char *) (X)), ((size_t) (sizeof(X) - 1))
>  
> +/* 
> +  Skip trailing space. Made as a marcro so it can inline.  The idea is simple :

monty prefers it as an inline function (difficult to read, hell to
debug).

> +  On most systems reading memory in larger chunks (ideally equal to the size of
> +  the chinks that the machine physically reads from memory) causes fewer memory
> +  access loops and hence increased performance.
> +  This is why the 'int' type is used : it's closest to that (according to how
> +  it's defined in C).
> +  So when we determine the amount of whitespace at the end of a string we do
> +  the following :
> +    1. We divide the string into 3 zones :
> +      a) from the start of the string (__start) to the first multiple
> +        of sizeof(int)  (__start_words)
> +      b) from the end of the string (__end) to the last multiple of sizeof(int)
> +        (__end_words)
> +      c) a zone that is aligned to sizeof(int) and can be safely accessed
> +        through an int *
> +    2. We start comparing backwards from (c) char-by-char. If all we find is
> +       space then we continue
> +    3. If there are elements in zone (b) we compare them as unsigned ints to a
> +       int mask (SPACE_INT) consisting of all spaces
> +    4. Finally we compare the remaining part (a) of the string char by char.
> +       This covers for the last non-space unsigned int from 3. (if any)
> +   
> +   This algorithm works well for relatively larger strings, but it will slow
> +   the things down for smaller strings (because of the additional calculations
> +   and checks compared to the naive method). Thus the barrier of length 20
> +   is added.
> +*/
> +#if SIZEOF_INT == 2
> +#define SPACE_INT 0x2020

I doubt you need to support that :)

> +#elif SIZEOF_INT == 4
> +#define SPACE_INT 0x20202020
> +#elif SIZEOF_INT == 8
> +#define SPACE_INT 0x2020202020202020
> +#else
> +#error define the appropriate constant for a word full of spaces
> +#endif
> +

the define below is so big that it needs a comment explaining the arguments

> +#define SKIP_TRAILING_SPACE(type,ptr,len,end_ret) \

use

  do { ... } while(0)

pattern for macros that should be used as a statement (an not as an
expression).

as a bonus you'll be able to use break to leave the macro early (like
return in a function).

> +{ \

do you need 'type' ? only if some callers really need uchar and other
need char - but it doesn't look like they have a good reason for this
inconsistency. better to change my_lengthsp_8bit() to use uchar ?

> +  type *__start= ptr; \
> +  register type *__end= ptr + len; \

I tend to avoid 'register' - compiler usually knows better what to keep
on registers, at best it'll ignore your declaration, in the worst case
you'll prevent it from making a good choice. Also modern compilers (e.g.
gcc-4 or may be even older) use SSA optimization where, basically every
change of a variable value creates a new variable (that is a=a+1 is
treated as a_2=a_1+1) which won't necessary use the same memory that the
old one.

> +  if (len > 20) \
> +  { \
> +    type *__end_words= (type *)(((intptr)__end) / SIZEOF_INT * SIZEOF_INT); \
> +    type *__start_words= (type *) \
> +       ((((intptr)__start) + SIZEOF_INT - 1) / SIZEOF_INT * SIZEOF_INT); \
> +    DBUG_ASSERT(((intptr)__start) >= SIZEOF_INT); \
> +    if (__end_words > __start) \
> +      while (__end > __end_words && __end[-1] == 0x20) \
> +        __end--; \
> +    if (__end[-1] == 0x20 && __start_words < __end_words) \

why this if() isn't inside the previous one ?

> +      while (__end > __start_words && ((unsigned *)__end)[-1] ==
> SPACE_INT) \
> +        __end -= SIZEOF_INT; \
> +  } \
> +  while (__end > __start && __end[-1] == 0x20) \
> +    __end--; \
> +  end_ret= __end; \
> +}
>  #endif

I assume you've benchmarked your code. What did you see ?

Regards / Mit vielen Grüssen,
Sergei

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <serg@stripped>
 / /|_/ / // /\ \/ /_/ / /__  Principal Software Developer/Server Architect
/_/  /_/\_, /___/\___\_\___/  MySQL GmbH, Dachauer Str. 37, D-80335 München
       <___/                  Geschäftsführer: Kaj Arnö - HRB
München 162140
Thread
bk commit into 6.0 tree (gkodinov:1.2799) BUG#14637kgeorge6 Feb
  • Re: bk commit into 6.0 tree (gkodinov:1.2799) BUG#14637Sergei Golubchik12 Feb