Hi!
On Feb 06, kgeorge@stripped wrote:
> ChangeSet@stripped, 2008-02-06 18:45:46+02:00, gkodinov@stripped +5 -0
> Bug #14637: trim trailing spaces processes data only byte wise
> Use and int * where possible to scan for trailing space in a
> string instead of always iterating char-by-char.
>
> diff -Nrup a/include/m_string.h b/include/m_string.h
> --- a/include/m_string.h 2007-11-30 15:52:49 +02:00
> +++ b/include/m_string.h 2008-02-06 18:45:44 +02:00
> @@ -263,4 +263,63 @@ typedef struct st_mysql_lex_string LEX_S
> #define USTRING_WITH_LEN(X) ((uchar*) X), ((size_t) (sizeof(X) - 1))
> #define C_STRING_WITH_LEN(X) ((char *) (X)), ((size_t) (sizeof(X) - 1))
>
> +/*
> + Skip trailing space. Made as a marcro so it can inline. The idea is simple :
monty prefers it as an inline function (difficult to read, hell to
debug).
> + On most systems reading memory in larger chunks (ideally equal to the size of
> + the chinks that the machine physically reads from memory) causes fewer memory
> + access loops and hence increased performance.
> + This is why the 'int' type is used : it's closest to that (according to how
> + it's defined in C).
> + So when we determine the amount of whitespace at the end of a string we do
> + the following :
> + 1. We divide the string into 3 zones :
> + a) from the start of the string (__start) to the first multiple
> + of sizeof(int) (__start_words)
> + b) from the end of the string (__end) to the last multiple of sizeof(int)
> + (__end_words)
> + c) a zone that is aligned to sizeof(int) and can be safely accessed
> + through an int *
> + 2. We start comparing backwards from (c) char-by-char. If all we find is
> + space then we continue
> + 3. If there are elements in zone (b) we compare them as unsigned ints to a
> + int mask (SPACE_INT) consisting of all spaces
> + 4. Finally we compare the remaining part (a) of the string char by char.
> + This covers for the last non-space unsigned int from 3. (if any)
> +
> + This algorithm works well for relatively larger strings, but it will slow
> + the things down for smaller strings (because of the additional calculations
> + and checks compared to the naive method). Thus the barrier of length 20
> + is added.
> +*/
> +#if SIZEOF_INT == 2
> +#define SPACE_INT 0x2020
I doubt you need to support that :)
> +#elif SIZEOF_INT == 4
> +#define SPACE_INT 0x20202020
> +#elif SIZEOF_INT == 8
> +#define SPACE_INT 0x2020202020202020
> +#else
> +#error define the appropriate constant for a word full of spaces
> +#endif
> +
the define below is so big that it needs a comment explaining the arguments
> +#define SKIP_TRAILING_SPACE(type,ptr,len,end_ret) \
use
do { ... } while(0)
pattern for macros that should be used as a statement (an not as an
expression).
as a bonus you'll be able to use break to leave the macro early (like
return in a function).
> +{ \
do you need 'type' ? only if some callers really need uchar and other
need char - but it doesn't look like they have a good reason for this
inconsistency. better to change my_lengthsp_8bit() to use uchar ?
> + type *__start= ptr; \
> + register type *__end= ptr + len; \
I tend to avoid 'register' - compiler usually knows better what to keep
on registers, at best it'll ignore your declaration, in the worst case
you'll prevent it from making a good choice. Also modern compilers (e.g.
gcc-4 or may be even older) use SSA optimization where, basically every
change of a variable value creates a new variable (that is a=a+1 is
treated as a_2=a_1+1) which won't necessary use the same memory that the
old one.
> + if (len > 20) \
> + { \
> + type *__end_words= (type *)(((intptr)__end) / SIZEOF_INT * SIZEOF_INT); \
> + type *__start_words= (type *) \
> + ((((intptr)__start) + SIZEOF_INT - 1) / SIZEOF_INT * SIZEOF_INT); \
> + DBUG_ASSERT(((intptr)__start) >= SIZEOF_INT); \
> + if (__end_words > __start) \
> + while (__end > __end_words && __end[-1] == 0x20) \
> + __end--; \
> + if (__end[-1] == 0x20 && __start_words < __end_words) \
why this if() isn't inside the previous one ?
> + while (__end > __start_words && ((unsigned *)__end)[-1] ==
> SPACE_INT) \
> + __end -= SIZEOF_INT; \
> + } \
> + while (__end > __start && __end[-1] == 0x20) \
> + __end--; \
> + end_ret= __end; \
> +}
> #endif
I assume you've benchmarked your code. What did you see ?
Regards / Mit vielen Grüssen,
Sergei
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@stripped>
/ /|_/ / // /\ \/ /_/ / /__ Principal Software Developer/Server Architect
/_/ /_/\_, /___/\___\_\___/ MySQL GmbH, Dachauer Str. 37, D-80335 München
<___/ Geschäftsführer: Kaj Arnö - HRB
München 162140