Evgeny,
I will leave up the Serg (2nd reviewer) to decide whether he thinks
the performance improvements are of value. As I said, I am a bit
sceptic.
See inline for comments to the actual patch. My comments are
just suggestions. Feel free to disregard, but please give me some
arguments when you disagree (so that I can learn from it).
--
Øystein
Evgeny Potemkin wrote:
> #At file:///work/bzrroot/34384-bug-next/ based on
revid:epotemkin@stripped
>
> 2864 Evgeny Potemkin 2009-09-19
> Bug#34384: Slow down on constant conversion.
>
> When values of different types are compared they're converted
to a type that
> allows correct comparison. This conversion is done for each
comparison and
> takes some time. When a constant is being compared it's
possible to cache the
> value after conversion to speedup comparison.
>
> A test case isn't provided because all changes are internal and
isn't visible
> outside.
>
> A flag named uncacheable is added to the Item class. It denies
caching of the
> item.
> A flag named late_caching is added to the Item_cache class.
It's set to TRUE
> when we need to postpone caching till the first use of cached
value.
> A function named check_late_caching is added to the Item_cache
flag. Its
> purpose is to check state of the late_caching flag and cache
value accordingly.
> Item_cache_xxx::val_xxx functions are changed to call it prior
to return
> cached value.
and Item_cache_xxx::store functions
> The Arg_comparator::set_cmp_func function is changed to cache a
value being
> compared if it's a constant and needs type conversion.
> Now the Item_func_like::fix_fields function denies caching
arguments of the LIKE
> function.
> The Item_cache::get_cache function is overloaded to allow
setting of the
> late_caching flag and cache type.
> @ mysql-test/r/range.result
> Adjusted a test case after fix for the bug#34384.
> @ sql/item.cc
> Bug#34384: Slow down on constant conversion.
> A flag named uncacheable is added to the Item class. It
denies caching of the
> item.
> The Item_cache::get_cache function is overloaded to allow
setting of the
> late_caching flag and cache type.
> Item_cache_xxx::val_xxx functions are changed to call it
prior to return
> cached value.
> @ sql/item.h
> Bug#34384: Slow down on constant conversion.
> A function named check_late_caching is added to the
Item_cache flag. Its
> purpose is to check state of the late_caching flag and cache
value accordingly.
> The Item_cache::get_cache function is overloaded to allow
setting of the
> late_caching flag and cache type.
> @ sql/item_cmpfunc.cc
> Bug#34384: Slow down on constant conversion.
> The Arg_comparator::set_cmp_func function is changed to cache
a value being
> compared if it's a constant and needs type conversion.
> @ sql/item_cmpfunc.h
> Bug#34384: Slow down on constant conversion.
> @ sql/item_xmlfunc.cc
> Bug#34384: Slow down on constant conversion.
>
> modified:
> mysql-test/r/range.result
> sql/item.cc
> sql/item.h
> sql/item_cmpfunc.cc
> sql/item_cmpfunc.h
> sql/item_xmlfunc.cc
> === modified file 'mysql-test/r/range.result'
> --- a/mysql-test/r/range.result 2008-06-10 22:27:52 +0000
> +++ b/mysql-test/r/range.result 2009-09-19 17:55:58 +0000
> @@ -1230,8 +1230,6 @@ c1
> Warnings:
> Warning 1366 Incorrect decimal value: 'A' for column 'c1' at row 1
> Warning 1292 Truncated incorrect DOUBLE value: 'A'
> -Warning 1292 Truncated incorrect DOUBLE value: 'A'
> -Warning 1292 Truncated incorrect DOUBLE value: 'A'
I guess the extra warnings disappear because there are only one
conversion now.
> DROP TABLE t1;
> create table t1 (a int,b int,key (b),key (a),key (b,a));
> insert into t1(a,b) values (1,2),(3,4),(5,6),(7,8);
>
> === modified file 'sql/item.cc'
> --- a/sql/item.cc 2009-07-23 12:21:41 +0000
> +++ b/sql/item.cc 2009-09-19 17:55:58 +0000
> @@ -376,7 +376,8 @@ int Item::save_str_value_in_field(Field
> Item::Item():
> is_expensive_cache(-1), rsize(0), name(0), orig_name(0),
name_length(0),
> fixed(0), is_autogenerated_name(TRUE),
> - collation(&my_charset_bin, DERIVATION_COERCIBLE)
> + collation(&my_charset_bin, DERIVATION_COERCIBLE),
> + uncacheable(0)
> {
> marker= 0;
> maybe_null=null_value=with_sum_func=unsigned_flag=0;
> @@ -7136,17 +7137,22 @@ bool field_is_equal_to_item(Field *field
>
> Item_cache* Item_cache::get_cache(const Item *item)
> {
> - switch (item->result_type()) {
> + return get_cache(item->result_type(), item, FALSE);
> +}
> +
> +Item_cache* Item_cache::get_cache(const Item_result type, const Item
*item, bool late_caching)
> +{
You could consider using a default parameter here, but I am not quite sure
whether that is kosher in MySQL.
...
> === modified file 'sql/item.h'
> --- a/sql/item.h 2009-06-24 17:00:59 +0000
> +++ b/sql/item.h 2009-09-19 17:55:58 +0000
> @@ -532,6 +532,7 @@ public:
> of its arguments is or
contains a
> subselect. Computed by
fix_fields. */
> Item_result cmp_context; /* Comparison context */
> + bool uncacheable;
A (doxygen) comment explaining this field would be good.
I think you should consider making this field private.
> // alloc & destruct is done as start of select using sql_alloc
> Item();
> /*
> @@ -2995,14 +2996,24 @@ protected:
> Field *cached_field;
> enum enum_field_types cached_field_type;
> public:
> + bool late_caching;
I think you should add a comment here explaining the late caching
mechanism. I think you should also consider making this field private.
> Item_cache():
> - example(0), used_table_map(0), cached_field(0),
cached_field_type(MYSQL_TYPE_STRING)
> + example(0), used_table_map(0), cached_field(0),
cached_field_type(MYSQL_TYPE_STRING),
> + late_caching(0)
> + {
> + fixed= 1;
> + null_value= 1;
> + }
> + Item_cache(bool late_caching_arg):
> + example(0), used_table_map(0), cached_field(0),
cached_field_type(MYSQL_TYPE_STRING),
> + late_caching(late_caching_arg)
> {
> fixed= 1;
> null_value= 1;
> }
A default argument could also be used here.
> Item_cache(enum_field_types field_type_arg):
> - example(0), used_table_map(0), cached_field(0),
cached_field_type(field_type_arg)
> + example(0), used_table_map(0), cached_field(0),
cached_field_type(field_type_arg),
> + late_caching(0)
> {
> fixed= 1;
> null_value= 1;
> @@ -3026,6 +3037,8 @@ public:
> enum Type type() const { return CACHE_ITEM; }
> enum_field_types field_type() const { return cached_field_type; }
> static Item_cache* get_cache(const Item *item);
> + static Item_cache* get_cache(const Item_result type, const Item* item,
> + bool late_caching);
> table_map used_tables() const { return used_table_map; }
> virtual void keep_array() {}
> virtual void print(String *str, enum_query_type query_type);
> @@ -3037,6 +3050,18 @@ public:
> {
> return this == item;
> }
> + inline bool check_late_caching(Item *item)
> + {
> + if (!late_caching)
> + return FALSE;
> + if (item)
> + example= item;
> + else {
> + late_caching= FALSE;
> + store(example);
> + }
> + return TRUE;
> + }
> };
It seems like this method is used for two different purposes, both for
accessing cached values and for storing them. Generally, I do not
think functions with quite different behavior based on the values of
parameters is a good idea. Splitting this into two, would also give
opportunities for more descriptive names. (BTW, placement of curly
braces for else-block is not according to coding standard.)
Maybe something like this could be done:
inline bool init_late_caching(Item *item)
{
if (!late_caching)
return FALSE;
example= item;
return TRUE;
}
inline void cache_value()
{
late_caching= FALSE;
store(example);
}
Item_cache_xxx::store(...)
{
if (init_late_caching(item))
return ...;
...
}
Item_cache_xxx::val_xxx(...)
{
if (late_caching)
cache_value();
...
}
>
>
> @@ -3046,13 +3071,24 @@ protected:
> longlong value;
> public:
> Item_cache_int(): Item_cache(), value(0) {}
> + Item_cache_int(bool late_caching_arg):
Item_cache(late_caching_arg), value(0) {}
> Item_cache_int(enum_field_types field_type_arg):
> Item_cache(field_type_arg), value(0) {}
>
> void store(Item *item);
> void store(Item *item, longlong val_arg);
> - double val_real() { DBUG_ASSERT(fixed == 1); return (double) value; }
> - longlong val_int() { DBUG_ASSERT(fixed == 1); return value; }
> + double val_real()
> + {
> + DBUG_ASSERT(fixed == 1);
> + check_late_caching(NULL);
> + return (double) value;
> + }
> + longlong val_int()
> + {
> + DBUG_ASSERT(fixed == 1);
> + check_late_caching(NULL);
> + return value;
> + }
Maybe the above functions should be moved to the .cc-file when they
are not one-liners any more. This has the added value that all
val_xxx implementations can be co-located.
> String* val_str(String *str);
> my_decimal *val_decimal(my_decimal *);
> enum Item_result result_type() const { return INT_RESULT; }
> @@ -3065,9 +3101,15 @@ class Item_cache_real: public Item_cache
> double value;
> public:
> Item_cache_real(): Item_cache(), value(0) {}
> + Item_cache_real(bool late_caching_arg):
Item_cache(late_caching_arg), value(0) {}
>
> void store(Item *item);
> - double val_real() { DBUG_ASSERT(fixed == 1); return value; }
> + double val_real()
> + {
> + DBUG_ASSERT(fixed == 1);
> + check_late_caching(NULL);
> + return value;
> + }
> longlong val_int();
> String* val_str(String *str);
> my_decimal *val_decimal(my_decimal *);
> @@ -3081,6 +3123,7 @@ protected:
> my_decimal decimal_value;
> public:
> Item_cache_decimal(): Item_cache() {}
> + Item_cache_decimal(bool late_caching_arg):
Item_cache(late_caching_arg) {}
>
> void store(Item *item);
> double val_real();
> @@ -3105,10 +3148,22 @@ public:
> MYSQL_TYPE_VARCHAR &&
> !((const Item_field *) item)->field->has_charset())
> {}
> + Item_cache_str(const Item *item, bool late_caching_arg) :
> + Item_cache(late_caching_arg), value(0),
> + is_varbinary(item->type() == FIELD_ITEM &&
> + ((const Item_field *) item)->field->type() ==
> + MYSQL_TYPE_VARCHAR &&
> + !((const Item_field *) item)->field->has_charset())
> + {}
You are duplicating some complex logic here, and duplication is the
root of all evil ;-) Making late_caching_arg have a default value is
one of several ways to avoid this.
> void store(Item *item);
> double val_real();
> longlong val_int();
> - String* val_str(String *) { DBUG_ASSERT(fixed == 1); return value; }
> + String* val_str(String *)
> + {
> + DBUG_ASSERT(fixed == 1);
> + check_late_caching(NULL);
> + return value;
> + }
> my_decimal *val_decimal(my_decimal *);
> enum Item_result result_type() const { return STRING_RESULT; }
> CHARSET_INFO *charset() const { return value->charset(); };
> @@ -3123,6 +3178,9 @@ class Item_cache_row: public Item_cache
> public:
> Item_cache_row()
> :Item_cache(), values(0), item_count(2), save_array(0) {}
> + Item_cache_row(bool late_caching_arg)
> + :Item_cache(late_caching_arg), values(0), item_count(2),
> + save_array(0) {}
>
> /*
> 'allocate' used only in row transformer, to preallocate space
for row
>
> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc 2009-07-03 10:39:01 +0000
> +++ b/sql/item_cmpfunc.cc 2009-09-19 17:55:58 +0000
> @@ -524,6 +524,13 @@ void Item_bool_func2::fix_length_and_dec
> set_cmp_func();
> }
>
> +void Item_bool_func2::set_cmp_func()
> +{
> + /* Don't need cache for context analysis. */
> + if (!current_thd->is_context_analysis_only())
> + tmp_arg[0]->uncacheable= tmp_arg[1]->uncacheable= TRUE;
The comment and the name of the method called seems contradictory.
From looking at the if-statement, it seems like the comment should be:
"If not doing only context_analysis, disable caching", but maybe
is_context_analysis_only is just another bad name ...
> + cmp.set_cmp_func(this, tmp_arg, tmp_arg+1);
> +}
>
> int Arg_comparator::set_compare_func(Item_bool_func2 *item,
Item_result type)
> {
> @@ -847,6 +854,7 @@ int Arg_comparator::set_cmp_func(Item_bo
> Item_result type)
> {
> ulonglong const_value= (ulonglong)-1;
> + Item_cache *cache;
Note that variables with same name is created locally in statements
blocks within this method. While quite legal, it may create some
confusion. Maybe yours should be local to?
> a= a1;
> b= a2;
>
> @@ -906,6 +914,28 @@ int Arg_comparator::set_cmp_func(Item_bo
> return 1;
> }
>
> + {
> + /*
> + When one of items being compared is a constant and its type
> + differs from comparison type then cache such item to avoid
> + type conversion of this constant on each evaluation.
> + */
> + if ((*a)->const_item() && !(*a)->uncacheable &&
> + type != (*a)->result_type())
> + {
> + cache= Item_cache::get_cache(type, *a, TRUE);
> + cache->store(*a);
> + a_cache= cache;
> + a= (Item **)&a_cache;
> + } else if ((*b)->const_item() && !(*b)->uncacheable &&
> + type != (*b)->result_type())
> + {
> + cache= Item_cache::get_cache(type, *b, TRUE);
> + cache->store(*b);
> + b_cache= cache;
> + b= (Item **)&b_cache;
> + }
> + }
Quite some code duplication here, too. You could make a helper
function to avoid that. For example, something like this:
void cache_converted_constant(Item **value, Item **cached_value,
Item_result type)
{
if (!value->const_item() || value->uncacheable ||
type == value->result_type())
return;
Item_cache *cache= Item_cache::get_cache(type, value, TRUE);
cache->store(value);
*cached_value= cache;
*cache= *cached_value;
}
And used like this:
cache_converted_constant(a, &a_cache, type);
cache_converted_constant(b, &b_cache, type);
> return set_compare_func(owner_arg, type);
> }
>
> @@ -4465,6 +4495,11 @@ Item_func::optimize_type Item_func_like:
> bool Item_func_like::fix_fields(THD *thd, Item **ref)
> {
> DBUG_ASSERT(fixed == 0);
> +
> + /* LIKE doesn't compare arguments directly, thus no reason to
cache them. */
> + for (int i= 0; i < arg_count; i++)
> + args[i]->uncacheable= TRUE;
> +
> if (Item_bool_func2::fix_fields(thd, ref) ||
> escape_item->fix_fields(thd, &escape_item))
> return TRUE;
>
> === modified file 'sql/item_cmpfunc.h'
> --- a/sql/item_cmpfunc.h 2009-06-22 09:36:50 +0000
> +++ b/sql/item_cmpfunc.h 2009-09-19 17:55:58 +0000
> @@ -335,10 +335,7 @@ public:
> Item_bool_func2(Item *a,Item *b)
> :Item_int_func(a,b), cmp(tmp_arg, tmp_arg+1),
abort_on_null(FALSE) {}
> void fix_length_and_dec();
> - void set_cmp_func()
> - {
> - cmp.set_cmp_func(this, tmp_arg, tmp_arg+1);
> - }
> + void set_cmp_func();
> optimize_type select_optimize() const { return OPTIMIZE_OP; }
> virtual enum Functype rev_functype() const { return UNKNOWN_FUNC; }
> bool have_rev_func() const { return rev_functype() != UNKNOWN_FUNC; }
>
> === modified file 'sql/item_xmlfunc.cc'
> --- a/sql/item_xmlfunc.cc 2009-07-28 14:16:37 +0000
> +++ b/sql/item_xmlfunc.cc 2009-09-19 17:55:58 +0000
> @@ -952,6 +952,8 @@ static Item *create_comparator(MY_XPATH
> */
>
> Item *fake= new Item_string("", 0, xpath->cs);
> + /* Don't cache fake because its value will be changed during
comparison.*/
> + fake->uncacheable= TRUE;
> Item_nodeset_func *nodeset;
> Item *scalar, *comp;
> if (a->type() == Item::XPATH_NODESET)
>
>
>
> ------------------------------------------------------------------------
>
>