Hi Ole,
The idea of the fix is right. The optimization was done at a wrong place thus
led to a bug, But the implementation could be better.
Please see comment below.
Regards, Evgen.
On 12/03/10 17:45, Ole John Aske wrote:
> #At file:///net/fimafeng09/export/home/tmp/oleja/mysql/mysql-5.1/ based on
> revid:georgi.kodinov@stripped
>
> 3477 Ole John Aske 2010-12-03
> Alternative fix for bug#57030: 'BETWEEN' evaluation is incorrect
>
> This is a more extensive, but IMHO a more correct fix for handling BETWEEN
> optimization.
> It introduce some refactoring where a BETWEEN condition is rewritten
> to a 'GE AND LE' condition inside add_key_fields() and optimized as such.
>
> The special condition 'field' BETWEEN<c1> and<c1> is still
> detected and rewritten
> to 'field' EQ<c1>.
>
> ALternative (shorter and more hacky) fix is at:
> http://lists.mysql.com/commits/125940
>
> ..................
>
> Root cause for this bug is that the optimizer try to detect& optimize the
> special case:
> '<field> BETWEEN c1 AND c1' and handle this as the condition
> '<field> = c1'
>
> This was implemented inside add_key_field(.. *field, *value[]...) which
> assumed
> field to refer key Field, and value[] to refer a [low...high] constant pair.
> value[0] and value[1] was then compared for equality.
>
> In a 'normal' BETWEEN condition of the form '<field> BETWEEN val1 and
> val2' the
> BETWEEN operation is represented with an argementlist containing the
> values [<field>, val1, val2] - add_key_field() is then called with
> parameters field=<field>, *value=val1.
>
> However, if the BETWEEN predicate specified:
>
> 1) '<const1> BETWEEN<const2> AND<field>
>
> the 'field' and 'value' arguments to add_key_field() had to be swapped. This
> was implemented
> by trying the cheat add_key_field() to handle it like:
>
> 2) '<const1> GE<const2> AND<const1> LE<field>'
>
> As we didn't really replace the BETWEEN operation with 'ge' and 'le',
> add_key_field() still handled it as a 'BETWEEN' and compared the (swapped)
> arguments<const1> and<const2> for equality. If they was equal,
> the
> condition 1) was incorrectly 'optimized' to:
>
> 3) '<field> EQ<const1>'
>
> This fix replace the BETWEEN with a 'GE AND LE' pair and let add_key_field()
> extract
> any key equalities / ranges from from this transformed condition.
>
> modified:
> mysql-test/r/range.result
> mysql-test/t/range.test
> sql/sql_select.cc
> === modified file 'mysql-test/r/range.result'
> --- a/mysql-test/r/range.result 2010-08-24 15:51:32 +0000
> +++ b/mysql-test/r/range.result 2010-12-03 14:45:55 +0000
> @@ -1666,4 +1666,91 @@ c_key c_notkey
> 1 1
> 3 3
> DROP TABLE t1;
> +#
> +# Bug #57030: 'BETWEEN' evaluation is incorrect
> +#
> +CREATE TABLE t1(pk INT PRIMARY KEY, i4 INT);
> +CREATE UNIQUE INDEX i4_uq ON t1(i4);
> +INSERT INTO t1 VALUES (1,10), (2,20), (3,30);
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND 10;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 const i4_uq i4_uq 5 const 1
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND 10;
> +pk i4
> +1 10
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN i4 AND i4;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 range i4_uq i4_uq 5 NULL 1 Using where
> +SELECT * FROM t1 WHERE 10 BETWEEN i4 AND i4;
> +pk i4
> +1 10
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN 10 AND i4;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 range i4_uq i4_uq 5 NULL 3 Using where
> +SELECT * FROM t1 WHERE 10 BETWEEN 10 AND i4;
> +pk i4
> +1 10
> +2 20
> +3 30
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN i4 AND 10;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 range i4_uq i4_uq 5 NULL 1 Using where
> +SELECT * FROM t1 WHERE 10 BETWEEN i4 AND 10;
> +pk i4
> +1 10
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN 10 AND 10;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 3
> +SELECT * FROM t1 WHERE 10 BETWEEN 10 AND 10;
> +pk i4
> +1 10
> +2 20
> +3 30
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN 11 AND 11;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
> +SELECT * FROM t1 WHERE 10 BETWEEN 11 AND 11;
> +pk i4
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN 100 AND 0;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
> +SELECT * FROM t1 WHERE 10 BETWEEN 100 AND 0;
> +pk i4
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 100 AND 0;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading
> const tables
> +SELECT * FROM t1 WHERE i4 BETWEEN 100 AND 0;
> +pk i4
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND 99999999999999999;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 range i4_uq i4_uq 5 NULL 2 Using where
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND 99999999999999999;
> +pk i4
> +1 10
> +2 20
> +3 30
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 999999999999999 AND 30;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading
> const tables
> +SELECT * FROM t1 WHERE i4 BETWEEN 999999999999999 AND 30;
> +pk i4
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND '20';
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 range i4_uq i4_uq 5 NULL 1 Using where
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND '20';
> +pk i4
> +1 10
> +2 20
> +DROP TABLE t1;
> End of 5.1 tests
>
> === modified file 'mysql-test/t/range.test'
> --- a/mysql-test/t/range.test 2010-08-24 15:51:32 +0000
> +++ b/mysql-test/t/range.test 2010-12-03 14:45:55 +0000
> @@ -1325,4 +1325,63 @@ SELECT * FROM t1 WHERE 2 NOT BETWEEN c_n
>
> DROP TABLE t1;
>
> +--echo #
> +--echo # Bug #57030: 'BETWEEN' evaluation is incorrect
> +--echo #
> +
> +# Test some BETWEEN predicates which does *not* follow the
> +# 'normal' pattern of<field> BETWEEN<low const> AND<high const>
> +
> +CREATE TABLE t1(pk INT PRIMARY KEY, i4 INT);
> +CREATE UNIQUE INDEX i4_uq ON t1(i4);
> +
> +INSERT INTO t1 VALUES (1,10), (2,20), (3,30);
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND 10;
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND 10;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN i4 AND i4;
> +SELECT * FROM t1 WHERE 10 BETWEEN i4 AND i4;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN 10 AND i4;
> +SELECT * FROM t1 WHERE 10 BETWEEN 10 AND i4;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN i4 AND 10;
> +SELECT * FROM t1 WHERE 10 BETWEEN i4 AND 10;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN 10 AND 10;
> +SELECT * FROM t1 WHERE 10 BETWEEN 10 AND 10;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN 11 AND 11;
> +SELECT * FROM t1 WHERE 10 BETWEEN 11 AND 11;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE 10 BETWEEN 100 AND 0;
> +SELECT * FROM t1 WHERE 10 BETWEEN 100 AND 0;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 100 AND 0;
> +SELECT * FROM t1 WHERE i4 BETWEEN 100 AND 0;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND 99999999999999999;
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND 99999999999999999;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 999999999999999 AND 30;
> +SELECT * FROM t1 WHERE i4 BETWEEN 999999999999999 AND 30;
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND '20';
> +SELECT * FROM t1 WHERE i4 BETWEEN 10 AND '20';
Please add also a query with 'WHERE field1 BETWEEN field2 and field2'.
It in an appropriate join ref access should be used for it.
> +
> +
> +DROP TABLE t1;
> +
> --echo End of 5.1 tests
>
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc 2010-10-29 08:23:06 +0000
> +++ b/sql/sql_select.cc 2010-12-03 14:45:55 +0000
> @@ -3348,26 +3348,7 @@ add_key_field(KEY_FIELD **key_fields,uin
> eq_func is NEVER true when num_values> 1
> */
> if (!eq_func)
> - {
> - /*
> - Additional optimization: if we're processing
> - "t.key BETWEEN c1 AND c1" then proceed as if we were processing
> - "t.key = c1".
> - TODO: This is a very limited fix. A more generic fix is possible.
> - There are 2 options:
> - A) Make equality propagation code be able to handle BETWEEN
> - (including cases like t1.key BETWEEN t2.key AND t3.key)
> - B) Make range optimizer to infer additional "t.key = c" equalities
> - and use them in equality propagation process (see details in
> - OptimizerKBAndTodo)
> - */
> - if ((cond->functype() != Item_func::BETWEEN) ||
> - ((Item_func_between*) cond)->negated ||
> - !value[0]->eq(value[1], field->binary()))
> - return;
> - eq_func= TRUE;
> - }
> -
> + return;
> if (field->result_type() == STRING_RESULT)
> {
> if ((*value)->result_type() != STRING_RESULT)
> @@ -3563,9 +3544,49 @@ add_key_fields(JOIN *join, KEY_FIELD **k
> case Item_func::OPTIMIZE_KEY:
> {
> Item **values;
> - // BETWEEN, IN, NE
> - if (is_local_field (cond_func->key_item())&&
> - !(cond_func->used_tables()& OUTER_REF_TABLE_BIT))
> + /*
> + 'a BETWEEN low AND high' is transformed to
> + 'a>= low AND a<= high' before further key optimization:
It's not a transformation. Here is build just a list of possible keys.
> + */
> + if (cond_func->functype() == Item_func::BETWEEN)
> + {
> + values= cond_func->arguments();
> + /*
> + Additional optimization: If 'low = high', transform to
> + "t.key = low".
> + */
> + if (!((Item_func_between*)cond_func)->negated&&
> + values[0]->real_item()->type() == Item::FIELD_ITEM&&
> + values[1]->const_item()&&
Values don't have to be const, they just have to be equal. 'field1 BETWEEN
field2 and field2' also should be handled.
> + values[1]->eq(values[2],
> ((Item_field*)values[0]->real_item())->field->binary()))
> + {
> + Item_bool_func *eq= new
> Item_equal(values[1],((Item_field*)values[0]->real_item()));
> + if (eq)
> + {
> + add_key_fields(join, key_fields, and_level, eq, usable_tables,
> + sargables);
> + }
Instead of creating a new item and passing it better to call
add_key_equal_fields directly.
In this particular case the call would be:
add_key_equal_fields(key_fields, *and_level, cond_func,
(Item_field*) (cond_func->key_item()->real_item()),
0, values + 1,
cond_func->argument_count()-1,
usable_tables, sargables);
> + }
> + else
> + {
> + Item_bool_func2 *ge= new Item_func_ge(values[0],values[1]);
> + if (ge)
> + {
> + add_key_fields(join, key_fields, and_level, ge, usable_tables,
> + sargables);
> + }
> + Item_bool_func2 *le= new Item_func_le(values[0],values[2]);
> + if (le)
> + {
> + add_key_fields(join, key_fields, and_level, le, usable_tables,
> + sargables);
> + }
Same as above.
> + }
> + }
> +
> + // IN, NE
> + else if (is_local_field (cond_func->key_item())&&
> + !(cond_func->used_tables()& OUTER_REF_TABLE_BIT))
> {
> values= cond_func->arguments()+1;
> if (cond_func->functype() == Item_func::NE_FUNC&&
> @@ -3579,21 +3600,6 @@ add_key_fields(JOIN *join, KEY_FIELD **k
> cond_func->argument_count()-1,
> usable_tables, sargables);
> }
> - if (cond_func->functype() == Item_func::BETWEEN)
> - {
> - values= cond_func->arguments();
> - for (uint i= 1 ; i< cond_func->argument_count() ; i++)
> - {
> - Item_field *field_item;
> - if (is_local_field (cond_func->arguments()[i]))
> - {
> - field_item= (Item_field *)
> (cond_func->arguments()[i]->real_item());
> - add_key_equal_fields(key_fields, *and_level, cond_func,
> - field_item, 0, values, 1, usable_tables,
> - sargables);
> - }
> - }
> - }
> break;
> }
> case Item_func::OPTIMIZE_OP:
>