Hi Guilhem,
Thanks for review. Please find comments inline:
On 04/01/2011 11:19 AM, Guilhem Bichot wrote:
> Hello Bar,
>
> Alexander Barkov a écrit, Le 18.03.2011 13:46:
>> #At file:///home/bar/mysql-bzr/mysql-trunk.b58329/ based on
>> revid:alexander.nozdrin@stripped
>>
>> 3302 Alexander Barkov 2011-03-18
>> Bug#11765369 - Bug#58329: BETWEEN DOES NOT USE INDEXES FOR DATE OR
>> DATETIME FIELDS FOR UCS2+ Problem:
>> WHERE date_column = 'ucs2 literal'
>> was rewritten to
>> WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal'
>> so index on date_column did not work any more.
>> Fix: rewrite to
>> WHERE date_column = CONVERT('ucs2 literal' USING latin1)
>
> Without the patch: why did we choose to convert date_column instead of
> the literal? There was probably a rule somewhere?
Yes. We choose the collation of an operation using special rules,
which are very close to the SQL Standard collation aggregation rules,
with some MySQL extensions.
Please have a look into item.cc, into this method:
bool DTCollation::aggregate(DTCollation &dt, uint flags)
and the comments. The comments should give the idea of how it works.
>
> Why is the bug specific of UCS2?
The same problem happens with UTF16 and UTF32.
What happens in DTCollation::aggregate() in this particular example is:
- both sides have the same coercibility
- *only* one of the sides is Unicode
So we allow automatic conversion of the non-Unicode side to the
character set of the Unicode side (UCS2 in this example).
Note, the problem does not happen with UTF8:
we optimize away the CONVERT() function, because:
- DATE/TIME columns use latin1 internally
- DATE/TIME column values are limited to MY_REPERTOIRE_ASCII
- latin1 and utf8 are compatible on the ASCII range
This optimization is done in agg_item_set_converter().
(have a look into its comments as well, if interested)
>> === modified file 'mysql-test/t/ctype_ucs.test'
>> --- a/mysql-test/t/ctype_ucs.test 2011-03-04 14:59:32 +0000
>> +++ b/mysql-test/t/ctype_ucs.test 2011-03-18 12:43:19 +0000
>> @@ -779,5 +779,15 @@ DO IFNULL(CHAR(NULL USING ucs2), '');
>> DO CAST(CONVERT('' USING ucs2) AS UNSIGNED);
>>
>> --echo #
>> +--echo # Bug#11765369 - 58329: BETWEEN DOES NOT USE INDEXES FOR DATE
>> OR DATETIME FIELDS FOR UCS2+
>> +--echo #
>> +SET NAMES utf8, character_set_connection=ucs2;
>> +CREATE TABLE t1 (id INT DEFAULT NULL, date_column DATE DEFAULT NULL,
>> KEY (date_column)) engine=myisam;
>
> I'd delete "=engine=myisam"; so that the test can also run in InnoDB, in
> case one starts MTR with --mysqld=--default-storage-engine=innodb
I agree.
>
>> +INSERT INTO t1 VALUES (1,'2010-09-01'), (2,'2010-10-01');
>> +EXPLAIN EXTENDED SELECT * FROM t1 WHERE date_column= '2010-09-01';
>> +DROP TABLE t1;
>> +SET NAMES latin1;
>> +
>> +--echo #
>> --echo # End of 5.6 tests
>> --echo #
>>
>> === modified file 'sql/item_cmpfunc.cc'
>> --- a/sql/item_cmpfunc.cc 2011-03-17 11:33:17 +0000
>> +++ b/sql/item_cmpfunc.cc 2011-03-18 12:43:19 +0000
>> @@ -484,10 +484,31 @@ void Item_bool_func2::fix_length_and_dec
>> DTCollation coll;
>> if (args[0]->result_type() == STRING_RESULT &&
>> - args[1]->result_type() == STRING_RESULT &&
>> - agg_arg_charsets_for_comparison(coll, args, 2))
>> - return;
>> - + args[1]->result_type() == STRING_RESULT)
>> + {
>> + if (args[0]->is_datetime() || args[1]->is_datetime())
>> + {
>> + /*
>> + For DATE/TIME we always convert to my_charset_numeric (latin1),
>> + to make this:
>> + WHERE date_column = 'ucs2 literal';
>> +
>> + rewrite to
>> + WHERE date_column = CONVERT('ucs2 literal' USING latin1);
>> +
>> + instead of + WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal';
>> +
>> + This makes it possible to use index on date_column for optimization.
>> + */
>> + coll.set(&my_charset_numeric);
>> + if (agg_item_set_converter(coll, func_name(), args, 2, 0, 1))
>> + return;
>> + }
>> + else if (agg_arg_charsets_for_comparison(coll, args, 2))
>> + return;
>> + }
>
> I worry to have one agg* function called for DATE/TIME and another one
> called for other types,
The agg* functions are called only for STRING_RESULT types.
Never for the other result_type codes.
> i.e. I worry about "if/else".
The source of the problem is that DATE/TIME data types do not
have their own result_type code - they pretend to be STRING_RESULT.
In some cases we need to distinguish between "normal" strings
and "fake" strings, like DATE/TIME. This is one of the cases.
The "real" solution is to introduce separate type codes
for DATE/TIME data types, but this is too much for the scope
of this bug fix.
With the current code, I don't have a good idea how
to do this without "if/else".
We don't need to convert to Unicode in this case.
We need to do the other way around here:
convert the Unicode (i.e. UCS2) side to the character
set of DATE column, which is latin1.
(strictly speaking, it's pure ASCII, but we use my_charset_latin1
for ASCII internally)
> I feel that this
> could introduce DATE/TIME specific code paths, and thus bugs in the future.
> It looks like agg_arg_charsets_for_comparison() calls
> agg_item_set_converter(), so can't we do this instead, to avoid if/else:
>
> > if (args[0]->result_type() == STRING_RESULT &&
> > + args[1]->result_type() == STRING_RESULT)
> > + {
> > + if (args[0]->is_datetime() || args[1]->is_datetime())
> > + {
> > + /*
> > + For DATE/TIME we always convert to my_charset_numeric (latin1),
> > + to make this:
> > + WHERE date_column = 'ucs2 literal';
> > +
> > + rewrite to
> > + WHERE date_column = CONVERT('ucs2 literal' USING latin1);
> > +
> > + instead of
> > + WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal';
> > +
> > + This makes it possible to use index on date_column for optimization.
> > + */
> > + coll.set(&my_charset_numeric);
> > + }
> > + if (agg_arg_charsets_for_comparison(coll, args, 2))
> > + return;
> > + }
It will not work. The original value of "coll" does not matter.
agg_arg_charsets_for_comparison() will reset "coll" to UCS2.
It does two things:
- calculates the collation of the operation using
agg_item_collations()
- install CONVERT() function for the arguments which need it,
using agg_item_set_converter()
>
> ?