List:Commits« Previous MessageNext Message »
From:Alexander Barkov Date:April 1 2011 11:42am
Subject:Re: bzr commit into mysql-trunk branch (alexander.barkov:3302) Bug#58329
Bug#11765369
View as plain text  
Hi Guilhem,

Thanks for review. Please find comments inline:


On 04/01/2011 11:19 AM, Guilhem Bichot wrote:
> Hello Bar,
>
> Alexander Barkov a écrit, Le 18.03.2011 13:46:
>> #At file:///home/bar/mysql-bzr/mysql-trunk.b58329/ based on
>> revid:alexander.nozdrin@stripped
>>
>> 3302 Alexander Barkov 2011-03-18
>> Bug#11765369 - Bug#58329: BETWEEN DOES NOT USE INDEXES FOR DATE OR
>> DATETIME FIELDS FOR UCS2+ Problem:
>> WHERE date_column = 'ucs2 literal'
>> was rewritten to
>> WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal'
>> so index on date_column did not work any more.
>> Fix: rewrite to
>> WHERE date_column = CONVERT('ucs2 literal' USING latin1)
>
> Without the patch: why did we choose to convert date_column instead of
> the literal? There was probably a rule somewhere?

Yes. We choose the collation of an operation using special rules,
which are very close to the SQL Standard collation aggregation rules,
with some MySQL extensions.

Please have a look into item.cc, into this method:

bool DTCollation::aggregate(DTCollation &dt, uint flags)

and the comments. The comments should give the idea of how it works.


>
> Why is the bug specific of UCS2?

The same problem happens with UTF16 and UTF32.

What happens in DTCollation::aggregate() in this particular example is:
- both sides have the same coercibility
- *only* one of the sides is Unicode
So we allow automatic conversion of the non-Unicode side to the
character set of the Unicode side (UCS2 in this example).


Note, the problem does not happen with UTF8:
we optimize away the CONVERT() function, because:
- DATE/TIME columns use latin1 internally
- DATE/TIME column values are limited to MY_REPERTOIRE_ASCII
- latin1 and utf8 are compatible on the ASCII range

This optimization is done in agg_item_set_converter().
(have a look into its comments as well, if interested)


>> === modified file 'mysql-test/t/ctype_ucs.test'
>> --- a/mysql-test/t/ctype_ucs.test 2011-03-04 14:59:32 +0000
>> +++ b/mysql-test/t/ctype_ucs.test 2011-03-18 12:43:19 +0000
>> @@ -779,5 +779,15 @@ DO IFNULL(CHAR(NULL USING ucs2), '');
>> DO CAST(CONVERT('' USING ucs2) AS UNSIGNED);
>>
>> --echo #
>> +--echo # Bug#11765369 - 58329: BETWEEN DOES NOT USE INDEXES FOR DATE
>> OR DATETIME FIELDS FOR UCS2+
>> +--echo #
>> +SET NAMES utf8, character_set_connection=ucs2;
>> +CREATE TABLE t1 (id INT DEFAULT NULL, date_column DATE DEFAULT NULL,
>> KEY (date_column)) engine=myisam;
>
> I'd delete "=engine=myisam"; so that the test can also run in InnoDB, in
> case one starts MTR with --mysqld=--default-storage-engine=innodb

I agree.

>
>> +INSERT INTO t1 VALUES (1,'2010-09-01'), (2,'2010-10-01');
>> +EXPLAIN EXTENDED SELECT * FROM t1 WHERE date_column= '2010-09-01';
>> +DROP TABLE t1;
>> +SET NAMES latin1;
>> +
>> +--echo #
>> --echo # End of 5.6 tests
>> --echo #
>>
>> === modified file 'sql/item_cmpfunc.cc'
>> --- a/sql/item_cmpfunc.cc 2011-03-17 11:33:17 +0000
>> +++ b/sql/item_cmpfunc.cc 2011-03-18 12:43:19 +0000
>> @@ -484,10 +484,31 @@ void Item_bool_func2::fix_length_and_dec
>> DTCollation coll;
>> if (args[0]->result_type() == STRING_RESULT &&
>> - args[1]->result_type() == STRING_RESULT &&
>> - agg_arg_charsets_for_comparison(coll, args, 2))
>> - return;
>> - + args[1]->result_type() == STRING_RESULT)
>> + {
>> + if (args[0]->is_datetime() || args[1]->is_datetime())
>> + {
>> + /*
>> + For DATE/TIME we always convert to my_charset_numeric (latin1),
>> + to make this:
>> + WHERE date_column = 'ucs2 literal';
>> +
>> + rewrite to
>> + WHERE date_column = CONVERT('ucs2 literal' USING latin1);
>> +
>> + instead of + WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal';
>> +
>> + This makes it possible to use index on date_column for optimization.
>> + */
>> + coll.set(&my_charset_numeric);
>> + if (agg_item_set_converter(coll, func_name(), args, 2, 0, 1))
>> + return;
>> + }
>> + else if (agg_arg_charsets_for_comparison(coll, args, 2))
>> + return;
>> + }
>
> I worry to have one agg* function called for DATE/TIME and another one
> called for other types,

The agg* functions are called only for STRING_RESULT types.
Never for the other result_type codes.

 > i.e. I worry about "if/else".

The source of the problem is that DATE/TIME data types do not
have their own result_type code - they pretend to be STRING_RESULT.
In some cases we need to distinguish between "normal" strings
and "fake" strings, like DATE/TIME. This is one of the cases.

The "real" solution is to introduce separate type codes
for DATE/TIME data types, but this is too much for the scope
of this bug fix.

With the current code, I don't have a good idea how
to do this without "if/else".

We don't need to convert to Unicode in this case.
We need to do the other way around here:
convert the Unicode (i.e. UCS2) side to the character
set of DATE column, which is latin1.

(strictly speaking, it's pure ASCII, but we use my_charset_latin1
for ASCII internally)


> I feel that this
> could introduce DATE/TIME specific code paths, and thus bugs in the future.
> It looks like agg_arg_charsets_for_comparison() calls
> agg_item_set_converter(), so can't we do this instead, to avoid if/else:
>
>  > if (args[0]->result_type() == STRING_RESULT &&
>  > + args[1]->result_type() == STRING_RESULT)
>  > + {
>  > + if (args[0]->is_datetime() || args[1]->is_datetime())
>  > + {
>  > + /*
>  > + For DATE/TIME we always convert to my_charset_numeric (latin1),
>  > + to make this:
>  > + WHERE date_column = 'ucs2 literal';
>  > +
>  > + rewrite to
>  > + WHERE date_column = CONVERT('ucs2 literal' USING latin1);
>  > +
>  > + instead of
>  > + WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal';
>  > +
>  > + This makes it possible to use index on date_column for optimization.
>  > + */
>  > + coll.set(&my_charset_numeric);
>  > + }
>  > + if (agg_arg_charsets_for_comparison(coll, args, 2))
>  > + return;
>  > + }


It will not work. The original value of "coll" does not matter.

agg_arg_charsets_for_comparison() will reset "coll" to UCS2.


It does two things:

- calculates the collation of the operation using
   agg_item_collations()

- install CONVERT() function for the arguments which need it,
   using agg_item_set_converter()


>
> ?
Thread
bzr commit into mysql-trunk branch (alexander.barkov:3302) Bug#58329Bug#11765369Alexander Barkov18 Mar
  • Re: bzr commit into mysql-trunk branch (alexander.barkov:3302) Bug#58329Bug#11765369Guilhem Bichot1 Apr
    • Re: bzr commit into mysql-trunk branch (alexander.barkov:3302) Bug#58329Bug#11765369Alexander Barkov1 Apr
      • Re: bzr commit into mysql-trunk branch (alexander.barkov:3302) Bug#58329Bug#11765369Guilhem Bichot4 Apr