Hello,
Alexander Barkov a écrit, Le 01.04.2011 13:42:
> Hi Guilhem,
>
> Thanks for review. Please find comments inline:
>
>
> On 04/01/2011 11:19 AM, Guilhem Bichot wrote:
>> Hello Bar,
>>
>> Alexander Barkov a écrit, Le 18.03.2011 13:46:
>>> #At file:///home/bar/mysql-bzr/mysql-trunk.b58329/ based on
>>> revid:alexander.nozdrin@stripped
>>>
>>> 3302 Alexander Barkov 2011-03-18
>>> Bug#11765369 - Bug#58329: BETWEEN DOES NOT USE INDEXES FOR DATE OR
>>> DATETIME FIELDS FOR UCS2+ Problem:
>>> WHERE date_column = 'ucs2 literal'
>>> was rewritten to
>>> WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal'
>>> so index on date_column did not work any more.
>>> Fix: rewrite to
>>> WHERE date_column = CONVERT('ucs2 literal' USING latin1)
>>
>> Without the patch: why did we choose to convert date_column instead of
>> the literal? There was probably a rule somewhere?
>
> Yes. We choose the collation of an operation using special rules,
> which are very close to the SQL Standard collation aggregation rules,
> with some MySQL extensions.
>
> Please have a look into item.cc, into this method:
>
> bool DTCollation::aggregate(DTCollation &dt, uint flags)
>
> and the comments. The comments should give the idea of how it works.
yes they do
>> Why is the bug specific of UCS2?
>
> The same problem happens with UTF16 and UTF32.
>
> What happens in DTCollation::aggregate() in this particular example is:
> - both sides have the same coercibility
> - *only* one of the sides is Unicode
ok and Unicode is superset.
> So we allow automatic conversion of the non-Unicode side to the
> character set of the Unicode side (UCS2 in this example).
>
>
> Note, the problem does not happen with UTF8:
> we optimize away the CONVERT() function, because:
> - DATE/TIME columns use latin1 internally
> - DATE/TIME column values are limited to MY_REPERTOIRE_ASCII
> - latin1 and utf8 are compatible on the ASCII range
>
> This optimization is done in agg_item_set_converter().
> (have a look into its comments as well, if interested)
done. Could you please add some doxygen comment to agg_item_set_converter()?
>>> === modified file 'sql/item_cmpfunc.cc'
>>> --- a/sql/item_cmpfunc.cc 2011-03-17 11:33:17 +0000
>>> +++ b/sql/item_cmpfunc.cc 2011-03-18 12:43:19 +0000
>>> @@ -484,10 +484,31 @@ void Item_bool_func2::fix_length_and_dec
>>> DTCollation coll;
>>> if (args[0]->result_type() == STRING_RESULT &&
>>> - args[1]->result_type() == STRING_RESULT &&
>>> - agg_arg_charsets_for_comparison(coll, args, 2))
>>> - return;
>>> - + args[1]->result_type() == STRING_RESULT)
>>> + {
>>> + if (args[0]->is_datetime() || args[1]->is_datetime())
>>> + {
>>> + /*
>>> + For DATE/TIME we always convert to my_charset_numeric (latin1),
>>> + to make this:
>>> + WHERE date_column = 'ucs2 literal';
>>> +
>>> + rewrite to
>>> + WHERE date_column = CONVERT('ucs2 literal' USING latin1);
>>> +
>>> + instead of + WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal';
>>> +
>>> + This makes it possible to use index on date_column for optimization.
>>> + */
>>> + coll.set(&my_charset_numeric);
>>> + if (agg_item_set_converter(coll, func_name(), args, 2, 0, 1))
>>> + return;
>>> + }
>>> + else if (agg_arg_charsets_for_comparison(coll, args, 2))
>>> + return;
>>> + }
>>
>> I worry to have one agg* function called for DATE/TIME and another one
>> called for other types,
>
> The agg* functions are called only for STRING_RESULT types.
> Never for the other result_type codes.
ok. Rephrasing:
"I worry to have one agg* function called for DATE/TIME and another one
called for other STRING_RESULT".
> > i.e. I worry about "if/else".
>
> The source of the problem is that DATE/TIME data types do not
> have their own result_type code - they pretend to be STRING_RESULT.
> In some cases we need to distinguish between "normal" strings
> and "fake" strings, like DATE/TIME. This is one of the cases.
>
> The "real" solution is to introduce separate type codes
> for DATE/TIME data types, but this is too much for the scope
> of this bug fix.
>
> With the current code, I don't have a good idea how
> to do this without "if/else".
>
> We don't need to convert to Unicode in this case.
> We need to do the other way around here:
> convert the Unicode (i.e. UCS2) side to the character
> set of DATE column, which is latin1.
>
> (strictly speaking, it's pure ASCII, but we use my_charset_latin1
> for ASCII internally)
>
>
>> I feel that this
>> could introduce DATE/TIME specific code paths, and thus bugs in the
>> future.
>> It looks like agg_arg_charsets_for_comparison() calls
>> agg_item_set_converter(), so can't we do this instead, to avoid if/else:
>>
>> > if (args[0]->result_type() == STRING_RESULT &&
>> > + args[1]->result_type() == STRING_RESULT)
>> > + {
>> > + if (args[0]->is_datetime() || args[1]->is_datetime())
>> > + {
>> > + /*
>> > + For DATE/TIME we always convert to my_charset_numeric (latin1),
>> > + to make this:
>> > + WHERE date_column = 'ucs2 literal';
>> > +
>> > + rewrite to
>> > + WHERE date_column = CONVERT('ucs2 literal' USING latin1);
>> > +
>> > + instead of
>> > + WHERE CONVERT(date_column USING ucs2) = 'ucs2 literal';
>> > +
>> > + This makes it possible to use index on date_column for optimization.
>> > + */
>> > + coll.set(&my_charset_numeric);
>> > + }
>> > + if (agg_arg_charsets_for_comparison(coll, args, 2))
>> > + return;
>> > + }
>
>
> It will not work. The original value of "coll" does not matter.
>
> agg_arg_charsets_for_comparison() will reset "coll" to UCS2.
Could you please then document, in a doxygen comment for the proper
function, that "coll" is an _out_ parameter? I see at least
agg_item_collations() would deserve it, as it does:
c.set(something).
Callers of this function, if they also take a "coll" parameter, should
also mention in their doxygen comment that it's an _out_ parameter.
This is important, because different from agg_item_set_converter(),
which uses "coll" as an _in_ parameter. By the way that function doesn't
change "coll", so I would emphasize this by declaring a const:
agg_item_set_converter(const DTCollation &const, etc)
> It does two things:
>
> - calculates the collation of the operation using
> agg_item_collations()
>
> - install CONVERT() function for the arguments which need it,
> using agg_item_set_converter()
I understand your explanations (I think :-).
But there is something illogical. The current code (unpatched) looks like:
DTCollation coll;
if (args[0]->result_type() == STRING_RESULT &&
args[1]->result_type() == STRING_RESULT &&
agg_arg_charsets_for_comparison(coll, args, 2))
return;
args[0]->cmp_context= args[1]->cmp_context=
item_cmp_type(args[0]->result_type(), args[1]->result_type());
// Make a special case of compare with fields to get nicer DATE
comparisons
if (functype() == LIKE_FUNC) // Disable conversion in case of LIKE
function.
{
set_cmp_func();
return;
}
Two questions:
Q1) the code above says something like: "if those are two strings let's
aggregate the charsets for comparison of the strings". To me, this is
valid logic. It should apply the same for DATE fields. So I think that
the fix should be in the function which is responsible for saying how
two charsets aggregate: you wrote that agg_arg_charsets_for_comparison():
> does two things:
>
> - calculates the collation of the operation using
> agg_item_collations()
>
> - install CONVERT() function for the arguments which need it,
> using agg_item_set_converter()
so:
> It does two things:
>
> - calculates the collation of the operation using
> agg_item_collations()
here agg_item_collations() could realize that we're using DATEs and make
an exception (so it's this function which would have your bugfix),
> - install CONVERT() function for the arguments which need it,
> using agg_item_set_converter()
and so here it would naturally know what arguments need a CONVERT().
Reusing the logic of agg_arg_charsets_for_comparison() entirely.
In other words: it's the job of agg_item_collations() to say how
collations aggregate, so it's this function which needs a fix. Callers
of this function should not have to care that a DATE requires special
treatments.
I suspect that this way, we could have a more "unique execution path"
for DATE and other STRING_RESULT.
It also prepares nicely for when DATE is not a STRING_RESULT anymore: at
that time, maybe we will put in agg_item_collations() some virtual
function call based on type, which will implement the exception.
Now, maybe I'm dreaming and this is impossible. In that case, I'll say
"Bar knows his domain and future plans, his patch fixes the bug, so
trust him" and you can push your fix.
2) for LIKE_FUNC (see code above), we make a special case it seems. "="
is a simplified form of LIKE after all, can't we fix bug#58329 with the
same code as is used for LIKE_FUNC? If we can't, will the same bug as
bug#58329 remain with
"WHERE col_datetime LIKE ucs2_literal" (where in theory an index could
be usable too).
?