Hello,
Alexander Barkov a écrit, Le 28.04.2010 21:42:
> Hi Guilhem,
>
> Thanks for review! Please see my comments below:
I realized that what I am reviewing could be called a "Bar code", which
is a funny pun.
Comments below.
> Guilhem Bichot wrote:
>> Hello,
>>
>> Alexander Barkov a écrit, Le 28.04.2010 14:13:
>>> #At file:///home/bar/mysql-bzr/mysql-trunk-bugfixing/ based on
>>> revid:dao-gang.qu@stripped
>>>
>>> 3025 Alexander Barkov 2010-04-28
>>> Bug#52849 [Com]: datetime index not work
>>> Problem: after introduction of "WL#2649 Number-to-string
>>> conversions"
>>> This query:
>>> SET NAMES cp850; -- Or any other non-latin1 ASCII-based
>>> character set
>>> SELECT * FROM t1
>>> WHERE datetime_column='2010-01-01 00:00:00'
>>> started to add extra character set conversion: SELECT *
>>> FROM t1
>>> WHERE CONVERT(datetime_column USING cp850)='2010-01-01
>>> 00:00:00';
>>> so index on DATETIME column was not used anymore.
>>> Fix:
>>> avoid convertion of NUMERIC/DATETIME items
>>> (i.e. those with derivation DERIVATION_NUMERIC).
>>> === modified file 'sql/item.cc'
>>> --- a/sql/item.cc 2010-04-19 08:27:46 +0000
>>> +++ b/sql/item.cc 2010-04-28 12:04:20 +0000
>>> @@ -1796,6 +1796,20 @@ bool agg_item_set_converter(DTCollation
>>> &dummy_offset))
>>> continue;
>>>
>>> + /*
>>> + No needs to add converter if an "arg" is NUMERIC or DATETIME
>>> + value (which is pure ASCII) and at the same time target
>>> DTCollation
>>> + is ASCII-compatible. For example, no needs to rewrite:
>>> + SELECT * FROM t1 WHERE datetime_field = '2010-01-01';
>>> + to
>>> + SELECT * FROM t1 WHERE CONVERT(datetime_field USING cs) =
>>> '2010-01-01';
>>> + */
>>> + if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
>>
>>> + (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII &&
>>
>> is this test needed? Isn't the first one (about derivations) enough?
>
> It is enough now. I have put some future assumptions already.
>
> Now we can rewrite as follows:
>
> if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
> !(coll.collation->state & MY_CS_NONASCII))
> {
> DBUG_ASSERT((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII);
> DBUG_ASSERT(!(*arg)->collation.collation->state & MY_CS_NONASCII)));
> continue;
> }
>
>
> I.e. checking for repertoire is somewhat redundant at the moment,
> and checking for coll.collation->state is redundant too.
>
> But in fact these conditions test very different things.
> See explanation about three parts of DTCollation below.
>
> In theory, it is possible that derivation is DERIVATION_NUMERIC,
> but at the same time there's MY_CS_NONASCII flag, and at the same
> time repertoire is not MY_REPERTOIRE_ASCII. In practice, checking for
> DERIVATION_NUMERIC is enough for *now*. But may change quite soon.
>
>> What is a repertoire (ok it's a French word but what does it mean here
>> ;-)? What does MY_REPERTOIRE_ASCII mean?
>
> It means that the value can consist only of characters
> from Unicode range U+0000..U+007F.
>
> Note, a value can be of "tricky" character set, like UCS2,
> but at the same time its repertoire can be MY_REPERTOIRE_ASCII.
>
>
>> There exists MY_REPERTOIRE_NUMERIC, should it be used instead?
>
> Currently MY_REPERTOIRE_NUMERIC is just an alias for
> MY_REPERTOIRE_ASCII.
>
> In the future we can split MY_REPERTOIRE_ASCII into smaller parts, like:
> MY_REPERTOIRE_ASCII_CONTROLS
> MY_REPERTOIRE_ASCII_LETTERS
> MY_PEPERTOIRE_ASCII_PUNCTUATION
> MY_REPERTOIRE_ASCII_DIGITS
>
> and MY_REPERTOIRE_NUMERIC will be something different -
> a smaller subset of MY_REPERTOIRE_ASCII. This will allow
> to avoid "Illegal mix of collations" in some more cases.
>
> In the above condition it's only important that to check
> that repertoire is MY_REPERTOIRE_ASCII.
>
>>
>>> + !((*arg)->collation.collation->state & MY_CS_NONASCII)
> &&
>>
>> a member named "collation" in an object named "collation"
>> (collation.collation) - what is it about?
>
> Object "collation" is a complex object of DTCollation type,
> which has the following attributes:
>
> * collation - a pointer of CHARSET_INFO type,
> which describes character set and collation pair -
> low level handlers doing things like conversion and comparison.
>
> * repertoire - the set of Unicode characters which can
> appear in value. It helps to make sure that safe conversion
> is possible even before knowing values.
>
> * derivation - collation strength - the main factor which
> is used when mixing values with different collation.
> For example, a field of CHAR/VARCHAR/TEXT type is stronger
> than a string literal, so if you mix a field and a literal
> for comparison, they will be compared according to the field
> collation.
> On contrary, a field of numeric/datetime type is weaker
> than literal (starting from 5.5.3).
>
>
> All three above attributes determine what happens when
> we mix objects (like field, literal, function, etc): which
> side character set and collation to use, and whether we
> really need character set conversion, or character set re-cast
> is enough.
>
>
>>
>> I have the impression that the 3 first tests are about the
>> number/date's collation? do we need to test all three?
>
> Will definitely need in the future. But not now.
> So moving the two extra conditions into DBUG_ASSERT looks
> the best solution now for me.
>
>>
>>> + !(coll.collation->state & MY_CS_NONASCII))
>>
>> this one is about the target collation it seems.
>
> Yes. There's not much to test about the target collation.
> We only need to know if it is compatible with 7bit ASCII.
Thanks for the detailed explanations.
Your original version or the one with asserts is ok to push.
Though I wonder: if we had this patch instead:
>> if !((*arg)->collation.collation->state & MY_CS_NONASCII) &&
>> !(coll.collation->state & MY_CS_NONASCII))
>> {
>> //it's all ASCII
>> continue;
>> }
you see: if the "source" and "target" collations don't have
MY_CS_NONASCII then they are ASCII and there is no conversion to do.
Wouldn't that fix the bug? I understand that this test would not be
limited to numeric things.