List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:April 29 2010 8:55am
Subject:Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849
WL#2649
View as plain text  
Hello,

Alexander Barkov a écrit, Le 28.04.2010 21:42:
> Hi Guilhem,
> 
> Thanks for review! Please see my comments below:

I realized that what I am reviewing could be called a "Bar code", which 
is a funny pun.
Comments below.

> Guilhem Bichot wrote:
>> Hello,
>>
>> Alexander Barkov a écrit, Le 28.04.2010 14:13:
>>> #At file:///home/bar/mysql-bzr/mysql-trunk-bugfixing/ based on 
>>> revid:dao-gang.qu@stripped
>>>
>>>  3025 Alexander Barkov    2010-04-28
>>>       Bug#52849 [Com]: datetime index not work
>>>       Problem: after introduction of "WL#2649 Number-to-string 
>>> conversions"
>>>             This query:
>>>         SET NAMES cp850; -- Or any other non-latin1 ASCII-based 
>>> character set
>>>         SELECT * FROM t1
>>>         WHERE datetime_column='2010-01-01 00:00:00'
>>>       started to add extra character set conversion:         SELECT * 
>>> FROM t1
>>>         WHERE CONVERT(datetime_column USING cp850)='2010-01-01 
>>> 00:00:00';
>>>             so index on DATETIME column was not used anymore.
>>>             Fix:
>>>       avoid convertion of NUMERIC/DATETIME items
>>>       (i.e. those with derivation DERIVATION_NUMERIC).

>>> === modified file 'sql/item.cc'
>>> --- a/sql/item.cc    2010-04-19 08:27:46 +0000
>>> +++ b/sql/item.cc    2010-04-28 12:04:20 +0000
>>> @@ -1796,6 +1796,20 @@ bool agg_item_set_converter(DTCollation 
>>>                                    &dummy_offset))
>>>        continue;
>>>  
>>> +    /*
>>> +      No needs to add converter if an "arg" is NUMERIC or DATETIME
>>> +      value (which is pure ASCII) and at the same time target 
>>> DTCollation
>>> +      is ASCII-compatible. For example, no needs to rewrite:
>>> +        SELECT * FROM t1 WHERE datetime_field = '2010-01-01';
>>> +      to
>>> +        SELECT * FROM t1 WHERE CONVERT(datetime_field USING cs) = 
>>> '2010-01-01';
>>> +    */
>>> +    if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
>>
>>> +        (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII &&
>>
>> is this test needed? Isn't the first one (about derivations) enough? 
> 
> It is enough now. I have put some future assumptions already.
> 
> Now we can rewrite as follows:
> 
>    if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
>        !(coll.collation->state & MY_CS_NONASCII))
>    {
>       DBUG_ASSERT((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII);
>       DBUG_ASSERT(!(*arg)->collation.collation->state & MY_CS_NONASCII)));
>       continue;
>    }
> 
> 
> I.e. checking for repertoire is somewhat redundant at the moment,
> and checking for coll.collation->state is redundant too.
> 
> But in fact these conditions test very different things.
> See explanation about three parts of DTCollation below.
> 
> In theory, it is possible that derivation is DERIVATION_NUMERIC,
> but at the same time there's MY_CS_NONASCII flag, and at the same
> time repertoire is not MY_REPERTOIRE_ASCII. In practice, checking for 
> DERIVATION_NUMERIC is enough for *now*. But may change quite soon.
> 
>> What is a repertoire (ok it's a French word but what does it mean here 
>> ;-)? What does MY_REPERTOIRE_ASCII mean?
> 
> It means that the value can consist only of characters
> from Unicode range U+0000..U+007F.
> 
> Note, a value can be of "tricky" character set, like UCS2,
> but at the same time its repertoire can be MY_REPERTOIRE_ASCII.
> 
> 
>> There exists MY_REPERTOIRE_NUMERIC, should it be used instead?
> 
> Currently MY_REPERTOIRE_NUMERIC is just an alias for
> MY_REPERTOIRE_ASCII.
> 
> In the future we can split MY_REPERTOIRE_ASCII into smaller parts, like:
> MY_REPERTOIRE_ASCII_CONTROLS
> MY_REPERTOIRE_ASCII_LETTERS
> MY_PEPERTOIRE_ASCII_PUNCTUATION
> MY_REPERTOIRE_ASCII_DIGITS
> 
> and MY_REPERTOIRE_NUMERIC will be something different -
> a smaller subset of MY_REPERTOIRE_ASCII. This will allow
> to avoid "Illegal mix of collations" in some more cases.
> 
> In the above condition it's only important that to check
> that repertoire is MY_REPERTOIRE_ASCII.
> 
>>
>>> +        !((*arg)->collation.collation->state & MY_CS_NONASCII)
> &&
>>
>> a member named "collation" in an object named "collation" 
>> (collation.collation) - what is it about?
> 
> Object "collation" is a complex object of DTCollation type,
> which has the following attributes:
> 
> * collation - a pointer of CHARSET_INFO type,
>   which describes character set and collation pair -
>   low level handlers doing things like conversion and comparison.
> 
> * repertoire - the set of Unicode characters which can
>   appear in value. It helps to make sure that safe conversion
>   is possible even before knowing values.
> 
> * derivation - collation strength - the main factor which
>   is used when mixing values with different collation.
>   For example, a field of CHAR/VARCHAR/TEXT type is stronger
>   than a string literal, so if you mix a field and a literal
>   for comparison, they will be compared according to the field
>   collation.
>   On contrary, a field of numeric/datetime type is weaker
>   than literal (starting from 5.5.3).
> 
> 
> All three above attributes determine what happens when
> we mix objects (like field, literal, function, etc): which
> side character set and collation to use, and whether we
> really need character set conversion, or character set re-cast
> is enough.
> 
> 
>>
>> I have the impression that the 3 first tests are about the 
>> number/date's collation? do we need to test all three?
> 
> Will definitely need in the future. But not now.
> So moving the two extra conditions into DBUG_ASSERT looks
> the best solution now for me.
> 
>>
>>> +        !(coll.collation->state & MY_CS_NONASCII))
>>
>> this one is about the target collation it seems.
> 
> Yes. There's not much to test about the target collation.
> We only need to know if it is compatible with 7bit ASCII.

Thanks for the detailed explanations.
Your original version or the one with asserts is ok to push.
Though I wonder: if we had this patch instead:

>> if !((*arg)->collation.collation->state & MY_CS_NONASCII) &&
>> !(coll.collation->state & MY_CS_NONASCII))
>> {
>>  //it's all ASCII
>>  continue;
>> }
you see: if the "source" and "target" collations don't have 
MY_CS_NONASCII then they are ASCII and there is no conversion to do.
Wouldn't that fix the bug? I understand that this test would not be 
limited to numeric things.

Thread
bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849 WL#2649Alexander Barkov28 Apr
  • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Guilhem Bichot28 Apr
    • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov28 Apr
      • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Guilhem Bichot29 Apr
        • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov29 Apr
          • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov29 Apr