List:Commits« Previous MessageNext Message »
From:Alexander Barkov Date:April 29 2010 10:18am
Subject:Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849
WL#2649
View as plain text  
Hi Guilhem,


Guilhem Bichot wrote:
> Hello,
> 
> Alexander Barkov a écrit, Le 28.04.2010 21:42:
>> Hi Guilhem,
>>
>> Thanks for review! Please see my comments below:
> 
> I realized that what I am reviewing could be called a "Bar code", which 
> is a funny pun.

Funny :)

> Comments below.
> 
>> Guilhem Bichot wrote:
>>> Hello,
>>>
>>> Alexander Barkov a écrit, Le 28.04.2010 14:13:
>>>> #At file:///home/bar/mysql-bzr/mysql-trunk-bugfixing/ based on 
>>>> revid:dao-gang.qu@stripped
>>>>
>>>>  3025 Alexander Barkov    2010-04-28
>>>>       Bug#52849 [Com]: datetime index not work
>>>>       Problem: after introduction of "WL#2649 Number-to-string 
>>>> conversions"
>>>>             This query:
>>>>         SET NAMES cp850; -- Or any other non-latin1 ASCII-based 
>>>> character set
>>>>         SELECT * FROM t1
>>>>         WHERE datetime_column='2010-01-01 00:00:00'
>>>>       started to add extra character set conversion:         SELECT 
>>>> * FROM t1
>>>>         WHERE CONVERT(datetime_column USING cp850)='2010-01-01 
>>>> 00:00:00';
>>>>             so index on DATETIME column was not used anymore.
>>>>             Fix:
>>>>       avoid convertion of NUMERIC/DATETIME items
>>>>       (i.e. those with derivation DERIVATION_NUMERIC).
> 
>>>> === modified file 'sql/item.cc'
>>>> --- a/sql/item.cc    2010-04-19 08:27:46 +0000
>>>> +++ b/sql/item.cc    2010-04-28 12:04:20 +0000
>>>> @@ -1796,6 +1796,20 @@ bool agg_item_set_converter(DTCollation 
>>>>                                    &dummy_offset))
>>>>        continue;
>>>>  
>>>> +    /*
>>>> +      No needs to add converter if an "arg" is NUMERIC or DATETIME
>>>> +      value (which is pure ASCII) and at the same time target 
>>>> DTCollation
>>>> +      is ASCII-compatible. For example, no needs to rewrite:
>>>> +        SELECT * FROM t1 WHERE datetime_field = '2010-01-01';
>>>> +      to
>>>> +        SELECT * FROM t1 WHERE CONVERT(datetime_field USING cs) = 
>>>> '2010-01-01';
>>>> +    */
>>>> +    if ((*arg)->collation.derivation == DERIVATION_NUMERIC
> &&
>>>
>>>> +        (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII
> &&
>>>
>>> is this test needed? Isn't the first one (about derivations) enough? 
>>
>> It is enough now. I have put some future assumptions already.
>>
>> Now we can rewrite as follows:
>>
>>    if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
>>        !(coll.collation->state & MY_CS_NONASCII))
>>    {
>>       DBUG_ASSERT((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII);
>>       DBUG_ASSERT(!(*arg)->collation.collation->state & 
>> MY_CS_NONASCII)));
>>       continue;
>>    }
>>
>>
>> I.e. checking for repertoire is somewhat redundant at the moment,
>> and checking for coll.collation->state is redundant too.
>>
>> But in fact these conditions test very different things.
>> See explanation about three parts of DTCollation below.
>>
>> In theory, it is possible that derivation is DERIVATION_NUMERIC,
>> but at the same time there's MY_CS_NONASCII flag, and at the same
>> time repertoire is not MY_REPERTOIRE_ASCII. In practice, checking for 
>> DERIVATION_NUMERIC is enough for *now*. But may change quite soon.
>>
>>> What is a repertoire (ok it's a French word but what does it mean 
>>> here ;-)? What does MY_REPERTOIRE_ASCII mean?
>>
>> It means that the value can consist only of characters
>> from Unicode range U+0000..U+007F.
>>
>> Note, a value can be of "tricky" character set, like UCS2,
>> but at the same time its repertoire can be MY_REPERTOIRE_ASCII.
>>
>>
>>> There exists MY_REPERTOIRE_NUMERIC, should it be used instead?
>>
>> Currently MY_REPERTOIRE_NUMERIC is just an alias for
>> MY_REPERTOIRE_ASCII.
>>
>> In the future we can split MY_REPERTOIRE_ASCII into smaller parts, like:
>> MY_REPERTOIRE_ASCII_CONTROLS
>> MY_REPERTOIRE_ASCII_LETTERS
>> MY_PEPERTOIRE_ASCII_PUNCTUATION
>> MY_REPERTOIRE_ASCII_DIGITS
>>
>> and MY_REPERTOIRE_NUMERIC will be something different -
>> a smaller subset of MY_REPERTOIRE_ASCII. This will allow
>> to avoid "Illegal mix of collations" in some more cases.
>>
>> In the above condition it's only important that to check
>> that repertoire is MY_REPERTOIRE_ASCII.
>>
>>>
>>>> +        !((*arg)->collation.collation->state & MY_CS_NONASCII)
> &&
>>>
>>> a member named "collation" in an object named "collation" 
>>> (collation.collation) - what is it about?
>>
>> Object "collation" is a complex object of DTCollation type,
>> which has the following attributes:
>>
>> * collation - a pointer of CHARSET_INFO type,
>>   which describes character set and collation pair -
>>   low level handlers doing things like conversion and comparison.
>>
>> * repertoire - the set of Unicode characters which can
>>   appear in value. It helps to make sure that safe conversion
>>   is possible even before knowing values.
>>
>> * derivation - collation strength - the main factor which
>>   is used when mixing values with different collation.
>>   For example, a field of CHAR/VARCHAR/TEXT type is stronger
>>   than a string literal, so if you mix a field and a literal
>>   for comparison, they will be compared according to the field
>>   collation.
>>   On contrary, a field of numeric/datetime type is weaker
>>   than literal (starting from 5.5.3).
>>
>>
>> All three above attributes determine what happens when
>> we mix objects (like field, literal, function, etc): which
>> side character set and collation to use, and whether we
>> really need character set conversion, or character set re-cast
>> is enough.
>>
>>
>>>
>>> I have the impression that the 3 first tests are about the 
>>> number/date's collation? do we need to test all three?
>>
>> Will definitely need in the future. But not now.
>> So moving the two extra conditions into DBUG_ASSERT looks
>> the best solution now for me.
>>
>>>
>>>> +        !(coll.collation->state & MY_CS_NONASCII))
>>>
>>> this one is about the target collation it seems.
>>
>> Yes. There's not much to test about the target collation.
>> We only need to know if it is compatible with 7bit ASCII.
> 
> Thanks for the detailed explanations.
> Your original version or the one with asserts is ok to push.

Thanks!

> Though I wonder: if we had this patch instead:
> 
>>> if !((*arg)->collation.collation->state & MY_CS_NONASCII)
> &&
>>> !(coll.collation->state & MY_CS_NONASCII))
>>> {
>>>  //it's all ASCII
>>>  continue;
>>> }
> you see: if the "source" and "target" collations don't have 
> MY_CS_NONASCII then they are ASCII and there is no conversion to do.
 > Wouldn't that fix the bug? I understand that this test would not be
 > limited to numeric things.
 >

This is not enough. Note, the first argument can be say latin1 and
the second argument can be latin2. Both do not have MY_CS_NONASCII,
because both are compatible with 7bit ASCII on U+0000..U+007F range
(= binary codes 0x00..0x7F).

So generally we cannot avoid conversion in this case.

But we can avoid conversion if we know that repertoire of
the first argument is MY_REPERTOIRE_ASCII, i.e. the first
argument does not have any extended characters in byte range
0x80..0xFF. So we would have to have this condition anyway:

(*arg)->collation.repertoire == MY_REPERTOIRE_ASCII


What we could have is:

    if (!(coll.collation->state & MY_CS_NONASCII)) &&
        (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII &&
        (!(*arg)->collation.collation->state & MY_CS_NONASCII))
    {
      //it's all ASCII
      continue;
    }


I.e. remove limitation for numeric/datetime origin of the source.
But this needs additional investigations: my quick tests yesterday
showed that a number of tests started to fail.

At this point I'd like to fix the problem which appeared in 5.5.3.
But in the future this piece of code can be revised for better
optimization. I will put a comment about this.

Thanks!

Thread
bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849 WL#2649Alexander Barkov28 Apr
  • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Guilhem Bichot28 Apr
    • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov28 Apr
      • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Guilhem Bichot29 Apr
        • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov29 Apr
          • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov29 Apr