List:Commits« Previous MessageNext Message »
From:Alexander Barkov Date:April 29 2010 1:27pm
Subject:Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849
WL#2649
View as plain text  
Hi Guilhem,

Alexander Barkov wrote:
> Hi Guilhem,
> 
> 
> Guilhem Bichot wrote:
>> Hello,
>>
>> Alexander Barkov a écrit, Le 28.04.2010 21:42:
>>> Hi Guilhem,
>>>
>>> Thanks for review! Please see my comments below:
>>
>> I realized that what I am reviewing could be called a "Bar code", 
>> which is a funny pun.
> 
> Funny :)
> 
>> Comments below.
>>
>>> Guilhem Bichot wrote:
>>>> Hello,
>>>>
>>>> Alexander Barkov a écrit, Le 28.04.2010 14:13:
>>>>> #At file:///home/bar/mysql-bzr/mysql-trunk-bugfixing/ based on 
>>>>> revid:dao-gang.qu@stripped
>>>>>
>>>>>  3025 Alexander Barkov    2010-04-28
>>>>>       Bug#52849 [Com]: datetime index not work
>>>>>       Problem: after introduction of "WL#2649 Number-to-string 
>>>>> conversions"
>>>>>             This query:
>>>>>         SET NAMES cp850; -- Or any other non-latin1 ASCII-based 
>>>>> character set
>>>>>         SELECT * FROM t1
>>>>>         WHERE datetime_column='2010-01-01 00:00:00'
>>>>>       started to add extra character set conversion:         SELECT 
>>>>> * FROM t1
>>>>>         WHERE CONVERT(datetime_column USING cp850)='2010-01-01 
>>>>> 00:00:00';
>>>>>             so index on DATETIME column was not used anymore.
>>>>>             Fix:
>>>>>       avoid convertion of NUMERIC/DATETIME items
>>>>>       (i.e. those with derivation DERIVATION_NUMERIC).
>>
>>>>> === modified file 'sql/item.cc'
>>>>> --- a/sql/item.cc    2010-04-19 08:27:46 +0000
>>>>> +++ b/sql/item.cc    2010-04-28 12:04:20 +0000
>>>>> @@ -1796,6 +1796,20 @@ bool agg_item_set_converter(DTCollation 
>>>>>                                    &dummy_offset))
>>>>>        continue;
>>>>>  
>>>>> +    /*
>>>>> +      No needs to add converter if an "arg" is NUMERIC or DATETIME
>>>>> +      value (which is pure ASCII) and at the same time target 
>>>>> DTCollation
>>>>> +      is ASCII-compatible. For example, no needs to rewrite:
>>>>> +        SELECT * FROM t1 WHERE datetime_field = '2010-01-01';
>>>>> +      to
>>>>> +        SELECT * FROM t1 WHERE CONVERT(datetime_field USING cs) = 
>>>>> '2010-01-01';
>>>>> +    */
>>>>> +    if ((*arg)->collation.derivation == DERIVATION_NUMERIC
> &&
>>>>
>>>>> +        (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII
> &&
>>>>
>>>> is this test needed? Isn't the first one (about derivations) enough? 
>>>
>>> It is enough now. I have put some future assumptions already.
>>>
>>> Now we can rewrite as follows:
>>>
>>>    if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
>>>        !(coll.collation->state & MY_CS_NONASCII))
>>>    {
>>>       DBUG_ASSERT((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII);
>>>       DBUG_ASSERT(!(*arg)->collation.collation->state & 
>>> MY_CS_NONASCII)));
>>>       continue;
>>>    }
>>>
>>>
>>> I.e. checking for repertoire is somewhat redundant at the moment,
>>> and checking for coll.collation->state is redundant too.
>>>
>>> But in fact these conditions test very different things.
>>> See explanation about three parts of DTCollation below.
>>>
>>> In theory, it is possible that derivation is DERIVATION_NUMERIC,
>>> but at the same time there's MY_CS_NONASCII flag, and at the same
>>> time repertoire is not MY_REPERTOIRE_ASCII. In practice, checking for 
>>> DERIVATION_NUMERIC is enough for *now*. But may change quite soon.
>>>
>>>> What is a repertoire (ok it's a French word but what does it mean 
>>>> here ;-)? What does MY_REPERTOIRE_ASCII mean?
>>>
>>> It means that the value can consist only of characters
>>> from Unicode range U+0000..U+007F.
>>>
>>> Note, a value can be of "tricky" character set, like UCS2,
>>> but at the same time its repertoire can be MY_REPERTOIRE_ASCII.
>>>
>>>
>>>> There exists MY_REPERTOIRE_NUMERIC, should it be used instead?
>>>
>>> Currently MY_REPERTOIRE_NUMERIC is just an alias for
>>> MY_REPERTOIRE_ASCII.
>>>
>>> In the future we can split MY_REPERTOIRE_ASCII into smaller parts, like:
>>> MY_REPERTOIRE_ASCII_CONTROLS
>>> MY_REPERTOIRE_ASCII_LETTERS
>>> MY_PEPERTOIRE_ASCII_PUNCTUATION
>>> MY_REPERTOIRE_ASCII_DIGITS
>>>
>>> and MY_REPERTOIRE_NUMERIC will be something different -
>>> a smaller subset of MY_REPERTOIRE_ASCII. This will allow
>>> to avoid "Illegal mix of collations" in some more cases.
>>>
>>> In the above condition it's only important that to check
>>> that repertoire is MY_REPERTOIRE_ASCII.
>>>
>>>>
>>>>> +        !((*arg)->collation.collation->state &
> MY_CS_NONASCII) &&
>>>>
>>>> a member named "collation" in an object named "collation" 
>>>> (collation.collation) - what is it about?
>>>
>>> Object "collation" is a complex object of DTCollation type,
>>> which has the following attributes:
>>>
>>> * collation - a pointer of CHARSET_INFO type,
>>>   which describes character set and collation pair -
>>>   low level handlers doing things like conversion and comparison.
>>>
>>> * repertoire - the set of Unicode characters which can
>>>   appear in value. It helps to make sure that safe conversion
>>>   is possible even before knowing values.
>>>
>>> * derivation - collation strength - the main factor which
>>>   is used when mixing values with different collation.
>>>   For example, a field of CHAR/VARCHAR/TEXT type is stronger
>>>   than a string literal, so if you mix a field and a literal
>>>   for comparison, they will be compared according to the field
>>>   collation.
>>>   On contrary, a field of numeric/datetime type is weaker
>>>   than literal (starting from 5.5.3).
>>>
>>>
>>> All three above attributes determine what happens when
>>> we mix objects (like field, literal, function, etc): which
>>> side character set and collation to use, and whether we
>>> really need character set conversion, or character set re-cast
>>> is enough.
>>>
>>>
>>>>
>>>> I have the impression that the 3 first tests are about the 
>>>> number/date's collation? do we need to test all three?
>>>
>>> Will definitely need in the future. But not now.
>>> So moving the two extra conditions into DBUG_ASSERT looks
>>> the best solution now for me.
>>>
>>>>
>>>>> +        !(coll.collation->state & MY_CS_NONASCII))
>>>>
>>>> this one is about the target collation it seems.
>>>
>>> Yes. There's not much to test about the target collation.
>>> We only need to know if it is compatible with 7bit ASCII.
>>
>> Thanks for the detailed explanations.
>> Your original version or the one with asserts is ok to push.


The version with DBUG_ASSERT crashes with stored functions.
ASCII repertoire gets lost when argument is passed to the function.

This is an example:

create function f1 (par1 int) returns int
begin
return concat(par1);
end|
set @a= f1(1);

So I'll use my original version.


> 
> Thanks!
> 
>> Though I wonder: if we had this patch instead:
>>
>>>> if !((*arg)->collation.collation->state & MY_CS_NONASCII)
> &&
>>>> !(coll.collation->state & MY_CS_NONASCII))
>>>> {
>>>>  //it's all ASCII
>>>>  continue;
>>>> }
>> you see: if the "source" and "target" collations don't have 
>> MY_CS_NONASCII then they are ASCII and there is no conversion to do.
>  > Wouldn't that fix the bug? I understand that this test would not be
>  > limited to numeric things.
>  >
> 
> This is not enough. Note, the first argument can be say latin1 and
> the second argument can be latin2. Both do not have MY_CS_NONASCII,
> because both are compatible with 7bit ASCII on U+0000..U+007F range
> (= binary codes 0x00..0x7F).
> 
> So generally we cannot avoid conversion in this case.
> 
> But we can avoid conversion if we know that repertoire of
> the first argument is MY_REPERTOIRE_ASCII, i.e. the first
> argument does not have any extended characters in byte range
> 0x80..0xFF. So we would have to have this condition anyway:
> 
> (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII
> 
> 
> What we could have is:
> 
>    if (!(coll.collation->state & MY_CS_NONASCII)) &&
>        (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII &&
>        (!(*arg)->collation.collation->state & MY_CS_NONASCII))
>    {
>      //it's all ASCII
>      continue;
>    }
> 
> 
> I.e. remove limitation for numeric/datetime origin of the source.
> But this needs additional investigations: my quick tests yesterday
> showed that a number of tests started to fail.
> 
> At this point I'd like to fix the problem which appeared in 5.5.3.
> But in the future this piece of code can be revised for better
> optimization. I will put a comment about this.
> 
> Thanks!
> 
> 

Thread
bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849 WL#2649Alexander Barkov28 Apr
  • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Guilhem Bichot28 Apr
    • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov28 Apr
      • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Guilhem Bichot29 Apr
        • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov29 Apr
          • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov29 Apr