List:Commits« Previous MessageNext Message »
From:Alexander Barkov Date:April 28 2010 7:42pm
Subject:Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849
WL#2649
View as plain text  
Hi Guilhem,

Thanks for review! Please see my comments below:

Guilhem Bichot wrote:
> Hello,
> 
> Alexander Barkov a écrit, Le 28.04.2010 14:13:
>> #At file:///home/bar/mysql-bzr/mysql-trunk-bugfixing/ based on 
>> revid:dao-gang.qu@stripped
>>
>>  3025 Alexander Barkov    2010-04-28
>>       Bug#52849 [Com]: datetime index not work
>>       Problem: after introduction of "WL#2649 Number-to-string 
>> conversions"
>>             This query:
>>         SET NAMES cp850; -- Or any other non-latin1 ASCII-based 
>> character set
>>         SELECT * FROM t1
>>         WHERE datetime_column='2010-01-01 00:00:00'
>>       started to add extra character set conversion:         SELECT * 
>> FROM t1
>>         WHERE CONVERT(datetime_column USING cp850)='2010-01-01 00:00:00';
>>             so index on DATETIME column was not used anymore.
>>             Fix:
>>       avoid convertion of NUMERIC/DATETIME items
>>       (i.e. those with derivation DERIVATION_NUMERIC).
>>
>>     modified:
>>       mysql-test/r/type_datetime.result
>>       mysql-test/t/type_datetime.test
>>       sql/item.cc
> 
> I'm reviewing this but being quite ignorant of the concepts here, I have 
> only questions so far.
> 
>> === modified file 'sql/item.cc'
>> --- a/sql/item.cc    2010-04-19 08:27:46 +0000
>> +++ b/sql/item.cc    2010-04-28 12:04:20 +0000
>> @@ -1796,6 +1796,20 @@ bool agg_item_set_converter(DTCollation 
>>                                    &dummy_offset))
>>        continue;
>>  
>> +    /*
>> +      No needs to add converter if an "arg" is NUMERIC or DATETIME
>> +      value (which is pure ASCII) and at the same time target 
>> DTCollation
>> +      is ASCII-compatible. For example, no needs to rewrite:
>> +        SELECT * FROM t1 WHERE datetime_field = '2010-01-01';
>> +      to
>> +        SELECT * FROM t1 WHERE CONVERT(datetime_field USING cs) = 
>> '2010-01-01';
>> +    */
>> +    if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
> 
>> +        (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII &&
> 
> is this test needed? Isn't the first one (about derivations) enough? 

It is enough now. I have put some future assumptions already.

Now we can rewrite as follows:

    if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
        !(coll.collation->state & MY_CS_NONASCII))
    {
       DBUG_ASSERT((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII);
       DBUG_ASSERT(!(*arg)->collation.collation->state & MY_CS_NONASCII)));
       continue;
    }


I.e. checking for repertoire is somewhat redundant at the moment,
and checking for coll.collation->state is redundant too.

But in fact these conditions test very different things.
See explanation about three parts of DTCollation below.

In theory, it is possible that derivation is DERIVATION_NUMERIC,
but at the same time there's MY_CS_NONASCII flag, and at the same
time repertoire is not MY_REPERTOIRE_ASCII. In practice, checking for 
DERIVATION_NUMERIC is enough for *now*. But may change quite soon.

> What is a repertoire (ok it's a French word but what does it mean here 
> ;-)? What does MY_REPERTOIRE_ASCII mean?

It means that the value can consist only of characters
from Unicode range U+0000..U+007F.

Note, a value can be of "tricky" character set, like UCS2,
but at the same time its repertoire can be MY_REPERTOIRE_ASCII.


> There exists MY_REPERTOIRE_NUMERIC, should it be used instead?

Currently MY_REPERTOIRE_NUMERIC is just an alias for
MY_REPERTOIRE_ASCII.

In the future we can split MY_REPERTOIRE_ASCII into smaller parts, like:
MY_REPERTOIRE_ASCII_CONTROLS
MY_REPERTOIRE_ASCII_LETTERS
MY_PEPERTOIRE_ASCII_PUNCTUATION
MY_REPERTOIRE_ASCII_DIGITS

and MY_REPERTOIRE_NUMERIC will be something different -
a smaller subset of MY_REPERTOIRE_ASCII. This will allow
to avoid "Illegal mix of collations" in some more cases.

In the above condition it's only important that to check
that repertoire is MY_REPERTOIRE_ASCII.

> 
>> +        !((*arg)->collation.collation->state & MY_CS_NONASCII)
> &&
> 
> a member named "collation" in an object named "collation" 
> (collation.collation) - what is it about?

Object "collation" is a complex object of DTCollation type,
which has the following attributes:

* collation - a pointer of CHARSET_INFO type,
   which describes character set and collation pair -
   low level handlers doing things like conversion and comparison.

* repertoire - the set of Unicode characters which can
   appear in value. It helps to make sure that safe conversion
   is possible even before knowing values.

* derivation - collation strength - the main factor which
   is used when mixing values with different collation.
   For example, a field of CHAR/VARCHAR/TEXT type is stronger
   than a string literal, so if you mix a field and a literal
   for comparison, they will be compared according to the field
   collation.
   On contrary, a field of numeric/datetime type is weaker
   than literal (starting from 5.5.3).


All three above attributes determine what happens when
we mix objects (like field, literal, function, etc): which
side character set and collation to use, and whether we
really need character set conversion, or character set re-cast
is enough.


> 
> I have the impression that the 3 first tests are about the number/date's 
> collation? do we need to test all three?

Will definitely need in the future. But not now.
So moving the two extra conditions into DBUG_ASSERT looks
the best solution now for me.

> 
>> +        !(coll.collation->state & MY_CS_NONASCII))
> 
> this one is about the target collation it seems.

Yes. There's not much to test about the target collation.
We only need to know if it is compatible with 7bit ASCII.

> 
> Can't the entire if() just be:
> if !((*arg)->collation.collation->state & MY_CS_NONASCII) &&
> !(coll.collation->state & MY_CS_NONASCII))
> {
>  //it's all ASCII
>  continue;
> }
> I guess no, but why?
> 
>> +      continue;
>> +
>>      if (!(conv= (*arg)->safe_charset_converter(coll.collation)) &&
>>          ((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII))
>>        conv= new Item_func_conv_charset(*arg, coll.collation, 1);
> 
> Thanks!
> 

Thread
bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849 WL#2649Alexander Barkov28 Apr
  • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Guilhem Bichot28 Apr
    • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov28 Apr
      • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Guilhem Bichot29 Apr
        • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov29 Apr
          • Re: bzr commit into mysql-trunk-bugfixing branch (bar:3025) Bug#52849WL#2649Alexander Barkov29 Apr