Hi Guilhem,
Alexander Barkov wrote:
> Hi Guilhem,
>
>
> Guilhem Bichot wrote:
>> Hello,
>>
>> Alexander Barkov a écrit, Le 28.04.2010 21:42:
>>> Hi Guilhem,
>>>
>>> Thanks for review! Please see my comments below:
>>
>> I realized that what I am reviewing could be called a "Bar code",
>> which is a funny pun.
>
> Funny :)
>
>> Comments below.
>>
>>> Guilhem Bichot wrote:
>>>> Hello,
>>>>
>>>> Alexander Barkov a écrit, Le 28.04.2010 14:13:
>>>>> #At file:///home/bar/mysql-bzr/mysql-trunk-bugfixing/ based on
>>>>> revid:dao-gang.qu@stripped
>>>>>
>>>>> 3025 Alexander Barkov 2010-04-28
>>>>> Bug#52849 [Com]: datetime index not work
>>>>> Problem: after introduction of "WL#2649 Number-to-string
>>>>> conversions"
>>>>> This query:
>>>>> SET NAMES cp850; -- Or any other non-latin1 ASCII-based
>>>>> character set
>>>>> SELECT * FROM t1
>>>>> WHERE datetime_column='2010-01-01 00:00:00'
>>>>> started to add extra character set conversion: SELECT
>>>>> * FROM t1
>>>>> WHERE CONVERT(datetime_column USING cp850)='2010-01-01
>>>>> 00:00:00';
>>>>> so index on DATETIME column was not used anymore.
>>>>> Fix:
>>>>> avoid convertion of NUMERIC/DATETIME items
>>>>> (i.e. those with derivation DERIVATION_NUMERIC).
>>
>>>>> === modified file 'sql/item.cc'
>>>>> --- a/sql/item.cc 2010-04-19 08:27:46 +0000
>>>>> +++ b/sql/item.cc 2010-04-28 12:04:20 +0000
>>>>> @@ -1796,6 +1796,20 @@ bool agg_item_set_converter(DTCollation
>>>>> &dummy_offset))
>>>>> continue;
>>>>>
>>>>> + /*
>>>>> + No needs to add converter if an "arg" is NUMERIC or DATETIME
>>>>> + value (which is pure ASCII) and at the same time target
>>>>> DTCollation
>>>>> + is ASCII-compatible. For example, no needs to rewrite:
>>>>> + SELECT * FROM t1 WHERE datetime_field = '2010-01-01';
>>>>> + to
>>>>> + SELECT * FROM t1 WHERE CONVERT(datetime_field USING cs) =
>>>>> '2010-01-01';
>>>>> + */
>>>>> + if ((*arg)->collation.derivation == DERIVATION_NUMERIC
> &&
>>>>
>>>>> + (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII
> &&
>>>>
>>>> is this test needed? Isn't the first one (about derivations) enough?
>>>
>>> It is enough now. I have put some future assumptions already.
>>>
>>> Now we can rewrite as follows:
>>>
>>> if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
>>> !(coll.collation->state & MY_CS_NONASCII))
>>> {
>>> DBUG_ASSERT((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII);
>>> DBUG_ASSERT(!(*arg)->collation.collation->state &
>>> MY_CS_NONASCII)));
>>> continue;
>>> }
>>>
>>>
>>> I.e. checking for repertoire is somewhat redundant at the moment,
>>> and checking for coll.collation->state is redundant too.
>>>
>>> But in fact these conditions test very different things.
>>> See explanation about three parts of DTCollation below.
>>>
>>> In theory, it is possible that derivation is DERIVATION_NUMERIC,
>>> but at the same time there's MY_CS_NONASCII flag, and at the same
>>> time repertoire is not MY_REPERTOIRE_ASCII. In practice, checking for
>>> DERIVATION_NUMERIC is enough for *now*. But may change quite soon.
>>>
>>>> What is a repertoire (ok it's a French word but what does it mean
>>>> here ;-)? What does MY_REPERTOIRE_ASCII mean?
>>>
>>> It means that the value can consist only of characters
>>> from Unicode range U+0000..U+007F.
>>>
>>> Note, a value can be of "tricky" character set, like UCS2,
>>> but at the same time its repertoire can be MY_REPERTOIRE_ASCII.
>>>
>>>
>>>> There exists MY_REPERTOIRE_NUMERIC, should it be used instead?
>>>
>>> Currently MY_REPERTOIRE_NUMERIC is just an alias for
>>> MY_REPERTOIRE_ASCII.
>>>
>>> In the future we can split MY_REPERTOIRE_ASCII into smaller parts, like:
>>> MY_REPERTOIRE_ASCII_CONTROLS
>>> MY_REPERTOIRE_ASCII_LETTERS
>>> MY_PEPERTOIRE_ASCII_PUNCTUATION
>>> MY_REPERTOIRE_ASCII_DIGITS
>>>
>>> and MY_REPERTOIRE_NUMERIC will be something different -
>>> a smaller subset of MY_REPERTOIRE_ASCII. This will allow
>>> to avoid "Illegal mix of collations" in some more cases.
>>>
>>> In the above condition it's only important that to check
>>> that repertoire is MY_REPERTOIRE_ASCII.
>>>
>>>>
>>>>> + !((*arg)->collation.collation->state &
> MY_CS_NONASCII) &&
>>>>
>>>> a member named "collation" in an object named "collation"
>>>> (collation.collation) - what is it about?
>>>
>>> Object "collation" is a complex object of DTCollation type,
>>> which has the following attributes:
>>>
>>> * collation - a pointer of CHARSET_INFO type,
>>> which describes character set and collation pair -
>>> low level handlers doing things like conversion and comparison.
>>>
>>> * repertoire - the set of Unicode characters which can
>>> appear in value. It helps to make sure that safe conversion
>>> is possible even before knowing values.
>>>
>>> * derivation - collation strength - the main factor which
>>> is used when mixing values with different collation.
>>> For example, a field of CHAR/VARCHAR/TEXT type is stronger
>>> than a string literal, so if you mix a field and a literal
>>> for comparison, they will be compared according to the field
>>> collation.
>>> On contrary, a field of numeric/datetime type is weaker
>>> than literal (starting from 5.5.3).
>>>
>>>
>>> All three above attributes determine what happens when
>>> we mix objects (like field, literal, function, etc): which
>>> side character set and collation to use, and whether we
>>> really need character set conversion, or character set re-cast
>>> is enough.
>>>
>>>
>>>>
>>>> I have the impression that the 3 first tests are about the
>>>> number/date's collation? do we need to test all three?
>>>
>>> Will definitely need in the future. But not now.
>>> So moving the two extra conditions into DBUG_ASSERT looks
>>> the best solution now for me.
>>>
>>>>
>>>>> + !(coll.collation->state & MY_CS_NONASCII))
>>>>
>>>> this one is about the target collation it seems.
>>>
>>> Yes. There's not much to test about the target collation.
>>> We only need to know if it is compatible with 7bit ASCII.
>>
>> Thanks for the detailed explanations.
>> Your original version or the one with asserts is ok to push.
The version with DBUG_ASSERT crashes with stored functions.
ASCII repertoire gets lost when argument is passed to the function.
This is an example:
create function f1 (par1 int) returns int
begin
return concat(par1);
end|
set @a= f1(1);
So I'll use my original version.
>
> Thanks!
>
>> Though I wonder: if we had this patch instead:
>>
>>>> if !((*arg)->collation.collation->state & MY_CS_NONASCII)
> &&
>>>> !(coll.collation->state & MY_CS_NONASCII))
>>>> {
>>>> //it's all ASCII
>>>> continue;
>>>> }
>> you see: if the "source" and "target" collations don't have
>> MY_CS_NONASCII then they are ASCII and there is no conversion to do.
> > Wouldn't that fix the bug? I understand that this test would not be
> > limited to numeric things.
> >
>
> This is not enough. Note, the first argument can be say latin1 and
> the second argument can be latin2. Both do not have MY_CS_NONASCII,
> because both are compatible with 7bit ASCII on U+0000..U+007F range
> (= binary codes 0x00..0x7F).
>
> So generally we cannot avoid conversion in this case.
>
> But we can avoid conversion if we know that repertoire of
> the first argument is MY_REPERTOIRE_ASCII, i.e. the first
> argument does not have any extended characters in byte range
> 0x80..0xFF. So we would have to have this condition anyway:
>
> (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII
>
>
> What we could have is:
>
> if (!(coll.collation->state & MY_CS_NONASCII)) &&
> (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII &&
> (!(*arg)->collation.collation->state & MY_CS_NONASCII))
> {
> //it's all ASCII
> continue;
> }
>
>
> I.e. remove limitation for numeric/datetime origin of the source.
> But this needs additional investigations: my quick tests yesterday
> showed that a number of tests started to fail.
>
> At this point I'd like to fix the problem which appeared in 5.5.3.
> But in the future this piece of code can be revised for better
> optimization. I will put a comment about this.
>
> Thanks!
>
>