Thanks for review! Please see my comments below:
Guilhem Bichot wrote:
> Alexander Barkov a écrit, Le 28.04.2010 14:13:
>> #At file:///home/bar/mysql-bzr/mysql-trunk-bugfixing/ based on
>> 3025 Alexander Barkov 2010-04-28
>> Bug#52849 [Com]: datetime index not work
>> Problem: after introduction of "WL#2649 Number-to-string
>> This query:
>> SET NAMES cp850; -- Or any other non-latin1 ASCII-based
>> character set
>> SELECT * FROM t1
>> WHERE datetime_column='2010-01-01 00:00:00'
>> started to add extra character set conversion: SELECT *
>> FROM t1
>> WHERE CONVERT(datetime_column USING cp850)='2010-01-01 00:00:00';
>> so index on DATETIME column was not used anymore.
>> avoid convertion of NUMERIC/DATETIME items
>> (i.e. those with derivation DERIVATION_NUMERIC).
> I'm reviewing this but being quite ignorant of the concepts here, I have
> only questions so far.
>> === modified file 'sql/item.cc'
>> --- a/sql/item.cc 2010-04-19 08:27:46 +0000
>> +++ b/sql/item.cc 2010-04-28 12:04:20 +0000
>> @@ -1796,6 +1796,20 @@ bool agg_item_set_converter(DTCollation
>> + /*
>> + No needs to add converter if an "arg" is NUMERIC or DATETIME
>> + value (which is pure ASCII) and at the same time target
>> + is ASCII-compatible. For example, no needs to rewrite:
>> + SELECT * FROM t1 WHERE datetime_field = '2010-01-01';
>> + to
>> + SELECT * FROM t1 WHERE CONVERT(datetime_field USING cs) =
>> + */
>> + if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
>> + (*arg)->collation.repertoire == MY_REPERTOIRE_ASCII &&
> is this test needed? Isn't the first one (about derivations) enough?
It is enough now. I have put some future assumptions already.
Now we can rewrite as follows:
if ((*arg)->collation.derivation == DERIVATION_NUMERIC &&
!(coll.collation->state & MY_CS_NONASCII))
DBUG_ASSERT((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII);
DBUG_ASSERT(!(*arg)->collation.collation->state & MY_CS_NONASCII)));
I.e. checking for repertoire is somewhat redundant at the moment,
and checking for coll.collation->state is redundant too.
But in fact these conditions test very different things.
See explanation about three parts of DTCollation below.
In theory, it is possible that derivation is DERIVATION_NUMERIC,
but at the same time there's MY_CS_NONASCII flag, and at the same
time repertoire is not MY_REPERTOIRE_ASCII. In practice, checking for
DERIVATION_NUMERIC is enough for *now*. But may change quite soon.
> What is a repertoire (ok it's a French word but what does it mean here
> ;-)? What does MY_REPERTOIRE_ASCII mean?
It means that the value can consist only of characters
from Unicode range U+0000..U+007F.
Note, a value can be of "tricky" character set, like UCS2,
but at the same time its repertoire can be MY_REPERTOIRE_ASCII.
> There exists MY_REPERTOIRE_NUMERIC, should it be used instead?
Currently MY_REPERTOIRE_NUMERIC is just an alias for
In the future we can split MY_REPERTOIRE_ASCII into smaller parts, like:
and MY_REPERTOIRE_NUMERIC will be something different -
a smaller subset of MY_REPERTOIRE_ASCII. This will allow
to avoid "Illegal mix of collations" in some more cases.
In the above condition it's only important that to check
that repertoire is MY_REPERTOIRE_ASCII.
>> + !((*arg)->collation.collation->state & MY_CS_NONASCII)
> a member named "collation" in an object named "collation"
> (collation.collation) - what is it about?
Object "collation" is a complex object of DTCollation type,
which has the following attributes:
* collation - a pointer of CHARSET_INFO type,
which describes character set and collation pair -
low level handlers doing things like conversion and comparison.
* repertoire - the set of Unicode characters which can
appear in value. It helps to make sure that safe conversion
is possible even before knowing values.
* derivation - collation strength - the main factor which
is used when mixing values with different collation.
For example, a field of CHAR/VARCHAR/TEXT type is stronger
than a string literal, so if you mix a field and a literal
for comparison, they will be compared according to the field
On contrary, a field of numeric/datetime type is weaker
than literal (starting from 5.5.3).
All three above attributes determine what happens when
we mix objects (like field, literal, function, etc): which
side character set and collation to use, and whether we
really need character set conversion, or character set re-cast
> I have the impression that the 3 first tests are about the number/date's
> collation? do we need to test all three?
Will definitely need in the future. But not now.
So moving the two extra conditions into DBUG_ASSERT looks
the best solution now for me.
>> + !(coll.collation->state & MY_CS_NONASCII))
> this one is about the target collation it seems.
Yes. There's not much to test about the target collation.
We only need to know if it is compatible with 7bit ASCII.
> Can't the entire if() just be:
> if !((*arg)->collation.collation->state & MY_CS_NONASCII) &&
> !(coll.collation->state & MY_CS_NONASCII))
> //it's all ASCII
> I guess no, but why?
>> + continue;
>> if (!(conv= (*arg)->safe_charset_converter(coll.collation)) &&
>> ((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII))
>> conv= new Item_func_conv_charset(*arg, coll.collation, 1);