I'm adding Svoj in CC because I know he's fixing
a very similar bug at the moment.
I have taken a look into
which I guess is an improved version of
originally mentioned in your letter.
I noticed three things:
Sorry, I'm not strong with Falcon internals,
so I don't know why you need to trim trailing minSortChar.
This makes MySQLCollation::compare() work differently from
how collation really works.
Can you please give some insight for this?
Unfortunately, MySQLCollation::computeKeyLength() won't work
in some cases because it scans from the end of the string.
Some Asian character sets, like can have bytes in ASCII range
in the second byte of a multi-byte character.
For example, in SJIS:
#define issjishead(c) ((0x81<=(c) && (c)<=0x9f) || \
((0xe0<=(c)) && (c)<=0xfc))
#define issjistail(c) ((0x40<=(c) && (c)<=0x7e) || \
(0x80<=(c) && (c)<=0xfc))
For example, if you scan from the end, you can never guess
what is byte 0x40:
- either a single byte character (@)
- or the second byte of a two-byte character.
The same problem exists for all bytes 0x40..0x7e.
So only scanning from the beginning guarantees ambiguity for
character sets like SJIS.
int falcon_cs_is_binary (void *cs)
return (0 == strcmp(((CHARSET_INFO*) cs)->name, "binary"));
You can modify the above code to check to this:
return ((CHARSET_INFO*) cs)->number == 63;
Lars-Erik Bjørk wrote:
> Hi Alexander (bar)!
> I am cc'ing you on this one, in case you have an opinion on the
> matter :)
> On Tue, 2008-12-02 at 15:18 +0100, Lars-Erik Bjørk wrote:
>> On Tue, 2008-12-02 at 15:08 +0100, Vladislav Vaintroub wrote:
>>> Lars-Erik, while I admit that is it a good job, except for taking slightly
>>> non-MBCS compliant handling of minSortChar.
>> Ahem, yes :) I forgot this one
>>> But, I do believe there are already too many workarounds in MySQLCollation
>>> The original problems are not documented anywhere and comments are missing
>>> , as usual. Thus I think for sake of portability, readability and other
>>> abilities a rewrite of this stuff using Mr. MySQL-Unicode( aka Bar) as
>>> consultant would be good. I bet almost all problems could be fixed using
>>> functions strings library already provides (this stuff might be not well
>>> documented as well, but it works) and there is no reasons to
>>> reinvent/reimplement stuff.
>> If Bar knows a better, and less complex way using the strings library,
>> then I am all for that! :)
>>> -First, why do we need computeKeyLength at all? What does it work around?
>>> -Which role does minSortChar play? Why are there attempts to trim it
>>> together with the blanks?
>>> -We don't we use charsetinfo->lengthsp() to trim blanks. Why not?
>>> -We use some ad-hoc technique to trim strings. Why?
>>> -why there are so many if (isBinary) in MySQLCollation?
>>>> -----Original Message-----
>>>> From: Lars-Erik.Bjork@stripped [mailto:Lars-Erik.Bjork@stripped]
>>>> Sent: Tuesday, December 02, 2008 2:21 PM
>>>> To: falcon@stripped
>>>> Subject: Please review fox for bug#34479
>>>> The fix is available at:
>>>> The function MySQLCollation::computeKeyLength(...) is no longer
>>>> because it would clutter the header file. Also, some of you (Vlad? :))
>>>> may have problems with using a couple of the CHARSET_INFO struct's
>>>> members 'directly', but this is already done elsewhere :)
>>>> Please enjoy! ;)
>>>> Best regards,
>>>> Falcon Storage Engine Mailing List
>>>> For list archives: http://lists.mysql.com/falcon
>>>> To unsubscribe: http://lists.mysql.com/falcon?unsub=1