List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:August 29 2001 8:49pm
Subject:Re: varchar casting not correct
View as plain text  
Hi!

>>>>> "Dirk" == Dirk Nehring <dnehring@stripped> writes:

<cut>

Dirk> Hi Tim,

Dirk> I vote for +CF and -AF for all charset, since accented characters have
Dirk> another meaning than non-accented characters. Mapping every character to
Dirk> their lowercase presentation is very useful (if you don't like it, you
Dirk> can use the binary representation), but ignoring the difference of
Dirk> "â", "à" "á", "ä", "ã", "æ", ... is fatal
> in my view.

Unfortunate this is not true here in Scandinavia.  When you are
comparing and sorting things, many, but not all, of the accented
characters are to be compared as equal.

The current latin1 is how things are compared in at least Sweden and
Finland.

>> The only solution I can think of is to offer all four versions
>> and let the user decide which to use.  The naming convention
>> could be something like:
>> 
>> latin1_af           +CF -AF (your proposed behavior)
>> latin1              +CF +AF (current behavior, roughly)
>> latin1_cf_af        -CF -AF (pretty close to CHAR(x) BINARY)
>> latin1_cf           -CF +AF (probably not too useful)

Dirk> Think seems too flexible which do not make anything easier. "+CF -AF"
Dirk> should be enough for all needs (hopefully).

Sorry but this will not help.

>> The last question is, how do we accommodate the Finns?  And it's
>> not just the Finns, either.  There is a separate character set
>> for german sort order.  There could be one for french, etc., etc.
>> How should they be handled?

Dirk> Sort order is another problem. "a", "ä" should be different in the
Dirk> index, but sorted equal. Oh, oh, I see problems arising...

Yes, this is a problem;  If you want to be able to use the index to
retrieve index in sorted order, you get quickly into trouble.

Dirk> Oracle ignores the language specific characters when sorting:

SQL> select * from tt order by t;

Dirk> xa1
Dirk> xaa
Dirk> xaa1
Dirk> xan
Dirk> xaà
Dirk> xaä

Dirk> with "export NLS_LANG=AMERICAN_GERMANY.WE8ISO8859P1"

Dirk> This is what I suggest. They have a NLSSORT function which does this.

The big question is how much effort one should spend to solve the
uncommon case, if this will make things harder for everyone else.

I think personally that in most cases it's better to store the index
according to sorted order than in any other order.

The only major disadvantage with doing this is that one can't have
UNIQUE on such a index, but in most cases this isn't a problem.

>> This is exhaustive, and exhausting.  But I'm not sure what else
>> to do to solve this problem.

Dirk> Yes, you're right. Let's take a look to our competitors like Oracle,
Dirk> they have found a solution (by providing special functions for this
Dirk> case).

Dirk> And, we should definitely solve this problem before 4.0 will be
Dirk> released, for 3.23 it is a point of discussion.

Sorry, but there is no way to solve this properly until 4.1 is out.
(We plan to start test-compiling binary versions of MySQL 4.0 this
weekend)

Regards,
Monty
Thread
varchar casting not correctDirk Nehring28 Aug
  • Re: varchar casting not correctTimothy Smith28 Aug
    • Re: varchar casting not correctDirk Nehring29 Aug
      • Re: varchar casting not correctMichael Widenius29 Aug
        • Re: varchar casting not correctDirk Nehring30 Aug
          • Re: varchar casting not correctMichael Widenius30 Aug
    • Re: varchar casting not correctMichael Widenius29 Aug