>>>>> "Dirk" == Dirk Nehring <dnehring@stripped> writes:
Dirk> On Wed, Aug 29, 2001 at 11:49:57PM +0300, Michael Widenius wrote:
>> >>>>> "Dirk" == Dirk Nehring <dnehring@stripped> writes:
Dirk> Hi Tim,
Dirk> I vote for +CF and -AF for all charset, since accented characters have
Dirk> another meaning than non-accented characters. Mapping every character to
Dirk> their lowercase presentation is very useful (if you don't like it, you
Dirk> can use the binary representation), but ignoring the difference of
Dirk> "â", "à" "á", "ä", "ã", "æ", ... is fatal
> in my view.
>> Unfortunate this is not true here in Scandinavia. When you are
>> comparing and sorting things, many, but not all, of the accented
>> characters are to be compared as equal.
Dirk> In a linguistic manner this is correct (in germany "a" and "ä" are
Dirk> handled equal), but from a technical view it is not the same. The user
Dirk> should have the possibility to choose between "correct" indexing/sorting
Dirk> and linguistic based sorting. Other database vendors and even the OSes
Dirk> handled this in the same way, the default sorting method is the one I
You can get this done in MySQL by specifying BINARY for strings that
you want to be treated specially.
When sorting, you can then apply a function on the column that will
handle the sort order.
>> The big question is how much effort one should spend to solve the
>> uncommon case, if this will make things harder for everyone else.
>> I think personally that in most cases it's better to store the index
>> according to sorted order than in any other order.
Dirk> I agree. But what is the default sorted order?
This is depending the option to '--default-character-set' when you
start MySQL. In our binary distributions this is 'latin1' that is the
sort order we use in Sweden/Finland but that works (in most cases) for
To define a new sort/compare order is not hard, and you should be able
to do this within half an hour if you really want this.
(Without having to recompile MySQL).
>> The only major disadvantage with doing this is that one can't have
>> UNIQUE on such a index, but in most cases this isn't a problem.
Dirk> Hmmh, this is _exactly_ one of my problems. I'm forced to use "BINARY"
Dirk> as attribute, but I need to have each word in lowercase in the index.
Why do you need to have words in lowercase in the index ?
Have you considered using 'lower()' or 'upper()' in your WHERE/ORDER
BY clauses to fix the problems you have.
>> Sorry, but there is no way to solve this properly until 4.1 is out.
>> (We plan to start test-compiling binary versions of MySQL 4.0 this
Dirk> For this I wish you all the best, hopefully we you can solve this NLS
Dirk> problem. I suggest a new nls-sorting function, but it can be done in
Dirk> different ways.
If you have special needs, the fastest way to solve this is to just
define a new character set that fixes those issues you have with the
default 'latin1' character set.