In the last episode (Jun 30), Pooly said:
> 2008/6/30 Dan Nelson <dnelson@stripped>:
> > In the last episode (Jun 29), Pooly said:
> >> Hi,
> >>
> >> I'm trying to convert my tables to UTF8 but I'm getting the
> >> following error: ERROR 1062 (23000): Duplicate entry 'Zorglüb' for
> >> key 1
> >>
> >> Not too sure why I'm getting this error since the current (latin1)
> >> data are:
> >>
> >> mysql> select * from topics_lookup where label like 'Zor%';
> >> +----------+----------+------+
> >> | label | topic_id | main |
> >> +----------+----------+------+
> >> | Zorglub | 72 | 0 |
> >> | Zorglüb | 72 | 1 |
> >> +----------+----------+------+
> >> 2 rows in set (0.00 sec)
> >>
> >> There is a unique index on label, however the 2 data are different.
> >>
> >> Any ideas ?
> >
> > I can't reproduce this. Can you provide example commands
> > demonstrating your problem?
>
> Yes, sorry I should have been more precise in my email.
>
> mysql> select version();
> +--------------------------+
> | version() |
> +--------------------------+
> | 5.0.32-Debian_7etch5-log |
> +--------------------------+
> 1 row in set (0.00 sec)
>
> create table mytable2 ( label varchar(200) primary key ) charset latin1;
> insert into mytable2 values ('Zorglub'), ('Zorglüb');
> alter table mytable2 convert to character set utf8 collate utf8_general_ci;
>
> this gives:
> ERROR 1062 (23000): Duplicate entry 'Zorglüb' for key 1
>
> I tried to search the changelog and the bug tracking system, but
> without much luck.
Mysql's default collation is latin1_swedish_ci, which sorts ü along
with y. utf8_general_ci sorts it along with u:
http://www.collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html
http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html
More reading:
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
... To further illustrate, the following equalities hold in both
utf8_general_ci and utf8_unicode_ci (for the effect this has in
comparisons or when doing searches, see Section 9.1.5.6, "Examples of
the Effect of Collation"):
Ä = A
Ö = O
Ü = U
http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html
mysql> SELECT * FROM germanutf8 WHERE c = 'Bär';
+------+
| c |
+------+
| Bar |
| Bär |
+------+
... This is not a bug but rather a consequence of the sorting that
latin1_german1_ci or utf8_unicode_ci do (the sorting shown is done
according to the German DIN 5007 standard).
--
Dan Nelson
dnelson@stripped