List:General Discussion« Previous MessageNext Message »
From:Dan Nelson Date:July 1 2008 1:37am
Subject:Re: convertion to utf-8
View as plain text  
In the last episode (Jun 30), Pooly said:
> 2008/6/30 Dan Nelson <dnelson@stripped>:
> > In the last episode (Jun 29), Pooly said:
> >> Hi,
> >>
> >> I'm trying to convert my tables to UTF8 but I'm getting the
> >> following error: ERROR 1062 (23000): Duplicate entry 'Zorglüb' for
> >> key 1
> >>
> >> Not too sure why I'm getting this error since the current (latin1)
> >> data are:
> >>
> >> mysql> select * from topics_lookup where label like 'Zor%';
> >> +----------+----------+------+
> >> | label    | topic_id | main |
> >> +----------+----------+------+
> >> | Zorglub  |       72 |    0 |
> >> | Zorglüb  |       72 |    1 |
> >> +----------+----------+------+
> >> 2 rows in set (0.00 sec)
> >>
> >> There is a unique index on label, however the 2 data are different.
> >>
> >> Any ideas ?
> >
> > I can't reproduce this.  Can you provide example commands
> > demonstrating your problem?
> 
> Yes, sorry I should have been more precise in my email.
> 
> mysql> select version();
> +--------------------------+
> | version()                |
> +--------------------------+
> | 5.0.32-Debian_7etch5-log |
> +--------------------------+
> 1 row in set (0.00 sec)
> 
> create table mytable2 ( label varchar(200) primary key ) charset latin1;
> insert into mytable2 values ('Zorglub'), ('Zorglüb');
> alter table mytable2 convert to character set utf8 collate utf8_general_ci;
> 
> this gives:
> ERROR 1062 (23000): Duplicate entry 'Zorglüb' for key 1
> 
> I tried to search the changelog and the bug tracking system, but
> without much luck.

Mysql's default collation is latin1_swedish_ci, which sorts ü along
with y.  utf8_general_ci sorts it along with u:

http://www.collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html
http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html

More reading:

http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

 ... To further illustrate, the following equalities hold in both
 utf8_general_ci and utf8_unicode_ci (for the effect this has in
 comparisons or when doing searches, see Section 9.1.5.6, "Examples of
 the Effect of Collation"):

  Ä = A
  Ö = O
  Ü = U

http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html

  mysql> SELECT * FROM germanutf8 WHERE c = 'Bär';
  +------+
  | c    |
  +------+
  | Bar  |
  | Bär  |
  +------+

 ... This is not a bug but rather a consequence of the sorting that
 latin1_german1_ci or utf8_unicode_ci do (the sorting shown is done
 according to the German DIN 5007 standard).

-- 
	Dan Nelson
	dnelson@stripped
Thread
convertion to utf-8Pooly29 Jun
  • Re: convertion to utf-8Dan Nelson30 Jun
    • Re: convertion to utf-8Pooly30 Jun
      • Re: convertion to utf-8Dan Nelson1 Jul
        • Re: convertion to utf-8Pooly3 Jul
  • RE: convertion to utf-8Jerry Schwartz30 Jun