List:General Discussion« Previous MessageNext Message »
From:Pooly Date:July 3 2008 8:10pm
Subject:Re: convertion to utf-8
View as plain text  
2008/7/1 Dan Nelson <dnelson@stripped>:
> In the last episode (Jun 30), Pooly said:
>> 2008/6/30 Dan Nelson <dnelson@stripped>:
>> > In the last episode (Jun 29), Pooly said:
>> >> Hi,
>> >>
>> >> I'm trying to convert my tables to UTF8 but I'm getting the
>> >> following error: ERROR 1062 (23000): Duplicate entry 'Zorglüb' for
>> >> key 1
>> >>
>> >> Not too sure why I'm getting this error since the current (latin1)
>> >> data are:
>> >>
>> >> mysql> select * from topics_lookup where label like 'Zor%';
>> >> +----------+----------+------+
>> >> | label    | topic_id | main |
>> >> +----------+----------+------+
>> >> | Zorglub  |       72 |    0 |
>> >> | Zorglüb  |       72 |    1 |
>> >> +----------+----------+------+
>> >> 2 rows in set (0.00 sec)
>> >>
>> >> There is a unique index on label, however the 2 data are different.
>> >>
>> >> Any ideas ?
>> >
>> > I can't reproduce this.  Can you provide example commands
>> > demonstrating your problem?
>>
>> Yes, sorry I should have been more precise in my email.
>>
>> mysql> select version();
>> +--------------------------+
>> | version()                |
>> +--------------------------+
>> | 5.0.32-Debian_7etch5-log |
>> +--------------------------+
>> 1 row in set (0.00 sec)
>>
>> create table mytable2 ( label varchar(200) primary key ) charset latin1;
>> insert into mytable2 values ('Zorglub'), ('Zorglüb');
>> alter table mytable2 convert to character set utf8 collate utf8_general_ci;
>>
>> this gives:
>> ERROR 1062 (23000): Duplicate entry 'Zorglüb' for key 1
>>
>> I tried to search the changelog and the bug tracking system, but
>> without much luck.
>
> Mysql's default collation is latin1_swedish_ci, which sorts ü along
> with y.  utf8_general_ci sorts it along with u:
>
> http://www.collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html
> http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html
>
> More reading:
>
> http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
>
>  ... To further illustrate, the following equalities hold in both
>  utf8_general_ci and utf8_unicode_ci (for the effect this has in
>  comparisons or when doing searches, see Section 9.1.5.6, "Examples of
>  the Effect of Collation"):
>
>  Ä = A
>  Ö = O
>  Ü = U
>

Thanks for the link and the detailled explanation. It's all clear now
with the collation, and I now what to do with my data.
Cheers,
Thread
convertion to utf-8Pooly29 Jun
  • Re: convertion to utf-8Dan Nelson30 Jun
    • Re: convertion to utf-8Pooly30 Jun
      • Re: convertion to utf-8Dan Nelson1 Jul
        • Re: convertion to utf-8Pooly3 Jul
  • RE: convertion to utf-8Jerry Schwartz30 Jun