List:General Discussion« Previous MessageNext Message »
From:Johan De Meersman Date:August 27 2012 2:42pm
Subject:MySQL, UTF8 and collations
View as plain text  
Hey, 

We're upgrading MySQLs from 5.0 to 5.5, and running into the predictable utf8_general_ci
bu^Wwe-meant-to-do-that with german ß and similar characters. The server swallowed
the existing datafiles (thank god for that) so we're up and running, but "check table"
does whine about needing a full rebuild, which is clearly that issue. 

Now, I'm aware that this should be fixed by using the utf8_unicode_ci collation; but
somehow it isn't... 

The its-not-a-bug-but-heres-the-fix collation utf8_general_mysql500_ci does work, but
honestly I feel that that's kind of dirty. 

Any ideas why utf8_unicode_ci doesn't work as expected? The new server version is
5.5.24-1~dotdeb.1-log. 

Thx, 
Johan 


mysql> set names UTF8; 
Query OK, 0 rows affected (0.00 sec) 

mysql> show create table search_index; 
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Table | Create Table | 
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| search_index | CREATE TABLE `search_index` ( 
`word` varchar(50) NOT NULL DEFAULT '', 
`sid` int(10) unsigned NOT NULL DEFAULT '0', 
`type` varchar(16) DEFAULT NULL, 
`score` float DEFAULT NULL, 
UNIQUE KEY `word_sid_type` (`word`,`sid`,`type`), 
KEY `sid_type` (`sid`,`type`), 
KEY `word` (`word`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8 | 
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec) 

mysql> show table status like "search_index"; 
+--------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+

| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length |
Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time |
Check_time | Collation | Checksum | Create_options | Comment | 
+--------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+

| search_index | InnoDB | 10 | Compact | 6893296 | 58 | 405536768 | 0 | 725172224 |
255852544 | NULL | 2012-08-27 14:53:20 | NULL | NULL | utf8_general_ci | NULL | | | 
+--------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+

1 row in set (0.07 sec) 

mysql> CREATE TABLE `search_johan` ( `word` varchar(50) NOT NULL DEFAULT '', `count`
> float DEFAULT NULL, PRIMARY KEY (`word`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
> collate=utf8_unicode_ci; 
Query OK, 0 rows affected (0.00 sec) 

mysql> insert into search_johan select * from search_total; 
ERROR 1062 (23000): Duplicate entry 'cœur' for key 'PRIMARY' 




-- 

Linux Bier Wanderung 2012, now also available in Belgium! 
August, 12 to 19, Diksmuide, Belgium - http://lbw2012.tuxera.be 

Thread
MySQL, UTF8 and collationsJohan De Meersman27 Aug
  • RE: MySQL, UTF8 and collationsRick James27 Aug
    • Re: MySQL, UTF8 and collationsJohan De Meersman28 Aug
      • RE: MySQL, UTF8 and collationsRick James28 Aug
      • Re: MySQL, UTF8 and collationsShawn Green28 Aug