Author: shinz
Date: 2006-06-26 18:07:34 +0200 (Mon, 26 Jun 2006)
New Revision: 2517
Log:
Attached PeterG's CJK FAQ to the Charset chapter (the actual FAQ was missing in my last commit ;-)
Added:
trunk/refman-common/cjk-faq.en.xml
Added: trunk/refman-common/cjk-faq.en.xml
===================================================================
--- trunk/refman-common/cjk-faq.en.xml (rev 0)
+++ trunk/refman-common/cjk-faq.en.xml 2006-06-26 16:07:34 UTC (rev 2517)
@@ -0,0 +1,1353 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
+ "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
+ [
+ <!ENTITY % fixedchars.entities SYSTEM "fixedchars.ent">
+ %fixedchars.entities;
+ <!ENTITY % title.entities SYSTEM "titles.en.ent">
+ %title.entities;
+ ]>
+<section id="cjk-faq">
+
+ <title>&title-cjk-faq;</title>
+
+ <indexterm type="concept">
+ <primary>CJK</primary>
+ <secondary>FAQ</secondary>
+ </indexterm>
+
+ <indexterm type="concept">
+ <primary>Chinese, Japanese, Korean character sets</primary>
+ <secondary>frequently asked questions</secondary>
+ </indexterm>
+
+ <indexterm type="concept">
+ <primary>Japanese, Korean, Chinese character sets</primary>
+ <secondary>frequently asked questions</secondary>
+ </indexterm>
+
+ <indexterm type="concept">
+ <primary>Korean, Chinese, Japanese character sets</primary>
+ <secondary>frequently asked questions</secondary>
+ </indexterm>
+
+ <para>
+ This Frequently-Asked-Questions section comes from the experiences
+ of MySQL's Support and Development groups, after handling many
+ enquiries about CJK (Chinese Japanese Korean) issues.
+ </para>
+
+<!-- This list can be removed if TOC shows up correctly
+ <para>
+ Contents:
+
+ <itemizedlist>
+
+ <listitem>
+ <para></para>
+ </listitem>
+
+ <listitem>
+ <para>
+ SELECT shows non-Latin characters as "?"s. Why?
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Troubles with GB character sets (Chinese)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Troubles with Big5 character set (Chinese)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Troubles with character-set conversions (Japanese)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The Great Yen Sign Problem (Japanese)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Troubles with euckr character set (Korean)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The "Data truncated" message
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Troubles with Access (or Perl) (or PHP) (etc.)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ How can I get old MySQL-4.0 behaviour back?
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Why do some LIKE and FULLTEXT searches fail?
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ What CJK character sets are available?
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Is character X available in all character sets?
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Strings Don't Sort Correctly in Unicode (I)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Strings Don't Sort Correctly in Unicode (II)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ My supplementary characters get rejected
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Shouldn't it be CJKV (V for Vietnamese)?
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Will MySQL fix any CJK problems in version 5.1?
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When will MySQL translate the manual again?
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Whom can I talk to?
+ </para>
+ </listitem>
+
+ </itemizedlist>
+ </para>
+-->
+
+ <section id="cjk-faq-question-marks">
+
+ <title>SELECT shows non-Latin characters as "?"s. Why?</title>
+
+ <para>
+ You inserted CJK characters with <literal>INSERT</literal>, but
+ when you do a <literal>SELECT</literal>, they all look like
+ <quote>?</quote>. It usually is a setting in MySQL that doesn't
+ match the settings for the application program or the operating
+ system. These are common troubleshooting steps:
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Find out: what version do you have? The statement
+ <literal>SELECT VERSION();</literal> will tell you. This FAQ
+ is for MySQL version 5, so some of the answers here will not
+ apply to you if you have version 4.0 or 4.1.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Find out: what character set is the database column really
+ in? Too frequently, people think that the character set will
+ be the same as the server's set (false), or the set used for
+ display purposes (false). Make sure, by saying <literal>SHOW
+ CREATE TABLE tablename</literal>, or better yet by saying
+ this:
+
+<programlisting>
+ SELECT character_set_name, collation_name
+ FROM information_schema.columns WHERE table_schema = your_database_name
+ AND table_name = your_table_name
+ AND column_name = your_column_name;
+ </programlisting>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Find out: what is the hexadecimal value?
+
+<programlisting>
+ SELECT HEX(your_column_name)
+ FROM your_table_name;
+ </programlisting>
+
+ If you see <literal>3F</literal>, then that really is the
+ encoding for <literal>?</literal>, so no wonder you see
+ <quote>?</quote>. Probably this happened because of a
+ problem converting a particular character from your client
+ character set to the target character set.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Find out: is a literal round trip possible, that is, if you
+ select <quote>literal</quote> (or <quote>_introducer
+ hexadecimal-value</quote>) do you get <quote>literal</quote>
+ as a result? For example, with the Japanese Katakana Letter
+ Pe, which looks like <literal>ペ'</literal>, and which
+ exists in all CJK character sets, and which has the code
+ point value (hexadecimal coding) <literal>0x30da</literal>,
+ enter:
+
+<programlisting>
+SELECT 'ペ' AS `ペ`; /* or SELECT _ucs2 30da; */
+</programlisting>
+
+ If the result doesn't look like <literal>ペ</literal>, a
+ round trip failed. For bug reports, we might ask people to
+ follow up with <literal>SELECT hex('ペ');</literal>. Then
+ we can see whether the client encoding is right.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Find out: is it the browser or application? Just use
+ <command>mysql</command> (the MySQL client program, which on
+ Windows will be <command>mysql.exe</command>). If
+ <command>mysql</command> displays correctly but your
+ application doesn't, then your problem is probably
+ <quote>Settings</quote>, but consult also the question about
+ <quote>Troubles with Access (or Perl) (or PHP)
+ (etc.)</quote> much later in this FAQ.
+ </para>
+
+ <para>
+ To find your settings, the statement you need here is
+ <literal>SHOW VARIABLES</literal>. For example:
+
+<programlisting>
+mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
++--------------------------+----------------------------------------+
+| Variable_name | Value |
++--------------------------+----------------------------------------+
+| character_set_client | utf8 |
+| character_set_connection | utf8 |
+| character_set_database | latin1 |
+| character_set_filesystem | binary |
+| character_set_results | utf8 |
+| character_set_server | latin1 |
+| character_set_system | utf8 |
+| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
++--------------------------+----------------------------------------+
+8 rows in set (0.03 sec)
+</programlisting>
+
+ The above are typical character-set settings for an
+ international-oriented client (notice the use of
+ <literal>utf8</literal> Unicode) connected to a server in
+ the West (<literal>latin1</literal> is a West Europe
+ character set and a default for MySQL).
+ </para>
+
+ <para>
+ Although Unicode (usually the <literal>utf8</literal>
+ variant on Unix, usually the <literal>ucs2</literal> variant
+ on Windows) is better than <quote>latin</quote>, it's often
+ not what your operating system utilities support best. Many
+ Windows users find that a Microsoft character set, such as
+ <literal>cp932</literal> for Japanese Windows, is what's
+ suitable.
+ </para>
+
+ <para>
+ If you can't control the server settings, and you have no
+ idea what your underlying computer is about, then try
+ changing to a common character set for the country that
+ you're in (<literal>euckr</literal> = Korea,
+ <literal>gb2312</literal> or <literal>gbk</literal> =
+ People's Republic of China, <literal>big5</literal> = other
+ China, <literal>sjis</literal> or <literal>ujis</literal> or
+ <literal>cp932</literal> or <literal>eucjpms</literal> =
+ Japan, <literal>ucs2</literal> or <literal>utf8</literal> =
+ anywhere). Usually it is only necessary to change the client
+ and connection and results settings, and there is a simple
+ statement which changes all three at once, namely
+ <literal>SET NAMES</literal>. For example:
+
+<programlisting>
+SET NAMES 'big5';
+</programlisting>
+
+ Once you get the correct setting, you can make it permanent
+ by editing <filename>my.cnf</filename> or
+ <filename>my.ini</filename>. For example you might add lines
+ looking like this:
+
+<programlisting>
+[mysqld]
+SET NAMES 'big5'
+</programlisting>
+ </para>
+ </listitem>
+
+ </itemizedlist>
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-gb-charset-problems">
+
+ <title>Troubles with GB character sets (Chinese)</title>
+
+ <para>
+ <remark role="update">
+ [SH] References to d.udm.net (Bar's pages) need to be changed
+ once we've moved those pages to the Reference Manual.
+ </remark>
+
+ MySQL supports the two common variants of the GB (<quote>Guojia
+ Biaozhun</quote> or <quote>National Standard</quote>) character
+ sets which are official in the People's Republic of China:
+ <literal>gb2312</literal> and <literal>gbk</literal>. Sometimes
+ people try to insert <literal>gbk</literal> characters into
+ <literal>gb2312</literal>, and it works most of the time because
+ <literal>gbk</literal> is a superset of <literal>gb2312</literal>.
+ But eventually they try to insert a rarer Chinese character and it
+ doesn't work. (Example: bug #16072 in our bugs database,
+ <ulink url="http://bugs.mysql.com/bug.php?id=16072"/>). So we'll
+ try to clarify here exactly what characters are legitimate in
+ <literal>gb2312</literal> or <literal>gbk</literal>, with
+ reference to the official documents. Please check these references
+ before reporting <literal>gb2312</literal> or
+ <literal>gbk</literal> bugs. We now have a graphic listing of the
+ <literal>gbk</literal> characters, currently on the site of Mr
+ Alexander Barkov (MySQL's principal programmer for character set
+ issues). The chart is in order according to the
+ <literal>gb2312_chinese_ci</literal> collation:
+ <ulink url="http://d.udm.net/bar/~bar/charts/gb2312_chinese_ci.html"/>.
+ MySQL's <literal>gbk</literal> is in reality <quote>Microsoft code
+ page 936</quote>. This differs from the official
+ <literal>gbk</literal> for characters <literal>A1A4</literal>
+ (middle dot), <literal>A1AA</literal> (em dash),
+ <literal>A6E0-A6F5</literal>, and <literal>A8BB-A8C0</literal>.
+ For a listing of the differences, see
+ <ulink url="http://recode.progiciels-bpi.ca/showfile.html?name=dist/libiconv/gbk.h"/>.
+ For a listing of gbk/Unicode mappings, see
+ <ulink url="http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT"/>.
+ For MySQL's listing of gbk characters, see
+ <ulink url="http://d.udm.net/bar/~bar/charts/gbk_chinese_ci.html"/>.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-big5-charset-problems">
+
+ <title>Troubles with big5 character set (Chinese)</title>
+
+ <para>
+ MySQL supports the Big5 character set which is common in Hong Kong
+ and the Republic of China (Taiwan). MySQL's
+ <literal>big5</literal> is in reality <quote>Microsoft code page
+ 950</quote>, which is very similar to the original
+ <literal>big5</literal> character set. This is a recent change,
+ starting with MySQL version 4.1.16 / 5.0.16. We made the change as
+ a result of a bug report, bug #12476 in our bugs database,
+ <ulink url="http://bugs.mysql.com/bug.php?id=12476"/> (title:
+ <quote>Some big5 codes are still missing ...</quote>). For
+ example, the following statements work in the current version of
+ MySQL, but not in old versions:
+
+<programlisting>
+mysql> <userinput>create table big5 (big5 char(1) character set big5);</userinput>
+Query OK, 0 rows affected (0.13 sec)
+
+mysql> <userinput>insert into big5 values (0xf9dc);</userinput>
+Query OK, 1 row affected (0.00 sec)
+
+mysql> <userinput>select * from big5;</userinput>
++------+
+| big5 |
++------+
+| 嫺 |
++------+
+1 row in set (0.02 sec)
+</programlisting>
+
+ There is a feature request for adding HKSCS extensions (bug #13577
+ in our bugs database,
+ <ulink url="http://bugs.mysql.com/bug.php?id=13577)"/>. People who
+ need the extension may find the suggested patch for bug #13577 is
+ of interest.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-charset-conversion-problems">
+
+ <title>Troubles with character-set conversions (Japanese)</title>
+
+ <para>
+ MySQL supports the <literal>sjis</literal>,
+ <literal>ujis</literal>, <literal>cp932</literal>, and eucjpms
+ character sets, as well as Unicode. A common need is to convert
+ between character sets. For example, there might be a Unix server
+ (typically with <literal>sjis</literal> or
+ <literal>ujis</literal>) and a Windows client (typically with
+ <literal>cp932</literal>). But conversions can seem to fail.
+ Here's why. In this conversion table, the <literal>ucs2</literal>
+ column is the source, and the
+ <literal>sjis</literal>/<literal>cp932</literal>/<literal>ujis</literal>/<literal>eucjpms</literal>
+ columns are the destination, that is, what the hexadecimal result
+ would be if we used <literal>CONVERT(ucs2)</literal> or if we
+ assigned a <literal>ucs2</literal> column containing the value to
+ an
+ <literal>sjis</literal>/<literal>cp932</literal>/<literal>ujis</literal>/<literal>eucjpms</literal>
+ column.
+
+<programlisting>
+character name ucs2 sjis cp932 ujis eucjpms
+-------------- ---- ---- ---- ---- -------
+
+BROKEN BAR 00A6 3F 3F 8FA2C3 3F
+FULLWIDTH BROKEN BAR FFE4 3F FA55 3F 8FA2
+
+YEN SIGN 00A5 3F 3F 20 3F
+FULLWIDTH YEN SIGN FFE5 818F 818F A1EF 3F
+
+TILDE 007E 7E 7E 7E 7E
+OVERLINE 203E 3F 3F 20 3F
+
+HORIZONTAL BAR 2015 815C 815C A1BD A1BD
+EM DASH 2014 3F 3F 3F 3F
+
+REVERSE SOLIDUS 005C 815F 5C 5C 5C
+FULLWIDTH "" FF3C 3F 815F 3F A1C0
+
+WAVE DASH 301C 8160 3F A1C1 3F
+FULLWIDTH TILDE FF5E 3F 8160 3F A1C1
+
+DOUBLE VERTICAL LINE 2016 8161 3F A1C2 3F
+PARALLEL TO 2225 3F 8161 3F A1C2
+
+MINUS SIGN 2212 817C 3F A1DD 3F
+FULLWIDTH HYPHEN-MINUS FF0D 3F 817C 3F A1DD
+
+CENT SIGN 00A2 8191 3F A1F1 3F
+FULLWIDTH CENT SIGN FFE0 3F 8191 3F A1F1
+
+POUND SIGN 00A3 8192 3F A1F2 3F
+FULLWIDTH POUND SIGN FFE1 3F 8192 3F A1F2
+
+NOT SIGN 00AC 81CA 3F A2CC 3F
+FULLWIDTH NOT SIGN FFE2 3F 81CA 3F A2CC
+</programlisting>
+
+ For example, consider this extract from the table:
+
+<programlisting>
+ ucs2 sjis cp932
+ ---- ---- -----
+NOT SIGN 00AC 81CA 3F
+FULLWIDTH NOT SIGN FFE2 3F 81CA
+</programlisting>
+
+ It means <quote>for NOT SIGN which is Unicode U+00AC, MySQL
+ converts to sjis code point 0x81CA and to cp932 code point
+ 3F</quote>. (<literal>3F</literal> is question mark
+ (<quote>?</quote>) and is what we always use when we can't
+ convert.) Now, what should we do if we want to convert
+ <literal>sjis 81CA</literal> to <literal>cp932</literal>? Our
+ answer is: <quote>?</quote>. There are serious complaints about
+ this, many people would prefer a <quote>loose</quote> conversion,
+ so that <literal>81CA (NOT SIGN)</literal> in
+ <literal>sjis</literal> becomes <literal>81CA (FULLWIDTH NOT
+ SIGN)</literal> in <literal>cp932</literal>. We are considering
+ changing.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-great-yen-sign-problem">
+
+ <title>The Great Yen Sign Problem (Japanese)</title>
+
+ <para>
+ In SJIS the code for Yen Sign (<literal>¥</literal>) is
+ <literal>5C</literal>. In SJIS the code for Reverse Solidus
+ (<literal>\</literal>) is <literal>5C</literal>. Since the above
+ statements are contradictory, confusion often results. Well, to
+ put it more seriously, some versions of Japanese character sets
+ (both <literal>sjis</literal> and <literal>euc</literal>) have
+ treated <literal>5C</literal> as a reverse solidus, also known as
+ a backslash, and others have treated it as a yen sign. There's
+ nothing we can do, except take sides: MySQL follows only one
+ version of the JIS (Japanese Industrial Standards) standard
+ description, and <emphasis>5C is Reverse Solidus</emphasis>,
+ always. Should we make a separate character set where
+ <literal>5C</literal> is Yen Sign, as another DBMS (Oracle) does?
+ We haven't decided. Certainly not in version 5.1 or 5.2. But if
+ people keep complaining about The Great Yen Sign Problem, that's
+ one possible solution.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-euckr-charset-problems">
+
+ <title>Troubles with euckr character set (Korean)</title>
+
+ <para>
+ MySQL supports the <literal>euckr</literal> (Extended Unix Code
+ Korea) character set which is common in South Korea. In theory,
+ problems could arise because there have been several versions of
+ this character set. So far, only one problem has been noted, for
+ Korea's currency symbol. We use the <quote>ASCII</quote> variant
+ of EUC-KR, in which the code point <literal>0x5c</literal> is
+ REVERSE SOLIDUS, that is <literal>\</literal>, instead of the
+ <quote>KS-Roman</quote> variant of EUC-KR, in which the code point
+ <literal>0x5c</literal> is WON SIGN, that is <quote>₩</quote>.
+ You can't convert Unicode <literal>U+20A9</literal> WON SIGN to
+ <literal>euckr</literal>:
+
+<programlisting>
+mysql> <userinput>SELECT CONVERT('₩' USING euckr) AS euckr,</userinput>
+-> <userinput>HEX(CONVERT('₩' USING euckr)) AS hexeuckr;</userinput>
++-------+----------+
+| euckr | hexeuckr |
++-------+----------+
+| ? | 3F |
++-------+----------+
+1 row in set (0.00 sec)
+</programlisting>
+
+ MySQL's graphic Korean chart is here:
+ <ulink url="http://d.udm.net/bar/~bar/charts/euckr_korean_ci.html"/>.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-data-truncated">
+
+ <title>The <quote>Data truncated</quote> message</title>
+
+ <para>
+ For illustration, we'll make a table with one Unicode
+ (<literal>ucs2</literal>) column and one Chinese
+ (<literal>gb2312</literal>) column.
+
+<programlisting>
+mysql> <userinput>CREATE TABLE ch</userinput>
+ -> <userinput>(ucs2 CHAR(3) CHARACTER SET ucs2,</userinput>
+ -> <userinput>gb2312 CHAR(3) CHARACTER SET gb2312);</userinput>
+Query OK, 0 rows affected (0.05 sec)
+</programlisting>
+
+ We'll try to place the rare character <literal>汌</literal> in
+ both columns.
+
+<programlisting>
+mysql> <userinput>INSERT INTO ch VALUES ('A汌B','A汌B');</userinput>
+Query OK, 1 row affected, 1 warning (0.00 sec)
+</programlisting>
+
+ Ah, there's a warning. Let's see what it is.
+
+<programlisting>
+mysql> <userinput>SHOW WARNINGS;</userinput>
++---------+------+---------------------------------------------+
+| Level | Code | Message |
++---------+------+---------------------------------------------+
+| Warning | 1265 | Data truncated for column 'gb2312' at row 1 |
++---------+------+---------------------------------------------+
+1 row in set (0.00 sec)
+</programlisting>
+
+ So it's a warning about the gb2312 column only.
+
+<programlisting>
+mysql> SELECT ucs2,HEX(ucs2),gb2312,HEX(gb2312) FROM ch;
++-------+--------------+--------+-------------+
+| ucs2 | HEX(ucs2) | gb2312 | HEX(gb2312) |
++-------+--------------+--------+-------------+
+| A汌B | 00416C4C0042 | A?B | 413F42 |
++-------+--------------+--------+-------------+
+1 row in set (0.00 sec)
+</programlisting>
+
+ There are several things that need explanation here.
+
+ <orderedlist>
+
+ <listitem>
+ <para>
+ The fact that it's a <quote>warning</quote> rather than an
+ <quote>error</quote> is characteristic of MySQL. We like to
+ try to do what we can, to get the best fit, rather than give
+ up.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The <literal>汌</literal> character isn't in the
+ <literal>gb2312</literal> character set. We described that
+ problem earlier.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Admittedly the message is misleading. We didn't
+ <quote>truncate</quote> in this case, we replaced with a
+ question mark. We've had a complaint about this message (bug
+ #9337). But until we come up with something better, just
+ accept that error/warning code 2165 can mean a variety of
+ things.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ With <literal>SQL_MODE=TRADITIONAL</literal>, there would be
+ an error message, but instead of error 2165 you would see:
+ <literal>ERROR 1406 (22001): Data too long for column
+ 'gb2312' at row 1</literal>.
+ </para>
+ </listitem>
+
+ </orderedlist>
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-access-perl-php-troubles">
+
+ <title>Troubles with Access, Perl, PHP, etc.</title>
+
+ <para>
+ You can't get things to look right with your special program for a
+ GUI front end or browser? Get a direct connection to the server
+ (with <command>mysql</command> on Unix or with
+ <command>mysql.exe</command> on Windows) and try the same query
+ there. If mysql is okay, then the trouble is probably that your
+ application interface needs some initializing. Use
+ <command>mysql</command> to tell you what character set(s) it
+ uses, by saying <literal>SHOW VARIABLES LIKE 'char%';</literal>.
+ If it's Access, you're probably connecting with MyODBC. So you'll
+ want to check out the Reference Manual page for
+ <xref linkend="myodbc-configuration-dsn-windows"/>, and pay
+ attention particularly to the illustrations for <quote>SQL command
+ on connect</quote>. You should enter <literal>SET NAMES
+ 'big5'</literal> (supposing that you use <literal>big5</literal>)
+ (you don't need a <literal>;</literal> here). If it's ASP, you
+ might need to add <literal>SET NAMES</literal> in the code. Here
+ is an example that has worked in the past:
+
+<programlisting>
+<%
+Session.CodePage=0
+Dim strConnection
+Dim Conn
+strConnection="driver={MySQL ODBC 3.51 Driver};server=yourserver;uid=yourusername;pwd=yourpassword;database=yourdatabase;stmt=SET NAMES 'big5';"
+Set Conn = Server.CreateObject(<quote>ADODB.Connection</quote>)
+Conn.Open strConnection
+%>
+</programlisting>
+
+ If it's PHP, here's a slightly different user suggestion:
+
+<programlisting>
+<?php
+ $link = mysql_connect($host,$usr,$pwd);
+ mysql_select_db($db);
+ if (mysql_error()) { print "Database ERROR: " . mysql_error(); }
+ mysql_query("SET CHARACTER SET utf8", $link);
+ mysql_query("SET NAMES 'utf8'", $link);
+?>
+</programlisting>
+
+ In this case, the tipper used <literal>SET CHARACTER SET</literal>
+ statement to change <literal>character_set_client</literal> and
+ <literal>character_set_system</literal>, and used <literal>SET
+ NAMES</literal> to change <literal>character_set_client</literal>
+ and <literal>character_set_connection</literal> and
+ <literal>character_set_results</literal>. (Incidentally, MySQL
+ people encourage the use of the <literal>mysqli</literal>
+ extension, rather than the <literal>mysql</literal> example that
+ this example uses.) Another thing to check with PHP is the browser
+ assumptions. Sometimes a meta tag change in the heading area
+ suffices, for example: <literal><meta http-equiv="Content-Type"
+ content="text/html; charset=utf-8"></literal>
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-restore-mysql40-behavior">
+
+ <title>How can I get old MySQL 4.0 behaviour back?</title>
+
+ <para>
+ In the old days, with MySQL Version 4.0, there was a single
+ <quote>global</quote> character set for both server and client
+ sides, and the decision was made by the server administrator. We
+ changed that starting with MySQL Version 4.1. What happens now is
+ a <quote>handshake</quote>. The MySQL Reference Manual describes
+ it thus:
+
+ <blockquote>
+
+ <para>
+ When a client connects, it sends to the server the name of the
+ character set that it wants to use. The server uses the name
+ to set the <literal>character_set_client</literal>,
+ <literal>character_set_results</literal>, and
+ <literal>character_set_connection</literal> system variables.
+ In effect, the server performs a <literal>SET NAMES</literal>
+ operation using the character set name.
+ </para>
+
+ </blockquote>
+
+ The effect of this is: you can't control the client character set
+ by saying <literal>mysqld --character-set-server=utf8</literal>.
+ But some Asian customers said that they don't like that, they want
+ the MySQL 4.0 behaviour. So we added a <command>mysqld</command>
+ switch, <option>--character-set-client-handshake</option>, which
+ (and this is the interesting part) can be turned off with
+ <option>--skip-character-set-client-handshake</option>. If you
+ start mysqld with
+ <option>--skip-character-set-client-handshake</option>, then the
+ behaviour is like this: When a client connects, it sends to the
+ server the name of the character set that it wants to use. The
+ server ignores it! Here is an illustration with the handshake
+ switch on or off. Pretend that your favourite server character set
+ is <literal>latin1</literal> (of course that's unlikely in a CJK
+ area but it's MySQL's default if there's no
+ <filename>my.ini</filename> or <filename>my.cnf</filename> file).
+ Pretend that the client operates with <literal>utf8</literal>
+ because that's what the client's operating system supports. Start
+ the server with a default character set,
+ <literal>latin1</literal>:
+
+<programlisting>
+mysqld --character-set-server=latin1
+</programlisting>
+
+ Start the client with a default character set,
+ <literal>utf8</literal>:
+
+<programlisting>
+mysql --default-character-set=utf8
+</programlisting>
+
+ Show what the current settings are:
+
+<programlisting>
+mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
++--------------------------+----------------------------------------+
+| Variable_name | Value |
++--------------------------+----------------------------------------+
+| character_set_client | utf8 |
+| character_set_connection | utf8 |
+| character_set_database | latin1 |
+| character_set_filesystem | binary |
+| character_set_results | utf8 |
+| character_set_server | latin1 |
+| character_set_system | utf8 |
+| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
++--------------------------+----------------------------------------+
+8 rows in set (0.01 sec)
+</programlisting>
+
+ Stop the client. Stop the server with
+ <command>mysqladmin</command>. Start the server again but this
+ time say <quote>skip the handshake</quote>:
+
+<programlisting>
+mysqld --character-set-server=utf8 --skip-character-set-client-handshake
+</programlisting>
+
+ Start the client with a default character set,
+ <literal>utf8</literal>, again. Show what the current settings
+ are, again:
+
+<programlisting>
+mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
++--------------------------+----------------------------------------+
+| Variable_name | Value |
++--------------------------+----------------------------------------+
+| character_set_client | latin1 |
+| character_set_connection | latin1 |
+| character_set_database | latin1 |
+| character_set_filesystem | binary |
+| character_set_results | latin1 |
+| character_set_server | latin1 |
+| character_set_system | utf8 |
+| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
++--------------------------+----------------------------------------+
+8 rows in set (0.01 sec)
+</programlisting>
+
+ As you can see by comparing the <literal>SHOW VARIABLES</literal>
+ results, the server ignores the client's initial settings if the
+ <option>--skip-character-set-client-handshake</option> is used.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-fulltext-searches">
+
+ <title>Why do some LIKE and FULLTEXT searches fail?</title>
+
+ <para>
+ There is a simple problem with <literal>LIKE</literal> searches on
+ <literal>BINARY</literal> and <literal>BLOB</literal> columns: we
+ need to know the end of a character. With multi-byte character
+ sets, different characters might have different octet lengths. For
+ example, in <literal>utf8</literal>, <literal>A</literal> requires
+ one byte but <literal>ペ</literal> requires three bytes.
+ Illustration:
+
+<programlisting>
+ +-------------------------+---------------------------+
+ | octet_length(_utf8 'A') | octet_length(_utf8 'ペ') |
+ +-------------------------+---------------------------+
+ | 1 | 3 |
+ +-------------------------+---------------------------+
+ 1 row in set (0.00 sec)
+ </programlisting>
+
+ If we don't know where the first character ends, then we don't
+ know where the second character begins, and even simple-looking
+ searches like <literal>LIKE '_A%'</literal> will fail. The
+ solution is to use a regular CJK character set in the first place,
+ or convert to a CJK character character set before comparing.
+ Incidentally, this is one reason why MySQL cannot allow encodings
+ of nonexistent characters: It must be strict about rejecting bad
+ input, or it won't know where characters end. There is a simple
+ problem with <literal>FULLTEXT</literal>: we need to know the end
+ of a word. With Western writing this is rarely a problem because
+ there are spaces between words. With Asian writing this is not the
+ case. We could use half-good solutions, like saying that all Han
+ characters represent words, or depending on (Japanese) changes
+ from Katakana to Hiragana which are due to grammatical endings.
+ But the only good solution requires a dictionary, and we haven't
+ found a good open-source dictionary.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-available-cjk-charsets">
+
+ <title>What CJK character sets are available?</title>
+
+ <para>
+ The list of CJK character sets may vary depending on version. For
+ example, the <literal>eucjpms</literal> character set is a recent
+ addition. But the language name appears in the
+ <literal>DESCRIPTION</literal> column for every entry in
+ <literal>information_schema.character_sets</literal>. Therefore,
+ to get a current list of all the non-Unicode CJK character sets,
+ say:
+
+<programlisting>
+mysql> <userinput>SELECT character_set_name, description</userinput>
+ -> <userinput>FROM information_schema.character_sets</userinput>
+ -> <userinput>WHERE description LIKE '%Chinese%'</userinput>
+ -> <userinput>OR description LIKE '%Japanese%'</userinput>
+ -> <userinput>OR description LIKE '%Korean%'</userinput>
+ -> <userinput>ORDER BY character_set_name;</userinput>
++--------------------+---------------------------+
+| character_set_name | description |
++--------------------+---------------------------+
+| big5 | Big5 Traditional Chinese |
+| cp932 | SJIS for Windows Japanese |
+| eucjpms | UJIS for Windows Japanese |
+| euckr | EUC-KR Korean |
+| gb2312 | GB2312 Simplified Chinese |
+| gbk | GBK Simplified Chinese |
+| sjis | Shift-JIS Japanese |
+| ujis | EUC-JP Japanese |
++--------------------+---------------------------+
+8 rows in set (0.01 sec)
+</programlisting>
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-character-x-availability">
+
+ <title>Is character X available in all character sets?</title>
+
+ <para>
+ The majority of everyday-use Chinese/Japanese characters
+ (simplified Chinese and basic non-halfwidth Kana Japanese) appear
+ in all CJK character sets. Here is a stored procedure which
+ accepts a UCS-2 Unicode character, converts it to all other
+ character sets, and displays the results in hexadecimal.
+
+<programlisting>
+DELIMITER //
+
+CREATE PROCEDURE p_convert (ucs2_char CHAR(1) CHARACTER SET ucs2)
+BEGIN
+
+CREATE TABLE tj
+ (ucs2 CHAR(1) character set ucs2,
+ utf8 CHAR(1) character set utf8,
+ big5 CHAR(1) character set big5,
+ cp932 CHAR(1) character set cp932,
+ eucjpms CHAR(1) character set eucjpms,
+ euckr CHAR(1) character set euckr,
+ gb2312 CHAR(1) character set gb2312,
+ gbk CHAR(1) character set gbk,
+ sjis CHAR(1) character set sjis,
+ ujis CHAR(1) character set ujis);
+
+INSERT INTO tj (ucs2) VALUES (ucs2_char);
+
+UPDATE tj SET utf8=ucs2,
+ big5=ucs2,
+ cp932=ucs2,
+ eucjpms=ucs2,
+ euckr=ucs2,
+ gb2312=ucs2,
+ gbk=ucs2,
+ sjis=ucs2,
+ ujis=ucs2;
+
+/* If there's a conversion problem, UPDATE will produce a warning. */
+
+SELECT hex(ucs2) AS ucs2,
+ hex(utf8) AS utf8,
+ hex(big5) AS big5,
+ hex(cp932) AS cp932,
+ hex(eucjpms) AS eucjpms,
+ hex(euckr) AS euckr,
+ hex(gb2312) AS gb2312,
+ hex(gbk) AS gbk,
+ hex(sjis) AS sjis,
+ hex(ujis) AS ujis
+FROM tj;
+
+DROP TABLE tj;
+
+END//
+</programlisting>
+
+ The input can be any single <literal>ucs2</literal> character, or
+ it can be the code point value (hexadecimal representation) of
+ that character. Here's an example of what
+ <function>P_CONVERT()</function> can do. An earlier answer said
+ that the character <quote>Katakana Letter Ge</quote> appears in
+ all CJK character sets. We know that the code point value of
+ Katakana Letter Ge is <literal>0x30da</literal>. (By the way, we
+ got the name from Unicode's list of ucs2 encodings and names:
+ <ulink url="http://www.unicode.org/Public/UNIDATA/UnicodeData.txt"/>.)
+ So we'll say:
+
+<programlisting>
+mysql> <userinput>CALL P_CONVERT(0x30da)//</userinput>
++------+--------+------+-------+---------+-------+--------+------+------+------+
+| ucs2 | utf8 | big5 | cp932 | eucjpms | euckr | gb2312 | gbk | sjis | ujis |
++------+--------+------+-------+---------+-------+--------+------+------+------+
+| 30DA | E3839A | C772 | 8379 | A5DA | ABDA | A5DA | A5DA | 8379 | A5DA |
++------+--------+------+-------+---------+-------+--------+------+------+------+
+1 row in set (0.04 sec)
+</programlisting>
+
+ Since none of the column values is <literal>3F</literal>, we know
+ that every conversion worked.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-sorting-problems-unicode-1">
+
+ <title>Strings don't sort correctly in Unicode (I)</title>
+
+ <para>
+ Sometimes people observe that the result of a
+ <literal>utf8_unicode_ci</literal> or
+ <literal>ucs2_unicode_ci</literal> search or <literal>ORDER
+ BY</literal> sort is not what they think a native would expect.
+ Although we never rule out the chance that there is a bug, we have
+ found in the past that people are not correctly reading the
+ standard table of weights for the Unicode Collation Algorithm. So,
+ here's how to check whether we're using the right collation. The
+ correct table for MySQL is this one:
+ <ulink url="http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt"/>.
+ This is different from the first table you will find by navigating
+ from the <literal>unicode.org</literal> home page. MySQL
+ deliberately uses the older 4.0.0 <quote>allkeys</quote> table,
+ instead of the current 4.1.0 table. We are very wary about
+ changing ordering which affects indexes. Here is an example of a
+ problem that we handled recently, for a complaint in our bugs
+ database, <ulink url="http://bugs.mysql.com/bug.php?id=16526"/>:
+
+<programlisting>
+mysql> <userinput>CREATE TABLE tj (s1 CHAR(1) CHARACTER SET utf8 COLLATE utf8_unicode_ci);</userinput>
+Query OK, 0 rows affected (0.05 sec)
+
+mysql> <userinput>INSERT INTO tj VALUES ('が'),('か');</userinput>
+Query OK, 2 rows affected (0.00 sec)
+Records: 2 Duplicates: 0 Warnings: 0
+
+mysql> <userinput>SELECT * FROM tj WHERE s1 = 'か';</userinput>
++------+
+| s1 |
++------+
+| が |
+| か |
++------+
+2 rows in set (0.00 sec)
+</programlisting>
+
+ If your eyes are sharp, you'll see that the character in the first
+ result row isn't the one that we searched for. Why did MySQL
+ retrieve it? First we look for the Unicode code point value, which
+ is possible by reading the hexadecimal number for the
+ <literal>ucs2</literal> version of the characters:
+
+<programlisting>
+mysql> <userinput>SELECT s1,HEX(CONVERT(s1 USING ucs2)) FROM tj;</userinput>
++------+-----------------------------+
+| s1 | HEX(CONVERT(s1 USING ucs2)) |
++------+-----------------------------+
+| が | 304C |
+| か | 304B |
++------+-----------------------------+
+2 rows in set (0.03 sec)
+</programlisting>
+
+ Now let's search for <literal>304B</literal> and
+ <literal>304C</literal> in the 4.0.0 allkeys table. We'll find
+ these lines:
+
+<programlisting>
+304B ; [.1E57.0020.000E.304B] # HIRAGANA LETTER KA
+304C ; [.1E57.0020.000E.304B][.0000.0140.0002.3099] # HIRAGANA LETTER GA; QQCM
+</programlisting>
+
+ The official Unicode names (following the <quote>#</quote> mark)
+ are informative; they tell us the Japanese syllabary (Hiragana),
+ the informal classification (letter instead of digit or
+ punctuation), and the Western identifier (<literal>KA</literal> or
+ <literal>GA</literal>, which happen to be voiced/unvoiced
+ components of the same letter pair). More importantly, the Primary
+ Weight (the first hexadecimal number inside the square brackets)
+ is <literal>1E57</literal> on both lines. For comparisons in both
+ searching and sorting, MySQL pays attention only to the Primary
+ Weight, it ignores all the other numbers. So now we know that
+ we're sorting <literal>が</literal> and <literal>か</literal>
+ correctly according to the Unicode specification. If we wanted to
+ distinguish them, we'd have to use a
+ non-Unicode-Collation-Algorithm collation
+ (<literal>utf8_unicode_bin</literal> or
+ <literal>utf8_general_ci</literal>), or compare the
+ <function>HEX()</function> values, or say <literal>ORDER BY
+ CONVERT(s1 USING sjis)</literal>. Being correct <quote>according
+ to Unicode</quote> isn't enough, of course: the person who
+ submitted the bug was equally correct. We plan to add another
+ collation for Japanese according to the JIS X 4061 standard, where
+ voiced/unvoiced letters like KA/GA are distinguishable for
+ ordering purposes.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-sorting-problems-unicode-2">
+
+ <title>Strings Don't Sort Correctly In Unicode (II)</title>
+
+ <para>
+ You're using Unicode (<literal>ucs2</literal> or
+ <literal>utf8</literal>), and you know what the Unicode sort order
+ is (see the previous question and answer), but MySQL still seems
+ to sort your table wrong? This might be easy.
+
+<programlisting>
+mysql> <userinput>SHOW CREATE TABLE t\G</userinput>
+******************** 1. row ******************
+Table: t
+Create Table: CREATE TABLE `t` (
+`s1` char(1) CHARACTER SET ucs2 DEFAULT NULL
+) ENGINE=MyISAM DEFAULT CHARSET=latin1
+1 row in set (0.00 sec)
+</programlisting>
+
+ Hmm, the character set looks okay. Let's look at the
+ <literal>information_schema</literal> for this column.
+
+<programlisting>
+mysql> <userinput>SELECT column_name, character_set_name, collation_name</userinput>
+ -> <userinput>FROM information_schema.columns</userinput>
+ -> <userinput>WHERE column_name = 's1'</userinput>
+ -> <userinput>AND table_name = 't';</userinput>
++-------------+--------------------+-----------------+
+| column_name | character_set_name | collation_name |
++-------------+--------------------+-----------------+
+| s1 | ucs2 | ucs2_general_ci |
++-------------+--------------------+-----------------+
+1 row in set (0.01 sec)
+</programlisting>
+
+ Oops, the collation is <literal>ucs2_general_ci</literal> instead
+ of <literal>ucs2_unicode_ci</literal>! Here's why:
+
+<programlisting>
+mysql> <userinput>SHOW CHARSET LIKE 'ucs2%';</userinput>
++---------+---------------+-------------------+--------+
+| Charset | Description | Default collation | Maxlen |
++---------+---------------+-------------------+--------+
+| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
++---------+---------------+-------------------+--------+
+1 row in set (0.00 sec)
+</programlisting>
+
+ For <literal>ucs2</literal> and <literal>utf8</literal>, the
+ <quote>general</quote> collation is the default. To specify that
+ you wanted a <quote>unicode</quote> collation, you should have
+ specified <literal>COLLATE ucs2_unicode_ci</literal>.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-supplementary-chars-rejected">
+
+ <title>My supplementary characters get rejected</title>
+
+ <para>
+ Right. MySQL doesn't support supplementary characters (characters
+ which need more than 3 bytes with UTF-8). We support only what
+ Unicode calls the <emphasis>Basic Multilingual Plane / Plane
+ 0</emphasis>. Only a few very rare Han characters are
+ supplementary; support for them is uncommon. This has led to bug
+ #12600 (<ulink url="http://bugs.mysql.com/bug.php?id=12600"/>)
+ which we rejected as <quote>not a bug</quote>. With
+ <literal>utf8</literal>, we must truncate an input string when we
+ encounter bytes that we don't understand. Otherwise, we wouldn't
+ know how long the bad multi-byte character is. A workaround is: if
+ you use <literal>ucs2</literal> instead of
+ <literal>utf8</literal>, then the bad characters will change to
+ question marks, but there will be no truncation. Or change the
+ data type to <literal>BLOB</literal> or <literal>BINARY</literal>,
+ which have no validity checking. In our bugs database, bug #14052
+ (<ulink url="http://bugs.mysql.com/bug.php?id=14052"/>) is a
+ feature request for Wikipedia, asking us to support supplementary
+ characters extending <literal>ucs2</literal> as well as
+ <literal>utf8</literal>.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-cjkv">
+
+ <title>Shouldn't it be CJKV (V for Vietnamese)?</title>
+
+ <para>
+ No. The term CJKV (Chinese Japanese Korean Vietnamese) refers to
+ character sets which contain Han (originally Chinese) characters.
+ MySQL has no plan to support the old Vietnamese script using Han
+ characters. MySQL does of course support the modern Vietnamese
+ script with Western characters. Another question that has come up
+ (once) is a request for specialized Vietnamese collation, see
+ <ulink url="http://bugs.mysql.com/bug.php?id=4745"/>. We might do
+ something about it someday, if many more requests arise.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-fixing-cjk-problems">
+
+ <title>Will MySQL fix any CJK problems in version 5.1?</title>
+
+ <remark role="update">
+ [SH] Remove (or rewrite) whole section once the fixes it talks
+ about are implemented.
+ </remark>
+
+ <para>
+ Yes. We're changing the names of files and directories. Here's an
+ example, using mysql as <literal>root</literal> under Linux:
+
+ <orderedlist>
+
+ <listitem>
+ <para>
+ Create a table with a name containing a Han character:
+
+<programlisting>
+mysql> <userinput>CREATE TABLE tab_楮 (s1 INT);</userinput>
+Query OK, 0 rows affected (0.07 sec)
+</programlisting>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Find out where MySQL stores database files:
+
+<programlisting>
+mysql> <userinput>SHOW VARIABLES LIKE 'datadir';</userinput>
++---------------+-----------------------+
+| Variable_name | Value |
++---------------+-----------------------+
+| datadir | /usr/local/mysql/var/ |
++---------------+-----------------------+
+1 row in set (0.00 sec)
+</programlisting>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Look at the directory to see the MyISAM table files:
+
+<programlisting>
+# cd /usr/local/mysql/var/dba
+# dir tab_*
+-rw-rw---- 1 root root 0 2006-05-16 10:22 tab_@stripped
+-rw-rw---- 1 root root 1024 2006-05-16 10:22 tab_@stripped
+-rw-rw---- 1 root root 8556 2006-05-16 10:22 tab_@stripped
+</programlisting>
+ </para>
+ </listitem>
+
+ </orderedlist>
+
+ Notice that MySQL has converted the Han character to
+ <literal>@</literal> + (Unicode value of Han character), that is,
+ to a purely ASCII representation. This solves an old problem, that
+ database files weren't portable, because some computers wouldn't
+ allow <literal>楮</literal> in a file name. Conversion to the new
+ file names will be automatic when you upgrade to version 5.1. This
+ should take care of bug #6313 in our bugs database,
+ <ulink url="http://bugs.mysql.com/bug.php?id=6313"/>.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-manual-translation">
+
+ <title>When will MySQL translate the manual again?</title>
+
+ <remark role="update">
+ [SH] Update as CJK translations of manuals are updated.
+ </remark>
+
+ <para>
+ A Beijing-based group has produced a Simplified Chinese version
+ for us under contract. It's complete and can be found on
+ <ulink url="http://dev.mysql.com/doc/#chinese-5.1"/>. It's up to
+ date as of version 5.1.2. The Japanese manual can be downloaded
+ from this page: http://dev.mysql.com/doc/ (Scroll down the page
+ until you see the word <quote>Japanese</quote>.) It is still for
+ version 4.1.
+ </para>
+
+ </section>
+
+ <section id="cjk-faq-contact">
+
+ <title>Whom can I talk to?</title>
+
+ <remark role="update">
+ [SH] Update as things change.
+ </remark>
+
+ <para>
+ Check <ulink url="http://dev.mysql.com/user-groups/"/> to see if
+ there is a MySQL user group near you. If there isn't: why not
+ start one yourself? To contact a sales engineer in MySQL KK's
+ Japan office:
+
+<programlisting>
+Tel: +81(0)3-5326-3133
+Fax: +81(0)3-5326-3001
+Email: dsaito@stripped
+</programlisting>
+
+ To see feature requests about language issues:
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Go to <ulink url="http://bugs.mysql.com"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Click <guimenu>Advanced Search</guimenu>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ In the <guilabel>Severity</guilabel> dropdown box, click
+ <literal>S4 (Feature Request)</literal>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ In the list box beside <guilabel>Category</guilabel>, click
+ <literal>Character Sets</literal>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Click the <guibutton>Search</guibutton> button.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ You can post CJK questions, or see previous answers, on MySQL's
+ <quote>Character Sets, Collation, Unicode</quote> forum:
+ <ulink url="http://forums.mysql.com/list.php?103"/>. MySQL plans
+ to add native-language forums on
+ <ulink url="http://forums.mysql.com/"/> very soon.
+ </para>
+
+ </section>
+
+</section>
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r2517 - trunk/refman-common | stefan | 26 Jun |