Author: jstephens
Date: 2006-10-23 14:55:19 +0200 (Mon, 23 Oct 2006)
New Revision: 3701
Log:
More CJK refactoring for new FAQ.
Modified:
trunk/refman-5.0/faqs.xml
Modified: trunk/refman-5.0/faqs.xml
===================================================================
--- trunk/refman-5.0/faqs.xml 2006-10-23 06:44:04 UTC (rev 3700)
+++ trunk/refman-5.0/faqs.xml 2006-10-23 12:55:19 UTC (rev 3701)
Changed blocks: 21, Lines Added: 268, Lines Deleted: 139; 24560 bytes
@@ -3925,9 +3925,10 @@
<programlisting>
SELECT character_set_name, collation_name
-FROM information_schema.columns WHERE table_schema = your_database_name
-AND table_name = your_table_name
-AND column_name = your_column_name;
+ FROM information_schema.columns
+ WHERE table_schema = your_database_name
+ AND table_name = your_table_name
+ AND column_name = your_column_name;
</programlisting>
</para>
@@ -4032,10 +4033,7 @@
Windows: <command>mysql.exe</command>). If
<command>mysql</command> displays correctly but your
application doesn't, then your problem is probably due to
- settings in the application. <quote>Settings</quote>, but
- consult also the question about <quote>Troubles with
- Access (or Perl) (or PHP) (etc.)</quote> much later in
- this FAQ.
+ system settings.
</para>
<para>
@@ -4111,6 +4109,17 @@
</programlisting>
</para>
+ <remark role="todo">
+ [js] Fix refs once all sections are reworked.
+ </remark>
+
+ <para>
+ It is also possible that there are issues with application
+ settings — consult also the question about
+ <quote>Troubles with Access (or Perl) (or PHP)
+ (etc.)</quote> much later in this FAQ.
+ </para>
+
</answer>
</qandaentry>
@@ -4143,8 +4152,8 @@
</remark>
<remark role="todo">
- [js] Get the charts from Bar perferably as CSV or some other
- format that will be easy to turn into DB-XML for inlcusion as
+ [js] Get the charts from Bar - preferably as CSV or some other
+ format that will be easy to turn into DB-XML - for inclusion as
appendices to the RefMan.
</remark>
@@ -4255,6 +4264,10 @@
<primary>big5</primary>
</indexterm>
+ <remark role="todo">
+ [js] Needs to be framed as a Q-and-A.
+ </remark>
+
<para>
MySQL supports the Big5 character set which is common in Hong
Kong and Taiwan (Republic of China). MySQL's
@@ -4337,7 +4350,7 @@
<answer>
<para>
- Here's why. In this conversion table, the
+ In the following conversion table, the
<literal>ucs2</literal> column is the source, and the
<literal>sjis</literal>, <literal>cp932</literal>,
<literal>ujis</literal>, and <literal>eucjpms</literal>
@@ -4396,34 +4409,53 @@
FULLWIDTH NOT SIGN FFE2 3F 81CA
</programlisting>
- It means that MySQL converts the NOT SIGN (Unicode
+ This means that MySQL converts the NOT SIGN (Unicode
<literal>U+00AC</literal>) to <literal>sjis</literal> code
point <literal>0x81CA</literal> and to
<literal>cp932</literal> code point <literal>3F</literal>.
(<literal>3F</literal> is the question mark
(<quote>?</quote>) — this is what we always use when
- we can't convert.) Now, what should we do if we want to
- convert <literal>sjis 81CA</literal> to
- <literal>cp932</literal>? Our answer is: <quote>?</quote>.
- There are serious complaints about this, many people would
- prefer a <quote>loose</quote> conversion, so that
- <literal>81CA (NOT SIGN)</literal> in
- <literal>sjis</literal> becomes <literal>81CA (FULLWIDTH
- NOT SIGN)</literal> in <literal>cp932</literal>. We are
- considering changing.
+ we can't convert.)
</para>
</answer>
</qandaentry>
+ <qandaentry>
+
+ <question>
+
+ <para>
+ What should I do if I want to convert <literal>sjis
+ 81CA</literal> to <literal>cp932</literal>?
+ </para>
+
+ </question>
+
+ <answer>
+
+ <para>
+ Our answer is: <quote>?</quote>. There are serious
+ complaints about this: many people would prefer a
+ <quote>loose</quote> conversion, so that <literal>81CA
+ (NOT SIGN)</literal> in <literal>sjis</literal> becomes
+ <literal>81CA (FULLWIDTH NOT SIGN)</literal> in
+ <literal>cp932</literal>. We are considering a change to
+ this behaviour.
+ </para>
+
+ </answer>
+
+ </qandaentry>
+
</qandaset>
</section>
<section id="cjk-faq-great-yen-sign-problem">
- <title>The Great Yen Sign problem (Japanese)</title>
+ <title>How Do I Work With the Yen Sign (<literal>¥</literal>)?</title>
<indexterm type="concept">
<primary>CJK (Chinese, Japanese, Korean)</primary>
@@ -4439,30 +4471,70 @@
<primary>Yen sign (Japanese)</primary>
</indexterm>
- <para>
- In SJIS the code for Yen Sign (<literal>¥</literal>) is
- <literal>5C</literal>. In SJIS the code for Reverse Solidus
- (<literal>\</literal>) is <literal>5C</literal>. Since the above
- statements are contradictory, confusion often results. Well, to
- put it more seriously, some versions of Japanese character sets
- (both <literal>sjis</literal> and <literal>euc</literal>) have
- treated <literal>5C</literal> as a reverse solidus, also known
- as a backslash, and others have treated it as a yen sign.
- There's nothing we can do, except take sides: MySQL follows only
- one version of the JIS (Japanese Industrial Standards) standard
- description, and <emphasis>5C is Reverse Solidus</emphasis>,
- always. Should we make a separate character set where
- <literal>5C</literal> is Yen Sign, as another DBMS (Oracle)
- does? We haven't decided. Certainly not in version 5.1 or 5.2.
- But if people keep complaining about The Great Yen Sign Problem,
- that's one possible solution.
- </para>
+ <qandaset>
+ <qandaentry>
+
+ <question>
+
+ <para>
+ How does MySQL represent the Yen (<literal>¥</literal>)
+ sign?
+ </para>
+
+ </question>
+
+ <answer>
+
+ <para>
+ A problem arises because some versions of Japanese
+ character sets (both <literal>sjis</literal> and
+ <literal>euc</literal>) treat <literal>5C</literal> as a
+ reverse solidus (<literal>\</literal> — also known
+ as a backslash), and others treat it as a yen sign
+ (<literal>¥</literal>).
+ </para>
+
+ <para>
+ MySQL follows only one version of the JIS (Japanese
+ Industrial Standards) standard description. In MySQL,
+ <emphasis>5C is always Reverse Solidus</emphasis>.
+ </para>
+
+ </answer>
+
+ </qandaentry>
+
+ <qandaentry>
+
+ <question>
+
+ <para>
+ Do MySQL plan to make a separate character set where
+ <literal>5C</literal> is the Yen sign, as at least one
+ other major DBMS does?
+ </para>
+
+ </question>
+
+ <answer>
+
+ <para>
+ This is one possible solution to the Yen sign issue;
+ however, this will not happen in MySQL 5.1 or 5.2.
+ </para>
+
+ </answer>
+
+ </qandaentry>
+
+ </qandaset>
+
</section>
<section id="cjk-faq-euckr-charset-problems">
- <title>Troubles with euckr character set (Korean)</title>
+ <title>Issues with <literal>euckr</literal> (Korean) Character Set Support</title>
<indexterm type="concept">
<primary>CJK (Chinese, Japanese, Korean)</primary>
@@ -4478,22 +4550,44 @@
<primary>Korean</primary>
</indexterm>
+ <qandaset>
+
+ <qandaentry>
+
+ <question>
+
+ <para>
+ Of what issues should I be aware when working with Korean
+ character sets in MySQL?
+ </para>
+
+ </question>
+
+ </qandaentry>
+
+ </qandaset>
+
<para>
- MySQL supports the <literal>euckr</literal> (Extended Unix Code
- Korea) character set which is common in South Korea. In theory,
- problems could arise because there have been several versions of
- this character set. So far, only one problem has been noted, for
- Korea's currency symbol. We use the <quote>ASCII</quote> variant
- of EUC-KR, in which the code point <literal>0x5c</literal> is
- REVERSE SOLIDUS, that is <literal>\</literal>, instead of the
- <quote>KS-Roman</quote> variant of EUC-KR, in which the code
- point <literal>0x5c</literal> is WON SIGN, that is
- <quote>₩</quote>. You can't convert Unicode
- <literal>U+20A9</literal> WON SIGN to <literal>euckr</literal>:
+ In theory, while there have been several versions of the
+ <literal>euckr</literal> (<firstterm>Extended Unix Code
+ Korea</firstterm>) character set, only one problem has been
+ noted.
+ </para>
+ <para>
+ We use the <quote>ASCII</quote> variant of EUC-KR, in which the
+ code point <literal>0x5c</literal> is REVERSE SOLIDUS, that is
+ <literal>\</literal>, instead of the <quote>KS-Roman</quote>
+ variant of EUC-KR, in which the code point
+ <literal>0x5c</literal> is <literal>WON
+ SIGN</literal>(<literal>₩</literal>). This means that you
+ cannot convert Unicode <literal>U+20A9</literal> to
+ <literal>euckr</literal>:
+
<programlisting>
-mysql> <userinput>SELECT CONVERT('₩' USING euckr) AS euckr,</userinput>
--> <userinput>HEX(CONVERT('₩' USING euckr)) AS hexeuckr;</userinput>
+mysql> <userinput>SELECT</userinput>
+ -> <userinput>CONVERT('₩' USING euckr) AS euckr,</userinput>
+ -> <userinput>HEX(CONVERT('₩' USING euckr)) AS hexeuckr;</userinput>
+-------+----------+
| euckr | hexeuckr |
+-------+----------+
@@ -4502,6 +4596,11 @@
1 row in set (0.00 sec)
</programlisting>
+ <remark role="todo">
+ [js] Update following reference once we have incorporated
+ these charts into the Manual.
+ </remark>
+
MySQL's graphic Korean chart is here:
<ulink url="http://d.udm.net/bar/~bar/charts/euckr_korean_ci.html"/>.
</para>
@@ -4542,7 +4641,7 @@
both columns.
<programlisting>
-mysql> <userinput>INSERT INTO ch VALUES ('A汌B','A汌B');</userinput>
+mysql> <userinput>INSERT INTO ch VALUES ('A汌B','A汌B');</userinput>
Query OK, 1 row affected, 1 warning (0.00 sec)
</programlisting>
@@ -4558,10 +4657,11 @@
1 row in set (0.00 sec)
</programlisting>
- So it's a warning about the gb2312 column only.
+ So it's a warning about the <literal>gb2312</literal> column
+ only.
<programlisting>
-mysql> SELECT ucs2,HEX(ucs2),gb2312,HEX(gb2312) FROM ch;
+mysql> SELECT ucs2,HEX(ucs2),gb2312,HEX(gb2312) FROM ch;
+-------+--------------+--------+-------------+
| ucs2 | HEX(ucs2) | gb2312 | HEX(gb2312) |
+-------+--------------+--------+-------------+
@@ -4596,9 +4696,9 @@
Admittedly the message is misleading. We didn't
<quote>truncate</quote> in this case, we replaced with a
question mark. We've had a complaint about this message
- (bug #9337). But until we come up with something better,
- just accept that error/warning code 2165 can mean a
- variety of things.
+ (See Bug #9337). But until we come up with something
+ better, just accept that error/warning code 2165 can mean
+ a variety of things.
</para>
</listitem>
@@ -4616,9 +4716,9 @@
</section>
- <section id="cjk-faq-access-perl-php-troubles">
+ <section id="faqs-cjk-access-perl-php">
- <title>Troubles with Access, Perl, PHP, etc.</title>
+ <title>Using CJK Character Sets with Access, PHP, and other APIs</title>
<indexterm type="concept">
<primary>CJK (Chinese, Japanese, Korean)</primary>
@@ -4633,19 +4733,18 @@
<para>
You can't get things to look right with your special program for
a GUI front end or browser? Get a direct connection to the
- server (with <command>mysql</command> on Unix or with
- <command>mysql.exe</command> on Windows) and try the same query
- there. If mysql is okay, then the trouble is probably that your
- application interface needs some initializing. Use
- <command>mysql</command> to tell you what character set(s) it
- uses, by saying <literal>SHOW VARIABLES LIKE 'char%';</literal>.
- If it's Access, you're probably connecting with MyODBC. So
- you'll want to check out the Reference Manual page for
- configuring an ODBC DSN, and pay attention particularly to the
- illustrations for <quote>SQL command on connect</quote>. You
- should enter <literal>SET NAMES 'big5'</literal> (supposing that
- you use <literal>big5</literal>) (you don't need a
- <literal>;</literal> here). If it's ASP, you might need to add
+ server using <command>mysql</command> (Windows:
+ <command>mysql.exe</command>) and try the same query there. If
+ <command>mysql</command> responds correctly, then the trouble
+ may be that your application interface requires initialization.
+ Use <command>mysql</command> to tell you what character set or
+ sets it uses with the statement <literal>SHOW VARIABLES LIKE
+ 'char%';</literal>. If you are using Access, then you are most
+ likely connecting with MyODBC. In this case, you should check
+ <xref linkend="myodbc-configuration"/>. If, for instance, you
+ use <literal>big5</literal>, you would enter <literal>SET NAMES
+ 'big5'</literal>. (Note that no <literal>;</literal> is required
+ in this case). If you are using ASP, you might need to add
<literal>SET NAMES</literal> in the code. Here is an example
that has worked in the past:
@@ -4654,48 +4753,75 @@
Session.CodePage=0
Dim strConnection
Dim Conn
-strConnection="driver={MySQL ODBC 3.51 Driver};server=yourserver;uid=yourusername;" \
- & "pwd=yourpassword;database=yourdatabase;stmt=SET NAMES 'big5';"
-Set Conn = Server.CreateObject(<quote>ADODB.Connection</quote>)
+strConnection="driver={MySQL ODBC 3.51 Driver};server=<replaceable>server</replaceable>;uid=<replaceable>username</replaceable>;" \
+ & "pwd=<replaceable>password</replaceable>;database=<replaceable>database</replaceable>;stmt=SET NAMES 'big5';"
+Set Conn = Server.CreateObject("ADODB.Connection")
Conn.Open strConnection
%>
</programlisting>
- If it's PHP, here's a slightly different user suggestion:
+ In much the same way, if you are using any character set other
+ than <literal>latin1</literal> with Connector/NET, then you must
+ specify the character set in the connection string. See
+ <xref linkend="connector-net-using-connecting"/>, for more
+ information.
+ </para>
+ <para>
+ If you are using PHP, try this:
+
<programlisting>
<?php
- $link = mysql_connect($host,$usr,$pwd);
- mysql_select_db($db);
- if (mysql_error()) { print "Database ERROR: " . mysql_error(); }
- mysql_query("SET CHARACTER SET utf8", $link);
+ $link = mysql_connect($host, $usr, $pwd);
+
+ mysql_select_db($db);
+
+ if( mysql_error() ) { print "Database ERROR: " . mysql_error(); }
mysql_query("SET NAMES 'utf8'", $link);
-?>
+?>
</programlisting>
- In this case, the tipper used <literal>SET CHARACTER
- SET</literal> statement to change
+ In this case, we used <literal>SET NAMES</literal> to change
<literal>character_set_client</literal> and
- <literal>character_set_result</literal>, and used <literal>SET
- NAMES</literal> to change
- <literal>character_set_client</literal> and
<literal>character_set_connection</literal> and
- <literal>character_set_results</literal>. So actually the
- <literal>SET CHARACTER SET</literal> statement is redundant.
- (Incidentally, MySQL people encourage the use of the
- <literal>mysqli</literal> extension, rather than the
- <literal>mysql</literal> example that this example uses.)
- Another thing to check with PHP is the browser assumptions.
- Sometimes a meta tag change in the heading area suffices, for
- example: <literal><meta http-equiv="Content-Type"
- content="text/html; charset=utf-8"></literal>
+ <literal>character_set_results</literal>.
</para>
<para>
- For Connector/J tips, see the manual section in the Connectors
- chapter titled <quote>Using Character Sets and Unicode</quote>.
+ We encourage the use of the newer <literal>mysqli</literal>
+ extension, rather than <literal>mysql</literal>. Using
+ <literal>mysqli</literal>, the previous example could be
+ rewritten as shown here:
+
+<programlisting>
+<?php
+ $link = new mysqli($host, $usr, $pwd, $db);
+
+ if( mysqli_connect_errno() )
+ {
+ printf("Connect failed: %s\n", mysqli_connect_error());
+ exit();
+ }
+
+ $link->query("SET NAMES 'utf8'");
+?>
+</programlisting>
+
+ Another issue often encountered in PHP applications has to do
+ with assumptions made by the browser. Sometimes adding or
+ changing a <literal><meta></literal> tag suffices to
+ correct the problem: for example, to insure that the user agent
+ interpets page content as <literal>UTF-8</literal>, you should
+ include <literal><meta http-equiv="Content-Type"
+ content="text/html; charset=utf-8"></literal> in the
+ <literal><head></literal> of the HTML page.
</para>
+ <para>
+ If you are using Connector/J, see
+ <xref linkend="connector-j-reference-charsets"/>.
+ </para>
+
</section>
<section id="cjk-faq-restore-mysql40-behavior">
@@ -4708,12 +4834,12 @@
</indexterm>
<para>
- In the old days, with MySQL Version 4.0, there was a single
- <quote>global</quote> character set for both server and client
- sides, and the decision was made by the server administrator. We
- changed that starting with MySQL Version 4.1. What happens now
- is a <quote>handshake</quote>. The MySQL Reference Manual
- describes it thus:
+ In MySQL Version 4.0, there was a single <quote>global</quote>
+ character set for both server and client, and the decision as to
+ which character to use was made by the server administrator.
+ This changed starting with MySQL Version 4.1. What happens now
+ is a <quote>handshake</quote>, as described in
+ <xref linkend="charset-connection"/>:
<blockquote>
@@ -4729,43 +4855,46 @@
</blockquote>
- The effect of this is: you can't control the client character
- set by saying <literal>mysqld
- --character-set-server=utf8</literal>. But some Asian customers
- said that they don't like that, they want the MySQL 4.0
- behaviour. So we added a <command>mysqld</command> switch,
- <option>--character-set-client-handshake</option>, which (and
- this is the interesting part) can be turned off with
+ The effect of this is that you cannot control the client
+ character set by starting <command>mysqld</command> with
+ <option>--character-set-server=utf8</option>. However, some of
+ our Asian customers have said that prefer the MySQL 4.0
+ behaviour. To make it possible to retain this behaviour, we
+ added a <command>mysqld</command> switch,
+ <option>--character-set-client-handshake</option>, which can be
+ turned off with
<option>--skip-character-set-client-handshake</option>. If you
- start mysqld with
- <option>--skip-character-set-client-handshake</option>, then the
- behaviour is like this: When a client connects, it sends to the
- server the name of the character set that it wants to use. The
- server ignores it! Here is an illustration with the handshake
- switch on or off. Pretend that your favourite server character
- set is <literal>latin1</literal> (of course that's unlikely in a
- CJK area but it's MySQL's default if there's no
- <filename>my.ini</filename> or <filename>my.cnf</filename>
- file). Pretend that the client operates with
- <literal>utf8</literal> because that's what the client's
- operating system supports. Start the server with a default
- character set, <literal>latin1</literal>:
+ start <command>mysqld</command> with
+ <option>--skip-character-set-client-handshake</option>, then,
+ when a client connects, it sends to the server the name of the
+ character set that it wants to use — however,
+ <emphasis>the server ignores this request from the
+ client</emphasis>.
+ </para>
+ <para>
+ By way of example, suppose that your favourite server character
+ set is <literal>latin1</literal> (unlikely in a CJK area, but
+ this is the default value). Suppose further that the client uses
+ <literal>utf8</literal> because this is what the client's
+ operating system supports. Now, start the server with
+ <literal>latin1</literal> as its default character set:
+
<programlisting>
mysqld --character-set-server=latin1
</programlisting>
- Start the client with a default character set,
+ And then start the client with the default character set
<literal>utf8</literal>:
<programlisting>
mysql --default-character-set=utf8
</programlisting>
- Show what the current settings are:
+ The current settings can be seen as shown here:
<programlisting>
-mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
+mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
@@ -4781,20 +4910,19 @@
8 rows in set (0.01 sec)
</programlisting>
- Stop the client. Stop the server with
- <command>mysqladmin</command>. Start the server again but this
- time say <quote>skip the handshake</quote>:
+ Now stop the client, and then stop the server using
+ <command>mysqladmin</command>. Then start the server again, but
+ this time tell it to skip the handshake like so:
<programlisting>
mysqld --character-set-server=utf8 --skip-character-set-client-handshake
</programlisting>
- Start the client with a default character set,
- <literal>utf8</literal>, again. Show what the current settings
- are, again:
+ Start the client with <literal>utf8</literal> once again as the
+ default character set, then display the current settings:
<programlisting>
-mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
+mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
@@ -4810,9 +4938,9 @@
8 rows in set (0.01 sec)
</programlisting>
- As you can see by comparing the <literal>SHOW
- VARIABLES</literal> results, the server ignores the client's
- initial settings if the
+ As you can see by comparing the differing results from
+ <literal>SHOW VARIABLES</literal>, the server ignores the
+ client's initial settings if the
<option>--skip-character-set-client-handshake</option> is used.
</para>
@@ -4820,7 +4948,8 @@
<section id="cjk-faq-fulltext-searches">
- <title>Why do some LIKE and FULLTEXT searches fail?</title>
+ <title>Why do some <literal>LIKE</literal> and <literal>FULLTEXT</literal>
+ searches fail?</title>
<indexterm type="concept">
<primary>CJK (Chinese, Japanese, Korean)</primary>
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r3701 - trunk/refman-5.0 | jon | 23 Oct |