List:Commits« Previous MessageNext Message »
From:jon Date:October 23 2006 12:55pm
Subject:svn commit - mysqldoc@docsrva: r3701 - trunk/refman-5.0
View as plain text  
Author: jstephens
Date: 2006-10-23 14:55:19 +0200 (Mon, 23 Oct 2006)
New Revision: 3701

Log:

More CJK refactoring for new FAQ.



Modified:
   trunk/refman-5.0/faqs.xml


Modified: trunk/refman-5.0/faqs.xml
===================================================================
--- trunk/refman-5.0/faqs.xml	2006-10-23 06:44:04 UTC (rev 3700)
+++ trunk/refman-5.0/faqs.xml	2006-10-23 12:55:19 UTC (rev 3701)
Changed blocks: 21, Lines Added: 268, Lines Deleted: 139; 24560 bytes

@@ -3925,9 +3925,10 @@
 
 <programlisting>
 SELECT character_set_name, collation_name
-FROM   information_schema.columns WHERE  table_schema = your_database_name
-AND    table_name = your_table_name
-AND    column_name = your_column_name;
+    FROM information_schema.columns 
+    WHERE table_schema = your_database_name
+        AND table_name = your_table_name
+        AND column_name = your_column_name;
 </programlisting>
             </para>
 

@@ -4032,10 +4033,7 @@
               Windows: <command>mysql.exe</command>). If
               <command>mysql</command> displays correctly but your
               application doesn't, then your problem is probably due to
-              settings in the application. <quote>Settings</quote>, but
-              consult also the question about <quote>Troubles with
-              Access (or Perl) (or PHP) (etc.)</quote> much later in
-              this FAQ.
+              system settings.
             </para>
 
             <para>

@@ -4111,6 +4109,17 @@
 </programlisting>
             </para>
 
+            <remark role="todo">
+              [js] Fix refs once all sections are reworked.
+            </remark>
+
+            <para>
+              It is also possible that there are issues with application
+              settings &mdash; consult also the question about
+              <quote>Troubles with Access (or Perl) (or PHP)
+              (etc.)</quote> much later in this FAQ.
+            </para>
+
           </answer>
 
         </qandaentry>

@@ -4143,8 +4152,8 @@
       </remark>
 
       <remark role="todo">
-        [js] Get the charts from Bar perferably as CSV or some other
-        format that will be easy to turn into DB-XML for inlcusion as
+        [js] Get the charts from Bar - preferably as CSV or some other
+        format that will be easy to turn into DB-XML - for inclusion as
         appendices to the RefMan.
       </remark>
 

@@ -4255,6 +4264,10 @@
         <primary>big5</primary>
       </indexterm>
 
+      <remark role="todo">
+        [js] Needs to be framed as a Q-and-A.
+      </remark>
+
       <para>
         MySQL supports the Big5 character set which is common in Hong
         Kong and Taiwan (Republic of China). MySQL's

@@ -4337,7 +4350,7 @@
           <answer>
 
             <para>
-              Here's why. In this conversion table, the
+              In the following conversion table, the
               <literal>ucs2</literal> column is the source, and the
               <literal>sjis</literal>, <literal>cp932</literal>,
               <literal>ujis</literal>, and <literal>eucjpms</literal>

@@ -4396,34 +4409,53 @@
 FULLWIDTH NOT SIGN     FFE2   3F  81CA
 </programlisting>
 
-              It means that MySQL converts the NOT SIGN (Unicode
+              This means that MySQL converts the NOT SIGN (Unicode
               <literal>U+00AC</literal>) to <literal>sjis</literal> code
               point <literal>0x81CA</literal> and to
               <literal>cp932</literal> code point <literal>3F</literal>.
               (<literal>3F</literal> is the question mark
               (<quote>?</quote>) &mdash; this is what we always use when
-              we can't convert.) Now, what should we do if we want to
-              convert <literal>sjis 81CA</literal> to
-              <literal>cp932</literal>? Our answer is: <quote>?</quote>.
-              There are serious complaints about this, many people would
-              prefer a <quote>loose</quote> conversion, so that
-              <literal>81CA (NOT SIGN)</literal> in
-              <literal>sjis</literal> becomes <literal>81CA (FULLWIDTH
-              NOT SIGN)</literal> in <literal>cp932</literal>. We are
-              considering changing.
+              we can't convert.)
             </para>
 
           </answer>
 
         </qandaentry>
 
+        <qandaentry>
+
+          <question>
+
+            <para>
+              What should I do if I want to convert <literal>sjis
+              81CA</literal> to <literal>cp932</literal>?
+            </para>
+
+          </question>
+
+          <answer>
+
+            <para>
+              Our answer is: <quote>?</quote>. There are serious
+              complaints about this: many people would prefer a
+              <quote>loose</quote> conversion, so that <literal>81CA
+              (NOT SIGN)</literal> in <literal>sjis</literal> becomes
+              <literal>81CA (FULLWIDTH NOT SIGN)</literal> in
+              <literal>cp932</literal>. We are considering a change to
+              this behaviour.
+            </para>
+
+          </answer>
+
+        </qandaentry>
+
       </qandaset>
 
     </section>
 
     <section id="cjk-faq-great-yen-sign-problem">
 
-      <title>The Great Yen Sign problem (Japanese)</title>
+      <title>How Do I Work With the Yen Sign (<literal>¥</literal>)?</title>
 
       <indexterm type="concept">
         <primary>CJK (Chinese, Japanese, Korean)</primary>

@@ -4439,30 +4471,70 @@
         <primary>Yen sign (Japanese)</primary>
       </indexterm>
 
-      <para>
-        In SJIS the code for Yen Sign (<literal>¥</literal>) is
-        <literal>5C</literal>. In SJIS the code for Reverse Solidus
-        (<literal>\</literal>) is <literal>5C</literal>. Since the above
-        statements are contradictory, confusion often results. Well, to
-        put it more seriously, some versions of Japanese character sets
-        (both <literal>sjis</literal> and <literal>euc</literal>) have
-        treated <literal>5C</literal> as a reverse solidus, also known
-        as a backslash, and others have treated it as a yen sign.
-        There's nothing we can do, except take sides: MySQL follows only
-        one version of the JIS (Japanese Industrial Standards) standard
-        description, and <emphasis>5C is Reverse Solidus</emphasis>,
-        always. Should we make a separate character set where
-        <literal>5C</literal> is Yen Sign, as another DBMS (Oracle)
-        does? We haven't decided. Certainly not in version 5.1 or 5.2.
-        But if people keep complaining about The Great Yen Sign Problem,
-        that's one possible solution.
-      </para>
+      <qandaset>
 
+        <qandaentry>
+
+          <question>
+
+            <para>
+              How does MySQL represent the Yen (<literal>¥</literal>)
+              sign?
+            </para>
+
+          </question>
+
+          <answer>
+
+            <para>
+              A problem arises because some versions of Japanese
+              character sets (both <literal>sjis</literal> and
+              <literal>euc</literal>) treat <literal>5C</literal> as a
+              reverse solidus (<literal>\</literal> &mdash; also known
+              as a backslash), and others treat it as a yen sign
+              (<literal>¥</literal>).
+            </para>
+
+            <para>
+              MySQL follows only one version of the JIS (Japanese
+              Industrial Standards) standard description. In MySQL,
+              <emphasis>5C is always Reverse Solidus</emphasis>.
+            </para>
+
+          </answer>
+
+        </qandaentry>
+
+        <qandaentry>
+
+          <question>
+
+            <para>
+              Do MySQL plan to make a separate character set where
+              <literal>5C</literal> is the Yen sign, as at least one
+              other major DBMS does?
+            </para>
+
+          </question>
+
+          <answer>
+
+            <para>
+              This is one possible solution to the Yen sign issue;
+              however, this will not happen in MySQL 5.1 or 5.2.
+            </para>
+
+          </answer>
+
+        </qandaentry>
+
+      </qandaset>
+
     </section>
 
     <section id="cjk-faq-euckr-charset-problems">
 
-      <title>Troubles with euckr character set (Korean)</title>
+      <title>Issues with <literal>euckr</literal> (Korean) Character Set Support</title>
 
       <indexterm type="concept">
         <primary>CJK (Chinese, Japanese, Korean)</primary>

@@ -4478,22 +4550,44 @@
         <primary>Korean</primary>
       </indexterm>
 
+      <qandaset>
+
+        <qandaentry>
+
+          <question>
+
+            <para>
+              Of what issues should I be aware when working with Korean
+              character sets in MySQL?
+            </para>
+
+          </question>
+
+        </qandaentry>
+
+      </qandaset>
+
       <para>
-        MySQL supports the <literal>euckr</literal> (Extended Unix Code
-        Korea) character set which is common in South Korea. In theory,
-        problems could arise because there have been several versions of
-        this character set. So far, only one problem has been noted, for
-        Korea's currency symbol. We use the <quote>ASCII</quote> variant
-        of EUC-KR, in which the code point <literal>0x5c</literal> is
-        REVERSE SOLIDUS, that is <literal>\</literal>, instead of the
-        <quote>KS-Roman</quote> variant of EUC-KR, in which the code
-        point <literal>0x5c</literal> is WON SIGN, that is
-        <quote>₩</quote>. You can't convert Unicode
-        <literal>U+20A9</literal> WON SIGN to <literal>euckr</literal>:
+        In theory, while there have been several versions of the
+        <literal>euckr</literal> (<firstterm>Extended Unix Code
+        Korea</firstterm>) character set, only one problem has been
+        noted.
+      </para>
 
+      <para>
+        We use the <quote>ASCII</quote> variant of EUC-KR, in which the
+        code point <literal>0x5c</literal> is REVERSE SOLIDUS, that is
+        <literal>\</literal>, instead of the <quote>KS-Roman</quote>
+        variant of EUC-KR, in which the code point
+        <literal>0x5c</literal> is <literal>WON
+        SIGN</literal>(<literal>₩</literal>). This means that you
+        cannot convert Unicode <literal>U+20A9</literal> to
+        <literal>euckr</literal>:
+
 <programlisting>
-mysql> <userinput>SELECT CONVERT('₩' USING euckr) AS euckr,</userinput>
--> <userinput>HEX(CONVERT('₩' USING euckr)) AS hexeuckr;</userinput>
+mysql&gt; <userinput>SELECT</userinput>
+    -&gt;     <userinput>CONVERT('₩' USING euckr) AS euckr,</userinput>
+    -&gt;     <userinput>HEX(CONVERT('₩' USING euckr)) AS hexeuckr;</userinput>
 +-------+----------+
 | euckr | hexeuckr |
 +-------+----------+

@@ -4502,6 +4596,11 @@
 1 row in set (0.00 sec)
 </programlisting>
 
+        <remark role="todo">
+          [js] Update following reference once we have incorporated
+          these charts into the Manual.
+        </remark>
+
         MySQL's graphic Korean chart is here:
         <ulink url="http://d.udm.net/bar/~bar/charts/euckr_korean_ci.html"/>.
       </para>

@@ -4542,7 +4641,7 @@
         both columns.
 
 <programlisting>
-mysql> <userinput>INSERT INTO ch VALUES ('A汌B','A汌B');</userinput>
+mysql&gt; <userinput>INSERT INTO ch VALUES ('A汌B','A汌B');</userinput>
 Query OK, 1 row affected, 1 warning (0.00 sec) 
 </programlisting>
 

@@ -4558,10 +4657,11 @@
 1 row in set (0.00 sec)
 </programlisting>
 
-        So it's a warning about the gb2312 column only.
+        So it's a warning about the <literal>gb2312</literal> column
+        only.
 
 <programlisting>
-mysql> SELECT ucs2,HEX(ucs2),gb2312,HEX(gb2312) FROM ch;
+mysql&gt; SELECT ucs2,HEX(ucs2),gb2312,HEX(gb2312) FROM ch;
 +-------+--------------+--------+-------------+
 | ucs2  | HEX(ucs2)    | gb2312 | HEX(gb2312) |
 +-------+--------------+--------+-------------+

@@ -4596,9 +4696,9 @@
               Admittedly the message is misleading. We didn't
               <quote>truncate</quote> in this case, we replaced with a
               question mark. We've had a complaint about this message
-              (bug #9337). But until we come up with something better,
-              just accept that error/warning code 2165 can mean a
-              variety of things.
+              (See Bug #9337). But until we come up with something
+              better, just accept that error/warning code 2165 can mean
+              a variety of things.
             </para>
           </listitem>
 

@@ -4616,9 +4716,9 @@
 
     </section>
 
-    <section id="cjk-faq-access-perl-php-troubles">
+    <section id="faqs-cjk-access-perl-php">
 
-      <title>Troubles with Access, Perl, PHP, etc.</title>
+      <title>Using CJK Character Sets with Access, PHP, and other APIs</title>
 
       <indexterm type="concept">
         <primary>CJK (Chinese, Japanese, Korean)</primary>

@@ -4633,19 +4733,18 @@
       <para>
         You can't get things to look right with your special program for
         a GUI front end or browser? Get a direct connection to the
-        server (with <command>mysql</command> on Unix or with
-        <command>mysql.exe</command> on Windows) and try the same query
-        there. If mysql is okay, then the trouble is probably that your
-        application interface needs some initializing. Use
-        <command>mysql</command> to tell you what character set(s) it
-        uses, by saying <literal>SHOW VARIABLES LIKE 'char%';</literal>.
-        If it's Access, you're probably connecting with MyODBC. So
-        you'll want to check out the Reference Manual page for
-        configuring an ODBC DSN, and pay attention particularly to the
-        illustrations for <quote>SQL command on connect</quote>. You
-        should enter <literal>SET NAMES 'big5'</literal> (supposing that
-        you use <literal>big5</literal>) (you don't need a
-        <literal>;</literal> here). If it's ASP, you might need to add
+        server using <command>mysql</command> (Windows:
+        <command>mysql.exe</command>) and try the same query there. If
+        <command>mysql</command> responds correctly, then the trouble
+        may be that your application interface requires initialization.
+        Use <command>mysql</command> to tell you what character set or
+        sets it uses with the statement <literal>SHOW VARIABLES LIKE
+        'char%';</literal>. If you are using Access, then you are most
+        likely connecting with MyODBC. In this case, you should check
+        <xref linkend="myodbc-configuration"/>. If, for instance, you
+        use <literal>big5</literal>, you would enter <literal>SET NAMES
+        'big5'</literal>. (Note that no <literal>;</literal> is required
+        in this case). If you are using ASP, you might need to add
         <literal>SET NAMES</literal> in the code. Here is an example
         that has worked in the past:
 

@@ -4654,48 +4753,75 @@
 Session.CodePage=0
 Dim strConnection
 Dim Conn
-strConnection="driver={MySQL ODBC 3.51 Driver};server=yourserver;uid=yourusername;" \ 
-        &amp; "pwd=yourpassword;database=yourdatabase;stmt=SET NAMES 'big5';"
-Set Conn = Server.CreateObject(<quote>ADODB.Connection</quote>) 
+strConnection="driver={MySQL ODBC 3.51 Driver};server=<replaceable>server</replaceable>;uid=<replaceable>username</replaceable>;" \ 
+               &amp; "pwd=<replaceable>password</replaceable>;database=<replaceable>database</replaceable>;stmt=SET NAMES 'big5';"
+Set Conn = Server.CreateObject("ADODB.Connection") 
 Conn.Open strConnection
 %> 
 </programlisting>
 
-        If it's PHP, here's a slightly different user suggestion:
+        In much the same way, if you are using any character set other
+        than <literal>latin1</literal> with Connector/NET, then you must
+        specify the character set in the connection string. See
+        <xref linkend="connector-net-using-connecting"/>, for more
+        information.
+      </para>
 
+      <para>
+        If you are using PHP, try this:
+
 <programlisting>
 &lt;?php 
-  $link = mysql_connect($host,$usr,$pwd); 
-  mysql_select_db($db); 
-  if (mysql_error()) { print "Database ERROR: " . mysql_error(); } 
-  mysql_query("SET CHARACTER SET utf8", $link); 
+  $link = mysql_connect($host, $usr, $pwd); 
+
+  mysql_select_db($db);
+ 
+  if( mysql_error() ) { print "Database ERROR: " . mysql_error(); }
   mysql_query("SET NAMES 'utf8'", $link); 
-?>
+?&gt;
 </programlisting>
 
-        In this case, the tipper used <literal>SET CHARACTER
-        SET</literal> statement to change
+        In this case, we used <literal>SET NAMES</literal> to change
         <literal>character_set_client</literal> and
-        <literal>character_set_result</literal>, and used <literal>SET
-        NAMES</literal> to change
-        <literal>character_set_client</literal> and
         <literal>character_set_connection</literal> and
-        <literal>character_set_results</literal>. So actually the
-        <literal>SET CHARACTER SET</literal> statement is redundant.
-        (Incidentally, MySQL people encourage the use of the
-        <literal>mysqli</literal> extension, rather than the
-        <literal>mysql</literal> example that this example uses.)
-        Another thing to check with PHP is the browser assumptions.
-        Sometimes a meta tag change in the heading area suffices, for
-        example: <literal>&lt;meta http-equiv="Content-Type"
-        content="text/html; charset=utf-8"></literal>
+        <literal>character_set_results</literal>.
       </para>
 
       <para>
-        For Connector/J tips, see the manual section in the Connectors
-        chapter titled <quote>Using Character Sets and Unicode</quote>.
+        We encourage the use of the newer <literal>mysqli</literal>
+        extension, rather than <literal>mysql</literal>. Using
+        <literal>mysqli</literal>, the previous example could be
+        rewritten as shown here:
+
+<programlisting>               
+&lt;?php 
+  $link = new mysqli($host, $usr, $pwd, $db);
+ 
+  if( mysqli_connect_errno() )
+  {
+    printf("Connect failed: %s\n", mysqli_connect_error());
+    exit();
+  }
+
+  $link-&gt;query("SET NAMES 'utf8'"); 
+?&gt;
+</programlisting>
+
+        Another issue often encountered in PHP applications has to do
+        with assumptions made by the browser. Sometimes adding or
+        changing a <literal>&lt;meta&gt;</literal> tag suffices to
+        correct the problem: for example, to insure that the user agent
+        interpets page content as <literal>UTF-8</literal>, you should
+        include <literal>&lt;meta http-equiv="Content-Type"
+        content="text/html; charset=utf-8"></literal> in the
+        <literal>&lt;head&gt;</literal> of the HTML page.
       </para>
 
+      <para>
+        If you are using Connector/J, see
+        <xref linkend="connector-j-reference-charsets"/>.
+      </para>
+
     </section>
 
     <section id="cjk-faq-restore-mysql40-behavior">

@@ -4708,12 +4834,12 @@
       </indexterm>
 
       <para>
-        In the old days, with MySQL Version 4.0, there was a single
-        <quote>global</quote> character set for both server and client
-        sides, and the decision was made by the server administrator. We
-        changed that starting with MySQL Version 4.1. What happens now
-        is a <quote>handshake</quote>. The MySQL Reference Manual
-        describes it thus:
+        In MySQL Version 4.0, there was a single <quote>global</quote>
+        character set for both server and client, and the decision as to
+        which character to use was made by the server administrator.
+        This changed starting with MySQL Version 4.1. What happens now
+        is a <quote>handshake</quote>, as described in
+        <xref linkend="charset-connection"/>:
 
         <blockquote>
 

@@ -4729,43 +4855,46 @@
 
         </blockquote>
 
-        The effect of this is: you can't control the client character
-        set by saying <literal>mysqld
-        --character-set-server=utf8</literal>. But some Asian customers
-        said that they don't like that, they want the MySQL 4.0
-        behaviour. So we added a <command>mysqld</command> switch,
-        <option>--character-set-client-handshake</option>, which (and
-        this is the interesting part) can be turned off with
+        The effect of this is that you cannot control the client
+        character set by starting <command>mysqld</command> with
+        <option>--character-set-server=utf8</option>. However, some of
+        our Asian customers have said that prefer the MySQL 4.0
+        behaviour. To make it possible to retain this behaviour, we
+        added a <command>mysqld</command> switch,
+        <option>--character-set-client-handshake</option>, which can be
+        turned off with
         <option>--skip-character-set-client-handshake</option>. If you
-        start mysqld with
-        <option>--skip-character-set-client-handshake</option>, then the
-        behaviour is like this: When a client connects, it sends to the
-        server the name of the character set that it wants to use. The
-        server ignores it! Here is an illustration with the handshake
-        switch on or off. Pretend that your favourite server character
-        set is <literal>latin1</literal> (of course that's unlikely in a
-        CJK area but it's MySQL's default if there's no
-        <filename>my.ini</filename> or <filename>my.cnf</filename>
-        file). Pretend that the client operates with
-        <literal>utf8</literal> because that's what the client's
-        operating system supports. Start the server with a default
-        character set, <literal>latin1</literal>:
+        start <command>mysqld</command> with
+        <option>--skip-character-set-client-handshake</option>, then,
+        when a client connects, it sends to the server the name of the
+        character set that it wants to use &mdash; however,
+        <emphasis>the server ignores this request from the
+        client</emphasis>.
+      </para>
 
+      <para>
+        By way of example, suppose that your favourite server character
+        set is <literal>latin1</literal> (unlikely in a CJK area, but
+        this is the default value). Suppose further that the client uses
+        <literal>utf8</literal> because this is what the client's
+        operating system supports. Now, start the server with
+        <literal>latin1</literal> as its default character set:
+
 <programlisting>
 mysqld --character-set-server=latin1
 </programlisting>
 
-        Start the client with a default character set,
+        And then start the client with the default character set
         <literal>utf8</literal>:
 
 <programlisting>
 mysql --default-character-set=utf8
 </programlisting>
 
-        Show what the current settings are:
+        The current settings can be seen as shown here:
 
 <programlisting>
-mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
+mysql&gt; <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
 +--------------------------+----------------------------------------+
 | Variable_name            | Value                                  |
 +--------------------------+----------------------------------------+

@@ -4781,20 +4910,19 @@
 8 rows in set (0.01 sec)
 </programlisting>
 
-        Stop the client. Stop the server with
-        <command>mysqladmin</command>. Start the server again but this
-        time say <quote>skip the handshake</quote>:
+        Now stop the client, and then stop the server using
+        <command>mysqladmin</command>. Then start the server again, but
+        this time tell it to skip the handshake like so:
 
 <programlisting>
 mysqld --character-set-server=utf8 --skip-character-set-client-handshake
 </programlisting>
 
-        Start the client with a default character set,
-        <literal>utf8</literal>, again. Show what the current settings
-        are, again:
+        Start the client with <literal>utf8</literal> once again as the
+        default character set, then display the current settings:
 
 <programlisting>
-mysql> <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
+mysql&gt; <userinput>SHOW VARIABLES LIKE 'char%';</userinput>
 +--------------------------+----------------------------------------+
 | Variable_name            | Value                                  |
 +--------------------------+----------------------------------------+

@@ -4810,9 +4938,9 @@
 8 rows in set (0.01 sec)
 </programlisting>
 
-        As you can see by comparing the <literal>SHOW
-        VARIABLES</literal> results, the server ignores the client's
-        initial settings if the
+        As you can see by comparing the differing results from
+        <literal>SHOW VARIABLES</literal>, the server ignores the
+        client's initial settings if the
         <option>--skip-character-set-client-handshake</option> is used.
       </para>
 

@@ -4820,7 +4948,8 @@
 
     <section id="cjk-faq-fulltext-searches">
 
-      <title>Why do some LIKE and FULLTEXT searches fail?</title>
+      <title>Why do some <literal>LIKE</literal> and <literal>FULLTEXT</literal>
+        searches fail?</title>
 
       <indexterm type="concept">
         <primary>CJK (Chinese, Japanese, Korean)</primary>


Thread
svn commit - mysqldoc@docsrva: r3701 - trunk/refman-5.0jon23 Oct