List:Commits« Previous MessageNext Message »
From:paul.dubois Date:May 13 2011 5:41pm
Subject:svn commit - mysqldoc@oter02: r26218 - in trunk: . refman-5.0 refman-5.1 refman-5.5 refman-5.6 refman-6.0
View as plain text  
Author: pd221994
Date: 2011-05-13 19:41:11 +0200 (Fri, 13 May 2011)
New Revision: 26218

Log:
 r47978@dhcp-adc-twvpn-1-vpnpool-10-154-14-71:  paul | 2011-05-13 12:26:51 -0500
 Adding charset/collation instruction revisions


Modified:
   svk:merge
   trunk/refman-5.0/internationalization.xml
   trunk/refman-5.1/internationalization.xml
   trunk/refman-5.5/internationalization.xml
   trunk/refman-5.6/internationalization.xml
   trunk/refman-6.0/internationalization.xml

Property changes on: trunk
___________________________________________________________________

Modified: svk:merge
===================================================================


Changed blocks: 0, Lines Added: 0, Lines Deleted: 0; 1277 bytes


Modified: trunk/refman-5.0/internationalization.xml
===================================================================
--- trunk/refman-5.0/internationalization.xml	2011-05-13 16:12:35 UTC (rev 26217)
+++ trunk/refman-5.0/internationalization.xml	2011-05-13 17:41:11 UTC (rev 26218)
Changed blocks: 29, Lines Added: 254, Lines Deleted: 198; 26827 bytes

@@ -6129,9 +6129,9 @@
 
       <listitem>
         <para>
-          If the character set does not need to use special string
-          collating routines for sorting and does not need multi-byte
-          character support, it is simple.
+          If the character set does not need special string collating
+          routines for sorting and does not need multi-byte character
+          support, it is simple.
         </para>
       </listitem>
 

@@ -6165,7 +6165,8 @@
           <replaceable>MYSET</replaceable> to the
           <filename>sql/share/charsets/Index.xml</filename> file. Use
           the existing contents in the file as a guide to adding new
-          contents.
+          contents. A partial listing for the <literal>latin1</literal>
+          <literal>&lt;charset&gt;</literal> element follows:
         </para>
 
 <programlisting>

@@ -6179,14 +6180,19 @@
   &lt;/collation&gt;
   &lt;collation name="latin1_danish_ci"	id="15"	order="Danish"/&gt;
   ...
+  &lt;collation name="latin1_bin"		id="47"	order="Binary"&gt;
+    &lt;flag&gt;binary&lt;/flag&gt;
+    &lt;flag&gt;compiled&lt;/flag&gt;
+  &lt;/collation&gt;
+  ...
 &lt;/charset&gt;
 </programlisting>
 
         <para>
           The <literal>&lt;charset&gt;</literal> element must list all
           the collations for the character set. These must include at
-          least a binary collation and a default collation. The default
-          collation is usually named using a suffix of
+          least a binary collation and a default (primary) collation.
+          The default collation is often named using a suffix of
           <literal>general_ci</literal> (general, case insensitive). It
           is possible for the binary collation to be the default
           collation, but usually they are different. The default

@@ -6277,7 +6283,7 @@
           character set:
         </para>
 
-        <orderedlist>
+        <itemizedlist>
 
           <listitem>
             <para>

@@ -6319,7 +6325,7 @@
             </para>
           </listitem>
 
-        </orderedlist>
+        </itemizedlist>
       </listitem>
 
       <listitem>

@@ -6465,7 +6471,8 @@
 
       <para>
         Each simple character set has a configuration file located in
-        the <filename>sql/share/charsets</filename> directory. The file
+        the <filename>sql/share/charsets</filename> directory. For a
+        character set named <replaceable>MYSYS</replaceable>, the file
         is named
         <filename><replaceable>MYSET</replaceable>.xml</filename>. It
         uses <literal>&lt;map&gt;</literal> array elements to list

@@ -6517,17 +6524,18 @@
         <literal>ctype_<replaceable>MYSET</replaceable>[]</literal>,
         <literal>to_lower_<replaceable>MYSET</replaceable>[]</literal>,
         and so forth. Not every complex character set has all of the
-        arrays. See the existing <filename>ctype-*.c</filename> files
-        for examples. See the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
+        arrays. See also the existing <filename>ctype-*.c</filename>
+        files for examples. See the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
         information.
       </para>
 
       <para>
-        The <literal>&lt;ctype&gt;</literal> array is indexed by
-        character value + 1 and has 257 elements. This is a legacy
-        convention for handling <literal>EOF</literal>. The other arrays
-        are indexed by character value and have 256 elements.
+        Most of the arrays are indexed by character value and have 256
+        elements. The <literal>&lt;ctype&gt;</literal> array is indexed
+        by character value + 1 and has 257 elements. This is a legacy
+        convention for handling <literal>EOF</literal>.
       </para>
 
       <para>

@@ -6582,14 +6590,14 @@
 </programlisting>
 
       <para>
-        Each <literal>&lt;collation&gt;</literal> element contains a
-        mapping array that indicates how characters should be ordered
-        for comparison and sorting purposes. MySQL sorts characters
-        based on the values of this information. In some cases, this is
-        the same as the <literal>upper</literal> array, which means that
-        sorting is case-insensitive. For more complicated sorting rules
-        (for complex character sets), see the discussion of string
-        collating in <xref linkend="string-collating"/>.
+        Each <literal>&lt;collation&gt;</literal> array indicates how
+        characters should be ordered for comparison and sorting
+        purposes. MySQL sorts characters based on the values of this
+        information. In some cases, this is the same as the
+        <literal>&lt;upper&gt;</literal> array, which means that sorting
+        is case-insensitive. For more complicated sorting rules (for
+        complex character sets), see the discussion of string collating
+        in <xref linkend="string-collating"/>.
       </para>
 
     </section>

@@ -6608,12 +6616,13 @@
       </indexterm>
 
       <para>
-        For simple character sets, sorting rules are specified in the
-        <filename><replaceable>MYSET</replaceable>.xml</filename>
+        For a simple character set named
+        <replaceable>MYSET</replaceable>, sorting rules are specified in
+        the <filename><replaceable>MYSET</replaceable>.xml</filename>
         configuration file using <literal>&lt;map&gt;</literal> array
         elements within <literal>&lt;collation&gt;</literal> elements.
         If the sorting rules for your language are too complex to be
-        handled with simple arrays, you need to define string collating
+        handled with simple arrays, you must define string collating
         functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.

@@ -6628,9 +6637,10 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>tis160</literal> character sets. Take a look at the
         <literal>MY_COLLATION_HANDLER</literal> structures to see how
-        they are used, and see the <filename>CHARSET_INFO.txt</filename>
-        file in the <filename>strings</filename> directory for
-        additional information.
+        they are used. See also the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
+        information.
       </para>
 
     </section>

@@ -6649,9 +6659,9 @@
       </indexterm>
 
       <para>
-        If you want to add support for a new character set that includes
-        multi-byte characters, you need to use multi-byte character
-        functions in the
+        If you want to add support for a new character set named
+        <replaceable>MYSET</replaceable> that includes multi-byte
+        characters, you must use multi-byte character functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.
       </para>

@@ -6665,9 +6675,9 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>ujis</literal> character sets. Take a look at the
         <literal>MY_CHARSET_HANDLER</literal> structures to see how they
-        are used, and see the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
-        information.
+        are used. See also the <filename>CHARSET_INFO.txt</filename>
+        file in the <filename>strings</filename> directory for
+        additional information.
       </para>
 
     </section>

@@ -6727,9 +6737,9 @@
     </itemizedlist>
 
     <para>
-      The following discussion describes how to add collations of the
-      first two types to existing character sets. All existing character
-      sets already have a binary collation, so there is no need here to
+      The following sections describe how to add collations of the first
+      two types to existing character sets. All existing character sets
+      already have a binary collation, so there is no need here to
       describe how to add one.
     </para>
 

@@ -6775,10 +6785,8 @@
       all the information required for a complete character set, just
       modify the appropriate files for an existing character set. That
       is, based on what is already present for the character set's
-      current collations, add new data structures, functions, and
-      configuration information for the new collation. For an example,
-      see the MySQL Blog article in the following list of additional
-      resources.
+      current collations, add data structures, functions, and
+      configuration information for the new collation.
     </para>
 
     <bridgehead>

@@ -6855,6 +6863,11 @@
 </programlisting>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-collation-simple-8bit"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Complex collations for 8-bit character
         sets</emphasis>
       </para>

@@ -6908,6 +6921,11 @@
       </itemizedlist>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-character-set"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Collations for Unicode multi-byte
         character sets</emphasis>
       </para>

@@ -6994,6 +7012,12 @@
       </para>
 
       <para>
+        For implementation instructions, for a non-UCA colluation, see
+        <xref linkend="adding-character-set"/>. For a UCA collation, see
+        <xref linkend="adding-collation-unicode-uca"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Miscellaneous collations</emphasis>
       </para>
 

@@ -7019,16 +7043,16 @@
 
         <listitem>
           <para>
-            The <literal>Id</literal> column of
-            <literal role="stmt">SHOW COLLATION</literal> output
+            The <literal>ID</literal> column of the
+            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
+            table
           </para>
         </listitem>
 
         <listitem>
           <para>
-            The <literal>ID</literal> column of the
-            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
-            table
+            The <literal>Id</literal> column of
+            <literal role="stmt">SHOW COLLATION</literal> output
           </para>
         </listitem>
 

@@ -7193,9 +7217,9 @@
             add a <literal>&lt;collation&gt;</literal> element that
             names the collation and that contains a
             <literal>&lt;map&gt;</literal> element that defines a
-            character code-to-weight mapping table. Each word within the
-            <literal>&lt;map&gt;</literal> element must be a number in
-            hexadecimal format.
+            character code-to-weight mapping table for character codes 0
+            to 255. Each value within the <literal>&lt;map&gt;</literal>
+            element must be a number in hexadecimal format.
           </para>
 
 <programlisting>

@@ -7252,8 +7276,8 @@
         <literal>&lt;collation&gt;</literal> element within a
         <literal>&lt;charset&gt;</literal> character set description.
         The procedure described here does not require recompiling MySQL.
-        It uses a subset of the Locale Data Markup Language (LDML),
-        which is available at
+        It uses a subset of the Locale Data Markup Language (LDML)
+        specification, which is available at
         <ulink url="http://www.unicode.org/reports/tr35/"/>. In
         &current-series;, this method of adding collations is supported
         as of MySQL 5.0.46. With this method, you need not define the

@@ -7264,7 +7288,8 @@
         for which UCA collations can be defined.
       </para>
 
-      <informaltable>
+      <table>
+        <title>MySQL Character Sets Available for User-Defined UCA Collations</title>
         <tgroup cols="2">
           <colspec colwidth="30*"/>
           <colspec colwidth="60*"/>

@@ -7285,65 +7310,79 @@
             </row>
           </tbody>
         </tgroup>
-      </informaltable>
+      </table>
 
       <para>
-        The following brief summary describes the LDML characteristics
-        required to understand the procedure for adding a collation
-        given later in this section:
+        The following sections show how to add a collation that is
+        defined using LDML syntax, and provide a summary of LDML rules
+        supported in MySQL.
       </para>
 
-      <itemizedlist>
+      <section id="ldml-rules">
 
-        <listitem>
-          <para>
-            LDML has reset rules and shift rules.
-          </para>
-        </listitem>
+        <title>LDML Syntax Supported in MySQL</title>
 
-        <listitem>
-          <para>
-            Characters named in these rules can be written in
-            <literal>\u<replaceable>nnnn</replaceable></literal> format,
-            where <replaceable>nnnn</replaceable> is the hexadecimal
-            Unicode code point value. Basic Latin letters
-            <literal>A-Z</literal> and <literal>a-z</literal> can also
-            be written literally (this is a MySQL limitation; the LDML
-            specification permits literal non-Latin1 characters in the
-            rules). Only characters in the Basic Multilingual Plane can
-            be specified. This notation does not apply to characters
-            outside the BMP range of <literal>0000</literal> to
-            <literal>FFFF</literal>.
-          </para>
-        </listitem>
+        <para>
+          This section describes the LDML rules that MySQL recognizes.
+          These are a subset of the rules described in the LDML
+          specification available at
+          <ulink url="http://www.unicode.org/reports/tr35/"/>. The rules
+          here are all supported except that character sorting occurs
+          only at the primary level. Rules that specify secondary or
+          higher sort levels are recognized but have no effect.
+        </para>
 
-        <listitem>
-          <para>
-            A reset rule does not specify any ordering in and of itself.
-            Instead, it <quote>resets</quote> the ordering for
-            subsequent shift rules to cause them to be taken in relation
-            to a given character. Either of the following rules resets
-            subsequent shift rules to be taken in relation to the letter
-            <literal>'A'</literal>:
-          </para>
+        <itemizedlist>
 
+          <listitem>
+            <para>
+              Characters named in LDML rules can be written in
+              <literal>\u<replaceable>nnnn</replaceable></literal>
+              format, where <replaceable>nnnn</replaceable> is the
+              hexadecimal Unicode code point value. Basic Latin letters
+              <literal>A-Z</literal> and <literal>a-z</literal> can also
+              be written literally (this is a MySQL limitation; the LDML
+              specification permits literal non-Latin1 characters in the
+              rules). Only characters in the Basic Multilingual Plane
+              can be specified. This notation does not apply to
+              characters outside the BMP range of
+              <literal>0000</literal> to <literal>FFFF</literal>.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              LDML has reset rules and shift rules.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              A reset rule does not specify any ordering in and of
+              itself. Instead, it <quote>resets</quote> the ordering for
+              subsequent shift rules to cause them to be taken in
+              relation to a given character. Either of the following
+              rules resets subsequent shift rules to be taken in
+              relation to the letter <literal>'A'</literal>:
+            </para>
+
 <programlisting>
 &lt;reset&gt;A&lt;/reset&gt;
 
 &lt;reset&gt;\u0041&lt;/reset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Shift rules define primary, secondary, and tertiary
-            differences of a character from another character. They are
-            specified using <literal>&lt;p&gt;</literal>,
-            <literal>&lt;s&gt;</literal>, and
-            <literal>&lt;t&gt;</literal> elements. Either of the
-            following rules specifies a primary shift rule for the
-            <literal>'G'</literal> character:
-          </para>
+          <listitem>
+            <para>
+              Shift rules define primary, secondary, and tertiary
+              differences of a character from another character. They
+              are specified using <literal>&lt;p&gt;</literal>,
+              <literal>&lt;s&gt;</literal>, and
+              <literal>&lt;t&gt;</literal> elements. Either of the
+              following rules specifies a primary shift rule for the
+              <literal>'G'</literal> character:
+            </para>
 
 <programlisting>
 &lt;p&gt;G&lt;/p&gt;

@@ -7351,43 +7390,57 @@
 &lt;p&gt;\u0047&lt;/p&gt;
 </programlisting>
 
-          <itemizedlist>
+            <itemizedlist>
 
-            <listitem>
-              <para>
-                Use primary differences to distinguish separate letters.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use primary differences to distinguish separate
+                  letters.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use secondary differences to distinguish accent
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use secondary differences to distinguish accent
+                  variations.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use tertiary differences to distinguish lettercase
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use tertiary differences to distinguish lettercase
+                  variations.
+                </para>
+              </listitem>
 
-          </itemizedlist>
-        </listitem>
+            </itemizedlist>
+          </listitem>
 
-      </itemizedlist>
+        </itemizedlist>
 
-      <para>
-        To add a UCA collation for a Unicode character set without
-        recompiling MySQL, use the following procedure. The example adds
-        a collation named <literal>utf8_phone_ci</literal> to the
-        <literal>utf8</literal> character set. The collation is designed
-        for a scenario involving a Web application for which users post
-        their names and phone numbers. Phone numbers can be given in
-        very different formats:
-      </para>
+      </section>
 
+      <section id="ldml-collation-example">
+
+        <title>Defining a UCA Collation using LDML Syntax</title>
+
+        <para>
+          To add a UCA collation for a Unicode character set without
+          recompiling MySQL, use the following procedure. If you are
+          unfamiliar with the LDML rules used to describe the
+          collation's sort characteristics, see
+          <xref linkend="ldml-rules"/>.
+        </para>
+
+        <para>
+          The example adds a collation named
+          <literal>utf8_phone_ci</literal> to the
+          <literal>utf8</literal> character set. The collation is
+          designed for a scenario involving a Web application for which
+          users post their names and phone numbers. Phone numbers can be
+          given in very different formats:
+        </para>
+
 <programlisting>
 +7-12345-67
 +7-12-345-67

@@ -7396,33 +7449,33 @@
 +71234567
 </programlisting>
 
-      <para>
-        The problem raised by dealing with these kinds of values is that
-        the varying permissible formats make searching for a specific
-        phone number very difficult. The solution is to define a new
-        collation that reorders punctuation characters, making them
-        ignorable.
-      </para>
+        <para>
+          The problem raised by dealing with these kinds of values is
+          that the varying permissible formats make searching for a
+          specific phone number very difficult. The solution is to
+          define a new collation that reorders punctuation characters,
+          making them ignorable.
+        </para>
 
-      <orderedlist>
+        <orderedlist>
 
-        <listitem>
-          <para>
-            Choose a collation ID, as shown in
-            <xref linkend="adding-collation-choosing-id"/>. The
-            following steps use an ID of 252.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              Choose a collation ID, as shown in
+              <xref linkend="adding-collation-choosing-id"/>. The
+              following steps use an ID of 252.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            To modify the <literal>Index.xml</literal> configuration
-            file. This file will be located in the directory named by
-            the <literal role="sysvar">character_sets_dir</literal>
-            system variable. You can check the variable value as
-            follows, although the path name might be different on your
-            system:
-          </para>
+          <listitem>
+            <para>
+              To modify the <literal>Index.xml</literal> configuration
+              file. This file will be located in the directory named by
+              the <literal role="sysvar">character_sets_dir</literal>
+              system variable. You can check the variable value as
+              follows, although the path name might be different on your
+              system:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW VARIABLES LIKE 'character_sets_dir';</userinput>

@@ -7432,21 +7485,22 @@
 | character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
 +--------------------+-----------------------------------------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Choose a name for the collation and list it in the
-            <filename>Index.xml</filename> file. In addition, you'll
-            need to provide the collation ordering rules. Find the
-            <literal>&lt;charset&gt;</literal> element for the character
-            set to which the collation is being added, and add a
-            <literal>&lt;collation&gt;</literal> element that indicates
-            the collation name and ID, to associate the name with the
-            ID. Within the <literal>&lt;collation&gt;</literal> element,
-            provide a <literal>&lt;rules&gt;</literal> element
-            containing the ordering rules:
-          </para>
+          <listitem>
+            <para>
+              Choose a name for the collation and list it in the
+              <filename>Index.xml</filename> file. In addition, you'll
+              need to provide the collation ordering rules. Find the
+              <literal>&lt;charset&gt;</literal> element for the
+              character set to which the collation is being added, and
+              add a <literal>&lt;collation&gt;</literal> element that
+              indicates the collation name and ID, to associate the name
+              with the ID. Within the
+              <literal>&lt;collation&gt;</literal> element, provide a
+              <literal>&lt;rules&gt;</literal> element containing the
+              ordering rules:
+            </para>
 
 <programlisting>
 &lt;charset name="utf8"&gt;

@@ -7464,25 +7518,25 @@
   ...
 &lt;/charset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            If you want a similar collation for other Unicode character
-            sets, add other <literal>&lt;collation&gt;</literal>
-            elements. For example, to define
-            <literal>ucs2_phone_ci</literal>, add a
-            <literal>&lt;collation&gt;</literal> element to the
-            <literal>&lt;charset name="ucs2"&gt;</literal> element.
-            Remember that each collation must have its own unique ID.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              If you want a similar collation for other Unicode
+              character sets, add other
+              <literal>&lt;collation&gt;</literal> elements. For
+              example, to define <literal>ucs2_phone_ci</literal>, add a
+              <literal>&lt;collation&gt;</literal> element to the
+              <literal>&lt;charset name="ucs2"&gt;</literal> element.
+              Remember that each collation must have its own unique ID.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            Restart the server and use this statement to verify that the
-            collation is present:
-          </para>
+          <listitem>
+            <para>
+              Restart the server and use this statement to verify that
+              the collation is present:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW COLLATION LIKE 'utf8_phone_ci';</userinput>

@@ -7492,19 +7546,19 @@
 | utf8_phone_ci | utf8    | 252 |         |          |       8 |
 +---------------+---------+-----+---------+----------+---------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-      </orderedlist>
+        </orderedlist>
 
-      <para>
-        Now test the collation to make sure that it has the desired
-        properties.
-      </para>
+        <para>
+          Now test the collation to make sure that it has the desired
+          properties.
+        </para>
 
-      <para>
-        Create a table containing some sample phone numbers using the
-        new collation:
-      </para>
+        <para>
+          Create a table containing some sample phone numbers using the
+          new collation:
+        </para>
 
 <programlisting>
 <!--

@@ -7533,10 +7587,10 @@
 Query OK, 1 row affected (0.00 sec)
 </programlisting>
 
-      <para>
-        Run some queries to see whether the ignored punctuation
-        characters are in fact ignored for sorting and comparisons:
-      </para>
+        <para>
+          Run some queries to see whether the ignored punctuation
+          characters are in fact ignored for sorting and comparisons:
+        </para>
 
 <programlisting>
 mysql&gt; <userinput>SELECT * FROM phonebook ORDER BY phone;</userinput>

@@ -7576,6 +7630,8 @@
 1 row in set (0.00 sec)
 </programlisting>
 
+      </section>
+
     </section>
 
   </section>


Modified: trunk/refman-5.1/internationalization.xml
===================================================================
--- trunk/refman-5.1/internationalization.xml	2011-05-13 16:12:35 UTC (rev 26217)
+++ trunk/refman-5.1/internationalization.xml	2011-05-13 17:41:11 UTC (rev 26218)
Changed blocks: 29, Lines Added: 254, Lines Deleted: 198; 26827 bytes

@@ -6319,9 +6319,9 @@
 
       <listitem>
         <para>
-          If the character set does not need to use special string
-          collating routines for sorting and does not need multi-byte
-          character support, it is simple.
+          If the character set does not need special string collating
+          routines for sorting and does not need multi-byte character
+          support, it is simple.
         </para>
       </listitem>
 

@@ -6355,7 +6355,8 @@
           <replaceable>MYSET</replaceable> to the
           <filename>sql/share/charsets/Index.xml</filename> file. Use
           the existing contents in the file as a guide to adding new
-          contents.
+          contents. A partial listing for the <literal>latin1</literal>
+          <literal>&lt;charset&gt;</literal> element follows:
         </para>
 
 <programlisting>

@@ -6369,14 +6370,19 @@
   &lt;/collation&gt;
   &lt;collation name="latin1_danish_ci"	id="15"	order="Danish"/&gt;
   ...
+  &lt;collation name="latin1_bin"		id="47"	order="Binary"&gt;
+    &lt;flag&gt;binary&lt;/flag&gt;
+    &lt;flag&gt;compiled&lt;/flag&gt;
+  &lt;/collation&gt;
+  ...
 &lt;/charset&gt;
 </programlisting>
 
         <para>
           The <literal>&lt;charset&gt;</literal> element must list all
           the collations for the character set. These must include at
-          least a binary collation and a default collation. The default
-          collation is usually named using a suffix of
+          least a binary collation and a default (primary) collation.
+          The default collation is often named using a suffix of
           <literal>general_ci</literal> (general, case insensitive). It
           is possible for the binary collation to be the default
           collation, but usually they are different. The default

@@ -6467,7 +6473,7 @@
           character set:
         </para>
 
-        <orderedlist>
+        <itemizedlist>
 
           <listitem>
             <para>

@@ -6509,7 +6515,7 @@
             </para>
           </listitem>
 
-        </orderedlist>
+        </itemizedlist>
       </listitem>
 
       <listitem>

@@ -6655,7 +6661,8 @@
 
       <para>
         Each simple character set has a configuration file located in
-        the <filename>sql/share/charsets</filename> directory. The file
+        the <filename>sql/share/charsets</filename> directory. For a
+        character set named <replaceable>MYSYS</replaceable>, the file
         is named
         <filename><replaceable>MYSET</replaceable>.xml</filename>. It
         uses <literal>&lt;map&gt;</literal> array elements to list

@@ -6707,17 +6714,18 @@
         <literal>ctype_<replaceable>MYSET</replaceable>[]</literal>,
         <literal>to_lower_<replaceable>MYSET</replaceable>[]</literal>,
         and so forth. Not every complex character set has all of the
-        arrays. See the existing <filename>ctype-*.c</filename> files
-        for examples. See the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
+        arrays. See also the existing <filename>ctype-*.c</filename>
+        files for examples. See the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
         information.
       </para>
 
       <para>
-        The <literal>&lt;ctype&gt;</literal> array is indexed by
-        character value + 1 and has 257 elements. This is a legacy
-        convention for handling <literal>EOF</literal>. The other arrays
-        are indexed by character value and have 256 elements.
+        Most of the arrays are indexed by character value and have 256
+        elements. The <literal>&lt;ctype&gt;</literal> array is indexed
+        by character value + 1 and has 257 elements. This is a legacy
+        convention for handling <literal>EOF</literal>.
       </para>
 
       <para>

@@ -6772,14 +6780,14 @@
 </programlisting>
 
       <para>
-        Each <literal>&lt;collation&gt;</literal> element contains a
-        mapping array that indicates how characters should be ordered
-        for comparison and sorting purposes. MySQL sorts characters
-        based on the values of this information. In some cases, this is
-        the same as the <literal>upper</literal> array, which means that
-        sorting is case-insensitive. For more complicated sorting rules
-        (for complex character sets), see the discussion of string
-        collating in <xref linkend="string-collating"/>.
+        Each <literal>&lt;collation&gt;</literal> array indicates how
+        characters should be ordered for comparison and sorting
+        purposes. MySQL sorts characters based on the values of this
+        information. In some cases, this is the same as the
+        <literal>&lt;upper&gt;</literal> array, which means that sorting
+        is case-insensitive. For more complicated sorting rules (for
+        complex character sets), see the discussion of string collating
+        in <xref linkend="string-collating"/>.
       </para>
 
     </section>

@@ -6798,12 +6806,13 @@
       </indexterm>
 
       <para>
-        For simple character sets, sorting rules are specified in the
-        <filename><replaceable>MYSET</replaceable>.xml</filename>
+        For a simple character set named
+        <replaceable>MYSET</replaceable>, sorting rules are specified in
+        the <filename><replaceable>MYSET</replaceable>.xml</filename>
         configuration file using <literal>&lt;map&gt;</literal> array
         elements within <literal>&lt;collation&gt;</literal> elements.
         If the sorting rules for your language are too complex to be
-        handled with simple arrays, you need to define string collating
+        handled with simple arrays, you must define string collating
         functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.

@@ -6818,9 +6827,10 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>tis160</literal> character sets. Take a look at the
         <literal>MY_COLLATION_HANDLER</literal> structures to see how
-        they are used, and see the <filename>CHARSET_INFO.txt</filename>
-        file in the <filename>strings</filename> directory for
-        additional information.
+        they are used. See also the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
+        information.
       </para>
 
     </section>

@@ -6839,9 +6849,9 @@
       </indexterm>
 
       <para>
-        If you want to add support for a new character set that includes
-        multi-byte characters, you need to use multi-byte character
-        functions in the
+        If you want to add support for a new character set named
+        <replaceable>MYSET</replaceable> that includes multi-byte
+        characters, you must use multi-byte character functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.
       </para>

@@ -6855,9 +6865,9 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>ujis</literal> character sets. Take a look at the
         <literal>MY_CHARSET_HANDLER</literal> structures to see how they
-        are used, and see the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
-        information.
+        are used. See also the <filename>CHARSET_INFO.txt</filename>
+        file in the <filename>strings</filename> directory for
+        additional information.
       </para>
 
     </section>

@@ -6917,9 +6927,9 @@
     </itemizedlist>
 
     <para>
-      The following discussion describes how to add collations of the
-      first two types to existing character sets. All existing character
-      sets already have a binary collation, so there is no need here to
+      The following sections describe how to add collations of the first
+      two types to existing character sets. All existing character sets
+      already have a binary collation, so there is no need here to
       describe how to add one.
     </para>
 

@@ -6965,10 +6975,8 @@
       all the information required for a complete character set, just
       modify the appropriate files for an existing character set. That
       is, based on what is already present for the character set's
-      current collations, add new data structures, functions, and
-      configuration information for the new collation. For an example,
-      see the MySQL Blog article in the following list of additional
-      resources.
+      current collations, add data structures, functions, and
+      configuration information for the new collation.
     </para>
 
     <bridgehead>

@@ -7045,6 +7053,11 @@
 </programlisting>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-collation-simple-8bit"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Complex collations for 8-bit character
         sets</emphasis>
       </para>

@@ -7098,6 +7111,11 @@
       </itemizedlist>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-character-set"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Collations for Unicode multi-byte
         character sets</emphasis>
       </para>

@@ -7184,6 +7202,12 @@
       </para>
 
       <para>
+        For implementation instructions, for a non-UCA colluation, see
+        <xref linkend="adding-character-set"/>. For a UCA collation, see
+        <xref linkend="adding-collation-unicode-uca"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Miscellaneous collations</emphasis>
       </para>
 

@@ -7209,16 +7233,16 @@
 
         <listitem>
           <para>
-            The <literal>Id</literal> column of
-            <literal role="stmt">SHOW COLLATION</literal> output
+            The <literal>ID</literal> column of the
+            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
+            table
           </para>
         </listitem>
 
         <listitem>
           <para>
-            The <literal>ID</literal> column of the
-            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
-            table
+            The <literal>Id</literal> column of
+            <literal role="stmt">SHOW COLLATION</literal> output
           </para>
         </listitem>
 

@@ -7383,9 +7407,9 @@
             add a <literal>&lt;collation&gt;</literal> element that
             names the collation and that contains a
             <literal>&lt;map&gt;</literal> element that defines a
-            character code-to-weight mapping table. Each word within the
-            <literal>&lt;map&gt;</literal> element must be a number in
-            hexadecimal format.
+            character code-to-weight mapping table for character codes 0
+            to 255. Each value within the <literal>&lt;map&gt;</literal>
+            element must be a number in hexadecimal format.
           </para>
 
 <programlisting>

@@ -7442,8 +7466,8 @@
         <literal>&lt;collation&gt;</literal> element within a
         <literal>&lt;charset&gt;</literal> character set description.
         The procedure described here does not require recompiling MySQL.
-        It uses a subset of the Locale Data Markup Language (LDML),
-        which is available at
+        It uses a subset of the Locale Data Markup Language (LDML)
+        specification, which is available at
         <ulink url="http://www.unicode.org/reports/tr35/"/>. In
         &current-series;, this method of adding collations is supported
         as of MySQL 5.1.20. With this method, you need not define the

@@ -7454,7 +7478,8 @@
         for which UCA collations can be defined.
       </para>
 
-      <informaltable>
+      <table>
+        <title>MySQL Character Sets Available for User-Defined UCA Collations</title>
         <tgroup cols="2">
           <colspec colwidth="30*"/>
           <colspec colwidth="60*"/>

@@ -7475,65 +7500,79 @@
             </row>
           </tbody>
         </tgroup>
-      </informaltable>
+      </table>
 
       <para>
-        The following brief summary describes the LDML characteristics
-        required to understand the procedure for adding a collation
-        given later in this section:
+        The following sections show how to add a collation that is
+        defined using LDML syntax, and provide a summary of LDML rules
+        supported in MySQL.
       </para>
 
-      <itemizedlist>
+      <section id="ldml-rules">
 
-        <listitem>
-          <para>
-            LDML has reset rules and shift rules.
-          </para>
-        </listitem>
+        <title>LDML Syntax Supported in MySQL</title>
 
-        <listitem>
-          <para>
-            Characters named in these rules can be written in
-            <literal>\u<replaceable>nnnn</replaceable></literal> format,
-            where <replaceable>nnnn</replaceable> is the hexadecimal
-            Unicode code point value. Basic Latin letters
-            <literal>A-Z</literal> and <literal>a-z</literal> can also
-            be written literally (this is a MySQL limitation; the LDML
-            specification permits literal non-Latin1 characters in the
-            rules). Only characters in the Basic Multilingual Plane can
-            be specified. This notation does not apply to characters
-            outside the BMP range of <literal>0000</literal> to
-            <literal>FFFF</literal>.
-          </para>
-        </listitem>
+        <para>
+          This section describes the LDML rules that MySQL recognizes.
+          These are a subset of the rules described in the LDML
+          specification available at
+          <ulink url="http://www.unicode.org/reports/tr35/"/>. The rules
+          here are all supported except that character sorting occurs
+          only at the primary level. Rules that specify secondary or
+          higher sort levels are recognized but have no effect.
+        </para>
 
-        <listitem>
-          <para>
-            A reset rule does not specify any ordering in and of itself.
-            Instead, it <quote>resets</quote> the ordering for
-            subsequent shift rules to cause them to be taken in relation
-            to a given character. Either of the following rules resets
-            subsequent shift rules to be taken in relation to the letter
-            <literal>'A'</literal>:
-          </para>
+        <itemizedlist>
 
+          <listitem>
+            <para>
+              Characters named in LDML rules can be written in
+              <literal>\u<replaceable>nnnn</replaceable></literal>
+              format, where <replaceable>nnnn</replaceable> is the
+              hexadecimal Unicode code point value. Basic Latin letters
+              <literal>A-Z</literal> and <literal>a-z</literal> can also
+              be written literally (this is a MySQL limitation; the LDML
+              specification permits literal non-Latin1 characters in the
+              rules). Only characters in the Basic Multilingual Plane
+              can be specified. This notation does not apply to
+              characters outside the BMP range of
+              <literal>0000</literal> to <literal>FFFF</literal>.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              LDML has reset rules and shift rules.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              A reset rule does not specify any ordering in and of
+              itself. Instead, it <quote>resets</quote> the ordering for
+              subsequent shift rules to cause them to be taken in
+              relation to a given character. Either of the following
+              rules resets subsequent shift rules to be taken in
+              relation to the letter <literal>'A'</literal>:
+            </para>
+
 <programlisting>
 &lt;reset&gt;A&lt;/reset&gt;
 
 &lt;reset&gt;\u0041&lt;/reset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Shift rules define primary, secondary, and tertiary
-            differences of a character from another character. They are
-            specified using <literal>&lt;p&gt;</literal>,
-            <literal>&lt;s&gt;</literal>, and
-            <literal>&lt;t&gt;</literal> elements. Either of the
-            following rules specifies a primary shift rule for the
-            <literal>'G'</literal> character:
-          </para>
+          <listitem>
+            <para>
+              Shift rules define primary, secondary, and tertiary
+              differences of a character from another character. They
+              are specified using <literal>&lt;p&gt;</literal>,
+              <literal>&lt;s&gt;</literal>, and
+              <literal>&lt;t&gt;</literal> elements. Either of the
+              following rules specifies a primary shift rule for the
+              <literal>'G'</literal> character:
+            </para>
 
 <programlisting>
 &lt;p&gt;G&lt;/p&gt;

@@ -7541,43 +7580,57 @@
 &lt;p&gt;\u0047&lt;/p&gt;
 </programlisting>
 
-          <itemizedlist>
+            <itemizedlist>
 
-            <listitem>
-              <para>
-                Use primary differences to distinguish separate letters.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use primary differences to distinguish separate
+                  letters.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use secondary differences to distinguish accent
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use secondary differences to distinguish accent
+                  variations.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use tertiary differences to distinguish lettercase
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use tertiary differences to distinguish lettercase
+                  variations.
+                </para>
+              </listitem>
 
-          </itemizedlist>
-        </listitem>
+            </itemizedlist>
+          </listitem>
 
-      </itemizedlist>
+        </itemizedlist>
 
-      <para>
-        To add a UCA collation for a Unicode character set without
-        recompiling MySQL, use the following procedure. The example adds
-        a collation named <literal>utf8_phone_ci</literal> to the
-        <literal>utf8</literal> character set. The collation is designed
-        for a scenario involving a Web application for which users post
-        their names and phone numbers. Phone numbers can be given in
-        very different formats:
-      </para>
+      </section>
 
+      <section id="ldml-collation-example">
+
+        <title>Defining a UCA Collation using LDML Syntax</title>
+
+        <para>
+          To add a UCA collation for a Unicode character set without
+          recompiling MySQL, use the following procedure. If you are
+          unfamiliar with the LDML rules used to describe the
+          collation's sort characteristics, see
+          <xref linkend="ldml-rules"/>.
+        </para>
+
+        <para>
+          The example adds a collation named
+          <literal>utf8_phone_ci</literal> to the
+          <literal>utf8</literal> character set. The collation is
+          designed for a scenario involving a Web application for which
+          users post their names and phone numbers. Phone numbers can be
+          given in very different formats:
+        </para>
+
 <programlisting>
 +7-12345-67
 +7-12-345-67

@@ -7586,33 +7639,33 @@
 +71234567
 </programlisting>
 
-      <para>
-        The problem raised by dealing with these kinds of values is that
-        the varying permissible formats make searching for a specific
-        phone number very difficult. The solution is to define a new
-        collation that reorders punctuation characters, making them
-        ignorable.
-      </para>
+        <para>
+          The problem raised by dealing with these kinds of values is
+          that the varying permissible formats make searching for a
+          specific phone number very difficult. The solution is to
+          define a new collation that reorders punctuation characters,
+          making them ignorable.
+        </para>
 
-      <orderedlist>
+        <orderedlist>
 
-        <listitem>
-          <para>
-            Choose a collation ID, as shown in
-            <xref linkend="adding-collation-choosing-id"/>. The
-            following steps use an ID of 252.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              Choose a collation ID, as shown in
+              <xref linkend="adding-collation-choosing-id"/>. The
+              following steps use an ID of 252.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            To modify the <literal>Index.xml</literal> configuration
-            file. This file will be located in the directory named by
-            the <literal role="sysvar">character_sets_dir</literal>
-            system variable. You can check the variable value as
-            follows, although the path name might be different on your
-            system:
-          </para>
+          <listitem>
+            <para>
+              To modify the <literal>Index.xml</literal> configuration
+              file. This file will be located in the directory named by
+              the <literal role="sysvar">character_sets_dir</literal>
+              system variable. You can check the variable value as
+              follows, although the path name might be different on your
+              system:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW VARIABLES LIKE 'character_sets_dir';</userinput>

@@ -7622,21 +7675,22 @@
 | character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
 +--------------------+-----------------------------------------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Choose a name for the collation and list it in the
-            <filename>Index.xml</filename> file. In addition, you'll
-            need to provide the collation ordering rules. Find the
-            <literal>&lt;charset&gt;</literal> element for the character
-            set to which the collation is being added, and add a
-            <literal>&lt;collation&gt;</literal> element that indicates
-            the collation name and ID, to associate the name with the
-            ID. Within the <literal>&lt;collation&gt;</literal> element,
-            provide a <literal>&lt;rules&gt;</literal> element
-            containing the ordering rules:
-          </para>
+          <listitem>
+            <para>
+              Choose a name for the collation and list it in the
+              <filename>Index.xml</filename> file. In addition, you'll
+              need to provide the collation ordering rules. Find the
+              <literal>&lt;charset&gt;</literal> element for the
+              character set to which the collation is being added, and
+              add a <literal>&lt;collation&gt;</literal> element that
+              indicates the collation name and ID, to associate the name
+              with the ID. Within the
+              <literal>&lt;collation&gt;</literal> element, provide a
+              <literal>&lt;rules&gt;</literal> element containing the
+              ordering rules:
+            </para>
 
 <programlisting>
 &lt;charset name="utf8"&gt;

@@ -7654,25 +7708,25 @@
   ...
 &lt;/charset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            If you want a similar collation for other Unicode character
-            sets, add other <literal>&lt;collation&gt;</literal>
-            elements. For example, to define
-            <literal>ucs2_phone_ci</literal>, add a
-            <literal>&lt;collation&gt;</literal> element to the
-            <literal>&lt;charset name="ucs2"&gt;</literal> element.
-            Remember that each collation must have its own unique ID.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              If you want a similar collation for other Unicode
+              character sets, add other
+              <literal>&lt;collation&gt;</literal> elements. For
+              example, to define <literal>ucs2_phone_ci</literal>, add a
+              <literal>&lt;collation&gt;</literal> element to the
+              <literal>&lt;charset name="ucs2"&gt;</literal> element.
+              Remember that each collation must have its own unique ID.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            Restart the server and use this statement to verify that the
-            collation is present:
-          </para>
+          <listitem>
+            <para>
+              Restart the server and use this statement to verify that
+              the collation is present:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW COLLATION LIKE 'utf8_phone_ci';</userinput>

@@ -7682,19 +7736,19 @@
 | utf8_phone_ci | utf8    | 252 |         |          |       8 |
 +---------------+---------+-----+---------+----------+---------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-      </orderedlist>
+        </orderedlist>
 
-      <para>
-        Now test the collation to make sure that it has the desired
-        properties.
-      </para>
+        <para>
+          Now test the collation to make sure that it has the desired
+          properties.
+        </para>
 
-      <para>
-        Create a table containing some sample phone numbers using the
-        new collation:
-      </para>
+        <para>
+          Create a table containing some sample phone numbers using the
+          new collation:
+        </para>
 
 <programlisting>
 <!--

@@ -7723,10 +7777,10 @@
 Query OK, 1 row affected (0.00 sec)
 </programlisting>
 
-      <para>
-        Run some queries to see whether the ignored punctuation
-        characters are in fact ignored for sorting and comparisons:
-      </para>
+        <para>
+          Run some queries to see whether the ignored punctuation
+          characters are in fact ignored for sorting and comparisons:
+        </para>
 
 <programlisting>
 mysql&gt; <userinput>SELECT * FROM phonebook ORDER BY phone;</userinput>

@@ -7766,6 +7820,8 @@
 1 row in set (0.00 sec)
 </programlisting>
 
+      </section>
+
     </section>
 
   </section>


Modified: trunk/refman-5.5/internationalization.xml
===================================================================
--- trunk/refman-5.5/internationalization.xml	2011-05-13 16:12:35 UTC (rev 26217)
+++ trunk/refman-5.5/internationalization.xml	2011-05-13 17:41:11 UTC (rev 26218)
Changed blocks: 29, Lines Added: 267, Lines Deleted: 210; 27861 bytes

@@ -7585,9 +7585,9 @@
 
       <listitem>
         <para>
-          If the character set does not need to use special string
-          collating routines for sorting and does not need multi-byte
-          character support, it is simple.
+          If the character set does not need special string collating
+          routines for sorting and does not need multi-byte character
+          support, it is simple.
         </para>
       </listitem>
 

@@ -7621,7 +7621,8 @@
           <replaceable>MYSET</replaceable> to the
           <filename>sql/share/charsets/Index.xml</filename> file. Use
           the existing contents in the file as a guide to adding new
-          contents.
+          contents. A partial listing for the <literal>latin1</literal>
+          <literal>&lt;charset&gt;</literal> element follows:
         </para>
 
 <programlisting>

@@ -7635,14 +7636,19 @@
   &lt;/collation&gt;
   &lt;collation name="latin1_danish_ci"	id="15"	order="Danish"/&gt;
   ...
+  &lt;collation name="latin1_bin"		id="47"	order="Binary"&gt;
+    &lt;flag&gt;binary&lt;/flag&gt;
+    &lt;flag&gt;compiled&lt;/flag&gt;
+  &lt;/collation&gt;
+  ...
 &lt;/charset&gt;
 </programlisting>
 
         <para>
           The <literal>&lt;charset&gt;</literal> element must list all
           the collations for the character set. These must include at
-          least a binary collation and a default collation. The default
-          collation is usually named using a suffix of
+          least a binary collation and a default (primary) collation.
+          The default collation is often named using a suffix of
           <literal>general_ci</literal> (general, case insensitive). It
           is possible for the binary collation to be the default
           collation, but usually they are different. The default

@@ -7735,7 +7741,7 @@
           character set:
         </para>
 
-        <orderedlist>
+        <itemizedlist>
 
           <listitem>
             <para>

@@ -7777,7 +7783,7 @@
             </para>
           </listitem>
 
-        </orderedlist>
+        </itemizedlist>
       </listitem>
 
       <listitem>

@@ -7879,7 +7885,8 @@
 
       <para>
         Each simple character set has a configuration file located in
-        the <filename>sql/share/charsets</filename> directory. The file
+        the <filename>sql/share/charsets</filename> directory. For a
+        character set named <replaceable>MYSYS</replaceable>, the file
         is named
         <filename><replaceable>MYSET</replaceable>.xml</filename>. It
         uses <literal>&lt;map&gt;</literal> array elements to list

@@ -7931,17 +7938,18 @@
         <literal>ctype_<replaceable>MYSET</replaceable>[]</literal>,
         <literal>to_lower_<replaceable>MYSET</replaceable>[]</literal>,
         and so forth. Not every complex character set has all of the
-        arrays. See the existing <filename>ctype-*.c</filename> files
-        for examples. See the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
+        arrays. See also the existing <filename>ctype-*.c</filename>
+        files for examples. See the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
         information.
       </para>
 
       <para>
-        The <literal>&lt;ctype&gt;</literal> array is indexed by
-        character value + 1 and has 257 elements. This is a legacy
-        convention for handling <literal>EOF</literal>. The other arrays
-        are indexed by character value and have 256 elements.
+        Most of the arrays are indexed by character value and have 256
+        elements. The <literal>&lt;ctype&gt;</literal> array is indexed
+        by character value + 1 and has 257 elements. This is a legacy
+        convention for handling <literal>EOF</literal>.
       </para>
 
       <para>

@@ -7996,14 +8004,14 @@
 </programlisting>
 
       <para>
-        Each <literal>&lt;collation&gt;</literal> element contains a
-        mapping array that indicates how characters should be ordered
-        for comparison and sorting purposes. MySQL sorts characters
-        based on the values of this information. In some cases, this is
-        the same as the <literal>upper</literal> array, which means that
-        sorting is case-insensitive. For more complicated sorting rules
-        (for complex character sets), see the discussion of string
-        collating in <xref linkend="string-collating"/>.
+        Each <literal>&lt;collation&gt;</literal> array indicates how
+        characters should be ordered for comparison and sorting
+        purposes. MySQL sorts characters based on the values of this
+        information. In some cases, this is the same as the
+        <literal>&lt;upper&gt;</literal> array, which means that sorting
+        is case-insensitive. For more complicated sorting rules (for
+        complex character sets), see the discussion of string collating
+        in <xref linkend="string-collating"/>.
       </para>
 
     </section>

@@ -8022,12 +8030,13 @@
       </indexterm>
 
       <para>
-        For simple character sets, sorting rules are specified in the
-        <filename><replaceable>MYSET</replaceable>.xml</filename>
+        For a simple character set named
+        <replaceable>MYSET</replaceable>, sorting rules are specified in
+        the <filename><replaceable>MYSET</replaceable>.xml</filename>
         configuration file using <literal>&lt;map&gt;</literal> array
         elements within <literal>&lt;collation&gt;</literal> elements.
         If the sorting rules for your language are too complex to be
-        handled with simple arrays, you need to define string collating
+        handled with simple arrays, you must define string collating
         functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.

@@ -8042,9 +8051,10 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>tis160</literal> character sets. Take a look at the
         <literal>MY_COLLATION_HANDLER</literal> structures to see how
-        they are used, and see the <filename>CHARSET_INFO.txt</filename>
-        file in the <filename>strings</filename> directory for
-        additional information.
+        they are used. See also the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
+        information.
       </para>
 
     </section>

@@ -8063,9 +8073,9 @@
       </indexterm>
 
       <para>
-        If you want to add support for a new character set that includes
-        multi-byte characters, you need to use multi-byte character
-        functions in the
+        If you want to add support for a new character set named
+        <replaceable>MYSET</replaceable> that includes multi-byte
+        characters, you must use multi-byte character functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.
       </para>

@@ -8079,9 +8089,9 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>ujis</literal> character sets. Take a look at the
         <literal>MY_CHARSET_HANDLER</literal> structures to see how they
-        are used, and see the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
-        information.
+        are used. See also the <filename>CHARSET_INFO.txt</filename>
+        file in the <filename>strings</filename> directory for
+        additional information.
       </para>
 
     </section>

@@ -8141,9 +8151,9 @@
     </itemizedlist>
 
     <para>
-      The following discussion describes how to add collations of the
-      first two types to existing character sets. All existing character
-      sets already have a binary collation, so there is no need here to
+      The following sections describe how to add collations of the first
+      two types to existing character sets. All existing character sets
+      already have a binary collation, so there is no need here to
       describe how to add one.
     </para>
 

@@ -8189,10 +8199,8 @@
       all the information required for a complete character set, just
       modify the appropriate files for an existing character set. That
       is, based on what is already present for the character set's
-      current collations, add new data structures, functions, and
-      configuration information for the new collation. For an example,
-      see the MySQL Blog article in the following list of additional
-      resources.
+      current collations, add data structures, functions, and
+      configuration information for the new collation.
     </para>
 
     <bridgehead>

@@ -8269,6 +8277,11 @@
 </programlisting>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-collation-simple-8bit"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Complex collations for 8-bit character
         sets</emphasis>
       </para>

@@ -8322,6 +8335,11 @@
       </itemizedlist>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-character-set"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Collations for Unicode multi-byte
         character sets</emphasis>
       </para>

@@ -8408,6 +8426,12 @@
       </para>
 
       <para>
+        For implementation instructions, for a non-UCA colluation, see
+        <xref linkend="adding-character-set"/>. For a UCA collation, see
+        <xref linkend="adding-collation-unicode-uca"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Miscellaneous collations</emphasis>
       </para>
 

@@ -8434,16 +8458,16 @@
 
         <listitem>
           <para>
-            The <literal>Id</literal> column of
-            <literal role="stmt">SHOW COLLATION</literal> output
+            The <literal>ID</literal> column of the
+            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
+            table
           </para>
         </listitem>
 
         <listitem>
           <para>
-            The <literal>ID</literal> column of the
-            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
-            table
+            The <literal>Id</literal> column of
+            <literal role="stmt">SHOW COLLATION</literal> output
           </para>
         </listitem>
 

@@ -8597,9 +8621,9 @@
             add a <literal>&lt;collation&gt;</literal> element that
             names the collation and that contains a
             <literal>&lt;map&gt;</literal> element that defines a
-            character code-to-weight mapping table. Each word within the
-            <literal>&lt;map&gt;</literal> element must be a number in
-            hexadecimal format.
+            character code-to-weight mapping table for character codes 0
+            to 255. Each value within the <literal>&lt;map&gt;</literal>
+            element must be a number in hexadecimal format.
           </para>
 
 <programlisting>

@@ -8656,8 +8680,8 @@
         <literal>&lt;collation&gt;</literal> element within a
         <literal>&lt;charset&gt;</literal> character set description.
         The procedure described here does not require recompiling MySQL.
-        It uses a subset of the Locale Data Markup Language (LDML),
-        which is available at
+        It uses a subset of the Locale Data Markup Language (LDML)
+        specification, which is available at
         <ulink url="http://www.unicode.org/reports/tr35/"/>. With this
         method, you need not define the entire collation. Instead, you
         begin with an existing <quote>base</quote> collation and

@@ -8667,7 +8691,8 @@
         defined.
       </para>
 
-      <informaltable>
+      <table>
+        <title>MySQL Character Sets Available for User-Defined UCA Collations</title>
         <tgroup cols="2">
           <colspec colwidth="30*"/>
           <colspec colwidth="60*"/>

@@ -8696,65 +8721,79 @@
             </row>
           </tbody>
         </tgroup>
-      </informaltable>
+      </table>
 
       <para>
-        The following brief summary describes the LDML characteristics
-        required to understand the procedure for adding a collation
-        given later in this section:
+        The following sections show how to add a collation that is
+        defined using LDML syntax, and provide a summary of LDML rules
+        supported in MySQL.
       </para>
 
-      <itemizedlist>
+      <section id="ldml-rules">
 
-        <listitem>
-          <para>
-            LDML has reset, shift, and identity rules.
-          </para>
-        </listitem>
+        <title>LDML Syntax Supported in MySQL</title>
 
-        <listitem>
-          <para>
-            Characters named in these rules can be written in
-            <literal>\u<replaceable>nnnn</replaceable></literal> format,
-            where <replaceable>nnnn</replaceable> is the hexadecimal
-            Unicode code point value. Basic Latin letters
-            <literal>A-Z</literal> and <literal>a-z</literal> can also
-            be written literally (this is a MySQL limitation; the LDML
-            specification permits literal non-Latin1 characters in the
-            rules). Only characters in the Basic Multilingual Plane can
-            be specified. This notation does not apply to characters
-            outside the BMP range of <literal>0000</literal> to
-            <literal>FFFF</literal>.
-          </para>
-        </listitem>
+        <para>
+          This section describes the LDML rules that MySQL recognizes.
+          These are a subset of the rules described in the LDML
+          specification available at
+          <ulink url="http://www.unicode.org/reports/tr35/"/>. The rules
+          here are all supported except that character sorting occurs
+          only at the primary level. Rules that specify secondary or
+          higher sort levels are recognized but have no effect.
+        </para>
 
-        <listitem>
-          <para>
-            A reset rule does not specify any ordering in and of itself.
-            Instead, it <quote>resets</quote> the ordering for
-            subsequent shift rules to cause them to be taken in relation
-            to a given character. Either of the following rules resets
-            subsequent shift rules to be taken in relation to the letter
-            <literal>'A'</literal>:
-          </para>
+        <itemizedlist>
 
+          <listitem>
+            <para>
+              Characters named in LDML rules can be written in
+              <literal>\u<replaceable>nnnn</replaceable></literal>
+              format, where <replaceable>nnnn</replaceable> is the
+              hexadecimal Unicode code point value. Basic Latin letters
+              <literal>A-Z</literal> and <literal>a-z</literal> can also
+              be written literally (this is a MySQL limitation; the LDML
+              specification permits literal non-Latin1 characters in the
+              rules). Only characters in the Basic Multilingual Plane
+              can be specified. This notation does not apply to
+              characters outside the BMP range of
+              <literal>0000</literal> to <literal>FFFF</literal>.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              LDML has reset rules and shift rules.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              A reset rule does not specify any ordering in and of
+              itself. Instead, it <quote>resets</quote> the ordering for
+              subsequent shift rules to cause them to be taken in
+              relation to a given character. Either of the following
+              rules resets subsequent shift rules to be taken in
+              relation to the letter <literal>'A'</literal>:
+            </para>
+
 <programlisting>
 &lt;reset&gt;A&lt;/reset&gt;
 
 &lt;reset&gt;\u0041&lt;/reset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Shift rules define primary, secondary, and tertiary
-            differences of a character from another character. They are
-            specified using <literal>&lt;p&gt;</literal>,
-            <literal>&lt;s&gt;</literal>, and
-            <literal>&lt;t&gt;</literal> elements. Either of the
-            following rules specifies a primary shift rule for the
-            <literal>'G'</literal> character:
-          </para>
+          <listitem>
+            <para>
+              Shift rules define primary, secondary, and tertiary
+              differences of a character from another character. They
+              are specified using <literal>&lt;p&gt;</literal>,
+              <literal>&lt;s&gt;</literal>, and
+              <literal>&lt;t&gt;</literal> elements. Either of the
+              following rules specifies a primary shift rule for the
+              <literal>'G'</literal> character:
+            </para>
 
 <programlisting>
 &lt;p&gt;G&lt;/p&gt;

@@ -8762,62 +8801,77 @@
 &lt;p&gt;\u0047&lt;/p&gt;
 </programlisting>
 
-          <itemizedlist>
+            <itemizedlist>
 
-            <listitem>
-              <para>
-                Use primary differences to distinguish separate letters.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use primary differences to distinguish separate
+                  letters.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use secondary differences to distinguish accent
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use secondary differences to distinguish accent
+                  variations.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use tertiary differences to distinguish lettercase
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use tertiary differences to distinguish lettercase
+                  variations.
+                </para>
+              </listitem>
 
-          </itemizedlist>
-        </listitem>
+            </itemizedlist>
+          </listitem>
 
-        <listitem>
-          <para>
-            Identity rules indicate that one character sorts identically
-            to another. The following rules cause <literal>'b'</literal>
-            sort the same as <literal>'a'</literal>:
-          </para>
+          <listitem>
+            <para>
+              Identity rules indicate that one character sorts
+              identically to another. The following rules cause
+              <literal>'b'</literal> sort the same as
+              <literal>'a'</literal>:
+            </para>
 
 <programlisting>
 &lt;reset&gt;a&lt;/reset&gt;
 &lt;i&gt;b&lt;/i&gt;
 </programlisting>
 
-          <para>
-            Identity rules are supported as of MySQL 5.5.3. Prior to
-            5.5.3, use <literal>&lt;s&gt; ... &lt;/s&gt;</literal>
-            instead.
-          </para>
-        </listitem>
+            <para>
+              Identity rules are supported as of MySQL 5.5.3. Prior to
+              5.5.3, use <literal>&lt;s&gt; ... &lt;/s&gt;</literal>
+              instead.
+            </para>
+          </listitem>
 
-      </itemizedlist>
+        </itemizedlist>
 
-      <para>
-        To add a UCA collation for a Unicode character set without
-        recompiling MySQL, use the following procedure. The example adds
-        a collation named <literal>utf8_phone_ci</literal> to the
-        <literal>utf8</literal> character set. The collation is designed
-        for a scenario involving a Web application for which users post
-        their names and phone numbers. Phone numbers can be given in
-        very different formats:
-      </para>
+      </section>
 
+      <section id="ldml-collation-example">
+
+        <title>Defining a UCA Collation using LDML Syntax</title>
+
+        <para>
+          To add a UCA collation for a Unicode character set without
+          recompiling MySQL, use the following procedure. If you are
+          unfamiliar with the LDML rules used to describe the
+          collation's sort characteristics, see
+          <xref linkend="ldml-rules"/>.
+        </para>
+
+        <para>
+          The example adds a collation named
+          <literal>utf8_phone_ci</literal> to the
+          <literal>utf8</literal> character set. The collation is
+          designed for a scenario involving a Web application for which
+          users post their names and phone numbers. Phone numbers can be
+          given in very different formats:
+        </para>
+
 <programlisting>
 +7-12345-67
 +7-12-345-67

@@ -8826,33 +8880,33 @@
 +71234567
 </programlisting>
 
-      <para>
-        The problem raised by dealing with these kinds of values is that
-        the varying permissible formats make searching for a specific
-        phone number very difficult. The solution is to define a new
-        collation that reorders punctuation characters, making them
-        ignorable.
-      </para>
+        <para>
+          The problem raised by dealing with these kinds of values is
+          that the varying permissible formats make searching for a
+          specific phone number very difficult. The solution is to
+          define a new collation that reorders punctuation characters,
+          making them ignorable.
+        </para>
 
-      <orderedlist>
+        <orderedlist>
 
-        <listitem>
-          <para>
-            Choose a collation ID, as shown in
-            <xref linkend="adding-collation-choosing-id"/>. The
-            following steps use an ID of 1029.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              Choose a collation ID, as shown in
+              <xref linkend="adding-collation-choosing-id"/>. The
+              following steps use an ID of 1029.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            To modify the <literal>Index.xml</literal> configuration
-            file. This file will be located in the directory named by
-            the <literal role="sysvar">character_sets_dir</literal>
-            system variable. You can check the variable value as
-            follows, although the path name might be different on your
-            system:
-          </para>
+          <listitem>
+            <para>
+              To modify the <literal>Index.xml</literal> configuration
+              file. This file will be located in the directory named by
+              the <literal role="sysvar">character_sets_dir</literal>
+              system variable. You can check the variable value as
+              follows, although the path name might be different on your
+              system:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW VARIABLES LIKE 'character_sets_dir';</userinput>

@@ -8862,21 +8916,22 @@
 | character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
 +--------------------+-----------------------------------------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Choose a name for the collation and list it in the
-            <filename>Index.xml</filename> file. In addition, you'll
-            need to provide the collation ordering rules. Find the
-            <literal>&lt;charset&gt;</literal> element for the character
-            set to which the collation is being added, and add a
-            <literal>&lt;collation&gt;</literal> element that indicates
-            the collation name and ID, to associate the name with the
-            ID. Within the <literal>&lt;collation&gt;</literal> element,
-            provide a <literal>&lt;rules&gt;</literal> element
-            containing the ordering rules:
-          </para>
+          <listitem>
+            <para>
+              Choose a name for the collation and list it in the
+              <filename>Index.xml</filename> file. In addition, you'll
+              need to provide the collation ordering rules. Find the
+              <literal>&lt;charset&gt;</literal> element for the
+              character set to which the collation is being added, and
+              add a <literal>&lt;collation&gt;</literal> element that
+              indicates the collation name and ID, to associate the name
+              with the ID. Within the
+              <literal>&lt;collation&gt;</literal> element, provide a
+              <literal>&lt;rules&gt;</literal> element containing the
+              ordering rules:
+            </para>
 
 <programlisting>
 &lt;charset name="utf8"&gt;

@@ -8894,25 +8949,25 @@
   ...
 &lt;/charset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            If you want a similar collation for other Unicode character
-            sets, add other <literal>&lt;collation&gt;</literal>
-            elements. For example, to define
-            <literal>ucs2_phone_ci</literal>, add a
-            <literal>&lt;collation&gt;</literal> element to the
-            <literal>&lt;charset name="ucs2"&gt;</literal> element.
-            Remember that each collation must have its own unique ID.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              If you want a similar collation for other Unicode
+              character sets, add other
+              <literal>&lt;collation&gt;</literal> elements. For
+              example, to define <literal>ucs2_phone_ci</literal>, add a
+              <literal>&lt;collation&gt;</literal> element to the
+              <literal>&lt;charset name="ucs2"&gt;</literal> element.
+              Remember that each collation must have its own unique ID.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            Restart the server and use this statement to verify that the
-            collation is present:
-          </para>
+          <listitem>
+            <para>
+              Restart the server and use this statement to verify that
+              the collation is present:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW COLLATION LIKE 'utf8_phone_ci';</userinput>

@@ -8922,19 +8977,19 @@
 | utf8_phone_ci | utf8    | 1029 |         |          |       8 |
 +---------------+---------+------+---------+----------+---------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-      </orderedlist>
+        </orderedlist>
 
-      <para>
-        Now test the collation to make sure that it has the desired
-        properties.
-      </para>
+        <para>
+          Now test the collation to make sure that it has the desired
+          properties.
+        </para>
 
-      <para>
-        Create a table containing some sample phone numbers using the
-        new collation:
-      </para>
+        <para>
+          Create a table containing some sample phone numbers using the
+          new collation:
+        </para>
 
 <programlisting>
 <!--

@@ -8963,10 +9018,10 @@
 Query OK, 1 row affected (0.00 sec)
 </programlisting>
 
-      <para>
-        Run some queries to see whether the ignored punctuation
-        characters are in fact ignored for sorting and comparisons:
-      </para>
+        <para>
+          Run some queries to see whether the ignored punctuation
+          characters are in fact ignored for sorting and comparisons:
+        </para>
 
 <programlisting>
 mysql&gt; <userinput>SELECT * FROM phonebook ORDER BY phone;</userinput>

@@ -9006,6 +9061,8 @@
 1 row in set (0.00 sec)
 </programlisting>
 
+      </section>
+
     </section>
 
   </section>


Modified: trunk/refman-5.6/internationalization.xml
===================================================================
--- trunk/refman-5.6/internationalization.xml	2011-05-13 16:12:35 UTC (rev 26217)
+++ trunk/refman-5.6/internationalization.xml	2011-05-13 17:41:11 UTC (rev 26218)
Changed blocks: 29, Lines Added: 278, Lines Deleted: 221; 29525 bytes

@@ -7725,9 +7725,9 @@
 
       <listitem>
         <para>
-          If the character set does not need to use special string
-          collating routines for sorting and does not need multi-byte
-          character support, it is simple.
+          If the character set does not need special string collating
+          routines for sorting and does not need multi-byte character
+          support, it is simple.
         </para>
       </listitem>
 

@@ -7761,7 +7761,8 @@
           <replaceable>MYSET</replaceable> to the
           <filename>sql/share/charsets/Index.xml</filename> file. Use
           the existing contents in the file as a guide to adding new
-          contents.
+          contents. A partial listing for the <literal>latin1</literal>
+          <literal>&lt;charset&gt;</literal> element follows:
         </para>
 
 <programlisting>

@@ -7775,14 +7776,19 @@
   &lt;/collation&gt;
   &lt;collation name="latin1_danish_ci"	id="15"	order="Danish"/&gt;
   ...
+  &lt;collation name="latin1_bin"		id="47"	order="Binary"&gt;
+    &lt;flag&gt;binary&lt;/flag&gt;
+    &lt;flag&gt;compiled&lt;/flag&gt;
+  &lt;/collation&gt;
+  ...
 &lt;/charset&gt;
 </programlisting>
 
         <para>
           The <literal>&lt;charset&gt;</literal> element must list all
           the collations for the character set. These must include at
-          least a binary collation and a default collation. The default
-          collation is usually named using a suffix of
+          least a binary collation and a default (primary) collation.
+          The default collation is often named using a suffix of
           <literal>general_ci</literal> (general, case insensitive). It
           is possible for the binary collation to be the default
           collation, but usually they are different. The default

@@ -7874,7 +7880,7 @@
           character set:
         </para>
 
-        <orderedlist>
+        <itemizedlist>
 
           <listitem>
             <para>

@@ -7916,7 +7922,7 @@
             </para>
           </listitem>
 
-        </orderedlist>
+        </itemizedlist>
       </listitem>
 
       <listitem>

@@ -8018,7 +8024,8 @@
 
       <para>
         Each simple character set has a configuration file located in
-        the <filename>sql/share/charsets</filename> directory. The file
+        the <filename>sql/share/charsets</filename> directory. For a
+        character set named <replaceable>MYSYS</replaceable>, the file
         is named
         <filename><replaceable>MYSET</replaceable>.xml</filename>. It
         uses <literal>&lt;map&gt;</literal> array elements to list

@@ -8070,17 +8077,18 @@
         <literal>ctype_<replaceable>MYSET</replaceable>[]</literal>,
         <literal>to_lower_<replaceable>MYSET</replaceable>[]</literal>,
         and so forth. Not every complex character set has all of the
-        arrays. See the existing <filename>ctype-*.c</filename> files
-        for examples. See the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
+        arrays. See also the existing <filename>ctype-*.c</filename>
+        files for examples. See the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
         information.
       </para>
 
       <para>
-        The <literal>&lt;ctype&gt;</literal> array is indexed by
-        character value + 1 and has 257 elements. This is a legacy
-        convention for handling <literal>EOF</literal>. The other arrays
-        are indexed by character value and have 256 elements.
+        Most of the arrays are indexed by character value and have 256
+        elements. The <literal>&lt;ctype&gt;</literal> array is indexed
+        by character value + 1 and has 257 elements. This is a legacy
+        convention for handling <literal>EOF</literal>.
       </para>
 
       <para>

@@ -8135,14 +8143,14 @@
 </programlisting>
 
       <para>
-        Each <literal>&lt;collation&gt;</literal> element contains a
-        mapping array that indicates how characters should be ordered
-        for comparison and sorting purposes. MySQL sorts characters
-        based on the values of this information. In some cases, this is
-        the same as the <literal>upper</literal> array, which means that
-        sorting is case-insensitive. For more complicated sorting rules
-        (for complex character sets), see the discussion of string
-        collating in <xref linkend="string-collating"/>.
+        Each <literal>&lt;collation&gt;</literal> array indicates how
+        characters should be ordered for comparison and sorting
+        purposes. MySQL sorts characters based on the values of this
+        information. In some cases, this is the same as the
+        <literal>&lt;upper&gt;</literal> array, which means that sorting
+        is case-insensitive. For more complicated sorting rules (for
+        complex character sets), see the discussion of string collating
+        in <xref linkend="string-collating"/>.
       </para>
 
     </section>

@@ -8161,12 +8169,13 @@
       </indexterm>
 
       <para>
-        For simple character sets, sorting rules are specified in the
-        <filename><replaceable>MYSET</replaceable>.xml</filename>
+        For a simple character set named
+        <replaceable>MYSET</replaceable>, sorting rules are specified in
+        the <filename><replaceable>MYSET</replaceable>.xml</filename>
         configuration file using <literal>&lt;map&gt;</literal> array
         elements within <literal>&lt;collation&gt;</literal> elements.
         If the sorting rules for your language are too complex to be
-        handled with simple arrays, you need to define string collating
+        handled with simple arrays, you must define string collating
         functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.

@@ -8181,9 +8190,10 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>tis160</literal> character sets. Take a look at the
         <literal>MY_COLLATION_HANDLER</literal> structures to see how
-        they are used, and see the <filename>CHARSET_INFO.txt</filename>
-        file in the <filename>strings</filename> directory for
-        additional information.
+        they are used. See also the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
+        information.
       </para>
 
     </section>

@@ -8202,9 +8212,9 @@
       </indexterm>
 
       <para>
-        If you want to add support for a new character set that includes
-        multi-byte characters, you need to use multi-byte character
-        functions in the
+        If you want to add support for a new character set named
+        <replaceable>MYSET</replaceable> that includes multi-byte
+        characters, you must use multi-byte character functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.
       </para>

@@ -8218,9 +8228,9 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>ujis</literal> character sets. Take a look at the
         <literal>MY_CHARSET_HANDLER</literal> structures to see how they
-        are used, and see the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
-        information.
+        are used. See also the <filename>CHARSET_INFO.txt</filename>
+        file in the <filename>strings</filename> directory for
+        additional information.
       </para>
 
     </section>

@@ -8307,9 +8317,9 @@
     </itemizedlist>
 
     <para>
-      The following discussion describes how to add collations of the
-      first two types to existing character sets. All existing character
-      sets already have a binary collation, so there is no need here to
+      The following sections describe how to add collations of the first
+      two types to existing character sets. All existing character sets
+      already have a binary collation, so there is no need here to
       describe how to add one.
     </para>
 

@@ -8355,10 +8365,8 @@
       all the information required for a complete character set, just
       modify the appropriate files for an existing character set. That
       is, based on what is already present for the character set's
-      current collations, add new data structures, functions, and
-      configuration information for the new collation. For an example,
-      see the MySQL Blog article in the following list of additional
-      resources.
+      current collations, add data structures, functions, and
+      configuration information for the new collation.
     </para>
 
     <bridgehead>

@@ -8443,6 +8451,11 @@
 </programlisting>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-collation-simple-8bit"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Complex collations for 8-bit character
         sets</emphasis>
       </para>

@@ -8544,6 +8557,11 @@
       </itemizedlist>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-character-set"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Collations for Unicode multi-byte
         character sets</emphasis>
       </para>

@@ -8684,6 +8702,12 @@
       </para>
 
       <para>
+        For implementation instructions, for a non-UCA colluation, see
+        <xref linkend="adding-character-set"/>. For a UCA collation, see
+        <xref linkend="adding-collation-unicode-uca"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Miscellaneous collations</emphasis>
       </para>
 

@@ -8709,16 +8733,16 @@
 
         <listitem>
           <para>
-            The <literal>Id</literal> column of
-            <literal role="stmt">SHOW COLLATION</literal> output
+            The <literal>ID</literal> column of the
+            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
+            table
           </para>
         </listitem>
 
         <listitem>
           <para>
-            The <literal>ID</literal> column of the
-            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
-            table
+            The <literal>Id</literal> column of
+            <literal role="stmt">SHOW COLLATION</literal> output
           </para>
         </listitem>
 

@@ -8872,9 +8896,9 @@
             add a <literal>&lt;collation&gt;</literal> element that
             names the collation and that contains a
             <literal>&lt;map&gt;</literal> element that defines a
-            character code-to-weight mapping table. Each word within the
-            <literal>&lt;map&gt;</literal> element must be a number in
-            hexadecimal format.
+            character code-to-weight mapping table for character codes 0
+            to 255. Each value within the <literal>&lt;map&gt;</literal>
+            element must be a number in hexadecimal format.
           </para>
 
 <programlisting>

@@ -8931,8 +8955,8 @@
         <literal>&lt;collation&gt;</literal> element within a
         <literal>&lt;charset&gt;</literal> character set description.
         The procedure described here does not require recompiling MySQL.
-        It uses a subset of the Locale Data Markup Language (LDML),
-        which is available at
+        It uses a subset of the Locale Data Markup Language (LDML)
+        specification, which is available at
         <ulink url="http://www.unicode.org/reports/tr35/"/>. With this
         method, you need not define the entire collation. Instead, you
         begin with an existing <quote>base</quote> collation and

@@ -8940,12 +8964,13 @@
         base collation. The following table lists the base collations of
         the Unicode character sets for which UCA collations can be
         defined. It is not possible to create user-defined UCA
-        collations for <literal>utf16le</literal> because there is no
-        <literal>utf16le_unicode_ci</literal> collation, which would
-        serve as the basis for such collations.
+        collations for <literal>utf16le</literal>; there is no
+        <literal>utf16le_unicode_ci</literal> collation that would serve
+        as the basis for such collations.
       </para>
 
-      <informaltable>
+      <table>
+        <title>MySQL Character Sets Available for User-Defined UCA Collations</title>
         <tgroup cols="2">
           <colspec colwidth="30*"/>
           <colspec colwidth="60*"/>

@@ -8974,65 +8999,79 @@
             </row>
           </tbody>
         </tgroup>
-      </informaltable>
+      </table>
 
       <para>
-        The following brief summary describes the LDML characteristics
-        required to understand the procedure for adding a collation
-        given later in this section:
+        The following sections show how to add a collation that is
+        defined using LDML syntax, and provide a summary of LDML rules
+        supported in MySQL.
       </para>
 
-      <itemizedlist>
+      <section id="ldml-rules">
 
-        <listitem>
-          <para>
-            LDML has reset, shift, and identity rules.
-          </para>
-        </listitem>
+        <title>LDML Syntax Supported in MySQL</title>
 
-        <listitem>
-          <para>
-            Characters named in these rules can be written in
-            <literal>\u<replaceable>nnnn</replaceable></literal> format,
-            where <replaceable>nnnn</replaceable> is the hexadecimal
-            Unicode code point value. Basic Latin letters
-            <literal>A-Z</literal> and <literal>a-z</literal> can also
-            be written literally (this is a MySQL limitation; the LDML
-            specification permits literal non-Latin1 characters in the
-            rules). Only characters in the Basic Multilingual Plane can
-            be specified. This notation does not apply to characters
-            outside the BMP range of <literal>0000</literal> to
-            <literal>FFFF</literal>.
-          </para>
-        </listitem>
+        <para>
+          This section describes the LDML rules that MySQL recognizes.
+          These are a subset of the rules described in the LDML
+          specification available at
+          <ulink url="http://www.unicode.org/reports/tr35/"/>. The rules
+          here are all supported except that character sorting occurs
+          only at the primary level. Rules that specify secondary or
+          higher sort levels are recognized but have no effect.
+        </para>
 
-        <listitem>
-          <para>
-            A reset rule does not specify any ordering in and of itself.
-            Instead, it <quote>resets</quote> the ordering for
-            subsequent shift rules to cause them to be taken in relation
-            to a given character. Either of the following rules resets
-            subsequent shift rules to be taken in relation to the letter
-            <literal>'A'</literal>:
-          </para>
+        <itemizedlist>
 
+          <listitem>
+            <para>
+              Characters named in LDML rules can be written in
+              <literal>\u<replaceable>nnnn</replaceable></literal>
+              format, where <replaceable>nnnn</replaceable> is the
+              hexadecimal Unicode code point value. Basic Latin letters
+              <literal>A-Z</literal> and <literal>a-z</literal> can also
+              be written literally (this is a MySQL limitation; the LDML
+              specification permits literal non-Latin1 characters in the
+              rules). Only characters in the Basic Multilingual Plane
+              can be specified. This notation does not apply to
+              characters outside the BMP range of
+              <literal>0000</literal> to <literal>FFFF</literal>.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              LDML has reset rules and shift rules.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              A reset rule does not specify any ordering in and of
+              itself. Instead, it <quote>resets</quote> the ordering for
+              subsequent shift rules to cause them to be taken in
+              relation to a given character. Either of the following
+              rules resets subsequent shift rules to be taken in
+              relation to the letter <literal>'A'</literal>:
+            </para>
+
 <programlisting>
 &lt;reset&gt;A&lt;/reset&gt;
 
 &lt;reset&gt;\u0041&lt;/reset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Shift rules define primary, secondary, and tertiary
-            differences of a character from another character. They are
-            specified using <literal>&lt;p&gt;</literal>,
-            <literal>&lt;s&gt;</literal>, and
-            <literal>&lt;t&gt;</literal> elements. Either of the
-            following rules specifies a primary shift rule for the
-            <literal>'G'</literal> character:
-          </para>
+          <listitem>
+            <para>
+              Shift rules define primary, secondary, and tertiary
+              differences of a character from another character. They
+              are specified using <literal>&lt;p&gt;</literal>,
+              <literal>&lt;s&gt;</literal>, and
+              <literal>&lt;t&gt;</literal> elements. Either of the
+              following rules specifies a primary shift rule for the
+              <literal>'G'</literal> character:
+            </para>
 
 <programlisting>
 &lt;p&gt;G&lt;/p&gt;

@@ -9040,76 +9079,91 @@
 &lt;p&gt;\u0047&lt;/p&gt;
 </programlisting>
 
-          <itemizedlist>
+            <itemizedlist>
 
-            <listitem>
-              <para>
-                Use primary differences to distinguish separate letters.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use primary differences to distinguish separate
+                  letters.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use secondary differences to distinguish accent
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use secondary differences to distinguish accent
+                  variations.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use tertiary differences to distinguish lettercase
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use tertiary differences to distinguish lettercase
+                  variations.
+                </para>
+              </listitem>
 
-          </itemizedlist>
-        </listitem>
+            </itemizedlist>
+          </listitem>
 
-        <listitem>
-          <para>
-            Identity rules indicate that one character sorts identically
-            to another. The following rules cause <literal>'b'</literal>
-            sort the same as <literal>'a'</literal>:
-          </para>
+          <listitem>
+            <para>
+              Identity rules indicate that one character sorts
+              identically to another. The following rules cause
+              <literal>'b'</literal> sort the same as
+              <literal>'a'</literal>:
+            </para>
 
 <programlisting>
 &lt;reset&gt;a&lt;/reset&gt;
 &lt;i&gt;b&lt;/i&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            In MySQL &current-series;, an extension to LDML rules is
-            that the <literal>&lt;collation&gt;</literal> element
-            permits an optional <literal>version</literal> attribute in
-            <literal>&lt;collation&gt;</literal> tags to indicate the
-            UCA version on which the collation is based. If the
-            <literal>version</literal> attribute is omitted, its default
-            value is <literal>4.0.0</literal>. For example, the
-            following specification indicates that the collation is
-            based on UCA 5.2.0:
-          </para>
+          <listitem>
+            <para>
+              In MySQL &current-series;, an extension to LDML rules is
+              that the <literal>&lt;collation&gt;</literal> element
+              permits an optional <literal>version</literal> attribute
+              in <literal>&lt;collation&gt;</literal> tags to indicate
+              the UCA version on which the collation is based. If the
+              <literal>version</literal> attribute is omitted, its
+              default value is <literal>4.0.0</literal>. For example,
+              the following specification indicates that the collation
+              is based on UCA 5.2.0:
+            </para>
 
 <programlisting>
 &lt;collation id="<replaceable>nnn</replaceable>" name="utf8_<replaceable>xxx</replaceable>_ci" version="5.2.0"&gt;
 ...
 &lt;/collation&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-      </itemizedlist>
+        </itemizedlist>
 
-      <para>
-        To add a UCA collation for a Unicode character set without
-        recompiling MySQL, use the following procedure. The example adds
-        a collation named <literal>utf8_phone_ci</literal> to the
-        <literal>utf8</literal> character set. The collation is designed
-        for a scenario involving a Web application for which users post
-        their names and phone numbers. Phone numbers can be given in
-        very different formats:
-      </para>
+      </section>
 
+      <section id="ldml-collation-example">
+
+        <title>Defining a UCA Collation using LDML Syntax</title>
+
+        <para>
+          To add a UCA collation for a Unicode character set without
+          recompiling MySQL, use the following procedure. If you are
+          unfamiliar with the LDML rules used to describe the
+          collation's sort characteristics, see
+          <xref linkend="ldml-rules"/>.
+        </para>
+
+        <para>
+          The example adds a collation named
+          <literal>utf8_phone_ci</literal> to the
+          <literal>utf8</literal> character set. The collation is
+          designed for a scenario involving a Web application for which
+          users post their names and phone numbers. Phone numbers can be
+          given in very different formats:
+        </para>
+
 <programlisting>
 +7-12345-67
 +7-12-345-67

@@ -9118,33 +9172,33 @@
 +71234567
 </programlisting>
 
-      <para>
-        The problem raised by dealing with these kinds of values is that
-        the varying permissible formats make searching for a specific
-        phone number very difficult. The solution is to define a new
-        collation that reorders punctuation characters, making them
-        ignorable.
-      </para>
+        <para>
+          The problem raised by dealing with these kinds of values is
+          that the varying permissible formats make searching for a
+          specific phone number very difficult. The solution is to
+          define a new collation that reorders punctuation characters,
+          making them ignorable.
+        </para>
 
-      <orderedlist>
+        <orderedlist>
 
-        <listitem>
-          <para>
-            Choose a collation ID, as shown in
-            <xref linkend="adding-collation-choosing-id"/>. The
-            following steps use an ID of 1029.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              Choose a collation ID, as shown in
+              <xref linkend="adding-collation-choosing-id"/>. The
+              following steps use an ID of 1029.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            To modify the <literal>Index.xml</literal> configuration
-            file. This file will be located in the directory named by
-            the <literal role="sysvar">character_sets_dir</literal>
-            system variable. You can check the variable value as
-            follows, although the path name might be different on your
-            system:
-          </para>
+          <listitem>
+            <para>
+              To modify the <literal>Index.xml</literal> configuration
+              file. This file will be located in the directory named by
+              the <literal role="sysvar">character_sets_dir</literal>
+              system variable. You can check the variable value as
+              follows, although the path name might be different on your
+              system:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW VARIABLES LIKE 'character_sets_dir';</userinput>

@@ -9154,21 +9208,22 @@
 | character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
 +--------------------+-----------------------------------------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Choose a name for the collation and list it in the
-            <filename>Index.xml</filename> file. In addition, you'll
-            need to provide the collation ordering rules. Find the
-            <literal>&lt;charset&gt;</literal> element for the character
-            set to which the collation is being added, and add a
-            <literal>&lt;collation&gt;</literal> element that indicates
-            the collation name and ID, to associate the name with the
-            ID. Within the <literal>&lt;collation&gt;</literal> element,
-            provide a <literal>&lt;rules&gt;</literal> element
-            containing the ordering rules:
-          </para>
+          <listitem>
+            <para>
+              Choose a name for the collation and list it in the
+              <filename>Index.xml</filename> file. In addition, you'll
+              need to provide the collation ordering rules. Find the
+              <literal>&lt;charset&gt;</literal> element for the
+              character set to which the collation is being added, and
+              add a <literal>&lt;collation&gt;</literal> element that
+              indicates the collation name and ID, to associate the name
+              with the ID. Within the
+              <literal>&lt;collation&gt;</literal> element, provide a
+              <literal>&lt;rules&gt;</literal> element containing the
+              ordering rules:
+            </para>
 
 <programlisting>
 &lt;charset name="utf8"&gt;

@@ -9186,25 +9241,25 @@
   ...
 &lt;/charset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            If you want a similar collation for other Unicode character
-            sets, add other <literal>&lt;collation&gt;</literal>
-            elements. For example, to define
-            <literal>ucs2_phone_ci</literal>, add a
-            <literal>&lt;collation&gt;</literal> element to the
-            <literal>&lt;charset name="ucs2"&gt;</literal> element.
-            Remember that each collation must have its own unique ID.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              If you want a similar collation for other Unicode
+              character sets, add other
+              <literal>&lt;collation&gt;</literal> elements. For
+              example, to define <literal>ucs2_phone_ci</literal>, add a
+              <literal>&lt;collation&gt;</literal> element to the
+              <literal>&lt;charset name="ucs2"&gt;</literal> element.
+              Remember that each collation must have its own unique ID.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            Restart the server and use this statement to verify that the
-            collation is present:
-          </para>
+          <listitem>
+            <para>
+              Restart the server and use this statement to verify that
+              the collation is present:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW COLLATION LIKE 'utf8_phone_ci';</userinput>

@@ -9214,19 +9269,19 @@
 | utf8_phone_ci | utf8    | 1029 |         |          |       8 |
 +---------------+---------+------+---------+----------+---------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-      </orderedlist>
+        </orderedlist>
 
-      <para>
-        Now test the collation to make sure that it has the desired
-        properties.
-      </para>
+        <para>
+          Now test the collation to make sure that it has the desired
+          properties.
+        </para>
 
-      <para>
-        Create a table containing some sample phone numbers using the
-        new collation:
-      </para>
+        <para>
+          Create a table containing some sample phone numbers using the
+          new collation:
+        </para>
 
 <programlisting>
 <!--

@@ -9255,10 +9310,10 @@
 Query OK, 1 row affected (0.00 sec)
 </programlisting>
 
-      <para>
-        Run some queries to see whether the ignored punctuation
-        characters are in fact ignored for sorting and comparisons:
-      </para>
+        <para>
+          Run some queries to see whether the ignored punctuation
+          characters are in fact ignored for sorting and comparisons:
+        </para>
 
 <programlisting>
 mysql&gt; <userinput>SELECT * FROM phonebook ORDER BY phone;</userinput>

@@ -9298,6 +9353,8 @@
 1 row in set (0.00 sec)
 </programlisting>
 
+      </section>
+
     </section>
 
   </section>


Modified: trunk/refman-6.0/internationalization.xml
===================================================================
--- trunk/refman-6.0/internationalization.xml	2011-05-13 16:12:35 UTC (rev 26217)
+++ trunk/refman-6.0/internationalization.xml	2011-05-13 17:41:11 UTC (rev 26218)
Changed blocks: 29, Lines Added: 267, Lines Deleted: 210; 27891 bytes

@@ -7923,9 +7923,9 @@
 
       <listitem>
         <para>
-          If the character set does not need to use special string
-          collating routines for sorting and does not need multi-byte
-          character support, it is simple.
+          If the character set does not need special string collating
+          routines for sorting and does not need multi-byte character
+          support, it is simple.
         </para>
       </listitem>
 

@@ -7959,7 +7959,8 @@
           <replaceable>MYSET</replaceable> to the
           <filename>sql/share/charsets/Index.xml</filename> file. Use
           the existing contents in the file as a guide to adding new
-          contents.
+          contents. A partial listing for the <literal>latin1</literal>
+          <literal>&lt;charset&gt;</literal> element follows:
         </para>
 
 <programlisting>

@@ -7973,14 +7974,19 @@
   &lt;/collation&gt;
   &lt;collation name="latin1_danish_ci"	id="15"	order="Danish"/&gt;
   ...
+  &lt;collation name="latin1_bin"		id="47"	order="Binary"&gt;
+    &lt;flag&gt;binary&lt;/flag&gt;
+    &lt;flag&gt;compiled&lt;/flag&gt;
+  &lt;/collation&gt;
+  ...
 &lt;/charset&gt;
 </programlisting>
 
         <para>
           The <literal>&lt;charset&gt;</literal> element must list all
           the collations for the character set. These must include at
-          least a binary collation and a default collation. The default
-          collation is usually named using a suffix of
+          least a binary collation and a default (primary) collation.
+          The default collation is often named using a suffix of
           <literal>general_ci</literal> (general, case insensitive). It
           is possible for the binary collation to be the default
           collation, but usually they are different. The default

@@ -8073,7 +8079,7 @@
           character set:
         </para>
 
-        <orderedlist>
+        <itemizedlist>
 
           <listitem>
             <para>

@@ -8115,7 +8121,7 @@
             </para>
           </listitem>
 
-        </orderedlist>
+        </itemizedlist>
       </listitem>
 
       <listitem>

@@ -8261,7 +8267,8 @@
 
       <para>
         Each simple character set has a configuration file located in
-        the <filename>sql/share/charsets</filename> directory. The file
+        the <filename>sql/share/charsets</filename> directory. For a
+        character set named <replaceable>MYSYS</replaceable>, the file
         is named
         <filename><replaceable>MYSET</replaceable>.xml</filename>. It
         uses <literal>&lt;map&gt;</literal> array elements to list

@@ -8313,17 +8320,18 @@
         <literal>ctype_<replaceable>MYSET</replaceable>[]</literal>,
         <literal>to_lower_<replaceable>MYSET</replaceable>[]</literal>,
         and so forth. Not every complex character set has all of the
-        arrays. See the existing <filename>ctype-*.c</filename> files
-        for examples. See the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
+        arrays. See also the existing <filename>ctype-*.c</filename>
+        files for examples. See the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
         information.
       </para>
 
       <para>
-        The <literal>&lt;ctype&gt;</literal> array is indexed by
-        character value + 1 and has 257 elements. This is a legacy
-        convention for handling <literal>EOF</literal>. The other arrays
-        are indexed by character value and have 256 elements.
+        Most of the arrays are indexed by character value and have 256
+        elements. The <literal>&lt;ctype&gt;</literal> array is indexed
+        by character value + 1 and has 257 elements. This is a legacy
+        convention for handling <literal>EOF</literal>.
       </para>
 
       <para>

@@ -8378,14 +8386,14 @@
 </programlisting>
 
       <para>
-        Each <literal>&lt;collation&gt;</literal> element contains a
-        mapping array that indicates how characters should be ordered
-        for comparison and sorting purposes. MySQL sorts characters
-        based on the values of this information. In some cases, this is
-        the same as the <literal>upper</literal> array, which means that
-        sorting is case-insensitive. For more complicated sorting rules
-        (for complex character sets), see the discussion of string
-        collating in <xref linkend="string-collating"/>.
+        Each <literal>&lt;collation&gt;</literal> array indicates how
+        characters should be ordered for comparison and sorting
+        purposes. MySQL sorts characters based on the values of this
+        information. In some cases, this is the same as the
+        <literal>&lt;upper&gt;</literal> array, which means that sorting
+        is case-insensitive. For more complicated sorting rules (for
+        complex character sets), see the discussion of string collating
+        in <xref linkend="string-collating"/>.
       </para>
 
     </section>

@@ -8404,12 +8412,13 @@
       </indexterm>
 
       <para>
-        For simple character sets, sorting rules are specified in the
-        <filename><replaceable>MYSET</replaceable>.xml</filename>
+        For a simple character set named
+        <replaceable>MYSET</replaceable>, sorting rules are specified in
+        the <filename><replaceable>MYSET</replaceable>.xml</filename>
         configuration file using <literal>&lt;map&gt;</literal> array
         elements within <literal>&lt;collation&gt;</literal> elements.
         If the sorting rules for your language are too complex to be
-        handled with simple arrays, you need to define string collating
+        handled with simple arrays, you must define string collating
         functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.

@@ -8424,9 +8433,10 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>tis160</literal> character sets. Take a look at the
         <literal>MY_COLLATION_HANDLER</literal> structures to see how
-        they are used, and see the <filename>CHARSET_INFO.txt</filename>
-        file in the <filename>strings</filename> directory for
-        additional information.
+        they are used. See also the
+        <filename>CHARSET_INFO.txt</filename> file in the
+        <filename>strings</filename> directory for additional
+        information.
       </para>
 
     </section>

@@ -8445,9 +8455,9 @@
       </indexterm>
 
       <para>
-        If you want to add support for a new character set that includes
-        multi-byte characters, you need to use multi-byte character
-        functions in the
+        If you want to add support for a new character set named
+        <replaceable>MYSET</replaceable> that includes multi-byte
+        characters, you must use multi-byte character functions in the
         <filename>ctype-<replaceable>MYSET</replaceable>.c</filename>
         source file in the <filename>strings</filename> directory.
       </para>

@@ -8461,9 +8471,9 @@
         <literal>gbk</literal>, <literal>sjis</literal>, and
         <literal>ujis</literal> character sets. Take a look at the
         <literal>MY_CHARSET_HANDLER</literal> structures to see how they
-        are used, and see the <filename>CHARSET_INFO.txt</filename> file
-        in the <filename>strings</filename> directory for additional
-        information.
+        are used. See also the <filename>CHARSET_INFO.txt</filename>
+        file in the <filename>strings</filename> directory for
+        additional information.
       </para>
 
     </section>

@@ -8550,9 +8560,9 @@
     </itemizedlist>
 
     <para>
-      The following discussion describes how to add collations of the
-      first two types to existing character sets. All existing character
-      sets already have a binary collation, so there is no need here to
+      The following sections describe how to add collations of the first
+      two types to existing character sets. All existing character sets
+      already have a binary collation, so there is no need here to
       describe how to add one.
     </para>
 

@@ -8598,10 +8608,8 @@
       all the information required for a complete character set, just
       modify the appropriate files for an existing character set. That
       is, based on what is already present for the character set's
-      current collations, add new data structures, functions, and
-      configuration information for the new collation. For an example,
-      see the MySQL Blog article in the following list of additional
-      resources.
+      current collations, add data structures, functions, and
+      configuration information for the new collation.
     </para>
 
     <bridgehead>

@@ -8686,6 +8694,11 @@
 </programlisting>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-collation-simple-8bit"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Complex collations for 8-bit character
         sets</emphasis>
       </para>

@@ -8787,6 +8800,11 @@
       </itemizedlist>
 
       <para>
+        For implementation instructions, see
+        <xref linkend="adding-character-set"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Collations for Unicode multi-byte
         character sets</emphasis>
       </para>

@@ -8927,6 +8945,12 @@
       </para>
 
       <para>
+        For implementation instructions, for a non-UCA colluation, see
+        <xref linkend="adding-character-set"/>. For a UCA collation, see
+        <xref linkend="adding-collation-unicode-uca"/>.
+      </para>
+
+      <para>
         <emphasis role="bold">Miscellaneous collations</emphasis>
       </para>
 

@@ -8954,16 +8978,16 @@
 
         <listitem>
           <para>
-            The <literal>Id</literal> column of
-            <literal role="stmt">SHOW COLLATION</literal> output
+            The <literal>ID</literal> column of the
+            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
+            table
           </para>
         </listitem>
 
         <listitem>
           <para>
-            The <literal>ID</literal> column of the
-            <literal role="is">INFORMATION_SCHEMA.COLLATIONS</literal>
-            table
+            The <literal>Id</literal> column of
+            <literal role="stmt">SHOW COLLATION</literal> output
           </para>
         </listitem>
 

@@ -9117,9 +9141,9 @@
             add a <literal>&lt;collation&gt;</literal> element that
             names the collation and that contains a
             <literal>&lt;map&gt;</literal> element that defines a
-            character code-to-weight mapping table. Each word within the
-            <literal>&lt;map&gt;</literal> element must be a number in
-            hexadecimal format.
+            character code-to-weight mapping table for character codes 0
+            to 255. Each value within the <literal>&lt;map&gt;</literal>
+            element must be a number in hexadecimal format.
           </para>
 
 <programlisting>

@@ -9176,8 +9200,8 @@
         <literal>&lt;collation&gt;</literal> element within a
         <literal>&lt;charset&gt;</literal> character set description.
         The procedure described here does not require recompiling MySQL.
-        It uses a subset of the Locale Data Markup Language (LDML),
-        which is available at
+        It uses a subset of the Locale Data Markup Language (LDML)
+        specification, which is available at
         <ulink url="http://www.unicode.org/reports/tr35/"/>. In
         &current-series;, this method of adding collations is supported
         as of MySQL 6.0.4. With this method, you need not define the

@@ -9188,7 +9212,8 @@
         for which UCA collations can be defined.
       </para>
 
-      <informaltable>
+      <table>
+        <title>MySQL Character Sets Available for User-Defined UCA Collations</title>
         <tgroup cols="2">
           <colspec colwidth="30*"/>
           <colspec colwidth="60*"/>

@@ -9217,65 +9242,79 @@
             </row>
           </tbody>
         </tgroup>
-      </informaltable>
+      </table>
 
       <para>
-        The following brief summary describes the LDML characteristics
-        required to understand the procedure for adding a collation
-        given later in this section:
+        The following sections show how to add a collation that is
+        defined using LDML syntax, and provide a summary of LDML rules
+        supported in MySQL.
       </para>
 
-      <itemizedlist>
+      <section id="ldml-rules">
 
-        <listitem>
-          <para>
-            LDML has reset, shift, and identity rules.
-          </para>
-        </listitem>
+        <title>LDML Syntax Supported in MySQL</title>
 
-        <listitem>
-          <para>
-            Characters named in these rules can be written in
-            <literal>\u<replaceable>nnnn</replaceable></literal> format,
-            where <replaceable>nnnn</replaceable> is the hexadecimal
-            Unicode code point value. Basic Latin letters
-            <literal>A-Z</literal> and <literal>a-z</literal> can also
-            be written literally (this is a MySQL limitation; the LDML
-            specification permits literal non-Latin1 characters in the
-            rules). Only characters in the Basic Multilingual Plane can
-            be specified. This notation does not apply to characters
-            outside the BMP range of <literal>0000</literal> to
-            <literal>FFFF</literal>.
-          </para>
-        </listitem>
+        <para>
+          This section describes the LDML rules that MySQL recognizes.
+          These are a subset of the rules described in the LDML
+          specification available at
+          <ulink url="http://www.unicode.org/reports/tr35/"/>. The rules
+          here are all supported except that character sorting occurs
+          only at the primary level. Rules that specify secondary or
+          higher sort levels are recognized but have no effect.
+        </para>
 
-        <listitem>
-          <para>
-            A reset rule does not specify any ordering in and of itself.
-            Instead, it <quote>resets</quote> the ordering for
-            subsequent shift rules to cause them to be taken in relation
-            to a given character. Either of the following rules resets
-            subsequent shift rules to be taken in relation to the letter
-            <literal>'A'</literal>:
-          </para>
+        <itemizedlist>
 
+          <listitem>
+            <para>
+              Characters named in LDML rules can be written in
+              <literal>\u<replaceable>nnnn</replaceable></literal>
+              format, where <replaceable>nnnn</replaceable> is the
+              hexadecimal Unicode code point value. Basic Latin letters
+              <literal>A-Z</literal> and <literal>a-z</literal> can also
+              be written literally (this is a MySQL limitation; the LDML
+              specification permits literal non-Latin1 characters in the
+              rules). Only characters in the Basic Multilingual Plane
+              can be specified. This notation does not apply to
+              characters outside the BMP range of
+              <literal>0000</literal> to <literal>FFFF</literal>.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              LDML has reset rules and shift rules.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              A reset rule does not specify any ordering in and of
+              itself. Instead, it <quote>resets</quote> the ordering for
+              subsequent shift rules to cause them to be taken in
+              relation to a given character. Either of the following
+              rules resets subsequent shift rules to be taken in
+              relation to the letter <literal>'A'</literal>:
+            </para>
+
 <programlisting>
 &lt;reset&gt;A&lt;/reset&gt;
 
 &lt;reset&gt;\u0041&lt;/reset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Shift rules define primary, secondary, and tertiary
-            differences of a character from another character. They are
-            specified using <literal>&lt;p&gt;</literal>,
-            <literal>&lt;s&gt;</literal>, and
-            <literal>&lt;t&gt;</literal> elements. Either of the
-            following rules specifies a primary shift rule for the
-            <literal>'G'</literal> character:
-          </para>
+          <listitem>
+            <para>
+              Shift rules define primary, secondary, and tertiary
+              differences of a character from another character. They
+              are specified using <literal>&lt;p&gt;</literal>,
+              <literal>&lt;s&gt;</literal>, and
+              <literal>&lt;t&gt;</literal> elements. Either of the
+              following rules specifies a primary shift rule for the
+              <literal>'G'</literal> character:
+            </para>
 
 <programlisting>
 &lt;p&gt;G&lt;/p&gt;

@@ -9283,62 +9322,77 @@
 &lt;p&gt;\u0047&lt;/p&gt;
 </programlisting>
 
-          <itemizedlist>
+            <itemizedlist>
 
-            <listitem>
-              <para>
-                Use primary differences to distinguish separate letters.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use primary differences to distinguish separate
+                  letters.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use secondary differences to distinguish accent
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use secondary differences to distinguish accent
+                  variations.
+                </para>
+              </listitem>
 
-            <listitem>
-              <para>
-                Use tertiary differences to distinguish lettercase
-                variations.
-              </para>
-            </listitem>
+              <listitem>
+                <para>
+                  Use tertiary differences to distinguish lettercase
+                  variations.
+                </para>
+              </listitem>
 
-          </itemizedlist>
-        </listitem>
+            </itemizedlist>
+          </listitem>
 
-        <listitem>
-          <para>
-            Identity rules indicate that one character sorts identically
-            to another. The following rules cause <literal>'b'</literal>
-            sort the same as <literal>'a'</literal>:
-          </para>
+          <listitem>
+            <para>
+              Identity rules indicate that one character sorts
+              identically to another. The following rules cause
+              <literal>'b'</literal> sort the same as
+              <literal>'a'</literal>:
+            </para>
 
 <programlisting>
 &lt;reset&gt;a&lt;/reset&gt;
 &lt;i&gt;b&lt;/i&gt;
 </programlisting>
 
-          <para>
-            Identity rules are supported as of MySQL 6.0.9. Prior to
-            6.0.9, use <literal>&lt;s&gt; ... &lt;/s&gt;</literal>
-            instead.
-          </para>
-        </listitem>
+            <para>
+              Identity rules are supported as of MySQL 6.0.9. Prior to
+              6.0.9, use <literal>&lt;s&gt; ... &lt;/s&gt;</literal>
+              instead.
+            </para>
+          </listitem>
 
-      </itemizedlist>
+        </itemizedlist>
 
-      <para>
-        To add a UCA collation for a Unicode character set without
-        recompiling MySQL, use the following procedure. The example adds
-        a collation named <literal>utf8_phone_ci</literal> to the
-        <literal>utf8</literal> character set. The collation is designed
-        for a scenario involving a Web application for which users post
-        their names and phone numbers. Phone numbers can be given in
-        very different formats:
-      </para>
+      </section>
 
+      <section id="ldml-collation-example">
+
+        <title>Defining a UCA Collation using LDML Syntax</title>
+
+        <para>
+          To add a UCA collation for a Unicode character set without
+          recompiling MySQL, use the following procedure. If you are
+          unfamiliar with the LDML rules used to describe the
+          collation's sort characteristics, see
+          <xref linkend="ldml-rules"/>.
+        </para>
+
+        <para>
+          The example adds a collation named
+          <literal>utf8_phone_ci</literal> to the
+          <literal>utf8</literal> character set. The collation is
+          designed for a scenario involving a Web application for which
+          users post their names and phone numbers. Phone numbers can be
+          given in very different formats:
+        </para>
+
 <programlisting>
 +7-12345-67
 +7-12-345-67

@@ -9347,33 +9401,33 @@
 +71234567
 </programlisting>
 
-      <para>
-        The problem raised by dealing with these kinds of values is that
-        the varying permissible formats make searching for a specific
-        phone number very difficult. The solution is to define a new
-        collation that reorders punctuation characters, making them
-        ignorable.
-      </para>
+        <para>
+          The problem raised by dealing with these kinds of values is
+          that the varying permissible formats make searching for a
+          specific phone number very difficult. The solution is to
+          define a new collation that reorders punctuation characters,
+          making them ignorable.
+        </para>
 
-      <orderedlist>
+        <orderedlist>
 
-        <listitem>
-          <para>
-            Choose a collation ID, as shown in
-            <xref linkend="adding-collation-choosing-id"/>. The
-            following steps use an ID of 1029.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              Choose a collation ID, as shown in
+              <xref linkend="adding-collation-choosing-id"/>. The
+              following steps use an ID of 1029.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            To modify the <literal>Index.xml</literal> configuration
-            file. This file will be located in the directory named by
-            the <literal role="sysvar">character_sets_dir</literal>
-            system variable. You can check the variable value as
-            follows, although the path name might be different on your
-            system:
-          </para>
+          <listitem>
+            <para>
+              To modify the <literal>Index.xml</literal> configuration
+              file. This file will be located in the directory named by
+              the <literal role="sysvar">character_sets_dir</literal>
+              system variable. You can check the variable value as
+              follows, although the path name might be different on your
+              system:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW VARIABLES LIKE 'character_sets_dir';</userinput>

@@ -9383,21 +9437,22 @@
 | character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
 +--------------------+-----------------------------------------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            Choose a name for the collation and list it in the
-            <filename>Index.xml</filename> file. In addition, you'll
-            need to provide the collation ordering rules. Find the
-            <literal>&lt;charset&gt;</literal> element for the character
-            set to which the collation is being added, and add a
-            <literal>&lt;collation&gt;</literal> element that indicates
-            the collation name and ID, to associate the name with the
-            ID. Within the <literal>&lt;collation&gt;</literal> element,
-            provide a <literal>&lt;rules&gt;</literal> element
-            containing the ordering rules:
-          </para>
+          <listitem>
+            <para>
+              Choose a name for the collation and list it in the
+              <filename>Index.xml</filename> file. In addition, you'll
+              need to provide the collation ordering rules. Find the
+              <literal>&lt;charset&gt;</literal> element for the
+              character set to which the collation is being added, and
+              add a <literal>&lt;collation&gt;</literal> element that
+              indicates the collation name and ID, to associate the name
+              with the ID. Within the
+              <literal>&lt;collation&gt;</literal> element, provide a
+              <literal>&lt;rules&gt;</literal> element containing the
+              ordering rules:
+            </para>
 
 <programlisting>
 &lt;charset name="utf8"&gt;

@@ -9415,25 +9470,25 @@
   ...
 &lt;/charset&gt;
 </programlisting>
-        </listitem>
+          </listitem>
 
-        <listitem>
-          <para>
-            If you want a similar collation for other Unicode character
-            sets, add other <literal>&lt;collation&gt;</literal>
-            elements. For example, to define
-            <literal>ucs2_phone_ci</literal>, add a
-            <literal>&lt;collation&gt;</literal> element to the
-            <literal>&lt;charset name="ucs2"&gt;</literal> element.
-            Remember that each collation must have its own unique ID.
-          </para>
-        </listitem>
+          <listitem>
+            <para>
+              If you want a similar collation for other Unicode
+              character sets, add other
+              <literal>&lt;collation&gt;</literal> elements. For
+              example, to define <literal>ucs2_phone_ci</literal>, add a
+              <literal>&lt;collation&gt;</literal> element to the
+              <literal>&lt;charset name="ucs2"&gt;</literal> element.
+              Remember that each collation must have its own unique ID.
+            </para>
+          </listitem>
 
-        <listitem>
-          <para>
-            Restart the server and use this statement to verify that the
-            collation is present:
-          </para>
+          <listitem>
+            <para>
+              Restart the server and use this statement to verify that
+              the collation is present:
+            </para>
 
 <programlisting>
 mysql&gt; <userinput>SHOW COLLATION LIKE 'utf8_phone_ci';</userinput>

@@ -9443,19 +9498,19 @@
 | utf8_phone_ci | utf8    | 1029 |         |          |       8 |
 +---------------+---------+------+---------+----------+---------+
 </programlisting>
-        </listitem>
+          </listitem>
 
-      </orderedlist>
+        </orderedlist>
 
-      <para>
-        Now test the collation to make sure that it has the desired
-        properties.
-      </para>
+        <para>
+          Now test the collation to make sure that it has the desired
+          properties.
+        </para>
 
-      <para>
-        Create a table containing some sample phone numbers using the
-        new collation:
-      </para>
+        <para>
+          Create a table containing some sample phone numbers using the
+          new collation:
+        </para>
 
 <programlisting>
 <!--

@@ -9484,10 +9539,10 @@
 Query OK, 1 row affected (0.00 sec)
 </programlisting>
 
-      <para>
-        Run some queries to see whether the ignored punctuation
-        characters are in fact ignored for sorting and comparisons:
-      </para>
+        <para>
+          Run some queries to see whether the ignored punctuation
+          characters are in fact ignored for sorting and comparisons:
+        </para>
 
 <programlisting>
 mysql&gt; <userinput>SELECT * FROM phonebook ORDER BY phone;</userinput>

@@ -9527,6 +9582,8 @@
 1 row in set (0.00 sec)
 </programlisting>
 
+      </section>
+
     </section>
 
   </section>


Thread
svn commit - mysqldoc@oter02: r26218 - in trunk: . refman-5.0 refman-5.1 refman-5.5 refman-5.6 refman-6.0paul.dubois13 May