List:Commits« Previous MessageNext Message »
From:paul Date:May 22 2008 6:17pm
Subject:svn commit - mysqldoc@docsrva: r10823 - in trunk: . refman-6.0
View as plain text  
Author: paul
Date: 2008-05-22 20:17:36 +0200 (Thu, 22 May 2008)
New Revision: 10823

Log:
 r31651@frost:  paul | 2008-05-22 13:15:57 -0500
 adding-a-collation revisions


Modified:
   trunk/refman-6.0/collation-tmp.xml

Property changes on: trunk
___________________________________________________________________
Name: svk:merge
   - 4767c598-dc10-0410-bea0-d01b485662eb:/mysqldoc-local/mysqldoc/trunk:35828
7d8d2c4e-af1d-0410-ab9f-b038ce55645b:/mysqldoc-local/mysqldoc:31645
b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:14218
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:31428
   + 4767c598-dc10-0410-bea0-d01b485662eb:/mysqldoc-local/mysqldoc/trunk:35828
7d8d2c4e-af1d-0410-ab9f-b038ce55645b:/mysqldoc-local/mysqldoc:31651
b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:14218
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:31428


Modified: trunk/refman-6.0/collation-tmp.xml
===================================================================
--- trunk/refman-6.0/collation-tmp.xml	2008-05-22 17:02:16 UTC (rev 10822)
+++ trunk/refman-6.0/collation-tmp.xml	2008-05-22 18:17:36 UTC (rev 10823)
Changed blocks: 12, Lines Added: 251, Lines Deleted: 174; 17371 bytes

@@ -20,11 +20,10 @@
   </para>
 
   <para>
-    Collations order characters based on weights. For each character in
-    a character set, the character code maps to a weight. Characters
-    with equal weights compare as equal, and characters with unequal
-    weights compare according to the relative magnitude of their
-    weights.
+    A collation orders characters based on weights. Each character in a
+    character set maps to a weight. Characters with equal weights
+    compare as equal, and characters with unequal weights compare
+    according to the relative magnitude of their weights.
   </para>
 
   <remark role="todo">

@@ -36,16 +35,88 @@
     used to see the weights for the characters in a string. It returns a
     binary string that indicates the weights, so it is convenient to use
     <literal>HEX(WEIGHT_STRING(<replaceable>str</replaceable>))</literal>
-    to display the weights in printable form.
+    to display the weights in printable form. The following example
+    shows that weights do not differ for lettercase for
+    <literal>'AaBb'</literal> it if is a non-binary string, but do
+    differ if it is a binary string:
   </para>
 
+<programlisting>
+mysql&gt; <userinput>SELECT HEX(WEIGHT_STRING('AaBb'));</userinput>
++----------------------------+
+| HEX(WEIGHT_STRING('AaBb')) |
++----------------------------+
+| 41414242                   | 
++----------------------------+
+mysql&gt; <userinput>SELECT HEX(WEIGHT_STRING(BINARY 'AaBb'));</userinput>
++-----------------------------------+
+| HEX(WEIGHT_STRING(BINARY 'AaBb')) |
++-----------------------------------+
+| 41614262                          | 
++-----------------------------------+
+</programlisting>
+
   <para>
-    [SHOW SOME EXAMPLES]
+    MySQL implements several types of collations, as discussed in
+    <xref linkend="charset-collation-types"/>. Some types of collations
+    can be added to MySQL without recompiling:
   </para>
 
+  <itemizedlist>
+
+    <listitem>
+      <para>
+        Simple collations for 8-bit character sets
+      </para>
+    </listitem>
+
+    <listitem>
+      <para>
+        UCA-based collations for Unicode character sets
+      </para>
+    </listitem>
+
+    <listitem>
+      <para>
+        Binary (<literal><replaceable>xxx</replaceable>_bin</literal>)
+        collations
+      </para>
+    </listitem>
+
+  </itemizedlist>
+
   <para>
-    This section describes how to add a collation to an existing
-    character set. (If the character set does not exist, see
+    All existing character sets already have a binary collation, so
+    there is no need here to describe how to add one. The following
+    sections describe how to add collations of the first two types.
+
+    <remark role="todo">
+      4.1-only note, should replace previous sentence
+    </remark>
+
+    In MySQL 4.1, simple collations for 8-bit character sets can be
+    added without recompiling. To add a UCA-based Unicode collation,
+    MySQL 5.0 or higher is required.
+  </para>
+
+  <para>
+    To add a collation that does require recompiling (as implemented by
+    means of functions in a C source file), use the instructions in
+    <xref linkend="adding-character-set"/>. However, instead of adding
+    all the information required for a complete character set, just
+    modify the appropriate files for an existing character set. That is,
+    based on what is already present for the character set's current
+    collations, add new data structures, functions, and configuration
+    information for the new collation.
+  </para>
+
+  <remark role="todo">
+    MERGE prev with following
+  </remark>
+
+  <para>
+    The following discussion describes how to add a collation to an
+    existing character set. (If the character set does not exist, see
     <xref linkend="adding-character-set"/>.)
 
     <remark role="todo">

@@ -57,6 +128,46 @@
   </para>
 
   <para>
+    Summary of the procedure:
+  </para>
+
+  <orderedlist>
+
+    <listitem>
+      <para>
+        Choose a collation ID
+      </para>
+    </listitem>
+
+    <listitem>
+      <para>
+        Add configuration information that names the collation and
+        describes the character-ordering rules
+      </para>
+    </listitem>
+
+    <listitem>
+      <para>
+        Restart the server
+      </para>
+    </listitem>
+
+    <listitem>
+      <para>
+        Verify that the collation is present
+      </para>
+    </listitem>
+
+  </orderedlist>
+
+  <para>
+    The instructions cover only collations that can be added without
+    recompiling MySQL. For guidelines on adding collations that require
+    recompiling, see the MySQL Blog article in the following list of
+    additional resources.
+  </para>
+
+  <para>
     <emphasis role="bold">Additional resources</emphasis>
   </para>
 

@@ -83,36 +194,47 @@
       </para>
     </listitem>
 
+    <listitem>
+      <para>
+        MySQL Blog article <quote>Instructions for adding a new Unicode
+        collation</quote>:
+        <ulink url="http://blogs.mysql.com/peterg/2008/05/19/instructions-for-adding-a-new-unicode-collation/"/>
+      </para>
+    </listitem>
+
   </itemizedlist>
 
-  <section id="adding-collation-types">
+  <section id="charset-collation-types">
 
     <title>Types of Collations</title>
 
     <remark role="todo">
-      It might be that some of the following would be better placed in a
-      collation section of more general interest.
+      Ultimately, this section will be moved to the more general
+      character set discussion, but I have it here for now to keep all
+      the new material together.
     </remark>
 
     <para>
       MySQL includes several types of collation implementations:
     </para>
 
-    <itemizedlist>
+    <para>
+      <emphasis role="bold">Simple collations for 8-bit character
+      sets</emphasis>
+    </para>
 
-      <listitem>
-        <para>
-          Simple collations for 8-bit character sets
-        </para>
+    <para>
+      This kind of collation is implemented using an array of 256
+      weights that defines a one-to-one mapping from character codes to
+      weights. <literal>latin1_swedish_ci</literal> is an example. It is
+      a case-insensitive collation, so the weights for uppercase and
+      lowercase versions of a character are the same and they compare as
+      equal.
+    </para>
 
-        <para>
-          This kind of collation is implemented using an array of 256
-          weights that defines a one-to-one mapping from character codes
-          to weights. <literal>latin1_swedish_ci</literal> is an
-          example. It is a case-insensitive collation, so the weights
-          for uppercase and lowercase versions of a character are the
-          same and they compare as equal.
-        </para>
+    <remark role="todo">
+      pre-6.0 example
+    </remark>
 
 <programlisting>
 mysql&gt; <userinput>SET NAMES 'latin1' COLLATE 'latin1_swedish_ci';</userinput>

@@ -127,9 +249,9 @@
 1 row in set (0.00 sec)
 </programlisting>
 
-        <remark role="todo">
-          Example will not work pre-6.0.
-        </remark>
+    <remark role="todo">
+      6.0 example
+    </remark>
 
 <programlisting>
 mysql&gt; <userinput>SET NAMES 'latin1' COLLATE 'latin1_swedish_ci';</userinput>

@@ -151,45 +273,43 @@
 +-----------+
 1 row in set (0.12 sec)
 </programlisting>
-      </listitem>
 
-      <listitem>
-        <para>
-          Complex collations for 8-bit character sets
-        </para>
+    <para>
+      <emphasis role="bold">Complex collations for 8-bit character
+      sets</emphasis>
+    </para>
 
-        <para>
-          This kind of collation is implemented using functions in a C
-          source file that define how to order characters, as described
-          in <xref linkend="adding-character-set"/>.
-        </para>
-      </listitem>
+    <para>
+      This kind of collation is implemented using functions in a C
+      source file that define how to order characters, as described in
+      <xref linkend="adding-character-set"/>.
+    </para>
 
+    <para>
+      <emphasis role="bold">Collations for non-Unicode multi-byte
+      character sets</emphasis>
+    </para>
+
+    <para>
+      For characters in the ASCII range, character codes map to weights
+      in case-insensitive fashion. For multi-byte characters outside the
+      ASCII range, there are two types of relationship between character
+      codes and weights:
+    </para>
+
+    <itemizedlist>
+
       <listitem>
         <para>
-          Collations for non-Unicode multi-byte character sets
+          Weights equal character codes.
+          <literal>sjis_japanese_ci</literal> is an example of this kind
+          of collation.
         </para>
 
-        <para>
-          For characters in the ASCII range, character codes map to
-          weights in case-insensitive fashion. For multi-byte characters
-          outside the ASCII range, there are two types of relationship
-          between character codes and weights:
-        </para>
+        <remark role="todo">
+          Example will not work pre-6.0.
+        </remark>
 
-        <itemizedlist>
-
-          <listitem>
-            <para>
-              Weights equal character codes.
-              <literal>sjis_japanese_ci</literal> is an example of this
-              kind of collation.
-            </para>
-
-            <remark role="todo">
-              Example will not work pre-6.0.
-            </remark>
-
 <programlisting>
 <!--
 mysql> DROP TABLE IF EXISTS t1;

@@ -213,19 +333,19 @@
 +------+---------+------------------------+
 3 rows in set (0.00 sec)
 </programlisting>
-          </listitem>
+      </listitem>
 
-          <listitem>
-            <para>
-              Character codes map one-to-one to weights, but a code is
-              not necessarily equal to the weight.
-              <literal>gbk_chinese_ci</literal> is an example of this
-              kind of collation.
-            </para>
+      <listitem>
+        <para>
+          Character codes map one-to-one to weights, but a code is not
+          necessarily equal to the weight.
+          <literal>gbk_chinese_ci</literal> is an example of this kind
+          of collation.
+        </para>
 
-            <remark role="todo">
-              Example will not work pre-6.0.
-            </remark>
+        <remark role="todo">
+          Example will not work pre-6.0.
+        </remark>
 
 <programlisting>
 <!--

@@ -251,35 +371,34 @@
 +------+---------+------------------------+
 4 rows in set (0.00 sec)
 </programlisting>
-          </listitem>
-
-        </itemizedlist>
       </listitem>
 
-      <listitem>
-        <para>
-          Collations for Unicode multi-byte character sets
-        </para>
+    </itemizedlist>
 
-        <para>
-          Some of these are based on the Unicode Collation Algorithm
-          (UCA). Others are not.
-        </para>
+    <para>
+      <emphasis role="bold">Collations for Unicode multi-byte character
+      sets</emphasis>
+    </para>
 
-        <para>
-          Non-UCA collations have a one-to-one mapping from character
-          code to weight. In MySQL, such collations are case insensitive
-          and accent insensitive. <literal>ut8_general_ci</literal> is
-          an example: <literal>'a'</literal>, <literal>'A'</literal>,
-          <literal>'À'</literal>, and <literal>'á'</literal> each have
-          different character codes but all have a weight of
-          <literal>0x0041</literal> and compare as equal.
-        </para>
+    <para>
+      Some of these are based on the Unicode Collation Algorithm (UCA).
+      Others are not.
+    </para>
 
-        <remark role="todo">
-          6.0: Use slide 9. Pre-6.0: Use explicit comparisons?
-        </remark>
+    <para>
+      Non-UCA collations have a one-to-one mapping from character code
+      to weight. In MySQL, such collations are case insensitive and
+      accent insensitive. <literal>ut8_general_ci</literal> is an
+      example: <literal>'a'</literal>, <literal>'A'</literal>,
+      <literal>'À'</literal>, and <literal>'á'</literal> each have
+      different character codes but all have a weight of
+      <literal>0x0041</literal> and compare as equal.
+    </para>
 
+    <remark role="todo">
+      6.0: Use slide 9. Pre-6.0: Use explicit comparisons?
+    </remark>
+
 <programlisting>
 mysql&gt; <userinput>SET NAMES 'utf8' COLLATE 'utf8_general_ci';</userinput>
 Query OK, 0 rows affected (0.00 sec)

@@ -321,33 +440,32 @@
 4 rows in set (0.00 sec)
 </programlisting>
 
+    <para>
+      UCA-based collations in MySQL have these properties:
+    </para>
+
+    <itemizedlist>
+
+      <listitem>
         <para>
-          UCA-based collations in MySQL have these properties:
+          If a character has weights, each weight uses 2 bytes (16 bits)
         </para>
+      </listitem>
 
-        <itemizedlist>
+      <listitem>
+        <para>
+          A character may have zero weights (or an empty weight). In
+          this case, the character is ignorable. Example: "U+0000 NULL"
+          does not have a weight and is ignorable
+        </para>
+      </listitem>
 
-          <listitem>
-            <para>
-              If a character has weights, each weight uses 2 bytes (16
-              bits)
-            </para>
-          </listitem>
+      <listitem>
+        <para>
+          A character may have one weight. Examples:
+          <literal>'a'</literal> and <literal>'A'</literal>.
+        </para>
 
-          <listitem>
-            <para>
-              A character may have zero weights (or an empty weight). In
-              this case, the character is ignorable. Example: "U+0000
-              NULL" does not have a weight and is ignorable
-            </para>
-          </listitem>
-
-          <listitem>
-            <para>
-              A character may have one weight. Examples:
-              <literal>'a'</literal> and <literal>'A'</literal>.
-            </para>
-
 <programlisting>
 mysql&gt; <userinput>SET NAMES 'utf8' COLLATE 'utf8_unicode_ci';</userinput>
 Query OK, 0 rows affected (0.05 sec)

@@ -360,14 +478,14 @@
 +----------+-------------------------+
 1 row in set (0.02 sec)
 </programlisting>
-          </listitem>
+      </listitem>
 
-          <listitem>
-            <para>
-              A character may have many weights. This is an expansion.
-              Example: German letter <literal>'ß'</literal> (SZ LEAGUE,
-              or SHARP S)
-            </para>
+      <listitem>
+        <para>
+          A character may have many weights. This is an expansion.
+          Example: German letter <literal>'ß'</literal> (SZ LEAGUE, or
+          SHARP S)
+        </para>
 
 <programlisting>
 mysql&gt; <userinput>SET NAMES 'utf8' COLLATE 'utf8_unicode_ci';</userinput>

@@ -381,14 +499,13 @@
 +-----------+--------------------------+
 1 row in set (0.00 sec)
 </programlisting>
-          </listitem>
+      </listitem>
 
-          <listitem>
-            <para>
-              Many characters may have one weight. This is a
-              contraction. Example: <literal>'ch'</literal> is a single
-              letter in Czech
-            </para>
+      <listitem>
+        <para>
+          Many characters may have one weight. This is a contraction.
+          Example: <literal>'ch'</literal> is a single letter in Czech
+        </para>
 
 <programlisting>
 mysql&gt; <userinput>SET NAMES 'utf8' COLLATE 'utf8_czech_ci';</userinput>

@@ -402,62 +519,22 @@
 +-----------+--------------------------+
 1 row in set (0.00 sec)
 </programlisting>
-          </listitem>
-
-        </itemizedlist>
-
-        <para>
-          A many-characters-to-many-weights mapping is also possible
-          (this is contraction with expansion), but is not supported by
-          MySQL.
-        </para>
       </listitem>
 
     </itemizedlist>
 
     <para>
-      Certain types of collations can be added to MySQL without
-      recompiling:
+      A many-characters-to-many-weights mapping is also possible (this
+      is contraction with expansion), but is not supported by MySQL.
     </para>
 
-    <itemizedlist>
-
-      <listitem>
-        <para>
-          Simple collations for 8-bit character sets
-        </para>
-      </listitem>
-
-      <listitem>
-        <para>
-          UCA-based collations for Unicode character sets
-        </para>
-      </listitem>
-
-      <listitem>
-        <para>
-          Binary (<literal><replaceable>xxx</replaceable>_bin</literal>)
-          collations
-        </para>
-      </listitem>
-
-    </itemizedlist>
-
     <para>
-      The following sections describe how to add collations of the first
-      two types. All existing character sets already have a binary
-      collation, so there is no need here to describe how to add one.
+      <emphasis role="bold">Miscellaneous collations</emphasis>
     </para>
 
     <para>
-      To add a collation that does require recompiling (as implemented
-      by means of functions in a C source file), use the instructions in
-      <xref linkend="adding-character-set"/>. However, instead of adding
-      all the information required for a complete character set, just
-      modify the appropriate files for an existing character set. That
-      is, based on what is already present for the character set's
-      current collations, add new data structures, functions, and
-      configuration information for the new collation.
+      There are also a few collations that do not fall into any of the
+      previous categories.
     </para>
 
   </section>


Thread
svn commit - mysqldoc@docsrva: r10823 - in trunk: . refman-6.0paul22 May