Author: paul
Date: 2008-05-22 20:17:36 +0200 (Thu, 22 May 2008)
New Revision: 10823
Log:
r31651@frost: paul | 2008-05-22 13:15:57 -0500
adding-a-collation revisions
Modified:
trunk/refman-6.0/collation-tmp.xml
Property changes on: trunk
___________________________________________________________________
Name: svk:merge
- 4767c598-dc10-0410-bea0-d01b485662eb:/mysqldoc-local/mysqldoc/trunk:35828
7d8d2c4e-af1d-0410-ab9f-b038ce55645b:/mysqldoc-local/mysqldoc:31645
b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:14218
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:31428
+ 4767c598-dc10-0410-bea0-d01b485662eb:/mysqldoc-local/mysqldoc/trunk:35828
7d8d2c4e-af1d-0410-ab9f-b038ce55645b:/mysqldoc-local/mysqldoc:31651
b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:14218
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:31428
Modified: trunk/refman-6.0/collation-tmp.xml
===================================================================
--- trunk/refman-6.0/collation-tmp.xml 2008-05-22 17:02:16 UTC (rev 10822)
+++ trunk/refman-6.0/collation-tmp.xml 2008-05-22 18:17:36 UTC (rev 10823)
Changed blocks: 12, Lines Added: 251, Lines Deleted: 174; 17371 bytes
@@ -20,11 +20,10 @@
</para>
<para>
- Collations order characters based on weights. For each character in
- a character set, the character code maps to a weight. Characters
- with equal weights compare as equal, and characters with unequal
- weights compare according to the relative magnitude of their
- weights.
+ A collation orders characters based on weights. Each character in a
+ character set maps to a weight. Characters with equal weights
+ compare as equal, and characters with unequal weights compare
+ according to the relative magnitude of their weights.
</para>
<remark role="todo">
@@ -36,16 +35,88 @@
used to see the weights for the characters in a string. It returns a
binary string that indicates the weights, so it is convenient to use
<literal>HEX(WEIGHT_STRING(<replaceable>str</replaceable>))</literal>
- to display the weights in printable form.
+ to display the weights in printable form. The following example
+ shows that weights do not differ for lettercase for
+ <literal>'AaBb'</literal> it if is a non-binary string, but do
+ differ if it is a binary string:
</para>
+<programlisting>
+mysql> <userinput>SELECT HEX(WEIGHT_STRING('AaBb'));</userinput>
++----------------------------+
+| HEX(WEIGHT_STRING('AaBb')) |
++----------------------------+
+| 41414242 |
++----------------------------+
+mysql> <userinput>SELECT HEX(WEIGHT_STRING(BINARY 'AaBb'));</userinput>
++-----------------------------------+
+| HEX(WEIGHT_STRING(BINARY 'AaBb')) |
++-----------------------------------+
+| 41614262 |
++-----------------------------------+
+</programlisting>
+
<para>
- [SHOW SOME EXAMPLES]
+ MySQL implements several types of collations, as discussed in
+ <xref linkend="charset-collation-types"/>. Some types of collations
+ can be added to MySQL without recompiling:
</para>
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Simple collations for 8-bit character sets
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ UCA-based collations for Unicode character sets
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Binary (<literal><replaceable>xxx</replaceable>_bin</literal>)
+ collations
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
<para>
- This section describes how to add a collation to an existing
- character set. (If the character set does not exist, see
+ All existing character sets already have a binary collation, so
+ there is no need here to describe how to add one. The following
+ sections describe how to add collations of the first two types.
+
+ <remark role="todo">
+ 4.1-only note, should replace previous sentence
+ </remark>
+
+ In MySQL 4.1, simple collations for 8-bit character sets can be
+ added without recompiling. To add a UCA-based Unicode collation,
+ MySQL 5.0 or higher is required.
+ </para>
+
+ <para>
+ To add a collation that does require recompiling (as implemented by
+ means of functions in a C source file), use the instructions in
+ <xref linkend="adding-character-set"/>. However, instead of adding
+ all the information required for a complete character set, just
+ modify the appropriate files for an existing character set. That is,
+ based on what is already present for the character set's current
+ collations, add new data structures, functions, and configuration
+ information for the new collation.
+ </para>
+
+ <remark role="todo">
+ MERGE prev with following
+ </remark>
+
+ <para>
+ The following discussion describes how to add a collation to an
+ existing character set. (If the character set does not exist, see
<xref linkend="adding-character-set"/>.)
<remark role="todo">
@@ -57,6 +128,46 @@
</para>
<para>
+ Summary of the procedure:
+ </para>
+
+ <orderedlist>
+
+ <listitem>
+ <para>
+ Choose a collation ID
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Add configuration information that names the collation and
+ describes the character-ordering rules
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Restart the server
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Verify that the collation is present
+ </para>
+ </listitem>
+
+ </orderedlist>
+
+ <para>
+ The instructions cover only collations that can be added without
+ recompiling MySQL. For guidelines on adding collations that require
+ recompiling, see the MySQL Blog article in the following list of
+ additional resources.
+ </para>
+
+ <para>
<emphasis role="bold">Additional resources</emphasis>
</para>
@@ -83,36 +194,47 @@
</para>
</listitem>
+ <listitem>
+ <para>
+ MySQL Blog article <quote>Instructions for adding a new Unicode
+ collation</quote>:
+ <ulink url="http://blogs.mysql.com/peterg/2008/05/19/instructions-for-adding-a-new-unicode-collation/"/>
+ </para>
+ </listitem>
+
</itemizedlist>
- <section id="adding-collation-types">
+ <section id="charset-collation-types">
<title>Types of Collations</title>
<remark role="todo">
- It might be that some of the following would be better placed in a
- collation section of more general interest.
+ Ultimately, this section will be moved to the more general
+ character set discussion, but I have it here for now to keep all
+ the new material together.
</remark>
<para>
MySQL includes several types of collation implementations:
</para>
- <itemizedlist>
+ <para>
+ <emphasis role="bold">Simple collations for 8-bit character
+ sets</emphasis>
+ </para>
- <listitem>
- <para>
- Simple collations for 8-bit character sets
- </para>
+ <para>
+ This kind of collation is implemented using an array of 256
+ weights that defines a one-to-one mapping from character codes to
+ weights. <literal>latin1_swedish_ci</literal> is an example. It is
+ a case-insensitive collation, so the weights for uppercase and
+ lowercase versions of a character are the same and they compare as
+ equal.
+ </para>
- <para>
- This kind of collation is implemented using an array of 256
- weights that defines a one-to-one mapping from character codes
- to weights. <literal>latin1_swedish_ci</literal> is an
- example. It is a case-insensitive collation, so the weights
- for uppercase and lowercase versions of a character are the
- same and they compare as equal.
- </para>
+ <remark role="todo">
+ pre-6.0 example
+ </remark>
<programlisting>
mysql> <userinput>SET NAMES 'latin1' COLLATE 'latin1_swedish_ci';</userinput>
@@ -127,9 +249,9 @@
1 row in set (0.00 sec)
</programlisting>
- <remark role="todo">
- Example will not work pre-6.0.
- </remark>
+ <remark role="todo">
+ 6.0 example
+ </remark>
<programlisting>
mysql> <userinput>SET NAMES 'latin1' COLLATE 'latin1_swedish_ci';</userinput>
@@ -151,45 +273,43 @@
+-----------+
1 row in set (0.12 sec)
</programlisting>
- </listitem>
- <listitem>
- <para>
- Complex collations for 8-bit character sets
- </para>
+ <para>
+ <emphasis role="bold">Complex collations for 8-bit character
+ sets</emphasis>
+ </para>
- <para>
- This kind of collation is implemented using functions in a C
- source file that define how to order characters, as described
- in <xref linkend="adding-character-set"/>.
- </para>
- </listitem>
+ <para>
+ This kind of collation is implemented using functions in a C
+ source file that define how to order characters, as described in
+ <xref linkend="adding-character-set"/>.
+ </para>
+ <para>
+ <emphasis role="bold">Collations for non-Unicode multi-byte
+ character sets</emphasis>
+ </para>
+
+ <para>
+ For characters in the ASCII range, character codes map to weights
+ in case-insensitive fashion. For multi-byte characters outside the
+ ASCII range, there are two types of relationship between character
+ codes and weights:
+ </para>
+
+ <itemizedlist>
+
<listitem>
<para>
- Collations for non-Unicode multi-byte character sets
+ Weights equal character codes.
+ <literal>sjis_japanese_ci</literal> is an example of this kind
+ of collation.
</para>
- <para>
- For characters in the ASCII range, character codes map to
- weights in case-insensitive fashion. For multi-byte characters
- outside the ASCII range, there are two types of relationship
- between character codes and weights:
- </para>
+ <remark role="todo">
+ Example will not work pre-6.0.
+ </remark>
- <itemizedlist>
-
- <listitem>
- <para>
- Weights equal character codes.
- <literal>sjis_japanese_ci</literal> is an example of this
- kind of collation.
- </para>
-
- <remark role="todo">
- Example will not work pre-6.0.
- </remark>
-
<programlisting>
<!--
mysql> DROP TABLE IF EXISTS t1;
@@ -213,19 +333,19 @@
+------+---------+------------------------+
3 rows in set (0.00 sec)
</programlisting>
- </listitem>
+ </listitem>
- <listitem>
- <para>
- Character codes map one-to-one to weights, but a code is
- not necessarily equal to the weight.
- <literal>gbk_chinese_ci</literal> is an example of this
- kind of collation.
- </para>
+ <listitem>
+ <para>
+ Character codes map one-to-one to weights, but a code is not
+ necessarily equal to the weight.
+ <literal>gbk_chinese_ci</literal> is an example of this kind
+ of collation.
+ </para>
- <remark role="todo">
- Example will not work pre-6.0.
- </remark>
+ <remark role="todo">
+ Example will not work pre-6.0.
+ </remark>
<programlisting>
<!--
@@ -251,35 +371,34 @@
+------+---------+------------------------+
4 rows in set (0.00 sec)
</programlisting>
- </listitem>
-
- </itemizedlist>
</listitem>
- <listitem>
- <para>
- Collations for Unicode multi-byte character sets
- </para>
+ </itemizedlist>
- <para>
- Some of these are based on the Unicode Collation Algorithm
- (UCA). Others are not.
- </para>
+ <para>
+ <emphasis role="bold">Collations for Unicode multi-byte character
+ sets</emphasis>
+ </para>
- <para>
- Non-UCA collations have a one-to-one mapping from character
- code to weight. In MySQL, such collations are case insensitive
- and accent insensitive. <literal>ut8_general_ci</literal> is
- an example: <literal>'a'</literal>, <literal>'A'</literal>,
- <literal>'À'</literal>, and <literal>'á'</literal> each have
- different character codes but all have a weight of
- <literal>0x0041</literal> and compare as equal.
- </para>
+ <para>
+ Some of these are based on the Unicode Collation Algorithm (UCA).
+ Others are not.
+ </para>
- <remark role="todo">
- 6.0: Use slide 9. Pre-6.0: Use explicit comparisons?
- </remark>
+ <para>
+ Non-UCA collations have a one-to-one mapping from character code
+ to weight. In MySQL, such collations are case insensitive and
+ accent insensitive. <literal>ut8_general_ci</literal> is an
+ example: <literal>'a'</literal>, <literal>'A'</literal>,
+ <literal>'À'</literal>, and <literal>'á'</literal> each have
+ different character codes but all have a weight of
+ <literal>0x0041</literal> and compare as equal.
+ </para>
+ <remark role="todo">
+ 6.0: Use slide 9. Pre-6.0: Use explicit comparisons?
+ </remark>
+
<programlisting>
mysql> <userinput>SET NAMES 'utf8' COLLATE 'utf8_general_ci';</userinput>
Query OK, 0 rows affected (0.00 sec)
@@ -321,33 +440,32 @@
4 rows in set (0.00 sec)
</programlisting>
+ <para>
+ UCA-based collations in MySQL have these properties:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
<para>
- UCA-based collations in MySQL have these properties:
+ If a character has weights, each weight uses 2 bytes (16 bits)
</para>
+ </listitem>
- <itemizedlist>
+ <listitem>
+ <para>
+ A character may have zero weights (or an empty weight). In
+ this case, the character is ignorable. Example: "U+0000 NULL"
+ does not have a weight and is ignorable
+ </para>
+ </listitem>
- <listitem>
- <para>
- If a character has weights, each weight uses 2 bytes (16
- bits)
- </para>
- </listitem>
+ <listitem>
+ <para>
+ A character may have one weight. Examples:
+ <literal>'a'</literal> and <literal>'A'</literal>.
+ </para>
- <listitem>
- <para>
- A character may have zero weights (or an empty weight). In
- this case, the character is ignorable. Example: "U+0000
- NULL" does not have a weight and is ignorable
- </para>
- </listitem>
-
- <listitem>
- <para>
- A character may have one weight. Examples:
- <literal>'a'</literal> and <literal>'A'</literal>.
- </para>
-
<programlisting>
mysql> <userinput>SET NAMES 'utf8' COLLATE 'utf8_unicode_ci';</userinput>
Query OK, 0 rows affected (0.05 sec)
@@ -360,14 +478,14 @@
+----------+-------------------------+
1 row in set (0.02 sec)
</programlisting>
- </listitem>
+ </listitem>
- <listitem>
- <para>
- A character may have many weights. This is an expansion.
- Example: German letter <literal>'ß'</literal> (SZ LEAGUE,
- or SHARP S)
- </para>
+ <listitem>
+ <para>
+ A character may have many weights. This is an expansion.
+ Example: German letter <literal>'ß'</literal> (SZ LEAGUE, or
+ SHARP S)
+ </para>
<programlisting>
mysql> <userinput>SET NAMES 'utf8' COLLATE 'utf8_unicode_ci';</userinput>
@@ -381,14 +499,13 @@
+-----------+--------------------------+
1 row in set (0.00 sec)
</programlisting>
- </listitem>
+ </listitem>
- <listitem>
- <para>
- Many characters may have one weight. This is a
- contraction. Example: <literal>'ch'</literal> is a single
- letter in Czech
- </para>
+ <listitem>
+ <para>
+ Many characters may have one weight. This is a contraction.
+ Example: <literal>'ch'</literal> is a single letter in Czech
+ </para>
<programlisting>
mysql> <userinput>SET NAMES 'utf8' COLLATE 'utf8_czech_ci';</userinput>
@@ -402,62 +519,22 @@
+-----------+--------------------------+
1 row in set (0.00 sec)
</programlisting>
- </listitem>
-
- </itemizedlist>
-
- <para>
- A many-characters-to-many-weights mapping is also possible
- (this is contraction with expansion), but is not supported by
- MySQL.
- </para>
</listitem>
</itemizedlist>
<para>
- Certain types of collations can be added to MySQL without
- recompiling:
+ A many-characters-to-many-weights mapping is also possible (this
+ is contraction with expansion), but is not supported by MySQL.
</para>
- <itemizedlist>
-
- <listitem>
- <para>
- Simple collations for 8-bit character sets
- </para>
- </listitem>
-
- <listitem>
- <para>
- UCA-based collations for Unicode character sets
- </para>
- </listitem>
-
- <listitem>
- <para>
- Binary (<literal><replaceable>xxx</replaceable>_bin</literal>)
- collations
- </para>
- </listitem>
-
- </itemizedlist>
-
<para>
- The following sections describe how to add collations of the first
- two types. All existing character sets already have a binary
- collation, so there is no need here to describe how to add one.
+ <emphasis role="bold">Miscellaneous collations</emphasis>
</para>
<para>
- To add a collation that does require recompiling (as implemented
- by means of functions in a C source file), use the instructions in
- <xref linkend="adding-character-set"/>. However, instead of adding
- all the information required for a complete character set, just
- modify the appropriate files for an existing character set. That
- is, based on what is already present for the character set's
- current collations, add new data structures, functions, and
- configuration information for the new collation.
+ There are also a few collations that do not fall into any of the
+ previous categories.
</para>
</section>
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r10823 - in trunk: . refman-6.0 | paul | 22 May |