List:Commits« Previous MessageNext Message »
From:paul.dubois Date:May 23 2011 3:26pm
Subject:svn commit - mysqldoc@oter02: r26311 - in trunk: . dynamic-docs/changelog refman-5.0 refman-5.1 refman-5.5 refman-5.6 refman-6.0
View as plain text  
Author: pd221994
Date: 2011-05-23 17:26:29 +0200 (Mon, 23 May 2011)
New Revision: 26311

Log:
 r48262@dhcp-adc-twvpn-1-vpnpool-10-154-20-51:  paul | 2011-05-23 10:20:24 -0500
 Document WL#5624: Collation customization improvements


Modified:
   svk:merge
   trunk/dynamic-docs/changelog/mysqld-2.xml
   trunk/refman-5.0/globalization.xml
   trunk/refman-5.1/globalization.xml
   trunk/refman-5.5/globalization.xml
   trunk/refman-5.6/globalization.xml
   trunk/refman-6.0/globalization.xml

Property changes on: trunk
___________________________________________________________________

Modified: svk:merge
===================================================================


Changed blocks: 0, Lines Added: 0, Lines Deleted: 0; 1277 bytes


Modified: trunk/dynamic-docs/changelog/mysqld-2.xml
===================================================================
--- trunk/dynamic-docs/changelog/mysqld-2.xml	2011-05-23 14:32:58 UTC (rev 26310)
+++ trunk/dynamic-docs/changelog/mysqld-2.xml	2011-05-23 15:26:29 UTC (rev 26311)
Changed blocks: 1, Lines Added: 73, Lines Deleted: 0; 2331 bytes

@@ -47566,4 +47566,77 @@
 
   </logentry>
 
+  <logentry entrytype="feature">
+
+    <tags>
+      <manual type="collations"/>
+      <manual type="LDML"/>
+    </tags>
+
+    <bugs>
+      <fixes wlid="5624"/>
+    </bugs>
+
+    <versions>
+      <version ver="5.6.1"/>
+    </versions>
+
+    <message>
+
+      <para>
+        Support for adding Unicode collations that are based on the
+        Unicode Collation Algorithm (UCA) has been improved:
+      </para>
+
+      <itemizedlist>
+
+        <listitem>
+          <para>
+            MySQL now recognizes a larger subset of the LDML syntax that
+            is used to write collation descriptions. In many cases, it
+            is possible to download a collation definition from the
+            Unicode Common Locale Data Repository and paste the relevant
+            part (that is, the part between the
+            <literal>&lt;rules&gt;</literal> and
+            <literal>&lt;/rules&gt;</literal> tags) into the MySQL
+            <filename>Index.xml</filename> file.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Character representation in LDML rules is more flexible. Any
+            character can be written literally, not just basic Latin
+            letters. For collations based on UCA 5.2.0, hexadecimal
+            notation can be used for any character, not just BMP
+            characters.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            When problems are found while parsing
+            <filename>Index.xml</filename>, better diagnostics are
+            produced.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            For collations that require tailoring rules, there is no
+            longer a fixed size limit on the tailoring information.
+          </para>
+        </listitem>
+
+      </itemizedlist>
+
+      <para>
+        For more information, see <xref linkend="ldml-rules"/>, and
+        <xref linkend="collation-diagnostics"/>.
+      </para>
+
+    </message>
+
+  </logentry>
+
 </changelog>


Modified: trunk/refman-5.0/globalization.xml
===================================================================
--- trunk/refman-5.0/globalization.xml	2011-05-23 14:32:58 UTC (rev 26310)
+++ trunk/refman-5.0/globalization.xml	2011-05-23 15:26:29 UTC (rev 26311)
Changed blocks: 3, Lines Added: 22, Lines Deleted: 18; 2432 bytes

@@ -7593,12 +7593,13 @@
 
           <listitem>
             <para>
-              A reset rule does not specify any ordering in and of
-              itself. Instead, it <quote>resets</quote> the ordering for
-              subsequent shift rules to cause them to be taken in
-              relation to a given character. Either of these rules
-              resets subsequent shift rules to be taken in relation to
-              the letter <literal>'A'</literal>:
+              A <literal>&lt;reset&gt;</literal> rule does not specify
+              any ordering in and of itself. Instead, it
+              <quote>resets</quote> the ordering for subsequent shift
+              rules to cause them to be taken in relation to a given
+              character. Either of the following rules resets subsequent
+              shift rules to be taken in relation to the letter
+              <literal>'A'</literal>:
             </para>
 
 <programlisting>

@@ -7610,21 +7611,13 @@
 
           <listitem>
             <para>
-              Shift rules define primary, secondary, and tertiary
-              differences of a character from another character. To
-              specify them, use <literal>&lt;p&gt;</literal>,
+              The <literal>&lt;p&gt;</literal>,
               <literal>&lt;s&gt;</literal>, and
-              <literal>&lt;t&gt;</literal> elements. Either of these
-              rules specifies a primary shift rule for the
-              <literal>'G'</literal> character:
+              <literal>&lt;t&gt;</literal>, shift rules define primary,
+              secondary, and tertiary differences of a character from
+              another character:
             </para>
 
-<programlisting>
-&lt;p&gt;G&lt;/p&gt;
-
-&lt;p&gt;\u0047&lt;/p&gt;
-</programlisting>
-
             <itemizedlist>
 
               <listitem>

@@ -7649,6 +7642,17 @@
               </listitem>
 
             </itemizedlist>
+
+            <para>
+              Either of these rules specifies a primary shift rule for
+              the <literal>'G'</literal> character:
+            </para>
+
+<programlisting>
+&lt;p&gt;G&lt;/p&gt;
+
+&lt;p&gt;\u0047&lt;/p&gt;
+</programlisting>
           </listitem>
 
         </itemizedlist>


Modified: trunk/refman-5.1/globalization.xml
===================================================================
--- trunk/refman-5.1/globalization.xml	2011-05-23 14:32:58 UTC (rev 26310)
+++ trunk/refman-5.1/globalization.xml	2011-05-23 15:26:29 UTC (rev 26311)
Changed blocks: 3, Lines Added: 22, Lines Deleted: 18; 2432 bytes

@@ -7783,12 +7783,13 @@
 
           <listitem>
             <para>
-              A reset rule does not specify any ordering in and of
-              itself. Instead, it <quote>resets</quote> the ordering for
-              subsequent shift rules to cause them to be taken in
-              relation to a given character. Either of these rules
-              resets subsequent shift rules to be taken in relation to
-              the letter <literal>'A'</literal>:
+              A <literal>&lt;reset&gt;</literal> rule does not specify
+              any ordering in and of itself. Instead, it
+              <quote>resets</quote> the ordering for subsequent shift
+              rules to cause them to be taken in relation to a given
+              character. Either of the following rules resets subsequent
+              shift rules to be taken in relation to the letter
+              <literal>'A'</literal>:
             </para>
 
 <programlisting>

@@ -7800,21 +7801,13 @@
 
           <listitem>
             <para>
-              Shift rules define primary, secondary, and tertiary
-              differences of a character from another character. To
-              specify them, use <literal>&lt;p&gt;</literal>,
+              The <literal>&lt;p&gt;</literal>,
               <literal>&lt;s&gt;</literal>, and
-              <literal>&lt;t&gt;</literal> elements. Either of these
-              rules specifies a primary shift rule for the
-              <literal>'G'</literal> character:
+              <literal>&lt;t&gt;</literal>, shift rules define primary,
+              secondary, and tertiary differences of a character from
+              another character:
             </para>
 
-<programlisting>
-&lt;p&gt;G&lt;/p&gt;
-
-&lt;p&gt;\u0047&lt;/p&gt;
-</programlisting>
-
             <itemizedlist>
 
               <listitem>

@@ -7839,6 +7832,17 @@
               </listitem>
 
             </itemizedlist>
+
+            <para>
+              Either of these rules specifies a primary shift rule for
+              the <literal>'G'</literal> character:
+            </para>
+
+<programlisting>
+&lt;p&gt;G&lt;/p&gt;
+
+&lt;p&gt;\u0047&lt;/p&gt;
+</programlisting>
           </listitem>
 
         </itemizedlist>


Modified: trunk/refman-5.5/globalization.xml
===================================================================
--- trunk/refman-5.5/globalization.xml	2011-05-23 14:32:58 UTC (rev 26310)
+++ trunk/refman-5.5/globalization.xml	2011-05-23 15:26:29 UTC (rev 26311)
Changed blocks: 4, Lines Added: 28, Lines Deleted: 24; 3355 bytes

@@ -9020,12 +9020,13 @@
 
           <listitem>
             <para>
-              A reset rule does not specify any ordering in and of
-              itself. Instead, it <quote>resets</quote> the ordering for
-              subsequent shift rules to cause them to be taken in
-              relation to a given character. Either of these rules
-              resets subsequent shift rules to be taken in relation to
-              the letter <literal>'A'</literal>:
+              A <literal>&lt;reset&gt;</literal> rule does not specify
+              any ordering in and of itself. Instead, it
+              <quote>resets</quote> the ordering for subsequent shift
+              rules to cause them to be taken in relation to a given
+              character. Either of the following rules resets subsequent
+              shift rules to be taken in relation to the letter
+              <literal>'A'</literal>:
             </para>
 
 <programlisting>

@@ -9037,21 +9038,13 @@
 
           <listitem>
             <para>
-              Shift rules define primary, secondary, and tertiary
-              differences of a character from another character. To
-              specify them, use <literal>&lt;p&gt;</literal>,
+              The <literal>&lt;p&gt;</literal>,
               <literal>&lt;s&gt;</literal>, and
-              <literal>&lt;t&gt;</literal> elements. Either of these
-              rules specifies a primary shift rule for the
-              <literal>'G'</literal> character:
+              <literal>&lt;t&gt;</literal>, shift rules define primary,
+              secondary, and tertiary differences of a character from
+              another character:
             </para>
 
-<programlisting>
-&lt;p&gt;G&lt;/p&gt;
-
-&lt;p&gt;\u0047&lt;/p&gt;
-</programlisting>
-
             <itemizedlist>
 
               <listitem>

@@ -9076,13 +9069,24 @@
               </listitem>
 
             </itemizedlist>
+
+            <para>
+              Either of these rules specifies a primary shift rule for
+              the <literal>'G'</literal> character:
+            </para>
+
+<programlisting>
+&lt;p&gt;G&lt;/p&gt;
+
+&lt;p&gt;\u0047&lt;/p&gt;
+</programlisting>
           </listitem>
 
           <listitem>
             <para>
-              Identity rules indicate that one character sorts
-              identically to another. These rules cause
-              <literal>'b'</literal> to sort the same as
+              The <literal>&lt;i&gt;</literal> shift rule indicates that
+              one character sorts identically to another. The following
+              rules cause <literal>'b'</literal> to sort the same as
               <literal>'a'</literal>:
             </para>
 

@@ -9092,9 +9096,9 @@
 </programlisting>
 
             <para>
-              Identity rules are supported as of MySQL 5.5.3. Prior to
-              5.5.3, use <literal>&lt;s&gt; ... &lt;/s&gt;</literal>
-              instead.
+              The <literal>&lt;i&gt;</literal> shift rules is supported
+              as of MySQL 5.5.3. Prior to 5.5.3, use <literal>&lt;s&gt;
+              ... &lt;/s&gt;</literal> instead.
             </para>
           </listitem>
 


Modified: trunk/refman-5.6/globalization.xml
===================================================================
--- trunk/refman-5.6/globalization.xml	2011-05-23 14:32:58 UTC (rev 26310)
+++ trunk/refman-5.6/globalization.xml	2011-05-23 15:26:29 UTC (rev 26311)
Changed blocks: 6, Lines Added: 548, Lines Deleted: 43; 24284 bytes

@@ -9233,41 +9233,54 @@
           specification available at
           <ulink url="http://www.unicode.org/reports/tr35/"/>, which
           should be consulted for further information. MySQL recognizes
-          a large enough subset of the rules that in many cases, it is
+          a large enough subset of the syntax that, in many cases, it is
           possible to download a collation definition from the Unicode
-          Common Locale Data Repository and paste into the
-          <filename>Index.xml</filename> file the relevant part (that
-          is, the part between the <literal>&lt;rules&gt;</literal> and
-          <literal>&lt;/rules&gt;</literal> tags). The rules described
-          here are all supported except that character sorting occurs
-          only at the primary level. Rules that specify differences at
-          secondary or higher sort levels are recognized (and thus can
-          be included in collation definitions) but are treated as
-          equality at the primary level.
+          Common Locale Data Repository and paste the relevant part
+          (that is, the part between the
+          <literal>&lt;rules&gt;</literal> and
+          <literal>&lt;/rules&gt;</literal> tags) into the MySQL
+          <filename>Index.xml</filename> file. The rules described here
+          are all supported except that character sorting occurs only at
+          the primary level. Rules that specify differences at secondary
+          or higher sort levels are recognized (and thus can be included
+          in collation definitions) but are treated as equality at the
+          primary level.
         </para>
 
         <para>
+          The MySQL server generates diagnostics when it finds problems
+          while parsing the <filename>Index.xml</filename> file. See
+          <xref linkend="collation-diagnostics"/>.
+        </para>
+
+        <para>
           <emphasis role="bold">Character Representation</emphasis>
         </para>
 
         <para>
-          Characters named in LDML rules can be written in
+          Characters named in LDML rules can be written literally or in
           <literal>\u<replaceable>nnnn</replaceable></literal> format,
           where <replaceable>nnnn</replaceable> is the hexadecimal
-          Unicode code point value. Within hexadecimal values, the
-          digits <literal>A</literal> through <literal>F</literal> are
-          not case sensitive; <literal>\u00E1</literal> and
-          <literal>\u00e1</literal> are equivalent. Basic Latin letters
-          <literal>A-Z</literal> and <literal>a-z</literal> can also be
-          written literally (this is a MySQL limitation; the LDML
-          specification permits literal non-Latin1 characters in the
-          rules). Only characters in the Basic Multilingual Plane can be
-          specified. This notation does not apply to characters outside
-          the BMP range of <literal>0000</literal> to
-          <literal>FFFF</literal>.
+          Unicode code point value. For example, <literal>A</literal>
+          and <literal>&aacute;</literal> can be written literally or as
+          <literal>\u0041</literal> and <literal>\u00E1</literal>.
+          Within hexadecimal values, the digits <literal>A</literal>
+          through <literal>F</literal> are not case sensitive;
+          <literal>\u00E1</literal> and <literal>\u00e1</literal> are
+          equivalent. For UCA 4.0.0 collations, hexadecimal notation can
+          be used only for characters in the Basic Multilingual Plane,
+          not for characters outside the BMP range of
+          <literal>0000</literal> to <literal>FFFF</literal>. For UCA
+          5.2.0 collations, hexadecimal notation can be used for any
+          character.
         </para>
 
         <para>
+          The <filename>Index.xml</filename> file itself should be
+          written using UTF-8 encoding.
+        </para>
+
+        <para>
           <emphasis role="bold">Syntax Rules</emphasis>
         </para>
 

@@ -9283,12 +9296,13 @@
 
           <listitem>
             <para>
-              A reset rule does not specify any ordering in and of
-              itself. Instead, it <quote>resets</quote> the ordering for
-              subsequent shift rules to cause them to be taken in
-              relation to a given character. Either of these rules
-              resets subsequent shift rules to be taken in relation to
-              the letter <literal>'A'</literal>:
+              A <literal>&lt;reset&gt;</literal> rule does not specify
+              any ordering in and of itself. Instead, it
+              <quote>resets</quote> the ordering for subsequent shift
+              rules to cause them to be taken in relation to a given
+              character. Either of the following rules resets subsequent
+              shift rules to be taken in relation to the letter
+              <literal>'A'</literal>:
             </para>
 
 <programlisting>

@@ -9300,21 +9314,13 @@
 
           <listitem>
             <para>
-              Shift rules define primary, secondary, and tertiary
-              differences of a character from another character. To
-              specify them, use <literal>&lt;p&gt;</literal>,
+              The <literal>&lt;p&gt;</literal>,
               <literal>&lt;s&gt;</literal>, and
-              <literal>&lt;t&gt;</literal> elements. Either of these
-              rules specifies a primary shift rule for the
-              <literal>'G'</literal> character:
+              <literal>&lt;t&gt;</literal>, shift rules define primary,
+              secondary, and tertiary differences of a character from
+              another character:
             </para>
 
-<programlisting>
-&lt;p&gt;G&lt;/p&gt;
-
-&lt;p&gt;\u0047&lt;/p&gt;
-</programlisting>
-
             <itemizedlist>
 
               <listitem>

@@ -9338,14 +9344,79 @@
                 </para>
               </listitem>
 
+<!--
+If we add this back in, there is also quaternary material
+in the reset-before and abbreviated-syntax descriptions
+that should be added
+          <listitem>
+            <para>
+              Use quaternary differences to distinguish punctuation in
+              <quote>Shifted</quote> mode.
+            </para>
+          </listitem>
+-->
+
             </itemizedlist>
+
+            <para>
+              Either of these rules specifies a primary shift rule for
+              the <literal>'G'</literal> character:
+            </para>
+
+<programlisting>
+&lt;p&gt;G&lt;/p&gt;
+
+&lt;p&gt;\u0047&lt;/p&gt;
+</programlisting>
           </listitem>
 
           <listitem>
             <para>
-              Identity rules indicate that one character sorts
-              identically to another. These rules cause
-              <literal>'b'</literal> to sort the same as
+              Reset rules permit a <literal>before</literal> attribute.
+              Normally, shift rules after a reset rule indicate
+              characters that sort after the reset character. Shift
+              rules after a reset rule that has the
+              <literal>before</literal> attribute indicate characters
+              that sort before the reset character. The following rules
+              put the character <literal>'b'</literal> immediately
+              before <literal>'a'</literal> at the primary level:
+            </para>
+
+<programlisting>
+&lt;reset before="primary"&gt;a&lt;/reset&gt;
+&lt;p&gt;b&lt;/p&gt;
+</programlisting>
+
+            <para>
+              Permissible <literal>before</literal> attribute values
+              specify the sort level by name or the equivalent numeric
+              value:
+            </para>
+
+<programlisting>
+&lt;reset before="primary"&gt;
+&lt;reset before="1"&gt;
+
+&lt;reset before="secondary"&gt;
+&lt;reset before="2"&gt;
+
+&lt;reset before="tertiary"&gt;
+&lt;reset before="3"&gt;
+</programlisting>
+
+<!--
+<programlisting>
+&lt;reset before="quaternary"&gt;
+&lt;reset before="4"&gt;
+</programlisting>
+-->
+          </listitem>
+
+          <listitem>
+            <para>
+              The <literal>&lt;i&gt;</literal> shift rule indicates that
+              one character sorts identically to another. The following
+              rules cause <literal>'b'</literal> to sort the same as
               <literal>'a'</literal>:
             </para>
 

@@ -9355,6 +9426,364 @@
 </programlisting>
           </listitem>
 
+          <listitem>
+            <para>
+              Abbreviated shift syntax specifies multiple shift rules
+              using a single pair of tags. The following table shows the
+              correspondence between abbreviated syntax rules and the
+              equivalent nonabbreviated rules.
+            </para>
+
+            <table>
+              <title>Abbreviated Shift Syntax</title>
+              <tgroup cols="2">
+                <colspec colwidth="40*"/>
+                <colspec colwidth="60*"/>
+                <thead>
+                  <row>
+                    <entry>Abbreviated Syntax</entry>
+                    <entry>Nonabbreviated Syntax</entry>
+                  </row>
+                </thead>
+                <tbody>
+                  <row>
+                    <entry><literal>&lt;pc&gt;xyz&lt;/pc&gt;</literal></entry>
+                    <entry><literal>&lt;p&gt;x&lt;/p&gt;&lt;p&gt;y&lt;/p&gt;&lt;p&gt;z&lt;/p&gt;</literal></entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;sc&gt;xyz&lt;/sc&gt;</literal></entry>
+                    <entry><literal>&lt;s&gt;x&lt;/s&gt;&lt;s&gt;y&lt;/s&gt;&lt;s&gt;z&lt;/s&gt;</literal></entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;tc&gt;xyz&lt;/tc&gt;</literal></entry>
+                    <entry><literal>&lt;t&gt;x&lt;/t&gt;&lt;t&gt;y&lt;/t&gt;&lt;t&gt;z&lt;/t&gt;</literal></entry>
+                  </row>
+<!--
+          <row>
+            <entry><literal>&lt;qc&gt;xyz&lt;/qc&gt;</literal></entry>
+            <entry><literal>&lt;q&gt;x&lt;/q&gt;&lt;q&gt;y&lt;/q&gt;&lt;q&gt;z&lt;/q&gt;</literal></entry>
+          </row>
+-->
+                  <row>
+                    <entry><literal>&lt;ic&gt;xyz&lt;/ic&gt;</literal></entry>
+                    <entry><literal>&lt;i&gt;x&lt;/i&gt;&lt;i&gt;y&lt;/i&gt;&lt;i&gt;z&lt;/i&gt;</literal></entry>
+                  </row>
+                </tbody>
+              </tgroup>
+            </table>
+          </listitem>
+
+          <listitem>
+            <para>
+              MySQL supports expansions 2 to 6 characters long. An
+              expansion is a reset rule that establishes an anchor point
+              for a multiple-character sequence. The following rules put
+              <literal>'z'</literal> greater at the primary level than
+              the sequence of three characters <literal>'abc'</literal>:
+            </para>
+
+<programlisting>
+&lt;reset&gt;abc&lt;/reset&gt;
+&lt;p&gt;z&lt;/p&gt;
+</programlisting>
+          </listitem>
+
+          <listitem>
+            <para>
+              MySQL supports contractions 2 to 6 characters long. A
+              contraction is a shift rule that sorts a
+              multiple-character sequence. The following rules put the
+              sequence of three characters <literal>'xyz'</literal>
+              greater at the primary level than <literal>'a'</literal>:
+            </para>
+
+<programlisting>
+&lt;reset&gt;a&lt;/reset&gt;
+&lt;p&gt;xyz&lt;/p&gt;
+</programlisting>
+          </listitem>
+
+          <listitem>
+            <para>
+              Long expansions and long contractions can be used
+              together. These rules put the sequence of three characters
+              <literal>'xyz'</literal> greater at the primary level than
+              the sequence of three characters <literal>'abc'</literal>:
+            </para>
+
+<programlisting>
+&lt;reset&gt;abc&lt;/reset&gt;
+&lt;p&gt;xyz&lt;/p&gt;
+</programlisting>
+          </listitem>
+
+          <listitem>
+            <para>
+              Normal expansion syntax uses <literal>&lt;x&gt;</literal>
+              plus <literal>&lt;extend&gt;</literal> elements to specify
+              an expansion. The following rules put the character
+              <literal>'k'</literal> greater at the secondary level than
+              the sequence <literal>'ch'</literal>. That is,
+              <literal>'k'</literal> behaves as if it expands to a
+              character after <literal>'c'</literal> followed by
+              <literal>'h'</literal>:
+            </para>
+
+<programlisting>
+&lt;reset&gt;c&lt;/reset&gt;
+&lt;x&gt;&lt;s&gt;k&lt;/s&gt;&lt;extend&gt;h&lt;/extend&gt;&lt;/x&gt;
+</programlisting>
+
+            <para>
+              This syntax permits long sequences. These rules sort the
+              sequence <literal>'ccs'</literal> greater at the tertiary
+              level than the sequence <literal>'cscs'</literal>:
+            </para>
+
+<programlisting>
+&lt;reset&gt;cs&lt;/reset&gt;
+&lt;x&gt;&lt;t&gt;ccs&lt;/t&gt;&lt;extend&gt;cs&lt;/extend&gt;&lt;/x&gt;
+</programlisting>
+
+            <para>
+              The LDML specification describes normal expansion syntax
+              as <quote>tricky.</quote> See that specification for
+              details.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para>
+              Previous context syntax uses <literal>&lt;x&gt;</literal>
+              plus <literal>&lt;context&gt;</literal> elements to
+              specify that the context before a character affects how it
+              sorts. The following rules put <literal>'-'</literal>
+              greater at the secondary level than
+              <literal>'a'</literal>, but only when
+              <literal>'-'</literal> goes after <literal>'b'</literal>:
+            </para>
+
+<programlisting>
+&lt;reset&gt;a&lt;/reset&gt;
+&lt;x&gt;&lt;context&gt;b&lt;/context&gt;&lt;s&gt;-&lt;/s&gt;&lt;/x&gt;
+</programlisting>
+          </listitem>
+
+          <listitem>
+            <para>
+              Previous context syntax can include the
+              <literal>&lt;extend&gt;</literal> element. These rules put
+              <literal>'def'</literal> greater at the primary level than
+              <literal>'aghi'</literal>, but only when
+              <literal>'def'</literal> comes after
+              <literal>'abc'</literal>:
+            </para>
+
+<programlisting>
+&lt;reset&gt;a&lt;/reset&gt;
+&lt;x&gt;&lt;context&gt;abc&lt;/context&gt;&lt;p&gt;def&lt;/p&gt;&lt;extend&gt;ghi&lt;/extend&gt;&lt;/x&gt;
+</programlisting>
+          </listitem>
+
+          <listitem>
+            <para>
+              A reset rule can name a logical reset position rather than
+              a literal character:
+            </para>
+
+<programlisting>
+&lt;first_tertiary_ignorable/&gt;
+&lt;last_tertiary_ignorable/&gt;
+&lt;first_secondary_ignorable/&gt;
+&lt;last_secondary_ignorable/&gt;
+&lt;first_primary_ignorable/&gt;
+&lt;last_primary_ignorable/&gt;
+&lt;first_variable/&gt;
+&lt;last_variable/&gt;
+&lt;first_non_ignorable/&gt;
+&lt;last_non_ignorable/&gt;
+&lt;first_trailing/&gt;
+&lt;last_trailing/&gt;
+</programlisting>
+
+            <para>
+              These rules put <literal>'z'</literal> greater at the
+              primary level than nonignorable characters that have a
+              Default Unicode Collation Element Table (DUCET) entry and
+              that are not CJK:
+            </para>
+
+<programlisting>
+&lt;reset&gt;&lt;last_non_ignorable/&gt;&lt;/reset&gt;
+&lt;p&gt;z&lt;/p&gt;
+</programlisting>
+
+            <para>
+              Logical positions have the code points shown in the
+              following table.
+            </para>
+
+            <table>
+              <title>Logical Reset Position Code Points</title>
+              <tgroup cols="3">
+                <colspec colwidth="40*"/>
+                <colspec colwidth="30*"/>
+                <colspec colwidth="30*"/>
+                <thead>
+                  <row>
+                    <entry>Logical Position</entry>
+                    <entry>Unicode 4.0.0 Code Point</entry>
+                    <entry>Unicode 5.2.0 Code Point</entry>
+                  </row>
+                </thead>
+                <tbody>
+                  <row>
+                    <entry><literal>&lt;first_non_ignorable/&gt;</literal></entry>
+                    <entry>U+02D0</entry>
+                    <entry>U+02D0</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;last_non_ignorable/&gt;</literal></entry>
+                    <entry>U+A48C</entry>
+                    <entry>U+1342E</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;first_primary_ignorable/&gt;</literal></entry>
+                    <entry>U+0332</entry>
+                    <entry>U+0332</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;last_primary_ignorable/&gt;</literal></entry>
+                    <entry>U+20EA</entry>
+                    <entry>U+101FD</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;first_secondary_ignorable/&gt;</literal></entry>
+                    <entry>U+0000</entry>
+                    <entry>U+0000</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;last_secondary_ignorable/&gt;</literal></entry>
+                    <entry>U+FE73</entry>
+                    <entry>U+FE73</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;first_tertiary_ignorable/&gt;</literal></entry>
+                    <entry>U+0000</entry>
+                    <entry>U+0000</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;last_tertiary_ignorable/&gt;</literal></entry>
+                    <entry>U+FE73</entry>
+                    <entry>U+FE73</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;first_trailing/&gt;</literal></entry>
+                    <entry>U+0000</entry>
+                    <entry>U+0000</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;last_trailing/&gt;</literal></entry>
+                    <entry>U+0000</entry>
+                    <entry>U+0000</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;first_variable/&gt;</literal></entry>
+                    <entry>U+0009</entry>
+                    <entry>U+0009</entry>
+                  </row>
+                  <row>
+                    <entry><literal>&lt;last_variable/&gt;</literal></entry>
+                    <entry>U+2183</entry>
+                    <entry>U+1D371</entry>
+                  </row>
+                </tbody>
+              </tgroup>
+            </table>
+          </listitem>
+
+          <listitem>
+            <para>
+              The <literal>&lt;collation&gt;</literal> element permits a
+              <literal>shift-after-method</literal> attribute that
+              affects character weight calculation for shift rules. The
+              attribute has these permitted values:
+            </para>
+
+            <itemizedlist>
+
+              <listitem>
+                <para>
+                  <literal>simple</literal>: Calculate character weights
+                  as for reset rules that do not have a
+                  <literal>before</literal> attribute. This is the
+                  default.
+                </para>
+              </listitem>
+
+              <listitem>
+                <para>
+                  <literal>expand</literal>: Use expansions for shifts
+                  after reset rules.
+                </para>
+              </listitem>
+
+            </itemizedlist>
+
+            <para>
+              Suppose that <literal>'0'</literal> and
+              <literal>'1'</literal> have weights of
+              <literal>0E29</literal> and <literal>0E2A</literal> and we
+              want to put all basic Latin letters between
+              <literal>'0'</literal> and <literal>'1'</literal>:
+            </para>
+
+<programlisting>
+&lt;reset&gt;0&lt;/reset&gt;
+&lt;pc&gt;abcdefghijklmnopqrstuvwxyz&lt;/pc&gt;
+</programlisting>
+
+            <para>
+              For simple shift mode, weights are calculated as follows:
+            </para>
+
+<programlisting>
+'a' has weight 0E29+1
+'b' has weight 0E29+2
+'c' has weight 0E29+3
+...
+</programlisting>
+
+            <para>
+              However, there are not enough vacant positions to put 26
+              characters between <literal>'0'</literal> and
+              <literal>'1'</literal>. The result is that digits and
+              letters are intermixed.
+            </para>
+
+            <para>
+              To solve this, use
+              <literal>shift-after-method="expand"</literal>. Then
+              weights are calculated like this:
+            </para>
+
+<programlisting>
+'a' has weight [0E29][233D+1]
+'b' has weight [0E29][233D+2]
+'c' has weight [0E29][233D+3]
+...
+</programlisting>
+
+            <para>
+              <literal>233D</literal> is the UCA 4.0.0 weight for
+              character <literal>0xA48C</literal>, which is the last
+              nonignorable character (a sort of the greatest character
+              in the collation, excluding CJK). UCA 5.2.0 is similar but
+              uses <literal>3ACA</literal>, for character
+              <literal>0x1342E</literal>.
+            </para>
+          </listitem>
+
         </itemizedlist>
 
         <para>

@@ -9382,6 +9811,82 @@
 
       </section>
 
+      <section id="collation-diagnostics">
+
+        <title>Diagnostics During <filename>Index.xml</filename> Parsing</title>
+
+        <para>
+          The MySQL server generates diagnostics when it finds problems
+          while parsing the <filename>Index.xml</filename> file:
+        </para>
+
+        <itemizedlist>
+
+          <listitem>
+            <para>
+              Unknown tags are written to the error log. For example,
+              the following message results if a collation definition
+              contains a <literal>&lt;aaa&gt;</literal> tag:
+            </para>
+
+<programlisting>
+[Warning] Buffered warning: Unknown LDML tag:
+'charsets/charset/collation/rules/aaa'
+</programlisting>
+          </listitem>
+
+          <listitem>
+            <para>
+              Problems with collations generate warnings that clients
+              can display with <literal role="stmt">SHOW
+              WARNINGS</literal>. Suppose that a reset rule contains an
+              expansion longer than the maximum supported length of 6
+              characters:
+            </para>
+
+<programlisting>
+&lt;reset&gt;abcdefghi&lt;/reset&gt;
+&lt;i&gt;x&lt;/i&gt;
+</programlisting>
+
+            <para>
+              An attempt to use the collation produces warnings:
+            </para>
+
+<programlisting>
+mysql> <userinput>SELECT _utf8'test' COLLATE utf8_test_ci;</userinput>
+ERROR 1273 (HY000): Unknown collation: 'utf8_test_ci'
+mysql&gt; <userinput>SHOW WARNINGS;</userinput>
++---------+------+---------------------------------------+
+| Level   | Code | Message                               |
++---------+------+---------------------------------------+
+| Error   | 1273 | Unknown collation: 'utf8_test_ci'     |
+| Warning | 1273 | Expansion is too long at 'hi&lt;/reset&gt;' |
++---------+------+---------------------------------------+
+</programlisting>
+          </listitem>
+
+          <listitem>
+            <para>
+              If collation initialization is not possible, the server
+              reports an <quote>Unknown collation</quote> error, and
+              also generates warnings explaining the problems, such as
+              in the previous example. In other cases, when a collation
+              description is generally correct but contains some unknown
+              tags, the collation is initialized and is available for
+              use. The unknown parts are ignored, but a warning is
+              generated in the error log.
+            </para>
+          </listitem>
+
+          <listitem>
+            <para></para>
+          </listitem>
+
+        </itemizedlist>
+
+      </section>
+
     </section>
 
   </section>


Modified: trunk/refman-6.0/globalization.xml
===================================================================
--- trunk/refman-6.0/globalization.xml	2011-05-23 14:32:58 UTC (rev 26310)
+++ trunk/refman-6.0/globalization.xml	2011-05-23 15:26:29 UTC (rev 26311)
Changed blocks: 4, Lines Added: 28, Lines Deleted: 24; 3355 bytes

@@ -9525,12 +9525,13 @@
 
           <listitem>
             <para>
-              A reset rule does not specify any ordering in and of
-              itself. Instead, it <quote>resets</quote> the ordering for
-              subsequent shift rules to cause them to be taken in
-              relation to a given character. Either of these rules
-              resets subsequent shift rules to be taken in relation to
-              the letter <literal>'A'</literal>:
+              A <literal>&lt;reset&gt;</literal> rule does not specify
+              any ordering in and of itself. Instead, it
+              <quote>resets</quote> the ordering for subsequent shift
+              rules to cause them to be taken in relation to a given
+              character. Either of the following rules resets subsequent
+              shift rules to be taken in relation to the letter
+              <literal>'A'</literal>:
             </para>
 
 <programlisting>

@@ -9542,21 +9543,13 @@
 
           <listitem>
             <para>
-              Shift rules define primary, secondary, and tertiary
-              differences of a character from another character. To
-              specify them, use <literal>&lt;p&gt;</literal>,
+              The <literal>&lt;p&gt;</literal>,
               <literal>&lt;s&gt;</literal>, and
-              <literal>&lt;t&gt;</literal> elements. Either of these
-              rules specifies a primary shift rule for the
-              <literal>'G'</literal> character:
+              <literal>&lt;t&gt;</literal>, shift rules define primary,
+              secondary, and tertiary differences of a character from
+              another character:
             </para>
 
-<programlisting>
-&lt;p&gt;G&lt;/p&gt;
-
-&lt;p&gt;\u0047&lt;/p&gt;
-</programlisting>
-
             <itemizedlist>
 
               <listitem>

@@ -9581,13 +9574,24 @@
               </listitem>
 
             </itemizedlist>
+
+            <para>
+              Either of these rules specifies a primary shift rule for
+              the <literal>'G'</literal> character:
+            </para>
+
+<programlisting>
+&lt;p&gt;G&lt;/p&gt;
+
+&lt;p&gt;\u0047&lt;/p&gt;
+</programlisting>
           </listitem>
 
           <listitem>
             <para>
-              Identity rules indicate that one character sorts
-              identically to another. These rules cause
-              <literal>'b'</literal> to sort the same as
+              The <literal>&lt;i&gt;</literal> shift rule indicates that
+              one character sorts identically to another. The following
+              rules cause <literal>'b'</literal> to sort the same as
               <literal>'a'</literal>:
             </para>
 

@@ -9597,9 +9601,9 @@
 </programlisting>
 
             <para>
-              Identity rules are supported as of MySQL 6.0.9. Prior to
-              6.0.9, use <literal>&lt;s&gt; ... &lt;/s&gt;</literal>
-              instead.
+              The <literal>&lt;i&gt;</literal> shift rules is supported
+              as of MySQL 6.0.9. Prior to 6.0.9, use <literal>&lt;s&gt;
+              ... &lt;/s&gt;</literal> instead.
             </para>
           </listitem>
 


Thread
svn commit - mysqldoc@oter02: r26311 - in trunk: . dynamic-docs/changelog refman-5.0 refman-5.1 refman-5.5 refman-5.6 refman-6.0paul.dubois23 May