Author: paul
Date: 2006-04-10 17:41:01 +0200 (Mon, 10 Apr 2006)
New Revision: 1775
Log:
r9334@frost: paul | 2006-04-10 10:39:28 -0500
Document database/table identifier encoding in filesystem. (WL#1324)
Modified:
trunk/
trunk/refman-4.1/language-structure.xml
trunk/refman-5.0/language-structure.xml
trunk/refman-5.1/installing.xml
trunk/refman-5.1/language-structure.xml
trunk/refman-5.1/sql-syntax.xml
trunk/refman-5.1/storage-engines.xml
trunk/refman-common/news-5.1.xml
trunk/refman-common/titles.en.ent
Property changes on: trunk
___________________________________________________________________
Name: svk:merge
- b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:9331
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:4334
+ b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:9334
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:4334
Modified: trunk/refman-4.1/language-structure.xml
===================================================================
--- trunk/refman-4.1/language-structure.xml 2006-04-10 15:29:51 UTC (rev 1774)
+++ trunk/refman-4.1/language-structure.xml 2006-04-10 15:41:01 UTC (rev 1775)
@@ -782,64 +782,94 @@
</para>
<para>
- The following table describes the maximum length and allowable
- characters for each type of identifier.
+ The following table describes the maximum length for each type of
+ identifier.
</para>
<informaltable>
<tgroup cols="3">
<colspec colwidth="15*"/>
<colspec colwidth="15*"/>
- <colspec colwidth="70*"/>
<tbody>
<row>
<entry><emphasis role="bold">Identifier</emphasis></entry>
- <entry><emphasis role="bold">Maximum Length (bytes)</emphasis></entry>
- <entry><emphasis role="bold">Allowed Characters</emphasis></entry>
+ <entry><emphasis role="bold">Maximum Length</emphasis></entry>
</row>
<row>
<entry>Database</entry>
<entry>64</entry>
- <entry>Any character that is allowed in a directory name, except
- ‘<literal>/</literal>’,
- ‘<literal>\</literal>’, or
- ‘<literal>.</literal>’</entry>
</row>
<row>
<entry>Table</entry>
<entry>64</entry>
- <entry>Any character that is allowed in a filename, except
- ‘<literal>/</literal>’,
- ‘<literal>\</literal>’, or
- ‘<literal>.</literal>’</entry>
</row>
<row>
<entry>Column</entry>
<entry>64</entry>
- <entry>All characters</entry>
</row>
<row>
<entry>Index</entry>
<entry>64</entry>
- <entry>All characters</entry>
</row>
<row>
<entry>Alias</entry>
<entry>255</entry>
- <entry>All characters</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
- In addition to the restrictions noted in the table, no identifier
- can contain ASCII 0 or a byte with a value of 255. Database,
- table, and column names should not end with space characters.
- Before MySQL 4.1, identifier quote characters should not be used
- in identifiers.
+ There are some restrictions on the characters that may appear in
+ identifiers:
</para>
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ No identifier can contain ASCII 0 (<literal>0x00</literal>) or
+ a byte with a value of 255.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Before MySQL 4.1, identifier quote characters should not be
+ used in identifiers. As of 4.1, the use of identifier quote
+ characters in identifiers is permitted, although it is best to
+ avoid doing so if possible.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Database, table, and column names should not end with space
+ characters.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Database names cannot contain
+ ‘<literal>/</literal>’,
+ ‘<literal>\</literal>’,
+ ‘<literal>.</literal>’, or characters that are not
+ allowed in a directory name.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Table names cannot contain ‘<literal>/</literal>’,
+ ‘<literal>\</literal>’,
+ ‘<literal>.</literal>’, or characters that are not
+ allowed in a filename.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
<para>
Beginning with MySQL 4.1, identifiers are stored using Unicode
(UTF-8). This applies to identifiers in table definitions that
Modified: trunk/refman-5.0/language-structure.xml
===================================================================
--- trunk/refman-5.0/language-structure.xml 2006-04-10 15:29:51 UTC (rev 1774)
+++ trunk/refman-5.0/language-structure.xml 2006-04-10 15:41:01 UTC (rev 1775)
@@ -808,64 +808,92 @@
</para>
<para>
- The following table describes the maximum length and allowable
- characters for each type of identifier.
+ The following table describes the maximum length for each type of
+ identifier.
</para>
<informaltable>
<tgroup cols="3">
<colspec colwidth="15*"/>
<colspec colwidth="15*"/>
- <colspec colwidth="70*"/>
<tbody>
<row>
<entry><emphasis role="bold">Identifier</emphasis></entry>
- <entry><emphasis role="bold">Maximum Length (bytes)</emphasis></entry>
- <entry><emphasis role="bold">Allowed Characters</emphasis></entry>
+ <entry><emphasis role="bold">Maximum Length</emphasis></entry>
</row>
<row>
<entry>Database</entry>
<entry>64</entry>
- <entry>Any character that is allowed in a directory name, except
- ‘<literal>/</literal>’,
- ‘<literal>\</literal>’, or
- ‘<literal>.</literal>’</entry>
</row>
<row>
<entry>Table</entry>
<entry>64</entry>
- <entry>Any character that is allowed in a filename, except
- ‘<literal>/</literal>’,
- ‘<literal>\</literal>’, or
- ‘<literal>.</literal>’</entry>
</row>
<row>
<entry>Column</entry>
<entry>64</entry>
- <entry>All characters</entry>
</row>
<row>
<entry>Index</entry>
<entry>64</entry>
- <entry>All characters</entry>
</row>
<row>
<entry>Alias</entry>
<entry>255</entry>
- <entry>All characters</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
- In addition to the restrictions noted in the table, no identifier
- can contain ASCII 0 or a byte with a value of 255. Database,
- table, and column names should not end with space characters. The
- use of identifier quote characters in identifiers is permitted,
- although it is best to avoid doing so if possible.
+ There are some restrictions on the characters that may appear in
+ identifiers:
</para>
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ No identifier can contain ASCII 0 (<literal>0x00</literal>) or
+ a byte with a value of 255.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The use of identifier quote characters in identifiers is
+ permitted, although it is best to avoid doing so if possible.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Database, table, and column names should not end with space
+ characters.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Database names cannot contain
+ ‘<literal>/</literal>’,
+ ‘<literal>\</literal>’,
+ ‘<literal>.</literal>’, or characters that are not
+ allowed in a directory name.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Table names cannot contain ‘<literal>/</literal>’,
+ ‘<literal>\</literal>’,
+ ‘<literal>.</literal>’, or characters that are not
+ allowed in a filename.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
<para>
Identifiers are stored using Unicode (UTF-8). This applies to
identifiers in table definitions that stored in
Modified: trunk/refman-5.1/installing.xml
===================================================================
--- trunk/refman-5.1/installing.xml 2006-04-10 15:29:51 UTC (rev 1774)
+++ trunk/refman-5.1/installing.xml 2006-04-10 15:41:01 UTC (rev 1775)
@@ -11914,6 +11914,19 @@
</para>
</listitem>
+ <listitem>
+ <para>
+ As of MySQL 5.1.6, special characters in database and table
+ identifiers are encoded when creating the corresponding
+ directory names and filenames. This relaxes the restrictions
+ on the characters that can appear in identifiers. See
+ <xref linkend="identifier-mapping"/>. When you run
+ <command>mysql_upgrade</command>, it will cause database and
+ table names to be updated to the new format should they
+ contain special characters.
+ </para>
+ </listitem>
+
</itemizedlist>
<remark>
Modified: trunk/refman-5.1/language-structure.xml
===================================================================
--- trunk/refman-5.1/language-structure.xml 2006-04-10 15:29:51 UTC (rev 1774)
+++ trunk/refman-5.1/language-structure.xml 2006-04-10 15:41:01 UTC (rev 1775)
@@ -808,65 +808,100 @@
</para>
<para>
- The following table describes the maximum length and allowable
- characters for each type of identifier.
+ The following table describes the maximum length for each type of
+ identifier.
</para>
<informaltable>
<tgroup cols="3">
<colspec colwidth="15*"/>
<colspec colwidth="15*"/>
- <colspec colwidth="70*"/>
<tbody>
<row>
<entry><emphasis role="bold">Identifier</emphasis></entry>
- <entry><emphasis role="bold">Maximum Length (bytes)</emphasis></entry>
- <entry><emphasis role="bold">Allowed Characters</emphasis></entry>
+ <entry><emphasis role="bold">Maximum Length</emphasis></entry>
</row>
<row>
<entry>Database</entry>
<entry>64</entry>
- <entry>Any character that is allowed in a directory name, except
- ‘<literal>/</literal>’,
- ‘<literal>\</literal>’, or
- ‘<literal>.</literal>’</entry>
</row>
<row>
<entry>Table</entry>
<entry>64</entry>
- <entry>Any character that is allowed in a filename, except
- ‘<literal>/</literal>’,
- ‘<literal>\</literal>’, or
- ‘<literal>.</literal>’</entry>
</row>
<row>
<entry>Column</entry>
<entry>64</entry>
- <entry>All characters</entry>
</row>
<row>
<entry>Index</entry>
<entry>64</entry>
- <entry>All characters</entry>
</row>
<row>
<entry>Alias</entry>
<entry>255</entry>
- <entry>All characters</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>
- In addition to the restrictions noted in the table, no identifier
- can contain ASCII 0 or a byte with a value of 255. Database,
- table, and column names should not end with space characters. The
- use of identifier quote characters in identifiers is permitted,
- although it is best to avoid doing so if possible.
+ There are some restrictions on the characters that may appear in
+ identifiers:
</para>
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ No identifier can contain ASCII 0 (<literal>0x00</literal>) or
+ a byte with a value of 255.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The use of identifier quote characters in identifiers is
+ permitted, although it is best to avoid doing so if possible.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Database, table, and column names should not end with space
+ characters.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Before MySQL 5.1.6, database names cannot contain
+ ‘<literal>/</literal>’,
+ ‘<literal>\</literal>’,
+ ‘<literal>.</literal>’, or characters that are not
+ allowed in a directory name.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Before MySQL 5.1.6, table names cannot contain
+ ‘<literal>/</literal>’,
+ ‘<literal>\</literal>’,
+ ‘<literal>.</literal>’, or characters that are not
+ allowed in a filename.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
<para>
+ As of MySQL 5.1.6, special characters in database and table names
+ are encoded in the corresponding filesystem names as described in
+ <xref linkend="identifier-mapping"/>.
+ </para>
+
+ <para>
Identifiers are stored using Unicode (UTF-8). This applies to
identifiers in table definitions that stored in
<filename>.frm</filename> files and to identifiers stored in the
@@ -1289,6 +1324,111 @@
</section>
+ <section id="identifier-mapping">
+
+ <title>&title-identifier-mapping;</title>
+
+ <para>
+ There is a correspondence between database and table identifiers
+ and names in the filesystem. MySQL represents each database as a
+ directory in the data directory, and each table by one or more
+ files in the appropriate database directory.
+ </para>
+
+ <para>
+ Before MySQL 5.1.6, there are some limitations on the characters
+ that can be used in identifiers for database objects that
+ correspond to filesystem objects. For example, pathname
+ separator characters are disallowed, and
+ ‘<literal>.</literal>’ is disallowed because it
+ begins the extension for table files.
+ </para>
+
+ <para>
+ As of MySQL 5.1.6, any character is legal in database or table
+ identifiers except ASCII NUL (<literal>0x00</literal>). MySQL
+ encodes any characters that are problematic in the corresponding
+ filesystem objects when it creates database directories or table
+ files:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Basic Latin letters (<literal>a..zA..Z</literal>) and digits
+ (<literal>0..9</literal>) are encoded as is. Consequently,
+ their case sensitivity directly depends on filesystem
+ features.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ All other national letters from alphabets that have
+ uppercase/lowercase mapping are encoded as follows:
+ </para>
+
+<programlisting>
+Code range Pattern Number Used Unused Blocks
+-----------------------------------------------------------------------------
+00C0..017F [@][0..4][g..z] 5*20= 100 97 3 Latin1 Supplement + Ext A
+0370..03FF [@][5..9][g..z] 5*20= 100 88 12 Greek + Coptic
+0400..052F [@][g..z][0..6] 20*7= 140 140 137 Cyrillic
+0530..058F [@][g..z][7..8] 20*2= 40 38 2 Armenian
+2160..217F [@][g..z][9] 20*1= 20 16 4 Number Forms
+0180..02AF [@][g..z][a..k] 28*11=220 203 17 Latin Ext B + IPA
+1E00..0EFF [@][g..z][l..r] 20*7= 140 136 4 Latin Additional Extended
+1F00..1FFF [@][g..z][s..z] 20*8= 160 144 16 Greek Extended
+.... .... [@][a..f][g..z] 6*20= 120 0 120 RESERVED
+24B6..24E9 [@][@][a..z] 26 26 0 Enclosed Alphanumerics
+FF21..FF5A [@][a..z][@] 26 26 0 Full Width forms
+</programlisting>
+
+ <para>
+ One of the bytes in the sequence encodes lettercase. For
+ example: <literal>LATIN CAPITAL LETTER A WITH
+ GRAVE</literal> is encoded as <literal>@0G</literal>,
+ whereas <literal>LATIN SMALL LETTER A WITH GRAVE</literal>
+ is encoded as <literal>@0g</literal>. Here the third byte
+ (<literal>G</literal> or <literal>g</literal>) indicates
+ lettercase. (On a case-insensitive filesystem, both letters
+ will be treated as the same.)
+ </para>
+
+ <para>
+ For some blocks, such as Cyrillic, the second byte
+ determines lettercase. For other blocks, such as Latin1
+ Supplement, the third byte determines lettercase. If two
+ bytes in the sequence are letters (as in Greek Extended),
+ the leftmost letter character stands for lettercase. All
+ other letter bytes must be in lowercase.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ All non-letter characters, as well as letters from alphabets
+ that do not have uppercase/lowercase mapping (such Hebrew)
+ are encoded using hexadecimal representation using lowercase
+ letters for hex digits <literal>a..f</literal>:
+ </para>
+
+<programlisting>
+0x003F -> @003f
+0xFFFF -> @ffff
+</programlisting>
+
+ <para>
+ The hexadecimal values corrrespond to character values in
+ the <literal>ucs2</literal> double-byte character set.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ </section>
+
</section>
<section id="user-variables">
Modified: trunk/refman-5.1/sql-syntax.xml
===================================================================
--- trunk/refman-5.1/sql-syntax.xml 2006-04-10 15:29:51 UTC (rev 1774)
+++ trunk/refman-5.1/sql-syntax.xml 2006-04-10 15:41:01 UTC (rev 1775)
@@ -1436,7 +1436,10 @@
<literal>CREATE DATABASE</literal> statement creates only a
directory under the MySQL data directory and the
<filename>db.opt</filename> file. Rules for allowable database
- names are given in <xref linkend="legal-names"/>.
+ names are given in <xref linkend="legal-names"/>. If a database
+ name contains special characters, the name for the database
+ directory contains encoded versions of those characters as
+ described in <xref linkend="identifier-mapping"/>.
</para>
<para>
@@ -1947,7 +1950,10 @@
<para>
<xref linkend="storage-engines"/>, describes what files each
- storage engine creates to represent tables.
+ storage engine creates to represent tables. If a table name
+ contains special characters, the names for the table files
+ contain encoded versions of those characters as described in
+ <xref linkend="identifier-mapping"/>.
</para>
<para>
Modified: trunk/refman-5.1/storage-engines.xml
===================================================================
--- trunk/refman-5.1/storage-engines.xml 2006-04-10 15:29:51 UTC (rev 1774)
+++ trunk/refman-5.1/storage-engines.xml 2006-04-10 15:41:01 UTC (rev 1775)
@@ -273,7 +273,9 @@
storage engine. The server creates the <filename>.frm</filename>
file above the storage engine level. Individual storage engines
create any additional files required for the tables that they
- manage.
+ manage. If a table name contains special characters, the names for
+ the table files contain encoded versions of those characters as
+ described in <xref linkend="identifier-mapping"/>.
</para>
<para>
Modified: trunk/refman-common/news-5.1.xml
===================================================================
--- trunk/refman-common/news-5.1.xml 2006-04-10 15:29:51 UTC (rev 1774)
+++ trunk/refman-common/news-5.1.xml 2006-04-10 15:41:01 UTC (rev 1775)
@@ -2681,6 +2681,16 @@
<listitem>
<para>
+ Special characters in database and table identifiers now are
+ encoded when creating the corresponding directory names and
+ filenames. This relaxes the restrictions on the characters
+ that can appear in identifiers. See
+ <xref linkend="identifier-mapping"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
Queries against partitioned tables can now take advantage of
partition pruning. In some cases, this can result in query
execution that is an order of magnitude faster than the same
Modified: trunk/refman-common/titles.en.ent
===================================================================
--- trunk/refman-common/titles.en.ent 2006-04-10 15:29:51 UTC (rev 1774)
+++ trunk/refman-common/titles.en.ent 2006-04-10 15:41:01 UTC (rev 1775)
@@ -530,6 +530,7 @@
<!ENTITY title-hp-ux-11-x "HP-UX Version 11.x Notes">
<!ENTITY title-ibm-aix "IBM-AIX notes">
<!ENTITY title-identifier-qualifiers "Identifier Qualifiers">
+<!ENTITY title-identifier-mapping "Mapping of Identifiers to Filenames">
<!ENTITY title-if-statement "<literal>IF</literal> Statement">
<!ENTITY title-ignoring-user "<literal>Ignoring user</literal>">
<!ENTITY title-implicit-commit "Statements That Cause an Implicit Commit">
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r1775 - in trunk: . refman-4.1 refman-5.0 refman-5.1 refman-common | paul | 10 Apr |