Author: paul
Date: 2006-01-30 16:00:57 +0100 (Mon, 30 Jan 2006)
New Revision: 1124
Log:
r6910@frost: paul | 2006-01-30 08:57:54 -0600
Move regexp.xml to refman-common directory.
Added:
trunk/refman-common/regexp.xml
Removed:
trunk/refman-4.1/regexp.xml
trunk/refman-5.0/regexp.xml
trunk/refman-5.1/regexp.xml
Modified:
trunk/
Property changes on: trunk
___________________________________________________________________
Name: svk:merge
- b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:6904
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:2588
+ b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:6910
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:2588
Deleted: trunk/refman-4.1/regexp.xml
Deleted: trunk/refman-5.0/regexp.xml
Deleted: trunk/refman-5.1/regexp.xml
Added: trunk/refman-common/regexp.xml
===================================================================
--- trunk/refman-common/regexp.xml 2006-01-30 14:13:09 UTC (rev 1123)
+++ trunk/refman-common/regexp.xml 2006-01-30 15:00:57 UTC (rev 1124)
@@ -0,0 +1,477 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
+"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
+[
+ <!ENTITY % fixedchars.entities SYSTEM "../refman-common/fixedchars.ent">
+ %fixedchars.entities;
+ <!ENTITY % title.entities SYSTEM "../refman-common/titles.en.ent">
+ %title.entities;
+ <!ENTITY % versions.entities SYSTEM "versions.ent">
+ %versions.entities;
+]>
+<appendix id="regexp">
+
+ <title>&title-regexp;</title>
+
+ <indexterm>
+ <primary>regex</primary>
+ </indexterm>
+
+ <indexterm>
+ <primary>regular expression syntax</primary>
+ <secondary>described</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>syntax</primary>
+ <secondary>regular expression</secondary>
+ </indexterm>
+
+ <para>
+ A regular expression is a powerful way of specifying a pattern for a
+ complex search.
+ </para>
+
+ <para>
+ MySQL uses Henry Spencer's implementation of regular expressions,
+ which is aimed at conformance with POSIX 1003.2. See
+ <xref linkend="credits"/>. MySQL uses the extended version to
+ support pattern-matching operations performed with the
+ <literal>REGEXP</literal> operator in SQL statements. See
+ <xref linkend="pattern-matching"/>.
+ </para>
+
+ <para>
+ This appendix is a summary, with examples, of the special characters
+ and constructs that can be used in MySQL for
+ <literal>REGEXP</literal> operations. It does not contain all the
+ details that can be found in Henry Spencer's
+ <literal>regex(7)</literal> manual page. That manual page is
+ included in MySQL source distributions, in the
+ <filename>regex.7</filename> file under the
+ <filename>regex</filename> directory.
+ </para>
+
+ <para>
+ A regular expression describes a set of strings. The simplest
+ regular expression is one that has no special characters in it. For
+ example, the regular expression <literal>hello</literal> matches
+ <literal>hello</literal> and nothing else.
+ </para>
+
+ <para>
+ Non-trivial regular expressions use certain special constructs so
+ that they can match more than one string. For example, the regular
+ expression <literal>hello|word</literal> matches either the string
+ <literal>hello</literal> or the string <literal>word</literal>.
+ </para>
+
+ <para>
+ As a more complex example, the regular expression
+ <literal>B[an]*s</literal> matches any of the strings
+ <literal>Bananas</literal>, <literal>Baaaaas</literal>,
+ <literal>Bs</literal>, and any other string starting with a
+ <literal>B</literal>, ending with an <literal>s</literal>, and
+ containing any number of <literal>a</literal> or
+ <literal>n</literal> characters in between.
+ </para>
+
+ <para>
+ A regular expression for the <literal>REGEXP</literal> operator may
+ use any of the following special characters and constructs:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>^</literal>
+ </para>
+
+ <para>
+ Match the beginning of a string.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'fo\nfo' REGEXP '^fo$';</userinput> -> 0
+mysql> <userinput>SELECT 'fofo' REGEXP '^fo';</userinput> -> 1
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>$</literal>
+ </para>
+
+ <para>
+ Match the end of a string.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'fo\no' REGEXP '^fo\no$';</userinput> -> 1
+mysql> <userinput>SELECT 'fo\no' REGEXP '^fo$';</userinput> -> 0
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>.</literal>
+ </para>
+
+ <para>
+ Match any character (including carriage return and newline).
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'fofo' REGEXP '^f.*$';</userinput> -> 1
+mysql> <userinput>SELECT 'fo\r\nfo' REGEXP '^f.*$';</userinput> -> 1
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>a*</literal>
+ </para>
+
+ <para>
+ Match any sequence of zero or more <literal>a</literal>
+ characters.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'Ban' REGEXP '^Ba*n';</userinput> -> 1
+mysql> <userinput>SELECT 'Baaan' REGEXP '^Ba*n';</userinput> -> 1
+mysql> <userinput>SELECT 'Bn' REGEXP '^Ba*n';</userinput> -> 1
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>a+</literal>
+ </para>
+
+ <para>
+ Match any sequence of one or more <literal>a</literal>
+ characters.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'Ban' REGEXP '^Ba+n';</userinput> -> 1
+mysql> <userinput>SELECT 'Bn' REGEXP '^Ba+n';</userinput> -> 0
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>a?</literal>
+ </para>
+
+ <para>
+ Match either zero or one <literal>a</literal> character.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'Bn' REGEXP '^Ba?n';</userinput> -> 1
+mysql> <userinput>SELECT 'Ban' REGEXP '^Ba?n';</userinput> -> 1
+mysql> <userinput>SELECT 'Baan' REGEXP '^Ba?n';</userinput> -> 0
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>de|abc</literal>
+ </para>
+
+ <para>
+ Match either of the sequences <literal>de</literal> or
+ <literal>abc</literal>.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'pi' REGEXP 'pi|apa';</userinput> -> 1
+mysql> <userinput>SELECT 'axe' REGEXP 'pi|apa';</userinput> -> 0
+mysql> <userinput>SELECT 'apa' REGEXP 'pi|apa';</userinput> -> 1
+mysql> <userinput>SELECT 'apa' REGEXP '^(pi|apa)$';</userinput> -> 1
+mysql> <userinput>SELECT 'pi' REGEXP '^(pi|apa)$';</userinput> -> 1
+mysql> <userinput>SELECT 'pix' REGEXP '^(pi|apa)$';</userinput> -> 0
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>(abc)*</literal>
+ </para>
+
+ <para>
+ Match zero or more instances of the sequence
+ <literal>abc</literal>.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'pi' REGEXP '^(pi)*$';</userinput> -> 1
+mysql> <userinput>SELECT 'pip' REGEXP '^(pi)*$';</userinput> -> 0
+mysql> <userinput>SELECT 'pipi' REGEXP '^(pi)*$';</userinput> -> 1
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>{1}</literal>, <literal>{2,3}</literal>
+ </para>
+
+ <para>
+ <literal>{n}</literal> or <literal>{m,n}</literal> notation
+ provides a more general way of writing regular expressions that
+ match many occurrences of the previous atom (or
+ <quote>piece</quote>) of the pattern. <literal>m</literal> and
+ <literal>n</literal> are integers.
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>a*</literal>
+ </para>
+
+ <para>
+ Can be written as <literal>a{0,}</literal>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>a+</literal>
+ </para>
+
+ <para>
+ Can be written as <literal>a{1,}</literal>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>a?</literal>
+ </para>
+
+ <para>
+ Can be written as <literal>a{0,1}</literal>.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ To be more precise, <literal>a{n}</literal> matches exactly
+ <literal>n</literal> instances of <literal>a</literal>.
+ <literal>a{n,}</literal> matches <literal>n</literal> or more
+ instances of <literal>a</literal>. <literal>a{m,n}</literal>
+ matches <literal>m</literal> through <literal>n</literal>
+ instances of <literal>a</literal>, inclusive.
+ </para>
+
+ <para>
+ <literal>m</literal> and <literal>n</literal> must be in the
+ range from <literal>0</literal> to <literal>RE_DUP_MAX</literal>
+ (default 255), inclusive. If both <literal>m</literal> and
+ <literal>n</literal> are given, <literal>m</literal> must be
+ less than or equal to <literal>n</literal>.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'abcde' REGEXP 'a[bcd]{2}e';</userinput> -> 0
+mysql> <userinput>SELECT 'abcde' REGEXP 'a[bcd]{3}e';</userinput> -> 1
+mysql> <userinput>SELECT 'abcde' REGEXP 'a[bcd]{1,10}e';</userinput> -> 1
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>[a-dX]</literal>, <literal>[^a-dX]</literal>
+ </para>
+
+ <para>
+ Matches any character that is (or is not, if ^ is used) either
+ <literal>a</literal>, <literal>b</literal>,
+ <literal>c</literal>, <literal>d</literal> or
+ <literal>X</literal>. A <literal>-</literal> character between
+ two other characters forms a range that matches all characters
+ from the first character to the second. For example,
+ <literal>[0-9]</literal> matches any decimal digit. To include a
+ literal <literal>]</literal> character, it must immediately
+ follow the opening bracket <literal>[</literal>. To include a
+ literal <literal>-</literal> character, it must be written first
+ or last. Any character that does not have a defined special
+ meaning inside a <literal>[]</literal> pair matches only itself.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'aXbc' REGEXP '[a-dXYZ]';</userinput> -> 1
+mysql> <userinput>SELECT 'aXbc' REGEXP '^[a-dXYZ]$';</userinput> -> 0
+mysql> <userinput>SELECT 'aXbc' REGEXP '^[a-dXYZ]+$';</userinput> -> 1
+mysql> <userinput>SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$';</userinput> -> 0
+mysql> <userinput>SELECT 'gheis' REGEXP '^[^a-dXYZ]+$';</userinput> -> 1
+mysql> <userinput>SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$';</userinput> -> 0
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>[.characters.]</literal>
+ </para>
+
+ <para>
+ Within a bracket expression (written using <literal>[</literal>
+ and <literal>]</literal>), matches the sequence of characters of
+ that collating element. <literal>characters</literal> is either
+ a single character or a character name like
+ <literal>newline</literal>. You can find the full list of
+ character names in the <filename>regexp/cname.h</filename> file.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT '~' REGEXP '[[.~.]]';</userinput> -> 1
+mysql> <userinput>SELECT '~' REGEXP '[[.tilde.]]';</userinput> -> 1
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>[=character_class=]</literal>
+ </para>
+
+ <para>
+ Within a bracket expression (written using <literal>[</literal>
+ and <literal>]</literal>),
+ <literal>[=character_class=]</literal> represents an equivalence
+ class. It matches all characters with the same collation value,
+ including itself. For example, if <literal>o</literal> and
+ <literal>(+)</literal> are the members of an equivalence class,
+ then <literal>[[=o=]]</literal>, <literal>[[=(+)=]]</literal>,
+ and <literal>[o(+)]</literal> are all synonymous. An equivalence
+ class may not be used as an endpoint of a range.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>[:character_class:]</literal>
+ </para>
+
+ <para>
+ Within a bracket expression (written using <literal>[</literal>
+ and <literal>]</literal>),
+ <literal>[:character_class:]</literal> represents a character
+ class that matches all characters belonging to that class. The
+ standard class names are:
+ </para>
+
+ <informaltable>
+ <tgroup cols="2">
+ <colspec colwidth="10*"/>
+ <colspec colwidth="90*"/>
+ <tbody>
+ <row>
+ <entry><literal>alnum</literal></entry>
+ <entry>Alphanumeric characters</entry>
+ </row>
+ <row>
+ <entry><literal>alpha</literal></entry>
+ <entry>Alphabetic characters</entry>
+ </row>
+ <row>
+ <entry><literal>blank</literal></entry>
+ <entry>Whitespace characters</entry>
+ </row>
+ <row>
+ <entry><literal>cntrl</literal></entry>
+ <entry>Control characters</entry>
+ </row>
+ <row>
+ <entry><literal>digit</literal></entry>
+ <entry>Digit characters</entry>
+ </row>
+ <row>
+ <entry><literal>graph</literal></entry>
+ <entry>Graphic characters</entry>
+ </row>
+ <row>
+ <entry><literal>lower</literal></entry>
+ <entry>Lowercase alphabetic characters</entry>
+ </row>
+ <row>
+ <entry><literal>print</literal></entry>
+ <entry>Graphic or space characters</entry>
+ </row>
+ <row>
+ <entry><literal>punct</literal></entry>
+ <entry>Punctuation characters</entry>
+ </row>
+ <row>
+ <entry><literal>space</literal></entry>
+ <entry>Space, tab, newline, and carriage return</entry>
+ </row>
+ <row>
+ <entry><literal>upper</literal></entry>
+ <entry>Uppercase alphabetic characters</entry>
+ </row>
+ <row>
+ <entry><literal>xdigit</literal></entry>
+ <entry>Hexadecimal digit characters</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+
+ <para>
+ These stand for the character classes defined in the
+ <literal>ctype(3)</literal> manual page. A particular locale may
+ provide other class names. A character class may not be used as
+ an endpoint of a range.
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'justalnums' REGEXP '[[:alnum:]]+';</userinput> -> 1
+mysql> <userinput>SELECT '!!' REGEXP '[[:alnum:]]+';</userinput> -> 0
+</programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>[[:<:]]</literal>, <literal>[[:>:]]</literal>
+ </para>
+
+ <para>
+ These markers stand for word boundaries. They match the
+ beginning and end of words, respectively. A word is a sequence
+ of word characters that is not preceded by or followed by word
+ characters. A word character is an alphanumeric character in the
+ <literal>alnum</literal> class or an underscore
+ (<literal>_</literal>).
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';</userinput> -> 1
+mysql> <userinput>SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';</userinput> -> 0
+</programlisting>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ To use a literal instance of a special character in a regular
+ expression, precede it by two backslash (\) characters. The MySQL
+ parser interprets one of the backslashes, and the regular expression
+ library interprets the other. For example, to match the string
+ <literal>1+2</literal> that contains the special
+ <literal>+</literal> character, only the last of the following
+ regular expressions is the correct one:
+ </para>
+
+<programlisting>
+mysql> <userinput>SELECT '1+2' REGEXP '1+2';</userinput> -> 0
+mysql> <userinput>SELECT '1+2' REGEXP '1\+2';</userinput> -> 0
+mysql> <userinput>SELECT '1+2' REGEXP '1\\+2';</userinput> -> 1
+</programlisting>
+
+</appendix>
Property changes on: trunk/refman-common/regexp.xml
___________________________________________________________________
Name: svn:eol-style
+ LF
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r1124 - in trunk: . refman-4.1 refman-5.0 refman-5.1 refman-common | paul | 30 Jan |