List:Internals« Previous MessageNext Message »
From:stefan Date:June 19 2005 7:56pm
Subject:bk commit - mysqldoc@docsrva tree (stefan:1.2823)
View as plain text  
Below is the list of changes that have just been committed into a local
mysqldoc repository of stefan. When stefan does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://www.mysql.com/doc/I/n/Installing_source_tree.html

ChangeSet
  1.2823 05/06/19 21:56:21 stefan@stripped +3 -0
  functions.xml:
    Update paragraphs on FULLTEXT parser
    tolarify why we can't index Chinese and
    other languages that don't have word delimiters;
    add a hint on how you could still use FULLTEXT
    with Chinese
    Sync with refman

  refman/functions.xml
    1.2 05/06/19 21:54:45 stefan@stripped +21 -10
    Update paragraphs on FULLTEXT parser
    tolarify why we can't index Chinese and
    other languages that don't have word delimiters;
    add a hint on how you could still use FULLTEXT
    with Chinese

  refman-4.1/functions.xml
    1.2 05/06/19 21:54:33 stefan@stripped +21 -10
    Sync with refman

  refman-5.0/functions.xml
    1.2 05/06/19 21:54:16 stefan@stripped +21 -10
    Sync with refman

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User:	stefan
# Host:	apollon.site
# Root:	/home/stefan/bk/mysqldoc

--- 1.1/refman-4.1/functions.xml	2005-06-16 21:35:22 +02:00
+++ 1.2/refman-4.1/functions.xml	2005-06-19 21:54:33 +02:00
@@ -9435,16 +9435,27 @@
 </programlisting>
 
   <para>
-   MySQL uses a very simple parser to split text into words. A ``word''
-   is any sequence of true word characters (letters, digits, and
-   underscores), optionally separated by no more than one sequential
-   '<literal>'</literal>' character. For example,
-   <literal>wasn't</literal> is parsed as a single word, but
-   <literal>wasn''t</literal> is parsed as two words
-   <literal>wasn</literal> and <literal>t</literal>. (And then
-   <literal>t</literal> would be ignored as too short according to the
-   rules following.) Also, single quotes at the ends of words are
-   stripped; only embedded single quotes are retained.
+   The MySQL <literal>FULLTEXT</literal> implementation regards any sequence 
+   of true word characters (letters, digits, and underscores) as a word. 
+   That sequence may also contain apostrophes (<literal>'</literal>), 
+   but not more than one in a row. This means that <literal>aaa'bbb</literal> 
+   is regarded as one word, but <literal>aaa''bbb</literal> is regarded as two 
+   words. Apostrophes at the beginning or the end of a word are stripped by 
+   the <literal>FULLTEXT</literal> parser; <literal>'aaa'bbb'</literal> would 
+   be parsed as <literal>aaa'bbb</literal>.
+  </para>
+
+  <para>
+   The <literal>FULLTEXT</literal> parser determines where words start and end 
+   by looking for certain delimiters, for example <literal>' '</literal> 
+   (the space character), <literal>,</literal> (the comma), and 
+   <literal>.</literal> (the period). If words aren't separated by delimiters, 
+   like for example Chinese words, the <literal>FULLTEXT</literal> parser 
+   cannot determine where a word starts and where it ends. To be able to add 
+   words or other indexed terms in such languages to a 
+   <literal>FULLTEXT</literal> index, you'd have to preprocess them so that 
+   they are separated by some delimiter (for example, 
+   by <literal>''</literal>). 
   </para>
 
   <para>

--- 1.1/refman-5.0/functions.xml	2005-06-16 21:46:21 +02:00
+++ 1.2/refman-5.0/functions.xml	2005-06-19 21:54:16 +02:00
@@ -9435,16 +9435,27 @@
 </programlisting>
 
   <para>
-   MySQL uses a very simple parser to split text into words. A ``word''
-   is any sequence of true word characters (letters, digits, and
-   underscores), optionally separated by no more than one sequential
-   '<literal>'</literal>' character. For example,
-   <literal>wasn't</literal> is parsed as a single word, but
-   <literal>wasn''t</literal> is parsed as two words
-   <literal>wasn</literal> and <literal>t</literal>. (And then
-   <literal>t</literal> would be ignored as too short according to the
-   rules following.) Also, single quotes at the ends of words are
-   stripped; only embedded single quotes are retained.
+   The MySQL <literal>FULLTEXT</literal> implementation regards any sequence 
+   of true word characters (letters, digits, and underscores) as a word. 
+   That sequence may also contain apostrophes (<literal>'</literal>), 
+   but not more than one in a row. This means that <literal>aaa'bbb</literal> 
+   is regarded as one word, but <literal>aaa''bbb</literal> is regarded as two 
+   words. Apostrophes at the beginning or the end of a word are stripped by 
+   the <literal>FULLTEXT</literal> parser; <literal>'aaa'bbb'</literal> would 
+   be parsed as <literal>aaa'bbb</literal>.
+  </para>
+
+  <para>
+   The <literal>FULLTEXT</literal> parser determines where words start and end 
+   by looking for certain delimiters, for example <literal>' '</literal> 
+   (the space character), <literal>,</literal> (the comma), and 
+   <literal>.</literal> (the period). If words aren't separated by delimiters, 
+   like for example Chinese words, the <literal>FULLTEXT</literal> parser 
+   cannot determine where a word starts and where it ends. To be able to add 
+   words or other indexed terms in such languages to a 
+   <literal>FULLTEXT</literal> index, you'd have to preprocess them so that 
+   they are separated by some delimiter (for example, 
+   by <literal>''</literal>). 
   </para>
 
   <para>

--- 1.1/refman/functions.xml	2005-06-16 01:46:57 +02:00
+++ 1.2/refman/functions.xml	2005-06-19 21:54:45 +02:00
@@ -9435,16 +9435,27 @@
 </programlisting>
 
   <para>
-   MySQL uses a very simple parser to split text into words. A ``word''
-   is any sequence of true word characters (letters, digits, and
-   underscores), optionally separated by no more than one sequential
-   '<literal>'</literal>' character. For example,
-   <literal>wasn't</literal> is parsed as a single word, but
-   <literal>wasn''t</literal> is parsed as two words
-   <literal>wasn</literal> and <literal>t</literal>. (And then
-   <literal>t</literal> would be ignored as too short according to the
-   rules following.) Also, single quotes at the ends of words are
-   stripped; only embedded single quotes are retained.
+   The MySQL <literal>FULLTEXT</literal> implementation regards any sequence 
+   of true word characters (letters, digits, and underscores) as a word. 
+   That sequence may also contain apostrophes (<literal>'</literal>), 
+   but not more than one in a row. This means that <literal>aaa'bbb</literal> 
+   is regarded as one word, but <literal>aaa''bbb</literal> is regarded as two 
+   words. Apostrophes at the beginning or the end of a word are stripped by 
+   the <literal>FULLTEXT</literal> parser; <literal>'aaa'bbb'</literal> would 
+   be parsed as <literal>aaa'bbb</literal>.
+  </para>
+
+  <para>
+   The <literal>FULLTEXT</literal> parser determines where words start and end 
+   by looking for certain delimiters, for example <literal>' '</literal> 
+   (the space character), <literal>,</literal> (the comma), and 
+   <literal>.</literal> (the period). If words aren't separated by delimiters, 
+   like for example Chinese words, the <literal>FULLTEXT</literal> parser 
+   cannot determine where a word starts and where it ends. To be able to add 
+   words or other indexed terms in such languages to a 
+   <literal>FULLTEXT</literal> index, you'd have to preprocess them so that 
+   they are separated by some delimiter (for example, 
+   by <literal>''</literal>). 
   </para>
 
   <para>
Thread
bk commit - mysqldoc@docsrva tree (stefan:1.2823)stefan19 Jun