Below is the list of changes that have just been committed into a local
mysqldoc repository of stefan. When stefan does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://www.mysql.com/doc/I/n/Installing_source_tree.html
ChangeSet
1.2823 05/06/19 21:56:21 stefan@stripped +3 -0
functions.xml:
Update paragraphs on FULLTEXT parser
to[C[C[Clarify why we can't index Chinese and
other languages that don't have word delimiters;
add a hint on how you could still use FULLTEXT
with Chinese
Sync with refman
refman/functions.xml
1.2 05/06/19 21:54:45 stefan@stripped +21 -10
Update paragraphs on FULLTEXT parser
to[C[C[Clarify why we can't index Chinese and
other languages that don't have word delimiters;
add a hint on how you could still use FULLTEXT
with Chinese
refman-4.1/functions.xml
1.2 05/06/19 21:54:33 stefan@stripped +21 -10
Sync with refman
refman-5.0/functions.xml
1.2 05/06/19 21:54:16 stefan@stripped +21 -10
Sync with refman
# This is a BitKeeper patch. What follows are the unified diffs for the
# set of deltas contained in the patch. The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User: stefan
# Host: apollon.site
# Root: /home/stefan/bk/mysqldoc
--- 1.1/refman-4.1/functions.xml 2005-06-16 21:35:22 +02:00
+++ 1.2/refman-4.1/functions.xml 2005-06-19 21:54:33 +02:00
@@ -9435,16 +9435,27 @@
</programlisting>
<para>
- MySQL uses a very simple parser to split text into words. A ``word''
- is any sequence of true word characters (letters, digits, and
- underscores), optionally separated by no more than one sequential
- '<literal>'</literal>' character. For example,
- <literal>wasn't</literal> is parsed as a single word, but
- <literal>wasn''t</literal> is parsed as two words
- <literal>wasn</literal> and <literal>t</literal>. (And then
- <literal>t</literal> would be ignored as too short according to the
- rules following.) Also, single quotes at the ends of words are
- stripped; only embedded single quotes are retained.
+ The MySQL <literal>FULLTEXT</literal> implementation regards any sequence
+ of true word characters (letters, digits, and underscores) as a word.
+ That sequence may also contain apostrophes (<literal>'</literal>),
+ but not more than one in a row. This means that <literal>aaa'bbb</literal>
+ is regarded as one word, but <literal>aaa''bbb</literal> is regarded as two
+ words. Apostrophes at the beginning or the end of a word are stripped by
+ the <literal>FULLTEXT</literal> parser; <literal>'aaa'bbb'</literal> would
+ be parsed as <literal>aaa'bbb</literal>.
+ </para>
+
+ <para>
+ The <literal>FULLTEXT</literal> parser determines where words start and end
+ by looking for certain delimiters, for example <literal>' '</literal>
+ (the space character), <literal>,</literal> (the comma), and
+ <literal>.</literal> (the period). If words aren't separated by delimiters,
+ like for example Chinese words, the <literal>FULLTEXT</literal> parser
+ cannot determine where a word starts and where it ends. To be able to add
+ words or other indexed terms in such languages to a
+ <literal>FULLTEXT</literal> index, you'd have to preprocess them so that
+ they are separated by some delimiter (for example,
+ by <literal>''</literal>).
</para>
<para>
--- 1.1/refman-5.0/functions.xml 2005-06-16 21:46:21 +02:00
+++ 1.2/refman-5.0/functions.xml 2005-06-19 21:54:16 +02:00
@@ -9435,16 +9435,27 @@
</programlisting>
<para>
- MySQL uses a very simple parser to split text into words. A ``word''
- is any sequence of true word characters (letters, digits, and
- underscores), optionally separated by no more than one sequential
- '<literal>'</literal>' character. For example,
- <literal>wasn't</literal> is parsed as a single word, but
- <literal>wasn''t</literal> is parsed as two words
- <literal>wasn</literal> and <literal>t</literal>. (And then
- <literal>t</literal> would be ignored as too short according to the
- rules following.) Also, single quotes at the ends of words are
- stripped; only embedded single quotes are retained.
+ The MySQL <literal>FULLTEXT</literal> implementation regards any sequence
+ of true word characters (letters, digits, and underscores) as a word.
+ That sequence may also contain apostrophes (<literal>'</literal>),
+ but not more than one in a row. This means that <literal>aaa'bbb</literal>
+ is regarded as one word, but <literal>aaa''bbb</literal> is regarded as two
+ words. Apostrophes at the beginning or the end of a word are stripped by
+ the <literal>FULLTEXT</literal> parser; <literal>'aaa'bbb'</literal> would
+ be parsed as <literal>aaa'bbb</literal>.
+ </para>
+
+ <para>
+ The <literal>FULLTEXT</literal> parser determines where words start and end
+ by looking for certain delimiters, for example <literal>' '</literal>
+ (the space character), <literal>,</literal> (the comma), and
+ <literal>.</literal> (the period). If words aren't separated by delimiters,
+ like for example Chinese words, the <literal>FULLTEXT</literal> parser
+ cannot determine where a word starts and where it ends. To be able to add
+ words or other indexed terms in such languages to a
+ <literal>FULLTEXT</literal> index, you'd have to preprocess them so that
+ they are separated by some delimiter (for example,
+ by <literal>''</literal>).
</para>
<para>
--- 1.1/refman/functions.xml 2005-06-16 01:46:57 +02:00
+++ 1.2/refman/functions.xml 2005-06-19 21:54:45 +02:00
@@ -9435,16 +9435,27 @@
</programlisting>
<para>
- MySQL uses a very simple parser to split text into words. A ``word''
- is any sequence of true word characters (letters, digits, and
- underscores), optionally separated by no more than one sequential
- '<literal>'</literal>' character. For example,
- <literal>wasn't</literal> is parsed as a single word, but
- <literal>wasn''t</literal> is parsed as two words
- <literal>wasn</literal> and <literal>t</literal>. (And then
- <literal>t</literal> would be ignored as too short according to the
- rules following.) Also, single quotes at the ends of words are
- stripped; only embedded single quotes are retained.
+ The MySQL <literal>FULLTEXT</literal> implementation regards any sequence
+ of true word characters (letters, digits, and underscores) as a word.
+ That sequence may also contain apostrophes (<literal>'</literal>),
+ but not more than one in a row. This means that <literal>aaa'bbb</literal>
+ is regarded as one word, but <literal>aaa''bbb</literal> is regarded as two
+ words. Apostrophes at the beginning or the end of a word are stripped by
+ the <literal>FULLTEXT</literal> parser; <literal>'aaa'bbb'</literal> would
+ be parsed as <literal>aaa'bbb</literal>.
+ </para>
+
+ <para>
+ The <literal>FULLTEXT</literal> parser determines where words start and end
+ by looking for certain delimiters, for example <literal>' '</literal>
+ (the space character), <literal>,</literal> (the comma), and
+ <literal>.</literal> (the period). If words aren't separated by delimiters,
+ like for example Chinese words, the <literal>FULLTEXT</literal> parser
+ cannot determine where a word starts and where it ends. To be able to add
+ words or other indexed terms in such languages to a
+ <literal>FULLTEXT</literal> index, you'd have to preprocess them so that
+ they are separated by some delimiter (for example,
+ by <literal>''</literal>).
</para>
<para>
| Thread |
|---|
| • bk commit - mysqldoc@docsrva tree (stefan:1.2823) | stefan | 19 Jun |