List:Internals« Previous MessageNext Message »
From:konstantin Date:October 25 2005 8:05pm
Subject:bk commit into 5.0-fulltext tree (konstantin:1.2016)
View as plain text  
Below is the list of changes that have just been committed into a local
5.0-fulltext repository of kostja. When kostja does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html

ChangeSet
  1.2016 05/10/25 22:05:39 konstantin@stripped +3 -0
  Add comments to the custom parser and UDF for fulltext, cleanup.
  Change the commit trigger to report the right code branch.

  plugin/fulltext/cnet_weight.c
    1.7 05/10/25 22:05:27 konstantin@stripped +106 -12
    - remove non-portable includes; my_global.h must be included first
    - add comments.

  plugin/fulltext/cnet_parser.c
    1.8 05/10/25 22:05:27 konstantin@stripped +1 -9
    - use MySQL includes for portability; my_global.h must be included first.

  BitKeeper/triggers/post-commit
    1.38 05/10/25 22:05:27 konstantin@stripped +1 -1
    Change commit trigger to report the right version.

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User:	konstantin
# Host:	dragonfly.local
# Root:	/opt/local/work/mysql-5.0-cnet

--- 1.37/BitKeeper/triggers/post-commit	2005-04-05 21:52:01 +04:00
+++ 1.38/BitKeeper/triggers/post-commit	2005-10-25 22:05:27 +04:00
@@ -5,7 +5,7 @@
 INTERNALS=internals@stripped
 DOCS=docs-commit@stripped
 LIMIT=10000
-VERSION="5.0"
+VERSION="5.0-fulltext"
 
 if [ "$REAL_EMAIL" = "" ]
 then

--- 1.7/plugin/fulltext/cnet_parser.c	2005-10-25 01:19:54 +04:00
+++ 1.8/plugin/fulltext/cnet_parser.c	2005-10-25 22:05:27 +04:00
@@ -13,16 +13,8 @@
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
 
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <string.h>
-#include <ctype.h>
-#include <stdio.h>
-
 #include <my_global.h>
+#include <m_string.h>
 #include <m_ctype.h>
 #include <plugin.h>
 

--- 1.6/plugin/fulltext/cnet_weight.c	2005-10-25 01:19:54 +04:00
+++ 1.7/plugin/fulltext/cnet_weight.c	2005-10-25 22:05:27 +04:00
@@ -13,22 +13,113 @@
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
 
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <string.h>
-#include <ctype.h>
-#include <stdio.h>
-
 #include <my_global.h>
 #include <mysql.h>
+#include <m_string.h>
 #include <m_ctype.h>
 #include <plugin.h>
 
+/*
+  This file defines a non-aggregate User Defined Function (UDF)
+  for relevance calculation.
+
+  An introduction to UDFs in MySQL.
+  ---------------------------------
+
+  By means of a non-aggregate UDF a user can extend MySQL Server with an SQL
+  level function. Such function can be used for query processing just like
+  a built-in function, i.e. SUBSTRING, CONCAT, UPPER and so on.
+
+  A non-aggregate UDF is defined by 3 callbacks which the server locates in
+  a dynamic library when a UDF is installed. Each callback is invoked at
+  a particular juncture of query processing:
+
+  * the init function is called after the query has been parsed, but before
+    execution. This function is called once per query.
+  * the UDF body is invoked during query execution, possibly many times
+    in case arguments of the UDF refer to non-constant expressions such
+    as table columns
+  * the deinit function is called in the end of the query
+
+  For more information on UDFs please see
+  http://dev.mysql.com/doc/refman/5.0/en/adding-udf.html
+
+  CNET_WEIGHT(DOCUMENT, QUERY) -- a weighting UDF
+  -----------------------------------------------
+
+  A weighting UDF `CNET_WEIGHT' is a function that demonstrates how
+  relevance evaluation of fulltext can be implemented.
+  The function fulfills the following criteria:
+   * proximity between the matched words of a document is taken into
+     account: if the matched words stand closer to each other, the
+     relevance is higher
+
+     Example:
+     CNET_WEIGHT("My dearest aunt Cynthia", "dearest aunt") >
+     CNET_WEIGHT("My dearest Cynthia! My aunt", "dearest aunt")
+
+   * case sensitive matches are ranked higher than case insensitive
+
+     Example:
+     CNET_WEIGHT("MySQL", "MySQL") > CNET_WEIGHT("MySQL", "mysql")
+
+  The function accepts two arguments, a document and a search query
+  respectively. Whereas a document can refer to a non-constant expression,
+  such as a table column, a search query must be a constant.
+
+     Example:
+
+     CREATE TABLE t1 (a TEXT);
+     INSERT INTO t1 (a)
+     VALUES ("an ambitious proposal that could revolutionize"),
+            ("the approach to architectural design");
+
+     SELECT CNET_WEIGHT(a, "ambitious") FROM t1;
+     +-----------------------------+
+     | CNET_WEIGHT(a, "ambitious") |
+     +-----------------------------+
+     |                     2.00000 |
+     |                     0.00000 |
+     +-----------------------------+
+
+  Although a UDF itself can not make use of an index, it's easy to
+  utilize one by using CNET_WEIGHT in conjunction with MATCH ... AGAINST
+  clause:
+
+    ALTER TABLE t1 ADD FULLTEXT KEY(a);
+
+     SELECT CNET_WEIGHT(a, "ambitious")
+     FROM t1
+     WHERE MATCH(a) AGAINST ("ambitious" IN BOOLEAN MODE) > 0;
+     +-----------------------------+
+     | CNET_WEIGHT(a, "ambitious") |
+     +-----------------------------+
+     |                     2.00000 |
+     +-----------------------------+
+
+
+  The architecture of CNET_WEIGHT.
+  --------------------------------
+  In order to evaluate a relevance, the UDF parses the query and the
+  document, and tries to find every word of the query in the document.
+  Parsing of the query is done once per query, in cnet_weight_init,
+  whereas the document (or if it corresponds to a table column, the record)
+  is parsed on every invocation of cnet_weight().
+
+  For parsing purposes an external function is used; at the moment it
+  refers to the parsing function from CNET parsing plugin, which allows
+  to achieve best correlation between relevance values and table contents:
+
+    ALTER TABLE t1 ADD FULLTEXT KEY(a) WITH PARSER cnet_parser;
+
+  The signature of the plugin parser and its input buffers make it
+  available for reuse in the UDF.
+
+  The relevance calculation formula of CNET_WEIGHT
+  --------------------------------------------------
+*/
 
-/* This function will be used to parse query and document */
+/* This function will be used to parse the query and the document */
 extern int cnet_parser_parse(MYSQL_FTPARSER_PARAM *param);
 
 
@@ -251,12 +342,14 @@
 
   SYNOPSIS
     cnet_weight()
-    initid               holds broken into words query
+    initid               holds the query broken down into a list of words
     args                 only first argument is used, which is document
     is_null              unused
     error                unused
 
   DESCRIPTION
+    This function is a callback that is registered in the server
+    and called whenever a user uses "CNET_WEIGHT()"
     This function initializes parser variables and calls
     parser with first argument (args->args[0]) which is
     document.
@@ -266,7 +359,8 @@
 */
 
 double cnet_weight(UDF_INIT *initid, UDF_ARGS *args,
-                   char *is_null, char *error)
+                   char *is_null __attribute__((unused)),
+                   char *error __attribute__((unused)))
 {
   MYSQL_FTPARSER_PARAM param;
   CNET_WEIGHT_PARAM *weight_param= (CNET_WEIGHT_PARAM *)initid->ptr;
Thread
bk commit into 5.0-fulltext tree (konstantin:1.2016)konstantin25 Oct