Below is the list of changes that have just been committed into a local
5.0-fulltext repository of paul. When paul does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
ChangeSet
1.2031 05/11/02 16:18:04 paul@stripped +1 -0
plugin-setup.txt:
Additional information about tests done with CNET data.
plugin/fulltext/plugin-setup.txt
1.3 05/11/02 16:17:20 paul@stripped +177 -0
Additional information about tests done with CNET data.
# This is a BitKeeper patch. What follows are the unified diffs for the
# set of deltas contained in the patch. The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User: paul
# Host: frost.snake.net
# Root: /Users/paul/bk/mysql-5.0-cnet
--- 1.2/plugin/fulltext/plugin-setup.txt 2005-10-29 20:30:11 -05:00
+++ 1.3/plugin/fulltext/plugin-setup.txt 2005-11-02 16:17:20 -06:00
@@ -243,3 +243,180 @@
| 0.00000 |
+-------------------------------------------+
1 row in set (0.00 sec)
+
+----------
+
+The following mail message contains some additional examples that
+we were able to run based on CNET-provided data. The following
+procedure was used to prepare the sample data used in the examples:
+
+We selected all NOT NULL values from the following columns of the
+productcoredata table that was provided by C|NET:
+
+productName, manufacturerName, SKU, keywordsHardcode, shortSpecs,
+7bottomLine, 16bottomLine, 7good, 16good, 7bad, 16bad, 7whatFor,
+16whatFor, 7whoFor, 16whoFor, 7bizUse, 16bizUse.
+
+These columns were selected into one single column table with a fulltext
+index, and then all benchmarks/tests were run against that table.
+
+----- Forwarded message from Sergey Vojtovich <svoj@stripped> -----
+
+Date: Fri, 28 Oct 2005 16:49:48 +0500
+From: Sergey Vojtovich <svoj@stripped>
+To: Konstantin Osipov <konstantin@stripped>
+Subject: cnet examples
+
+Hi!
+
+Please find CNET examples attached. There are still two points I wasn't able
+to
+comment: word proximity and text literals.
+
+Regards,
+Sergey
+
+1. Word frequency
+SELECT a, cnet_weight(a, 'acceleraid') AS weight FROM t2
+WHERE MATCH a AGAINST ('acceleraid' IN BOOLEAN MODE)
+ORDER BY weight DESC LIMIT 10\G
+*************************** 1. row ***************************
+a: ACCELERAID 150 W/SCSI IF SCSI TO PCI CONTR W/4MB ACCELERAID 150 W/SCSI IF
+SCSI TO PCI CONTR W/4MB
+weight: 1.25000
+*************************** 2. row ***************************
+a: ACCELERAID 352 64BIT PCI U160 2CH 64MB RAID
+weight: 1.00000
+*************************** 3. row ***************************
+a: ACCELERAID 170 SCSI RAID 1CH U160 64MB CACHE
+weight: 1.00000
+*************************** 4. row ***************************
+a: ACCELERAID 170 SCSI RAID 1CH U160 32MB CACHE
+weight: 1.00000
+*************************** 5. row ***************************
+a: ACCELERAID 352 2 CHANNEL PCI U160 SCSI RAID ADPTR 32MB NO BBU
+weight: 1.00000
+*************************** 6. row ***************************
+a: ACCELERAID 170LP SCSI RAID 1CH U160 16MB W/SW
+weight: 1.00000
+*************************** 7. row ***************************
+a: ACCELERAID 352 2 CHANNEL PCI U160 SCSI RAID ADPTR 64MB NO BBU
+weight: 1.00000
+*************************** 8. row ***************************
+a: ACCELERAID 150 1CH PCI ULTRA2 SCSI LVD RAID MODEL 4MB EDO CACHE
+weight: 1.00000
+*************************** 9. row ***************************
+a: Mylex AcceleRAID 150 - storage controller (RAID) - Ultra2 Wide SCSI - PCI
+weight: 0.50000
+
+This example demonstrates how number of occurances affects weight.
+
+2. Term proximity
+
+3. Common Word Handling
+MySQL does not skip words that appear in >50% of documents in
+boolean mode.
+
+4. Case selectivity
+SELECT a, cnet_weight(a, 'AC21') AS weight FROM t2
+WHERE MATCH a AGAINST ('AC21' IN BOOLEAN MODE)
+ORDER BY weight DESC LIMIT 10;
++------------------------------------+---------+
+| a | weight |
++------------------------------------+---------+
+| AC21 - Adapter Cable | 2.00000 |
+| AC21 | 2.00000 |
+| JWIN JV AC21 - video cable - 60 ft | 1.00000 |
++------------------------------------+---------+
+
+SELECT a, cnet_weight(a, 'ac21') AS weight FROM t2
+WHERE MATCH a AGAINST ('ac21' IN BOOLEAN MODE)
+ORDER BY weight DESC LIMIT 10;
++------------------------------------+---------+
+| a | weight |
++------------------------------------+---------+
+| AC21 - Adapter Cable | 1.00000 |
+| AC21 | 1.00000 |
+| JWIN JV AC21 - video cable - 60 ft | 0.50000 |
++------------------------------------+---------+
+
+Terms that match in case sensitive fashion yield a higher
+relevance score.
+
+5. Weighting relevancy
+This point is not covered by the exmaple. However weighting
+function could be easily extended to met this requirement.
+
+For example, cnet_weight(text, query) could be extended to:
+cnet_weight(field0, weight0, ... fieldN, weightN, query).
+Then this function could be used to set relevancy factor
+for each field, e.g.:
+cnet_weight(title, 3.0, keywords, 2.0, body, 1.0, query)
+
+6. Intra-Word delimiters
+SELECT a, cnet_weight(a, 'C++') AS weight FROM t2
+WHERE MATCH a AGAINST ('C++' IN BOOLEAN MODE)
+ORDER BY weight DESC LIMIT 10;
++---------------------------------------------------------------------------------------+---------+
+| a
+| weight |
++---------------------------------------------------------------------------------------+---------+
+| C++Builder Professional - ( v. 6 ) - 1 year upgrade plan
+| 2.00000 |
+| C++Builder Enterprise - 1 year upgrade plan
+| 2.00000 |
+| AE VC++.NET STD 2003 WIN32 EN CD
+| 2.00000 |
+| UPG-V BORLAND C++ BUILDER 6 PRO C++BLDR PRO 3.X OR HIGHER 98/W2K
+| 2.00000 |
+| UPG-V BORLAND C++ BUILDER 6 ENT C++BLDR ENT OR PRO 3.X OR HIGHER
+| 2.00000 |
+| VC++.NET STD 2003 WIN32 EN CD
+| 2.00000 |
+| CE TOOLKIT FOR VISUAL C++ V6.0 CD WNT V/U CE TOOLKIT FOR VISUAL C++ V6.0
+CD WNT V/U | 1.33333 |
+| CE TOOLKIT FOR VISUAL C++ V6.0 CD NT CE TOOLKIT FOR VISUAL C++ V6.0 CD NT
+| 1.33333 |
+| VLA BORLAND C++ BUILDER V6 ENT UPG ENT V5/4/3 LIC
+| 1.00000 |
+| Intel C++ Compiler for Windows - 1 year upgrade plan (renewal)
+| 1.00000 |
++---------------------------------------------------------------------------------------+---------+
+
+SELECT a, cnet_weight(a, 'u.s. cellular') AS weight FROM t2
+WHERE MATCH a AGAINST ('u.s. cellular' IN BOOLEAN MODE)
+ORDER BY weight DESC LIMIT 10;
++------------------------------------------------------------------------+---------+
+| a |
+weight |
++------------------------------------------------------------------------+---------+
+| 5.8 oz=Up to 180 min =U.S. Cellular |
+1.00000 |
+| Motorola V120x (U.S. Cellular) |
+0.33333 |
+| 3.9 oz=Up to 120 min =CDMA 1900 =U.S. Cellular |
+0.33333 |
+| 3.4 oz=Up to 240 min =CDMA 800/1900 =U.S. Cellular |
+0.33333 |
+| Kyocera Slider SE47 (U.S. Cellular) |
+0.25000 |
+| 5.3 oz=Up to 210 min =D-AMPS 800/1900 / AMPS 800 =U.S. Cellular |
+0.25000 |
+| 3.8 oz=Up to 210 min =D-AMPS 800/1900 / AMPS 800 =U.S. Cellular |
+0.25000 |
+| 4.5 oz=Up to 150 min =CDMA2000 1X 1900/800 / AMPS 800 =U.S. Cellular |
+0.25000 |
+| 4.2 oz=Up to 300 min =D-AMPS 800/1900 / AMPS 800 =U.S. Cellular |
+0.25000 |
++------------------------------------------------------------------------+---------+
+
+Parser preloads list of "always-index" words from ordinary-dict.txt. Format
+of ordinary-dict.txt is one word per line. With this approach one can
+use words containing intra-word delimiters as single words.
+
+7. Text literals
+To add good examples synonyms needs to be extended.
+
+
+
+----- End forwarded message -----
| Thread |
|---|
| • bk commit into 5.0-fulltext tree (paul:1.2031) | paul | 3 Nov |