On 22 Apr 1999 12:55:36 +0100
"Robin Bowes" <robin.bowes@stripped> wrote:
> Don's query is exactly the same as mine, only using slightly different
> syntax.
>
> Substitute a for s1 and b for s2 and you're there...
>
> You might like to check the archives for more on this issue - I seem to
> remember a similar technique that allows the results to be "scored" ie
> return the results ordered by the number of words that match but I don't
> recall the details.
I've had good results using this scheme:
documents
docid
url
title
search
docid
word
then to get a list of matchings docs:
SELECT COUNT(*) as hits, d.docid, d.url, d.title
FROM search as s, documents as d
WHERE (s.word = 'word1' OR
s.word = 'word2' OR
s.word = 'word3') AND
s.docid = d.docid
GROUP BY s.docid
HAVING hits >= 3
will do an AND match (all three terms must be there). To do an OR match
(any terms will match), just remove the HAVING BY line. You can return
results with best matches first (most words matching) by adding in
'ORDER BY hits DESC' to the query. I also like the scheme because you
can add a 'score' field to the search table and give ceartain words a
better score, you can then pull out the score of the match using
SUM(score).
My search table currently has 2.5 million words, and there is no
noticeable delay for the user searching (through a web page).
Hope this helps,
Alex
-------------------- Gossamer Threads Inc. ----------------------
Alex Krohn Email: alex@stripped
Internet Consultant Phone: (604) 922-4536
http://www.gossamer-threads.com Fax : (604) 926-9320
PGP Public Key: http://www.gossamer-threads.com/pgpkey.txt
Page me Online: http://wwp.mirabilis.com/1591987