At 7:37 AM +1000 4/11/06, Taco Fleur wrote:
>Hi all,
>
>I am trying to find out whether it is possible to return the count of
>keywords found in a text field, does anyone know?
>For example; "ColdFusion or Java" is entered in the search string and 20
>records are found that match, I need to count how many times "ColdFusion"
>and "Java" appears in each match, add those two and than sort descending on
>that total.
>
>I was actually using verity for the search, but the client insists he sees
>the number of keywords found, which Verity does not do.
>
>The text searched are Résumé's, I initially thought that MySQL could
> search
>the résumé's when stored as binary data, but I was wrong, so I now have
> the
>CV's converted to HTML and then store them in the DB as VARCHAR
>
>Any help would be much appreciated, I am having a hard time coming from a MS
>SQL background ;-)
There's no function that I know of to do that directly; however, you
could do something like this:
select
(length(your_text_field)-length(replace(your_text_field,
'coldfusion','')))/length('coldfusion')
as wordcount
from
your_table
That is, it removes all instances of 'coldfusion' from your string,
gets the difference in length from the unaltered string, and divides
that by the number of characters in your search string. I've used
this method several times.
Of course, if you want to avoid matching against terms like
'javalike' or 'coldfusionista' then you've gotta do some additional
checking, for example:
select
(length(your_text_field)-length(replace(concat('
',your_text_field,' '), ' coldfusion ','')))/length(' coldfusion ')
as wordcount
from
your_table
See
http://dev.mysql.com/doc/refman/4.1/en/string-functions.html
for more info.
If you're doing this alot, it might be more efficient to build a word
index table like:
word char(32) not null # or whatever your max word
length is likely to be
word_count integer unsigned not null
document_id integer unsigned not null
where document_id is a foreign key pointing at the table containing
your fulltext. This would be easier to extend to handle synonym
handling too, and you could do all the suffix handling/stemming you
need (eg; to take care of plurals). I've done something like that as
well, and included an extra field for the metaphone version of the
word, to match approximate spellings. If the text fields were all in
plain text, you could even include character positions like
word char(32) not null
word_position integer unsigned not null
document_id integer unsigned not null
then you could get word counts by doing a select count(word).
steve
> Kind regards,
>
>
>Taco Fleur
>
>Free Call 1800 032 982 or Mobile 0421 851 786
>Pacific Fox <http://www.pacificfox.com.au/> http://www.pacificfox.com.au an
>industry leader with commercial IT experience since 1994
>
>*
>
> Web Design and Development
>*
>
> SMS Solutions, including developer API
>*
>
> Domain Registration, .COM for as low as fifteen dollars a year,
>.COM.AU for fifty dollars two years!
>
>
--
+--------------- my people are the people of the dessert, ---------------+
| Steve Edberg http://pgfsun.ucdavis.edu/ |
| UC Davis Genome Center sbedberg@stripped |
| Bioinformatics programming/database/sysadmin (530)754-9127 |
+---------------- said t e lawrence, picking up his fork ----------------+