(lines wrapped for readability)
In the last episode (May 20), Roberto Slepetys Ferreira said:
> I am working with a big indexing system, that will index almost 10
> years of a newspaper, about 300 articles/day, about 4 Kb (350 words)
> each : 10*365*300 texts = 1.095.000 of articles (about 4,3 Gb of
> It's my first use of the Full Text Search of MySQL, and I am having
> some problems:
> 1st) I like to acess the Dictionary Table directly, to retrieve all
> the world that was indexed. This will be used to construct the front
> end, allowing the user only to make a search based on world that
> really exists in the dictionary table.
If you build from source, there's a debugging program called ft_dump
that gets built in the myisam/ directory. It can dump the wordlist,
plus per-word stats if you want them.
> 2nd) I like to limit the grown of the dictionary to words from 4
> letters to 25 letters.
Set ft_min_word_len=4 and ft_max_word_len=25 before building the index.
> 3rd) How I can grown the search speed ? The primary search is allways
> COUNT(*) of ocorrences of some word in the text table.
> 4rd) I made some tests of performance, and I noted that the first
> search for a very large occurence (abour 1% of the data) takes 100
> times the secound iqual search, this smell to me a very sofisticated
> cache mechanism: How it works ? Could I have some information on that
> ? I could apprimorate it ?
I think what you're seeing is mysql retrieving the recordlist for each
word in your query. After the first query, the lists are cached in
mysql's key buffer, and the second query doesn't have to do any disk
I/O. Regular indexes do the same thing, but the effect is more
pronounced with fulltext indexes because an entry in a fulltext index
is much larger than a regular index's.