I am hard working to optimize a FTS over a 1,65 Mi of small texts (about 300 words each).
Doing some tests with a small database, with 300.000 texts, the performance is going very
bad with the directly FTS, waiting until 180 seconds for a clear cache search of COUNT(*)
for a single world.
I am considering several aprouches to this problem:
- The MySQL Manual explicit says that the fine tune of FTS is necessary, changing
internal parameters, and redefining stop words.
To do that, I am working in the ft_dump tool to analise the word frequence and
- I am working in a different aprouche, doing a FTS brute force aprouche, constructing
3 tables: (correlation, text and dictionary) explicitly.
The correlation table, is a 3 int fields (text, word position e word) with 3
indexes, where I will do the binary searchs.
The Text table, is simple the text and the int field (prim_key) associated to the
The Dictionary table, is a list of words and the prim_key.
For indexing it, I split all the words of each text, and insert in the correlation
To understand what is the better aprouche, I am looking for information on how the FTS in
MySQL is implemented, but I cannot find any information on the site, specialy what is the
space necessary and computation cost for the search operation (and the index operation).
I found some information in the N3Labs page about Full text search algorithms, but not
about the FTS implemented in MySQL.
bits,clicks and results
Diretor de Tecnologia
H O M E W O R K S
bits, clicks e resultados
r. helena, 275 - 5º andar
são paulo - tel.3845-0722