List:General Discussion« Previous MessageNext Message »
From:Roberto Slepetys Ferreira Date:May 21 2003 2:55pm
Subject:Full text search algorithm
View as plain text  
Hi MySQLers,

I am hard working to optimize a FTS over a 1,65 Mi of small texts (about 300 words each).

Doing some tests with a small database, with 300.000 texts, the performance is going very
bad with the directly FTS, waiting until 180 seconds for a clear cache search of COUNT(*)
for a single world.

I am considering several aprouches to this problem:

    - The MySQL Manual explicit says that the fine tune of FTS is necessary, changing
internal parameters, and redefining stop words. 
      To do that, I am working in the ft_dump tool to analise the word frequence and

    - I am working in a different aprouche, doing a FTS brute force aprouche, constructing
3 tables: (correlation, text and dictionary) explicitly.
      The correlation table, is a 3 int fields (text, word position e word) with 3
indexes, where I will do the binary searchs.
      The Text table, is simple the text and the int field (prim_key) associated to the
correlation table.
      The Dictionary table, is a list of words and the prim_key.
      For indexing it, I split all the words of each text, and insert in the correlation

To understand what is the better aprouche, I am looking for information on how the FTS in
MySQL is implemented,  but I cannot find any information on the site, specialy what is the
space necessary and computation cost for the search operation (and the index operation).

I found some information in the N3Labs page about Full text search algorithms, but not
about the FTS implemented in MySQL.

bits,clicks and results


Roberto Slepetys
Diretor de Tecnologia
H O M E W O R K S  
bits, clicks e resultados
r. helena, 275 - 5º andar
são paulo - tel.3845-0722

Full text search algorithmRoberto Slepetys Ferreira21 May
  • Re: Full text search algorithmBrent Baisley21 May
    • Re: Full text search algorithmRoberto Slepetys Ferreira21 May
      • Re: Full text search algorithmSantino21 May
      • Re: Full text search algorithmBrent Baisley21 May