List:General Discussion« Previous MessageNext Message »
From:Roberto Slepetys Ferreira Date:May 21 2003 2:55pm
Subject:Full text search algorithm
View as plain text  
Hi MySQLers,

I am hard working to optimize a FTS over a 1,65 Mi of small texts (about 300 words each).

Doing some tests with a small database, with 300.000 texts, the performance is going very
bad with the directly FTS, waiting until 180 seconds for a clear cache search of COUNT(*)
for a single world.

I am considering several aprouches to this problem:

    - The MySQL Manual explicit says that the fine tune of FTS is necessary, changing
internal parameters, and redefining stop words. 
      To do that, I am working in the ft_dump tool to analise the word frequence and
distribution.

    - I am working in a different aprouche, doing a FTS brute force aprouche, constructing
3 tables: (correlation, text and dictionary) explicitly.
      The correlation table, is a 3 int fields (text, word position e word) with 3
indexes, where I will do the binary searchs.
      The Text table, is simple the text and the int field (prim_key) associated to the
correlation table.
      The Dictionary table, is a list of words and the prim_key.
      For indexing it, I split all the words of each text, and insert in the correlation
table.

To understand what is the better aprouche, I am looking for information on how the FTS in
MySQL is implemented,  but I cannot find any information on the site, specialy what is
the space necessary and computation cost for the search operation (and the index
operation).

I found some information in the N3Labs page about Full text search algorithms, but not
about the FTS implemented in MySQL.

Thanks
Slepetys

www.homeworks.com.br
bits,clicks and results


   

Roberto Slepetys
Diretor de Tecnologia
....................................
H O M E W O R K S  
bits, clicks e resultados
....................................
r. helena, 275 - 5º andar
são paulo - tel.3845-0722
www.homeworks.com.br



Thread
Full text search algorithmRoberto Slepetys Ferreira21 May
  • Re: Full text search algorithmBrent Baisley21 May
    • Re: Full text search algorithmRoberto Slepetys Ferreira21 May
      • Re: Full text search algorithmSantino21 May
      • Re: Full text search algorithmBrent Baisley21 May