MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:Sander Pilon Date:September 10 2000 4:14pm
Subject:RE: search engine.
View as plain text  
And be sure to check out the FULLTEXT search capabilities, I believe they
were made
exactly for this purpose. See the manual for details.

Sander

> -----Original Message-----
> From: James Treworgy [mailto:jamie@stripped]
> Sent: 10 September 2000 17:52
> To: Nicolas Moldavsky
> Cc: mysql@stripped
> Subject: Re: search engine.
>
>
>  From past discussions I believe what is the most efficient way
> to be able
> to accomodate all types of searching is not to try to tokenize
> your entire
> string (you don't save that much space that way, particularly if the
> documents are large) - tokenize only the unique words to perform your
> initial search, and then search ONLY the matching documents using a
> substring search (e.g. LIKE to perform a phrase search.  This way
> you get a
> fast search to filter out most of the non-matching documents on a much
> smaller index database, and then to match exact phrases you can do it the
> old fashioned way.
>
> e.g.
>
> tbl_tokens (TokenID, Word)
> tbl_masterindex (DocID, TokenID, WordCountID) <-- the last field
> containst
> the # of occurrences of the word in the document, useful for ordering
> results by relevance
> tbl_docs (DocID, Body)  <-- Body contains the fulll text
>
> Select * from tbl_masterindex LEFT JOIN tbl_tokens ON
> tbl_masterindex.TokenID = tbl_tokens.TokenID where Word =
> 'firstsearchterm'
> (OR|AND) word='secondsearchterm' .. etc
>
> (now you build a where clause based on results... you could probably
> integrate these two SQL statements but I think it's more
> efficient to do it
> separately rather than having a three-level join... any comments from
> efficiency gurus on this?)
>
> Select * from tbl_docs WHERE (... your where clause based on results of
> previous query to select correct DocIDs ..) AND Body like '%full phrase
> search if needed%'
>
> I don't remember reading anything about the optimization of this final
> query to answer the question of, "does it try to search all the documents
> for the full phrase, or will it only try to do the LIKE if the
> part before
> the AND is met").
>
> Anyway there is a lot of room for inventiveness and discussion in
> optimizing the queries here. But the basic point is how you organize the
> information so that you minimize both the amount of data stored in your
> index, and the number of records in which you must do a substring search.
>
> Jamie
>
> At 11:22 AM 9/10/00, Nicolas Moldavsky wrote:
> >It all depends on what table structure you use.  The best schema I've
> >found is having a documents table, a words table and a table holding the
> >relationship between these two: document_word.   The latter should have a
> >document_id, a word_id and integer specifying the position of
> each word in
> >the document.
> >
> >With this structure you can do OR queries (any of the words) easily, but
> >AND queries (all the words) are a bit more difficult because you have to
> >do self joins of the document_word and the words table.    The 'exact
> >phrase' query should be similar to the AND query but you have to
> take into
> >account that the position of the words should be sequential and in order.
>
>
> --
> ---------------------------------------------------------------------
> Please check "http://www.mysql.com/php/manual.php" before
> posting. To request this thread, e-mail mysql-thread50205@stripped
>
> To unsubscribe, send a message to:
>     <mysql-unsubscribe-sander=3dnews.net@stripped>
>
> If you have a broken mail client that cannot send a message to
> the above address(Microsoft Outlook), you can use
http://lists.mysql.com/php/unsubscribe.php

Thread
search engine.Ariel Manzur10 Sep
  • Re: search engine.James Treworgy10 Sep
  • Re: search engine.Nicolas Moldavsky10 Sep
    • Re: search engine.James Treworgy10 Sep
      • RE: search engine.Sander Pilon10 Sep
        • RE: search engine.James Treworgy10 Sep
          • RE: search engine.Sander Pilon10 Sep
            • Compile warningSander Pilon10 Sep
              • FullText documentation / code bugSander Pilon10 Sep
                • MySQL *crash* on FulltextSander Pilon10 Sep
                  • RE: MySQL *crash* on Fulltext & FullText conclusion so farSander Pilon10 Sep
                • Re: FullText documentation / code bugRob McMillin10 Sep
                  • RE: FullText documentation / code bugSander Pilon10 Sep
Re: search engine.Ariel Manzur11 Sep
  • Re: search engine.James Treworgy11 Sep
    • Re: search engine.Ariel Manzur13 Sep
Re: search engine.James Treworgy11 Sep