List:General Discussion« Previous MessageNext Message »
From:Peter Burden Date:January 30 2004 11:51pm
Subject:Re: MySQL optimisations for search engine
View as plain text  
Jasper Bryant-Greene wrote:

>Hi
>
> 
>
>I'm running a small search engine that indexes the web, but have been having
>trouble with optimising it to handle the load.
>
>There are two tables involved in searching - `pages`, which stores the ID,
>URL, title and crawl date, and `words` which has two rows - `word` and
>`page` where `word` is a word found on the page and `page` is the ID from
>the `pages` table.
>
> 
>
>When we crawl an URL we rip all the words from the page and add them to the
>`words` table with the ID of the page.
>
> 
>
>The query we use for searches is:
>
> 
>
>SELECT COUNT(words.word) AS score, words.page AS id, pages.title, pages.url
>FROM words,pages WHERE pages.id=words.page AND words.word IN($words) GROUP
>BY words.page ORDER BY score DESC LIMIT 10
>  
>

Not sure which columns are indexed, but the main problem is almost certainly
"words.word IN($words)" - this will yield a set of rows from the 'words' 
table
assuming that the string "$words" contains the query terms(s), this 
requires an
expensive "serial walk" of the "words" table and an expensive string 
matching
operation associated with each row - unless the IN() operator is a lot 
cleverer
than I suspect.

A better approach would, perhaps, be to parse the query into an array of
words in the application and then construct suitable SQL along the lines of
.... words.word = qword1 OR words.word = qword2 OR words.word = qword3
    ...... etc with as many or few terms as required.

Index the words table on word for a further really big performance boost .

> 
>
>I've put the LIMIT 10 in there because it's been going slow as hell. not
>only that but it's still going rather slow since we're getting rather high
>load on the search engine at the moment.
>
> 
>
>If anyone could suggest ways to make it run faster that'd be great, bearing
>in mind that:
>
> 
>
>a)      I can't change MySQL server parameters since the host won't allow it
>
>b)      I'd rather not start crawling again with a different method - the
>words table has over 1,700,000 rows.
>
> 
>
>Thanks
>
>  
>
>>Jasper Bryant-Greene
>>Cabbage Promotions
>> <mailto:jasp@stripped> jasp@stripped
>> <http://fatalnetwork.com/> http://fatalnetwork.com/
>>US: +1 (509) 691 3287
>>NZ: +64 (21) 232 3303
>>    
>>
>
>
>  
>


Thread
MySQL optimisations for search engineJasper Bryant-Greene30 Jan
  • Re: MySQL optimisations for search enginePeter Burden31 Jan
RE: MySQL optimisations for search engineAndrew Braithwaite1 Feb