Jasper Bryant-Greene wrote:
>Hi
>
>
>
>I'm running a small search engine that indexes the web, but have been having
>trouble with optimising it to handle the load.
>
>There are two tables involved in searching - `pages`, which stores the ID,
>URL, title and crawl date, and `words` which has two rows - `word` and
>`page` where `word` is a word found on the page and `page` is the ID from
>the `pages` table.
>
>
>
>When we crawl an URL we rip all the words from the page and add them to the
>`words` table with the ID of the page.
>
>
>
>The query we use for searches is:
>
>
>
>SELECT COUNT(words.word) AS score, words.page AS id, pages.title, pages.url
>FROM words,pages WHERE pages.id=words.page AND words.word IN($words) GROUP
>BY words.page ORDER BY score DESC LIMIT 10
>
>
Not sure which columns are indexed, but the main problem is almost certainly
"words.word IN($words)" - this will yield a set of rows from the 'words'
table
assuming that the string "$words" contains the query terms(s), this
requires an
expensive "serial walk" of the "words" table and an expensive string
matching
operation associated with each row - unless the IN() operator is a lot
cleverer
than I suspect.
A better approach would, perhaps, be to parse the query into an array of
words in the application and then construct suitable SQL along the lines of
.... words.word = qword1 OR words.word = qword2 OR words.word = qword3
...... etc with as many or few terms as required.
Index the words table on word for a further really big performance boost .
>
>
>I've put the LIMIT 10 in there because it's been going slow as hell. not
>only that but it's still going rather slow since we're getting rather high
>load on the search engine at the moment.
>
>
>
>If anyone could suggest ways to make it run faster that'd be great, bearing
>in mind that:
>
>
>
>a) I can't change MySQL server parameters since the host won't allow it
>
>b) I'd rather not start crawling again with a different method - the
>words table has over 1,700,000 rows.
>
>
>
>Thanks
>
>
>
>>Jasper Bryant-Greene
>>Cabbage Promotions
>> <mailto:jasp@stripped> jasp@stripped
>> <http://fatalnetwork.com/> http://fatalnetwork.com/
>>US: +1 (509) 691 3287
>>NZ: +64 (21) 232 3303
>>
>>
>
>
>
>