List:General Discussion« Previous MessageNext Message »
From:Sascha Mantscheff Date:November 13 2000 10:27pm
Subject:Performance problems and query/table design
View as plain text  
Here my question:

The explanation of a query reads:

+----------------+--------+----------------------------------+---------+---------+--------------------+------+-------------+
| table          | type   | possible_keys                    | key     |
key_len | ref                | rows | Extra       |
+----------------+--------+----------------------------------+---------+---------+--------------------+------+-------------+
| ft_temp1       | ref    | id,word_no,id_source,id_source_2 | word_no
|       1 | ???                |   19 |             |
| ft_words       | eq_ref | PRIMARY                          | PRIMARY
|       4 | ft_temp1.id_word   |    1 | Using index |
| ft_sources     | eq_ref | PRIMARY                          | PRIMARY
|       4 | ft_temp1.id_source |    1 |             |
| ft_occurrences | ref    | id_word,id_source                | id_word
|       4 | ft_temp1.id_word   | 9400 | where used  |
+----------------+--------+----------------------------------+---------+---------+--------------------+------+-------------+

Obviously the key id_word in the ft_occurrences table is not used. Why?
Can I force it?

============================================

And here the explanation for my question:

I'm using MySQL for a full text index on our website. For this search
I'm using three tables with the following structure:


ft_words;
+--------------+------------------+------+-----+---------+-------+
| Field        | Type             | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| id           | int(10) unsigned |      | PRI | 0       |       |
| word         | varchar(30)      |      | UNI |         |       |
| source_count | int(11)          | YES  |     | NULL    |       |
+--------------+------------------+------+-----+---------+-------+

ft_sources;
+------------+-------------------------------------------------+------+-----+---------+-------+
| Field      | Type                                            | Null |
Key | Default | Extra |
+------------+-------------------------------------------------+------+-----+---------+-------+
| id         | int(10) unsigned                                |      |
PRI | 0       |       |
| sourcetype | enum('B','H','G','F','M','N','NE','P','PE','Q') |     
|     | B       |       |
| sourcename | varchar(50)                                     |     
|     |         |       |
+------------+-------------------------------------------------+------+-----+---------+-------+

ft_occurrences;
+------------+----------------------+------+-----+---------+-------+
| Field      | Type                 | Null | Key | Default | Extra |
+------------+----------------------+------+-----+---------+-------+
| id_word    | int(10) unsigned     |      | MUL | 0       |       |
| id_source  | int(10) unsigned     |      | MUL | 0       |       |
| position   | smallint(5) unsigned |      |     | 0       |       |
| wordlength | tinyint(3) unsigned  |      |     | 0       |       |
+------------+----------------------+------+-----+---------+-------+

Now I'd like to do a search with several several search words combined
by "AND".
The naive approach would be (for word1, word2, word3):
        select sourcetype,sourcename,position,wordlength
        from ft_occurrences as o1, ft_occurrences as o2, ft_occurrences
as o3,
                        ft_sources,
                        ft_words as w1,ft_words as w2,ft_words as w3
        where w1.word='word1' and w2.word='word2' and w3.word='word3'
        and w1.id=o1.id_word
        and w2.id=o2.id_word
        and w3.id=o3.id_word
        and o1.id_source=o2.id_source and o2.id_source=o3.id_source;

This leads to a table overflow, since we have about 6.000.000
occurrences, 100.000 words and 100.000 sources.

Then I tried another approach with temporary tables. The results for the
first word go into a temporary table, then this temporary table is
joined with the results for the 2nd word; the results go into the
temporary table, and so on.
This approach works, but there is a performance problem with the last
step. The query for the last step reads:

        select sourcetype, sourcename, position, wordlength
        from ft_occurrences,ft_words,ft_sources,ft_temp1
        where (
                        ft_temp1.id_word=ft_occurrences.id_word
                and ft_temp1.id_source=ft_occurrences.id_source
                and ft_temp1.id_source=ft_sources.id
                and ft_words.id=ft_temp1.id_word
                and ft_temp1.word_no=1
                and ft_temp1.id=5
        )
        order by sourcetype,sourcename,position;

This query takes about 90 seconds, with a result set of about 6 rows.
The temporary table ft_temp1 contains only a few hundred entries, and
the filter condition
        "ft_temp1.word_no=1 and ft_temp1.id=5"
leave only the 6 rows of the result set. I do not understand why this
query takes that much time. The explanation of the query reads:

+----------------+--------+----------------------------------+---------+---------+--------------------+------+-------------+
| table          | type   | possible_keys                    | key     |
key_len | ref                | rows | Extra       |
+----------------+--------+----------------------------------+---------+---------+--------------------+------+-------------+
| ft_temp1       | ref    | id,word_no,id_source,id_source_2 | word_no
|       1 | ???                |   19 |             |
| ft_words       | eq_ref | PRIMARY                          | PRIMARY
|       4 | ft_temp1.id_word   |    1 | Using index |
| ft_sources     | eq_ref | PRIMARY                          | PRIMARY
|       4 | ft_temp1.id_source |    1 |             |
| ft_occurrences | ref    | id_word,id_source                | id_word
|       4 | ft_temp1.id_word   | 9400 | where used  |
+----------------+--------+----------------------------------+---------+---------+--------------------+------+-------------+

Obviously the key id_word in the ft_occurrences table is not used. Why?
Can I force it?

Any suggestions are appreciated.
Thread
Performance problems and query/table designSascha Mantscheff14 Nov
  • Re: Performance problems and query/table designBenjamin Pflugmann16 Nov