MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:Tobias Lind Date:September 10 2003 12:26pm
Subject:Suboptimal index-usage with FULLTEXT-search
View as plain text  
Hi!
I'm running a large web application with MySQL 4.0.14.
The application is performing searches in a large table (>500.000 rows).
The WHERE-clause for these searches sometimes is a combination of different
columns in the table, and I have noticed some very sub-optimal index-usage
when the search includes a FULLTEXT-indexed column. MySQL seems to ALWAYS
use the fulltext-index in these searches - even if, for instance the
PRIMARY-index would be far more optimal.

How to repeat:

CREATE TABLE testTable (
  usrName varchar(40) NOT NULL default '',
  age tinyint(4) default NULL,
  music text,
  film text,
  otherInterests text,
  PRIMARY KEY  (usrName),
  KEY ageIndex (age),
  FULLTEXT KEY music (music),
  FULLTEXT KEY film (film),
  FULLTEXT KEY otherInterests (otherInterests)
) TYPE=MyISAM;

INSERT INTO testTable VALUES ('kalle',14,'Eminem, hiphop, heavy
metal','matrix, lord of the rings, the ring','football');
INSERT INTO testTable VALUES ('pelle',16,'Jazz, blues','Steve
McQueen','cars');
INSERT INTO testTable VALUES ('olle',14,'Slipknot, Eminem','The
Matrix','nothing');
INSERT INTO testTable VALUES ('lasse',15,'Reggae,
Eminem','Rockers','Reading, football');
INSERT INTO testTable VALUES ('nisse',15,'Reggae, Eminem','The
Matrix','Reading, football');

Examples:

mysql> EXPLAIN SELECT * FROM testTable WHERE usrName='olle' AND MATCH (film)
AGAINST ('"matrix"' IN BOOLEAN MODE);
+-----------+--------+---------------+---------+---------+-------+------+---
----------+
| table     | type   | possible_keys | key     | key_len | ref   | rows |
Extra       |
+-----------+--------+---------------+---------+---------+-------+------+---
----------+
| testTable | eq_ref | PRIMARY,film  | PRIMARY |      40 | const |    1 |
Using where |
+-----------+--------+---------------+---------+---------+-------+------+---
----------+
1 row in set (0.00 sec)
This is ok - the PRIMARY index is selected...



mysql> EXPLAIN SELECT * FROM testTable WHERE usrName IN ('olle','pelle') AND
MATCH (film) AGAINST ('"matrix"' IN BOOLEAN MODE);
+-----------+----------+---------------+------+---------+------+------+-----
--------+
| table     | type     | possible_keys | key  | key_len | ref  | rows |
Extra       |
+-----------+----------+---------------+------+---------+------+------+-----
--------+
| testTable | fulltext | PRIMARY,film  | film |       0 |      |    1 |
Using where |
+-----------+----------+---------------+------+---------+------+------+-----
--------+
1 row in set (0.00 sec)
But here the fulltext index "film" is selected!
I think the optimal selection would be to always use PRIMARY-index if
possible!
In this small table it's ok of course, but in my application I have over
500.000 rows. When searching for very common films (that matches a lot of
rows) this leads to very poor performance. Using PRIMARY-index gives A LOT
better performance in all cases I have tested.


mysql> EXPLAIN SELECT * FROM testTable WHERE usrName IN ('olle','pelle') AND
age=14 AND MATCH (film) AGAINST ('"matrix"' IN BOOLEAN MODE);
+-----------+----------+-----------------------+------+---------+------+----
--+-------------+
| table     | type     | possible_keys         | key  | key_len | ref  |
rows | Extra       |
+-----------+----------+-----------------------+------+---------+------+----
--+-------------+
| testTable | fulltext | PRIMARY,ageIndex,film | film |       0 |      |
1 | Using where |
+-----------+----------+-----------------------+------+---------+------+----
--+-------------+
1 row in set (0.00 sec)
Here the fulltext index "film" is also selected.
PRIMARY or ageIndex would be better...



mysql> EXPLAIN SELECT * FROM testTable ignore index (film) WHERE usrName IN
('olle','pelle') AND age=14 AND MATCH (film) AGAINST ('"matrix"' IN BOOLEAN
MODE);
+-----------+------+------------------+----------+---------+-------+------+-
------------+
| table     | type | possible_keys    | key      | key_len | ref   | rows |
Extra       |
+-----------+------+------------------+----------+---------+-------+------+-
------------+
| testTable | ref  | PRIMARY,ageIndex | ageIndex |       2 | const |    2 |
Using where |
+-----------+------+------------------+----------+---------+-------+------+-
------------+
1 row in set (0.00 sec)
If I use IGNORE INDEX (film), it works better.
This is actuelly how I deal with it in my application - if the search
contains very limiting where-clauses (other than a FULLTEXT-columns), I add
"IGNORE INDEX (music,film,otherInterests)" to the query...


The main problem with all this seems to be that when a fulltext-indexed
column is used in the where-clause, The MySQL optimizer will ALWAYS choose
the FULLTEXT-index since it is assumed that this will match only 1 row
(always 1 row).
I understand that it is very hard to do a correct analysis and estimation of
rows for a BOOLEAN FULLTEXT-search, but to assume that it will always return
1 row is not at all optimal in my opinion!
If MySQL instead assumed - say 1000 rows - we would get far better results
in most situations. Then we would use another index if it would result in
less rows than 1000 (not so many in my large table). If that assumption
would be incorrect, the worst thing that could happen would be that another
index (with less that 1000 possible rows) would be used instead even though
the fulltext-index would have been better. But that's not so bad - we would
scan maximum 999 rows with the use of another index - no problem! ...not
compared to my 100.000 rows scans that's the result of picking a
fulltext-index when it gives a lot of matches...

The best solution would of course be that MySQL could estimate the number of
possible rows from the boolean fulltest-search. I guess that this is
impossible.
The next best thing would be to be able to manually configure the minimum
estimated number of rows from a boolean search as a hint to the MySQL
optimizer... Right now this estimation is always 1, I'd like to set it
higher!!

Best regards,
Tobias Lind


Thread
Suboptimal index-usage with FULLTEXT-searchTobias Lind10 Sep
  • Re: Suboptimal index-usage with FULLTEXT-searchSergei Golubchik10 Sep
Re: Suboptimal index-usage with FULLTEXT-searchSteven Roussey17 Sep