Indeed, as you say, Brent, correlated subqueries are not well-optimized
in MySQL. The specific subquery (the IN() subquery) demonstrated in the
original post is, however, optimized in MySQL 6.0 :)
More comments inline.
Brent Baisley wrote:
> You are using a correlated subquery, which MySQL is terrible at.
> Whenever you find yourself doing a correlated subquery, see if you can
> switch it to a derived table with a join, which MySQL is far better at.
> A derived table is like a "virtual" table you create on the fly. It's
> very simple, just assign a name to your query and then treat it as if it
> is a regular table.
Actually, in this case, no need for a derived table. A simple join will
suffice:
SELECT * FROM projects p
JOIN project_tags pt ON p.project_id = pt.project_id
JOIN tags t ON pt.tag_id = t.tag_id
WHERE tags.name='foo';
Make sure you've got indexes on p (project_id), pt (project_id, tag_id),
t (name)
Cheers,
Jay
> So your query would look something like this:
> SELECT projects.* FROM projects
> JOIN (SELECT projects_tags.project_id FROM tags, projects_tags WHERE
> tags.name='foo' AND projects_tags.project_id=projects.id) AS ptagids
> ON project.id=ptagids.project_id
>
> Your IN has become a JOIN and mysql optimizes it far better.
>
> On Oct 19, 2007, at 6:57 PM, Ryan Bates wrote:
>
>> I'm trying to determine why a subquery is slower than running two
>> separate queries. I have a simple many-to-many association using 3
>> tables: projects, tags and projects_tags. Here's the query I'm using
>> to find the projects with a given tag:
>>
>> SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id
>> FROM tags, projects_tags WHERE tags.name='foo' AND
>> projects_tags.project_id=projects.id);
>> (0.36 sec)
>>
>> Compare that with splitting it into two queries:
>>
>> SELECT projects_tags.project_id FROM tags, projects_tags WHERE
>> tags.name='foo' AND projects_tags.project_id=projects.id
>> (0.00 sec) /* returns 1, 2, 3 */
>>
>> SELECT * FROM projects WHERE id IN (1, 2, 3);
>> (0.00 sec)
>>
>> Why is it so much faster? Looking at the explain statement (below) of
>> the one with the subquery, it appears it's not using the primary key
>> index on the projects table. Why is it that MySQL doesn't perform this
>> simple optimization? And is there a solution that will allow me to
>> still use a subquery?
>>
>> I realize I can use a join instead of a subquery, but this is a
>> simplified example.
>>
>>
>> Here's the explain statement:
>>
>> *************************** 1. row ***************************
>> id: 1
>> select_type: PRIMARY
>> table: projects
>> type: ALL
>> possible_keys: NULL
>> key: NULL
>> key_len: NULL
>> ref: NULL
>> rows: 15433
>> Extra: Using where
>> *************************** 2. row ***************************
>> id: 2
>> select_type: DEPENDENT SUBQUERY
>> table: tags
>> type: ref
>> possible_keys: PRIMARY,index_tags_on_name
>> key: index_tags_on_name
>> key_len: 258
>> ref: const
>> rows: 1
>> Extra: Using where; Using index
>> *************************** 3. row ***************************
>> id: 2
>> select_type: DEPENDENT SUBQUERY
>> table: projects_tags
>> type: ref
>> possible_keys: tag_id
>> key: tag_id
>> key_len: 5
>> ref: my_database.tags.id
>> rows: 10
>> Extra: Using where
>>
>>
>> Here's the table dumps:
>>
>> CREATE TABLE `projects` (
>> `id` int(11) NOT NULL auto_increment,
>> `name` varchar(255) default NULL,
>> PRIMARY KEY (`id`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>>
>> CREATE TABLE `tags` (
>> `id` int(11) NOT NULL auto_increment,
>> `name` varchar(255) default NULL,
>> PRIMARY KEY (`id`),
>> KEY `index_tags_on_name` (`name`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>>
>> CREATE TABLE `projects_tags` (
>> `project_id` int(11) default NULL,
>> `tag_id` int(11) default NULL,
>> KEY `tag_id` (`tag_id`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>>
>>
>> I'm using MySQL 5.0.37. Thanks in advance.
>>
>> Ryan
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=1
>>
>
>