List:General Discussion« Previous MessageNext Message »
From:Michael Stassen Date:September 14 2005 3:21pm
Subject:Re: unexpected EXPLAIN result with subselect
View as plain text  
Matthias Pigulla wrote:
> Hi all,
> 
> can someone explain me why this query gets executed the way it is? I
> simplified it as much as possible and think it's a conceptual/logical
> thing, so I'll omit - at least for now - the table definitions and
> sample data for brevity.
> 
> -- "Superquery"
> EXPLAIN SELECT t1.id, t1.name
> FROM document AS t1
> WHERE t1.id
> IN (
> 	-- "Subquery"
> 	SELECT DISTINCT data_id
> 	FROM wfd_reference
> 	WHERE wfd_field_id =699
> 	AND ref_data_id
> 	IN ( 171 )
> )
> 
> This gives:
> 
> *************************** 1. row ***************************
>            id: 1
>   select_type: PRIMARY
>         table: t1
>          type: ALL
> possible_keys: NULL
>           key: NULL
>       key_len: NULL
>           ref: NULL
>          rows: 277
>         Extra: Using where
> *************************** 2. row ***************************
>            id: 2
>   select_type: DEPENDENT SUBQUERY
>         table: wfd_reference
>          type: ref
> possible_keys: field_data,test
>           key: field_data
>       key_len: 4
>           ref: const,func
>          rows: 4
>         Extra: Using where; Using temporary
> 
> However - why do we need the *dependent* subquery at all? The subquery
> can be executed on its own, as it does not depend on any information of
> the "superquery".
> 
> EXPLAIN SELECT DISTINCT data_id
> FROM wfd_reference
> WHERE wfd_field_id =699
> AND ref_data_id
> IN ( 171 ) 
> 
> *************************** 1. row ***************************
>            id: 1
>   select_type: SIMPLE
>         table: wfd_reference
>          type: ref
> possible_keys: field_data,test
>           key: test
>       key_len: 4
>           ref: const,const
>          rows: 9
>         Extra: Using where; Using temporary
> 
> Now if I just take the result of this query, concat the data_ids on the
> application level and build the superquery as follows:
> 
> -- "two-staged superquery variant"
> EXPLAIN SELECT t1.id, t1.name
> FROM document AS t1
> WHERE t1.id
> IN (
> 32, 31, 30, 53, 56, 57, 58, 59, 60, 111
> )
> 
> *************************** 1. row ***************************
>            id: 1
>   select_type: SIMPLE
>         table: t1
>          type: range
> possible_keys: PRIMARY
>           key: PRIMARY
>       key_len: 2
>           ref: NULL
>          rows: 10
>         Extra: Using where
> 
> the results seem to be much better. 
> 
> I was afraid of the dependent subquery for the "ALL" scan of t1, as t1
> will become huge. OTOH, the subquery will be very restrictive: The
> number of data_ids will always be very small, at least compared to the
> number of rows in t1.
> 
> Are there any non-obvious reasons for the behaviour described above? Is
> that something that cannot be optimized right now? Am I too
> short-sighted with my "optimization" approach?
> 
> Thanks a lot,
> Matthias

As others have already pointed out, mysql often doesn't optimize subqueries 
properly, and fixing that doesn't seem to be a high priority right now. 
Subqueries are relatively new in mysql, so it probably shouldn't be 
surprising that they aren't optimized as well as other things.  Fortunately, 
most subqueries can be rewritten as joins 
<http://dev.mysql.com/doc/mysql/en/rewriting-subqueries.html>, which mysql 
does a good job of optimizing.

I believe your query is equivalent to

   SELECT DISTINCT t1.id, t1.name
   FROM document AS t1
   JOIN wfd_reference AS wfd ON t1.id = wfd.data_id
   WHERE wfd.wfd_field_id = 699
     AND wfd.ref_data_id IN (171);

which should perform better than the (incorrectly) dependent subquery, 
because it will be properly optimized.  Mysql will use the 'test' index in 
wfd_reference to pick the rows which match the conditions, then use t1's 
primary key to get the corresponding rows from t1.

I notice the DISTINCT in your subquery.  I take it that there may be 
multiple rows in wfd_reference with the same data_id, wfd_field_id = 699, 
and ref_data_id IN (171)?  (If not, just leave out the DISTINCT in the 
above.)  In that case, it may be possible that your 2-step approach will be 
even faster than the JOIN with DISTINCT.  You can do this directly in mysql, 
instead of in your app, by using a temporary table to store the inner query 
result.  Something like

   CREATE TEMPORARY TABLE matches
     SELECT DISTINCT data_id
     FROM wfd_reference
     WHERE wfd_field_id = 699
       AND ref_data_id IN (171);

   SELECT t1.id, t1.name
   FROM document AS t1
   JOIN matches ON t1.id = matches.data_id;

   DROP TABLE matches;

Michael


Thread
unexpected EXPLAIN result with subselectMatthias Pigulla14 Sep
  • Re: unexpected EXPLAIN result with subselectFelix Geerinckx14 Sep
  • Re: unexpected EXPLAIN result with subselectGleb Paharenko14 Sep
  • Re: unexpected EXPLAIN result with subselectMichael Stassen14 Sep