List:General Discussion« Previous MessageNext Message »
From:Michael Stassen Date:September 26 2005 5:43pm
Subject:Re: Exists BUG in "IN" ?
View as plain text  
Dyego Souza Dantas Leal wrote:
> Hello Guys,
> 
> 
> I'm using the 5.0.12 version of MySQL PRO on AMD64 3000+ with 1 GB of 
> ram and using the InnoDB Tables..
> 
<snip>
> 
> Here is the Select:
> 
> mysql> select f1.fc_package from svcs_filecontrol f1 where f1.fc_id in (
>    -> *select max(f2.fc_id) from svcs_filecontrol f2
>    -> where f2.fc_lbl_Id = 1
>    -> group by f2.fc_package
>    -> order by f2.fc_versao desc*)
>    -> ;

I assume you added the '*'s for emphasis?

I fail to see how ORDER BY in the _subquery_ helps you here.  In fact, I 
think it slows you down, with no effect on  the output.  Shouldn't this be 
"ORDER BY f1.fc_versao DESC", in the main query?

<snip>
> 
> The problem is ... 7.77 sec to a simple select :( , is a bug ? because 

   Not so simple, as it turns out.

> if i run the SUB-SELECT separated of query , this run faster... look:
> 
> mysql> *select max(f2.fc_id) from svcs_filecontrol f2 where f2.fc_lbl_Id 
> = 1 group by f2.fc_package order by f2.fc_versao desc;*
> +---------------+
> | max(f2.fc_id) |
> +---------------+
> |           243 |
> |             2 |
> |           235 |
> .....
> +---------------+
> 29 rows in set (0.00 sec)
> 
> The DESC of this command is:
> 
> mysql> desc select f1.fc_package from svcs_filecontrol f1 where f1.fc_id 
> in (select max(f2.fc_id) from svcs_filecontrol f2 where f2.fc_lbl_Id = 1 
> group by f
> 2.fc_package order by f2.fc_versao desc);
>
> +----+--------------------+-------+------+-------------------------------------+-----------+---------+-------+------+----------------------------------------------+
> 
> 
> | id | select_type        | table | type | 
> possible_keys                       | key       | key_len | ref   | rows 
> | Extra                                        |
>
> +----+--------------------+-------+------+-------------------------------------+-----------+---------+-------+------+----------------------------------------------+
> 
> 
> |  1 | PRIMARY            | f1    | ALL  | 
> NULL                                | NULL      | NULL    | NULL  | 2440 
> | Using where                                  |
> |  2 | DEPENDENT SUBQUERY | f2    | ref  | 
> fc_lbl_id,fc_lbl_nome_pacote_versao | fc_lbl_id | 4       | const |  435 
> | Using where; Using temporary; Using filesort |
>
> +----+--------------------+-------+------+-------------------------------------+-----------+---------+-------+------+----------------------------------------------+
> 
> 
> 2 rows in set (0.00 sec)
> 
> mysql>
> 
> Is a bug ?

No (not exactly), it's a missing feature.  The problem is that mysql thinks 
you have a dependent subquery, so your subquery is being run once for each 
row of your table.  Ideally, I suppose the optimizer should notice that it 
can run the subquery once, then match rows against the IN list using the 
index, but it doesn't.  Subqueries are relatively new in mysql.  They work, 
but they often are not optimized well.  Fixing that is on the TO-DO list, 
but seems to be a low priority.  Joins, on the other hand, have been around 
a long time and are well-optimized.

This seems to be a version of the groupwise-maximum problem 
<http://dev.mysql.com/doc/mysql/en/example-maximum-column-group-row.html>. 
If I understand your query, I think it's equivalent to

   SELECT f1.fc_package from svcs_filecontrol f1
   WHERE f1.fc_id = ( SELECT max(f2.fc_id)
                      FROM svcs_filecontrol f2
                      WHERE f1.fc_package = f2.fc_package
                        AND f2.fc_lbl_Id = 1)
   ORDER BY f1.fc_versao DESC;

which is how the manual solves this problem.  I think that will still be a 
dependent subquery, though, so I'm not sure it will be any faster.

A better bet is probably to do this in two steps.  Get the ids using the 
inner query and store them in a temporary table, then join to the temporary 
table to get the rows you want.  Something like:

   CREATE TEMPORARY TABLE max_ids
     SELECT MAX(fc_id) AS max_id
     FROM svcs_filecontrol
     WHERE fc_lbl_Id = 1
     GROUP BY fc_package;

   SELECT f.fc_package
   FROM svcs_filecontrol f
   JOIN max_ids m ON f.fc_id = m.max_id
   ORDER BY f.fc_versao DESC;

   DROP TABLE max_ids;

That may look like a pain, but it should certainly be fast.

Michael
Thread
Exists BUG in "IN" ?Dyego Souza Dantas Leal26 Sep
  • Re: Exists BUG in "IN" ?Michael Stassen26 Sep
    • Global ReplaceJohn Berman30 Sep
      • Re: Global ReplaceScott Noyes30 Sep
      • Re: Global ReplaceSGreen30 Sep
RE: Global ReplaceSujay Koduri30 Sep
  • RE: Global ReplaceJohn Berman30 Sep