MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:Rick James Date:September 15 2008 5:46pm
Subject:RE: SELECT DISTINCT with ORDER BY implementation
View as plain text  
DISTINCT will return only one row.

GROUP BY will return one of each set.

Maybe something like this (a huge kludge):
SELECT id
  FROM (SELECT ... ORDER BY ...)
  GROUP BY ...
Which will get the first id from each 'group', as ordered by the inner
sort order.

 
Rick James
MySQL Geeks - Consulting & Review

 

> -----Original Message-----
> From: Mariella Petrini [mailto:mariellapetrini@stripped] 
> Sent: Monday, September 15, 2008 10:29 AM
> To: Rick James; spetrunia@stripped; 
> internals@stripped; mysql@stripped
> Subject: RE: SELECT DISTINCT with ORDER BY implementation
> 
> 
> > Eh?  Don't you need two SELECTs to "Retrieve ...
> > only one id ... and
> > list the unique ids ..."?
> 
> What do you mean exactly ?
> 
> 
> The query that I was using is the following:
> 
> 
> select DISTINCT(id) from 
> (select id from
> sort_table FORCE INDEX (info1) where info1=7 AND info2=7 and 
> info3=7 and  (sort  >='2008-10-10' AND sort <= '2008-11-11') 
> ORDER BY sort DESC) as t
> 
> The inner query supposedly returns a list of ids that match 
> the WHERE condition and sorted by sort field DESC
> 
> The outer query should pick up only the first id in each 
> duplicate subset of ids and discard the duplicates
> 
> In reality the set of distinct ids returned is sorted by the 
> sort field.
> 
> HOwever Sergey pointed out that when the upper query is run, 
> using the temporary table as a regular base table, is assumed 
> that the data in the temporary table is not  ordered.
> 
> 
> I got one more question, plz:
> 
> 1) How is a result set returned by an inner query 
> copied/dumped/inserted into a temporary table for the outer 
> query. How is the copy criteria chosen (top-bottom, 
> bottom-top, random depending on ..) ?
> 
> 
> Thanks,
> 
> Mariella
> 
> 
> 
> --- On Mon, 9/15/08, Rick James <rjames@stripped> wrote:
> 
> > From: Rick James <rjames@stripped>
> > Subject: RE: SELECT DISTINCT with ORDER BY implementation
> > To: mariellapetrini@stripped, "Sergey Petrunia" 
> <spetrunia@stripped>, internals@stripped, 
> mysql@stripped
> > Date: Monday, September 15, 2008, 9:44 AM
> > Eh?  Don't you need two SELECTs to "Retrieve ...
> > only one id ... and
> > list the unique ids ..."?
> > 
> >  
> > Rick James
> > MySQL Geeks - Consulting & Review
> > 
> >  
> > 
> > > -----Original Message-----
> > > From: Mariella Petrini
> > [mailto:mariellapetrini@stripped] 
> > > Sent: Sunday, September 14, 2008 8:04 PM
> > > To: Sergey Petrunia; internals@stripped; Rick
> > James; 
> > > mysql@stripped
> > > Subject: Re: SELECT DISTINCT with ORDER BY
> > implementation
> > > 
> > > Hi,
> > > 
> > > Thanks a lot for your answers.
> > > 
> > > 
> > > > 
> > > >   SELECT ... FROM (SELECT ... ORDER BY x) tbl
> > > > 
> > > > With MySQL, the inner ORDER BY just adds
> > overhead, because
> > > > execution
> > > > proceeds as follows:
> > > > 
> > > >  * Run the inner query, making it to produce
> > result in the
> > > > specified
> > > >    ordering. Dump the result into a temporaty
> > table.
> > > >  * Run the upper query, using the temporary table
> > as a
> > > > regular base table,
> > > >    in particular assuming that the data in the
> > temporary
> > > > table is not
> > > >    ordered.
> > > > 
> > > > That is, the ORDER BY just causes overhead and no
> > benefit.
> > > 
> > > Did I understand correctly that the ORDER BY  executed
> > in the 
> > > inner query could get lost when the "result
> > set" is dumped 
> > > into the temporary table to be used by the upper query
> > ? 
> > > 
> > > 
> > > 
> > > My original intent was the following:
> > > 
> > > Retrieve from the "sort" table only one id
> > (several rows with 
> > > the same id match the WHERE condition)
> > > that would match the WHERE criteria and list the
> > unique ids 
> > > sorted by the "sort" field in DESC mode
> > > 
> > > 
> > > WHERE condition is :
> > > WHERE info1=7 AND info2=7 and info3=7 and 
> > > (sort >='2008-10-10' AND sort <=
> > '2008-11-11') .... ) 
> > > 
> > > SORT is :
> > > ORDER BY sort DESC
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 1)
> > > The following query does not retrieve what I would
> > like:
> > > 
> > > SELECT DISTINCT (id) from sort_table WHERE info1=7 AND
> > 
> > > info2=7 and info3=7 and  (sort
> > >='2008-10-10' AND sort <= 
> > > '2008-11-11') ORDER BY sort DESC)
> > > because in the case of DISTINCT the execution of
> > DISTINCT is 
> > > run before of the ORDER BY (so the unique rows are not
> > 
> > > ordered by sort field)
> > > 
> > > 
> > > 2)
> > > The following query it seems to retrieve only one row 
> > that 
> > > matches the WHERE condition and the unique rows
> > (unique ids) 
> > > are ordered by the sort field DESC
> > > 
> > > 
> > > select DISTINCT(id) from (select id from
> > > sort_table FORCE INDEX
> > > (info1) where info1=7 AND info2=7 and info3=7 and 
> > (sort 
> > > >='2008-10-10' AND sort <=
> > '2008-11-11') ORDER BY sort DESC) as t
> > > 
> > > 
> > > 
> > > 3) If there is a better way (more efficient and
> > possibly 
> > > without changing the structure of the table and
> > possibly the index) 
> > > to get the list of the unique ids 
> > > that satisfy the WHERE condition and get them ordered
> > by the 
> > > 'sort' field DESC, I would be happy to follow
> > it
> > > 
> > > 
> > > P.S. The index "info" is a combined index on
> > (info1,info2,info3)
> > > P.S. The table contains one extra field (info4) that
> > is listed 
> > >      in the WHERE condition as well
> > >      So in reality the WHERE conditions is a little
> > longer 
> > > and looks like:
> > > 
> > > 
> > > WHERE info1=7 AND info2=7 and info3=7 and  (sort 
> > > >='2008-10-10' AND sort <=
> > '2008-11-11') and (info4 >= ... 
> > > and info4 <=...) ORDER BY sort DESC
> > > 
> > > 
> > > 
> > > Thanks a lot for your help,
> > > 
> > > Mariella
> > > 
> > > 
> > > --- On Fri, 9/12/08, Sergey Petrunia
> > <spetrunia@stripped> wrote:
> > > 
> > > > From: Sergey Petrunia <spetrunia@stripped>
> > > > Subject: Re: SELECT DISTINCT with ORDER BY
> > implementation
> > > > To: mariellapetrini@stripped
> > > > Cc: internals@stripped, "Rick
> > James" 
> > > <rjames@stripped>, mysql@stripped
> > > > Date: Friday, September 12, 2008, 1:24 AM
> > > > Hi Mariella,
> > > > 
> > > > Short answers:
> > > >  * you're seeing an EXPLAIN bug
> > > >  * FROM (SELECT ... ORDER BY) just adds overhead.
> > > > 
> > > > Long answer:
> > > > 
> > > > #
> > > > #  Filling the table
> > > > #
> > > > create table t0 (a int);
> > > > insert into t0 values
> > > > (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
> > > > # insert 100K rows
> > > > insert into sort_table select A.a, now(), B.a,
> > C.a, D.a
> > > > from t0 A, t0
> > > > B, t0 C, t0 D, t0 E;
> > > > 
> > > > #
> > > > # Pick the constants such that conditions on
> > info(1,2,3)
> > > > match 100 records,
> > > > # and condition on sort matches nothing.
> > > > #
> > > > mysql> explain select DISTINCT(id) from
> > (select id from
> > > > sort_table
> > > > FORCE INDEX (info1) where info1=7 AND info2=7 and
> > info3=7
> > > > and  (sort
> > > > >='2008-10-10' AND sort <=
> > > > '2008-11-11') ORDER BY sort DESC) t\G
> > > > *************************** 1. row
> > > > ***************************
> > > >            id: 1
> > > >   select_type: PRIMARY
> > > >         table: <derived2>
> > > >          type: system
> > > > possible_keys: NULL
> > > >           key: NULL
> > > >       key_len: NULL
> > > >           ref: NULL
> > > >          rows: 0
> > > >         Extra: const row not found
> > > > *************************** 2. row
> > > > ***************************
> > > >            id: 2
> > > >   select_type: DERIVED
> > > >         table: sort_table
> > > >          type: ALL
> > > > possible_keys: info1
> > > >           key: info1
> > > >       key_len: 12
> > > >           ref:
> > > >          rows: 99
> > > >         Extra: Using filesort
> > > > 2 rows in set (0.01 sec)
> > > > 
> > > > #
> > > > # Now analyze the counters:
> > > > #
> > > > mysql> flush status;
> > > > Query OK, 0 rows affected (0.01 sec)
> > > > 
> > > > mysql> select DISTINCT(id) from (select id
> > from
> > > > sort_table FORCE INDEX
> > > > (info1) where info1=7 AND info2=7 and info3=7 and
> >  (sort
> > > > >='2008-10-10' AND sort <=
> > > > '2008-11-11') ORDER BY sort DESC) t\G
> > > > Empty set (0.01 sec)
> > > > 
> > > > mysql> show status like 'handler%';
> > > > +----------------------------+-------+
> > > > | Variable_name              | Value |
> > > > +----------------------------+-------+
> > > > | Handler_commit             | 0     |
> > > > | Handler_delete             | 0     |
> > > > | Handler_discover           | 0     |
> > > > | Handler_prepare            | 0     |
> > > > | Handler_read_first         | 1     |
> > > > | Handler_read_key           | 1     |
> > > > | Handler_read_next          | 100   | ---(1)
> > > > | Handler_read_prev          | 0     |
> > > > | Handler_read_rnd           | 0     |
> > > > | Handler_read_rnd_next      | 1     | ---(2)
> > > > | Handler_rollback           | 0     |
> > > > | Handler_savepoint          | 0     |
> > > > | Handler_savepoint_rollback | 0     |
> > > > | Handler_update             | 0     |
> > > > | Handler_write              | 14    |
> > > > +----------------------------+-------+
> > > > 15 rows in set (0.00 sec)
> > > > 
> > > > When we look at (1) and (2) it is a apparent that
> > > > * Full table scan wasn't done. If it was,
> > we'd see
> > > > 100K reads
> > > > * Instead it did an index scan, and got 100
> > records. This
> > > > is exactly how
> > > >   many records the range scan over "info1=7
> > AND
> > > > info2=7 and info3=7" would
> > > >   produce.
> > > > This leads us to conclude that this is an EXPLAIN
> > bug.
> > > > I'm going to file
> > > > this.
> > > > 
> > > > 
> > > > Btw, what was your intent in having a query in
> > form
> > > > 
> > > >   SELECT ... FROM (SELECT ... ORDER BY x) tbl
> > > > 
> > > > With MySQL, the inner ORDER BY just adds
> > overhead, because
> > > > execution
> > > > proceeds as follows:
> > > > 
> > > >  * Run the inner query, making it to produce
> > result in the
> > > > specified
> > > >    ordering. Dump the result into a temporaty
> > table.
> > > >  * Run the upper query, using the temporary table
> > as a
> > > > regular base table,
> > > >    in particular assuming that the data in the
> > temporary
> > > > table is not
> > > >    ordered.
> > > > 
> > > > That is, the ORDER BY just causes overhead and no
> > benefit.
> > > > 
> > > > BR
> > > >  Sergey
> > > > --
> > > > Sergey Petrunia, Lead Software Engineer
> > > > MySQL AB, www.mysql.com
> > > > Office: N/A
> > > > Blog: http://s.petrunia.net/blog
> > > > 
> > > > -- 
> > > > MySQL Internals Mailing List
> > > > For list archives:
> > http://lists.mysql.com/internals
> > > > To unsubscribe:   
> > > >
> > http://lists.mysql.com/internals?unsub=1
> > > 
> > > 
> > >       
> > >
> 
> 
>       
> 
Thread
SELECT DISTINCT with ORDER BY implementationMariella Petrini10 Sep
  • RE: SELECT DISTINCT with ORDER BY implementationRick James10 Sep
    • RE: SELECT DISTINCT with ORDER BY implementationMariella Petrini10 Sep
RE: SELECT DISTINCT with ORDER BY implementationMariella Petrini11 Sep
  • Re: SELECT DISTINCT with ORDER BY implementationSergey Petrunia12 Sep
RE: SELECT DISTINCT with ORDER BY implementationMariella Petrini11 Sep