List:General Discussion« Previous MessageNext Message »
From:Michael Stassen Date:September 16 2004 2:47pm
Subject:Re: Query with group by
View as plain text  
Jose Miguel Pérez wrote:

> Hi Vincent!
> 
>>i'm trying to built a query that report me the latest tuple for a given
>>field.
>>I explain myself a bit. Let's say i have a table such this :
> 
>     There is no simple solution to your problem. Strictly speaking, it's
> forbidden to name a column in the SELECT clause which is NOT on the GROUP BY
> clause. (Read chapter: "13.9.3 GROUP BY with Hidden Fields" from the MySQL
> Documentation.). Anyway, MySQL is relaxed from ANSI SQL in such a way it's
> permitted to include a column which is not in the GROUP BY clause (hidden
> columns). The values returned from such a query are "unpredictable".
> 
>     I started saying there is no _simple_ solution. However, you can think a
> little bit and your problem goes away. I can think of two solutions to your
> problem which neither involves a subquery.
> 
>     You are lucky in that you know for sure what your last value is (you do
> have a "date" field"). You already know the value of using MAX(date) in the
> SELECT. You can concatenate the date column with the version, and return a
> "SUBSTRING" from that operation, something like this:
> 
>         SELECT SUBSTRING_INDEX(MAX(CONCAT(date, ' ', version)), ' ', -1) AS
> correct_version,
>                 location, version AS wrong_version
>         FROM cities
>         WHERE content='ALPHA'
>         GROUP BY location

This is the MAX-CONCAT trick.  It works, but it's inefficient, as it has to 
do a full table scan, with calculations done on each row.  An index on date 
cannot be used in this case, because we are searching for the maximum 
CONCAT(date, ' ', version), rather than the maximum date.

>     You will get the following result set: (Notice the "correct_version"
> column).
> 
> +-----------------+----------+---------------+
> | correct_version | location | wrong_version |
> +-----------------+----------+---------------+
> | 10              | NEW-YORK |            11 |
> | 10              | PARIS    |            10 |
> | 11              | TOKYO    |            10 |
> +-----------------+----------+---------------+
> 
>     There is another option, however, and this one is what I like most. In
> two words, you can join the table with itself, like this:
> 
>         SELECT c1.date, c1.location, c1.version
>         FROM cities c1
>         LEFT JOIN cities c2
>             ON c1.location=c2.location AND c1.content=c2.content
>                 AND c2.date>c1.date
>         WHERE c2.id IS NULL AND c1.content = 'ALPHA'
> 
>     What we have done here is first LEFT JOIN the table (cities in my
> example) with itself so that we have in the left part those rows which have
> the maximum date and with NULL on the right as the value of c2. We then
> select those saying "WHERE c2.id IS NULL". The last operation is selecting
> the rows for an 'ALPHA' content.

This will work, but it is also somewhat inefficient.  The LEFT JOIN is 
creating numerous extra, unwanted rows, only to throw them away with the 
WHERE c2.id IS NULL.  Assuming n rows for a particular location value, you 
are creating 1 + (n * (n - 1)/2) rows {optimized down to n rows, if id is 
defined NOT NULL} to find the one row you want for that group.  That's no 
big deal for this small sample table, but it may not scale well.

>     Notice that with this version we also eliminate the GROUP BY, since all
> rows returned from the join are unique. The result set returned with the
> last example is this:
> 
> +------------+----------+---------+
> | date       | location | version |
> +------------+----------+---------+
> | 2004-09-16 | PARIS    |      10 |
> | 2004-09-16 | NEW-YORK |      10 |
> | 2004-09-15 | TOKYO    |      11 |
> +------------+----------+---------+

The most efficient way is probably to use a temporary table.

   CREATE TEMPORARY TABLE max_dates
   SELECT location, MAX(date) AS max_date
   FROM temp
   WHERE content = 'ALPHA'
   GROUP BY location;

   SELECT t.*
   FROM temp t, max_dates m
   WHERE t.location = m.location
   AND t.date = m.max_date;

   DROP TABLE max_dates;

The manual describes the MAX-CONCAT trick, the temporary table solution, and 
a subquery solution for 4.1+ 
<http://dev.mysql.com/doc/mysql/en/example-Maximum-column-group-row.html>.

>     Cheers,
>     Jose Miguel.

Michael
Thread
Query with group byVincent.Badier16 Sep
  • Re: Query with group byRhino16 Sep
    • Re: Query with group byVincent.Badier16 Sep
  • Re: Query with group byJose Miguel Pérez16 Sep
    • Re: Query with group byMichael Stassen16 Sep
  • Re: Query with group byRhino16 Sep
    • Re: Query with group byMichael Stassen17 Sep
  • Re: Query with group byJose Miguel Pérez16 Sep
    • Re: Query with group byMichael Stassen17 Sep
      • RE: Query with group byJose Miguel Pérez22 Sep
  • Re: Query with group byRhino17 Sep
    • Re: Query with group byMichael Stassen18 Sep
  • Re: Query with group byRhino18 Sep