List:General Discussion« Previous MessageNext Message »
From:Michael Stassen Date:October 31 2005 6:49pm
Subject:Re: 'GROUP BY' behavior
View as plain text  
Bill Adams wrote:
>  
> Shawn,
> 
> That's a very reasonable answer.  Thanks for pointing me to the
> examples.  This one addresses the second question:
> http://dev.mysql.com/doc/refman/4.1/en/example-maximum-column-group-row.
> html .  There is no example answering both questions in one query.
> 
> Regards,
> Bill

Right.  The key is in the questions you posed in your first post:

> SELECT hostname, volname, qtreename, round(avg(used/allocated*100),0), 
>        round(avg(used)), allocated, available
> FROM quota_entries
> WHERE date_sub(now(), interval 1 day) < timestamp
> GROUP BY qtreename, hostname
> 
> My questions: 
> From which of the several averaged rows do 'allocated' and 'available'
> come in the results?  Can I control this?  I would like the row with 
> maximum timestamp.

You don't get rows with GROUP BY.  Instead, you get group names and aggregate 
functions.  In your query, qtreename and hostname define the groups, and AVG() 
is an aggregate function.  That leaves volname, allocated, and available.  Other 
systems wouldn't even allow this query, because those 3 are neither aggregate 
functions nor named in the GROUP BY.  MySQL allows this as a convenience, but 
you are warned not to use columns whose values vary within groups 
<http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-fields.html>.

As mysql assumes a unique value of volname, allocated, and available per group, 
it is free to pull those values out of *any* row in the group.  In practice, it 
seems to use the first row found per group (for efficiency, I expect), but you 
should treat it as if it were random.

So, there just isn't a one-query solution.  You should be able to modify the 
example Shawn pointed to in the manual to do what you want, though.  If you 
store the two averages along with the max timestamp per group in a temporary 
table with your first query, you can join your table to the temp table in your 
second query to get what you want.  Something like this:

## Create the temp table.  Adjust the column types to fit
   CREATE TEMPORARY TABLE group_stats
   (  qtreename CHAR(30),
      hostname CHAR(30),
      max_time TIMESTAMP,
      percent INT,
      used INT
   );

   LOCK TABLES quota_entries READ;

## Put the aggregate values for each group into the temp table
   INSERT INTO group_stats
   SELECT qtreename, hostname,
          MAX(timestamp),
          ROUND(AVG(used)),
          ROUND(AVG(used/allocated*100),0)
   FROM quota_entries
   WHERE timestamp > NOW() - INTERVAL 1 DAY
   GROUP BY qtreename, hostname;

## Join your table to the temp table to get your results
   SELECT qe.hostname, qe.volname, qe.qtreename,
          gs.used, gs.percent,
          qe.allocated, qe.available
   FROM group_stats gs
   JOIN quota_entries qe
     ON  qe.qtreename = gs.qtreename
     AND qe.hostname = gs.hostname
     AND qe.timestamp = gs.max_time

## Clean up
   UNLOCK TABLES;
   DROP TABLE group_stats;

Michael
Thread
'GROUP BY' behaviorBill Adams28 Oct
  • Re: 'GROUP BY' behaviorSGreen28 Oct
RE: 'GROUP BY' behaviorBill Adams29 Oct
  • Re: 'GROUP BY' behaviorMichael Stassen31 Oct