List:General Discussion« Previous MessageNext Message »
From:Larry Martell Date:September 19 2012 2:26pm
Subject:Re: getting certain rows from a group by
View as plain text  
On Wed, Sep 19, 2012 at 12:04 AM,  <hsv@stripped> wrote:
>>>>> 2012/09/18 06:53 -0400, Larry Martell >>>>
> This works fine. But now I need to get a certain column
> (image_measurer_id) with each row returned that corresponds to the row
> from the group that has bottom = Min(bottom), bottom = Max(bottom),
> bottom closest to Avg(bottom), and bottom from the row where date_time
> = Max(date_time).
> <<<<<<<<
> There is a standard and ugly way of getting such a thing, but it entails repeating
> the table reference (supposing you want any, not every, to match):
>
> SELECT data_cst.target_name_id, image_measurer_id, nb, xb, vb, xt, bottom
> FROM (SELECT Min(bottom) AS nb, Max(bottom) AS xb, MIN(ABS(Avg(bottom) - bottom)) AS
> vb, Max(date_time) AS xt
> FROM data_cst
>       WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118,
> 119, 120, 121)
>       AND DATE(data_cst.date_time) = '2010-03-04'
>       GROUP BY target_name_id, ep, wafer_id
>       HAVING count(*) < 999) AS st
> JOIN data_cst ON st.target_name_id = data_cst.target_name_id AND (nb = bottom OR xb =
> bottom OR vb = bottom OR date_time = xt)
>
> One record will be returned for every row that holds a relevant extremum, not
> guaranteed to be unique.
>
> This query pertains only to your original subquery, not the whole query. To get the
> result to which you refer, join this to your original query, to which you have added
> something like vb:
>
> SELECT *
> FROM (original query) JOIN (this query) ON nb = minbottom AND xb = maxbottom AND xt =
> "Last Run" AND vb = .... AND (original query).target_name_id =
> (this_query).target_name_id

Thanks. I will look into doing it like this.

I got the min, max, and date conditions to work by adding the joins
that Rick suggested:

join data_cst mn ON mn.bottom = q1.minbottom
join data_cst mx ON mx.bottom = q1.maxbottom
join data_cst lr ON lr.date_time = q1.LastRun

But now you have me wondering - if this just matching on that one
field, then it can be returning a row from data_cst that is not part
of the original result set? Do I need to add all the same select
criteria to each join?

> --but I am not confident in the result. There are problems in the original query, the
> biggest that of avgbottom, stdbottom, maxbottom, minbottom none are aggregated over
> wafer_id.

The inner query aggregates over target_name_id, ep, wafer_id, and the
the outer query aggregates that result set over target_name_id, ep.
The avg and std are what the client wants - average of averages and
standards deviations. As are the min and max - the outer query returns
the min and max of all the wafers for each target, ep.

> Therefore, it is not certain from which record from q1 they are returned. MySQL tends
> to pick the first that fits, but not only is nothing guaranteed, it is explicitly written
> that if such not aggregated fields appear, the output is not determinate unless all
> pertinent are equal, the possibility wherof the reason for allowing it.

Yes, if case with dups probably had not been thought of by them, and
will have to be addressed at some point.

>
> When that has been handled, it is needful to change the foregoing query to match that
> one in two levels, because averaging is not associative (there is also the problem of
> equality-testing on generated floating-point numbers). If it were only MAX and MIN, one
> level of not GROUPing BY wafer_id would be all right.
>
> By the way, in the original query, I suggest instead of
> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
> '2010-03-04 23:59:59'
> this,
> DATE(data_cst.date_time) = '2010-03-04'

The date_time was just an example. Depending on user input the query
could have times in it.

> Instead of
>
>      ... q1,
>      data_target
> WHERE data_target.id = target_name_id
>
> this would be better:
>
>      ... q1 JOIN data_target ON data_target.id = target_name_id

Why is one better then the other?

> I believe that
>      100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100)
>    = 100*SUM(numbottoms)/SUM(numonep)
> and
>      SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END)
>    = SUM(GREATEST(reruns, 0))

Yes, thanks - yours is clearer and I've made those changes.
Thread
getting certain rows from a group byLarry Martell18 Sep
  • Re: getting certain rows from a group byPeter Brawley18 Sep
    • Re: getting certain rows from a group byLarry Martell18 Sep
      • RE: getting certain rows from a group byRick James18 Sep
        • Re: getting certain rows from a group byLarry Martell18 Sep
          • RE: getting certain rows from a group byRick James18 Sep
            • Re: getting certain rows from a group byLarry Martell18 Sep
              • RE: getting certain rows from a group byRick James18 Sep
                • Re: getting certain rows from a group byLarry Martell19 Sep
                  • Re: getting certain rows from a group byLarry Martell19 Sep
                    • Re: getting certain rows from a group byLarry Martell19 Sep
                      • RE: getting certain rows from a group byRick James19 Sep
                        • Re: getting certain rows from a group byLarry Martell20 Sep
      • Re: getting certain rows from a group byLarry Martell18 Sep
  • Re: getting certain rows from a group byhsv19 Sep
    • Re: getting certain rows from a group byLarry Martell19 Sep
      • Re: getting certain rows from a group byLarry Martell19 Sep
    • Re: getting certain rows from a group byLarry Martell19 Sep
      • Re: getting certain rows from a group byhsv19 Sep