From: Larry Martell Date: September 19 2012 2:26pm Subject: Re: getting certain rows from a group by List-Archive: http://lists.mysql.com/mysql/228195 Message-Id: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, Sep 19, 2012 at 12:04 AM, wrote: >>>>> 2012/09/18 06:53 -0400, Larry Martell >>>> > This works fine. But now I need to get a certain column > (image_measurer_id) with each row returned that corresponds to the row > from the group that has bottom =3D Min(bottom), bottom =3D Max(bottom), > bottom closest to Avg(bottom), and bottom from the row where date_time > =3D Max(date_time). > <<<<<<<< > There is a standard and ugly way of getting such a thing, but it entails = repeating the table reference (supposing you want any, not every, to match)= : > > SELECT data_cst.target_name_id, image_measurer_id, nb, xb, vb, xt, bottom > FROM (SELECT Min(bottom) AS nb, Max(bottom) AS xb, MIN(ABS(Avg(bottom) - = bottom)) AS vb, Max(date_time) AS xt > FROM data_cst > WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116,= 117, 118, 119, 120, 121) > AND DATE(data_cst.date_time) =3D '2010-03-04' > GROUP BY target_name_id, ep, wafer_id > HAVING count(*) < 999) AS st > JOIN data_cst ON st.target_name_id =3D data_cst.target_name_id AND (nb = =3D bottom OR xb =3D bottom OR vb =3D bottom OR date_time =3D xt) > > One record will be returned for every row that holds a relevant extremum,= not guaranteed to be unique. > > This query pertains only to your original subquery, not the whole query. = To get the result to which you refer, join this to your original query, to = which you have added something like vb: > > SELECT * > FROM (original query) JOIN (this query) ON nb =3D minbottom AND xb =3D ma= xbottom AND xt =3D "Last Run" AND vb =3D .... AND (original query).target_n= ame_id =3D (this_query).target_name_id Thanks. I will look into doing it like this. I got the min, max, and date conditions to work by adding the joins that Rick suggested: join data_cst mn ON mn.bottom =3D q1.minbottom join data_cst mx ON mx.bottom =3D q1.maxbottom join data_cst lr ON lr.date_time =3D q1.LastRun But now you have me wondering - if this just matching on that one field, then it can be returning a row from data_cst that is not part of the original result set? Do I need to add all the same select criteria to each join? > --but I am not confident in the result. There are problems in the origina= l query, the biggest that of avgbottom, stdbottom, maxbottom, minbottom non= e are aggregated over wafer_id. The inner query aggregates over target_name_id, ep, wafer_id, and the the outer query aggregates that result set over target_name_id, ep. The avg and std are what the client wants - average of averages and standards deviations. As are the min and max - the outer query returns the min and max of all the wafers for each target, ep. > Therefore, it is not certain from which record from q1 they are returned.= MySQL tends to pick the first that fits, but not only is nothing guarantee= d, it is explicitly written that if such not aggregated fields appear, the = output is not determinate unless all pertinent are equal, the possibility w= herof the reason for allowing it. Yes, if case with dups probably had not been thought of by them, and will have to be addressed at some point. > > When that has been handled, it is needful to change the foregoing query t= o match that one in two levels, because averaging is not associative (there= is also the problem of equality-testing on generated floating-point number= s). If it were only MAX and MIN, one level of not GROUPing BY wafer_id woul= d be all right. > > By the way, in the original query, I suggest instead of > data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND > '2010-03-04 23:59:59' > this, > DATE(data_cst.date_time) =3D '2010-03-04' The date_time was just an example. Depending on user input the query could have times in it. > Instead of > > ... q1, > data_target > WHERE data_target.id =3D target_name_id > > this would be better: > > ... q1 JOIN data_target ON data_target.id =3D target_name_id Why is one better then the other? > I believe that > 100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) > =3D 100*SUM(numbottoms)/SUM(numonep) > and > SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) > =3D SUM(GREATEST(reruns, 0)) Yes, thanks - yours is clearer and I've made those changes.