From: Takeshi Hashimoto Date: January 21 2014 11:03pm Subject: Re: grouping by the difference between values in rows List-Archive: http://lists.mysql.com/mysql/230053 Message-Id: MIME-Version: 1.0 (1.0) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable For me, it seems just use [case - when ] on the difference between x and y, a= nd group by with output.=20 Good luck *\(^o^)/* Sent from my iPhone On Jan 21, 2014, at 15:38, hsv@stripped wrote: >>>>> 2014/01/12 14:17 -0500, Larry Martell >>>> > I've been asked to do something that I do not think is possible in SQL. >=20 > I have a query that has this basic form: >=20 > SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f > FROM t > GROUP BY a, b, c, d, f >=20 > x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or > 10053.490, 2542.094). >=20 > The business issue is that if either x or y in 2 rows that are in the > same a, b, c, d group are within 1 of each other then they should be > grouped together. And to make it more complicated, the tolerance is > applied as a rolling continuum. For example, if the x and y in a set > of grouped rows are: >=20 > row 1: 1.5, 9.5 > row 2: 2.4, 20.8 > row 3: 3.3, 40.6 > row 4: 4.2, 2.5 > row 5: 5.1, 10.1 > row 6: 6.0, 7.9 > row 7: 8.0, 21.0 > row 8: 100, 200 >=20 > 1 through 6 get combined because all their X values are within the > tolerance of some other X in the set that's been combined. 7's Y value > is within the tolerance of 2's Y, so that should be combined as well. > 8 is not combined because neither the X or Y value is within the > tolerance of any X or Y in the set that was combined. >=20 > In python I can easily parse the data and identify the rows that need > to be combined, but then I've lost the ability to calculate the > average and std. The only way I can think of to do this is to remove > the grouping from the SQL and do all the grouping and aggregating > myself. But this query often returns 20k to 30k rows after grouping. > It could easily be 80k to 100k rows that I have to process if I remove > the grouping and I think that will be very slow. >=20 > Anyone have any ideas? > <<<<<<<< > I suspect you can carry out their ideas by something like this, in an SQL p= rocedure: >=20 > Besides your table "t", there are tables "t1", "t2", "t3", "tpair", and "t= q". >=20 > With a cursor copy records from "t" to "t1" (with all of "t" s fields and a= n "g1" besides) ordered by a, b, c, d, x, going through all the complicatio= ns of deciding where a group boundary falls, numbering the groups by "g1". >=20 > Repeat this copying from "t1" to "t2" (which has besides "g1" also "g2"), o= rdered by a, b, c, d, y, numbering the groups by "g2". Now "t1" no longer is= needed. >=20 > Copy all distinct pairs of "g1" and "g2" to "tpair". >=20 > Until g =3D MIN(g1) of "tpair" is null, move (INSERT ... SELECT; DELETE ..= .) from "tpair" to "tq" all pairs where g1=3Dg, and as long as there is anyt= hing to move from "tpair" to "tq" alternatly move records where any in "tpai= r" s "g2" match any already in "tq" and any in "tpair" s "g1" match any alre= ady in "tq". Now all the pairs in "tq" represent the same group: every recor= d in "t2" with a pair in "tq" is copied into "t3" with a new number "g3" ins= tead of the pair. >=20 > After this one may group "t3" by "g3". >=20 >=20 > --=20 > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql >=20