List:General Discussion« Previous MessageNext Message »
From:Peter Brawley Date:January 13 2014 7:43am
Subject:Re: grouping by the difference between values in rows
View as plain text  
On 2014-01-12 9:13 PM, Larry Martell wrote:
> On Sun, Jan 12, 2014 at 2:47 PM, Peter Brawley
> <peter.brawley@stripped> wrote:
>> On 2014-01-12 1:17 PM, Larry Martell wrote:
>>> I've been asked to do something that I do not think is possible in SQL.
>>>
>>> I have a query that has this basic form:
>>>
>>> SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f
>>> FROM t
>>> GROUP BY a, b, c, d, f
>>>
>>> x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or
>>> 10053.490, 2542.094).
>>>
>>> The business issue is that if either x or y in 2 rows that are in the
>>> same a, b, c, d group are within 1 of each other then they should be
>>> grouped together. And to make it more complicated, the tolerance is
>>> applied as a rolling continuum. For example, if the x and y in a set
>>> of grouped rows are:
>>>
>>> row 1: 1.5, 9.5
>>> row 2: 2.4, 20.8
>>> row 3: 3.3, 40.6
>>> row 4: 4.2, 2.5
>>> row 5: 5.1, 10.1
>>> row 6: 6.0, 7.9
>>> row 7: 8.0, 21.0
>>> row 8: 100, 200
>>>
>>> 1 through 6 get combined because all their X values are within the
>>> tolerance of some other X in the set that's been combined. 7's Y value
>>> is within the tolerance of 2's Y, so that should be combined as well.
>>> 8 is not combined because neither the X or Y value is within the
>>> tolerance of any X or Y in the set that was combined.
>>>
>>> In python I can easily parse the data and identify the rows that need
>>> to be combined, but then I've lost the ability to calculate the
>>> average and std. The only way I can think of to do this is to remove
>>> the grouping from the SQL and do all the grouping and aggregating
>>> myself. But this query often returns 20k to 30k rows after grouping.
>>> It could easily be 80k to 100k rows that I have to process if I remove
>>> the grouping and I think that will be very slow.
>>>
>>> Anyone have any ideas?
>>
>> Could you compute the row-to-row values & write them to a temp table, then
>> run the SQL that incorporates that result column?
> I thought of temp tables, but I could not come up with a way to use
> them for this. How can I apply the x/y tolerance grouping in sql?

Run the query you showed, saving the result to a temp table. In an sproc 
or your preferred app language, do the row-to-row processing to generate 
a new column in the temp table from the biz rules you outlined, now 
query the revised temp table as desired.

PB

-----

>

Thread
grouping by the difference between values in rowsLarry Martell12 Jan 2014
  • Re: grouping by the difference between values in rowsPeter Brawley12 Jan 2014
    • Re: grouping by the difference between values in rowsLarry Martell13 Jan 2014
      • Re: grouping by the difference between values in rowsPeter Brawley13 Jan 2014
        • Re: grouping by the difference between values in rowsLarry Martell13 Jan 2014
        • Re: grouping by the difference between values in rowsshawn l.green15 Jan 2014
          • Re: grouping by the difference between values in rowsLarry Martell16 Jan 2014
  • Re: grouping by the difference between values in rowshsv21 Jan 2014
    • Re: grouping by the difference between values in rowsTakeshi Hashimoto21 Jan 2014