List:General Discussion« Previous MessageNext Message »
From:Larry Martell Date:January 13 2014 6:31pm
Subject:Re: grouping by the difference between values in rows
View as plain text  
On Mon, Jan 13, 2014 at 2:43 AM, Peter Brawley
<peter.brawley@stripped> wrote:
>
> On 2014-01-12 9:13 PM, Larry Martell wrote:
>>
>> On Sun, Jan 12, 2014 at 2:47 PM, Peter Brawley
>> <peter.brawley@stripped> wrote:
>>>
>>> On 2014-01-12 1:17 PM, Larry Martell wrote:
>>>>
>>>> I've been asked to do something that I do not think is possible in SQL.
>>>>
>>>> I have a query that has this basic form:
>>>>
>>>> SELECT a, b, c, d, AVG(e), STD(e), CONCAT(x, ',', y) as f
>>>> FROM t
>>>> GROUP BY a, b, c, d, f
>>>>
>>>> x and y are numbers (378.18, 2213.797 or 378.218, 2213.949 or
>>>> 10053.490, 2542.094).
>>>>
>>>> The business issue is that if either x or y in 2 rows that are in the
>>>> same a, b, c, d group are within 1 of each other then they should be
>>>> grouped together. And to make it more complicated, the tolerance is
>>>> applied as a rolling continuum. For example, if the x and y in a set
>>>> of grouped rows are:
>>>>
>>>> row 1: 1.5, 9.5
>>>> row 2: 2.4, 20.8
>>>> row 3: 3.3, 40.6
>>>> row 4: 4.2, 2.5
>>>> row 5: 5.1, 10.1
>>>> row 6: 6.0, 7.9
>>>> row 7: 8.0, 21.0
>>>> row 8: 100, 200
>>>>
>>>> 1 through 6 get combined because all their X values are within the
>>>> tolerance of some other X in the set that's been combined. 7's Y value
>>>> is within the tolerance of 2's Y, so that should be combined as well.
>>>> 8 is not combined because neither the X or Y value is within the
>>>> tolerance of any X or Y in the set that was combined.
>>>>
>>>> In python I can easily parse the data and identify the rows that need
>>>> to be combined, but then I've lost the ability to calculate the
>>>> average and std. The only way I can think of to do this is to remove
>>>> the grouping from the SQL and do all the grouping and aggregating
>>>> myself. But this query often returns 20k to 30k rows after grouping.
>>>> It could easily be 80k to 100k rows that I have to process if I remove
>>>> the grouping and I think that will be very slow.
>>>>
>>>> Anyone have any ideas?
>>>
>>>
>>> Could you compute the row-to-row values & write them to a temp table,
>>> then
>>> run the SQL that incorporates that result column?
>>
>> I thought of temp tables, but I could not come up with a way to use
>> them for this. How can I apply the x/y tolerance grouping in sql?
>
>
> Run the query you showed, saving the result to a temp table. In an sproc or
> your preferred app language, do the row-to-row processing to generate a new
> column in the temp table from the biz rules you outlined, now query the
> revised temp table as desired.

Not too clear on how this will help me. The issue with the query I
showed is that I don't have the individual rows that make up the
aggregate data of the rows I need to combine. I think I have to run a
query with no group by and do all the grouping and aggregation myself.

In any case, unfortunately this has been made a low priority task and
I've been put on to something else (I hate when they do that). I'll
revive this thread when I'm allowed to get back on this.
Thread
grouping by the difference between values in rowsLarry Martell12 Jan 2014
  • Re: grouping by the difference between values in rowsPeter Brawley12 Jan 2014
    • Re: grouping by the difference between values in rowsLarry Martell13 Jan 2014
      • Re: grouping by the difference between values in rowsPeter Brawley13 Jan 2014
        • Re: grouping by the difference between values in rowsLarry Martell13 Jan 2014
        • Re: grouping by the difference between values in rowsshawn l.green15 Jan 2014
          • Re: grouping by the difference between values in rowsLarry Martell16 Jan 2014
  • Re: grouping by the difference between values in rowshsv21 Jan 2014
    • Re: grouping by the difference between values in rowsTakeshi Hashimoto21 Jan 2014