List:General Discussion« Previous MessageNext Message »
From:Baron Schwartz Date:June 18 2007 4:36pm
Subject:Re: Hiding columns used in GROUP BY and HAVING clauses
View as plain text  

Dan Nelson wrote:
> In the last episode (Jun 18), Edward Kay said:
>> From: Dan Nelson [mailto:dnelson@stripped]
>> <snip>
>>>>>> At the moment, I have this and it works:
>>>>>>
>>>>>> 	select * from contact_address
>>>>>> 	group by primary_entity_id
>>>>>> 	having count(primary_entity_id) = 1
>>>>>> 	and is_primary = 0;
>>>>>>
>>>>>> This is fine except I want to use the result in a sub-query.
>>>>>> Since it returns two columns this doesn't work:
>>>>>>
>>>>>> 	update contact_address set is_primary = 1 where address_id in (
>>>>>> 		select * from contact_address
>>>>>> 		group by primary_entity_id
>>>>>> 		having count(primary_entity_id) = 1
>>>>>> 		and is_primary = 0
>>>>>> 	);
>>>>>>
>>>>>> Normally, I'd only return the address_id in the sub-SELECT,
>>>>>> but I need the is_primary column for the HAVING clause.
>>>>> I did some tests, and it looks like you can use aggregate
>>>>> functions in your HAVING clause without actually selecting the
>>>>> column.  So "HAVING COUNT(primary_entity_id) = 1" should work
>>>>> even if you only select address_id.
>>>> Yes, that is true and it does work.
>>>>
>>>> What doesn't work however, is the extra 'AND is_primary = 0' HAVING
>>>> clause.
>>> Couldn't you move that up into a WHERE clause (still in the
>>> subquery)? It's just a regular field comparison, so it doesn't have
>>> to be in the HAVING clause.
>> No, because that's not the same thing. Imagine the following data:
>>
>> address_id   primary_entity_id   is_primary
>> -------------------------------------------
>>   1            293                 0
>>   2            293                 1
>>
>> With my query above, the sub-query result set would be empty since
>> count(primary_entity_id) = 2.
>>
>> If I move the is_primary = 0 requirement to a WHERE clause, then the
>> first row would be incorrectly updated since the group function would
>> only be working on the sub-set of data (where is_primary = 0).
>>
>> It's a subtle but important difference.
> 
> Yes, that complicates things a bit...  That does mean you'll need to
> pull another column in your subquery.  I guess you could strip that
> column out with a subquery :) Something like:
> 
> update contact_address set is_primary = 1 where address_id in ( 
>   select address_id from (
>     select address_id, primary_entity_id from contact_address
>       group by primary_entity_id
>       having count(primary_entity_id) = 1
>       and primary_entity_id = 0 
>   )
> ); 
> 

This is fine, but as I wrote earlier, MySQL does not do well with IN() 
subqueries, and nesting a subquery in a dependent FROM clause is probably even 
harder for it to optimize well, since that will end up being an un-indexed 
temporary table.  It is probably better to write this as a JOIN.  I gave an 
example a few days ago.

Cheers
Baron
Thread
Hiding columns used in GROUP BY and HAVING clausesEdward Kay15 Jun
  • Re: Hiding columns used in GROUP BY and HAVING clausesBaron Schwartz15 Jun
  • Re: Hiding columns used in GROUP BY and HAVING clausesDan Nelson15 Jun
    • RE: Hiding columns used in GROUP BY and HAVING clausesEdward Kay18 Jun
      • Re: Hiding columns used in GROUP BY and HAVING clausesDan Nelson18 Jun
        • RE: Hiding columns used in GROUP BY and HAVING clausesEdward Kay18 Jun
          • Re: Hiding columns used in GROUP BY and HAVING clausesBaron Schwartz18 Jun
            • RE: Hiding columns used in GROUP BY and HAVING clausesEdward Kay18 Jun
          • Re: Hiding columns used in GROUP BY and HAVING clausesDan Nelson18 Jun
            • Re: Hiding columns used in GROUP BY and HAVING clausesBaron Schwartz18 Jun
              • RE: Hiding columns used in GROUP BY and HAVING clausesEdward Kay19 Jun