Dan Nelson wrote:
> In the last episode (Jun 18), Edward Kay said:
>> From: Dan Nelson [mailto:dnelson@stripped]
>> <snip>
>>>>>> At the moment, I have this and it works:
>>>>>>
>>>>>> select * from contact_address
>>>>>> group by primary_entity_id
>>>>>> having count(primary_entity_id) = 1
>>>>>> and is_primary = 0;
>>>>>>
>>>>>> This is fine except I want to use the result in a sub-query.
>>>>>> Since it returns two columns this doesn't work:
>>>>>>
>>>>>> update contact_address set is_primary = 1 where address_id in (
>>>>>> select * from contact_address
>>>>>> group by primary_entity_id
>>>>>> having count(primary_entity_id) = 1
>>>>>> and is_primary = 0
>>>>>> );
>>>>>>
>>>>>> Normally, I'd only return the address_id in the sub-SELECT,
>>>>>> but I need the is_primary column for the HAVING clause.
>>>>> I did some tests, and it looks like you can use aggregate
>>>>> functions in your HAVING clause without actually selecting the
>>>>> column. So "HAVING COUNT(primary_entity_id) = 1" should work
>>>>> even if you only select address_id.
>>>> Yes, that is true and it does work.
>>>>
>>>> What doesn't work however, is the extra 'AND is_primary = 0' HAVING
>>>> clause.
>>> Couldn't you move that up into a WHERE clause (still in the
>>> subquery)? It's just a regular field comparison, so it doesn't have
>>> to be in the HAVING clause.
>> No, because that's not the same thing. Imagine the following data:
>>
>> address_id primary_entity_id is_primary
>> -------------------------------------------
>> 1 293 0
>> 2 293 1
>>
>> With my query above, the sub-query result set would be empty since
>> count(primary_entity_id) = 2.
>>
>> If I move the is_primary = 0 requirement to a WHERE clause, then the
>> first row would be incorrectly updated since the group function would
>> only be working on the sub-set of data (where is_primary = 0).
>>
>> It's a subtle but important difference.
>
> Yes, that complicates things a bit... That does mean you'll need to
> pull another column in your subquery. I guess you could strip that
> column out with a subquery :) Something like:
>
> update contact_address set is_primary = 1 where address_id in (
> select address_id from (
> select address_id, primary_entity_id from contact_address
> group by primary_entity_id
> having count(primary_entity_id) = 1
> and primary_entity_id = 0
> )
> );
>
This is fine, but as I wrote earlier, MySQL does not do well with IN()
subqueries, and nesting a subquery in a dependent FROM clause is probably even
harder for it to optimize well, since that will end up being an un-indexed
temporary table. It is probably better to write this as a JOIN. I gave an
example a few days ago.
Cheers
Baron