On 11/22/2012 7:14 PM, hsv@stripped wrote:
>>>>> 2012/11/22 14:30 +0000, Neil Tompkins >>>>
> I'm struggling with what I think is a basic select but can't think how to
> do it : My data is
> From this I what to get a distinct list of id where the type equals 2 and 5
> Any ideas ?
> This ugly one, which generalizes:
> select id,group_concat(type) AS tl from the_table group by id having
> find_in_set('2',tl) and find_in_set('5',tl)
> Ugly becaus it involves so much converting between number & string.
> For full generality one would indeed write
> GROUP_CONCAT(type ORDER BY type)
> and pass my "tl" and a string, say '1,2', to a procedure that using SUBSTRING_INDEX
> taking the strings for arrays ensures that all found in the first string is also in the
> second string. There are times when I wish SQL had arrays.
The fun part of solving this is to remember that SQL is a set-oriented
language. For each element in the set, none of them can be both 2 and 5
at the same time. So, you have to build two sets and check to see which
rows are in both.
One pattern works if you need to aggregate for just a few terms
from (select distinct id from mytable where type=2) a
INNER JOIN (select distinct id from mytable where type=5) b
However, this gets numerically very expensive with more than a few JOINS
to the pattern. Also, there is no index on either of the temporary
results (a or b) so this is a full Cartesian product of both tables.
That means that although it gives you a correct answer, it will not
scale to 100000's of rows (or more) in either set.
So, here is a way to assemble the same result that uses much less
resources. Remember, each row you want is a member of a set.
CREATE TEMPORARY TABLE tmpList (
, type int
, PRIMARY KEY (id,type)
INSERT IGNORE tmpList
WHERE type in (2,5)
SELECT id, count(type) hits
GROUP BY id
DROP TEMPORARY TABLE tmpList
Can you see why this works?
I created an indexed subset of rows that match either value (2 or 5) but
only keep one example of each. I accomplished that by the combination of
PRIMARY KEY and INSERT IGNORE. Then I counted many type values each ID
value represented in the subset. If I looked for 2 terms and I ended up
with hits=2, then I know that those ID values matched on both terms.
You can expand on this pattern to also do partial (M of N search terms)
or best-fit determinations.
I hope this was the kind of help you were looking for.
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN