From: Shawn Green Date: December 19 2012 3:11am Subject: Re: Basic SELECT help List-Archive: http://lists.mysql.com/mysql/228833 Message-Id: <50D13073.9070508@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Neil, On 11/22/2012 7:14 PM, hsv@stripped wrote: >>>>> 2012/11/22 14:30 +0000, Neil Tompkins >>>> > I'm struggling with what I think is a basic select but can't think how to > do it : My data is > > id,type > > 1000,5 > 1001,5 > 1002,2 > 1001,2 > 1003,2 > 1005,2 > 1006,1 > > From this I what to get a distinct list of id where the type equals 2 and 5 > > Any ideas ? > <<<<<<<< > This ugly one, which generalizes: > > select id,group_concat(type) AS tl from the_table group by id having find_in_set('2',tl) and find_in_set('5',tl) > > Ugly becaus it involves so much converting between number & string. > > For full generality one would indeed write > GROUP_CONCAT(type ORDER BY type) > and pass my "tl" and a string, say '1,2', to a procedure that using SUBSTRING_INDEX taking the strings for arrays ensures that all found in the first string is also in the second string. There are times when I wish SQL had arrays. The fun part of solving this is to remember that SQL is a set-oriented language. For each element in the set, none of them can be both 2 and 5 at the same time. So, you have to build two sets and check to see which rows are in both. One pattern works if you need to aggregate for just a few terms SELECT a.id from (select distinct id from mytable where type=2) a INNER JOIN (select distinct id from mytable where type=5) b on a.id=b.id However, this gets numerically very expensive with more than a few JOINS to the pattern. Also, there is no index on either of the temporary results (a or b) so this is a full Cartesian product of both tables. That means that although it gives you a correct answer, it will not scale to 100000's of rows (or more) in either set. So, here is a way to assemble the same result that uses much less resources. Remember, each row you want is a member of a set. CREATE TEMPORARY TABLE tmpList ( id int , type int , PRIMARY KEY (id,type) ) INSERT IGNORE tmpList SELECT id,type FROM mytable WHERE type in (2,5) SELECT id, count(type) hits FROM tmplist GROUP BY id HAVING hits=2 DROP TEMPORARY TABLE tmpList Can you see why this works? I created an indexed subset of rows that match either value (2 or 5) but only keep one example of each. I accomplished that by the combination of PRIMARY KEY and INSERT IGNORE. Then I counted many type values each ID value represented in the subset. If I looked for 2 terms and I ended up with hits=2, then I know that those ID values matched on both terms. You can expand on this pattern to also do partial (M of N search terms) or best-fit determinations. I hope this was the kind of help you were looking for. Regards, -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN