List:General Discussion« Previous MessageNext Message »
From:Shawn Green Date:December 19 2012 3:11am
Subject:Re: Basic SELECT help
View as plain text  
Hi Neil,

On 11/22/2012 7:14 PM, hsv@stripped wrote:
>>>>> 2012/11/22 14:30 +0000, Neil Tompkins >>>>
> I'm struggling with what I think is a basic select but can't think how to
> do it : My data is
>
> id,type
>
> 1000,5
> 1001,5
> 1002,2
> 1001,2
> 1003,2
> 1005,2
> 1006,1
>
>  From this I what to get a distinct list of id where the type equals 2 and 5
>
> Any ideas ?
> <<<<<<<<
> This ugly one, which generalizes:
>
> select id,group_concat(type) AS tl from the_table group by id having
> find_in_set('2',tl) and find_in_set('5',tl)
>
> Ugly becaus it involves so much converting between number & string.
>
> For full generality one would indeed write
> GROUP_CONCAT(type ORDER BY type)
> and pass my "tl" and a string, say '1,2', to a procedure that using SUBSTRING_INDEX
> taking the strings for arrays ensures that all found in the first string is also in the
> second string. There are times when I wish SQL had arrays.

The fun part of solving this is to remember that SQL is a set-oriented 
language. For each element in the set, none of them can be both 2 and 5 
at the same time. So, you have to build two sets and check to see which 
rows are in both.

One pattern works if you need to aggregate for just a few terms

SELECT a.id
from (select distinct id from mytable where type=2) a
INNER JOIN (select distinct id from mytable where type=5) b
   on a.id=b.id

However, this gets numerically very expensive with more than a few JOINS 
to the pattern.  Also, there is no index on either of the temporary 
results (a or b) so this is a full Cartesian product of both tables. 
That means that although it gives you a correct answer, it will not 
scale to 100000's of rows (or more) in either set.


So, here is a way to assemble the same result that uses much less 
resources. Remember, each row you want is a member of a set.

CREATE TEMPORARY TABLE tmpList (
   id int
, type int
, PRIMARY KEY (id,type)
)

INSERT IGNORE tmpList
SELECT id,type
FROM mytable
WHERE type in (2,5)

SELECT id, count(type) hits
FROM tmplist
GROUP BY id
HAVING hits=2

DROP TEMPORARY TABLE tmpList

Can you see why this works?

I created an indexed subset of rows that match either value (2 or 5) but 
only keep one example of each. I accomplished that by the combination of 
PRIMARY KEY and INSERT IGNORE. Then I counted many type values each ID 
value represented in the subset.  If I looked for 2 terms and I ended up 
with hits=2, then I know that those ID values matched on both terms.

You can expand on this pattern to also do partial (M of N search terms) 
or best-fit determinations.

I hope this was the kind of help you were looking for.

Regards,
-- 
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN


Thread
Basic SELECT helpNeil Tompkins22 Nov
  • Re: Basic SELECT helpMike OK22 Nov
  • Re: Basic SELECT helpmarek gutowski22 Nov
    • Re: Basic SELECT helpNeil Tompkins22 Nov
      • Re: Basic SELECT helpBenaya Paul22 Nov
        • Re: Basic SELECT helpNeil Tompkins22 Nov
  • RE: Basic SELECT helpJason Trebilcock22 Nov
    • Re: Basic SELECT helpMichael Dykman22 Nov
      • Re: Basic SELECT helpNeil Tompkins22 Nov
        • Re: Basic SELECT helpMichael Dykman22 Nov
      • Re: Basic SELECT helpNeil Tompkins22 Nov
      • Re: Basic SELECT helpNeil Tompkins22 Nov
      • Re: Basic SELECT helpNeil Tompkins22 Nov
  • Re: Basic SELECT helphsv22 Nov
Re: Basic SELECT helpMogens Melander22 Nov
Re: Basic SELECT helpMogens Melander22 Nov
  • Re: Basic SELECT helpMichael Dykman22 Nov
Re: Basic SELECT helphsv23 Nov
  • Re: Basic SELECT helpShawn Green19 Dec