List:General Discussion« Previous MessageNext Message »
From:Paul Halliday Date:June 13 2013 12:31am
Subject:RE: How do I select all rows of table that h
ave some rows in
another table (AND, not OR)
View as plain text  
I am so, so glad that someone finally said what I think each time I see a message from you Mr. James. 

-------- Original message --------
From: Rick James <rjames@yahoo-inc.com> 
Date: 06-12-2013  8:45 PM  (GMT-04:00) 
To: Daevid Vincent <daevid@stripped>,mysql@stripped 
Subject: RE: How do I select all rows of table that have some rows in
  another table (AND, not OR) 
 
"I'm the ORIGINAL Rick James, B...."  (And, I'm still alive.)  LOL

If you are using PHP, you might want to stop at 31 bits per INT/SET.  PHP seems not to yet be in the 64-bit world.

> -----Original Message-----
> From: Daevid Vincent [mailto:daevid@stripped]
> Sent: Wednesday, June 12, 2013 11:26 AM
> To: mysql@stripped
> Subject: RE: How do I select all rows of table that have some rows in
> another table (AND, not OR)
> 
> This idea is so fancy pants and clever I *wish* it could have worked for
> me.
> I checked and we actually have 65 genres currently (with more to come I'm
> sure) LOL *FML*. I'm almost ashamed I didn't think of this. I <3 me some
> bitmasks and this solution is so elegant. It's unfortunate there isn't a
> way to use more than 64-bits natively.
> 
> You're RICK JAMES Bitch! :-p   (please tell me you know the Dave
> Chappelles
> skit I'm referring to)
> 
> > -----Original Message-----
> > From: Rick James [mailto:rjames@stripped]
> > Sent: Wednesday, June 12, 2013 9:39 AM
> > To: Daevid Vincent; mysql@stripped
> > Cc: 'shawn green'
> > Subject: RE: How do I select all rows of table that have some rows in
> > another table (AND, not OR)
> >
> > Thinking out of the box... (And posting my reply at the 'wrong' end of
> > the email.)...
> >
> > Are there fewer than 64 genres?  Use a SET or BIGINT UNSIGNED.
> >
> > > >
> >                 
> AND sg.`genre_id` IN (10,38)
> > > >
>                 
AND sg.`genre_id` NOT IN (22,61)
> >
> > -->
> >     AND  genre & ((1<<10) | (1<<38)) != 0
> >     AND  genre & ((1<<22) | (1<<61))  = 0
> >
> > This would avoid having that extra table, and save a lot of space.
> >
> > If you have more than 64 genres, then
> > Plan A: clump them into some kind of application grouping and use
> > multiple INTs/SETs.
> > Plan B: do mod & div arithmetic to compute which genre field to tackle.
> >
> > For B, something like:
> >     AND  (genre1 & (1<<0)) + (genre3 & (1<<8)) != 0
> >     AND  (genre2 & (1<<2)) + (genre6 & (1<<1))  = 0 (That's assuming
> > 10 bits per genre# field.  I would use 32 and INT
> > UNSIGNED.)
> >
> >
> >
> > > -----Original Message-----
> > > From: Daevid Vincent [mailto:daevid@stripped]
> > > Sent: Tuesday, June 11, 2013 4:17 PM
> > > To: mysql@stripped
> > > Cc: 'shawn green'
> > > Subject: RE: How do I select all rows of table that have some rows
> > > in another table (AND, not OR)
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: shawn green [mailto:shawn.l.green@stripped]
> > > > Sent: Tuesday, June 11, 2013 2:16 PM
> > > > To: mysql@stripped
> > > > Subject: Re: How do I select all rows of table that have some rows
> > > > in another table (AND, not OR)
> > > >
> > > > Hello Daevid,
> > > >
> > > > On 6/11/2013 3:59 PM, Daevid Vincent wrote:
> > > > > I am trying to implement a filter so that a user could select
> > > > > various
> > > > genres
> > > > > they want "in" or "out". Perhaps they like 'action' and 'car
> chases'
> > > > > but don't like 'foreign' and 'drifting' (or whatever...)
> > > > >
> > > > > So I want something sort of like this, however IN() is using an
> "OR"
> > > > > comparison when I need it to be an "AND"
> > > > >
> > > > > SELECT DISTINCT
> > > > >      GROUP_CONCAT(sg.`genre_id`) FROM
> > > > >          `dvds` AS d
> > > > >          JOIN `scenes_list` AS s
> > > >
>                  ON s.`dvd_id` = d.`dvd_id`
> > > > >          JOIN `scenes_genres` AS sg
> > > > >                  ON sg.`scene_id` = s.`scene_id`
> > > >
> >                 
> AND sg.`genre_id` IN (10,38)
> > > >
>                 
AND sg.`genre_id` NOT IN (22,61) GROUP BY
> > > > > s.`scene_id`;
> > > > >
> > > > > This is giving me way way too many rows returned.
> > > > >
> > > > > For example, I would expect this scene_id to be in the result set:
> > > > >
> > > > > SELECT * FROM scenes_genres WHERE scene_id = 17;
> > > > >
> > > > > scene_id  genre_id
> > > > > --------  ----------
> > > > >       
17           1
> > > > >       
17           3
> > > > >       
17          10 <--
> > > > >       
> 17          19
> > > > >       
17          38 <--
> > > > >   7          58
> > > > >       
17          59
> > > > >
> > > > > And this scene ID to NOT be in the result set:
> > > > >
> > > > > SELECT * FROM scenes_genres WHERE scene_id = 11;
> > > > >
> > > > > scene_id  genre_id
> > > > > --------  ----------
> > > > >       
> 11           1
> > > > >     11          19
> > > > >        11          31
> > > > >       
11     8
> > > > >       
11          59
> > > > >
> > > > > I've tried various subselect ideas, but of course this fails b/c
> > > genre_id
> > > > > can't be multiple things at one time (AND)
> > > > >
> > > > >     JOIN `scenes_genres` AS sg
> > > > >       ON sg.`scene_id` = s.`scene_id`
> > > > >          AND sg.`genre_id` IN (
> > > >
>            SELECT `genre_id` FROM `scenes_genres`
> > > > >                   AND `genre_id` =
38
> > > > > > > > >
> >                   
> AND `genre_id` <> 61
> > > > >         )
> > > > >
> > > > > And straight up like this failure too...
> > > > >
> > > > >    JOIN `scenes_genres` AS sg
> > > > >      ON sg.`scene_id` = s.`scene_id`
> > > > >       > >         AND NOT (sg.`genre_id` = 22 AND
sg.`genre_id` = 61)
> > > > >
> > > > > So I'm sort of out of ideas at this point and hoping someone has
> > > > > a way
> > > to
> > > > do
> > > > > this.
> > > > >
> > > > > Also, just for S&G this is how we are currently implementing it,
> > > > > but we
> > > > feel
> > > > > the REGEXP is killing our queries and while "clever" is a bit
> > > > > hacky and nullifies any indexes we have on the genres column as
> > > > > it requires a file_sort table scan to compare substrings
> basically...
> > > > >
> > > > > SELECT * FROM scene_all_genres WHERE scene_id = 17;
> > > > >
> > > > > scene_id  genres
> > > > > --------  ------------------------
> > > > >        17  1|3|10|19|38|53|58|59|
> > > > >
> > > > > SELECT * FROM scene_all_genres WHERE scene_id = 11;
> > > > >
> > > > > scene_id  genres
> > > > > --------  -------------------
> > > > >        11 
> 1|10|19|31|32|59|
> > > > >
> > > > > SELECT DISTINCT
> > > > >          s.`scene_id` AS `id`,
> > > > >          sg.`genres`
> > > > > FROM
> > > > >         
`scene_all_genres` AS sg,
> > > > >          `dvds` AS
d,
> > > > >          `scenes_list` AS
> s
> > > > > WHERE dvd_id` = d.`dvd_id`
> > > > >          AND sg.`scene_id` = s.`scene_id`
> > > > >          AND sg.`genres` REGEXP
> '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]'
> > > > >          AND
sg.`genres` NOT REGEXP
> > > > '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'
> > > > >
> > > > > ;
> > > > >
> > > > > http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_rege
> > > > > xp
> > > > >
> > > > > [[:<:]], [[:>:]]
> > > > >
> > > > > These markers stand for word boundaries. They match the
> > > > > beginning and
> > > end
> > > > of
> > > > > words, respectively. A word is a sequence of word characters
> > > > > that is not preceded by or followed by word characters. A word
> > > > > character is an alphanumeric character in the alnum class or an
> underscore (_).
> > > > >
> > > >
> > > > To me it looks like you want to know how to match N for N when
> > > > looking for subset of properties. It's a multi-step process using
> > > > plain SQL but it's fast:
> > > >
> > > > 1) Build a list of scenes containing the genres you want to see
> > > >
> > > > CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id)) SELECT
> > > > scene_id, genre_id FROM scenes_genres WHERE genre_id` IN (10,38)
> > > > <-- in this case you have 2 terms to mach
> > > >
> > > > 2) From that list, determine which of those scenes also contain
> > > > unwanted genres and remove them.
> > > >
> > > > DELETE tmpList
> > > > FROM tmpList t
> > > > INNER JOIN scenes_genres sg
> > > >    on sg.scene_id = t.scene_id
> > > >    and sg.genre_id IN (22,61)
> > > >
> > > > # at this point, tmpList contains all scenes that have any of the
> > > > desired genres but none of the unwanted ones.
> > > >
> > > > 3) Check to see if any scene has all N matches.
> > > >
> > > > SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY
> > > > scene_id HAVING matches = 2
> > > > /* ^--- this is the number of terms you are trying to match. */
> > > >
> > > >
> > > > Of course you can modify the last query to eliminate the HAVING
> > > > clause and pick the top 5 matching scenes (even if they are
> > > > partial matches) like this
> > > >
> > > > SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY
> > > > scene_id ORDER BY matches DESC LIMIT 5
> > > >
> > > > Let us know if this is not what you wanted to do.
> > > >
> > > >
> > > >
> > > > Note to the rest of the list:
> > > >
> > > > Yes, we could have done most of this with self-joins or
> > > > subqueries, an EXISTS, and a NOT EXISTS inside a single statement
> > > > but I think the execution time would have been miserable. I also
> > > > think that that approach also would not have allowed us to
> > > > evaluate a partial match (like 5 out of 7 target genres), only
> > > > complete matches would have been returned.
> > > >
> > > > I am still very interested in seeing alternative solutions :)
> > > >
> > > > --
> > > > Shawn Green
> > > > MySQL Principal Technical Support Engineer Oracle USA, Inc. -
> > > > Hardware and Software, Engineered to Work Together.
> > > > Office: Blountville, TN
> > >
> > > Shawn, thank you for taking the time to reply. I wasn't expecting
> > > the solution to be so much "work" with multiple statements like
> > > that. I was thinking it could be done in one (or two, as in split
> > > out a portion of
> it
> > > in
> > > PHP and re-insert it to the original SQL to avoid a JOIN or
> something).
> > > Part
> > > of the issue is that we use PHP to generate the $sql string by
> > > appending bits and pieces depending on the search criteria thereby
> > > keeping the 'path'
> > > through the SQL statement simple and relatively linear. To implement
> this
> > > would require significant re-writing and/or special cases where we
> > > could introduce errors or omissions in the future. The frustrating
> > > part is
> that
> > > the REGEXP query we use now only takes about 2 seconds on my DEV VM
> (same
> > > database as PROD), however when the RDBMS is loaded it then takes up
> > > to
> 30
> > > seconds so in theory it's not even that inefficient given the #
> > > rows. We do use memcached for the results, but since there are so
> > > many combinations
> a
> > > user could choose, our hit ratio is not so great and therefore the
> > > cache isn't doing us much good and this is why the RDBMS can get
> > > loaded up easily.
> > >
> > > How can an "OR" be so simple using IN() but "AND" be so overly
> complex?
> > > Seems that mysql should have another function for ALL() that works
> > > just like
> > > IN() to handle this kind of scenario.
> > >
> > >
> > >
> > >
> > > --
> > > MySQL General Mailing List
> > > For list archives: http://lists.mysql.com/mysql
> > > To unsubscribe:    http://lists.mysql.com/mysql
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/mysql
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Thread
RE: How do I select all rows of table that have some rows in
another table (AND, not OR)
Paul Halliday13 Jun