List:General Discussion« Previous MessageNext Message »
From:Daevid Vincent Date:June 12 2013 6:25pm
Subject:RE: How do I select all rows of table that have some rows in another table (AND, not OR)
View as plain text  
This idea is so fancy pants and clever I *wish* it could have worked for me.
I checked and we actually have 65 genres currently (with more to come I'm
sure) LOL *FML*. I'm almost ashamed I didn't think of this. I <3 me some
bitmasks and this solution is so elegant. It's unfortunate there isn't a way
to use more than 64-bits natively.

You're RICK JAMES Bitch! :-p   (please tell me you know the Dave Chappelles
skit I'm referring to)

> -----Original Message-----
> From: Rick James [mailto:rjames@stripped]
> Sent: Wednesday, June 12, 2013 9:39 AM
> To: Daevid Vincent; mysql@stripped
> Cc: 'shawn green'
> Subject: RE: How do I select all rows of table that have some rows in
> another table (AND, not OR)
> 
> Thinking out of the box... (And posting my reply at the 'wrong' end of the
> email.)...
> 
> Are there fewer than 64 genres?  Use a SET or BIGINT UNSIGNED.
> 
> > > >                  AND sg.`genre_id` IN (10,38)
> > > >                  AND sg.`genre_id` NOT IN (22,61)
> 
> -->
>     AND  genre & ((1<<10) | (1<<38)) != 0
>     AND  genre & ((1<<22) | (1<<61))  = 0
> 
> This would avoid having that extra table, and save a lot of space.
> 
> If you have more than 64 genres, then
> Plan A: clump them into some kind of application grouping and use multiple
> INTs/SETs.
> Plan B: do mod & div arithmetic to compute which genre field to tackle.
> 
> For B, something like:
>     AND  (genre1 & (1<<0)) + (genre3 & (1<<8)) != 0
>     AND  (genre2 & (1<<2)) + (genre6 & (1<<1))  = 0
> (That's assuming 10 bits per genre# field.  I would use 32 and INT
> UNSIGNED.)
> 
> 
> 
> > -----Original Message-----
> > From: Daevid Vincent [mailto:daevid@stripped]
> > Sent: Tuesday, June 11, 2013 4:17 PM
> > To: mysql@stripped
> > Cc: 'shawn green'
> > Subject: RE: How do I select all rows of table that have some rows in
> > another table (AND, not OR)
> >
> >
> >
> > > -----Original Message-----
> > > From: shawn green [mailto:shawn.l.green@stripped]
> > > Sent: Tuesday, June 11, 2013 2:16 PM
> > > To: mysql@stripped
> > > Subject: Re: How do I select all rows of table that have some rows in
> > > another table (AND, not OR)
> > >
> > > Hello Daevid,
> > >
> > > On 6/11/2013 3:59 PM, Daevid Vincent wrote:
> > > > I am trying to implement a filter so that a user could select
> > > > various
> > > genres
> > > > they want "in" or "out". Perhaps they like 'action' and 'car chases'
> > > > but don't like 'foreign' and 'drifting' (or whatever...)
> > > >
> > > > So I want something sort of like this, however IN() is using an "OR"
> > > > comparison when I need it to be an "AND"
> > > >
> > > > SELECT DISTINCT
> > > >          s.`scene_id` AS `id`,
> > > >          GROUP_CONCAT(sg.`genre_id`) FROM
> > > >          `dvds` AS d
> > > >          JOIN `scenes_list` AS s
> > > >                  ON s.`dvd_id` = d.`dvd_id`
> > > >          JOIN `scenes_genres` AS sg
> > > >                  ON sg.`scene_id` = s.`scene_id`
> > > >                  AND sg.`genre_id` IN (10,38)
> > > >                  AND sg.`genre_id` NOT IN (22,61) GROUP BY
> > > > s.`scene_id`;
> > > >
> > > > This is giving me way way too many rows returned.
> > > >
> > > > For example, I would expect this scene_id to be in the result set:
> > > >
> > > > SELECT * FROM scenes_genres WHERE scene_id = 17;
> > > >
> > > > scene_id  genre_id
> > > > --------  ----------
> > > >        17           1
> > > >        17           3
> > > >        17          10 <--
> > > >        17          19
> > > >        17          38 <--
> > > >        17          53
> > > >        17          58
> > > >        17          59
> > > >
> > > > And this scene ID to NOT be in the result set:
> > > >
> > > > SELECT * FROM scenes_genres WHERE scene_id = 11;
> > > >
> > > > scene_id  genre_id
> > > > --------  ----------
> > > >        11           1
> > > >        11          10 <--
> > > >        11          19
> > > >        11          31
> > > >        11          32
> > > > 		       <-- but does not have 38
> > > >        11          59
> > > >
> > > > I've tried various subselect ideas, but of course this fails b/c
> > genre_id
> > > > can't be multiple things at one time (AND)
> > > >
> > > >     JOIN `scenes_genres` AS sg
> > > >       ON sg.`scene_id` = s.`scene_id`
> > > >          AND sg.`genre_id` IN (
> > > >            SELECT `genre_id` FROM `scenes_genres`
> > > >             WHERE `genre_id` = 10
> > > >                    AND `genre_id` = 38
> > > >                    AND `genre_id` <> 22
> > > >                    AND `genre_id` <> 61
> > > >         )
> > > >
> > > > And straight up like this failure too...
> > > >
> > > >    JOIN `scenes_genres` AS sg
> > > >      ON sg.`scene_id` = s.`scene_id`
> > > >         AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38)
> > > >         AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61)
> > > >
> > > > So I'm sort of out of ideas at this point and hoping someone has a
> > > > way
> > to
> > > do
> > > > this.
> > > >
> > > > Also, just for S&G this is how we are currently implementing it,
> but
> > > > we
> > > feel
> > > > the REGEXP is killing our queries and while "clever" is a bit hacky
> > > > and nullifies any indexes we have on the genres column as it
> > > > requires a file_sort table scan to compare substrings basically...
> > > >
> > > > SELECT * FROM scene_all_genres WHERE scene_id = 17;
> > > >
> > > > scene_id  genres
> > > > --------  ------------------------
> > > >        17  1|3|10|19|38|53|58|59|
> > > >
> > > > SELECT * FROM scene_all_genres WHERE scene_id = 11;
> > > >
> > > > scene_id  genres
> > > > --------  -------------------
> > > >        11  1|10|19|31|32|59|
> > > >
> > > > SELECT DISTINCT
> > > >          s.`scene_id` AS `id`,
> > > >          sg.`genres`
> > > > FROM
> > > >          `scene_all_genres` AS sg,
> > > >          `dvds` AS d,
> > > >          `scenes_list` AS s
> > > > WHERE 	 dvd_id` = d.`dvd_id`
> > > >          AND sg.`scene_id` = s.`scene_id`
> > > >          AND sg.`genres` REGEXP
> '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]'
> > > >          AND sg.`genres` NOT REGEXP
> > > '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'
> > > >
> > > > ;
> > > >
> > > > http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp
> > > >
> > > > [[:<:]], [[:>:]]
> > > >
> > > > These markers stand for word boundaries. They match the beginning
> > > > and
> > end
> > > of
> > > > words, respectively. A word is a sequence of word characters that is
> > > > not preceded by or followed by word characters. A word character is
> > > > an alphanumeric character in the alnum class or an underscore (_).
> > > >
> > >
> > > To me it looks like you want to know how to match N for N when looking
> > > for subset of properties. It's a multi-step process using plain SQL
> > > but it's fast:
> > >
> > > 1) Build a list of scenes containing the genres you want to see
> > >
> > > CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id)) SELECT
> > > scene_id, genre_id FROM scenes_genres WHERE genre_id` IN (10,38)  <--
> > > in this case you have 2 terms to mach
> > >
> > > 2) From that list, determine which of those scenes also contain
> > > unwanted genres and remove them.
> > >
> > > DELETE tmpList
> > > FROM tmpList t
> > > INNER JOIN scenes_genres sg
> > >    on sg.scene_id = t.scene_id
> > >    and sg.genre_id IN (22,61)
> > >
> > > # at this point, tmpList contains all scenes that have any of the
> > > desired genres but none of the unwanted ones.
> > >
> > > 3) Check to see if any scene has all N matches.
> > >
> > > SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY
> > > scene_id HAVING matches = 2
> > > /* ^--- this is the number of terms you are trying to match. */
> > >
> > >
> > > Of course you can modify the last query to eliminate the HAVING clause
> > > and pick the top 5 matching scenes (even if they are partial matches)
> > > like this
> > >
> > > SELECT scene_id, count(genre_id) as matches
> > > FROM tmpList
> > > GROUP BY scene_id
> > > ORDER BY matches DESC
> > > LIMIT 5
> > >
> > > Let us know if this is not what you wanted to do.
> > >
> > >
> > >
> > > Note to the rest of the list:
> > >
> > > Yes, we could have done most of this with self-joins or subqueries, an
> > > EXISTS, and a NOT EXISTS inside a single statement but I think the
> > > execution time would have been miserable. I also think that that
> > > approach also would not have allowed us to evaluate a partial match
> > > (like 5 out of 7 target genres), only complete matches would have been
> > > returned.
> > >
> > > I am still very interested in seeing alternative solutions :)
> > >
> > > --
> > > Shawn Green
> > > MySQL Principal Technical Support Engineer
> > > Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
> > > Office: Blountville, TN
> >
> > Shawn, thank you for taking the time to reply. I wasn't expecting the
> > solution to be so much "work" with multiple statements like that. I was
> > thinking it could be done in one (or two, as in split out a portion of
it
> > in
> > PHP and re-insert it to the original SQL to avoid a JOIN or something).
> > Part
> > of the issue is that we use PHP to generate the $sql string by appending
> > bits and pieces depending on the search criteria thereby keeping the
> > 'path'
> > through the SQL statement simple and relatively linear. To implement
this
> > would require significant re-writing and/or special cases where we could
> > introduce errors or omissions in the future. The frustrating part is
that
> > the REGEXP query we use now only takes about 2 seconds on my DEV VM
(same
> > database as PROD), however when the RDBMS is loaded it then takes up to
30
> > seconds so in theory it's not even that inefficient given the # rows. We
> > do
> > use memcached for the results, but since there are so many combinations
a
> > user could choose, our hit ratio is not so great and therefore the cache
> > isn't doing us much good and this is why the RDBMS can get loaded up
> > easily.
> >
> > How can an "OR" be so simple using IN() but "AND" be so overly complex?
> > Seems that mysql should have another function for ALL() that works just
> > like
> > IN() to handle this kind of scenario.
> >
> >
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/mysql
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql

Thread
How do I select all rows of table that have some rows in another table (AND, not OR)Daevid Vincent11 Jun
  • Re: How do I select all rows of table that have some rows in anothertable (AND, not OR)shawn green11 Jun
    • RE: How do I select all rows of table that have some rows in another table (AND, not OR)Daevid Vincent11 Jun
      • RE: How do I select all rows of table that have some rows inanother table (AND, not OR)Rick James12 Jun
        • RE: How do I select all rows of table that have some rows in another table (AND, not OR)Daevid Vincent12 Jun
          • RE: How do I select all rows of table that have some rows inanother table (AND, not OR)Rick James12 Jun
      • Re: How do I select all rows of table that have some rows in anothertable (AND, not OR)shawn green12 Jun
  • Re: How do I select all rows of table that have some rows inanother table (AND, not OR)hsv13 Jun
RE: How do I select all rows of table that have some rows in another table (AND, not OR)Daevid Vincent12 Jun
Re: How do I select all rows of table that have some rows inanother table (AND, not OR)hsv13 Jun
RE: How do I select all rows of table that have some rows in another table (AND, not OR)hsv18 Jun