List:General Discussion« Previous MessageNext Message »
From:Daevid Vincent Date:June 12 2013 6:30pm
Subject:RE: How do I select all rows of table that have some rows in another table (AND, not OR)
View as plain text  
Oh! I must have misread. I didn't see how you had a solution for > 64 bits.
I may have to experiment with that!

> -----Original Message-----
> From: Daevid Vincent [mailto:daevid@stripped]
> Sent: Wednesday, June 12, 2013 11:26 AM
> To: mysql@stripped
> Subject: RE: How do I select all rows of table that have some rows in
> another table (AND, not OR)
> 
> This idea is so fancy pants and clever I *wish* it could have worked for
> me. I checked and we actually have 65 genres currently (with more to come
> I'm sure) LOL *FML*. I'm almost ashamed I didn't think of this. I <3 me
> some bitmasks and this solution is so elegant. It's unfortunate there
> isn't a way to use more than 64-bits natively.
> 
> You're RICK JAMES Bitch! :-p   (please tell me you know the Dave
> Chappelles skit I'm referring to)
> 
> > -----Original Message-----
> > From: Rick James [mailto:rjames@stripped]
> > Sent: Wednesday, June 12, 2013 9:39 AM
> > To: Daevid Vincent; mysql@stripped
> > Cc: 'shawn green'
> > Subject: RE: How do I select all rows of table that have some rows in
> > another table (AND, not OR)
> >
> > Thinking out of the box... (And posting my reply at the 'wrong' end of
> the
> > email.)...
> >
> > Are there fewer than 64 genres?  Use a SET or BIGINT UNSIGNED.
> >
> > > > >                  AND sg.`genre_id` IN (10,38)
> > > > >                  AND sg.`genre_id` NOT IN (22,61)
> >
> > -->
> >     AND  genre & ((1<<10) | (1<<38)) != 0
> >     AND  genre & ((1<<22) | (1<<61))  = 0
> >
> > This would avoid having that extra table, and save a lot of space.
> >
> > If you have more than 64 genres, then
> > Plan A: clump them into some kind of application grouping and use
> multiple
> > INTs/SETs.
> > Plan B: do mod & div arithmetic to compute which genre field to tackle.
> >
> > For B, something like:
> >     AND  (genre1 & (1<<0)) + (genre3 & (1<<8)) != 0
> >     AND  (genre2 & (1<<2)) + (genre6 & (1<<1))  = 0
> > (That's assuming 10 bits per genre# field.  I would use 32 and INT
> > UNSIGNED.)
> >
> >
> >
> > > -----Original Message-----
> > > From: Daevid Vincent [mailto:daevid@stripped]
> > > Sent: Tuesday, June 11, 2013 4:17 PM
> > > To: mysql@stripped
> > > Cc: 'shawn green'
> > > Subject: RE: How do I select all rows of table that have some rows in
> > > another table (AND, not OR)
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: shawn green [mailto:shawn.l.green@stripped]
> > > > Sent: Tuesday, June 11, 2013 2:16 PM
> > > > To: mysql@stripped
> > > > Subject: Re: How do I select all rows of table that have some rows
> in
> > > > another table (AND, not OR)
> > > >
> > > > Hello Daevid,
> > > >
> > > > On 6/11/2013 3:59 PM, Daevid Vincent wrote:
> > > > > I am trying to implement a filter so that a user could select
> > > > > various
> > > > genres
> > > > > they want "in" or "out". Perhaps they like 'action' and 'car
> chases'
> > > > > but don't like 'foreign' and 'drifting' (or whatever...)
> > > > >
> > > > > So I want something sort of like this, however IN() is using an
> "OR"
> > > > > comparison when I need it to be an "AND"
> > > > >
> > > > > SELECT DISTINCT
> > > > >          s.`scene_id` AS `id`,
> > > > >          GROUP_CONCAT(sg.`genre_id`) FROM
> > > > >          `dvds` AS d
> > > > >          JOIN `scenes_list` AS s
> > > > >                  ON s.`dvd_id` = d.`dvd_id`
> > > > >          JOIN `scenes_genres` AS sg
> > > > >                  ON sg.`scene_id` = s.`scene_id`
> > > > >                  AND sg.`genre_id` IN (10,38)
> > > > >                  AND sg.`genre_id` NOT IN (22,61) GROUP BY
> > > > > s.`scene_id`;
> > > > >
> > > > > This is giving me way way too many rows returned.
> > > > >
> > > > > For example, I would expect this scene_id to be in the result
> set:
> > > > >
> > > > > SELECT * FROM scenes_genres WHERE scene_id = 17;
> > > > >
> > > > > scene_id  genre_id
> > > > > --------  ----------
> > > > >        17           1
> > > > >        17           3
> > > > >        17          10 <--
> > > > >        17          19
> > > > >        17          38 <--
> > > > >        17          53
> > > > >        17          58
> > > > >        17          59
> > > > >
> > > > > And this scene ID to NOT be in the result set:
> > > > >
> > > > > SELECT * FROM scenes_genres WHERE scene_id = 11;
> > > > >
> > > > > scene_id  genre_id
> > > > > --------  ----------
> > > > >        11           1
> > > > >        11          10 <--
> > > > >        11          19
> > > > >        11          31
> > > > >        11          32
> > > > > 		       <-- but does not have 38
> > > > >        11          59
> > > > >
> > > > > I've tried various subselect ideas, but of course this fails b/c
> > > genre_id
> > > > > can't be multiple things at one time (AND)
> > > > >
> > > > >     JOIN `scenes_genres` AS sg
> > > > >       ON sg.`scene_id` = s.`scene_id`
> > > > >          AND sg.`genre_id` IN (
> > > > >            SELECT `genre_id` FROM `scenes_genres`
> > > > >             WHERE `genre_id` = 10
> > > > >                    AND `genre_id` = 38
> > > > >                    AND `genre_id` <> 22
> > > > >                    AND `genre_id` <> 61
> > > > >         )
> > > > >
> > > > > And straight up like this failure too...
> > > > >
> > > > >    JOIN `scenes_genres` AS sg
> > > > >      ON sg.`scene_id` = s.`scene_id`
> > > > >         AND (sg.`genre_id` = 10 AND sg.`genre_id` = 38)
> > > > >         AND NOT (sg.`genre_id` = 22 AND sg.`genre_id` = 61)
> > > > >
> > > > > So I'm sort of out of ideas at this point and hoping someone has
> a
> > > > > way
> > > to
> > > > do
> > > > > this.
> > > > >
> > > > > Also, just for S&G this is how we are currently implementing
> it,
> but
> > > > > we
> > > > feel
> > > > > the REGEXP is killing our queries and while "clever" is a bit
> hacky
> > > > > and nullifies any indexes we have on the genres column as it
> > > > > requires a file_sort table scan to compare substrings
> basically...
> > > > >
> > > > > SELECT * FROM scene_all_genres WHERE scene_id = 17;
> > > > >
> > > > > scene_id  genres
> > > > > --------  ------------------------
> > > > >        17  1|3|10|19|38|53|58|59|
> > > > >
> > > > > SELECT * FROM scene_all_genres WHERE scene_id = 11;
> > > > >
> > > > > scene_id  genres
> > > > > --------  -------------------
> > > > >        11  1|10|19|31|32|59|
> > > > >
> > > > > SELECT DISTINCT
> > > > >          s.`scene_id` AS `id`,
> > > > >          sg.`genres`
> > > > > FROM
> > > > >          `scene_all_genres` AS sg,
> > > > >          `dvds` AS d,
> > > > >          `scenes_list` AS s
> > > > > WHERE 	 dvd_id` = d.`dvd_id`
> > > > >          AND sg.`scene_id` = s.`scene_id`
> > > > >          AND sg.`genres` REGEXP
> '[[:<:]]10[[:>:]].*[[:<:]]38[[:>:]]'
> > > > >          AND sg.`genres` NOT REGEXP
> > > > '(([[:<:]]22[[:>:]])|([[:<:]]61[[:>:]]))'
> > > > >
> > > > > ;
> > > > >
> > > > >
> http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp
> > > > >
> > > > > [[:<:]], [[:>:]]
> > > > >
> > > > > These markers stand for word boundaries. They match the
> beginning
> > > > > and
> > > end
> > > > of
> > > > > words, respectively. A word is a sequence of word characters
> that
> is
> > > > > not preceded by or followed by word characters. A word character
> is
> > > > > an alphanumeric character in the alnum class or an underscore
> (_).
> > > > >
> > > >
> > > > To me it looks like you want to know how to match N for N when
> looking
> > > > for subset of properties. It's a multi-step process using plain SQL
> > > > but it's fast:
> > > >
> > > > 1) Build a list of scenes containing the genres you want to see
> > > >
> > > > CREATE TEMPORARY TABLE tmpList(key(scene_id, genre_id)) SELECT
> > > > scene_id, genre_id FROM scenes_genres WHERE genre_id` IN (10,38)
> <--
> > > > in this case you have 2 terms to mach
> > > >
> > > > 2) From that list, determine which of those scenes also contain
> > > > unwanted genres and remove them.
> > > >
> > > > DELETE tmpList
> > > > FROM tmpList t
> > > > INNER JOIN scenes_genres sg
> > > >    on sg.scene_id = t.scene_id
> > > >    and sg.genre_id IN (22,61)
> > > >
> > > > # at this point, tmpList contains all scenes that have any of the
> > > > desired genres but none of the unwanted ones.
> > > >
> > > > 3) Check to see if any scene has all N matches.
> > > >
> > > > SELECT scene_id, count(genre_id) as matches FROM tmpList GROUP BY
> > > > scene_id HAVING matches = 2
> > > > /* ^--- this is the number of terms you are trying to match. */
> > > >
> > > >
> > > > Of course you can modify the last query to eliminate the HAVING
> clause
> > > > and pick the top 5 matching scenes (even if they are partial
> matches)
> > > > like this
> > > >
> > > > SELECT scene_id, count(genre_id) as matches
> > > > FROM tmpList
> > > > GROUP BY scene_id
> > > > ORDER BY matches DESC
> > > > LIMIT 5
> > > >
> > > > Let us know if this is not what you wanted to do.
> > > >
> > > >
> > > >
> > > > Note to the rest of the list:
> > > >
> > > > Yes, we could have done most of this with self-joins or subqueries,
> an
> > > > EXISTS, and a NOT EXISTS inside a single statement but I think the
> > > > execution time would have been miserable. I also think that that
> > > > approach also would not have allowed us to evaluate a partial match
> > > > (like 5 out of 7 target genres), only complete matches would have
> been
> > > > returned.
> > > >
> > > > I am still very interested in seeing alternative solutions :)
> > > >
> > > > --
> > > > Shawn Green
> > > > MySQL Principal Technical Support Engineer
> > > > Oracle USA, Inc. - Hardware and Software, Engineered to Work
> Together.
> > > > Office: Blountville, TN
> > >
> > > Shawn, thank you for taking the time to reply. I wasn't expecting the
> > > solution to be so much "work" with multiple statements like that. I
> was
> > > thinking it could be done in one (or two, as in split out a portion of
> it
> > > in
> > > PHP and re-insert it to the original SQL to avoid a JOIN or
> something).
> > > Part
> > > of the issue is that we use PHP to generate the $sql string by
> appending
> > > bits and pieces depending on the search criteria thereby keeping the
> > > 'path'
> > > through the SQL statement simple and relatively linear. To implement
> this
> > > would require significant re-writing and/or special cases where we
> could
> > > introduce errors or omissions in the future. The frustrating part is
> that
> > > the REGEXP query we use now only takes about 2 seconds on my DEV VM
> (same
> > > database as PROD), however when the RDBMS is loaded it then takes up
> to 30
> > > seconds so in theory it's not even that inefficient given the # rows.
> We
> > > do
> > > use memcached for the results, but since there are so many
> combinations a
> > > user could choose, our hit ratio is not so great and therefore the
> cache
> > > isn't doing us much good and this is why the RDBMS can get loaded up
> > > easily.
> > >
> > > How can an "OR" be so simple using IN() but "AND" be so overly
> complex?
> > > Seems that mysql should have another function for ALL() that works
> just
> > > like
> > > IN() to handle this kind of scenario.
> > >
> > >
> > >
> > >
> > > --
> > > MySQL General Mailing List
> > > For list archives: http://lists.mysql.com/mysql
> > > To unsubscribe:    http://lists.mysql.com/mysql
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/mysql

Thread
How do I select all rows of table that have some rows in another table (AND, not OR)Daevid Vincent11 Jun
  • Re: How do I select all rows of table that have some rows in anothertable (AND, not OR)shawn green11 Jun
    • RE: How do I select all rows of table that have some rows in another table (AND, not OR)Daevid Vincent11 Jun
      • RE: How do I select all rows of table that have some rows inanother table (AND, not OR)Rick James12 Jun
        • RE: How do I select all rows of table that have some rows in another table (AND, not OR)Daevid Vincent12 Jun
          • RE: How do I select all rows of table that have some rows inanother table (AND, not OR)Rick James12 Jun
      • Re: How do I select all rows of table that have some rows in anothertable (AND, not OR)shawn green12 Jun
  • Re: How do I select all rows of table that have some rows inanother table (AND, not OR)hsv13 Jun
RE: How do I select all rows of table that have some rows in another table (AND, not OR)Daevid Vincent12 Jun
Re: How do I select all rows of table that have some rows inanother table (AND, not OR)hsv13 Jun
RE: How do I select all rows of table that have some rows in another table (AND, not OR)hsv18 Jun