List:General Discussion« Previous MessageNext Message »
From:Michael Stassen Date:September 19 2005 3:06pm
Subject:Re: Query matches twice, but not simultaneously...
View as plain text  
Hobbs, Richard wrote:
 > Hello,
 >
 > I have two tables - one containing messages, and another containing links
 > between messages and other messages in a tree structure (much like a
 > threaded mailing list archiving thing). A single message can have
 > multiple "parents" though, meaning the links table can have several
 > entries for a single message.
 > For example:
 >
 > ----------------
 > child    parent
 > 10        5
 > 10        7
 > 11        5
 > 12        7
 > 13        5
 > 13        7
 > ----------------
 >
 > I would like to display all messages which match both 5 and 7 in terms of
 > the parent_id, meaning messages 10 and 13 would be displayed.
 >
 > I have used the following query:
 >
 > ------------------------------------------------------------
 > SELECT DISTINCT message.username,message.content
 > FROM message,links WHERE links.child_id = message.id AND (
 > links.parent_id = 5 OR links.parent_id = 7
 > );
 > ------------------------------------------------------------
 >
 > NOTE: Without the word DISTINCT, if this query finds a message that
 > matches both 5 AND 7, it will display the message twice. I have obviously
 > used DISTINCT as an easy way to get around this problem.
 >
 > However, this query displays the message if it matches 5 OR 7. I only
 > want it to be displayed if it matches 5 AND 7.
 >
 > However, if i change the word "OR" to "AND", it displays no message at
 > all!
 >
 > I presume this is because it finds two instances of each message, neither
 > of which match both 5 and 7 (i.e. the first instance matches 5, but not
 > 7, and the second instance matches 7, but not 5).
 >
 > Does anyone know how i can get around this problem?
 >
 > Thanks in advance to anyone who can help! :-)
 >
 > Richard.


Peter Brawley wrote:
 > Richard
 >
 > >I would like to display all messages which match both 5 and 7 in terms
 > >of the parent_id, meaning messages 10 and 13 would be displayed.
 >
 > SELECT f1.child
 > FROM foo AS f1
 > INNER JOIN foo AS f2 USING(child)
 > WHERE  f1.parent=5 AND f2.parent=7;
 >
 > PB

Hobbs, Richard wrote:
 > Hello,
 >
 > Perfect :-)
 >
 > Thank you,
 > Hobbs.

Peter's solution is a self-join.  Here it is translated to your tables:

   SELECT message.username, message.content
   FROM message
   JOIN links l1 ON l1.child_id = message.id
   JOIN links l2 ON l2.child_id = message.id
   WHERE l1.parent_id = 5
     AND l2.parent_id = 7;

This works fine.  For completeness, I'll point out an alternate solution.

Your original query, before you added DISTINCT, produced two rows for each 
message you wanted, and 1 row for each message that had one, but not both, 
of the desired parents.  That is, number of rows per message equals number 
of matching criteria.  We can use this difference to select only the rows 
you want:

   SELECT message.username, message.content
   FROM message
   JOIN links ON links.child_id = message.id
   WHERE links.parent_id IN (5, 7)
   GROUP BY message.id
   HAVING COUNT(*) = 2;

This replaces a JOIN with a GROUP BY, which may be faster.  You might want 
to test both ways to see which works better for your data.

If you will ever need messages with more than 2 specified parents, I think 
you'll find the second method works better.  The self-join method requires 
an additional JOIN and an additional WHERE condition for each requirement. 
For example, here's the self join for 3 criteria:

   SELECT message.username, message.content
   FROM message
   JOIN links l1 ON l1.child_id = message.id
   JOIN links l2 ON l2.child_id = message.id
   JOIN links l3 ON l3.child_id = message.id
   WHERE l1.parent_id = 5
     AND l2.parent_id = 7
     AND l3.parent_id = 8;

In contrast, the GROUP BY solution changes very little:

   SELECT message.username, message.content
   FROM message
   JOIN links ON links.child_id = message.id
   WHERE links.parent_id IN (5, 7, 8)
   GROUP BY message.id
   HAVING COUNT(*) = 3;

The extra criteria are added to the IN list, and the HAVING clause is 
changed to look for COUNT(*) = number_of_criteria.  In this case, notice 
that we have replaced two JOINs with one GROUP BY.

(Note: For the GROUP BY versions, I am assuming there is a unique value of 
message.username and message.content for each message.id, and I'm using a 
mysql extension 
<http://dev.mysql.com/doc/mysql/en/group-by-hidden-fields.html>.  If the 
assumption is wrong, or you want portability, change the GROUP BY clause to 
"GROUP BY message.username, message.content".)

Michael
Thread
Query matches twice, but not simultaneously...Richard Hobbs18 Sep
  • Re: Query matches twice, but not simultaneously...Peter Brawley19 Sep
    • Re: Query matches twice, but not simultaneously...Richard Hobbs19 Sep
      • Re: Query matches twice, but not simultaneously...Michael Stassen19 Sep