List:General Discussion« Previous MessageNext Message »
From:SGreen Date:February 1 2006 8:07pm
Subject:Re: NOT IN vs IS NULL
View as plain text  
You are correct, that the situation you describe was not clearly presented 
in your previous reply.  I humbly apologize for any offense.

Using the EXPLAIN you posted in your latest reply, you can translate your 
subquery into the following JOINed query

SELECT p.offer_id
FROM paytable p
LEFT JOIN offers o
        ON o.advertiser_id = p.advertiser_id
WHERE pt.login_id=1
        AND o.offer_id is null;

Which is not what I think you were actually trying to write. The terms 
selecting values from  publisher_advertiser_blocks disappeared because 
they are on the *right* side of a LEFT JOIN and played no part in actually 
limiting your final results.  Here is the needs statement from your first 
post: 
>>>>The goal of these queries is to select all offer_id's from `paytable` 
for a known login_id where that login_id is not "blocked" from that 
offer_id.<<<<

 I would write it this way

CREATE TEMPORARY TABLE tmpBlocks SELECT DISTINCT
        o.offer_id
FROM offers
INNER JOIN publisher_advertiser_blocks pab
        ON pab.advertiser_id = o.advertiser_id
        AND  pab.login_id = 1;

ALTER TABLE  tmpBlocks ADD KEY(offer_id);

SELECT p.offer_id
FROM paytable p
LEFT JOIN tmpBlocks tb
        ON tb.offer_id = p.offer_id
WHERE tb.offer_id IS NULL;

DROP TEMPORARY TABLE tmpBlocks;

One trick to working efficiently with "larger" datasets (millions of rows 
per table)  is to minimize the number of rows being joined at one time. By 
breaking this query into two statements we keep our JOIN combinations to a 
minimum so that at each successive stage we are working with smaller sets 
of data than if we had written it as a single statement.

The term "pab.login_id=1" is in the ON clause because your index on 
publisher_advertiser_block is defined in such a way that makes it better 
to be in the ON clause than in the WHERE clause (also information from 
your original post). There was a posting not long ago that said that the 
statistics of a temporary table's indexes were not updated if they exist 
before you fill the table with data. That reason and the fact that it's 
faster to build an index on a populated table than populate an indexed 
table is why I added the index after filling the table with data.

Again, I apologize for any offense I may have caused,

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



Devananda <karnah805@stripped> wrote on 02/01/2006 02:17:36 PM:

> SGreen@stripped wrote:
> > Devananda,
> >
> > I have to support Peter on this one.  What he submitted to you is a 
> > perfectly appropriate solution. It seems as though you rejected his 
> > assistance before even trying to see if it would work. 
> On the contrary, I have tried his suggestions and they do not work (see 
> the email I just sent to the list). I have also tried, but obviously not 

> done very well, to explain why they will not work in this particular 
> case. The queries Peter has suggested actually take many minutes to 
> return when run on the real data, whereas my initial subquery takes only 

> a few seconds. I had initially hoped to find a way, using a JOIN, that 
> would take less than a second.
> 
> 
> d1 mysql> explain  SELECT DISTINCT pt.offer_id FROM paytable AS pt INNER 

> JOIN offers AS o USING (offer_id) LEFT JOIN publisher_advertiser_blocks 
> AS pab1   ON pt.login_id=pab1.login_id LEFT JOIN 
> publisher_advertiser_blocks AS pab2 ON 
> o.advertiser_id=pab2.advertiser_id WHERE pt.login_id=1   AND 
> pab1.login_id IS NULL OR pab2.advertiser_id IS NULL;
> +----+-------------+-------+--------+---------------+---------
> +---------+-----------------------+---------
> +------------------------------------+
> | id | select_type | table | type   | possible_keys | key     | key_len 
> | ref                   | rows    | Extra                              |
> +----+-------------+-------+--------+---------------+---------
> +---------+-----------------------+---------
> +------------------------------------+
> |  1 | SIMPLE      | pt    | index  | PRIMARY       | PRIMARY |       5 
> | NULL                  | 1814973 | Using index; Using temporary       |
> |  1 | SIMPLE      | o     | eq_ref | PRIMARY       | PRIMARY |       4 
> | affiliate.pt.offer_id |       1 | Using where; Distinct              |
> |  1 | SIMPLE      | pab1  | ref    | PRIMARY       | PRIMARY |       4 
> | affiliate.pt.login_id |       7 | Using index; Distinct              |
> |  1 | SIMPLE      | pab2  | index  | NULL          | PRIMARY |       8 
> | NULL                  |      62 | Using where; Using index; Distinct |
> +----+-------------+-------+--------+---------------+---------
> +---------+-----------------------+---------
> +------------------------------------+
> 
> 
> 
> d1 mysql> explain SELECT offer_id FROM paytable WHERE login_id=1 AND 
> offer_id NOT IN (  SELECT offer_id  FROM offers  LEFT JOIN 
> publisher_advertiser_blocks AS pab USING (advertiser_id)  WHERE 
> pab.login_id=1 );
> +----+--------------------+----------+--------
> +-----------------------+---------+---------
> 
+--------------------------------------+---------+--------------------------+
> | id | select_type        | table    | type   | possible_keys         | 
> key     | key_len | ref                                  | rows    | 
> Extra                    |
> +----+--------------------+----------+--------
> +-----------------------+---------+---------
> 
+--------------------------------------+---------+--------------------------+
> |  1 | PRIMARY            | paytable | index  | NULL                  | 
> PRIMARY |       5 | NULL                                 | 1814973 | 
> Using where; Using index |
> |  2 | DEPENDENT SUBQUERY | offers   | eq_ref | PRIMARY,advertiser_id | 
> PRIMARY |       4 | func                                 |       1 | 
> Using where              |
> |  2 | DEPENDENT SUBQUERY | pab      | eq_ref | PRIMARY               | 
> PRIMARY |       8 | const,affiliate.offers.advertiser_id |       1 | 
> Using where; Using index |
> +----+--------------------+----------+--------
> +-----------------------+---------+---------
> 
+--------------------------------------+---------+--------------------------+
> 3 rows in set (0.03 sec)
> 
> > There have been and 
> > continue to be SQL-driven databases around that have not had and do 
not 
> > have subquery support. They function perfectly well without them.  By 
your 
> > response, it appears to me you don't know how a LEFT JOIN is supposed 
to 
> > operate. That is one of the most basic tools of any relational DBA's 
> > arsenal.
> > 
> Perhaps I didn't phrase my response well, regarding 1to1 relationship, 
> but I do understand how to use a LEFT JOIN. I freely admit I don't have 
> as much experience with all the types of joins, or in other DBMS, as 
> many of the folks on this list. Also, I _know_ that subqueries and IN 
> lists are not nearly as efficient as JOINs. That is  exactly _why_ I 
> asked in the first place!!
> 
> :)
> 
> Regards,
> Devananda
> 
> 
> > I implore everyone not comfortable with them to learn their "JOIN"s 
(INNER 
> > JOIN, LEFT JOIN, RIGHT JOIN, CROSS JOIN, etc.) before you learn how to 

> > subquery (and I do NOT mean that comma-separated crap that poses as a 
> > CROSS JOIN). So many of the performance problems posted to this list 
are 
> > subquery-related when a simple, straightforward JOIN or two will solve 
the 
> > same problem usually with _much_ better response time. This post is a 
> > perfect example.
> >
> > Explicit JOIN statements have the advantage of possibly using indexes 
at 
> > every layer of data aggregation. Subqueries lose that ability at the 
level 
> > of each subquery.  Some subqueries, depending on how they are written, 
are 
> > executed once PER ROW of your parent query data. If it takes .01 
seconds 
> > to run a subquery by itself and you have 10000 rows of parent data, 
that 
> > would be 100 seconds of nothing but subquery execution time before you 

> > even get to the GROUP BY or ORDER BY processing part of the query. 
> > Rewriting the same query to use an explicit JOIN may turn the .01 
seconds 
> > of the "naked" subquery into .02-.04 seconds because of the indexing 
> > advantage and because it computes a set of matches between the table 
only 
> > once.
> >
> > Subqueries have their uses and are appropriate for many situations. 
> > However, IMHO, they should be your second-line of attack because of 
their 
> > inherent weaknesses. I believe this to be true for the majority of the 

> > RDBM systems in the world today.
> >
> > Shawn Green
> > Database Administrator
> > Unimin Corporation - Spruce Pine
> >
> > 
> 
> -- 
> Devananda vdv
> 
> 
> http://devananda-vdv.blogspot.com/
> http://mycat.sourceforge.net/
> 

Thread
NOT IN vs IS NULLDevananda27 Jan
  • Re: NOT IN vs IS NULLPeter Brawley28 Jan
Re: NOT IN vs IS NULLPeter Brawley31 Jan
Re: NOT IN vs IS NULLPeter Brawley1 Feb
  • Re: NOT IN vs IS NULLSGreen1 Feb
    • Re: NOT IN vs IS NULLDevananda1 Feb
      • Re: NOT IN vs IS NULLSGreen1 Feb
        • Re: NOT IN vs IS NULLDevananda2 Feb