List:General Discussion« Previous MessageNext Message »
From:Devananda Date:February 2 2006 7:05pm
Subject:Re: NOT IN vs IS NULL
View as plain text  
Shawn,

I've just found out that most of my emails during this discussion were 
NOT posted to the list because I was sending mail in HTML format and 
Yahoo was not delivering the bounce notices to me. That explains why you 
believed I was not listening to Peter's input - only _his_ messages were 
appearing on the list!

I'm adding my last two emails to the end of this one so that it doesn't 
appear as though I have ignored everyone, and in the hope that the 
conclusions I reached (with help from both of you) will be of use to 
someone else.

My sincere apologies for the confusion that I inadvertently caused.

Best regards,
Devananda

----------------

Actually, what was missing all along was "pab.login_id=1" in the ON 
clause, not the WHERE clause!!! I guess I've not used that sort of 
condition in a JOIN before...  If I rewrite your query like this, which 
is much simpler, it does in fact work very, very well!

SELECT pt.offer_id
FROM paytable AS pt
INNER JOIN offers AS o USING (offer_id)
LEFT JOIN publisher_advertiser_blocks AS pab
ON o.advertiser_id=pab.advertiser_id AND pab.login_id=1
WHERE pt.login_id=1 AND pab.advertiser_id IS NULL;

That is just the answer I was hoping for from the start! Thanks :)

On a side note, I looked at the MySQL reference manual for JOINs ( 
http://dev.mysql.com/doc/refman/5.0/en/join.html ) again, and while the 
syntax of putting a row-limiting condition on the right table in a LEFT 
JOIN isn't discussed in the actual documentation (that's where I do most 
of my reading...) it IS mentioned in a post at the bottom of the page! 
Damn, I wish I had seen that last week ...

Thanks for the help Peter



------------

SGreen@stripped wrote:
> You are correct, that the situation you describe was not clearly 
> presented in your previous reply.  I humbly apologize for any offense.
>
Apology humbly accepted. I only took offense because I have been paying 
close attention to all the suggestions that Peter and you have been 
offering, trying to learn from them (as I do with most of your posts to 
the mailing list).

> Using the EXPLAIN you posted in your latest reply, you can translate 
> your subquery into the following JOINed query
>
> SELECT p.offer_id
> FROM paytable p
> LEFT JOIN offers o
>         ON o.advertiser_id = p.advertiser_id
> WHERE pt.login_id=1
>         AND o.offer_id is null;
>
> Which is not what I think you were actually trying to write. The terms 
> selecting values from  publisher_advertiser_blocks disappeared because 
> they are on the *right* side of a LEFT JOIN and played no part in 
> actually limiting your final results.  Here is the needs statement 
> from your first post:
> >>>>The goal of these queries is to select all offer_id's from
> `paytable`
> for a known login_id where that login_id is not "blocked" from that
> offer_id.<<<<
>
>  I would write it this way
>
> CREATE TEMPORARY TABLE tmpBlocks SELECT DISTINCT
>         o.offer_id
> FROM offers
> INNER JOIN publisher_advertiser_blocks pab
>         ON pab.advertiser_id = o.advertiser_id
>         AND  pab.login_id = 1;
>
> ALTER TABLE  tmpBlocks ADD KEY(offer_id);
>
> SELECT p.offer_id
> FROM paytable p
> LEFT JOIN tmpBlocks tb
>         ON tb.offer_id = p.offer_id
> WHERE tb.offer_id IS NULL;
>
> DROP TEMPORARY TABLE tmpBlocks;
With a slight change, what you suggested does work quite well. The final 
SELECT needs "AND p.login_id=1" in the WHERE clause, or else it will 
return data for all login_id that have records in paytable.
>
> One trick to working efficiently with "larger" datasets (millions of 
> rows per table)  is to minimize the number of rows being joined at one 
> time. By breaking this query into two statements we keep our JOIN 
> combinations to a minimum so that at each successive stage we are 
> working with smaller sets of data than if we had written it as a 
> single statement.
>
Indeed! I often use temporary tables, since some of the datasets I work 
with are in the tens or hundreds of millions of rows.

> The term "pab.login_id=1" is in the ON clause because your index on 
> publisher_advertiser_block is defined in such a way that makes it 
> better to be in the ON clause than in the WHERE clause (also 
> information from your original post). There was a posting not long ago 
> that said that the statistics of a temporary table's indexes were not 
> updated if they exist before you fill the table with data.
That's very good to know! I will keep it in mind, and probably could get 
more performance out of queries I've written in the past by doing this.
>
> Again, I apologize for any offense I may have caused,
No harm done, we all live and learn :)

Best regards, and thanks again,
Devananda

-- 
http://devananda-vdv.blogspot.com/
http://mycat.sourceforge.net/

Thread
NOT IN vs IS NULLDevananda27 Jan
  • Re: NOT IN vs IS NULLPeter Brawley28 Jan
Re: NOT IN vs IS NULLPeter Brawley31 Jan
Re: NOT IN vs IS NULLPeter Brawley1 Feb
  • Re: NOT IN vs IS NULLSGreen1 Feb
    • Re: NOT IN vs IS NULLDevananda1 Feb
      • Re: NOT IN vs IS NULLSGreen1 Feb
        • Re: NOT IN vs IS NULLDevananda2 Feb