From: Peter Brawley Date: January 27 2006 11:30pm Subject: Re: NOT IN vs IS NULL List-Archive: http://lists.mysql.com/mysql/194304 Message-Id: <43DAACFF.30409@earthlink.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=======AVGMAIL-43DAAD000A62=======" --=======AVGMAIL-43DAAD000A62======= Content-Type: multipart/alternative; boundary=------------080202000906000003080401 --------------080202000906000003080401 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Devananda, />Could anyone give me some insight as to which of the following queries >is "better" (and why) - or if there is another query that would be faster than either? / It's late on Friday so I could be missing something, but wouldn't the following be simpler? SELECT offer_id FROM paytable AS pt LEFT JOIN publisher_advertiser_blocks as pab USING (login_id) WHERE pab.login_id IS NULL; PB ----- > In general, I try to stay away from very large IN(..) lists because I > have seen them regularly degrade performance, but in this case the > alternative that I have found doesn't seem to perform any faster. > Could anyone give me some insight as to which of the following queries > is "better" (and why) - or if there is another query that would be > faster than either? > > I am using MySQL 4.1.14. There are three tables, > > offers: > CREATE TABLE `offers` ( > `offer_id` int(11) NOT NULL auto_increment, > `advertiser_id` int(11) NOT NULL default '0', > ... > PRIMARY KEY (`offer_id`), > KEY `advertiser_id` (`advertiser_id`) > ) > > paytable: > CREATE TABLE `paytable` ( > `offer_id` int(11) NOT NULL default '0', > `login_id` int(11) NOT NULL default '0', > ... > PRIMARY KEY (`offer_id`,`login_id`) > ) > > publisher_advertiser_blocks: > CREATE TABLE `publisher_advertiser_blocks` ( > `login_id` int(11) NOT NULL default '0', > `advertiser_id` int(11) NOT NULL default '0', > PRIMARY KEY (`login_id`,`advertiser_id`) > ) > > The goal of these queries is to select all offer_id's from `paytable` > for a known login_id where that login_id is not "blocked" from that > offer_id. While testing I simply selected count(*) to keep my result > set from crowding the screen. > > The two queries have about the same execution time but very different > EXPLAIN results... without further ado, here they are: > > explain > select count(*) from paytable > where login_id=# > and offer_id NOT IN > ( > select distinct offer_id > from offers > left join publisher_advertiser_blocks pab > using (advertiser_id) > where pab.login_id=# > ); > > +----+--------------------+----------+--------+-----------------------+---------+---------+--------------------------------------+---------+----------------------------- > > -+ > | id | select_type | table | type | possible_keys > | key | key_len | ref | rows | > Extra | > +----+--------------------+----------+--------+-----------------------+---------+---------+--------------------------------------+---------+----------------------------- > > -+ > | 1 | PRIMARY | paytable | index | NULL > | PRIMARY | 5 | NULL | 1773152 | > Using where; Using index | > | 2 | DEPENDENT SUBQUERY | offers | eq_ref | PRIMARY,advertiser_id > | PRIMARY | 4 | func | 1 | > Using where; Using temporary > | > | 2 | DEPENDENT SUBQUERY | pab | eq_ref | PRIMARY > | PRIMARY | 8 | const,affiliate.offers.advertiser_id | 1 | > Using where; Using index | > +----+--------------------+----------+--------+-----------------------+---------+---------+--------------------------------------+---------+----------------------------- > > -+ > 3 rows in set (0.00 sec) > > > > explain > select count(*) > from paytable > left join > ( > select distinct offer_id > from offers > left join publisher_advertiser_blocks pab > using (advertiser_id) > where pab.login_id=# > ) as a using (offer_id) > where login_id=# and a.offer_id IS NULL; > +----+-------------+------------+-------+---------------+---------------+---------+-----------------------------+---------+-------------------------------------------+ > > | id | select_type | table | type | possible_keys | > key | key_len | ref | rows | > Extra | > +----+-------------+------------+-------+---------------+---------------+---------+-----------------------------+---------+-------------------------------------------+ > > | 1 | PRIMARY | paytable | index | NULL | > PRIMARY | 5 | NULL | 1773152 | > Using where; Using index | > | 1 | PRIMARY | | ALL | NULL | > NULL | NULL | NULL | 309 | > Using where; Not exists | > | 2 | DERIVED | pab | ref | PRIMARY | > PRIMARY | 4 | | 2 | > Using where; Using index; Using temporary | > | 2 | DERIVED | offers | ref | advertiser_id | > advertiser_id | 2 | affiliate.pab.advertiser_id | 8 | > Using where | > +----+-------------+------------+-------+---------------+---------------+---------+-----------------------------+---------+-------------------------------------------+ > > 4 rows in set (0.01 sec) > > > > Thanks in advance, > --------------080202000906000003080401 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Devananda,

>Could anyone give me some insight as to which of the following queries
>is "better" (and why) - or if there is another query that would be faster than either?


It's late on Friday so I could be missing something, but wouldn't the following be simpler?

SELECT offer_id
FROM paytable AS pt
LEFT JOIN publisher_advertiser_blocks as pab USING (login_id)
WHERE pab.login_id IS NULL;

PB

-----
In general, I try to stay away from very large IN(..) lists because I have seen them regularly degrade performance, but in this case the alternative that I have found doesn't seem to perform any faster. Could anyone give me some insight as to which of the following queries is "better" (and why) - or if there is another query that would be faster than either?

I am using MySQL 4.1.14. There are three tables,

offers:
CREATE TABLE `offers` (
 `offer_id` int(11) NOT NULL auto_increment,
 `advertiser_id` int(11) NOT NULL default '0',
...
 PRIMARY KEY  (`offer_id`),
 KEY `advertiser_id` (`advertiser_id`)
)

paytable:
CREATE TABLE `paytable` (
 `offer_id` int(11) NOT NULL default '0',
 `login_id` int(11) NOT NULL default '0',
...
 PRIMARY KEY  (`offer_id`,`login_id`)
)

publisher_advertiser_blocks:
CREATE TABLE `publisher_advertiser_blocks` (
 `login_id` int(11) NOT NULL default '0',
 `advertiser_id` int(11) NOT NULL default '0',
 PRIMARY KEY  (`login_id`,`advertiser_id`)
)

The goal of these queries is to select all offer_id's from `paytable` for a known login_id where that login_id is not "blocked" from that offer_id. While testing I simply selected count(*) to keep my result set from crowding the screen.

The two queries have about the same execution time but very different EXPLAIN results... without further ado, here they are:

explain
select count(*) from paytable
where login_id=#
and offer_id NOT IN
(
select distinct offer_id
from offers
left join publisher_advertiser_blocks pab
using (advertiser_id)
where pab.login_id=#
);

+----+--------------------+----------+--------+-----------------------+---------+---------+--------------------------------------+---------+-----------------------------
-+
| id | select_type        | table    | type   | possible_keys         | key     | key_len | ref                                  | rows    | Extra                      |
+----+--------------------+----------+--------+-----------------------+---------+---------+--------------------------------------+---------+-----------------------------
-+
|  1 | PRIMARY            | paytable | index  | NULL                  | PRIMARY |       5 | NULL                                 | 1773152 | Using where; Using index   |
|  2 | DEPENDENT SUBQUERY | offers   | eq_ref | PRIMARY,advertiser_id | PRIMARY |       4 | func                                 |       1 | Using where; Using temporary
|
|  2 | DEPENDENT SUBQUERY | pab      | eq_ref | PRIMARY               | PRIMARY |       8 | const,affiliate.offers.advertiser_id |       1 | Using where; Using index   |
+----+--------------------+----------+--------+-----------------------+---------+---------+--------------------------------------+---------+-----------------------------
-+
3 rows in set (0.00 sec)



explain
select count(*)
from paytable
left join
(
select distinct offer_id
from offers
left join publisher_advertiser_blocks pab
using (advertiser_id)
where pab.login_id=#
) as a using (offer_id)
where login_id=# and a.offer_id IS NULL;   
+----+-------------+------------+-------+---------------+---------------+---------+-----------------------------+---------+-------------------------------------------+
| id | select_type | table      | type  | possible_keys | key           | key_len | ref                         | rows    | Extra                                     |
+----+-------------+------------+-------+---------------+---------------+---------+-----------------------------+---------+-------------------------------------------+
|  1 | PRIMARY     | paytable   | index | NULL          | PRIMARY       |       5 | NULL                        | 1773152 | Using where; Using index                  |
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL          |    NULL | NULL                        |     309 | Using where; Not exists                   |
|  2 | DERIVED     | pab        | ref   | PRIMARY       | PRIMARY       |       4 |                             |       2 | Using where; Using index; Using temporary |
|  2 | DERIVED     | offers     | ref   | advertiser_id | advertiser_id |       2 | affiliate.pab.advertiser_id |       8 | Using where                               |
+----+-------------+------------+-------+---------------+---------------+---------+-----------------------------+---------+-------------------------------------------+
4 rows in set (0.01 sec)



Thanks in advance,

--------------080202000906000003080401-- --=======AVGMAIL-43DAAD000A62======= Content-Type: text/plain; x-avg=cert; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Content-Description: "AVG certification" No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.14.23/243 - Release Date: 1/27/2006 --=======AVGMAIL-43DAAD000A62=======--