List:General Discussion« Previous MessageNext Message »
From:James Nobis Date:April 21 2005 4:19pm
Subject:Re: possible bug left join and null
View as plain text  
Thanks everyone for such quick and thorough responses!

Quoting SGreen@stripped:

> James Nobis <james_nobis@stripped> wrote on 04/21/2005
> 10:44:07 AM:
>
>> The problem is something fairly simple but yet MySQL seems to make this
>> complicated.  Essentially, find a list of customers who have not
>> bought product
>> X ever.  (Customers have orders, orders have order line items).  All
>> 3 coworkers
>> independently arrived at the same sql which failed to work.  Then, we
> wrote it
>> as a subquery which has performance issue and finally rewrote it with a
> temp
>> table and a join.  However, it seems like what we had should have
> worked.
>>
>> Borrowing from http://builder.com.com/5100-6388_14-5532304.html about
> midway
>> down the page I set out to create an identical schema and query in
> MySQL.
>>
>> CREATE TABLE `Customer` (
>>   `id` int(11) NOT NULL default '0',
>>   `name` varchar(255) NOT NULL default ''
>> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
>>
>> INSERT INTO `Customer` VALUES (1, 'bob');
>> INSERT INTO `Customer` VALUES (2, 'nathan');
>>
>> CREATE TABLE `Order` (
>>   `id` int(11) NOT NULL auto_increment,
>>   `customer_id` int(11) NOT NULL default '0',
>>   `order_date` datetime NOT NULL default '0000-00-00 00:00:00',
>>   PRIMARY KEY  (`id`)
>> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
>>
>> INSERT INTO `Order` VALUES (1, 1, '0000-00-00 00:00:00');
>> INSERT INTO `Order` VALUES (2, 2, '0000-00-00 00:00:00');
>>
>> CREATE TABLE `OrderLines` (
>>   `order_id` int(11) NOT NULL default '0',
>>   `product_id` int(11) NOT NULL default '0',
>>   `quantity` int(11) NOT NULL default '0'
>> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
>>
>> INSERT INTO `OrderLines` VALUES (1, 5, 1);
>> INSERT INTO `OrderLines` VALUES (1, 9, 1);
>> INSERT INTO `OrderLines` VALUES (2, 15, 1);
>> INSERT INTO `OrderLines` VALUES (2, 25, 1);
>>
>> Then, I run the following query:
>> SELECT DISTINCT Customer.id, Customer.name
>> FROM Customer
>> LEFT JOIN `Order` ON Customer.id = Order.customer_id
>> INNER JOIN OrderLines ON Order.id = OrderLines.order_id
>> AND OrderLines.product_id =9
>> WHERE Order.customer_id IS NULL
>>
>> I would expect this to return a single row with Customer.id 2.
>>
>> Is there something obvious my coworkers and I are missing?
>>
>> James Nobis
>> Web Developer
>> Academic Superstore
>> 223 W. Anderson Ln. Suite A110, Austin, TX 78752
>> Voice: (512) 450-1199 x453 Fax: (512) 450-0263
>> http://www.academicsuperstore.com
>>
> It's hard to remember where I picked this up but I once read that it's
> generally bad form to start with an outer join (LEFT or RIGHT JOIN) and
> move into an INNER JOIN like you are doing. Because if the rows from the
> Order table are optional to the results of the query, the rows from the
> OrderLines are transitively optional as well (if an Order row doesn't
> exist then there can't be any OrderLine rows either). So an equivalent
> form of your query could have been:
>
> SELECT DISTINCT Customer.id, Customer.name
> FROM Customer
> LEFT JOIN `Order`
>        ON Customer.id = Order.customer_id
> LEFT JOIN OrderLines
>        ON Order.id = OrderLines.order_id
>        AND OrderLines.product_id =9
> WHERE Order.customer_id IS NULL;
>
> But this won't help you to determine if a Customer had NEVER ordered that
> product because you are including Order rows regardless of whether that
> order had a product #9 on it or not. I then tried a nested JOIN using
> parentheses like this and got no names:
>
> SELECT DISTINCT Customer.id, Customer.name
> FROM Customer
> LEFT JOIN (`Order`
> INNER JOIN OrderLines
>        ON Order.id = OrderLines.order_id
>        AND OrderLines.product_id =9
> ) ON Customer.id = Order.customer_id
> WHERE Order.customer_id IS NULL;
>
> The unfiltered results of that join look like this(sorry if it wraps):
>
> SELECT *
> FROM Customer
> LEFT JOIN (
>        `Order` INNER JOIN OrderLines
>                ON Order.id = OrderLines.order_id
>                AND OrderLines.product_id =9
> ) ON Customer.id = Order.customer_id;
>
> +----+--------+----+-------------+---------------------+----------+------------+----------+
> | id | name   | id | customer_id | order_date          | order_id |
> product_id | quantity |
>
> +----+--------+----+-------------+---------------------+----------+------------+----------+
> |  1 | bob    |  1 |           1 | 0000-00-00 00:00:00 |        1 |   9 |
>     1 |
> |  2 | nathan |  1 |           1 | 0000-00-00 00:00:00 |     NULL | NULL |
>    NULL |
> |  1 | bob    |  2 |           2 | 0000-00-00 00:00:00 |     NULL | NULL |
>    NULL |
> |  2 | nathan |  2 |           2 | 0000-00-00 00:00:00 |     NULL | NULL |
>    NULL |
>
> +----+--------+----+-------------+---------------------+----------+------------+----------+
> 4 rows in set (0.00 sec)
>
> Each customer has at least one order so the nested JOIN didn't work to
> find your answer either (BTW- nested joins are not documented as a valid
> syntax so I wasn't sure if it was going to work or not).
>
> However, I thought, why not do exactly what the original question stated:
> count how many times product 9 appears as a line item on an order and
> return the names of the customers where that count is 0.
>
> SELECT Customer.id
>        , Customer.name
>        , COUNT(orderlines.product_id) as LineItemCount
> FROM Customer
> LEFT JOIN `Order`
>        ON Customer.id = Order.customer_id
> LEFT JOIN OrderLines
>        ON Order.id = OrderLines.order_id
>        AND OrderLines.product_id =9
> GROUP BY 1,2;
> +----+--------+---------------+
> | id | name   | LineItemCount |
> +----+--------+---------------+
> |  1 | bob    |             1 |
> |  2 | nathan |             0 |
> +----+--------+---------------+
> 2 rows in set (0.01 sec)
>
> All we need now is a HAVING clause to pick out those who have never
> ordered #9:
>
> SELECT Customer.id
>        , Customer.name
>        , COUNT(orderlines.product_id) as LineItemCount
> FROM Customer
> LEFT JOIN `Order`
>        ON Customer.id = Order.customer_id
> LEFT JOIN OrderLines
>        ON Order.id = OrderLines.order_id
>        AND OrderLines.product_id =9
> GROUP BY 1,2
> HAVING LineItemCount=0;
> +----+--------+---------------+
> | id | name   | LineItemCount |
> +----+--------+---------------+
> |  2 | nathan |             0 |
> +----+--------+---------------+
> 1 row in set (0.00 sec)
>
> Which is the results you wanted, right?  Why didn't your original query
> work? I can't say for sure but I am sure it has something to do with the
> fact that your INNER join was subordinate to your LEFT join. There are
> several bugs about similar situations (mixing LEFT, RIGHT, and INNER in
> the same query, mixing LEFT and RIGHT) and I think the development team
> are still trying to work out the correct algorithms to use to apply the
> correct logic algebra to this class of query. Who knows, maybe your query
> will be the one that helps the light go off in their head so they can get
> this all straightened out.  Until they do, try to keep your INNER joins
> superior to your OUTER joins and you should stay out of trouble or
> refactor your query to precompute your subordinate INNER join into a temp
> table and work with it from there.
>
> With that advice in mind, this may be a faster solution
>
> CREATE TEMPORARY TABLE tmpOrders(KEY(customer_id))
> SELECT DISTINCT o.customer_id
> FROM Order o
> INNER JOIN OrderLines ol
>        ON ol.order_id = o.id
>
> SELECT DISTINCT Customer.id, Customer.name
> FROM Customer
> LEFT JOIN tmpOrders
>        ON Customer.id = tmpOrders.customer_id
> WHERE tmpOrders.customer_id is null;
>
> DROP TABLE tmpOrders;
>
> FWIW....
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine



James Nobis
Web Developer
Academic Superstore
223 W. Anderson Ln. Suite A110, Austin, TX 78752
Voice: (512) 450-1199 x453 Fax: (512) 450-0263
http://www.academicsuperstore.com

Thread
possible bug left join and nullJames Nobis21 Apr
  • Re: possible bug left join and nullJigal van Hemert21 Apr
  • Re: possible bug left join and nullSGreen21 Apr
    • Re: possible bug left join and nullJames Nobis21 Apr
  • Re: possible bug left join and nullBrent Baisley21 Apr