I want to run a query which show the top selling other videos bought by
people who have bought a specified video at our shop.
The information required is split over 2 tables - sale, which has the
details of the individual sale including the customer ID, and date of the
sale, and sale_items, which has the details of the items in that sale.
So, to do this neatly, I use a double self join thus:
SELECT si2.itemid, COUNT(si2.itemid) as ordered
FROM sale_items as si1, sale as s1,
sale_items as si2, sale as s2
WHERE s1.saleid = si1.saleid
AND s2.saleid = si2.saleid
AND s1.customerid = s2.customerid
AND si1.itemid = 7000000023298
AND si2.itemid != 7000000023298
AND si1.status != 99 # Don't count cancelled orders
AND si2.status != 99 # Don't count cancelled orders
GROUP BY si2.itemid
ORDER BY ordered DESC, s2.when DESC
LIMIT 5
Now, this works fine, and is fast (approx .04 secs on average)
However, if I want to not just eliminate cancelled orders (status 99),
but all orders that are still in progress, ie only include orders that
are shipped (status 40), and change the status != 99 to
status = 40, the query takes over 20 seconds! (there are, of course,
indexes on status, itemid and saleid)
An explain on the second query gives:
+-------+--------+------------------+---------+---------+---------------+------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra
|
+-------+--------+------------------+---------+---------+---------------+------+------------+
| si1 | ref | saleid,itemid | itemid | 13 | 7000000023298 | 10 | where
used |
| s1 | eq_ref | PRIMARY,personid | PRIMARY | 3 | si1.saleid | 1 |
|
| si2 | ref | saleid,status | status | 1 | ??? | 19 | where
used |
| s2 | eq_ref | PRIMARY,personid | PRIMARY | 3 | si2.saleid | 1 | where
used |
+-------+--------+------------------+---------+---------+---------------+------+------------+
whereas on the first it gives:
+-------+--------+------------------+----------+---------+---------------+------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra
|
+-------+--------+------------------+----------+---------+---------------+------+------------+
| si1 | ref | saleid,itemid | itemid | 13 | 7000000023298 | 10 | where
used |
| s1 | eq_ref | PRIMARY,personid | PRIMARY | 3 | si1.saleid | 1 |
|
| s2 | ref | PRIMARY,personid | personid | 3 | s1.customerid | 16 |
|
| si2 | ref | saleid | saleid | 3 | s2.saleid | 18 | where
used |
+-------+--------+------------------+----------+---------+---------------+------+------------+
This is on mysql-3.21.33c under Debian Linux 2.0.35 which is still
our production server database.
Under 3.22.20a, both queries take about .88 seonds, and give an explain
of:
+-------+--------+------------------+----------+---------+---------------+-------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra
|
+-------+--------+------------------+----------+---------+---------------+-------+------------+
| si1 | range | status | status | NULL | NULL | 43123 |
|
| s1 | eq_ref | PRIMARY,personid | PRIMARY | 3 | si1.saleid | 1 |
|
| s2 | ref | PRIMARY,personid | personid | 3 | s1.customerid | 16 |
|
| si2 | ref | saleid | saleid | 3 | s2.saleid | 18 | where
used |
+-------+--------+------------------+----------+---------+---------------+-------+------------+
Which seems to be doing it differently again ...
Would a straight_join help get it to about .04 for both queries? Or what's
going on?
Tony
--
-----------------------------------------------------------------------------
Tony Bowden | tony@stripped http://www.blackstar.co.uk/
Black Star | The UK's Biggest Video & DVD store * Free Postage Worldwide
-----------------------------------------------------------------------------
| Thread |
|---|
| • Strange index behaviour on double self join | tony | 26 Mar |