In the last episode (Feb 01), Michael Griffith said:
> CREATE TABLE test(
> userID int, # (non-unique)
> testID int, # (non-unique)
> PRIMARY key(testid,userid)
> Suppose this table is populated with 1,000,000 rows. Then do this:
> DELETE FROM test WHERE userID=XXXX AND testID<200000 OR testID>800000;
> This query is EXTREMELY slow becasue it looks at every row in the DB.
> A significant improvement can be acheived by splitting it into 2
> DELETE FROM test WHERE userID=XXXX AND testID<200000;
> DELETE FROM test WHERE userID=XXXX AND testID>800000;
Avoiding ORs is a good idea, but I don't think your example
demonstrates it. The real problem is ORs on two different columns.
You have: "(testID<200000 AND userID=XXXX) OR testID>800000", which can
be optimized with a two-column index on (testID,userID). A better
example would be "DELETE FROM test WHERE userID=XXXX OR testID=XXXX".
> On real data I've acheived at least a ten fold increase doing this.
> This is easy to optimize from the client side, however, I don't see
> any reason why this optimization can't or shouldn't be build into the
> server. Whenever an OR can be split into two separate queries on the
> same index this optimization should work.
What you're suggesting here is splitting a query into a UNION of two
smaller queries, essentially. But this is not always the best way.
Keep in mind that your DELETE example doesn't have to worry about
records that are true for both halves of your OR (since the record
would get deleted by the first statement). If it were really a SELECT
query, mysql would have to run the first half of the query, remember
all the record numbers, then run the second query and make sure that no
duplicate records are returned:
SELECT * FROM test WHERE userID=XXXX or testID=XXXX;
i.e. full table scan
would become (if mysql supported UNION):
SELECT DISTINCT * FROM
SELECT * FROM test WHERE userID=XXXX
UNION SELECT * FROM test WHERE testID=XXXX
i.e. random record pull based on userID index +
random record pull based on testID index +
mergesort + remove duplicate records
If there is a lot of overlap between the two OR clauses, or the
percentage of returned records is high relative to the size of the
table, doing a full table scan would be faster.