Hi! Comments inline...
> MySQL internalists:
> I want to write an assertion (for ActiveRecord, but that's not important
> here) which traps production SQL SELECT statements, calls EXPLAIN on
> each one, then parses their results, looking for tell-tail issues. (The
> goal is we wrap target code in assert_efficient_sql, then if a future
> upgrade makes the code inefficient, the assertion catches the problem
> before the clients do.)
> Here's an example of my query failing:
> def test_assert_inefficient_sql
> assert_raise_message Test::Unit::AssertionFailedError,
> /Pessimistic.*Accessory Load.*ALL/m do
> assert_efficient_sql do
> Accessory.find_by_sql('select * from accessories, rings
> where accessories.id = rings.id')
> # noooo! they don't link like that!
> That assertion would print this diagnostic (edited for width):
> pessimistic query Accessory Load!
> select * from accessories, rings where accessories.id = rings.id
> select_type | key_len | type | rows | table | ref |
> SIMPLE | | ALL | 1 | rings | |
> SIMPLE | 4 | eq_ref | 1 | accessories | testdb.rings.id |
> The assertion failed, semantically, because this application has no good
> reason to JOIN accessories and rings, so they have no index in common.
Seems to me that rings and accessories *would* have an index in common,
as shown above: the rings.id column is related to the accessories.id
> Syntactically, it fails because it has a type of ALL. (I heard somewhere
> that reading every row in a table, just to find one record, is bad DB
It has an ALL because there is no WHERE condition supplied on an indexed
column, no because of the join.
> The relevant source code, inside the assertion, just goes 'ALL' ==
> explanation ['type'].
This is a good basic check for SQL inefficiency, as it checks for full
table scans, which are indicators that an index is missing or a WHERE
condition is not being passed. However, there are *many* cases when
full table scans are perfectly fine:
* SELECTing all of a lookup table
* Data warehousing scenarios (indexes are uncommon due to data-loading
* Small tables -- the optimizer may simply do a full sequential table
scan versus numerous index->data page lookups.
So, checking for inefficiencies can be a little tricky. I think a
better rule might be something along the lines of:
if (('ALL' == explanation['type'] &&
10000 < explanation['rows'] ) ||
or something to that effect (I don't know Ruby, that's just pseudo-code,
but I think you get the idea...
Essentially, you will want to make sets of rules that you run the
EXPLAIN plan through to check for common inefficiencies...
> The question now is what else can I load up inside the assertion? What
> other checks should
> A> always,
> B> frequently, or
> C> opportunistically
> pass in the response from EXPLAIN, to detect unneeded DB loading?
Well, you might want to check the diffs of SHOW SESSION STATUS variables
also. Significant increases in the following counter variables may show
inefficient queries or indexing:
Created_tmp_disk_tables (worse than above...)
Table_locks_waited (occurs in badly written transaction code sometimes)
Select_full_join (occurs when joins are done across non-indexed columns
or when a join condition uses a function on an indexed column)
Select_full_range_join (similar to above but for range conditions)
to name just a few... :)
Qcache_lowmem_prunes (Can occur when query cache gets very fragmented)
> (Please rank suggestions as listed, because a multi-tiered
> assert_efficient_sql is more valuable than one that only targets A, such
> as "ALL"!)
> This assertion is for _unit_ tests, not customer acceptance tests or QA
> tests, so I hope to bypass the time-honored method of loading up a
> billion records, running them for 20 minutes, and profiling them.
> Developers should learn within seconds if their new SQL statements are
> inefficient, not within days, or longer.
Hope the above leads you in the right direction. cheers!