List:Internals« Previous MessageNext Message »
From:Jay Pipes Date:January 3 2008 2:40pm
Subject:Re: what should an assert_efficient_sql check for?
View as plain text  
Hi!  Comments inline...

Phlip wrote:
> MySQL internalists:
> 
> I want to write an assertion (for ActiveRecord, but that's not important 
> here) which traps production SQL SELECT statements, calls EXPLAIN on 
> each one, then parses their results, looking for tell-tail issues. (The 
> goal is we wrap target code in assert_efficient_sql, then if a future 
> upgrade makes the code inefficient, the assertion catches the problem 
> before the clients do.)
> 
> Here's an example of my query failing:
> 
>   def test_assert_inefficient_sql
>     assert_raise_message Test::Unit::AssertionFailedError,
>                         /Pessimistic.*Accessory Load.*ALL/m do
>       assert_efficient_sql do
>         Accessory.find_by_sql('select * from accessories, rings
>                                 where accessories.id = rings.id')
>                              #  noooo! they don't link like that!
>       end
>     end
>   end
> 
> That assertion would print this diagnostic (edited for width):
> 
> pessimistic query Accessory Load!
> select * from accessories, rings where accessories.id = rings.id
>  select_type | key_len | type   | rows | table       | ref             | 
> key
> -------------------------------------------------------------------------------- 
> 
>  SIMPLE      |         | ALL    | 1    | rings       |                 |
>  SIMPLE      | 4       | eq_ref | 1    | accessories | testdb.rings.id | 
> PRIMARY
> 
> The assertion failed, semantically, because this application has no good 
> reason to JOIN accessories and rings, so they have no index in common.

Seems to me that rings and accessories *would* have an index in common, 
as shown above: the rings.id column is related to the accessories.id 
column, no?

> Syntactically, it fails because it has a type of ALL. (I heard somewhere 
> that reading every row in a table, just to find one record, is bad DB 
> Karma.)

It has an ALL because there is no WHERE condition supplied on an indexed 
column, no because of the join.

> The relevant source code, inside the assertion, just goes 'ALL' == 
> explanation ['type'].

This is a good basic check for SQL inefficiency, as it checks for full 
table scans, which are indicators that an index is missing or a WHERE 
condition is not being passed.  However, there are *many* cases when 
full table scans are perfectly fine:

* SELECTing all of a lookup table
* Data warehousing scenarios (indexes are uncommon due to data-loading 
performance issues)
* Small tables -- the optimizer may simply do a full sequential table 
scan versus numerous index->data page lookups.

So, checking for inefficiencies can be a little tricky.  I think a 
better rule might be something along the lines of:

if (('ALL' == explanation['type'] &&
	10000 < explanation['rows'] ) ||
	explanation['Extra'].contains('Using filesort'))

or something to that effect (I don't know Ruby, that's just pseudo-code, 
but I think you get the idea...

Essentially, you will want to make sets of rules that you run the 
EXPLAIN plan through to check for common inefficiencies...

> The question now is what else can I load up inside the assertion? What 
> other checks should
> 
>   A> always,
>   B> frequently, or
>   C> opportunistically
> 
> pass in the response from EXPLAIN, to detect unneeded DB loading?

Well, you might want to check the diffs of SHOW SESSION STATUS variables 
also.  Significant increases in the following counter variables may show 
inefficient queries or indexing:

Created_tmp_tables
Created_tmp_disk_tables (worse than above...)
Table_locks_waited (occurs in badly written transaction code sometimes)
Sort_merge_passes
Select_full_join (occurs when joins are done across non-indexed columns 
or when a join condition uses a function on an indexed column)
Select_full_range_join (similar to above but for range conditions)

to name just a few... :)
Qcache_lowmem_prunes (Can occur when query cache gets very fragmented)


> (Please rank suggestions as listed, because a multi-tiered 
> assert_efficient_sql is more valuable than one that only targets A, such 
> as "ALL"!)
> 
> This assertion is for _unit_ tests, not customer acceptance tests or QA 
> tests, so I hope to bypass the time-honored method of loading up a 
> billion records, running them for 20 minutes, and profiling them. 
> Developers should learn within seconds if their new SQL statements are 
> inefficient, not within days, or longer.

Hope the above leads you in the right direction.  cheers!

Jay
Thread
what should an assert_efficient_sql check for?Phlip3 Jan
  • Re: what should an assert_efficient_sql check for?Jay Pipes3 Jan
    • Re: what should an assert_efficient_sql check for?Phlip3 Jan
    • Re: what should an assert_efficient_sql check for?Phlip3 Jan
      • Re: what should an assert_efficient_sql check for?Baron Schwartz3 Jan
        • Re: what should an assert_efficient_sql check for?Phlip3 Jan
          • Re: what should an assert_efficient_sql check for?Baron Schwartz3 Jan
      • Re: what should an assert_efficient_sql check for?Phlip3 Jan
Re: what should an assert_efficient_sql check for?Phlip4 Jan