List:General Discussion« Previous MessageNext Message »
From:Shawn Green Date:July 11 2012 7:37pm
Subject:Re: Composite Index Usage in Joins
View as plain text  
On 7/10/2012 5:50 PM, Jeffrey Grollo wrote:
> Hi,
>
> I’m attempting to optimize a join and am having a difficult time using
> multiple columns from a composite index. The second column of the composite
> key is being used when tested for equality, but not for IN or BETWEEN
> criteria.
>
> As an example, say that I’m searching two tables: portfolio and trades. The
> portfolio table contains a list of security IDs. The trades table keeps
> tracks of the price and time when I’ve traded securities in my portfolio.
> Tables are:
>
> CREATE TABLE portfolio (
>    sec_id bigint(20) NOT NULL AUTO_INCREMENT,
>    name char(10) NOT NULL,
>    PRIMARY KEY (sec_id)
> ) ENGINE=InnoDB ;
>
> CREATE TABLE trades (
>    tx_id bigint(20) NOT NULL AUTO_INCREMENT,
>    sec_id bigint(20) NOT NULL,
>    trade_time datetime NOT NULL,
>    price int NOT NULL,
>    PRIMARY KEY (tx_id),
>    KEY sec_time (sec_id, trade_time)
> ) ENGINE=InnoDB ;
>
> If I query the trades table directly both columns of the composite index
> "sec_time" will be used when I'm using a range criteria on the trade_time
> column:
>
>
> mysql> explain select price from trades force index(sec_time)
>      -> where sec_id IN (1, 2) and trade_time BETWEEN '2012-07-01' AND
> '2012-07-04';
>
> +----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+
> | id | select_type | table  | type  | possible_keys | key      | key_len |
> ref  | rows | Extra       |
>
> +----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+
> |  1 | SIMPLE      | trades | range | sec_time      | sec_time | 16      |
> NULL |    2 | Using where |
>
> +----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+
>
>
> If I introduce a join to retrieve all trades for my portfolio, the entire
> index will continue to be used if I make trade_time a constant:
>
> mysql> explain select price from portfolio p inner join trades t force
> index(sec_time) on p.sec_id = t.sec_id
>      -> where trade_time = '2012-07-01';
>
> +----+-------------+-------+-------+---------------+----------+---------+------------------------+------+-------------+
> | id | select_type | table | type  | possible_keys | key      | key_len |
> ref                    | rows | Extra       |
>
> +----+-------------+-------+-------+---------------+----------+---------+------------------------+------+-------------+
> |  1 | SIMPLE      | p     | index | PRIMARY       | PRIMARY  | 8       |
> NULL                   |    1 | Using index |
> |  1 | SIMPLE      | t     | ref   | sec_time      | sec_time | 16      |
> vantage.p.sec_id,const |    1 |             |
>
> +----+-------------+-------+-------+---------------+----------+---------+------------------------+------+-------------+
>
>
> However, if I expand the trade_time search (either using IN or BETWEEN),
> only the sec_id column of the composite query is used:
>
>
> mysql> explain select price from portfolio p inner join trades t force
> index(sec_time) on p.sec_id = t.sec_id
>      -> where trade_time IN ('2012-07-01', '2012-07-02');
>
> +----+-------------+-------+-------+---------------+----------+---------+------------------+------+-------------+
> | id | select_type | table | type  | possible_keys | key      | key_len |
> ref              | rows | Extra       |
>
> +----+-------------+-------+-------+---------------+----------+---------+------------------+------+-------------+
> |  1 | SIMPLE      | p     | index | PRIMARY       | PRIMARY  | 8       |
> NULL             |    1 | Using index |
> |  1 | SIMPLE      | t     | ref   | sec_time      | sec_time | 8       |
> vantage.p.sec_id |    1 | Using where |
>
> +----+-------------+-------+-------+---------------+----------+---------+------------------+------+-------------+
>
>
> My expectation is that MySQL would be able to use both columns of the
> sec_time index, but I've been unable to find either confirmation of
> refutation of that assumption. If MySQL cannot optimize a join in this
> case, is there another approach to optimizing this query that I should
> pursue? Essentially, my trades table may contain many historical records
> and pulling the entire history of trades for each security would produce a
> much larger result set than would be retrieved if the trade_time criteria
> was applied in the index reference.
>
> I'm using MySQL 5.5.11.
>
> Thanks for any guidance,
> Jeff
>

Indexes are stored as b-TREE structures. For InnoDB tables, the leaf 
nodes of the tree will either contain the PRIMARY KEY of the table or a 
6-byte hidden value that acts as a row identifier.

The tree is structured so that the key values are parsed together and a 
balanced binary tree is built that represents the range of values on the 
table. For example, the index entry for one of your rows could be 
"1_2012-07-01". At each level above the leaves, you have a node that 
lists the first and last elements of the range of leaves beneath it. For 
example an intermediate node may have "1_2009-01-01" and "1_2009-10-15" 
to represent that those are the values within that branch of the tree.

Ranged scans only happen for the last portion of an index being used. OR 
queries (or those using IN) can also only be applied to the last part of 
an index search. This means that if you are using IN (or OR) against the 
first part of an index, that's where the usage of the index stops. The 
rest of the conditions are evaluated during the WHERE processing phase 
of the query.

Also, trying to force index usage may be creating more work for your 
disks than necessary. An indexed lookup is a random access to a portion 
of a file. Should that need to come directly off the disk, then that 
lookup will be about 3x more expensive than a scan (because of the 
various head positioning maneuvers required). The thumbrule is that if 
more than about 30% of a table needs to be randomly located, then switch 
to a full scan. It saves a lot of time.

Additional information about how indexes are used and abused during 
queries is located in the Optimization chapter in the manual:
http://dev.mysql.com/doc/refman/5.1/en/optimization.html

I suggest you start here and work your way out:
http://dev.mysql.com/doc/refman/5.1/en/optimization-indexes.html

Best wishes,
-- 
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN


Thread
Composite Index Usage in JoinsJeffrey Grollo10 Jul
  • Re: Composite Index Usage in JoinsShawn Green11 Jul
    • Re: Composite Index Usage in JoinsJeffrey Grollo12 Jul
  • Re: Composite Index Usage in JoinsSergei Petrunia11 Jul
    • Re: Composite Index Usage in JoinsJeffrey Grollo12 Jul
  • RE: Composite Index Usage in JoinsRick James13 Jul