List:General Discussion« Previous MessageNext Message »
From:FFF FFF Date:December 4 2004 3:35am
Subject:Performance and indexing for time intervals...
View as plain text  
I'm running into some performance problems with a table of time
intervals.  I'd like to look up the record that covers/overlaps a
given instant, and I was hoping that someone might help me out.

Consider these tables:

create table items (
                item_id integer auto_increment not null,
                item_name varchar(40),
                primary key(item_id)
);

create table price_intervals (
                item_id integer not null,
                beginning datetime not null,
                end datetime not null,
                price decimal(6,2) not null,
                primary key(item_id, beginning, end)
);

Items contains a list of 100,000 items.  Price_intervals contains a
list of 1,000,000 item prices and the intervals during which they are
valid.  The intervals over which they are valid are continuous and of
varying lengths.

Given a point in time, I'd like to be able to look up a price for each
of the items at that moment.  To do this, I'll need to know the record
in the price_intervals table that begins most recently before my
sample point.

Since I need to do this performantly, I'd ideally like to do it using
an index in O(1) time.

My initial attempt at this was the following:

SELECT item_name, price from
       items join price_intervals on
         (items.item_id = price_intervals.item_id)
WHERE
       price_intervals.beginning <= '11-01-01T00:00:00' AND
       price_intervals.end > '11-01-01T00:00:00';

This query works, but it only uses the item_id portion of the
price_interval primary key, and it ends up scanning through all of the
1,000,000 price_intervals for each journey (this sort of makes sense
since the 'less than' and 'greater than' can't be combined on the same
index).

(explain plan)
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: price_intervals
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 1000000
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: items
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.price_intervals.item_id
rows: 1
Extra:


I thought I could optimize it with this query:

SELECT item_name, price from
       items join price_intervals on
         (items.item_id = price_intervals.item_id)
WHERE
       price_intervals.beginning =
   (SELECT
    max(beginning) from
    price_intervals
   WHERE
   price_intervals.item_id = items.item_id and
   price_intervals.beginning <= '11-01-01T00:00:00');

It turns out, however, that mysql doesn't seem to use the index on
(item_id, beginning) on this query -- I would expect it to roll
backward through this index to get the MAX() value -- instead, the
subquery visits all of the price_interval records for the item; this
makes it signifcantly slower than the first.

(explain plan)
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: items
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 100000
Extra:
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: price_intervals
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 12
ref: test.items.item_id,func
rows: 1
Extra: Using where
*************************** 3. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: price_intervals
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.items.item_id
rows: 10000
Extra: Using where; Using index

I'm curious how others have handled problems like this -- I know mysql
doesn't have much in the way of time interval support, but are their
performant workaround people have developed?  In particular, is there
a way to get the predecessor of a record in an indexed column in O(1)
time?  I think this would be what I need to locate the band of a
continuous time series that corresponds to a sample.

Thanks very much for any insights,

Dylan

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Thread
Performance and indexing for time intervals...FFF FFF4 Dec
  • Re: Performance and indexing for time intervals...Dan Nelson4 Dec