I'm running into some performance problems with a table of time
intervals. I'd like to look up the record that covers/overlaps a
given instant, and I was hoping that someone might help me out.
Consider these tables:
create table items (
item_id integer auto_increment not null,
item_name varchar(40),
primary key(item_id)
);
create table price_intervals (
item_id integer not null,
beginning datetime not null,
end datetime not null,
price decimal(6,2) not null,
primary key(item_id, beginning, end)
);
Items contains a list of 100,000 items. Price_intervals contains a
list of 1,000,000 item prices and the intervals during which they are
valid. The intervals over which they are valid are continuous and of
varying lengths.
Given a point in time, I'd like to be able to look up a price for each
of the items at that moment. To do this, I'll need to know the record
in the price_intervals table that begins most recently before my
sample point.
Since I need to do this performantly, I'd ideally like to do it using
an index in O(1) time.
My initial attempt at this was the following:
SELECT item_name, price from
items join price_intervals on
(items.item_id = price_intervals.item_id)
WHERE
price_intervals.beginning <= '11-01-01T00:00:00' AND
price_intervals.end > '11-01-01T00:00:00';
This query works, but it only uses the item_id portion of the
price_interval primary key, and it ends up scanning through all of the
1,000,000 price_intervals for each journey (this sort of makes sense
since the 'less than' and 'greater than' can't be combined on the same
index).
(explain plan)
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: price_intervals
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 1000000
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: items
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.price_intervals.item_id
rows: 1
Extra:
I thought I could optimize it with this query:
SELECT item_name, price from
items join price_intervals on
(items.item_id = price_intervals.item_id)
WHERE
price_intervals.beginning =
(SELECT
max(beginning) from
price_intervals
WHERE
price_intervals.item_id = items.item_id and
price_intervals.beginning <= '11-01-01T00:00:00');
It turns out, however, that mysql doesn't seem to use the index on
(item_id, beginning) on this query -- I would expect it to roll
backward through this index to get the MAX() value -- instead, the
subquery visits all of the price_interval records for the item; this
makes it signifcantly slower than the first.
(explain plan)
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: items
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 100000
Extra:
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: price_intervals
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 12
ref: test.items.item_id,func
rows: 1
Extra: Using where
*************************** 3. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: price_intervals
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.items.item_id
rows: 10000
Extra: Using where; Using index
I'm curious how others have handled problems like this -- I know mysql
doesn't have much in the way of time interval support, but are their
performant workaround people have developed? In particular, is there
a way to get the predecessor of a record in an indexed column in O(1)
time? I think this would be what I need to locate the band of a
continuous time series that corresponds to a sample.
Thanks very much for any insights,
Dylan
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/