Emmanuel van der Meulen wrote:
> On Monday, June 07, 2004 19:09, Michael Stassen wrote;
>
>>This doesn't quite make sense. You seem to say that several rows
>>will match but then you say only one will. It must be one or the
>>other. Perhaps I've misunderstood you.
>
> Firstly, Michael thank you for your assistance; I'll elaborate.
>
> Michael I'll attempt first without supplying the table definitions and
> actual sql and etc..
>
> Table 'a' contains a property which does not match any property on table 'b'
> directly, but matches within a range.
>
> Example;
>
> table 'a' property 'num'
>
> num
> ---
> 1000
> 2000
> 3000
>
> table 'b' properties 'fromNum' and 'toNum'
> fromNum toNum
> row 1. 1 1500
> row 2. 1501 2000
> row 3. 2001 4000
>
> select... where 'num' between 'fromNum' and 'toNum'.
>
> As seen in this example, using between only one row actually matches,
> however the query engine would at first see more than one matching row i.e.,
What do you mean by "at first"?
> with 'fromNum' (between is same as num >= fromNum), value 2000 from table a
> would match row 1 & 2 on table b; and with 'toNum' (between is same as num
> <= toNum), value 2000 from table a would match row 2 & 3 on table b; only
> once the range is taken together the result matches one row, viz., row 2 on
> table b.
This analysis is not quite right. You seem to be saying that rows which
match half of the condition will be retrieved, then filtered according to
the other half of the condition, but that's not how it works. If it did,
BETWEEN would always be slow, but BETWEEN can be very fast, if the range is
a constant and the column is indexed. Assuming indexes on num, fromNum, and
toNum, you should be able to verify that
SELECT * FROM table_a WHERE num BETWEEN 1501 AND 2000;
SELECT * FROM table_b WHERE fromNum BETWEEN 1 AND 3000;
SELECT * FROM table_b WHERE toNum BETWEEN 1500 AND 4000;
are all very quick. In your case, "num BETWEEN fromNum AND toNum" is the
same as "num >= fromNum AND num <= toNum". There is only one matching row.
The problem is finding it.
To find which rows in table b match, we have to look at each row and compare
the values of num, fromNum, and toNum. Indexes won't help much here,
because each column's index is unlikely to narrow the list of possible
matches much. If the optimizer could put 2 indexes together, it might see
that fromNum matches from row 23 up, while toNum matches from rows 23 down,
so row 23 is it, but the optimizer doesn't combine indexes -- it chooses the
best (most restrictive) one. Even then, it only uses the index if it
restricts us to no more than 30% of the rows. Furthermore, the value of num
changes with each row of table a. The result is an unavoidable full table
scan on table b.
In other words, the problem is not, strictly speaking, the BETWEEN.
> Now please remember table b has 1.4 million rows as in this example, with
> fromNum and toNum running consecutively, so the query takes 4 seconds to
> find a row in table b. With say 200 rows in table a, that means the query
> runs for a long time.
That's 4 seconds to scan all of table b, rather than 4 seconds to find one
row. A subtle but important distinction.
Is that right? You have 1.4 million ranges? Wow! Up till now, I've been
assuming you had a lot of data in table a to be assigned one of a relatively
few ranges from table b. Apparently, I've been picturing it backwards.
I'm not sure there's a way to improve this specific query, but there may be
another way to accomplish the same thing which works better. It's hard to
say what that might be without a better picture of what you're doing than I
currently have. Perhaps if you described your data someone could make a
suggestion. I'm still assuming table b must be static, right? Otherwise
you couldn't guarantee uniqueness of ranges, I think. On the other hand,
different rows in table a could be in the same range (match the same row of
table b), right?
> What I did was to use limit 1, and ran query with 1 row, this took .01
> second.
I assume by "ran query with 1 row" you mean one row from table a. Now num
is effectively a constant, and LIMIT 1 short circuits the full table scan as
soon as one match is found.
>>You also seem to imply that with BETWEEN you get a full
>>table scan even though there is only one match for each row. That sounds
>>like an indexing problem, but it is hard to say without more information.
>
> Indexed individually on;
> fromNum
> toNum
>
> Also experimented by adding combining index on;
> fromNum/toNum
>
> Either way no difference, query runs 4 seconds.
>
>>It would help us help you if you at least posted the query and the results
>>of EXPLAIN. It would probably also help if you told us more about the
>>tables, perhaps with SHOW CREATE TABLE.
>
> If above does not help you, I'll bring the whole lot to the post.
Well, the more info, the better. At least describe the data a little more.
Depending on exactly what you want, there may be a way to change the
process to make this work at an acceptable speed.
> Michael, again thank you for you assistance.
>
> Kind regards
> Emmanuel
Michael