Sorry for the delayed reply; I have been on a vacation for a week and
are slowly catching up with all old emails.
>>>>> "Antony" == Antony T Curtis <antony@stripped> writes:
Antony> Thimble Smith wrote:
>> On Wed, Feb 21, 2001 at 12:59:43PM +0000, Antony T Curtis wrote:
>> > In the case where there is a SELECT on a table where no index is
>> > directly applicable, if there is a compound index where the second part
>> > (or third, etc) of the index is used in the WHERE clause, it should be
>> > possible to still use that index by being able to skip rows rather than
>> > doing a brute-force search through all the records.
>> > Any hints where I should look to see if such an optimisation could be
>> > implemented?
>> It isn't clear to me how that would work. Do you mean, to scan
>> the whole index in order to find rows where the secondary index
>> part matches, instead of scanning the data file? Or do you have
>> something else in mind? I think this optimization wouldn't work,
>> but I'm not sure I understand what you're suggesting.
Antony> The kind of thing it'd do is something like this...
Antony> If the key-part to scan is the (n)th key part of the index,
Antony> 1. Seek index to start
Antony> 2. Read key from index
Antony> 3. Check (n)th key part against end-range, if less or equal, goto 6
Antony> 4. Increment (n-1)th key part and seek to greater/equal key
Antony> 5. goto stage 2
Antony> 6. Check (n)th key part against start-range, if greater/equal, goto xx
Antony> 7. Set (n)th key part to start-range and seek to greather/equal key
Antony> 8. goto 2
Antony> 9. (the key should point to valid row in range)
Antony> 10. Seek to next key
Antony> 11. if not end of index, goto 6
The problem is that the above will only work if a major part of the
keys are identical in the first n-1 parts. (You would guess that at
least 1/3 of the keys must be identical per key block for this to
work, which is not a very common situation).
The major problem is that when it comes to scan indexes compared to
scan a fixed length data file, index scanning is at least 3 times
slower than data file scanning (for B-trees). The reason is that
there is much more seeks involved when scanning the index structure
and also because the index structure is more complicated than the data
file structure, especially when we talk about compressed indexes.
The above is at least true for MyISAM tables where the data file is
stored separately from the index.
If you have a database structure where the data is stored in the leafs
of the trees (like Berkeley DB), the above optimization could be faster
as there is as much seeks involved in scanning the index as scanning
Antony> As an aside, IBM DB2 does not allow two indexes to exist where the two
Antony> keys have the same components but are in different orders,
Antony> Something like the following will fail in DB2.
>> CREATE TABLE db.test (
>> COL1 CHAR(10) NOT NULL,
>> COL2 CHAR(10) NOT NULL,
>> INDEX (COL1, COL2),
>> INDEX (COL2, COL1)
Antony> I suggest that their product does something similar to the above.
I can't see any reason why the above should fail; For example, if you
don't allow the above, you can't optimize a wide range of queries,
SELECT * from test where col1 >= "A" and col1 <= "B"
SELECT * from test where col2 >= "A" and col2 <= "B"
SELECT * from test where col1 >= "A" and col2 >= "E"
SELECT * from test where col2 >= "A" and col1 >= "E"
SELECT * from test ORDER BY col1,col2;
SELECT * from test ORDER BY col2,col1;
To optimize a major part of the above queries, you must have both
indexes! To not allow one to do this is only relevant if you are
using some kind of index type where you can't do range or order by
As MySQL keeps a note about which indexes can be used to satisfy the
WHERE and the SELECT part, it would not be hard to do this
optimization for the handlers where the data is stored in the trees
(Currently only BDB tables).
The general optimization, where you allow any operator on the key
parts is easy to do:
- Add a handler function:
and add in sql_select.cc two new functions:
join_read_next_unique() and init_join_read_next_unique()
that could be based on join_read_next() and init_join_read_next()
in this case you need to call handler::read_next() as long as the
WHERE clause was satisfied and read_next_unique() when the WHERE
clause wasn't satisfied.
The general optimization to handle cases like:
SELECT * from test where key_part2>= "A" and key_part2 <="B" or
key_part2 >="D" or key_part2 <="E"
is much harder to do; In this case we could theoretically skip to
next key_part1 key when: we have found a key (#, "F"), but to do this
one would have to do a lot of new code, in the style of opt_range.cc