Our storage engine, TokuDB, allows multiple keys to be clustering
(include all columns in the index). For context, an short explanation
of the feature is here:
To get the optimizer to properly know when to use a clustering key, we
reimplemented handler::read_time to return a better estimate in the
case of clustering keys (similar to what InnoDB does in
ha_innobase::read_time for the primary key). This helps the optimizer
in most cases.
The problem I have now run into is getting index_merge to work
properly. Because MySQL does not know that an arbitrary key can be
clustering, it will proceed to use it like a normal key, and this can
lead to a sub-optimal query plan.
Suppose we have the following query:
select * from foo where a=1 and b=1;
If we have a normal secondary key on "a" and a key on "b", then MySQL
gets the values from the keys, does an intersect, and then uses
handler::rnd_pos to retrieve the entire row. This is the right thing
to do here.
Unfortunately, at the moment, if we have clustering keys on "a" and
"b", it does the same thing. This is not the ideal behavior. We want
the optimizer to select either the key on "a" or "b" (suppose we
select "a"), retrieve all of the rows where "a = 1", and from this
result set, filter out the rows where "b != 1". This is what the
optimizer would do if "a" were the primary key.
Does anyone have any idea how one could get this to properly?