List:Internals« Previous MessageNext Message »
From:Zardosht Kasheff Date:June 18 2009 6:28pm
Subject:query optimization and multiple clustering keys
View as plain text  
Hello,

Our storage engine, TokuDB, allows multiple keys to be clustering. For
context, an explanation of the feature is here:
http://blogs.tokutek.com/tokuview/introducing_multiple_clustering_indexes

What I am trying to figure out is what changes need to be done to the
query optimizer to properly support this feature.

Currently, this is what we do. I have filed a feature request and
submitted a patch in MySQL bug #45458
(http://bugs.mysql.com/bug.php?id=45458). The submitted patch makes
the necessary grammar changes and provides a flag to the storage
engine that says if a key is clustering or not. In our storage engine,
we reimplement handler::read_time to properly estimate the cost of
reading rows from a clustered key (similar to what InnoDB does in
ha_innobase::read_time when the key is the primary key). This helps
the optimizer know when to use the clustering key.

What I am wondering is the following: what else needs to be done to
the optimizer to use clustering keys properly?
For example, I am concerned that there will be unnecessary calls to
handler::position and handler::rnd_pos because the optimizer does not
know it is just faster to get all of the data out of the clustered
key.

Here is my failed attempt at answer the question:
I looked at places where handler::primary_key_is_clustered is called.
I figured that if there are locations where the optimizer needs to
know if the primary key is clustered, they may also be places where
the optimizer needs to know if another key is clustered. I only saw
one potential location where this might be necessary:

in sql_select.cc, in function test_if_skip_sort_order,
   we have :            bool is_covering= table->covering_keys.is_set(nr) ||
                          nr == table->s->primary_key &&
	                  table->file->primary_key_is_clustered();

I figure that this will also need to take into account if the key is
clustered, correct?


So, in conclusion, given everything above, I have one general question
and one specific question.
The general question: can anyone think of any other place where the
optimizer needs to be tweaked to be aware of clustered keys?
The specific question: does any tweaking need to happen in
test_if_skip_sort_order

I plan on submitting any patch derived from this mail thread to the
bug report for #45458

Thanks
-Zardosht
Thread
query optimization and multiple clustering keysZardosht Kasheff18 Jun
  • Re: query optimization and multiple clustering keysSergei Golubchik4 Jul