From: Zardosht Kasheff Date: June 18 2009 6:28pm Subject: query optimization and multiple clustering keys List-Archive: http://lists.mysql.com/internals/36930 Message-Id: <2f9663ba0906181128k48c52a73u9b5b7202bf6962d1@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello, Our storage engine, TokuDB, allows multiple keys to be clustering. For context, an explanation of the feature is here: http://blogs.tokutek.com/tokuview/introducing_multiple_clustering_indexes What I am trying to figure out is what changes need to be done to the query optimizer to properly support this feature. Currently, this is what we do. I have filed a feature request and submitted a patch in MySQL bug #45458 (http://bugs.mysql.com/bug.php?id=45458). The submitted patch makes the necessary grammar changes and provides a flag to the storage engine that says if a key is clustering or not. In our storage engine, we reimplement handler::read_time to properly estimate the cost of reading rows from a clustered key (similar to what InnoDB does in ha_innobase::read_time when the key is the primary key). This helps the optimizer know when to use the clustering key. What I am wondering is the following: what else needs to be done to the optimizer to use clustering keys properly? For example, I am concerned that there will be unnecessary calls to handler::position and handler::rnd_pos because the optimizer does not know it is just faster to get all of the data out of the clustered key. Here is my failed attempt at answer the question: I looked at places where handler::primary_key_is_clustered is called. I figured that if there are locations where the optimizer needs to know if the primary key is clustered, they may also be places where the optimizer needs to know if another key is clustered. I only saw one potential location where this might be necessary: in sql_select.cc, in function test_if_skip_sort_order, we have : bool is_covering= table->covering_keys.is_set(nr) || nr == table->s->primary_key && table->file->primary_key_is_clustered(); I figure that this will also need to take into account if the key is clustered, correct? So, in conclusion, given everything above, I have one general question and one specific question. The general question: can anyone think of any other place where the optimizer needs to be tweaked to be aware of clustered keys? The specific question: does any tweaking need to happen in test_if_skip_sort_order I plan on submitting any patch derived from this mail thread to the bug report for #45458 Thanks -Zardosht