From: Zardosht Kasheff Date: July 10 2009 1:25pm Subject: Re: help with index_merge and clustering keys List-Archive: http://lists.mysql.com/internals/37166 Message-Id: <2f9663ba0907100625y1a584d2bwc8ac2eb06f5839b9@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello Sergey, Thank you very much for taking the time to look into this and provide feedback. I am trying the suggestion as we speak. I want to answer several questions you have below. 1) Yes, you are correct in understanding the clustering keys. Just like InnoDB has an "invisible tail" including all of the columns that does not show up in table->key_info[index_no].key_parts, so do clustering keys. 2) Of the proposed solutions below, I was hoping to implement #2 over time in a piece-wise fashion. I realize that the entire solution is difficult, so I was hoping to figure out pieces of the optimizer. There must be cases in the optimizer that deal with the fact that InnoDB's primary key is clustered. I was hoping to extend that logic to deal with clustering keys. For the internals list, I also have a question of my own about the optimizer. Suppose I have an InnoDB table: create table foo (a int, b int, c int, d int ..., primary key (a), key(b), key(c))engine =3D InnoDB Now suppose I have the query "select * from foo where a=3D1 and b =3D 1 and c =3D 1 and d =3D 1". The general way to answer this query is to use the keys to retrieve all rows were a, b, and c are all 1, and then filter the rows where d =3D 1. There seem to be two ways to retrieve all rows where a, b, and c are all 1. One is to do an intersect of b=3D1 and c=3D1, and then filter the rows retrieved where a !=3D 1. Another way is to do a range query on the clustered primary key where a=3D1, and filter the rows where b !=3D 1 and c !=3D 1. (There are also bad query plans where we can retrieve all rows where b=3D1 and filter those a !=3D1 and c !=3D 1). Where is the logic that makes that determines these specific options? Unless I do not understand the code correctly, it does not seem to be in get_best_ror_intersect. Thanks -Zardosht On Fri, Jul 10, 2009 at 5:39 AM, Sergey Petrunya wrot= e: > Hi! > > On Thu, Jul 09, 2009 at 12:06:03PM -0400, Zardosht Kasheff wrote: >> Does anyone have any pointers on any functions to look at for a hint >> of where to look here? >> >> For the example query, select * from foo where a =3D 1 and b =3D 1, if w= e >> have a primary key of a, the optimizer knows that an intersect is not >> necessary. Where is this code so that I may try to extend that logic >> to clustering keys? > > Do I understand it correctly (from reading > http://blogs.tokutek.com/tokuview/introducing_multiple_clustering_indexes= ) that > each key with HA_CLUSTERED_KEY flag has an invisble 'tail' which has all = table > columns that were not explicitly specified in the key? > > And the said 'tail' is not =A0present in table->key_info[index_no].key_pa= rts > array? (like it is for InnoDB's clustered primary key) > > In that case, I see three possible solutions: > > Solution1: Make the 'tail' be shown in table->key_info[index_no].key_part= s. > Then the optimizer won't need to care about clustered keys having some > implicitly-present columns at the end. > > Solution2: Make the range/index_merge optimizer be aware of the implicitl= y- > present 'tail' so that it uses tail members as regular key parts (one of > the things the range optimizer will need will be the order of elements in > the 'tail'). > > > #1 and #2 will require relatively big changes/reviews etc, although they = will > provide complete solutions (the fact that the key is clustered will be us= ed to > the fullest extent). > > If you're not after the broadest solution and just need to make the optim= izer > not to construct index_merge/interesect(..., clustered_key, ...), here is= patch > that should do exactly that: > > Solution3: > =3D=3D=3D modified file 'sql/opt_range.cc' > --- sql/opt_range.cc =A0 =A02009-03-24 13:58:52 +0000 > +++ sql/opt_range.cc =A0 =A02009-07-10 09:22:49 +0000 > @@ -4535,8 +4535,16 @@ TRP_ROR_INTERSECT *get_best_ror_intersec > =A0 for (idx=3D 0, cur_ror_scan=3D tree->ror_scans; idx < param->keys; id= x++) > =A0 { > =A0 =A0 ROR_SCAN_INFO *scan; > - =A0 =A0if (!tree->ror_scans_map.is_set(idx)) > + =A0 =A0/* > + =A0 =A0 =A0Ignore non-ROR scans or 'clustered' (in Tokutek's sense) key= s. > + =A0 =A0*/ > + =A0 =A0uint keyno=3D param->real_keynr[idx]; > + =A0 =A0if (!tree->ror_scans_map.is_set(idx) || > + =A0 =A0 =A0 =A0(keyno !=3D cpk_no && > + =A0 =A0 =A0 =A0 param->table->key_info[keyno].flags & HA_HA_CLUSTERED_K= EY)) > + =A0 =A0{ > =A0 =A0 =A0 continue; > + =A0 =A0} > =A0 =A0 if (!(scan=3D make_ror_scan(param, idx, tree->keys[idx]))) > =A0 =A0 =A0 return NULL; > =A0 =A0 if (param->real_keynr[idx] =3D=3D cpk_no) > > > (about the special treatment of clustered primary key in index_merge/inte= rsect: it is > included in index_merge but it's not really a scan. We get columns of clu= stered primary > key from secondary key scans, and then use clustered primary key scan con= dition > only for filtering) > > BR > =A0Sergey > -- > Sergey Petrunia, Software Developer > Monty Program AB, http://askmonty.org > Blog: http://s.petrunia.net/blog >