From: Zardosht Kasheff Date: July 10 2009 8:20pm Subject: Re: help with index_merge and clustering keys List-Archive: http://lists.mysql.com/internals/37167 Message-Id: <2f9663ba0907101320i7768f436se32d3c5e71aec598@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Yes, with a minor modification. If both a and b are clustered, use the key that is more selective (as reported by handler::records_in_range) Thanks -Zardosht On Fri, Jul 10, 2009 at 4:17 PM, Rick James wrote: > Let's look at "select * from foo where a=3D1 and b =3D 1" in multiple con= texts: > > (I am assuming there an index containing both a and b.) > > * Neither a nor b is clustered: =A0index_merge could be the most efficien= t way to perform the query... =A0Find all PKs for a=3D1, find all PKs for b= =3D1, take the intersection; then fetch the rows. > > * One of a or b is clustered: =A0As mentioned, use the clustered key only= . =A0The other key buys nothing. > > * Both a and b are clustered: =A0Again, the other key is useless; simply = use the clustered key. > > Are these the directions you are headed? > > > Rick James > MySQL Geeks - Consulting & Review > > > >> -----Original Message----- >> From: Zardosht Kasheff [mailto:zardosht@stripped] >> Sent: Friday, July 10, 2009 6:26 AM >> To: Sergey Petrunya >> Cc: internals@stripped; serg@stripped >> Subject: Re: help with index_merge and clustering keys >> >> Hello Sergey, >> >> Thank you very much for taking the time to look into this and provide >> feedback. I am trying the suggestion as we speak. >> >> I want to answer several questions you have below. >> >> 1) Yes, you are correct in understanding the clustering keys. Just >> like InnoDB has an "invisible tail" including all of the columns that >> does not show up in table->key_info[index_no].key_parts, so do >> clustering keys. >> 2) Of the proposed solutions below, I was hoping to implement #2 over >> time in a piece-wise fashion. I realize that the entire solution is >> difficult, so I was hoping to figure out pieces of the optimizer. >> There must be cases in the optimizer that deal with the fact that >> InnoDB's primary key is clustered. I was hoping to extend that logic >> to deal with clustering keys. >> >> For the internals list, I also have a question of my own about the >> optimizer. Suppose I have an InnoDB table: >> create table foo (a int, b int, c int, d int ..., primary key (a), >> key(b), key(c))engine =3D InnoDB >> >> Now suppose I have the query "select * from foo where a=3D1 and b =3D 1 >> and c =3D 1 and d =3D 1". >> >> The general way to answer this query is to use the keys to retrieve >> all rows were a, b, and c are all 1, and then filter the rows where d >> =3D 1. There seem to be two ways to retrieve all rows where a, b, and c >> are all 1. One is to do an intersect of b=3D1 and c=3D1, and then filter >> the rows retrieved where a !=3D 1. Another way is to do a range query on >> the clustered primary key where a=3D1, and filter the rows where b !=3D = 1 >> and c !=3D 1. (There are also bad query plans where we can retrieve all >> rows where b=3D1 and filter those a !=3D1 and c !=3D 1). >> >> Where is the logic that makes that determines these specific options? >> Unless I do not understand the code correctly, it does not seem to be >> in =A0get_best_ror_intersect. >> >> Thanks >> -Zardosht >> >> On Fri, Jul 10, 2009 at 5:39 AM, Sergey >> Petrunya wrote: >> > Hi! >> > >> > On Thu, Jul 09, 2009 at 12:06:03PM -0400, Zardosht Kasheff wrote: >> >> Does anyone have any pointers on any functions to look at >> for a hint >> >> of where to look here? >> >> >> >> For the example query, select * from foo where a =3D 1 and b >> =3D 1, if we >> >> have a primary key of a, the optimizer knows that an >> intersect is not >> >> necessary. Where is this code so that I may try to extend >> that logic >> >> to clustering keys? >> > >> > Do I understand it correctly (from reading >> > >> http://blogs.tokutek.com/tokuview/introducing_multiple_cluster >> ing_indexes) that >> > each key with HA_CLUSTERED_KEY flag has an invisble 'tail' >> which has all table >> > columns that were not explicitly specified in the key? >> > >> > And the said 'tail' is not =A0present in >> table->key_info[index_no].key_parts >> > array? (like it is for InnoDB's clustered primary key) >> > >> > In that case, I see three possible solutions: >> > >> > Solution1: Make the 'tail' be shown in >> table->key_info[index_no].key_parts. >> > Then the optimizer won't need to care about clustered keys >> having some >> > implicitly-present columns at the end. >> > >> > Solution2: Make the range/index_merge optimizer be aware of >> the implicitly- >> > present 'tail' so that it uses tail members as regular key >> parts (one of >> > the things the range optimizer will need will be the order >> of elements in >> > the 'tail'). >> > >> > >> > #1 and #2 will require relatively big changes/reviews etc, >> although they will >> > provide complete solutions (the fact that the key is >> clustered will be used to >> > the fullest extent). >> > >> > If you're not after the broadest solution and just need to >> make the optimizer >> > not to construct index_merge/interesect(..., clustered_key, >> ...), here is patch >> > that should do exactly that: >> > >> > Solution3: >> > =3D=3D=3D modified file 'sql/opt_range.cc' >> > --- sql/opt_range.cc =A0 =A02009-03-24 13:58:52 +0000 >> > +++ sql/opt_range.cc =A0 =A02009-07-10 09:22:49 +0000 >> > @@ -4535,8 +4535,16 @@ TRP_ROR_INTERSECT *get_best_ror_intersec >> > =A0 for (idx=3D 0, cur_ror_scan=3D tree->ror_scans; idx < >> param->keys; idx++) >> > =A0 { >> > =A0 =A0 ROR_SCAN_INFO *scan; >> > - =A0 =A0if (!tree->ror_scans_map.is_set(idx)) >> > + =A0 =A0/* >> > + =A0 =A0 =A0Ignore non-ROR scans or 'clustered' (in Tokutek's >> sense) keys. >> > + =A0 =A0*/ >> > + =A0 =A0uint keyno=3D param->real_keynr[idx]; >> > + =A0 =A0if (!tree->ror_scans_map.is_set(idx) || >> > + =A0 =A0 =A0 =A0(keyno !=3D cpk_no && >> > + =A0 =A0 =A0 =A0 param->table->key_info[keyno].flags & >> HA_HA_CLUSTERED_KEY)) >> > + =A0 =A0{ >> > =A0 =A0 =A0 continue; >> > + =A0 =A0} >> > =A0 =A0 if (!(scan=3D make_ror_scan(param, idx, tree->keys[idx]))) >> > =A0 =A0 =A0 return NULL; >> > =A0 =A0 if (param->real_keynr[idx] =3D=3D cpk_no) >> > >> > >> > (about the special treatment of clustered primary key in >> index_merge/intersect: it is >> > included in index_merge but it's not really a scan. We get >> columns of clustered primary >> > key from secondary key scans, and then use clustered >> primary key scan condition >> > only for filtering) >> > >> > BR >> > =A0Sergey >> > -- >> > Sergey Petrunia, Software Developer >> > Monty Program AB, http://askmonty.org >> > Blog: http://s.petrunia.net/blog >> > >> >> -- >> MySQL Internals Mailing List >> For list archives: http://lists.mysql.com/internals >> To unsubscribe: >> http://lists.mysql.com/internals?unsub=3Drjames@stripped >> >> >