Let's look at "select * from foo where a=1 and b = 1" in multiple contexts:
(I am assuming there an index containing both a and b.)
* Neither a nor b is clustered: index_merge could be the most efficient way to perform
the query... Find all PKs for a=1, find all PKs for b=1, take the intersection; then
fetch the rows.
* One of a or b is clustered: As mentioned, use the clustered key only. The other key
buys nothing.
* Both a and b are clustered: Again, the other key is useless; simply use the clustered
key.
Are these the directions you are headed?
Rick James
MySQL Geeks - Consulting & Review
> -----Original Message-----
> From: Zardosht Kasheff [mailto:zardosht@stripped]
> Sent: Friday, July 10, 2009 6:26 AM
> To: Sergey Petrunya
> Cc: internals@stripped; serg@stripped
> Subject: Re: help with index_merge and clustering keys
>
> Hello Sergey,
>
> Thank you very much for taking the time to look into this and provide
> feedback. I am trying the suggestion as we speak.
>
> I want to answer several questions you have below.
>
> 1) Yes, you are correct in understanding the clustering keys. Just
> like InnoDB has an "invisible tail" including all of the columns that
> does not show up in table->key_info[index_no].key_parts, so do
> clustering keys.
> 2) Of the proposed solutions below, I was hoping to implement #2 over
> time in a piece-wise fashion. I realize that the entire solution is
> difficult, so I was hoping to figure out pieces of the optimizer.
> There must be cases in the optimizer that deal with the fact that
> InnoDB's primary key is clustered. I was hoping to extend that logic
> to deal with clustering keys.
>
> For the internals list, I also have a question of my own about the
> optimizer. Suppose I have an InnoDB table:
> create table foo (a int, b int, c int, d int ..., primary key (a),
> key(b), key(c))engine = InnoDB
>
> Now suppose I have the query "select * from foo where a=1 and b = 1
> and c = 1 and d = 1".
>
> The general way to answer this query is to use the keys to retrieve
> all rows were a, b, and c are all 1, and then filter the rows where d
> = 1. There seem to be two ways to retrieve all rows where a, b, and c
> are all 1. One is to do an intersect of b=1 and c=1, and then filter
> the rows retrieved where a != 1. Another way is to do a range query on
> the clustered primary key where a=1, and filter the rows where b != 1
> and c != 1. (There are also bad query plans where we can retrieve all
> rows where b=1 and filter those a !=1 and c != 1).
>
> Where is the logic that makes that determines these specific options?
> Unless I do not understand the code correctly, it does not seem to be
> in get_best_ror_intersect.
>
> Thanks
> -Zardosht
>
> On Fri, Jul 10, 2009 at 5:39 AM, Sergey
> Petrunya<psergey@stripped> wrote:
> > Hi!
> >
> > On Thu, Jul 09, 2009 at 12:06:03PM -0400, Zardosht Kasheff wrote:
> >> Does anyone have any pointers on any functions to look at
> for a hint
> >> of where to look here?
> >>
> >> For the example query, select * from foo where a = 1 and b
> = 1, if we
> >> have a primary key of a, the optimizer knows that an
> intersect is not
> >> necessary. Where is this code so that I may try to extend
> that logic
> >> to clustering keys?
> >
> > Do I understand it correctly (from reading
> >
> http://blogs.tokutek.com/tokuview/introducing_multiple_cluster
> ing_indexes) that
> > each key with HA_CLUSTERED_KEY flag has an invisble 'tail'
> which has all table
> > columns that were not explicitly specified in the key?
> >
> > And the said 'tail' is not present in
> table->key_info[index_no].key_parts
> > array? (like it is for InnoDB's clustered primary key)
> >
> > In that case, I see three possible solutions:
> >
> > Solution1: Make the 'tail' be shown in
> table->key_info[index_no].key_parts.
> > Then the optimizer won't need to care about clustered keys
> having some
> > implicitly-present columns at the end.
> >
> > Solution2: Make the range/index_merge optimizer be aware of
> the implicitly-
> > present 'tail' so that it uses tail members as regular key
> parts (one of
> > the things the range optimizer will need will be the order
> of elements in
> > the 'tail').
> >
> >
> > #1 and #2 will require relatively big changes/reviews etc,
> although they will
> > provide complete solutions (the fact that the key is
> clustered will be used to
> > the fullest extent).
> >
> > If you're not after the broadest solution and just need to
> make the optimizer
> > not to construct index_merge/interesect(..., clustered_key,
> ...), here is patch
> > that should do exactly that:
> >
> > Solution3:
> > === modified file 'sql/opt_range.cc'
> > --- sql/opt_range.cc 2009-03-24 13:58:52 +0000
> > +++ sql/opt_range.cc 2009-07-10 09:22:49 +0000
> > @@ -4535,8 +4535,16 @@ TRP_ROR_INTERSECT *get_best_ror_intersec
> > for (idx= 0, cur_ror_scan= tree->ror_scans; idx <
> param->keys; idx++)
> > {
> > ROR_SCAN_INFO *scan;
> > - if (!tree->ror_scans_map.is_set(idx))
> > + /*
> > + Ignore non-ROR scans or 'clustered' (in Tokutek's
> sense) keys.
> > + */
> > + uint keyno= param->real_keynr[idx];
> > + if (!tree->ror_scans_map.is_set(idx) ||
> > + (keyno != cpk_no &&
> > + param->table->key_info[keyno].flags &
> HA_HA_CLUSTERED_KEY))
> > + {
> > continue;
> > + }
> > if (!(scan= make_ror_scan(param, idx, tree->keys[idx])))
> > return NULL;
> > if (param->real_keynr[idx] == cpk_no)
> >
> >
> > (about the special treatment of clustered primary key in
> index_merge/intersect: it is
> > included in index_merge but it's not really a scan. We get
> columns of clustered primary
> > key from secondary key scans, and then use clustered
> primary key scan condition
> > only for filtering)
> >
> > BR
> > Sergey
> > --
> > Sergey Petrunia, Software Developer
> > Monty Program AB, http://askmonty.org
> > Blog: http://s.petrunia.net/blog
> >
>
> --
> MySQL Internals Mailing List
> For list archives: http://lists.mysql.com/internals
> To unsubscribe:
> http://lists.mysql.com/internals?unsub=1
>
>