List:Internals« Previous MessageNext Message »
From:Zardosht Kasheff Date:July 10 2009 8:20pm
Subject:Re: help with index_merge and clustering keys
View as plain text  
Yes, with a minor modification. If both a and b are clustered, use the
key that is more selective (as reported by handler::records_in_range)

Thanks
-Zardosht

On Fri, Jul 10, 2009 at 4:17 PM, Rick James<rjames@stripped> wrote:
> Let's look at "select * from foo where a=1 and b = 1" in multiple contexts:
>
> (I am assuming there an index containing both a and b.)
>
> * Neither a nor b is clustered:  index_merge could be the most efficient way to
> perform the query...  Find all PKs for a=1, find all PKs for b=1, take the
> intersection; then fetch the rows.
>
> * One of a or b is clustered:  As mentioned, use the clustered key only.
>  The other key buys nothing.
>
> * Both a and b are clustered:  Again, the other key is useless; simply use the
> clustered key.
>
> Are these the directions you are headed?
>
>
> Rick James
> MySQL Geeks - Consulting & Review
>
>
>
>> -----Original Message-----
>> From: Zardosht Kasheff [mailto:zardosht@stripped]
>> Sent: Friday, July 10, 2009 6:26 AM
>> To: Sergey Petrunya
>> Cc: internals@stripped; serg@stripped
>> Subject: Re: help with index_merge and clustering keys
>>
>> Hello Sergey,
>>
>> Thank you very much for taking the time to look into this and provide
>> feedback. I am trying the suggestion as we speak.
>>
>> I want to answer several questions you have below.
>>
>> 1) Yes, you are correct in understanding the clustering keys. Just
>> like InnoDB has an "invisible tail" including all of the columns that
>> does not show up in table->key_info[index_no].key_parts, so do
>> clustering keys.
>> 2) Of the proposed solutions below, I was hoping to implement #2 over
>> time in a piece-wise fashion. I realize that the entire solution is
>> difficult, so I was hoping to figure out pieces of the optimizer.
>> There must be cases in the optimizer that deal with the fact that
>> InnoDB's primary key is clustered. I was hoping to extend that logic
>> to deal with clustering keys.
>>
>> For the internals list, I also have a question of my own about the
>> optimizer. Suppose I have an InnoDB table:
>> create table foo (a int, b int, c int, d int ..., primary key (a),
>> key(b), key(c))engine = InnoDB
>>
>> Now suppose I have the query "select * from foo where a=1 and b = 1
>> and c = 1 and d = 1".
>>
>> The general way to answer this query is to use the keys to retrieve
>> all rows were a, b, and c are all 1, and then filter the rows where d
>> = 1. There seem to be two ways to retrieve all rows where a, b, and c
>> are all 1. One is to do an intersect of b=1 and c=1, and then filter
>> the rows retrieved where a != 1. Another way is to do a range query on
>> the clustered primary key where a=1, and filter the rows where b != 1
>> and c != 1. (There are also bad query plans where we can retrieve all
>> rows where b=1 and filter those a !=1 and c != 1).
>>
>> Where is the logic that makes that determines these specific options?
>> Unless I do not understand the code correctly, it does not seem to be
>> in  get_best_ror_intersect.
>>
>> Thanks
>> -Zardosht
>>
>> On Fri, Jul 10, 2009 at 5:39 AM, Sergey
>> Petrunya<psergey@stripped> wrote:
>> > Hi!
>> >
>> > On Thu, Jul 09, 2009 at 12:06:03PM -0400, Zardosht Kasheff wrote:
>> >> Does anyone have any pointers on any functions to look at
>> for a hint
>> >> of where to look here?
>> >>
>> >> For the example query, select * from foo where a = 1 and b
>> = 1, if we
>> >> have a primary key of a, the optimizer knows that an
>> intersect is not
>> >> necessary. Where is this code so that I may try to extend
>> that logic
>> >> to clustering keys?
>> >
>> > Do I understand it correctly (from reading
>> >
>> http://blogs.tokutek.com/tokuview/introducing_multiple_cluster
>> ing_indexes) that
>> > each key with HA_CLUSTERED_KEY flag has an invisble 'tail'
>> which has all table
>> > columns that were not explicitly specified in the key?
>> >
>> > And the said 'tail' is not  present in
>> table->key_info[index_no].key_parts
>> > array? (like it is for InnoDB's clustered primary key)
>> >
>> > In that case, I see three possible solutions:
>> >
>> > Solution1: Make the 'tail' be shown in
>> table->key_info[index_no].key_parts.
>> > Then the optimizer won't need to care about clustered keys
>> having some
>> > implicitly-present columns at the end.
>> >
>> > Solution2: Make the range/index_merge optimizer be aware of
>> the implicitly-
>> > present 'tail' so that it uses tail members as regular key
>> parts (one of
>> > the things the range optimizer will need will be the order
>> of elements in
>> > the 'tail').
>> >
>> >
>> > #1 and #2 will require relatively big changes/reviews etc,
>> although they will
>> > provide complete solutions (the fact that the key is
>> clustered will be used to
>> > the fullest extent).
>> >
>> > If you're not after the broadest solution and just need to
>> make the optimizer
>> > not to construct index_merge/interesect(..., clustered_key,
>> ...), here is patch
>> > that should do exactly that:
>> >
>> > Solution3:
>> > === modified file 'sql/opt_range.cc'
>> > --- sql/opt_range.cc    2009-03-24 13:58:52 +0000
>> > +++ sql/opt_range.cc    2009-07-10 09:22:49 +0000
>> > @@ -4535,8 +4535,16 @@ TRP_ROR_INTERSECT *get_best_ror_intersec
>> >   for (idx= 0, cur_ror_scan= tree->ror_scans; idx <
>> param->keys; idx++)
>> >   {
>> >     ROR_SCAN_INFO *scan;
>> > -    if (!tree->ror_scans_map.is_set(idx))
>> > +    /*
>> > +      Ignore non-ROR scans or 'clustered' (in Tokutek's
>> sense) keys.
>> > +    */
>> > +    uint keyno= param->real_keynr[idx];
>> > +    if (!tree->ror_scans_map.is_set(idx) ||
>> > +        (keyno != cpk_no &&
>> > +         param->table->key_info[keyno].flags
> &
>> HA_HA_CLUSTERED_KEY))
>> > +    {
>> >       continue;
>> > +    }
>> >     if (!(scan= make_ror_scan(param, idx, tree->keys[idx])))
>> >       return NULL;
>> >     if (param->real_keynr[idx] == cpk_no)
>> >
>> >
>> > (about the special treatment of clustered primary key in
>> index_merge/intersect: it is
>> > included in index_merge but it's not really a scan. We get
>> columns of clustered primary
>> > key from secondary key scans, and then use clustered
>> primary key scan condition
>> > only for filtering)
>> >
>> > BR
>> >  Sergey
>> > --
>> > Sergey Petrunia, Software Developer
>> > Monty Program AB, http://askmonty.org
>> > Blog: http://s.petrunia.net/blog
>> >
>>
>> --
>> MySQL Internals Mailing List
>> For list archives: http://lists.mysql.com/internals
>> To unsubscribe:
>> http://lists.mysql.com/internals?unsub=1
>>
>>
>
Thread
help with index_merge and clustering keysZardosht Kasheff23 Jun
  • Re: help with index_merge and clustering keysZardosht Kasheff9 Jul
    • Re: help with index_merge and clustering keysSergey Petrunya10 Jul
      • Re: help with index_merge and clustering keysZardosht Kasheff10 Jul
        • RE: help with index_merge and clustering keysRick James10 Jul
          • RE: help with index_merge and clustering keysMichael Widenius16 Jul
            • Re: help with index_merge and clustering keysZardosht Kasheff16 Jul
            • RE: help with index_merge and clustering keysRick James16 Jul
              • RE: help with index_merge and clustering keysMichael Widenius19 Jul
        • Re: help with index_merge and clustering keysSergey Petrunya12 Jul
          • Re: help with index_merge and clustering keysZardosht Kasheff12 Jul
            • Re: help with index_merge and clustering keysSergey Petrunya15 Jul
              • RE: help with index_merge and clustering keysRick James15 Jul
              • Re: help with index_merge and clustering keysZardosht Kasheff24 Jul
      • Re: help with index_merge and clustering keysMichael Widenius16 Jul
Re: help with index_merge and clustering keysZardosht Kasheff10 Jul
Re: help with index_merge and clustering keysZardosht Kasheff16 Jul
  • RE: help with index_merge and clustering keysRick James16 Jul
    • Re: help with index_merge and clustering keysZardosht Kasheff16 Jul