List:Commits« Previous MessageNext Message »
From:Olav Sandstaa Date:May 5 2010 11:50am
Subject:Re: bzr commit into mysql-6.0-codebase-bugfixing branch
(Olav.Sandstaa:3791) Bug#42991
View as plain text  
Hi Evgeny and Jørgen,

On 03/18/10 11:35, Evgeny Potemkin wrote:
> Hi Olav,
>
> On 03/18/10 12:47, Olav Sandstaa wrote:
>> Hi Evgeny,
>>
>> Thanks for the review. Here are some responses and comment to your
>> comments.
>>
>> Evgeny Potemkin wrote:
>>> Hi Olav,
>>>
>>> There is few wrong things with your fix:
>>> 1) Generally speaking: with your check you effectively deny ICP at all
>>> if the used key has a partial keypart. It's better just to not allow
>>> fields for this keypart to be used, so this key part wouldn't be
>>> checked and no error will occur.
>>
>> Yes, this is correct. The implemented fix is based on the proposal given
>> by Sergey P in his comment to the bug:
>>
>> "Suggested solution right now is to disable index condition for indexes
>> that have partially-covered columns (check key_part->flag &
>> HA_PART_KEY_SEG for all key parts?)"
> Ok, but it's not a good way to fix it, IMO.
>>
>>> 2) Actually, as far as I can see from the code this ^^ never could
>>> happen because any field (BLOBs are among them) which gets
>>> KEY_PART_SEG flag can't be pushed due to part_of_key is never set on
>>> them. Thus, at least, the comment is wrong.
>>
>> I am not sure I understand this comment. I agree that the existing code
>> would prevent a "condition on a BLOB field" from being pushed. The
>> existing code does not prevent an INDEX containing a BLOB field from
>> being used for ICP. Which of my comments do you think is wrong?
> For a predicate to became a part of ICP condition it is required that 
> the field has the bit for the current index set in the part_of_key 
> bitmap.
> And this never done for BLOB fields. You can easily check it by yourself.
> The problem actually isn't in ICP itself. It's in the range access+ICP.
> The end_range variable is used for the range access method and it 
> seems that when ICP is enabled it is handled is a wrong manner.
> Because of all this I said that your cset comment is wrong.
>>
>>> 3) Your example doesn't hit ICP feature. You missed the thing that
>>> original id1 index has 2 fields - the first is gets pushed and the
>>> second is saved into end_range since the query uses range scan.
>>
>> I think it does. If I run an explain on the query before applying my
>> code change I get:
>>
>> explain SELECT * FROM t1 WHERE (c1 <= '6566-06-15' AND c2 <> 3);
>> id select_type table type possible_keys key key_len ref rows Extra
>> 1 SIMPLE t1 range PRIMARY,id1 id1 6 NULL 1 Using index condition; Using
>> where
>>
>> If I run explain on the query after applying the proposed fix I get:
>>
>> explain SELECT * FROM t1 WHERE (c1 <= '6566-06-15' AND c2 <> 3);
>> id select_type table type possible_keys key key_len ref rows Extra
>> 1 SIMPLE t1 range PRIMARY,id1 id1 6 NULL 1 Using where
>>
>> which shows that my change does as intended for this query by disabling
>> the use of ICP for it. And yes, I agree it will be good to include an
>> explain in the test case but it would not show that ICP is in use given
>> that the fix disables ICP.
> I haven't checked why it prints "Using index condition", but when I 
> run your example on the unmodified server the function that actually 
> check the pushed condition (index_cond_func_xxx) is never called. 
> Probably there is another bug - it prints that ICP is used while 
> actually it isn't.
>>
>>> 4) But even after this modification, as far as I can see, the key
>>> being compared is limited to the index length. This means that in your
>>> example end_range will compare key only with '6566', not with full
>>> constant. This is for MyISAM.
>>
>> Yes. this is correct. The problem is "what" does it compare the
>> end_range to? Ie. what does it compare '6566' against?
>>
>> Short answer:
>>
>> * In InnoDB: we compare end_range against "freed memory" :-( (and thus
>> we have this valgrind issue reported in this bug report)
>> * MyISAM: we compare end_range against the BLOB value from "the last
>> previous record" that was read - and it ALWAYS(?) evaluate to TRUE :-)
>>
>> More details follows:
>>
>> The ICP code that does the end_range check and the check of pushed
>> condition is the following (from ha_innodb.cc):
>>
>> static uint index_cond_func_innodb(void *arg)
>> {
>> ha_innobase *h= (ha_innobase*)arg;
>> if (h->end_range)
>> {
>> if (h->compare_key2(h->end_range) > 0)
>> return 2; /* caller should return HA_ERR_END_OF_FILE already */
>> }
>> return test(h->pushed_idx_cond->val_int());
>> }
>>
>> When this code is executed we have read all non-BLOB fields from the
>> index entry. So in case of the test which has an index containing the
>> primary key (int) and a prefix of a TINYTEXT column we will have only
>> read the int field, not the BLOB prefix (see SergeyP's comment in bug
>> report: "we don't ever try decoding blob columns from index images of
>> their prefixes"). So the evaluation of the end range has "nothing to
>> compare against" (h->compare_key2(h->end_range)) and leads InnoDB into
>> reading "free memory".
> Ok.
>>
>>> Suggestions:
>>> AFAIU, this feature was developed first for MyISAM and then ported to
>>> InnoDB.
>>> It seems it's safe to check end_range in MyISAM, and it's not for 
>>> InnoDB.
>>
>> Yes, it is "safe" in MyISAM but "unsafe" in InnoDB mainly because MyISAM
>> does not delete the BLOB value it last read (so it is there) while
>> InnoDB frees the space used for storing the BLOB value when starting
>> reading the next record. The ICP code in MyISAM is very similar to what
>> is in InnoDB. The ICP code in MyISAM does not either seem to read prefix
>> of BLOB columns from index entries. Instead it uses whatever value the
>> BLOB field of the previously fully read record had and evaluates the
>> end_range against this (which hopefully always evaluates to true?).
>>
>>> But since range scans over BLOBs are allowed in InnoDB (AFAIK) it
>>> seems that the bug is that end_range isn't correctly done.
>>> Thus, I suggest you to analyze how InnoDB scans BLOBs and adjust ICP
>>> to do it properly.
>>
>> So your suggestions is that we should extend the ICP code in InnoDB to
>> decode/read the BLOB prefix from the index entry and use this for
>> evaluating the end_range? (since as commented by SergeyP: "we don't ever
>> try decoding blob columns from index images of their prefixes".)
>>
>> This is probably the most correct solution to this problem since then we
>> will be able to both do ICP evaluation and the end_range check using the
>> index entry. I think it is doable but it will possibly require more work
>> than just a simple bug fix. For instance the current administrative data
>> structure use for InnoDB is currently only able to have one entry for
>> each field of the record. Doing this we must extend this to both being
>> able to have "two entries" for fields that are different in the index
>> entry and the actual record.
>>
>> Other alternatives are:
>>
>> -(alt 2): change InnoDB so that it does not free the memory used for
>> storing the BLOB value read for the last read record (and let the
>> end_range test use this). This will be similar to how MyISAM is doing it
>> (I think). I do not consider this as a good solution.
>> -(alt 3): change the ICP code so that it skips the end_range evaluation
>> in case it is a BLOB field (a prefix value). This is possibly an OK
>> solution since the correct end_range evaluation is done later anyway.
> This one seems to have best effect/efforts ratio. Let's try it.
> But again, it would be better to not just disable end_range at all but 
> only strip BLOB fields from it, and leave the original end_range 
> condition for
> later check.
>> -(alt 4): and there is of course the current patch which disables ICP
>> for the case where the index contains a partial field to avoid this
>> problem (the initial solution suggested by SergeyP).
>>
>> Do you have any opinion about which to choose? Or do you have other
>> proposals?
> Could you spend some time to investigate whether it is that hard to 
> unpack
> blob fields from indexes? It would be really nice to have.
> I have an (unsupported) opinion that it's not too hard because this is 
> done
> for the range scan and we probably just need to copy-paste some code.

I did some investigation of how InnoDB is handling blob fields when 
doing a range scan. I did not get all details and might have overlooked 
something but my impression is that InnoDB does:

   * for locating the start of a range scan it converts the "start 
position blob value" from "mysql format" to "innodb format" and uses 
this for navigating the index structure.

   * for locating the end range position it evaluates the "end position 
blob value" after it has read the BLOB from the base record.

  I have now made a new patch that extends the existing ICP code in 
InnoDB to read out prefixes of BLOB fields from index entries and use 
this for the end-range evaluation in the ICP code path. The new patch is 
available here:

    http://lists.mysql.com/commits/107499

Please have a look at this and see if you think this is a better 
approach then in the previous patch.

Thanks,
Olav

Thread
bzr commit into mysql-6.0-codebase-bugfixing branch (Olav.Sandstaa:3791)Bug#42991Olav.Sandstaa5 Mar
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(Olav.Sandstaa:3791) Bug#42991Evgeny Potemkin10 Mar
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(Olav.Sandstaa:3791) Bug#42991Olav Sandstaa18 Mar
      • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(Olav.Sandstaa:3791) Bug#42991Evgeny Potemkin18 Mar
        • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(Olav.Sandstaa:3791) Bug#42991Olav Sandstaa18 Mar
          • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(Olav.Sandstaa:3791) Bug#42991Jørgen Løland19 Mar
        • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(Olav.Sandstaa:3791) Bug#42991Jørgen Løland19 Mar
        • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(Olav.Sandstaa:3791) Bug#42991Olav Sandstaa5 May