List:Internals« Previous MessageNext Message »
From:Timothy P Clark Date:May 20 2009 7:25pm
Subject:re: character fields in records_in_range
View as plain text  
Hi Monty,

Thanks for your response!

> Hi!
> 
> >>>>> "Timothy" == Timothy P Clark <Timothy> writes:
> 
> Timothy> Hi,
> Timothy> We're trying to figure out the best way to determine a key 
field's 
> Timothy> non-padded length in records_in_range().
> 
> Timothy> Our storage engine (IBM DB2) stores key values in a format that 
is 
> Timothy> different from MySQL's. One consequence of this is that we 
> need to be able 
> Timothy> to determine in handler::records_in_range() the true length
> of a padded 
> Timothy> character field in a key. An example of a query which 
> generates a padded 
> Timothy> character field in the key would be: select * from t1 wherekeyA 
like 
> Timothy> "ABC%". For this query, we need to be able to know in 
> records_in_range() 
> Timothy> that there are exactly 3 characters in the key describing 
> the lower bound. 
> Timothy> As far as I can tell, this information is not available 
> explicitly in any 
> Timothy> data structure available to records_in_range().
> 
> Timothy> Therefore, we need to analyze the key field data to 
> determine the length. 
> Timothy> This appears simple for single-byte or double-byte 
> character sets, because 
> Timothy> we can use a function like memchr() to search for the first
> occurrence of 
> Timothy> the CHARSET_INFO::min_sort_char in the field. This is what we 
are 
> Timothy> currently doing. However, from looking at the various 
> my_like_range_* 
> Timothy> functions in ./strings/ it appears that this strategy 
> cannot be applied to 
> Timothy> multi-byte character sets, such as cp932 or utf8. In fact, 
> it looks like 
> Timothy> we almost have to reverse-engineer each implementation of 
> my_like_range to 
> Timothy> produce a complementary function that can determine when 
> padding starts. 
> Timothy> Obviously, we would prefer a solution that is more 
> maintainable and more 
> Timothy> easily implemented. Do we have any options?
> 
> records_in_range is meant to ask how many keys there is between two
> different keys.
> 
> There shouldn't be a different answer if the user would ask for
> 
> 'LIKE "ABC%"' or BETWEEN "ABC\0" AND "ABC\FF"
> 
> How MySQL works internally is that it converts the first version to
> the second format, which are idenital if the \0 and \FF are choosen
> correctly.

My original post probably wasn't clear, but it's the second format that is 
giving us trouble.  In some cases, that \FF (i.e. max_sort_char) is not a 
valid (i.e. convertable) character for a given character set (e.g. cp932), 
and this causes DB2 to complain when the \FF comes through 
records_in_range. Since the storage engine is just a bridge between MySQL 
and DB2, we need to find a way to make the data look clean to DB2 for the 
existing interfaces into the index support. If we can tell DB2 that 
(key="ABC", length=3), it is smart enough to give us back the estimate 
that MySQL expects for ("ABC\0\0\0..." <= key <= "ABC\FF\FF\FF..."). The 
hard part in records_in_range is figuring out that the length is 3. Thus 
my original post. 

> 
> I fear, that if you try to solve the LIKE case in the way you describe
> it, you may break the second case.

Unforunately, I think it is already broken for some multi-byte character 
sets.

> 
> Regards,
> Monty
> 
> For information of MariaDB, the community developed server based on
> source code from MySQL, check out www.askmonty.org

Thank you,

Tim Clark
Thread
character fields in records_in_rangeTimothy P Clark15 May
  • Re: character fields in records_in_rangeSergei Golubchik18 May
    • Re: character fields in records_in_rangeTimothy P Clark18 May
      • Re: character fields in records_in_rangeSergei Golubchik19 May
        • Re: character fields in records_in_rangeTimothy P Clark20 May
          • Re: character fields in records_in_rangeSergei Golubchik20 May
            • Re: character fields in records_in_rangeTimothy P Clark21 May
  • re: character fields in records_in_rangeMichael Widenius19 May
    • re: character fields in records_in_rangeTimothy P Clark20 May
      • Re: re: character fields in records_in_rangeSergei Golubchik20 May
        • Re: re: character fields in records_in_rangeTimothy P Clark20 May
          • Re: re: character fields in records_in_rangeSergei Golubchik21 May
            • Re: re: character fields in records_in_rangeTimothy P Clark21 May