List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:May 19 2009 11:06am
Subject:re: character fields in records_in_range
View as plain text  
Hi!

>>>>> "Timothy" == Timothy P Clark <Timothy> writes:

Timothy> Hi,
Timothy> We're trying to figure out the best way to determine a key field's 
Timothy> non-padded length in records_in_range().

Timothy> Our storage engine (IBM DB2) stores key values in a format that is 
Timothy> different from MySQL's. One consequence of this is that we need to be able 
Timothy> to determine in handler::records_in_range() the true length of a padded 
Timothy> character field in a key. An example of a query which generates a padded 
Timothy> character field in the key would be: select * from t1 where keyA like 
Timothy> "ABC%". For this query, we need to be able to know in records_in_range() 
Timothy> that there are exactly 3 characters in the key describing the lower bound. 
Timothy> As far as I can tell, this information is not available explicitly in any 
Timothy> data structure available to records_in_range().

Timothy> Therefore, we need to analyze the key field data to determine the length. 
Timothy> This appears simple for single-byte or double-byte character sets, because 
Timothy> we can use a function like memchr() to search for the first occurrence of 
Timothy> the CHARSET_INFO::min_sort_char in the field. This is what we are 
Timothy> currently doing. However, from looking at the various my_like_range_* 
Timothy> functions in ./strings/ it appears that this strategy cannot be applied to 
Timothy> multi-byte character sets, such as cp932 or utf8. In fact, it looks like 
Timothy> we almost have to reverse-engineer each implementation of my_like_range to 
Timothy> produce a complementary function that can determine when padding starts. 
Timothy> Obviously, we would prefer a solution that is more maintainable and more 
Timothy> easily implemented. Do we have any options?

records_in_range is meant to ask how many keys there is between two
different keys.

There shouldn't be a different answer if the user would ask for

'LIKE "ABC%"' or BETWEEN "ABC\0" AND "ABC\FF"

How MySQL works internally is that it converts the first version to
the second format, which are idenital if the \0 and \FF are choosen
correctly.

I fear, that if you try to solve the LIKE case in the way you describe
it, you may break the second case.

Regards,
Monty

For information of MariaDB, the community developed server based on
source code from MySQL, check out www.askmonty.org
Thread
character fields in records_in_rangeTimothy P Clark15 May
  • Re: character fields in records_in_rangeSergei Golubchik18 May
    • Re: character fields in records_in_rangeTimothy P Clark18 May
      • Re: character fields in records_in_rangeSergei Golubchik19 May
        • Re: character fields in records_in_rangeTimothy P Clark20 May
          • Re: character fields in records_in_rangeSergei Golubchik20 May
            • Re: character fields in records_in_rangeTimothy P Clark21 May
  • re: character fields in records_in_rangeMichael Widenius19 May
    • re: character fields in records_in_rangeTimothy P Clark20 May
      • Re: re: character fields in records_in_rangeSergei Golubchik20 May
        • Re: re: character fields in records_in_rangeTimothy P Clark20 May
          • Re: re: character fields in records_in_rangeSergei Golubchik21 May
            • Re: re: character fields in records_in_rangeTimothy P Clark21 May