Thanks for your response!
> >>>>> "Timothy" == Timothy P Clark <Timothy> writes:
> Timothy> Hi,
> Timothy> We're trying to figure out the best way to determine a key
> Timothy> non-padded length in records_in_range().
> Timothy> Our storage engine (IBM DB2) stores key values in a format that
> Timothy> different from MySQL's. One consequence of this is that we
> need to be able
> Timothy> to determine in handler::records_in_range() the true length
> of a padded
> Timothy> character field in a key. An example of a query which
> generates a padded
> Timothy> character field in the key would be: select * from t1 wherekeyA
> Timothy> "ABC%". For this query, we need to be able to know in
> Timothy> that there are exactly 3 characters in the key describing
> the lower bound.
> Timothy> As far as I can tell, this information is not available
> explicitly in any
> Timothy> data structure available to records_in_range().
> Timothy> Therefore, we need to analyze the key field data to
> determine the length.
> Timothy> This appears simple for single-byte or double-byte
> character sets, because
> Timothy> we can use a function like memchr() to search for the first
> occurrence of
> Timothy> the CHARSET_INFO::min_sort_char in the field. This is what we
> Timothy> currently doing. However, from looking at the various
> Timothy> functions in ./strings/ it appears that this strategy
> cannot be applied to
> Timothy> multi-byte character sets, such as cp932 or utf8. In fact,
> it looks like
> Timothy> we almost have to reverse-engineer each implementation of
> my_like_range to
> Timothy> produce a complementary function that can determine when
> padding starts.
> Timothy> Obviously, we would prefer a solution that is more
> maintainable and more
> Timothy> easily implemented. Do we have any options?
> records_in_range is meant to ask how many keys there is between two
> different keys.
> There shouldn't be a different answer if the user would ask for
> 'LIKE "ABC%"' or BETWEEN "ABC\0" AND "ABC\FF"
> How MySQL works internally is that it converts the first version to
> the second format, which are idenital if the \0 and \FF are choosen
My original post probably wasn't clear, but it's the second format that is
giving us trouble. In some cases, that \FF (i.e. max_sort_char) is not a
valid (i.e. convertable) character for a given character set (e.g. cp932),
and this causes DB2 to complain when the \FF comes through
records_in_range. Since the storage engine is just a bridge between MySQL
and DB2, we need to find a way to make the data look clean to DB2 for the
existing interfaces into the index support. If we can tell DB2 that
(key="ABC", length=3), it is smart enough to give us back the estimate
that MySQL expects for ("ABC\0\0\0..." <= key <= "ABC\FF\FF\FF..."). The
hard part in records_in_range is figuring out that the length is 3. Thus
my original post.
> I fear, that if you try to solve the LIKE case in the way you describe
> it, you may break the second case.
Unforunately, I think it is already broken for some multi-byte character
> For information of MariaDB, the community developed server based on
> source code from MySQL, check out www.askmonty.org