Hi Monty,
Thanks for your response!
> Hi!
>
> >>>>> "Timothy" == Timothy P Clark <Timothy> writes:
>
> Timothy> Hi,
> Timothy> We're trying to figure out the best way to determine a key
field's
> Timothy> non-padded length in records_in_range().
>
> Timothy> Our storage engine (IBM DB2) stores key values in a format that
is
> Timothy> different from MySQL's. One consequence of this is that we
> need to be able
> Timothy> to determine in handler::records_in_range() the true length
> of a padded
> Timothy> character field in a key. An example of a query which
> generates a padded
> Timothy> character field in the key would be: select * from t1 wherekeyA
like
> Timothy> "ABC%". For this query, we need to be able to know in
> records_in_range()
> Timothy> that there are exactly 3 characters in the key describing
> the lower bound.
> Timothy> As far as I can tell, this information is not available
> explicitly in any
> Timothy> data structure available to records_in_range().
>
> Timothy> Therefore, we need to analyze the key field data to
> determine the length.
> Timothy> This appears simple for single-byte or double-byte
> character sets, because
> Timothy> we can use a function like memchr() to search for the first
> occurrence of
> Timothy> the CHARSET_INFO::min_sort_char in the field. This is what we
are
> Timothy> currently doing. However, from looking at the various
> my_like_range_*
> Timothy> functions in ./strings/ it appears that this strategy
> cannot be applied to
> Timothy> multi-byte character sets, such as cp932 or utf8. In fact,
> it looks like
> Timothy> we almost have to reverse-engineer each implementation of
> my_like_range to
> Timothy> produce a complementary function that can determine when
> padding starts.
> Timothy> Obviously, we would prefer a solution that is more
> maintainable and more
> Timothy> easily implemented. Do we have any options?
>
> records_in_range is meant to ask how many keys there is between two
> different keys.
>
> There shouldn't be a different answer if the user would ask for
>
> 'LIKE "ABC%"' or BETWEEN "ABC\0" AND "ABC\FF"
>
> How MySQL works internally is that it converts the first version to
> the second format, which are idenital if the \0 and \FF are choosen
> correctly.
My original post probably wasn't clear, but it's the second format that is
giving us trouble. In some cases, that \FF (i.e. max_sort_char) is not a
valid (i.e. convertable) character for a given character set (e.g. cp932),
and this causes DB2 to complain when the \FF comes through
records_in_range. Since the storage engine is just a bridge between MySQL
and DB2, we need to find a way to make the data look clean to DB2 for the
existing interfaces into the index support. If we can tell DB2 that
(key="ABC", length=3), it is smart enough to give us back the estimate
that MySQL expects for ("ABC\0\0\0..." <= key <= "ABC\FF\FF\FF..."). The
hard part in records_in_range is figuring out that the length is 3. Thus
my original post.
>
> I fear, that if you try to solve the LIKE case in the way you describe
> it, you may break the second case.
Unforunately, I think it is already broken for some multi-byte character
sets.
>
> Regards,
> Monty
>
> For information of MariaDB, the community developed server based on
> source code from MySQL, check out www.askmonty.org
Thank you,
Tim Clark