From: Michael Widenius Date: May 19 2009 11:06am Subject: re: character fields in records_in_range List-Archive: http://lists.mysql.com/internals/36755 Message-Id: <18962.37573.619039.230021@narttu.askmonty.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi! >>>>> "Timothy" == Timothy P Clark writes: Timothy> Hi, Timothy> We're trying to figure out the best way to determine a key field's Timothy> non-padded length in records_in_range(). Timothy> Our storage engine (IBM DB2) stores key values in a format that is Timothy> different from MySQL's. One consequence of this is that we need to be able Timothy> to determine in handler::records_in_range() the true length of a padded Timothy> character field in a key. An example of a query which generates a padded Timothy> character field in the key would be: select * from t1 where keyA like Timothy> "ABC%". For this query, we need to be able to know in records_in_range() Timothy> that there are exactly 3 characters in the key describing the lower bound. Timothy> As far as I can tell, this information is not available explicitly in any Timothy> data structure available to records_in_range(). Timothy> Therefore, we need to analyze the key field data to determine the length. Timothy> This appears simple for single-byte or double-byte character sets, because Timothy> we can use a function like memchr() to search for the first occurrence of Timothy> the CHARSET_INFO::min_sort_char in the field. This is what we are Timothy> currently doing. However, from looking at the various my_like_range_* Timothy> functions in ./strings/ it appears that this strategy cannot be applied to Timothy> multi-byte character sets, such as cp932 or utf8. In fact, it looks like Timothy> we almost have to reverse-engineer each implementation of my_like_range to Timothy> produce a complementary function that can determine when padding starts. Timothy> Obviously, we would prefer a solution that is more maintainable and more Timothy> easily implemented. Do we have any options? records_in_range is meant to ask how many keys there is between two different keys. There shouldn't be a different answer if the user would ask for 'LIKE "ABC%"' or BETWEEN "ABC\0" AND "ABC\FF" How MySQL works internally is that it converts the first version to the second format, which are idenital if the \0 and \FF are choosen correctly. I fear, that if you try to solve the LIKE case in the way you describe it, you may break the second case. Regards, Monty For information of MariaDB, the community developed server based on source code from MySQL, check out www.askmonty.org