Hi Sergei,
Sergei Golubchik <serg@stripped> wrote on 05/20/2009 03:22:11 PM:
> Hi, Timothy!
>
> On May 20, Timothy P Clark wrote:
> > >
> > > You should have problems with ORDER BY and GROUP BY, unless you
> > > pretend that your indexes are not ordered. You'll be able to do no
> > > ranges at all. Even equalities aren't guaranteed to work, I'm
> > > afraid.
> > >
> > > Do ORDER BY and, say, BETWEEN ranges work ? If yes - LIKE should
> > > work automatically. If they are not, I think that fixing LIKE is
> > > wrong, it's just patching up the sympthoms.
> > Yes, these do work, but that is because the actual I/O operations
(i.e.
> > index_read) run through a much different set of interfaces than
> > records_in_range.
>
> You should be able to trigger records_in_range() also without LIKE -
> with BETWEEN or inequalities.
OK, right. I misunderstood your question to mean "do queries with
BETWEEN/ORDER BY/GROUP BY/etc. return correct *results*?" Which they do,
and for all character sets.
But I think that you're asking whether records_in_range gives reasonable
*estimates* in those cases with the problematic character sets. I don't
have a definitive answer, but in general I believe that it should. A basic
SELECT ... WHERE col BETWEEN "AB" AND "CD" will obtain a good estimate
from records_in_range(). That's because the key values are padded with
blanks and contain entirely valid data. On the other hand, LIKE (and the
equivalent BETWEEN) introduces the padding with min_sort_char and
max_sort_char, and, as I've indicated, it's those specific values that can
cause a problem.
However, I think I've got a reasonably simple solution to all of this, and
it looks to be working with some preliminary tests. Instead of trying to
calculate the non-padded length of the key and sending a single key into
DB2, we'll honor the min/max format that MySQL is using. But when iconv
encounters an invalid character, we'll just fill the remainder of the
output buffer with the appropriate min/max value for the destination
character set.