Sergei Golubchik <serg@stripped> wrote on 05/21/2009 12:08:23 PM:
> Hi, Timothy!
>
> On May 20, Timothy P Clark wrote:
> > Sergei Golubchik <serg@stripped> wrote on 05/20/2009 02:44:22 PM:
> > > On May 20, Timothy P Clark wrote:
> > >
> > > Just trying to clarify.
> > >
> > > Are you saying that for LIKE "ABC%" you get "ABC\FF" in
> > > records_in_range() even in character sets where \FF is not a valid
> > > character ?
> > Yes, that is what I see.
> >
> > create table t2 (c char(10) collate cp932_ci, index(c));
> > insert into t2 values("a"),("b"),("c");
> > select * from t2 force index (c) where c like "a%";
> >
> > records_in_range shows a min_key of 0x0061000000...
> > and a max_key of 0x0061FFFFFFF...
> >
> > But iconv doesn't think that 0xFF is a valid character in cp932, and
> > from
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
> > I don't see any definition for either 0xFF or 0xFFFF in cp932.
> >
> > > If yes - it must be a bug, both range ends must be valid strings in
> > > their character set.
>
> I've looked in ctype-cp932.c - and indeed, it uses 0xFF as
> max_sort_char, but it's not a valid cp932 character.
>
> Tim, it's a bug - please don't implement workaround for it, we have to
> fix it in MySQL. I've just submitted this problem as a Bug#45012
Great! Thanks, Sergei.
I believe other character sets are also affected (Big5, euckr, etc.), and
I'll add those to the bug report once I have them verified.
Thanks,
Tim Clark