List:MySQL and Perl« Previous MessageNext Message »
From:Gaal Yahas Date:June 11 2004 11:41am
Subject:Re: [PATCH] Re: blessing db data as utf8
View as plain text  
On Fri, Jun 11, 2004 at 09:14:21AM +0100, Steve Hay wrote:
> I was concerned that searching for some sequence of bytes that make up a 
> UTF-8 character might accidentally match in the wrong place, like the 
> last byte of one character and the first byte of another.  Are you 
> implying that this can't ever happen because of how UTF-8 works?  I've 
> never really looked into the detail of the UTF-8 coding; I've just used 
> interfaces that manipulate it and took the view that the internals don't 
> really interest me (and shouldn't do, if I'm doing things properly).
> 
> If it's true, then it certainly does alleviate some of the pain.

Yes, utf8 is self-synchronizing. If both the needle and the haystack are
utf8, you don't get false positives even with non-utf8-aware strcmp-like
code.

> What about things like UPPER() and LOWER(), though?  Presumably they're 
> not going to work because they'll operate on bytes and completely screw 
> everything up?

True, that will not work.

> >of course, collating and therefore ORDER BY won't
> >work correctly either, and the sizes the database knows about will all
> >be in bytes instead of characters. Bothersome but not insurmountable. :)
> >
> I assume you mean pull the data into Perl, have the data correctly 
> flagged as UTF-8 there, and doing things like sorting in the Perl code?
> 
> I could live with that, but the UPPER()/LOWER() issue is more of a 
> problem.  I make a lot of use of them and it's not so easy to workaround 
> in the Perl.

You're right, but there's not much we can do about it until the database
supports utf8 natively.


Out of curiosity, where do you make use of this? Case-insensetive lookups
that preserve the original case?

Note that ORDER BY and UPPER()/LOWER() will continue to work on the subset
of your strings that happen to be ASCII, even if some of your data is not.

-- 
Gaal Yahas <gaal@stripped>
http://gaal.livejournal.com/
Thread
blessing db data as utf8Gaal Yahas9 Jun
  • Re: blessing db data as utf8Jochen Wiedmann9 Jun
    • Re: blessing db data as utf8Gaal Yahas9 Jun
      • Re: blessing db data as utf8Jochen Wiedmann9 Jun
        • Re: blessing db data as utf8Gaal Yahas9 Jun
  • [PATCH] Re: blessing db data as utf8Gaal Yahas9 Jun
    • Re: [PATCH] Re: blessing db data as utf8Jochen Wiedmann10 Jun
      • Re: [PATCH] Re: blessing db data as utf8Gaal Yahas10 Jun
      • Re: [PATCH] Re: blessing db data as utf8Steve Hay10 Jun
        • Re: [PATCH] Re: blessing db data as utf8Gaal Yahas10 Jun
          • Re: [PATCH] Re: blessing db data as utf8Steve Hay11 Jun
            • Re: [PATCH] Re: blessing db data as utf8Gaal Yahas11 Jun
              • Re: [PATCH] Re: blessing db data as utf8Steve Hay11 Jun
Re: blessing db data as utf8Gaal Yahas10 Jun