From: Date: June 11 2004 10:14am Subject: Re: [PATCH] Re: blessing db data as utf8 List-Archive: http://lists.mysql.com/perl/3018 Message-Id: <40C969DD.7090206@uk.radan.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Gaal Yahas wrote: >On Thu, Jun 10, 2004 at 11:45:07AM +0100, Steve Hay wrote: > > > >>Getting UTF-8 data stored in the database back into properly flagged Perl >>strings without mangling anything is only part of the problem. How do >>you perform SQL SELECT's on such data in the database without the >>database understanding that the bytes it is storing are UTF-8 characters? >> >> > >Having the database knowing about utf8 is (very) nice to have, but it >isn't essential. '=' and LIKE should continue to work thanks to the >cleverness of utf8; > I was concerned that searching for some sequence of bytes that make up a UTF-8 character might accidentally match in the wrong place, like the last byte of one character and the first byte of another. Are you implying that this can't ever happen because of how UTF-8 works? I've never really looked into the detail of the UTF-8 coding; I've just used interfaces that manipulate it and took the view that the internals don't really interest me (and shouldn't do, if I'm doing things properly). If it's true, then it certainly does alleviate some of the pain. What about things like UPPER() and LOWER(), though? Presumably they're not going to work because they'll operate on bytes and completely screw everything up? >of course, collating and therefore ORDER BY won't >work correctly either, and the sizes the database knows about will all >be in bytes instead of characters. Bothersome but not insurmountable. :) > I assume you mean pull the data into Perl, have the data correctly flagged as UTF-8 there, and doing things like sorting in the Perl code? I could live with that, but the UPPER()/LOWER() issue is more of a problem. I make a lot of use of them and it's not so easy to workaround in the Perl. - Steve ------------------------------------------------ Radan Computational Ltd. The information contained in this message and any files transmitted with it are confidential and intended for the addressee(s) only. If you have received this message in error or there are any problems, please notify the sender immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of Radan Computational Ltd. The recipient(s) of this message should check it and any attached files for viruses: Radan Computational will accept no liability for any damage caused by any virus transmitted by this email.