Gaal Yahas wrote:
>On Thu, Jun 10, 2004 at 11:45:07AM +0100, Steve Hay wrote:
>
>
>
>>Getting UTF-8 data stored in the database back into properly flagged Perl
>>strings without mangling anything is only part of the problem. How do
>>you perform SQL SELECT's on such data in the database without the
>>database understanding that the bytes it is storing are UTF-8 characters?
>>
>>
>
>Having the database knowing about utf8 is (very) nice to have, but it
>isn't essential. '=' and LIKE should continue to work thanks to the
>cleverness of utf8;
>
I was concerned that searching for some sequence of bytes that make up a
UTF-8 character might accidentally match in the wrong place, like the
last byte of one character and the first byte of another. Are you
implying that this can't ever happen because of how UTF-8 works? I've
never really looked into the detail of the UTF-8 coding; I've just used
interfaces that manipulate it and took the view that the internals don't
really interest me (and shouldn't do, if I'm doing things properly).
If it's true, then it certainly does alleviate some of the pain.
What about things like UPPER() and LOWER(), though? Presumably they're
not going to work because they'll operate on bytes and completely screw
everything up?
>of course, collating and therefore ORDER BY won't
>work correctly either, and the sizes the database knows about will all
>be in bytes instead of characters. Bothersome but not insurmountable. :)
>
I assume you mean pull the data into Perl, have the data correctly
flagged as UTF-8 there, and doing things like sorting in the Perl code?
I could live with that, but the UPPER()/LOWER() issue is more of a
problem. I make a lot of use of them and it's not so easy to workaround
in the Perl.
- Steve
------------------------------------------------
Radan Computational Ltd.
The information contained in this message and any files transmitted with it are
confidential and intended for the addressee(s) only. If you have received this message
in error or there are any problems, please notify the sender immediately. The
unauthorized use, disclosure, copying or alteration of this message is strictly
forbidden. Note that any views or opinions presented in this email are solely those of
the author and do not necessarily represent those of Radan Computational Ltd. The
recipient(s) of this message should check it and any attached files for viruses: Radan
Computational will accept no liability for any damage caused by any virus transmitted by
this email.