-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
David Karlton wrote:
>
>
> Mark Matthews wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> David Karlton wrote:
>>
>>> hi there,
>>>
>>> we have a 3.23.47 MySQL server running on linux, and are using the
>>> 3.0.6 JDBC driver. the database is set up with the default (Latin1)
>>> character set.
>>>
>>> if we try to store a String that contains any characters in the range
>>> 0x80 to 0x9f, the data does not seem to be saved correctly. in the
>>> standard ISO-8859-1 (Latin1) charset, these characters are control
>>> characters, while in CP1252 (Windows Latin1), they are printable
>>> characters. for example, in CP1252:
>>>
>>> 0x83 --> ƒ
>>> 0x85 --> …
>>> 0x86 --> †
>>> 0x87 --> ‡
>>> 0x88 --> ¹
>>> 0x89 --> ‰
>>> 0x8a --> Š
>>> 0x8b --> ‹
>>> 0x8c --> Œ
>>> 0x91 --> ‘
>>> 0x92 --> ’
>>> 0x93 --> “
>>> 0x94 --> ”
>>> 0x95 --> •
>>> 0x96 --> –
>>> 0x97 --> —
>>> 0x99 --> ™
>>> 0x9a --> š
>>> 0x9b --> ›
>>> 0x9c --> œ
>>> 0x9f --> Ÿ
>>>
>>> it doesn't seem that there is a CP1252 charset available for mysql,
>>> since the only way we can store these characters at all is to force
>>> the driver to use UTF-8 to encode the text. however, this means that
>>> other database clients must also expect UTF-8, and the database is
>>> advertising Latin1 by default.
>>>
>>> why isn't there a CP1252 character set for mysql?
>>>
>>> here's a java source code that exemplifies the problem:
>>
>>
>>
>> I think there's a Java problem here too...If I use 'Cp1252' as an
>> encoding, and create a String using the character array above, the
>> chars are all kept...If I create another string from that, by using
>> String.getBytes("Cp1252"), and then create a new String using new
>> String(byte[], "Cp1252"), the characters are all replaced with '?'.
>>
>> -Mark
>
>
> so, how do i solve this? i need to store these characters and read them
> out again with java (JDBC), and presumably non-java clients, without
> encoding to UTF-8. is the issue here that these characters are indeed
> being stored properly in the table, and the bug is somewhere in the
> chain of me trying to read them out again? or is the driver failing to
> store the characters in the first place (since there's no CP1252 charset
> included with mysql, or does that even matter)? in other words, is this
> a database problem or a driver problem?
>
> thanks for the prompt reply!
>
> dk
Hrrmm..I checked this again, and here's what I found. If you add
'&useUnicode=true&characterEncoding=Cp1252' to your JDBC URL, JDBC
stores and retrieves these characters fine. They are the same on the way
in as they are on the way out.
However, MySQL doesn't have a Cp1252 code page, so sorting and searching
will work oddly, and I've got an inquiry into our
character/internationalization person to see what can be done about that.
-Mark
- --
MySQL 2003 Users Conference -> http://www.mysql.com/events/uc2003/
For technical support contracts, visit https://order.mysql.com/?ref=mmma
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mark Matthews <mark@stripped>
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer - JDBC/Java
/_/ /_/\_, /___/\___\_\___/ Flossmoor (Chicago), IL USA
<___/ www.mysql.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.1.90 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQE+Z9FytvXNTca6JD8RAgInAKCMUk8fdz5aZxE+VCTALjzVpvHvgACfRPpX
TOco78ui8rr61TQcZWpuxio=
=ruPo
-----END PGP SIGNATURE-----