List:MySQL and Java« Previous MessageNext Message »
From:Mark Matthews Date:March 6 2003 10:53pm
Subject:Re: problem with CP1252 (Windows Latin1 chars)
View as plain text  
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Karlton wrote:
> 
> 
> Mark Matthews wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> David Karlton wrote:
>>
>>> hi there,
>>>
>>> we have a 3.23.47 MySQL server running on linux, and are using the 
>>> 3.0.6 JDBC driver.  the database is set up with the default (Latin1) 
>>> character set.
>>>
>>> if we try to store a String that contains any characters in the range 
>>> 0x80 to 0x9f, the data does not seem to be saved correctly.  in the 
>>> standard ISO-8859-1 (Latin1) charset, these characters are control 
>>> characters, while in CP1252 (Windows Latin1), they are printable 
>>> characters.  for example, in CP1252:
>>>
>>> 0x83 --> ƒ
>>> 0x85 --> …
>>> 0x86 --> †
>>> 0x87 --> ‡
>>> 0x88 --> ¹
>>> 0x89 --> ‰
>>> 0x8a --> Š
>>> 0x8b --> ‹
>>> 0x8c --> Œ
>>> 0x91 --> ‘
>>> 0x92 --> ’
>>> 0x93 --> “
>>> 0x94 --> ”
>>> 0x95 --> •
>>> 0x96 --> –
>>> 0x97 --> —
>>> 0x99 --> ™
>>> 0x9a --> š
>>> 0x9b --> ›
>>> 0x9c --> œ
>>> 0x9f --> Ÿ
>>>
>>> it doesn't seem that there is a CP1252 charset available for mysql, 
>>> since the only way we can store these characters at all is to force 
>>> the driver to use UTF-8 to encode the text.  however, this means that 
>>> other database clients must also expect UTF-8, and the database is 
>>> advertising Latin1 by default.
>>>
>>> why isn't there a CP1252 character set for mysql?
>>>
>>> here's a java source code that exemplifies the problem:
>>
>>
>>
>> I think there's a Java problem here too...If I use 'Cp1252' as an 
>> encoding, and create a String using the character array above, the 
>> chars are all kept...If I create another string from that, by using 
>> String.getBytes("Cp1252"), and then create a new String using new 
>> String(byte[], "Cp1252"), the characters are all replaced with '?'.
>>
>>     -Mark
> 
> 
> so, how do i solve this?  i need to store these characters and read them 
> out again with java (JDBC), and presumably non-java clients, without 
> encoding to UTF-8.  is the issue here that these characters are indeed 
> being stored properly in the table, and the bug is somewhere in the 
> chain of me trying to read them out again?  or is the driver failing to 
> store the characters in the first place (since there's no CP1252 charset 
> included with mysql, or does that even matter)?  in other words, is this 
> a database problem or a driver problem?
> 
> thanks for the prompt reply!
> 
> dk

Hrrmm..I checked this again, and here's what I found. If you add 
'&useUnicode=true&characterEncoding=Cp1252' to your JDBC URL, JDBC 
stores and retrieves these characters fine. They are the same on the way 
in as they are on the way out.

However, MySQL doesn't have a Cp1252 code page, so sorting and searching 
will work oddly, and I've got an inquiry into our 
character/internationalization person to see what can be done about that.

	-Mark
- -- 
MySQL 2003 Users Conference -> http://www.mysql.com/events/uc2003/

For technical support contracts, visit https://order.mysql.com/?ref=mmma

     __  ___     ___ ____  __
    /  |/  /_ __/ __/ __ \/ /  Mark Matthews <mark@stripped>
   / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer - JDBC/Java
  /_/  /_/\_, /___/\___\_\___/ Flossmoor (Chicago), IL USA
         <___/ www.mysql.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.1.90 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+Z9FytvXNTca6JD8RAgInAKCMUk8fdz5aZxE+VCTALjzVpvHvgACfRPpX
TOco78ui8rr61TQcZWpuxio=
=ruPo
-----END PGP SIGNATURE-----

Thread
problem with CP1252 (Windows Latin1 chars)David Karlton6 Mar
  • Re: problem with CP1252 (Windows Latin1 chars)Mark Matthews6 Mar
    • Re: problem with CP1252 (Windows Latin1 chars)David Karlton6 Mar
      • Re: problem with CP1252 (Windows Latin1 chars)Mark Matthews6 Mar