List:General Discussion« Previous MessageNext Message »
From:Jerry Schwartz Date:January 21 2008 4:30pm
Subject:RE: Handling Special Characters
View as plain text  
> From: baron.schwartz@stripped [mailto:baron.schwartz@stripped] On
> Behalf Of Baron Schwartz
> Sent: Friday, January 18, 2008 2:33 PM
> To: Jerry Schwartz
> Cc: mysql@stripped
> Subject: Re: Handling Special Characters
>
> Jerry,
>
> On Jan 18, 2008 2:27 PM, Jerry Schwartz <jschwartz@stripped>
> wrote:
> > I am having trouble inserting special characters into a table. I am
> using
> > the MySQL client. I put the following commands into a text file (I'm
> on
> > WinXP, using Notepad), copy them, and paste them into the MySQL
> command line
> > client.
> >
> > SET NAMES utf8;
> >
> > CREATE TEMPORARY TABLE `giiexpr_db`.`eo_name_table` (
> > `eo_name` VARCHAR( 255 ) NOT NULL
> > ) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci;
> >
> > INSERT INTO eo_name_table
> >     (eo_name)
> > VALUES
> > ("Associated British Foods Plc (Abf) - Hot Drinks - World");
> >
> > SELECT * FROM eo_name_table;
> > +---------------------------------------------------------+
> > | eo_name                                                 |
> > +---------------------------------------------------------+
> > | Associated British Foods Plc (Abf) - Hot Drinks - World |
> > +---------------------------------------------------------+
> > 1 row in set (0.04 sec)
> >
> > That symbol before "World" is an N-dash, 0x96. This works perfectly.
> >
> > Here's where things go wrong. If instead of pasting these commands
> into the
> > client, I source the exact same file, I get this:
> >
> > +-------------------------------------------------+
> > | eo_name                                         |
> > +-------------------------------------------------+
> > | Associated British Foods Plc (Abf) - Hot Drinks |
> > +-------------------------------------------------+
> > 1 row in set (0.04 sec)
> >
> > It seems that special characters are treated differently when they
> are in a
> > sourced file. It isn't just the N-dash, French accented characters
> cause the
> > same problem. These are all single-byte characters.
> >
> > Just to be sure, I checked the length of the string in the field, and
> it
> > accurately reflected the difference. That proves, to my satisfaction,
> that
> > the problem is on the input side.
> >
> > I'm using version 4.1.22 community server on a Linux platform. I'm
> using
> > version 5.0.45 of the CLI client.
> >
> > Anyone have any ideas? I was trying to avoid having to write a
> program to do
> > this.
>
> After you get the data you want into the table via copy/paste, can you
> dump and reload it correctly with mysqldump > dump.sql and mysql <
> dump.sql?  If so, try peeking into the dump file and see how it's
> written there.
[JS] It turns out that the actual copy/paste on the Windows side wasn't
giving me the proper results either. It was not obvious with a visual
inspection, but the paste into a command window was apparently converting
the 0x96 (an N-dash) to 0x20 (a regular hyphen). That strikes me as a little
weird.

If I work on the Linux side, I get better results. I brought my input file
over to Linux, then sourced it into MySQL CLI there. The correct data went
into the data base. I dumped it out on the Linux side and brought the file
back to my PC. Looking at the dumped file with an editor that displays UTF-8
shows that the output of mysqldump was correct.

That gave me a clue. It appears that all of my problems are on the Windows
side. When I forced my input file to be saved using UTF-8 encoding, I got
the right results.

This explains why I've been having trouble entering Chinese characters into
my database. (This N-quad business was just the tip of a very large
iceberg.) I had discovered that copying from a UTF-8 encoded web page
directly into Excel, and then going through some gyrations on the PC end,
worked; but pasting it directly into an editor did not. I'll bet if I force
UTF-8 encoding when I save the file, I'll have better luck.

Thanks for the hint.



Thread
Handling Special CharactersJerry Schwartz18 Jan
  • Re: Handling Special CharactersBaron Schwartz18 Jan
    • RE: Handling Special CharactersJerry Schwartz19 Jan
    • RE: Handling Special CharactersJerry Schwartz21 Jan
  • Re: Handling Special CharactersSebastian Mendel21 Jan
    • RE: Handling Special CharactersJerry Schwartz21 Jan