List:General Discussion« Previous MessageNext Message »
From:Rick James Date:September 28 2012 2:59pm
Subject:RE: Need Help Converting Character Sets
View as plain text  
Thanks for that link!  That's another subtle issue I had not noted.

There are so many combinations, that it is hard to say "do this":
* Incoming bytes are latin1 / utf8 / Microsquish control characters.
* You do/don't have SET NAMES (or equivalent)
* The database/table/column is declared latin1/utf8/other.
* The problem is on ingestion / on retrieval.

The thing mentioned involved 2 steps:
ALTER TABLE ... MODIFY COLUMN  BINARY (or BLOB);  -- to forget any charset knowledge
ALTER TABLE ... MODIFY COLUMN  CHARACTER SET ...;  -- coming from BINARY, this does not
check the encoding.
(sorry, don't have the link handy)

> -----Original Message-----
> From: hsv@stripped [mailto:hsv@stripped]
> Sent: Thursday, September 27, 2012 2:24 PM
> To: Mark Phillips
> Cc: Mysql List
> Subject: Re: Need Help Converting Character Sets
> 
> >>>> 2012/09/24 16:28 -0700, Mark Phillips >>>>
> I have a table, Articles, of news articles (in English) with three text
> columns for the intro, body, and caption. The data came from a web
> page, and the content was cut and pasted from other sources. I am
> finding that there are some non utf-8 characters in these three text
> columns. I would like to (1) convert these text fields to be strict
> utf-8 and then (2) fix the input page to keep all new submissions utf-
> 8.
> 
> 91) For the first step, fixing the current database, I tried:
> 
> update Articles set body = CONVERT(body USING ASCII);
> 
> However, when I checked one of the articles I found an apostrophe had
> been converted into a question mark. (FWIW, the apostrophe was one of
> those offending non utf-8 characters):
> 
> Before conversion: "I stepped into the observatory?s control room ..."
> 
> After conversion: "I stepped into the observatory?s control room..."
> 
> Is there a better way to accomplish my first goal, without reading each
> article and manually making the changes?
> <<<<<<<<
> I do not remember where on the MySQL website this is, but there was an
> article about converting from character sets in version 4 to those in
> version 5, when UTF-8 first was supported. It sounds to me that maybe
> the tricks shown there would be useful to you, since, in effect,
> through MySQL MySQL was fooled into accepting for UTF-8 that which was
> not. Conversion to binary string was mentioned.
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql

Thread
Need Help Converting Character SetsMark Phillips24 Sep
  • RE: Need Help Converting Character SetsRick James24 Sep
    • Re: Need Help Converting Character SetsDerek Downey28 Sep
  • Re: Need Help Converting Character Setshsv28 Sep
    • RE: Need Help Converting Character SetsRick James28 Sep
      • Re: Need Help Converting Character SetsMark Phillips30 Sep
        • RE: Need Help Converting Character SetsRick James1 Oct
        • Re: Need Help Converting Character Setshsv2 Oct