List:General Discussion« Previous MessageNext Message »
From:Derek Downey Date:September 28 2012 1:11am
Subject:Re: Need Help Converting Character Sets
View as plain text  
To go along with what Rick is saying, this link might help you:
http://dba.stackexchange.com/questions/10467/how-to-convert-control-characters-in-mysql-from-latin1-to-utf-8

I remember doing a bunch of converting HEX() control characters (such as an apostrophe
copied from a Word document) before attempting the SET NAMES.

Derek Downey


On Sep 24, 2012, at 7:53 PM, Rick James wrote:

> If you have a mixture of encodings, you are in deep doodoo.
> 
> This page describes some debugging techniques and some issues:
>   http://mysql.rjweb.org/doc.php/charcoll
> 
> That apostrophe might be MicroSquish's "smart quote".
> 
> Can you provide SELECT HEX(the_field) FROM... ?  We (or the above page) might be able
> to interpret the character.
> 
> To prevent future char set issues, you must know what encoding the source is.  Then,
> with SET NAMES (etc), you tell mysqld that the bytes you have in hand are encoded that
> way.  mysqld will then convert those bytes to the character set of declared for the column
> they go in.  (Presumably, all the text columns will be declared utf8 or utf8mb4.)
> 
>> -----Original Message-----
>> From: Mark Phillips [mailto:mark@stripped]
>> Sent: Monday, September 24, 2012 4:28 PM
>> To: Mysql List
>> Subject: Need Help Converting Character Sets
>> 
>> I have a table, Articles, of news articles (in English) with three text
>> columns for the intro, body, and caption. The data came from a web
>> page, and the content was cut and pasted from other sources. I am
>> finding that there are some non utf-8 characters in these three text
>> columns. I would like to (1) convert these text fields to be strict
>> utf-8 and then (2) fix the input page to keep all new submissions utf-
>> 8.
>> 
>> 91) For the first step, fixing the current database, I tried:
>> 
>> update Articles set body = CONVERT(body USING ASCII);
>> 
>> However, when I checked one of the articles I found an apostrophe had
>> been converted into a question mark. (FWIW, the apostrophe was one of
>> those offending non utf-8 characters):
>> 
>> Before conversion: "I stepped into the observatory's control room ..."
>> 
>> After conversion: "I stepped into the observatory?s control room..."
>> 
>> Is there a better way to accomplish my first goal, without reading each
>> article and manually making the changes?
>> 
>> (2) For the second goal, insuring that all future articles are utf-8,
>> do I need to change the table structure or the insert query to insure I
>> get the correct utf-8 characters into the database?
>> 
>> Thanks,
>> 
>> Mark
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql
> 


Thread
Need Help Converting Character SetsMark Phillips24 Sep
  • RE: Need Help Converting Character SetsRick James24 Sep
    • Re: Need Help Converting Character SetsDerek Downey28 Sep
  • Re: Need Help Converting Character Setshsv28 Sep
    • RE: Need Help Converting Character SetsRick James28 Sep
      • Re: Need Help Converting Character SetsMark Phillips30 Sep
        • RE: Need Help Converting Character SetsRick James1 Oct
        • Re: Need Help Converting Character Setshsv2 Oct