List:General Discussion« Previous MessageNext Message »
From:Rick James Date:September 24 2012 11:53pm
Subject:RE: Need Help Converting Character Sets
View as plain text  
If you have a mixture of encodings, you are in deep doodoo.

This page describes some debugging techniques and some issues:
   http://mysql.rjweb.org/doc.php/charcoll

That apostrophe might be MicroSquish's "smart quote".

Can you provide SELECT HEX(the_field) FROM... ?  We (or the above page) might be able to
interpret the character.

To prevent future char set issues, you must know what encoding the source is.  Then, with
SET NAMES (etc), you tell mysqld that the bytes you have in hand are encoded that way. 
mysqld will then convert those bytes to the character set of declared for the column they
go in.  (Presumably, all the text columns will be declared utf8 or utf8mb4.)

> -----Original Message-----
> From: Mark Phillips [mailto:mark@stripped]
> Sent: Monday, September 24, 2012 4:28 PM
> To: Mysql List
> Subject: Need Help Converting Character Sets
> 
> I have a table, Articles, of news articles (in English) with three text
> columns for the intro, body, and caption. The data came from a web
> page, and the content was cut and pasted from other sources. I am
> finding that there are some non utf-8 characters in these three text
> columns. I would like to (1) convert these text fields to be strict
> utf-8 and then (2) fix the input page to keep all new submissions utf-
> 8.
> 
> 91) For the first step, fixing the current database, I tried:
> 
> update Articles set body = CONVERT(body USING ASCII);
> 
> However, when I checked one of the articles I found an apostrophe had
> been converted into a question mark. (FWIW, the apostrophe was one of
> those offending non utf-8 characters):
> 
> Before conversion: "I stepped into the observatory's control room ..."
> 
> After conversion: "I stepped into the observatory?s control room..."
> 
> Is there a better way to accomplish my first goal, without reading each
> article and manually making the changes?
> 
> (2) For the second goal, insuring that all future articles are utf-8,
> do I need to change the table structure or the insert query to insure I
> get the correct utf-8 characters into the database?
> 
> Thanks,
> 
> Mark
Thread
Need Help Converting Character SetsMark Phillips24 Sep
  • RE: Need Help Converting Character SetsRick James24 Sep
    • Re: Need Help Converting Character SetsDerek Downey28 Sep
  • Re: Need Help Converting Character Setshsv28 Sep
    • RE: Need Help Converting Character SetsRick James28 Sep
      • Re: Need Help Converting Character SetsMark Phillips30 Sep
        • RE: Need Help Converting Character SetsRick James1 Oct
        • Re: Need Help Converting Character Setshsv2 Oct