From: Rick James Date: September 24 2012 11:53pm Subject: RE: Need Help Converting Character Sets List-Archive: http://lists.mysql.com/mysql/228266 Message-Id: <2E7DD7ADE53B044C8C8BCD9C5829E1EB148CF91AB9@SP2-EX07VS01.ds.corp.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable If you have a mixture of encodings, you are in deep doodoo. This page describes some debugging techniques and some issues: http://mysql.rjweb.org/doc.php/charcoll That apostrophe might be MicroSquish's "smart quote". Can you provide SELECT HEX(the_field) FROM... ? We (or the above page) mig= ht be able to interpret the character. To prevent future char set issues, you must know what encoding the source i= s. Then, with SET NAMES (etc), you tell mysqld that the bytes you have in = hand are encoded that way. mysqld will then convert those bytes to the cha= racter set of declared for the column they go in. (Presumably, all the tex= t columns will be declared utf8 or utf8mb4.) > -----Original Message----- > From: Mark Phillips [mailto:mark@stripped] > Sent: Monday, September 24, 2012 4:28 PM > To: Mysql List > Subject: Need Help Converting Character Sets >=20 > I have a table, Articles, of news articles (in English) with three text > columns for the intro, body, and caption. The data came from a web > page, and the content was cut and pasted from other sources. I am > finding that there are some non utf-8 characters in these three text > columns. I would like to (1) convert these text fields to be strict > utf-8 and then (2) fix the input page to keep all new submissions utf- > 8. >=20 > 91) For the first step, fixing the current database, I tried: >=20 > update Articles set body =3D CONVERT(body USING ASCII); >=20 > However, when I checked one of the articles I found an apostrophe had > been converted into a question mark. (FWIW, the apostrophe was one of > those offending non utf-8 characters): >=20 > Before conversion: "I stepped into the observatory's control room ..." >=20 > After conversion: "I stepped into the observatory?s control room..." >=20 > Is there a better way to accomplish my first goal, without reading each > article and manually making the changes? >=20 > (2) For the second goal, insuring that all future articles are utf-8, > do I need to change the table structure or the insert query to insure I > get the correct utf-8 characters into the database? >=20 > Thanks, >=20 > Mark