From: Derek Downey Date: September 28 2012 1:11am Subject: Re: Need Help Converting Character Sets List-Archive: http://lists.mysql.com/mysql/228284 Message-Id: MIME-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/alternative; boundary="Apple-Mail=_14517A10-CADA-45CB-A872-BFA6B4444B57" --Apple-Mail=_14517A10-CADA-45CB-A872-BFA6B4444B57 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii To go along with what Rick is saying, this link might help you: = http://dba.stackexchange.com/questions/10467/how-to-convert-control-charac= ters-in-mysql-from-latin1-to-utf-8 I remember doing a bunch of converting HEX() control characters (such as = an apostrophe copied from a Word document) before attempting the SET = NAMES. Derek Downey On Sep 24, 2012, at 7:53 PM, Rick James wrote: > If you have a mixture of encodings, you are in deep doodoo. >=20 > This page describes some debugging techniques and some issues: > http://mysql.rjweb.org/doc.php/charcoll >=20 > That apostrophe might be MicroSquish's "smart quote". >=20 > Can you provide SELECT HEX(the_field) FROM... ? We (or the above = page) might be able to interpret the character. >=20 > To prevent future char set issues, you must know what encoding the = source is. Then, with SET NAMES (etc), you tell mysqld that the bytes = you have in hand are encoded that way. mysqld will then convert those = bytes to the character set of declared for the column they go in. = (Presumably, all the text columns will be declared utf8 or utf8mb4.) >=20 >> -----Original Message----- >> From: Mark Phillips [mailto:mark@stripped] >> Sent: Monday, September 24, 2012 4:28 PM >> To: Mysql List >> Subject: Need Help Converting Character Sets >>=20 >> I have a table, Articles, of news articles (in English) with three = text >> columns for the intro, body, and caption. The data came from a web >> page, and the content was cut and pasted from other sources. I am >> finding that there are some non utf-8 characters in these three text >> columns. I would like to (1) convert these text fields to be strict >> utf-8 and then (2) fix the input page to keep all new submissions = utf- >> 8. >>=20 >> 91) For the first step, fixing the current database, I tried: >>=20 >> update Articles set body =3D CONVERT(body USING ASCII); >>=20 >> However, when I checked one of the articles I found an apostrophe had >> been converted into a question mark. (FWIW, the apostrophe was one of >> those offending non utf-8 characters): >>=20 >> Before conversion: "I stepped into the observatory's control room = ..." >>=20 >> After conversion: "I stepped into the observatory?s control room..." >>=20 >> Is there a better way to accomplish my first goal, without reading = each >> article and manually making the changes? >>=20 >> (2) For the second goal, insuring that all future articles are utf-8, >> do I need to change the table structure or the insert query to insure = I >> get the correct utf-8 characters into the database? >>=20 >> Thanks, >>=20 >> Mark >=20 > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql >=20 --Apple-Mail=_14517A10-CADA-45CB-A872-BFA6B4444B57--