List:MySQL++« Previous MessageNext Message »
From:Charles J. Daniels Date:May 27 2013 10:52am
Subject:Not getting utf8 result, continued
View as plain text  
Hello,

So I wrote recently about a result that's not showing up to me as utf8. The
specific value I'm trying to receive is a-zA-ZÀ-ÖØ-öø-ȳ
and I can get all
of it except for that last character, the ȳ shows up as a ?. The thing is,
I'm not getting the result in utf8. If I pass the result through ToUCS2, I
get an error due to invalid characters and end up with a lot of lost data
and yield sign shaped question marks. I know this value is stored and
retrievable as intended. I indexed a StoreQueryResult of the value into a
mysqlpp::String and accessed it through data() to make sure I wasn't
getting an unintentional conversion, and the raw data showed non utf8
behavior. For instance, À is stored in a single char, as 0xffffffc0,
instead of spread over two chars as a multi-byte. It looks rater like
latin-1 encoding, which would be 0xc0, but I don't understand why all the
f's ( guess char to dword conversion from sprintf with "0x%x" ?). But that
table has a create statement that puts utf8 as default, so it shouldn't be
in latin-1 at all.

Am I missing something. It's very fishy to me that the result comes back
with the final character specifically given as '?'. That shows me loss of
data, and supports the idea that the value is not truely stored in latin-1,
unbenounced to me, for it wouldn't be retrievable through WorkBench and
then pass through mysql++ as a question mark.

I guess I'm gonna have to step through the mysql++ code and get more
familiar, see if I can pick up the point at which it first accesses a
result from mysql and see if it ever gets something utf8 that it
subsequently converts away, or maybe it comes in with loss of data from
point one.

Here's a code snippet I run:

string sqlString = "select LgRegexpWordCharacters from languages where LgID
= ";
sqlString.append(LgID);
Query q  = conn.query(sqlString);
StoreQueryResult sqr = q.store();
wchar_t wc[200];
ToUCS2(wc, 200, sqr[0][0]);

That strings pretty jarbled. Through a different conversion (not ToUCS2) I
can get all but the last char.

--charlie

Thread
Not getting utf8 result, continuedCharles J. Daniels27 May