List:Commits« Previous MessageNext Message »
From:Alexander Barkov Date:March 1 2011 2:13pm
Subject:Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341
Bug#11764503
View as plain text  
Hi Guilhem,

Guilhem Bichot wrote:
> Hello,
> 
> Alexander Barkov a écrit, Le 01.03.2011 11:24:
>> #At file:///home/bar/mysql-bzr/mysql-5.5.b57341/ based on 
>> revid:tor.didriksen@stripped
>>
>>  3357 Alexander Barkov    2011-03-01
>>       Bug#11764503 (Bug#57341) Query in EXPLAIN EXTENDED shows wrong 
>> characters
>>               @ mysql-test/r/ctype_latin1.result
>>         @ mysql-test/r/ctype_utf8.result
>>         @ mysql-test/t/ctype_latin1.test
>>         @ mysql-test/t/ctype_utf8.test
>>         Adding tests
>>               @ sql/mysqld.h
>>         Adding a new Item::print() mode - QT_EXPLAIN, which
>>         - displays introducers
>>         - prints ASCII characters as is
>>         - prints non-ASCII characters using hex-escape
>>         Note: as "EXPLAIN" output is only for human readability purposes
>>         and does not need to be a pasrable SQL, using hex-escape is Ok.
>>               @ sql/item.cc
>>         Implementing QT_EXPLAIN mode for Item_string::print().
>>         Note: ErrConvString class perfectly suites for EXPLAIN purposes.
>>               @ sql/sql_parse.cc
>>         Using new QT_EXPLAIN mode for "EXPLAIN EXTENDED SELECT".
> 
>> === modified file 'mysql-test/r/ctype_latin1.result'
>> --- a/mysql-test/r/ctype_latin1.result    2011-02-10 08:18:08 +0000
>> +++ b/mysql-test/r/ctype_latin1.result    2011-03-01 10:17:10 +0000
>> @@ -3246,5 +3246,20 @@ maketime(`a`,`a`,`a`)
>>  DROP TABLE t1;
>>  SET sql_mode=default;
>>  #
>> +# Bug#11764503 (Bug#57341) Query in EXPLAIN EXTENDED shows wrong 
>> characters
>> +#
>> +SET NAMES utf8;
>> +EXPLAIN EXTENDED SELECT 'abcdó', _latin1'abcdó',
> _utf8'abcdó';
>> +id    select_type    table    type    possible_keys    key    
>> key_len    ref    rows    filtered    Extra
>> +1    SIMPLE    NULL    NULL    NULL    NULL    NULL    NULL    
>> NULL    NULL    No tables used
>> +Warnings:
>> +Note    1003    select 'abcdó' AS `abcdó`,_latin1'abcd\xC3\xB3' AS
> 
>> `abcdó`,_utf8'abcd\xC3\xB3' AS `abcdó`
>> +SET NAMES latin1;
> 
> I had looked a bit into this bug, and my impression was that the cause 
> is rather one piece of code (the one printing literals of the SELECT 
> list, or the one printing AS clauses?), which forgets to convert to the 
> proper charset.
> In other words, if I type
> EXPLAIN EXTENDED SELECT 'abcdó', _latin1'abcdó', _utf8'abcdó';
> wouldn't the proper solution be to display
> Note    1003    select 'abcdó' AS `abcdó`, _latin1'abcdó' AS
> `abcdó`, etc?
> 
> Isn't the solution chosen in the patch hiding the real bug?

Does not really forgets/hides. I had been thinking
a few hours how to fix it properly, then intentionally made it this way.
Let me try to explain why. Sorry for a long answer. This is tricky.


1. When there are no character set introducers (e.g. _latin1 or _utf8),
everything works fine with my change:
a. the patch does not touch behaviour for ASCII characters
b. the patch fixes behaviour for non-ASCII characters
   (which was the report about), so non-ASCII characters
   are displayed well now in "EXPLAIN EXTENDED".

2. When there are introducers:
2a. the patch does not change behaviour for ASCII characters


2b. when there are some non-ASCII characters, it's a tough situation.
(it's not in the original report)

Suppose we have this query:

SET NAMES utf8;
EXPLAIN EXTENDED SELECT _latin1'abcdó';  # where 'ó' is \xC3\xB3 (utf8)

By the above query we say that:
"although the character set of the entire query is utf8,
please take the constant in quotes as latin1",
so the sequence "\xC3\xB3" is treated as:

- 0xC3 = LATIN CAPITAL LETTER A WITH TILDE
- 0xB3 = SUPERSCRIPT TREE


To make display it in "EXPLAIN EXTENDED" in the original
form _latin1'abcdó', Item_string would have to remember
two character sets:

- latin1, which is the real character set of the constant,
   e.g. for val_str() and all other usual Item functionality

- utf8, which is the "visible" character set,
   i.e. how it was seen to the user when the query was typed.
   So in Item_string::print() we could do:
   "although the string is latin1, pretend to be utf8 when displaying",
   i.e. backward substitution for what happened at parse time.


This has a disadvantage:
This would need a new member "CHARSET_INFO *visible_charset"
in Item_string, which would be used extremely rarely: *only* for
EXPLAIN EXTENDED purposes, only for the case when there
is a string with an introducer and with non-ASCII characters.
I think it would be a slow down without a good reason,
because it would need initialization of the new member in all
constructors.

Another serious disadvantage is that raw 8-bit data with
introducer totally  messes everything up after conversion
to another character set, and we're planning some work in this
direction: "WL#3529 Unicode Escape Sequences",
so we'll use escape sequences every time when an extended string
with an introducer needs to be  displayed.

With WL#3529, it will be displayed as follows:

SET NAMES utf8;
EXPLAIN EXTENDED _latin1'abcdó';
-> Note    1003    select _latin1 U&'abcd\00C3\00B3' AS `abcdó`

Notice the U& syntax, where
- 00C3 is Unicode code point for "LATIN1 CAPITAL LETTER A WITH TILDE"
- 00B3 is Unicode code point for "SUPERSCRIPT THREE"


It will be similar in other context, for example for VIEWs:

SET NAMES utf8;
CREATE VIEW v1 AS SELECT _latin1'abcdó';
SHOW CREATE VIEW v1;

will display:
   CREATE VIEW v1 AS SELECT _latin1'abcd\00C3\00B3';

which can be harmlessly converted to any character set
and then parsed back without data loss.


So I think, my decision should be fine:
- it does not slow down Item_string to "fix" very rare situation

- it is inline with the general development direction:
   a kind of small step towards "introducer-safe environment"
   which will be later fully implemented in WL#3529
   (I'm just using HEX sequences at the moment,
    rather than Unicode sequences - for simplicity, and not
    to confuse users by unknown yet syntax)

Thread
bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Alexander Barkov1 Mar
  • Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Guilhem Bichot1 Mar
    • Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Alexander Barkov1 Mar
      • Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Guilhem Bichot2 Mar
        • Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Alexander Barkov3 Mar
  • Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Guilhem Bichot3 Mar
    • Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Alexander Barkov4 Mar
      • Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Guilhem Bichot4 Mar
        • Re: bzr commit into mysql-5.5 branch (alexander.barkov:3357) Bug#57341Bug#11764503Alexander Barkov4 Mar