List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:December 22 2009 12:18pm
Subject:Re: system_charset_info
View as plain text  
Hi!

>>>>> "Yan" == Yan Yu <yan2yu2@stripped> writes:

<cut>

Yan> There are still many other broken test cases that we need to
Yan> investigate.

Yan> What is the bug ?

Yan> http://bugs.mysql.com/bug.php?id=49609

Thanks. Either MySQL people or MariaDB people will certainly take a
look at this.

<cut>

Monty> Most errors in this case should mainly be in the test suite.

Monty> If you run mysql-test-run --force and examine the results, ignoring
Monty> errors in the test suite, you should be able to get some assurance
Monty> that your change works.

Yan> Could you elaborate on what do you mean by "ignoring the errors in the test
> suite"?  what kind of error I can ignore?

Yan> Currently I encounter three types of errors:
Yan> (1) most benign type of error is:
Yan> If I run mysql-test-run, some individual tests would fail; However, if I run a
> failed test by itself, using command:
Yan> mysql-test-run --record [single-test-name],
Yan> the test would succeed.  I even tried to bundle several tests proceeding the
> failed test together with the failed
Yan> I assume I can ignore this type of error.

--record overwrites the old test data, so by using this, you may hide
a error.

What I would suggest you to do:

mysql-test-run --force

and after it's done, do a diff of the .result and .reject files in the
'r' directories.

You can ignore
- Only the characterset name in the result files has changed
  (should be a majority of the cases)
- EXPLAIN statements where the key lengths are different..
- Test results where the order of lines has changed that depends on
  the different sorting of character sets.
- Difference caused because of different max string lengths
  (varchar/blob differences)

Any other test results you should try to understand what could cause
them.

Yan> (2) this type of error seems caused by MYSQL_TYPE_VAR_STRING vs. MYSQL_TYPE_BLOB.
Yan> for example, in sql/sql_select.cc, line  9456, function
> create_tmp_field_for_schema(),
Yan> based on item->max_length, it will create either a  Field_blob object, or a
> Field_varstring object.

Yan>  9459     if (item->max_length > MAX_FIELD_VARCHARLENGTH)
Yan>  9460       field= new Field_blob(item->max_length, item->maybe_null,
Yan>  9461                             item->name, item->collation.collation);
Yan>  9462     else
Yan>  9463       field= new Field_varstring(item->max_length, item->maybe_null,
Yan>  9464                                  item->name,
Yan>  9465                                  table->s,
> item->collation.collation);

Yan> if the system_charset_info is a multibyte encoding, e.g., utf8, the max_length of
> the column type would be greater than MAX_FIELD_VARCHARLENGTH, it will create an
> MYSQL_TYPE_BLOB object, otherwise, it will create an MYSQL_TYPE_VAR_STRING object if the
> system_charset_info is latin1.

Yan> The difference between MYSQL_TYPE_VAR_STRING vs. MYSQL_TYPE_BLOB manifests itself
> in at least two aspects:
Yan> (I) the "Type" of the COLUMN_TYPE returned to "describe/explain [table name]"
> command is different in these two cases.
Yan> this causes mysql_client_test fails.

This you can ignore.

Yan> (II) the wire format (i.e., packing results to the client) is different for these
> two fields: MYSQL_TYPE_VAR_STRING vs. MYSQL_TYPE_BLOB.  Field_varstring::store() and
> Field_blob::store() in field.cc.
Yan> Somehow it seems like that the client got confused after I change
> system_charset_info to latin1, (I stepped traced the server code, it seems fine)

Should not cause a problem for the client (if there isn't an unknown
bug somewhere)

Yan> the results from "describe [table name] would miss information on two columns:
> Type and Null Column.
Yan> e.g., result for "describe Columns;" returns the following:
Yan> +--------------------------+------+------+-----+---------+-------+
Yan> | Field                    | Type | Null | Key | Default | Extra |
Yan> +--------------------------+------+------+-----+---------+-------+
Yan> | TABLE_CATALOG            |      |      |     | NULL    |       |
Yan> | TABLE_SCHEMA             |      |      |     |         |       |
Yan> | TABLE_NAME               |      |      |     |         |       |
Yan> | COLUMN_NAME              |      |      |     |         |       |
Yan> | ORDINAL_POSITION         |      |      |     | 0       |       |
Yan> | COLUMN_DEFAULT           |      |      |     | NULL    |       |

Yan> I have not been able to pinpoint the root cause yet, BUT after I change
Yan> one line of ST_FIELD_INFO columns_fields_info[] from

Yan> +  {"COLUMN_TYPE", 65535, MYSQL_TYPE_STRING, 0, 0, "Type", OPEN_FRM_ONLY},
Yan> to
Yan> -  {"COLUMN_TYPE", 5000, MYSQL_TYPE_STRING, 0, 0, "Type", OPEN_FRM_ONLY},

Yan> the problem seems gone.

Yan> My Q is:  would it be safe for me to make this change?

For your purpose, the above should be safe.

However, I need to look at what could cause the above display bug.

Yan> (3) problems caused by conversion between different charsets. mb_wc() or wc_mb()
Yan> our system_charset uses latin1, and
Yan> "create.test"  includes the following SQL statement:
Yan> set names utf8;
Yan> create database
> имя_базы_в_кодировке_утф8_длиной_больше_чем_45;
Yan> use
> имя_базы_в_кодировке_утф8_длиной_больше_чем_45;
Yan> select database();

Yan>
> "имя_базы_в_кодировке_утф8_длиной_больше_чем_45" was
> converted to unicode string internally.
Yan> it seems like either the conversion is mis-handled somewhere, or the conversion is
> missing in one direction. Would this be possible?

This could be becase of two reasons

- files_charset_info needs to be utf8.
or
- files_charset_info needs to be same as system_charset_info.

Bar, can you answer this one ?

Yan> somehow, the database name is represented internally as
Yan> "???_????_?_?????????_???8_??????_??????_???_45",
Yan> and this same string  is also returned back to the client for "select database();"
> command.
Yan> I do not know much about the unicode string, so not sure about the conversion
> code.

Regards,
Monty
Thread
system_charset_infoYan Yu20 Dec
  • re: system_charset_infoMichael Widenius22 Dec
  • Re: system_charset_infoAlexander Barkov23 Dec
Re: system_charset_infoMichael Widenius22 Dec
  • RE: system_charset_infoYan Yu23 Dec
  • Re: system_charset_infoAlexander Barkov23 Dec
    • Re: system_charset_infoYan Yu24 Dec