Hi Monty, Yan,
Michael Widenius wrote:
> Hi!
>
>>>>>> "Yan" == Yan Yu <yan2yu2@stripped> writes:
>
> <cut>
>
> Yan> There are still many other broken test cases that we need to
> Yan> investigate.
>
> Yan> What is the bug ?
>
> Yan> http://bugs.mysql.com/bug.php?id=49609
>
> Thanks. Either MySQL people or MariaDB people will certainly take a
> look at this.
This bug is fixed in MySQL-6.0 sources.
I don't think it the fix will ever be back-ported to an
earlier MySQL version. So you can try to back-port
the fix from MySQL-6.0 to the desired Maria version.
>
> <cut>
>
> Monty> Most errors in this case should mainly be in the test suite.
>
> Monty> If you run mysql-test-run --force and examine the results, ignoring
> Monty> errors in the test suite, you should be able to get some assurance
> Monty> that your change works.
>
> Yan> Could you elaborate on what do you mean by "ignoring the errors in the test
> suite"? what kind of error I can ignore?
>
> Yan> Currently I encounter three types of errors:
> Yan> (1) most benign type of error is:
> Yan> If I run mysql-test-run, some individual tests would fail; However, if I run
> a failed test by itself, using command:
> Yan> mysql-test-run --record [single-test-name],
> Yan> the test would succeed. I even tried to bundle several tests proceeding the
> failed test together with the failed
> Yan> I assume I can ignore this type of error.
>
> --record overwrites the old test data, so by using this, you may hide
> a error.
>
> What I would suggest you to do:
>
> mysql-test-run --force
>
> and after it's done, do a diff of the .result and .reject files in the
> 'r' directories.
>
> You can ignore
> - Only the characterset name in the result files has changed
> (should be a majority of the cases)
> - EXPLAIN statements where the key lengths are different..
> - Test results where the order of lines has changed that depends on
> the different sorting of character sets.
> - Difference caused because of different max string lengths
> (varchar/blob differences)
>
> Any other test results you should try to understand what could cause
> them.
>
> Yan> (2) this type of error seems caused by MYSQL_TYPE_VAR_STRING vs.
> MYSQL_TYPE_BLOB.
> Yan> for example, in sql/sql_select.cc, line 9456, function
> create_tmp_field_for_schema(),
> Yan> based on item->max_length, it will create either a Field_blob object, or
> a Field_varstring object.
>
> Yan> 9459 if (item->max_length > MAX_FIELD_VARCHARLENGTH)
> Yan> 9460 field= new Field_blob(item->max_length, item->maybe_null,
> Yan> 9461 item->name,
> item->collation.collation);
> Yan> 9462 else
> Yan> 9463 field= new Field_varstring(item->max_length,
> item->maybe_null,
> Yan> 9464 item->name,
> Yan> 9465 table->s,
> item->collation.collation);
>
> Yan> if the system_charset_info is a multibyte encoding, e.g., utf8, the
> max_length of the column type would be greater than MAX_FIELD_VARCHARLENGTH, it will
> create an MYSQL_TYPE_BLOB object, otherwise, it will create an MYSQL_TYPE_VAR_STRING
> object if the system_charset_info is latin1.
>
> Yan> The difference between MYSQL_TYPE_VAR_STRING vs. MYSQL_TYPE_BLOB manifests
> itself in at least two aspects:
> Yan> (I) the "Type" of the COLUMN_TYPE returned to "describe/explain [table name]"
> command is different in these two cases.
> Yan> this causes mysql_client_test fails.
>
> This you can ignore.
>
> Yan> (II) the wire format (i.e., packing results to the client) is different for
> these two fields: MYSQL_TYPE_VAR_STRING vs. MYSQL_TYPE_BLOB. Field_varstring::store() and
> Field_blob::store() in field.cc.
> Yan> Somehow it seems like that the client got confused after I change
> system_charset_info to latin1, (I stepped traced the server code, it seems fine)
>
> Should not cause a problem for the client (if there isn't an unknown
> bug somewhere)
>
> Yan> the results from "describe [table name] would miss information on two
> columns: Type and Null Column.
> Yan> e.g., result for "describe Columns;" returns the following:
> Yan> +--------------------------+------+------+-----+---------+-------+
> Yan> | Field | Type | Null | Key | Default | Extra |
> Yan> +--------------------------+------+------+-----+---------+-------+
> Yan> | TABLE_CATALOG | | | | NULL | |
> Yan> | TABLE_SCHEMA | | | | | |
> Yan> | TABLE_NAME | | | | | |
> Yan> | COLUMN_NAME | | | | | |
> Yan> | ORDINAL_POSITION | | | | 0 | |
> Yan> | COLUMN_DEFAULT | | | | NULL | |
>
> Yan> I have not been able to pinpoint the root cause yet, BUT after I change
> Yan> one line of ST_FIELD_INFO columns_fields_info[] from
>
> Yan> + {"COLUMN_TYPE", 65535, MYSQL_TYPE_STRING, 0, 0, "Type", OPEN_FRM_ONLY},
> Yan> to
> Yan> - {"COLUMN_TYPE", 5000, MYSQL_TYPE_STRING, 0, 0, "Type", OPEN_FRM_ONLY},
>
> Yan> the problem seems gone.
>
> Yan> My Q is: would it be safe for me to make this change?
>
> For your purpose, the above should be safe.
>
> However, I need to look at what could cause the above display bug.
>
> Yan> (3) problems caused by conversion between different charsets. mb_wc() or
> wc_mb()
> Yan> our system_charset uses latin1, and
> Yan> "create.test" includes the following SQL statement:
> Yan> set names utf8;
> Yan> create database
> имя_базы_в_кодировке_утф8_длиной_больше_чем_45;
> Yan> use
> имя_базы_в_кодировке_утф8_длиной_больше_чем_45;
> Yan> select database();
>
> Yan>
> "имя_базы_в_кодировке_утф8_длиной_больше_чем_45" was
> converted to unicode string internally.
> Yan> it seems like either the conversion is mis-handled somewhere, or the
> conversion is missing in one direction. Would this be possible?
>
> This could be becase of two reasons
>
> - files_charset_info needs to be utf8.
> or
> - files_charset_info needs to be same as system_charset_info.
>
> Bar, can you answer this one ?
The original idea of introducing "files_charset_info" was the assumption
that the file system character set can be different. For example,
on Windows, it depends on OEM code page. In case of a Western machine
it would be cp850. But we never went this way. Instead we
we introduced table-name-to-file-name encoding, so files_charset_info is
just an alias for system_charset_info now.
>
> Yan> somehow, the database name is represented internally as
> Yan> "???_????_?_?????????_???8_??????_??????_???_45",
> Yan> and this same string is also returned back to the client for "select
> database();" command.
> Yan> I do not know much about the unicode string, so not sure about the conversion
> code.
If you changed system_character_set to latin1, then
non-Latin1 letters (e.g. Cyrillic in the above example)
won't work well in identifiers. So I suggest to fix the
code to disallow using of non-latin1 characters in identifiers.
I think the fix should be done on parser level, in sql_yacc.yy.
See the code corresponding to the "IDENT_sys" and maybe
also "TEST_STRING_sys" rules. It uses thd->convert_string()
implemented in sql_class.cc.
You can extend this method by adding a new parameter, a pointer
to "uint", to return conversion error count to the upper level.
Conversion error count is returned in the "dummy_errors" variable,
from the copy_and_convert() call.
The new method version can do something like this:
bool THD::convert_string(LEX_STRING *to, CHARSET_INFO *to_cs,
const char *from, uint from_length,
CHARSET_INFO *from_cs,
uint *conversion_error_count)
{
DBUG_ENTER("convert_string");
size_t new_length= to_cs->mbmaxlen * from_length;
if (!(to->str= (char*) alloc(new_length+1)))
{
to->length= 0; // Safety fix
DBUG_RETURN(1); // EOM
}
to->length= copy_and_convert((char*) to->str, new_length, to_cs,
from, from_length, from_cs,
conversion_error_count);
to->str[to->length]=0; // Safety
DBUG_RETURN(0);
}
Now in sql_yacc.yy you can check conversion_error_count
returned from thd->convert_string().
If conversion_error_count is 0, then everything is OK.
If conversion_error_count > 0, then the identifier
contains some non-Latin1 character and you just send
a "Bad identifier" error to the client and terminate
query execution.
>
> Regards,
> Monty