Bar,
Here are some more detailed comments. See in-line. By the way, I
there will be file specific comments in the final commit.
Alexander Barkov wrote:
> #At file:///home/bar/mysql-bzr/mysql-6.0.w2649/ based on
revid:alik@stripped
>
> 2701 Alexander Barkov 2009-06-19
> WL#2649 Number-to-string conversions
...
>
> === added file 'mysql-test/r/ctype_binary.result'
> --- a/mysql-test/r/ctype_binary.result 1970-01-01 00:00:00 +0000
> +++ b/mysql-test/r/ctype_binary.result 2009-06-19 14:04:18 +0000
> @@ -0,0 +1,1537 @@
> +set names binary;
> +Start of 6.1 tests
I guess "6.1" is not the most appropriate term anymore.
...
> @@ -1917,25 +1917,25 @@ from t9 where c1= 1 ;
> 1 1 1 1 1 1 1 1 1 1 1.0000 1.0000 2004-02-29 2004-02-29 11:11:11
2004-02-29 11:11:11 11:11:11 2004 1 1 a 123456789a
123456789a123456789b123456789c tinyblob tinytext blob text mediumblob
mediumtext longblob longtext one monday
> execute full_info ;
> Catalog Database Table Table_alias Column Column_alias Type
Length Max length Is_null Flags Decimals Charsetnr
> -def @arg01 8 20 1 Y 32896 0 63
> -def @arg02 8 20 1 Y 32896 0 63
> -def @arg03 8 20 1 Y 32896 0 63
> -def @arg04 8 20 1 Y 32896 0 63
> -def @arg05 8 20 1 Y 32896 0 63
> -def @arg06 8 20 1 Y 32896 0 63
> -def @arg07 5 23 1 Y 32896 31 63
> -def @arg08 5 23 1 Y 32896 31 63
> -def @arg09 5 23 1 Y 32896 31 63
> -def @arg10 5 23 1 Y 32896 31 63
> -def @arg11 246 83 6 Y 128 30 63
> -def @arg12 246 83 6 Y 128 30 63
> +def @arg01 8 20 1 Y 32768 0 8
> +def @arg02 8 20 1 Y 32768 0 8
> +def @arg03 8 20 1 Y 32768 0 8
> +def @arg04 8 20 1 Y 32768 0 8
> +def @arg05 8 20 1 Y 32768 0 8
> +def @arg06 8 20 1 Y 32768 0 8
> +def @arg07 5 23 1 Y 32768 31 8
> +def @arg08 5 23 1 Y 32768 31 8
> +def @arg09 5 23 1 Y 32768 31 8
> +def @arg10 5 23 1 Y 32768 31 8
> +def @arg11 246 83 6 Y 0 30 8
> +def @arg12 246 83 6 Y 0 30 8
> def @arg13 251 16777216 10 Y 128 31 63
> def @arg14 251 16777216 19 Y 128 31 63
> def @arg15 251 16777216 19 Y 128 31 63
> -def @arg16 251 16777216 8 Y 128 31 63
> -def @arg17 8 20 4 Y 32928 0 63
> -def @arg18 8 20 1 Y 32896 0 63
> -def @arg19 8 20 1 Y 32896 0 63
> +def @arg16 251 16777216 8 Y 0 31 8
If I understand this correctly, DATETIME will still convert to a
string with a binary charset while TIME now will give you latin1. I
do not understand the logic behind the difference.
...
> === modified file 'sql/item.h'
> --- a/sql/item.h 2009-02-02 15:58:48 +0000
> +++ b/sql/item.h 2009-06-19 14:04:18 +0000
> @@ -687,6 +687,77 @@ public:
> If value is not null null_value flag will be reset to FALSE.
> */
> virtual String *val_str(String *str)=0;
> +
> + /*
> + Returns string representation of this item in ASCII format.
> +
> + SYNOPSIS
> + val_str_ascii()
> + str - similar to val_str();
> +
> + NOTE
> + This method is introduced for performance optimization purposes.
> +
> + 1. val_str() result of some Items in string context
> + depends on @@character_set_results.
> + @@character_set_results can be set to a "real multibyte" character
> + set like UCS2, UTF16, UTF32. (We'll use only UTF32 in the examples
> + below for convenience.)
> +
> + So the default string result of such functions
> + in this circumstances is real multi-byte character set, like
UTF32.
'these circumstances' or 'this circumstance'?
> +
> + For example, all numbers in string context
> + return result in @@character_set_results:
> +
> + SELECT CONCAT(20010101); -> UTF32
> +
> + We do sprintf() first (to get ASCII representation)
> + and then convert to UTF32;
> +
> + So these kind "data sources" can use ASCII representation
> + internally, but return multi-byte data only because
> + @@character_set_results wants so.
> + Therefore, conversion from ASCII to UTF32 is applied internally.
> +
> +
> + 2. Some other functions need in fact ASCII input.
> +
> + For example,
> + inet_aton(), GeometryFromText(), Convert_TZ(), GET_FORMAT().
> +
> + Similar, fields of certain type, like DATE, TIME,
> + when you insert string data into them, expect in fact ASCII input.
> + If they get non-ASCII input, for example UTF32, they
> + convert input from ASCII to UTF32, and then use ASCII
Do you mean 'from UTF32 to ASCII'?
> + representation to do further processing.
> +
> +
> + 3. Now imagine we pass result of a data source of the first type
> + to a data destination of the second type.
> +
> + What happens:
> + a. data source converts data from ASCII to UTF32, because
> + @@character_set_results wants so and passes the result to
> + data destination.
> + b. data destination gets UTF32 string.
> + c. data destination converts UTF32 string to ASCII,
> + because it needs ASCII representation to be able to
handle data
> + correctly.
> +
> + As a result we get two steps of unnecessary conversion:
> + From ASCII to UTF32, then from UTF32 to ASCII.
> +
> + A better way to handle these situations is to pass ASCII
> + representation directly from the source to the destination.
> +
> + This is why val_str_ascii() introduced.
> +
> + RETURN
> + Symilar to val_str()
Typo: 'Symilar'
> + */
> + virtual String *val_str_ascii(String *str);
> +
> /*
> Return decimal representation of item with fixed point.
>
> @@ -1051,6 +1122,14 @@ public:
> { return Field::GEOM_GEOMETRY; };
> String *check_well_formed_result(String *str, bool send_error= 0);
> bool eq_by_collation(Item *item, bool binary_cmp, CHARSET_INFO *cs);
> + uint32 max_char_length() const
> + { return max_length / collation.collation->mbmaxlen; }
> + void fix_charset(CHARSET_INFO *cs)
> + {
> + max_length= max_char_length() * cs->mbmaxlen;
> + collation.collation= cs;
> + }
> + void fix_charset() { fix_charset(default_charset()); }
If this is what you end up with (See my previous email for
alternatives), I suggest you avoid function overloading and call the
latter function fix_default_charset(). The first fix_charset function
does not seem to be called from other places than here. Is it really
needed?
...
> @@ -2960,12 +2964,15 @@ void Item_func_coalesce::fix_length_and_
> break;
> case DECIMAL_RESULT:
> count_decimal_length();
> + fix_charset();
> break;
> case REAL_RESULT:
> count_real_length();
> + fix_charset();
> break;
> case INT_RESULT:
> count_only_length();
> + fix_charset();
> decimals= 0;
Would it not be better to do the fix_charset() from within the count
methods?
...
> @@ -631,6 +631,12 @@ public:
> {
> Item_func::print(str, query_type);
> }
> + void fix_length_and_dec()
> + {
> + Item_bool_func2::fix_length_and_dec();
> + // returns "1" or "0" or "-1"
> + max_length= 2 * collation.collation->mbmaxlen;
> + }
Why is not fix_charset() used here?
...
> @@ -531,7 +533,7 @@ void Item_func::fix_num_length_and_dec()
> for (uint i=0 ; i < arg_count ; i++)
> {
> set_if_bigger(decimals,args[i]->decimals);
> - set_if_bigger(fl_length, args[i]->max_length);
> + set_if_bigger(fl_length, args[i]->max_char_length());
> }
> max_length=float_length(decimals);
> if (fl_length > max_length)
> @@ -578,7 +580,7 @@ void Item_func::count_only_length()
> unsigned_flag= 0;
> for (uint i=0 ; i < arg_count ; i++)
> {
> - set_if_bigger(max_length, args[i]->max_length);
> + set_if_bigger(max_length, args[i]->max_char_length());
> set_if_bigger(unsigned_flag, args[i]->unsigned_flag);
> }
> }
You earlier promised me a temporary variable here. I still find it a
bit confusing that the unit of max_length is #chars in the beginning
and #bytes at the end. (There are several such places around.)
...
--
Øystein