List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:June 25 2009 12:56pm
Subject:Re: bzr commit into mysql-6.0 branch (bar:2701) WL#2649
View as plain text  
Bar,

Here are some more detailed comments.  See in-line.  By the way, I
there will be file specific comments in the final commit.

Alexander Barkov wrote:
 > #At file:///home/bar/mysql-bzr/mysql-6.0.w2649/ based on 
revid:alik@stripped
 >
 >  2701 Alexander Barkov	2009-06-19
 >        WL#2649 Number-to-string conversions

...
 >
 > === added file 'mysql-test/r/ctype_binary.result'
 > --- a/mysql-test/r/ctype_binary.result	1970-01-01 00:00:00 +0000
 > +++ b/mysql-test/r/ctype_binary.result	2009-06-19 14:04:18 +0000
 > @@ -0,0 +1,1537 @@
 > +set names binary;
 > +Start of 6.1 tests

I guess "6.1" is not the most appropriate term anymore.

...

 > @@ -1917,25 +1917,25 @@ from t9 where c1= 1 ;
 >  1	1	1	1	1	1	1	1	1	1	1.0000	1.0000	2004-02-29	2004-02-29 11:11:11 
2004-02-29 11:11:11	11:11:11	2004	1	1	a	123456789a 
123456789a123456789b123456789c	tinyblob	tinytext	blob	text	mediumblob 
mediumtext	longblob	longtext	one	monday
 >  execute full_info ;
 >  Catalog	Database	Table	Table_alias	Column	Column_alias	Type 
Length	Max length	Is_null	Flags	Decimals	Charsetnr
 > -def					@arg01	8	20	1	Y	32896	0	63
 > -def					@arg02	8	20	1	Y	32896	0	63
 > -def					@arg03	8	20	1	Y	32896	0	63
 > -def					@arg04	8	20	1	Y	32896	0	63
 > -def					@arg05	8	20	1	Y	32896	0	63
 > -def					@arg06	8	20	1	Y	32896	0	63
 > -def					@arg07	5	23	1	Y	32896	31	63
 > -def					@arg08	5	23	1	Y	32896	31	63
 > -def					@arg09	5	23	1	Y	32896	31	63
 > -def					@arg10	5	23	1	Y	32896	31	63
 > -def					@arg11	246	83	6	Y	128	30	63
 > -def					@arg12	246	83	6	Y	128	30	63
 > +def					@arg01	8	20	1	Y	32768	0	8
 > +def					@arg02	8	20	1	Y	32768	0	8
 > +def					@arg03	8	20	1	Y	32768	0	8
 > +def					@arg04	8	20	1	Y	32768	0	8
 > +def					@arg05	8	20	1	Y	32768	0	8
 > +def					@arg06	8	20	1	Y	32768	0	8
 > +def					@arg07	5	23	1	Y	32768	31	8
 > +def					@arg08	5	23	1	Y	32768	31	8
 > +def					@arg09	5	23	1	Y	32768	31	8
 > +def					@arg10	5	23	1	Y	32768	31	8
 > +def					@arg11	246	83	6	Y	0	30	8
 > +def					@arg12	246	83	6	Y	0	30	8
 >  def					@arg13	251	16777216	10	Y	128	31	63
 >  def					@arg14	251	16777216	19	Y	128	31	63
 >  def					@arg15	251	16777216	19	Y	128	31	63
 > -def					@arg16	251	16777216	8	Y	128	31	63
 > -def					@arg17	8	20	4	Y	32928	0	63
 > -def					@arg18	8	20	1	Y	32896	0	63
 > -def					@arg19	8	20	1	Y	32896	0	63
 > +def					@arg16	251	16777216	8	Y	0	31	8

If I understand this correctly, DATETIME will still convert to a
string with a binary charset while TIME now will give you latin1.  I
do not understand the logic behind the difference.

...

 > === modified file 'sql/item.h'
 > --- a/sql/item.h	2009-02-02 15:58:48 +0000
 > +++ b/sql/item.h	2009-06-19 14:04:18 +0000
 > @@ -687,6 +687,77 @@ public:
 >        If value is not null null_value flag will be reset to FALSE.
 >    */
 >    virtual String *val_str(String *str)=0;
 > +
 > +  /*
 > +    Returns string representation of this item in ASCII format.
 > +
 > +    SYNOPSIS
 > +      val_str_ascii()
 > +      str - similar to val_str();
 > +
 > +    NOTE
 > +      This method is introduced for performance optimization purposes.
 > +
 > +      1. val_str() result of some Items in string context
 > +      depends on @@character_set_results.
 > +      @@character_set_results can be set to a "real multibyte" character
 > +      set like UCS2, UTF16, UTF32. (We'll use only UTF32 in the examples
 > +      below for convenience.)
 > +
 > +      So the default string result of such functions
 > +      in this circumstances is real multi-byte character set, like 
UTF32.

'these circumstances' or 'this circumstance'?

 > +
 > +      For example, all numbers in string context
 > +      return result in @@character_set_results:
 > +
 > +      SELECT CONCAT(20010101); -> UTF32
 > +
 > +      We do sprintf() first (to get ASCII representation)
 > +      and then convert to UTF32;
 > +
 > +      So these kind "data sources" can use ASCII representation
 > +      internally, but return multi-byte data only because
 > +      @@character_set_results wants so.
 > +      Therefore, conversion from ASCII to UTF32 is applied internally.
 > +
 > +
 > +      2. Some other functions need in fact ASCII input.
 > +
 > +      For example,
 > +        inet_aton(), GeometryFromText(), Convert_TZ(), GET_FORMAT().
 > +
 > +      Similar, fields of certain type, like DATE, TIME,
 > +      when you insert string data into them, expect in fact ASCII input.
 > +      If they get non-ASCII input, for example UTF32, they
 > +      convert input from ASCII to UTF32, and then use ASCII

Do you mean 'from UTF32 to ASCII'?

 > +      representation to do further processing.
 > +
 > +
 > +      3. Now imagine we pass result of a data source of the first type
 > +         to a data destination of the second type.
 > +
 > +      What happens:
 > +        a. data source converts data from ASCII to UTF32, because
 > +           @@character_set_results wants so and passes the result to
 > +           data destination.
 > +        b. data destination gets UTF32 string.
 > +        c. data destination converts UTF32 string to ASCII,
 > +           because it needs ASCII representation to be able to 
handle data
 > +           correctly.
 > +
 > +      As a result we get two steps of unnecessary conversion:
 > +      From ASCII to UTF32, then from UTF32 to ASCII.
 > +
 > +      A better way to handle these situations is to pass ASCII
 > +      representation directly from the source to the destination.
 > +
 > +      This is why val_str_ascii() introduced.
 > +
 > +    RETURN
 > +      Symilar to val_str()

Typo: 'Symilar'

 > +  */
 > +  virtual String *val_str_ascii(String *str);
 > +
 >    /*
 >      Return decimal representation of item with fixed point.
 >
 > @@ -1051,6 +1122,14 @@ public:
 >      { return Field::GEOM_GEOMETRY; };
 >    String *check_well_formed_result(String *str, bool send_error= 0);
 >    bool eq_by_collation(Item *item, bool binary_cmp, CHARSET_INFO *cs);
 > +  uint32 max_char_length() const
 > +  { return max_length / collation.collation->mbmaxlen; }
 > +  void fix_charset(CHARSET_INFO *cs)
 > +  {
 > +    max_length= max_char_length() * cs->mbmaxlen;
 > +    collation.collation= cs;
 > +  }
 > +  void fix_charset() { fix_charset(default_charset()); }

If this is what you end up with (See my previous email for
alternatives), I suggest you avoid function overloading and call the
latter function fix_default_charset().  The first fix_charset function
does not seem to be called from other places than here.  Is it really
needed?

...

 > @@ -2960,12 +2964,15 @@ void Item_func_coalesce::fix_length_and_
 >      break;
 >    case DECIMAL_RESULT:
 >      count_decimal_length();
 > +    fix_charset();
 >      break;
 >    case REAL_RESULT:
 >      count_real_length();
 > +    fix_charset();
 >      break;
 >    case INT_RESULT:
 >      count_only_length();
 > +    fix_charset();
 >      decimals= 0;

Would it not be better to do the fix_charset() from within the count
methods?

...

 > @@ -631,6 +631,12 @@ public:
 >    {
 >      Item_func::print(str, query_type);
 >    }
 > +  void fix_length_and_dec()
 > +  {
 > +    Item_bool_func2::fix_length_and_dec();
 > +    // returns "1" or "0" or "-1"
 > +    max_length= 2 * collation.collation->mbmaxlen;
 > +  }

Why is not fix_charset() used here?

...

 > @@ -531,7 +533,7 @@ void Item_func::fix_num_length_and_dec()
 >    for (uint i=0 ; i < arg_count ; i++)
 >    {
 >      set_if_bigger(decimals,args[i]->decimals);
 > -    set_if_bigger(fl_length, args[i]->max_length);
 > +    set_if_bigger(fl_length, args[i]->max_char_length());
 >    }
 >    max_length=float_length(decimals);
 >    if (fl_length > max_length)
 > @@ -578,7 +580,7 @@ void Item_func::count_only_length()
 >    unsigned_flag= 0;
 >    for (uint i=0 ; i < arg_count ; i++)
 >    {
 > -    set_if_bigger(max_length, args[i]->max_length);
 > +    set_if_bigger(max_length, args[i]->max_char_length());
 >      set_if_bigger(unsigned_flag, args[i]->unsigned_flag);
 >    }
 >  }

You earlier promised me a temporary variable here.  I still find it a
bit confusing that the unit of max_length is #chars in the beginning
and #bytes at the end.  (There are several such places around.)

...

-- 
Øystein
Thread
bzr commit into mysql-6.0 branch (bar:2701) WL#2649Alexander Barkov19 Jun
  • Re: bzr commit into mysql-6.0 branch (bar:2701) WL#2649Øystein Grøvlen25 Jun