List:Internals« Previous MessageNext Message »
From:Alexander (Shurik) Barkov Date:October 20 2002 5:28am
Subject:Re: Question about CONVERT(str,charset_to,charset_from)
View as plain text  
Paul DuBois wrote:

> I've been puzzling over this patch, which implements a form of the
> CONVERT() function.  I can see that this can be useful for specifying
> the destination character set as a string expression rather than as
> an unquoted character set name.  But I'm wondering why the second argument
> is necessary at all.  Strings have a charset already, why do you have
> to specify what it is?



There are some reasons against this style of CONVERT():

   CONVERT(string,from_charset,to_charset)

They are:

1. As we lately decided, a function should never return strings
    with different charsets in different rows. The above style of
    CONVERT() breaks this rule.

2. Also I've already implemented COLLATE syntax, so one can easily
    force a string to change it's charset, so one can use this for
    example:
    CONVERT(latin1_string COLLATE latin5 USING utf8)

In other hand, let's imagine that we have a mailing storage (as
Peter suggested in his example) this a structure like this:

CREATE TABLE mail (
   body         BLOB BINARY,
   body_charset VARCHAR(32)
);


Now, if we want to send a body of all letters in UTF8,
this style of CONVERT() wants a an unquoted string on
the second place. As far as body_charset is expression
but not an unquoted charset name, this will fail:

SELECT CONVERT(body COLLATE body_charset USING utf8) ...


So, taking all this in account, I can suppose:

1. we should remove this style: CONVERT(expr,expr,expr), as far
    as it can produce strings in different charsets in
    different rows, which is wrong;

2. we can't extend COLLATE to support expression:
    SELECT body COLLATE expr
    because it will be able to produce different charsets
    in different rows again.


Probably, the case is to extend CONVERT syntax to support
something like this:

CONVERT(expr FROM expr USING unquoted_charset_name)

  where the first expr is a source string to convert and the
  second expr is a source charset. This will not reduce a
  functionality and also will not break the rule about
  different charsets in rows.

What do you think?


> 
> At 19:11 +0400 3/29/02, bar@stripped wrote:
> 
>> Below is the list of changes that have just been committed into a
>> 4.1 repository of bar. When bar does a push, they will be propogated to
>> the main repository and within 24 hours after the push to the public 
>> repository.
>> For information on how to access the public repository
>> see http://www.mysql.com/doc/I/n/Installing_source_tree.html
>>
>> ChangeSet
>>   1.1178 02/03/29 19:11:06 bar@stripped +3 -0
>>   Now this syntax works too:  CONVERT(string,charset_to,charset_from)
>>   where charset_to and charset_from are expressions. For example:
>>
>>   CONVERT('test','latin2','cp1250')
>>
>>   sql/sql_yacc.yy
>>     1.155 02/03/29 19:11:05 bar@stripped +4 -0
>>     Now this syntax works too:  CONVERT(string,charset_to,charset_from)
>>
>>   sql/item_strfunc.h
>>     1.18 02/03/29 19:11:04 bar@stripped +10 -0
>>     Now this syntax works too:  CONVERT(string,charset_to,charset_from)
>>
>>   sql/item_strfunc.cc
>>     1.42 02/03/29 19:11:04 bar@stripped +73 -0
>>     Now this syntax works too:  CONVERT(string,charset_to,charset_from)
> 
> 
> Also, it appears to me that the names of the second and third arguments
> in the preceding descriptions is backward, because the function result has
> the charset of the third argument, not the second:
> 
> mysql> select charset(convert('abc','latin1','utf8'));
> +-----------------------------------------+
> | charset(convert('abc','latin1','utf8')) |
> +-----------------------------------------+
> | utf8                                    |
> +-----------------------------------------+
> mysql> select charset(convert('abc','utf8','latin1'));
> +-----------------------------------------+
> | charset(convert('abc','utf8','latin1')) |
> +-----------------------------------------+
> | latin1                                  |
> +-----------------------------------------+
> 
>>
>> # This is a BitKeeper patch.  What follows are the unified diffs for the
>> # set of deltas contained in the patch.  The rest of the patch, the part
>> # that BitKeeper cares about, is below these diffs.
>> # User:    bar
>> # Host:    gw.udmsearch.izhnet.ru
>> # Root:    /usr/home/bar/mysql-4.1
>>
>> --- 1.41/sql/item_strfunc.cc    Fri Mar 29 18:22:18 2002
>> +++ 1.42/sql/item_strfunc.cc    Fri Mar 29 19:11:04 2002
>> @@ -1843,6 +1843,79 @@
>>    /* BAR TODO: What to do here??? */
>>  }
>>
>> +
>> +String *Item_func_conv_charset3::val_str(String *str)
>> +{
>> +  my_wc_t wc;
>> +  int cnvres;
>> +  const uchar *s, *se;
>> +  uchar *d, *d0, *de;
>> +  uint dmaxlen;
>> +  String *arg= args[0]->val_str(str);
>> +  String *to_cs= args[1]->val_str(str);
>> +  String *from_cs= args[2]->val_str(str);
>> +  CHARSET_INFO *from_charset;
>> +  CHARSET_INFO *to_charset;
>> + +  if (!arg     || args[0]->null_value ||
>> +      !to_cs   || args[1]->null_value ||
>> +      !from_cs || args[2]->null_value ||
>> +      !(from_charset=find_compiled_charset_by_name(from_cs->ptr())) ||
>> +      !(to_charset=find_compiled_charset_by_name(to_cs->ptr())))
>> +  {
>> +    null_value=1;
>> +    return 0;
>> +  }
>> +
>> +  s=(const uchar*)arg->ptr();
>> +  se=s+arg->length();
>> + +  
>> dmaxlen=arg->length()*(to_charset->mbmaxlen?to_charset->mbmaxlen:1)+1;
>> +  str->alloc(dmaxlen);
>> +  d0=d=(unsigned char*)str->ptr();
>> +  de=d+dmaxlen;
>> + +  while( s < se && d < de){
>> +
>> +    cnvres=from_charset->mb_wc(from_charset,&wc,s,se);
>> +    if (cnvres>0)
>> +    {
>> +      s+=cnvres;
>> +    }
>> +    else if (cnvres==MY_CS_ILSEQ)
>> +    {
>> +      s++;
>> +      wc='?';
>> +    }
>> +    else
>> +      break;
>> +
>> +outp:
>> +    cnvres=to_charset->wc_mb(to_charset,wc,d,de);
>> +    if (cnvres>0)
>> +    {
>> +      d+=cnvres;
>> +    }
>> +    else if (cnvres==MY_CS_ILUNI && wc!='?')
>> +    {
>> +        wc='?';
>> +        goto outp;
>> +    }
>> +    else
>> +      break;
>> +  };
>> + +  str->length((uint) (d-d0));
>> +  str->set_charset(to_charset);
>> +  return str;
>> +}
>> +
>> +void Item_func_conv_charset3::fix_length_and_dec()
>> +{
>> +  /* BAR TODO: What to do here??? */
>> +}
>> +
>> +
>>  String *Item_func_hex::val_str(String *str)
>>  {
>>    if (args[0]->result_type() != STRING_RESULT)
>>
>> --- 1.17/sql/item_strfunc.h    Fri Mar 29 18:22:19 2002
>> +++ 1.18/sql/item_strfunc.h    Fri Mar 29 19:11:04 2002
>> @@ -489,6 +489,16 @@
>>    const char *func_name() const { return "conv_charset"; }
>>  };
>>
>> +class Item_func_conv_charset3 :public Item_str_func
>> +{
>> +public:
>> +  Item_func_conv_charset3(Item *arg1,Item *arg2,Item *arg3)
>> +    :Item_str_func(arg1,arg2,arg3) {}
>> +  String *val_str(String *);
>> +  void fix_length_and_dec();
>> +  const char *func_name() const { return "conv_charset3"; }
>> +};
>> +
>>
>>  /*******************************************************
>>  Spatial functions
>>
>> --- 1.154/sql/sql_yacc.yy    Fri Mar 29 18:22:20 2002
>> +++ 1.155/sql/sql_yacc.yy    Fri Mar 29 19:11:05 2002
>> @@ -1664,6 +1664,10 @@
>>          }
>>          $$= new Item_func_conv_charset($3,cs);
>>        }
>> +    | CONVERT_SYM '(' expr ',' expr ',' expr ')'
>> +      {
>> +        $$= new Item_func_conv_charset3($3,$5,$7);
>> +      }
>>      | FUNC_ARG0 '(' ')'
>>        { $$= ((Item*(*)(void))($1.symbol->create_func))();}
>>      | FUNC_ARG1 '(' expr ')'
>>
> 
> 
> 



Thread
bk commit into 4.1 treebar29 Mar
  • Re: bk commit into 4.1 treePaul DuBois29 Mar
  • Question about CONVERT(str,charset_to,charset_from)Paul DuBois18 Oct
    • Re: Question about CONVERT(str,charset_to,charset_from)Peter Zaitsev18 Oct
      • Re: Question about CONVERT(str,charset_to,charset_from)Paul DuBois18 Oct
  • Re: Question about CONVERT(str,charset_to,charset_from)Shurik) Barkov20 Oct
    • Re: Question about CONVERT(str,charset_to,charset_from)Michael Widenius20 Oct
Re: Question about CONVERT(str,charset_to,charset_from)Paul DuBois18 Oct