Paul DuBois wrote:
> I've been puzzling over this patch, which implements a form of the
> CONVERT() function. I can see that this can be useful for specifying
> the destination character set as a string expression rather than as
> an unquoted character set name. But I'm wondering why the second argument
> is necessary at all. Strings have a charset already, why do you have
> to specify what it is?
There are some reasons against this style of CONVERT():
CONVERT(string,from_charset,to_charset)
They are:
1. As we lately decided, a function should never return strings
with different charsets in different rows. The above style of
CONVERT() breaks this rule.
2. Also I've already implemented COLLATE syntax, so one can easily
force a string to change it's charset, so one can use this for
example:
CONVERT(latin1_string COLLATE latin5 USING utf8)
In other hand, let's imagine that we have a mailing storage (as
Peter suggested in his example) this a structure like this:
CREATE TABLE mail (
body BLOB BINARY,
body_charset VARCHAR(32)
);
Now, if we want to send a body of all letters in UTF8,
this style of CONVERT() wants a an unquoted string on
the second place. As far as body_charset is expression
but not an unquoted charset name, this will fail:
SELECT CONVERT(body COLLATE body_charset USING utf8) ...
So, taking all this in account, I can suppose:
1. we should remove this style: CONVERT(expr,expr,expr), as far
as it can produce strings in different charsets in
different rows, which is wrong;
2. we can't extend COLLATE to support expression:
SELECT body COLLATE expr
because it will be able to produce different charsets
in different rows again.
Probably, the case is to extend CONVERT syntax to support
something like this:
CONVERT(expr FROM expr USING unquoted_charset_name)
where the first expr is a source string to convert and the
second expr is a source charset. This will not reduce a
functionality and also will not break the rule about
different charsets in rows.
What do you think?
>
> At 19:11 +0400 3/29/02, bar@stripped wrote:
>
>> Below is the list of changes that have just been committed into a
>> 4.1 repository of bar. When bar does a push, they will be propogated to
>> the main repository and within 24 hours after the push to the public
>> repository.
>> For information on how to access the public repository
>> see http://www.mysql.com/doc/I/n/Installing_source_tree.html
>>
>> ChangeSet
>> 1.1178 02/03/29 19:11:06 bar@stripped +3 -0
>> Now this syntax works too: CONVERT(string,charset_to,charset_from)
>> where charset_to and charset_from are expressions. For example:
>>
>> CONVERT('test','latin2','cp1250')
>>
>> sql/sql_yacc.yy
>> 1.155 02/03/29 19:11:05 bar@stripped +4 -0
>> Now this syntax works too: CONVERT(string,charset_to,charset_from)
>>
>> sql/item_strfunc.h
>> 1.18 02/03/29 19:11:04 bar@stripped +10 -0
>> Now this syntax works too: CONVERT(string,charset_to,charset_from)
>>
>> sql/item_strfunc.cc
>> 1.42 02/03/29 19:11:04 bar@stripped +73 -0
>> Now this syntax works too: CONVERT(string,charset_to,charset_from)
>
>
> Also, it appears to me that the names of the second and third arguments
> in the preceding descriptions is backward, because the function result has
> the charset of the third argument, not the second:
>
> mysql> select charset(convert('abc','latin1','utf8'));
> +-----------------------------------------+
> | charset(convert('abc','latin1','utf8')) |
> +-----------------------------------------+
> | utf8 |
> +-----------------------------------------+
> mysql> select charset(convert('abc','utf8','latin1'));
> +-----------------------------------------+
> | charset(convert('abc','utf8','latin1')) |
> +-----------------------------------------+
> | latin1 |
> +-----------------------------------------+
>
>>
>> # This is a BitKeeper patch. What follows are the unified diffs for the
>> # set of deltas contained in the patch. The rest of the patch, the part
>> # that BitKeeper cares about, is below these diffs.
>> # User: bar
>> # Host: gw.udmsearch.izhnet.ru
>> # Root: /usr/home/bar/mysql-4.1
>>
>> --- 1.41/sql/item_strfunc.cc Fri Mar 29 18:22:18 2002
>> +++ 1.42/sql/item_strfunc.cc Fri Mar 29 19:11:04 2002
>> @@ -1843,6 +1843,79 @@
>> /* BAR TODO: What to do here??? */
>> }
>>
>> +
>> +String *Item_func_conv_charset3::val_str(String *str)
>> +{
>> + my_wc_t wc;
>> + int cnvres;
>> + const uchar *s, *se;
>> + uchar *d, *d0, *de;
>> + uint dmaxlen;
>> + String *arg= args[0]->val_str(str);
>> + String *to_cs= args[1]->val_str(str);
>> + String *from_cs= args[2]->val_str(str);
>> + CHARSET_INFO *from_charset;
>> + CHARSET_INFO *to_charset;
>> + + if (!arg || args[0]->null_value ||
>> + !to_cs || args[1]->null_value ||
>> + !from_cs || args[2]->null_value ||
>> + !(from_charset=find_compiled_charset_by_name(from_cs->ptr())) ||
>> + !(to_charset=find_compiled_charset_by_name(to_cs->ptr())))
>> + {
>> + null_value=1;
>> + return 0;
>> + }
>> +
>> + s=(const uchar*)arg->ptr();
>> + se=s+arg->length();
>> + +
>> dmaxlen=arg->length()*(to_charset->mbmaxlen?to_charset->mbmaxlen:1)+1;
>> + str->alloc(dmaxlen);
>> + d0=d=(unsigned char*)str->ptr();
>> + de=d+dmaxlen;
>> + + while( s < se && d < de){
>> +
>> + cnvres=from_charset->mb_wc(from_charset,&wc,s,se);
>> + if (cnvres>0)
>> + {
>> + s+=cnvres;
>> + }
>> + else if (cnvres==MY_CS_ILSEQ)
>> + {
>> + s++;
>> + wc='?';
>> + }
>> + else
>> + break;
>> +
>> +outp:
>> + cnvres=to_charset->wc_mb(to_charset,wc,d,de);
>> + if (cnvres>0)
>> + {
>> + d+=cnvres;
>> + }
>> + else if (cnvres==MY_CS_ILUNI && wc!='?')
>> + {
>> + wc='?';
>> + goto outp;
>> + }
>> + else
>> + break;
>> + };
>> + + str->length((uint) (d-d0));
>> + str->set_charset(to_charset);
>> + return str;
>> +}
>> +
>> +void Item_func_conv_charset3::fix_length_and_dec()
>> +{
>> + /* BAR TODO: What to do here??? */
>> +}
>> +
>> +
>> String *Item_func_hex::val_str(String *str)
>> {
>> if (args[0]->result_type() != STRING_RESULT)
>>
>> --- 1.17/sql/item_strfunc.h Fri Mar 29 18:22:19 2002
>> +++ 1.18/sql/item_strfunc.h Fri Mar 29 19:11:04 2002
>> @@ -489,6 +489,16 @@
>> const char *func_name() const { return "conv_charset"; }
>> };
>>
>> +class Item_func_conv_charset3 :public Item_str_func
>> +{
>> +public:
>> + Item_func_conv_charset3(Item *arg1,Item *arg2,Item *arg3)
>> + :Item_str_func(arg1,arg2,arg3) {}
>> + String *val_str(String *);
>> + void fix_length_and_dec();
>> + const char *func_name() const { return "conv_charset3"; }
>> +};
>> +
>>
>> /*******************************************************
>> Spatial functions
>>
>> --- 1.154/sql/sql_yacc.yy Fri Mar 29 18:22:20 2002
>> +++ 1.155/sql/sql_yacc.yy Fri Mar 29 19:11:05 2002
>> @@ -1664,6 +1664,10 @@
>> }
>> $$= new Item_func_conv_charset($3,cs);
>> }
>> + | CONVERT_SYM '(' expr ',' expr ',' expr ')'
>> + {
>> + $$= new Item_func_conv_charset3($3,$5,$7);
>> + }
>> | FUNC_ARG0 '(' ')'
>> { $$= ((Item*(*)(void))($1.symbol->create_func))();}
>> | FUNC_ARG1 '(' expr ')'
>>
>
>
>
| Thread |
|---|
| • bk commit into 4.1 tree | bar | 29 Mar |
| • Re: bk commit into 4.1 tree | Paul DuBois | 29 Mar |
| • Question about CONVERT(str,charset_to,charset_from) | Paul DuBois | 18 Oct |
| • Re: Question about CONVERT(str,charset_to,charset_from) | Peter Zaitsev | 18 Oct |
| • Re: Question about CONVERT(str,charset_to,charset_from) | Paul DuBois | 18 Oct |
| • Re: Question about CONVERT(str,charset_to,charset_from) | Shurik) Barkov | 20 Oct |
| • Re: Question about CONVERT(str,charset_to,charset_from) | Michael Widenius | 20 Oct |
| • Re: Question about CONVERT(str,charset_to,charset_from) | Paul DuBois | 18 Oct |