Hi, Gleb!
On Jul 22, Gleb Shchepa wrote:
> 3034 Gleb Shchepa 2009-07-22
> Bug# 30946: mysqldump silently ignores --default-character-set
> when used with --tab
Looks good to me.
I have few minor remarks, see below (add comments, add more tests,
simplify one function).
ok to push after doing all that.
> 1) New syntax: added CHARACTER SET clause to the
> SELECT ... INTO OUTFILE (to complement the same clause in
> LOAD DATA INFILE).
> mysqldump is updated to use this in --tab mode.
>
> 2) ESCAPED BY/ENCLOSED BY field parameters are documented as
> accepting CHAR argument, however SELECT .. INTO OUTFILE
> silently ignored rests of multisymbol arguments.
> For the symmetrical behavior with LOAD DATA INFILE the
> server has been modified to fail with the same error:
>
> ERROR 42000: Field separator argument is not what is
> expected; check the manual
>
> 3) Current LOAD DATA INFILE recognizes field/line separators
> "as is" without converting from client charset to data
> file charset. So, it is supposed, that input file of
> LOAD DATA INFILE consists of data in one charset and
> separators in other charset. For the compatibility with
> that [buggy] behaviour SELECT INTO OUTFILE implementation
> has been saved "as is" too, but the new warning message
> has been added:
>
> Non-ASCII separator arguments are not fully supported
>
> This message warns on field/line separators that contain
> non-ASCII symbols.
> === modified file 'mysql-test/r/outfile_loaddata.result'
> --- a/mysql-test/r/outfile_loaddata.result 2007-11-20 16:15:20 +0000
> +++ b/mysql-test/r/outfile_loaddata.result 2009-07-22 08:23:31 +0000
could you also add new tests for just SELECT ... OUTFILE - from SQL, not
via mysqldump ?
> === modified file 'sql/sql_class.cc'
> --- a/sql/sql_class.cc 2009-07-10 23:12:13 +0000
> +++ b/sql/sql_class.cc 2009-07-22 08:23:31 +0000
> @@ -1790,6 +1790,8 @@ select_export::prepare(List<Item> &list,
> if ((uint) strlen(exchange->file_name) + NAME_LEN >= FN_REFLEN)
> strmake(path,exchange->file_name,FN_REFLEN-1);
>
> + write_cs= exchange->cs ? exchange->cs : &my_charset_bin;
> +
> if ((file= create_file(thd, path, exchange, &cache)) < 0)
> return 1;
> /* Check if there is any blobs in data */
> @@ -1809,6 +1811,21 @@ select_export::prepare(List<Item> &list,
> non_string_results= TRUE;
> }
> }
> + if (exchange->escaped->numchars() > 1 ||
> exchange->enclosed->numchars() > 1)
> + {
> + my_error(ER_WRONG_FIELD_TERMINATORS, MYF(0));
> + return TRUE;
> + }
> + if (exchange->escaped->length() > 1 || exchange->enclosed->length()
> > 1 ||
> + !my_isascii(exchange->escaped->ptr()[0]) ||
> + !my_isascii(exchange->enclosed->ptr()[0]) ||
> + !exchange->field_term->is_ascii() ||
> !exchange->line_term->is_ascii() ||
> + !exchange->line_start->is_ascii())
> + {
> + push_warning(thd, MYSQL_ERROR::WARN_LEVEL_WARN,
> + WARN_NON_ASCII_SEPARATOR_NOT_IMPLEMENTED,
> + ER(WARN_NON_ASCII_SEPARATOR_NOT_IMPLEMENTED));
> + }
please explain here too (in a comment) why you don't recode field
separators (number 3 from the changeset comment).
> field_term_length=exchange->field_term->length();
> field_term_char= field_term_length ?
> (int) (uchar) (*exchange->field_term)[0] : INT_MAX;
> @@ -1858,6 +1875,8 @@ bool select_export::send_data(List<Item>
>
> DBUG_ENTER("select_export::send_data");
> char buff[MAX_FIELD_WIDTH],null_buff[2],space[MAX_FIELD_WIDTH];
> + char cvt_buff[MAX_FIELD_WIDTH];
> + String cvt_str(cvt_buff, sizeof(cvt_buff), write_cs);
> bool space_inited=0;
> String tmp(buff,sizeof(buff),&my_charset_bin),*res;
> tmp.length(0);
> @@ -1881,6 +1900,39 @@ bool select_export::send_data(List<Item>
> bool enclosed = (exchange->enclosed->length() &&
> (!exchange->opt_enclosed || result_type == STRING_RESULT));
> res=item->str_result(&tmp);
> + if (res && !my_charset_same(write_cs, res->charset()) &&
> + !my_charset_same(write_cs, &my_charset_bin))
> + {
> + char printable_buff[64];
> + const char *well_formed_error_pos;
> + const char *cannot_convert_error_pos;
> + const char *from_end_pos;
> + const char *error_pos;
> + uint32 bytes;
> + bytes= well_formed_copy_nchars(write_cs, cvt_buff, sizeof(cvt_buff),
> + res->charset(), res->ptr(),
> res->length(),
> + sizeof(cvt_buff),
> + &well_formed_error_pos,
> + &cannot_convert_error_pos,
> + &from_end_pos);
> + error_pos= well_formed_error_pos ? well_formed_error_pos
> + : cannot_convert_error_pos;
> + if (error_pos)
> + {
would be logical to declare printable_buff[] here, not in the outer
scope, right ?
(I had to search to see where printable_buff is [not] used, as I
expected it to be used somewhere outside that if() as the code
suggested)
> + const char *end= res->ptr() + res->length(), *orig_end= end;
> + set_if_smaller(end, error_pos + 6);
> + convert_to_printable(printable_buff,
> + printable_buff + sizeof(printable_buff),
> + error_pos, end, res->charset(), end < orig_end);
> + push_warning_printf(thd, MYSQL_ERROR::WARN_LEVEL_WARN,
> + ER_TRUNCATED_WRONG_VALUE_FOR_FIELD,
> + ER(ER_TRUNCATED_WRONG_VALUE_FOR_FIELD),
> + "string", printable_buff,
> + item->name, row_count);
> + }
> + cvt_str.length(bytes);
> + res= &cvt_str;
> + }
> if (res && enclosed)
> {
> if (my_b_write(&cache,(uchar*) exchange->enclosed->ptr(),
>
> === modified file 'sql/sql_string.cc'
> --- a/sql/sql_string.cc 2009-06-16 14:36:15 +0000
> +++ b/sql/sql_string.cc 2009-07-22 08:23:31 +0000
> @@ -1119,3 +1119,69 @@ void String::swap(String &s)
> swap_variables(bool, alloced, s.alloced);
> swap_variables(CHARSET_INFO*, str_charset, s.str_charset);
> }
> +
> +
> +/**
> + Convert string to printable ASCII string
> +
> + @details This function converts input string "from" replacing non-ASCII bytes
> + with hexadecimal sequences ("\xXX") optionally appending "..." to the end of
> + the resulting string.
please add something about the usage ("... used in the error messages
... e.g. when a string cannot be converted to a result charset...")
> +
> + @param to output buffer
> + @param to_end end of the output buffer
> + @param from input string
> + @param from_end end of the input string
> + @param from_cs input charset
> + @param dots append '...' to output if true
> +
> + @return number of bytes in the output string
> +*/
> +
> +uint convert_to_printable(char *to, const char *to_end,
> + const char *from, const char *from_end,
> + CHARSET_INFO *from_cs, bool dots)
> +{
> + char *to_orig= to;
> +
> + if (!from || to == to_end)
> + return 0;
> +
> + to_end--; // for '\0' at the end
> +
> + for (; to < to_end && from < from_end; from++)
> + {
....
> + }
> + if (dots && to_end - to >= 3)
> + {
> + *to++= '.';
> + *to++= '.';
> + *to++= '.';
> + }
> + *to= '\0';
> + return to - to_orig;
> +}
it's a bit complicated semantics, you expect the caller to truncate the
original string (that caused an error), set from_end accordingly and set
'dots' to specify whether there was a truncation (whether from_end
points to the end of the string or not).
I think it's kind of more natural to have this logic in the
convert_to_printable() function. That is, it should take
from, from_length (or from_end) -- real length/end of the string
to, to_length (or end)
and the function decides automatically whether add dots or not.
Examples:
to=char[10] buffer, to_length=10.
from="abc" (that is, from_length=3)
result is "abc" (to="abc")
from="abc1234567890"
result is "abc123..."
from="123456789"
result="123456789"
etc.
Regards / Mit vielen Grüßen,
Sergei
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@stripped>
/ /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect
/_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028
<___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring