List:Commits« Previous MessageNext Message »
From:Sergei Golubchik Date:July 24 2009 8:55pm
Subject:Re: bzr commit into mysql-5.1-bugteam branch (gshchepa:3034)
Bug#30946
View as plain text  
Hi, Gleb!

On Jul 22, Gleb Shchepa wrote:
>  3034 Gleb Shchepa	2009-07-22
>       Bug# 30946: mysqldump silently ignores --default-character-set
>                   when used with --tab

Looks good to me.

I have few minor remarks, see below (add comments, add more tests,
simplify one function).

ok to push after doing all that.

>       1) New syntax: added CHARACTER SET clause to the
>         SELECT ... INTO OUTFILE (to complement the same clause in
>         LOAD DATA INFILE).
>         mysqldump is updated to use this in --tab mode.
>
>       2) ESCAPED BY/ENCLOSED BY field parameters are documented as
>          accepting CHAR argument, however SELECT .. INTO OUTFILE
>          silently ignored rests of multisymbol arguments.
>          For the symmetrical behavior with LOAD DATA INFILE the
>          server has been modified to fail with the same error:
>
>            ERROR 42000: Field separator argument is not what is
>                         expected; check the manual
>
>       3) Current LOAD DATA INFILE recognizes field/line separators
>          "as is" without converting from client charset to data
>          file charset. So, it is supposed, that input file of
>          LOAD DATA INFILE consists of data in one charset and
>          separators in other charset. For the compatibility with
>          that [buggy] behaviour SELECT INTO OUTFILE implementation
>          has been saved "as is" too, but the new warning message
>          has been added:
>
>            Non-ASCII separator arguments are not fully supported
>
>          This message warns on field/line separators that contain
>          non-ASCII symbols.

> === modified file 'mysql-test/r/outfile_loaddata.result'
> --- a/mysql-test/r/outfile_loaddata.result	2007-11-20 16:15:20 +0000
> +++ b/mysql-test/r/outfile_loaddata.result	2009-07-22 08:23:31 +0000

could you also add new tests for just SELECT ... OUTFILE - from SQL, not
via mysqldump ?

> === modified file 'sql/sql_class.cc'
> --- a/sql/sql_class.cc	2009-07-10 23:12:13 +0000
> +++ b/sql/sql_class.cc	2009-07-22 08:23:31 +0000
> @@ -1790,6 +1790,8 @@ select_export::prepare(List<Item> &list,
>    if ((uint) strlen(exchange->file_name) + NAME_LEN >= FN_REFLEN)
>      strmake(path,exchange->file_name,FN_REFLEN-1);
>
> +  write_cs= exchange->cs ? exchange->cs : &my_charset_bin;
> +
>    if ((file= create_file(thd, path, exchange, &cache)) < 0)
>      return 1;
>    /* Check if there is any blobs in data */
> @@ -1809,6 +1811,21 @@ select_export::prepare(List<Item> &list,
>          non_string_results= TRUE;
>      }
>    }
> +  if (exchange->escaped->numchars() > 1 ||
> exchange->enclosed->numchars() > 1)
> +  {
> +    my_error(ER_WRONG_FIELD_TERMINATORS, MYF(0));
> +    return TRUE;
> +  }
> +  if (exchange->escaped->length() > 1 || exchange->enclosed->length()
> > 1 ||
> +      !my_isascii(exchange->escaped->ptr()[0]) ||
> +      !my_isascii(exchange->enclosed->ptr()[0]) ||
> +      !exchange->field_term->is_ascii() ||
> !exchange->line_term->is_ascii() ||
> +      !exchange->line_start->is_ascii())
> +  {
> +    push_warning(thd, MYSQL_ERROR::WARN_LEVEL_WARN,
> +                 WARN_NON_ASCII_SEPARATOR_NOT_IMPLEMENTED,
> +                 ER(WARN_NON_ASCII_SEPARATOR_NOT_IMPLEMENTED));
> +  }

please explain here too (in a comment) why you don't recode field
separators (number 3 from the changeset comment).

>    field_term_length=exchange->field_term->length();
>    field_term_char= field_term_length ?
>                     (int) (uchar) (*exchange->field_term)[0] : INT_MAX;
> @@ -1858,6 +1875,8 @@ bool select_export::send_data(List<Item>
>
>    DBUG_ENTER("select_export::send_data");
>    char buff[MAX_FIELD_WIDTH],null_buff[2],space[MAX_FIELD_WIDTH];
> +  char cvt_buff[MAX_FIELD_WIDTH];
> +  String cvt_str(cvt_buff, sizeof(cvt_buff), write_cs);
>    bool space_inited=0;
>    String tmp(buff,sizeof(buff),&my_charset_bin),*res;
>    tmp.length(0);
> @@ -1881,6 +1900,39 @@ bool select_export::send_data(List<Item>
>      bool enclosed = (exchange->enclosed->length() &&
>                       (!exchange->opt_enclosed || result_type == STRING_RESULT));
>      res=item->str_result(&tmp);
> +    if (res && !my_charset_same(write_cs, res->charset()) &&
> +        !my_charset_same(write_cs, &my_charset_bin))
> +    {
> +      char printable_buff[64];
> +      const char *well_formed_error_pos;
> +      const char *cannot_convert_error_pos;
> +      const char *from_end_pos;
> +      const char *error_pos;
> +      uint32 bytes;
> +      bytes= well_formed_copy_nchars(write_cs, cvt_buff, sizeof(cvt_buff),
> +                                     res->charset(), res->ptr(),
> res->length(),
> +                                     sizeof(cvt_buff),
> +                                     &well_formed_error_pos,
> +                                     &cannot_convert_error_pos,
> +                                     &from_end_pos);
> +      error_pos= well_formed_error_pos ? well_formed_error_pos
> +                                       : cannot_convert_error_pos;
> +      if (error_pos)
> +      {

would be logical to declare printable_buff[] here, not in the outer
scope, right ?

(I had to search to see where printable_buff is [not] used, as I
expected it to be used somewhere outside that if() as the code
suggested)

> +        const char *end= res->ptr() + res->length(), *orig_end= end;
> +        set_if_smaller(end, error_pos + 6);
> +        convert_to_printable(printable_buff,
> +                             printable_buff + sizeof(printable_buff),
> +                             error_pos, end, res->charset(), end < orig_end);
> +        push_warning_printf(thd, MYSQL_ERROR::WARN_LEVEL_WARN,
> +                            ER_TRUNCATED_WRONG_VALUE_FOR_FIELD,
> +                            ER(ER_TRUNCATED_WRONG_VALUE_FOR_FIELD),
> +                            "string", printable_buff,
> +                            item->name, row_count);
> +      }
> +      cvt_str.length(bytes);
> +      res= &cvt_str;
> +    }
>      if (res && enclosed)
>      {
>        if (my_b_write(&cache,(uchar*) exchange->enclosed->ptr(),
>
> === modified file 'sql/sql_string.cc'
> --- a/sql/sql_string.cc	2009-06-16 14:36:15 +0000
> +++ b/sql/sql_string.cc	2009-07-22 08:23:31 +0000
> @@ -1119,3 +1119,69 @@ void String::swap(String &s)
>    swap_variables(bool, alloced, s.alloced);
>    swap_variables(CHARSET_INFO*, str_charset, s.str_charset);
>  }
> +
> +
> +/**
> +  Convert string to printable ASCII string
> +
> +  @details This function converts input string "from" replacing non-ASCII bytes
> +  with hexadecimal sequences ("\xXX") optionally appending "..." to the end of
> +  the resulting string.

please add something about the usage ("... used in the error messages
... e.g.  when a string cannot be converted to a result charset...")

> +
> +  @param    to          output buffer
> +  @param    to_end      end of the output buffer
> +  @param    from        input string
> +  @param    from_end    end of the input string
> +  @param    from_cs     input charset
> +  @param    dots        append '...' to output if true
> +
> +  @return   number of bytes in the output string
> +*/
> +
> +uint convert_to_printable(char *to, const char *to_end,
> +                          const char *from, const char *from_end,
> +                          CHARSET_INFO *from_cs, bool dots)
> +{
> +  char *to_orig= to;
> +
> +  if (!from || to == to_end)
> +    return 0;
> +
> +  to_end--; // for '\0' at the end
> +
> +  for (; to < to_end && from < from_end; from++)
> +  {
....
> +  }
> +  if (dots && to_end - to >= 3)
> +  {
> +    *to++= '.';
> +    *to++= '.';
> +    *to++= '.';
> +  }
> +  *to= '\0';
> +  return to - to_orig;
> +}

it's a bit complicated semantics, you expect the caller to truncate the
original string (that caused an error), set from_end accordingly and set
'dots' to specify whether there was a truncation (whether from_end
points to the end of the string or not).

I think it's kind of more natural to have this logic in the
convert_to_printable() function. That is, it should take

   from, from_length (or from_end) -- real length/end of the string
   to, to_length (or end)

and the function decides automatically whether add dots or not.
Examples:

  to=char[10] buffer, to_length=10.

  from="abc" (that is, from_length=3)
  result is "abc" (to="abc")

  from="abc1234567890"
  result is "abc123..."

  from="123456789"
  result="123456789"

etc.

Regards / Mit vielen Grüßen,
Sergei

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <serg@stripped>
 / /|_/ / // /\ \/ /_/ / /__  Principal Software Engineer/Server Architect
/_/  /_/\_, /___/\___\_\___/  Sun Microsystems GmbH, HRB München 161028
       <___/                  Sonnenallee 1, 85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring
Thread
bzr commit into mysql-5.1-bugteam branch (gshchepa:3034) Bug#30946Gleb Shchepa23 Jul
  • Re: bzr commit into mysql-5.1-bugteam branch (gshchepa:3034)Bug#30946Sergei Golubchik24 Jul