List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:November 12 2010 2:32pm
Subject:Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to
3229)
View as plain text  
Hello,

Jorgen Loland a écrit, Le 09.11.2010 13:55:
> Review of changesets up to 3218:
> 
> === modified file 'mysql-test/r/optimizer_trace_no_prot.result'
> --- mysql-test/r/optimizer_trace_no_prot.result    2010-09-22 19:50:59 
> +0000
> +++ mysql-test/r/optimizer_trace_no_prot.result    2010-10-05 13:19:55 
> +0000
>  > @@ -1962,7 +1949,15 @@ QUERY_ID    TRACE    MISSING_BYTES_BEYOND_MAX_
>  >      }
>  >    ] /* steps */
>  >  }    0    0
>  > -set optimizer_trace="skip_plan=off";
>  > +set @@optimizer_trace_features="greedy_search=on,misc=off";
>  > +explain select * from t1,t2;
>  > +id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > +1    SIMPLE    t2    ALL    NULL    NULL    NULL    NULL    2   
>  > +1    SIMPLE    t1    ALL    NULL    NULL    NULL    NULL    3    
> Using join buffer (BNL, regular buffers)
>  > +select * from information_schema.OPTIMIZER_TRACE;
>  > +QUERY_ID    TRACE    MISSING_BYTES_BEYOND_MAX_MEM_SIZE    
> OS_MALLOC_ERROR
>  > +0        0    0
>  > +set @@optimizer_trace_features=default;
> 
> jl: So turning feature "misc" off is the same as disabling tracing? I
> don't find that all to intuitive. If this is correct (misc=off ==
> enable=off), why don't we remove the misc feature name?

What happens is that the top object (the {} around the entire trace) is 
in category "misc" (every structure needs to have a category...), so if 
you disable "misc", you disable this top object, and thus everything 
below it.
Indeed "misc" isn't intended to be used by users, so I will hide it.

>  > === modified file 'sql/opt_trace.cc'
>  > --- sql/opt_trace.cc    2010-09-22 19:50:59 +0000
>  > +++ sql/opt_trace.cc    2010-09-27 08:00:25 +0000
>  > @@ -86,15 +105,27 @@ void Opt_trace_ooa::do_construct(Opt_tra
>  >        */
>  >        size_t new_size= alloced + increment;
>  >        const size_t max_size= stmt->buffer.allowed_mem_size;
>  > -      if (new_size > max_size /* don't pre-allocate more than the 
> limit */ &&
>  > -          max_size >= alloced /* don't shrink string if limit 
> exceeded */ )
>  > +      if (new_size > max_size) // don't pre-allocate more than the 
> limit
>  >          new_size= max_size;
>  > -      stmt->buffer.realloc(new_size - 1 /* because String::realloc 
> has +1 */);
>  > +      const size_t overhead=
>  > +        1 + (ALIGN_SIZE(1) - 1) // see start of String::realloc()
>  > +        /*
>  > +          Stay strictly below max_size or it would trigger 
> truncation at
>  > +          next append(), which is potentially wrong for a "pre-emptive
>  > +          allocation" as we do here.
>  > +        */
>  > +        + 1;
> 
> jl: I have problems understanding the size of the overhead. Why is 1
> used as parameter to ALIGN_SIZE?

This is an horribly complicated piece of code on which I spent hours, 
until truncation finally happened when I expected :-(
Trying to remember...
The idea is that the MySQL user wants to limit memory size. So we should 
not be interested in the String's length but in the String's _allocated_ 
length (which is possibly bigger).
We are going to re-allocate the String to make room for more bytes (as 
you see we try to allocate in increments of 1024 to not have too 
frequent realloc()).
So the simple code would be:

// grow by 1024 bytes
size_t new_size= alloced + increment;
const size_t max_size= stmt->buffer.allowed_mem_size;
if (new_size > max_size) // don't pre-allocate more than the
                          // limit
   new_size= max_size;
stmt->buffer.realloc(new_size);
But String::realloc() does not reallocate "new_size" bytes, it has a 
more complex formula: it reallocates ALIGN_SIZE(length + 1).
So the simple code above will do
realloc(ALIGN_SIZE(new_size + 1)).
Imagine that max_size is 0 (because all previous traces, still kept due 
to optimizer_trace_limit and optimizer_trace_offset, have consumed all 
allowed memory). new_size is then 0, and the above realloc() allocates 8 
bytes. So for each new kept trace, we will eat 8 bytes. I found it was a 
"security issue", or at least not "works as expectable". I wanted that 
we never allocate if max_size is 0. Which is why we dive into the gory 
details of String::realloc(): we say that this function has an overhead 
of 1+(ALIGN_SIZE(1)-1). Ok I don't remember why I didn't write directly 
ALIGN_SIZE(1): this means 8 bytes (sizeof(double) exactly), and if you 
look at the start of String::realloc() this is indeed correct. Then I 
finally add 1, as the comment in code explains.
Let me know what you think; there is probably a simpler way to write 
this code...

>  > === modified file 'sql/opt_trace.h'
>  > --- sql/opt_trace.h    2010-09-22 19:50:59 +0000
>  > +++ sql/opt_trace.h    2010-09-27 08:00:25 +0000
>  > @@ -531,7 +552,7 @@ private:
>  >    /**
>  > -     Adds a value (of string type) to the OOA. If a key is 
> specified, then it
>  > -     is adds the key/value pair (the OOA must thus be an object). If 
> no key is
>  > -     specified, only the value is added (the OOA must thus be an 
> array).
>  > -     @param  value  the value
>  > -     @param  key    the optional key
>  > -     @return a reference to the OOA, useful for chaining like this:
>  > +     Adds a value (of string type) to the structure. A key is 
> specified, so it
>  > +     is adds the key/value pair (the structure must thus be an object).
>  > +     @param  key    key
>  > +     @param  value  value
>  > +     @return a reference to the structure, useful for chaining like 
> this:
>  >       @verbatim add(x,y).add(z,t).add(u,v) @endverbatim
> 
> jl: typo "so it is adds"

fixed

>  > @@ -881,40 +996,54 @@ public:
>  > /**
>  >    A JSON array (ordered set of values).
>  >    Defines only a constructor, all the rest in inherited from 
> Opt_trace_struct.
>  > */
> 
> jl: s/in/is

fixed

>  > -class Opt_trace_array: public Opt_trace_ooa
>  > +class Opt_trace_array: public Opt_trace_struct
>  >  {
>  >  public:
>  >    /**
>  > -     Constructs an array. If a key is specified, the array is the 
> value of a
>  > -     key/value pair (serves for adding the array to an object). 
> Otherwise the
>  > -     array is just a value (serves for adding the array to an array).
>  > +     Constructs an array. @sa Opt_trace_object::Opt_trace_object.
>  >       @param  ctx  context for this array
>  > -     @param  key  optional key
>  > +     @param  key  key
>  >    */
>  > -  Opt_trace_array(Opt_trace_context *ctx, const char *key= NULL) :
>  > -  Opt_trace_ooa(ctx, false, key) {}
>  > +  Opt_trace_array(Opt_trace_context *ctx, const char *key,
>  > +                  Opt_trace_context::feature_value feature=
>  > +                  Opt_trace_context::MISC) :
>  > +  Opt_trace_struct(ctx, false, key, feature) {}
>  > +  Opt_trace_array(Opt_trace_context *ctx,
>  > +                  Opt_trace_context::feature_value feature=
>  > +                  Opt_trace_context::MISC) :
>  > +  Opt_trace_struct(ctx, false, NULL, feature) {}
>  >  };
> 
> jl: Please document both constructors and explicitly state the
> difference between having and not having a key

done (copied from Opt_trace_object)

>  > @@ -980,15 +1111,15 @@ private:
>  >  /**
>  > -   Grabs the current OOA from the context and call its methods.
>  > -   Most of the time, there is a direct reference to this OOA available
>  > +   Grabs the current structure from the context and call its methods.
>  > +   Most of the time, there is a direct reference to this structure 
> available
>  >     (because it has been instantianted a few lines earlier), then 
> prefer using
>  >     it than this macro.
>  >  */
>  >  #define OPT_TRACE(trace, action) do 
> {                                  \
>  >      if (unlikely((trace) != 
> NULL))                                     \
>  >        if 
> (unlikely((trace)->is_started()))                             \
>  > -        
> (trace)->get_current_ooa()->action;                            \
>  > +        
> (trace)->get_current_struct()->action;                            \
>  >    } while (0)
> 
> jl: align final '\' with the others

fixed

>  > @@ -1054,34 +1185,54 @@ extern ST_FIELD_INFO optimizer_trace_inf
>  >  /* all empty */
>  >
>  >  class Opt_trace_context
>  > -{};
>  > +{
>  > +public:
>  > +  /* We need this one */
>  > +  enum feature_value { MISC= 1, GREEDY_SEARCH= 2 };
>  > +};
> 
> jl: suggestion: "We need this enum even if tracing is disabled"

fixed

>  > @@ -1120,8 +1271,8 @@ public:
>  >  #define 
> OPT_TRACE_TRANSFO(trace,object_level0,object_level1,select_number,from,to) 
> \
>  >      Opt_trace_object 
> object_level0(trace);                              \
>  >      Opt_trace_object object_level1(trace, 
> "transformation");            \
>  > -    object_level1.add((ulonglong)select_number, "select 
> #").            \
>  > -      add(from, "from").add(to, "to");
>  > +    object_level1.add("select #", select_number).          \
>  > +    add("from", from).add("to", to);
> 
> jl: Align '\'

fixed, by typing <TAB> in Emacs. This makes all \ be aligned except the 
one on the line of OPT_TRACE_TRANSFO, which Emacs put more to the right 
because the line is very long.

>  > --- sql/sql_view.cc    2010-08-30 06:38:09 +0000
>  > +++ sql/sql_view.cc    2010-10-05 10:29:43 +0000
>  > @@ -1138,6 +1139,12 @@ bool mysql_make_view(THD *thd, File_pars
>  >    table->definer.user.str= table->definer.host.str= 0;
>  >    table->definer.user.length= table->definer.host.length= 0;
>  >
>  > +  Opt_trace_object oto0(thd->opt_trace);
>  > +  Opt_trace_object oto1(thd->opt_trace, "view");
>  > +  oto1.add("db", table->db, table->db_length).
>  > +    add("name", table->table_name, table->table_name_length).
>  > +    add("in_select#", old_lex->select_lex.select_number);
>  > +
> 
> jl: This has been changed to add_str in a later cs, but don't we need
> to use add_str_escaped when we don't know the character set?

Normally in MySQL (I just checked with Bar the specialist), TABLE's and 
TABLE_LIST's db and table_name are in utf8. So we can pass them to 
add_str() without charset conversion; but you are right that they can 
contain characters like \n which must be escaped. I have put it in my 
TODO to reconsider all add_str* calls in code, to see which ones need 
escaping, and to add tests (see also same discussion in a review of your 
range opt patch).

> opt_trace-t.cc
> --------------
> 1) There are many add_str calls in the test. These could cover more
>    code if they were called with different parameters, specifically
>      a) with length parameter
>      b) with another character set/escaped characters
> 
> 2) Test get_current_struct() by add()-ing to it
> 
> 3) Test set_query() and then print it with "normal" and "strange"
>    character set.

Ok, will work on this.

> opt_trace2server.cc
> -------------------
>  > if (thd->opt_trace->start(need_it_for_I_S,
>  >                           (var & Opt_trace_context::FLAG_END_MARKER),
>  >                           (var & Opt_trace_context::FLAG_ONE_LINE),
>  >                           thd->variables.optimizer_trace_offset,
>  >                           thd->variables.optimizer_trace_limit,
>  >                           thd->variables.optimizer_trace_max_mem_size,
>  > 
> Opt_trace_context::feature_value(thd->variables.optimizer_trace_features)))
> 
> jl: long line

Yes... even just
Opt_trace_context::feature_value(thd->variables.optimizer_trace_features 
is 72 chars, so I don't see how to break the line properly. What would 
you suggest?

>  >  char buff[1024];
>  >  String str(buff,(uint32) sizeof(buff), system_charset_info);
>  >  str.length(0);
>  >  /**
>  >     If this statement is not SELECT, what is shown here can be
>  >     inexact. INSERT SELECT is shown as SELECT. DELETE WHERE is shown
>  >     as SELECT WHERE.
>  >     This is acceptable given the audience (developers) and the goal (the
>  >     inexact parts are irrelevant for the optimizer).
>  >  */
>  >  thd->lex->unit.print(&str, enum_query_type(QT_EXPAND_VIEW_DEFINITION
> |
>  >                                             QT_SHOW_SELECT_NUMBER));
> 
> jl: what happens if the expanded query turns out to be longer than
> 1024 characters?

Most print() uses String::append(), which calls String::realloc(), which 
has this logic:
if (this string's buffer was allocated) then realloc() it
else malloc() a new buffer and use it for the String.
Normally, when we exceed 1024 bytes for the first time, we fall into the 
"else" case.
The 1024-byte buffer is only, I think, an optimization to use the stack 
instead of the heap, most of the times. But is not a limitation.
It's by reading code that I say this, you may want to double-check, or 
ask me to add a test with a very long query and run with Valgrind.

I will make a commit when ready.
Thread
bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to 3229) Guilhem Bichot22 Oct
Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to3229)Jorgen Loland26 Oct
  • Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to3229)Guilhem Bichot5 Nov
    • Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to3229)Jorgen Loland10 Nov
  • Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to3229)Jorgen Loland9 Nov
    • Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to3229)Guilhem Bichot12 Nov
      • Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to3229)Jorgen Loland15 Nov
        • Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to3229)Guilhem Bichot15 Nov
    • Re: bzr push into mysql-next-mr-bugfixing branch (guilhem:3228 to3229)Guilhem Bichot15 Nov