List:Commits« Previous MessageNext Message »
From:Jorgen Loland Date:October 18 2010 7:08am
Subject:Re: one trace per row scanned, by range optimizer
View as plain text  
Guilhem,

I started to look at the JSON syntax errors last week and already made an 
attempt to remedy the problem you're describing. Currently, I have this code:

static int
join_init_quick_read_record(JOIN_TAB *tab)
{
   /* This is for QS_DYNAMIC_RANGE, i.e., "Range checked for each
      record". The trace for the range analysis below this point will
      be printed with different ranges for every record to the left of
      this table in the join.
    */
   Opt_trace_object wrapper(tab->join->thd->opt_trace);
   Opt_trace_object trace_table(tab->join->thd->opt_trace,
                                "records_estimation_per_record",
                                Opt_trace_context::DYNAMIC_RANGE);
   trace_table.add("table", tab->table->alias);

   if (test_if_quick_select(tab) == -1)
     return -1;					/* No possible records */
   return join_init_read_record(tab);
}

Further, Opt_trace_context::DYNAMIC_RANGE is disabled by default. The sandbox 
I've done this in also contains a bunch of other subquery JSON fixes. It looks 
like this:

       "join_execution": {
         "select#": 1,
         "steps": [
           {                     <- much interesting in this block
             "subselect_single_select_exec_steps": [
               {
                 "join_optimization": {
                   <cut>
                 }
               },
               {
                 "join_execution": {
                   "select#": 2,
                   "steps": [
                     {
                       "records_estimation_per_record": "..."
                     }
                   ]
                 }
               }
             ]
           },
           {                     <- NOT much interesting in this block
             "subselect_single_select_exec_steps": [
               {
                 "join_execution": {
                   "select#": 2,
                   "steps": [
                     {
                       "records_estimation_per_record": "..."
                     }
                   ]
                 }
               }
             ]
           },
           {                     <- NOT much interesting in this block
             "subselect_single_select_exec_steps": [
               {
                 "join_execution": {
                   "select#": 2,
                   "steps": [
                     {
                       "records_estimation_per_record": "..."
                     }
                   ]
                 }
               }
             ]
           },
           {
             <and so on>


You can see from the above that even with traces disabling from and below 
join_init_quick_read_record(), we will still produce 14 lines for each record 
evaluated by dynamic range. Even this may be too much, but on the other hand it 
is important information that the range optimizer is in fact evaluated for each 
record. To reduce it even further, maybe we could disable tracing in 
subselect_*_engine::exec, but I haven't investigated if that makes sense yet. 
There may be other information we need from the subselect*::exec() functions, 
and especially the first time exec() is called. Maybe tracing can be disabled 
only on second execution and onwards?

So my reply is:
I think it should be possible to print this range information by enabling an 
optimizer feature. My suggestions:
   1) We can be satisfied with disabling tracing below
      join_init_quick...() and accept 14 lines of output per record, or
   2) We can disable tracing earlier, e.g. in subselect*::exec() the second
      time it is called and onwards. We could e.g. cut down to 3 lines
      (below) or hide exec completely by not tracing anything.

           {
             "subselect_single_select_exec_steps": "..."
           },
           {
             "subselect_single_select_exec_steps": "..."
           },
           {
             "subselect_single_select_exec_steps": "..."
           }


On 10/16/2010 06:12 PM, Guilhem Bichot wrote:
> Hello,
>
> More thoughts:
>
> Guilhem Bichot a écrit, Le 15.10.2010 20:45:
>> Hello,
>>
>> I'm attaching a result file for discussion.
>> I got it by using your patch and disabling the assertion in
>> Opt_trace_context::syntax_error(), so that the test can proceed further.
>> There is a first problem in "attached_conditions", but easy to solve
>> and I have just pushed a fix for it (revno 3221). The more interesting
>> happens at end of object "join_optimization" (look for the "invalid
>> JSON" words around there).
>>
>> For such query:
>>
>> select * from t1, t2
>> where t2.a between t1.a - interval 2 day and t1.a + interval 2 day;
>>
>> we can see that the range optimizer is called once per row of t1: we
>> read a row of t1, and we call the range optimizer with the range "2
>> days before or after this row of t1". This is the so-called "range
>> checked for each record" (visible in EXPLAIN). Two rows in t1, so two
>> pieces of trace related to the range optimizer.
>> In my test there are only two outer rows, but imagine if there would
>> be 1000 outer rows: we would have 1000 traces of the range optimizer.
>> What is the right approach?
>> 1) Try to disable tracing at the right moment, to avoid this?
>
> If we do this, the right moment could be in
> join_init_quick_read_record(), putting this
> Opt_trace_disable_I_S otd(tab->join->thd->opt_trace, true);
> at start of the function.
> That also fixes the syntax error in index_merge_myisam
> --opt-trace-protocol , where we have the range optimizer called once per
> evaluation of a subquery.
>
>> 2) Or let this tracing be, saying that the range optimizer, even
>> called 1000 times in the execution phase, is still an important
>> optimization information, and just count on truncation by
>> @@optimizer_trace_max_mem_size (default: 16kB) if things go
>> unreasonably big? That doesn't sound too bad to me. If we go this way,
>> I should define a proper container for those range-opt-at-exec-time
>> traces: a "join_execution" object in the trace seems mandated, with a
>> "steps" array underneath; each range optimizer trace becomes an
>> element of the array. There is another problem: traces generated by
>> execution have the potential to be big, so truncation by max_mem_size
>> would become frequent; a truncated trace is not parsable by a JSON
>> parser. So the only way for a user to get a parsable trace is to
>> increase max_mem_size until it's big enough to host the execution
>> trace, and it may be unreasonably big then. So maybe we need a flag in
>> JOIN::exec()'s top object ("EXECUTION"?), so that the user can request
>> no tracing in JOIN::exec() and its children. Then the trace stays
>> small, below max_mem_size, not truncated, and parsable.
>
> If we do this, then what should be the default for the "EXECUTION" flag:
> enabled or disabled? disabled is safer, and more in line with the name
> "optimizer trace". Its drawback is that we lose the join_optimization
> block of subqueries (as they are optimized only at execution of the top
> join).
>
>> Disabling tracing at the start of JOIN::exec() is not a solution: in
>> other queries (like the first one using a subquery, in
>> optimizer_trace_no_prot.test), JOIN::optimize for the subquery is
>> called by the top query's JOIN::exec... so if I disable tracing of
>> JOIN::exec(), I lose tracing of the subquery's JOIN::optimize().
>>
>> Do you have an opinion?
>
> Still asking the question :-)
> I am hesitant. I think that, to keep things under control, I would
> prefer to silence the execution phase, either by disabling the range opt
> calls done in execution, or by having the EXECUTION flag off by default.
>

-- 
Jørgen Løland | Senior Software Engineer | +47 73842138
Oracle MySQL
Trondheim, Norway
Thread
one trace per row scanned, by range optimizerGuilhem Bichot15 Oct
  • Re: one trace per row scanned, by range optimizerGuilhem Bichot16 Oct
    • Re: one trace per row scanned, by range optimizerJorgen Loland18 Oct
      • Re: one trace per row scanned, by range optimizerGuilhem Bichot21 Oct
        • Re: one trace per row scanned, by range optimizerJorgen Loland21 Oct
          • Re: one trace per row scanned, by range optimizerGuilhem Bichot22 Oct