List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:October 21 2010 8:19am
Subject:Re: one trace per row scanned, by range optimizer
View as plain text  
Hello Jorgen,

Jorgen Loland a écrit, Le 18.10.2010 09:08:
> Guilhem,
> 
> I started to look at the JSON syntax errors last week and already made 
> an attempt to remedy the problem you're describing. Currently, I have 
> this code:
> 
> static int
> join_init_quick_read_record(JOIN_TAB *tab)
> {
>   /* This is for QS_DYNAMIC_RANGE, i.e., "Range checked for each
>      record". The trace for the range analysis below this point will
>      be printed with different ranges for every record to the left of
>      this table in the join.
>    */
>   Opt_trace_object wrapper(tab->join->thd->opt_trace);
>   Opt_trace_object trace_table(tab->join->thd->opt_trace,
>                                "records_estimation_per_record",
>                                Opt_trace_context::DYNAMIC_RANGE);
>   trace_table.add("table", tab->table->alias);
> 
>   if (test_if_quick_select(tab) == -1)
>     return -1;                    /* No possible records */
>   return join_init_read_record(tab);
> }
> 
> Further, Opt_trace_context::DYNAMIC_RANGE is disabled by default.

ok, I think we need this indeed.

> The 
> sandbox I've done this in also contains a bunch of other subquery JSON 
> fixes. It looks like this:
> 
>       "join_execution": {
>         "select#": 1,
>         "steps": [
>           {                     <- much interesting in this block
>             "subselect_single_select_exec_steps": [
>               {
>                 "join_optimization": {
>                   <cut>
>                 }
>               },
>               {
>                 "join_execution": {
>                   "select#": 2,
>                   "steps": [
>                     {
>                       "records_estimation_per_record": "..."
>                     }
>                   ]
>                 }
>               }
>             ]
>           },
>           {                     <- NOT much interesting in this block
>             "subselect_single_select_exec_steps": [
>               {
>                 "join_execution": {
>                   "select#": 2,
>                   "steps": [
>                     {
>                       "records_estimation_per_record": "..."
>                     }
>                   ]
>                 }
>               }
>             ]
>           },
>           {                     <- NOT much interesting in this block
>             "subselect_single_select_exec_steps": [
>               {
>                 "join_execution": {
>                   "select#": 2,
>                   "steps": [
>                     {
>                       "records_estimation_per_record": "..."
>                     }
>                   ]
>                 }
>               }
>             ]
>           },
>           {
>             <and so on>
> 
> 
> You can see from the above that even with traces disabling from and 
> below join_init_quick_read_record(), we will still produce 14 lines for 
> each record evaluated by dynamic range.

That's too much.

> Even this may be too much, but 
> on the other hand it is important information that the range optimizer 
> is in fact evaluated for each record.

Yes. User needs to be told about this, but not 1000s of times.

> To reduce it even further, maybe 
> we could disable tracing in subselect_*_engine::exec, but I haven't 
> investigated if that makes sense yet. There may be other information we 
> need from the subselect*::exec() functions, and especially the first 
> time exec() is called. Maybe tracing can be disabled only on second 
> execution and onwards?

I'm thinking about this:
have a counter in JOIN (or in select_lex), counting the number of times 
this JOIN's exec() has been called; when the number exceeds a maximum 
(1?), we don't trace this JOIN's exec() anymore. The idea being that the 
first call brings useful info, but the subsequent calls don't. Then we 
probably need an option to set this counter (for the case where a bug 
happens in the 36th exec() of a JOIN and we want to trace this)?

> So my reply is:
> I think it should be possible to print this range information by 
> enabling an optimizer feature. My suggestions:
>   1) We can be satisfied with disabling tracing below
>      join_init_quick...() and accept 14 lines of output per record, or

sounds too much to me

>   2) We can disable tracing earlier, e.g. in subselect*::exec() the second
>      time it is called and onwards. We could e.g. cut down to 3 lines
>      (below) or hide exec completely by not tracing anything.
> 
>           {
>             "subselect_single_select_exec_steps": "..."
>           },
>           {
>             "subselect_single_select_exec_steps": "..."
>           },
>           {
>             "subselect_single_select_exec_steps": "..."
>           }

even an ellipsis repeated for each record could lead to a big trace. I 
would go silent after the first record.

> 
> 
> On 10/16/2010 06:12 PM, Guilhem Bichot wrote:
>> Hello,
>>
>> More thoughts:
>>
>> Guilhem Bichot a écrit, Le 15.10.2010 20:45:
>>> Hello,
>>>
>>> I'm attaching a result file for discussion.
>>> I got it by using your patch and disabling the assertion in
>>> Opt_trace_context::syntax_error(), so that the test can proceed further.
>>> There is a first problem in "attached_conditions", but easy to solve
>>> and I have just pushed a fix for it (revno 3221). The more interesting
>>> happens at end of object "join_optimization" (look for the "invalid
>>> JSON" words around there).
>>>
>>> For such query:
>>>
>>> select * from t1, t2
>>> where t2.a between t1.a - interval 2 day and t1.a + interval 2 day;
>>>
>>> we can see that the range optimizer is called once per row of t1: we
>>> read a row of t1, and we call the range optimizer with the range "2
>>> days before or after this row of t1". This is the so-called "range
>>> checked for each record" (visible in EXPLAIN). Two rows in t1, so two
>>> pieces of trace related to the range optimizer.
>>> In my test there are only two outer rows, but imagine if there would
>>> be 1000 outer rows: we would have 1000 traces of the range optimizer.
>>> What is the right approach?
>>> 1) Try to disable tracing at the right moment, to avoid this?
>>
>> If we do this, the right moment could be in
>> join_init_quick_read_record(), putting this
>> Opt_trace_disable_I_S otd(tab->join->thd->opt_trace, true);
>> at start of the function.
>> That also fixes the syntax error in index_merge_myisam
>> --opt-trace-protocol , where we have the range optimizer called once per
>> evaluation of a subquery.
>>
>>> 2) Or let this tracing be, saying that the range optimizer, even
>>> called 1000 times in the execution phase, is still an important
>>> optimization information, and just count on truncation by
>>> @@optimizer_trace_max_mem_size (default: 16kB) if things go
>>> unreasonably big? That doesn't sound too bad to me. If we go this way,
>>> I should define a proper container for those range-opt-at-exec-time
>>> traces: a "join_execution" object in the trace seems mandated, with a
>>> "steps" array underneath; each range optimizer trace becomes an
>>> element of the array. There is another problem: traces generated by
>>> execution have the potential to be big, so truncation by max_mem_size
>>> would become frequent; a truncated trace is not parsable by a JSON
>>> parser. So the only way for a user to get a parsable trace is to
>>> increase max_mem_size until it's big enough to host the execution
>>> trace, and it may be unreasonably big then. So maybe we need a flag in
>>> JOIN::exec()'s top object ("EXECUTION"?), so that the user can request
>>> no tracing in JOIN::exec() and its children. Then the trace stays
>>> small, below max_mem_size, not truncated, and parsable.
>>
>> If we do this, then what should be the default for the "EXECUTION" flag:
>> enabled or disabled? disabled is safer, and more in line with the name
>> "optimizer trace". Its drawback is that we lose the join_optimization
>> block of subqueries (as they are optimized only at execution of the top
>> join).
>>
>>> Disabling tracing at the start of JOIN::exec() is not a solution: in
>>> other queries (like the first one using a subquery, in
>>> optimizer_trace_no_prot.test), JOIN::optimize for the subquery is
>>> called by the top query's JOIN::exec... so if I disable tracing of
>>> JOIN::exec(), I lose tracing of the subquery's JOIN::optimize().
>>>
>>> Do you have an opinion?
>>
>> Still asking the question :-)
>> I am hesitant. I think that, to keep things under control, I would
>> prefer to silence the execution phase, either by disabling the range opt
>> calls done in execution, or by having the EXECUTION flag off by default.
>>
> 


-- 
Mr. Guilhem Bichot <guilhem.bichot@stripped>
Oracle / MySQL / Optimizer team, Lead Software Engineer
Bordeaux, France
www.oracle.com / www.mysql.com
Thread
one trace per row scanned, by range optimizerGuilhem Bichot15 Oct
  • Re: one trace per row scanned, by range optimizerGuilhem Bichot16 Oct
    • Re: one trace per row scanned, by range optimizerJorgen Loland18 Oct
      • Re: one trace per row scanned, by range optimizerGuilhem Bichot21 Oct
        • Re: one trace per row scanned, by range optimizerJorgen Loland21 Oct
          • Re: one trace per row scanned, by range optimizerGuilhem Bichot22 Oct