List:Commits« Previous MessageNext Message »
From:Jorgen Loland Date:November 17 2010 12:24pm
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3229)
WL#5594
View as plain text  
Hi Guilhem,

On 11/10/2010 11:05 PM, Guilhem Bichot wrote:
 > Hello Jorgen,
 >
 > Jorgen Loland a écrit, Le 22.10.2010 10:31:
 > GB the condition is printed as
 > "c <= ��"
 > in "ranges", but as
 > "(`test`.`t1`.`c` < '?????')"
 > in "condition_processing"... what is correct and what is wrong...
 > The "ranges" one shows 2 symbols, though in the condition
 > there were 3 symbols...
 > I propose that I address this myself later (see further below).

I vote: yes, please :)

 >  > === modified file 'mysql-test/r/optimizer_trace_no_prot.result'
 >
 > GB optimizer_trace_ps_prot.result also needs an update.
 >
 >  > --- a/mysql-test/r/optimizer_trace_no_prot.result 2010-10-21 15:22:32
 > +0000
 >  > +++ b/mysql-test/r/optimizer_trace_no_prot.result 2010-10-22 08:31:43
 > +0000
 >  > @@ -1,5 +1,6 @@
 >  > select * from information_schema.OPTIMIZER_TRACE;
 >  > QUERY TRACE MISSING_BYTES_BEYOND_MAX_MEM_SIZE OS_MALLOC_ERROR
 >  > +set optimizer_trace_max_mem_size=1048576;
 >  > set @@session.optimizer_trace="enabled=on";
 >  > select * from information_schema.OPTIMIZER_TRACE;
 >  > QUERY TRACE MISSING_BYTES_BEYOND_MAX_MEM_SIZE OS_MALLOC_ERROR
 >  > @@ -40,6 +41,13 @@ SELECT (SELECT 1 FROM D WHERE d = c) AS
 >  > "C"
 >  > ],
 >  > "records_estimation": [
 >  > + {
 >  > + "database": "test",
 >  > + "table": "C",
 >  > + "records": 1,
 >  > + "cost": 1,
 >  > + "cause": "system_table"
 >
 > GB this is the "cause"... of what?

changed to table_type: (system|const)_table

 >  > + ] /* ranges */
 >  > + } /* range_access_plan */,
 >  > + "records_in_plan": 1,
 >  > + "cost_of_plan": 2.21,
 >
 > GB best_extension_by_limited_search() uses "cost_for_plan" and
 > "records_for_plan". There is also "cumulated_cost/records"
 > elsewhere. We should standardize on one pair of words to represent
 > this concept of "cost after including previous costs".

changed to records_for_plan, but this is not the same as cumulated,
because there is no adding happening here. We are simply stating that
"this is the plan we've chosen, this is the number of records/cost for
it". Thus, cumulated* remain separate labels.

 >  > + "chosen": true
 >  > + } /* chosen_range_access_summary */
 >  > + } /* range_analysis */
 >  > }
 >  > ] /* steps */
 >  > } 0 0
 >  > @@ -3730,6 +4248,55 @@ delete from D where d=5 {
 >  > "steps": [
 >  > {
 >  > "expanded_query": "/* select#1 */ select from dual where (`d` = 5)"
 >  > + },
 >  > + {
 >  > + "range_analysis": {
 >  > + "table_scan": {
 >  > + "records": 2,
 >
 > GB there is no indication of the table, but as this is a single-table
 > DELETE anyway, it's ok. Or should we give it a longer introduction,
 > like in SELECT, which has:
 > {
 > "records_estimation": [
 > {
 > "database": "test",
 > "table": "D",
 > "range_analysis": {
 > ?

Ok, done. But note that we only enter the optimizer this way if it is
a single-table operation. Multi-table operations enter the optimizer
the "normal" way:

delete t1 from t1,t2 where t1.i>1;
SELECT trace FROM information_schema.optimizer_trace /* injected by 
--opt-trace-protocol */;
trace
{
   "steps": [
     {
       "expanded_query": "/* select#1 */ select NULL AS `NULL` from `test`.`t1` 
join `test`.`t2` where (`t1`.`i` > 1)"
     },
     {
       "join_preparation": {
         "select#": 1,
         "steps": [
         ]
       }
     },
     {
       "join_optimization": {
...

 >  > @@ -5477,6 +6425,7 @@ explain select * from v1 where id="b" {
 >  > ] /* constant_tables */,
 >  > "records_estimation": [
 >  > {
 >  > + "database": "test",
 >
 > GB Question: should we omit printing the database if it's equal to
 > the default database (THD::db)?
 > Pros: in 99% of our debugging cases, there is a single database, so we
 > wouldn't see "database" in the traces at all
 > Cons: less uniform.

I think no. Let's keep it simple and uniform.

 >  > "table": "t1",
 >  > "table_scan": {
 >  > "records": 3,
 >
 >  > @@ -5548,6 +6500,7 @@ explain select * from v1 where id="b" {
 >  > ] /* constant_tables */,
 >  > "records_estimation": [
 >  > {
 >  > + "database": "",
 >  > "table": "v1",
 >
 > GB The view "v1" is said to have database "". It's not very intuitive;
 > I assume it's because the optimizer is rather using a temp table
 > materializing the view, and such internal temp tables don't have a
 > database.
 > So, probably nothing to worry about.

Indeed. The trace starts with

     {
       "view": {
         "db": "test",
         "name": "v1",
         "in_select#": 1,
         "materialized": true
       } /* view */
     },

Maybe we should do like we do for fake select? Print something more
understandable? I'm not sure if this is possible; didn't check. I
notice that database and tablename is empty for fake selects as well,
by the way. Whatever we choose to do about the empty name (if
anything) should be done the same way in both cases.

 >  > === added file 'mysql-test/r/optimizer_trace_range.result'
 >  > + "ranges": [
 >  > + "key2 < 5",
 >  > + "1020 < key2"
 >
 > GB here we have a nice comma-delimited list of ranges...
 >
 >  > +EXPLAIN SELECT * FROM t2 WHERE key1a = 5 and key1b < 10 {
 >  > + "steps": [
 >  > + {
 >  > + "expanded_query": "/* select#1 */ select `*` AS `*` from
 > `test`.`t2` where ((`key1a` = 5) and (`key1b` < 10))"
 >  > + },
 > ...
 >  > + "group_index_range": {
 >  > + "chosen": false,
 >  > + "cause": "not_group_by_or_distinct"
 >  > + } /* group_index_range */,
 >  > + "analyzing_range_alternatives": {
 >  > + "range_scan_alternatives": [
 >  > + {
 >  > + "index": "PRIMARY",
 >  > + "ranges": [
 >  > + "5 <= key1a <= 5 : key1b < 10"
 >
 > GB ... but here we have it delimited with ":" ?

Yes we do, and that's on purpose :-)

comma-delimited means different ranges, colon-delimited means multiple
conditions in the same range. So

ranges [
   col1 < 5 : col2 = 5,
   7 <= col1 : col2 = 5
]

would mean the two ranges we would get from

where (
        (col1 < 5 OR col1 >= 7)
          AND
         col2=5

(Assuming, of course, that both col1 and col2 are covered by the
index).

If you have suggestions for how to make this more self explanatory,
I'm all ears

 >  > + for (st_ror_scan_info **cur_scan= first_scan;
 >  > + cur_scan != last_scan;
 >  > + cur_scan++)
 >  > + {
 >  > + const KEY cur_key= param->table->key_info[(*cur_scan)->keynr];
 >  > + const KEY_PART_INFO *key_part= cur_key.key_part;
 >  > +
 >  > + Opt_trace_object trace_isect_idx(trace_ctx);
 >  > + trace_isect_idx.add_str("type", "range_scan").
 >  > + add_str("index", cur_key.name, strlen(cur_key.name)).
 >
 > GB there are few add_str() calls in the patch, they may need escaping
 > or not. The current API is that
 > - add_str() (with or without a "length" parameter) expects UTF8
 > without special characters (like, no \n), so does no escaping.
 > - add_str_escaped() expects UTF8 and does escaping (of
 > \n and other special characters)
 > So add_str_escaped() is when data may contain "whatever" (a range
 > would fall in this category: it can contain a varchar with \n in the
 > middle...).
 > add_str_escaped() is slower (tests each byte for escaping), which is
 > why add_str() exists.
 > add_str() is ok for hard-coded strings made of A-Za-z0-9_-
 > (non-special ASCII characters) like
 > trace->add_str("type", "index_roworder_intersect")
 > or
 > trace_summary.add_str("cause", cause);
 > When the string is variable and user-provided, like the cur_key.name
 > above, we should use add_str_escaped(). Because it's possible to
 > create an index named index\n1, as Bar showed to me:
 > create table... index `index
 > 1`));
 > Don't bother with fixing this now. I can, after your push, make a pass
 > over all add_str() calls and fix what needs to be (there are problems
 > in my code too).
 > There are also less performant solutions (but less a burden on the
 > developer): I can remove add_str_escaped() and make add_str() always
 > do escaping. Feels stupid to spend CPU time on escaping "cost" which
 > doesn't need to be escaped, though.
 > What do you think?

The risk is high that we'll call add_str in some places where an
add_str_esc is really needed, but on the other hand the consequences
are almost negligible as I see it (we'll get some strange characters
from a few trace locations, but then we notice it and fix it
immediately). However, my impression is that the majority (or even
all?) strings we get from the server (typically names) will need to be
escaped anyway.

If it is correct that almost all add_str() except the ones with
"static labels" ("cause", "cost") need escaping, we also have these
possibilities (ignore if assumption is not true):

1)
Merge the two add_str and add_str_escaped functions into one, add a
default bool that we use only in "labeling" cases:

   Opt_trace_struct& add_str(const char *key,
                             const char *value, size_t val_length, bool 
needs_escape= true)

or 2)
change to add_str() for strings that need escaping and add_str_noescape() for others

Both makes it more explicit in the code that the implementor really
intended to use the unescaped version of add.

 >  > + void trace_basic_info(const PARAM *param, Opt_trace_object *trace)
 > const
 >  > + {
 >  > + trace->add_str("type", "index_group").
 >  > + add_str("index", index_info->name, strlen(index_info->name)).
 >  > + add_str("group_attribute", min_max_arg_part->field->field_name,
 >  > + strlen(min_max_arg_part->field->field_name)).
 >  > + add("min_aggregate", have_min).
 >  > + add("max_aggregate", have_max).
 >  > + add("distinct_aggregate", have_agg_distinct).
 >  > + add("key_parts_used_for_access", used_key_parts).
 >
 > GB when printing "potential_range_indices" key_part is printed in a
 > human-readable form (names of columns), should be do the same here
 > instead of printing a number?

Good idea. Done

 >  > @@ -2494,18 +2777,26 @@ int SQL_SELECT::test_quick_select(THD *t
 >  > }
 >  > }
 >  >
 >  > -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >  > - oto.add(param.table->quick_condition_rows, "quick_condition_rows");
 >  > -#endif
 >  > - free_mem:
 >  > +free_mem:
 >  > + if (quick)
 >  > + {
 >  > + Opt_trace_object trace_range_summary(thd->opt_trace,
 >  > + "chosen_range_access_summary");
 >  > + {
 >  > + Opt_trace_object trace_range_plan(thd->opt_trace,
 >  > + "range_access_plan");
 >  > + best_trp->trace_basic_info(&param, &trace_range_plan);
 >  > +
 >  > + }
 >  > + trace_range_summary.add("records_in_plan", quick->records).
 >  > + add("cost_of_plan", quick->read_time).
 >  > + add("chosen", true);
 >  > + }
 >  > +
 >  > free_root(&alloc,MYF(0)); // Return memory & allocator
 >  > thd->mem_root= param.old_root;
 >  > thd->no_errors=0;
 >  > }
 >  > -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >  > - oto.add(records, "records");
 >  > - OPT_TRACE2(thd->opt_trace, print_quick(quick, &needed_reg));
 >
 > GB after that change, print_quick() is not used anymore; could you
 > please delete it (I get a compiler warning)? Or if it's code which we
 > want to keep around, maybe disable it inside "#if 0"?
 > It may be a good time to diff opt_range.cc between opt-backporting and
 > your tree: maybe some old 2009 prototype stuff is not needed
 > anymore?

We agreed to revert print_quick() to how it works in opt_backporting,
so this is on the todo-list. Until we get time to look into that, the
function is inside "#if 0" as suggested

 >  > @@ -4683,28 +5003,42 @@ TRP_ROR_INTERSECT *get_best_ror_intersec
 >  > ROR_SCAN_INFO **intersect_scans_best;
 >  > cur_ror_scan= tree->ror_scans;
 >  > intersect_scans_best= intersect_scans;
 >  > - while (cur_ror_scan != tree->ror_scans_end &&
!intersect->is_covering)
 >  > {
 >  > - /* S= S + first(R); R= R - first(R); */
 >  > - if (!ror_intersect_add(intersect, *cur_ror_scan, FALSE))
 >  > + Opt_trace_array ota(param->thd->opt_trace, "intersecting_indices");
 >  > + while (cur_ror_scan != tree->ror_scans_end &&
!intersect->is_covering)
 >  > {
 >  > - cur_ror_scan++;
 >  > - continue;
 >  > - }
 >  > + Opt_trace_object trace_idx(param->thd->opt_trace);
 >  > + char *idx_name=
param->table->key_info[(*cur_ror_scan)->keynr].name;
 >  > + trace_idx.add_str("index", idx_name, strlen(idx_name));
 >  > + /* S= S + first(R); R= R - first(R); */
 >  > + if (!ror_intersect_add(intersect, *cur_ror_scan, FALSE))
 >  > + {
 >  > + trace_idx.add("used_in_intersect", false).
 >  > + add_str("cause", "does_not_reduce_cost_of_intersect");
 >
 > GB would add_str("cause, "cost") be clear enough? Maybe not...

In this case I think we add valuable information.

 >  > @@ -4728,9 +5062,22 @@ TRP_ROR_INTERSECT *get_best_ror_intersec
 >  > if (ror_intersect_add(intersect, cpk_scan, TRUE) &&
 >  > (intersect->total_cost < min_cost))
 >  > {
 >  > + Opt_trace_object (param->thd->opt_trace, "clustered_pk").
 >  > + add("cpk_scan_added_to_intersect", true).
 >  > + add("cumulated_cost", intersect->total_cost);
 >  > cpk_scan_used= TRUE;
 >  > intersect_best= intersect; //just set pointer here
 >  > }
 >  > + else
 >  > + Opt_trace_object (param->thd->opt_trace, "clustered_pk").
 >  > + add("cpk_added_to_intersect", false).add_str("cause", "cost");
 >  > + }
 >  > + else
 >  > + {
 >  > + Opt_trace_object trace_cpk(param->thd->opt_trace, "clustered_pk");
 >  > + trace_cpk.add("cpk_added_to_intersect", false);
 >  > + cpk_scan ? trace_cpk.add_str("cause", "ror_is_covering")
 >  > + : trace_cpk.add_str("cause", "no_clustered_pk_index");
 >
 > GB The two last lines could be
 > trace_cpk.add_str("cause", cpk_scan ? "ror_is_covering" :
 > "no_clustered_pk_index");
 > (I guess it generates less function calls).

I don't see how. My code checks cpk_scan variable and calls add_str()
once. Your code checks cpk_scan variable and calls add_str() once.
What am I missing?

 >  > /* Ok, return ROR-intersect plan if we have found one */
 >  > @@ -4755,10 +5102,18 @@ TRP_ROR_INTERSECT *get_best_ror_intersec
 >  > trp->records= best_rows;
 >  > trp->index_scan_costs= intersect_best->index_scan_costs;
 >  > trp->cpk_scan= cpk_scan_used? cpk_scan: NULL;
 >  > +
 >  > + trace_ror.add("records", trp->records).
 >  > + add("cost", trp->read_cost).
 >  > + add("covering", trp->is_covering).
 >  > + add("chosen", true);
 >  > +
 >  > DBUG_PRINT("info", ("Returning non-covering ROR-intersect plan:"
 >  > "cost %g, records %lu",
 >  > trp->read_cost, (ulong) trp->records));
 >  > }
 >  > + else
 >  > + trace_ror.add("chosen", false);
 >
 > GB the two trace_ror.add("chosen", true/false) could be replaced with
 > one final
 > trace_ror.add("chosen", trp != NULL);

Yes, but do you think that increases readability? I think the explicit
printing of true/false is easier.

 >  > @@ -6121,6 +6492,11 @@ get_mm_leaf(RANGE_OPT_PARAM *param, Item
 >  > {
 >  > if (type == Item_func::LT_FUNC || type == Item_func::LE_FUNC)
 >  > {
 >  > + Opt_trace_object (param->thd->opt_trace, "impossible_condition",
 >  > + Opt_trace_context::RANGE_OPTIMIZER).
 >  > + add_str("table", *field->table_name, strlen(*field->table_name)).
 >  > + add_str("field", field->field_name, strlen(field->field_name)).
 >  > + add_str("cause", "unsigned_int_cannot_be_negative");
 >
 > GB We have three such blocks which follow the same pattern: they
 > create "impossible_condition", fill "table" and "field" and then
 > "cause", they differ only by "cause", and then go to label
 > "end". I propose to reduce code duplication by setting a "const char
 > *cause" to the cause, and doing the duplicated trace code at label
 > "end".
 > I worry a bit about growing the code's size too much: unlike DBUG, the
 > optimizer trace code will be put into release binaries where speed
 > matters. My nightmare is that the QA team runs a benchmark and finds a
 > notable difference between when opt trace is not compiled in and when
 > it is compiled in.

Done, but to which extent does program size degrade performance when
there is no difference in the functions that are called? I thought
this was negligible - I may be wrong.

 >  > @@ -7632,8 +8008,26 @@ range_seq_t sel_arg_range_seq_init(void
 >  > }
 >  >
 >  >
 >  > -static void step_down_to(SEL_ARG_RANGE_SEQ *arg, SEL_ARG *key_tree)
 >  > +static void step_down_to(String *s, SEL_ARG_RANGE_SEQ *arg, SEL_ARG
 > *key_tree)
 >  > {
 >  > +
 >  > + if (arg->param->thd->opt_trace &&
 > arg->param->thd->opt_trace->is_started())
 >
 > GB this will not compile if OPTIMIZER_TRACE is not defined (cmake
 > -DOPTIMIZER_TRACE=0).

added
#ifdef OPTIMIZER_TRACE

to this and another opt_trace->is_started() calls

 >  > DBUG_RETURN(NULL);
 >  > -
 >  > + }
 >  > /* The query passes all tests, so construct a new TRP object. */
 >
 >  > void QUICK_RANGE_SELECT::dbug_dump(int indent, bool verbose)
 >  > {
 >  > Opt_trace_context *out= current_thd->opt_trace;
 >  > + Opt_trace_object oto(out, "range_scan");
 >  > + oto.add_str("index", head->key_info[index].name,
 >
 > GB as agreed in a previous mail, we should revert that to their state
 > of opt-backporting.
 > I believe that, as print_quick() is unused now, dbug_dump() is unused
 > except in TEST_join() (which prints to DBUG); but maybe even that is
 > redundant and we don't need to call it from TEST_join() anymore,
 > because the range optimizer traces go to DBUG? In that case, we could
 > remove dbug_dump() functions... Let me know if you want to do it, or I
 > should put it on my TODO.

This is already an item in the todo list.

 >  > void QUICK_ROR_INTERSECT_SELECT::dbug_dump(int indent, bool verbose)
 >  > {
 >  > -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >  > Opt_trace_context *out= current_thd->opt_trace;
 >  > -#endif
 >  > List_iterator_fast<QUICK_RANGE_SELECT> it(quick_selects);
 >  > QUICK_RANGE_SELECT *quick;
 >  > -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >  > - Opt_trace_object oto(out, "index_merge_intersect");
 >  > -#endif
 >  > + Opt_trace_array ota(out, "index_roworder_intersect");
 >  > while ((quick= it++))
 >  > + {
 >  > + Opt_trace_object wrapper(out);
 >  > quick->dbug_dump(indent+2, verbose);
 >  > + }
 >  > if (cpk_quick)
 >  > {
 >  > -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >  > Opt_trace_object oto2(out, "clustered_pk_scan");
 >
 > GB sometimes we use "cpk" sometimes "clustered_pk", could we
 > standardize?

cpk changed to clustered_pk

 >  > + test_quick_select adds tracing with keys, and we are currently
 >  > + in a trace array that does not accept keys.
 >  > + */
 >  > + Opt_trace_object wrapper(thd->opt_trace);
 >  > return test_quick_select(thd, tmp, 0, limit, force_quick_range,
 > FALSE) < 0;
 >  > }
 >
 > GB how about leaving check_quick() as it was, and rather creating the
 > wrapper object directly in the UPDATE and DELETE code? I think it would be
 > more logical. Otherwise it introduces a kind of "API difference"
 > between check_quick() and test_quick_select(): both functions become
 > different because one can be called in a context and the other
 > cannot.

ok, done. now also printing db and table name in the locations where
the wrappers are added.

 >  > @@ -7434,6 +7509,7 @@ best_access_path(JOIN *join,
 >  > }
 >  > else
 >  > {
 >  > + trace_access_scan.add_str("access_type", "scan");
 >
 > GB do we, further down in the trace, tell the user whether
 > we decided to use _table_ scan or _index_ scan?

This decision is made in make_join_readinfo(). I added a tracepoint to
this method ("refine_plan") that currently does nothing more than
output whether table or index will be scanned. We probably want to add
info to this trace object later (but maybe not 1st priority)


 >  > /* Estimate cost of reading table. */
 >  > if (s->table->force_index && !best_key)
 >  > tmp= s->table->file->read_time(s->ref.key, 1, s->records);
 >
 > GB A comment about SQL_SELECT::test_quick_select(): it looks like we
 > generate different plans, print the description of each plan to the
 > trace, and at function's the very end, print the chosen one. Thus the
 > chosen one is printed twice (when it's considered, and in
 > conclusion). An alternative (which I used when advance_sj_state()
 > tries semijoin strategies one after the other), would be to not print
 > at the very end, but when we generate each plan, print a "chosen"
 > information for it. The user can understand which one was the final
 > choice, by looking at the last one which has chosen==true. Though if
 > you do that, then the order of plans matters, so there needs to be a
 > list of plans, not a dictionary of plans (a dictionary is not ordered,
 > a JSON viewer may shuffle its attributes).
 > I'm ok with the current solution if you prefer it.

To me, the current way is very easy to understand. Besides,
range_optimizer=off currently hides details while showing the summary
as can be seen below. This is not possible if we remove the summary.

             "records_estimation": [
               {
                 "database": "test",
                 "table": "t1",
                 "range_analysis": {
                   "table_scan": {
                     "records": 1024,
                     "cost": 217.15
                   } /* table_scan */,
                   "potential_range_indices": "...",
                   "group_index_range": "...",
                   "analyzing_range_alternatives": "...",
                   "chosen_range_access_summary": {
                     "range_access_plan": {
                       "type": "range_scan",
                       "index": "i2",
                       "records": 47,
                       "ranges": [
                         "key2 < 5",
                         "1020 < key2"
                       ] /* ranges */
                     } /* range_access_plan */,
                     "records_for_plan": 47,
                     "cost_for_plan": 58.41,
                     "chosen": true
                   } /* chosen_range_access_summary */
                 } /* range_analysis */
               }
             ] /* records_estimation */

-- 
Jørgen Løland | Senior Software Engineer | +47 73842138
Oracle MySQL
Trondheim, Norway
Thread
bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3229) WL#5594Jorgen Loland22 Oct
  • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3229)WL#5594Guilhem Bichot10 Nov
    • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3229)WL#5594Jorgen Loland17 Nov
      • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3229)WL#5594Guilhem Bichot18 Nov
        • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3229)WL#5594Jorgen Loland22 Nov