List:Commits« Previous MessageNext Message »
From:Jorgen Loland Date:November 12 2010 9:39am
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)
View as plain text  
Hi Guilhem,
On 11/10/2010 10:58 PM, Guilhem Bichot wrote:
 >>>> === modified file 'mysql-test/r/optimizer_trace_no_prot.result' ---
 >>>> a/mysql-test/r/optimizer_trace_no_prot.result 2010-10-05
 >> 13:19:55 +0000
 >>>> +++ b/mysql-test/r/optimizer_trace_no_prot.result 2010-10-07
 >> 09:02:31 +0000
 >>>> @@ -459,9 +615,13 @@ QUERY_ID TRACE MISSING_BYTES_BEYOND_MAX_
 >>>> "best_access_path": { "considered_access_paths": [ { - "access_type":
 >>>> "table scan", - "cost": 10, - "records": 0, + "access_type": "scan", +
 >>>> "using_range_access": false,
 >>> GB I wonder how a range access can be used when we are doing a "scan"
 >>> (which means table scan)? I think range access is only when there is an
 >>> index...?
 >> Changed to print either
 >> "access_type": "scan" or "access_type": "range"
 >> but note that scan can also be index scan.
 > GB Looks like you are right, scan can be index scan. But in the trace, the
 > opposite of "access_type:scan" is "access_type:index"; maybe this latter one
 > is misleading: it is about an index, but not about an index scan (it is
 > "scan" which is about an "index scan" possibly). Then what is
 > "access_type:index" about? Is it "ref" access, or possibly something else?
 > Maybe it can only be "ref" access: I see that if we set best_key, we then set
 > POSITION::key, which is described in sql_select.h as non-NULL only for "ref"
 > methods. If that's only for "ref" access, maybe we should say "ref" instead
 > of "index" for "access_type".

Good point. "access_type", "index" is added to the trace in best_access_path() here:

7023      loose_scan_opt.next_ref_key();
7024      DBUG_PRINT("info", ("Considering ref access on key %s",
7025                          keyuse->table->key_info[keyuse->key].name));
7026      Opt_trace_object trace_access_idx(thd->opt_trace);
7027      trace_access_idx.add_str("access_type", "index").
7028        add_str("index", keyinfo->name, strlen(keyinfo->name));

As you can see on line 7024, the dbug_print comment is indeed that ref access is

 >>> GB do you know what set_key_image() does? I imagine it leaves data in
 >>> some member of the field; so the content of that member is different
 >>> depending on whether trace is enabled or not; we need to be sure that
 >>> this difference is innocuous.
 >> I am pretty sure it sets the key value (less than value, greater than value
 >> etc) in field's part of record[0] so that field->val_*() (which is
 >> instructed to read from some offset in record[0]) can output the key value
 >> as string. I am also pretty sure record[0] is overwritten every time new
 >> data is requested, so whatever we put in here will not be visible when
 >> reading records.
 > GB ok
 >> Btw: note that this is SergeyP's code
 > GB yes. And it's us who are going to put it into a production release, and
 > who will fix the bug if a customer hits one ;-)

You are of course right. Did you find the explanation satisfactory? It's
difficult to prove that something always happens :-/

 >>>> + if (field->type() == MYSQL_TYPE_BIT) + (void)
 >>>> field->val_int_as_str(&tmp, 1); + else +
 >>> GB do we need to have such special case for MYSQL_TYPE_BIT?
 >> I have no idea. SP wrote this.
 > GB Ok, we need to investigate/test this (if not before the push, we should
 > put it in WL4800_TODO.txt).

Added to WL4800_TODO.txt

 >>>> + out->append("'",1);
 >>> GB out->append('\'') should do. And, what does this quote server
 > for? Is
 >>> it that for a multi-column key we want to print part1'part2'part3?
 >> Yes, that's what it is for. If we go for trace_basic_info() (comment
 >> further down) we should be able to do without this.
 > GB but trace_basic_info() uses append_range() which uses print_key2().

Yes, sorry for the ambiguity: What I meant is that trace_basic_info will not use
print_key2 on multi-column key part. trace_basic_info calls print_key2() for
individual parts, so the ' is not needed :)

 >>> If this has to be, we see that we compute key+store_length above, compare
 >>> it to key_end, then we go to the next iteration, where we compute
 >>> key+store_length again (in the incrementation in the for(...) line) and
 >>> compare it with key_end again (in the for(...)), it's a bit inefficient.
 >>> Also, we don't append a quote after NULL (as we just "continue"), this
 >>> sounds strange. Will a multi-column key (NULL,3,4) be printed as NULL3'4
 >>> ? That should be tested. I suggest to test every branch of this function
 >>> (the BIT one, the quote one, etc) in some .test file.
 >> Ok, but can we reach a conclusion on tracing QUICK vs TRP first?
 > GB but even if we stick with TRP, trace_basic_info() will remain unchanged (I
 > think), and it uses append_range() which uses print_key2() so we need to test
 > every branch of print_key2()...?

See previous comment: TRP will not call print_key2 with multiple key parts, so
the loop can be removed. That is why I wanted to reach agreement on QUICK vs
TRP. I verified with this diff and ran the trace tests:

=== modified file 'sql/'
--- sql/	2010-10-22 13:36:50 +0000
+++ sql/	2010-11-11 13:12:07 +0000
@@ -1988,8 +1988,13 @@ print_key2(String *out, const KEY_PART_I
+  int i =0;
    for (; key < key_end; key+= store_length, key_part++)
+    i++;
+    if (i>1)
+      DBUG_ASSERT(false);

 >>>> @@ -1996,8 +2052,54 @@ public: } DBUG_RETURN(quick); } + const char
 >>>> *get_name() const {return "range_scan";} + void trace_basic_info(const
 >>>> PARAM *param, Opt_trace_object
 >> *trace) const
 >>>> + { + trace->add("table_read_plan_type", get_name()). + add("index",
 >> param->table->key_info[param->real_keynr[key_idx]].name).
 >>> GB is usage of real_keynr here inspired from existing places of code? I
 >>> need to ask, I don't know when real_keynr is good and when not. About the
 >>> latest patch: keynr_in_table could be const (maybe the compiler can do
 >>> more optimizations with this information).
 >> Variables storing key number can refer to the position in either table
 >> (table->key_info[<this>]) or in param
(param->real_keynr[<this>]). If the
 >> variable is the position in param, we use real_keynr to get the position in
 >> table.
 >> You really have to check which one you're dealing with for each class.
 >> TRP::key_idx is key number in param. In the followup patch, I tried to make
 >> this more explicit. My code was wrong in some places due to copy-pasting.
 > GB ok. I suggest to change uint key_idx; /* key number in PARAM::key */ to
 > uint key_idx; /* key number in PARAM::key and PARAM::real_keynr */


 >>>> + { + String multikey; + for (SEL_ARG *part= current; part; part=
 >>>> part->next_key_part)
 >>> GB "part" could be const. In all logic, in those functions which
 > only do
 >>> tracing, all pointers to the optimizer's structures should be const, to
 >>> emphasize that this only updates the trace.
 >> Including TRP_ROR_INTERSECT and others with double-pointers? That requires
 >> a bunch of casting as well:
 >> from this: for (st_ror_scan_info **cur_scan= first_scan; to this: for
 >> (const st_ror_scan_info **cur_scan= (const st_ror_scan_info**) first_scan;
 >> That's a lot of noise as I see it, but I'll do it if you insist :)
 > GB I don't insist when C rules make it hard; as you noted, const
 > st_ror_scan_info **cur_scan= first_scan; though logical, won't compile (I
 > just re-read the explanation in some book, where the author recognizes that
 > this is annoying). For simple pointers, I find const to be very valuable as
 > safeguard.

I agree that const on simple pointers is good. Thanks for commenting on this

 >>>> @@ -2246,12 +2425,10 @@ int SQL_SELECT::test_quick_select(THD *t + } +
 >>>> +free_mem: + Opt_trace_object trace_range_summary(thd->opt_trace, +
 >> "chosen_range_access_summary");
 >>>> + if (quick) + { + /* Guilhem question: I've worked with two
 >>>> alternative ways of + printing the chosen range access. + + 1) Use
 >>>> trace_basic_info() of the TABLE_READ_PLAN object. This + can print
 >>>> multi-key-part ranges separately (I prefer this) + + 2) Use dbug_dump()
 >>>> of the QUICK_SELECT object. Maybe + conceptually more correct since it
 >>>> is what will be + executed, but I cannot make it print multi-key-part
 >>>> ranges + in an understandable way (and I tried a lot).
 >>> GB you mean that in a QUICK_SELECT object, we don't have all info which
 >>> trace_basic_info() uses? Assuming we go with trace_basic_info(), do you
 >>> think that what it prints may be misleading compared to what is in
 >> The information in QUICK_SELECT is in some compressed form and I am unable
 >> to find individual ranges. I don't think it misleads in any way, it only
 >> shows the range in a human readable form.
 > GB Then let's go for the information source which is easier for you (TRP I
 > think). Do I understand correctly that: - check_quick_select() calls the
 > handler (the storage engine), asking it to give statistics for a set of
 > ranges; this set is made accessible to the engine via an "iterator"
 > (sel_arg_range_seq_next()) - this iterator creates range intervals as it runs
 > (so it's quite a smart iterator), and prints them to the trace? If you see
 > anything which would be useful to add to the function comment of
 > sel_arg_range_seq_next(), please don't hesitate (have you noticed this subtle
 > way to hint that I don't fully understand how this function works?).

Prereq: get_mm_tree() has been called and a SEL_ARG graph consisting of range 
intervals has therefore been created. The SEL_ARG graph is constructed in a way 
so that SEL_ARG::next/prev pointers form OR conditions and 
SEL_ARG::next_key_part form AND conditions.

1) check_quick_select asks multi_range_read_info_const() for the cost (in number
   of records) of reading a set of ranges.
2) multi_range_read_info_const() summarizes the records in the different ranges.
    It gets the different ranges as range_seq_t structs from the
    sel_arg_range_seq_next() iterator. Each of these range_seq_t are the ANDed
    form from the SEL_ARG graph in a "compressed" way (see comment on
    KeyTupleFormat in, around line 64).

Now comes the part I don't understand, and is the reason why I'm not able to 
print a human readable range from range_seq_t: SEL_ARG has min_flag and 
max_flag, which effectively determines the boundaries of the range like this:

min_val < X
min_val <= X

X < max_val
X <= max_val

and with both upper and lower bound we get

min_val <= X <= max_val
min_val <= X < max_val

where X is an indexed column.

For the range "key1a = 5 and key1b < 10", trace_basic_info will pick up the 
SEL_ARGs and print

ranges: [5 <= key1a <= 5 : key1b < 10] (*1)

because the max_flag for the first SEL_ARG is "<=" and the max_flag for the 
second SEL_ARG is "<" (see key range flags NO_MIN_RANGE/NEAR_MIN in my_base.h 
and in append_range.

In range_seq_t with ANDed conditions, the min-values (5) and max-values (5'10) 
of the range have been concatenated to the same min_key and max_key uchar*. So 
far so good. The problem is that the flag that determines if we're dealing with 
"<", "<=" etc has also been concatenated using bitwise | (see e.g. 
step_down_to()). So when we try to print the range above using print_quick() we get:

ranges: [ 5 <= X < 5'10 ]

The min/max values are understandable, but notice that the "upper bound" (col1 
<= 5 and col2 < 10) is printed only with '<' due to bitwise | of the flags. I 
gave up understanding how this works (the queries return the correct rows, so it 
has to work). Because I don't understand this, I am reluctant to add more 
documentation to the method in case I add even more confusion.

(*1) Apropos ":" vs "," in ranges: ":" separates ANDed conditions while a new 
array entry (separated by ",") separates ORed conditions

 >>> GB what does cost_not_reduced mean?
 >> adding this index to the intersect does not reduce the cost of the plan.
 >> changed to "does_not_reduce_cost_of_intersect"
 > GB Ok. Maybe then "used_in_intersect" could be "usable"? Trying to
 > standardize...


 >>>> @@ -4805,6 +5105,12 @@ TRP_ROR_INTERSECT *get_best_covering_ror
 >>>> ROR_SCAN_INFO **ror_scans_end= tree->ror_scans_end;
 >>>> DBUG_ENTER("get_best_covering_ror_intersect");
 >>>> + // None of our tests enter this function + Opt_trace_object
 >>>> (param->thd->opt_trace). + add("get_best_covering_ror_intersect",
 >>>> true). + add("untested_code", true). + add("need_tracing",true);
 >>> GB by "our tests" do you mean the entire mtr testsuite?
 >> yes
 > GB in another place, the patch uses "done_tracing" instead of
 > "need_tracing".

Incredible catch!

 >> I think we should revert print_quick() and dbug_dump() to how they are in
 >> mysql-next-mr-opt-backporting, i.e, not use them for tracing at all.
 > GB sounds good; who will do it?

I can do it, but not in this changeset. I added it to WL4800_TODO.txt

 >>>> @@ -7358,27 +7425,31 @@ best_access_path(JOIN *join, */ if (!(records
 >>>> >= s->found_records || best >
 >> s->read_time)) // (1)
 >>>> { - oto1.add("cost", s->read_time).add("records",
s->found_records); +
 >>>> trace_access_scan.add("cost", s->read_time). + add("records",
 >>>> s->found_records). + add("chosen", false);
 >>> GB There was a line, which this patch deletes, which made sure to
 > output
 >>> '"chosen":false' at label skip_table_scan, instead of once per if()
 >>> block. Why change this?
 >> Because I wanted it to print "chosen:false, cause:<whatever>", and
 >> <whatever> is different in each if. I could have made a "String cause"
 >> instead, or I could ignore cost and just print chosen:F. What do you
 > think?
 > GB you can print cause, but postpone printing "chosen" to a single call right
 > after skip_table_scan (and not right before), like this: skip_table_scan:
 > trace_scan_path.add("chosen", best_key == NULL); That would replace a few
 > add("chosen",false) calls.


Jørgen Løland | Senior Software Engineer | +47 73842138
Oracle MySQL
Trondheim, Norway
bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219) WL#5594Jorgen Loland7 Oct
  • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Guilhem Bichot16 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Jorgen Loland22 Oct
      • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Guilhem Bichot10 Nov
        • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Jorgen Loland12 Nov
          • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Guilhem Bichot15 Nov
      • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Guilhem Bichot7 Feb
        • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Jorgen Loland8 Feb