Hi Guilhem,
On 11/10/2010 10:58 PM, Guilhem Bichot wrote:
>>>> === modified file 'mysql-test/r/optimizer_trace_no_prot.result' ---
>>>> a/mysql-test/r/optimizer_trace_no_prot.result 2010-10-05
>> 13:19:55 +0000
>>>> +++ b/mysql-test/r/optimizer_trace_no_prot.result 2010-10-07
>> 09:02:31 +0000
>>>> @@ -459,9 +615,13 @@ QUERY_ID TRACE MISSING_BYTES_BEYOND_MAX_
>>>> "best_access_path": { "considered_access_paths": [ { - "access_type":
>>>> "table scan", - "cost": 10, - "records": 0, + "access_type": "scan", +
>>>> "using_range_access": false,
>>>
>>> GB I wonder how a range access can be used when we are doing a "scan"
>>> (which means table scan)? I think range access is only when there is an
>>> index...?
>>
>> Changed to print either
>>
>> "access_type": "scan" or "access_type": "range"
>>
>> but note that scan can also be index scan.
>
> GB Looks like you are right, scan can be index scan. But in the trace, the
> opposite of "access_type:scan" is "access_type:index"; maybe this latter one
> is misleading: it is about an index, but not about an index scan (it is
> "scan" which is about an "index scan" possibly). Then what is
> "access_type:index" about? Is it "ref" access, or possibly something else?
> Maybe it can only be "ref" access: I see that if we set best_key, we then set
> POSITION::key, which is described in sql_select.h as non-NULL only for "ref"
> methods. If that's only for "ref" access, maybe we should say "ref" instead
> of "index" for "access_type".
Good point. "access_type", "index" is added to the trace in best_access_path() here:
7023 loose_scan_opt.next_ref_key();
7024 DBUG_PRINT("info", ("Considering ref access on key %s",
7025 keyuse->table->key_info[keyuse->key].name));
7026 Opt_trace_object trace_access_idx(thd->opt_trace);
7027 trace_access_idx.add_str("access_type", "index").
7028 add_str("index", keyinfo->name, strlen(keyinfo->name));
As you can see on line 7024, the dbug_print comment is indeed that ref access is
considered.
>>> GB do you know what set_key_image() does? I imagine it leaves data in
>>> some member of the field; so the content of that member is different
>>> depending on whether trace is enabled or not; we need to be sure that
>>> this difference is innocuous.
>>
>> I am pretty sure it sets the key value (less than value, greater than value
>> etc) in field's part of record[0] so that field->val_*() (which is
>> instructed to read from some offset in record[0]) can output the key value
>> as string. I am also pretty sure record[0] is overwritten every time new
>> data is requested, so whatever we put in here will not be visible when
>> reading records.
>
> GB ok
>
>> Btw: note that this is SergeyP's code
>
> GB yes. And it's us who are going to put it into a production release, and
> who will fix the bug if a customer hits one ;-)
You are of course right. Did you find the explanation satisfactory? It's
difficult to prove that something always happens :-/
>>>> + if (field->type() == MYSQL_TYPE_BIT) + (void)
>>>> field->val_int_as_str(&tmp, 1); + else +
field->val_str(&tmp);
>>>
>>> GB do we need to have such special case for MYSQL_TYPE_BIT?
>>
>> I have no idea. SP wrote this.
>
> GB Ok, we need to investigate/test this (if not before the push, we should
> put it in WL4800_TODO.txt).
Added to WL4800_TODO.txt
>>>> + out->append("'",1);
>>>
>>> GB out->append('\'') should do. And, what does this quote server
> for? Is
>>> it that for a multi-column key we want to print part1'part2'part3?
>>
>> Yes, that's what it is for. If we go for trace_basic_info() (comment
>> further down) we should be able to do without this.
>
> GB but trace_basic_info() uses append_range() which uses print_key2().
Yes, sorry for the ambiguity: What I meant is that trace_basic_info will not use
print_key2 on multi-column key part. trace_basic_info calls print_key2() for
individual parts, so the ' is not needed :)
>>> If this has to be, we see that we compute key+store_length above, compare
>>> it to key_end, then we go to the next iteration, where we compute
>>> key+store_length again (in the incrementation in the for(...) line) and
>>> compare it with key_end again (in the for(...)), it's a bit inefficient.
>>> Also, we don't append a quote after NULL (as we just "continue"), this
>>> sounds strange. Will a multi-column key (NULL,3,4) be printed as NULL3'4
>>> ? That should be tested. I suggest to test every branch of this function
>>> (the BIT one, the quote one, etc) in some .test file.
>>
>> Ok, but can we reach a conclusion on tracing QUICK vs TRP first?
> GB but even if we stick with TRP, trace_basic_info() will remain unchanged (I
> think), and it uses append_range() which uses print_key2() so we need to test
> every branch of print_key2()...?
See previous comment: TRP will not call print_key2 with multiple key parts, so
the loop can be removed. That is why I wanted to reach agreement on QUICK vs
TRP. I verified with this diff and ran the trace tests:
=== modified file 'sql/opt_range.cc'
--- sql/opt_range.cc 2010-10-22 13:36:50 +0000
+++ sql/opt_range.cc 2010-11-11 13:12:07 +0000
@@ -1988,8 +1988,13 @@ print_key2(String *out, const KEY_PART_I
+ int i =0;
for (; key < key_end; key+= store_length, key_part++)
{
+ i++;
+ if (i>1)
+ DBUG_ASSERT(false);
+
>>>> @@ -1996,8 +2052,54 @@ public: } DBUG_RETURN(quick); } + const char
>>>> *get_name() const {return "range_scan";} + void trace_basic_info(const
>>>> PARAM *param, Opt_trace_object
>> *trace) const
>>>> + { + trace->add("table_read_plan_type", get_name()). + add("index",
>> param->table->key_info[param->real_keynr[key_idx]].name).
>>>
>>> GB is usage of real_keynr here inspired from existing places of code? I
>>> need to ask, I don't know when real_keynr is good and when not. About the
>>> latest patch: keynr_in_table could be const (maybe the compiler can do
>>> more optimizations with this information).
>>
>> Variables storing key number can refer to the position in either table
>> (table->key_info[<this>]) or in param
(param->real_keynr[<this>]). If the
>> variable is the position in param, we use real_keynr to get the position in
>> table.
>>
>> You really have to check which one you're dealing with for each class.
>> TRP::key_idx is key number in param. In the followup patch, I tried to make
>> this more explicit. My code was wrong in some places due to copy-pasting.
>
> GB ok. I suggest to change uint key_idx; /* key number in PARAM::key */ to
> uint key_idx; /* key number in PARAM::key and PARAM::real_keynr */
Done
>>>> + { + String multikey; + for (SEL_ARG *part= current; part; part=
>>>> part->next_key_part)
>>>
>>> GB "part" could be const. In all logic, in those functions which
> only do
>>> tracing, all pointers to the optimizer's structures should be const, to
>>> emphasize that this only updates the trace.
>>
>> Including TRP_ROR_INTERSECT and others with double-pointers? That requires
>> a bunch of casting as well:
>>
>> from this: for (st_ror_scan_info **cur_scan= first_scan; to this: for
>> (const st_ror_scan_info **cur_scan= (const st_ror_scan_info**) first_scan;
>>
>> That's a lot of noise as I see it, but I'll do it if you insist :)
>
> GB I don't insist when C rules make it hard; as you noted, const
> st_ror_scan_info **cur_scan= first_scan; though logical, won't compile (I
> just re-read the explanation in some book, where the author recognizes that
> this is annoying). For simple pointers, I find const to be very valuable as
> safeguard.
I agree that const on simple pointers is good. Thanks for commenting on this
earlier.
>>>> @@ -2246,12 +2425,10 @@ int SQL_SELECT::test_quick_select(THD *t + } +
>>>> +free_mem: + Opt_trace_object trace_range_summary(thd->opt_trace, +
>> "chosen_range_access_summary");
>>>> + if (quick) + { + /* Guilhem question: I've worked with two
>>>> alternative ways of + printing the chosen range access. + + 1) Use
>>>> trace_basic_info() of the TABLE_READ_PLAN object. This + can print
>>>> multi-key-part ranges separately (I prefer this) + + 2) Use dbug_dump()
>>>> of the QUICK_SELECT object. Maybe + conceptually more correct since it
>>>> is what will be + executed, but I cannot make it print multi-key-part
>>>> ranges + in an understandable way (and I tried a lot).
>>>
>>> GB you mean that in a QUICK_SELECT object, we don't have all info which
>>> trace_basic_info() uses? Assuming we go with trace_basic_info(), do you
>>> think that what it prints may be misleading compared to what is in
>>> QUICK_SELECT?
>>
>> The information in QUICK_SELECT is in some compressed form and I am unable
>> to find individual ranges. I don't think it misleads in any way, it only
>> shows the range in a human readable form.
>
> GB Then let's go for the information source which is easier for you (TRP I
> think). Do I understand correctly that: - check_quick_select() calls the
> handler (the storage engine), asking it to give statistics for a set of
> ranges; this set is made accessible to the engine via an "iterator"
> (sel_arg_range_seq_next()) - this iterator creates range intervals as it runs
> (so it's quite a smart iterator), and prints them to the trace? If you see
> anything which would be useful to add to the function comment of
> sel_arg_range_seq_next(), please don't hesitate (have you noticed this subtle
> way to hint that I don't fully understand how this function works?).
Prereq: get_mm_tree() has been called and a SEL_ARG graph consisting of range
intervals has therefore been created. The SEL_ARG graph is constructed in a way
so that SEL_ARG::next/prev pointers form OR conditions and
SEL_ARG::next_key_part form AND conditions.
1) check_quick_select asks multi_range_read_info_const() for the cost (in number
of records) of reading a set of ranges.
2) multi_range_read_info_const() summarizes the records in the different ranges.
It gets the different ranges as range_seq_t structs from the
sel_arg_range_seq_next() iterator. Each of these range_seq_t are the ANDed
form from the SEL_ARG graph in a "compressed" way (see comment on
KeyTupleFormat in opt_range.cc, around line 64).
Now comes the part I don't understand, and is the reason why I'm not able to
print a human readable range from range_seq_t: SEL_ARG has min_flag and
max_flag, which effectively determines the boundaries of the range like this:
min_flag
min_val < X
min_val <= X
max_flag
X < max_val
X <= max_val
and with both upper and lower bound we get
min_val <= X <= max_val
min_val <= X < max_val
etc
where X is an indexed column.
For the range "key1a = 5 and key1b < 10", trace_basic_info will pick up the
SEL_ARGs and print
ranges: [5 <= key1a <= 5 : key1b < 10] (*1)
because the max_flag for the first SEL_ARG is "<=" and the max_flag for the
second SEL_ARG is "<" (see key range flags NO_MIN_RANGE/NEAR_MIN in my_base.h
and in append_range.
In range_seq_t with ANDed conditions, the min-values (5) and max-values (5'10)
of the range have been concatenated to the same min_key and max_key uchar*. So
far so good. The problem is that the flag that determines if we're dealing with
"<", "<=" etc has also been concatenated using bitwise | (see e.g.
step_down_to()). So when we try to print the range above using print_quick() we get:
ranges: [ 5 <= X < 5'10 ]
The min/max values are understandable, but notice that the "upper bound" (col1
<= 5 and col2 < 10) is printed only with '<' due to bitwise | of the flags. I
gave up understanding how this works (the queries return the correct rows, so it
has to work). Because I don't understand this, I am reluctant to add more
documentation to the method in case I add even more confusion.
(*1) Apropos ":" vs "," in ranges: ":" separates ANDed conditions while a new
array entry (separated by ",") separates ORed conditions
>>> GB what does cost_not_reduced mean?
>>
>> adding this index to the intersect does not reduce the cost of the plan.
>> changed to "does_not_reduce_cost_of_intersect"
>
> GB Ok. Maybe then "used_in_intersect" could be "usable"? Trying to
> standardize...
Done
>>
>>>> @@ -4805,6 +5105,12 @@ TRP_ROR_INTERSECT *get_best_covering_ror
>>>> ROR_SCAN_INFO **ror_scans_end= tree->ror_scans_end;
>>>> DBUG_ENTER("get_best_covering_ror_intersect");
>>>>
>>>> + // None of our tests enter this function + Opt_trace_object
>>>> (param->thd->opt_trace). + add("get_best_covering_ror_intersect",
>>>> true). + add("untested_code", true). + add("need_tracing",true);
>>>
>>> GB by "our tests" do you mean the entire mtr testsuite?
>>
>> yes
>
> GB in another place, the patch uses "done_tracing" instead of
> "need_tracing".
Incredible catch!
>> I think we should revert print_quick() and dbug_dump() to how they are in
>> mysql-next-mr-opt-backporting, i.e, not use them for tracing at all.
>
> GB sounds good; who will do it?
I can do it, but not in this changeset. I added it to WL4800_TODO.txt
>>>> @@ -7358,27 +7425,31 @@ best_access_path(JOIN *join, */ if (!(records
>>>> >= s->found_records || best >
>> s->read_time)) // (1)
>>>> { - oto1.add("cost", s->read_time).add("records",
s->found_records); +
>>>> trace_access_scan.add("cost", s->read_time). + add("records",
>>>> s->found_records). + add("chosen", false);
>>> GB There was a line, which this patch deletes, which made sure to
> output
>>> '"chosen":false' at label skip_table_scan, instead of once per if()
>>> block. Why change this?
>>
>> Because I wanted it to print "chosen:false, cause:<whatever>", and
>> <whatever> is different in each if. I could have made a "String cause"
>> instead, or I could ignore cost and just print chosen:F. What do you
> think?
>
> GB you can print cause, but postpone printing "chosen" to a single call right
> after skip_table_scan (and not right before), like this: skip_table_scan:
> trace_scan_path.add("chosen", best_key == NULL); That would replace a few
> add("chosen",false) calls.
Done
--
Jørgen Løland | Senior Software Engineer | +47 73842138
Oracle MySQL
Trondheim, Norway