Hello Jorgen,
Some replies inline.
I have a request: could you please commit the review-related changes as
an incremental patch? that would be me review them faster.
Jorgen Loland a écrit, Le 12.11.2010 10:39:
> Hi Guilhem,
> On 11/10/2010 10:58 PM, Guilhem Bichot wrote:
> >>>> === modified file 'mysql-test/r/optimizer_trace_no_prot.result'
> ---
> >>>> a/mysql-test/r/optimizer_trace_no_prot.result 2010-10-05
> >> 13:19:55 +0000
> >>>> +++ b/mysql-test/r/optimizer_trace_no_prot.result 2010-10-07
> >> 09:02:31 +0000
> >>>> @@ -459,9 +615,13 @@ QUERY_ID TRACE MISSING_BYTES_BEYOND_MAX_
> >>>> "best_access_path": { "considered_access_paths": [ { -
> "access_type":
> >>>> "table scan", - "cost": 10, - "records": 0, + "access_type":
> "scan", +
> >>>> "using_range_access": false,
> >>>
> >>> GB I wonder how a range access can be used when we are doing a "scan"
> >>> (which means table scan)? I think range access is only when there
> is an
> >>> index...?
> >>
> >> Changed to print either
> >>
> >> "access_type": "scan" or "access_type": "range"
> >>
> >> but note that scan can also be index scan.
> >
> > GB Looks like you are right, scan can be index scan. But in the
> trace, the
> > opposite of "access_type:scan" is "access_type:index"; maybe this
> latter one
> > is misleading: it is about an index, but not about an index scan (it is
> > "scan" which is about an "index scan" possibly). Then what is
> > "access_type:index" about? Is it "ref" access, or possibly something
> else?
> > Maybe it can only be "ref" access: I see that if we set best_key, we
> then set
> > POSITION::key, which is described in sql_select.h as non-NULL only
> for "ref"
> > methods. If that's only for "ref" access, maybe we should say "ref"
> instead
> > of "index" for "access_type".
>
> Good point. "access_type", "index" is added to the trace in
> best_access_path() here:
>
> 7023 loose_scan_opt.next_ref_key();
> 7024 DBUG_PRINT("info", ("Considering ref access on key %s",
> 7025 keyuse->table->key_info[keyuse->key].name));
> 7026 Opt_trace_object trace_access_idx(thd->opt_trace);
> 7027 trace_access_idx.add_str("access_type", "index").
> 7028 add_str("index", keyinfo->name, strlen(keyinfo->name));
>
> As you can see on line 7024, the dbug_print comment is indeed that ref
> access is
> considered.
GB do you mean you would change "index" to "ref" in the first add_str()?
that would be good, to me.
> >>> GB do you know what set_key_image() does? I imagine it leaves data in
> >>> some member of the field; so the content of that member is different
> >>> depending on whether trace is enabled or not; we need to be sure that
> >>> this difference is innocuous.
> >>
> >> I am pretty sure it sets the key value (less than value, greater
> than value
> >> etc) in field's part of record[0] so that field->val_*() (which is
> >> instructed to read from some offset in record[0]) can output the key
> value
> >> as string. I am also pretty sure record[0] is overwritten every time
> new
> >> data is requested, so whatever we put in here will not be visible when
> >> reading records.
> >
> > GB ok
> >
> >> Btw: note that this is SergeyP's code
> >
> > GB yes. And it's us who are going to put it into a production
> release, and
> > who will fix the bug if a customer hits one ;-)
>
> You are of course right. Did you find the explanation satisfactory? It's
> difficult to prove that something always happens :-/
GB Yes, it is satisfactory, and it looks like the function uses
set_key_image() like other existing functions do.
> >>>> + if (field->type() == MYSQL_TYPE_BIT) + (void)
> >>>> field->val_int_as_str(&tmp, 1); + else +
> field->val_str(&tmp);
> >>>
> >>> GB do we need to have such special case for MYSQL_TYPE_BIT?
> >>
> >> I have no idea. SP wrote this.
> >
> > GB Ok, we need to investigate/test this (if not before the push, we
> should
> > put it in WL4800_TODO.txt).
>
> Added to WL4800_TODO.txt
GB ok
> >>>> + out->append("'",1);
> >>>
> >>> GB out->append('\'') should do. And, what does this quote server
> > for? Is
> >>> it that for a multi-column key we want to print part1'part2'part3?
> >>
> >> Yes, that's what it is for. If we go for trace_basic_info() (comment
> >> further down) we should be able to do without this.
> >
> > GB but trace_basic_info() uses append_range() which uses print_key2().
>
> Yes, sorry for the ambiguity: What I meant is that trace_basic_info will
> not use
> print_key2 on multi-column key part. trace_basic_info calls print_key2()
> for
> individual parts, so the ' is not needed :)
>
> >>> If this has to be, we see that we compute key+store_length above,
> compare
> >>> it to key_end, then we go to the next iteration, where we compute
> >>> key+store_length again (in the incrementation in the for(...) line)
> and
> >>> compare it with key_end again (in the for(...)), it's a bit
> inefficient.
> >>> Also, we don't append a quote after NULL (as we just "continue"), this
> >>> sounds strange. Will a multi-column key (NULL,3,4) be printed as
> NULL3'4
> >>> ? That should be tested. I suggest to test every branch of this
> function
> >>> (the BIT one, the quote one, etc) in some .test file.
> >>
> >> Ok, but can we reach a conclusion on tracing QUICK vs TRP first?
> > GB but even if we stick with TRP, trace_basic_info() will remain
> unchanged (I
> > think), and it uses append_range() which uses print_key2() so we need
> to test
> > every branch of print_key2()...?
>
> See previous comment: TRP will not call print_key2 with multiple key
> parts, so
> the loop can be removed. That is why I wanted to reach agreement on
> QUICK vs
> TRP. I verified with this diff and ran the trace tests:
>
> === modified file 'sql/opt_range.cc'
> --- sql/opt_range.cc 2010-10-22 13:36:50 +0000
> +++ sql/opt_range.cc 2010-11-11 13:12:07 +0000
> @@ -1988,8 +1988,13 @@ print_key2(String *out, const KEY_PART_I
> + int i =0;
> for (; key < key_end; key+= store_length, key_part++)
> {
> + i++;
> + if (i>1)
> + DBUG_ASSERT(false);
> +
GB Ok; then I suggest simplifying print_key2(): the for() and the if()
towards the iteration's end (which adds '\'') can be replaced by a
simple block with an assertion that key+store_length==key_end. And,
append_range() has an argument "key_parts" which could be made
non-plural, to emphasize that it's a single keypart.
> >>>> @@ -2246,12 +2425,10 @@ int SQL_SELECT::test_quick_select(THD *t +
>
> } +
> >>>> +free_mem: + Opt_trace_object
> trace_range_summary(thd->opt_trace, +
> >> "chosen_range_access_summary");
> >>>> + if (quick) + { + /* Guilhem question: I've worked with two
> >>>> alternative ways of + printing the chosen range access. + + 1) Use
> >>>> trace_basic_info() of the TABLE_READ_PLAN object. This + can print
> >>>> multi-key-part ranges separately (I prefer this) + + 2) Use
> dbug_dump()
> >>>> of the QUICK_SELECT object. Maybe + conceptually more correct
> since it
> >>>> is what will be + executed, but I cannot make it print
> multi-key-part
> >>>> ranges + in an understandable way (and I tried a lot).
> >>>
> >>> GB you mean that in a QUICK_SELECT object, we don't have all info
> which
> >>> trace_basic_info() uses? Assuming we go with trace_basic_info(), do
> you
> >>> think that what it prints may be misleading compared to what is in
> >>> QUICK_SELECT?
> >>
> >> The information in QUICK_SELECT is in some compressed form and I am
> unable
> >> to find individual ranges. I don't think it misleads in any way, it
> only
> >> shows the range in a human readable form.
> >
> > GB Then let's go for the information source which is easier for you
> (TRP I
> > think). Do I understand correctly that: - check_quick_select() calls the
> > handler (the storage engine), asking it to give statistics for a set of
> > ranges; this set is made accessible to the engine via an "iterator"
> > (sel_arg_range_seq_next()) - this iterator creates range intervals as
> it runs
> > (so it's quite a smart iterator), and prints them to the trace? If
> you see
> > anything which would be useful to add to the function comment of
> > sel_arg_range_seq_next(), please don't hesitate (have you noticed
> this subtle
> > way to hint that I don't fully understand how this function works?).
>
> Prereq: get_mm_tree() has been called and a SEL_ARG graph consisting of
> range intervals has therefore been created. The SEL_ARG graph is
> constructed in a way so that SEL_ARG::next/prev pointers form OR
> conditions and SEL_ARG::next_key_part form AND conditions.
>
> 1) check_quick_select asks multi_range_read_info_const() for the cost
> (in number
> of records) of reading a set of ranges.
> 2) multi_range_read_info_const() summarizes the records in the different
> ranges.
> It gets the different ranges as range_seq_t structs from the
> sel_arg_range_seq_next() iterator. Each of these range_seq_t are the
> ANDed
> form from the SEL_ARG graph in a "compressed" way (see comment on
> KeyTupleFormat in opt_range.cc, around line 64).
>
> Now comes the part I don't understand, and is the reason why I'm not
> able to print a human readable range from range_seq_t: SEL_ARG has
> min_flag and max_flag, which effectively determines the boundaries of
> the range like this:
>
> min_flag
> min_val < X
> min_val <= X
>
> max_flag
> X < max_val
> X <= max_val
>
> and with both upper and lower bound we get
>
> min_val <= X <= max_val
> min_val <= X < max_val
> etc
>
> where X is an indexed column.
>
> For the range "key1a = 5 and key1b < 10", trace_basic_info will pick up
> the SEL_ARGs and print
>
> ranges: [5 <= key1a <= 5 : key1b < 10] (*1)
>
> because the max_flag for the first SEL_ARG is "<=" and the max_flag for
> the second SEL_ARG is "<" (see key range flags NO_MIN_RANGE/NEAR_MIN in
> my_base.h and in append_range.
>
> In range_seq_t with ANDed conditions, the min-values (5) and max-values
> (5'10) of the range have been concatenated to the same min_key and
> max_key uchar*. So far so good. The problem is that the flag that
> determines if we're dealing with "<", "<=" etc has also been
> concatenated using bitwise | (see e.g. step_down_to()). So when we try
> to print the range above using print_quick() we get:
>
> ranges: [ 5 <= X < 5'10 ]
>
> The min/max values are understandable, but notice that the "upper bound"
> (col1 <= 5 and col2 < 10) is printed only with '<' due to bitwise | of
> the flags. I gave up understanding how this works (the queries return
> the correct rows, so it has to work). Because I don't understand this, I
> am reluctant to add more documentation to the method in case I add even
> more confusion.
GB Mmm I'll need to re-read again this long explanation (too tired now ;-)
> (*1) Apropos ":" vs "," in ranges: ":" separates ANDed conditions while
> a new array entry (separated by ",") separates ORed conditions
> >> I think we should revert print_quick() and dbug_dump() to how they
> are in
> >> mysql-next-mr-opt-backporting, i.e, not use them for tracing at all.
> >
> > GB sounds good; who will do it?
>
> I can do it, but not in this changeset. I added it to WL4800_TODO.txt
GB ok