List:Commits« Previous MessageNext Message »
From:Jorgen Loland Date:October 22 2010 7:04am
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)
WL#5594
View as plain text  
Hi Guilhem

Due to the sheer size of this patch, I have removed comments from you which I 
agree and would only reply "done". Places where I disagree or have further 
information can be found inline (size is now down to 1700 lines!).

See commit email for the new patch.

 > I don't see big problems with your approach.
 > Comments below, prefixed with "GB" (by the way I discovered
 > control-<downarrow> in emacs today, it's great! when pressed, it leaps
 > over any contiguous block of lines starting with ">", so reaches my
 > next comment).
 >
 > I had two choices: either take a few more days to figure out
 > everything by myself, or send now with quite a few questions for you,
 > which I have done.
 >
 > Regarding the produced trace (as seen in the result file), I find it
 > understandable enough (after reading it a few times); I guess we don't
 > need to aim at perfection; if someone later finds a better way to
 > express this or that key/value pair, she/he can always change it.
 >
 > Jorgen Loland a écrit, Le 07.10.2010 11:02:
 >> #At 
file:///export/home/jl208045/mysql/wl4800/mysql-next-mr-opt-backporting-wl4800-patchcleanup/

based on revid:guilhem@stripped
 >>
 >>  3219 Jorgen Loland	2010-10-07
 >>       WL#5594 - Add optimizer traces to the range optimizer
 >>
 >>       Add optimizer trace points to the range optimizer.
 >>      @ mysql-test/include/optimizer_trace.inc
 >>         Added tests to cover range optimizer
 >>      @ mysql-test/r/optimizer_trace_no_prot.result
 >>         Added tests to cover range optimizer. Also adds range information to 
trace of existing queries
 >>      @ sql/handler.cc
 >>         Add tracepoints to range optimizer
 >>      @ sql/opt_range.cc
 >>         Add tracepoints to range optimizer
 >>      @ sql/opt_range.h
 >>         Add tracepoints to range optimizer
 >>      @ sql/sql_select.cc
 >>         Add tracepoints to range optimizer
 >
 >GB Ah! now I see... this patch is about adding tracepoints to the range
 >optimizer!

You're not in favor of tagging each individual file?

 >> === modified file 'mysql-test/include/optimizer_trace.inc'
 >> --- a/mysql-test/include/optimizer_trace.inc	2010-10-05 10:29:43 +0000
 >> +++ b/mysql-test/include/optimizer_trace.inc	2010-10-07 09:02:31 +0000
 >> @@ -473,4 +473,138 @@ select 1 union select 2;
 >>
 >>  drop table t1,t2;
 >>  DROP TABLE C,D;
 >> +
 >> +set optimizer_trace=default;
 >> +#################
 >

 >GB Maybe ##### could be changed to
 ># test of range optimizer tracing
 >I believe that tracing of modules should be moved to separate
 >"optimizer_trace*" tests; optimizer_trace_no_prot becomes big.
 >That applies to pieces which I added to this test too (subquery etc);
 >in fact, to all pieces which are not pretexts for demonstrating
 >certain features of @@optimizer_trace* variables. I can take care of
 >moving my parts if you agree.

Ack. Creating new test file.

 >> +CREATE TABLE t1 (
 >> +  cola char(3) not null,
 >> +  colb char(3) not null,
 >> +  filler char(200),
 >> +  key(cola),
 >> +  key(colb)
 >> +) engine=innodb;
 >

 >GB so far there was a ban on explicit setting of the engine in a generic
 >test; for the case where an engine would not be enabled in a certain
 >binary. Now that InnoDB is default, maybe enabled all the time, I
 >don't know if this is still a problem.
 >Note: if a test doesn't have "require_innodb.inc", then the default
 >engine used is myisam.

There's no need to require innodb in this test. Removed.

 >> === modified file 'mysql-test/r/optimizer_trace_no_prot.result'
 >> --- a/mysql-test/r/optimizer_trace_no_prot.result	2010-10-05 13:19:55 +0000
 >> +++ b/mysql-test/r/optimizer_trace_no_prot.result	2010-10-07 09:02:31 +0000
 >> @@ -41,6 +41,15 @@ QUERY_ID	TRACE	MISSING_BYTES_BEYOND_MAX_
 >>                "C"
 >>              ],
 >>              "records_estimation": [
 >> +              {
 >> +                "table": "C",
 >

 >GB Should we print the database's name when we print the table's name
 >(something which I didn't do myself, but now I start thinking :-)?
 >Otherwise there is an ambiguity if the query contains db1.C and db2.C.

I agree. I updated all tracepoints that prints table, also the ones I
didn't add in this patch.

 >> +                "found_records": 1,
 >> +                "records": 1,
 >
 >GB how is "found_records" different from "records"? What is our policy:
 >have names which an advanced user could understand, or, stick to names
 >used in code (assuming the reader reads code at the same time)? Maybe
 >the former is impossible...
 >
 >> +                "cost": 1,
 >> +                "worst_seeks": 1,
 >
 >GB nobody can know what "worst_seeks" is about, without consulting
 >code (and there is no comment near its declaration in
 >sql_select.h :-( ).

I think I overdid it here. I removed found_records and worst_seeks
since I haven't found any decisions worth tracing that is based on
worst_seek (thus, this is the only place it is mentioned)

 >> @@ -80,9 +89,33 @@ QUERY_ID	TRACE	MISSING_BYTES_BEYOND_MAX_
 >>              "records_estimation": [
 >>                {
 >>                  "table": "D",
 >> -                "more_range_optimizer_trace": {
 >> -                  "TODO?": "yes!"
 >> -                }
 >> +                "range_analysis": {
 >> +                  "table_scan": {
 >> +                    "records": 2,
 >> +                    "cost": 4.5034
 >> +                  },
 >> +                  "potential_range_indices": [
 >> +                    {
 >> +                      "index": "d",
 >> +                      "key_parts": [
 >> +                        "d"
 >> +                      ]
 >> +                    }
 >> +                  ],
 >> +                  "best_full_index_scan": {
 >> +                    "index": "d",
 >> +                    "cost": 1.4233,
 >> +                    "chosen": true
 >> +                  },
 >> +                  "impossible_select": true,
 >

 >GB Here we don't see the ranges which led the range optimizer to conclude
 >that it was impossible to find records. Maybe that would be
 >interesting, if possible (if not, don't bother).

Good point - added tracing like this:

                   "impossible_condition": {
                     "table": "t1",
                     "field": "key4",
                     "cause": "null_field_in_non_null_column"
                   },

Note that if this impossible condition is ORed, it will simply be
removed. If it is ANDed, it will make the entire condition impossible.
This fact is difficult to trace where the decision to remove the
condition is made.

 >> +                  "stop_range_analysis": true,
 >> +                  "chosen_range_access_summary": {
 >> +                    "chosen": false
 >> +                  },
 >> +                  "records": 0
 >> +                },
 >> +                "records": 0,
 >> +                "cause": "impossible_where_condition"
 >
 >GB we have "impossible_select", and a few lines below,
 >"impossible_where_condition": can we avoid having both? If not, can we
 >choose one stadard name?
 >It takes all this:
 >                   "impossible_select": true,
 >                   "stop_range_analysis": true,
 >                   "chosen_range_access_summary": {
 >                     "chosen": false
 >                   },
 >                   "records": 0
 >                 },
 >                 "records": 0,
 >                 "cause": "impossible_where_condition"
 >to say that we have found something impossible; is it possible to
 >shorten it easily? Maybe not, just asking.

We can easily remove
   "stop_range_analysis": true
which I find unnecessary in retrospect.

Further, we can remove

                    "chosen_range_access_summary": {
                      "chosen": false
                    },

easily. That's a matter of taste. I have no strong preference to keep
it, so it's gone :)

 >> @@ -459,9 +615,13 @@ QUERY_ID	TRACE	MISSING_BYTES_BEYOND_MAX_
 >>                  "best_access_path": {
 >>                    "considered_access_paths": [
 >>                      {
 >> -                      "access_type": "table scan",
 >> -                      "cost": 10,
 >> -                      "records": 0,
 >> +                      "access_type": "scan",
 >> +                      "using_range_access": false,
 >
 >GB I wonder how a range access can be used when we are doing a "scan"
 >(which means table scan)? I think range access is only when there is
 >an index...?

Changed to print either

"access_type": "scan"
or
"access_type": "range"

but note that scan can also be index scan.

 >
 >> +                      "using_join_cache": true,
 >
 >GB A note unrelated to your patch: this means that best_access_path()
 >computes costs under the assumption that join cache will be used, but
 >it may not be, the decision is taken much later.

Probably. I haven't looked into that yet.

 >> +                  "analyzing_range_alternatives": {
 >> +                    "range_scan_alternatives": [
 >> +                      {
 >> +                        "index": "d",
 >> +                        "ranges": [
 >> +                          "NULL <= d <= NULL"
 >> +                        ] /* ranges */,
 >> +                        "index_only": true,
 >> +                        "records": 1,
 >> +                        "cost": 2.21,
 >> +                        "rowid_ordered": true,
 >> +                        "chosen": false,
 >> +                        "cause": "higher_cost"
 >> +                      }
 >> +                    ] /* range_scan_alternatives */,
 >> +                    "analyzing_ror_intersect": {
 >> +                      "usable": false,
 >> +                      "cause": "too_few_ror_scans"
 >> +                    } /* analyzing_ror_intersect */
 >> +                  } /* analyzing_range_alternatives */,
 >> +                  "chosen_range_access_summary": {
 >> +                    "chosen": false
 >

 >GB this probably means that we abandon any range method and will go
 >with full index scan or full table scan...?

Yes, because "range_scan_alternatives" -> "higher_cost"

but "chosen_range_access_summary" -> "chosen false" is no longer
printed due to impossible condition question above.

 >> @@ -4252,9 +4885,53 @@ QUERY_ID	TRACE	MISSING_BYTES_BEYOND_MAX_
 >>              "records_estimation": [
 >>                {
 >>                  "table": "D",
 >> -                "more_range_optimizer_trace": {
 >> -                  "TODO?": "yes!"
 >> -                } /* more_range_optimizer_trace */
 >> +                "range_analysis": {
 >> +                  "table_scan": {
 >> +                    "records": 4,
 >> +                    "cost": 4.9068
 >> +                  } /* table_scan */,
 >> +                  "potential_range_indices": [
 >> +                    {
 >> +                      "index": "d",
 >> +                      "key_parts": [
 >> +                        "d"
 >> +                      ] /* key_parts */
 >> +                    }
 >> +                  ] /* potential_range_indices */,
 >> +                  "best_full_index_scan": {
 >> +                    "index": "d",
 >> +                    "cost": 1.8698,
 >> +                    "chosen": true
 >> +                  } /* best_full_index_scan */,
 >> +                  "group_index_range": {
 >> +                    "chosen": false,
 >> +                    "cause": "not_single_table"
 >> +                  } /* group_index_range */,
 >> +                  "analyzing_range_alternatives": {
 >> +                    "range_scan_alternatives": [
 >> +                      {
 >> +                        "index": "d",
 >> +                        "ranges": [
 >> +                          "3 <= d <= 3"
 >> +                        ] /* ranges */,
 >> +                        "index_only": true,
 >> +                        "records": 1,
 >> +                        "cost": 2.21,
 >> +                        "rowid_ordered": true,
 >> +                        "chosen": false,
 >> +                        "cause": "higher_cost"
 >

 >GB how can a range access with 3 <= d <= 3 (i.e. an index lookup?) be
 >more expensive (than index scan?)?

A guess: there are very few records in D, and full index means that
the values we're looking for is in the index. Thus,
best_full_index_scan reads the index only. Further, range access type
with d=3 is not the same as ref access which is actually chosen a bit
further down

I think "best_covering_index_scan" makes this more explicit. Agree?

 >> -  ] /* steps */
 >> -}	0	0
 >> +          }	662	0
 >

 >GB This 662 says that 662 bytes are missing because the trace is bigger
 >than @@optimizer_trace_max_mem_size (it has grown due to range opt
 >trace); I suggest raising the variable for this query.

No longer a problem after changing to 1MB default size

 >> +                  "potential_range_indices": [
 >> +                    {
 >> +                      "index": "i1",
 >> +                      "key_parts": [
 >> +                        "key1"
 >> +                      ]
 >> +                    },
 >> +                    {
 >> +                      "index": "i2",
 >> +                      "key_parts": [
 >> +                        "key2"
 >> +                      ]
 >> +                    },
 >> +                    {
 >> +                      "index": "i3",
 >> +                      "usable": false,
 >> +                      "cause": "not_in_keys_to_use"
 >

 >GB Looks like not_in_keys_to_use is related to keys which the WHERE
 >clause does not use?

Changed to "not_usable_by_conditions" (not mentioning WHERE since it
may also be an ON condition)

 >> +                  "analyzing_index_merge": [
 >> +                    {
 >> +                      "indices_to_merge": [
 >> +                        {
 >> +                          "range_scan_alternatives": [
 >> +                            {
 >> +                              "index": "i1",
 >> +                              "ranges": [
 >> +                                "key1 < 3"
 >> +                              ],
 >> +                              "index_only": true,
 >
 >GB Here I don't understand what "index_only" means. The SELECT wants all
 >columns, and index i1 doesn't contain all columns...?

Good question. In this case, we're analyzing index_merge, which
explicitly calls get_key_scans_params with
index_read_must_be_used==TRUE, and we have

       bool read_index_only= index_read_must_be_used ? TRUE :
                             (bool) param->table->covering_keys.is_set(keynr);

So my interpretation is that we need only the values available in the
index to perform index_merge. The output of the merge will then be
used for looking up records

 >> +                              "records": 3,
 >> +                              "cost": 1.6526,
 >> +                              "rowid_ordered": false,
 >> +                              "chosen": true
 >> +                            }
 >> +                          ],
 >> +                          "index_to_merge": "i1",
 >> +                          "new_cost": 1.6526
 >> +                        },
 >> +                        {
 >> +                          "range_scan_alternatives": [
 >> +                            {
 >> +                              "index": "i2",
 >> +                              "ranges": [
 >> +                                "1020 < key2"
 >> +                              ],
 >> +                              "index_only": true,
 >> +                              "records": 42,
 >> +                              "cost": 10.282,
 >> +                              "rowid_ordered": false,
 >> +                              "chosen": true
 >> +                            }
 >> +                          ],
 >> +                          "index_to_merge": "i2",
 >> +                          "new_cost": 11.935
 >
 >GB ok, so "new_cost" is a cumulated cost. Maybe we should name it
 >"cumulated_cost"? or a better name?

Agree. cumulated_cost it is.

 >> +                        }
 >> +                      ],
 >> +                      "scan_cost": 11.935,
 >

 >GB "scan_cost": what type of scan? this seems to be the cost of the range
 >accesses above.

scan_cost -> cost_of_reading_ranges

 >> +                      "total_cost": 58.363
 >
 >GB Maybe we should also print the difference between scan_cost and
 >total_cost, which seems to be some Unique::get_use_cost?

I added some more info in this area:

                           ] /* range_scan_alternatives */,
                           "index_to_merge": "i2",
                           "cumulated_cost": 11.935
                         }
                       ] /* indices_to_merge */,
                       "cost_of_reading_ranges": 11.935,
                       "cost_sort_rowid_and_read_disk": 8.0666,
                       "cost_duplicate_removal": 38.361,
                       "total_cost": 58.363

 >> +                    }
 >> +                  ],
 >> +                  "chosen_range_access_summary": {
 >> +                    "table_read_plan_type": "index_merge",
 >> +                    "index_merge_of": [
 >> +                      {
 >> +                        "table_read_plan_type": "range_scan",
 >> +                        "index": "i1",
 >> +                        "records": 3,
 >> +                        "ranges": [
 >> +                          "key1 < 3"
 >> +                        ]
 >> +                      },
 >> +                      {
 >> +                        "table_read_plan_type": "range_scan",
 >> +                        "index": "i2",
 >> +                        "records": 42,
 >> +                        "ranges": [
 >> +                          "1020 < key2"
 >> +                        ]
 >> +                      }
 >> +                    ],
 >> +                    "total_records": 45,
 >> +                    "total_cost": 58.363,
 >> +                    "chosen": true
 >> +                  },
 >> +                  "records": 45
 >> +                }
 >> +              }
 >> +            ]
 >> +          },
 >> +          {
 >> +            "considered_execution_plans": [
 >> +              {
 >> +                "table": "t1",
 >> +                "best_access_path": {
 >> +                  "considered_access_paths": [
 >> +                    {
 >> +                      "access_type": "scan",
 >> +                      "using_range_access": true,
 >> +                      "with_where_cond_processing": {
 >> +                        "records": 45,
 >> +                        "cost": 58.363
 >> +                      },
 >> +                      "chosen": true
 >> +                    }
 >> +                  ]
 >> +                },
 >> +                "cost_for_plan": 58.363,
 >> +                "records_for_plan": 45,
 >> +                "chosen": true
 >> +              }
 >> +            ]
 >> +          },
 >> +          {
 >> +            "attaching_conditions_to_tables": {
 >> +              "original_condition": "((`test`.`t1`.`key1` < 3) or 
(`test`.`t1`.`key2` > 1020))",
 >> +              "attached_conditions": [
 >> +                {
 >> +                  "table": "t1",
 >> +                  "attached": "((`test`.`t1`.`key1` < 3) or 
(`test`.`t1`.`key2` > 1020))"
 >
 >GB interesting; as we chose index_merge, I imagine that we're doing index
 >range accesses, so it should not be needed to have attached conditions
 >like this (those are conditions which are tested after fetching
 >records). This is unrelated to your patch.

Interesting indeed... conditions directly involved in setting the
ranges should not be needed here. We should investigate this further.
From
http://dev.mysql.com/doc/refman/5.5/en/range-access-single-part.html:

"In general (and as demonstrated by the preceding example), the
condition used for a range scan is less restrictive than the WHERE
clause. MySQL performs an additional check to filter out rows that
satisfy the range condition but not the full WHERE clause. "


 >> +                }
 >> +              ]
 >> +            }
 >> +          }
 >> +        ]
 >> +      }
 >> +    }
 >> +  ]
 >> +}
 >> +
 >> +EXPLAIN SELECT key2, MIN(key2_1) FROM t2 GROUP BY key2;
 >> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 >> +1	SIMPLE	t2	range	NULL	i2_1	4	NULL	103	Using index for group-by
 >> +
 >> +SELECT TRACE FROM information_schema.OPTIMIZER_TRACE;
 >> +TRACE
 >> +{
 >> +  "top_query": "EXPLAIN SELECT key2, MIN(key2_1) FROM t2 GROUP BY key2",
 >> +  "steps": [
 >> +    {
 >> +      "expanded_query": "/* select#1 */ select `key2` AS 
`key2`,min(`key2_1`) AS `MIN(key2_1)` from `test`.`t2` group by `key2`"
 >> +    },
 >> +    {
 >> +      "join_preparation": {
 >> +        "select#": 1,
 >> +        "steps": [
 >> +        ]
 >> +      }
 >> +    },
 >> +    {
 >> +      "join_optimization": {
 >> +        "select#": 1,
 >> +        "steps": [
 >> +          {
 >> +            "constant_tables": [
 >> +            ],
 >> +            "records_estimation": [
 >> +              {
 >> +                "table": "t2",
 >> +                "const_keys_added": {
 >
 >GB Here again, understanding "const_key" requires knowing code.

Only problem is that I don't fully understand which indices are put
into const_keys either. If you have a better naming suggestion I will
change it.

 >> +                  "keys": [
 >> +                    "i2_1",
 >> +                    "i2_2"
 >> +                  ],
 >> +                  "cause": "group_by"
 >> +                },
 >> +                "range_analysis": {
 >> +                  "table_scan": {
 >> +                    "records": 1024,
 >> +                    "cost": 215.15
 >> +                  },
 >> +                  "potential_range_indices": [
 >> +                    {
 >> +                      "index": "i1a",
 >> +                      "usable": false,
 >> +                      "cause": "not_in_keys_to_use"
 >> +                    },
 >> +                    {
 >> +                      "index": "i1b",
 >> +                      "usable": false,
 >> +                      "cause": "not_in_keys_to_use"
 >> +                    },
 >> +                    {
 >> +                      "index": "i2_1",
 >> +                      "key_parts": [
 >> +                        "key2",
 >> +                        "key2_1"
 >> +                      ]
 >> +                    },
 >> +                    {
 >> +                      "index": "i2_2",
 >> +                      "key_parts": [
 >> +                        "key2",
 >> +                        "key2_1"
 >> +                      ]
 >> +                    }
 >> +                  ],
 >> +                  "best_full_index_scan": {
 >> +                    "index": "i2_1",
 >> +                    "cost": 235.03,
 >> +                    "chosen": false
 >GB not chosen because more expensive than table scan?

More expensive than the best plan found so far. In this case: table_scan.

Added: "cause": "higher_cost"

 >> +                  },
 >> +                  "group_index_range": {
 >> +                    "potential_group_range_indices": [
 >> +                      {
 >> +                        "index": "i1a",
 >> +                        "usable": false,
 >> +                        "covering_index": false
 >> +                      },
 >> +                      {
 >> +                        "index": "i1b",
 >> +                        "usable": false,
 >> +                        "covering_index": false
 >> +                      },
 >> +                      {
 >> +                        "index": "i2_1",
 >> +                        "covering": true,
 >> +                        "records": 103,
 >> +                        "cost": 50.6
 >> +                      },
 >> +                      {
 >> +                        "index": "i2_2",
 >> +                        "covering": true,
 >> +                        "records": 103,
 >> +                        "cost": 50.6
 >> +                      }
 >> +                    ]
 >GB Was group_index_range chosen or not?
 >I could not find a place where this group_index_range is marked
 >"chosen".

added chosen T/F and cause:cost

 >> +                  },
 >> +                  "best_group_range_summary": {
 >> +                    "table_read_plan_type": "index_group",
 >> +                    "index": "i2_1",
 >> +                    "records": 103,
 >> +                    "cost": 50.6
 >> +                  },
 >> +                  "chosen_range_access_summary": {
 >> +                    "table_read_plan_type": "index_group",
 >> +                    "index": "i2_1",
 >> +                    "records": 103,
 >> +                    "cost": 50.6,
 >> +                    "total_records": 103,
 >> +                    "total_cost": 50.6,
 >GB what's the diff between "total_records" and "records"?

records is records of accessing index_group, total_records is records
in the entire plan. These produce different numbers in index_merge. To
clearify it now prints records_in_plan/cost_of_plan:

For index_merge:

                   "chosen_range_access_summary": {
                     "range_access_plan": {
                       "type": "index_merge",
                       "index_merge_of": [
                         {
                           "type": "range_scan",
                           "index": "i1",
                           "records": 3,
                           "ranges": [
                             "key1 < 3"
                           ] /* ranges */
                         },
                         {
                           "type": "range_scan",
                           "index": "i2",
                           "records": 42,
                           "ranges": [
                             "1020 < key2"
                           ] /* ranges */
                         }
                       ] /* index_merge_of */
                     } /* range_access_plan */,
                     "records_in_plan": 45,
                     "cost_of_plan": 58.363,
                     "chosen": true


For index_group:

                   "chosen_range_access_summary": {
                     "range_access_plan": {
                       "type": "index_group",
                       "index": "i2_1",
                       "records": 103,
                       "cost": 50.6
                     } /* range_access_plan */,
                     "records_in_plan": 103,
                     "cost_of_plan": 50.6,


 >> +                    "chosen": true
 >> +                  },
 >> +                  "records": 103

I removed this ^ "records"

 >> +EXPLAIN SELECT * FROM t1 WHERE key2=10 OR key3=3 OR key4 <=> null;
 >> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 >> +1	SIMPLE	t1	index_merge	i2,i3,i4	i2,i3	4,4	NULL	2	Using union(i2,i3); Using 
where
 >

 >GB do you know why we don't also union with i4?

Because key4 is defined as not null, so that part of the condition is
always false. This is made more explicit by the previously suggested
change of displaying what happens inside range analysis:

                   ] /* potential_range_indices */,
                   "impossible_condition": {
                     "table": "t1",
                     "field": "key4",
                     "cause": "null_field_in_non_null_column"
                   } /* impossible_condition */,

 >> +            "records_estimation": [
 >> +              {
 >> +                "table": "t2",
 >> +                "range_analysis": {
 >> +                  "table_scan": {
 >> +                    "records": 1024,
 >> +                    "cost": 215.15
 >> +                  },
 >> +                  "potential_range_indices": [
 >> +                    {
 >> +                      "index": "i1a",
 >> +                      "usable": false,
 >> +                      "cause": "not_in_keys_to_use"
 >> +                    },
 >> +                    {
 >> +                      "index": "i1b",
 >> +                      "usable": false,
 >> +                      "cause": "not_in_keys_to_use"
 >> +                    },
 >> +                    {
 >> +                      "index": "i2_1",
 >> +                      "key_parts": [
 >> +                        "key2",
 >> +                        "key2_1"
 >> +                      ]
 >GB here we don't have "usable":true. We need to solve the problem of
 >boolean attributes which are sometimes present sometimes absent: when
 >absent, what is their implicit value? Or should we explicitely print
 >"true" for them?
 >For what it's worth, in Python the None value evaluate to false.

Ok, I didn't pay much attention to always display T or F for boolean
outputs. In this case I added usable T to potential_range_indices:

                      "index": "i2_1",
                       "usable": true,
                       "key_parts": [
                         "key2",
                         "key2_1"
                       ] /* key_parts */

Are you suggesting that any Opt_trace_object that may contain
<something>:F should in all other cases contain <something>:T? That
will be a big job...

 >> +                    },
 >> +                    {
 >> +                      "index": "i2_2",
 >> +                      "key_parts": [
 >> +                        "key2",
 >> +                        "key2_1"
 >> +                      ]
 >> +                    }
 >> +                  ],
 >> +                  "range_scan_possible": false,
 >GB does the trace above tell why range scan is not possible? I could not find.

It's because it evaluates to always possible. The code that figures
this out is very hard to trace so I prefer not to do that, but the
result (SEL_TREE::ALWAYS => condition always true) is pretty easy to
print. I added this:

                   "range_scan_possible": false,
                   "cause": "condition_always_true",

 >> +            "records_estimation": [
 >> +              {
 >> +                "table": "t1",
 >> +                "range_analysis": {
 >> +                  "table_scan": {
 >> +                    "records": 1024,
 >> +                    "cost": 217.15
 >> +                  },
 >> +                  "potential_range_indices": [
  ...
 >> +                  ],
 >> +                  "group_index_range": {
 >> +                    "chosen": false,
 >> +                    "cause": "not_group_by_or_distinct"
 >> +                  },
 >> +                  "analyzing_index_merge": [
 >> +                    {
 >> +                      "indices_to_merge": [
 >> +                        {
 >> +                          "range_scan_alternatives": [
 >> +                            {
 >> +                              "index": "i1",
 >> +                              "ranges": [
 >> +                                "1 < key1"
 >> +                              ],
 >> +                              "index_only": true,
 >> +                              "records": 1023,
 >> +                              "cost": 227.35,
 >> +                              "rowid_ordered": false,
 >> +                              "chosen": false,
 >> +                              "cause": "higher_cost"
 >
 >GB higher than table scan?

Higher than the best access method evaluated so far. In this case:
table_scan. We don't know that from the code easily. All we know is
that the cost is higher than some previously found cost. It should be
fairly easy to understand from the trace though.

 >> +                            }
 >> +                          ],
 >> +                          "chosen": false,
 >> +                          "cause": "expensive"
 >
 >GB could we use the same cause ("higher_cost"), or do "expensive" and
 >"higher_cost" carry different meanings?

higher_cost, expensive etc => cause:cost as suggested further down the review.

 >> +                    "analyzing_ror_intersect": {
 >> +                      "intersecting_indices": [
 >> +                        {
 >> +                          "index": "cola",
 >> +                          "used_in_intersect": true,
 >> +                          "matching_records_now": 512,
 >> +                          "cost_now": 518.14,
 >> +                          "covering_now": false
 >
 >GB why "_now" ?

If the intersection of indices contains all fields, the intersection
is covering. Changed to "isect_covering_with_this_index"

 >> === modified file 'sql/handler.cc'
 >> --- a/sql/handler.cc	2010-09-27 09:26:39 +0000
 >> +++ b/sql/handler.cc	2010-10-07 09:02:31 +0000
 >> @@ -4381,14 +4381,12 @@ handler::multi_range_read_info_const(uin
 >>    *bufsz= 0;
 >>
 >>    seq_it= seq->init(seq_init_param, n_ranges, *flags);
 >> -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >>    /*
 >>      Given that this method can be overridden by engines, it doesn't sound
 >>      smart to do tracing here. seq->next() produces tracing when it's equal
to
 >>      sel_arg_range_seq_next() (prints the range).
 >>    */
 >>    Opt_trace_array ota(thd->opt_trace, "ranges");
 >> -#endif
 >
 >GB What do you think about my comment under #ifndef? If you find that
 >there's no big problem, or no choice, you can delete it.

I haven't found any other way to do this.

 >> === modified file 'sql/opt_range.cc'
 >> --- a/sql/opt_range.cc	2010-09-27 09:26:39 +0000
 >> +++ b/sql/opt_range.cc	2010-10-07 09:02:31 +0000
 >> @@ -30,7 +30,8 @@
 >>      and builds lists of intervals (in index/partitioning space), such that
 >>      all possible records that match the condition are contained within the
 >>      intervals.
 >> -    The entry point for the range analysis module is get_mm_tree() function.
 >> +    The entry point for the range analysis module is get_mm_tree()
 >> +    (mm=min_max) function.
 >
 >GB ahah, one less mystery! thanks!
 >
 >GB How about having a "RANGE_OPTIMIZER" flag, used in the constructor of
 >a few top-level objects/arrays of the range opt code, so that a user
 >can request hiding of the trace pieces generated by the range
 >optimizer?

Of course. I have added flag a flag and use it so that when disabled
we still get table_scan and the range access summary parts which I
find interesting even if the intermediate decisions are not shown:

                 "range_analysis": {
                   "table_scan": {
                     "records": 1024,
                     "cost": 217.15
                   } /* table_scan */,
                   "potential_range_indices": "...",
                   "group_index_range": "...",
                   "analyzing_index_merge": "...",
                   "chosen_range_access_summary": {
                     "range_access_plan": {
                       "type": "index_merge",
                       "index_merge_of": [
                         {
                           "type": "range_scan",
                           "index": "i1",
                           "records": 3,
                           "ranges": [
                             "key1 < 3"
                           ] /* ranges */
                         },
                         {
                           "type": "range_scan",
                           "index": "i2",
                           "records": 42,
                           "ranges": [
                             "1020 < key2"
                           ] /* ranges */
                         }
                       ] /* index_merge_of */
                     } /* range_access_plan */,
                     "records_in_plan": 45,
                     "cost_of_plan": 58.363,
                     "chosen": true
                   } /* chosen_range_access_summary */
                 } /* range_analysis */

Is this acceptable? Should we go as far as
"range_analysis": "..."
?

 >> +        continue;
 >> +      }
 >> +      key++;                                    // Skip null byte
 >> +      store_length--;
 >> +    }
 >> +    field->set_key_image(key, key_part->length);
 >

 >GB do you know what set_key_image() does? I imagine it leaves data in
 >some member of the field; so the content of that member is different
 >depending on whether trace is enabled or not; we need to be sure that
 >this difference is innocuous.

I am pretty sure it sets the key value (less than value, greater than
value etc) in field's part of record[0] so that field->val_*() (which
is instructed to read from some offset in record[0]) can output the
key value as string. I am also pretty sure record[0] is overwritten
every time new data is requested, so whatever we put in here will not
be visible when reading records.

Btw: note that this is SergeyP's code

 >
 >> +    if (field->type() == MYSQL_TYPE_BIT)
 >> +      (void) field->val_int_as_str(&tmp, 1);
 >> +    else
 >> +      field->val_str(&tmp);
 >
 >GB do we need to have such special case for MYSQL_TYPE_BIT?

I have no idea. SP wrote this.

 >> +      out->append("'",1);
 >
 >GB out->append('\'') should do. And, what does this quote server for? Is
 >it that for a multi-column key we want to print
 >part1'part2'part3?

Yes, that's what it is for. If we go for trace_basic_info() (comment
further down) we should be able to do without this.

 >If this has to be, we see that we compute key+store_length above,
 >compare it to key_end, then we go to the next iteration, where we
 >compute key+store_length again (in the incrementation in the for(...)
 >line) and compare it with key_end again (in the for(...)), it's a bit
 >inefficient.
 >Also, we don't append a quote after NULL (as we just "continue"), this
 >sounds strange. Will a multi-column key (NULL,3,4) be printed as
 >NULL3'4 ? That should be tested.
 >I suggest to test every branch of this function (the BIT one, the
 >quote one, etc) in some .test file.

Ok, but can we reach a conclusion on tracing QUICK vs TRP first?

 >> @@ -1996,8 +2052,54 @@ public:
 >>      }
 >>      DBUG_RETURN(quick);
 >>    }
 >> +  const char *get_name() const {return "range_scan";}
 >> +  void trace_basic_info(const PARAM *param, Opt_trace_object *trace) const
 >> +  {
 >> +    trace->add("table_read_plan_type", get_name()).
 >> +      add("index",
param->table->key_info[param->real_keynr[key_idx]].name).
 >
 >GB is usage of real_keynr here inspired from existing places of code?
 >I need to ask, I don't know when real_keynr is good and when not.
 >About the latest patch: keynr_in_table could be const (maybe the
 >compiler can do more optimizations with this information).

Variables storing key number can refer to the position in either table
(table->key_info[<this>]) or in param (param->real_keynr[<this>]). If
the variable is the position in param, we use real_keynr to get the
position in table.

You really have to check which one you're dealing with for each class.
TRP::key_idx is key number in param. In the followup patch, I tried to
make this more explicit. My code was wrong in some places due to
copy-pasting.

 >
 >> +      add("records", records);
 >
 >GB should we print also "read_cost" (like we do for ror_intersect)? And
 >the is_ror member?

This one has been removed (see further up)

 >> +      param->table->key_info[param->real_keynr[key_idx]].key_part;
 >
 >GB you may want to cache the address of
 >param->table->key_info[param->real_keynr[key_idx]] to avoid computing
 >twice this long expression.
 >
 >> +    for (SEL_ARG *current= key; current; current= current->next)

done in followup patch already sent to you

 >GB "current" could be const
 >
 >> +    {
 >> +      String multikey;
 >> +      for (SEL_ARG *part= current; part; part= part->next_key_part)
 >
 >GB "part" could be const. In all logic, in those functions which only do
 >tracing, all pointers to the optimizer's structures should be const,
 >to emphasize that this only updates the trace.

Including TRP_ROR_INTERSECT and others with double-pointers? That
requires a bunch of casting as well:

from this:
    for (st_ror_scan_info **cur_scan= first_scan;
to this:
    for (const st_ror_scan_info **cur_scan= (const st_ror_scan_info**) first_scan;

That's a lot of noise as I see it, but I'll do it if you insist :)

 >
 >> +    {
 >> +      KEY key=
param->table->key_info[param->real_keynr[(*cur_scan)->keynr]];
 >
 >GB print_ror_scans_arr gets a KEY through
 >   param->table->key_info[(*cur_scan)->keynr]
 >so I am clueless about whether param->real_keynr is good/bad/useless
 >here. The description of real_keynr says "Only makes sense if
 >using_real_indexes==TRUE"...? Is
 >param->table->key_info[(*cur_scan)->keynr] sometimes unreliable?

Fixed in followup patch

 >> +      Opt_trace_object trace_isect_idx(param->thd->opt_trace);
 >> +      trace_isect_idx.add("table_read_plan_type", "range_scan").
 >GB ok, looks like inside the for() we print ranges as in the
 >TRP_RANGE case.

yes, this mimics TRP_RANGE::trace_...()

 >> +        add("index", key.name).
 >> +        add("records", (*cur_scan)->records);
 >> +      {
 >GB I'm not sure we need to open a new scope here. Destructors are called
 >in the reverse order of constructors, so trace_range will be closed
 >before trace_isect_idx, as is desirable.

No, we don't. Most likely, I did this because I previously added
something to trace_isect after the for()

 >> +        for (SEL_ARG *current= (*cur_scan)->sel_arg;
 >> +             current;
 >> +             current= current->next)
 >> +        {
 >> +          String range;
 >> +          for (SEL_ARG *part= current;
 >> +               part;
 >> +               part= part->next_key_part)
 >> +          {
 >> +            KEY_PART_INFO *key_part= key.key_part+part->part;
 >> +            append_range(&range, key_part,
 >> +                         part->min_value, key_part->length,
 >> +                         part->max_value, key_part->length,
 >> +                         part->min_flag|part->max_flag);
 >GB I suggest to make the code similar to the TRP_RANGE case (or the
 >contrary); for example, in both do cache key_part (as above), and do
 >cache "length" (as in TRP_RANGE).

Good idea

 >GB Is this 3-loop function covered by tests?

Yes, search for "intersect_of" in the result file.

 >> +          }
 >> +          trace_range.add(range.c_ptr());
 >GB In the latest tree, there is add(char*,size_t), so a suggestion is to
 >change all calls like above to:
 >add(range.ptr(),range.length());
 >Because the value (range.ptr()) may contain non-ASCII characters, it
 >should be in UTF8 (JSON parses only UTF8); the add(char*, size_t) will
 >additionally do escaping of strange characters found in the value
 >(like \0, \n) so that a JSON parser can parse it.
 >Let's study how to achieve this.
 >append_range() uses print_key2() which uses Field::val_str():
 >   inline String *val_str(String *str) { return val_str(str, str); }
 >If we look at Field_blob::val_str():
 >String *Field_blob::val_str(String *val_buffer __attribute__((unused)),
 >			    String *val_ptr)
 >{
 >   ASSERT_COLUMN_MARKED_FOR_READ;
 >   char *blob;
 >   memcpy(&blob, ptr+packlength, sizeof(char*));
 >   if (!blob)
 >     val_ptr->set("",0,charset());	// A bit safer than ->length(0)
 >   else
 >     val_ptr->set((const char*) blob,get_length(ptr),charset());
 >   return val_ptr;
 >}
 >this val_ptr->set() changes the String's charset to be the field's
 >charset.
 >So it is possible that in the first iteration of the loops of
 >print_key2() or of trace_basic_info(), "range"'s charset is set to X,
 >"range" receives characters in this charset, then in a second
 >iteration of the loop, "range"'s charset becomes Y and "range"
 >receives characters in this charset. In the end, we don't know what
 >charset "range" is in, and it contains a mix of charsets. Possibly not
 >in UTF8 at all, and we cannot make it UTF8 (it's too late, we have a
 >mix).
 >I suggest to change code to be like this:
 >- in print_key2(), change this line:
 >     out->append(tmp.ptr(), tmp.length());
 >to
 >     out->append(tmp.ptr(), tmp.length(), tmp.charset());
 >if all goes well this should convert from tmp.charset() to "out"'s
 >charset before storing in "out"
 >- we need "out" charset to be declared as UTF8 for the previous point
 >to work; thus, add after each declaration of "out" a mention of the
 >charset, for example in
 >void QUICK_RANGE_SELECT::dbug_dump(int indent, bool verbose),
 >change
 >       String range_info;
 >       range= *pr;
 >to
 >       String range_info;
 >       range_info.set_charset(system_charset_info);
 >       range= *pr;
 >

Ok, done

 >GB Could you also, please, add a test for this:
 >- a test of escaping, by storing \n in some range bound of your query,
 >which shows up in the trace via print_key2(), so that we can check
 >that displays well (escaped) and that it's valid JSON (you can use
 >WL4800_validate_json.py on the result file to see)


Done.

Problem: It breaks the line in the trace output. The problem is that
escape=FALSE in call to do_add(..., bool escape)

 >- a test of charset correctness: you can add queries/indices to
 >optimizer_trace_charset.test (test present in the latest tree).

done


 >> @@ -2033,6 +2177,19 @@ public:
 >>                               MEM_ROOT *parent_alloc);
 >>    TABLE_READ_PLAN **first_ror; /* array of ptrs to plans for merged scans */
 >>    TABLE_READ_PLAN **last_ror;  /* end of the above array */
 >> +  const char *get_name() const {return "index_roworder_union";}
 >
 >GB Sometimes in the patch, words are separated by "_", sometimes they are
 >glued, like index_roworder_union: is there an idea behind the
 >differences?

Not really. I try to make them easily readable.

 >> +  void trace_basic_info(const PARAM *param, Opt_trace_object *trace) const
 >> +  {
 >> +    trace->add("table_read_plan_type", get_name());
 >> +    Opt_trace_array ota(param->thd->opt_trace, "union_of");
 >> +    for (TABLE_READ_PLAN **current= first_ror;
 >> +         current != last_ror;
 >> +         current++)
 >> +    {
 >> +      Opt_trace_object trp_info(param->thd->opt_trace);
 >> +      (*current)->trace_basic_info(param,&trp_info);
 >
 >GB space after ',' . I hope you can later do a pass to spot and fix all
 >little coding style issues.

I'll do my best. Hopefully emacs regexp is helpfull.

 >> +      add("index", get_idx_name()).
 >> +      add("records", records).
 >> +      add("cost", read_cost);
 >
 >GB This is scarce info compared to other trace_basic_info(), don't we
 >need to dump something here (range_tree? index_tree? no clue).

You're right. I found something interesting to print: range info :)

 >> @@ -2246,12 +2425,10 @@ int SQL_SELECT::test_quick_select(THD *t
 >>    else if (read_time <= 2.0 && !force_quick_range)
 >>      DBUG_RETURN(0);				/* No need for quick select */
 >>
 >> -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >> -  Opt_trace_object oto(thd->opt_trace, "range_analysis");
 >> +  Opt_trace_object trace_range(thd->opt_trace, "range_analysis");
 >>    Opt_trace_object(thd->opt_trace, "table_scan").
 >

 >GB In some other place of the patch further down, "table scan" is
 >replaced by "scan". We should decide and be consistent. What do you
 >prefer?
 >"scan" is vague; it could well be an "index scan", which is a way of
 >scanning. But "table scan" is vague too: scanning the index is a way
 >to scan the table, so a table scan could be an index scan. What "table
 >scan" really means in MySQL is "scan not using an index", "non-index
 >scan". The doc talks about "table scan" and defines it as "full table
 >scan" (see the description of "ALL" in
 >http://dev.mysql.com/doc/refman/5.5/en/explain-output.html ).
 >A google search for "oracle table scan" brings up "full table scan".

The reason I changed "table scan" to "scan" further up was because it
can be either a table or index scan (or maybe I'm wrong?) The reason
for printing "table_scan" here is because it is a table scan, not
index scan (again, maybe I'm wrong).

 >> @@ -2359,12 +2543,10 @@ int SQL_SELECT::test_quick_select(THD *t
 >>          chosen= true;
 >>        }
 >>
 >> -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >> -      Opt_trace_object(thd->opt_trace, "best_covering_scan").
 >> -        add(head->key_info[key_for_use].name, "index").
 >> -        add(key_read_time, "cost").
 >> -        add(chosen, "chosen");
 >> -#endif
 >> +      Opt_trace_object(thd->opt_trace, "best_full_index_scan").
 >
 >GB So here we have "full_index_scan". For non-index scan, we used
 >"table_scan", or "scan". We should normalize on one term.

As stated above, this is now called "best_covering_index_scan". I can
remove "covering", but I think "best" should be there to indicate that
there may be other covering indices, but this is the best one.

 >> +    }
 >> +
 >> +free_mem:
 >> +    Opt_trace_object trace_range_summary(thd->opt_trace,
 >> +                                         "chosen_range_access_summary");
 >> +    if (quick)
 >> +    {
 >> +      /* Guilhem question: I've worked with two alternative ways of
 >> +         printing the chosen range access.
 >> +
 >> +         1) Use trace_basic_info() of the TABLE_READ_PLAN object. This
 >> +            can print multi-key-part ranges separately (I prefer this)
 >> +
 >> +         2) Use dbug_dump() of the QUICK_SELECT object. Maybe
 >> +            conceptually more correct since it is what will be
 >> +            executed, but I cannot make it print multi-key-part ranges
 >> +            in an understandable way (and I tried a lot).
 >
 >GB you mean that in a QUICK_SELECT object, we don't have all info which
 >trace_basic_info() uses?
 >Assuming we go with trace_basic_info(), do you think that what it
 >prints may be misleading compared to what is in QUICK_SELECT?

The information in QUICK_SELECT is in some compressed form and I am
unable to find individual ranges. I don't think it misleads in any
way, it only shows the range in a human readable form.

 >> +      trace_range_summary.add("total_records", quick->records).
 >> +        add("total_cost", quick->read_time).
 >GB Isn't this info (records/cost) already printed by the
 >trace_basic_info() call above?

See comment about total_records further up in the .result section

 >> +        add("chosen", true);
 >> +    }
 >> +    else
 >> +    {
 >> +      trace_range_summary.add("chosen", false);
 >> +      print_quick(quick, &needed_reg);
 >GB You want to use print_quick() in this branch?

Hmm... I wonder what I was thinking here!

 >> @@ -3928,6 +4194,8 @@ TABLE_READ_PLAN *get_best_disjunct_quick
 >>    if (imerge_cost > read_time ||
 >> 
!param->thd->optimizer_switch_flag(OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION))
 >>    {
 >> +    trace_best_disjunct.add("use_ror_index_merge", true).
 >GB Sometimes the patches uses "roworder" sometimes "ror", in
 >keys/values. We should pick one.

=> roworder


 >> @@ -4614,9 +4870,17 @@ TRP_ROR_INTERSECT *get_best_ror_intersec
 >>    double min_cost= DBL_MAX;
 >>    DBUG_ENTER("get_best_ror_intersect");
 >>
 >> +  Opt_trace_object trace_ror(param->thd->opt_trace,
"analyzing_ror_intersect");
 >> +
 >>    if ((tree->n_ror_scans < 2) ||
!param->table->file->stats.records ||
 >> 
!param->thd->optimizer_switch_flag(OPTIMIZER_SWITCH_INDEX_MERGE_INTERSECT))
 >> +  {
 >> +    if (tree->n_ror_scans < 2)
 >> +      trace_ror.add("usable", false).add("cause", "too_few_ror_scans");
 >
 >GB That means, if there is only one, there is nothing to intersect?
 >
 >> +    else
 >> +      trace_ror.add("usable", false).add("done_tracing", false);
 >
 >GB What does "done_tracing" mean here? looks like something is missing?

I haven't been able to find a query that gets here, so I have no idea
what it means.

 >> @@ -4683,28 +4947,42 @@ TRP_ROR_INTERSECT *get_best_ror_intersec
 >>    ROR_SCAN_INFO **intersect_scans_best;
 >>    cur_ror_scan= tree->ror_scans;
 >>    intersect_scans_best= intersect_scans;
 >> -  while (cur_ror_scan != tree->ror_scans_end &&
!intersect->is_covering)
 >>    {
 >> -    /* S= S + first(R);  R= R - first(R); */
 >> -    if (!ror_intersect_add(intersect, *cur_ror_scan, FALSE))
 >> +    Opt_trace_array ota(param->thd->opt_trace, "intersecting_indices");
 >> +    while (cur_ror_scan != tree->ror_scans_end &&
!intersect->is_covering)
 >>      {
 >> -      cur_ror_scan++;
 >> -      continue;
 >> -    }
 >> +      Opt_trace_object trace_idx(param->thd->opt_trace);
 >> +      trace_idx.add("index",
 >> +                   
param->table->key_info[(*cur_ror_scan)->keynr].name);
 >> +      /* S= S + first(R);  R= R - first(R); */
 >> +      if (!ror_intersect_add(intersect, *cur_ror_scan, FALSE))
 >> +      {
 >> +        trace_idx.add("used_in_intersect", false).
 >> +          add("cause", "cost_not_reduced");
 >
 >GB what does cost_not_reduced mean?

adding this index to the intersect does not reduce the cost of the
plan. changed to "does_not_reduce_cost_of_intersect"


 >> @@ -4728,9 +5006,22 @@ TRP_ROR_INTERSECT *get_best_ror_intersec
 >>      if (ror_intersect_add(intersect, cpk_scan, TRUE) &&
 >>          (intersect->total_cost < min_cost))
 >>      {
 >> +      Opt_trace_object (param->thd->opt_trace, "clustered_pk").
 >> +        add("cpk_scan_added_to_intersect", true).
 >> +        add("new_cost", intersect->total_cost);
 >
 >GB Sometimes we have "cost", "cost_now", "new_cost"; we need to
 >standardize.

Discussed above: cumulated_cost

 >> @@ -4805,6 +5105,12 @@ TRP_ROR_INTERSECT *get_best_covering_ror
 >>    ROR_SCAN_INFO **ror_scans_end= tree->ror_scans_end;
 >>    DBUG_ENTER("get_best_covering_ror_intersect");
 >>
 >> +  // None of our tests enter this function
 >> +  Opt_trace_object (param->thd->opt_trace).
 >> +    add("get_best_covering_ror_intersect", true).
 >> +    add("untested_code", true).
 >> +    add("need_tracing",true);
 >
 >GB by "our tests" do you mean the entire mtr testsuite?

yes

 >> +      else
 >> +        trace_idx.add("chosen", false).add("cause", "higher_cost");
 >

 >GB Does this mean, cost of range methods is bigger than cost of full
 >index scan? How could that be? I think a range method scans only a
 >piece of the index, so how can it be more costly than a full index
 >scan?

For an index scan, the interesting pieces of information can be found
in the index itself so we do not need to look to the records. Can
probably only happen if the range does not filter away many records.

 >> @@ -5100,10 +5396,12 @@ QUICK_SELECT_I *TRP_ROR_INTERSECT::make_
 >>                                               "creating ROR-intersect",
 >>                                               first_scan, last_scan););
 >>      alloc= parent_alloc? parent_alloc: &quick_intrsect->alloc;
 >> -    for (; first_scan != last_scan;++first_scan)
 >> +    for (st_ror_scan_info **current= first_scan;
 >> +         current != last_scan;
 >> +         current++)
 >
 >GB It sounds much better indeed to not increment the first_scan
 >member. I guess no other code depended on this incrementation?

No other code depends on it

 >> @@ -7839,17 +8159,18 @@ walk_up_n_right:
 >>          seq->param->is_ror_scan= FALSE;
 >>      }
 >>    }
 >> -  OPT_TRACE2(seq->param->thd->opt_trace,
 >> -             print_range(seq->param->thd->opt_trace,
 >> -                        
seq->param->table->key_info[seq->real_keyno].key_part,
 >> -                         range->start_key.key,
 >> -                         range->start_key.length,
 >> -                         range->end_key.key,
 >> -                         range->end_key.length,
 >> -                         cur->min_key_flag | cur->max_key_flag));
 >>
 >>    seq->param->range_count++;
 >>   
seq->param->max_key_part=max(seq->param->max_key_part,key_tree->part);
 >> +
 >> +  if (key_range_trace.length())
 >> +  {
 >> +    DBUG_ASSERT(seq->param->thd->opt_trace);
 >> +    DBUG_ASSERT(seq->param->thd->opt_trace->is_started());
 >> +    Opt_trace_array *trace_range=
 >> +     
(Opt_trace_array*)seq->param->thd->opt_trace->get_current_struct();
 >

 >GB Do we have a way to pass this array down from the callers, for more
 >safety? Maybe by adding it as a member of "SEL_ARG_RANGE_SEQ seq" (I'm
 >looking for a way which doesn't affect the storage engine API).

Yes, that's possible, but I'm not sure I like that better than getting
it like here.

 >If impossible, it would be better to not cast, and declare trace_range
 >of type Opt_trace_struct*. Indeed, with casting, if the struct happens
 >to be Opt_trace_object, a wrong cast may give a crash including in
 >release binaries. Without casting, the worse which can happen is a
 >JSON syntax error, which asserts in debug builds but is not dangerous
 >in release binaries (see Opt_trace_context::syntax_error()).

Good point! Cast removed.

 >I have the goal to make get_current_struct() private soon if
 >possible. So, nobody could access "the current struct"; it could just
 >add objects/arrays (which implicitely adds them as children of the
 >current struct), or pass down an object/array if a function wants to
 >add to it.

Ok, but can I please get this patch in first?

 >> +static void append_range(String *out,
 >> +                         KEY_PART_INFO *key_parts,
 >> +                         const uchar *min_key, uint16 min_length,
 >> +                         const uchar *max_key, uint16 max_length,
 >> +                         uint flag)
 >>  {
 >> -  const uchar *key_end= key+used_length;
 >> -  String tmp;
 >> -  uint store_length;
 >> -  TABLE *table= key_part->field->table;
 >> -  my_bitmap_map *old_sets[2];
 >> +  if (out->length() > 0)
 >> +    out->append(" : ");

 >GB for literals like this,
 >    out->append(" : ", 3);
 >should avoid strlen(), or alternatively
 >    out->append(STRING_WITH_LEN(" : "));
(cut)
 >Is this addition of ":" covered in tests?

Yet another test not moved from dev-test file to committed test file. Done

 >> @@ -11927,20 +12280,17 @@ static void print_quick(QUICK_SELECT_I *
 >>    DBUG_UNLOCK_FILE;
 >>    DBUG_VOID_RETURN;
 >>  }
 >> -#endif
 >> -
 >>
 >>  void QUICK_RANGE_SELECT::dbug_dump(int indent, bool verbose)
 >>  {
 >>    Opt_trace_context *out= current_thd->opt_trace;
 >>    /* purecov: begin inspected */
 >> -#ifndef NO_OPT_TRACE_FOR_RANGE_OPT
 >> -  Opt_trace_object oto(out, "range_select");
 >> -  oto.add(head->key_info[index].name, "index").
 >> -    add(static_cast<ulonglong>(max_used_key_length), "key_length");
 >> +  Opt_trace_object oto(out, "range_scan");
 >> +  oto.add("index", head->key_info[index].name).
 >> +    add("key_length", max_used_key_length).
 >> +    add("used_keyparts", used_key_parts);
 >

 >GB This isn't covered in optimizer_trace_no_prot; do we need to print to
 >the trace in this dbug_dump()? My experience is that such dbug_dump()
 >is called from several contexts, including some where a trace would
 >not be desirable.
 >It looks like it's only print_quick() which calls dbug_dump()
 >functions, and print_quick() is called only once, around line 2757, in
 >an "else" branch, whereas the alternate branch prints to the optimizer
 >trace; I suggest not making print_quick() print to the opt trace, same
 >for dbug_dump() functions.

I think we should revert print_quick() and dbug_dump() to how they are
in mysql-next-mr-opt-backporting, i.e, not use them for tracing at all.

 >> @@ -863,6 +864,14 @@ class SQL_SELECT :public Sql_alloc {
 >>    bool check_quick(THD *thd, bool force_quick_range, ha_rows limit)
 >>    {
 >>      key_map tmp(key_map::ALL_BITS);
 >> +
 >> +    /* Entrypoint for inser/update/deletes with condition.
 >> +       test_quick_select adds tracing with keys, and we are currently
 >> +       in a trace array that does not accept keys. Todo: move this
 >> +       wrapper trace object to sql_insert/update/delete where it is
 >> +       more meaningful.
 >> +    */
 >
 >GB Do you plan to implement this todo now, or later (before or after it
 >reaches the main tree)?

Hmm... I had a look at insert/update/delete and think it could remain
here. Objections?

 >> @@ -4943,14 +4947,28 @@ make_join_statistics(JOIN *join, TABLE_L
 >>
 >>        for (s= stat ; s < stat_end ; s++)
 >>        {
 >> +        Opt_trace_object trace_table(join->thd->opt_trace);
 >> +        trace_table.add("table", s->table->alias);
 >>          if (s->type == JT_SYSTEM || s->type == JT_CONST)
 >>          {
 >> +          trace_table.add("found_records", 1).
 >> +            add("records", 1).
 >> +            add("cost", 1).
 >> +            add("worst_seeks", 1.0);
 >> +
 >> +          if (s->ref.key_parts)
 >> +            trace_table.add("index", (s->table->key_info+
s->ref.key)->name);
 >
 >GB what does this if() do? I mean, what is the "index" about?

Removed

 >> @@ -5013,9 +5027,16 @@ make_join_statistics(JOIN *join, TABLE_L
 >>              {
 >>                /* Generate empty row */
 >>                s->info= "Impossible ON condition";
 >> +              trace_table.add("returning_empty_null_row", true).
 >> +                add("cause", "impossible_on_condition");
 >
 >GB or impossible_ON_condition? just an idea to make sure the "on" is
 >not taken as a real part of the sentence but as a SQL word.

I've tried to stick to lowercase in all tracing. Maybe that's not
necessary but I think it looks nice.

 >> @@ -6494,11 +6515,14 @@ add_group_and_distinct_keys(JOIN *join,
 >>    Item_field *cur_item;
 >>    key_map possible_keys;
 >>
 >> +  char* cause;
 >

 >GB given that we assign string literals to it, this should be const char*.

Puzzling

 >>    }
 >>    else if (is_indexed_agg_distinct(join, &indexed_fields))
 >GB Do you know what this is?

Yes, it is "select count(distinct x)"

 >> +/**
 >> +  Print keys that were appended to join_tab->const_keys because they
 >> +  can be used for GROUP BY or DISTINCT to the optimizer trace.
 >> +
 >> +  @param thd       Thread for the connection that submitted the query
 >> +  @param join_tab  The table the indices cover
 >> +  @param new_keys  The keys that are considered useful because they can
 >> +                   be used for GROUP BY or DISTINCT
 >> +  @param cause     Zero-terminated string with reason for adding indices
 >> +                   to const_keys
 >> +
 >> +  @see add_group_and_distinct_keys()
 >> + */
 >> +static void trace_indices_added_group_distinct(THD *thd,
 >

 >GB The only member of thd which this function uses is thd->opt_trace, I
 >suggest passing that as argument.
 >There is probably matter for arguing here. Some would say that thd is
 >just a context which should be passed all around; I rather believe
 >that a function should be passed only what's needed. I imagine both
 >have pros and cons.

I agree with passing the least amount of info possible. Changed.

 >> +      if (new_keys.is_set(j) && !existing_keys.is_set(j))
 >> +        trace_key.add(key_info[j].name);
 >> +  }
 >> +  trace_summary.add("cause", cause);
 >GB Is it normal that "cause" is an attribute of trace_summary? Shouldn't
 >it be of the upper object instead?

No, in this case I don't think so. It's "added indices x,y,z. cause:
can be used for group by".

 >> @@ -6970,8 +7037,8 @@ best_access_path(JOIN      *join,
 >>        loose_scan_opt.next_ref_key();
 >>        DBUG_PRINT("info", ("Considering ref access on key %s",
 >>                            keyuse->table->key_info[keyuse->key].name));
 >> -      Opt_trace_object oto1(thd->opt_trace);
 >> -      oto1.add("access_type", "index").add("index", keyinfo->name);
 >> +      Opt_trace_object trace_access_idx(thd->opt_trace);
 >> +      trace_access_idx.add("access_type", "index").add("index",
keyinfo->name);
 >>
 >>        /*
 >>           True if we find some keys from the range optimizer that match more
 >> @@ -7027,7 +7094,7 @@ best_access_path(JOIN      *join,
 >>        */
 >>        if (!found_part && !ft_key &&
!loose_scan_opt.have_a_case())
 >>        {
 >> -        oto1.add("usable", false);
 >> +        trace_access_idx.add("usable", false);
 >>          goto done_with_index;                  // Nothing usable found
 >>        }
 >>
 >> @@ -7306,7 +7373,7 @@ best_access_path(JOIN      *join,
 >>          loose_scan_opt.check_ref_access_part2(key, start_key, records, tmp);
 >>
 >>        } /* not ft_key */
 >> -      oto1.add("cost", tmp).add("records", records);
 >> +      trace_access_idx.add("records", records).add("cost", tmp);
 >>        /** @todo trace quick_matches_more_parts etc? */
 >>        if (tmp < best_time - records/(double) TIME_FOR_COMPARE ||
 >>            (quick_matches_more_parts &&
 >> @@ -7321,13 +7388,13 @@ best_access_path(JOIN      *join,
 >>          best_ref_depends_map= found_ref;
 >>        }
 >>    done_with_index:
 >> -      oto1.add("chosen", best_key == start_key);
 >> +      trace_access_idx.add("chosen", best_key == start_key);
 >>      } /* for each key */
 >>      records= best_records;
 >>    }
 >>
 >> -  Opt_trace_object oto1(thd->opt_trace);
 >> -  oto1.add("access_type", "table scan");
 >> +  Opt_trace_object trace_access_scan(thd->opt_trace);
 >> +  trace_access_scan.add("access_type", "scan");
 >>    /*
 >>      Don't test table scan if it can't be better.
 >>      Prefer key lookup if we would use the same key for scanning.
 >> @@ -7358,27 +7425,31 @@ best_access_path(JOIN      *join,
 >>    */
 >>    if (!(records >= s->found_records || best > s->read_time))       
    // (1)
 >>    {
 >> -    oto1.add("cost", s->read_time).add("records", s->found_records);
 >> +    trace_access_scan.add("cost", s->read_time).
 >> +      add("records", s->found_records).
 >> +      add("chosen", false);
 >GB There was a line, which this patch deletes, which made sure to output
 >'"chosen":false' at label skip_table_scan, instead of once per if()
 >block. Why change this?

Because I wanted it to print "chosen:false, cause:<whatever>", and
<whatever> is different in each if. I could have made a "String cause"
instead, or I could ignore cost and just print chosen:F. What do you think?

 >> @@ -7466,7 +7540,8 @@ best_access_path(JOIN      *join,
 >>        }
 >>      }
 >>
 >> -    oto1.add("cost", tmp).add("records", rows2double(rnd_records));
 >> +    Opt_trace_object (thd->opt_trace, "with_where_cond_processing").
 >> +      add("records", rows2double(rnd_records)).add("cost", tmp);
 >
 >GB This "cost" (tmp) does not include the cost of WHERE; according to the
 >code a few lines down, that would be
 >tmp + record_count * rnd_records / TIME_FOR_COMPARE.
 >So I wonder about the name "with_where_cond_processing".

Puzzling how it got that way :-/ Maybe I have shuffled the code.

-- 
Jørgen Løland | Senior Software Engineer | +47 73842138
Oracle MySQL
Trondheim, Norway
Thread
bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219) WL#5594Jorgen Loland7 Oct
  • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Guilhem Bichot16 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Jorgen Loland22 Oct
      • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Guilhem Bichot10 Nov
        • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Jorgen Loland12 Nov
          • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Guilhem Bichot15 Nov
      • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Guilhem Bichot7 Feb
        • Re: bzr commit into mysql-next-mr-bugfixing branch (jorgen.loland:3219)WL#5594Jorgen Loland8 Feb