Roy Lyseng a écrit, Le 21.03.2011 14:43:
> Hi Guilhem,
> thanks for commenting.
> There is one question for you below, otherwise I think I have good
> answers for all your comments.
> On 17.03.11 15.54, Guilhem Bichot wrote:
>> Hello Roy,
>> Looks good, but I'm still nit-picking:
>> Roy Lyseng a écrit, Le 15.03.2011 14:14:
>>> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>>> 3348 Roy Lyseng 2011-03-15
>>> Bug#11822517: Refactor call interface of choose_plan()
>>> The main purposes of this patch is
>>> 1. Make the interface of choose_plan() and greedy_search() simpler.
>>> 2. Prepare for a simpler test for dependencies when doing
>>> semi join materialization together with outer join (WL#5561).
>>> The fix involves replacing the join_tables argument to choose_plan()
>>> with a join nest pointer, which is NULL when the entire query will
>>> be optimized. Thus, the set of relevant tables is instead calculated
>>> inside this function. We also clarify that we need to pass as
>>> remaining_tables to
>>> best_access_path() not only the tables remaining to be optimized,
>>> but also all tables outside of the current semi join nest.
>>> optimize_semijoin_nests() is now taking the map of all query tables
>>> from the passed JOIN object instead of an argument.
>>> best_access_path() - meaning of argument remaining_tables is changed so
>>> that it actually is interpreted as all tables not added to the current
>>> partial plan.
>>> @@ -7537,24 +7547,31 @@ best_access_path(JOIN *join,
>>> TRUE Fatal error
>>> -static bool
>>> -choose_plan(JOIN *join, table_map join_tables)
>>> +static bool choose_plan(JOIN *join, TABLE_LIST *sjm_nest)
>>> uint search_depth= join->thd->variables.optimizer_search_depth;
>>> uint prune_level= join->thd->variables.optimizer_prune_level;
>>> bool straight_join= test(join->select_options &
>>> + table_map join_tables;
>>> + join->emb_sjm_nest= sjm_nest;
>> This is unrelated to your patch, but join->emb_sjm_nest is used only for
>> reading, so it could be made "pointer to const"; same for "n_tables" in
>> the greedy_search() function. I can send you a patch incremental over
>> yours, if
>> you are interested. I would also make the "emb_sjm_nest"
>> parameter of choose_plan() a pointer to const. And I could make a few
>> variables constant too.
> Please do.
Thanks for accepting! I'll write and send a patch as soon as your patch
is final (it will be easier for you: there will be no conflicts).
>>> @@ -7567,6 +7584,7 @@ choose_plan(JOIN *join, table_map join_t
>>> records accessed.
>>> jtab_sort_func= straight_join ? join_tab_cmp_straight : join_tab_cmp;
>>> + join_tables= join->all_table_map & ~join->const_table_map;
>>> my_qsort2(join->best_ref + join->const_tables,
>>> join->tables - join->const_tables, sizeof(JOIN_TAB*),
>>> @@ -7594,6 +7612,9 @@ choose_plan(JOIN *join, table_map join_t
>>> if (join->thd->lex->is_single_level_stmt())
>>> join->thd->status_var.last_query_cost= join->best_read;
>>> + join->emb_sjm_nest= NULL;
>> If we left the function with DBUG_RETURN(TRUE) we forget to restore
>> join->emb_sj_nest. I don't know whether this is a real problem, as we
>> will return an error anyway.
> AFAIK, emb_sj_nest is only used inside choose_plan(), so as long as it
> is set on entry, it will be safe. I've also added this fact to
>>> @@ -7986,16 +8007,14 @@ greedy_search(JOIN *join,
>> > +, but also all tables outside of this nest, because there may
>>> + be key references between the semi join nest and the outside tables
>>> + that should not be considered when materializing the semi join nest.
>>> + */
>>> + const table_map excluded_tables=
>>> + join->emb_sjm_nest ?
>>> + join->all_table_map & ~join->emb_sjm_nest->sj_inner_tables :
>> I continue to think that this 'excluded_tables' should be a parameter to
>> best_extension_...(), on par with 'remaining_tables'.
>> 'remaining_tables' is set by the caller using a formula which involves
>> sj_inner_tables; this is precisely so that best_extension...() does
>> not look
>> into sj_inner_tables...
>> I am not asking that best_access_path() gets a new parameter; only
>> best_extension...(), greedy_search() and optimize_straight_join()
>> would get a
>> new parameter.
> As it is constant throughout the optimization, I'd rather make it a
> member of the work object (ie JOIN), instead of adding an argument. What
> do you think?
Ideally I prefer to keep information which is useful to a function local
to that function, so, in the present case, passed as a parameter.
Having it as a member in JOIN, though possibly leading to faster code
(no repeated pushing of parameters on the stack), has the problem that
people may later copy-paste code involving join->excluded_tables, into
some other context where this variable is not set/reliable:
if we have
f(JOIN *join) // called from choose_plan()
if (join->excluded_tables) ...
then someone later decides to call this f() from outside of
choose_plan(), f() will do wrong.
It is more natural to have the variable created on the stack of
choose_plan(), and passed to f():
f(JOIN *join, uint excluded_tables) // called from choose_plan()
if (excluded_tables) ...
With this, someone using f() elsewhere necessarily needs to figure out a
value to pass as parameter "excluded_tables". Stupid copy-paste is made
harder and dependencies are clearly shown.
On the other hand, we already have JOIN::emb_sjm_nest which has this
characteristic (present in JOIN though meaningful only in a certain
function)... but I'm not sure this is a technique which we should spread
Having information local to where it's used is nice. Having lots of
function parameters is bad. So I don't know what to think.
If you agree, let's wait with this particular item until we get the
opinion of a second reviewer.
>>> + 0;
>>> - table_map allowed_tables= ~(table_map)0;
>>> - if (join->emb_sjm_nest)
>>> - allowed_tables= join->emb_sjm_nest->sj_inner_tables;
>>> - bool has_sj= !join->select_lex->sj_nests.is_empty();
>>> + const bool has_sj=
>>> + !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest);
>> I define 'has_sj' like this:
>> "whether the join nest we're optimizing contains a semi join nest".
>> If join->emb_sjm_nest is non-NULL, then this semi join nest doesn't
>> contain a semi join nest, for sure, as you explain in some comments.
>> If join->emb_sjm_nest is NULL, we're optimizing the complete query, and
>> its semi join nests are _then_ listed in join->select_lex->sj_nests.
>> So I'd prefer if join->emb_sj_nest was tested first instead of second,
>> computing 'has_sj'.
>> The idea is that if join->emb_sj_nest is non-NULL then we shouldn't even
>> consult join->select_lex->sj_nests because it is unrelated to the semi
>> join nests of the *to-be-optimized join nest*.
>> I know it may sound like bikeshed painting...
> More bikeshed painting: If join->select_lex->sj_nests is empty,
> join->emb_sj_nest cannot possibly be non-NULL, and this is slightly more
> efficient in in the non-semijoin case...
ok then let's keep it as you wrote.