List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:March 21 2011 9:45pm
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517
View as plain text  
Hello,

Roy Lyseng a écrit, Le 21.03.2011 14:43:
> Hi Guilhem,
> 
> thanks for commenting.
> 
> There is one question for you below, otherwise I think I have good 
> answers for all your comments.
> 
> On 17.03.11 15.54, Guilhem Bichot wrote:
>> Hello Roy,
>>
>> Looks good, but I'm still nit-picking:
>>
>> Roy Lyseng a écrit, Le 15.03.2011 14:14:
>>> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>>> revid:roy.lyseng@stripped
>>>
>>> 3348 Roy Lyseng 2011-03-15
>>> Bug#11822517: Refactor call interface of choose_plan()
>>> The main purposes of this patch is
>>> 1. Make the interface of choose_plan() and greedy_search() simpler.
>>> 2. Prepare for a simpler test for dependencies when doing
>>> semi join materialization together with outer join (WL#5561).
>>> The fix involves replacing the join_tables argument to choose_plan()
>>> with a join nest pointer, which is NULL when the entire query will
>>> be optimized. Thus, the set of relevant tables is instead calculated
>>> inside this function. We also clarify that we need to pass as 
>>> remaining_tables to
>>> best_access_path() not only the tables remaining to be optimized,
>>> but also all tables outside of the current semi join nest.
>>> sql/sql_select.cc
>>> optimize_semijoin_nests() is now taking the map of all query tables
>>> from the passed JOIN object instead of an argument.
>>> best_access_path() - meaning of argument remaining_tables is changed so
>>> that it actually is interpreted as all tables not added to the current
>>> partial plan.

>>> @@ -7537,24 +7547,31 @@ best_access_path(JOIN *join,
>>> TRUE Fatal error
>>> */
>>>
>>> -static bool
>>> -choose_plan(JOIN *join, table_map join_tables)
>>> +static bool choose_plan(JOIN *join, TABLE_LIST *sjm_nest)
>>> {
>>> uint search_depth= join->thd->variables.optimizer_search_depth;
>>> uint prune_level= join->thd->variables.optimizer_prune_level;
>>> bool straight_join= test(join->select_options &
> SELECT_STRAIGHT_JOIN);
>>> + table_map join_tables;
>>> DBUG_ENTER("choose_plan");
>>>
>>> + join->emb_sjm_nest= sjm_nest;
>>
>> This is unrelated to your patch, but join->emb_sjm_nest is used only for
>> reading, so it could be made "pointer to const"; same for "n_tables" in
>> the greedy_search() function. I can send you a patch incremental over 
>> yours, if
>> you are interested. I would also make the "emb_sjm_nest"
>> parameter of choose_plan() a pointer to const. And I could make a few 
>> "n_tables"
>> variables constant too.
> 
> Please do.

Thanks for accepting! I'll write and send a patch as soon as your patch 
is final (it will be easier for you: there will be no conflicts).

>>
>>> }
>>> else
>>> {
>>> @@ -7567,6 +7584,7 @@ choose_plan(JOIN *join, table_map join_t
>>> records accessed.
>>> */
>>> jtab_sort_func= straight_join ? join_tab_cmp_straight : join_tab_cmp;
>>> + join_tables= join->all_table_map & ~join->const_table_map;
>>> }
>>> my_qsort2(join->best_ref + join->const_tables,
>>> join->tables - join->const_tables, sizeof(JOIN_TAB*),
>>> @@ -7594,6 +7612,9 @@ choose_plan(JOIN *join, table_map join_t
>>> */
>>> if (join->thd->lex->is_single_level_stmt())
>>> join->thd->status_var.last_query_cost= join->best_read;
>>> +
>>> + join->emb_sjm_nest= NULL;
>>
>> If we left the function with DBUG_RETURN(TRUE) we forget to restore
>> join->emb_sj_nest. I don't know whether this is a real problem, as we
>> will return an error anyway.
> 
> AFAIK, emb_sj_nest is only used inside choose_plan(), so as long as it 
> is set on entry, it will be safe. I've also added this fact to 
> sql_select.h.

ok

>>
>>> DBUG_RETURN(FALSE);
>>> }
>>>
>>> @@ -7986,16 +8007,14 @@ greedy_search(JOIN *join,
>>> DBUG_ENTER("greedy_search");
...
>>
>>  > +, but also all tables outside of this nest, because there may
>>> + be key references between the semi join nest and the outside tables
>>> + that should not be considered when materializing the semi join nest.
>>> + */
>>> + const table_map excluded_tables=
>>> + join->emb_sjm_nest ?
>>> + join->all_table_map & ~join->emb_sjm_nest->sj_inner_tables :
>>
>> I continue to think that this 'excluded_tables' should be a parameter to
>> best_extension_...(), on par with 'remaining_tables'.
>> 'remaining_tables' is set by the caller using a formula which involves
>> sj_inner_tables; this is precisely so that best_extension...() does 
>> not look
>> into sj_inner_tables...
>> I am not asking that best_access_path() gets a new parameter; only
>> best_extension...(), greedy_search() and optimize_straight_join() 
>> would get a
>> new parameter.
> 
> As it is constant throughout the optimization, I'd rather make it a 
> member of the work object (ie JOIN), instead of adding an argument. What 
> do you think?

Ideally I prefer to keep information which is useful to a function local 
to that function, so, in the present case, passed as a parameter.

Having it as a member in JOIN, though possibly leading to faster code 
(no repeated pushing of parameters on the stack), has the problem that 
people may later copy-paste code involving join->excluded_tables, into 
some other context where this variable is not set/reliable:
if we have
f(JOIN *join) // called from choose_plan()
{
   if (join->excluded_tables) ...
}
then someone later decides to call this f() from outside of 
choose_plan(), f() will do wrong.
It is more natural to have the variable created on the stack of 
choose_plan(), and passed to f():
f(JOIN *join, uint excluded_tables) // called from choose_plan()
{
   if (excluded_tables) ...
}
With this, someone using f() elsewhere necessarily needs to figure out a 
value to pass as parameter "excluded_tables". Stupid copy-paste is made 
harder and dependencies are clearly shown.

On the other hand, we already have JOIN::emb_sjm_nest which has this 
characteristic (present in JOIN though meaningful only in a certain 
function)... but I'm not sure this is a technique which we should spread 
more.

Having information local to where it's used is nice. Having lots of 
function parameters is bad. So I don't know what to think.

If you agree, let's wait with this particular item until we get the 
opinion of a second reviewer.

>>> + 0;
>>>
>>> - table_map allowed_tables= ~(table_map)0;
>>> - if (join->emb_sjm_nest)
>>> - allowed_tables= join->emb_sjm_nest->sj_inner_tables;
>>> -
>>> - bool has_sj= !join->select_lex->sj_nests.is_empty();
>>> + const bool has_sj=
>>> + !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest);
>>
>> I define 'has_sj' like this:
>> "whether the join nest we're optimizing contains a semi join nest".
>> If join->emb_sjm_nest is non-NULL, then this semi join nest doesn't
>> contain a semi join nest, for sure, as you explain in some comments.
>> If join->emb_sjm_nest is NULL, we're optimizing the complete query, and
>> its semi join nests are _then_ listed in join->select_lex->sj_nests.
>> So I'd prefer if join->emb_sj_nest was tested first instead of second, 
>> when
>> computing 'has_sj'.
>> The idea is that if join->emb_sj_nest is non-NULL then we shouldn't even
>> consult join->select_lex->sj_nests because it is unrelated to the semi
>> join nests of the *to-be-optimized join nest*.
>> I know it may sound like bikeshed painting...
> 
> More bikeshed painting: If join->select_lex->sj_nests is empty, 
> join->emb_sj_nest cannot possibly be non-NULL, and this is slightly more 
> efficient in in the non-semijoin case...

ok then let's keep it as you wrote.

Thread
bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Roy Lyseng15 Mar
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Guilhem Bichot17 Mar
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Ole John Aske17 Mar
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Roy Lyseng21 Mar
      • Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Guilhem Bichot21 Mar