Hi Øystein,
thanks for good suggestions!
Øystein Grøvlen wrote:
> Hi Roy,
>
> Thanks for the updated patch. This looks even better. See in-line for
> some nit-picking.
>
> Roy Lyseng wrote:
> > #At file:///home/rl136806/mysql/repo/mysql-6.0-work2/ based on
> revid:horst.hunger@stripped
> >
> > 3829 Roy Lyseng 2010-04-21
> > WL#5266 - Refactor data used to collect correlated expressions
> in semijoins
>
> ...
>
> > === modified file 'sql/sql_select.cc'
> > --- a/sql/sql_select.cc 2010-04-17 06:34:02 +0000
> > +++ b/sql/sql_select.cc 2010-04-21 10:27:39 +0000
> > @@ -275,7 +275,15 @@ static uint make_join_orderinfo(JOIN *jo
> > static int
> > join_read_record_no_init(JOIN_TAB *tab);
> > static
> > -bool subquery_types_allow_materialization(Item_in_subselect *in_subs);
> > +bool subquery_types_allow_materialization(Item_in_subselect
> *predicate);
> > +static
> > +void semijoin_types_allow_materialization(TABLE_LIST *sj_nest,
> > + bool
> types_allow_materialization,
> > + bool *);
>
> This forward declaration does not match the actual declaration. Also,
> please, mention all parameters.
Outch...
>
> ...
>
> > +static
> > +bool resolve_subquery(THD *thd, JOIN *join)
> > +
> > +{
> > + DBUG_ENTER("resolve_subquery");
> > +
> > + SELECT_LEX *select_lex= join->select_lex;
> > +
> > + /* Return if we are just normalizing a view */
> > + if (thd->lex->view_prepare_mode)
> > + DBUG_RETURN(FALSE);
> > + /* Return if the select lex is tagged as within a DESCRIBE */
> > + if (join->select_options & SELECT_DESCRIBE)
> > + DBUG_RETURN(FALSE);
> > +
> > + /*
> > + @todo for PS, make the whole block execute only on the first
> execution.
> > + resolve_subquery() is only invoked in the first execution for
> subqueries
> > + that are transformed to semijoin, but for other subqueries, this
> function
> > + is called for every execution. One solution is perhaps to define
> > + exec_method in class Item_subselect and exit immediately if
> unequal to
> > + EXEC_UNSPECIFIED.
> > + */
> > + Item_subselect *subq_predicate= select_lex->master_unit()->item;
> > + DBUG_ASSERT(subq_predicate);
>
> Do you really need subq_predicate? Could not in_exists_predicate be
> used instead? It is OK to use it as an abbreviation in order to make
> the statement just below shorter, but I think it is a bit confusing to
> have two pointers for the same thing (see comment below).
subq_predicate is required in the call to select_transformer(), so I cannot
remove it. I could replace it with select_lex->master_unit()->item, but I do not
think that is a good idea.
As an afterthought, assigning exec_method and the other fields and functions to
Item_exists_subselect was not my best idea. It would have been better to put
them on Item_subselect, which would mean that we would have general
transformation capabilities for all subquery types.
In this case, in_exists_predicate would be replaced with subq_predicate in most
cases.
However, it would mean violating the approved design, and I will not do that
unless the reviewers insist...
>
> > +
> > + /* in_exists_predicate is non-NULL for IN, =ANY and EXISTS
> predicates */
> > + Item_exists_subselect *in_exists_predicate=
> > + (subq_predicate->substype() == Item_subselect::IN_SUBS ||
> > + subq_predicate->substype() == Item_subselect::EXISTS_SUBS) ?
> > + (Item_exists_subselect*)subq_predicate :
> > + NULL;
>
> ...
>
> > + if (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_MATERIALIZATION)
> &&
> > + subq_predicate->substype() == Item_subselect::IN_SUBS &&
>
> // 1
> > + !select_lex->is_part_of_union() && // 2
> > + select_lex->master_unit()->first_select()->leaf_tables
> &&
> // 3
> > + thd->lex->sql_command == SQLCOM_SELECT && // *
> > + select_lex->outer_select()->leaf_tables && // 3A
> > + in_predicate->is_top_level_item() && // 4
> > + !in_predicate->is_correlated && // 5
> > + in_predicate->exec_method ==
> > + Item_exists_subselect::EXEC_UNSPECIFIED &&
> // 6
> > + subquery_types_allow_materialization(in_predicate)) // 7
> > + in_predicate->exec_method=
> Item_exists_subselect::EXEC_MATERIALIZATION;
>
> Ditto.
I admit it can be confusing, but in_predicate is not valid unless the test
subq_predicate->substype() == Item_subselect::IN_SUBS is True. Notice also that
the assignment to in_predicate is immediately above this big if test, so there
should not be a big confusion.
Please advice what I should do here...
>
> ...
>
> > @@ -972,46 +1039,66 @@ err:
> > */
> >
> > static
> > -bool subquery_types_allow_materialization(Item_in_subselect *in_subs)
> > +void semijoin_types_allow_materialization(TABLE_LIST *sj_nest,
> > + bool
> *materialization_allowed,
> > + bool *sjm_scan_allowed)
>
> I suggest dropping the materialization_allowed parameter let it be
> the return value. Then the function call could be used directly in
> the if-statement that checks this.
OK.
>
> ...
>
> > {
> > - DBUG_ENTER("subquery_types_allow_materialization");
> > + DBUG_ENTER("semijoin_types_allow_materialization");
> >
> > - DBUG_ASSERT(in_subs->left_expr->fixed);
> > + DBUG_ASSERT(sj_nest->nested_join->sj_outer_exprs.elements ==
> > + sj_nest->nested_join->sj_inner_exprs.elements);
> >
> > - List_iterator<Item>
> it(in_subs->unit->first_select()->item_list);
> > - uint elements= in_subs->unit->first_select()->item_list.elements;
> > + List_iterator<Item> it1(sj_nest->nested_join->sj_outer_exprs);
> > + List_iterator<Item> it2(sj_nest->nested_join->sj_inner_exprs);
> > +
> > + *materialization_allowed= FALSE;
> > + *sjm_scan_allowed= FALSE;
> >
> > - in_subs->types_allow_materialization= FALSE; // Assign default
> values
> > - in_subs->sjm_scan_allowed= FALSE;
> > -
> > bool all_are_fields= TRUE;
> > - for (uint i= 0; i < elements; i++)
> > + Item *outer, *inner;
> > + while (outer= it1++, inner= it2++)
>
> Since outer and inner is not needed outside the loop, I wonder whether
> a for loop would be better. Also, would not
> (outer= it1++ && inner= it2++)
> be a bit safer?
A for loop will either need a loop variable or it becomes unnecessary
complicated, I think. Can you suggest a simple solution with a for loop?
About safety, we have already asserted that the element counts are the same.
>
> ...
>
> > @@ -9301,6 +9446,8 @@ static bool make_join_select(JOIN *join,
> > if (pushdown_on_conditions(join, tab))
> > DBUG_RETURN(1);
> >
> > + DBUG_ASSERT(save_used_tables ? tab->emb_sj_nest != NULL : TRUE);
> > +
>
> Why not DBUG_ASSERT(save_used_tables && tab->emb_sj_nest != NULL)?
It would be:
DBUG_ASSERT((!save_used_tables) ||
(save_used_tables && tab->emb_sj_nest != NULL));
then. I think the assert in the commit is the best one.
>
> > if (save_used_tables && !(used_tables &
> > ~(tab->emb_sj_nest->sj_inner_tables |
> > join->const_table_map |
> PSEUDO_TABLE_BITS)))
>
> ...
>
> > @@ -13476,6 +13611,9 @@ void advance_sj_state(JOIN *join, table_
> > join->cur_dups_producing_tables |= emb_sj_nest->sj_inner_tables;
> >
> > /* Remove the sj_nest if all of its SJ-inner tables are in
> cur_table_map */
> > + DBUG_ASSERT((remaining_tables & emb_sj_nest->sj_inner_tables &
> > + ~new_join_tab->table->map) ==
> > + (remaining_tables & emb_sj_nest->sj_inner_tables));
> > if (!(remaining_tables &
> > emb_sj_nest->sj_inner_tables &
> ~new_join_tab->table->map))
> > join->cur_sj_inner_tables &= ~emb_sj_nest->sj_inner_tables;
>
> I not quite sure what/why you are trying to assert here, but it seems
> like the above assert is equivalent to
>
> DBUG_ASSERT(!(remaining_tables & emb_sj_nest->sj_inner_tables &
> new_join_tab->table->map));
It was a start of an experiment I did to see whether it was possible to
eliminate table->map from the statement below. The DBUG_ASSERT documents that it
is redundant. I forgot to followup on that, and I also forgot to remove the
DBUG_ASSERT. Sorry about that...
Now, the DBUG_ASSERT is removed and table->map is eliminated from the statement.
> ...
>
> > @@ -13627,10 +13764,12 @@ void advance_sj_state(JOIN *join, table_
> > pos->first_dupsweedout_table= idx;
> >
> > pos->dupsweedout_tables |= nest->sj_inner_tables |
> > - nest->nested_join->sj_depends_on |
> > - nest->nested_join->sj_corr_tables;
> > + nest->nested_join->sj_depends_on;
> > }
> >
> > + DBUG_ASSERT((remaining_tables & ~new_join_tab->table->map &
> > + pos->dupsweedout_tables) ==
> > + (remaining_tables & pos->dupsweedout_tables));
> > if (pos->dupsweedout_tables &&
> > !(remaining_tables &
> > ~new_join_tab->table->map & pos->dupsweedout_tables))
>
> And the above assert seems to be equivalent to
>
> DBUG_ASSERT(!(remaining_tables & pos->dupsweedout_tables &
> new_join_tab->table->map))
>
> But I do not understand the purpose of this assert.
>
Same as above.
Thanks,
Roy