Roy,
I am fine with your proposed resolutions below. No further comments.
--
Øystein
Roy Lyseng wrote:
> Hi Øystein,
>
> thanks for good suggestions!
>
> Øystein Grøvlen wrote:
>> Hi Roy,
>>
>> Thanks for the updated patch. This looks even better. See in-line for
>> some nit-picking.
>>
>> Roy Lyseng wrote:
>> > #At file:///home/rl136806/mysql/repo/mysql-6.0-work2/ based on
>> revid:horst.hunger@stripped
>> >
>> > 3829 Roy Lyseng 2010-04-21
>> > WL#5266 - Refactor data used to collect correlated
>> expressions in semijoins
>>
>> ...
>>
>> > === modified file 'sql/sql_select.cc'
>> > --- a/sql/sql_select.cc 2010-04-17 06:34:02 +0000
>> > +++ b/sql/sql_select.cc 2010-04-21 10:27:39 +0000
>> > @@ -275,7 +275,15 @@ static uint make_join_orderinfo(JOIN *jo
>> > static int
>> > join_read_record_no_init(JOIN_TAB *tab);
>> > static
>> > -bool subquery_types_allow_materialization(Item_in_subselect
>> *in_subs);
>> > +bool subquery_types_allow_materialization(Item_in_subselect
>> *predicate);
>> > +static
>> > +void semijoin_types_allow_materialization(TABLE_LIST *sj_nest,
>> > + bool
>> types_allow_materialization,
>> > + bool *);
>>
>> This forward declaration does not match the actual declaration. Also,
>> please, mention all parameters.
>
> Outch...
>>
>> ...
>>
>> > +static
>> > +bool resolve_subquery(THD *thd, JOIN *join)
>> > +
>> > +{
>> > + DBUG_ENTER("resolve_subquery");
>> > +
>> > + SELECT_LEX *select_lex= join->select_lex;
>> > +
>> > + /* Return if we are just normalizing a view */
>> > + if (thd->lex->view_prepare_mode)
>> > + DBUG_RETURN(FALSE);
>> > + /* Return if the select lex is tagged as within a DESCRIBE */
>> > + if (join->select_options & SELECT_DESCRIBE)
>> > + DBUG_RETURN(FALSE);
>> > +
>> > + /*
>> > + @todo for PS, make the whole block execute only on the first
>> execution.
>> > + resolve_subquery() is only invoked in the first execution for
>> subqueries
>> > + that are transformed to semijoin, but for other subqueries,
>> this function
>> > + is called for every execution. One solution is perhaps to define
>> > + exec_method in class Item_subselect and exit immediately if
>> unequal to
>> > + EXEC_UNSPECIFIED.
>> > + */
>> > + Item_subselect *subq_predicate= select_lex->master_unit()->item;
>> > + DBUG_ASSERT(subq_predicate);
>>
>> Do you really need subq_predicate? Could not in_exists_predicate be
>> used instead? It is OK to use it as an abbreviation in order to make
>> the statement just below shorter, but I think it is a bit confusing to
>> have two pointers for the same thing (see comment below).
>
> subq_predicate is required in the call to select_transformer(), so I
> cannot remove it. I could replace it with
> select_lex->master_unit()->item, but I do not think that is a good idea.
>
> As an afterthought, assigning exec_method and the other fields and
> functions to Item_exists_subselect was not my best idea. It would have
> been better to put them on Item_subselect, which would mean that we
> would have general transformation capabilities for all subquery types.
>
> In this case, in_exists_predicate would be replaced with subq_predicate
> in most cases.
>
> However, it would mean violating the approved design, and I will not do
> that unless the reviewers insist...
>>
>> > +
>> > + /* in_exists_predicate is non-NULL for IN, =ANY and EXISTS
>> predicates */
>> > + Item_exists_subselect *in_exists_predicate=
>> > + (subq_predicate->substype() == Item_subselect::IN_SUBS ||
>> > + subq_predicate->substype() == Item_subselect::EXISTS_SUBS) ?
>> > + (Item_exists_subselect*)subq_predicate :
>> > + NULL;
>>
>> ...
>>
>> > + if
>> (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_MATERIALIZATION) &&
>> > + subq_predicate->substype() == Item_subselect::IN_SUBS
>> && // 1
>> > + !select_lex->is_part_of_union() && // 2
>> > + select_lex->master_unit()->first_select()->leaf_tables
> &&
>> // 3
>> > + thd->lex->sql_command == SQLCOM_SELECT && // *
>> > + select_lex->outer_select()->leaf_tables && //
> 3A
>> > + in_predicate->is_top_level_item() && // 4
>> > + !in_predicate->is_correlated && // 5
>> > + in_predicate->exec_method ==
>> > + Item_exists_subselect::EXEC_UNSPECIFIED
>> && // 6
>> > + subquery_types_allow_materialization(in_predicate)) // 7
>> > + in_predicate->exec_method=
>> Item_exists_subselect::EXEC_MATERIALIZATION;
>>
>> Ditto.
>
> I admit it can be confusing, but in_predicate is not valid unless the
> test subq_predicate->substype() == Item_subselect::IN_SUBS is True.
> Notice also that the assignment to in_predicate is immediately above
> this big if test, so there should not be a big confusion.
>
> Please advice what I should do here...
>>
>> ...
>>
>> > @@ -972,46 +1039,66 @@ err:
>> > */
>> >
>> > static
>> > -bool subquery_types_allow_materialization(Item_in_subselect *in_subs)
>> > +void semijoin_types_allow_materialization(TABLE_LIST *sj_nest,
>> > + bool
>> *materialization_allowed,
>> > + bool *sjm_scan_allowed)
>>
>> I suggest dropping the materialization_allowed parameter let it be
>> the return value. Then the function call could be used directly in
>> the if-statement that checks this.
>
> OK.
>>
>> ...
>>
>> > {
>> > - DBUG_ENTER("subquery_types_allow_materialization");
>> > + DBUG_ENTER("semijoin_types_allow_materialization");
>> >
>> > - DBUG_ASSERT(in_subs->left_expr->fixed);
>> > + DBUG_ASSERT(sj_nest->nested_join->sj_outer_exprs.elements ==
>> > + sj_nest->nested_join->sj_inner_exprs.elements);
>> >
>> > - List_iterator<Item>
> it(in_subs->unit->first_select()->item_list);
>> > - uint elements=
> in_subs->unit->first_select()->item_list.elements;
>> > + List_iterator<Item>
> it1(sj_nest->nested_join->sj_outer_exprs);
>> > + List_iterator<Item>
> it2(sj_nest->nested_join->sj_inner_exprs);
>> > +
>> > + *materialization_allowed= FALSE;
>> > + *sjm_scan_allowed= FALSE;
>> >
>> > - in_subs->types_allow_materialization= FALSE; // Assign default
>> values
>> > - in_subs->sjm_scan_allowed= FALSE;
>> > -
>> > bool all_are_fields= TRUE;
>> > - for (uint i= 0; i < elements; i++)
>> > + Item *outer, *inner;
>> > + while (outer= it1++, inner= it2++)
>>
>> Since outer and inner is not needed outside the loop, I wonder whether
>> a for loop would be better. Also, would not
>> (outer= it1++ && inner= it2++)
>> be a bit safer?
>
> A for loop will either need a loop variable or it becomes unnecessary
> complicated, I think. Can you suggest a simple solution with a for loop?
>
> About safety, we have already asserted that the element counts are the
> same.
>
>>
>> ...
>>
>> > @@ -9301,6 +9446,8 @@ static bool make_join_select(JOIN *join,
>> > if (pushdown_on_conditions(join, tab))
>> > DBUG_RETURN(1);
>> >
>> > + DBUG_ASSERT(save_used_tables ? tab->emb_sj_nest != NULL :
>> TRUE);
>> > +
>>
>> Why not DBUG_ASSERT(save_used_tables && tab->emb_sj_nest != NULL)?
>
> It would be:
> DBUG_ASSERT((!save_used_tables) ||
> (save_used_tables && tab->emb_sj_nest != NULL));
>
> then. I think the assert in the commit is the best one.
>>
>> > if (save_used_tables && !(used_tables &
>> > ~(tab->emb_sj_nest->sj_inner_tables
> |
>> > join->const_table_map |
>> PSEUDO_TABLE_BITS)))
>>
>> ...
>>
>> > @@ -13476,6 +13611,9 @@ void advance_sj_state(JOIN *join, table_
>> > join->cur_dups_producing_tables |=
> emb_sj_nest->sj_inner_tables;
>> >
>> > /* Remove the sj_nest if all of its SJ-inner tables are in
>> cur_table_map */
>> > + DBUG_ASSERT((remaining_tables & emb_sj_nest->sj_inner_tables
> &
>> > + ~new_join_tab->table->map) ==
>> > + (remaining_tables &
> emb_sj_nest->sj_inner_tables));
>> > if (!(remaining_tables &
>> > emb_sj_nest->sj_inner_tables &
> ~new_join_tab->table->map))
>> > join->cur_sj_inner_tables &=
> ~emb_sj_nest->sj_inner_tables;
>>
>> I not quite sure what/why you are trying to assert here, but it seems
>> like the above assert is equivalent to
>>
>> DBUG_ASSERT(!(remaining_tables & emb_sj_nest->sj_inner_tables &
>> new_join_tab->table->map));
>
> It was a start of an experiment I did to see whether it was possible to
> eliminate table->map from the statement below. The DBUG_ASSERT documents
> that it is redundant. I forgot to followup on that, and I also forgot to
> remove the DBUG_ASSERT. Sorry about that...
>
> Now, the DBUG_ASSERT is removed and table->map is eliminated from the
> statement.
>> ...
>>
>> > @@ -13627,10 +13764,12 @@ void advance_sj_state(JOIN *join, table_
>> > pos->first_dupsweedout_table= idx;
>> >
>> > pos->dupsweedout_tables |= nest->sj_inner_tables |
>> > - nest->nested_join->sj_depends_on |
>> > - nest->nested_join->sj_corr_tables;
>> > + nest->nested_join->sj_depends_on;
>> > }
>> >
>> > + DBUG_ASSERT((remaining_tables & ~new_join_tab->table->map
> &
>> > + pos->dupsweedout_tables) ==
>> > + (remaining_tables & pos->dupsweedout_tables));
>> > if (pos->dupsweedout_tables &&
>> > !(remaining_tables &
>> > ~new_join_tab->table->map &
> pos->dupsweedout_tables))
>>
>> And the above assert seems to be equivalent to
>>
>> DBUG_ASSERT(!(remaining_tables & pos->dupsweedout_tables &
>> new_join_tab->table->map))
>>
>> But I do not understand the purpose of this assert.
>>
> Same as above.
>
> Thanks,
> Roy