| List: | Commits | « Previous MessageNext Message » | |
| From: | Øystein Grøvlen | Date: | April 15 2010 11:06am |
| Subject: | Re: bzr commit into mysql-6.0-codebase-bugfixing branch (roy.lyseng:3790) WL#5266 | ||
| View as plain text | |||
Roy,
Thanks for the refactoring. Several good things here. See my comments
in-line:
Roy Lyseng wrote:
> #At file:///home/rl136806/mysql/repo/mysql-6.0-work2/ based on
revid:alik@stripped
>
> 3790 Roy Lyseng 2010-03-05
> WL#5266 - Refactor data used to collect correlated expressions
in semijoins
>
> sql/item_subselect.cc
> Modified constructors for Item_exists_subselect and
Item_in_subselect.
> Performed necessary updates after changes to subselect classes.
>
> sql/item_subselect.h
> Fields exec_method, sj_convert_priority, emb_on_expr_nest,
> expr_join_nest, types_allow_materialization, sjm_scan_allowed
> moved from class Item_in_subselect to Item_exists_subselect,
> to prepare for future semijoin transformation of EXISTS.
>
> Modified constructors for Item_exists_subselect and
Item_in_subselect.
>
> Added a simple select_transformer() function for
Item_exists_subselect.
>
> sql/sql_lex.cc
> Changes in exec_method constant values.
>
> sql/sql_select.cc
> New function resolve_subquery() is refactored out of
JOIN::prepare().
> It performs all resolving, including preparing semijoin
transformations,
> for predicates that involve a subquery (ie IN, EXISTS, ANY/ALL).
>
> In JOIN::prepare(), removed out-commented test on
Query_arena::PREPARED
> because it did not work as anticipated, and it could be omitted.
>
> In JOIN::prepare(), removed call to
subquery_types_allow_materialization().
> Now handled by call to semijoin_types_allow_materialization() in
> convert_subquery_to_semijoin().
>
> subquery_types_allow_materialization() is modified so that it
is only
> used for non-semijoin-transformed subqueries.
> semijoin_types_allow_materialization() is created to handle
> semijoin-transformed subqueries.
> Some common functionality is factored out into
> types_allow_materialization().
>
> convert_subq_to_sj() is renamed to
convert_subquery_to_semijoin().
> Doxygen comments are added, more comments about the actual
transformation
> are added.
>
> Removed unnecessary fix_fields() on subq_pred->left_expr in
> convert_subquery_to_semijoin(), replaced with DBUG_ASSERT().
>
> Pushed IN expressions into the lists
nested_join->sj_outer_exprs and
> nested_join->sj_inner_exprs, added comments on use of
sj_corr_tables
> and sj_depends_on.
>
> Added error handling when creating objects for the semijoin
condition.
>
> In JOIN::flatten_subqueries(), modified pointer types so that
they
> allow EXISTS predicates to be transformed (for future work).
>
> removed manipulation of the is_correlated flag when the
subquery is
removed => Removed
> transformed to a semijoin. The set sj_corr_tables is
maintained instead
> of this, and can be used for testing wherever is_correlated
was used.
>
> In get_semi_join_select_list_index(), use sj_inner_exprs to
retrieve
> expressions from subquery instead of using the select list of
the subquery
> (which is not resolved after first execution of query).
>
> In make_join_select(), cleaned up subset test (sj_inner_tables in
> used_tables).
>
> create_subq_in_equalities() has been renamed to
create_subq_equalities(),
> because it may be used for trivially-correlated expressions
as well
> as equalities generated from IN predicates in the future.
>
> In setup_sj_materialization(), use sj_inner_exprs to retrieve
> expressions from subquery instead of using the select list of
the subquery.
> Use sj_outer_exprs instead of
emb_sj_nest->sj_subq_pred->left_expr.
>
> In advance_sj_state(), simplified some expressions by taking
advantage
> of the fact that sj_corr_tables is always a true subset of
sj_depends_on.
>
> sql/sql_select.h
> Modified type of sj_subselects, so that EXISTS predicates can
be semijoin-
> transformed in the future.
>
> sql/table.h
> Removed sj_in_exprs (size of expression lists can be used
instead).
> Changed type of sj_subq_pred so that EXISTS predicates can be
handled.
> Added the list sj_inner_exprs to replace the use of the
subquery's
> select list. This also opens for processing of
trivially-correlated
> expressions in the future.
> Renamed sj_outer_expr_list to sj_outer_exprs.
>
> modified:
> sql/item_subselect.cc
> sql/item_subselect.h
> sql/sql_lex.cc
> sql/sql_select.cc
> sql/sql_select.h
> sql/table.h
> === modified file 'sql/item_subselect.cc'
> --- a/sql/item_subselect.cc 2010-03-01 10:04:07 +0000
> +++ b/sql/item_subselect.cc 2010-03-05 09:43:47 +0000
> @@ -298,8 +298,8 @@ bool Item_subselect::exec()
> bool Item_in_subselect::exec()
> {
> DBUG_ENTER("Item_in_subselect::exec");
> - DBUG_ASSERT(exec_method != MATERIALIZATION ||
> - (exec_method == MATERIALIZATION &&
> + DBUG_ASSERT(exec_method != EXEC_MATERIALIZATION ||
> + (exec_method == EXEC_MATERIALIZATION &&
> engine->engine_type() ==
subselect_engine::HASH_SJ_ENGINE));
> /*
> Initialize the cache of the left predicate operand. This has to
be done as
> @@ -313,7 +313,7 @@ bool Item_in_subselect::exec()
> - on a cost-based basis, that takes into account the cost of a cache
> lookup, the cache hit rate, and the savings per cache hit.
> */
> - if (!left_expr_cache && exec_method == MATERIALIZATION)
> + if (!left_expr_cache && exec_method == EXEC_MATERIALIZATION)
> init_left_expr_cache();
>
> /* If the new left operand is already in the cache, reuse the old
result. */
> @@ -693,7 +693,9 @@ bool Item_singlerow_subselect::val_bool(
>
>
> Item_exists_subselect::Item_exists_subselect(st_select_lex *select_lex):
> - Item_subselect()
> + Item_subselect(), value(FALSE), exec_method(EXEC_UNSPECIFIED),
> + sj_convert_priority(0), emb_on_expr_nest(NULL),
expr_join_nest(NULL),
> + types_allow_materialization(FALSE), sjm_scan_allowed(FALSE)
This initialization is duplicated between contructors. Would it be
better to have an init function? (I am not sure myself)
> {
> DBUG_ENTER("Item_exists_subselect::Item_exists_subselect");
> bool val_bool();
> @@ -701,7 +703,6 @@ Item_exists_subselect::Item_exists_subse
> max_columns= UINT_MAX;
> null_value= 0; //can't be NULL
> maybe_null= 0; //can't be NULL
> - value= 0;
> DBUG_VOID_RETURN;
> }
>
> @@ -731,8 +732,7 @@ bool Item_in_subselect::test_limit(st_se
> Item_in_subselect::Item_in_subselect(Item * left_exp,
> st_select_lex *select_lex):
> Item_exists_subselect(), left_expr_cache(0), first_execution(TRUE),
> - optimizer(0), pushed_cond_guards(NULL), exec_method(NOT_TRANSFORMED),
> - upper_item(0)
> + optimizer(0), pushed_cond_guards(NULL), upper_item(0)
> {
> DBUG_ENTER("Item_in_subselect::Item_in_subselect");
> left_expr= left_exp;
> @@ -1141,7 +1141,7 @@ Item_in_subselect::single_value_transfor
> If this IN predicate can be computed via materialization, do not
> perform the IN -> EXISTS transformation.
> */
> - if (exec_method == MATERIALIZATION)
> + if (exec_method == EXEC_MATERIALIZATION)
> DBUG_RETURN(RES_OK);
>
> /* Perform the IN=>EXISTS transformation. */
> @@ -1420,7 +1420,7 @@ Item_in_subselect::row_value_transformer
> If this IN predicate can be computed via materialization, do not
> perform the IN -> EXISTS transformation.
> */
> - if (exec_method == MATERIALIZATION)
> + if (exec_method == EXEC_MATERIALIZATION)
> DBUG_RETURN(RES_OK);
>
> /* Perform the IN=>EXISTS transformation. */
> @@ -1736,8 +1736,8 @@ Item_in_subselect::select_in_like_transf
> If we didn't choose an execution method up to this point, we choose
> the IN=>EXISTS transformation.
> */
> - if (exec_method == NOT_TRANSFORMED)
> - exec_method= IN_TO_EXISTS;
> + if (exec_method == EXEC_UNSPECIFIED)
> + exec_method= EXEC_EXISTS;
> arena= thd->activate_stmt_arena_if_needed(&backup);
>
> /*
> @@ -1771,7 +1771,7 @@ err:
>
> void Item_in_subselect::print(String *str, enum_query_type query_type)
> {
> - if (exec_method == IN_TO_EXISTS)
> + if (exec_method == EXEC_EXISTS)
> str->append(STRING_WITH_LEN("<exists>"));
> else
> {
> @@ -1786,7 +1786,7 @@ bool Item_in_subselect::fix_fields(THD *
> {
> bool result = 0;
>
> - if (exec_method == SEMI_JOIN)
> + if (exec_method == EXEC_SEMI_JOIN)
> return !( (*ref)= new Item_int(1));
>
> if (thd_arg->lex->view_prepare_mode && left_expr &&
!left_expr->fixed)
> @@ -1852,7 +1852,7 @@ bool Item_in_subselect::setup_engine()
> */
> delete new_engine;
> new_engine= NULL;
> - exec_method= NOT_TRANSFORMED;
> + exec_method= EXEC_UNSPECIFIED;
> if (left_expr->cols() == 1)
> trans_res=
single_value_in_to_exists_transformer(old_engine->join,
> &eq_creator);
> @@ -1960,7 +1960,7 @@ bool Item_in_subselect::init_left_expr_c
>
> bool Item_in_subselect::is_expensive_processor(uchar *arg)
> {
> - return exec_method == MATERIALIZATION;
> + return exec_method == EXEC_MATERIALIZATION;
> }
>
>
> @@ -1968,7 +1968,7 @@ Item_subselect::trans_res
> Item_allany_subselect::select_transformer(JOIN *join)
> {
> DBUG_ENTER("Item_allany_subselect::select_transformer");
> - exec_method= IN_TO_EXISTS;
> + exec_method= EXEC_EXISTS;
> if (upper_item)
> upper_item->show= 1;
> DBUG_RETURN(select_in_like_transformer(join, func));
> @@ -1977,7 +1977,7 @@ Item_allany_subselect::select_transforme
>
> void Item_allany_subselect::print(String *str, enum_query_type
query_type)
> {
> - if (exec_method == IN_TO_EXISTS)
> + if (exec_method == EXEC_EXISTS)
> str->append(STRING_WITH_LEN("<exists>"));
> else
> {
>
> === modified file 'sql/item_subselect.h'
> --- a/sql/item_subselect.h 2010-01-13 06:34:01 +0000
> +++ b/sql/item_subselect.h 2010-03-05 09:43:47 +0000
> @@ -77,7 +77,10 @@ public:
> /* subquery is transformed */
> bool changed;
I suggest using doxygen syntax for all the declarations below.
>
> - /* TRUE <=> The underlying SELECT is correlated w.r.t some
ancestor select */
> + /*
> + TRUE <=> The underlying SELECT is correlated w.r.t some ancestor
select.
> + Note that this field is not maintained in semijoin-transformed
subqueries.
> + */
> bool is_correlated;
>
> enum trans_res {RES_OK, RES_REDUCE, RES_ERROR};
> @@ -237,9 +240,56 @@ protected:
> bool value; /* value of this item (boolean: exists/not-exists) */
>
> public:
> + /*
> + The method chosen to execute the predicate, currently used for
IN, =ANY
> + and EXISTS predicates.
> + */
> + enum enum_exec_method {
> + EXEC_UNSPECIFIED, /* No execution method specified yet. */
> + EXEC_SEMI_JOIN, /* Predicate is converted to semi-join nest. */
> + EXEC_EXISTS, /* IN was converted to correlated EXISTS. */
> + EXEC_MATERIALIZATION /* Predicate executed via subquery
materialization. */
> + };
> + enum_exec_method exec_method;
> + /* Priority of this predicate in the convert-to-semi-join-nest
process. */
> + int sj_convert_priority;
> + /*
> + Used by subquery optimizations to keep track about in which
clause this
> + subquery predicate is located:
> + (TABLE_LIST*) 1 - the predicate is an AND-part of the WHERE
> + join nest pointer - the predicate is an AND-part of ON expression
> + of a join nest
> + NULL - for all other locations
> + See also THD::emb_on_expr_nest.
> + */
> + TABLE_LIST *emb_on_expr_nest;
> + /*
> + Location of the subquery predicate. It is either
> + - pointer to join nest if the subquery predicate is in the ON
expression
> + - (TABLE_LIST*)1 if the predicate is in the WHERE.
> + */
> + TABLE_LIST *expr_join_nest;
Description of the above two pointers are almost identical. Do you
know what is the difference? Why are both needed? Would be good to
have a comment about that.
> + /*
> + Types of left_expr and subquery's select list allow to perform
subquery
> + materialization. Currently, we set this to FALSE when it as well
could
> + be TRUE. This is to be properly addressed with fix for BUG#36752.
> + */
> + bool types_allow_materialization;
Is this variable really needed anymore? It is only checked once, why
not call semijoin_types_allow_materialization() at that point instead?
> + /* Same as above, but they also allow to scan the materialized
table. */
I suggest replacing "they" with something more specific here.
> + bool sjm_scan_allowed;
> +
> Item_exists_subselect(st_select_lex *select_lex);
> - Item_exists_subselect(): Item_subselect() {}
> + Item_exists_subselect()
> + :Item_subselect(), value(FALSE), exec_method(EXEC_UNSPECIFIED),
> + sj_convert_priority(0), emb_on_expr_nest(NULL),
expr_join_nest(NULL),
> + types_allow_materialization(FALSE), sjm_scan_allowed(FALSE)
> + {}
>
> + virtual trans_res select_transformer(JOIN *join)
> + {
> + exec_method= EXEC_EXISTS;
> + return RES_OK;
> + }
> subs_type substype() { return EXISTS_SUBS; }
> void reset()
> {
> @@ -299,45 +349,6 @@ protected:
> public:
> /* Used to trigger on/off conditions that were pushed down to
subselect */
> bool *pushed_cond_guards;
> -
> - /* Priority of this predicate in the convert-to-semi-join-nest
process. */
> - int sj_convert_priority;
> - /*
> - Used by subquery optimizations to keep track about in which
clause this
> - subquery predicate is located:
> - (TABLE_LIST*) 1 - the predicate is an AND-part of the WHERE
> - join nest pointer - the predicate is an AND-part of ON expression
> - of a join nest
> - NULL - for all other locations
> - See also THD::emb_on_expr_nest.
> - */
> - TABLE_LIST *emb_on_expr_nest;
> - /*
> - Location of the subquery predicate. It is either
> - - pointer to join nest if the subquery predicate is in the ON
expression
> - - (TABLE_LIST*)1 if the predicate is in the WHERE.
> - */
> - TABLE_LIST *expr_join_nest;
> - /*
> - Types of left_expr and subquery's select list allow to perform
subquery
> - materialization. Currently, we set this to FALSE when it as well
could
> - be TRUE. This is to be properly addressed with fix for BUG#36752.
> - */
> - bool types_allow_materialization;
> -
> - /*
> - Same as above, but they also allow to scan the materialized table.
> - */
> - bool sjm_scan_allowed;
> -
> - /* The method chosen to execute the IN predicate. */
> - enum enum_exec_method {
> - NOT_TRANSFORMED, /* No execution method was chosen for this IN. */
> - SEMI_JOIN, /* IN was converted to semi-join nest and should be
removed. */
> - IN_TO_EXISTS, /* IN was converted to correlated EXISTS. */
> - MATERIALIZATION /* IN will be executed via subquery
materialization. */
> - };
> - enum_exec_method exec_method;
>
> bool *get_cond_guard(int i)
> {
> @@ -355,8 +366,7 @@ public:
> Item_in_subselect(Item * left_expr, st_select_lex *select_lex);
> Item_in_subselect()
> :Item_exists_subselect(), left_expr_cache(0), first_execution(TRUE),
> - optimizer(0), abort_on_null(0), pushed_cond_guards(NULL),
> - exec_method(NOT_TRANSFORMED), upper_item(0)
> + optimizer(0), abort_on_null(0), pushed_cond_guards(NULL),
upper_item(0)
> {}
> void cleanup();
> subs_type substype() { return IN_SUBS; }
>
> === modified file 'sql/sql_lex.cc'
> --- a/sql/sql_lex.cc 2010-02-25 13:55:28 +0000
> +++ b/sql/sql_lex.cc 2010-03-05 09:43:47 +0000
> @@ -2168,7 +2168,7 @@ void st_select_lex::print_limit(THD *thd
> */
> (((subs_type == Item_subselect::IN_SUBS) &&
> ((Item_in_subselect*)item)->exec_method ==
> - Item_in_subselect::MATERIALIZATION) ?
> + Item_in_subselect::EXEC_MATERIALIZATION) ?
> TRUE :
> (select_limit->val_int() == 1LL) &&
> offset_limit == 0));
>
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc 2010-03-01 10:04:07 +0000
> +++ b/sql/sql_select.cc 2010-03-05 09:43:47 +0000
> @@ -256,7 +256,13 @@ static uint make_join_orderinfo(JOIN *jo
> static int
> join_read_record_no_init(JOIN_TAB *tab);
> static
> -bool subquery_types_allow_materialization(Item_in_subselect *in_subs);
> +bool subquery_types_allow_materialization(Item_in_subselect *predicate);
> +static
> +void semijoin_types_allow_materialization(TABLE_LIST *sj_nest);
> +static
> +bool types_allow_materialization(Item *outer, Item *inner);
> +static
> +bool resolve_subquery(THD *thd, JOIN *join);
> int do_sj_reset(SJ_TMP_TABLE *sj_tbl);
> TABLE *create_duplicate_weedout_tmp_table(THD *thd, uint
uniq_tuple_length_arg,
> SJ_TMP_TABLE *sjtbl);
> @@ -575,171 +581,11 @@ JOIN::prepare(Item ***rref_pointer_array
> thd->lex->allow_sum_func= save_allow_sum_func;
> }
>
> - /*
> - If
> - 1) this join is inside a subquery (of any type except FROM-clause
> - subquery) and
> - 2) we aren't just normalizing a VIEW
> - 2) we aren't in a DESCRIBE
> -
> - Then perform early unconditional subquery transformations:
> - - Convert subquery predicate into semi-join, or
> - - Mark the subquery for execution using materialization, or
> - - Perform IN->EXISTS transformation, or
> - - Perform more/less ALL/ANY -> MIN/MAX rewrite
> - - Substitute trivial scalar-context subquery with its value
> -
> - TODO: for PS, make the whole block execute only on the first
execution
> - */
> - Item_subselect *subselect;
> - if (!thd->lex->view_prepare_mode && // (1)
> - (subselect= select_lex->master_unit()->item) && // (2)
> - !(select_options & SELECT_DESCRIBE))
> + if (select_lex->master_unit()->item)
> {
> - Item_in_subselect *in_subs= NULL;
> - if (subselect->substype() == Item_subselect::IN_SUBS)
> - in_subs= (Item_in_subselect*)subselect;
> -
> - /* Resolve expressions and perform semantic analysis for IN query */
> - if (in_subs != NULL)
> - /*
> - TODO: Add the condition below to this if statement when we
have proper
> - support for is_correlated handling for materialized semijoins.
> - If we were to add this condition now, the fix_fields() call in
> - convert_subq_to_sj() would force the flag is_correlated to
be set
> - erroneously for prepared queries.
> -
> - thd->stmt_arena->state != Query_arena::PREPARED)
> - */
> - {
> - /*
> - Check if the left and right expressions have the same # of
> - columns, i.e. we don't have a case like
> - (oe1, oe2) IN (SELECT ie1, ie2, ie3 ...)
> -
> - TODO why do we have this duplicated in IN->EXISTS transformers?
> - psergey-todo: fix these: grep for
duplicated_subselect_card_check
> - */
> - if (select_lex->item_list.elements != in_subs->left_expr->cols())
> - {
> - my_error(ER_OPERAND_COLUMNS, MYF(0),
in_subs->left_expr->cols());
> - DBUG_RETURN(-1);
> - }
> -
> - SELECT_LEX *current= thd->lex->current_select;
> - thd->lex->current_select= current->return_after_parsing();
> - char const *save_where= thd->where;
> - thd->where= "IN/ALL/ANY subquery";
> -
> - bool failure= !in_subs->left_expr->fixed &&
> - in_subs->left_expr->fix_fields(thd,
&in_subs->left_expr);
> - thd->lex->current_select= current;
> - thd->where= save_where;
> - if (failure)
> - DBUG_RETURN(-1); /* purecov: deadcode */
> - }
> - DBUG_PRINT("info", ("Checking if subq can be converted to
semi-join"));
> - /*
> - Check if we're in subquery that is a candidate for flattening
into a
> - semi-join (which is done in flatten_subqueries()). The
> - requirements are:
> - 1. Subquery predicate is an IN/=ANY subq predicate
> - 2. Subquery is a single SELECT (not a UNION)
> - 3. Subquery does not have GROUP BY or ORDER BY
> - 4. Subquery does not use aggregate functions or HAVING
> - 5. Subquery predicate is at the AND-top-level of ON/WHERE clause
> - 6. We are not in a subquery of a single table UPDATE/DELETE
that
> - doesn't have a JOIN (TODO: We should handle this at some
> - point by switching to multi-table UPDATE/DELETE)
> - 7. We're not in a confluent table-less subquery, like
"SELECT 1".
> - 8. No execution method was already chosen (by a prepared
statement)
> - 9. Parent select is not a confluent table-less select
> - 10. Neither parent nor child select have STRAIGHT_JOIN option.
> - */
> - if (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_SEMIJOIN) &&
> - in_subs &&
// 1
> - !select_lex->is_part_of_union() &&
// 2
> - !select_lex->group_list.elements && !order &&
// 3
> - !having && !select_lex->with_sum_func &&
// 4
> - thd->thd_marker.emb_on_expr_nest &&
// 5
> - select_lex->outer_select()->join &&
// 6
> - select_lex->master_unit()->first_select()->leaf_tables &&
// 7
> - in_subs->exec_method == Item_in_subselect::NOT_TRANSFORMED
&& // 8
> - select_lex->outer_select()->leaf_tables &&
// 9
> - !((select_options |
select_lex->outer_select()->join->select_options)
> - & SELECT_STRAIGHT_JOIN))
// 10
> - {
> - DBUG_PRINT("info", ("Subquery is semi-join conversion
candidate"));
> -
> - (void)subquery_types_allow_materialization(in_subs);
> -
> - in_subs->emb_on_expr_nest= thd->thd_marker.emb_on_expr_nest;
> -
> - /* Register the subquery for further processing in
flatten_subqueries() */
> - select_lex->
> - outer_select()->join->sj_subselects.append(thd->mem_root,
in_subs);
> - in_subs->expr_join_nest= thd->thd_marker.emb_on_expr_nest;
> - }
> - else
> - {
> - DBUG_PRINT("info", ("Subquery can't be converted to semi-join"));
> - /*
> - Check if the subquery predicate can be executed via
materialization.
> - The required conditions are:
> - 1. Subquery predicate is an IN/=ANY subq predicate
> - 2. Subquery is a single SELECT (not a UNION)
> - 3. Subquery is not a table-less query. In this case there is no
> - point in materializing.
> - 3A The upper query is not a confluent SELECT ... FROM DUAL. We
> - can't do materialization for SELECT .. FROM DUAL because it
> - does not call setup_subquery_materialization(). We
could make
> - SELECT ... FROM DUAL call that function but that
doesn't seem
> - to be the case that is worth handling.
> - 4. Subquery predicate is a top-level predicate
> - (this implies it is not negated)
> - TODO: this is a limitation that should be lifted once we
> - implement correct NULL semantics (WL#3830)
> - 5. Subquery is non-correlated
> - TODO:
> - This is an overly restrictive condition. It can be
extended to:
> - (Subquery is non-correlated ||
> - Subquery is correlated to any query outer to IN predicate ||
> - (Subquery is correlated to the immediate outer query &&
> - Subquery !contains {GROUP BY, ORDER BY [LIMIT],
> - aggregate functions}) && subquery predicate is not
under "NOT IN"))
> - 6. No execution method was already chosen (by a prepared
statement).
> -
> - (*) The subquery must be part of a SELECT statement. The current
> - condition also excludes multi-table update statements.
> -
> - We have to determine whether we will perform subquery
materialization
> - before calling the IN=>EXISTS transformation, so that we
know whether to
> - perform the whole transformation or only that part of it
which wraps
> - Item_in_subselect in an Item_in_optimizer.
> - */
> - if
(thd->optimizer_switch_flag(OPTIMIZER_SWITCH_MATERIALIZATION) &&
> - in_subs &&
// 1
> - !select_lex->is_part_of_union() &&
// 2
> - select_lex->master_unit()->first_select()->leaf_tables
&&
// 3
> - thd->lex->sql_command == SQLCOM_SELECT &&
// *
> - select_lex->outer_select()->leaf_tables &&
// 3A
> - subquery_types_allow_materialization(in_subs))
> - {
> - // psergey-todo: duplicated_subselect_card_check: where it's
done?
> - if (in_subs->is_top_level_item() &&
// 4
> - !in_subs->is_correlated &&
// 5
> - in_subs->exec_method ==
Item_in_subselect::NOT_TRANSFORMED) // 6
> - in_subs->exec_method= Item_in_subselect::MATERIALIZATION;
> - }
> -
> - Item_subselect::trans_res trans_res;
> - if ((trans_res= subselect->select_transformer(this)) !=
> - Item_subselect::RES_OK)
> - {
> - select_lex->fix_prepare_information(thd, &conds, &having);
> - DBUG_RETURN((trans_res == Item_subselect::RES_ERROR));
> - }
> - }
> + /* Join object is a subquery within an IN/ANY/ALL/EXISTS
predicate */
> + if (resolve_subquery(thd, this))
> + DBUG_RETURN(-1);
> }
>
> select_lex->fix_prepare_information(thd, &conds, &having);
> @@ -903,9 +749,189 @@ err:
>
>
> /**
> - @brief Check if subquery's compared types allow materialization.
> + @brief Resolve predicate involving subquery
> +
> + @param thd Pointer to THD.
> + @param join Join that is part of a subquery predicate.
> +
> + @retval FALSE Success.
> + @retval TRUE Error.
> +
> + @details
> + Perform early unconditional subquery transformations:
> + - Convert subquery predicate into semi-join, or
> + - Mark the subquery for execution using materialization, or
> + - Perform IN->EXISTS transformation, or
> + - Perform more/less ALL/ANY -> MIN/MAX rewrite
> + - Substitute trivial scalar-context subquery with its value
> +
> + TODO: for PS, make the whole block execute only on the first execution
> +
> +*/
> +
> +static
> +bool resolve_subquery(THD *thd, JOIN *join)
> +
> +{
> + DBUG_ENTER("resolve_subquery");
> +
> + SELECT_LEX *select_lex= join->select_lex;
> +
> + /* Return if we are just normalizing a view */
> + if (thd->lex->view_prepare_mode)
> + DBUG_RETURN(FALSE);
> + /* Return if the select lex is tagged as within a DESCRIBE */
> + if (join->select_options & SELECT_DESCRIBE)
> + DBUG_RETURN(FALSE);
> +
> + Item_subselect *subq_predicate= select_lex->master_unit()->item;
> + DBUG_ASSERT(subq_predicate);
> +
> + /* in_exists_predicate is non-NULL for IN, =ANY and EXISTS
predicates */
> + Item_exists_subselect *in_exists_predicate=
> + (subq_predicate->substype() == Item_subselect::IN_SUBS ||
> + subq_predicate->substype() == Item_subselect::EXISTS_SUBS) ?
> + (Item_exists_subselect*)subq_predicate :
> + NULL;
> +
> + /* in_predicate is non-NULL for IN and =ANY predicates */
> + Item_in_subselect *in_predicate=
> + (subq_predicate->substype() == Item_subselect::IN_SUBS) ?
> + (Item_in_subselect *)subq_predicate :
> + NULL;
I think having to pointers that point to the same is a source of
confusion, and especially when you use both in the same if-statement.
It seems to me that in_predicate is strictly not necessary, you can
use in_predicate where applicable and use the test
(subq_predicate->substype() == Item_subselect::IN_SUBS) when a
distinction is needed.
In general, I do not think infering state from a null-pointer is a
very good practice.
> +
> + if (in_predicate != NULL)
I suggest replacing the above with
if (subq_predicate->substype() == Item_subselect::IN_SUBS)
> + {
> + /*
> + Check if the left and right expressions have the same # of
> + columns, i.e. we don't have a case like
> + (oe1, oe2) IN (SELECT ie1, ie2, ie3 ...)
> +
> + TODO why do we have this duplicated in IN->EXISTS transformers?
> + psergey-todo: fix these: grep for duplicated_subselect_card_check
> + */
> + if (select_lex->item_list.elements !=
in_predicate->left_expr->cols())
> + {
> + my_error(ER_OPERAND_COLUMNS, MYF(0),
in_predicate->left_expr->cols());
> + DBUG_RETURN(TRUE);
> + }
> +
> + SELECT_LEX *current= thd->lex->current_select;
> + thd->lex->current_select= current->return_after_parsing();
> + char const *save_where= thd->where;
> + thd->where= "IN/ALL/ANY subquery";
> +
> + bool result= !in_predicate->left_expr->fixed &&
> + in_predicate->left_expr->fix_fields(thd,
> +
&in_predicate->left_expr);
> + thd->lex->current_select= current;
> + thd->where= save_where;
> + if (result)
> + DBUG_RETURN(TRUE); /* purecov: deadcode */
> + }
> +
> + DBUG_PRINT("info", ("Checking if subq can be converted to
semi-join"));
> + /*
> + Check if we're in subquery that is a candidate for flattening into a
> + semi-join (which is done in flatten_subqueries()). The
requirements are:
> + 1. Subquery predicate is an IN/=ANY subquery predicate
> + 2. Subquery is a single SELECT (not a UNION)
> + 3. Subquery does not have GROUP BY or ORDER BY
> + 4. Subquery does not use aggregate functions or HAVING
> + 5. Subquery predicate is at the AND-top-level of ON/WHERE clause
> + 6. We are not in a subquery of a single table UPDATE/DELETE that
> + doesn't have a JOIN (TODO: We should handle this at some
> + point by switching to multi-table UPDATE/DELETE)
> + 7. We're not in a confluent table-less subquery, like "SELECT 1".
> + 8. No execution method was already chosen (by a prepared
statement)
> + 9. Parent select is not a confluent table-less select
> + 10. Neither parent nor child select have STRAIGHT_JOIN option.
> + */
> + if (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_SEMIJOIN) &&
> + in_predicate &&
// 1
> + !select_lex->is_part_of_union() &&
// 2
> + !select_lex->group_list.elements && !join->order &&
// 3
> + !join->having && !select_lex->with_sum_func &&
// 4
> + thd->thd_marker.emb_on_expr_nest &&
// 5
> + select_lex->outer_select()->join &&
// 6
> + select_lex->master_unit()->first_select()->leaf_tables &&
// 7
> + in_exists_predicate->exec_method ==
> + Item_exists_subselect::EXEC_UNSPECIFIED
&& // 8
Test #8 relies on the assumption that in_exists_predicate is non-null
when in_predicate is non-null. I do not like this. Why cannot Test#1
test on in_exists_predicate? I assume this in the future is supposed
to apply to EXISTS-queries, too.
> + select_lex->outer_select()->leaf_tables &&
// 9
> + !((join->select_options |
> + select_lex->outer_select()->join->select_options)
> + & SELECT_STRAIGHT_JOIN))
// 10
> + {
> + DBUG_PRINT("info", ("Subquery is semi-join conversion candidate"));
> +
> + in_exists_predicate->emb_on_expr_nest=
thd->thd_marker.emb_on_expr_nest;
> + in_exists_predicate->expr_join_nest=
thd->thd_marker.emb_on_expr_nest;
> +
> + /* Register the subquery for further processing in
flatten_subqueries() */
> + select_lex->outer_select()->join->
> + sj_subselects.append(thd->mem_root, in_exists_predicate);
> + }
> + else
> + {
> + DBUG_PRINT("info", ("Subquery can't be converted to semi-join"));
> + /*
> + Check if the subquery predicate can be executed via
materialization.
> + The required conditions are:
> + 1. Subquery predicate is an IN/=ANY subquery predicate
> + 2. Subquery is a single SELECT (not a UNION)
> + 3. Subquery is not a table-less query. In this case there is no
> + point in materializing.
> + 3A The upper query is not a confluent SELECT ... FROM DUAL. We
> + can't do materialization for SELECT .. FROM DUAL because it
> + does not call setup_subquery_materialization(). We could
make
> + SELECT ... FROM DUAL call that function but that doesn't seem
> + to be the case that is worth handling.
> + 4. Subquery predicate is a top-level predicate
> + (this implies it is not negated)
> + TODO: this is a limitation that should be lifted once we
> + implement correct NULL semantics (WL#3830)
> + 5. Subquery is non-correlated
> + TODO:
> + This is an overly restrictive condition. It can be extended to:
> + (Subquery is non-correlated ||
> + Subquery is correlated to any query outer to IN predicate ||
> + (Subquery is correlated to the immediate outer query &&
> + Subquery !contains {GROUP BY, ORDER BY [LIMIT],
> + aggregate functions}) && subquery predicate is not under
"NOT IN"))
> + 6. No execution method was already chosen (by a prepared
statement).
> + 7. Involved expression types allow materialization (temporary
only)
> +
> + (*) The subquery must be part of a SELECT statement. The current
> + condition also excludes multi-table update statements.
> +
> + We have to determine whether we will perform subquery
materialization
> + before calling the IN=>EXISTS transformation, so that we know
whether to
> + perform the whole transformation or only that part of it which
wraps
> + Item_in_subselect in an Item_in_optimizer.
> + */
> + if (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_MATERIALIZATION)
&&
> + in_predicate &&
// 1
> + !select_lex->is_part_of_union() &&
// 2
> + select_lex->master_unit()->first_select()->leaf_tables &&
// 3
> + thd->lex->sql_command == SQLCOM_SELECT &&
// *
> + select_lex->outer_select()->leaf_tables &&
// 3A
> + in_predicate->is_top_level_item() &&
// 4
> + !in_predicate->is_correlated &&
// 5
> + in_exists_predicate->exec_method ==
> + Item_exists_subselect::EXEC_UNSPECIFIED &&
// 6
> + subquery_types_allow_materialization(in_predicate))
// 7
> + in_predicate->exec_method=
Item_exists_subselect::EXEC_MATERIALIZATION;
> +
> + if (subq_predicate->select_transformer(join) ==
Item_subselect::RES_ERROR)
> + DBUG_RETURN(TRUE);
> + }
> + DBUG_RETURN(FALSE);
> +}
> +
> +/**
> + @brief Check if subquery predicate's compared types allow
materialization.
>
> - @param in_subs Subquery predicate, updated as follows:
> + @param predicate[in,out] subquery predicate, updated as follows:
> types_allow_materialization TRUE if subquery materialization is
allowed.
> sjm_scan_allowed If types_allow_materialization is TRUE,
> indicates whether it is possible to
use subquery
> @@ -953,49 +979,103 @@ err:
> */
>
> static
> -bool subquery_types_allow_materialization(Item_in_subselect *in_subs)
> +bool subquery_types_allow_materialization(Item_in_subselect *predicate)
Almost all data accessed in this method is members of
Item_in_subselect. This indicates that this function should really be
aw Item_in_subselect member function.
> {
> DBUG_ENTER("subquery_types_allow_materialization");
>
> - DBUG_ASSERT(in_subs->left_expr->fixed);
> -
> - List_iterator<Item> it(in_subs->unit->first_select()->item_list);
> - uint elements= in_subs->unit->first_select()->item_list.elements;
> + DBUG_ASSERT(predicate->left_expr->fixed);
>
> - in_subs->types_allow_materialization= FALSE; // Assign default values
> - in_subs->sjm_scan_allowed= FALSE;
> + List_iterator<Item>
it(predicate->unit->first_select()->item_list);
> + uint elements= predicate->unit->first_select()->item_list.elements;
>
> bool all_are_fields= TRUE;
> for (uint i= 0; i < elements; i++)
> {
> - Item *outer= in_subs->left_expr->element_index(i);
> + Item *outer= predicate->left_expr->element_index(i);
> Item *inner= it++;
Why not have iterators for both as in
semijoin_types_allow_materialization()?
> - all_are_fields &= (outer->real_item()->type() ==
Item::FIELD_ITEM &&
> - inner->real_item()->type() == Item::FIELD_ITEM);
> - if (outer->result_type() != inner->result_type())
> + all_are_fields&= (outer->real_item()->type() == Item::FIELD_ITEM
&&
> + inner->real_item()->type() == Item::FIELD_ITEM);
> + if (!types_allow_materialization(outer, inner))
> DBUG_RETURN(FALSE);
> - switch (outer->result_type()) {
> - case STRING_RESULT:
> - if (outer->is_datetime() != inner->is_datetime())
> - DBUG_RETURN(FALSE);
> -
> - if (!(outer->collation.collation == inner->collation.collation
> - /*&& outer->max_length <= inner->max_length */))
> - DBUG_RETURN(FALSE);
> - /*case INT_RESULT:
> - if (!(outer->unsigned_flag ^ inner->unsigned_flag))
> - DBUG_RETURN(FALSE); */
> - default:
> - ;/* suitable for materialization */
> - }
> }
> - in_subs->types_allow_materialization= TRUE;
> - in_subs->sjm_scan_allowed= all_are_fields;
> + predicate->types_allow_materialization= TRUE;
> + predicate->sjm_scan_allowed= all_are_fields;
> DBUG_PRINT("info",("subquery_types_allow_materialization: ok,
allowed"));
> DBUG_RETURN(TRUE);
> }
>
>
> +/**
> + @brief Check if semijoin's compared types allow materialization.
> +
> + @param sj_nest Semi-join nest containing information about
correlated
> + expressions.
> + Subquery predicate attached to sj_nest is updated
with
> + information about whether materialization is allowed.
> +
> + See subquery_types_allow_materialization() for additional
documentation
> +*/
> +
> +static
> +void semijoin_types_allow_materialization(TABLE_LIST *sj_nest)
> +{
> + DBUG_ENTER("semijoin_types_allow_materialization");
> +
> + List_iterator<Item> it1(sj_nest->nested_join->sj_outer_exprs);
> + List_iterator<Item> it2(sj_nest->nested_join->sj_inner_exprs);
> +
> + Item_exists_subselect *predicate= sj_nest->sj_subq_pred;
> + uint elements= sj_nest->nested_join->sj_outer_exprs.elements;
> +
> + bool all_are_fields= TRUE;
> + for (uint i= 0; i < elements; i++)
Why do you need to find the #elements and use that for iteration? It
seems better to me to just iterate until you reach the end of the
list. The way it is now, there is no check that outer or inner does
not become empty before the loop is terminated. At least, an assert
should be added.
> + {
> + Item *outer= it1++;
> + Item *inner= it2++;
> + all_are_fields &= (outer->real_item()->type() ==
Item::FIELD_ITEM &&
> + inner->real_item()->type() == Item::FIELD_ITEM);
> + if (!types_allow_materialization(outer, inner))
> + DBUG_VOID_RETURN;
> + }
> + predicate->types_allow_materialization= TRUE;
> + predicate->sjm_scan_allowed= all_are_fields;
> + DBUG_PRINT("info",("semijoin_types_allow_materialization: ok,
allowed"));
> + DBUG_VOID_RETURN;
> +}
I am concerned about the duplication of logic between
subquery_types_allow_materialization() and
semijoin_types_allow_materialization(). Why not make
types_allow_materialization() take two lists or iterators as
parameters and do the iteration in this common method?
> +
> +
> +/**
> + @brief Check if two items are compatible wrt. materialization.
> +
> + @param outer Expression from outer query
> + @param inner Expression from inner query
> +
> + @retval TRUE If subquery types allow materialization.
> + @retval FALSE Otherwise.
> +*/
> +
> +static bool types_allow_materialization(Item *outer, Item *inner)
> +
> +{
> + if (outer->result_type() != inner->result_type())
> + return FALSE;
> + switch (outer->result_type()) {
> + case STRING_RESULT:
> + if (outer->is_datetime() != inner->is_datetime())
> + return FALSE;
> + if (!(outer->collation.collation == inner->collation.collation
> + /*&& outer->max_length <= inner->max_length */))
> + return FALSE;
> + /*case INT_RESULT:
> + if (!(outer->unsigned_flag ^ inner->unsigned_flag))
> + return FALSE; */
Do you know if there is any good reason for keeping this inactive
code?
> + default:
> + ; /* suitable for materialization */
> + }
> + return TRUE;
> +}
> +
> +
> /*
> Remove the predicates pushed down into the subquery
>
> @@ -3205,8 +3285,8 @@ err:
> }
>
>
> -int subq_sj_candidate_cmp(Item_in_subselect* const *el1,
> - Item_in_subselect* const *el2)
> +int subq_sj_candidate_cmp(Item_exists_subselect* const *el1,
> + Item_exists_subselect* const *el2)
> {
> return ((*el1)->sj_convert_priority < (*el2)->sj_convert_priority)
? 1 :
> ( ((*el1)->sj_convert_priority ==
(*el2)->sj_convert_priority)? 0 : -1);
> @@ -3245,42 +3325,90 @@ void fix_list_after_tbl_changes(SELECT_L
> }
>
>
> -/*
> +/**
> Convert a subquery predicate into a TABLE_LIST semi-join nest
>
> - SYNOPSIS
> - convert_subq_to_sj()
> - parent_join Parent join, the one that has subq_pred in its
WHERE/ON
> - clause
> - subq_pred Subquery predicate to be converted
> -
> - DESCRIPTION
> - Convert a subquery predicate into a TABLE_LIST semi-join nest.
All the
> - prerequisites are already checked, so the conversion is always
successfull.
> + @param parent_join Parent join, which has subq_pred in its
WHERE/ON clause.
> + @param subq_pred Subquery predicate to be converted.
> + This is either an IN, =ANY or EXISTS predicate.
>
> - Prepared Statements: the transformation is permanent:
> - - Changes in TABLE_LIST structures are naturally permanent
> - - Item tree changes are performed on statement MEM_ROOT:
> - = we activate statement MEM_ROOT
> - = this function is called before the first
fix_prepare_information
> - call.
> + @retval FALSE OK
> + @retval TRUE Error
>
> - This is intended because the criteria for subquery-to-sj
conversion remain
> - constant for the lifetime of the Prepared Statement.
> + @details
> + Convert a subquery predicate into a TABLE_LIST semi-join nest. All
the
> + prerequisites are already checked, so the conversion is always
successfull.
No reason to repeat the summary in @details. Typo: "successfull"
>
> - RETURN
> - FALSE OK
> - TRUE Out of memory error
> + The following transformations are performed:
> +
> + 1. IN/=ANY predicates on the form:
> +
> + SELECT ...
> + FROM ot1 ... otN
> + WHERE (oe1, ... oeM) IN (SELECT ie1, ..., ieM)
> + FROM it1 ... itK
> + [WHERE inner-cond])
> + [AND outer-cond]
> + [GROUP BY ...] [HAVING ...] [ORDER BY ...]
> +
> + are transformed into:
> +
> + SELECT ...
> + FROM (ot1 ... otN) SJ (it1 ... itK)
> + ON (oe1, ... oeM) = (ie1, ..., ieM)
> + [AND inner-cond]
> + [WHERE outer-cond]
> + [GROUP BY ...] [HAVING ...] [ORDER BY ...]
> +
> + 2. EXISTS predicates on the form (this is for FUTURE implementation):
> +
> + SELECT ...
> + FROM ot1 ... otN
> + WHERE EXISTS (SELECT whatever
> + FROM it1 ... itK
> + [WHERE inner-cond])
> + [AND outer-cond]
> + [GROUP BY ...] [HAVING ...] [ORDER BY ...]
> +
> + are transformed into:
> +
> + SELECT ...
> + FROM (ot1 ... otN) SJ (it1 ... itK)
> + ON inner-cond
> + [WHERE outer-cond]
> + [GROUP BY ...] [HAVING ...] [ORDER BY ...]
> +
> + If inner-cond in the EXISTS subquery is empty, it is replaced with
"1=1",
> + which is always TRUE.
> +
> + Notice that the inner-cond may contain correlated and non-correlated
> + expressions. Further transformations will analyze and break up such
> + expressions.
> +
> + Prepared Statements: the transformation is permanent:
> + - Changes in TABLE_LIST structures are naturally permanent
> + - Item tree changes are performed on statement MEM_ROOT:
> + = we activate statement MEM_ROOT
> + = this function is called before the first
fix_prepare_information call.
> +
> + This is intended because the criteria for subquery-to-sj
conversion remain
> + constant for the lifetime of the Prepared Statement.
> */
>
> -bool convert_subq_to_sj(JOIN *parent_join, Item_in_subselect *subq_pred)
> +bool convert_subquery_to_semijoin(JOIN *parent_join,
> + Item_exists_subselect *subq_pred)
> {
> SELECT_LEX *parent_lex= parent_join->select_lex;
> TABLE_LIST *emb_tbl_nest= NULL;
> List<TABLE_LIST> *emb_join_list= &parent_lex->top_join_list;
> THD *thd= parent_join->thd;
> - DBUG_ENTER("convert_subq_to_sj");
> + DBUG_ENTER("convert_subquery_to_semijoin");
>
> + // Shorthand pointer used for IN subqueries
> + Item_in_subselect *in_subq_pred =
> + subq_pred->substype() == Item_subselect::IN_SUBS ?
> + (Item_in_subselect *)subq_pred :
> + NULL;
Do you really need this pointer outside the if-block for
IN-equalities?
> /*
> 1. Find out where to put the predicate into.
> Note: for "t1 LEFT JOIN t2" this will be t2, a leaf.
> @@ -3336,7 +3464,7 @@ bool convert_subq_to_sj(JOIN *parent_joi
> A3: changes in the TABLE_LIST::outer_join will make
everything work
> automatically.
> */
> - if (!(wrap_nest= alloc_join_nest(parent_join->thd)))
> + if (!(wrap_nest= alloc_join_nest(thd)))
> {
> DBUG_RETURN(TRUE);
> }
> @@ -3381,7 +3509,7 @@ bool convert_subq_to_sj(JOIN *parent_joi
>
> TABLE_LIST *sj_nest;
> NESTED_JOIN *nested_join;
> - if (!(sj_nest= alloc_join_nest(parent_join->thd)))
> + if (!(sj_nest= alloc_join_nest(thd)))
> {
> DBUG_RETURN(TRUE);
> }
> @@ -3439,8 +3567,13 @@ bool convert_subq_to_sj(JOIN *parent_joi
> /* 3. Remove the original subquery predicate from the WHERE/ON */
>
> // The subqueries were replaced for Item_int(1) earlier
> - subq_pred->exec_method=
> - Item_in_subselect::SEMI_JOIN; // for subsequent executions
> + if (in_subq_pred != NULL)
> + {
> + // @todo: Add semi-join execution for EXISTS as well.
> + in_subq_pred->exec_method= Item_exists_subselect::EXEC_SEMI_JOIN;
> + /* Left side of IN predicate is already resolved */
> + DBUG_ASSERT(in_subq_pred->left_expr->fixed);
> + }
Why do you need to do the above here? Can it not wait until the
big if-block for IN-equalities?
> /*TODO: also reset the 'with_subselect' there. */
>
> /* n. Adjust the parent_join->tables counter */
> @@ -3457,26 +3590,34 @@ bool convert_subq_to_sj(JOIN *parent_joi
> emb= emb->embedding)
> emb->select_lex= parent_join->select_lex;
> }
> - parent_join->tables += subq_lex->join->tables;
> + parent_join->tables+= subq_lex->join->tables;
>
> - /*
> - Put the subquery's WHERE into semi-join's sj_on_expr
> - Add the subquery-induced equalities too.
> - */
> - SELECT_LEX *save_lex= thd->lex->current_select;
> - thd->lex->current_select=subq_lex;
> - if (!subq_pred->left_expr->fixed &&
> - subq_pred->left_expr->fix_fields(thd, &subq_pred->left_expr))
> - DBUG_RETURN(TRUE);
> - thd->lex->current_select=save_lex;
> -
> - sj_nest->nested_join->sj_corr_tables= subq_pred->used_tables();
> - sj_nest->nested_join->sj_depends_on= subq_pred->used_tables() |
> -
subq_pred->left_expr->used_tables();
> - sj_nest->sj_on_expr= subq_lex->where;
> + nested_join->sj_outer_exprs.empty();
> + nested_join->sj_inner_exprs.empty();
>
> /*
> - Create the IN-equalities and inject them into semi-join's ON
expression.
> + @todo: Convert the following if statement to a switch and add a case
> + for EXISTS queries.
> + */
> + if (subq_pred->substype() == Item_subselect::IN_SUBS)
> + {
> + /*
> + sj_corr_tables is supposed to contain non-trivially correlated
tables,
> + but here it is set to contain all correlated tables.
> + */
What is the intention of this comment? Is it something that we should
fix? If so, a TODO or something would be good. Otherwise, a
justification is needed.
> + nested_join->sj_corr_tables= subq_pred->used_tables();
> + /*
> + sj_depends_on contains the set of outer tables referred in the
> + subquery's WHERE clause as well as tables referred in the IN
predicate's
> + left-hand side.
> + */
> + nested_join->sj_depends_on= subq_pred->used_tables() |
> + in_subq_pred->left_expr->used_tables();
> + /* Put the subquery's WHERE into semi-join's condition. */
> + sj_nest->sj_on_expr= subq_lex->where;
> +
> + /*
> + Create the IN-equalities and inject them into semi-join's ON
condition.
> Additionally, for LooseScan strategy
> - Record the number of IN-equalities.
> - Create list of pointers to (oe1, ..., ieN). We'll need the
list to
> @@ -3489,41 +3630,40 @@ bool convert_subq_to_sj(JOIN *parent_joi
> pointers to Item_direct_view_refs are guaranteed to be stable as
> Item_direct_view_refs doesn't substitute itself with anything in
> Item_direct_view_ref::fix_fields.
> - */
> - sj_nest->sj_in_exprs= subq_pred->left_expr->cols();
> - sj_nest->nested_join->sj_outer_expr_list.empty();
> + */
>
> - if (subq_pred->left_expr->cols() == 1)
> - {
> - nested_join->sj_outer_expr_list.push_back(subq_pred->left_expr);
> - Item_func_eq *item_eq=
> - new Item_func_eq(subq_pred->left_expr,
subq_lex->ref_pointer_array[0]);
> - item_eq->in_equality_no= 0;
> - sj_nest->sj_on_expr= and_items(sj_nest->sj_on_expr, item_eq);
> - }
> - else
> - {
> - for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
> + for (uint i= 0; i < in_subq_pred->left_expr->cols(); i++)
> {
> - nested_join->sj_outer_expr_list.push_back(subq_pred->left_expr->
> - element_index(i));
> + nested_join->sj_outer_exprs.push_back(in_subq_pred->left_expr->
> + element_index(i));
> +
nested_join->sj_inner_exprs.push_back(subq_lex->ref_pointer_array[i]);
> +
> Item_func_eq *item_eq=
> - new Item_func_eq(subq_pred->left_expr->element_index(i),
> + new Item_func_eq(in_subq_pred->left_expr->element_index(i),
> subq_lex->ref_pointer_array[i]);
> + if (item_eq == NULL)
> + DBUG_RETURN(TRUE);
> +
> + /*
> + Mark this item as a subquery equality predicate. Currently, we
> + only check that this field is different from UINT_MAX, so a
boolean
> + would do better than an integer.
> + */
> item_eq->in_equality_no= i;
> sj_nest->sj_on_expr= and_items(sj_nest->sj_on_expr, item_eq);
> + if (sj_nest->sj_on_expr == NULL)
> + DBUG_RETURN(TRUE);
> }
> + /* Fix the created equality and AND */
> + sj_nest->sj_on_expr->fix_fields(thd, &sj_nest->sj_on_expr);
> }
> - /* Fix the created equality and AND */
> - sj_nest->sj_on_expr->fix_fields(parent_join->thd,
&sj_nest->sj_on_expr);
>
> /*
> Walk through sj nest's WHERE and ON expressions and call
> item->fix_table_changes() for all items.
> */
> sj_nest->sj_on_expr->fix_after_pullout(parent_lex,
&sj_nest->sj_on_expr);
> - fix_list_after_tbl_changes(parent_lex,
&sj_nest->nested_join->join_list);
> -
> + fix_list_after_tbl_changes(parent_lex, &nested_join->join_list);
>
> /* Unlink the child select_lex so it doesn't show up in EXPLAIN: */
> subq_lex->master_unit()->exclude_level();
> @@ -3531,20 +3671,25 @@ bool convert_subq_to_sj(JOIN *parent_joi
> DBUG_EXECUTE("where",
> print_where(sj_nest->sj_on_expr,"SJ-EXPR",
QT_ORDINARY););
>
> - /* Inject sj_on_expr into the parent's WHERE or ON */
> if (emb_tbl_nest)
> {
> + /* Inject sj_on_expr into the parent's ON condition */
> emb_tbl_nest->on_expr= and_items(emb_tbl_nest->on_expr,
> sj_nest->sj_on_expr);
> - emb_tbl_nest->on_expr->fix_fields(parent_join->thd,
&emb_tbl_nest->on_expr);
> + if (emb_tbl_nest->on_expr == NULL)
> + DBUG_RETURN(TRUE);
> + emb_tbl_nest->on_expr->fix_fields(thd, &emb_tbl_nest->on_expr);
> }
> else
> {
> - /* Inject into the WHERE */
> + /* Inject sj_on_expr into the parent's WHERE condition */
> parent_join->conds= and_items(parent_join->conds,
sj_nest->sj_on_expr);
> - parent_join->conds->fix_fields(parent_join->thd,
&parent_join->conds);
> + if (parent_join->conds == NULL)
> + DBUG_RETURN(TRUE);
> + parent_join->conds->fix_fields(thd, &parent_join->conds);
> parent_join->select_lex->where= parent_join->conds;
> }
> + semijoin_types_allow_materialization(sj_nest);
Any particular reason for calling this here? Could it not wait until
the info is actually needed?
>
> if (subq_lex->ftfunc_list->elements)
> {
> @@ -3608,18 +3753,19 @@ bool convert_subq_to_sj(JOIN *parent_joi
> bool JOIN::flatten_subqueries()
> {
> Query_arena *arena, backup;
> - Item_in_subselect **in_subq;
> - Item_in_subselect **in_subq_end;
> + Item_exists_subselect **subq;
> + Item_exists_subselect **subq_end;
> DBUG_ENTER("JOIN::flatten_subqueries");
>
> if (sj_subselects.elements() == 0)
> DBUG_RETURN(FALSE);
>
> /* First, convert child join's subqueries. We proceed bottom-up
here */
> - for (in_subq= sj_subselects.front(), in_subq_end=
sj_subselects.back();
> - in_subq != in_subq_end; in_subq++)
> + for (subq= sj_subselects.front(), subq_end= sj_subselects.back();
> + subq != subq_end;
> + subq++)
> {
> - st_select_lex *child_select= (*in_subq)->get_select_lex();
> + st_select_lex *child_select= (*subq)->get_select_lex();
> JOIN *child_join= child_select->join;
> child_join->outer_tables = child_join->tables;
>
> @@ -3633,8 +3779,8 @@ bool JOIN::flatten_subqueries()
>
> if (child_join->flatten_subqueries())
> DBUG_RETURN(TRUE);
> - (*in_subq)->sj_convert_priority=
> - (*in_subq)->is_correlated * MAX_TABLES + child_join->outer_tables;
> + (*subq)->sj_convert_priority=
> + (*subq)->is_correlated * MAX_TABLES + child_join->outer_tables;
> }
>
> // Temporary measure: disable semi-joins when they are together
with outer
> @@ -3645,7 +3791,7 @@ bool JOIN::flatten_subqueries()
> if (tbl->on_expr || (tbl->embedding && !(embedding->sj_on_expr
&&
> !embedding->embedding)))
> {
> - in_subq= sj_subselects.front();
> + subq= sj_subselects.front();
> arena= thd->activate_stmt_arena_if_needed(&backup);
> goto skip_conversion;
> }
> @@ -3662,66 +3808,74 @@ bool JOIN::flatten_subqueries()
> // #tables-in-parent-query + #tables-in-subquery < MAX_TABLES
> /* Replace all subqueries to be flattened with Item_int(1) */
> arena= thd->activate_stmt_arena_if_needed(&backup);
> - for (in_subq= sj_subselects.front();
> - in_subq != in_subq_end &&
> - tables + (*in_subq)->unit->first_select()->join->tables <
MAX_TABLES;
> - in_subq++)
> - {
> - Item **tree= ((*in_subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
> - &conds : &((*in_subq)->emb_on_expr_nest->on_expr);
> - if (replace_where_subcondition(this, tree, *in_subq, new
Item_int(1),
> + for (subq= sj_subselects.front();
> + subq != subq_end &&
> + tables + (*subq)->unit->first_select()->join->tables <
MAX_TABLES;
> + subq++)
> + {
> + Item **tree= ((*subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
> + &conds : &((*subq)->emb_on_expr_nest->on_expr);
> + if (replace_where_subcondition(this, tree, *subq, new Item_int(1),
> FALSE))
> DBUG_RETURN(TRUE); /* purecov: inspected */
> }
>
> - for (in_subq= sj_subselects.front();
> - in_subq != in_subq_end &&
> - tables + (*in_subq)->unit->first_select()->join->tables <
MAX_TABLES;
> - in_subq++)
> + for (subq= sj_subselects.front();
> + subq != subq_end &&
> + tables + (*subq)->unit->first_select()->join->tables <
MAX_TABLES;
> + subq++)
> {
> - if (convert_subq_to_sj(this, *in_subq))
> + if (convert_subquery_to_semijoin(this, *subq))
> DBUG_RETURN(TRUE);
> }
> skip_conversion:
> /*
> - 3. Finalize (perform IN->EXISTS rewrite) the subqueries that we
didn't
> - convert:
> + 3. Finalize the subqueries that we did not convert,
> + ie. perform IN->EXISTS rewrite.
> */
> - for (; in_subq!= in_subq_end; in_subq++)
> + for (; subq!= subq_end; subq++)
> {
> - JOIN *child_join= (*in_subq)->unit->first_select()->join;
> + JOIN *child_join= (*subq)->unit->first_select()->join;
> Item_subselect::trans_res res;
> - (*in_subq)->changed= 0;
> - (*in_subq)->fixed= 0;
> + (*subq)->changed= 0;
> + (*subq)->fixed= 0;
>
> SELECT_LEX *save_select_lex= thd->lex->current_select;
> - thd->lex->current_select= (*in_subq)->unit->first_select();
> + thd->lex->current_select= (*subq)->unit->first_select();
>
> - res= (*in_subq)->select_transformer(child_join);
> + res= (*subq)->select_transformer(child_join);
>
> thd->lex->current_select= save_select_lex;
>
> if (res == Item_subselect::RES_ERROR)
> DBUG_RETURN(TRUE);
>
> - (*in_subq)->changed= 1;
> - (*in_subq)->fixed= 1;
> + (*subq)->changed= 1;
> + (*subq)->fixed= 1;
>
> - Item *substitute= (*in_subq)->substitution;
> - bool do_fix_fields= !(*in_subq)->substitution->fixed;
> - Item **tree= ((*in_subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
> - &conds : &((*in_subq)->emb_on_expr_nest->on_expr);
> - if (replace_where_subcondition(this, tree, *in_subq, substitute,
> + Item *substitute= (*subq)->substitution;
> + /*
> + @todo: When EXISTS transformation is implemented, "substitute"
will be
> + NULL here and we need to skip the remainder of the for loop
for this
> + subquery. The check is also related to handling of outer joins and
> + may be skipped when semijoins can be combined with outer joins.
> + */
> + bool do_fix_fields= !(*subq)->substitution->fixed;
> + Item **tree= ((*subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
> + &conds : &((*subq)->emb_on_expr_nest->on_expr);
> + if (replace_where_subcondition(this, tree, *subq, substitute,
> do_fix_fields))
> DBUG_RETURN(TRUE);
> - (*in_subq)->substitution= NULL;
> +
> + (*subq)->substitution= NULL;
>
> if (!thd->stmt_arena->is_conventional())
> {
> - tree= ((*in_subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
> - &select_lex->prep_where :
&((*in_subq)->emb_on_expr_nest->prep_on_expr);
> + tree= ((*subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
> + &select_lex->prep_where :
> + &((*subq)->emb_on_expr_nest->prep_on_expr);
>
> - if (replace_where_subcondition(this, tree, *in_subq, substitute,
> + if (replace_where_subcondition(this, tree, *subq, substitute,
> FALSE))
> DBUG_RETURN(TRUE);
> }
> @@ -3769,7 +3923,8 @@ bool JOIN::setup_subquery_materializatio
> subquery_predicate->substype() == Item_subselect::IN_SUBS)
> {
> Item_in_subselect *in_subs= (Item_in_subselect*)
subquery_predicate;
> - if (in_subs->exec_method ==
Item_in_subselect::MATERIALIZATION &&
> + if (in_subs->exec_method ==
> + Item_exists_subselect::EXEC_MATERIALIZATION &&
> in_subs->setup_engine())
I like the previous formatting better.
> return TRUE;
> }
> @@ -3946,7 +4101,6 @@ int pull_out_semijoin_tables(JOIN *join)
> Pulling a table out of uncorrelated subquery in
general makes
> makes it correlated. See the NOTE to this funtion.
> */
> - sj_nest->sj_subq_pred->is_correlated= TRUE;
> sj_nest->nested_join->sj_corr_tables|= tbl->table->map;
> sj_nest->nested_join->sj_depends_on|= tbl->table->map;
> }
> @@ -4634,10 +4788,16 @@ static bool optimize_semijoin_nests(JOIN
> /* semi-join nests with only constant tables are not valid */
> DBUG_ASSERT(sj_nest->sj_inner_tables & ~join->const_table_map);
>
> + /*
> + Try semijoin materialization if
> + 1) Not all tables have been pulled out of the semijoin nest, and
> + 2) The semijoin is classified as non-trivially-correlated, and
> + 3) Data types allow execution with materialization
> + */
> sj_nest->sj_mat_info= NULL;
> - if (sj_nest->sj_inner_tables && /* not everything was pulled
out */
> - !sj_nest->sj_subq_pred->is_correlated &&
> - sj_nest->sj_subq_pred->types_allow_materialization)
> + if (sj_nest->sj_inner_tables &&
// (1)
> + !sj_nest->nested_join->sj_corr_tables &&
// (2)
> + sj_nest->sj_subq_pred->types_allow_materialization)
// (3)
> {
> join->emb_sjm_nest= sj_nest;
> if (choose_plan(join, all_table_map & ~join->const_table_map))
> @@ -4890,49 +5050,41 @@ merge_key_fields(KEY_FIELD *start,KEY_FI
> return first_free;
> }
>
> -/*
> +
> +/**
> Given a field, return its index in semi-join's select list, or
UINT_MAX
>
> - DESCRIPTION
> - Given a field, we find its table; then see if the table is within a
> - semi-join nest and if the field was in select list of the subselect.
> - If it was, we return field's index in the select list. The value
is used
> - by LooseScan strategy.
> + @param field Field that we are looking up table for
> +
> + @retval =UINT_MAX Field is not from a semijoin-transformed subquery
> + @retval <UINT_MAX Index in select list of subquery
> +
> + @details
> + Given a field, find its table; then see if the table is within a
> + semi-join nest and if the field was in select list of the subquery
> + (if subquery was part of a quantified comparison predicate), or
> + the field was a result of subquery decorrelation.
> + If it was, then return the field's index in the select list.
> + The value is used by LooseScan strategy.
> */
>
> static uint get_semi_join_select_list_index(Field *field)
> {
> - uint res= UINT_MAX;
> TABLE_LIST *emb_sj_nest;
> if ((emb_sj_nest= field->table->pos_in_table_list->embedding) &&
> emb_sj_nest->sj_on_expr)
> {
> - Item_in_subselect *subq_pred= emb_sj_nest->sj_subq_pred;
> - st_select_lex *subq_lex= subq_pred->unit->first_select();
> - if (subq_pred->left_expr->cols() == 1)
> + List<Item> &items= emb_sj_nest->nested_join->sj_inner_exprs;
> + List_iterator<Item> it(items);
> + for (uint i= 0; i < items.elements; i++)
> {
> - Item *sel_item= subq_lex->ref_pointer_array[0];
> + Item *sel_item= it++;
> if (sel_item->type() == Item::FIELD_ITEM &&
> ((Item_field*)sel_item)->field->eq(field))
> - {
> - res= 0;
> - }
> - }
> - else
> - {
> - for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
> - {
> - Item *sel_item= subq_lex->ref_pointer_array[i];
> - if (sel_item->type() == Item::FIELD_ITEM &&
> - ((Item_field*)sel_item)->field->eq(field))
> - {
> - res= i;
> - break;
> - }
> - }
> + return i;
> }
> }
> - return res;
> + return UINT_MAX;
> }
>
> /**
> @@ -6013,7 +6165,7 @@ set_position(JOIN *join,uint idx,JOIN_TA
> ulonglong get_bound_sj_equalities(TABLE_LIST *sj_nest,
> table_map remaining_tables)
> {
> - List_iterator<Item> li(sj_nest->nested_join->sj_outer_expr_list);
> + List_iterator<Item> li(sj_nest->nested_join->sj_outer_exprs);
> Item *item;
> uint i= 0;
> ulonglong res= 0;
> @@ -6119,10 +6271,10 @@ public:
> */
> best_loose_scan_cost= DBL_MAX;
> if (!join->emb_sjm_nest && s->emb_sj_nest &&
// (1)
> - s->emb_sj_nest->sj_in_exprs < 64 &&
> + s->emb_sj_nest->nested_join->sj_inner_exprs.elements < 64
&&
> ((remaining_tables & s->emb_sj_nest->sj_inner_tables) ==
// (2)
> s->emb_sj_nest->sj_inner_tables) &&
// (2)
> - join->cur_sj_inner_tables == 0 &&
// (3)
> + join->cur_sj_inner_tables == 0 &&
// (3)
> !(remaining_tables &
> s->emb_sj_nest->nested_join->sj_corr_tables) &&
// (4)
> remaining_tables &
s->emb_sj_nest->nested_join->sj_depends_on &&// (5)
> @@ -6184,12 +6336,13 @@ public:
> (handled_col|bound_col)* (other_col|bound_col)
>
> */
> - if (try_loosescan && // (1)
> - (handled_sj_equalities | bound_sj_equalities) == // (2)
> - PREV_BITS(ulonglong, s->emb_sj_nest->sj_in_exprs) && //
(2)
> - (PREV_BITS(key_part_map, max_loose_keypart+1) & // (3)
> - (found_part | loose_scan_keyparts)) == // (3)
> - (found_part | loose_scan_keyparts) && // (3)
> + if (try_loosescan &&
// (1)
> + (handled_sj_equalities | bound_sj_equalities) ==
// (2)
> + PREV_BITS(ulonglong,
> +
s->emb_sj_nest->nested_join->sj_inner_exprs.elements)&& // (2)
> + (PREV_BITS(key_part_map, max_loose_keypart+1) &
// (3)
> + (found_part | loose_scan_keyparts)) ==
// (3)
> + (found_part | loose_scan_keyparts) &&
// (3)
> !key_uses_partial_cols(s->table, key))
> {
> /* Ok, can use the strategy */
> @@ -9219,9 +9372,13 @@ static bool make_join_select(JOIN *join,
> if (pushdown_on_conditions(join, tab))
> DBUG_RETURN(1);
>
> - if (save_used_tables && !(used_tables &
> - ~(tab->emb_sj_nest->sj_inner_tables |
> - join->const_table_map |
PSEUDO_TABLE_BITS)))
> + if (save_used_tables)
> + {
> + DBUG_ASSERT(tab->emb_sj_nest);
> + }
> + if (save_used_tables && // Inside an SJM nest, and
> + !(tab->emb_sj_nest->sj_inner_tables &
> + ~used_tables)) // All inner tables in
prefix
> {
> /*
> We have reached the end of semi join nest. That is, the
join order
> @@ -9230,12 +9387,8 @@ static bool make_join_select(JOIN *join,
> outer_tbl1 SJ-Materialize(inner_tbl1 ... inner_tblN)
outer_tbl ...
> ^
>
\-we're here
> - At this point, we need to produce two conditions
> - - A condition that can be checked when we have all of the
sj-inner
> - tables (inner_tbl1 ... inner_tblN). This will be used
while doing
> - materialization.
> - - A condition that can be checked when we have all of the
tables
> - in the prefix (both inner and outer).
> + At this point, we need to produce a condition that can be
checked
> + when we have all of the tables in the prefix (both inner
and outer).
> */
> tab->emb_sj_nest->sj_mat_info->join_cond=
> cond ?
I think we have aggreed earlier to not include the above two diffs in
this patch.
> @@ -9828,47 +9981,40 @@ void remove_sj_conds(Item **tree)
>
>
> /*
> - Create subquery IN-equalities assuming use of materialization strategy
> + Create subquery equalities assuming use of materialization strategy
>
> - SYNOPSIS
> - create_subq_in_equalities()
> - thd Thread handle
> - sjm Semi-join materialization structure
> - subq_pred The subquery predicate
> + @param thd Thread handle
> + @param sjm Semi-join materialization structure
> + @param sj_nest Semi-join nest
>
> - DESCRIPTION
> - Create subquery IN-equality predicates. That is, for a subquery
> + @retval <>NULL Created condition
> + @retval = NULL Error
> +
> + @details
> + Create subquery equality predicates. That is, for a subquery
>
> - (oe1, oe2, ...) IN (SELECT ie1, ie2, ... FROM ...)
> + (oe1, oe2, ...) IN (SELECT ie1, ie2, ... FROM ...)
>
> - create "oe1=ie1 AND ie1=ie2 AND ..." expression, such that ie1,
ie2, ..
> - refer to the columns of the table that's used to materialize the
> - subquery.
> -
> - RETURN
> - Created condition
> + create "oe1=ie1 AND oe2=ie2 AND ..." expression, such that ie1,
ie2, ..
> + refer to the columns of the table that is used to materialize the
subquery.
> + This function will also generate proper equality predicates for
> + trivially-correlated subqueries corresponding to the above IN query.
> */
>
> -Item *create_subq_in_equalities(THD *thd, SJ_MATERIALIZATION_INFO *sjm,
> - Item_in_subselect *subq_pred)
> +Item *create_subq_equalities(THD *thd, SJ_MATERIALIZATION_INFO *sjm,
> + TABLE_LIST *sj_nest)
> {
> Item *res= NULL;
> - if (subq_pred->left_expr->cols() == 1)
> - {
> - if (!(res= new Item_func_eq(subq_pred->left_expr,
> - new Item_field(sjm->table->field[0]))))
> - return NULL; /* purecov: inspected */
> - }
> - else
> + List_iterator<Item> outer_expr(sj_nest->nested_join->sj_outer_exprs);
> +
> + for (uint i= 0; i < sj_nest->nested_join->sj_outer_exprs.elements;
i++)
> {
> Item *conj;
> - for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
> - {
> - if (!(conj= new
Item_func_eq(subq_pred->left_expr->element_index(i),
> - new
Item_field(sjm->table->field[i]))) ||
> - !(res= and_items(res, conj)))
> - return NULL; /* purecov: inspected */
> - }
> + Item *outer= outer_expr++;
> + if (!(conj= new Item_func_eq(outer, new
Item_field(sjm->table->field[i]))))
> + return NULL; /* purecov: inspected */
> + if (!(res= and_items(res, conj)))
> + return NULL; /* purecov: inspected */
> }
> if (res->fix_fields(thd, &res))
> return NULL; /* purecov: inspected */
> @@ -9905,8 +10051,7 @@ bool setup_sj_materialization(JOIN_TAB *
> SJ_MATERIALIZATION_INFO *sjm= emb_sj_nest->sj_mat_info;
> THD *thd= tab->join->thd;
> /* First the calls come to the materialization function */
> - List<Item> &item_list=
emb_sj_nest->sj_subq_pred->unit->first_select()->item_list;
> -
> + List<Item> &item_list= emb_sj_nest->nested_join->sj_inner_exprs;
> /*
> Set up the table to write to, do as
select_union::create_result_table does
> */
> @@ -9965,10 +10110,11 @@ bool setup_sj_materialization(JOIN_TAB *
> KEY_PART_INFO *cur_key_part= tmp_key->key_part;
> store_key **ref_key= tab_ref->key_copy;
> uchar *cur_ref_buff= tab_ref->key_buff;
> -
> + List_iterator<Item>
outer_expr(emb_sj_nest->nested_join->sj_outer_exprs);
> +
> for (i= 0; i < tmp_key_parts; i++, cur_key_part++, ref_key++)
> {
> - tab_ref->items[i]=
emb_sj_nest->sj_subq_pred->left_expr->element_index(i);
> + tab_ref->items[i]= outer_expr++;
> int null_count= test(cur_key_part->field->real_maybe_null());
> *ref_key= new store_key_item(thd, cur_key_part->field,
> /* TODO:
> @@ -9999,8 +10145,7 @@ bool setup_sj_materialization(JOIN_TAB *
> if (tab[i].select)
> remove_sj_conds(&tab[i].select->cond);
> }
> - if (!(sjm->in_equality= create_subq_in_equalities(thd, sjm,
> -
emb_sj_nest->sj_subq_pred)))
> + if (!(sjm->in_equality= create_subq_equalities(thd, sjm,
emb_sj_nest)))
> DBUG_RETURN(TRUE); /* purecov: inspected */
> }
> else
> @@ -13224,11 +13369,9 @@ void advance_sj_state(JOIN *join, table_
> join->thd->optimizer_switch_flag(OPTIMIZER_SWITCH_FIRSTMATCH))
> {
> const table_map outer_corr_tables=
> - new_join_tab->emb_sj_nest->nested_join->sj_corr_tables |
> new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
> const table_map sj_inner_tables=
> new_join_tab->emb_sj_nest->sj_inner_tables;
> -
> /*
> Enter condition:
> 1. The next join tab belongs to semi-join nest
> @@ -13323,8 +13466,7 @@ void advance_sj_state(JOIN *join, table_
> pos->first_loosescan_table= idx;
> pos->loosescan_need_tables=
> new_join_tab->emb_sj_nest->sj_inner_tables |
> - new_join_tab->emb_sj_nest->nested_join->sj_depends_on |
> - new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
> + new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
> }
>
> if ((pos->first_loosescan_table != MAX_TABLES) &&
> @@ -13409,8 +13551,7 @@ void advance_sj_state(JOIN *join, table_
> */
> pos->sjm_scan_need_tables=
> new_join_tab->emb_sj_nest->sj_inner_tables |
> - new_join_tab->emb_sj_nest->nested_join->sj_depends_on |
> - new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
> + new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
> pos->sjm_scan_last_inner= idx;
> }
> else
> @@ -13525,8 +13666,7 @@ void advance_sj_state(JOIN *join, table_
> pos->first_dupsweedout_table= idx;
>
> pos->dupsweedout_tables |= nest->sj_inner_tables |
> - nest->nested_join->sj_depends_on |
> - nest->nested_join->sj_corr_tables;
> + nest->nested_join->sj_depends_on;
> }
>
> if (pos->dupsweedout_tables &&
> @@ -18336,6 +18476,26 @@ make_cond_for_table_from_pred(COND *root
> }
>
>
> +/**
> + Generate a condition that can be checked after materializing a
semi-join nest
> +
> + @param root_cond Root condition, ancestor of the condition being
analyzed.
> + @param cond Condition to analyze.
> + @param tables Tables in the outer part of the join nest.
> + @param sjm_tables Tables within the semi-join nest (the inner part).
> +
> + @retval <>NULL Generated condition
> + @retval = NULL Already checked, or error
> +
> + @details
> + A regular semi-join materialization is always non-correlated, ie
> + the subquery is always resolved by performing a lookup generated in
> + create_subq_equalities, hence this function should never produce
> + any condition for regular semi-join materialization.
> + For a scan semi-join materialization, this function may return a
condition
> + to be checked.
> +*/
> +
> static COND *
> make_cond_after_sjm(Item *root_cond, Item *cond, table_map tables,
> table_map sjm_tables)
>
> === modified file 'sql/sql_select.h'
> --- a/sql/sql_select.h 2010-03-01 10:04:07 +0000
> +++ b/sql/sql_select.h 2010-03-05 09:43:47 +0000
> @@ -1649,7 +1649,7 @@ public:
> bool union_part; ///< this subselect is part of union
> bool optimized; ///< flag to avoid double optimization in EXPLAIN
>
> - Array<Item_in_subselect> sj_subselects;
> + Array<Item_exists_subselect> sj_subselects;
>
> /* Temporary tables used to weed-out semi-join duplicates */
> List<TABLE> sj_tmp_tables;
>
> === modified file 'sql/table.h'
> --- a/sql/table.h 2010-02-25 13:55:28 +0000
> +++ b/sql/table.h 2010-03-05 09:43:47 +0000
> @@ -1147,7 +1147,7 @@ enum enum_open_type
>
> class SJ_MATERIALIZATION_INFO;
> class Index_hint;
> -class Item_in_subselect;
> +class Item_exists_subselect;
>
>
> /*
> @@ -1223,7 +1223,7 @@ struct TABLE_LIST
> char *db, *alias, *table_name, *schema_table_name;
> char *option; /* Used by cache index */
> Item *on_expr; /* Used with outer join */
> - Item *sj_on_expr;
> + Item *sj_on_expr; /* Synthesized semijoin
condition */
> /*
> (Valid only for semi-join nests) Bitmap of tables that are
within the
> semi-join (this is different from bitmap of all nest's children
because
> @@ -1231,9 +1231,7 @@ struct TABLE_LIST
> nest's children).
> */
> table_map sj_inner_tables;
> - /* Number of IN-compared expressions */
> - uint sj_in_exprs;
> - Item_in_subselect *sj_subq_pred;
> + Item_exists_subselect *sj_subq_pred;
> SJ_MATERIALIZATION_INFO *sj_mat_info;
>
> /*
> @@ -1795,13 +1793,17 @@ typedef struct st_nested_join
> uint counter_;
> nested_join_map nj_map; /* Bit used to identify this
nested join*/
> /*
> - (Valid only for semi-join nests) Bitmap of tables outside the
semi-join
> - that are used within the semi-join's ON condition.
> + Tables outside the semi-join that are used within the semi-join's
> + ON condition (ie. the subquery WHERE clause and optional IN
equalities).
> */
> table_map sj_depends_on;
> - /* Outer non-trivially correlated tables */
> + /* Outer non-trivially correlated tables, a true subset of
sj_depends_on */
> table_map sj_corr_tables;
> - List<Item> sj_outer_expr_list;
> + /*
> + Lists of trivially-correlated expressions from the outer and
inner tables
> + of the semi-join, respectively.
> + */
> + List<Item> sj_outer_exprs, sj_inner_exprs;
> } NESTED_JOIN;
>
>
>
>
>
> ------------------------------------------------------------------------
>
>
That's all for now,
--
Øystein
