List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:April 15 2010 11:06am
Subject:Re: bzr commit into mysql-6.0-codebase-bugfixing branch
(roy.lyseng:3790) WL#5266
View as plain text  
Roy,

Thanks for the refactoring.  Several good things here. See my comments 
in-line:


Roy Lyseng wrote:
 > #At file:///home/rl136806/mysql/repo/mysql-6.0-work2/ based on 
revid:alik@stripped
 >
 >  3790 Roy Lyseng	2010-03-05
 >       WL#5266 - Refactor data used to collect correlated expressions 
in semijoins
 >
 >       sql/item_subselect.cc
 >         Modified constructors for Item_exists_subselect and 
Item_in_subselect.
 >         Performed necessary updates after changes to subselect classes.
 >
 >       sql/item_subselect.h
 >         Fields exec_method, sj_convert_priority, emb_on_expr_nest,
 >         expr_join_nest, types_allow_materialization, sjm_scan_allowed
 >         moved from class Item_in_subselect to Item_exists_subselect,
 >         to prepare for future semijoin transformation of EXISTS.
 >
 >         Modified constructors for Item_exists_subselect and 
Item_in_subselect.
 >
 >         Added a simple select_transformer() function for 
Item_exists_subselect.
 >
 >       sql/sql_lex.cc
 >         Changes in exec_method constant values.
 >
 >       sql/sql_select.cc
 >         New function resolve_subquery() is refactored out of 
JOIN::prepare().
 >         It performs all resolving, including preparing semijoin 
transformations,
 >         for predicates that involve a subquery (ie IN, EXISTS, ANY/ALL).
 >
 >         In JOIN::prepare(), removed out-commented test on 
Query_arena::PREPARED
 >         because it did not work as anticipated, and it could be omitted.
 >
 >         In JOIN::prepare(), removed call to 
subquery_types_allow_materialization().
 >         Now handled by call to semijoin_types_allow_materialization() in
 >         convert_subquery_to_semijoin().
 >
 >         subquery_types_allow_materialization() is modified so that it 
is only
 >         used for non-semijoin-transformed subqueries.
 >         semijoin_types_allow_materialization() is created to handle
 >         semijoin-transformed subqueries.
 >         Some common functionality is factored out into
 >         types_allow_materialization().
 >
 >         convert_subq_to_sj() is renamed to 
convert_subquery_to_semijoin().
 >         Doxygen comments are added, more comments about the actual 
transformation
 >         are added.
 >
 >         Removed unnecessary fix_fields() on subq_pred->left_expr in
 >         convert_subquery_to_semijoin(), replaced with DBUG_ASSERT().
 >
 >         Pushed IN expressions into the lists 
nested_join->sj_outer_exprs and
 >         nested_join->sj_inner_exprs, added comments on use of 
sj_corr_tables
 >         and sj_depends_on.
 >
 >         Added error handling when creating objects for the semijoin 
condition.
 >
 >         In JOIN::flatten_subqueries(), modified pointer types so that 
they
 >         allow EXISTS predicates to be transformed (for future work).
 >
 >         removed manipulation of the is_correlated flag when the 
subquery is

removed => Removed

 >         transformed to a semijoin. The set sj_corr_tables is 
maintained instead
 >         of this, and can be used for testing wherever is_correlated 
was used.
 >
 >         In get_semi_join_select_list_index(), use sj_inner_exprs to 
retrieve
 >         expressions from subquery instead of using the select list of 
the subquery
 >         (which is not resolved after first execution of query).
 >
 >         In make_join_select(), cleaned up subset test (sj_inner_tables in
 >         used_tables).
 >
 >         create_subq_in_equalities() has been renamed to 
create_subq_equalities(),
 >         because it may be used for trivially-correlated expressions 
as well
 >         as equalities generated from IN predicates in the future.
 >
 >         In setup_sj_materialization(), use sj_inner_exprs to retrieve
 >         expressions from subquery instead of using the select list of 
the subquery.
 >         Use sj_outer_exprs instead of 
emb_sj_nest->sj_subq_pred->left_expr.
 >
 >         In advance_sj_state(), simplified some expressions by taking 
advantage
 >         of the fact that sj_corr_tables is always a true subset of 
sj_depends_on.
 >
 >       sql/sql_select.h
 >         Modified type of sj_subselects, so that EXISTS predicates can 
be semijoin-
 >         transformed in the future.
 >
 >       sql/table.h
 >         Removed sj_in_exprs (size of expression lists can be used 
instead).
 >         Changed type of sj_subq_pred so that EXISTS predicates can be 
handled.
 >         Added the list sj_inner_exprs to replace the use of the 
subquery's
 >         select list. This also opens for processing of 
trivially-correlated
 >         expressions in the future.
 >         Renamed sj_outer_expr_list to sj_outer_exprs.
 >
 >     modified:
 >       sql/item_subselect.cc
 >       sql/item_subselect.h
 >       sql/sql_lex.cc
 >       sql/sql_select.cc
 >       sql/sql_select.h
 >       sql/table.h
 > === modified file 'sql/item_subselect.cc'
 > --- a/sql/item_subselect.cc	2010-03-01 10:04:07 +0000
 > +++ b/sql/item_subselect.cc	2010-03-05 09:43:47 +0000
 > @@ -298,8 +298,8 @@ bool Item_subselect::exec()
 >  bool Item_in_subselect::exec()
 >  {
 >    DBUG_ENTER("Item_in_subselect::exec");
 > -  DBUG_ASSERT(exec_method != MATERIALIZATION ||
 > -              (exec_method == MATERIALIZATION &&
 > +  DBUG_ASSERT(exec_method != EXEC_MATERIALIZATION ||
 > +              (exec_method == EXEC_MATERIALIZATION &&
 >                 engine->engine_type() == 
subselect_engine::HASH_SJ_ENGINE));
 >    /*
 >      Initialize the cache of the left predicate operand. This has to 
be done as
 > @@ -313,7 +313,7 @@ bool Item_in_subselect::exec()
 >      - on a cost-based basis, that takes into account the cost of a cache
 >        lookup, the cache hit rate, and the savings per cache hit.
 >    */
 > -  if (!left_expr_cache && exec_method == MATERIALIZATION)
 > +  if (!left_expr_cache && exec_method == EXEC_MATERIALIZATION)
 >      init_left_expr_cache();
 >
 >    /* If the new left operand is already in the cache, reuse the old 
result. */
 > @@ -693,7 +693,9 @@ bool Item_singlerow_subselect::val_bool(
 >
 >
 >  Item_exists_subselect::Item_exists_subselect(st_select_lex *select_lex):
 > -  Item_subselect()
 > +  Item_subselect(), value(FALSE), exec_method(EXEC_UNSPECIFIED),
 > +     sj_convert_priority(0), emb_on_expr_nest(NULL), 
expr_join_nest(NULL),
 > +     types_allow_materialization(FALSE), sjm_scan_allowed(FALSE)

This initialization is duplicated between contructors.  Would it be
better to have an init function?  (I am not sure myself)

 >  {
 >    DBUG_ENTER("Item_exists_subselect::Item_exists_subselect");
 >    bool val_bool();
 > @@ -701,7 +703,6 @@ Item_exists_subselect::Item_exists_subse
 >    max_columns= UINT_MAX;
 >    null_value= 0; //can't be NULL
 >    maybe_null= 0; //can't be NULL
 > -  value= 0;
 >    DBUG_VOID_RETURN;
 >  }
 >
 > @@ -731,8 +732,7 @@ bool Item_in_subselect::test_limit(st_se
 >  Item_in_subselect::Item_in_subselect(Item * left_exp,
 >  				     st_select_lex *select_lex):
 >    Item_exists_subselect(), left_expr_cache(0), first_execution(TRUE),
 > -  optimizer(0), pushed_cond_guards(NULL), exec_method(NOT_TRANSFORMED),
 > -  upper_item(0)
 > +  optimizer(0), pushed_cond_guards(NULL), upper_item(0)
 >  {
 >    DBUG_ENTER("Item_in_subselect::Item_in_subselect");
 >    left_expr= left_exp;
 > @@ -1141,7 +1141,7 @@ Item_in_subselect::single_value_transfor
 >      If this IN predicate can be computed via materialization, do not
 >      perform the IN -> EXISTS transformation.
 >    */
 > -  if (exec_method == MATERIALIZATION)
 > +  if (exec_method == EXEC_MATERIALIZATION)
 >      DBUG_RETURN(RES_OK);
 >
 >    /* Perform the IN=>EXISTS transformation. */
 > @@ -1420,7 +1420,7 @@ Item_in_subselect::row_value_transformer
 >      If this IN predicate can be computed via materialization, do not
 >      perform the IN -> EXISTS transformation.
 >    */
 > -  if (exec_method == MATERIALIZATION)
 > +  if (exec_method == EXEC_MATERIALIZATION)
 >      DBUG_RETURN(RES_OK);
 >
 >    /* Perform the IN=>EXISTS transformation. */
 > @@ -1736,8 +1736,8 @@ Item_in_subselect::select_in_like_transf
 >      If we didn't choose an execution method up to this point, we choose
 >      the IN=>EXISTS transformation.
 >    */
 > -  if (exec_method == NOT_TRANSFORMED)
 > -    exec_method= IN_TO_EXISTS;
 > +  if (exec_method == EXEC_UNSPECIFIED)
 > +    exec_method= EXEC_EXISTS;
 >    arena= thd->activate_stmt_arena_if_needed(&backup);
 >
 >    /*
 > @@ -1771,7 +1771,7 @@ err:
 >
 >  void Item_in_subselect::print(String *str, enum_query_type query_type)
 >  {
 > -  if (exec_method == IN_TO_EXISTS)
 > +  if (exec_method == EXEC_EXISTS)
 >      str->append(STRING_WITH_LEN("<exists>"));
 >    else
 >    {
 > @@ -1786,7 +1786,7 @@ bool Item_in_subselect::fix_fields(THD *
 >  {
 >    bool result = 0;
 >
 > -  if (exec_method == SEMI_JOIN)
 > +  if (exec_method == EXEC_SEMI_JOIN)
 >      return !( (*ref)= new Item_int(1));
 >
 >    if (thd_arg->lex->view_prepare_mode && left_expr &&
!left_expr->fixed)
 > @@ -1852,7 +1852,7 @@ bool Item_in_subselect::setup_engine()
 >        */
 >        delete new_engine;
 >        new_engine= NULL;
 > -      exec_method= NOT_TRANSFORMED;
 > +      exec_method= EXEC_UNSPECIFIED;
 >        if (left_expr->cols() == 1)
 >          trans_res= 
single_value_in_to_exists_transformer(old_engine->join,
 >                                                           &eq_creator);
 > @@ -1960,7 +1960,7 @@ bool Item_in_subselect::init_left_expr_c
 >
 >  bool Item_in_subselect::is_expensive_processor(uchar *arg)
 >  {
 > -  return exec_method == MATERIALIZATION;
 > +  return exec_method == EXEC_MATERIALIZATION;
 >  }
 >
 >
 > @@ -1968,7 +1968,7 @@ Item_subselect::trans_res
 >  Item_allany_subselect::select_transformer(JOIN *join)
 >  {
 >    DBUG_ENTER("Item_allany_subselect::select_transformer");
 > -  exec_method= IN_TO_EXISTS;
 > +  exec_method= EXEC_EXISTS;
 >    if (upper_item)
 >      upper_item->show= 1;
 >    DBUG_RETURN(select_in_like_transformer(join, func));
 > @@ -1977,7 +1977,7 @@ Item_allany_subselect::select_transforme
 >
 >  void Item_allany_subselect::print(String *str, enum_query_type 
query_type)
 >  {
 > -  if (exec_method == IN_TO_EXISTS)
 > +  if (exec_method == EXEC_EXISTS)
 >      str->append(STRING_WITH_LEN("<exists>"));
 >    else
 >    {
 >
 > === modified file 'sql/item_subselect.h'
 > --- a/sql/item_subselect.h	2010-01-13 06:34:01 +0000
 > +++ b/sql/item_subselect.h	2010-03-05 09:43:47 +0000
 > @@ -77,7 +77,10 @@ public:
 >    /* subquery is transformed */
 >    bool changed;

I suggest using doxygen syntax for all the declarations below.

 >
 > -  /* TRUE <=> The underlying SELECT is correlated w.r.t some 
ancestor select */
 > +  /*
 > +    TRUE <=> The underlying SELECT is correlated w.r.t some ancestor 
select.
 > +    Note that this field is not maintained in semijoin-transformed 
subqueries.
 > +   */
 >    bool is_correlated;
 >
 >    enum trans_res {RES_OK, RES_REDUCE, RES_ERROR};
 > @@ -237,9 +240,56 @@ protected:
 >    bool value; /* value of this item (boolean: exists/not-exists) */
 >
 >  public:
 > +  /*
 > +    The method chosen to execute the predicate, currently used for 
IN, =ANY
 > +    and EXISTS predicates.
 > +  */
 > +  enum enum_exec_method {
 > +    EXEC_UNSPECIFIED, /* No execution method specified yet. */
 > +    EXEC_SEMI_JOIN,   /* Predicate is converted to semi-join nest. */
 > +    EXEC_EXISTS,      /* IN was converted to correlated EXISTS. */
 > +    EXEC_MATERIALIZATION /* Predicate executed via subquery 
materialization. */
 > +  };
 > +  enum_exec_method exec_method;
 > +  /* Priority of this predicate in the convert-to-semi-join-nest 
process. */
 > +  int sj_convert_priority;
 > +  /*
 > +    Used by subquery optimizations to keep track about in which 
clause this
 > +    subquery predicate is located:
 > +      (TABLE_LIST*) 1   - the predicate is an AND-part of the WHERE
 > +      join nest pointer - the predicate is an AND-part of ON expression
 > +                          of a join nest
 > +      NULL              - for all other locations
 > +    See also THD::emb_on_expr_nest.
 > +  */
 > +  TABLE_LIST *emb_on_expr_nest;
 > +  /*
 > +    Location of the subquery predicate. It is either
 > +     - pointer to join nest if the subquery predicate is in the ON 
expression
 > +     - (TABLE_LIST*)1 if the predicate is in the WHERE.
 > +  */
 > +  TABLE_LIST *expr_join_nest;

Description of the above two pointers are almost identical. Do you
know what is the difference?  Why are both needed?  Would be good to
have a comment about that.

 > +  /*
 > +    Types of left_expr and subquery's select list allow to perform 
subquery
 > +    materialization. Currently, we set this to FALSE when it as well 
could
 > +    be TRUE. This is to be properly addressed with fix for BUG#36752.
 > +  */
 > +  bool types_allow_materialization;

Is this variable really needed anymore?  It is only checked once, why
not call semijoin_types_allow_materialization() at that point instead?


 > +  /* Same as above, but they also allow to scan the materialized 
table. */

I suggest replacing "they" with something more specific here.

 > +  bool sjm_scan_allowed;
 > +
 >    Item_exists_subselect(st_select_lex *select_lex);
 > -  Item_exists_subselect(): Item_subselect() {}
 > +  Item_exists_subselect()
 > +    :Item_subselect(), value(FALSE), exec_method(EXEC_UNSPECIFIED),
 > +     sj_convert_priority(0), emb_on_expr_nest(NULL), 
expr_join_nest(NULL),
 > +     types_allow_materialization(FALSE), sjm_scan_allowed(FALSE)
 > +  {}
 >
 > +  virtual trans_res select_transformer(JOIN *join)
 > +  {
 > +    exec_method= EXEC_EXISTS;
 > +    return RES_OK;
 > +  }
 >    subs_type substype() { return EXISTS_SUBS; }
 >    void reset()
 >    {
 > @@ -299,45 +349,6 @@ protected:
 >  public:
 >    /* Used to trigger on/off conditions that were pushed down to 
subselect */
 >    bool *pushed_cond_guards;
 > -
 > -  /* Priority of this predicate in the convert-to-semi-join-nest 
process. */
 > -  int sj_convert_priority;
 > -  /*
 > -    Used by subquery optimizations to keep track about in which 
clause this
 > -    subquery predicate is located:
 > -      (TABLE_LIST*) 1   - the predicate is an AND-part of the WHERE
 > -      join nest pointer - the predicate is an AND-part of ON expression
 > -                          of a join nest
 > -      NULL              - for all other locations
 > -    See also THD::emb_on_expr_nest.
 > -  */
 > -  TABLE_LIST *emb_on_expr_nest;
 > -  /*
 > -    Location of the subquery predicate. It is either
 > -     - pointer to join nest if the subquery predicate is in the ON 
expression
 > -     - (TABLE_LIST*)1 if the predicate is in the WHERE.
 > -  */
 > -  TABLE_LIST *expr_join_nest;
 > -  /*
 > -    Types of left_expr and subquery's select list allow to perform 
subquery
 > -    materialization. Currently, we set this to FALSE when it as well 
could
 > -    be TRUE. This is to be properly addressed with fix for BUG#36752.
 > -  */
 > -  bool types_allow_materialization;
 > -
 > -  /*
 > -    Same as above, but they also allow to scan the materialized table.
 > -  */
 > -  bool sjm_scan_allowed;
 > -
 > -  /* The method chosen to execute the IN predicate.  */
 > -  enum enum_exec_method {
 > -    NOT_TRANSFORMED, /* No execution method was chosen for this IN. */
 > -    SEMI_JOIN,   /* IN was converted to semi-join nest and should be 
removed. */
 > -    IN_TO_EXISTS, /* IN was converted to correlated EXISTS. */
 > -    MATERIALIZATION /* IN will be executed via subquery 
materialization. */
 > -  };
 > -  enum_exec_method exec_method;
 >
 >    bool *get_cond_guard(int i)
 >    {
 > @@ -355,8 +366,7 @@ public:
 >    Item_in_subselect(Item * left_expr, st_select_lex *select_lex);
 >    Item_in_subselect()
 >      :Item_exists_subselect(), left_expr_cache(0), first_execution(TRUE),
 > -    optimizer(0), abort_on_null(0), pushed_cond_guards(NULL),
 > -    exec_method(NOT_TRANSFORMED), upper_item(0)
 > +    optimizer(0), abort_on_null(0), pushed_cond_guards(NULL), 
upper_item(0)
 >    {}
 >    void cleanup();
 >    subs_type substype() { return IN_SUBS; }
 >
 > === modified file 'sql/sql_lex.cc'
 > --- a/sql/sql_lex.cc	2010-02-25 13:55:28 +0000
 > +++ b/sql/sql_lex.cc	2010-03-05 09:43:47 +0000
 > @@ -2168,7 +2168,7 @@ void st_select_lex::print_limit(THD *thd
 >                    */
 >                    (((subs_type == Item_subselect::IN_SUBS) &&
 >                      ((Item_in_subselect*)item)->exec_method ==
 > -                    Item_in_subselect::MATERIALIZATION) ?
 > +                    Item_in_subselect::EXEC_MATERIALIZATION) ?
 >                     TRUE :
 >                     (select_limit->val_int() == 1LL) &&
 >                     offset_limit == 0));
 >
 > === modified file 'sql/sql_select.cc'
 > --- a/sql/sql_select.cc	2010-03-01 10:04:07 +0000
 > +++ b/sql/sql_select.cc	2010-03-05 09:43:47 +0000
 > @@ -256,7 +256,13 @@ static uint make_join_orderinfo(JOIN *jo
 >  static int
 >  join_read_record_no_init(JOIN_TAB *tab);
 >  static
 > -bool subquery_types_allow_materialization(Item_in_subselect *in_subs);
 > +bool subquery_types_allow_materialization(Item_in_subselect *predicate);
 > +static
 > +void semijoin_types_allow_materialization(TABLE_LIST *sj_nest);
 > +static
 > +bool types_allow_materialization(Item *outer, Item *inner);
 > +static
 > +bool resolve_subquery(THD *thd, JOIN *join);
 >  int do_sj_reset(SJ_TMP_TABLE *sj_tbl);
 >  TABLE *create_duplicate_weedout_tmp_table(THD *thd, uint 
uniq_tuple_length_arg,
 >                                            SJ_TMP_TABLE *sjtbl);
 > @@ -575,171 +581,11 @@ JOIN::prepare(Item ***rref_pointer_array
 >      thd->lex->allow_sum_func= save_allow_sum_func;
 >    }
 >
 > -  /*
 > -    If
 > -      1) this join is inside a subquery (of any type except FROM-clause
 > -         subquery) and
 > -      2) we aren't just normalizing a VIEW
 > -      2) we aren't in a DESCRIBE
 > -
 > -    Then perform early unconditional subquery transformations:
 > -     - Convert subquery predicate into semi-join, or
 > -     - Mark the subquery for execution using materialization, or
 > -     - Perform IN->EXISTS transformation, or
 > -     - Perform more/less ALL/ANY -> MIN/MAX rewrite
 > -     - Substitute trivial scalar-context subquery with its value
 > -
 > -    TODO: for PS, make the whole block execute only on the first 
execution
 > -  */
 > -  Item_subselect *subselect;
 > -  if (!thd->lex->view_prepare_mode &&                    // (1)
 > -      (subselect= select_lex->master_unit()->item) &&    // (2)
 > -      !(select_options & SELECT_DESCRIBE))
 > +  if (select_lex->master_unit()->item)
 >    {
 > -    Item_in_subselect *in_subs= NULL;
 > -    if (subselect->substype() == Item_subselect::IN_SUBS)
 > -      in_subs= (Item_in_subselect*)subselect;
 > -
 > -    /* Resolve expressions and perform semantic analysis for IN query */
 > -    if (in_subs != NULL)
 > -      /*
 > -        TODO: Add the condition below to this if statement when we 
have proper
 > -        support for is_correlated handling for materialized semijoins.
 > -        If we were to add this condition now, the fix_fields() call in
 > -        convert_subq_to_sj() would force the flag is_correlated to 
be set
 > -        erroneously for prepared queries.
 > -
 > -        thd->stmt_arena->state != Query_arena::PREPARED)
 > -      */
 > -    {
 > -      /*
 > -        Check if the left and right expressions have the same # of
 > -        columns, i.e. we don't have a case like
 > -          (oe1, oe2) IN (SELECT ie1, ie2, ie3 ...)
 > -
 > -        TODO why do we have this duplicated in IN->EXISTS transformers?
 > -        psergey-todo: fix these: grep for 
duplicated_subselect_card_check
 > -      */
 > -      if (select_lex->item_list.elements != in_subs->left_expr->cols())
 > -      {
 > -        my_error(ER_OPERAND_COLUMNS, MYF(0), 
in_subs->left_expr->cols());
 > -        DBUG_RETURN(-1);
 > -      }
 > -
 > -      SELECT_LEX *current= thd->lex->current_select;
 > -      thd->lex->current_select= current->return_after_parsing();
 > -      char const *save_where= thd->where;
 > -      thd->where= "IN/ALL/ANY subquery";
 > -
 > -      bool failure= !in_subs->left_expr->fixed &&
 > -                     in_subs->left_expr->fix_fields(thd, 
&in_subs->left_expr);
 > -      thd->lex->current_select= current;
 > -      thd->where= save_where;
 > -      if (failure)
 > -        DBUG_RETURN(-1); /* purecov: deadcode */
 > -    }
 > -    DBUG_PRINT("info", ("Checking if subq can be converted to 
semi-join"));
 > -    /*
 > -      Check if we're in subquery that is a candidate for flattening 
into a
 > -      semi-join (which is done in flatten_subqueries()). The
 > -      requirements are:
 > -        1. Subquery predicate is an IN/=ANY subq predicate
 > -        2. Subquery is a single SELECT (not a UNION)
 > -        3. Subquery does not have GROUP BY or ORDER BY
 > -        4. Subquery does not use aggregate functions or HAVING
 > -        5. Subquery predicate is at the AND-top-level of ON/WHERE clause
 > -        6. We are not in a subquery of a single table UPDATE/DELETE 
that
 > -             doesn't have a JOIN (TODO: We should handle this at some
 > -             point by switching to multi-table UPDATE/DELETE)
 > -        7. We're not in a confluent table-less subquery, like 
"SELECT 1".
 > -        8. No execution method was already chosen (by a prepared 
statement)
 > -        9. Parent select is not a confluent table-less select
 > -        10. Neither parent nor child select have STRAIGHT_JOIN option.
 > -    */
 > -    if (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_SEMIJOIN) &&
 > -        in_subs && 
   // 1
 > -        !select_lex->is_part_of_union() && 
   // 2
 > -        !select_lex->group_list.elements && !order && 
   // 3
 > -        !having && !select_lex->with_sum_func && 
   // 4
 > -        thd->thd_marker.emb_on_expr_nest && 
   // 5
 > -        select_lex->outer_select()->join && 
   // 6
 > -        select_lex->master_unit()->first_select()->leaf_tables && 
   // 7
 > -        in_subs->exec_method == Item_in_subselect::NOT_TRANSFORMED 
&& // 8
 > -        select_lex->outer_select()->leaf_tables && 
   // 9
 > -        !((select_options | 
select_lex->outer_select()->join->select_options)
 > -          & SELECT_STRAIGHT_JOIN)) 
   // 10
 > -    {
 > -      DBUG_PRINT("info", ("Subquery is semi-join conversion 
candidate"));
 > -
 > -      (void)subquery_types_allow_materialization(in_subs);
 > -
 > -      in_subs->emb_on_expr_nest= thd->thd_marker.emb_on_expr_nest;
 > -
 > -      /* Register the subquery for further processing in 
flatten_subqueries() */
 > -      select_lex->
 > -        outer_select()->join->sj_subselects.append(thd->mem_root, 
in_subs);
 > -      in_subs->expr_join_nest= thd->thd_marker.emb_on_expr_nest;
 > -    }
 > -    else
 > -    {
 > -      DBUG_PRINT("info", ("Subquery can't be converted to semi-join"));
 > -      /*
 > -        Check if the subquery predicate can be executed via 
materialization.
 > -        The required conditions are:
 > -        1. Subquery predicate is an IN/=ANY subq predicate
 > -        2. Subquery is a single SELECT (not a UNION)
 > -        3. Subquery is not a table-less query. In this case there is no
 > -           point in materializing.
 > -          3A The upper query is not a confluent SELECT ... FROM DUAL. We
 > -             can't do materialization for SELECT .. FROM DUAL because it
 > -             does not call setup_subquery_materialization(). We 
could make
 > -             SELECT ... FROM DUAL call that function but that 
doesn't seem
 > -             to be the case that is worth handling.
 > -        4. Subquery predicate is a top-level predicate
 > -           (this implies it is not negated)
 > -           TODO: this is a limitation that should be lifted once we
 > -           implement correct NULL semantics (WL#3830)
 > -        5. Subquery is non-correlated
 > -           TODO:
 > -           This is an overly restrictive condition. It can be 
extended to:
 > -           (Subquery is non-correlated ||
 > -            Subquery is correlated to any query outer to IN predicate ||
 > -            (Subquery is correlated to the immediate outer query &&
 > -             Subquery !contains {GROUP BY, ORDER BY [LIMIT],
 > -             aggregate functions}) && subquery predicate is not 
under "NOT IN"))
 > -        6. No execution method was already chosen (by a prepared 
statement).
 > -
 > -        (*) The subquery must be part of a SELECT statement. The current
 > -             condition also excludes multi-table update statements.
 > -
 > -        We have to determine whether we will perform subquery 
materialization
 > -        before calling the IN=>EXISTS transformation, so that we 
know whether to
 > -        perform the whole transformation or only that part of it 
which wraps
 > -        Item_in_subselect in an Item_in_optimizer.
 > -      */
 > -      if 
(thd->optimizer_switch_flag(OPTIMIZER_SWITCH_MATERIALIZATION) &&
 > -          in_subs  && 
     // 1
 > -          !select_lex->is_part_of_union() && 
     // 2
 > -          select_lex->master_unit()->first_select()->leaf_tables
&& 
     // 3
 > -          thd->lex->sql_command == SQLCOM_SELECT && 
     // *
 > -          select_lex->outer_select()->leaf_tables && 
     // 3A
 > -          subquery_types_allow_materialization(in_subs))
 > -      {
 > -        // psergey-todo: duplicated_subselect_card_check: where it's 
done?
 > -        if (in_subs->is_top_level_item() && 
     // 4
 > -            !in_subs->is_correlated && 
     // 5
 > -            in_subs->exec_method == 
Item_in_subselect::NOT_TRANSFORMED) // 6
 > -          in_subs->exec_method= Item_in_subselect::MATERIALIZATION;
 > -      }
 > -
 > -      Item_subselect::trans_res trans_res;
 > -      if ((trans_res= subselect->select_transformer(this)) !=
 > -          Item_subselect::RES_OK)
 > -      {
 > -        select_lex->fix_prepare_information(thd, &conds, &having);
 > -        DBUG_RETURN((trans_res == Item_subselect::RES_ERROR));
 > -      }
 > -    }
 > +    /* Join object is a subquery within an IN/ANY/ALL/EXISTS 
predicate */
 > +    if (resolve_subquery(thd, this))
 > +      DBUG_RETURN(-1);
 >    }
 >
 >    select_lex->fix_prepare_information(thd, &conds, &having);
 > @@ -903,9 +749,189 @@ err:
 >
 >
 >  /**
 > -  @brief Check if subquery's compared types allow materialization.
 > +  @brief Resolve predicate involving subquery
 > +
 > +  @param thd     Pointer to THD.
 > +  @param join    Join that is part of a subquery predicate.
 > +
 > +  @retval FALSE  Success.
 > +  @retval TRUE   Error.
 > +
 > +  @details
 > +  Perform early unconditional subquery transformations:
 > +   - Convert subquery predicate into semi-join, or
 > +   - Mark the subquery for execution using materialization, or
 > +   - Perform IN->EXISTS transformation, or
 > +   - Perform more/less ALL/ANY -> MIN/MAX rewrite
 > +   - Substitute trivial scalar-context subquery with its value
 > +
 > +  TODO: for PS, make the whole block execute only on the first execution
 > +
 > +*/
 > +
 > +static
 > +bool resolve_subquery(THD *thd, JOIN *join)
 > +
 > +{
 > +  DBUG_ENTER("resolve_subquery");
 > +
 > +  SELECT_LEX *select_lex= join->select_lex;
 > +
 > +  /* Return if we are just normalizing a view */
 > +  if (thd->lex->view_prepare_mode)
 > +    DBUG_RETURN(FALSE);
 > +  /* Return if the select lex is tagged as within a DESCRIBE */
 > +  if (join->select_options & SELECT_DESCRIBE)
 > +    DBUG_RETURN(FALSE);
 > +
 > +  Item_subselect *subq_predicate= select_lex->master_unit()->item;
 > +  DBUG_ASSERT(subq_predicate);
 > +
 > +  /* in_exists_predicate is non-NULL for IN, =ANY and EXISTS 
predicates */
 > +  Item_exists_subselect *in_exists_predicate=
 > +    (subq_predicate->substype() == Item_subselect::IN_SUBS ||
 > +     subq_predicate->substype() == Item_subselect::EXISTS_SUBS) ?
 > +        (Item_exists_subselect*)subq_predicate :
 > +        NULL;
 > +
 > +  /* in_predicate is non-NULL for IN and =ANY predicates */
 > +  Item_in_subselect *in_predicate=
 > +    (subq_predicate->substype() == Item_subselect::IN_SUBS) ?
 > +        (Item_in_subselect *)subq_predicate :
 > +        NULL;

I think having to pointers that point to the same is a source of
confusion, and especially when you use both in the same if-statement.
It seems to me that in_predicate is strictly not necessary, you can
use in_predicate where applicable and use the test
(subq_predicate->substype() == Item_subselect::IN_SUBS) when a
distinction is needed.

In general, I do not think infering state from a null-pointer is a
very good practice.

 > +
 > +  if (in_predicate != NULL)

I suggest replacing the above with

if (subq_predicate->substype() == Item_subselect::IN_SUBS)

 > +  {
 > +    /*
 > +      Check if the left and right expressions have the same # of
 > +      columns, i.e. we don't have a case like
 > +        (oe1, oe2) IN (SELECT ie1, ie2, ie3 ...)
 > +
 > +      TODO why do we have this duplicated in IN->EXISTS transformers?
 > +      psergey-todo: fix these: grep for duplicated_subselect_card_check
 > +    */
 > +    if (select_lex->item_list.elements != 
in_predicate->left_expr->cols())
 > +    {
 > +      my_error(ER_OPERAND_COLUMNS, MYF(0), 
in_predicate->left_expr->cols());
 > +      DBUG_RETURN(TRUE);
 > +    }
 > +
 > +    SELECT_LEX *current= thd->lex->current_select;
 > +    thd->lex->current_select= current->return_after_parsing();
 > +    char const *save_where= thd->where;
 > +    thd->where= "IN/ALL/ANY subquery";
 > +
 > +    bool result= !in_predicate->left_expr->fixed &&
 > +                  in_predicate->left_expr->fix_fields(thd,
 > + 
&in_predicate->left_expr);
 > +    thd->lex->current_select= current;
 > +    thd->where= save_where;
 > +    if (result)
 > +      DBUG_RETURN(TRUE); /* purecov: deadcode */
 > +  }
 > +
 > +  DBUG_PRINT("info", ("Checking if subq can be converted to 
semi-join"));
 > +  /*
 > +    Check if we're in subquery that is a candidate for flattening into a
 > +    semi-join (which is done in flatten_subqueries()). The 
requirements are:
 > +      1. Subquery predicate is an IN/=ANY subquery predicate
 > +      2. Subquery is a single SELECT (not a UNION)
 > +      3. Subquery does not have GROUP BY or ORDER BY
 > +      4. Subquery does not use aggregate functions or HAVING
 > +      5. Subquery predicate is at the AND-top-level of ON/WHERE clause
 > +      6. We are not in a subquery of a single table UPDATE/DELETE that
 > +           doesn't have a JOIN (TODO: We should handle this at some
 > +           point by switching to multi-table UPDATE/DELETE)
 > +      7. We're not in a confluent table-less subquery, like "SELECT 1".
 > +      8. No execution method was already chosen (by a prepared 
statement)
 > +      9. Parent select is not a confluent table-less select
 > +      10. Neither parent nor child select have STRAIGHT_JOIN option.
 > +  */
 > +  if (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_SEMIJOIN) &&
 > +      in_predicate && 
     // 1
 > +      !select_lex->is_part_of_union() && 
     // 2
 > +      !select_lex->group_list.elements && !join->order && 
     // 3
 > +      !join->having && !select_lex->with_sum_func && 
     // 4
 > +      thd->thd_marker.emb_on_expr_nest && 
     // 5
 > +      select_lex->outer_select()->join && 
     // 6
 > +      select_lex->master_unit()->first_select()->leaf_tables && 
     // 7
 > +      in_exists_predicate->exec_method ==
 > +                           Item_exists_subselect::EXEC_UNSPECIFIED 
&&   // 8

Test #8 relies on the assumption that in_exists_predicate is non-null
when in_predicate is non-null.  I do not like this. Why cannot Test#1
test on in_exists_predicate?  I assume this in the future is supposed
to apply to EXISTS-queries, too.

 > +      select_lex->outer_select()->leaf_tables && 
     // 9
 > +      !((join->select_options |
 > +         select_lex->outer_select()->join->select_options)
 > +        & SELECT_STRAIGHT_JOIN)) 
     // 10
 > +  {
 > +    DBUG_PRINT("info", ("Subquery is semi-join conversion candidate"));
 > +
 > +    in_exists_predicate->emb_on_expr_nest= 
thd->thd_marker.emb_on_expr_nest;
 > +    in_exists_predicate->expr_join_nest= 
thd->thd_marker.emb_on_expr_nest;
 > +
 > +    /* Register the subquery for further processing in 
flatten_subqueries() */
 > +    select_lex->outer_select()->join->
 > +      sj_subselects.append(thd->mem_root, in_exists_predicate);
 > +  }
 > +  else
 > +  {
 > +    DBUG_PRINT("info", ("Subquery can't be converted to semi-join"));
 > +    /*
 > +      Check if the subquery predicate can be executed via 
materialization.
 > +      The required conditions are:
 > +      1. Subquery predicate is an IN/=ANY subquery predicate
 > +      2. Subquery is a single SELECT (not a UNION)
 > +      3. Subquery is not a table-less query. In this case there is no
 > +         point in materializing.
 > +        3A The upper query is not a confluent SELECT ... FROM DUAL. We
 > +           can't do materialization for SELECT .. FROM DUAL because it
 > +           does not call setup_subquery_materialization(). We could 
make
 > +           SELECT ... FROM DUAL call that function but that doesn't seem
 > +           to be the case that is worth handling.
 > +      4. Subquery predicate is a top-level predicate
 > +         (this implies it is not negated)
 > +         TODO: this is a limitation that should be lifted once we
 > +         implement correct NULL semantics (WL#3830)
 > +      5. Subquery is non-correlated
 > +         TODO:
 > +         This is an overly restrictive condition. It can be extended to:
 > +         (Subquery is non-correlated ||
 > +          Subquery is correlated to any query outer to IN predicate ||
 > +          (Subquery is correlated to the immediate outer query &&
 > +           Subquery !contains {GROUP BY, ORDER BY [LIMIT],
 > +           aggregate functions}) && subquery predicate is not under 
"NOT IN"))
 > +      6. No execution method was already chosen (by a prepared 
statement).
 > +      7. Involved expression types allow materialization (temporary 
only)
 > +
 > +      (*) The subquery must be part of a SELECT statement. The current
 > +           condition also excludes multi-table update statements.
 > +
 > +      We have to determine whether we will perform subquery 
materialization
 > +      before calling the IN=>EXISTS transformation, so that we know 
whether to
 > +      perform the whole transformation or only that part of it which 
wraps
 > +      Item_in_subselect in an Item_in_optimizer.
 > +    */
 > +    if (thd->optimizer_switch_flag(OPTIMIZER_SWITCH_MATERIALIZATION) 
  &&
 > +        in_predicate && 
     // 1
 > +        !select_lex->is_part_of_union() && 
     // 2
 > +        select_lex->master_unit()->first_select()->leaf_tables && 
     // 3
 > +        thd->lex->sql_command == SQLCOM_SELECT && 
     // *
 > +        select_lex->outer_select()->leaf_tables && 
     // 3A
 > +        in_predicate->is_top_level_item() && 
     // 4
 > +        !in_predicate->is_correlated && 
     // 5
 > +        in_exists_predicate->exec_method ==
 > +                      Item_exists_subselect::EXEC_UNSPECIFIED && 
     // 6
 > +        subquery_types_allow_materialization(in_predicate)) 
     // 7
 > +      in_predicate->exec_method= 
Item_exists_subselect::EXEC_MATERIALIZATION;
 > +
 > +    if (subq_predicate->select_transformer(join) == 
Item_subselect::RES_ERROR)
 > +      DBUG_RETURN(TRUE);
 > +  }
 > +  DBUG_RETURN(FALSE);
 > +}
 > +
 > +/**
 > +  @brief Check if subquery predicate's compared types allow 
materialization.
 >
 > -  @param in_subs Subquery predicate, updated as follows:
 > +  @param predicate[in,out] subquery predicate, updated as follows:
 >      types_allow_materialization TRUE if subquery materialization is 
allowed.
 >      sjm_scan_allowed            If types_allow_materialization is TRUE,
 >                                  indicates whether it is possible to 
use subquery
 > @@ -953,49 +979,103 @@ err:
 >  */
 >
 >  static
 > -bool subquery_types_allow_materialization(Item_in_subselect *in_subs)
 > +bool subquery_types_allow_materialization(Item_in_subselect *predicate)

Almost all data accessed in this method is members of
Item_in_subselect.  This indicates that this function should really be
aw Item_in_subselect member function.


 >  {
 >    DBUG_ENTER("subquery_types_allow_materialization");
 >
 > -  DBUG_ASSERT(in_subs->left_expr->fixed);
 > -
 > -  List_iterator<Item> it(in_subs->unit->first_select()->item_list);
 > -  uint elements= in_subs->unit->first_select()->item_list.elements;
 > +  DBUG_ASSERT(predicate->left_expr->fixed);
 >
 > -  in_subs->types_allow_materialization= FALSE;  // Assign default values
 > -  in_subs->sjm_scan_allowed= FALSE;
 > +  List_iterator<Item>
it(predicate->unit->first_select()->item_list);
 > +  uint elements= predicate->unit->first_select()->item_list.elements;
 >
 >    bool all_are_fields= TRUE;
 >    for (uint i= 0; i < elements; i++)
 >    {
 > -    Item *outer= in_subs->left_expr->element_index(i);
 > +    Item *outer= predicate->left_expr->element_index(i);
 >      Item *inner= it++;

Why not have iterators for both as in
semijoin_types_allow_materialization()?

 > -    all_are_fields &= (outer->real_item()->type() == 
Item::FIELD_ITEM &&
 > -                       inner->real_item()->type() == Item::FIELD_ITEM);
 > -    if (outer->result_type() != inner->result_type())
 > +    all_are_fields&= (outer->real_item()->type() == Item::FIELD_ITEM
&&
 > +                      inner->real_item()->type() == Item::FIELD_ITEM);
 > +    if (!types_allow_materialization(outer, inner))
 >        DBUG_RETURN(FALSE);
 > -    switch (outer->result_type()) {
 > -    case STRING_RESULT:
 > -      if (outer->is_datetime() != inner->is_datetime())
 > -        DBUG_RETURN(FALSE);
 > -
 > -      if (!(outer->collation.collation == inner->collation.collation
 > -          /*&& outer->max_length <= inner->max_length */))
 > -        DBUG_RETURN(FALSE);
 > -    /*case INT_RESULT:
 > -      if (!(outer->unsigned_flag ^ inner->unsigned_flag))
 > -        DBUG_RETURN(FALSE); */
 > -    default:
 > -      ;/* suitable for materialization */
 > -    }
 >    }
 > -  in_subs->types_allow_materialization= TRUE;
 > -  in_subs->sjm_scan_allowed= all_are_fields;
 > +  predicate->types_allow_materialization= TRUE;
 > +  predicate->sjm_scan_allowed= all_are_fields;
 >    DBUG_PRINT("info",("subquery_types_allow_materialization: ok, 
allowed"));
 >    DBUG_RETURN(TRUE);
 >  }
 >
 >
 > +/**
 > +  @brief Check if semijoin's compared types allow materialization.
 > +
 > +  @param sj_nest   Semi-join nest containing information about 
correlated
 > +                   expressions.
 > +                   Subquery predicate attached to sj_nest is updated 
with
 > +                   information about whether materialization is allowed.
 > +
 > +  See subquery_types_allow_materialization() for additional 
documentation
 > +*/
 > +
 > +static
 > +void semijoin_types_allow_materialization(TABLE_LIST *sj_nest)
 > +{
 > +  DBUG_ENTER("semijoin_types_allow_materialization");
 > +
 > +  List_iterator<Item> it1(sj_nest->nested_join->sj_outer_exprs);
 > +  List_iterator<Item> it2(sj_nest->nested_join->sj_inner_exprs);
 > +
 > +  Item_exists_subselect *predicate= sj_nest->sj_subq_pred;
 > +  uint elements= sj_nest->nested_join->sj_outer_exprs.elements;
 > +
 > +  bool all_are_fields= TRUE;
 > +  for (uint i= 0; i < elements; i++)

Why do you need to find the #elements and use that for iteration?  It
seems better to me to just iterate until you reach the end of the
list.  The way it is now, there is no check that outer or inner does
not become empty before the loop is terminated.  At least, an assert
should be added.

 > +  {
 > +    Item *outer= it1++;
 > +    Item *inner= it2++;
 > +    all_are_fields &= (outer->real_item()->type() == 
Item::FIELD_ITEM &&
 > +                       inner->real_item()->type() == Item::FIELD_ITEM);
 > +    if (!types_allow_materialization(outer, inner))
 > +      DBUG_VOID_RETURN;
 > +  }
 > +  predicate->types_allow_materialization= TRUE;
 > +  predicate->sjm_scan_allowed= all_are_fields;
 > +  DBUG_PRINT("info",("semijoin_types_allow_materialization: ok, 
allowed"));
 > +  DBUG_VOID_RETURN;
 > +}

I am concerned about the duplication of logic between
subquery_types_allow_materialization() and
semijoin_types_allow_materialization().  Why not make
types_allow_materialization() take two lists or iterators as
parameters and do the iteration in this common method?

 > +
 > +
 > +/**
 > +  @brief Check if two items are compatible wrt. materialization.
 > +
 > +  @param outer Expression from outer query
 > +  @param inner Expression from inner query
 > +
 > +  @retval TRUE   If subquery types allow materialization.
 > +  @retval FALSE  Otherwise.
 > +*/
 > +
 > +static bool types_allow_materialization(Item *outer, Item *inner)
 > +
 > +{
 > +  if (outer->result_type() != inner->result_type())
 > +    return FALSE;
 > +  switch (outer->result_type()) {
 > +  case STRING_RESULT:
 > +    if (outer->is_datetime() != inner->is_datetime())
 > +      return FALSE;
 > +    if (!(outer->collation.collation == inner->collation.collation
 > +        /*&& outer->max_length <= inner->max_length */))
 > +      return FALSE;
 > +  /*case INT_RESULT:
 > +    if (!(outer->unsigned_flag ^ inner->unsigned_flag))
 > +      return FALSE; */

Do you know if there is any good reason for keeping this inactive
code?

 > +  default:
 > +    ;                 /* suitable for materialization */
 > +  }
 > +  return TRUE;
 > +}
 > +
 > +
 >  /*
 >    Remove the predicates pushed down into the subquery
 >
 > @@ -3205,8 +3285,8 @@ err:
 >  }
 >
 >
 > -int subq_sj_candidate_cmp(Item_in_subselect* const *el1,
 > -                          Item_in_subselect* const *el2)
 > +int subq_sj_candidate_cmp(Item_exists_subselect* const *el1,
 > +                          Item_exists_subselect* const *el2)
 >  {
 >    return ((*el1)->sj_convert_priority < (*el2)->sj_convert_priority) 
? 1 :
 >           ( ((*el1)->sj_convert_priority == 
(*el2)->sj_convert_priority)? 0 : -1);
 > @@ -3245,42 +3325,90 @@ void fix_list_after_tbl_changes(SELECT_L
 >  }
 >
 >
 > -/*
 > +/**
 >    Convert a subquery predicate into a TABLE_LIST semi-join nest
 >
 > -  SYNOPSIS
 > -    convert_subq_to_sj()
 > -       parent_join  Parent join, the one that has subq_pred in its 
WHERE/ON
 > -                    clause
 > -       subq_pred    Subquery predicate to be converted
 > -
 > -  DESCRIPTION
 > -    Convert a subquery predicate into a TABLE_LIST semi-join nest. 
All the
 > -    prerequisites are already checked, so the conversion is always 
successfull.
 > +  @param parent_join Parent join, which has subq_pred in its 
WHERE/ON clause.
 > +  @param subq_pred   Subquery predicate to be converted.
 > +                     This is either an IN, =ANY or EXISTS predicate.
 >
 > -    Prepared Statements: the transformation is permanent:
 > -     - Changes in TABLE_LIST structures are naturally permanent
 > -     - Item tree changes are performed on statement MEM_ROOT:
 > -        = we activate statement MEM_ROOT
 > -        = this function is called before the first 
fix_prepare_information
 > -          call.
 > +  @retval FALSE OK
 > +  @retval TRUE  Error
 >
 > -    This is intended because the criteria for subquery-to-sj 
conversion remain
 > -    constant for the lifetime of the Prepared Statement.
 > +  @details
 > +  Convert a subquery predicate into a TABLE_LIST semi-join nest. All 
the
 > +  prerequisites are already checked, so the conversion is always 
successfull.

No reason to repeat the summary in @details. Typo: "successfull"

 >
 > -  RETURN
 > -    FALSE  OK
 > -    TRUE   Out of memory error
 > +  The following transformations are performed:
 > +
 > +  1. IN/=ANY predicates on the form:
 > +
 > +  SELECT ...
 > +  FROM ot1 ... otN
 > +  WHERE (oe1, ... oeM) IN (SELECT ie1, ..., ieM)
 > +                           FROM it1 ... itK
 > +                          [WHERE inner-cond])
 > +   [AND outer-cond]
 > +  [GROUP BY ...] [HAVING ...] [ORDER BY ...]
 > +
 > +  are transformed into:
 > +
 > +  SELECT ...
 > +  FROM (ot1 ... otN) SJ (it1 ... itK)
 > +                     ON (oe1, ... oeM) = (ie1, ..., ieM)
 > +                        [AND inner-cond]
 > +  [WHERE outer-cond]
 > +  [GROUP BY ...] [HAVING ...] [ORDER BY ...]
 > +
 > +  2. EXISTS predicates on the form (this is for FUTURE implementation):
 > +
 > +  SELECT ...
 > +  FROM ot1 ... otN
 > +  WHERE EXISTS (SELECT whatever
 > +                FROM it1 ... itK
 > +                [WHERE inner-cond])
 > +   [AND outer-cond]
 > +  [GROUP BY ...] [HAVING ...] [ORDER BY ...]
 > +
 > +  are transformed into:
 > +
 > +  SELECT ...
 > +  FROM (ot1 ... otN) SJ (it1 ... itK)
 > +                       ON inner-cond
 > +  [WHERE outer-cond]
 > +  [GROUP BY ...] [HAVING ...] [ORDER BY ...]
 > +
 > +  If inner-cond in the EXISTS subquery is empty, it is replaced with 
"1=1",
 > +  which is always TRUE.
 > +
 > +  Notice that the inner-cond may contain correlated and non-correlated
 > +  expressions. Further transformations will analyze and break up such
 > +  expressions.
 > +
 > +  Prepared Statements: the transformation is permanent:
 > +   - Changes in TABLE_LIST structures are naturally permanent
 > +   - Item tree changes are performed on statement MEM_ROOT:
 > +      = we activate statement MEM_ROOT
 > +      = this function is called before the first 
fix_prepare_information call.
 > +
 > +  This is intended because the criteria for subquery-to-sj 
conversion remain
 > +  constant for the lifetime of the Prepared Statement.
 >  */
 >
 > -bool convert_subq_to_sj(JOIN *parent_join, Item_in_subselect *subq_pred)
 > +bool convert_subquery_to_semijoin(JOIN *parent_join,
 > +                                  Item_exists_subselect *subq_pred)
 >  {
 >    SELECT_LEX *parent_lex= parent_join->select_lex;
 >    TABLE_LIST *emb_tbl_nest= NULL;
 >    List<TABLE_LIST> *emb_join_list= &parent_lex->top_join_list;
 >    THD *thd= parent_join->thd;
 > -  DBUG_ENTER("convert_subq_to_sj");
 > +  DBUG_ENTER("convert_subquery_to_semijoin");
 >
 > +  // Shorthand pointer used for IN subqueries
 > +  Item_in_subselect *in_subq_pred =
 > +      subq_pred->substype() == Item_subselect::IN_SUBS ?
 > +                                  (Item_in_subselect *)subq_pred :
 > +                                  NULL;

Do you really need this pointer outside the if-block for
IN-equalities?

 >    /*
 >      1. Find out where to put the predicate into.
 >       Note: for "t1 LEFT JOIN t2" this will be t2, a leaf.
 > @@ -3336,7 +3464,7 @@ bool convert_subq_to_sj(JOIN *parent_joi
 >          A3: changes in the TABLE_LIST::outer_join will make 
everything work
 >              automatically.
 >        */
 > -      if (!(wrap_nest= alloc_join_nest(parent_join->thd)))
 > +      if (!(wrap_nest= alloc_join_nest(thd)))
 >        {
 >          DBUG_RETURN(TRUE);
 >        }
 > @@ -3381,7 +3509,7 @@ bool convert_subq_to_sj(JOIN *parent_joi
 >
 >    TABLE_LIST *sj_nest;
 >    NESTED_JOIN *nested_join;
 > -  if (!(sj_nest= alloc_join_nest(parent_join->thd)))
 > +  if (!(sj_nest= alloc_join_nest(thd)))
 >    {
 >      DBUG_RETURN(TRUE);
 >    }
 > @@ -3439,8 +3567,13 @@ bool convert_subq_to_sj(JOIN *parent_joi
 >    /* 3. Remove the original subquery predicate from the WHERE/ON */
 >
 >    // The subqueries were replaced for Item_int(1) earlier
 > -  subq_pred->exec_method=
 > -    Item_in_subselect::SEMI_JOIN;         // for subsequent executions
 > +  if (in_subq_pred != NULL)
 > +  {
 > +    // @todo: Add semi-join execution for EXISTS as well.
 > +    in_subq_pred->exec_method= Item_exists_subselect::EXEC_SEMI_JOIN;
 > +    /* Left side of IN predicate is already resolved */
 > +    DBUG_ASSERT(in_subq_pred->left_expr->fixed);
 > +  }

Why do you need to do the above here?  Can it not wait until the
big if-block for IN-equalities?

 >    /*TODO: also reset the 'with_subselect' there. */
 >
 >    /* n. Adjust the parent_join->tables counter */
 > @@ -3457,26 +3590,34 @@ bool convert_subq_to_sj(JOIN *parent_joi
 >           emb= emb->embedding)
 >        emb->select_lex= parent_join->select_lex;
 >    }
 > -  parent_join->tables += subq_lex->join->tables;
 > +  parent_join->tables+= subq_lex->join->tables;
 >
 > -  /*
 > -    Put the subquery's WHERE into semi-join's sj_on_expr
 > -    Add the subquery-induced equalities too.
 > -  */
 > -  SELECT_LEX *save_lex= thd->lex->current_select;
 > -  thd->lex->current_select=subq_lex;
 > -  if (!subq_pred->left_expr->fixed &&
 > -       subq_pred->left_expr->fix_fields(thd, &subq_pred->left_expr))
 > -    DBUG_RETURN(TRUE);
 > -  thd->lex->current_select=save_lex;
 > -
 > -  sj_nest->nested_join->sj_corr_tables= subq_pred->used_tables();
 > -  sj_nest->nested_join->sj_depends_on=  subq_pred->used_tables() |
 > - 
subq_pred->left_expr->used_tables();
 > -  sj_nest->sj_on_expr= subq_lex->where;
 > +  nested_join->sj_outer_exprs.empty();
 > +  nested_join->sj_inner_exprs.empty();
 >
 >    /*
 > -    Create the IN-equalities and inject them into semi-join's ON 
expression.
 > +    @todo: Convert the following if statement to a switch and add a case
 > +           for EXISTS queries.
 > +  */
 > +  if (subq_pred->substype() == Item_subselect::IN_SUBS)
 > +  {
 > +    /*
 > +      sj_corr_tables is supposed to contain non-trivially correlated 
tables,
 > +      but here it is set to contain all correlated tables.
 > +    */

What is the intention of this comment? Is it something that we should
fix?  If so, a TODO or something would be good.  Otherwise, a
justification is needed.

 > +    nested_join->sj_corr_tables= subq_pred->used_tables();
 > +    /*
 > +      sj_depends_on contains the set of outer tables referred in the
 > +      subquery's WHERE clause as well as tables referred in the IN 
predicate's
 > +      left-hand side.
 > +    */
 > +    nested_join->sj_depends_on=  subq_pred->used_tables() |
 > +                                 in_subq_pred->left_expr->used_tables();
 > +    /* Put the subquery's WHERE into semi-join's condition. */
 > +    sj_nest->sj_on_expr= subq_lex->where;
 > +
 > +    /*
 > +    Create the IN-equalities and inject them into semi-join's ON 
condition.
 >      Additionally, for LooseScan strategy
 >       - Record the number of IN-equalities.
 >       - Create list of pointers to (oe1, ..., ieN). We'll need the 
list to
 > @@ -3489,41 +3630,40 @@ bool convert_subq_to_sj(JOIN *parent_joi
 >         pointers to Item_direct_view_refs are guaranteed to be stable as
 >         Item_direct_view_refs doesn't substitute itself with anything in
 >         Item_direct_view_ref::fix_fields.
 > -  */
 > -  sj_nest->sj_in_exprs= subq_pred->left_expr->cols();
 > -  sj_nest->nested_join->sj_outer_expr_list.empty();
 > +    */
 >
 > -  if (subq_pred->left_expr->cols() == 1)
 > -  {
 > -    nested_join->sj_outer_expr_list.push_back(subq_pred->left_expr);
 > -    Item_func_eq *item_eq=
 > -      new Item_func_eq(subq_pred->left_expr, 
subq_lex->ref_pointer_array[0]);
 > -    item_eq->in_equality_no= 0;
 > -    sj_nest->sj_on_expr= and_items(sj_nest->sj_on_expr, item_eq);
 > -  }
 > -  else
 > -  {
 > -    for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
 > +    for (uint i= 0; i < in_subq_pred->left_expr->cols(); i++)
 >      {
 > -      nested_join->sj_outer_expr_list.push_back(subq_pred->left_expr->
 > -                                                element_index(i));
 > +      nested_join->sj_outer_exprs.push_back(in_subq_pred->left_expr->
 > +                                            element_index(i));
 > + 
nested_join->sj_inner_exprs.push_back(subq_lex->ref_pointer_array[i]);
 > +
 >        Item_func_eq *item_eq=
 > -        new Item_func_eq(subq_pred->left_expr->element_index(i),
 > +        new Item_func_eq(in_subq_pred->left_expr->element_index(i),
 >                           subq_lex->ref_pointer_array[i]);
 > +      if (item_eq == NULL)
 > +        DBUG_RETURN(TRUE);
 > +
 > +      /*
 > +        Mark this item as a subquery equality predicate. Currently, we
 > +        only check that this field is different from UINT_MAX, so a 
boolean
 > +        would do better than an integer.
 > +      */
 >        item_eq->in_equality_no= i;
 >        sj_nest->sj_on_expr= and_items(sj_nest->sj_on_expr, item_eq);
 > +      if (sj_nest->sj_on_expr == NULL)
 > +        DBUG_RETURN(TRUE);
 >      }
 > +    /* Fix the created equality and AND */
 > +    sj_nest->sj_on_expr->fix_fields(thd, &sj_nest->sj_on_expr);
 >    }
 > -  /* Fix the created equality and AND */
 > -  sj_nest->sj_on_expr->fix_fields(parent_join->thd, 
&sj_nest->sj_on_expr);
 >
 >    /*
 >      Walk through sj nest's WHERE and ON expressions and call
 >      item->fix_table_changes() for all items.
 >    */
 >    sj_nest->sj_on_expr->fix_after_pullout(parent_lex, 
&sj_nest->sj_on_expr);
 > -  fix_list_after_tbl_changes(parent_lex, 
&sj_nest->nested_join->join_list);
 > -
 > +  fix_list_after_tbl_changes(parent_lex, &nested_join->join_list);
 >
 >    /* Unlink the child select_lex so it doesn't show up in EXPLAIN: */
 >    subq_lex->master_unit()->exclude_level();
 > @@ -3531,20 +3671,25 @@ bool convert_subq_to_sj(JOIN *parent_joi
 >    DBUG_EXECUTE("where",
 >                 print_where(sj_nest->sj_on_expr,"SJ-EXPR", 
QT_ORDINARY););
 >
 > -  /* Inject sj_on_expr into the parent's WHERE or ON */
 >    if (emb_tbl_nest)
 >    {
 > +    /* Inject sj_on_expr into the parent's ON condition */
 >      emb_tbl_nest->on_expr= and_items(emb_tbl_nest->on_expr,
 >                                       sj_nest->sj_on_expr);
 > -    emb_tbl_nest->on_expr->fix_fields(parent_join->thd, 
&emb_tbl_nest->on_expr);
 > +    if (emb_tbl_nest->on_expr == NULL)
 > +      DBUG_RETURN(TRUE);
 > +    emb_tbl_nest->on_expr->fix_fields(thd, &emb_tbl_nest->on_expr);
 >    }
 >    else
 >    {
 > -    /* Inject into the WHERE */
 > +    /* Inject sj_on_expr into the parent's WHERE condition */
 >      parent_join->conds= and_items(parent_join->conds, 
sj_nest->sj_on_expr);
 > -    parent_join->conds->fix_fields(parent_join->thd, 
&parent_join->conds);
 > +    if (parent_join->conds == NULL)
 > +      DBUG_RETURN(TRUE);
 > +    parent_join->conds->fix_fields(thd, &parent_join->conds);
 >      parent_join->select_lex->where= parent_join->conds;
 >    }
 > +  semijoin_types_allow_materialization(sj_nest);

Any particular reason for calling this here?  Could it not wait until
the info is actually needed?

 >
 >    if (subq_lex->ftfunc_list->elements)
 >    {
 > @@ -3608,18 +3753,19 @@ bool convert_subq_to_sj(JOIN *parent_joi
 >  bool JOIN::flatten_subqueries()
 >  {
 >    Query_arena *arena, backup;
 > -  Item_in_subselect **in_subq;
 > -  Item_in_subselect **in_subq_end;
 > +  Item_exists_subselect **subq;
 > +  Item_exists_subselect **subq_end;
 >    DBUG_ENTER("JOIN::flatten_subqueries");
 >
 >    if (sj_subselects.elements() == 0)
 >      DBUG_RETURN(FALSE);
 >
 >    /* First, convert child join's subqueries. We proceed bottom-up 
here */
 > -  for (in_subq= sj_subselects.front(), in_subq_end= 
sj_subselects.back();
 > -       in_subq != in_subq_end; in_subq++)
 > +  for (subq= sj_subselects.front(), subq_end= sj_subselects.back();
 > +       subq != subq_end;
 > +       subq++)
 >    {
 > -    st_select_lex *child_select= (*in_subq)->get_select_lex();
 > +    st_select_lex *child_select= (*subq)->get_select_lex();
 >      JOIN *child_join= child_select->join;
 >      child_join->outer_tables = child_join->tables;
 >
 > @@ -3633,8 +3779,8 @@ bool JOIN::flatten_subqueries()
 >
 >      if (child_join->flatten_subqueries())
 >        DBUG_RETURN(TRUE);
 > -    (*in_subq)->sj_convert_priority=
 > -      (*in_subq)->is_correlated * MAX_TABLES + child_join->outer_tables;
 > +    (*subq)->sj_convert_priority=
 > +      (*subq)->is_correlated * MAX_TABLES + child_join->outer_tables;
 >    }
 >
 >    // Temporary measure: disable semi-joins when they are together 
with outer
 > @@ -3645,7 +3791,7 @@ bool JOIN::flatten_subqueries()
 >      if (tbl->on_expr || (tbl->embedding && !(embedding->sj_on_expr
&&
 >                                              !embedding->embedding)))
 >      {
 > -      in_subq= sj_subselects.front();
 > +      subq= sj_subselects.front();
 >        arena= thd->activate_stmt_arena_if_needed(&backup);
 >        goto skip_conversion;
 >      }
 > @@ -3662,66 +3808,74 @@ bool JOIN::flatten_subqueries()
 >    // #tables-in-parent-query + #tables-in-subquery < MAX_TABLES
 >    /* Replace all subqueries to be flattened with Item_int(1) */
 >    arena= thd->activate_stmt_arena_if_needed(&backup);
 > -  for (in_subq= sj_subselects.front();
 > -       in_subq != in_subq_end &&
 > -       tables + (*in_subq)->unit->first_select()->join->tables < 
MAX_TABLES;
 > -       in_subq++)
 > -  {
 > -    Item **tree= ((*in_subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
 > -                   &conds : &((*in_subq)->emb_on_expr_nest->on_expr);
 > -    if (replace_where_subcondition(this, tree, *in_subq, new 
Item_int(1),
 > +  for (subq= sj_subselects.front();
 > +       subq != subq_end &&
 > +       tables + (*subq)->unit->first_select()->join->tables < 
MAX_TABLES;
 > +       subq++)
 > +  {
 > +    Item **tree= ((*subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
 > +                   &conds : &((*subq)->emb_on_expr_nest->on_expr);
 > +    if (replace_where_subcondition(this, tree, *subq, new Item_int(1),
 >                                     FALSE))
 >        DBUG_RETURN(TRUE); /* purecov: inspected */
 >    }
 >
 > -  for (in_subq= sj_subselects.front();
 > -       in_subq != in_subq_end &&
 > -       tables + (*in_subq)->unit->first_select()->join->tables < 
MAX_TABLES;
 > -       in_subq++)
 > +  for (subq= sj_subselects.front();
 > +       subq != subq_end &&
 > +       tables + (*subq)->unit->first_select()->join->tables < 
MAX_TABLES;
 > +       subq++)
 >    {
 > -    if (convert_subq_to_sj(this, *in_subq))
 > +    if (convert_subquery_to_semijoin(this, *subq))
 >        DBUG_RETURN(TRUE);
 >    }
 >  skip_conversion:
 >    /*
 > -    3. Finalize (perform IN->EXISTS rewrite) the subqueries that we 
didn't
 > -    convert:
 > +    3. Finalize the subqueries that we did not convert,
 > +       ie. perform IN->EXISTS rewrite.
 >    */
 > -  for (; in_subq!= in_subq_end; in_subq++)
 > +  for (; subq!= subq_end; subq++)
 >    {
 > -    JOIN *child_join= (*in_subq)->unit->first_select()->join;
 > +    JOIN *child_join= (*subq)->unit->first_select()->join;
 >      Item_subselect::trans_res res;
 > -    (*in_subq)->changed= 0;
 > -    (*in_subq)->fixed= 0;
 > +    (*subq)->changed= 0;
 > +    (*subq)->fixed= 0;
 >
 >      SELECT_LEX *save_select_lex= thd->lex->current_select;
 > -    thd->lex->current_select= (*in_subq)->unit->first_select();
 > +    thd->lex->current_select= (*subq)->unit->first_select();
 >
 > -    res= (*in_subq)->select_transformer(child_join);
 > +    res= (*subq)->select_transformer(child_join);
 >
 >      thd->lex->current_select= save_select_lex;
 >
 >      if (res == Item_subselect::RES_ERROR)
 >        DBUG_RETURN(TRUE);
 >
 > -    (*in_subq)->changed= 1;
 > -    (*in_subq)->fixed= 1;
 > +    (*subq)->changed= 1;
 > +    (*subq)->fixed= 1;
 >
 > -    Item *substitute= (*in_subq)->substitution;
 > -    bool do_fix_fields= !(*in_subq)->substitution->fixed;
 > -    Item **tree= ((*in_subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
 > -                   &conds : &((*in_subq)->emb_on_expr_nest->on_expr);
 > -    if (replace_where_subcondition(this, tree, *in_subq, substitute,
 > +    Item *substitute= (*subq)->substitution;
 > +    /*
 > +      @todo: When EXISTS transformation is implemented, "substitute" 
will be
 > +      NULL here and we need to skip the remainder of the for loop 
for this
 > +      subquery. The check is also related to handling of outer joins and
 > +      may be skipped when semijoins can be combined with outer joins.
 > +    */
 > +    bool do_fix_fields= !(*subq)->substitution->fixed;
 > +    Item **tree= ((*subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
 > +                   &conds : &((*subq)->emb_on_expr_nest->on_expr);
 > +    if (replace_where_subcondition(this, tree, *subq, substitute,
 >                                     do_fix_fields))
 >        DBUG_RETURN(TRUE);
 > -    (*in_subq)->substitution= NULL;
 > +
 > +    (*subq)->substitution= NULL;
 >
 >      if (!thd->stmt_arena->is_conventional())
 >      {
 > -      tree= ((*in_subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
 > -                     &select_lex->prep_where : 
&((*in_subq)->emb_on_expr_nest->prep_on_expr);
 > +      tree= ((*subq)->emb_on_expr_nest == (TABLE_LIST*)1)?
 > +                     &select_lex->prep_where :
 > +                     &((*subq)->emb_on_expr_nest->prep_on_expr);
 >
 > -      if (replace_where_subcondition(this, tree, *in_subq, substitute,
 > +      if (replace_where_subcondition(this, tree, *subq, substitute,
 >                                       FALSE))
 >          DBUG_RETURN(TRUE);
 >      }
 > @@ -3769,7 +3923,8 @@ bool JOIN::setup_subquery_materializatio
 >            subquery_predicate->substype() == Item_subselect::IN_SUBS)
 >        {
 >          Item_in_subselect *in_subs= (Item_in_subselect*) 
subquery_predicate;
 > -        if (in_subs->exec_method == 
Item_in_subselect::MATERIALIZATION &&
 > +        if (in_subs->exec_method ==
 > +                     Item_exists_subselect::EXEC_MATERIALIZATION &&
 >              in_subs->setup_engine())

I like the previous formatting better.


 >            return TRUE;
 >        }
 > @@ -3946,7 +4101,6 @@ int pull_out_semijoin_tables(JOIN *join)
 >                Pulling a table out of uncorrelated subquery in 
general makes
 >                makes it correlated. See the NOTE to this funtion.
 >              */
 > -            sj_nest->sj_subq_pred->is_correlated= TRUE;
 >              sj_nest->nested_join->sj_corr_tables|= tbl->table->map;
 >              sj_nest->nested_join->sj_depends_on|= tbl->table->map;
 >            }
 > @@ -4634,10 +4788,16 @@ static bool optimize_semijoin_nests(JOIN
 >        /* semi-join nests with only constant tables are not valid */
 >        DBUG_ASSERT(sj_nest->sj_inner_tables & ~join->const_table_map);
 >
 > +      /*
 > +        Try semijoin materialization if
 > +        1) Not all tables have been pulled out of the semijoin nest, and
 > +        2) The semijoin is classified as non-trivially-correlated, and
 > +        3) Data types allow execution with materialization
 > +      */
 >        sj_nest->sj_mat_info= NULL;
 > -      if (sj_nest->sj_inner_tables && /* not everything was pulled 
out */
 > -          !sj_nest->sj_subq_pred->is_correlated &&
 > -           sj_nest->sj_subq_pred->types_allow_materialization)
 > +      if (sj_nest->sj_inner_tables && 
  // (1)
 > +          !sj_nest->nested_join->sj_corr_tables && 
  // (2)
 > +           sj_nest->sj_subq_pred->types_allow_materialization) 
  // (3)
 >        {
 >          join->emb_sjm_nest= sj_nest;
 >          if (choose_plan(join, all_table_map & ~join->const_table_map))
 > @@ -4890,49 +5050,41 @@ merge_key_fields(KEY_FIELD *start,KEY_FI
 >    return first_free;
 >  }
 >
 > -/*
 > +
 > +/**
 >    Given a field, return its index in semi-join's select list, or 
UINT_MAX
 >
 > -  DESCRIPTION
 > -    Given a field, we find its table; then see if the table is within a
 > -    semi-join nest and if the field was in select list of the subselect.
 > -    If it was, we return field's index in the select list. The value 
is used
 > -    by LooseScan strategy.
 > +  @param field Field that we are looking up table for
 > +
 > +  @retval =UINT_MAX Field is not from a semijoin-transformed subquery
 > +  @retval <UINT_MAX Index in select list of subquery
 > +
 > +  @details
 > +  Given a field, find its table; then see if the table is within a
 > +  semi-join nest and if the field was in select list of the subquery
 > +  (if subquery was part of a quantified comparison predicate), or
 > +  the field was a result of subquery decorrelation.
 > +  If it was, then return the field's index in the select list.
 > +  The value is used by LooseScan strategy.
 >  */
 >
 >  static uint get_semi_join_select_list_index(Field *field)
 >  {
 > -  uint res= UINT_MAX;
 >    TABLE_LIST *emb_sj_nest;
 >    if ((emb_sj_nest= field->table->pos_in_table_list->embedding) &&
 >        emb_sj_nest->sj_on_expr)
 >    {
 > -    Item_in_subselect *subq_pred= emb_sj_nest->sj_subq_pred;
 > -    st_select_lex *subq_lex= subq_pred->unit->first_select();
 > -    if (subq_pred->left_expr->cols() == 1)
 > +    List<Item> &items= emb_sj_nest->nested_join->sj_inner_exprs;
 > +    List_iterator<Item> it(items);
 > +    for (uint i= 0; i < items.elements; i++)
 >      {
 > -      Item *sel_item= subq_lex->ref_pointer_array[0];
 > +      Item *sel_item= it++;
 >        if (sel_item->type() == Item::FIELD_ITEM &&
 >            ((Item_field*)sel_item)->field->eq(field))
 > -      {
 > -        res= 0;
 > -      }
 > -    }
 > -    else
 > -    {
 > -      for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
 > -      {
 > -        Item *sel_item= subq_lex->ref_pointer_array[i];
 > -        if (sel_item->type() == Item::FIELD_ITEM &&
 > -            ((Item_field*)sel_item)->field->eq(field))
 > -        {
 > -          res= i;
 > -          break;
 > -        }
 > -      }
 > +        return i;
 >      }
 >    }
 > -  return res;
 > +  return UINT_MAX;
 >  }
 >
 >  /**
 > @@ -6013,7 +6165,7 @@ set_position(JOIN *join,uint idx,JOIN_TA
 >  ulonglong get_bound_sj_equalities(TABLE_LIST *sj_nest,
 >                                    table_map remaining_tables)
 >  {
 > -  List_iterator<Item> li(sj_nest->nested_join->sj_outer_expr_list);
 > +  List_iterator<Item> li(sj_nest->nested_join->sj_outer_exprs);
 >    Item *item;
 >    uint i= 0;
 >    ulonglong res= 0;
 > @@ -6119,10 +6271,10 @@ public:
 >      */
 >      best_loose_scan_cost= DBL_MAX;
 >      if (!join->emb_sjm_nest && s->emb_sj_nest && 
     // (1)
 > -        s->emb_sj_nest->sj_in_exprs < 64 &&
 > +        s->emb_sj_nest->nested_join->sj_inner_exprs.elements < 64
&&
 >          ((remaining_tables & s->emb_sj_nest->sj_inner_tables) == 
     // (2)
 >           s->emb_sj_nest->sj_inner_tables) && 
     // (2)
 > -        join->cur_sj_inner_tables == 0 && 
        // (3)
 > +        join->cur_sj_inner_tables == 0 && 
     // (3)
 >          !(remaining_tables &
 >            s->emb_sj_nest->nested_join->sj_corr_tables) && 
     // (4)
 >          remaining_tables & 
s->emb_sj_nest->nested_join->sj_depends_on &&// (5)
 > @@ -6184,12 +6336,13 @@ public:
 >             (handled_col|bound_col)* (other_col|bound_col)
 >
 >      */
 > -    if (try_loosescan &&                                       // (1)
 > -        (handled_sj_equalities | bound_sj_equalities) ==         // (2)
 > -        PREV_BITS(ulonglong, s->emb_sj_nest->sj_in_exprs) &&     //
(2)
 > -        (PREV_BITS(key_part_map, max_loose_keypart+1) &        // (3)
 > -         (found_part | loose_scan_keyparts)) ==                // (3)
 > -         (found_part | loose_scan_keyparts) &&                 // (3)
 > +    if (try_loosescan && 
     // (1)
 > +        (handled_sj_equalities | bound_sj_equalities) == 
     // (2)
 > +        PREV_BITS(ulonglong,
 > + 
s->emb_sj_nest->nested_join->sj_inner_exprs.elements)&& // (2)
 > +        (PREV_BITS(key_part_map, max_loose_keypart+1) & 
     // (3)
 > +         (found_part | loose_scan_keyparts)) == 
     // (3)
 > +         (found_part | loose_scan_keyparts) && 
     // (3)
 >          !key_uses_partial_cols(s->table, key))
 >      {
 >        /* Ok, can use the strategy */
 > @@ -9219,9 +9372,13 @@ static bool make_join_select(JOIN *join,
 >        if (pushdown_on_conditions(join, tab))
 >          DBUG_RETURN(1);
 >
 > -      if (save_used_tables && !(used_tables &
 > -                                ~(tab->emb_sj_nest->sj_inner_tables |
 > -                                  join->const_table_map | 
PSEUDO_TABLE_BITS)))
 > +      if (save_used_tables)
 > +      {
 > +        DBUG_ASSERT(tab->emb_sj_nest);
 > +      }
 > +      if (save_used_tables &&                 // Inside an SJM nest, and
 > +         !(tab->emb_sj_nest->sj_inner_tables &
 > +          ~used_tables))                      // All inner tables in 
prefix
 >        {
 >          /*
 >            We have reached the end of semi join nest. That is, the 
join order
 > @@ -9230,12 +9387,8 @@ static bool make_join_select(JOIN *join,
 >             outer_tbl1 SJ-Materialize(inner_tbl1 ... inner_tblN) 
outer_tbl ...
 >                                                                 ^
 > 
\-we're here
 > -          At this point, we need to produce two conditions
 > -           - A condition that can be checked when we have all of the 
sj-inner
 > -             tables (inner_tbl1 ... inner_tblN). This will be used 
while doing
 > -             materialization.
 > -           - A condition that can be checked when we have all of the 
tables
 > -             in the prefix (both inner and outer).
 > +          At this point, we need to produce a condition that can be 
checked
 > +          when we have all of the tables in the prefix (both inner 
and outer).
 >          */
 >          tab->emb_sj_nest->sj_mat_info->join_cond=
 >            cond ?

I think we have aggreed earlier to not include the above two diffs in
this patch.

 > @@ -9828,47 +9981,40 @@ void remove_sj_conds(Item **tree)
 >
 >
 >  /*
 > -  Create subquery IN-equalities assuming use of materialization strategy
 > +  Create subquery equalities assuming use of materialization strategy
 >
 > -  SYNOPSIS
 > -    create_subq_in_equalities()
 > -      thd        Thread handle
 > -      sjm        Semi-join materialization structure
 > -      subq_pred  The subquery predicate
 > +  @param thd       Thread handle
 > +  @param sjm       Semi-join materialization structure
 > +  @param sj_nest   Semi-join nest
 >
 > -  DESCRIPTION
 > -    Create subquery IN-equality predicates. That is, for a subquery
 > +  @retval <>NULL Created condition
 > +  @retval = NULL Error
 > +
 > +  @details
 > +  Create subquery equality predicates. That is, for a subquery
 >
 > -      (oe1, oe2, ...) IN (SELECT ie1, ie2, ... FROM ...)
 > +    (oe1, oe2, ...) IN (SELECT ie1, ie2, ... FROM ...)
 >
 > -    create "oe1=ie1 AND ie1=ie2 AND ..." expression, such that ie1, 
ie2, ..
 > -    refer to the columns of the table that's used to materialize the
 > -    subquery.
 > -
 > -  RETURN
 > -    Created condition
 > +  create "oe1=ie1 AND oe2=ie2 AND ..." expression, such that ie1, 
ie2, ..
 > +  refer to the columns of the table that is used to materialize the 
subquery.
 > +  This function will also generate proper equality predicates for
 > +  trivially-correlated subqueries corresponding to the above IN query.
 >  */
 >
 > -Item *create_subq_in_equalities(THD *thd, SJ_MATERIALIZATION_INFO *sjm,
 > -                                Item_in_subselect *subq_pred)
 > +Item *create_subq_equalities(THD *thd, SJ_MATERIALIZATION_INFO *sjm,
 > +                             TABLE_LIST *sj_nest)
 >  {
 >    Item *res= NULL;
 > -  if (subq_pred->left_expr->cols() == 1)
 > -  {
 > -    if (!(res= new Item_func_eq(subq_pred->left_expr,
 > -                                new Item_field(sjm->table->field[0]))))
 > -      return NULL; /* purecov: inspected */
 > -  }
 > -  else
 > +  List_iterator<Item> outer_expr(sj_nest->nested_join->sj_outer_exprs);
 > +
 > +  for (uint i= 0; i < sj_nest->nested_join->sj_outer_exprs.elements; 
i++)
 >    {
 >      Item *conj;
 > -    for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
 > -    {
 > -      if (!(conj= new 
Item_func_eq(subq_pred->left_expr->element_index(i),
 > -                                   new 
Item_field(sjm->table->field[i]))) ||
 > -          !(res= and_items(res, conj)))
 > -        return NULL; /* purecov: inspected */
 > -    }
 > +    Item *outer= outer_expr++;
 > +    if (!(conj= new Item_func_eq(outer, new 
Item_field(sjm->table->field[i]))))
 > +      return NULL; /* purecov: inspected */
 > +    if (!(res= and_items(res, conj)))
 > +      return NULL; /* purecov: inspected */
 >    }
 >    if (res->fix_fields(thd, &res))
 >      return NULL; /* purecov: inspected */
 > @@ -9905,8 +10051,7 @@ bool setup_sj_materialization(JOIN_TAB *
 >    SJ_MATERIALIZATION_INFO *sjm= emb_sj_nest->sj_mat_info;
 >    THD *thd= tab->join->thd;
 >    /* First the calls come to the materialization function */
 > -  List<Item> &item_list= 
emb_sj_nest->sj_subq_pred->unit->first_select()->item_list;
 > -
 > +  List<Item> &item_list= emb_sj_nest->nested_join->sj_inner_exprs;
 >    /*
 >      Set up the table to write to, do as 
select_union::create_result_table does
 >    */
 > @@ -9965,10 +10110,11 @@ bool setup_sj_materialization(JOIN_TAB *
 >      KEY_PART_INFO *cur_key_part= tmp_key->key_part;
 >      store_key **ref_key= tab_ref->key_copy;
 >      uchar *cur_ref_buff= tab_ref->key_buff;
 > -
 > +    List_iterator<Item> 
outer_expr(emb_sj_nest->nested_join->sj_outer_exprs);
 > +
 >      for (i= 0; i < tmp_key_parts; i++, cur_key_part++, ref_key++)
 >      {
 > -      tab_ref->items[i]= 
emb_sj_nest->sj_subq_pred->left_expr->element_index(i);
 > +      tab_ref->items[i]= outer_expr++;
 >        int null_count= test(cur_key_part->field->real_maybe_null());
 >        *ref_key= new store_key_item(thd, cur_key_part->field,
 >                                     /* TODO:
 > @@ -9999,8 +10145,7 @@ bool setup_sj_materialization(JOIN_TAB *
 >        if (tab[i].select)
 >          remove_sj_conds(&tab[i].select->cond);
 >      }
 > -    if (!(sjm->in_equality= create_subq_in_equalities(thd, sjm,
 > - 
emb_sj_nest->sj_subq_pred)))
 > +    if (!(sjm->in_equality= create_subq_equalities(thd, sjm, 
emb_sj_nest)))
 >        DBUG_RETURN(TRUE); /* purecov: inspected */
 >    }
 >    else
 > @@ -13224,11 +13369,9 @@ void advance_sj_state(JOIN *join, table_
 >        join->thd->optimizer_switch_flag(OPTIMIZER_SWITCH_FIRSTMATCH))
 >    {
 >      const table_map outer_corr_tables=
 > -      new_join_tab->emb_sj_nest->nested_join->sj_corr_tables |
 >        new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
 >      const table_map sj_inner_tables=
 >        new_join_tab->emb_sj_nest->sj_inner_tables;
 > -
 >      /*
 >        Enter condition:
 >         1. The next join tab belongs to semi-join nest
 > @@ -13323,8 +13466,7 @@ void advance_sj_state(JOIN *join, table_
 >        pos->first_loosescan_table= idx;
 >        pos->loosescan_need_tables=
 >          new_join_tab->emb_sj_nest->sj_inner_tables |
 > -        new_join_tab->emb_sj_nest->nested_join->sj_depends_on |
 > -        new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
 > +        new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
 >      }
 >
 >      if ((pos->first_loosescan_table != MAX_TABLES) &&
 > @@ -13409,8 +13551,7 @@ void advance_sj_state(JOIN *join, table_
 >        */
 >        pos->sjm_scan_need_tables=
 >          new_join_tab->emb_sj_nest->sj_inner_tables |
 > -        new_join_tab->emb_sj_nest->nested_join->sj_depends_on |
 > -        new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
 > +        new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
 >        pos->sjm_scan_last_inner= idx;
 >      }
 >      else
 > @@ -13525,8 +13666,7 @@ void advance_sj_state(JOIN *join, table_
 >          pos->first_dupsweedout_table= idx;
 >
 >        pos->dupsweedout_tables |= nest->sj_inner_tables |
 > -                                 nest->nested_join->sj_depends_on |
 > -                                 nest->nested_join->sj_corr_tables;
 > +                                 nest->nested_join->sj_depends_on;
 >      }
 >
 >      if (pos->dupsweedout_tables &&
 > @@ -18336,6 +18476,26 @@ make_cond_for_table_from_pred(COND *root
 >  }
 >
 >
 > +/**
 > +  Generate a condition that can be checked after materializing a 
semi-join nest
 > +
 > +  @param root_cond  Root condition, ancestor of the condition being 
analyzed.
 > +  @param cond       Condition to analyze.
 > +  @param tables     Tables in the outer part of the join nest.
 > +  @param sjm_tables Tables within the semi-join nest (the inner part).
 > +
 > +  @retval <>NULL Generated condition
 > +  @retval = NULL Already checked, or error
 > +
 > +  @details
 > +  A regular semi-join materialization is always non-correlated, ie
 > +  the subquery is always resolved by performing a lookup generated in
 > +  create_subq_equalities, hence this function should never produce
 > +  any condition for regular semi-join materialization.
 > +  For a scan semi-join materialization, this function may return a 
condition
 > +  to be checked.
 > +*/
 > +
 >  static COND *
 >  make_cond_after_sjm(Item *root_cond, Item *cond, table_map tables,
 >                      table_map sjm_tables)
 >
 > === modified file 'sql/sql_select.h'
 > --- a/sql/sql_select.h	2010-03-01 10:04:07 +0000
 > +++ b/sql/sql_select.h	2010-03-05 09:43:47 +0000
 > @@ -1649,7 +1649,7 @@ public:
 >    bool union_part; ///< this subselect is part of union
 >    bool optimized; ///< flag to avoid double optimization in EXPLAIN
 >
 > -  Array<Item_in_subselect> sj_subselects;
 > +  Array<Item_exists_subselect> sj_subselects;
 >
 >    /* Temporary tables used to weed-out semi-join duplicates */
 >    List<TABLE> sj_tmp_tables;
 >
 > === modified file 'sql/table.h'
 > --- a/sql/table.h	2010-02-25 13:55:28 +0000
 > +++ b/sql/table.h	2010-03-05 09:43:47 +0000
 > @@ -1147,7 +1147,7 @@ enum enum_open_type
 >
 >  class SJ_MATERIALIZATION_INFO;
 >  class Index_hint;
 > -class Item_in_subselect;
 > +class Item_exists_subselect;
 >
 >
 >  /*
 > @@ -1223,7 +1223,7 @@ struct TABLE_LIST
 >    char		*db, *alias, *table_name, *schema_table_name;
 >    char          *option;                /* Used by cache index  */
 >    Item		*on_expr;		/* Used with outer join */
 > -  Item          *sj_on_expr;
 > +  Item          *sj_on_expr;            /* Synthesized semijoin 
condition */
 >    /*
 >      (Valid only for semi-join nests) Bitmap of tables that are 
within the
 >      semi-join (this is different from bitmap of all nest's children 
because
 > @@ -1231,9 +1231,7 @@ struct TABLE_LIST
 >      nest's children).
 >    */
 >    table_map     sj_inner_tables;
 > -  /* Number of IN-compared expressions */
 > -  uint          sj_in_exprs;
 > -  Item_in_subselect  *sj_subq_pred;
 > +  Item_exists_subselect  *sj_subq_pred;
 >    SJ_MATERIALIZATION_INFO *sj_mat_info;
 >
 >    /*
 > @@ -1795,13 +1793,17 @@ typedef struct st_nested_join
 >    uint              counter_;
 >    nested_join_map   nj_map;          /* Bit used to identify this 
nested join*/
 >    /*
 > -    (Valid only for semi-join nests) Bitmap of tables outside the 
semi-join
 > -    that are used within the semi-join's ON condition.
 > +    Tables outside the semi-join that are used within the semi-join's
 > +    ON condition (ie. the subquery WHERE clause and optional IN 
equalities).
 >    */
 >    table_map         sj_depends_on;
 > -  /* Outer non-trivially correlated tables */
 > +  /* Outer non-trivially correlated tables, a true subset of 
sj_depends_on */
 >    table_map         sj_corr_tables;
 > -  List<Item>        sj_outer_expr_list;
 > +  /*
 > +    Lists of trivially-correlated expressions from the outer and 
inner tables
 > +    of the semi-join, respectively.
 > +  */
 > +  List<Item>        sj_outer_exprs, sj_inner_exprs;
 >  } NESTED_JOIN;
 >
 >
 >
 >
 >
 > ------------------------------------------------------------------------
 >
 >


That's all for now,

-- 
Øystein
Thread
bzr commit into mysql-6.0-codebase-bugfixing branch (roy.lyseng:3790) WL#5266Roy Lyseng5 Mar
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Jørgen Løland10 Mar
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Roy Lyseng19 Apr
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Øystein Grøvlen15 Apr
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Øystein Grøvlen15 Apr
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Roy Lyseng19 Apr
      • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Øystein Grøvlen21 Apr
        • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Roy Lyseng21 Apr
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Evgeny Potemkin16 Apr
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3790) WL#5266Roy Lyseng20 Apr