List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:October 22 2010 1:41pm
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)
Bug#45174 Bug#50019 Bug#52068
View as plain text  
Hi,

Thanks for the patch.  You have made a good job of understanding the
mechanisms here and an equally good job of explaining what goes on. I
learned much from this review.

My main comments:

  - I am not very comfortable with the new get_first function, neither
    its name nor its location (see comments inline).  I see that it is
    not easy to see what it the best compromise here within the limits
    of this spaghetti, but please consider whether there are other ways
    of doing this.

  - Making many different fixes in the same patch is a challenge for
    the reviewers since it is not obvious which changes leads to which
    changes in behavior (e.g., test results.)  I understand when there
    is a lot of problems with a given test case, it is bit difficult to
    split things up, but I think you should consider a separate patch
    for the fix to the scanning of the materilization table (with test
    case for Bug#52068)

See inline for more comments.


On 10/22/10 09:58 AM, Roy Lyseng wrote:
 > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on 
revid:guilhem@stripped
 >
 >   3265 Roy Lyseng	2010-10-22
 >        Bug#45174: Incorrectly applied equality propagation caused 
wrong result
 >        on a query with a materialized semi-join.
 >        Bug#50019: Wrong result for IN-query with materialization.
 >        Bug#52068: Optimizer generates invalid semijoin 
materialization plan
 >
 >        When a subquery is subject to a semijoin optimization, its tables
 >        are merged to the outer query and later are treated as regular 
tables.
 >        This allows a bunch of optimizations to be applied, equality
 >        propagation is among them. Equality propagation is done after 
query
 >        execution plan is chosen. It substitutes fields from tables being
 >        retrieved later for fields from tables being retrieved earlier.
 >        However, it can't be applied as is to any semijoin table.
 >        The semijoin materialization strategy differs from other semijoin
 >        strategies that the data from materialized semijoin tables 
isn't used
 >        directly but saved to a temporary table first.
 >        The materialization isn't isolated in a separate step, it is done
 >        inline within the nested loop execution.
 >        When it comes to fetching rows from the first table in the 
block of
 >        materialized semijoin tables, sub_select() function is called to
 >        materialize the result of the subquery and save it in the
 >        materialized table. Later, data from the materialized table is 
used
 >        as if they were regular table rows.
 >        Due to this we can't substitute fields that belong to the 
semi-join
 >        for fields from outer query and vice versa.
 >

The above explanation seems unecessary detailed/complex to me. I
cannot see how doing materialization in a separate step would help
here.

 >        Example: suppose we have a join order:
 >
 >          ot1 ot2  SJ-Mat(it1  it2  it3)  ot3
 >
 >        and equality ot2.col = it1.col = it2.col
 >        If we're looking for best substitute for 'it2.col', we should pick
 >        it1.col and not ot2.col.
 >
 >        For a field that is not in a materialized semijoin we can use 
any field,
 >        even those that are embedded in a materialized semijoin. This 
is because
 >        such fields are "copied back" to their original join-tab 
structures when
 >        the materialized temporary table is being read.
 >
 >        Now we have added another Item_equal::get_first() function 
that accepts
 >        as a parameter a field being substituted and checks whether it 
belongs
 >        to a materialized semijoin.
 >        The field to substitute will be from the same materialized 
semijoin nest
 >        (if supplied field is within such nest), otherwise it will be 
the first
 >        field in the multiple equality.
 >
 >        The new checks rely on the first_sj_inner_tab and 
first_sj_inner_tab
 >        fields of the join-tab. These fields are therefore set as soon as
 >        possible after the join strategy is fixed.

Maybe you could mentioned that they before only were used for
DuplicateWeedout strategy.

 >
 >        Also fixed problem appearing in Bug#52068: When MaterializeScan
 >        semijoin strategy was used and there were one or more outer 
dependent
 >        tables before the semijoin tables, the scan over the materialized
 >        table was not properly reset for each row of the prefix outer 
tables.
 >
 >        Also fixed problems with pushdown of SJM-aware predicates during
 >        make_join_select(): wrong predicates were sometimes generated,
 >        make_cond_after_sjm() was called at the wrong position, and
 >        make_cond_after_sjm() was never actually considering the 
pushed-down
 >        SJM predicates.
 >
 >        mysql-test/include/subquery_sj.inc
 >          Bug#45174: Incorrectly applied equality propagation caused 
wrong result
 >          on a query with a materialized semi-join.
 >          Bug#50019: Wrong result for IN-query with materialization.
 >          Bug#52068: Optimizer generates invalid semijoin 
materialization plan
 >          Added new tests.
 >
 >        mysql-test/r/subquery_mat_all.result
 >        mysql-test/r/subquery_sj_all.result
 >        mysql-test/r/subquery_sj_all_jcl6.result
 >        mysql-test/r/subquery_sj_all_jcl7.result
 >        mysql-test/r/subquery_sj_dupsweed.result
 >        mysql-test/r/subquery_sj_dupsweed_jcl6.result
 >        mysql-test/r/subquery_sj_dupsweed_jcl7.result
 >        mysql-test/r/subquery_sj_firstmatch.result
 >        mysql-test/r/subquery_sj_firstmatch_jcl6.result
 >        mysql-test/r/subquery_sj_firstmatch_jcl7.result
 >        mysql-test/r/subquery_sj_loosescan.result
 >        mysql-test/r/subquery_sj_loosescan_jcl6.result
 >        mysql-test/r/subquery_sj_loosescan_jcl7.result
 >        mysql-test/r/subquery_sj_mat.result
 >        mysql-test/r/subquery_sj_mat_jcl6.result
 >        mysql-test/r/subquery_sj_mat_jcl7.result
 >        mysql-test/r/subquery_sj_mat_nosj.result
 >        mysql-test/r/subquery_sj_none.result
 >        mysql-test/r/subquery_sj_none_jcl6.result
 >        mysql-test/r/subquery_sj_none_jcl7.result
 >          Bug#45174: Incorrectly applied equality propagation caused 
wrong result
 >          on a query with a materialized semi-join.
 >          Bug#50019: Wrong result for IN-query with materialization.
 >          Bug#52068: Optimizer generates invalid semijoin 
materialization plan
 >          Results for three new tests added.
 >          Some tests using semijoin materialization show that where clause
 >          has moved from the outer query into the materialized inner 
query.
 >          This is caused by the changed call to get_first() in
 >          eliminate_item_equal().
 >          Ex: select * from ot where a in(select b from it where b>0);
 >          The clause "b>0" is now evaluated on the inner query 
materialization.
 >          Performance-wise this is never worse when using 
MaterializeScan and
 >          usually better for MaterializeLookup. For the latter 
strategy, the

I had to read the above sentence a few times before understanding the
meaning of it.  Never worse than what?  I assume than before.  Does
usually better mean "usually better, but sometimes worse" or "usually
better, and never worse"?

 >          best possible solution is probably to evaluate the clause in 
both
 >          queries, this can be subject for a later feature development.
 >          Another test that applies the same condition to both the 
outer and
 >          the inner query is added, to show the plan for such types of 
queries.
 >          Earlier wrong semijoin materialization test results are 
corrected.
 >
 >        sql/item.cc
 >          Bug#45174: Incorrectly applied equality propagation caused 
wrong result
 >          on a query with a materialized semi-join.
 >
 >          Calling new get_first() function instead of old.
 >
 >        sql/item_cmpfunc.cc
 >          Bug#45174: Incorrectly applied equality propagation caused 
wrong result
 >          on a query with a materialized semi-join.
 >
 >          New function Item_equal::get_first() that accepts as argument
 >          a field being substituted.
 >
 >        sql/item_cmpfunc.h
 >          Bug#45174: Incorrectly applied equality propagation caused 
wrong result
 >          on a query with a materialized semi-join.
 >
 >          New function Item_equal::get_first() that accepts as argument
 >          a field being substituted.
 >
 >        sql/sql_select.cc
 >          Bug#45174: Incorrectly applied equality propagation caused 
wrong result
 >          on a query with a materialized semi-join.
 >          Bug#50019: Wrong result for IN-query with materialization.
 >          Bug#52068: Optimizer generates invalid semijoin 
materialization plan
 >
 >          Setting fields first_sj_inner_tab and last_sj_inner_tab 
moved from
 >          setup_semijoin_dups_elimination() to get_best_combination(), 
so they
 >          are set as early as possible after join order optimization.
 >
 >          In make_join_select(), the test that determined when to pushdown
 >          SJM-specific predicates was wrong, in addition to improving the
 >          comments.
 >
 >          The logic of eliminate_item_equal() has been simplified and
 >          adjusted so that it generates equalities that are useful also
 >          when the semijoin materialization strategy is being used.
 >          Some simplification was possible by taking advantage of the new
 >          Item_equal::get_first() function.
 >
 >          In sub_select_sjm(), moved code that initializes the scan 
over the
 >          materialized table so that it is now performed for each scan of
 >          table, instead of only for the first scan.
 >
 >          In make_cond_for_table_from_pred(), a number of comments has 
been
 >          added, and TAB characters are replaced by spaces.
 >
 >          In make_cond_after_sjm(), make sure that it handles equalities
 >          generated for semijoin materialization (with marker=3).
 >          Added comments and removed TAB characters.
 >
 >      modified:
 >        mysql-test/include/subquery_sj.inc
 >        mysql-test/r/subquery_mat_all.result
 >        mysql-test/r/subquery_sj_all.result
 >        mysql-test/r/subquery_sj_all_jcl6.result
 >        mysql-test/r/subquery_sj_all_jcl7.result
 >        mysql-test/r/subquery_sj_dupsweed.result
 >        mysql-test/r/subquery_sj_dupsweed_jcl6.result
 >        mysql-test/r/subquery_sj_dupsweed_jcl7.result
 >        mysql-test/r/subquery_sj_firstmatch.result
 >        mysql-test/r/subquery_sj_firstmatch_jcl6.result
 >        mysql-test/r/subquery_sj_firstmatch_jcl7.result
 >        mysql-test/r/subquery_sj_loosescan.result
 >        mysql-test/r/subquery_sj_loosescan_jcl6.result
 >        mysql-test/r/subquery_sj_loosescan_jcl7.result
 >        mysql-test/r/subquery_sj_mat.result
 >        mysql-test/r/subquery_sj_mat_jcl6.result
 >        mysql-test/r/subquery_sj_mat_jcl7.result
 >        mysql-test/r/subquery_sj_mat_nosj.result
 >        mysql-test/r/subquery_sj_none.result
 >        mysql-test/r/subquery_sj_none_jcl6.result
 >        mysql-test/r/subquery_sj_none_jcl7.result
 >        sql/item.cc
 >        sql/item_cmpfunc.cc
 >        sql/item_cmpfunc.h
 >        sql/sql_select.cc
 > === modified file 'mysql-test/include/subquery_sj.inc'
 > --- a/mysql-test/include/subquery_sj.inc	2010-10-13 13:27:36 +0000
 > +++ b/mysql-test/include/subquery_sj.inc	2010-10-22 07:57:40 +0000
 > @@ -393,6 +393,9 @@ insert into t3 select A.a + 10*B.a from
 >   explain select * from t3 where a in (select kp1 from t1 where kp1<20);
 >   select * from t3 where a in (select kp1 from t1 where kp1<20);
 >
 > +explain select * from t3 where a in (select kp1 from t1 where 
kp1<20) and a<20;
 > +select * from t3 where a in (select kp1 from t1 where kp1<20) and a<20;
 > +
 >   create table t4 (pk int primary key);
 >   insert into t4 select a from t3;
 >
 > @@ -637,11 +640,9 @@ drop table t0, t1,t2,t3;
 >
 >
 >   --echo
 > ---echo Test that MaterializeLookup strategy for semijoin,
 > +--echo Test that neither MaterializeLookup strategy for semijoin,
 >   --echo nor subquery materialization is used when BLOBs are involved
 >   --echo (except when arguments of some functions).
 > ---echo Note: Due to Bug#52068, wrong may occur below if MaterializeScan
 > ---echo strategy is used instead,
 >   --echo
 >   set @prefix_len = 6;
 >
 > @@ -3371,3 +3372,89 @@ WHERE (a1, a2) IN (
 >   DROP TABLE t1,t2a,t2b,t2c;
 >
 >   --echo # End BUG#52329
 > +
 > +--echo #
 > +--echo # Bug#45174: Incorrectly applied equality propagation caused 
wrong
 > +--echo # result on a query with a materialized semi-join.
 > +--echo #
 > +
 > +CREATE TABLE t1 (
 > +  varchar_nokey varchar(1) NOT NULL
 > +);
 > +
 > +INSERT INTO t1 VALUES
 > + ('v'), ('u'), ('n'), ('l'), ('h'), ('u'), ('n'), ('j'), ('k'),
 > + ('e'), ('i'), ('u'), ('n'), ('b'), ('x'), (''), ('q'), ('u');
 > +
 > +CREATE TABLE t2 (
 > +  pk int NOT NULL,
 > +  varchar_key varchar(1) NOT NULL,
 > +  varchar_nokey varchar(1) NOT NULL,
 > +  PRIMARY KEY(pk),
 > +  KEY varchar_key(varchar_key)
 > +);
 > +
 > +INSERT INTO t2 VALUES
 > + (11,'m','m'), (12,'j','j'), (13,'z','z'), (14,'a','a'), (15,'',''),
 > + (16,'e','e'), (17,'t','t'), (19,'b','b'), (20,'w','w'), (21,'m','m'),
 > + (23,'',''), (24,'w','w'), (26,'e','e'), (27,'e','e'), (28,'p','p');
 > +
 > +let $query=
 > +SELECT varchar_nokey
 > +FROM t1
 > +WHERE (varchar_nokey, varchar_nokey) IN (SELECT varchar_key, 
varchar_nokey
 > +                                         FROM t2
 > +                                         WHERE varchar_nokey<  'n' 
XOR pk);

I feel the XOR part is an unecessary distraction here.  AFAICT, you
will get the same error with just "WHERE varchar_nokey < pk". Ref. new
duplicate BUG#57421.

...

 > === modified file 'sql/item.cc'
 > --- a/sql/item.cc	2010-10-15 10:32:50 +0000
 > +++ b/sql/item.cc	2010-10-22 07:57:40 +0000
 > @@ -5075,8 +5075,8 @@ Item *Item_field::replace_equal_field(uc
 >           return this;
 >         return const_item;
 >       }
 > -    Item_field *subst= item_equal->get_first();
 > -    if (subst&&  field->table != subst->field->table&& 
!field->eq(subst->field))
 > +    Item_field *subst= item_equal->get_first(this);
 > +    if (field->table != subst->field->table&& 
!field->eq(subst->field))
 >         return subst;

Since it is no longer checked that subst is not null, maybe an assert
should be added?

 >     }
 >     return this;
 >
 > === modified file 'sql/item_cmpfunc.cc'
 > --- a/sql/item_cmpfunc.cc	2010-10-15 10:32:50 +0000
 > +++ b/sql/item_cmpfunc.cc	2010-10-22 07:57:40 +0000
 > @@ -5840,3 +5840,90 @@ void Item_equal::print(String *str, enum
 >     str->append(')');
 >   }
 >
 > +
 > +/**
 > +  Get the first equal field of multiple equality, use when execution 
order
 > +  must be considered (see details below).
 > +
 > +  @param field  field to get substitution field for, which must be
 > +                present within the multiple equality itself.

Actually, field is just used to identify the correct semijoin nest.
It does not have to be present within the multiple equality itself to
work correctly.  Maybe it would be better to generalize this function
to take JOIN_TAB* as parameter.  That is, if the passed in JOIN_TAB is
part of a materialization, return the first field from that semijoin
nest.  Otherwise, return first field.

 > +
 > +  @retval Found first field in the multiple equality.
 > +
 > +  @details Get the first field of multiple equality that is equal to the
 > +  given field. In order to make semijoin materialization strategy work

The "that is equal to the given field" seems unecessary since all fields of
the multiple equality should be equal to this field.

 > +  correctly we can't propagate equal fields between a materialized 
semijoin
 > +  and the outer query (or any other semijoin) unconditionally.
 > +  Thus the field is returned according to following rules:
 > +
 > +  1) If the given field belongs to a materialized semijoin then the
 > +     first field in multiple equality which belong to the same semijoin
 > +     is returned.
 > +  2) If the given field doesn't belong to a materialized semijoin then
 > +     the first field in the multiple equality is returned.
 > +*/
 > +
 > +Item_field* Item_equal::get_first(const Item_field *field)

I am neither comfortable with the name nor the location of this
function.  Since this function does something different from the
existing get_first, I think it should have another name.  I am also a
bit concerned with making Item classes dependent on JOIN_TAB (except
for a few methods of Item_subselect, this is the first time JOIN_TAB
is used in Item classes) and join strategies.  Maybe a free function
in sql_select.cc that is friend of Item_equal is a better choice.
(Like eliminate_item_equal() today.)


 > +{
 > +  DBUG_ASSERT(field != NULL);
 > +
 > +  const JOIN_TAB *field_tab= field->field->table->reginfo.join_tab;
 > +
 > +  if (sj_is_materialize_strategy(field_tab->get_sj_strategy()))
 > +  {
 > +    /*
 > +      It's a field from a materialized semijoin. We can substitute 
it only
 > +      with a field from the same semijoin.
 > +
 > +      Example: suppose we have a join order:
 > +
 > +       ot1 ot2  SJM(it1  it2  it3)  ot3
 > +
 > +      and equality ot2.col = it1.col = it2.col
 > +
 > +      If we're looking for best substitute for 'it2.col', we must 
pick it1.col
 > +      and not ot2.col. it2.col is evaluated while performing 
materialization,
 > +      when the outer tables are not available in the execution.
 > +    */
 > +    List_iterator<Item_field>  it(fields);
 > +    Item_field *item;
 > +    const JOIN_TAB *first= field_tab->first_sj_inner_tab;
 > +    const JOIN_TAB *last=  field_tab->last_sj_inner_tab;
 > +
 > +    while ((item= it++))
 > +    {
 > +      if (item->field->table->reginfo.join_tab>= first&&
 > +          item->field->table->reginfo.join_tab<= last)
 > +      {
 > +        return item;
 > +      }
 > +    }
 > +  }
 > +  else
 > +  {
 > +    /*
 > +      The field is not in a materialized semijoin nest. We can return
 > +      the first field in the multiple equality.
 > +
 > +      Example: suppose we have a join order with MaterializeLookup:
 > +
 > +       ot1 ot2  SJM-Lookup(it1  it2)
 > +
 > +      Here we should always pick the first field in the multiple 
equality,
 > +      as this will be present before all other dependent fields.
 > +
 > +      Example: suppose we have a join order with MaterializeScan:
 > +
 > +          SJM-Scan(it1  it2)  ot1  ot2
 > +
 > +      and equality ot2.col = ot1.col = it2.col.
 > +
 > +      When looking for best substitute for 'ot2.col', we can pick 
it2.col,
 > +      because when we run the scan, column values from the inner 
materialized
 > +      tables will be copied back to the column buffers for it1 and it2.
 > +    */
 > +    return fields.head();
 > +  }
 > +  DBUG_ASSERT(FALSE);                          // Should never get here.
 > +  return NULL;
 > +}
 >
 > === modified file 'sql/item_cmpfunc.h'
 > --- a/sql/item_cmpfunc.h	2010-10-15 10:32:50 +0000
 > +++ b/sql/item_cmpfunc.h	2010-10-22 07:57:40 +0000
 > @@ -1637,7 +1637,13 @@ public:
 >     void add(Item_field *f);
 >     uint members();
 >     bool contains(Field *field);
 > +  /**
 > +  Get the first field of multiple equality, use for semantic checking.
 > +
 > +  @retval First field in the multiple equality.
 > +  */
 >     Item_field* get_first() { return fields.head(); }
 > +  Item_field* get_first(const Item_field *field);
 >     void merge(Item_equal *item);
 >     void update_const();
 >     enum Functype functype() const { return MULT_EQUAL_FUNC; }
 >
 > === modified file 'sql/sql_select.cc'
 > --- a/sql/sql_select.cc	2010-10-21 15:44:35 +0000
 > +++ b/sql/sql_select.cc	2010-10-22 07:57:40 +0000
 > @@ -1414,13 +1414,16 @@ bool might_do_join_buffering(uint join_c
 >     setup_sj_materialization() (todo: can't we move that to here also?)
 >   */
 >
 > -int setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
 > -                                    uint no_jbuf_after)
 > +bool setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
 > +                                     uint no_jbuf_after)
 >   {
 >     uint tableno;
 >     THD *thd= join->thd;
 >     DBUG_ENTER("setup_semijoin_dups_elimination");
 >
 > +  if (join->select_lex->sj_nests.is_empty())
 > +    DBUG_RETURN(FALSE);
 > +
 >     for (tableno= join->const_tables ; tableno<  join->tables; )
 >     {
 >       JOIN_TAB *tab=join->join_tab + tableno;
 > @@ -1661,21 +1664,6 @@ int setup_semijoin_dups_elimination(JOIN
 >           break;
 >         }
 >       }
 > -    /*
 > -      Remember the first and last semijoin inner tables; this serves 
to tell
 > -      a JOIN_TAB's semijoin strategy (like in check_join_cache_usage()).
 > -    */
 > -    JOIN_TAB *last_sj_inner=
 > -      (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
 > -      /* Range may end with non-inner table so cannot set 
last_sj_inner_tab */
 > -      NULL : last_sj_tab;
 > -    for (JOIN_TAB *tab_in_range= tab;
 > -         tab_in_range<= last_sj_tab;
 > -         tab_in_range++)
 > -    {
 > -      tab_in_range->first_sj_inner_tab= tab;
 > -      tab_in_range->last_sj_inner_tab=  last_sj_inner;
 > -    }
 >     }
 >     DBUG_RETURN(FALSE);
 >   }
 > @@ -4253,14 +4241,18 @@ bool find_eq_ref_candidate(TABLE *table,
 >       using Materialization or LooseScan to execute it.
 >
 >     RETURN
 > -    0 - OK
 > -    1 - Out of memory error
 > +    FALSE - OK
 > +    TRUE  - Out of memory error
 >   */
 >
 > -int pull_out_semijoin_tables(JOIN *join)
 > +bool pull_out_semijoin_tables(JOIN *join)
 >   {
 >     TABLE_LIST *sj_nest;
 >     DBUG_ENTER("pull_out_semijoin_tables");
 > +
 > +  if (join->select_lex->sj_nests.is_empty())
 > +    DBUG_RETURN(FALSE);
 > +
 >     List_iterator<TABLE_LIST>  sj_list_it(join->select_lex->sj_nests);
 >
 >     /* Try pulling out of the each of the semi-joins */
 > @@ -4343,7 +4335,12 @@ int pull_out_semijoin_tables(JOIN *join)
 >               pointers.
 >             */
 >             child_li.remove();
 > -          upper_join_list->push_back(tbl);
 > +          if (upper_join_list->push_back(tbl))
 > +          {
 > +            if (arena)
 > +              join->thd->restore_active_arena(arena,&backup);
 > +            DBUG_RETURN(TRUE);
 > +          }
 >             tbl->join_list= upper_join_list;
 >             tbl->embedding= sj_nest->embedding;
 >           }
 > @@ -4364,7 +4361,7 @@ int pull_out_semijoin_tables(JOIN *join)
 >           join->thd->restore_active_arena(arena,&backup);
 >       }
 >     }
 > -  DBUG_RETURN(0);
 > +  DBUG_RETURN(FALSE);
 >   }
 >
 >
 > @@ -8426,15 +8423,18 @@ prev_record_reads(JOIN *join, uint idx,
 >
 >   static bool fix_semijoin_strategies_for_picked_join_order(JOIN *join)
 >   {
 > -  uint tablenr;
 > +  uint tableno;

I suggest declarin the above within each loop.  It does not seem to be
referred outside of the loops.

 >     table_map remaining_tables= 0;
 >     table_map handled_tabs= 0;
 >
 >     DBUG_ENTER("fix_semijoin_strategies_for_picked_join_order");
 >
 > -  for (tablenr= join->tables - 1 ; tablenr != join->const_tables - 
1; tablenr--)
 > +  if (join->select_lex->sj_nests.is_empty())
 > +    DBUG_RETURN(FALSE);
 > +
 > +  for (tableno= join->tables - 1; tableno != join->const_tables - 1; 
tableno--)
 >     {
 > -    POSITION *pos= join->best_positions + tablenr;
 > +    POSITION *pos= join->best_positions + tableno;
 >       JOIN_TAB *s= pos->table;
 >       TABLE_LIST *emb_sj_nest= s->emb_sj_nest;
 >       uint first;
 > @@ -8471,7 +8471,7 @@ static bool fix_semijoin_strategies_for_
 >         */
 >         memcpy(pos - table_count + 1, 
emb_sj_nest->nested_join->sjm.positions,
 >                sizeof(POSITION) * table_count);
 > -      first= tablenr - table_count + 1;
 > +      first= tableno - table_count + 1;
 >         join->best_positions[first].n_sj_tables= table_count;
 >         join->best_positions[first].sj_strategy= 
SJ_OPT_MATERIALIZE_LOOKUP;
 >
 > @@ -8507,14 +8507,13 @@ static bool fix_semijoin_strategies_for_
 >         /* Add materialization record count*/
 >         prefix_rec_count *= 
mat_sj_nest->nested_join->sjm.expected_rowcount;
 >
 > -      uint i;
 >         table_map rem_tables= remaining_tables;
 > -      for (i= tablenr; i != (first + table_count - 1); i--)
 > +      for (uint i= tableno; i != (first + table_count - 1); i--)
 >           rem_tables |= join->best_positions[i].table->table->map;
 >
 >         POSITION dummy;
 >         join->cur_sj_inner_tables= 0;
 > -      for (i= first + table_count; i<= tablenr; i++)
 > +      for (uint i= first + table_count; i<= tableno; i++)
 >         {
 >           best_access_path(join, join->best_positions[i].table, 
rem_tables, i, FALSE,
 >                            prefix_rec_count, join->best_positions + 
i,&dummy);
 > @@ -8528,14 +8527,14 @@ static bool fix_semijoin_strategies_for_
 >       {
 >         first= pos->first_firstmatch_table;
 >         join->best_positions[first].sj_strategy= SJ_OPT_FIRST_MATCH;
 > -      join->best_positions[first].n_sj_tables= tablenr - first + 1;
 > +      join->best_positions[first].n_sj_tables= tableno - first + 1;
 >         POSITION dummy; // For loose scan paths
 >         double record_count= (first== join->const_tables)? 1.0:
 > -                           join->best_positions[tablenr - 
1].prefix_record_count;
 > +                           join->best_positions[tableno - 
1].prefix_record_count;
 >
 >         table_map rem_tables= remaining_tables;
 > -      uint idx;
 > -      for (idx= first; idx<= tablenr; idx++)
 > +
 > +      for (uint idx= first; idx<= tableno; idx++)
 >         {
 >           rem_tables |= join->best_positions[idx].table->table->map;
 >         }
 > @@ -8544,7 +8543,7 @@ static bool fix_semijoin_strategies_for_
 >           join buffering
 >         */
 >         join->cur_sj_inner_tables= 0;
 > -      for (idx= first; idx<= tablenr; idx++)
 > +      for (uint idx= first; idx<= tableno; idx++)
 >         {
 >           if (join->best_positions[idx].use_join_buffer)
 >           {
 > @@ -8562,18 +8561,18 @@ static bool fix_semijoin_strategies_for_
 >         POSITION *first_pos= join->best_positions + first;
 >         POSITION loose_scan_pos; // For loose scan paths
 >         double record_count= (first== join->const_tables)? 1.0:
 > -                           join->best_positions[tablenr - 
1].prefix_record_count;
 > +                           join->best_positions[tableno - 
1].prefix_record_count;
 >
 >         table_map rem_tables= remaining_tables;
 > -      uint idx;
 > -      for (idx= first; idx<= tablenr; idx++)
 > +
 > +      for (uint idx= first; idx<= tableno; idx++)
 >           rem_tables |= join->best_positions[idx].table->table->map;
 >         /*
 >           Re-run best_access_path to produce best access methods that 
do not use
 >           join buffering
 >         */
 >         join->cur_sj_inner_tables= 0;
 > -      for (idx= first; idx<= tablenr; idx++)
 > +      for (uint idx= first; idx<= tableno; idx++)
 >         {
 >           if (join->best_positions[idx].use_join_buffer || (idx == 
first))
 >           {
 > @@ -8598,7 +8597,7 @@ static bool fix_semijoin_strategies_for_
 >         */
 >         first= pos->first_dupsweedout_table;
 >         join->best_positions[first].sj_strategy= SJ_OPT_DUPS_WEEDOUT;
 > -      join->best_positions[first].n_sj_tables= tablenr - first + 1;
 > +      join->best_positions[first].n_sj_tables= tableno - first + 1;
 >       }
 >
 >       uint i_end= first + join->best_positions[first].n_sj_tables;
 > @@ -8613,7 +8612,7 @@ static bool fix_semijoin_strategies_for_
 >         handled_tabs |= join->best_positions[i].table->table->map;
 >       }
 >
 > -    if (tablenr != first)
 > +    if (tableno != first)
 >         pos->sj_strategy= SJ_OPT_NONE;
 >       remaining_tables |= s->table->map;
 >     }
 > @@ -8650,16 +8649,13 @@ static bool fix_semijoin_strategies_for_
 >
 >   static bool get_best_combination(JOIN *join)
 >   {
 > -  uint i,tablenr;
 >     table_map used_tables;
 > -  JOIN_TAB *join_tab,*j;
 >     KEYUSE *keyuse;
 > -  uint table_count;
 > +  const uint table_count= join->tables;
 >     THD *thd=join->thd;
 >     DBUG_ENTER("get_best_combination");
 >
 > -  table_count=join->tables;
 > -  if (!(join->join_tab= join_tab= new (thd->mem_root) 
JOIN_TAB[table_count]))
 > +  if (!(join->join_tab= new (thd->mem_root) JOIN_TAB[table_count]))
 >       DBUG_RETURN(TRUE);
 >
 >     join->full_join=0;
 > @@ -8669,11 +8665,12 @@ static bool get_best_combination(JOIN *j
 >     if (fix_semijoin_strategies_for_picked_join_order(join))
 >       DBUG_RETURN(TRUE);
 >
 > -  for (j=join_tab, tablenr=0 ; tablenr<  table_count ; tablenr++,j++)
 > +  for (uint tableno= 0; tableno<  table_count; tableno++)
 >     {
 > +    JOIN_TAB *j= join->join_tab + tableno;
 >       TABLE *form;
 > -    *j= *join->best_positions[tablenr].table;
 > -    form=join->all_tables[tablenr]=j->table;
 > +    *j= *join->best_positions[tableno].table;
 > +    form=join->all_tables[tableno]= j->table;
 >       used_tables|= form->map;
 >       form->reginfo.join_tab=j;
 >       if (!*j->on_expr_ref)
 > @@ -8683,31 +8680,69 @@ static bool get_best_combination(JOIN *j
 >       if (j->type == JT_CONST)
 >         continue;					// Handled in make_join_stat..
 >
 > -
 >       j->loosescan_match_tab= NULL;  //non-nulls will be set later
 >       j->ref.key = -1;
 >       j->ref.key_parts=0;
 >
 > -
 >       if (j->type == JT_SYSTEM)
 >         continue;
 >
 > -    if (j->keys.is_clear_all() || !(keyuse= 
join->best_positions[tablenr].key) ||
 > -        (join->best_positions[tablenr].sj_strategy == 
SJ_OPT_LOOSE_SCAN))
 > +    if (j->keys.is_clear_all() ||
 > +       !(keyuse= join->best_positions[tableno].key) ||

IMHO, the above line should be indented one more character.

 > +        (join->best_positions[tableno].sj_strategy == 
SJ_OPT_LOOSE_SCAN))
 >       {
 >         j->type=JT_ALL;
 > -      j->index= join->best_positions[tablenr].loosescan_key;
 > -      if (tablenr != join->const_tables)
 > +      j->index= join->best_positions[tableno].loosescan_key;
 > +      if (tableno != join->const_tables)
 >   	join->full_join=1;
 >       }
 >       else if (create_ref_for_key(join, j, keyuse, used_tables))
 >         DBUG_RETURN(TRUE);                        // Something went wrong
 >     }
 >
 > -  for (i=0 ; i<  table_count ; i++)
 > -    join->map2table[join->join_tab[i].table->tablenr]=join->join_tab+i;
 > +  for (uint tableno= 0; tableno<  table_count; tableno++)
 > +    join->map2table[join->join_tab[tableno].table->tablenr]=
 > +      join->join_tab + tableno;
 > +
 >     update_depend_map(join);
 > -  DBUG_RETURN(0);
 > +

I think the purpose of the loop below should be explained in a comment
above the love. (Or the entire loop could be put into a separate
function and explained in a function comment.)

 > +  for (uint tableno= join->const_tables; tableno<  table_count; )
 > +  {
 > +    JOIN_TAB *tab= join->join_tab + tableno;
 > +    const POSITION *pos= join->best_positions + tableno;
 > +
 > +    switch (pos->sj_strategy)
 > +    {
 > +    case SJ_OPT_NONE:
 > +      tableno++;
 > +      break;
 > +    case SJ_OPT_MATERIALIZE_LOOKUP:
 > +    case SJ_OPT_MATERIALIZE_SCAN:
 > +    case SJ_OPT_LOOSE_SCAN:
 > +    case SJ_OPT_DUPS_WEEDOUT:
 > +    case SJ_OPT_FIRST_MATCH:
 > +      /*
 > +        Remember the first and last semijoin inner tables; this 
serves to tell
 > +        a JOIN_TAB's semijoin strategy (like in 
check_join_cache_usage()).
 > +      */
 > +      JOIN_TAB *last_sj_tab= tab + pos->n_sj_tables - 1;
 > +      JOIN_TAB *last_sj_inner=
 > +        (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
 > +        /* Range may end with non-inner table so cannot set 
last_sj_inner_tab */
 > +        NULL : last_sj_tab;
 > +      for (JOIN_TAB *tab_in_range= tab;
 > +           tab_in_range<= last_sj_tab;
 > +           tab_in_range++)
 > +      {
 > +        tab_in_range->first_sj_inner_tab= tab;
 > +        tab_in_range->last_sj_inner_tab=  last_sj_inner;
 > +      }
 > +      tableno+= pos->n_sj_tables;
 > +      break;
 > +    }
 > +  }
 > +
 > +  DBUG_RETURN(FALSE);
 >   }
 >
 >
 > @@ -9350,6 +9385,22 @@ static bool pushdown_on_conditions(JOIN*
 >   }
 >
 >
 > +/**
 > +  Separates the predicates in a join condition and pushes them to the
 > +  join step where all involved tables are available in the join prefix.
 > +  ON clauses from JOIN expressions are also pushed to the most 
appropriate step.
 > +
 > +  @param join Join object where predicates are pushed.
 > +
 > +  @param cond Pointer to condition which may contain an arbitrary 
number of
 > +              predicates, combined using AND, OR and XOR items.
 > +              If NULL, equivalent to a predicate that returns TRUE 
for all
 > +              row combinations.
 > +
 > +  @retval TRUE if condition makes query always false OR an error 
occurred.
 > +  @retval FALSE otherwise.
 > +*/
 > +
 >   static bool make_join_select(JOIN *join, Item *cond)
 >   {
 >     THD *thd= join->thd;
 > @@ -9445,8 +9496,7 @@ static bool make_join_select(JOIN *join,
 >            - If we're looking at the first SJM table, reset used_tables
 >              to refer to only allowed tables
 >         */
 > -      if (tab->emb_sj_nest&&
 > -          tab->emb_sj_nest->sj_mat_exec&&
 > +      if (sj_is_materialize_strategy(tab->get_sj_strategy())&&
 >             !(used_tables&  tab->emb_sj_nest->sj_inner_tables))
 >         {
 >           save_used_tables= used_tables;
 > @@ -9689,32 +9739,38 @@ static bool make_join_select(JOIN *join,
 >
 >         DBUG_ASSERT(save_used_tables ? tab->emb_sj_nest != NULL : TRUE);
 >
 > -      if (save_used_tables&&  !(used_tables&
 > -                                ~(tab->emb_sj_nest->sj_inner_tables |
 > -                                  join->const_table_map | 
PSEUDO_TABLE_BITS)))
 > +      /*
 > +         1. We are inside a materialized semijoin nest, and
 > +         2. All inner tables of the nest are covered.
 > +      */
 > +      if (save_used_tables&& 
  // 1
 > +         !(tab->emb_sj_nest->sj_inner_tables&  ~used_tables)) 
  // 2

Would it not be an idea to create a subset function soon so that we do
not have to twist our minds to understand this each time.  This
example shows that even an experienced MySQL developer had problems
getting this right.

 >         {
 >           /*
 > -          We have reached the end of semi join nest. That is, the 
join order
 > -          looks like this:
 > +          The join order now looks like this:
 >
 > -           outer_tbl1 SJ-Materialize(inner_tbl1 ... inner_tblN) 
outer_tbl ...
 > -                                                               ^
 > - 
\-we're here
 > -          At this point, we need to produce two conditions
 > -           - A condition that can be checked when we have all of the 
sj-inner
 > -             tables (inner_tbl1 ... inner_tblN). This will be used 
while doing
 > -             materialization.
 > -           - A condition that can be checked when we have all of the 
tables
 > -             in the prefix (both inner and outer).
 > +           ot1 ... otI SJM(it1 ... itN) otI+1 ... otM
 > +                                       ^
 > +                                        \-we're here
 > +          At this point, we have generated a condition that can be 
checked
 > +          when we have all of the sj-inner tables (it1 ... itN).
 > +          This will be used while doing materialization.
 > +
 > +          In addition, we need a condition that can be checked when 
we have
 > +          all of the tables in the prefix (both inner and outer).
 > +          This condition is only generated (and used) when we have 
an SJM-scan
 > +          operation. For SJM-lookup, the condition is completely 
fulfilled
 > +          through the lookup into the materialized table.
 > +          This constraint will last as long as we do not allow 
correlated
 > +          subqueries with materialized semijoin execution.
 >           */
 > -        tab->emb_sj_nest->sj_mat_exec->join_cond=
 > -          cond ?
 > -             make_cond_after_sjm(cond, cond, save_used_tables, 
used_tables):
 > -            NULL;
 > +        if (cond&&  tab->emb_sj_nest->sj_mat_exec->is_scan)
 > +          tab->emb_sj_nest->sj_mat_exec->join_cond=
 > +            make_cond_after_sjm(cond, cond, save_used_tables, 
used_tables);
 > +
 >           used_tables= save_used_tables | used_tables;
 >           save_used_tables= 0;
 >         }
 > -
 >       }
 >     }
 >     DBUG_RETURN(0);
 > @@ -10877,8 +10933,7 @@ make_join_readinfo(JOIN *join, ulonglong
 >     uint last_sjm_table= MAX_TABLES;
 >     DBUG_ENTER("make_join_readinfo");
 >
 > -  if (!join->select_lex->sj_nests.is_empty()&&
 > -      setup_semijoin_dups_elimination(join, options, no_jbuf_after))
 > +  if (setup_semijoin_dups_elimination(join, options, no_jbuf_after))
 >       DBUG_RETURN(TRUE); /* purecov: inspected */
 >
 >     for (i=join->const_tables ; i<  join->tables ; i++)
 > @@ -12527,17 +12582,18 @@ Item *eliminate_item_equal(Item *cond, C
 >                              Item_equal *item_equal)
 >   {
 >     List<Item>  eq_list;
 > -  Item_func_eq *eq_item= 0;
 > +  Item_func_eq *eq_item= NULL;
 >     if (((Item *) item_equal)->const_item()&&  !item_equal->val_int())
 >       return new Item_int((longlong) 0,1);
 >     Item *item_const= item_equal->get_const();
 >     Item_equal_iterator it(*item_equal);
 >     Item *head;
 > -  if (item_const)
 > -    head= item_const;
 > -  else
 > +  if (!item_const)
 >     {
 > -    head= item_equal->get_first();
 > +    /*
 > +      If there is a const item, match all field items with the const 
item,
 > +      otherwise match the second and subsequent field items with the 
first one:
 > +    */
 >       it++;
 >     }
 >     Item_field *item_field;
 > @@ -12548,7 +12604,7 @@ Item *eliminate_item_equal(Item *cond, C
 >       if (upper)
 >       {
 >         if (item_const&&  upper->get_const())
 > -        item= 0;
 > +        item= NULL;
 >         else
 >         {
 >           Item_equal_iterator li(*item_equal);
 > @@ -12563,57 +12619,42 @@ Item *eliminate_item_equal(Item *cond, C
 >       {
 >         if (eq_item)
 >           eq_list.push_back(eq_item);
 > -      /*
 > -        item_field might refer to a table that is within a semi-join
 > -        materialization nest. In that case, the join order looks 
like this:
 >
 > -          outer_tbl1 outer_tbl2 SJM (inner_tbl1 inner_tbl2) outer_tbl3
 > +      /*
 > +        item_field may refer to a table that is within a semijoin

I suggest prefixing item_field with @c, so that doxygen formats the
name as a C++ identifier.

 > +        materialization nest. In that case, the join order may look 
like:
 >
 > -        We must not construct equalities like
 > +          ot1 ot2 SJM (it3 it4) ot5
 >
 > -           outer_tbl1.col = inner_tbl1.col
 > +        If we have a multiple equality (ot1.c1, ot2.c2, it3.c3, 
it4.c4, ot5.c5),
 > +        we should generate the following equalities:
 > +         1. ot1.c1 = ot2.c2
 > +         2. ot1.c1 = it1.c3
 > +         3. it1.c3 = it2.c4
 > +         4. ot1.c1 = ot5.c5
 > +
 > +        Equalities 1) and 4) are regular equalities between two 
outer tables.
 > +        Equality 2) is an equality that matches the outer query with a
 > +        materialized semijoin table. It is either performed as a lookup
 > +        into the materialized table (SJM-lookup), or as a condition 
on the
 > +        outer table (SJM-scan).
 > +        Equality 3) is evaluated during semijoin materialization.
 > +
 > +        If there is a const item, match against this one.
 > +        Otherwise, match against the first field item in the 
multiple equality,
 > +        unless the item is within a materialized semijoin nest, 
where we match
 > +        against the first item within the SJM nest (if the item is 
not the first
 > +        item within the SJM nest), or match against the first item 
in the
 > +        list (if the item is the first one in the SJM list).
 > +      */

The last part got a bit complex.  The terminology is also a bit
inconsistent.  I think you use SJM nest and SJM list for the same
thing, and multiple equality and list likewise.

I also think you need to explain get_first(Item_field*) role in this.
Otherwise, it is not easy to understand the connection between the
description above and the code below.

 > +      head= item_const ? item_const : item_equal->get_first(item);
 > +      if (head == item)
 > +        head= item_equal->get_first();
 >
 > -        because they would get attached to inner_tbl1 and will get 
evaluated
 > -        during materialization phase, when we don't have current 
value of
 > -        outer_tbl1.col.
 > -      */
 > -      TABLE_LIST *emb_nest=
 > -        item_field->field->table->pos_in_table_list->embedding;
 > -      if (!item_const&&  emb_nest&&  emb_nest->sj_mat_exec)
 > -      {
 > -        /*
 > -          Find the first equal expression that refers to a table that is
 > -          within the semijoin nest. If we can't find it, do nothing
 > -        */
 > -        List_iterator<Item_field>  fit(item_equal->fields);

Since this function no longer refers to Item_equal::fields, it can be
removed as friend of Item_equal.

 > -        Item_field *head_in_sjm;
 > -        bool found= FALSE;
 > -        while ((head_in_sjm= fit++))
 > -        {
 > -          if (head_in_sjm->used_tables()&  emb_nest->sj_inner_tables)
 > -          {
 > -            if (head_in_sjm == item_field)
 > -            {
 > -              /* This is the first table inside the semi-join*/
 > -              eq_item= new Item_func_eq(item_field, head);
 > -              /* Tell make_cond_for_table don't use this. */
 > -              eq_item->marker=3;
 > -            }
 > -            else
 > -            {
 > -              eq_item= new Item_func_eq(item_field, head_in_sjm);
 > -              found= TRUE;
 > -            }
 > -            break;
 > -          }
 > -        }
 > -        if (!found)
 > -          continue;
 > -      }
 > -      else
 > -        eq_item= new Item_func_eq(item_field, head);
 > +      eq_item= new Item_func_eq(item_field, head);

I think it is a bit confusing that, AFAICT, both item and item_field
is used to refer to the same item.  I suggest that inside the
item==item_field block that all references are to item_field.

 >         if (!eq_item)
 > -        return 0;
 > +        return NULL;
 > +
 >         eq_item->set_cmp_func();
 >         eq_item->quick_fix_field();
 >       }
 > @@ -16908,28 +16949,27 @@ sub_select_sjm(JOIN *join, JOIN_TAB *joi
 >         Ok, materialization finished. Initialize the access to the 
temptable
 >       */
 >       sjm->materialized= TRUE;
 > +
 >       join_tab->read_record.read_record= join_no_more_records;
 > -    if (sjm->is_scan)
 > -    {
 > -      /* Initialize full scan */
 > -      JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
 > -      init_read_record(&last_tab->read_record, join->thd,
 > -                       sjm->table, NULL, TRUE, TRUE, FALSE);
 > -
 > -      DBUG_ASSERT(last_tab->read_record.read_record == rr_sequential);
 > -      last_tab->read_first_record= join_read_record_no_init;
 > -      last_tab->read_record.copy_field= sjm->copy_field;
 > -      last_tab->read_record.copy_field_end= sjm->copy_field +
 > -                                            sjm->table_cols.elements;
 > -      last_tab->read_record.read_record= rr_sequential_and_unpack;
 > -    }
 >     }
 >
 >     if (sjm->is_scan)
 >     {
 > -    /* Do full scan of the materialized table */
 > +    /*
 > +      For each prefix partial record (if any), initialize and perform
 > +      a full scan of the materialized table.
 > +    */
 >       JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
 >
 > +    init_read_record(&last_tab->read_record, join->thd,
 > +                     sjm->table, NULL, TRUE, TRUE, FALSE);
 > +
 > +    last_tab->read_first_record= join_read_record_no_init;
 > +    last_tab->read_record.copy_field= sjm->copy_field;
 > +    last_tab->read_record.copy_field_end= sjm->copy_field +
 > +                                          sjm->table_cols.elements;
 > +    last_tab->read_record.read_record= rr_sequential_and_unpack;
 > +

Since this code does not deal anything with equality predicates, I
would prefer if the above bug fix was separate patch.

Anyhow, I am afraid that readers of this code, may be a bit puzzled
that you set last_tab->read_first_record and
last_tab->read_record.read_record for each scan.  I doubt that which
functions to use will change between scans.  copy_field probably does
not change either.

 >       Item *save_cond= last_tab->select_cond;
 >       last_tab->set_select_cond(sjm->join_cond, __LINE__);
 >       rc= sub_select(join, last_tab, end_of_records);
 > @@ -18840,21 +18880,23 @@ static bool replace_subcondition(JOIN *j
 >   }
 >
 >
 > -/*
 > +/**
 >     Extract a condition that can be checked after reading given table
 >
 > -  SYNOPSIS
 > -    make_cond_for_table()
 > -      cond         Condition to analyze
 > -      tables       Tables for which "current field values" are available
 > -      used_table   Table that we're extracting the condition for (may
 > -                   also include PSEUDO_TABLE_BITS
 > -      exclude_expensive_cond  Do not push expensive conditions
 > +  @param cond       Condition to analyze
 > +  @param tables     Tables for which "current field values" are 
available
 > +  @param used_table Table that we're extracting the condition for (may
 > +                    also include PSEUDO_TABLE_BITS, and may be zero)
 > +  @param exclude_expensive_cond  Do not push expensive conditions
 >
 > -  DESCRIPTION
 > +  @retval<>NULL Generated condition
 > +  @retval = NULL Already checked, OR error
 > +
 > +  @details
 >       Extract the condition that can be checked after reading the table
 >       specified in 'used_table', given that current-field values for 
tables
 >       specified in 'tables' bitmap are available.
 > +    If 'used_table' is 0, extract conditions for all tables in 'tables'.
 >
 >       The function assumes that
 >         - Constant parts of the condition has already been checked.
 > @@ -18865,12 +18907,12 @@ static bool replace_subcondition(JOIN *j
 >       guaranteed to be true by employed 'ref' access methods (the 
code that
 >       does this is located at the end, search down for "EQ_FUNC").
 >
 > -
 > -  SEE ALSO
 > +  @see
 >       make_cond_for_info_schema uses similar algorithm
 >
 > -  RETURN
 > -    Extracted condition
 > +  @note
 > +    Make sure to keep the implementations of make_cond_for_table() and
 > +    make_cond_after_sjm() synchronized.
 >   */
 >
 >   static Item *
 > @@ -18880,26 +18922,32 @@ make_cond_for_table(Item *cond, table_ma
 >     return make_cond_for_table_from_pred(cond, cond, tables, used_table,
 >                                          exclude_expensive_cond);
 >   }
 > -
 > +
 >   static Item *
 >   make_cond_for_table_from_pred(Item *root_cond, Item *cond,
 >                                 table_map tables, table_map used_table,
 >                                 bool exclude_expensive_cond)
 >   {
 > -  if (used_table&&  !(cond->used_tables()&  used_table)&&
 > -      /*
 > -        Exclude constant conditions not checked at optimization time if
 > +  /*
 > +    Ignore this condition if
 > +     1. We are extracting conditions for a specific table, and
 > +     2. that table is not referenced by the condition, and
 > +     3. exclude constant conditions not checked at optimization time if
 >           the table we are pushing conditions to is the first one.
 >           As a result, such conditions are not considered as already 
checked
 >           and will be checked at execution time, attached to the 
first table.
 > -
 > +  */
 > +  if (used_table&&                                                  // 1
 > +      !(cond->used_tables()&  used_table)&&                        
// 2
 > +      /*
 >           psergey: TODO: "used_table&  1" doesn't make sense in 
nearly any
 >           context. Look at setup_table_map(), table bits reflect the 
order
 >           the tables were encountered by the parser. Check what we should
 >           replace this condition with.
 >         */
 > -      !((used_table&  1)&&  cond->is_expensive()))
 > -    return (Item*) 0;				// Already checked
 > +      !((used_table&  1)&&  cond->is_expensive()))                 
// 3
 > +    return NULL;
 > +
 >     if (cond->type() == Item::COND_ITEM)
 >     {
 >       if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
 > @@ -18907,48 +18955,48 @@ make_cond_for_table_from_pred(Item *root
 >         /* Create new top level AND item */
 >         Item_cond_and *new_cond=new Item_cond_and;
 >         if (!new_cond)
 > -	return (Item*) 0;			// OOM /* purecov: inspected */
 > +        return NULL;
 >         List_iterator<Item>  li(*((Item_cond*) cond)->argument_list());
 >         Item *item;
 >         while ((item=li++))
 >         {
 > -	Item *fix=make_cond_for_table_from_pred(root_cond, item,
 > +        Item *fix=make_cond_for_table_from_pred(root_cond, item,

Since you are changing the above line anyway, maybe you should change
to the MySQL-standard assignment format, too.  (There are several
cases of this.)


 >                                                   tables, used_table,
 > 
exclude_expensive_cond);
 > -	if (fix)
 > -	  new_cond->argument_list()->push_back(fix);
 > +        if (fix)
 > +          new_cond->argument_list()->push_back(fix);
 >         }
 >         switch (new_cond->argument_list()->elements) {
 >         case 0:
 > -	return (Item*) 0;			// Always true
 > +        return NULL;                          // Always true
 >         case 1:
 > -	return new_cond->argument_list()->head();
 > +        return new_cond->argument_list()->head();
 >         default:
 > -	/*
 > -	  Item_cond_and do not need fix_fields for execution, its parameters
 > -	  are fixed or do not need fix_fields, too
 > -	*/
 > -	new_cond->quick_fix_field();
 > -	new_cond->used_tables_cache=
 > -	  ((Item_cond_and*) cond)->used_tables_cache&
 > -	  tables;
 > -	return new_cond;
 > +        /*
 > +          Item_cond_and do not need fix_fields for execution, its 
parameters
 > +          are fixed or do not need fix_fields, too
 > +        */
 > +        new_cond->quick_fix_field();
 > +        new_cond->used_tables_cache=
 > +          ((Item_cond_and*) cond)->used_tables_cache&
 > +          tables;

I suggest merging the to lines above.


 > +          return new_cond;
 >         }
 >       }
 >       else
 > -    {						// Or list
 > +    {                                         // Or list
 >         Item_cond_or *new_cond=new Item_cond_or;
 >         if (!new_cond)
 > -	return (Item*) 0;			// OOM /* purecov: inspected */
 > +        return NULL;
 >         List_iterator<Item>  li(*((Item_cond*) cond)->argument_list());
 >         Item *item;
 >         while ((item=li++))
 >         {
 > -	Item *fix=make_cond_for_table_from_pred(root_cond, item,
 > +        Item *fix=make_cond_for_table_from_pred(root_cond, item,
 >                                                   tables, 0L,
 > 
exclude_expensive_cond);
 >   	if (!fix)
 > -	  return (Item*) 0;			// Always true
 > +          return NULL;                        // Always true
 >   	new_cond->argument_list()->push_back(fix);
 >         }
 >         /*
 > @@ -18963,20 +19011,26 @@ make_cond_for_table_from_pred(Item *root
 >     }
 >
 >     /*
 > -    Because the following test takes a while and it can be done
 > -    table_count times, we mark each item that we have examined with 
the result
 > -    of the test
 > -  */
 > +    Omit this condition if
 > +     1. It has been marked as omittable before, or
 > +     2. Some tables referred by the condition are not available, or
 > +     3. We are extracting conditions for any tables, the condition is
 > +        considered 'expensive', and we want to delay evaluation of such
 > +        conditions to the execution phase.
 > +  */
 > +  if (cond->marker == 3 || 
    // 1
 > +      (cond->used_tables()&  ~tables) || 
    // 2
 > +      (!used_table&&  exclude_expensive_cond&& 
cond->is_expensive())) // 3
 > +    return NULL;
 >
 > -  if (cond->marker == 3 || (cond->used_tables()&  ~tables) ||
 > -      /*
 > -        When extracting constant conditions, treat expensive 
conditions as
 > -        non-constant, so that they are not evaluated at optimization 
time.
 > -      */
 > -      (!used_table&&  exclude_expensive_cond&& 
cond->is_expensive()))
 > -    return (Item*) 0;				// Can't check this yet
 > -  if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
 > -    return cond;				// Not boolean op
 > +  /*
 > +    Extract this condition if
 > +     1. It has already been marked as applicable, or
 > +     2. It is not a<comparison predicate> 
(=,<,>,<=,>=,<=>)
 > +  */
 > +  if (cond->marker == 2 || 
    // 1
 > +      cond->eq_cmp_result() == Item::COND_OK) 
    // 2
 > +    return cond;
 >
 >     /*
 >       Remove equalities that are guaranteed to be true by use of 
'ref' access
 > @@ -19004,19 +19058,19 @@ make_cond_for_table_from_pred(Item *root
 >       Item *left_item= ((Item_func*) cond)->arguments()[0]->real_item();
 >       Item *right_item= ((Item_func*) cond)->arguments()[1]->real_item();
 >       if (left_item->type() == Item::FIELD_ITEM&&
 > -	test_if_ref(root_cond, (Item_field*) left_item,right_item))
 > +        test_if_ref(root_cond, (Item_field*) left_item,right_item))
 >       {
 > -      cond->marker=3;			// Checked when read
 > -      return (Item*) 0;
 > +      cond->marker=3;                    // Condition can be omitted
 > +      return NULL;
 >       }
 >       if (right_item->type() == Item::FIELD_ITEM&&
 > -	test_if_ref(root_cond, (Item_field*) right_item,left_item))
 > +        test_if_ref(root_cond, (Item_field*) right_item,left_item))
 >       {
 > -      cond->marker=3;			// Checked when read
 > -      return (Item*) 0;
 > +      cond->marker=3;                   // Condition can be omitted
 > +      return NULL;
 >       }

I suggest merging the above two if-statements.  No need to duplicate
this code.  If readability is a concern, it can be achieved with
white-space.

Even better, make a small inline function so that you can write

   if (x(root_cond, right_item, left_item) ||
       x(root_cond, left_item, right_item))


 >     }
 > -  cond->marker=2;
 > +  cond->marker=2;                       // Mark condition as applicable
 >     return cond;
 >   }
 >
 > @@ -19030,24 +19084,35 @@ make_cond_for_table_from_pred(Item *root
 >     @param sjm_tables Tables within the semi-join nest (the inner part).
 >
 >     @retval<>NULL Generated condition
 > -  @retval = NULL Already checked, or error
 > +  @retval = NULL Already checked, OR error
 >
 >     @details
 >     A regular semi-join materialization is always non-correlated, ie
 >     the subquery is always resolved by performing a lookup generated in
 > -  create_subquery_equalities, hence this function should never produce
 > +  create_subquery_equalities, hence this function never needs to produce
 >     any condition for regular semi-join materialization.
 >     For a scan semi-join materialization, this function may return a 
condition
 > -  to be checked.
 > +  to be checked, when there are outer tables before the SJM tables 
in the
 > +  join prefix.
 > +
 > +  @note
 > +    Make sure to keep the implementations of make_cond_for_table() and
 > +    make_cond_after_sjm() synchronized.
 >   */
 >
 >   static Item *
 >   make_cond_after_sjm(Item *root_cond, Item *cond, table_map tables,
 >                       table_map sjm_tables)
 >   {
 > +  /*
 > +    We can only test conditions that cover tables from the join prefix
 > +    and tables from the semijoin nest. Other conditions will be handled
 > +    by make_cond_for_table().
 > +  */
 >     if ((!(cond->used_tables()&  ~tables) ||
 >          !(cond->used_tables()&  ~sjm_tables)))
 > -    return (Item*) 0;				// Already checked
 > +    return NULL;
 > +
 >     if (cond->type() == Item::COND_ITEM)
 >     {
 >       if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
 > @@ -19055,49 +19120,49 @@ make_cond_after_sjm(Item *root_cond, Ite
 >         /* Create new top level AND item */
 >         Item_cond_and *new_cond=new Item_cond_and;
 >         if (!new_cond)
 > -	return (Item*) 0;			// OOM /* purecov: inspected */
 > +        return NULL;
 >         List_iterator<Item>  li(*((Item_cond*) cond)->argument_list());
 >         Item *item;
 >         while ((item=li++))
 >         {
 > -	Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
 > -	if (fix)
 > -	  new_cond->argument_list()->push_back(fix);
 > +        Item *fix=make_cond_after_sjm(root_cond, item, tables, 
sjm_tables);
 > +        if (fix)
 > +          new_cond->argument_list()->push_back(fix);
 >         }
 >         switch (new_cond->argument_list()->elements) {
 >         case 0:
 > -	return (Item*) 0;			// Always true
 > +        return NULL;                    // Always true
 >         case 1:
 > -	return new_cond->argument_list()->head();
 > +        return new_cond->argument_list()->head();
 >         default:
 >   	/*
 > -	  Item_cond_and do not need fix_fields for execution, its parameters
 > -	  are fixed or do not need fix_fields, too
 > +          Item_cond_and do not need fix_fields for execution, its 
parameters
 > +          are fixed or do not need fix_fields, too
 >   	*/
 > -	new_cond->quick_fix_field();
 > -	new_cond->used_tables_cache=
 > -	  ((Item_cond_and*) cond)->used_tables_cache&
 > -	  tables;
 > -	return new_cond;
 > +        new_cond->quick_fix_field();
 > +        new_cond->used_tables_cache=
 > +          ((Item_cond_and*) cond)->used_tables_cache&
 > +          tables;

Likewise.

 > +        return new_cond;
 >         }
 >       }
 >       else
 > -    {						// Or list
 > +    {                                          // Or list
 >         Item_cond_or *new_cond=new Item_cond_or;
 >         if (!new_cond)
 > -	return (Item*) 0;			// OOM /* purecov: inspected */
 > +        return NULL;
 >         List_iterator<Item>  li(*((Item_cond*) cond)->argument_list());
 >         Item *item;
 >         while ((item=li++))
 >         {
 > -	Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
 > -	if (!fix)
 > -	  return (Item*) 0;			// Always true
 > -	new_cond->argument_list()->push_back(fix);
 > +        Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
 > +        if (!fix)
 > +          return NULL;                  // Always true
 > +        new_cond->argument_list()->push_back(fix);
 >         }
 >         /*
 > -	Item_cond_and do not need fix_fields for execution, its parameters
 > -	are fixed or do not need fix_fields, too
 > +        Item_cond_and do not need fix_fields for execution, its 
parameters
 > +        are fixed or do not need fix_fields, too
 >         */
 >         new_cond->quick_fix_field();
 >         new_cond->used_tables_cache= ((Item_cond_or*) 
cond)->used_tables_cache;
 > @@ -19107,15 +19172,22 @@ make_cond_after_sjm(Item *root_cond, Ite
 >     }
 >
 >     /*
 > -    Because the following test takes a while and it can be done
 > -    table_count times, we mark each item that we have examined with 
the result
 > -    of the test
 > +    Omit this condition if
 > +     1. It has been marked as omittable before, or
 > +     2. Some tables referred by the condition are not available.
 >     */
 > +  if (cond->marker == 3 || 
    // 1
 > +      cond->used_tables()&  ~(tables | sjm_tables)) 
    // 2
 > +    return NULL;
 >
 > -  if (cond->marker == 3 || (cond->used_tables()&  ~(tables | 
sjm_tables)))
 > -    return (Item*) 0;				// Can't check this yet
 > -  if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
 > -    return cond;				// Not boolean op
 > +  /*
 > +    Extract this condition if
 > +     1. It has already been marked as applicable, or
 > +     2. It is not a<comparison predicate> 
(=,<,>,<=,>=,<=>)
 > +  */
 > +  if (cond->marker == 2 || 
    // 1
 > +      cond->eq_cmp_result() == Item::COND_OK) 
    // 2
 > +    return cond;
 >
 >     /*
 >       Remove equalities that are guaranteed to be true by use of 
'ref' access
 > @@ -19128,17 +19200,17 @@ make_cond_after_sjm(Item *root_cond, Ite
 >       if (left_item->type() == Item::FIELD_ITEM&&
 >   	test_if_ref(root_cond, (Item_field*) left_item,right_item))
 >       {
 > -      cond->marker=3;			// Checked when read
 > -      return (Item*) 0;
 > +      cond->marker=3;                   // Condition can be omitted
 > +      return NULL;
 >       }
 >       if (right_item->type() == Item::FIELD_ITEM&&
 >   	test_if_ref(root_cond, (Item_field*) right_item,left_item))
 >       {
 > -      cond->marker=3;			// Checked when read
 > -      return (Item*) 0;
 > +      cond->marker=3;                   // Condition can be omitted
 > +      return NULL;

Likewise.

 >       }
 >     }
 > -  cond->marker=2;
 > +  cond->marker=2;                       // Mark condition as applicable
 >     return cond;
 >   }
 >
 >
 >
 >
 >
 >

-- 
Øystein

Thread
bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265) Bug#45174Bug#50019 Bug#52068Roy Lyseng22 Oct
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)Bug#45174 Bug#50019 Bug#52068Guilhem Bichot22 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)Bug#45174 Bug#50019 Bug#52068Roy Lyseng22 Oct
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)Bug#45174 Bug#50019 Bug#52068Øystein Grøvlen22 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)Bug#45174 Bug#50019 Bug#52068Roy Lyseng25 Oct