List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:October 20 2010 9:13pm
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3260)
Bug#45174 Bug#50019 Bug#52068
View as plain text  
Hello Roy,

Really good work. I hope we can close the remaining points in time 
before I go on vacation :-/
I realize some of my comments are not requests below; to make it clear 
and given the short time available, I marked my comments with different 
tags:
REQ means a request for a change, or a question to answer, or tell me 
why I'm wrong (i.e. something which I'd like to be solved before 
approving the patch)
BLAH is an observation, which doesn't require any action.

Roy Lyseng a écrit, Le 07.10.2010 17:19:
> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:tor.didriksen@stripped
> 
>  3260 Roy Lyseng	2010-10-07
>       Bug#45174: Incorrectly applied equality propagation caused wrong result
>       on a query with a materialized semi-join.
>       Bug#50019: Wrong result for IN-query with materialization.
>       Bug#52068: Optimizer generates invalid semijoin materialization plan
>       
>       When a subquery is subject to a semijoin optimization, it's tables

REQ it's -> its

>       are merged to the outer query and later are treated as regular tables. 
>       This allows a bunch of optimizations to be applied, equality
>       propagation is among them. Equality propagation is done after query
>       execution plan is chosen. It substitutes fields from tables being
>       retrieved later for fields from tables being retrieved earlier.
>       However, it can't be applied as is to any semijoin table.
>       The semijoin materialization strategy differs from other semijoin
>       strategies that the data from materialized semijoin tables isn't used
>       directly but saved to a temporary table first.
>       The materialization isn't isolated in a separate step, it is done
>       inline within the nested loop execution.
>       When it comes to fetching rows from the first table in the block of
>       materialized semijoin tables, sub_select() function is called to
>       materialize the result of the subquery and save it in the
>       materialized table. Later, data from the materialized table is used
>       as they were regular table rows.

REQ "as" -> "as if"

>       Due to this we can't substitute fields that belong to the semi-join
>       for fields from outer query and vice versa. 
>       
>       Example: suppose we have a join order:
>       
>         ot1 ot2  SJ-Mat(it1  it2  it3)  ot3
>       
>       and equality ot2.col = it1.col = it2.col
>       If we're looking for best substitute for 'it2.col', we should pick
>       it1.col and not ot2.col.
>       
>       For a field that is not in a materialized semijoin we can use any field,
>       even those that are embedded in a materialized semijoin. This is because
>       such fields are "copied back" to their original join-tab structures when
>       the materialized temporary table is being read.
>       
>       Now we have added another Item_equal::get_first() function that accepts
>       as a parameter a field being substituted and checks whether it belongs
>       to a materialized semijoin.
>       The field to substitute will be from the same materialized semijoin nest
>       (if supplied field is within such nest), otherwise it will be the first
>       field in the multiple equality.
>       
>       The new checks rely on the first_sj_inner_tab and first_sj_inner_tab
>       fields of the join-tab. These fields are therefore set as soon as
>       possible after the join strategy is fixed.
>       
>       Also fixed problem appearing in Bug#52068: When MaterializeScan
>       semijoin strategy was used and there were one or more outer dependent
>       tables before the semijoin tables, the scan over the materialized
>       table was not properly reset for each row of the prefix outer tables.
>       
>       Also fixed problems with pushdown of SJM-aware predicates during
>       make_join_select(): wrong predicates were sometimes generated,
>       make_cond_after_sjm() was called at the wrong position, and
>       make_cond_after_sjm() was never actually considering the pushed-down
>       SJM predicates.

>       mysql-test/r/subquery_sj_none_jcl7.result
>         Bug#45174: Incorrectly applied equality propagation caused wrong result
>         on a query with a materialized semi-join.
>         Bug#50019: Wrong result for IN-query with materialization.
>         Bug#52068: Optimizer generates invalid semijoin materialization plan
>         Results for three new tests added.
>         Some tests using semijoin materialization show that where clause
>         has moved from the outer query into the materialized inner query.
>         This is caused by the changed call to get_first() in
>         eliminate_item_equal().
>         Ex: select * from ot where a in(select b from it where b>0);
>         The clause "b>0" is now evaluated on the inner query materialization.
>         Performance-wise this is never worse when using MaterializeScan and
>         usually better for MaterializeLookup. For the latter strategy, the
>         best possible solution is probably to evaluate the clause in both
>         queries, this can be subject for a later feature development.

REQ question: should we create a WL for this later hypothetical development?

>       sql/sql_select.cc
>         Bug#45174: Incorrectly applied equality propagation caused wrong result
>         on a query with a materialized semi-join.
>         Bug#50019: Wrong result for IN-query with materialization.
>         Bug#52068: Optimizer generates invalid semijoin materialization plan
>       
>         Setting fields first_sj_inner_tab and last_sj_inner_tab moved from
>         setup_semijoin_dups_elimination() to get_best_combination(), so they
>         are set as early as possible after join order optimization.
>         
>         In make_join_select(), the test that determined when to pushdown
>         SJM-specific predicates was wrong, in addition to approving the
>         comments.

REQ approving -> improving?

>       
>         The logic of eliminate_item_equal() has been simplified and
>         adjusted so that it generates equalities that are useful also
>         when the semijoin materialization strategy is being used.
>         Some simplification was possible by taking advantage of the new
>         Item_equal::get_first() function.
>       
>         In sub_select_sjm(), moved code that initializes the scan over the
>         materialized table so that it is now performed for each scan of
>         table, instead of only for the first scan.
>       
>         In make_cond_for_table_from_pred(), a number of comments has been
>         added, and TAB characters are replaced by spaces.
>       
>         In make_cond_after_sjm(), make sure that it handles equalities
>         generated for semijoin materialization (with marker=3).
>         Otherwise, removed marker optimizations for this function, as
>         it will only be called once per materialized semijoin nest
>         in a query. Added comments and removed TAB characters.


> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc	2010-09-28 15:17:29 +0000
> +++ b/sql/item_cmpfunc.cc	2010-10-07 15:15:10 +0000
> @@ -5816,3 +5816,101 @@ void Item_equal::print(String *str, enum
>    str->append(')');
>  }
>  
> +
> +/**
> +  Get the first field of multiple equality, use for semantic checking.
> +
> +  @retval First field in the multiple equality.
> +*/
> +
> +Item_field* Item_equal::get_first()
> +{
> +  return fields.head();

REQ This one is so small that I would leave it inline as it was (we can 
move the new comment to item_cmpfunc.h)...?

> +}
> +
> +/**
> +  Get the first equal field of multiple equality, use when execution order
> +  must be considered (see details below).
> +
> +  @param field  field to get substitution field for, which must be
> +                present within the multiple equality itself.
> +
> +  @retval Found first field in the multiple equality.
> +
> +  @details Get the first field of multiple equality that is equal to the
> +  given field. In order to make semijoin materialization strategy work
> +  correctly we can't propagate equal fields between a materialized semijoin
> +  and the outer query (or any other semijoin) unconditionally.
> +  Thus the field is returned according to following rules:
> +
> +  1) If the given field belongs to a materialized semijoin then the
> +     first field in multiple equality which belong to the same semijoin
> +     is returned.
> +  2) If the given field doesn't belong to a materialized semijoin then
> +     the first field in the multiple equality is returned.
> +*/
> +
> +Item_field* Item_equal::get_first(Item_field *field)

REQ the parameter can be declared as
"const Item_field *field"

> === modified file 'sql/item_cmpfunc.h'
> --- a/sql/item_cmpfunc.h	2010-08-12 00:26:10 +0000
> +++ b/sql/item_cmpfunc.h	2010-10-07 15:15:10 +0000
> @@ -1634,7 +1634,8 @@ public:
>    void add(Item_field *f);
>    uint members();
>    bool contains(Field *field);
> -  Item_field* get_first() { return fields.head(); }
> +  Item_field* get_first();
> +  Item_field* get_first(Item_field *field);

BLAH As already discussed in a previous round of comments, I wanted to 
add "const" at the declarations' end, to stress that those only read 
data, but as you observed, compiler does not permit. To be more precise 
(I always get caught by this): declaring
Item_field* get_first() const
{
   return fields.head();
}
means that "fields" becomes a constant list; thus head() can be applied 
to it only if it is a const function, which it is not. If I make head() 
be a const function in sql_list.h, it gets better. But not quite:
the iterator in get_first(field) is then a problem:
List_iterator<Item_field> it(fields);
because as "fields" is a constant list, it can be iterated only with a
const_iterator, and we don't have that in sql_list.h. I gave up :-)

> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc	2010-09-30 14:53:11 +0000
> +++ b/sql/sql_select.cc	2010-10-07 15:15:10 +0000
> @@ -4338,7 +4330,8 @@ int pull_out_semijoin_tables(JOIN *join)
>              pointers.
>            */
>            child_li.remove();
> -          upper_join_list->push_back(tbl);
> +          if (upper_join_list->push_back(tbl))
> +            DBUG_RETURN(TRUE);

REQ I think we should restore the active arena before returning TRUE, 
even though this is an error. Who knows what happens if we don't restore 
it... Also, I would add unlikely() around the if(), it should happen rarely.

> @@ -8639,16 +8635,13 @@ static bool fix_semijoin_strategies_for_
>  
>  static bool get_best_combination(JOIN *join)
>  {
> -  uint i,tablenr;
>    table_map used_tables;
> -  JOIN_TAB *join_tab,*j;
>    KEYUSE *keyuse;
> -  uint table_count;
> +  uint table_count= join->tables;

REQ make it "const uint"

> +    case SJ_OPT_FIRST_MATCH:
> +      /*
> +        Remember the first and last semijoin inner tables; this serves to tell
> +        a JOIN_TAB's semijoin strategy (like in check_join_cache_usage()).
> +      */
> +      JOIN_TAB *last_sj_tab= tab + pos->n_sj_tables - 1;
> +      JOIN_TAB *last_sj_inner=
> +        (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
> +        /* Range may end with non-inner table so cannot set last_sj_inner_tab */
> +        NULL : last_sj_tab;
> +      for (JOIN_TAB *tab_in_range= tab; 
> +           tab_in_range <= last_sj_tab; 
> +           tab_in_range++)

REQ there is trailing whitespace in those for() lines.

> @@ -9678,32 +9710,32 @@ static bool make_join_select(JOIN *join,

REQ You seem to understand well what make_join_select() does; could you
please add a short doxygen comment for it?

>        DBUG_ASSERT(save_used_tables ? tab->emb_sj_nest != NULL : TRUE);

BLAH save_used_tables is set here:
       if (tab->emb_sj_nest &&
           tab->emb_sj_nest->sj_mat_exec &&
           !(used_tables & tab->emb_sj_nest->sj_inner_tables))
       {
         save_used_tables= used_tables;
At your option, the two first expressions in the if() clause could maybe
be replaced with
sj_is_materialize_strategy(tab->get_sj_strategy())

> -      if (save_used_tables && !(used_tables & 
> -                                ~(tab->emb_sj_nest->sj_inner_tables |
> -                                  join->const_table_map | PSEUDO_TABLE_BITS)))

REQ Do I understand correctly: this test was wrong because, if there 
were 2 inner tables and we were positioned after the first and before 
the second, used_tables would be:
   const tables + some pseudo table bits + first inner
and so the if() above would be true, which is not correct, as we are not
yet at the last inner so should not call make_cond_after_sjm()?
Is that what you call "make_cond_after_sjm() was called at the wrong
position" in the commit comment?
I put back the old code, ran the test suite, and no SELECT gave bad
results. Is it normal? Maybe we need some test to cover that?

> +      /*
> +         1. We are inside a materialized semijoin nest, and
> +         2. All inner tables of the nest are covered.
> +      */ 
> +      if (save_used_tables &&                                        // 1
> +         !(tab->emb_sj_nest->sj_inner_tables & ~used_tables))        // 2
>        {
>          /*
> -          We have reached the end of semi join nest. That is, the join order

> @@ -10866,8 +10898,7 @@ make_join_readinfo(JOIN *join, ulonglong
>    uint last_sjm_table= MAX_TABLES;
>    DBUG_ENTER("make_join_readinfo");
>  
> -  if (!join->select_lex->sj_nests.is_empty() &&
> -      setup_semijoin_dups_elimination(join, options, no_jbuf_after))
> +  if (setup_semijoin_dups_elimination(join, options, no_jbuf_after))
>      DBUG_RETURN(TRUE); /* purecov: inspected */
>  
>    for (i=join->const_tables ; i < join->tables ; i++)
> @@ -12516,17 +12547,18 @@ Item *eliminate_item_equal(Item *cond, C
>                             Item_equal *item_equal)
>  {
>    List<Item> eq_list;
> -  Item_func_eq *eq_item= 0;
> +  Item_func_eq *eq_item= NULL;
>    if (((Item *) item_equal)->const_item() && !item_equal->val_int())
>      return new Item_int((longlong) 0,1); 
>    Item *item_const= item_equal->get_const();
>    Item_equal_iterator it(*item_equal);
>    Item *head;
> -  if (item_const)
> -    head= item_const;
> -  else
> +  if (!item_const)
>    {
> -    head= item_equal->get_first();
> +    /*
> +      If there is a const item, match all field items with the const item,
> +      otherwise match the second and subsequent field items with the first one:
> +    */
>      it++;
>    }
>    Item_field *item_field;
> @@ -12537,7 +12569,7 @@ Item *eliminate_item_equal(Item *cond, C
>      if (upper)
>      { 
>        if (item_const && upper->get_const())
> -        item= 0;
> +        item= NULL;
>        else
>        {
>          Item_equal_iterator li(*item_equal);
> @@ -12552,57 +12584,55 @@ Item *eliminate_item_equal(Item *cond, C
>      {
>        if (eq_item)
>          eq_list.push_back(eq_item);
> +
>        /*
> -        item_field might refer to a table that is within a semi-join
> -        materialization nest. In that case, the join order looks like this:
> +        item_field may refer to a table that is within a semijoin
> +        materialization nest. In that case, the join order may look like:
>  
> -          outer_tbl1 outer_tbl2 SJM (inner_tbl1 inner_tbl2) outer_tbl3 
> +          ot1 ot2 SJM (it1 it2) ot3 
>  
> -        We must not construct equalities like 
> +        We must not construct general equalities like 
>  
> -           outer_tbl1.col = inner_tbl1.col 
> +           ot1.col = it1.col 
>  
>          because they would get attached to inner_tbl1 and will get evaluated
>          during materialization phase, when we don't have current value of
> -        outer_tbl1.col.
> +        ot1.col. When such equalities are generated, add a special marker to
> +        the condition so that it will be evaluated after the inner tables
> +        have been materialized, and the materialized table is joined with the
> +        outer tables.
> +      */
> +
> +      bool need_marker= FALSE;
> +      /*
> +        If there is a const item, match against this one.
> +        Otherwise, match against the first field item in the multiple equality,
> +        unless the item is within a materialized semijoin nest, where we match
> +        against the first item within the SJM nest (if the item is not the first
> +        item within the SJM nest), or match against the first item in the
> +        list (if the item is the first one in the SJM list).

REQ About the final "or match..." part above: isn't it strange to build
equality fo=fi where "fi" is a field of the first SJM inner table, and 
"fo" is a field of an outer, preceding table? Doesn't it go against the 
idea that outer and inner fields should be separated when propagating
equalities? Why is the _first_ field of the SJM an exception?

> +        The latter equalities are the ones that must be tagged with a marker,
> +        to prevent them from being evaluated at the wrong place.
>        */
> -      TABLE_LIST *emb_nest= 
> -        item_field->field->table->pos_in_table_list->embedding;
> -      if (!item_const && emb_nest && emb_nest->sj_mat_exec)
> +      head= item_const ? item_const : item_equal->get_first(item);
> +      if (head == item)

REQ this if(head==item) branch, I don't understand.

>        {
> -        /* 
> -          Find the first equal expression that refers to a table that is
> -          within the semijoin nest. If we can't find it, do nothing
> -        */
> -        List_iterator<Item_field> fit(item_equal->fields);
> -        Item_field *head_in_sjm;
> -        bool found= FALSE;
> -        while ((head_in_sjm= fit++))
> -        {
> -          if (head_in_sjm->used_tables() & emb_nest->sj_inner_tables)
> -          {
> -            if (head_in_sjm == item_field)
> -            {
> -              /* This is the first table inside the semi-join*/
> -              eq_item= new Item_func_eq(item_field, head);
> -              /* Tell make_cond_for_table don't use this. */
> -              eq_item->marker=3;
> -            }
> -            else
> -            {
> -              eq_item= new Item_func_eq(item_field, head_in_sjm);
> -              found= TRUE;
> -            }
> -            break;
> -          }
> -        }
> -        if (!found)
> -          continue;
> +        head= item_equal->get_first();
> +        need_marker= TRUE;
>        }
> -      else
> -        eq_item= new Item_func_eq(item_field, head);
> +      eq_item= new Item_func_eq(item_field, head);
>        if (!eq_item)
>          return 0;
> +      if (need_marker)
> +      {
> +        /*
> +          Setting 'marker= 3' means that make_cond_for_table() will ignore this
> +          condition, but make_cond_after_sjm() will pick it up.

REQ why is it needed to make make_cond_for_table() ignore this 
first-table condition (I see a comment says that otherwise the condition 
would be evaluated at the wrong place?).
I removed this "if(need_marker)" block, and no test output changed. 
Could you please give an example, possibly add it to a test file to 
improve coverage?

> +          The design is not pretty, but the alternative is to have a hairy
> +          check for SJM-related conditions inside make_cond_for_table().
> +        */
> +        eq_item->marker= 3;
> +      }
>        eq_item->set_cmp_func();
>        eq_item->quick_fix_field();
>      }
> @@ -16897,21 +16927,8 @@ sub_select_sjm(JOIN *join, JOIN_TAB *joi
>        Ok, materialization finished. Initialize the access to the temptable
>      */
>      sjm->materialized= TRUE;
> +
>      join_tab->read_record.read_record= join_no_more_records;
> -    if (sjm->is_scan)
> -    {
> -      /* Initialize full scan */
> -      JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
> -      init_read_record(&last_tab->read_record, join->thd,
> -                       sjm->table, NULL, TRUE, TRUE, FALSE);
> -
> -      DBUG_ASSERT(last_tab->read_record.read_record == rr_sequential);
> -      last_tab->read_first_record= join_read_record_no_init;
> -      last_tab->read_record.copy_field= sjm->copy_field;
> -      last_tab->read_record.copy_field_end= sjm->copy_field +
> -                                            sjm->table_cols.elements;
> -      last_tab->read_record.read_record= rr_sequential_and_unpack;
> -    }
>    }
>  
>    if (sjm->is_scan)
> @@ -16919,6 +16936,15 @@ sub_select_sjm(JOIN *join, JOIN_TAB *joi
>      /* Do full scan of the materialized table */

REQ I'd change comment to something like:
/* For each prefix partial record, initialize full scan of the
materialized table */

>      JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
>  
> +    init_read_record(&last_tab->read_record, join->thd,
> +                     sjm->table, NULL, TRUE, TRUE, FALSE);
> +
> +    last_tab->read_first_record= join_read_record_no_init;
> +    last_tab->read_record.copy_field= sjm->copy_field;
> +    last_tab->read_record.copy_field_end= sjm->copy_field +
> +                                          sjm->table_cols.elements;
> +    last_tab->read_record.read_record= rr_sequential_and_unpack;
> +
>      Item *save_cond= last_tab->select_cond;

BLAH Out of curiosity, I don't understand why we need to save/restore 
the cond; no subquery_sj test fails without this. But this is not 
related to your patch, you can keep it.

>      last_tab->set_select_cond(sjm->join_cond, __LINE__);
>      rc= sub_select(join, last_tab, end_of_records);
> @@ -18837,13 +18863,14 @@ static bool replace_subcondition(JOIN *j
>        cond         Condition to analyze
>        tables       Tables for which "current field values" are available
>        used_table   Table that we're extracting the condition for (may 
> -                   also include PSEUDO_TABLE_BITS
> +                   also include PSEUDO_TABLE_BITS, and may be zero)
>        exclude_expensive_cond  Do not push expensive conditions
>  
>    DESCRIPTION
>      Extract the condition that can be checked after reading the table
>      specified in 'used_table', given that current-field values for tables
>      specified in 'tables' bitmap are available.
> +    If 'used_table' is 0 (zero), extract conditions for all tables in 'tables'.

BLAH "0 (zero)" is funny; keep it if you like, or just choose between 0 
and zero.

>  static Item *
>  make_cond_for_table_from_pred(Item *root_cond, Item *cond,
>                                table_map tables, table_map used_table,
>                                bool exclude_expensive_cond)
>  {
> -  if (used_table && !(cond->used_tables() & used_table) &&
> -      /*
> -        Exclude constant conditions not checked at optimization time if
> +  /*
> +    Ignore this condition if
> +     1. We are extracting conditions for a specific table, and
> +     2. that table is not referenced by the condition, and
> +     3. exclude constant conditions not checked at optimization time if
>          the table we are pushing conditions to is the first one.
>          As a result, such conditions are not considered as already checked
>          and will be checked at execution time, attached to the first table.

BIG PRAISE thank for the wealth of new comments, which do make those 
functions more understandable for me. Simply put: before you added 
comments, I never dared to parse those bitmap &~ things.

> -
> +  */
> +  if (used_table &&                                                 // 1
> +      !(cond->used_tables() & used_table) &&                       
> // 2
> +      /*
>          psergey: TODO: "used_table & 1" doesn't make sense in nearly any
>          context. Look at setup_table_map(), table bits reflect the order 
>          the tables were encountered by the parser. Check what we should
>          replace this condition with.
>        */
> -      !((used_table & 1) && cond->is_expensive()))
> -    return (Item*) 0;				// Already checked
> +      !((used_table & 1) && cond->is_expensive()))                 
> // 3
> +    return (Item*) NULL;                      // Already checked

REQ in all places where (Item*)0 is changed to (Item*)NULL, I'd go even
further and just use NULL without a cast; my gcc seems to be ok with it.

>      else
> -    {						// Or list
> +    {                                         // Or list
>        Item_cond_or *new_cond=new Item_cond_or;
>        if (!new_cond)
> -	return (Item*) 0;			// OOM /* purecov: inspected */
> +        return (Item*) NULL;
>        List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
>        Item *item;
>        while ((item=li++))
>        {
> -	Item *fix=make_cond_for_table_from_pred(root_cond, item,
> +        Item *fix=make_cond_for_table_from_pred(root_cond, item,
>                                                  tables, 0L,

BLAH It took me a while to figure out why when splitting OR we pass 0L, 
whereas when splitting AND we pass used_table; with an example I was 
able to understand (boils down to: AND-ed parts can be pushed down 
independently but not OR-ed parts), but I cannot find a short comment 
which would clearly explain it and could be added in code.

>                                                  exclude_expensive_cond);
>  	if (!fix)
> -	  return (Item*) 0;			// Always true
> +          return (Item*) NULL;                // Always true
>  	new_cond->argument_list()->push_back(fix);
>        }
>        /*
> @@ -18952,20 +18985,25 @@ make_cond_for_table_from_pred(Item *root
>    }
>  
>    /*
> -    Because the following test takes a while and it can be done
> -    table_count times, we mark each item that we have examined with the result
> -    of the test
> -  */
> +    Omit this condition if
> +     1. It has been marked as omittable before, or
> +     2. All tables referred by the condition are not available, or

REQ I'd change "All" to "Some" in (2).

> +     3. We are extracting constant conditions, the condition is
> +        considered 'expensive', and we want to delay evaluation of such 
> +        conditions to the execution phase.

REQ Actually, "!used_table" may not always mean "extracting constant
condition"; we see that used_table==0 when we split an OR a few lines
above in the same function; and your comment describes used_table==0 as
"extract conditions for all tables in 'tables'".

> +  */
> +  if (cond->marker == 3 ||                                             // 1

REQ I think it's time to:
- add a doxygen comment near the declaration of "marker" in item.h
- make this an enum with meaningful names

> +      (cond->used_tables() & ~tables) ||                               // 2
> +      (!used_table && exclude_expensive_cond &&
> cond->is_expensive())) // 3
> +    return (Item*) NULL;
>  
> -  if (cond->marker == 3 || (cond->used_tables() & ~tables) ||
> -      /*
> -        When extracting constant conditions, treat expensive conditions as
> -        non-constant, so that they are not evaluated at optimization time.
> -      */
> -      (!used_table && exclude_expensive_cond &&
> cond->is_expensive()))
> -    return (Item*) 0;				// Can't check this yet
> +  /*
> +    Extract this condition if
> +     1. It has already been marked as applicable, or
> +     2. It is not a <comparison predicate> (=, <, >, <=, >=,
> <=>)
> +  */
>    if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
> -    return cond;				// Not boolean op
> +    return cond;
>  
>    /* 
>      Remove equalities that are guaranteed to be true by use of 'ref' access
> @@ -18993,19 +19031,19 @@ make_cond_for_table_from_pred(Item *root
>      Item *left_item= ((Item_func*) cond)->arguments()[0]->real_item();
>      Item *right_item= ((Item_func*) cond)->arguments()[1]->real_item();
>      if (left_item->type() == Item::FIELD_ITEM &&
> -	test_if_ref(root_cond, (Item_field*) left_item,right_item))
> +        test_if_ref(root_cond, (Item_field*) left_item,right_item))
>      {
> -      cond->marker=3;			// Checked when read
> -      return (Item*) 0;
> +      cond->marker=3;                    // Condition can be omitted
> +      return (Item*) NULL;
>      }
>      if (right_item->type() == Item::FIELD_ITEM &&
> -	test_if_ref(root_cond, (Item_field*) right_item,left_item))
> +        test_if_ref(root_cond, (Item_field*) right_item,left_item))
>      {
> -      cond->marker=3;			// Checked when read
> -      return (Item*) 0;
> +      cond->marker=3;                   // Condition can be omitted

BLAH I wondered: but who checks this marker later? Looks like in 
make_join_select(), make_cond_for_table() may be called with the same 
predicate again and again: for example, one "normal call" and one call 
for engine condition pushdown.
Apparently this marker thing is only an optimization.

> +      return (Item*) NULL;
>      }
>    }
> -  cond->marker=2;
> +  cond->marker=2;                       // Mark condition as applicable
>    return cond;
>  }
>  
> @@ -19034,9 +19072,11 @@ static Item *
>  make_cond_after_sjm(Item *root_cond, Item *cond, table_map tables,
>                      table_map sjm_tables)

REQ the code of make_cond_after_sjm() is 90% copy-paste from
make_cond_for_table_from_pred() (even some indentation errors are
identical); could you please put a comment in both functions which
suggests to keep them in sync?

This function has comment:
"A regular semi-join materialization is always non-correlated, ie
the subquery is always resolved by performing a lookup generated in
create_subquery_equalities, hence this function should never produce
any condition for regular semi-join materialization.
For a scan semi-join materialization, this function may return a 
condition to be checked."

but, I see that this function does produce a condition for 
materialization-lookup in some cases, before and after your patch. For 
example, in subquery_sj_mat.test, in the section for BUG#45174, for the 
"EXPLAIN SELECT varchar_nokey etc", a condition is produced by 
make_cond_after_sjm(). This happens before and after your patch, though 
the produced conditions are different (and I find that your patch makes 
things better, as it manages to test more conditions on t2 when 
materializing; more conditions at low levels, is better :-).

So this comment looks wrong, but this isn't due to your patch.

I looked at the changes in pushed conditions introduced by the patch (by 
"pushed", I mean conditions which we attach to each table, evaluated 
after a row has been read from the table; not index/engine condition 
pushdown). It will be more understandable for you, if you use the same 
method as I did: I prepared a playground for this. I have prepared a 
custom version of the optimizer trace tree (you can get it from tree 
mysql-next-mr-guilhem at usual machine). It has your patch applied, both 
the code and test file changes. As for the result files, I ran
./mtr --record t/subquery_sj*.test --opt-trace-protocol
which generates the optimizer trace of each SELECT or EXPLAIN SELECT 
(good work Jorgen). So the result files show conditions "after your patch".
If you then revert only the code piece of your patch by applying the 
patch attached to this mail, you can re-run tests and see the 
differences; the differences in the optimizer traces will show what 
condition disappears/appears/moves. Remember: the "result" file is 
after-your-patch; the "reject" file is before-your-patch. The 
interesting test is subquery_sj_mat. For browsing differences, I warmly 
recommend a GUI diff tool, which shows line numbers and can do text 
searches (I use kdiff3). Please ignore the differences of records' 
counts in the trace, they are probably due to InnoDB's "random" 
statistics. Conditions generated by make_cond_after_sjm() are named 
"attached_after_semijoin_materialization" in the optimizer trace.

Below I go through some noteworthy changes of subquery_sj_mat which I 
observed by the procedure above.

1) I see a good change, the last query of the test for BUG#52068 (line 
109510 of the result file): after your patch we have 
make_cond_after_sjm() which produces "it2.a=ot1.a" which is then 
attached to the mat-scan-SJM, so tested before we reach the last table 
in plan (ot4). That is a win.

2) Another win, the last query of the test for BUG#45174 (line 108683 of 
the result file): your patch moves a condition from after-SJM to 
inside-SJM (from "attached_after_semijoin_materialization" to 
"attached"-to-t2)

REQ 3) This time make_cond_after_sjm() produces a condition only after 
your patch: at line 2359 of the result file:
"select * from t2 where t2.a in (select a from t1)"
mat-lookup is used, which, I guess, looks up value of t2.a in the 
materialized table, but make_cond_after_sjm() produces
"t1.a=t2.a"
which is think is a superfluous condition to check (as index lookup 
should already guarantee this?).
There are other examples, in the queries which follow the one above 
(just scroll down in the diff). Unfortunately it seems to be often such 
unneeded conditions (repeating what the index lookup already 
implements), for mat-lookup, and added by the patch.
Somehow it makes the questionable comment ("a regular semi-join etc") be 
wrong more often.
On the one hand, I don't see this as a big problem: I'm not sure this is 
a real speed penalty to check this, even for every row.
On the other hand, I cannot explain why we see this change. Could you 
please have a look, to be sure?

REQ 4) at line 65381 of the result file: query
  explain select *
  from t0 where a in
  (select t2.a+t3.a from t1 left join (t2 join t3) on t2.a=t1.a and 
t3.a=t1.a);
is worrying: before the patch it used to have a condition attached to 
t3, i.e. evaluated inside materialization, which had the potential to 
make the materialized table smaller. The patch moves this condition out 
into a make_cond_after_sjm() condition (so, the materialized table will 
be bigger). I don't understand what can explain the move, and if it has 
to be. I don't think it can be be explained the same way as the "kp1<20" 
move which we discussed in a previous mail. I don't see that equality 
propagation plays a direct role here. Could you please investigate this?

5) In subquery_sj_mat, line 65488 of the result file, query
   explain extended select left(a1,7), left(a2,Y)
   from t1_16
   where a1 in (select b1 from t2_16 where b1 > '0')
we have MatScan; your patch adds a make_cond_after_sjm condition, which 
seems to be key to give correct query results. I don't really understand 
why; I thought that the make_cond_after_sjm condition was just an 
optimization. Bah, no need to research, this is a good thing.

>  {
> +  if (!cond)
> +    return (Item*) NULL;

REQ but make_cond_for_table() behaves differently: it crashes if
cond is NULL. Can we make those two twins have the same expectations on 
'cond'?

>    if ((!(cond->used_tables() & ~tables) || 
>         !(cond->used_tables() & ~sjm_tables)))
> -    return (Item*) 0;				// Already checked
> +    return (Item*) NULL;                        // Already checked
>    if (cond->type() == Item::COND_ITEM)
>    {
>      if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
> @@ -19044,49 +19084,49 @@ make_cond_after_sjm(Item *root_cond, Ite
>        /* Create new top level AND item */
>        Item_cond_and *new_cond=new Item_cond_and;
>        if (!new_cond)
> -	return (Item*) 0;			// OOM /* purecov: inspected */
> +        return (Item*) NULL;
>        List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
>        Item *item;
>        while ((item=li++))
>        {
> -	Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
> -	if (fix)
> -	  new_cond->argument_list()->push_back(fix);
> +        Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
> +        if (fix)
> +          new_cond->argument_list()->push_back(fix);
>        }
>        switch (new_cond->argument_list()->elements) {
>        case 0:
> -	return (Item*) 0;			// Always true
> +        return (Item*) NULL;                     // Always true
>        case 1:
> -	return new_cond->argument_list()->head();
> +        return new_cond->argument_list()->head();
>        default:
>  	/*
> -	  Item_cond_and do not need fix_fields for execution, its parameters
> -	  are fixed or do not need fix_fields, too
> +          Item_cond_and do not need fix_fields for execution, its parameters
> +          are fixed or do not need fix_fields, too
>  	*/
> -	new_cond->quick_fix_field();
> -	new_cond->used_tables_cache=
> -	  ((Item_cond_and*) cond)->used_tables_cache &
> -	  tables;
> -	return new_cond;
> +        new_cond->quick_fix_field();
> +        new_cond->used_tables_cache=
> +          ((Item_cond_and*) cond)->used_tables_cache &
> +          tables;
> +        return new_cond;
>        }
>      }
>      else
> -    {						// Or list
> +    {                                          // Or list
>        Item_cond_or *new_cond=new Item_cond_or;
>        if (!new_cond)
> -	return (Item*) 0;			// OOM /* purecov: inspected */
> +        return (Item*) NULL;
>        List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
>        Item *item;
>        while ((item=li++))
>        {
> -	Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
> -	if (!fix)
> -	  return (Item*) 0;			// Always true
> -	new_cond->argument_list()->push_back(fix);
> +        Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
> +        if (!fix)
> +          return (Item*) NULL;                 // Always true
> +        new_cond->argument_list()->push_back(fix);
>        }
>        /*
> -	Item_cond_and do not need fix_fields for execution, its parameters
> -	are fixed or do not need fix_fields, too
> +        Item_cond_and do not need fix_fields for execution, its parameters
> +        are fixed or do not need fix_fields, too
>        */
>        new_cond->quick_fix_field();
>        new_cond->used_tables_cache= ((Item_cond_or*) cond)->used_tables_cache;
> @@ -19095,16 +19135,10 @@ make_cond_after_sjm(Item *root_cond, Ite
>      }
>    }
>  
> -  /*
> -    Because the following test takes a while and it can be done
> -    table_count times, we mark each item that we have examined with the result
> -    of the test
> -  */
> -
> -  if (cond->marker == 3 || (cond->used_tables() & ~(tables |
> sjm_tables)))
> -    return (Item*) 0;				// Can't check this yet
> -  if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
> -    return cond;				// Not boolean op
> +  if (cond->used_tables() & ~(tables | sjm_tables))
> +    return (Item*) NULL;                       // Can't check this yet
> +  if (cond->eq_cmp_result() == Item::COND_OK)
> +    return cond;                               // Not a <comparison
> predicate>
>  
>    /* 
>      Remove equalities that are guaranteed to be true by use of 'ref' access
> @@ -19117,17 +19151,14 @@ make_cond_after_sjm(Item *root_cond, Ite
>      if (left_item->type() == Item::FIELD_ITEM &&
>  	test_if_ref(root_cond, (Item_field*) left_item,right_item))
>      {
> -      cond->marker=3;			// Checked when read
>        return (Item*) 0;
>      }
>      if (right_item->type() == Item::FIELD_ITEM &&
>  	test_if_ref(root_cond, (Item_field*) right_item,left_item))
>      {
> -      cond->marker=3;			// Checked when read
>        return (Item*) 0;
>      }
>    }
> -  cond->marker=2;

REQ The commit comment says that we remove all the cond->marker things 
in this function because we will not come back to this condition. Is it 
the only reason, or also to support the "not pretty design" described 
earlier (which would be a ugly ;-).
By default, I would have preferred make_cond_after_sjm() and 
make_cond_for_table_from_pred() to stay as identical as possible, so 
that one day we can merge them in one function and kill this horrible 
copy-pasting.

BLAH In my simplified view, make_cond_after_sjm() should be as simple as 
a call to
make_cond_for_table_from_pred(cond, cond, 0, tables|sjm_tables);

-- 
Mr. Guilhem Bichot <guilhem.bichot@stripped>
Oracle / MySQL / Optimizer team, Lead Software Engineer
Bordeaux, France
www.oracle.com / www.mysql.com

=== modified file 'sql/item.cc'
--- sql/item.cc	2010-10-20 14:13:28 +0000
+++ sql/item.cc	2010-10-20 15:27:35 +0000
@@ -5023,8 +5023,8 @@
         return this;
       return const_item;
     }
-    Item_field *subst= item_equal->get_first(this);
-    if (field->table != subst->field->table &&
!field->eq(subst->field))
+    Item_field *subst= item_equal->get_first();
+    if (subst && field->table != subst->field->table &&
!field->eq(subst->field))
       return subst;
   }
   return this;

=== modified file 'sql/item_cmpfunc.cc'
--- sql/item_cmpfunc.cc	2010-10-20 14:13:28 +0000
+++ sql/item_cmpfunc.cc	2010-10-20 15:27:35 +0000
@@ -5777,102 +5777,3 @@
   args[0]->print(str, query_type);
   str->append(", true)");
 }
-
-
-/**
-  Get the first field of multiple equality, use for semantic checking.
-
-  @retval First field in the multiple equality.
-*/
-
-Item_field* Item_equal::get_first()
-{
-  return fields.head();
-}
-
-/**
-  Get the first equal field of multiple equality, use when execution order
-  must be considered (see details below).
-
-  @param field  field to get substitution field for, which must be
-                present within the multiple equality itself.
-
-  @retval Found first field in the multiple equality.
-
-  @details Get the first field of multiple equality that is equal to the
-  given field. In order to make semijoin materialization strategy work
-  correctly we can't propagate equal fields between a materialized semijoin
-  and the outer query (or any other semijoin) unconditionally.
-  Thus the field is returned according to following rules:
-
-  1) If the given field belongs to a materialized semijoin then the
-     first field in multiple equality which belong to the same semijoin
-     is returned.
-  2) If the given field doesn't belong to a materialized semijoin then
-     the first field in the multiple equality is returned.
-*/
-
-Item_field* Item_equal::get_first(Item_field *field)
-{
-  DBUG_ASSERT(field != NULL);
-
-  const JOIN_TAB *field_tab= field->field->table->reginfo.join_tab;
-
-  if (sj_is_materialize_strategy(field_tab->get_sj_strategy()))
-  {
-    /*
-      It's a field from a materialized semijoin. We can substitute it only
-      with a field from the same semijoin.
-
-      Example: suppose we have a join order:
-
-       ot1 ot2  SJM(it1  it2  it3)  ot3
-
-      and equality ot2.col = it1.col = it2.col
-
-      If we're looking for best substitute for 'it2.col', we must pick it1.col
-      and not ot2.col. it2.col is evaluated while performing materialization,
-      when the outer tables are not available in the execution.
-    */
-    List_iterator<Item_field> it(fields);
-    Item_field *item;
-    const JOIN_TAB *first= field_tab->first_sj_inner_tab;
-    const JOIN_TAB *last=  field_tab->last_sj_inner_tab;
-
-    while ((item= it++))
-    {
-      if (item->field->table->reginfo.join_tab >= first &&
-          item->field->table->reginfo.join_tab <= last)
-      {
-        return item;
-      }
-    }
-  }
-  else
-  {
-    /*
-      The field is not in a materialized semijoin nest. We can return
-      the first field in the multiple equality.
-
-      Example: suppose we have a join order with MaterializeLookup:
-
-       ot1 ot2  SJM-Lookup(it1  it2)
-
-      Here we should always pick the first field in the multiple equality,
-      as this will be present before all other dependent fields.
-
-      Example: suppose we have a join order with MaterializeScan:
-
-          SJM-Scan(it1  it2)  ot1  ot2
-
-      and equality ot2.col = ot1.col = it2.col.
-
-      When looking for best substitute for 'ot2.col', we can pick it2.col,
-      because when we run the scan, column values from the inner materialized
-      tables will be copied back to the column buffers for it1 and it2.
-    */
-    return fields.head();
-  }
-  DBUG_ASSERT(FALSE);                          // Should never get here.
-  return NULL;
-}

=== modified file 'sql/item_cmpfunc.h'
--- sql/item_cmpfunc.h	2010-10-20 14:13:28 +0000
+++ sql/item_cmpfunc.h	2010-10-20 15:27:35 +0000
@@ -1674,8 +1674,7 @@
   void add(Item_field *f);
   uint members();
   bool contains(Field *field);
-  Item_field* get_first();
-  Item_field* get_first(Item_field *field);
+  Item_field* get_first() { return fields.head(); }
   void merge(Item_equal *item);
   void update_const();
   enum Functype functype() const { return MULT_EQUAL_FUNC; }

=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc	2010-10-20 14:13:28 +0000
+++ sql/sql_select.cc	2010-10-20 15:27:35 +0000
@@ -1459,16 +1459,13 @@
   setup_sj_materialization() (todo: can't we move that to here also?)
 */
 
-bool setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
-                                     uint no_jbuf_after)
+int setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
+                                    uint no_jbuf_after)
 {
   uint tableno;
   THD *thd= join->thd;
   DBUG_ENTER("setup_semijoin_dups_elimination");
 
-  if (join->select_lex->sj_nests.is_empty())
-    DBUG_RETURN(FALSE);
-
   for (tableno= join->const_tables ; tableno < join->tables; )
   {
     JOIN_TAB *tab=join->join_tab + tableno;
@@ -1709,6 +1706,21 @@
         break;
       }
     }
+    /*
+      Remember the first and last semijoin inner tables; this serves to tell
+      a JOIN_TAB's semijoin strategy (like in check_join_cache_usage()).
+    */
+    JOIN_TAB *last_sj_inner=
+      (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
+      /* Range may end with non-inner table so cannot set last_sj_inner_tab */
+      NULL : last_sj_tab;
+    for (JOIN_TAB *tab_in_range= tab; 
+         tab_in_range <= last_sj_tab; 
+         tab_in_range++)
+    {
+      tab_in_range->first_sj_inner_tab= tab;
+      tab_in_range->last_sj_inner_tab=  last_sj_inner;
+    }
   }
   DBUG_RETURN(FALSE);
 }
@@ -4318,18 +4330,14 @@
     using Materialization or LooseScan to execute it. 
 
   RETURN 
-    FALSE - OK
-    TRUE  - Out of memory error
+    0 - OK
+    1 - Out of memory error
 */
 
-bool pull_out_semijoin_tables(JOIN *join)
+int pull_out_semijoin_tables(JOIN *join)
 {
   TABLE_LIST *sj_nest;
   DBUG_ENTER("pull_out_semijoin_tables");
-
-  if (join->select_lex->sj_nests.is_empty())
-    DBUG_RETURN(FALSE);
-
   List_iterator<TABLE_LIST> sj_list_it(join->select_lex->sj_nests);
   Opt_trace_context * const trace= join->thd->opt_trace;
 
@@ -4419,8 +4427,7 @@
             pointers.
           */
           child_li.remove();
-          if (upper_join_list->push_back(tbl))
-            DBUG_RETURN(TRUE);
+          upper_join_list->push_back(tbl);
           tbl->join_list= upper_join_list;
           tbl->embedding= sj_nest->embedding;
         }
@@ -4441,7 +4448,7 @@
         join->thd->restore_active_arena(arena, &backup);
     }
   }
-  DBUG_RETURN(FALSE);
+  DBUG_RETURN(0);
 }
 
 
@@ -8583,18 +8590,15 @@
 
 static bool fix_semijoin_strategies_for_picked_join_order(JOIN *join)
 {
-  uint tableno;
+  uint tablenr;
   table_map remaining_tables= 0;
   table_map handled_tabs= 0;
   Opt_trace_context * const trace= join->thd->opt_trace;
   DBUG_ENTER("fix_semijoin_strategies_for_picked_join_order");
 
-  if (join->select_lex->sj_nests.is_empty())
-    DBUG_RETURN(FALSE);
-
-  for (tableno= join->tables - 1 ; tableno != join->const_tables - 1; tableno--)
+  for (tablenr= join->tables - 1 ; tablenr != join->const_tables - 1; tablenr--)
   {
-    POSITION *pos= join->best_positions + tableno;
+    POSITION *pos= join->best_positions + tablenr;
     JOIN_TAB *s= pos->table;
     TABLE_LIST *emb_sj_nest= s->emb_sj_nest;
     uint first;
@@ -8631,7 +8635,7 @@
       */
       memcpy(pos - table_count + 1, emb_sj_nest->nested_join->sjm.positions, 
              sizeof(POSITION) * table_count);
-      first= tableno - table_count + 1;
+      first= tablenr - table_count + 1;
       join->best_positions[first].n_sj_tables= table_count;
       join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE_LOOKUP;
 
@@ -8669,12 +8673,12 @@
       
       uint i;
       table_map rem_tables= remaining_tables;
-      for (i= tableno; i != (first + table_count - 1); i--)
+      for (i= tablenr; i != (first + table_count - 1); i--)
         rem_tables |= join->best_positions[i].table->table->map;
 
       POSITION dummy;
       join->cur_sj_inner_tables= 0;
-      for (i= first + table_count; i <= tableno; i++)
+      for (i= first + table_count; i <= tablenr; i++)
       {
         Opt_trace_object oto0(trace);
         Opt_trace_object oto1(trace,
@@ -8693,14 +8697,14 @@
     {
       first= pos->first_firstmatch_table;
       join->best_positions[first].sj_strategy= SJ_OPT_FIRST_MATCH;
-      join->best_positions[first].n_sj_tables= tableno - first + 1;
+      join->best_positions[first].n_sj_tables= tablenr - first + 1;
       POSITION dummy; // For loose scan paths
       double record_count= (first== join->const_tables)? 1.0: 
-                           join->best_positions[tableno - 1].prefix_record_count;
+                           join->best_positions[tablenr - 1].prefix_record_count;
       
       table_map rem_tables= remaining_tables;
       uint idx;
-      for (idx= first; idx <= tableno; idx++)
+      for (idx= first; idx <= tablenr; idx++)
       {
         rem_tables |= join->best_positions[idx].table->table->map;
       }
@@ -8709,7 +8713,7 @@
         join buffering
       */ 
       join->cur_sj_inner_tables= 0;
-      for (idx= first; idx <= tableno; idx++)
+      for (idx= first; idx <= tablenr; idx++)
       {
         if (join->best_positions[idx].use_join_buffer)
         {
@@ -8732,18 +8736,18 @@
       POSITION *first_pos= join->best_positions + first;
       POSITION loose_scan_pos; // For loose scan paths
       double record_count= (first== join->const_tables)? 1.0: 
-                           join->best_positions[tableno - 1].prefix_record_count;
+                           join->best_positions[tablenr - 1].prefix_record_count;
       
       table_map rem_tables= remaining_tables;
       uint idx;
-      for (idx= first; idx <= tableno; idx++)
+      for (idx= first; idx <= tablenr; idx++)
         rem_tables |= join->best_positions[idx].table->table->map;
       /*
         Re-run best_access_path to produce best access methods that do not use
         join buffering
       */ 
       join->cur_sj_inner_tables= 0;
-      for (idx= first; idx <= tableno; idx++)
+      for (idx= first; idx <= tablenr; idx++)
       {
         if (join->best_positions[idx].use_join_buffer || (idx == first))
         {
@@ -8773,7 +8777,7 @@
       */
       first= pos->first_dupsweedout_table;
       join->best_positions[first].sj_strategy= SJ_OPT_DUPS_WEEDOUT;
-      join->best_positions[first].n_sj_tables= tableno - first + 1;
+      join->best_positions[first].n_sj_tables= tablenr - first + 1;
     }
     
     uint i_end= first + join->best_positions[first].n_sj_tables;
@@ -8788,7 +8792,7 @@
       handled_tabs |= join->best_positions[i].table->table->map;
     }
 
-    if (tableno != first)
+    if (tablenr != first)
       pos->sj_strategy= SJ_OPT_NONE;
     remaining_tables |= s->table->map;
   }
@@ -8825,13 +8829,16 @@
 
 static bool get_best_combination(JOIN *join)
 {
+  uint i,tablenr;
   table_map used_tables;
+  JOIN_TAB *join_tab,*j;
   KEYUSE *keyuse;
-  uint table_count= join->tables;
+  uint table_count;
   THD *thd=join->thd;
   DBUG_ENTER("get_best_combination");
 
-  if (!(join->join_tab= new (thd->mem_root) JOIN_TAB[table_count]))
+  table_count=join->tables;
+  if (!(join->join_tab= join_tab= new (thd->mem_root) JOIN_TAB[table_count]))
     DBUG_RETURN(TRUE);
 
   join->full_join=0;
@@ -8841,12 +8848,11 @@
   if (fix_semijoin_strategies_for_picked_join_order(join))
     DBUG_RETURN(TRUE);
 
-  for (uint tableno= 0; tableno < table_count; tableno++)
+  for (j=join_tab, tablenr=0 ; tablenr < table_count ; tablenr++,j++)
   {
-    JOIN_TAB *j= join->join_tab + tableno;
     TABLE *form;
-    *j= *join->best_positions[tableno].table;
-    form=join->all_tables[tableno]= j->table;
+    *j= *join->best_positions[tablenr].table;
+    form=join->all_tables[tablenr]=j->table;
     used_tables|= form->map;
     form->reginfo.join_tab=j;
     if (!*j->on_expr_ref)
@@ -8856,69 +8862,31 @@
     if (j->type == JT_CONST)
       continue;					// Handled in make_join_stat..
 
+
     j->loosescan_match_tab= NULL;  //non-nulls will be set later
     j->ref.key = -1;
     j->ref.key_parts=0;
 
+
     if (j->type == JT_SYSTEM)
       continue;
     
-    if (j->keys.is_clear_all() ||
-       !(keyuse= join->best_positions[tableno].key) || 
-        (join->best_positions[tableno].sj_strategy == SJ_OPT_LOOSE_SCAN))
+    if (j->keys.is_clear_all() || !(keyuse= join->best_positions[tablenr].key) || 
+        (join->best_positions[tablenr].sj_strategy == SJ_OPT_LOOSE_SCAN))
     {
       j->type=JT_ALL;
-      j->index= join->best_positions[tableno].loosescan_key;
-      if (tableno != join->const_tables)
+      j->index= join->best_positions[tablenr].loosescan_key;
+      if (tablenr != join->const_tables)
 	join->full_join=1;
     }
     else if (create_ref_for_key(join, j, keyuse, used_tables))
       DBUG_RETURN(TRUE);                        // Something went wrong
   }
 
-  for (uint tableno= 0; tableno < table_count; tableno++)
-    join->map2table[join->join_tab[tableno].table->tablenr]=
-      join->join_tab + tableno;
-
+  for (i=0 ; i < table_count ; i++)
+    join->map2table[join->join_tab[i].table->tablenr]=join->join_tab+i;
   update_depend_map(join);
-
-  for (uint tableno= join->const_tables; tableno < table_count; )
-  {
-    JOIN_TAB *tab= join->join_tab + tableno;
-    const POSITION *pos= join->best_positions + tableno;
-
-    switch (pos->sj_strategy)
-    {
-    case SJ_OPT_NONE:
-      tableno++;
-      break;
-    case SJ_OPT_MATERIALIZE_LOOKUP:
-    case SJ_OPT_MATERIALIZE_SCAN:
-    case SJ_OPT_LOOSE_SCAN:
-    case SJ_OPT_DUPS_WEEDOUT:
-    case SJ_OPT_FIRST_MATCH:
-      /*
-        Remember the first and last semijoin inner tables; this serves to tell
-        a JOIN_TAB's semijoin strategy (like in check_join_cache_usage()).
-      */
-      JOIN_TAB *last_sj_tab= tab + pos->n_sj_tables - 1;
-      JOIN_TAB *last_sj_inner=
-        (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
-        /* Range may end with non-inner table so cannot set last_sj_inner_tab */
-        NULL : last_sj_tab;
-      for (JOIN_TAB *tab_in_range= tab; 
-           tab_in_range <= last_sj_tab; 
-           tab_in_range++)
-      {
-        tab_in_range->first_sj_inner_tab= tab;
-        tab_in_range->last_sj_inner_tab=  last_sj_inner;
-      }
-      tableno+= pos->n_sj_tables;
-      break;
-    }
-  }
-
-  DBUG_RETURN(FALSE);
+  DBUG_RETURN(0);
 }
 
 
@@ -9923,12 +9891,9 @@
 
       DBUG_ASSERT(save_used_tables ? tab->emb_sj_nest != NULL : TRUE);
 
-      /*
-        1. We are inside a materialized semijoin nest, and
-        2. All inner tables of the nest are covered.
-      */
-      if (save_used_tables &&                                        // 1
-          !(tab->emb_sj_nest->sj_inner_tables & ~used_tables))        // 2
+      if (save_used_tables && !(used_tables & 
+                                ~(tab->emb_sj_nest->sj_inner_tables |
+                                  join->const_table_map | PSEUDO_TABLE_BITS)))
       {
         /*
           We have reached the end of semi join nest. That is, the join order
@@ -9944,8 +9909,9 @@
            - A condition that can be checked when we have all of the tables
              in the prefix (both inner and outer).
         */
-        Item * const sjm_cond=
-          make_cond_after_sjm(cond, cond, save_used_tables, used_tables);
+        Item * const sjm_cond= cond ?
+          make_cond_after_sjm(cond, cond, save_used_tables, used_tables):
+          NULL;
         tab->emb_sj_nest->sj_mat_exec->join_cond= sjm_cond;
         used_tables= save_used_tables | used_tables;
         save_used_tables= 0;
@@ -11115,7 +11081,8 @@
   uint last_sjm_table= MAX_TABLES;
   DBUG_ENTER("make_join_readinfo");
 
-  if (setup_semijoin_dups_elimination(join, options, no_jbuf_after))
+  if (!join->select_lex->sj_nests.is_empty() &&
+      setup_semijoin_dups_elimination(join, options, no_jbuf_after))
     DBUG_RETURN(TRUE); /* purecov: inspected */
 
   for (i=join->const_tables ; i < join->tables ; i++)
@@ -12764,18 +12731,17 @@
                            Item_equal *item_equal)
 {
   List<Item> eq_list;
-  Item_func_eq *eq_item= NULL;
+  Item_func_eq *eq_item= 0;
   if (((Item *) item_equal)->const_item() && !item_equal->val_int())
     return new Item_int((longlong) 0,1); 
   Item *item_const= item_equal->get_const();
   Item_equal_iterator it(*item_equal);
   Item *head;
-  if (!item_const)
+  if (item_const)
+    head= item_const;
+  else
   {
-    /*
-      If there is a const item, match all field items with the const item,
-      otherwise match the second and subsequent field items with the first one:
-    */
+    head= item_equal->get_first();
     it++;
   }
   Item_field *item_field;
@@ -12786,7 +12752,7 @@
     if (upper)
     { 
       if (item_const && upper->get_const())
-        item= NULL;
+        item= 0;
       else
       {
         Item_equal_iterator li(*item_equal);
@@ -12801,55 +12767,57 @@
     {
       if (eq_item)
         eq_list.push_back(eq_item);
-
       /*
-        item_field may refer to a table that is within a semijoin
-        materialization nest. In that case, the join order may look like:
-
-          ot1 ot2 SJM (it1 it2) ot3 
-
-        We must not construct general equalities like 
-
-           ot1.col = it1.col 
+        item_field might refer to a table that is within a semi-join
+        materialization nest. In that case, the join order looks like this:
+
+          outer_tbl1 outer_tbl2 SJM (inner_tbl1 inner_tbl2) outer_tbl3 
+
+        We must not construct equalities like 
+
+           outer_tbl1.col = inner_tbl1.col 
 
         because they would get attached to inner_tbl1 and will get evaluated
         during materialization phase, when we don't have current value of
-        ot1.col. When such equalities are generated, add a special marker to
-        the condition so that it will be evaluated after the inner tables
-        have been materialized, and the materialized table is joined with the
-        outer tables.
-      */
-
-      bool need_marker= FALSE;
-      /*
-        If there is a const item, match against this one.
-        Otherwise, match against the first field item in the multiple equality,
-        unless the item is within a materialized semijoin nest, where we match
-        against the first item within the SJM nest (if the item is not the first
-        item within the SJM nest), or match against the first item in the
-        list (if the item is the first one in the SJM list).
-        The latter equalities are the ones that must be tagged with a marker,
-        to prevent them from being evaluated at the wrong place.
-      */
-      head= item_const ? item_const : item_equal->get_first(item);
-      if (head == item)
+        outer_tbl1.col.
+      */
+      TABLE_LIST *emb_nest= 
+        item_field->field->table->pos_in_table_list->embedding;
+      if (!item_const && emb_nest && emb_nest->sj_mat_exec)
       {
-        head= item_equal->get_first();
-        need_marker= TRUE;
+        /* 
+          Find the first equal expression that refers to a table that is
+          within the semijoin nest. If we can't find it, do nothing
+        */
+        List_iterator<Item_field> fit(item_equal->fields);
+        Item_field *head_in_sjm;
+        bool found= FALSE;
+        while ((head_in_sjm= fit++))
+        {
+          if (head_in_sjm->used_tables() & emb_nest->sj_inner_tables)
+          {
+            if (head_in_sjm == item_field)
+            {
+              /* This is the first table inside the semi-join*/
+              eq_item= new Item_func_eq(item_field, head);
+              /* Tell make_cond_for_table don't use this. */
+              eq_item->marker=3;
+            }
+            else
+            {
+              eq_item= new Item_func_eq(item_field, head_in_sjm);
+              found= TRUE;
+            }
+            break;
+          }
+        }
+        if (!found)
+          continue;
       }
-      eq_item= new Item_func_eq(item_field, head);
+      else
+        eq_item= new Item_func_eq(item_field, head);
       if (!eq_item)
         return 0;
-      if (need_marker)
-      {
-        /*
-          Setting 'marker= 3' means that make_cond_for_table() will ignore this
-          condition, but make_cond_after_sjm() will pick it up.
-          The design is not pretty, but the alternative is to have a hairy
-          check for SJM-related conditions inside make_cond_for_table().
-        */
-        eq_item->marker= 3;
-      }
       eq_item->set_cmp_func();
       eq_item->quick_fix_field();
     }
@@ -17198,8 +17166,21 @@
       Ok, materialization finished. Initialize the access to the temptable
     */
     sjm->materialized= TRUE;
-
     join_tab->read_record.read_record= join_no_more_records;
+    if (sjm->is_scan)
+    {
+      /* Initialize full scan */
+      JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
+      init_read_record(&last_tab->read_record, join->thd,
+                       sjm->table, NULL, TRUE, TRUE, FALSE);
+
+      DBUG_ASSERT(last_tab->read_record.read_record == rr_sequential);
+      last_tab->read_first_record= join_read_record_no_init;
+      last_tab->read_record.copy_field= sjm->copy_field;
+      last_tab->read_record.copy_field_end= sjm->copy_field +
+                                            sjm->table_cols.elements;
+      last_tab->read_record.read_record= rr_sequential_and_unpack;
+    }
   }
 
   if (sjm->is_scan)
@@ -17207,15 +17188,6 @@
     /* Do full scan of the materialized table */
     JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
 
-    init_read_record(&last_tab->read_record, join->thd,
-                     sjm->table, NULL, TRUE, TRUE, FALSE);
-
-    last_tab->read_first_record= join_read_record_no_init;
-    last_tab->read_record.copy_field= sjm->copy_field;
-    last_tab->read_record.copy_field_end= sjm->copy_field +
-                                          sjm->table_cols.elements;
-    last_tab->read_record.read_record= rr_sequential_and_unpack;
-
     Item *save_cond= last_tab->select_cond;
     last_tab->set_select_cond(sjm->join_cond, __LINE__);
     rc= sub_select(join, last_tab, end_of_records);
@@ -19145,14 +19117,13 @@
       cond         Condition to analyze
       tables       Tables for which "current field values" are available
       used_table   Table that we're extracting the condition for (may 
-                   also include PSEUDO_TABLE_BITS, and may be zero)
+                   also include PSEUDO_TABLE_BITS
       exclude_expensive_cond  Do not push expensive conditions
 
   DESCRIPTION
     Extract the condition that can be checked after reading the table
     specified in 'used_table', given that current-field values for tables
     specified in 'tables' bitmap are available.
-    If 'used_table' is 0 (zero), extract conditions for all tables in 'tables'.
 
     The function assumes that
       - Constant parts of the condition has already been checked.
@@ -19178,32 +19149,26 @@
   return make_cond_for_table_from_pred(cond, cond, tables, used_table,
                                        exclude_expensive_cond);
 }
-
+               
 static Item *
 make_cond_for_table_from_pred(Item *root_cond, Item *cond,
                               table_map tables, table_map used_table,
                               bool exclude_expensive_cond)
 {
-  /*
-    Ignore this condition if
-     1. We are extracting conditions for a specific table, and
-     2. that table is not referenced by the condition, and
-     3. exclude constant conditions not checked at optimization time if
+  if (used_table && !(cond->used_tables() & used_table) &&
+      /*
+        Exclude constant conditions not checked at optimization time if
         the table we are pushing conditions to is the first one.
         As a result, such conditions are not considered as already checked
         and will be checked at execution time, attached to the first table.
-  */
-  if (used_table &&                                                 // 1
-      !(cond->used_tables() & used_table) &&                        // 2
-      /*
+
         psergey: TODO: "used_table & 1" doesn't make sense in nearly any
         context. Look at setup_table_map(), table bits reflect the order 
         the tables were encountered by the parser. Check what we should
         replace this condition with.
       */
-      !((used_table & 1) && cond->is_expensive()))                  // 3
-    return (Item*) NULL;                      // Already checked
-
+      !((used_table & 1) && cond->is_expensive()))
+    return (Item*) 0;				// Already checked
   if (cond->type() == Item::COND_ITEM)
   {
     if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
@@ -19211,48 +19176,48 @@
       /* Create new top level AND item */
       Item_cond_and *new_cond=new Item_cond_and;
       if (!new_cond)
-        return (Item*) NULL;
+	return (Item*) 0;			// OOM /* purecov: inspected */
       List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
       Item *item;
       while ((item=li++))
       {
-        Item *fix=make_cond_for_table_from_pred(root_cond, item, 
+	Item *fix=make_cond_for_table_from_pred(root_cond, item, 
                                                 tables, used_table,
                                                 exclude_expensive_cond);
-        if (fix)
-          new_cond->argument_list()->push_back(fix);
+	if (fix)
+	  new_cond->argument_list()->push_back(fix);
       }
       switch (new_cond->argument_list()->elements) {
       case 0:
-        return (Item*) NULL;                  // Always true
+	return (Item*) 0;			// Always true
       case 1:
-        return new_cond->argument_list()->head();
+	return new_cond->argument_list()->head();
       default:
-        /*
-          Item_cond_and do not need fix_fields for execution, its parameters
-          are fixed or do not need fix_fields, too
-        */
-        new_cond->quick_fix_field();
-        new_cond->used_tables_cache=
-          ((Item_cond_and*) cond)->used_tables_cache &
-          tables;
-          return new_cond;
+	/*
+	  Item_cond_and do not need fix_fields for execution, its parameters
+	  are fixed or do not need fix_fields, too
+	*/
+	new_cond->quick_fix_field();
+	new_cond->used_tables_cache=
+	  ((Item_cond_and*) cond)->used_tables_cache &
+	  tables;
+	return new_cond;
       }
     }
     else
-    {                                         // Or list
+    {						// Or list
       Item_cond_or *new_cond=new Item_cond_or;
       if (!new_cond)
-        return (Item*) NULL;
+	return (Item*) 0;			// OOM /* purecov: inspected */
       List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
       Item *item;
       while ((item=li++))
       {
-        Item *fix=make_cond_for_table_from_pred(root_cond, item,
+	Item *fix=make_cond_for_table_from_pred(root_cond, item,
                                                 tables, 0L,
                                                 exclude_expensive_cond);
 	if (!fix)
-          return (Item*) NULL;                // Always true
+	  return (Item*) 0;			// Always true
 	new_cond->argument_list()->push_back(fix);
       }
       /*
@@ -19267,25 +19232,20 @@
   }
 
   /*
-    Omit this condition if
-     1. It has been marked as omittable before, or
-     2. All tables referred by the condition are not available, or
-     3. We are extracting constant conditions, the condition is
-        considered 'expensive', and we want to delay evaluation of such 
-        conditions to the execution phase.
+    Because the following test takes a while and it can be done
+    table_count times, we mark each item that we have examined with the result
+    of the test
   */
-  if (cond->marker == 3 ||                                             // 1
-      (cond->used_tables() & ~tables) ||                               // 2
-      (!used_table && exclude_expensive_cond && cond->is_expensive()))
// 3
-    return (Item*) NULL;
 
-  /*
-    Extract this condition if
-     1. It has already been marked as applicable, or
-     2. It is not a <comparison predicate> (=, <, >, <=, >=, <=>)
-  */
+  if (cond->marker == 3 || (cond->used_tables() & ~tables) ||
+      /*
+        When extracting constant conditions, treat expensive conditions as
+        non-constant, so that they are not evaluated at optimization time.
+      */
+      (!used_table && exclude_expensive_cond && cond->is_expensive()))
+    return (Item*) 0;				// Can't check this yet
   if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
-    return cond;
+    return cond;				// Not boolean op
 
   /* 
     Remove equalities that are guaranteed to be true by use of 'ref' access
@@ -19313,19 +19273,19 @@
     Item *left_item= ((Item_func*) cond)->arguments()[0]->real_item();
     Item *right_item= ((Item_func*) cond)->arguments()[1]->real_item();
     if (left_item->type() == Item::FIELD_ITEM &&
-        test_if_ref(root_cond, (Item_field*) left_item,right_item))
+	test_if_ref(root_cond, (Item_field*) left_item,right_item))
     {
-      cond->marker=3;                    // Condition can be omitted
-      return (Item*) NULL;
+      cond->marker=3;			// Checked when read
+      return (Item*) 0;
     }
     if (right_item->type() == Item::FIELD_ITEM &&
-        test_if_ref(root_cond, (Item_field*) right_item,left_item))
+	test_if_ref(root_cond, (Item_field*) right_item,left_item))
     {
-      cond->marker=3;                   // Condition can be omitted
-      return (Item*) NULL;
+      cond->marker=3;			// Checked when read
+      return (Item*) 0;
     }
   }
-  cond->marker=2;                       // Mark condition as applicable
+  cond->marker=2;
   return cond;
 }
 
@@ -19354,11 +19314,9 @@
 make_cond_after_sjm(Item *root_cond, Item *cond, table_map tables,
                     table_map sjm_tables)
 {
-  if (!cond)
-    return (Item*) NULL;
   if ((!(cond->used_tables() & ~tables) || 
        !(cond->used_tables() & ~sjm_tables)))
-    return (Item*) NULL;                        // Already checked
+    return (Item*) 0;				// Already checked
   if (cond->type() == Item::COND_ITEM)
   {
     if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
@@ -19366,49 +19324,49 @@
       /* Create new top level AND item */
       Item_cond_and *new_cond=new Item_cond_and;
       if (!new_cond)
-        return (Item*) NULL;
+	return (Item*) 0;			// OOM /* purecov: inspected */
       List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
       Item *item;
       while ((item=li++))
       {
-        Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
-        if (fix)
-          new_cond->argument_list()->push_back(fix);
+	Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
+	if (fix)
+	  new_cond->argument_list()->push_back(fix);
       }
       switch (new_cond->argument_list()->elements) {
       case 0:
-        return (Item*) NULL;                     // Always true
+	return (Item*) 0;			// Always true
       case 1:
-        return new_cond->argument_list()->head();
+	return new_cond->argument_list()->head();
       default:
 	/*
-          Item_cond_and do not need fix_fields for execution, its parameters
-          are fixed or do not need fix_fields, too
+	  Item_cond_and do not need fix_fields for execution, its parameters
+	  are fixed or do not need fix_fields, too
 	*/
-        new_cond->quick_fix_field();
-        new_cond->used_tables_cache=
-          ((Item_cond_and*) cond)->used_tables_cache &
-          tables;
-        return new_cond;
+	new_cond->quick_fix_field();
+	new_cond->used_tables_cache=
+	  ((Item_cond_and*) cond)->used_tables_cache &
+	  tables;
+	return new_cond;
       }
     }
     else
-    {                                          // Or list
+    {						// Or list
       Item_cond_or *new_cond=new Item_cond_or;
       if (!new_cond)
-        return (Item*) NULL;
+	return (Item*) 0;			// OOM /* purecov: inspected */
       List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
       Item *item;
       while ((item=li++))
       {
-        Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
-        if (!fix)
-          return (Item*) NULL;                 // Always true
-        new_cond->argument_list()->push_back(fix);
+	Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
+	if (!fix)
+	  return (Item*) 0;			// Always true
+	new_cond->argument_list()->push_back(fix);
       }
       /*
-        Item_cond_and do not need fix_fields for execution, its parameters
-        are fixed or do not need fix_fields, too
+	Item_cond_and do not need fix_fields for execution, its parameters
+	are fixed or do not need fix_fields, too
       */
       new_cond->quick_fix_field();
       new_cond->used_tables_cache= ((Item_cond_or*) cond)->used_tables_cache;
@@ -19417,10 +19375,16 @@
     }
   }
 
-  if (cond->used_tables() & ~(tables | sjm_tables))
-    return (Item*) NULL;                       // Can't check this yet
-  if (cond->eq_cmp_result() == Item::COND_OK)
-    return cond;                               // Not a <comparison predicate>
+  /*
+    Because the following test takes a while and it can be done
+    table_count times, we mark each item that we have examined with the result
+    of the test
+  */
+
+  if (cond->marker == 3 || (cond->used_tables() & ~(tables | sjm_tables)))
+    return (Item*) 0;				// Can't check this yet
+  if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
+    return cond;				// Not boolean op
 
   /* 
     Remove equalities that are guaranteed to be true by use of 'ref' access
@@ -19433,14 +19397,17 @@
     if (left_item->type() == Item::FIELD_ITEM &&
 	test_if_ref(root_cond, (Item_field*) left_item,right_item))
     {
+      cond->marker=3;			// Checked when read
       return (Item*) 0;
     }
     if (right_item->type() == Item::FIELD_ITEM &&
 	test_if_ref(root_cond, (Item_field*) right_item,left_item))
     {
+      cond->marker=3;			// Checked when read
       return (Item*) 0;
     }
   }
+  cond->marker=2;
   return cond;
 }
 


Thread
bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3260) Bug#45174Bug#50019 Bug#52068Roy Lyseng7 Oct
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3260)Bug#45174 Bug#50019 Bug#52068Guilhem Bichot20 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3260)Bug#45174 Bug#50019 Bug#52068Guilhem Bichot21 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3260)Bug#45174 Bug#50019 Bug#52068Roy Lyseng21 Oct
      • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3260)Bug#45174 Bug#50019 Bug#52068Guilhem Bichot21 Oct
        • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3260)Bug#45174 Bug#50019 Bug#52068Roy Lyseng21 Oct