List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:October 25 2010 9:28am
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)
Bug#45174 Bug#50019 Bug#52068
View as plain text  
Hi Øystein,

thank you for the review.

On 22.10.10 15.41, Øystein Grøvlen wrote:
> Hi,
>
> Thanks for the patch. You have made a good job of understanding the
> mechanisms here and an equally good job of explaining what goes on. I
> learned much from this review.
>
> My main comments:
>
> - I am not very comfortable with the new get_first function, neither
> its name nor its location (see comments inline). I see that it is
> not easy to see what it the best compromise here within the limits
> of this spaghetti, but please consider whether there are other ways
> of doing this.

Please see answers to inline comments.
>
> - Making many different fixes in the same patch is a challenge for
> the reviewers since it is not obvious which changes leads to which
> changes in behavior (e.g., test results.) I understand when there
> is a lot of problems with a given test case, it is bit difficult to
> split things up, but I think you should consider a separate patch
> for the fix to the scanning of the materilization table (with test
> case for Bug#52068)

OK, separating out bug#52068.
>
> See inline for more comments.
>
>
> On 10/22/10 09:58 AM, Roy Lyseng wrote:
>  > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:guilhem@stripped
>  >
>  > 3265 Roy Lyseng 2010-10-22
>  > Bug#45174: Incorrectly applied equality propagation caused wrong result
>  > on a query with a materialized semi-join.
>  > Bug#50019: Wrong result for IN-query with materialization.
>  > Bug#52068: Optimizer generates invalid semijoin materialization plan
>  >
>  > When a subquery is subject to a semijoin optimization, its tables
>  > are merged to the outer query and later are treated as regular tables.
>  > This allows a bunch of optimizations to be applied, equality
>  > propagation is among them. Equality propagation is done after query
>  > execution plan is chosen. It substitutes fields from tables being
>  > retrieved later for fields from tables being retrieved earlier.
>  > However, it can't be applied as is to any semijoin table.
>  > The semijoin materialization strategy differs from other semijoin
>  > strategies that the data from materialized semijoin tables isn't used
>  > directly but saved to a temporary table first.
>  > The materialization isn't isolated in a separate step, it is done
>  > inline within the nested loop execution.
>  > When it comes to fetching rows from the first table in the block of
>  > materialized semijoin tables, sub_select() function is called to
>  > materialize the result of the subquery and save it in the
>  > materialized table. Later, data from the materialized table is used
>  > as if they were regular table rows.
>  > Due to this we can't substitute fields that belong to the semi-join
>  > for fields from outer query and vice versa.
>  >
>
> The above explanation seems unecessary detailed/complex to me. I
> cannot see how doing materialization in a separate step would help
> here.

I will trim the text a little.
>
>  > Example: suppose we have a join order:
>  >
>  > ot1 ot2 SJ-Mat(it1 it2 it3) ot3
>  >
>  > and equality ot2.col = it1.col = it2.col
>  > If we're looking for best substitute for 'it2.col', we should pick
>  > it1.col and not ot2.col.
>  >
>  > For a field that is not in a materialized semijoin we can use any field,
>  > even those that are embedded in a materialized semijoin. This is because
>  > such fields are "copied back" to their original join-tab structures when
>  > the materialized temporary table is being read.
>  >
>  > Now we have added another Item_equal::get_first() function that accepts
>  > as a parameter a field being substituted and checks whether it belongs
>  > to a materialized semijoin.
>  > The field to substitute will be from the same materialized semijoin nest
>  > (if supplied field is within such nest), otherwise it will be the first
>  > field in the multiple equality.
>  >
>  > The new checks rely on the first_sj_inner_tab and first_sj_inner_tab
>  > fields of the join-tab. These fields are therefore set as soon as
>  > possible after the join strategy is fixed.
>
> Maybe you could mentioned that they before only were used for
> DuplicateWeedout strategy.

Ok
>
>  >
>  > Also fixed problem appearing in Bug#52068: When MaterializeScan
>  > semijoin strategy was used and there were one or more outer dependent
>  > tables before the semijoin tables, the scan over the materialized
>  > table was not properly reset for each row of the prefix outer tables.
>  >
>  > Also fixed problems with pushdown of SJM-aware predicates during
>  > make_join_select(): wrong predicates were sometimes generated,
>  > make_cond_after_sjm() was called at the wrong position, and
>  > make_cond_after_sjm() was never actually considering the pushed-down
>  > SJM predicates.
>  >
>  > mysql-test/include/subquery_sj.inc
>  > Bug#45174: Incorrectly applied equality propagation caused wrong result
>  > on a query with a materialized semi-join.
>  > Bug#50019: Wrong result for IN-query with materialization.
>  > Bug#52068: Optimizer generates invalid semijoin materialization plan
>  > Added new tests.
>  >
>  > mysql-test/r/subquery_mat_all.result
>  > mysql-test/r/subquery_sj_all.result
>  > mysql-test/r/subquery_sj_all_jcl6.result
>  > mysql-test/r/subquery_sj_all_jcl7.result
>  > mysql-test/r/subquery_sj_dupsweed.result
>  > mysql-test/r/subquery_sj_dupsweed_jcl6.result
>  > mysql-test/r/subquery_sj_dupsweed_jcl7.result
>  > mysql-test/r/subquery_sj_firstmatch.result
>  > mysql-test/r/subquery_sj_firstmatch_jcl6.result
>  > mysql-test/r/subquery_sj_firstmatch_jcl7.result
>  > mysql-test/r/subquery_sj_loosescan.result
>  > mysql-test/r/subquery_sj_loosescan_jcl6.result
>  > mysql-test/r/subquery_sj_loosescan_jcl7.result
>  > mysql-test/r/subquery_sj_mat.result
>  > mysql-test/r/subquery_sj_mat_jcl6.result
>  > mysql-test/r/subquery_sj_mat_jcl7.result
>  > mysql-test/r/subquery_sj_mat_nosj.result
>  > mysql-test/r/subquery_sj_none.result
>  > mysql-test/r/subquery_sj_none_jcl6.result
>  > mysql-test/r/subquery_sj_none_jcl7.result
>  > Bug#45174: Incorrectly applied equality propagation caused wrong result
>  > on a query with a materialized semi-join.
>  > Bug#50019: Wrong result for IN-query with materialization.
>  > Bug#52068: Optimizer generates invalid semijoin materialization plan
>  > Results for three new tests added.
>  > Some tests using semijoin materialization show that where clause
>  > has moved from the outer query into the materialized inner query.
>  > This is caused by the changed call to get_first() in
>  > eliminate_item_equal().
>  > Ex: select * from ot where a in(select b from it where b>0);
>  > The clause "b>0" is now evaluated on the inner query materialization.
>  > Performance-wise this is never worse when using MaterializeScan and
>  > usually better for MaterializeLookup. For the latter strategy, the
>
> I had to read the above sentence a few times before understanding the
> meaning of it. Never worse than what? I assume than before. Does
> usually better mean "usually better, but sometimes worse" or "usually
> better, and never worse"?

Rephrased.
>
>  > best possible solution is probably to evaluate the clause in both
>  > queries, this can be subject for a later feature development.
>  > Another test that applies the same condition to both the outer and
>  > the inner query is added, to show the plan for such types of queries.
>  > Earlier wrong semijoin materialization test results are corrected.
>  >
>  > sql/item.cc
>  > Bug#45174: Incorrectly applied equality propagation caused wrong result
>  > on a query with a materialized semi-join.
>  >
>  > Calling new get_first() function instead of old.
>  >
>  > sql/item_cmpfunc.cc
>  > Bug#45174: Incorrectly applied equality propagation caused wrong result
>  > on a query with a materialized semi-join.
>  >
>  > New function Item_equal::get_first() that accepts as argument
>  > a field being substituted.
>  >
>  > sql/item_cmpfunc.h
>  > Bug#45174: Incorrectly applied equality propagation caused wrong result
>  > on a query with a materialized semi-join.
>  >
>  > New function Item_equal::get_first() that accepts as argument
>  > a field being substituted.
>  >
>  > sql/sql_select.cc
>  > Bug#45174: Incorrectly applied equality propagation caused wrong result
>  > on a query with a materialized semi-join.
>  > Bug#50019: Wrong result for IN-query with materialization.
>  > Bug#52068: Optimizer generates invalid semijoin materialization plan
>  >
>  > Setting fields first_sj_inner_tab and last_sj_inner_tab moved from
>  > setup_semijoin_dups_elimination() to get_best_combination(), so they
>  > are set as early as possible after join order optimization.
>  >
>  > In make_join_select(), the test that determined when to pushdown
>  > SJM-specific predicates was wrong, in addition to improving the
>  > comments.
>  >
>  > The logic of eliminate_item_equal() has been simplified and
>  > adjusted so that it generates equalities that are useful also
>  > when the semijoin materialization strategy is being used.
>  > Some simplification was possible by taking advantage of the new
>  > Item_equal::get_first() function.
>  >
>  > In sub_select_sjm(), moved code that initializes the scan over the
>  > materialized table so that it is now performed for each scan of
>  > table, instead of only for the first scan.
>  >
>  > In make_cond_for_table_from_pred(), a number of comments has been
>  > added, and TAB characters are replaced by spaces.
>  >
>  > In make_cond_after_sjm(), make sure that it handles equalities
>  > generated for semijoin materialization (with marker=3).
>  > Added comments and removed TAB characters.
>  >
>  > modified:
>  > mysql-test/include/subquery_sj.inc
>  > mysql-test/r/subquery_mat_all.result
>  > mysql-test/r/subquery_sj_all.result
>  > mysql-test/r/subquery_sj_all_jcl6.result
>  > mysql-test/r/subquery_sj_all_jcl7.result
>  > mysql-test/r/subquery_sj_dupsweed.result
>  > mysql-test/r/subquery_sj_dupsweed_jcl6.result
>  > mysql-test/r/subquery_sj_dupsweed_jcl7.result
>  > mysql-test/r/subquery_sj_firstmatch.result
>  > mysql-test/r/subquery_sj_firstmatch_jcl6.result
>  > mysql-test/r/subquery_sj_firstmatch_jcl7.result
>  > mysql-test/r/subquery_sj_loosescan.result
>  > mysql-test/r/subquery_sj_loosescan_jcl6.result
>  > mysql-test/r/subquery_sj_loosescan_jcl7.result
>  > mysql-test/r/subquery_sj_mat.result
>  > mysql-test/r/subquery_sj_mat_jcl6.result
>  > mysql-test/r/subquery_sj_mat_jcl7.result
>  > mysql-test/r/subquery_sj_mat_nosj.result
>  > mysql-test/r/subquery_sj_none.result
>  > mysql-test/r/subquery_sj_none_jcl6.result
>  > mysql-test/r/subquery_sj_none_jcl7.result
>  > sql/item.cc
>  > sql/item_cmpfunc.cc
>  > sql/item_cmpfunc.h
>  > sql/sql_select.cc
>  > === modified file 'mysql-test/include/subquery_sj.inc'
>  > --- a/mysql-test/include/subquery_sj.inc 2010-10-13 13:27:36 +0000
>  > +++ b/mysql-test/include/subquery_sj.inc 2010-10-22 07:57:40 +0000
>  > @@ -393,6 +393,9 @@ insert into t3 select A.a + 10*B.a from
>  > explain select * from t3 where a in (select kp1 from t1 where kp1<20);
>  > select * from t3 where a in (select kp1 from t1 where kp1<20);
>  >
>  > +explain select * from t3 where a in (select kp1 from t1 where kp1<20) and
> a<20;
>  > +select * from t3 where a in (select kp1 from t1 where kp1<20) and a<20;
>  > +
>  > create table t4 (pk int primary key);
>  > insert into t4 select a from t3;
>  >
>  > @@ -637,11 +640,9 @@ drop table t0, t1,t2,t3;
>  >
>  >
>  > --echo
>  > ---echo Test that MaterializeLookup strategy for semijoin,
>  > +--echo Test that neither MaterializeLookup strategy for semijoin,
>  > --echo nor subquery materialization is used when BLOBs are involved
>  > --echo (except when arguments of some functions).
>  > ---echo Note: Due to Bug#52068, wrong may occur below if MaterializeScan
>  > ---echo strategy is used instead,
>  > --echo
>  > set @prefix_len = 6;
>  >
>  > @@ -3371,3 +3372,89 @@ WHERE (a1, a2) IN (
>  > DROP TABLE t1,t2a,t2b,t2c;
>  >
>  > --echo # End BUG#52329
>  > +
>  > +--echo #
>  > +--echo # Bug#45174: Incorrectly applied equality propagation caused wrong
>  > +--echo # result on a query with a materialized semi-join.
>  > +--echo #
>  > +
>  > +CREATE TABLE t1 (
>  > + varchar_nokey varchar(1) NOT NULL
>  > +);
>  > +
>  > +INSERT INTO t1 VALUES
>  > + ('v'), ('u'), ('n'), ('l'), ('h'), ('u'), ('n'), ('j'), ('k'),
>  > + ('e'), ('i'), ('u'), ('n'), ('b'), ('x'), (''), ('q'), ('u');
>  > +
>  > +CREATE TABLE t2 (
>  > + pk int NOT NULL,
>  > + varchar_key varchar(1) NOT NULL,
>  > + varchar_nokey varchar(1) NOT NULL,
>  > + PRIMARY KEY(pk),
>  > + KEY varchar_key(varchar_key)
>  > +);
>  > +
>  > +INSERT INTO t2 VALUES
>  > + (11,'m','m'), (12,'j','j'), (13,'z','z'), (14,'a','a'), (15,'',''),
>  > + (16,'e','e'), (17,'t','t'), (19,'b','b'), (20,'w','w'), (21,'m','m'),
>  > + (23,'',''), (24,'w','w'), (26,'e','e'), (27,'e','e'), (28,'p','p');
>  > +
>  > +let $query=
>  > +SELECT varchar_nokey
>  > +FROM t1
>  > +WHERE (varchar_nokey, varchar_nokey) IN (SELECT varchar_key, varchar_nokey
>  > + FROM t2
>  > + WHERE varchar_nokey< 'n' XOR pk);
>
> I feel the XOR part is an unecessary distraction here. AFAICT, you
> will get the same error with just "WHERE varchar_nokey < pk". Ref. new
> duplicate BUG#57421.

I am not getting the error without the XOR, so I prefer to keep it (even though 
the XOR is odd...)
>
> ...
>
>  > === modified file 'sql/item.cc'
>  > --- a/sql/item.cc 2010-10-15 10:32:50 +0000
>  > +++ b/sql/item.cc 2010-10-22 07:57:40 +0000
>  > @@ -5075,8 +5075,8 @@ Item *Item_field::replace_equal_field(uc
>  > return this;
>  > return const_item;
>  > }
>  > - Item_field *subst= item_equal->get_first();
>  > - if (subst&& field->table != subst->field->table&&
> !field->eq(subst->field))
>  > + Item_field *subst= item_equal->get_first(this);
>  > + if (field->table != subst->field->table&&
> !field->eq(subst->field))
>  > return subst;
>
> Since it is no longer checked that subst is not null, maybe an assert
> should be added?

Ok.
>
>  > }
>  > return this;
>  >
>  > === modified file 'sql/item_cmpfunc.cc'
>  > --- a/sql/item_cmpfunc.cc 2010-10-15 10:32:50 +0000
>  > +++ b/sql/item_cmpfunc.cc 2010-10-22 07:57:40 +0000
>  > @@ -5840,3 +5840,90 @@ void Item_equal::print(String *str, enum
>  > str->append(')');
>  > }
>  >
>  > +
>  > +/**
>  > + Get the first equal field of multiple equality, use when execution order
>  > + must be considered (see details below).
>  > +
>  > + @param field field to get substitution field for, which must be
>  > + present within the multiple equality itself.
>
> Actually, field is just used to identify the correct semijoin nest.
> It does not have to be present within the multiple equality itself to
> work correctly. Maybe it would be better to generalize this function
> to take JOIN_TAB* as parameter. That is, if the passed in JOIN_TAB is
> part of a materialization, return the first field from that semijoin
> nest. Otherwise, return first field.

I think it is a good thing that the interface of Item_equal only depends on Item 
objects. Making the interface know about join-tabs is not good, IMHO. It is 
sufficient that the implementation needs to deal with them ;)
>
>  > +
>  > + @retval Found first field in the multiple equality.
>  > +
>  > + @details Get the first field of multiple equality that is equal to the
>  > + given field. In order to make semijoin materialization strategy work
>
> The "that is equal to the given field" seems unecessary since all fields of
> the multiple equality should be equal to this field.

Ok.
>
>  > + correctly we can't propagate equal fields between a materialized semijoin
>  > + and the outer query (or any other semijoin) unconditionally.
>  > + Thus the field is returned according to following rules:
>  > +
>  > + 1) If the given field belongs to a materialized semijoin then the
>  > + first field in multiple equality which belong to the same semijoin
>  > + is returned.
>  > + 2) If the given field doesn't belong to a materialized semijoin then
>  > + the first field in the multiple equality is returned.
>  > +*/
>  > +
>  > +Item_field* Item_equal::get_first(const Item_field *field)
>
> I am neither comfortable with the name nor the location of this
> function. Since this function does something different from the
> existing get_first, I think it should have another name. I am also a
> bit concerned with making Item classes dependent on JOIN_TAB (except
> for a few methods of Item_subselect, this is the first time JOIN_TAB
> is used in Item classes) and join strategies. Maybe a free function
> in sql_select.cc that is friend of Item_equal is a better choice.
> (Like eliminate_item_equal() today.)

I have renamed it to get_subst_item().

Because the implementation of equality substitution is dependent on join order, 
I do not have a big concern about depending the implementation on join-tabs. 
However, if we we refactored materialized semijoin into using multiple multiple 
equalities for materialized semijoin, this function would no longer be needed.

I could let the implementation work in the context of sj-nest objects, but I 
will still have to go through join-tab pointer to get there. Having a table-list 
pointer inside Item_field could fix this, but I think this is refactoring food...
>
>
>  > +{
>  > + DBUG_ASSERT(field != NULL);
>  > +
>  > + const JOIN_TAB *field_tab= field->field->table->reginfo.join_tab;
>  > +
>  > + if (sj_is_materialize_strategy(field_tab->get_sj_strategy()))
>  > + {
>  > + /*
>  > + It's a field from a materialized semijoin. We can substitute it only
>  > + with a field from the same semijoin.
>  > +
>  > + Example: suppose we have a join order:
>  > +
>  > + ot1 ot2 SJM(it1 it2 it3) ot3
>  > +
>  > + and equality ot2.col = it1.col = it2.col
>  > +
>  > + If we're looking for best substitute for 'it2.col', we must pick it1.col
>  > + and not ot2.col. it2.col is evaluated while performing materialization,
>  > + when the outer tables are not available in the execution.
>  > + */
>  > + List_iterator<Item_field> it(fields);
>  > + Item_field *item;
>  > + const JOIN_TAB *first= field_tab->first_sj_inner_tab;
>  > + const JOIN_TAB *last= field_tab->last_sj_inner_tab;
>  > +
>  > + while ((item= it++))
>  > + {
>  > + if (item->field->table->reginfo.join_tab>= first&&
>  > + item->field->table->reginfo.join_tab<= last)
>  > + {
>  > + return item;
>  > + }
>  > + }
>  > + }
>  > + else
>  > + {
>  > + /*
>  > + The field is not in a materialized semijoin nest. We can return
>  > + the first field in the multiple equality.
>  > +
>  > + Example: suppose we have a join order with MaterializeLookup:
>  > +
>  > + ot1 ot2 SJM-Lookup(it1 it2)
>  > +
>  > + Here we should always pick the first field in the multiple equality,
>  > + as this will be present before all other dependent fields.
>  > +
>  > + Example: suppose we have a join order with MaterializeScan:
>  > +
>  > + SJM-Scan(it1 it2) ot1 ot2
>  > +
>  > + and equality ot2.col = ot1.col = it2.col.
>  > +
>  > + When looking for best substitute for 'ot2.col', we can pick it2.col,
>  > + because when we run the scan, column values from the inner materialized
>  > + tables will be copied back to the column buffers for it1 and it2.
>  > + */
>  > + return fields.head();
>  > + }
>  > + DBUG_ASSERT(FALSE); // Should never get here.
>  > + return NULL;
>  > +}
>  >
>  > === modified file 'sql/item_cmpfunc.h'
>  > --- a/sql/item_cmpfunc.h 2010-10-15 10:32:50 +0000
>  > +++ b/sql/item_cmpfunc.h 2010-10-22 07:57:40 +0000
>  > @@ -1637,7 +1637,13 @@ public:
>  > void add(Item_field *f);
>  > uint members();
>  > bool contains(Field *field);
>  > + /**
>  > + Get the first field of multiple equality, use for semantic checking.
>  > +
>  > + @retval First field in the multiple equality.
>  > + */
>  > Item_field* get_first() { return fields.head(); }
>  > + Item_field* get_first(const Item_field *field);
>  > void merge(Item_equal *item);
>  > void update_const();
>  > enum Functype functype() const { return MULT_EQUAL_FUNC; }
>  >
>  > === modified file 'sql/sql_select.cc'
>  > --- a/sql/sql_select.cc 2010-10-21 15:44:35 +0000
>  > +++ b/sql/sql_select.cc 2010-10-22 07:57:40 +0000
>  > @@ -1414,13 +1414,16 @@ bool might_do_join_buffering(uint join_c
>  > setup_sj_materialization() (todo: can't we move that to here also?)
>  > */
>  >
>  > -int setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
>  > - uint no_jbuf_after)
>  > +bool setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
>  > + uint no_jbuf_after)
>  > {
>  > uint tableno;
>  > THD *thd= join->thd;
>  > DBUG_ENTER("setup_semijoin_dups_elimination");
>  >
>  > + if (join->select_lex->sj_nests.is_empty())
>  > + DBUG_RETURN(FALSE);
>  > +
>  > for (tableno= join->const_tables ; tableno< join->tables; )
>  > {
>  > JOIN_TAB *tab=join->join_tab + tableno;
>  > @@ -1661,21 +1664,6 @@ int setup_semijoin_dups_elimination(JOIN
>  > break;
>  > }
>  > }
>  > - /*
>  > - Remember the first and last semijoin inner tables; this serves to tell
>  > - a JOIN_TAB's semijoin strategy (like in check_join_cache_usage()).
>  > - */
>  > - JOIN_TAB *last_sj_inner=
>  > - (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
>  > - /* Range may end with non-inner table so cannot set last_sj_inner_tab */
>  > - NULL : last_sj_tab;
>  > - for (JOIN_TAB *tab_in_range= tab;
>  > - tab_in_range<= last_sj_tab;
>  > - tab_in_range++)
>  > - {
>  > - tab_in_range->first_sj_inner_tab= tab;
>  > - tab_in_range->last_sj_inner_tab= last_sj_inner;
>  > - }
>  > }
>  > DBUG_RETURN(FALSE);
>  > }
>  > @@ -4253,14 +4241,18 @@ bool find_eq_ref_candidate(TABLE *table,
>  > using Materialization or LooseScan to execute it.
>  >
>  > RETURN
>  > - 0 - OK
>  > - 1 - Out of memory error
>  > + FALSE - OK
>  > + TRUE - Out of memory error
>  > */
>  >
>  > -int pull_out_semijoin_tables(JOIN *join)
>  > +bool pull_out_semijoin_tables(JOIN *join)
>  > {
>  > TABLE_LIST *sj_nest;
>  > DBUG_ENTER("pull_out_semijoin_tables");
>  > +
>  > + if (join->select_lex->sj_nests.is_empty())
>  > + DBUG_RETURN(FALSE);
>  > +
>  > List_iterator<TABLE_LIST> sj_list_it(join->select_lex->sj_nests);
>  >
>  > /* Try pulling out of the each of the semi-joins */
>  > @@ -4343,7 +4335,12 @@ int pull_out_semijoin_tables(JOIN *join)
>  > pointers.
>  > */
>  > child_li.remove();
>  > - upper_join_list->push_back(tbl);
>  > + if (upper_join_list->push_back(tbl))
>  > + {
>  > + if (arena)
>  > + join->thd->restore_active_arena(arena,&backup);
>  > + DBUG_RETURN(TRUE);
>  > + }
>  > tbl->join_list= upper_join_list;
>  > tbl->embedding= sj_nest->embedding;
>  > }
>  > @@ -4364,7 +4361,7 @@ int pull_out_semijoin_tables(JOIN *join)
>  > join->thd->restore_active_arena(arena,&backup);
>  > }
>  > }
>  > - DBUG_RETURN(0);
>  > + DBUG_RETURN(FALSE);
>  > }
>  >
>  >
>  > @@ -8426,15 +8423,18 @@ prev_record_reads(JOIN *join, uint idx,
>  >
>  > static bool fix_semijoin_strategies_for_picked_join_order(JOIN *join)
>  > {
>  > - uint tablenr;
>  > + uint tableno;
>
> I suggest declarin the above within each loop. It does not seem to be
> referred outside of the loops.

Good idea.
>
>  > table_map remaining_tables= 0;
>  > table_map handled_tabs= 0;
>  >
>  > DBUG_ENTER("fix_semijoin_strategies_for_picked_join_order");
>  >
>  > - for (tablenr= join->tables - 1 ; tablenr != join->const_tables - 1;
> tablenr--)
>  > + if (join->select_lex->sj_nests.is_empty())
>  > + DBUG_RETURN(FALSE);
>  > +
>  > + for (tableno= join->tables - 1; tableno != join->const_tables - 1;
> tableno--)
>  > {
>  > - POSITION *pos= join->best_positions + tablenr;
>  > + POSITION *pos= join->best_positions + tableno;
>  > JOIN_TAB *s= pos->table;
>  > TABLE_LIST *emb_sj_nest= s->emb_sj_nest;
>  > uint first;
>  > @@ -8471,7 +8471,7 @@ static bool fix_semijoin_strategies_for_
>  > */
>  > memcpy(pos - table_count + 1, emb_sj_nest->nested_join->sjm.positions,
>  > sizeof(POSITION) * table_count);
>  > - first= tablenr - table_count + 1;
>  > + first= tableno - table_count + 1;
>  > join->best_positions[first].n_sj_tables= table_count;
>  > join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE_LOOKUP;
>  >
>  > @@ -8507,14 +8507,13 @@ static bool fix_semijoin_strategies_for_
>  > /* Add materialization record count*/
>  > prefix_rec_count *= mat_sj_nest->nested_join->sjm.expected_rowcount;
>  >
>  > - uint i;
>  > table_map rem_tables= remaining_tables;
>  > - for (i= tablenr; i != (first + table_count - 1); i--)
>  > + for (uint i= tableno; i != (first + table_count - 1); i--)
>  > rem_tables |= join->best_positions[i].table->table->map;
>  >
>  > POSITION dummy;
>  > join->cur_sj_inner_tables= 0;
>  > - for (i= first + table_count; i<= tablenr; i++)
>  > + for (uint i= first + table_count; i<= tableno; i++)
>  > {
>  > best_access_path(join, join->best_positions[i].table, rem_tables, i, FALSE,
>  > prefix_rec_count, join->best_positions + i,&dummy);
>  > @@ -8528,14 +8527,14 @@ static bool fix_semijoin_strategies_for_
>  > {
>  > first= pos->first_firstmatch_table;
>  > join->best_positions[first].sj_strategy= SJ_OPT_FIRST_MATCH;
>  > - join->best_positions[first].n_sj_tables= tablenr - first + 1;
>  > + join->best_positions[first].n_sj_tables= tableno - first + 1;
>  > POSITION dummy; // For loose scan paths
>  > double record_count= (first== join->const_tables)? 1.0:
>  > - join->best_positions[tablenr - 1].prefix_record_count;
>  > + join->best_positions[tableno - 1].prefix_record_count;
>  >
>  > table_map rem_tables= remaining_tables;
>  > - uint idx;
>  > - for (idx= first; idx<= tablenr; idx++)
>  > +
>  > + for (uint idx= first; idx<= tableno; idx++)
>  > {
>  > rem_tables |= join->best_positions[idx].table->table->map;
>  > }
>  > @@ -8544,7 +8543,7 @@ static bool fix_semijoin_strategies_for_
>  > join buffering
>  > */
>  > join->cur_sj_inner_tables= 0;
>  > - for (idx= first; idx<= tablenr; idx++)
>  > + for (uint idx= first; idx<= tableno; idx++)
>  > {
>  > if (join->best_positions[idx].use_join_buffer)
>  > {
>  > @@ -8562,18 +8561,18 @@ static bool fix_semijoin_strategies_for_
>  > POSITION *first_pos= join->best_positions + first;
>  > POSITION loose_scan_pos; // For loose scan paths
>  > double record_count= (first== join->const_tables)? 1.0:
>  > - join->best_positions[tablenr - 1].prefix_record_count;
>  > + join->best_positions[tableno - 1].prefix_record_count;
>  >
>  > table_map rem_tables= remaining_tables;
>  > - uint idx;
>  > - for (idx= first; idx<= tablenr; idx++)
>  > +
>  > + for (uint idx= first; idx<= tableno; idx++)
>  > rem_tables |= join->best_positions[idx].table->table->map;
>  > /*
>  > Re-run best_access_path to produce best access methods that do not use
>  > join buffering
>  > */
>  > join->cur_sj_inner_tables= 0;
>  > - for (idx= first; idx<= tablenr; idx++)
>  > + for (uint idx= first; idx<= tableno; idx++)
>  > {
>  > if (join->best_positions[idx].use_join_buffer || (idx == first))
>  > {
>  > @@ -8598,7 +8597,7 @@ static bool fix_semijoin_strategies_for_
>  > */
>  > first= pos->first_dupsweedout_table;
>  > join->best_positions[first].sj_strategy= SJ_OPT_DUPS_WEEDOUT;
>  > - join->best_positions[first].n_sj_tables= tablenr - first + 1;
>  > + join->best_positions[first].n_sj_tables= tableno - first + 1;
>  > }
>  >
>  > uint i_end= first + join->best_positions[first].n_sj_tables;
>  > @@ -8613,7 +8612,7 @@ static bool fix_semijoin_strategies_for_
>  > handled_tabs |= join->best_positions[i].table->table->map;
>  > }
>  >
>  > - if (tablenr != first)
>  > + if (tableno != first)
>  > pos->sj_strategy= SJ_OPT_NONE;
>  > remaining_tables |= s->table->map;
>  > }
>  > @@ -8650,16 +8649,13 @@ static bool fix_semijoin_strategies_for_
>  >
>  > static bool get_best_combination(JOIN *join)
>  > {
>  > - uint i,tablenr;
>  > table_map used_tables;
>  > - JOIN_TAB *join_tab,*j;
>  > KEYUSE *keyuse;
>  > - uint table_count;
>  > + const uint table_count= join->tables;
>  > THD *thd=join->thd;
>  > DBUG_ENTER("get_best_combination");
>  >
>  > - table_count=join->tables;
>  > - if (!(join->join_tab= join_tab= new (thd->mem_root)
> JOIN_TAB[table_count]))
>  > + if (!(join->join_tab= new (thd->mem_root) JOIN_TAB[table_count]))
>  > DBUG_RETURN(TRUE);
>  >
>  > join->full_join=0;
>  > @@ -8669,11 +8665,12 @@ static bool get_best_combination(JOIN *j
>  > if (fix_semijoin_strategies_for_picked_join_order(join))
>  > DBUG_RETURN(TRUE);
>  >
>  > - for (j=join_tab, tablenr=0 ; tablenr< table_count ; tablenr++,j++)
>  > + for (uint tableno= 0; tableno< table_count; tableno++)
>  > {
>  > + JOIN_TAB *j= join->join_tab + tableno;
>  > TABLE *form;
>  > - *j= *join->best_positions[tablenr].table;
>  > - form=join->all_tables[tablenr]=j->table;
>  > + *j= *join->best_positions[tableno].table;
>  > + form=join->all_tables[tableno]= j->table;
>  > used_tables|= form->map;
>  > form->reginfo.join_tab=j;
>  > if (!*j->on_expr_ref)
>  > @@ -8683,31 +8680,69 @@ static bool get_best_combination(JOIN *j
>  > if (j->type == JT_CONST)
>  > continue; // Handled in make_join_stat..
>  >
>  > -
>  > j->loosescan_match_tab= NULL; //non-nulls will be set later
>  > j->ref.key = -1;
>  > j->ref.key_parts=0;
>  >
>  > -
>  > if (j->type == JT_SYSTEM)
>  > continue;
>  >
>  > - if (j->keys.is_clear_all() || !(keyuse=
> join->best_positions[tablenr].key) ||
>  > - (join->best_positions[tablenr].sj_strategy == SJ_OPT_LOOSE_SCAN))
>  > + if (j->keys.is_clear_all() ||
>  > + !(keyuse= join->best_positions[tableno].key) ||
>
> IMHO, the above line should be indented one more character.

Right.
>
>  > + (join->best_positions[tableno].sj_strategy == SJ_OPT_LOOSE_SCAN))
>  > {
>  > j->type=JT_ALL;
>  > - j->index= join->best_positions[tablenr].loosescan_key;
>  > - if (tablenr != join->const_tables)
>  > + j->index= join->best_positions[tableno].loosescan_key;
>  > + if (tableno != join->const_tables)
>  > join->full_join=1;
>  > }
>  > else if (create_ref_for_key(join, j, keyuse, used_tables))
>  > DBUG_RETURN(TRUE); // Something went wrong
>  > }
>  >
>  > - for (i=0 ; i< table_count ; i++)
>  > -
> join->map2table[join->join_tab[i].table->tablenr]=join->join_tab+i;
>  > + for (uint tableno= 0; tableno< table_count; tableno++)
>  > + join->map2table[join->join_tab[tableno].table->tablenr]=
>  > + join->join_tab + tableno;
>  > +
>  > update_depend_map(join);
>  > - DBUG_RETURN(0);
>  > +
>
> I think the purpose of the loop below should be explained in a comment
> above the love. (Or the entire loop could be put into a separate
> function and explained in a function comment.)

Comment added. Separating into functions could be a job for refactoring of JOIN. 
I could do it now, but refactoring would be better in with regard to consistent 
naming, placement, etc...
>
>  > + for (uint tableno= join->const_tables; tableno< table_count; )
>  > + {
>  > + JOIN_TAB *tab= join->join_tab + tableno;
>  > + const POSITION *pos= join->best_positions + tableno;
>  > +
>  > + switch (pos->sj_strategy)
>  > + {
>  > + case SJ_OPT_NONE:
>  > + tableno++;
>  > + break;
>  > + case SJ_OPT_MATERIALIZE_LOOKUP:
>  > + case SJ_OPT_MATERIALIZE_SCAN:
>  > + case SJ_OPT_LOOSE_SCAN:
>  > + case SJ_OPT_DUPS_WEEDOUT:
>  > + case SJ_OPT_FIRST_MATCH:
>  > + /*
>  > + Remember the first and last semijoin inner tables; this serves to tell
>  > + a JOIN_TAB's semijoin strategy (like in check_join_cache_usage()).
>  > + */
>  > + JOIN_TAB *last_sj_tab= tab + pos->n_sj_tables - 1;
>  > + JOIN_TAB *last_sj_inner=
>  > + (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
>  > + /* Range may end with non-inner table so cannot set last_sj_inner_tab */
>  > + NULL : last_sj_tab;
>  > + for (JOIN_TAB *tab_in_range= tab;
>  > + tab_in_range<= last_sj_tab;
>  > + tab_in_range++)
>  > + {
>  > + tab_in_range->first_sj_inner_tab= tab;
>  > + tab_in_range->last_sj_inner_tab= last_sj_inner;
>  > + }
>  > + tableno+= pos->n_sj_tables;
>  > + break;
>  > + }
>  > + }
>  > +
>  > + DBUG_RETURN(FALSE);
>  > }
>  >
>  >
>  > @@ -9350,6 +9385,22 @@ static bool pushdown_on_conditions(JOIN*
>  > }
>  >
>  >
>  > +/**
>  > + Separates the predicates in a join condition and pushes them to the
>  > + join step where all involved tables are available in the join prefix.
>  > + ON clauses from JOIN expressions are also pushed to the most appropriate
> step.
>  > +
>  > + @param join Join object where predicates are pushed.
>  > +
>  > + @param cond Pointer to condition which may contain an arbitrary number of
>  > + predicates, combined using AND, OR and XOR items.
>  > + If NULL, equivalent to a predicate that returns TRUE for all
>  > + row combinations.
>  > +
>  > + @retval TRUE if condition makes query always false OR an error occurred.
>  > + @retval FALSE otherwise.
>  > +*/
>  > +
>  > static bool make_join_select(JOIN *join, Item *cond)
>  > {
>  > THD *thd= join->thd;
>  > @@ -9445,8 +9496,7 @@ static bool make_join_select(JOIN *join,
>  > - If we're looking at the first SJM table, reset used_tables
>  > to refer to only allowed tables
>  > */
>  > - if (tab->emb_sj_nest&&
>  > - tab->emb_sj_nest->sj_mat_exec&&
>  > + if (sj_is_materialize_strategy(tab->get_sj_strategy())&&
>  > !(used_tables& tab->emb_sj_nest->sj_inner_tables))
>  > {
>  > save_used_tables= used_tables;
>  > @@ -9689,32 +9739,38 @@ static bool make_join_select(JOIN *join,
>  >
>  > DBUG_ASSERT(save_used_tables ? tab->emb_sj_nest != NULL : TRUE);
>  >
>  > - if (save_used_tables&& !(used_tables&
>  > - ~(tab->emb_sj_nest->sj_inner_tables |
>  > - join->const_table_map | PSEUDO_TABLE_BITS)))
>  > + /*
>  > + 1. We are inside a materialized semijoin nest, and
>  > + 2. All inner tables of the nest are covered.
>  > + */
>  > + if (save_used_tables&& // 1
>  > + !(tab->emb_sj_nest->sj_inner_tables& ~used_tables)) // 2
>
> Would it not be an idea to create a subset function soon so that we do
> not have to twist our minds to understand this each time. This
> example shows that even an experienced MySQL developer had problems
> getting this right.

Yes agree. But I think that is better done as a refactoring task that handles 
all such cases within the optimizer. Perhaps converting table_map to a class 
that hides the implementation bit-map field and only exposes the functions would 
be a good idea.
>
>  > {
>  > /*
>  > - We have reached the end of semi join nest. That is, the join order
>  > - looks like this:
>  > + The join order now looks like this:
>  >
>  > - outer_tbl1 SJ-Materialize(inner_tbl1 ... inner_tblN) outer_tbl ...
>  > - ^
>  > - \-we're here
>  > - At this point, we need to produce two conditions
>  > - - A condition that can be checked when we have all of the sj-inner
>  > - tables (inner_tbl1 ... inner_tblN). This will be used while doing
>  > - materialization.
>  > - - A condition that can be checked when we have all of the tables
>  > - in the prefix (both inner and outer).
>  > + ot1 ... otI SJM(it1 ... itN) otI+1 ... otM
>  > + ^
>  > + \-we're here
>  > + At this point, we have generated a condition that can be checked
>  > + when we have all of the sj-inner tables (it1 ... itN).
>  > + This will be used while doing materialization.
>  > +
>  > + In addition, we need a condition that can be checked when we have
>  > + all of the tables in the prefix (both inner and outer).
>  > + This condition is only generated (and used) when we have an SJM-scan
>  > + operation. For SJM-lookup, the condition is completely fulfilled
>  > + through the lookup into the materialized table.
>  > + This constraint will last as long as we do not allow correlated
>  > + subqueries with materialized semijoin execution.
>  > */
>  > - tab->emb_sj_nest->sj_mat_exec->join_cond=
>  > - cond ?
>  > - make_cond_after_sjm(cond, cond, save_used_tables, used_tables):
>  > - NULL;
>  > + if (cond&& tab->emb_sj_nest->sj_mat_exec->is_scan)
>  > + tab->emb_sj_nest->sj_mat_exec->join_cond=
>  > + make_cond_after_sjm(cond, cond, save_used_tables, used_tables);
>  > +
>  > used_tables= save_used_tables | used_tables;
>  > save_used_tables= 0;
>  > }
>  > -
>  > }
>  > }
>  > DBUG_RETURN(0);
>  > @@ -10877,8 +10933,7 @@ make_join_readinfo(JOIN *join, ulonglong
>  > uint last_sjm_table= MAX_TABLES;
>  > DBUG_ENTER("make_join_readinfo");
>  >
>  > - if (!join->select_lex->sj_nests.is_empty()&&
>  > - setup_semijoin_dups_elimination(join, options, no_jbuf_after))
>  > + if (setup_semijoin_dups_elimination(join, options, no_jbuf_after))
>  > DBUG_RETURN(TRUE); /* purecov: inspected */
>  >
>  > for (i=join->const_tables ; i< join->tables ; i++)
>  > @@ -12527,17 +12582,18 @@ Item *eliminate_item_equal(Item *cond, C
>  > Item_equal *item_equal)
>  > {
>  > List<Item> eq_list;
>  > - Item_func_eq *eq_item= 0;
>  > + Item_func_eq *eq_item= NULL;
>  > if (((Item *) item_equal)->const_item()&&
> !item_equal->val_int())
>  > return new Item_int((longlong) 0,1);
>  > Item *item_const= item_equal->get_const();
>  > Item_equal_iterator it(*item_equal);
>  > Item *head;
>  > - if (item_const)
>  > - head= item_const;
>  > - else
>  > + if (!item_const)
>  > {
>  > - head= item_equal->get_first();
>  > + /*
>  > + If there is a const item, match all field items with the const item,
>  > + otherwise match the second and subsequent field items with the first one:
>  > + */
>  > it++;
>  > }
>  > Item_field *item_field;
>  > @@ -12548,7 +12604,7 @@ Item *eliminate_item_equal(Item *cond, C
>  > if (upper)
>  > {
>  > if (item_const&& upper->get_const())
>  > - item= 0;
>  > + item= NULL;
>  > else
>  > {
>  > Item_equal_iterator li(*item_equal);
>  > @@ -12563,57 +12619,42 @@ Item *eliminate_item_equal(Item *cond, C
>  > {
>  > if (eq_item)
>  > eq_list.push_back(eq_item);
>  > - /*
>  > - item_field might refer to a table that is within a semi-join
>  > - materialization nest. In that case, the join order looks like this:
>  >
>  > - outer_tbl1 outer_tbl2 SJM (inner_tbl1 inner_tbl2) outer_tbl3
>  > + /*
>  > + item_field may refer to a table that is within a semijoin
>
> I suggest prefixing item_field with @c, so that doxygen formats the
> name as a C++ identifier.

Ok.
>
>  > + materialization nest. In that case, the join order may look like:
>  >
>  > - We must not construct equalities like
>  > + ot1 ot2 SJM (it3 it4) ot5
>  >
>  > - outer_tbl1.col = inner_tbl1.col
>  > + If we have a multiple equality (ot1.c1, ot2.c2, it3.c3, it4.c4, ot5.c5),
>  > + we should generate the following equalities:
>  > + 1. ot1.c1 = ot2.c2
>  > + 2. ot1.c1 = it1.c3
>  > + 3. it1.c3 = it2.c4
>  > + 4. ot1.c1 = ot5.c5
>  > +
>  > + Equalities 1) and 4) are regular equalities between two outer tables.
>  > + Equality 2) is an equality that matches the outer query with a
>  > + materialized semijoin table. It is either performed as a lookup
>  > + into the materialized table (SJM-lookup), or as a condition on the
>  > + outer table (SJM-scan).
>  > + Equality 3) is evaluated during semijoin materialization.
>  > +
>  > + If there is a const item, match against this one.
>  > + Otherwise, match against the first field item in the multiple equality,
>  > + unless the item is within a materialized semijoin nest, where we match
>  > + against the first item within the SJM nest (if the item is not the first
>  > + item within the SJM nest), or match against the first item in the
>  > + list (if the item is the first one in the SJM list).
>  > + */
>
> The last part got a bit complex. The terminology is also a bit
> inconsistent. I think you use SJM nest and SJM list for the same
> thing, and multiple equality and list likewise.

SJM list was a typo.
>
> I also think you need to explain get_first(Item_field*) role in this.
> Otherwise, it is not easy to understand the connection between the
> description above and the code below.

Guilhem asked me to add a comment to the "head == item" test. Is that sufficient?
>
>  > + head= item_const ? item_const : item_equal->get_first(item);
>  > + if (head == item)
>  > + head= item_equal->get_first();
>  >
>  > - because they would get attached to inner_tbl1 and will get evaluated
>  > - during materialization phase, when we don't have current value of
>  > - outer_tbl1.col.
>  > - */
>  > - TABLE_LIST *emb_nest=
>  > - item_field->field->table->pos_in_table_list->embedding;
>  > - if (!item_const&& emb_nest&& emb_nest->sj_mat_exec)
>  > - {
>  > - /*
>  > - Find the first equal expression that refers to a table that is
>  > - within the semijoin nest. If we can't find it, do nothing
>  > - */
>  > - List_iterator<Item_field> fit(item_equal->fields);
>
> Since this function no longer refers to Item_equal::fields, it can be
> removed as friend of Item_equal.

Ok.
>
>  > - Item_field *head_in_sjm;
>  > - bool found= FALSE;
>  > - while ((head_in_sjm= fit++))
>  > - {
>  > - if (head_in_sjm->used_tables()& emb_nest->sj_inner_tables)
>  > - {
>  > - if (head_in_sjm == item_field)
>  > - {
>  > - /* This is the first table inside the semi-join*/
>  > - eq_item= new Item_func_eq(item_field, head);
>  > - /* Tell make_cond_for_table don't use this. */
>  > - eq_item->marker=3;
>  > - }
>  > - else
>  > - {
>  > - eq_item= new Item_func_eq(item_field, head_in_sjm);
>  > - found= TRUE;
>  > - }
>  > - break;
>  > - }
>  > - }
>  > - if (!found)
>  > - continue;
>  > - }
>  > - else
>  > - eq_item= new Item_func_eq(item_field, head);
>  > + eq_item= new Item_func_eq(item_field, head);
>
> I think it is a bit confusing that, AFAICT, both item and item_field
> is used to refer to the same item. I suggest that inside the
> item==item_field block that all references are to item_field.

Yes, good idea.
>
>  > if (!eq_item)
>  > - return 0;
>  > + return NULL;
>  > +
>  > eq_item->set_cmp_func();
>  > eq_item->quick_fix_field();
>  > }
>  > @@ -16908,28 +16949,27 @@ sub_select_sjm(JOIN *join, JOIN_TAB *joi
>  > Ok, materialization finished. Initialize the access to the temptable
>  > */
>  > sjm->materialized= TRUE;
>  > +
>  > join_tab->read_record.read_record= join_no_more_records;
>  > - if (sjm->is_scan)
>  > - {
>  > - /* Initialize full scan */
>  > - JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
>  > - init_read_record(&last_tab->read_record, join->thd,
>  > - sjm->table, NULL, TRUE, TRUE, FALSE);
>  > -
>  > - DBUG_ASSERT(last_tab->read_record.read_record == rr_sequential);
>  > - last_tab->read_first_record= join_read_record_no_init;
>  > - last_tab->read_record.copy_field= sjm->copy_field;
>  > - last_tab->read_record.copy_field_end= sjm->copy_field +
>  > - sjm->table_cols.elements;
>  > - last_tab->read_record.read_record= rr_sequential_and_unpack;
>  > - }
>  > }
>  >
>  > if (sjm->is_scan)
>  > {
>  > - /* Do full scan of the materialized table */
>  > + /*
>  > + For each prefix partial record (if any), initialize and perform
>  > + a full scan of the materialized table.
>  > + */
>  > JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
>  >
>  > + init_read_record(&last_tab->read_record, join->thd,
>  > + sjm->table, NULL, TRUE, TRUE, FALSE);
>  > +
>  > + last_tab->read_first_record= join_read_record_no_init;
>  > + last_tab->read_record.copy_field= sjm->copy_field;
>  > + last_tab->read_record.copy_field_end= sjm->copy_field +
>  > + sjm->table_cols.elements;
>  > + last_tab->read_record.read_record= rr_sequential_and_unpack;
>  > +
>
> Since this code does not deal anything with equality predicates, I
> would prefer if the above bug fix was separate patch.
>
> Anyhow, I am afraid that readers of this code, may be a bit puzzled
> that you set last_tab->read_first_record and
> last_tab->read_record.read_record for each scan. I doubt that which
> functions to use will change between scans. copy_field probably does
> not change either.

I tried this, but it appeared that setting several of the fields per scan was 
needed. So I defaulted to set them all...
>
>  > Item *save_cond= last_tab->select_cond;
>  > last_tab->set_select_cond(sjm->join_cond, __LINE__);
>  > rc= sub_select(join, last_tab, end_of_records);
>  > @@ -18840,21 +18880,23 @@ static bool replace_subcondition(JOIN *j
>  > }
>  >
>  >
>  > -/*
>  > +/**
>  > Extract a condition that can be checked after reading given table
>  >
>  > - SYNOPSIS
>  > - make_cond_for_table()
>  > - cond Condition to analyze
>  > - tables Tables for which "current field values" are available
>  > - used_table Table that we're extracting the condition for (may
>  > - also include PSEUDO_TABLE_BITS
>  > - exclude_expensive_cond Do not push expensive conditions
>  > + @param cond Condition to analyze
>  > + @param tables Tables for which "current field values" are available
>  > + @param used_table Table that we're extracting the condition for (may
>  > + also include PSEUDO_TABLE_BITS, and may be zero)
>  > + @param exclude_expensive_cond Do not push expensive conditions
>  >
>  > - DESCRIPTION
>  > + @retval<>NULL Generated condition
>  > + @retval = NULL Already checked, OR error
>  > +
>  > + @details
>  > Extract the condition that can be checked after reading the table
>  > specified in 'used_table', given that current-field values for tables
>  > specified in 'tables' bitmap are available.
>  > + If 'used_table' is 0, extract conditions for all tables in 'tables'.
>  >
>  > The function assumes that
>  > - Constant parts of the condition has already been checked.
>  > @@ -18865,12 +18907,12 @@ static bool replace_subcondition(JOIN *j
>  > guaranteed to be true by employed 'ref' access methods (the code that
>  > does this is located at the end, search down for "EQ_FUNC").
>  >
>  > -
>  > - SEE ALSO
>  > + @see
>  > make_cond_for_info_schema uses similar algorithm
>  >
>  > - RETURN
>  > - Extracted condition
>  > + @note
>  > + Make sure to keep the implementations of make_cond_for_table() and
>  > + make_cond_after_sjm() synchronized.
>  > */
>  >
>  > static Item *
>  > @@ -18880,26 +18922,32 @@ make_cond_for_table(Item *cond, table_ma
>  > return make_cond_for_table_from_pred(cond, cond, tables, used_table,
>  > exclude_expensive_cond);
>  > }
>  > -
>  > +
>  > static Item *
>  > make_cond_for_table_from_pred(Item *root_cond, Item *cond,
>  > table_map tables, table_map used_table,
>  > bool exclude_expensive_cond)
>  > {
>  > - if (used_table&& !(cond->used_tables()& used_table)&&
>  > - /*
>  > - Exclude constant conditions not checked at optimization time if
>  > + /*
>  > + Ignore this condition if
>  > + 1. We are extracting conditions for a specific table, and
>  > + 2. that table is not referenced by the condition, and
>  > + 3. exclude constant conditions not checked at optimization time if
>  > the table we are pushing conditions to is the first one.
>  > As a result, such conditions are not considered as already checked
>  > and will be checked at execution time, attached to the first table.
>  > -
>  > + */
>  > + if (used_table&& // 1
>  > + !(cond->used_tables()& used_table)&& // 2
>  > + /*
>  > psergey: TODO: "used_table& 1" doesn't make sense in nearly any
>  > context. Look at setup_table_map(), table bits reflect the order
>  > the tables were encountered by the parser. Check what we should
>  > replace this condition with.
>  > */
>  > - !((used_table& 1)&& cond->is_expensive()))
>  > - return (Item*) 0; // Already checked
>  > + !((used_table& 1)&& cond->is_expensive())) // 3
>  > + return NULL;
>  > +
>  > if (cond->type() == Item::COND_ITEM)
>  > {
>  > if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
>  > @@ -18907,48 +18955,48 @@ make_cond_for_table_from_pred(Item *root
>  > /* Create new top level AND item */
>  > Item_cond_and *new_cond=new Item_cond_and;
>  > if (!new_cond)
>  > - return (Item*) 0; // OOM /* purecov: inspected */
>  > + return NULL;
>  > List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
>  > Item *item;
>  > while ((item=li++))
>  > {
>  > - Item *fix=make_cond_for_table_from_pred(root_cond, item,
>  > + Item *fix=make_cond_for_table_from_pred(root_cond, item,
>
> Since you are changing the above line anyway, maybe you should change
> to the MySQL-standard assignment format, too. (There are several
> cases of this.)

Sure, done.
>
>
>  > tables, used_table,
>  > exclude_expensive_cond);
>  > - if (fix)
>  > - new_cond->argument_list()->push_back(fix);
>  > + if (fix)
>  > + new_cond->argument_list()->push_back(fix);
>  > }
>  > switch (new_cond->argument_list()->elements) {
>  > case 0:
>  > - return (Item*) 0; // Always true
>  > + return NULL; // Always true
>  > case 1:
>  > - return new_cond->argument_list()->head();
>  > + return new_cond->argument_list()->head();
>  > default:
>  > - /*
>  > - Item_cond_and do not need fix_fields for execution, its parameters
>  > - are fixed or do not need fix_fields, too
>  > - */
>  > - new_cond->quick_fix_field();
>  > - new_cond->used_tables_cache=
>  > - ((Item_cond_and*) cond)->used_tables_cache&
>  > - tables;
>  > - return new_cond;
>  > + /*
>  > + Item_cond_and do not need fix_fields for execution, its parameters
>  > + are fixed or do not need fix_fields, too
>  > + */
>  > + new_cond->quick_fix_field();
>  > + new_cond->used_tables_cache=
>  > + ((Item_cond_and*) cond)->used_tables_cache&
>  > + tables;
>
> I suggest merging the to lines above.

Done.
>
>
>  > + return new_cond;
>  > }
>  > }
>  > else
>  > - { // Or list
>  > + { // Or list
>  > Item_cond_or *new_cond=new Item_cond_or;
>  > if (!new_cond)
>  > - return (Item*) 0; // OOM /* purecov: inspected */
>  > + return NULL;
>  > List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
>  > Item *item;
>  > while ((item=li++))
>  > {
>  > - Item *fix=make_cond_for_table_from_pred(root_cond, item,
>  > + Item *fix=make_cond_for_table_from_pred(root_cond, item,
>  > tables, 0L,
>  > exclude_expensive_cond);
>  > if (!fix)
>  > - return (Item*) 0; // Always true
>  > + return NULL; // Always true
>  > new_cond->argument_list()->push_back(fix);
>  > }
>  > /*
>  > @@ -18963,20 +19011,26 @@ make_cond_for_table_from_pred(Item *root
>  > }
>  >
>  > /*
>  > - Because the following test takes a while and it can be done
>  > - table_count times, we mark each item that we have examined with the result
>  > - of the test
>  > - */
>  > + Omit this condition if
>  > + 1. It has been marked as omittable before, or
>  > + 2. Some tables referred by the condition are not available, or
>  > + 3. We are extracting conditions for any tables, the condition is
>  > + considered 'expensive', and we want to delay evaluation of such
>  > + conditions to the execution phase.
>  > + */
>  > + if (cond->marker == 3 || // 1
>  > + (cond->used_tables()& ~tables) || // 2
>  > + (!used_table&& exclude_expensive_cond&&
> cond->is_expensive())) // 3
>  > + return NULL;
>  >
>  > - if (cond->marker == 3 || (cond->used_tables()& ~tables) ||
>  > - /*
>  > - When extracting constant conditions, treat expensive conditions as
>  > - non-constant, so that they are not evaluated at optimization time.
>  > - */
>  > - (!used_table&& exclude_expensive_cond&&
> cond->is_expensive()))
>  > - return (Item*) 0; // Can't check this yet
>  > - if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
>  > - return cond; // Not boolean op
>  > + /*
>  > + Extract this condition if
>  > + 1. It has already been marked as applicable, or
>  > + 2. It is not a<comparison predicate>
> (=,<,>,<=,>=,<=>)
>  > + */
>  > + if (cond->marker == 2 || // 1
>  > + cond->eq_cmp_result() == Item::COND_OK) // 2
>  > + return cond;
>  >
>  > /*
>  > Remove equalities that are guaranteed to be true by use of 'ref' access
>  > @@ -19004,19 +19058,19 @@ make_cond_for_table_from_pred(Item *root
>  > Item *left_item= ((Item_func*) cond)->arguments()[0]->real_item();
>  > Item *right_item= ((Item_func*) cond)->arguments()[1]->real_item();
>  > if (left_item->type() == Item::FIELD_ITEM&&
>  > - test_if_ref(root_cond, (Item_field*) left_item,right_item))
>  > + test_if_ref(root_cond, (Item_field*) left_item,right_item))
>  > {
>  > - cond->marker=3; // Checked when read
>  > - return (Item*) 0;
>  > + cond->marker=3; // Condition can be omitted
>  > + return NULL;
>  > }
>  > if (right_item->type() == Item::FIELD_ITEM&&
>  > - test_if_ref(root_cond, (Item_field*) right_item,left_item))
>  > + test_if_ref(root_cond, (Item_field*) right_item,left_item))
>  > {
>  > - cond->marker=3; // Checked when read
>  > - return (Item*) 0;
>  > + cond->marker=3; // Condition can be omitted
>  > + return NULL;
>  > }
>
> I suggest merging the above two if-statements. No need to duplicate
> this code. If readability is a concern, it can be achieved with
> white-space.

Ok
>
> Even better, make a small inline function so that you can write
>
> if (x(root_cond, right_item, left_item) ||
> x(root_cond, left_item, right_item))
>
>
>  > }
>  > - cond->marker=2;
>  > + cond->marker=2; // Mark condition as applicable
>  > return cond;
>  > }
>  >
>  > @@ -19030,24 +19084,35 @@ make_cond_for_table_from_pred(Item *root
>  > @param sjm_tables Tables within the semi-join nest (the inner part).
>  >
>  > @retval<>NULL Generated condition
>  > - @retval = NULL Already checked, or error
>  > + @retval = NULL Already checked, OR error
>  >
>  > @details
>  > A regular semi-join materialization is always non-correlated, ie
>  > the subquery is always resolved by performing a lookup generated in
>  > - create_subquery_equalities, hence this function should never produce
>  > + create_subquery_equalities, hence this function never needs to produce
>  > any condition for regular semi-join materialization.
>  > For a scan semi-join materialization, this function may return a condition
>  > - to be checked.
>  > + to be checked, when there are outer tables before the SJM tables in the
>  > + join prefix.
>  > +
>  > + @note
>  > + Make sure to keep the implementations of make_cond_for_table() and
>  > + make_cond_after_sjm() synchronized.
>  > */
>  >
>  > static Item *
>  > make_cond_after_sjm(Item *root_cond, Item *cond, table_map tables,
>  > table_map sjm_tables)
>  > {
>  > + /*
>  > + We can only test conditions that cover tables from the join prefix
>  > + and tables from the semijoin nest. Other conditions will be handled
>  > + by make_cond_for_table().
>  > + */
>  > if ((!(cond->used_tables()& ~tables) ||
>  > !(cond->used_tables()& ~sjm_tables)))
>  > - return (Item*) 0; // Already checked
>  > + return NULL;
>  > +
>  > if (cond->type() == Item::COND_ITEM)
>  > {
>  > if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
>  > @@ -19055,49 +19120,49 @@ make_cond_after_sjm(Item *root_cond, Ite
>  > /* Create new top level AND item */
>  > Item_cond_and *new_cond=new Item_cond_and;
>  > if (!new_cond)
>  > - return (Item*) 0; // OOM /* purecov: inspected */
>  > + return NULL;
>  > List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
>  > Item *item;
>  > while ((item=li++))
>  > {
>  > - Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
>  > - if (fix)
>  > - new_cond->argument_list()->push_back(fix);
>  > + Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
>  > + if (fix)
>  > + new_cond->argument_list()->push_back(fix);
>  > }
>  > switch (new_cond->argument_list()->elements) {
>  > case 0:
>  > - return (Item*) 0; // Always true
>  > + return NULL; // Always true
>  > case 1:
>  > - return new_cond->argument_list()->head();
>  > + return new_cond->argument_list()->head();
>  > default:
>  > /*
>  > - Item_cond_and do not need fix_fields for execution, its parameters
>  > - are fixed or do not need fix_fields, too
>  > + Item_cond_and do not need fix_fields for execution, its parameters
>  > + are fixed or do not need fix_fields, too
>  > */
>  > - new_cond->quick_fix_field();
>  > - new_cond->used_tables_cache=
>  > - ((Item_cond_and*) cond)->used_tables_cache&
>  > - tables;
>  > - return new_cond;
>  > + new_cond->quick_fix_field();
>  > + new_cond->used_tables_cache=
>  > + ((Item_cond_and*) cond)->used_tables_cache&
>  > + tables;
>
> Likewise.

Done.
>
>  > + return new_cond;
>  > }
>  > }
>  > else
>  > - { // Or list
>  > + { // Or list
>  > Item_cond_or *new_cond=new Item_cond_or;
>  > if (!new_cond)
>  > - return (Item*) 0; // OOM /* purecov: inspected */
>  > + return NULL;
>  > List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
>  > Item *item;
>  > while ((item=li++))
>  > {
>  > - Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
>  > - if (!fix)
>  > - return (Item*) 0; // Always true
>  > - new_cond->argument_list()->push_back(fix);
>  > + Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
>  > + if (!fix)
>  > + return NULL; // Always true
>  > + new_cond->argument_list()->push_back(fix);
>  > }
>  > /*
>  > - Item_cond_and do not need fix_fields for execution, its parameters
>  > - are fixed or do not need fix_fields, too
>  > + Item_cond_and do not need fix_fields for execution, its parameters
>  > + are fixed or do not need fix_fields, too
>  > */
>  > new_cond->quick_fix_field();
>  > new_cond->used_tables_cache= ((Item_cond_or*) cond)->used_tables_cache;
>  > @@ -19107,15 +19172,22 @@ make_cond_after_sjm(Item *root_cond, Ite
>  > }
>  >
>  > /*
>  > - Because the following test takes a while and it can be done
>  > - table_count times, we mark each item that we have examined with the result
>  > - of the test
>  > + Omit this condition if
>  > + 1. It has been marked as omittable before, or
>  > + 2. Some tables referred by the condition are not available.
>  > */
>  > + if (cond->marker == 3 || // 1
>  > + cond->used_tables()& ~(tables | sjm_tables)) // 2
>  > + return NULL;
>  >
>  > - if (cond->marker == 3 || (cond->used_tables()& ~(tables |
> sjm_tables)))
>  > - return (Item*) 0; // Can't check this yet
>  > - if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
>  > - return cond; // Not boolean op
>  > + /*
>  > + Extract this condition if
>  > + 1. It has already been marked as applicable, or
>  > + 2. It is not a<comparison predicate>
> (=,<,>,<=,>=,<=>)
>  > + */
>  > + if (cond->marker == 2 || // 1
>  > + cond->eq_cmp_result() == Item::COND_OK) // 2
>  > + return cond;
>  >
>  > /*
>  > Remove equalities that are guaranteed to be true by use of 'ref' access
>  > @@ -19128,17 +19200,17 @@ make_cond_after_sjm(Item *root_cond, Ite
>  > if (left_item->type() == Item::FIELD_ITEM&&
>  > test_if_ref(root_cond, (Item_field*) left_item,right_item))
>  > {
>  > - cond->marker=3; // Checked when read
>  > - return (Item*) 0;
>  > + cond->marker=3; // Condition can be omitted
>  > + return NULL;
>  > }
>  > if (right_item->type() == Item::FIELD_ITEM&&
>  > test_if_ref(root_cond, (Item_field*) right_item,left_item))
>  > {
>  > - cond->marker=3; // Checked when read
>  > - return (Item*) 0;
>  > + cond->marker=3; // Condition can be omitted
>  > + return NULL;
>
> Likewise.

Done.
>
>  > }
>  > }
>  > - cond->marker=2;
>  > + cond->marker=2; // Mark condition as applicable
>  > return cond;
>  > }
>  >
>  >
>  >
>  >
>  >
>  >
>

Thanks,
Roy
Thread
bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265) Bug#45174Bug#50019 Bug#52068Roy Lyseng22 Oct
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)Bug#45174 Bug#50019 Bug#52068Guilhem Bichot22 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)Bug#45174 Bug#50019 Bug#52068Roy Lyseng22 Oct
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)Bug#45174 Bug#50019 Bug#52068Øystein Grøvlen22 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3265)Bug#45174 Bug#50019 Bug#52068Roy Lyseng25 Oct