Hi Øystein,
thank you for the review.
On 22.10.10 15.41, Øystein Grøvlen wrote:
> Hi,
>
> Thanks for the patch. You have made a good job of understanding the
> mechanisms here and an equally good job of explaining what goes on. I
> learned much from this review.
>
> My main comments:
>
> - I am not very comfortable with the new get_first function, neither
> its name nor its location (see comments inline). I see that it is
> not easy to see what it the best compromise here within the limits
> of this spaghetti, but please consider whether there are other ways
> of doing this.
Please see answers to inline comments.
>
> - Making many different fixes in the same patch is a challenge for
> the reviewers since it is not obvious which changes leads to which
> changes in behavior (e.g., test results.) I understand when there
> is a lot of problems with a given test case, it is bit difficult to
> split things up, but I think you should consider a separate patch
> for the fix to the scanning of the materilization table (with test
> case for Bug#52068)
OK, separating out bug#52068.
>
> See inline for more comments.
>
>
> On 10/22/10 09:58 AM, Roy Lyseng wrote:
> > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:guilhem@stripped
> >
> > 3265 Roy Lyseng 2010-10-22
> > Bug#45174: Incorrectly applied equality propagation caused wrong result
> > on a query with a materialized semi-join.
> > Bug#50019: Wrong result for IN-query with materialization.
> > Bug#52068: Optimizer generates invalid semijoin materialization plan
> >
> > When a subquery is subject to a semijoin optimization, its tables
> > are merged to the outer query and later are treated as regular tables.
> > This allows a bunch of optimizations to be applied, equality
> > propagation is among them. Equality propagation is done after query
> > execution plan is chosen. It substitutes fields from tables being
> > retrieved later for fields from tables being retrieved earlier.
> > However, it can't be applied as is to any semijoin table.
> > The semijoin materialization strategy differs from other semijoin
> > strategies that the data from materialized semijoin tables isn't used
> > directly but saved to a temporary table first.
> > The materialization isn't isolated in a separate step, it is done
> > inline within the nested loop execution.
> > When it comes to fetching rows from the first table in the block of
> > materialized semijoin tables, sub_select() function is called to
> > materialize the result of the subquery and save it in the
> > materialized table. Later, data from the materialized table is used
> > as if they were regular table rows.
> > Due to this we can't substitute fields that belong to the semi-join
> > for fields from outer query and vice versa.
> >
>
> The above explanation seems unecessary detailed/complex to me. I
> cannot see how doing materialization in a separate step would help
> here.
I will trim the text a little.
>
> > Example: suppose we have a join order:
> >
> > ot1 ot2 SJ-Mat(it1 it2 it3) ot3
> >
> > and equality ot2.col = it1.col = it2.col
> > If we're looking for best substitute for 'it2.col', we should pick
> > it1.col and not ot2.col.
> >
> > For a field that is not in a materialized semijoin we can use any field,
> > even those that are embedded in a materialized semijoin. This is because
> > such fields are "copied back" to their original join-tab structures when
> > the materialized temporary table is being read.
> >
> > Now we have added another Item_equal::get_first() function that accepts
> > as a parameter a field being substituted and checks whether it belongs
> > to a materialized semijoin.
> > The field to substitute will be from the same materialized semijoin nest
> > (if supplied field is within such nest), otherwise it will be the first
> > field in the multiple equality.
> >
> > The new checks rely on the first_sj_inner_tab and first_sj_inner_tab
> > fields of the join-tab. These fields are therefore set as soon as
> > possible after the join strategy is fixed.
>
> Maybe you could mentioned that they before only were used for
> DuplicateWeedout strategy.
Ok
>
> >
> > Also fixed problem appearing in Bug#52068: When MaterializeScan
> > semijoin strategy was used and there were one or more outer dependent
> > tables before the semijoin tables, the scan over the materialized
> > table was not properly reset for each row of the prefix outer tables.
> >
> > Also fixed problems with pushdown of SJM-aware predicates during
> > make_join_select(): wrong predicates were sometimes generated,
> > make_cond_after_sjm() was called at the wrong position, and
> > make_cond_after_sjm() was never actually considering the pushed-down
> > SJM predicates.
> >
> > mysql-test/include/subquery_sj.inc
> > Bug#45174: Incorrectly applied equality propagation caused wrong result
> > on a query with a materialized semi-join.
> > Bug#50019: Wrong result for IN-query with materialization.
> > Bug#52068: Optimizer generates invalid semijoin materialization plan
> > Added new tests.
> >
> > mysql-test/r/subquery_mat_all.result
> > mysql-test/r/subquery_sj_all.result
> > mysql-test/r/subquery_sj_all_jcl6.result
> > mysql-test/r/subquery_sj_all_jcl7.result
> > mysql-test/r/subquery_sj_dupsweed.result
> > mysql-test/r/subquery_sj_dupsweed_jcl6.result
> > mysql-test/r/subquery_sj_dupsweed_jcl7.result
> > mysql-test/r/subquery_sj_firstmatch.result
> > mysql-test/r/subquery_sj_firstmatch_jcl6.result
> > mysql-test/r/subquery_sj_firstmatch_jcl7.result
> > mysql-test/r/subquery_sj_loosescan.result
> > mysql-test/r/subquery_sj_loosescan_jcl6.result
> > mysql-test/r/subquery_sj_loosescan_jcl7.result
> > mysql-test/r/subquery_sj_mat.result
> > mysql-test/r/subquery_sj_mat_jcl6.result
> > mysql-test/r/subquery_sj_mat_jcl7.result
> > mysql-test/r/subquery_sj_mat_nosj.result
> > mysql-test/r/subquery_sj_none.result
> > mysql-test/r/subquery_sj_none_jcl6.result
> > mysql-test/r/subquery_sj_none_jcl7.result
> > Bug#45174: Incorrectly applied equality propagation caused wrong result
> > on a query with a materialized semi-join.
> > Bug#50019: Wrong result for IN-query with materialization.
> > Bug#52068: Optimizer generates invalid semijoin materialization plan
> > Results for three new tests added.
> > Some tests using semijoin materialization show that where clause
> > has moved from the outer query into the materialized inner query.
> > This is caused by the changed call to get_first() in
> > eliminate_item_equal().
> > Ex: select * from ot where a in(select b from it where b>0);
> > The clause "b>0" is now evaluated on the inner query materialization.
> > Performance-wise this is never worse when using MaterializeScan and
> > usually better for MaterializeLookup. For the latter strategy, the
>
> I had to read the above sentence a few times before understanding the
> meaning of it. Never worse than what? I assume than before. Does
> usually better mean "usually better, but sometimes worse" or "usually
> better, and never worse"?
Rephrased.
>
> > best possible solution is probably to evaluate the clause in both
> > queries, this can be subject for a later feature development.
> > Another test that applies the same condition to both the outer and
> > the inner query is added, to show the plan for such types of queries.
> > Earlier wrong semijoin materialization test results are corrected.
> >
> > sql/item.cc
> > Bug#45174: Incorrectly applied equality propagation caused wrong result
> > on a query with a materialized semi-join.
> >
> > Calling new get_first() function instead of old.
> >
> > sql/item_cmpfunc.cc
> > Bug#45174: Incorrectly applied equality propagation caused wrong result
> > on a query with a materialized semi-join.
> >
> > New function Item_equal::get_first() that accepts as argument
> > a field being substituted.
> >
> > sql/item_cmpfunc.h
> > Bug#45174: Incorrectly applied equality propagation caused wrong result
> > on a query with a materialized semi-join.
> >
> > New function Item_equal::get_first() that accepts as argument
> > a field being substituted.
> >
> > sql/sql_select.cc
> > Bug#45174: Incorrectly applied equality propagation caused wrong result
> > on a query with a materialized semi-join.
> > Bug#50019: Wrong result for IN-query with materialization.
> > Bug#52068: Optimizer generates invalid semijoin materialization plan
> >
> > Setting fields first_sj_inner_tab and last_sj_inner_tab moved from
> > setup_semijoin_dups_elimination() to get_best_combination(), so they
> > are set as early as possible after join order optimization.
> >
> > In make_join_select(), the test that determined when to pushdown
> > SJM-specific predicates was wrong, in addition to improving the
> > comments.
> >
> > The logic of eliminate_item_equal() has been simplified and
> > adjusted so that it generates equalities that are useful also
> > when the semijoin materialization strategy is being used.
> > Some simplification was possible by taking advantage of the new
> > Item_equal::get_first() function.
> >
> > In sub_select_sjm(), moved code that initializes the scan over the
> > materialized table so that it is now performed for each scan of
> > table, instead of only for the first scan.
> >
> > In make_cond_for_table_from_pred(), a number of comments has been
> > added, and TAB characters are replaced by spaces.
> >
> > In make_cond_after_sjm(), make sure that it handles equalities
> > generated for semijoin materialization (with marker=3).
> > Added comments and removed TAB characters.
> >
> > modified:
> > mysql-test/include/subquery_sj.inc
> > mysql-test/r/subquery_mat_all.result
> > mysql-test/r/subquery_sj_all.result
> > mysql-test/r/subquery_sj_all_jcl6.result
> > mysql-test/r/subquery_sj_all_jcl7.result
> > mysql-test/r/subquery_sj_dupsweed.result
> > mysql-test/r/subquery_sj_dupsweed_jcl6.result
> > mysql-test/r/subquery_sj_dupsweed_jcl7.result
> > mysql-test/r/subquery_sj_firstmatch.result
> > mysql-test/r/subquery_sj_firstmatch_jcl6.result
> > mysql-test/r/subquery_sj_firstmatch_jcl7.result
> > mysql-test/r/subquery_sj_loosescan.result
> > mysql-test/r/subquery_sj_loosescan_jcl6.result
> > mysql-test/r/subquery_sj_loosescan_jcl7.result
> > mysql-test/r/subquery_sj_mat.result
> > mysql-test/r/subquery_sj_mat_jcl6.result
> > mysql-test/r/subquery_sj_mat_jcl7.result
> > mysql-test/r/subquery_sj_mat_nosj.result
> > mysql-test/r/subquery_sj_none.result
> > mysql-test/r/subquery_sj_none_jcl6.result
> > mysql-test/r/subquery_sj_none_jcl7.result
> > sql/item.cc
> > sql/item_cmpfunc.cc
> > sql/item_cmpfunc.h
> > sql/sql_select.cc
> > === modified file 'mysql-test/include/subquery_sj.inc'
> > --- a/mysql-test/include/subquery_sj.inc 2010-10-13 13:27:36 +0000
> > +++ b/mysql-test/include/subquery_sj.inc 2010-10-22 07:57:40 +0000
> > @@ -393,6 +393,9 @@ insert into t3 select A.a + 10*B.a from
> > explain select * from t3 where a in (select kp1 from t1 where kp1<20);
> > select * from t3 where a in (select kp1 from t1 where kp1<20);
> >
> > +explain select * from t3 where a in (select kp1 from t1 where kp1<20) and
> a<20;
> > +select * from t3 where a in (select kp1 from t1 where kp1<20) and a<20;
> > +
> > create table t4 (pk int primary key);
> > insert into t4 select a from t3;
> >
> > @@ -637,11 +640,9 @@ drop table t0, t1,t2,t3;
> >
> >
> > --echo
> > ---echo Test that MaterializeLookup strategy for semijoin,
> > +--echo Test that neither MaterializeLookup strategy for semijoin,
> > --echo nor subquery materialization is used when BLOBs are involved
> > --echo (except when arguments of some functions).
> > ---echo Note: Due to Bug#52068, wrong may occur below if MaterializeScan
> > ---echo strategy is used instead,
> > --echo
> > set @prefix_len = 6;
> >
> > @@ -3371,3 +3372,89 @@ WHERE (a1, a2) IN (
> > DROP TABLE t1,t2a,t2b,t2c;
> >
> > --echo # End BUG#52329
> > +
> > +--echo #
> > +--echo # Bug#45174: Incorrectly applied equality propagation caused wrong
> > +--echo # result on a query with a materialized semi-join.
> > +--echo #
> > +
> > +CREATE TABLE t1 (
> > + varchar_nokey varchar(1) NOT NULL
> > +);
> > +
> > +INSERT INTO t1 VALUES
> > + ('v'), ('u'), ('n'), ('l'), ('h'), ('u'), ('n'), ('j'), ('k'),
> > + ('e'), ('i'), ('u'), ('n'), ('b'), ('x'), (''), ('q'), ('u');
> > +
> > +CREATE TABLE t2 (
> > + pk int NOT NULL,
> > + varchar_key varchar(1) NOT NULL,
> > + varchar_nokey varchar(1) NOT NULL,
> > + PRIMARY KEY(pk),
> > + KEY varchar_key(varchar_key)
> > +);
> > +
> > +INSERT INTO t2 VALUES
> > + (11,'m','m'), (12,'j','j'), (13,'z','z'), (14,'a','a'), (15,'',''),
> > + (16,'e','e'), (17,'t','t'), (19,'b','b'), (20,'w','w'), (21,'m','m'),
> > + (23,'',''), (24,'w','w'), (26,'e','e'), (27,'e','e'), (28,'p','p');
> > +
> > +let $query=
> > +SELECT varchar_nokey
> > +FROM t1
> > +WHERE (varchar_nokey, varchar_nokey) IN (SELECT varchar_key, varchar_nokey
> > + FROM t2
> > + WHERE varchar_nokey< 'n' XOR pk);
>
> I feel the XOR part is an unecessary distraction here. AFAICT, you
> will get the same error with just "WHERE varchar_nokey < pk". Ref. new
> duplicate BUG#57421.
I am not getting the error without the XOR, so I prefer to keep it (even though
the XOR is odd...)
>
> ...
>
> > === modified file 'sql/item.cc'
> > --- a/sql/item.cc 2010-10-15 10:32:50 +0000
> > +++ b/sql/item.cc 2010-10-22 07:57:40 +0000
> > @@ -5075,8 +5075,8 @@ Item *Item_field::replace_equal_field(uc
> > return this;
> > return const_item;
> > }
> > - Item_field *subst= item_equal->get_first();
> > - if (subst&& field->table != subst->field->table&&
> !field->eq(subst->field))
> > + Item_field *subst= item_equal->get_first(this);
> > + if (field->table != subst->field->table&&
> !field->eq(subst->field))
> > return subst;
>
> Since it is no longer checked that subst is not null, maybe an assert
> should be added?
Ok.
>
> > }
> > return this;
> >
> > === modified file 'sql/item_cmpfunc.cc'
> > --- a/sql/item_cmpfunc.cc 2010-10-15 10:32:50 +0000
> > +++ b/sql/item_cmpfunc.cc 2010-10-22 07:57:40 +0000
> > @@ -5840,3 +5840,90 @@ void Item_equal::print(String *str, enum
> > str->append(')');
> > }
> >
> > +
> > +/**
> > + Get the first equal field of multiple equality, use when execution order
> > + must be considered (see details below).
> > +
> > + @param field field to get substitution field for, which must be
> > + present within the multiple equality itself.
>
> Actually, field is just used to identify the correct semijoin nest.
> It does not have to be present within the multiple equality itself to
> work correctly. Maybe it would be better to generalize this function
> to take JOIN_TAB* as parameter. That is, if the passed in JOIN_TAB is
> part of a materialization, return the first field from that semijoin
> nest. Otherwise, return first field.
I think it is a good thing that the interface of Item_equal only depends on Item
objects. Making the interface know about join-tabs is not good, IMHO. It is
sufficient that the implementation needs to deal with them ;)
>
> > +
> > + @retval Found first field in the multiple equality.
> > +
> > + @details Get the first field of multiple equality that is equal to the
> > + given field. In order to make semijoin materialization strategy work
>
> The "that is equal to the given field" seems unecessary since all fields of
> the multiple equality should be equal to this field.
Ok.
>
> > + correctly we can't propagate equal fields between a materialized semijoin
> > + and the outer query (or any other semijoin) unconditionally.
> > + Thus the field is returned according to following rules:
> > +
> > + 1) If the given field belongs to a materialized semijoin then the
> > + first field in multiple equality which belong to the same semijoin
> > + is returned.
> > + 2) If the given field doesn't belong to a materialized semijoin then
> > + the first field in the multiple equality is returned.
> > +*/
> > +
> > +Item_field* Item_equal::get_first(const Item_field *field)
>
> I am neither comfortable with the name nor the location of this
> function. Since this function does something different from the
> existing get_first, I think it should have another name. I am also a
> bit concerned with making Item classes dependent on JOIN_TAB (except
> for a few methods of Item_subselect, this is the first time JOIN_TAB
> is used in Item classes) and join strategies. Maybe a free function
> in sql_select.cc that is friend of Item_equal is a better choice.
> (Like eliminate_item_equal() today.)
I have renamed it to get_subst_item().
Because the implementation of equality substitution is dependent on join order,
I do not have a big concern about depending the implementation on join-tabs.
However, if we we refactored materialized semijoin into using multiple multiple
equalities for materialized semijoin, this function would no longer be needed.
I could let the implementation work in the context of sj-nest objects, but I
will still have to go through join-tab pointer to get there. Having a table-list
pointer inside Item_field could fix this, but I think this is refactoring food...
>
>
> > +{
> > + DBUG_ASSERT(field != NULL);
> > +
> > + const JOIN_TAB *field_tab= field->field->table->reginfo.join_tab;
> > +
> > + if (sj_is_materialize_strategy(field_tab->get_sj_strategy()))
> > + {
> > + /*
> > + It's a field from a materialized semijoin. We can substitute it only
> > + with a field from the same semijoin.
> > +
> > + Example: suppose we have a join order:
> > +
> > + ot1 ot2 SJM(it1 it2 it3) ot3
> > +
> > + and equality ot2.col = it1.col = it2.col
> > +
> > + If we're looking for best substitute for 'it2.col', we must pick it1.col
> > + and not ot2.col. it2.col is evaluated while performing materialization,
> > + when the outer tables are not available in the execution.
> > + */
> > + List_iterator<Item_field> it(fields);
> > + Item_field *item;
> > + const JOIN_TAB *first= field_tab->first_sj_inner_tab;
> > + const JOIN_TAB *last= field_tab->last_sj_inner_tab;
> > +
> > + while ((item= it++))
> > + {
> > + if (item->field->table->reginfo.join_tab>= first&&
> > + item->field->table->reginfo.join_tab<= last)
> > + {
> > + return item;
> > + }
> > + }
> > + }
> > + else
> > + {
> > + /*
> > + The field is not in a materialized semijoin nest. We can return
> > + the first field in the multiple equality.
> > +
> > + Example: suppose we have a join order with MaterializeLookup:
> > +
> > + ot1 ot2 SJM-Lookup(it1 it2)
> > +
> > + Here we should always pick the first field in the multiple equality,
> > + as this will be present before all other dependent fields.
> > +
> > + Example: suppose we have a join order with MaterializeScan:
> > +
> > + SJM-Scan(it1 it2) ot1 ot2
> > +
> > + and equality ot2.col = ot1.col = it2.col.
> > +
> > + When looking for best substitute for 'ot2.col', we can pick it2.col,
> > + because when we run the scan, column values from the inner materialized
> > + tables will be copied back to the column buffers for it1 and it2.
> > + */
> > + return fields.head();
> > + }
> > + DBUG_ASSERT(FALSE); // Should never get here.
> > + return NULL;
> > +}
> >
> > === modified file 'sql/item_cmpfunc.h'
> > --- a/sql/item_cmpfunc.h 2010-10-15 10:32:50 +0000
> > +++ b/sql/item_cmpfunc.h 2010-10-22 07:57:40 +0000
> > @@ -1637,7 +1637,13 @@ public:
> > void add(Item_field *f);
> > uint members();
> > bool contains(Field *field);
> > + /**
> > + Get the first field of multiple equality, use for semantic checking.
> > +
> > + @retval First field in the multiple equality.
> > + */
> > Item_field* get_first() { return fields.head(); }
> > + Item_field* get_first(const Item_field *field);
> > void merge(Item_equal *item);
> > void update_const();
> > enum Functype functype() const { return MULT_EQUAL_FUNC; }
> >
> > === modified file 'sql/sql_select.cc'
> > --- a/sql/sql_select.cc 2010-10-21 15:44:35 +0000
> > +++ b/sql/sql_select.cc 2010-10-22 07:57:40 +0000
> > @@ -1414,13 +1414,16 @@ bool might_do_join_buffering(uint join_c
> > setup_sj_materialization() (todo: can't we move that to here also?)
> > */
> >
> > -int setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
> > - uint no_jbuf_after)
> > +bool setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
> > + uint no_jbuf_after)
> > {
> > uint tableno;
> > THD *thd= join->thd;
> > DBUG_ENTER("setup_semijoin_dups_elimination");
> >
> > + if (join->select_lex->sj_nests.is_empty())
> > + DBUG_RETURN(FALSE);
> > +
> > for (tableno= join->const_tables ; tableno< join->tables; )
> > {
> > JOIN_TAB *tab=join->join_tab + tableno;
> > @@ -1661,21 +1664,6 @@ int setup_semijoin_dups_elimination(JOIN
> > break;
> > }
> > }
> > - /*
> > - Remember the first and last semijoin inner tables; this serves to tell
> > - a JOIN_TAB's semijoin strategy (like in check_join_cache_usage()).
> > - */
> > - JOIN_TAB *last_sj_inner=
> > - (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
> > - /* Range may end with non-inner table so cannot set last_sj_inner_tab */
> > - NULL : last_sj_tab;
> > - for (JOIN_TAB *tab_in_range= tab;
> > - tab_in_range<= last_sj_tab;
> > - tab_in_range++)
> > - {
> > - tab_in_range->first_sj_inner_tab= tab;
> > - tab_in_range->last_sj_inner_tab= last_sj_inner;
> > - }
> > }
> > DBUG_RETURN(FALSE);
> > }
> > @@ -4253,14 +4241,18 @@ bool find_eq_ref_candidate(TABLE *table,
> > using Materialization or LooseScan to execute it.
> >
> > RETURN
> > - 0 - OK
> > - 1 - Out of memory error
> > + FALSE - OK
> > + TRUE - Out of memory error
> > */
> >
> > -int pull_out_semijoin_tables(JOIN *join)
> > +bool pull_out_semijoin_tables(JOIN *join)
> > {
> > TABLE_LIST *sj_nest;
> > DBUG_ENTER("pull_out_semijoin_tables");
> > +
> > + if (join->select_lex->sj_nests.is_empty())
> > + DBUG_RETURN(FALSE);
> > +
> > List_iterator<TABLE_LIST> sj_list_it(join->select_lex->sj_nests);
> >
> > /* Try pulling out of the each of the semi-joins */
> > @@ -4343,7 +4335,12 @@ int pull_out_semijoin_tables(JOIN *join)
> > pointers.
> > */
> > child_li.remove();
> > - upper_join_list->push_back(tbl);
> > + if (upper_join_list->push_back(tbl))
> > + {
> > + if (arena)
> > + join->thd->restore_active_arena(arena,&backup);
> > + DBUG_RETURN(TRUE);
> > + }
> > tbl->join_list= upper_join_list;
> > tbl->embedding= sj_nest->embedding;
> > }
> > @@ -4364,7 +4361,7 @@ int pull_out_semijoin_tables(JOIN *join)
> > join->thd->restore_active_arena(arena,&backup);
> > }
> > }
> > - DBUG_RETURN(0);
> > + DBUG_RETURN(FALSE);
> > }
> >
> >
> > @@ -8426,15 +8423,18 @@ prev_record_reads(JOIN *join, uint idx,
> >
> > static bool fix_semijoin_strategies_for_picked_join_order(JOIN *join)
> > {
> > - uint tablenr;
> > + uint tableno;
>
> I suggest declarin the above within each loop. It does not seem to be
> referred outside of the loops.
Good idea.
>
> > table_map remaining_tables= 0;
> > table_map handled_tabs= 0;
> >
> > DBUG_ENTER("fix_semijoin_strategies_for_picked_join_order");
> >
> > - for (tablenr= join->tables - 1 ; tablenr != join->const_tables - 1;
> tablenr--)
> > + if (join->select_lex->sj_nests.is_empty())
> > + DBUG_RETURN(FALSE);
> > +
> > + for (tableno= join->tables - 1; tableno != join->const_tables - 1;
> tableno--)
> > {
> > - POSITION *pos= join->best_positions + tablenr;
> > + POSITION *pos= join->best_positions + tableno;
> > JOIN_TAB *s= pos->table;
> > TABLE_LIST *emb_sj_nest= s->emb_sj_nest;
> > uint first;
> > @@ -8471,7 +8471,7 @@ static bool fix_semijoin_strategies_for_
> > */
> > memcpy(pos - table_count + 1, emb_sj_nest->nested_join->sjm.positions,
> > sizeof(POSITION) * table_count);
> > - first= tablenr - table_count + 1;
> > + first= tableno - table_count + 1;
> > join->best_positions[first].n_sj_tables= table_count;
> > join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE_LOOKUP;
> >
> > @@ -8507,14 +8507,13 @@ static bool fix_semijoin_strategies_for_
> > /* Add materialization record count*/
> > prefix_rec_count *= mat_sj_nest->nested_join->sjm.expected_rowcount;
> >
> > - uint i;
> > table_map rem_tables= remaining_tables;
> > - for (i= tablenr; i != (first + table_count - 1); i--)
> > + for (uint i= tableno; i != (first + table_count - 1); i--)
> > rem_tables |= join->best_positions[i].table->table->map;
> >
> > POSITION dummy;
> > join->cur_sj_inner_tables= 0;
> > - for (i= first + table_count; i<= tablenr; i++)
> > + for (uint i= first + table_count; i<= tableno; i++)
> > {
> > best_access_path(join, join->best_positions[i].table, rem_tables, i, FALSE,
> > prefix_rec_count, join->best_positions + i,&dummy);
> > @@ -8528,14 +8527,14 @@ static bool fix_semijoin_strategies_for_
> > {
> > first= pos->first_firstmatch_table;
> > join->best_positions[first].sj_strategy= SJ_OPT_FIRST_MATCH;
> > - join->best_positions[first].n_sj_tables= tablenr - first + 1;
> > + join->best_positions[first].n_sj_tables= tableno - first + 1;
> > POSITION dummy; // For loose scan paths
> > double record_count= (first== join->const_tables)? 1.0:
> > - join->best_positions[tablenr - 1].prefix_record_count;
> > + join->best_positions[tableno - 1].prefix_record_count;
> >
> > table_map rem_tables= remaining_tables;
> > - uint idx;
> > - for (idx= first; idx<= tablenr; idx++)
> > +
> > + for (uint idx= first; idx<= tableno; idx++)
> > {
> > rem_tables |= join->best_positions[idx].table->table->map;
> > }
> > @@ -8544,7 +8543,7 @@ static bool fix_semijoin_strategies_for_
> > join buffering
> > */
> > join->cur_sj_inner_tables= 0;
> > - for (idx= first; idx<= tablenr; idx++)
> > + for (uint idx= first; idx<= tableno; idx++)
> > {
> > if (join->best_positions[idx].use_join_buffer)
> > {
> > @@ -8562,18 +8561,18 @@ static bool fix_semijoin_strategies_for_
> > POSITION *first_pos= join->best_positions + first;
> > POSITION loose_scan_pos; // For loose scan paths
> > double record_count= (first== join->const_tables)? 1.0:
> > - join->best_positions[tablenr - 1].prefix_record_count;
> > + join->best_positions[tableno - 1].prefix_record_count;
> >
> > table_map rem_tables= remaining_tables;
> > - uint idx;
> > - for (idx= first; idx<= tablenr; idx++)
> > +
> > + for (uint idx= first; idx<= tableno; idx++)
> > rem_tables |= join->best_positions[idx].table->table->map;
> > /*
> > Re-run best_access_path to produce best access methods that do not use
> > join buffering
> > */
> > join->cur_sj_inner_tables= 0;
> > - for (idx= first; idx<= tablenr; idx++)
> > + for (uint idx= first; idx<= tableno; idx++)
> > {
> > if (join->best_positions[idx].use_join_buffer || (idx == first))
> > {
> > @@ -8598,7 +8597,7 @@ static bool fix_semijoin_strategies_for_
> > */
> > first= pos->first_dupsweedout_table;
> > join->best_positions[first].sj_strategy= SJ_OPT_DUPS_WEEDOUT;
> > - join->best_positions[first].n_sj_tables= tablenr - first + 1;
> > + join->best_positions[first].n_sj_tables= tableno - first + 1;
> > }
> >
> > uint i_end= first + join->best_positions[first].n_sj_tables;
> > @@ -8613,7 +8612,7 @@ static bool fix_semijoin_strategies_for_
> > handled_tabs |= join->best_positions[i].table->table->map;
> > }
> >
> > - if (tablenr != first)
> > + if (tableno != first)
> > pos->sj_strategy= SJ_OPT_NONE;
> > remaining_tables |= s->table->map;
> > }
> > @@ -8650,16 +8649,13 @@ static bool fix_semijoin_strategies_for_
> >
> > static bool get_best_combination(JOIN *join)
> > {
> > - uint i,tablenr;
> > table_map used_tables;
> > - JOIN_TAB *join_tab,*j;
> > KEYUSE *keyuse;
> > - uint table_count;
> > + const uint table_count= join->tables;
> > THD *thd=join->thd;
> > DBUG_ENTER("get_best_combination");
> >
> > - table_count=join->tables;
> > - if (!(join->join_tab= join_tab= new (thd->mem_root)
> JOIN_TAB[table_count]))
> > + if (!(join->join_tab= new (thd->mem_root) JOIN_TAB[table_count]))
> > DBUG_RETURN(TRUE);
> >
> > join->full_join=0;
> > @@ -8669,11 +8665,12 @@ static bool get_best_combination(JOIN *j
> > if (fix_semijoin_strategies_for_picked_join_order(join))
> > DBUG_RETURN(TRUE);
> >
> > - for (j=join_tab, tablenr=0 ; tablenr< table_count ; tablenr++,j++)
> > + for (uint tableno= 0; tableno< table_count; tableno++)
> > {
> > + JOIN_TAB *j= join->join_tab + tableno;
> > TABLE *form;
> > - *j= *join->best_positions[tablenr].table;
> > - form=join->all_tables[tablenr]=j->table;
> > + *j= *join->best_positions[tableno].table;
> > + form=join->all_tables[tableno]= j->table;
> > used_tables|= form->map;
> > form->reginfo.join_tab=j;
> > if (!*j->on_expr_ref)
> > @@ -8683,31 +8680,69 @@ static bool get_best_combination(JOIN *j
> > if (j->type == JT_CONST)
> > continue; // Handled in make_join_stat..
> >
> > -
> > j->loosescan_match_tab= NULL; //non-nulls will be set later
> > j->ref.key = -1;
> > j->ref.key_parts=0;
> >
> > -
> > if (j->type == JT_SYSTEM)
> > continue;
> >
> > - if (j->keys.is_clear_all() || !(keyuse=
> join->best_positions[tablenr].key) ||
> > - (join->best_positions[tablenr].sj_strategy == SJ_OPT_LOOSE_SCAN))
> > + if (j->keys.is_clear_all() ||
> > + !(keyuse= join->best_positions[tableno].key) ||
>
> IMHO, the above line should be indented one more character.
Right.
>
> > + (join->best_positions[tableno].sj_strategy == SJ_OPT_LOOSE_SCAN))
> > {
> > j->type=JT_ALL;
> > - j->index= join->best_positions[tablenr].loosescan_key;
> > - if (tablenr != join->const_tables)
> > + j->index= join->best_positions[tableno].loosescan_key;
> > + if (tableno != join->const_tables)
> > join->full_join=1;
> > }
> > else if (create_ref_for_key(join, j, keyuse, used_tables))
> > DBUG_RETURN(TRUE); // Something went wrong
> > }
> >
> > - for (i=0 ; i< table_count ; i++)
> > -
> join->map2table[join->join_tab[i].table->tablenr]=join->join_tab+i;
> > + for (uint tableno= 0; tableno< table_count; tableno++)
> > + join->map2table[join->join_tab[tableno].table->tablenr]=
> > + join->join_tab + tableno;
> > +
> > update_depend_map(join);
> > - DBUG_RETURN(0);
> > +
>
> I think the purpose of the loop below should be explained in a comment
> above the love. (Or the entire loop could be put into a separate
> function and explained in a function comment.)
Comment added. Separating into functions could be a job for refactoring of JOIN.
I could do it now, but refactoring would be better in with regard to consistent
naming, placement, etc...
>
> > + for (uint tableno= join->const_tables; tableno< table_count; )
> > + {
> > + JOIN_TAB *tab= join->join_tab + tableno;
> > + const POSITION *pos= join->best_positions + tableno;
> > +
> > + switch (pos->sj_strategy)
> > + {
> > + case SJ_OPT_NONE:
> > + tableno++;
> > + break;
> > + case SJ_OPT_MATERIALIZE_LOOKUP:
> > + case SJ_OPT_MATERIALIZE_SCAN:
> > + case SJ_OPT_LOOSE_SCAN:
> > + case SJ_OPT_DUPS_WEEDOUT:
> > + case SJ_OPT_FIRST_MATCH:
> > + /*
> > + Remember the first and last semijoin inner tables; this serves to tell
> > + a JOIN_TAB's semijoin strategy (like in check_join_cache_usage()).
> > + */
> > + JOIN_TAB *last_sj_tab= tab + pos->n_sj_tables - 1;
> > + JOIN_TAB *last_sj_inner=
> > + (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT) ?
> > + /* Range may end with non-inner table so cannot set last_sj_inner_tab */
> > + NULL : last_sj_tab;
> > + for (JOIN_TAB *tab_in_range= tab;
> > + tab_in_range<= last_sj_tab;
> > + tab_in_range++)
> > + {
> > + tab_in_range->first_sj_inner_tab= tab;
> > + tab_in_range->last_sj_inner_tab= last_sj_inner;
> > + }
> > + tableno+= pos->n_sj_tables;
> > + break;
> > + }
> > + }
> > +
> > + DBUG_RETURN(FALSE);
> > }
> >
> >
> > @@ -9350,6 +9385,22 @@ static bool pushdown_on_conditions(JOIN*
> > }
> >
> >
> > +/**
> > + Separates the predicates in a join condition and pushes them to the
> > + join step where all involved tables are available in the join prefix.
> > + ON clauses from JOIN expressions are also pushed to the most appropriate
> step.
> > +
> > + @param join Join object where predicates are pushed.
> > +
> > + @param cond Pointer to condition which may contain an arbitrary number of
> > + predicates, combined using AND, OR and XOR items.
> > + If NULL, equivalent to a predicate that returns TRUE for all
> > + row combinations.
> > +
> > + @retval TRUE if condition makes query always false OR an error occurred.
> > + @retval FALSE otherwise.
> > +*/
> > +
> > static bool make_join_select(JOIN *join, Item *cond)
> > {
> > THD *thd= join->thd;
> > @@ -9445,8 +9496,7 @@ static bool make_join_select(JOIN *join,
> > - If we're looking at the first SJM table, reset used_tables
> > to refer to only allowed tables
> > */
> > - if (tab->emb_sj_nest&&
> > - tab->emb_sj_nest->sj_mat_exec&&
> > + if (sj_is_materialize_strategy(tab->get_sj_strategy())&&
> > !(used_tables& tab->emb_sj_nest->sj_inner_tables))
> > {
> > save_used_tables= used_tables;
> > @@ -9689,32 +9739,38 @@ static bool make_join_select(JOIN *join,
> >
> > DBUG_ASSERT(save_used_tables ? tab->emb_sj_nest != NULL : TRUE);
> >
> > - if (save_used_tables&& !(used_tables&
> > - ~(tab->emb_sj_nest->sj_inner_tables |
> > - join->const_table_map | PSEUDO_TABLE_BITS)))
> > + /*
> > + 1. We are inside a materialized semijoin nest, and
> > + 2. All inner tables of the nest are covered.
> > + */
> > + if (save_used_tables&& // 1
> > + !(tab->emb_sj_nest->sj_inner_tables& ~used_tables)) // 2
>
> Would it not be an idea to create a subset function soon so that we do
> not have to twist our minds to understand this each time. This
> example shows that even an experienced MySQL developer had problems
> getting this right.
Yes agree. But I think that is better done as a refactoring task that handles
all such cases within the optimizer. Perhaps converting table_map to a class
that hides the implementation bit-map field and only exposes the functions would
be a good idea.
>
> > {
> > /*
> > - We have reached the end of semi join nest. That is, the join order
> > - looks like this:
> > + The join order now looks like this:
> >
> > - outer_tbl1 SJ-Materialize(inner_tbl1 ... inner_tblN) outer_tbl ...
> > - ^
> > - \-we're here
> > - At this point, we need to produce two conditions
> > - - A condition that can be checked when we have all of the sj-inner
> > - tables (inner_tbl1 ... inner_tblN). This will be used while doing
> > - materialization.
> > - - A condition that can be checked when we have all of the tables
> > - in the prefix (both inner and outer).
> > + ot1 ... otI SJM(it1 ... itN) otI+1 ... otM
> > + ^
> > + \-we're here
> > + At this point, we have generated a condition that can be checked
> > + when we have all of the sj-inner tables (it1 ... itN).
> > + This will be used while doing materialization.
> > +
> > + In addition, we need a condition that can be checked when we have
> > + all of the tables in the prefix (both inner and outer).
> > + This condition is only generated (and used) when we have an SJM-scan
> > + operation. For SJM-lookup, the condition is completely fulfilled
> > + through the lookup into the materialized table.
> > + This constraint will last as long as we do not allow correlated
> > + subqueries with materialized semijoin execution.
> > */
> > - tab->emb_sj_nest->sj_mat_exec->join_cond=
> > - cond ?
> > - make_cond_after_sjm(cond, cond, save_used_tables, used_tables):
> > - NULL;
> > + if (cond&& tab->emb_sj_nest->sj_mat_exec->is_scan)
> > + tab->emb_sj_nest->sj_mat_exec->join_cond=
> > + make_cond_after_sjm(cond, cond, save_used_tables, used_tables);
> > +
> > used_tables= save_used_tables | used_tables;
> > save_used_tables= 0;
> > }
> > -
> > }
> > }
> > DBUG_RETURN(0);
> > @@ -10877,8 +10933,7 @@ make_join_readinfo(JOIN *join, ulonglong
> > uint last_sjm_table= MAX_TABLES;
> > DBUG_ENTER("make_join_readinfo");
> >
> > - if (!join->select_lex->sj_nests.is_empty()&&
> > - setup_semijoin_dups_elimination(join, options, no_jbuf_after))
> > + if (setup_semijoin_dups_elimination(join, options, no_jbuf_after))
> > DBUG_RETURN(TRUE); /* purecov: inspected */
> >
> > for (i=join->const_tables ; i< join->tables ; i++)
> > @@ -12527,17 +12582,18 @@ Item *eliminate_item_equal(Item *cond, C
> > Item_equal *item_equal)
> > {
> > List<Item> eq_list;
> > - Item_func_eq *eq_item= 0;
> > + Item_func_eq *eq_item= NULL;
> > if (((Item *) item_equal)->const_item()&&
> !item_equal->val_int())
> > return new Item_int((longlong) 0,1);
> > Item *item_const= item_equal->get_const();
> > Item_equal_iterator it(*item_equal);
> > Item *head;
> > - if (item_const)
> > - head= item_const;
> > - else
> > + if (!item_const)
> > {
> > - head= item_equal->get_first();
> > + /*
> > + If there is a const item, match all field items with the const item,
> > + otherwise match the second and subsequent field items with the first one:
> > + */
> > it++;
> > }
> > Item_field *item_field;
> > @@ -12548,7 +12604,7 @@ Item *eliminate_item_equal(Item *cond, C
> > if (upper)
> > {
> > if (item_const&& upper->get_const())
> > - item= 0;
> > + item= NULL;
> > else
> > {
> > Item_equal_iterator li(*item_equal);
> > @@ -12563,57 +12619,42 @@ Item *eliminate_item_equal(Item *cond, C
> > {
> > if (eq_item)
> > eq_list.push_back(eq_item);
> > - /*
> > - item_field might refer to a table that is within a semi-join
> > - materialization nest. In that case, the join order looks like this:
> >
> > - outer_tbl1 outer_tbl2 SJM (inner_tbl1 inner_tbl2) outer_tbl3
> > + /*
> > + item_field may refer to a table that is within a semijoin
>
> I suggest prefixing item_field with @c, so that doxygen formats the
> name as a C++ identifier.
Ok.
>
> > + materialization nest. In that case, the join order may look like:
> >
> > - We must not construct equalities like
> > + ot1 ot2 SJM (it3 it4) ot5
> >
> > - outer_tbl1.col = inner_tbl1.col
> > + If we have a multiple equality (ot1.c1, ot2.c2, it3.c3, it4.c4, ot5.c5),
> > + we should generate the following equalities:
> > + 1. ot1.c1 = ot2.c2
> > + 2. ot1.c1 = it1.c3
> > + 3. it1.c3 = it2.c4
> > + 4. ot1.c1 = ot5.c5
> > +
> > + Equalities 1) and 4) are regular equalities between two outer tables.
> > + Equality 2) is an equality that matches the outer query with a
> > + materialized semijoin table. It is either performed as a lookup
> > + into the materialized table (SJM-lookup), or as a condition on the
> > + outer table (SJM-scan).
> > + Equality 3) is evaluated during semijoin materialization.
> > +
> > + If there is a const item, match against this one.
> > + Otherwise, match against the first field item in the multiple equality,
> > + unless the item is within a materialized semijoin nest, where we match
> > + against the first item within the SJM nest (if the item is not the first
> > + item within the SJM nest), or match against the first item in the
> > + list (if the item is the first one in the SJM list).
> > + */
>
> The last part got a bit complex. The terminology is also a bit
> inconsistent. I think you use SJM nest and SJM list for the same
> thing, and multiple equality and list likewise.
SJM list was a typo.
>
> I also think you need to explain get_first(Item_field*) role in this.
> Otherwise, it is not easy to understand the connection between the
> description above and the code below.
Guilhem asked me to add a comment to the "head == item" test. Is that sufficient?
>
> > + head= item_const ? item_const : item_equal->get_first(item);
> > + if (head == item)
> > + head= item_equal->get_first();
> >
> > - because they would get attached to inner_tbl1 and will get evaluated
> > - during materialization phase, when we don't have current value of
> > - outer_tbl1.col.
> > - */
> > - TABLE_LIST *emb_nest=
> > - item_field->field->table->pos_in_table_list->embedding;
> > - if (!item_const&& emb_nest&& emb_nest->sj_mat_exec)
> > - {
> > - /*
> > - Find the first equal expression that refers to a table that is
> > - within the semijoin nest. If we can't find it, do nothing
> > - */
> > - List_iterator<Item_field> fit(item_equal->fields);
>
> Since this function no longer refers to Item_equal::fields, it can be
> removed as friend of Item_equal.
Ok.
>
> > - Item_field *head_in_sjm;
> > - bool found= FALSE;
> > - while ((head_in_sjm= fit++))
> > - {
> > - if (head_in_sjm->used_tables()& emb_nest->sj_inner_tables)
> > - {
> > - if (head_in_sjm == item_field)
> > - {
> > - /* This is the first table inside the semi-join*/
> > - eq_item= new Item_func_eq(item_field, head);
> > - /* Tell make_cond_for_table don't use this. */
> > - eq_item->marker=3;
> > - }
> > - else
> > - {
> > - eq_item= new Item_func_eq(item_field, head_in_sjm);
> > - found= TRUE;
> > - }
> > - break;
> > - }
> > - }
> > - if (!found)
> > - continue;
> > - }
> > - else
> > - eq_item= new Item_func_eq(item_field, head);
> > + eq_item= new Item_func_eq(item_field, head);
>
> I think it is a bit confusing that, AFAICT, both item and item_field
> is used to refer to the same item. I suggest that inside the
> item==item_field block that all references are to item_field.
Yes, good idea.
>
> > if (!eq_item)
> > - return 0;
> > + return NULL;
> > +
> > eq_item->set_cmp_func();
> > eq_item->quick_fix_field();
> > }
> > @@ -16908,28 +16949,27 @@ sub_select_sjm(JOIN *join, JOIN_TAB *joi
> > Ok, materialization finished. Initialize the access to the temptable
> > */
> > sjm->materialized= TRUE;
> > +
> > join_tab->read_record.read_record= join_no_more_records;
> > - if (sjm->is_scan)
> > - {
> > - /* Initialize full scan */
> > - JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
> > - init_read_record(&last_tab->read_record, join->thd,
> > - sjm->table, NULL, TRUE, TRUE, FALSE);
> > -
> > - DBUG_ASSERT(last_tab->read_record.read_record == rr_sequential);
> > - last_tab->read_first_record= join_read_record_no_init;
> > - last_tab->read_record.copy_field= sjm->copy_field;
> > - last_tab->read_record.copy_field_end= sjm->copy_field +
> > - sjm->table_cols.elements;
> > - last_tab->read_record.read_record= rr_sequential_and_unpack;
> > - }
> > }
> >
> > if (sjm->is_scan)
> > {
> > - /* Do full scan of the materialized table */
> > + /*
> > + For each prefix partial record (if any), initialize and perform
> > + a full scan of the materialized table.
> > + */
> > JOIN_TAB *last_tab= join_tab + (sjm->table_count - 1);
> >
> > + init_read_record(&last_tab->read_record, join->thd,
> > + sjm->table, NULL, TRUE, TRUE, FALSE);
> > +
> > + last_tab->read_first_record= join_read_record_no_init;
> > + last_tab->read_record.copy_field= sjm->copy_field;
> > + last_tab->read_record.copy_field_end= sjm->copy_field +
> > + sjm->table_cols.elements;
> > + last_tab->read_record.read_record= rr_sequential_and_unpack;
> > +
>
> Since this code does not deal anything with equality predicates, I
> would prefer if the above bug fix was separate patch.
>
> Anyhow, I am afraid that readers of this code, may be a bit puzzled
> that you set last_tab->read_first_record and
> last_tab->read_record.read_record for each scan. I doubt that which
> functions to use will change between scans. copy_field probably does
> not change either.
I tried this, but it appeared that setting several of the fields per scan was
needed. So I defaulted to set them all...
>
> > Item *save_cond= last_tab->select_cond;
> > last_tab->set_select_cond(sjm->join_cond, __LINE__);
> > rc= sub_select(join, last_tab, end_of_records);
> > @@ -18840,21 +18880,23 @@ static bool replace_subcondition(JOIN *j
> > }
> >
> >
> > -/*
> > +/**
> > Extract a condition that can be checked after reading given table
> >
> > - SYNOPSIS
> > - make_cond_for_table()
> > - cond Condition to analyze
> > - tables Tables for which "current field values" are available
> > - used_table Table that we're extracting the condition for (may
> > - also include PSEUDO_TABLE_BITS
> > - exclude_expensive_cond Do not push expensive conditions
> > + @param cond Condition to analyze
> > + @param tables Tables for which "current field values" are available
> > + @param used_table Table that we're extracting the condition for (may
> > + also include PSEUDO_TABLE_BITS, and may be zero)
> > + @param exclude_expensive_cond Do not push expensive conditions
> >
> > - DESCRIPTION
> > + @retval<>NULL Generated condition
> > + @retval = NULL Already checked, OR error
> > +
> > + @details
> > Extract the condition that can be checked after reading the table
> > specified in 'used_table', given that current-field values for tables
> > specified in 'tables' bitmap are available.
> > + If 'used_table' is 0, extract conditions for all tables in 'tables'.
> >
> > The function assumes that
> > - Constant parts of the condition has already been checked.
> > @@ -18865,12 +18907,12 @@ static bool replace_subcondition(JOIN *j
> > guaranteed to be true by employed 'ref' access methods (the code that
> > does this is located at the end, search down for "EQ_FUNC").
> >
> > -
> > - SEE ALSO
> > + @see
> > make_cond_for_info_schema uses similar algorithm
> >
> > - RETURN
> > - Extracted condition
> > + @note
> > + Make sure to keep the implementations of make_cond_for_table() and
> > + make_cond_after_sjm() synchronized.
> > */
> >
> > static Item *
> > @@ -18880,26 +18922,32 @@ make_cond_for_table(Item *cond, table_ma
> > return make_cond_for_table_from_pred(cond, cond, tables, used_table,
> > exclude_expensive_cond);
> > }
> > -
> > +
> > static Item *
> > make_cond_for_table_from_pred(Item *root_cond, Item *cond,
> > table_map tables, table_map used_table,
> > bool exclude_expensive_cond)
> > {
> > - if (used_table&& !(cond->used_tables()& used_table)&&
> > - /*
> > - Exclude constant conditions not checked at optimization time if
> > + /*
> > + Ignore this condition if
> > + 1. We are extracting conditions for a specific table, and
> > + 2. that table is not referenced by the condition, and
> > + 3. exclude constant conditions not checked at optimization time if
> > the table we are pushing conditions to is the first one.
> > As a result, such conditions are not considered as already checked
> > and will be checked at execution time, attached to the first table.
> > -
> > + */
> > + if (used_table&& // 1
> > + !(cond->used_tables()& used_table)&& // 2
> > + /*
> > psergey: TODO: "used_table& 1" doesn't make sense in nearly any
> > context. Look at setup_table_map(), table bits reflect the order
> > the tables were encountered by the parser. Check what we should
> > replace this condition with.
> > */
> > - !((used_table& 1)&& cond->is_expensive()))
> > - return (Item*) 0; // Already checked
> > + !((used_table& 1)&& cond->is_expensive())) // 3
> > + return NULL;
> > +
> > if (cond->type() == Item::COND_ITEM)
> > {
> > if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
> > @@ -18907,48 +18955,48 @@ make_cond_for_table_from_pred(Item *root
> > /* Create new top level AND item */
> > Item_cond_and *new_cond=new Item_cond_and;
> > if (!new_cond)
> > - return (Item*) 0; // OOM /* purecov: inspected */
> > + return NULL;
> > List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
> > Item *item;
> > while ((item=li++))
> > {
> > - Item *fix=make_cond_for_table_from_pred(root_cond, item,
> > + Item *fix=make_cond_for_table_from_pred(root_cond, item,
>
> Since you are changing the above line anyway, maybe you should change
> to the MySQL-standard assignment format, too. (There are several
> cases of this.)
Sure, done.
>
>
> > tables, used_table,
> > exclude_expensive_cond);
> > - if (fix)
> > - new_cond->argument_list()->push_back(fix);
> > + if (fix)
> > + new_cond->argument_list()->push_back(fix);
> > }
> > switch (new_cond->argument_list()->elements) {
> > case 0:
> > - return (Item*) 0; // Always true
> > + return NULL; // Always true
> > case 1:
> > - return new_cond->argument_list()->head();
> > + return new_cond->argument_list()->head();
> > default:
> > - /*
> > - Item_cond_and do not need fix_fields for execution, its parameters
> > - are fixed or do not need fix_fields, too
> > - */
> > - new_cond->quick_fix_field();
> > - new_cond->used_tables_cache=
> > - ((Item_cond_and*) cond)->used_tables_cache&
> > - tables;
> > - return new_cond;
> > + /*
> > + Item_cond_and do not need fix_fields for execution, its parameters
> > + are fixed or do not need fix_fields, too
> > + */
> > + new_cond->quick_fix_field();
> > + new_cond->used_tables_cache=
> > + ((Item_cond_and*) cond)->used_tables_cache&
> > + tables;
>
> I suggest merging the to lines above.
Done.
>
>
> > + return new_cond;
> > }
> > }
> > else
> > - { // Or list
> > + { // Or list
> > Item_cond_or *new_cond=new Item_cond_or;
> > if (!new_cond)
> > - return (Item*) 0; // OOM /* purecov: inspected */
> > + return NULL;
> > List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
> > Item *item;
> > while ((item=li++))
> > {
> > - Item *fix=make_cond_for_table_from_pred(root_cond, item,
> > + Item *fix=make_cond_for_table_from_pred(root_cond, item,
> > tables, 0L,
> > exclude_expensive_cond);
> > if (!fix)
> > - return (Item*) 0; // Always true
> > + return NULL; // Always true
> > new_cond->argument_list()->push_back(fix);
> > }
> > /*
> > @@ -18963,20 +19011,26 @@ make_cond_for_table_from_pred(Item *root
> > }
> >
> > /*
> > - Because the following test takes a while and it can be done
> > - table_count times, we mark each item that we have examined with the result
> > - of the test
> > - */
> > + Omit this condition if
> > + 1. It has been marked as omittable before, or
> > + 2. Some tables referred by the condition are not available, or
> > + 3. We are extracting conditions for any tables, the condition is
> > + considered 'expensive', and we want to delay evaluation of such
> > + conditions to the execution phase.
> > + */
> > + if (cond->marker == 3 || // 1
> > + (cond->used_tables()& ~tables) || // 2
> > + (!used_table&& exclude_expensive_cond&&
> cond->is_expensive())) // 3
> > + return NULL;
> >
> > - if (cond->marker == 3 || (cond->used_tables()& ~tables) ||
> > - /*
> > - When extracting constant conditions, treat expensive conditions as
> > - non-constant, so that they are not evaluated at optimization time.
> > - */
> > - (!used_table&& exclude_expensive_cond&&
> cond->is_expensive()))
> > - return (Item*) 0; // Can't check this yet
> > - if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
> > - return cond; // Not boolean op
> > + /*
> > + Extract this condition if
> > + 1. It has already been marked as applicable, or
> > + 2. It is not a<comparison predicate>
> (=,<,>,<=,>=,<=>)
> > + */
> > + if (cond->marker == 2 || // 1
> > + cond->eq_cmp_result() == Item::COND_OK) // 2
> > + return cond;
> >
> > /*
> > Remove equalities that are guaranteed to be true by use of 'ref' access
> > @@ -19004,19 +19058,19 @@ make_cond_for_table_from_pred(Item *root
> > Item *left_item= ((Item_func*) cond)->arguments()[0]->real_item();
> > Item *right_item= ((Item_func*) cond)->arguments()[1]->real_item();
> > if (left_item->type() == Item::FIELD_ITEM&&
> > - test_if_ref(root_cond, (Item_field*) left_item,right_item))
> > + test_if_ref(root_cond, (Item_field*) left_item,right_item))
> > {
> > - cond->marker=3; // Checked when read
> > - return (Item*) 0;
> > + cond->marker=3; // Condition can be omitted
> > + return NULL;
> > }
> > if (right_item->type() == Item::FIELD_ITEM&&
> > - test_if_ref(root_cond, (Item_field*) right_item,left_item))
> > + test_if_ref(root_cond, (Item_field*) right_item,left_item))
> > {
> > - cond->marker=3; // Checked when read
> > - return (Item*) 0;
> > + cond->marker=3; // Condition can be omitted
> > + return NULL;
> > }
>
> I suggest merging the above two if-statements. No need to duplicate
> this code. If readability is a concern, it can be achieved with
> white-space.
Ok
>
> Even better, make a small inline function so that you can write
>
> if (x(root_cond, right_item, left_item) ||
> x(root_cond, left_item, right_item))
>
>
> > }
> > - cond->marker=2;
> > + cond->marker=2; // Mark condition as applicable
> > return cond;
> > }
> >
> > @@ -19030,24 +19084,35 @@ make_cond_for_table_from_pred(Item *root
> > @param sjm_tables Tables within the semi-join nest (the inner part).
> >
> > @retval<>NULL Generated condition
> > - @retval = NULL Already checked, or error
> > + @retval = NULL Already checked, OR error
> >
> > @details
> > A regular semi-join materialization is always non-correlated, ie
> > the subquery is always resolved by performing a lookup generated in
> > - create_subquery_equalities, hence this function should never produce
> > + create_subquery_equalities, hence this function never needs to produce
> > any condition for regular semi-join materialization.
> > For a scan semi-join materialization, this function may return a condition
> > - to be checked.
> > + to be checked, when there are outer tables before the SJM tables in the
> > + join prefix.
> > +
> > + @note
> > + Make sure to keep the implementations of make_cond_for_table() and
> > + make_cond_after_sjm() synchronized.
> > */
> >
> > static Item *
> > make_cond_after_sjm(Item *root_cond, Item *cond, table_map tables,
> > table_map sjm_tables)
> > {
> > + /*
> > + We can only test conditions that cover tables from the join prefix
> > + and tables from the semijoin nest. Other conditions will be handled
> > + by make_cond_for_table().
> > + */
> > if ((!(cond->used_tables()& ~tables) ||
> > !(cond->used_tables()& ~sjm_tables)))
> > - return (Item*) 0; // Already checked
> > + return NULL;
> > +
> > if (cond->type() == Item::COND_ITEM)
> > {
> > if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
> > @@ -19055,49 +19120,49 @@ make_cond_after_sjm(Item *root_cond, Ite
> > /* Create new top level AND item */
> > Item_cond_and *new_cond=new Item_cond_and;
> > if (!new_cond)
> > - return (Item*) 0; // OOM /* purecov: inspected */
> > + return NULL;
> > List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
> > Item *item;
> > while ((item=li++))
> > {
> > - Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
> > - if (fix)
> > - new_cond->argument_list()->push_back(fix);
> > + Item *fix=make_cond_after_sjm(root_cond, item, tables, sjm_tables);
> > + if (fix)
> > + new_cond->argument_list()->push_back(fix);
> > }
> > switch (new_cond->argument_list()->elements) {
> > case 0:
> > - return (Item*) 0; // Always true
> > + return NULL; // Always true
> > case 1:
> > - return new_cond->argument_list()->head();
> > + return new_cond->argument_list()->head();
> > default:
> > /*
> > - Item_cond_and do not need fix_fields for execution, its parameters
> > - are fixed or do not need fix_fields, too
> > + Item_cond_and do not need fix_fields for execution, its parameters
> > + are fixed or do not need fix_fields, too
> > */
> > - new_cond->quick_fix_field();
> > - new_cond->used_tables_cache=
> > - ((Item_cond_and*) cond)->used_tables_cache&
> > - tables;
> > - return new_cond;
> > + new_cond->quick_fix_field();
> > + new_cond->used_tables_cache=
> > + ((Item_cond_and*) cond)->used_tables_cache&
> > + tables;
>
> Likewise.
Done.
>
> > + return new_cond;
> > }
> > }
> > else
> > - { // Or list
> > + { // Or list
> > Item_cond_or *new_cond=new Item_cond_or;
> > if (!new_cond)
> > - return (Item*) 0; // OOM /* purecov: inspected */
> > + return NULL;
> > List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
> > Item *item;
> > while ((item=li++))
> > {
> > - Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
> > - if (!fix)
> > - return (Item*) 0; // Always true
> > - new_cond->argument_list()->push_back(fix);
> > + Item *fix= make_cond_after_sjm(root_cond, item, tables, 0L);
> > + if (!fix)
> > + return NULL; // Always true
> > + new_cond->argument_list()->push_back(fix);
> > }
> > /*
> > - Item_cond_and do not need fix_fields for execution, its parameters
> > - are fixed or do not need fix_fields, too
> > + Item_cond_and do not need fix_fields for execution, its parameters
> > + are fixed or do not need fix_fields, too
> > */
> > new_cond->quick_fix_field();
> > new_cond->used_tables_cache= ((Item_cond_or*) cond)->used_tables_cache;
> > @@ -19107,15 +19172,22 @@ make_cond_after_sjm(Item *root_cond, Ite
> > }
> >
> > /*
> > - Because the following test takes a while and it can be done
> > - table_count times, we mark each item that we have examined with the result
> > - of the test
> > + Omit this condition if
> > + 1. It has been marked as omittable before, or
> > + 2. Some tables referred by the condition are not available.
> > */
> > + if (cond->marker == 3 || // 1
> > + cond->used_tables()& ~(tables | sjm_tables)) // 2
> > + return NULL;
> >
> > - if (cond->marker == 3 || (cond->used_tables()& ~(tables |
> sjm_tables)))
> > - return (Item*) 0; // Can't check this yet
> > - if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
> > - return cond; // Not boolean op
> > + /*
> > + Extract this condition if
> > + 1. It has already been marked as applicable, or
> > + 2. It is not a<comparison predicate>
> (=,<,>,<=,>=,<=>)
> > + */
> > + if (cond->marker == 2 || // 1
> > + cond->eq_cmp_result() == Item::COND_OK) // 2
> > + return cond;
> >
> > /*
> > Remove equalities that are guaranteed to be true by use of 'ref' access
> > @@ -19128,17 +19200,17 @@ make_cond_after_sjm(Item *root_cond, Ite
> > if (left_item->type() == Item::FIELD_ITEM&&
> > test_if_ref(root_cond, (Item_field*) left_item,right_item))
> > {
> > - cond->marker=3; // Checked when read
> > - return (Item*) 0;
> > + cond->marker=3; // Condition can be omitted
> > + return NULL;
> > }
> > if (right_item->type() == Item::FIELD_ITEM&&
> > test_if_ref(root_cond, (Item_field*) right_item,left_item))
> > {
> > - cond->marker=3; // Checked when read
> > - return (Item*) 0;
> > + cond->marker=3; // Condition can be omitted
> > + return NULL;
>
> Likewise.
Done.
>
> > }
> > }
> > - cond->marker=2;
> > + cond->marker=2; // Mark condition as applicable
> > return cond;
> > }
> >
> >
> >
> >
> >
> >
>
Thanks,
Roy