On 08.04.11 11.56, Øystein Grøvlen wrote:
> Hi Roy,
>
> I think the patch looks good. By encapsulating this part of the
> optimizer in a separate class, it should be easier to understand what
> belongs together. See in-line for more specific comments.
>
> I also wonder whether you have resolved with Ole-John the issue he
> raised in an earlier email about conflicts with bug#58225 / bug#41740.
We are aware of it, and it will be resolved by one of us having to "adapt".
Because this is a prerequisite for further semijoin work, I cannot afford to
wait for the approval of Ole John's fix, unfortunately.
>
> Also, you are introducing a new variant "semi join" that I do not
> think exists in the code comments before. AFAIK, semi is a prefix
> (expect in special cases like the semis of a sporting tournament), and
> it should be either "semi-join" or just "semijoin".
I did it to align the terminology with inner join and outer join, as in "outer
join nest" and "inner join nest". I am not so troubled by the fact that it is
not proper English, and there are quite a few references to "semi join" in the
google space. However, a reference book such as "Database systems - the complete
book" talks about semijoin operator - but they also refer to outer join operators.
Anyway, it is probably better to revert to the term semijoin.
>
> On 04/ 6/11 01:57 PM, Roy Lyseng wrote:
> > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:tor.didriksen@stripped
> >
> > 3358 Roy Lyseng 2011-04-06
> > Bug#11822517: Refactor call interface of choose_plan()
> >
> > Part 1 - create class Optimize_table_order.
> >
> > This patch creates a class that is used to optimize the join order of
> > tables in a query. The main purpose of the refactoring is to simplify
> > function interfaces and to encapsulate information needed to perform
> > optimization in a dedicated class.
> >
> > The following existing functions are added to the class:
> > choose_table_order() - renamed from choose_plan()
> > check_interleaving_with_nj()
> > advance_sj_state()
> > backout_nj_sj_state()
> > optimize_straight_join()
> > greedy_search()
> > best_extension_by_limited_search()
> > optimize_wo_join_buffering()
> > determine_search_depth()
> >
> > sql/sql_select.cc
> > New class definition, see above for details.
> >
> > sql/sql_select.h
> > Added all_table_map (map of tables in query) to JOIN class.
> > cur_embedding_map has been moved to class Optimize_table_order.
> > Commented const_table_map, found_const_table_map and outer_join.
> >
> > modified:
> > sql/sql_select.cc
> > sql/sql_select.h
> > === modified file 'sql/sql_select.cc'
> > --- a/sql/sql_select.cc 2011-03-29 07:30:44 +0000
> > +++ b/sql/sql_select.cc 2011-04-06 11:56:55 +0000
>
> ...
>
> > @@ -279,6 +260,94 @@ TABLE *create_duplicate_weedout_tmp_tabl
> > Item_equal *find_item_equal(COND_EQUAL *cond_equal, Field *field,
> > bool *inherited_fl);
> >
> > +/**
> > + This class determines the optimal join order for tables within
> > + a basic query block, ie a query specification clause, possibly extended
> > + with semi joined tables from embedded subqueries.
> > +
> > + This class takes as prerequisite a join class where all dependencies among
> > + tables have been sorted out, all possible access paths have been
> > + sorted out, and all statistics information has been filled in.
> > +
> > + The class has a sole public function that will calculate the most
> > + optimal plan based on the inputs and the environment, such as prune level
> > + and greedy optimizer search depth. For more information, see the
> > + function headers for the private functions greedy_search() and
> > + best_extension_by_limited_search().
> > +*/
> > +
> > +class Optimize_table_order
> > +{
> > +public:
> > + Optimize_table_order(THD *thd, JOIN *join, TABLE_LIST *sjm_nest)
>
> Using same name for parameters as for member variables is not common
> in C++ (in fact, I thought it was not allowed) and Sun compiler
> complains
>
> "Warning: join hides Optimize_table_order::join."
>
> Since things seems to work, I guess the behavior is a expected, but I
> suggest that you change the name of the parameters.
AFAIU, this is acceptable and intuitive in the initializer list, but not in the
constructor body. I'd wish
>
> You do not need to pass thd as a separate argument, you can get it
> from join->thd. I am not sure it is a good idea to open up for using
> another thd than join->thd.
I am preparing for the bright future when thd is not a member of class JOIN.
Please advice if this is a mandatory comment.
>
> > + : search_depth(determine_search_depth(join,
> > + thd->variables.optimizer_search_depth)),
> > + prune_level(thd->variables.optimizer_prune_level),
> > + thd(thd), join(join),
> > + cur_embedding_map(0)
> > + {
> > + join->emb_sjm_nest= sjm_nest;
> > + }
> > + ~Optimize_table_order()
> > + {
> > + join->emb_sjm_nest= NULL;
> > + }
> > + /**
> > + Entry point to table join order optimization.
> > + For further description, see class header and private function headers.
> > +
> > + @return false if successful, true if error
> > + */
> > + bool choose_table_order();
> > +
> > +private:
> > + const uint search_depth; // Maximum search depth to apply in greedy search
> > + const uint prune_level; // pruning heuristics to be applied
> > + // (0 = EXHAUSTIVE, 1 = PRUNE_BY_TIME_OR_ROWS)
> > + THD * const thd; // Pointer to current THD
> > + JOIN * const join; // Pointer to the current plan being developed
>
> I suggest writing "*const" instead of "* const". Stroustrup calls
> this a "*const" modifier, so I think it should be regarded as a single
> unit.
'*const' wins over '* const' 43 to 18 in trunk, so I guess this is reasonable.
>
> > + /**
> > + Bitmap of nested joins embedding the position at the end of the current
> > + partial join.
> > + */
>
> I do not understand what "embedding the position at the end" means.
Neither do I... I'll try to rephrase it.
>
> > + nested_join_map cur_embedding_map;
> > + /**
> > + @todo: Add remaining Join optimization state members here,
> > + ie emb_sjm_nest, positions, cur_sj_inner_tables.
> > + This is not yet possible because (some of them) are accessed through
> > + best_access_path(), which is currently not part of this class.
> > + */
> > +
> > + bool check_interleaving_with_nj(JOIN_TAB *next_tab);
> > + void advance_sj_state(table_map remaining_tables,
> > + const JOIN_TAB *tab, uint idx,
> > + double *current_record_count,
> > + double *current_read_time,
> > + POSITION *loose_scan_pos);
> > + void backout_nj_sj_state(const table_map remaining_tables,
> > + const JOIN_TAB *tab);
> > + void optimize_straight_join(table_map join_tables);
> > + bool greedy_search(table_map remaining_tables);
> > + bool best_extension_by_limited_search(table_map remaining_tables,
> > + uint idx,
> > + double record_count,
> > + double read_time,
> > + uint current_search_depth);
> > + void optimize_wo_join_buffering(uint first_tab, uint last_tab,
> > + table_map last_remaining_tables,
> > + bool first_alt, uint no_jbuf_before,
> > + double *reopt_rec_count, double *reopt_cost);
> > +
> > + static uint determine_search_depth(JOIN* join, uint search_depth);
>
> I assume this is static in order to be able to call it from the
> initialization list. I think I would prefer, making this a non-static
> member function, remove the join parameter, and moving the
> initialization to the body of the constructor.
>
> > + /**
> > + @todo: Add the static functions join_tab_cmp(), join_tab_cmp_straight() and
> > + join_tab_cmp_embedded_first() as static private member functions here.
> > + This is currently not possible because they are fed to my_qsort2(),
> > + which requires C linkage.
> > + @todo: Add best_access_path() as private member function within this class.
> > + Currently not possible because it is called from outside this class.
>
> Why not make it a public member?
Because it is called in a phase of the optimization when there is no natural
scope of an object of this type. Which means that a new object needs to be
constructed, and that is not natural to me.
>
> > + */
> > +};
> >
> > /**
> > This handles SELECT with and without UNION
> > @@ -1953,8 +2022,7 @@ JOIN::optimize()
> >
> > /* Calculate how to do the join */
> > THD_STAGE_INFO(thd, stage_statistics);
> > - if (make_join_statistics(this, select_lex->leaf_tables, conds,&keyuse)
> ||
> > - thd->is_fatal_error)
> > + if (make_join_statistics(this, select_lex->leaf_tables,
> conds,&keyuse))
>
> The reason for the removable of the is_fatal_error check should be
> mentioned in the commit comments.
Yes, will do.
>
> > {
> > DBUG_PRINT("error",("Error: make_join_statistics() failed"));
> > DBUG_RETURN(1);
> > @@ -4625,9 +4693,10 @@ make_join_statistics(JOIN *join, TABLE_L
> > {
> > int error;
> > TABLE *table;
> > + THD * const thd= join->thd;
>
> Another case where I think it should be "*const"
>
> > TABLE_LIST *tables= tables_arg;
> > uint i,table_count,const_count,key;
> > - table_map found_const_table_map, all_table_map, found_ref, refs;
> > + table_map found_ref, refs;
> > TABLE **table_vector;
> > JOIN_TAB *stat,*stat_end,*s,**stat_ref;
> > Key_use *keyuse, *start_keyuse;
> > @@ -4636,25 +4705,33 @@ make_join_statistics(JOIN *join, TABLE_L
> > JOIN_TAB *stat_vector[MAX_TABLES+1];
> > DBUG_ENTER("make_join_statistics");
> >
> > + /*
> > + Conceptually, this initialization belongs immediately before the call
> > + to table_order.choose_table_order(), but due to the use of the error label,
> > + it must be placed before all 'goto error' statements.
> > + */
>
> Why?
Because th compiler protested...
>
> > + Optimize_table_order table_order(thd, join, NULL);
> > +
> > table_count=join->tables;
> > - stat= new (join->thd->mem_root) JOIN_TAB[table_count];
> > - stat_ref=(JOIN_TAB**) join->thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
> > - table_vector=(TABLE**)
> join->thd->alloc(sizeof(TABLE*)*(table_count*2));
> > + stat= new (thd->mem_root) JOIN_TAB[table_count];
> > + stat_ref=(JOIN_TAB**) thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
> > + table_vector=(TABLE**) thd->alloc(sizeof(TABLE*)*(table_count*2));
>
> Missing space after '='
Ok
>
> > if (!stat || !stat_ref || !table_vector)
> > - DBUG_RETURN(1); // Eom /* purecov: inspected */
> > + DBUG_RETURN(true);
> >
> > if (!(join->positions=
> > - new (join->thd->mem_root) POSITION[table_count+1]))
> > - DBUG_RETURN(TRUE);
> > + new (thd->mem_root) POSITION[table_count+1]))
> > + DBUG_RETURN(true);
> >
> > if (!(join->best_positions=
> > - new (join->thd->mem_root) POSITION[table_count+1]))
> > - DBUG_RETURN(TRUE);
> > + new (thd->mem_root) POSITION[table_count+1]))
> > + DBUG_RETURN(true);
> >
> > join->best_ref=stat_vector;
> >
> > stat_end=stat+table_count;
> > - found_const_table_map= all_table_map=0;
> > + join->found_const_table_map=0;
>
> Likewise.
Ok
>
> > + join->all_table_map= 0;
> > const_count=0;
> >
> > for (s= stat, i= 0;
>
> ...
Ok.
>
> > @@ -6945,17 +7008,25 @@ public:
> > 'join->positions' of length 'idx'. The chosen access method for 's' and its
> > cost are stored in 'join->positions[idx]'.
> >
> > + Notice also that when calculating a plan for a materialized semi join nest,
> > + this function must not consider key references between tables inside the
> > + semi join nest and those outside of it. The way to control this is to add
> > + the set of outside tables to the @c remaining_tables argument.
> > +
> > @param join pointer to the structure providing all context info
> > for the query
> > @param s the table to be joined by the function
> > @param thd thread for the connection that submitted the query
> > - @param remaining_tables set of tables not included into the partial plan yet
> > + @param remaining_tables set of tables not included in the partial plan yet.
> > + additionally, when calculating a plan for a
> > + materialized semi join nest, this must also contain
> > + all tables outside of the semi join nest.
>
> "Additionally" should be capitalized, and there is several occurences
> of "semi join" here.
Ok.
>
> > @param idx the length of the partial plan
> > @param disable_jbuf TRUE<=> Don't use join buffering
> > @param record_count estimate for the number of records returned by the
> > partial plan
> > - @param pos OUT Table access plan
> > - @param loose_scan_pos OUT Table plan that uses loosescan, or set cost to
> > + @param[out] pos Table access plan
> > + @param[out] loose_scan_pos Table plan that uses loosescan, or set cost to
> > DBL_MAX if not possible.
> >
> > @return
> > @@ -7537,33 +7608,37 @@ best_access_path(JOIN *join,
> >
> >
> > /**
> > - Selects and invokes a search strategy for an optimal query plan.
> > + Selects and invokes a search strategy for an optimal query join order.
> >
> > The function checks user-configurable parameters that control the search
> > strategy for an optimal plan, selects the search method and then invokes
> > it. Each specific optimization procedure stores the final optimal plan in
> > the array 'join->best_positions', and the cost of the plan in
> > 'join->best_read'.
> > + The function can be invoked to produce a plan for all tables in the query
> > + (in this case, the const tables are usually filtered out), or it can be
> > + invoked to produce a plan for a materialization of a semi join nest.
> > + Set a non-NULL emb_sjm_nest pointer when producing a plan for a semi join
> > + nest to be materialized and a NULL pointer when producing a full query plan.
> >
> > - @param join pointer to the structure providing all context info for
> > - the query
> > - @param join_tables set of the tables in the query
> > -
> > - @retval
> > - FALSE ok
> > - @retval
> > - TRUE Fatal error
> > + @return false if successful, true if error
> > */
> >
> > -static bool
> > -choose_plan(JOIN *join, table_map join_tables)
> > +bool Optimize_table_order::choose_table_order()
> > {
> > - uint search_depth= join->thd->variables.optimizer_search_depth;
> > - uint prune_level= join->thd->variables.optimizer_prune_level;
> > - bool straight_join= test(join->select_options& SELECT_STRAIGHT_JOIN);
> > - DBUG_ENTER("choose_plan");
> > + const bool straight_join= test(join->select_options&
> SELECT_STRAIGHT_JOIN);
> > + table_map join_tables;
>
> I would like a comment describing what join_tables represent.
Ok.
>
> > + DBUG_ENTER("Optimize_table_order::choose_table_order");
> > +
> > + /* Are there any tables to optimize? */
> > + if (join->const_tables == join->tables)
> > + {
> > + memcpy(join->best_positions, join->positions,
> > + sizeof(POSITION) * join->const_tables);
> > + join->best_read= 1.0;
> > + DBUG_RETURN(false);
> > + }
> >
> > - join->cur_embedding_map= 0;
> > reset_nj_counters(join->join_list);
> > qsort2_cmp jtab_sort_func;
> >
> > @@ -7573,6 +7648,12 @@ choose_plan(JOIN *join, table_map join_t
> > tables from this semi-join as first
> > */
> > jtab_sort_func= join_tab_cmp_embedded_first;
> > + join_tables= join->all_table_map& ~join->const_table_map;
> > + /*
> > + Inner tables of semi join nest are not allowed to be identified as
> > + const tables.
> > + */
> > + DBUG_ASSERT((join_tables& join->const_table_map) == 0);
>
> I do not think this assert is meaningful. Given the assignement
> above, it should always be true.
It will become meaningful when the second patch is added ;)
>
> > }
> > else
> > {
> > @@ -7585,6 +7666,7 @@ choose_plan(JOIN *join, table_map join_t
> > records accessed.
> > */
> > jtab_sort_func= straight_join ? join_tab_cmp_straight : join_tab_cmp;
> > + join_tables= join->all_table_map& ~join->const_table_map;
>
> There is no need to have the same assignment in both if and else part.
> I suggest initializing join_tables where it is declared and making it
> const.
It is a preparation for the second patch.
>
> > }
> > my_qsort2(join->best_ref + join->const_tables,
> > join->tables - join->const_tables, sizeof(JOIN_TAB*),
>
> ...
>
> > @@ -7980,48 +8056,43 @@ semijoin_order_allows_materialization(co
> > In the future, 'greedy_search' might be extended to support other
> > implementations of 'best_extension', e.g. some simpler quadratic procedure.
> >
> > - @param join pointer to the structure providing all context info
> > - for the query
> > + @par
> > + @c search_depth from Optimize_table_order controls the exhaustiveness
> > + of the search, and @c prune_levelcontrols the pruning heuristics that
>
> Missing space btw prune_level and controls.
>
> > + should be applied during search.
> > +
> > @param remaining_tables set of tables not included into the partial plan yet
> > - @param search_depth controlls the exhaustiveness of the search
> > - @param prune_level the pruning heuristics that should be applied during
> > - search
> >
> > - @retval
> > - FALSE ok
> > - @retval
> > - TRUE Fatal error
> > + @return false if successful, true if error
> > */
> >
> > -static bool
> > -greedy_search(JOIN *join,
> > - table_map remaining_tables,
> > - uint search_depth,
> > - uint prune_level)
> > +bool Optimize_table_order::greedy_search(table_map remaining_tables)
>
> ...
>
> > @@ -8278,8 +8337,8 @@ best_extension_by_limited_search(JOIN
> > table_map allowed_tables= ~(table_map)0;
> > if (join->emb_sjm_nest)
> > allowed_tables= join->emb_sjm_nest->sj_inner_tables;
> > -
> > - bool has_sj= !join->select_lex->sj_nests.is_empty();
> > + const bool has_sj=
> > + !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest);
>
> Where did the second part come from? Will this give a change in behavior?
>
> ...
I can document it.
>
> > @@ -13878,10 +13935,10 @@ void optimize_wo_join_buffering(JOIN *jo
> > outer_fanout= 1.0;
> > }
> >
> > - for (i= first_tab; i<= last_tab; i++)
> > + for (uint i= first_tab; i<= last_tab; i++)
> > reopt_remaining_tables |= join->positions[i].table->table->map;
>
> I guess an uint iterator is necessary to avoid having to cast this,
> but why are this passed as uint in the first place? Something we
> should look into changing some day.
Yes, some day :)
>
> ...
>
> --
> Øystein