List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:April 8 2011 2:11pm
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517
View as plain text  
On 08.04.11 11.56, Øystein Grøvlen wrote:
> Hi Roy,
>
> I think the patch looks good. By encapsulating this part of the
> optimizer in a separate class, it should be easier to understand what
> belongs together. See in-line for more specific comments.
>
> I also wonder whether you have resolved with Ole-John the issue he
> raised in an earlier email about conflicts with bug#58225 / bug#41740.

We are aware of it, and it will be resolved by one of us having to "adapt". 
Because this is a prerequisite for further semijoin work, I cannot afford to 
wait for the approval of Ole John's fix, unfortunately.
>
> Also, you are introducing a new variant "semi join" that I do not
> think exists in the code comments before. AFAIK, semi is a prefix
> (expect in special cases like the semis of a sporting tournament), and
> it should be either "semi-join" or just "semijoin".

I did it to align the terminology with inner join and outer join, as in "outer 
join nest" and "inner join nest". I am not so troubled by the fact that it is 
not proper English, and there are quite a few references to "semi join" in the 
google space. However, a reference book such as "Database systems - the complete 
book" talks about semijoin operator - but they also refer to outer join operators.

Anyway, it is probably better to revert to the term semijoin.
>
> On 04/ 6/11 01:57 PM, Roy Lyseng wrote:
>  > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:tor.didriksen@stripped
>  >
>  > 3358 Roy Lyseng 2011-04-06
>  > Bug#11822517: Refactor call interface of choose_plan()
>  >
>  > Part 1 - create class Optimize_table_order.
>  >
>  > This patch creates a class that is used to optimize the join order of
>  > tables in a query. The main purpose of the refactoring is to simplify
>  > function interfaces and to encapsulate information needed to perform
>  > optimization in a dedicated class.
>  >
>  > The following existing functions are added to the class:
>  > choose_table_order() - renamed from choose_plan()
>  > check_interleaving_with_nj()
>  > advance_sj_state()
>  > backout_nj_sj_state()
>  > optimize_straight_join()
>  > greedy_search()
>  > best_extension_by_limited_search()
>  > optimize_wo_join_buffering()
>  > determine_search_depth()
>  >
>  > sql/sql_select.cc
>  > New class definition, see above for details.
>  >
>  > sql/sql_select.h
>  > Added all_table_map (map of tables in query) to JOIN class.
>  > cur_embedding_map has been moved to class Optimize_table_order.
>  > Commented const_table_map, found_const_table_map and outer_join.
>  >
>  > modified:
>  > sql/sql_select.cc
>  > sql/sql_select.h
>  > === modified file 'sql/sql_select.cc'
>  > --- a/sql/sql_select.cc 2011-03-29 07:30:44 +0000
>  > +++ b/sql/sql_select.cc 2011-04-06 11:56:55 +0000
>
> ...
>
>  > @@ -279,6 +260,94 @@ TABLE *create_duplicate_weedout_tmp_tabl
>  > Item_equal *find_item_equal(COND_EQUAL *cond_equal, Field *field,
>  > bool *inherited_fl);
>  >
>  > +/**
>  > + This class determines the optimal join order for tables within
>  > + a basic query block, ie a query specification clause, possibly extended
>  > + with semi joined tables from embedded subqueries.
>  > +
>  > + This class takes as prerequisite a join class where all dependencies among
>  > + tables have been sorted out, all possible access paths have been
>  > + sorted out, and all statistics information has been filled in.
>  > +
>  > + The class has a sole public function that will calculate the most
>  > + optimal plan based on the inputs and the environment, such as prune level
>  > + and greedy optimizer search depth. For more information, see the
>  > + function headers for the private functions greedy_search() and
>  > + best_extension_by_limited_search().
>  > +*/
>  > +
>  > +class Optimize_table_order
>  > +{
>  > +public:
>  > + Optimize_table_order(THD *thd, JOIN *join, TABLE_LIST *sjm_nest)
>
> Using same name for parameters as for member variables is not common
> in C++ (in fact, I thought it was not allowed) and Sun compiler
> complains
>
> "Warning: join hides Optimize_table_order::join."
>
> Since things seems to work, I guess the behavior is a expected, but I
> suggest that you change the name of the parameters.

AFAIU, this is acceptable and intuitive in the initializer list, but not in the 
constructor body. I'd wish
>
> You do not need to pass thd as a separate argument, you can get it
> from join->thd. I am not sure it is a good idea to open up for using
> another thd than join->thd.

I am preparing for the bright future when thd is not a member of class JOIN. 
Please advice if this is a mandatory comment.
>
>  > + : search_depth(determine_search_depth(join,
>  > + thd->variables.optimizer_search_depth)),
>  > + prune_level(thd->variables.optimizer_prune_level),
>  > + thd(thd), join(join),
>  > + cur_embedding_map(0)
>  > + {
>  > + join->emb_sjm_nest= sjm_nest;
>  > + }
>  > + ~Optimize_table_order()
>  > + {
>  > + join->emb_sjm_nest= NULL;
>  > + }
>  > + /**
>  > + Entry point to table join order optimization.
>  > + For further description, see class header and private function headers.
>  > +
>  > + @return false if successful, true if error
>  > + */
>  > + bool choose_table_order();
>  > +
>  > +private:
>  > + const uint search_depth; // Maximum search depth to apply in greedy search
>  > + const uint prune_level; // pruning heuristics to be applied
>  > + // (0 = EXHAUSTIVE, 1 = PRUNE_BY_TIME_OR_ROWS)
>  > + THD * const thd; // Pointer to current THD
>  > + JOIN * const join; // Pointer to the current plan being developed
>
> I suggest writing "*const" instead of "* const". Stroustrup calls
> this a "*const" modifier, so I think it should be regarded as a single
> unit.

'*const' wins over '* const' 43 to 18 in trunk, so I guess this is reasonable.
>
>  > + /**
>  > + Bitmap of nested joins embedding the position at the end of the current
>  > + partial join.
>  > + */
>
> I do not understand what "embedding the position at the end" means.

Neither do I... I'll try to rephrase it.
>
>  > + nested_join_map cur_embedding_map;
>  > + /**
>  > + @todo: Add remaining Join optimization state members here,
>  > + ie emb_sjm_nest, positions, cur_sj_inner_tables.
>  > + This is not yet possible because (some of them) are accessed through
>  > + best_access_path(), which is currently not part of this class.
>  > + */
>  > +
>  > + bool check_interleaving_with_nj(JOIN_TAB *next_tab);
>  > + void advance_sj_state(table_map remaining_tables,
>  > + const JOIN_TAB *tab, uint idx,
>  > + double *current_record_count,
>  > + double *current_read_time,
>  > + POSITION *loose_scan_pos);
>  > + void backout_nj_sj_state(const table_map remaining_tables,
>  > + const JOIN_TAB *tab);
>  > + void optimize_straight_join(table_map join_tables);
>  > + bool greedy_search(table_map remaining_tables);
>  > + bool best_extension_by_limited_search(table_map remaining_tables,
>  > + uint idx,
>  > + double record_count,
>  > + double read_time,
>  > + uint current_search_depth);
>  > + void optimize_wo_join_buffering(uint first_tab, uint last_tab,
>  > + table_map last_remaining_tables,
>  > + bool first_alt, uint no_jbuf_before,
>  > + double *reopt_rec_count, double *reopt_cost);
>  > +
>  > + static uint determine_search_depth(JOIN* join, uint search_depth);
>
> I assume this is static in order to be able to call it from the
> initialization list. I think I would prefer, making this a non-static
> member function, remove the join parameter, and moving the
> initialization to the body of the constructor.
>
>  > + /**
>  > + @todo: Add the static functions join_tab_cmp(), join_tab_cmp_straight() and
>  > + join_tab_cmp_embedded_first() as static private member functions here.
>  > + This is currently not possible because they are fed to my_qsort2(),
>  > + which requires C linkage.
>  > + @todo: Add best_access_path() as private member function within this class.
>  > + Currently not possible because it is called from outside this class.
>
> Why not make it a public member?

Because it is called in a phase of the optimization when there is no natural 
scope of an object of this type. Which means that a new object needs to be 
constructed, and that is not natural to me.
>
>  > + */
>  > +};
>  >
>  > /**
>  > This handles SELECT with and without UNION
>  > @@ -1953,8 +2022,7 @@ JOIN::optimize()
>  >
>  > /* Calculate how to do the join */
>  > THD_STAGE_INFO(thd, stage_statistics);
>  > - if (make_join_statistics(this, select_lex->leaf_tables, conds,&keyuse)
> ||
>  > - thd->is_fatal_error)
>  > + if (make_join_statistics(this, select_lex->leaf_tables,
> conds,&keyuse))
>
> The reason for the removable of the is_fatal_error check should be
> mentioned in the commit comments.

Yes, will do.
>
>  > {
>  > DBUG_PRINT("error",("Error: make_join_statistics() failed"));
>  > DBUG_RETURN(1);
>  > @@ -4625,9 +4693,10 @@ make_join_statistics(JOIN *join, TABLE_L
>  > {
>  > int error;
>  > TABLE *table;
>  > + THD * const thd= join->thd;
>
> Another case where I think it should be "*const"
>
>  > TABLE_LIST *tables= tables_arg;
>  > uint i,table_count,const_count,key;
>  > - table_map found_const_table_map, all_table_map, found_ref, refs;
>  > + table_map found_ref, refs;
>  > TABLE **table_vector;
>  > JOIN_TAB *stat,*stat_end,*s,**stat_ref;
>  > Key_use *keyuse, *start_keyuse;
>  > @@ -4636,25 +4705,33 @@ make_join_statistics(JOIN *join, TABLE_L
>  > JOIN_TAB *stat_vector[MAX_TABLES+1];
>  > DBUG_ENTER("make_join_statistics");
>  >
>  > + /*
>  > + Conceptually, this initialization belongs immediately before the call
>  > + to table_order.choose_table_order(), but due to the use of the error label,
>  > + it must be placed before all 'goto error' statements.
>  > + */
>
> Why?

Because th compiler protested...
>
>  > + Optimize_table_order table_order(thd, join, NULL);
>  > +
>  > table_count=join->tables;
>  > - stat= new (join->thd->mem_root) JOIN_TAB[table_count];
>  > - stat_ref=(JOIN_TAB**) join->thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
>  > - table_vector=(TABLE**)
> join->thd->alloc(sizeof(TABLE*)*(table_count*2));
>  > + stat= new (thd->mem_root) JOIN_TAB[table_count];
>  > + stat_ref=(JOIN_TAB**) thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
>  > + table_vector=(TABLE**) thd->alloc(sizeof(TABLE*)*(table_count*2));
>
> Missing space after '='

Ok
>
>  > if (!stat || !stat_ref || !table_vector)
>  > - DBUG_RETURN(1); // Eom /* purecov: inspected */
>  > + DBUG_RETURN(true);
>  >
>  > if (!(join->positions=
>  > - new (join->thd->mem_root) POSITION[table_count+1]))
>  > - DBUG_RETURN(TRUE);
>  > + new (thd->mem_root) POSITION[table_count+1]))
>  > + DBUG_RETURN(true);
>  >
>  > if (!(join->best_positions=
>  > - new (join->thd->mem_root) POSITION[table_count+1]))
>  > - DBUG_RETURN(TRUE);
>  > + new (thd->mem_root) POSITION[table_count+1]))
>  > + DBUG_RETURN(true);
>  >
>  > join->best_ref=stat_vector;
>  >
>  > stat_end=stat+table_count;
>  > - found_const_table_map= all_table_map=0;
>  > + join->found_const_table_map=0;
>
> Likewise.

Ok
>
>  > + join->all_table_map= 0;
>  > const_count=0;
>  >
>  > for (s= stat, i= 0;
>
> ...

Ok.
>
>  > @@ -6945,17 +7008,25 @@ public:
>  > 'join->positions' of length 'idx'. The chosen access method for 's' and its
>  > cost are stored in 'join->positions[idx]'.
>  >
>  > + Notice also that when calculating a plan for a materialized semi join nest,
>  > + this function must not consider key references between tables inside the
>  > + semi join nest and those outside of it. The way to control this is to add
>  > + the set of outside tables to the @c remaining_tables argument.
>  > +
>  > @param join pointer to the structure providing all context info
>  > for the query
>  > @param s the table to be joined by the function
>  > @param thd thread for the connection that submitted the query
>  > - @param remaining_tables set of tables not included into the partial plan yet
>  > + @param remaining_tables set of tables not included in the partial plan yet.
>  > + additionally, when calculating a plan for a
>  > + materialized semi join nest, this must also contain
>  > + all tables outside of the semi join nest.
>
> "Additionally" should be capitalized, and there is several occurences
> of "semi join" here.

Ok.
>
>  > @param idx the length of the partial plan
>  > @param disable_jbuf TRUE<=> Don't use join buffering
>  > @param record_count estimate for the number of records returned by the
>  > partial plan
>  > - @param pos OUT Table access plan
>  > - @param loose_scan_pos OUT Table plan that uses loosescan, or set cost to
>  > + @param[out] pos Table access plan
>  > + @param[out] loose_scan_pos Table plan that uses loosescan, or set cost to
>  > DBL_MAX if not possible.
>  >
>  > @return
>  > @@ -7537,33 +7608,37 @@ best_access_path(JOIN *join,
>  >
>  >
>  > /**
>  > - Selects and invokes a search strategy for an optimal query plan.
>  > + Selects and invokes a search strategy for an optimal query join order.
>  >
>  > The function checks user-configurable parameters that control the search
>  > strategy for an optimal plan, selects the search method and then invokes
>  > it. Each specific optimization procedure stores the final optimal plan in
>  > the array 'join->best_positions', and the cost of the plan in
>  > 'join->best_read'.
>  > + The function can be invoked to produce a plan for all tables in the query
>  > + (in this case, the const tables are usually filtered out), or it can be
>  > + invoked to produce a plan for a materialization of a semi join nest.
>  > + Set a non-NULL emb_sjm_nest pointer when producing a plan for a semi join
>  > + nest to be materialized and a NULL pointer when producing a full query plan.
>  >
>  > - @param join pointer to the structure providing all context info for
>  > - the query
>  > - @param join_tables set of the tables in the query
>  > -
>  > - @retval
>  > - FALSE ok
>  > - @retval
>  > - TRUE Fatal error
>  > + @return false if successful, true if error
>  > */
>  >
>  > -static bool
>  > -choose_plan(JOIN *join, table_map join_tables)
>  > +bool Optimize_table_order::choose_table_order()
>  > {
>  > - uint search_depth= join->thd->variables.optimizer_search_depth;
>  > - uint prune_level= join->thd->variables.optimizer_prune_level;
>  > - bool straight_join= test(join->select_options& SELECT_STRAIGHT_JOIN);
>  > - DBUG_ENTER("choose_plan");
>  > + const bool straight_join= test(join->select_options&
> SELECT_STRAIGHT_JOIN);
>  > + table_map join_tables;
>
> I would like a comment describing what join_tables represent.

Ok.
>
>  > + DBUG_ENTER("Optimize_table_order::choose_table_order");
>  > +
>  > + /* Are there any tables to optimize? */
>  > + if (join->const_tables == join->tables)
>  > + {
>  > + memcpy(join->best_positions, join->positions,
>  > + sizeof(POSITION) * join->const_tables);
>  > + join->best_read= 1.0;
>  > + DBUG_RETURN(false);
>  > + }
>  >
>  > - join->cur_embedding_map= 0;
>  > reset_nj_counters(join->join_list);
>  > qsort2_cmp jtab_sort_func;
>  >
>  > @@ -7573,6 +7648,12 @@ choose_plan(JOIN *join, table_map join_t
>  > tables from this semi-join as first
>  > */
>  > jtab_sort_func= join_tab_cmp_embedded_first;
>  > + join_tables= join->all_table_map& ~join->const_table_map;
>  > + /*
>  > + Inner tables of semi join nest are not allowed to be identified as
>  > + const tables.
>  > + */
>  > + DBUG_ASSERT((join_tables& join->const_table_map) == 0);
>
> I do not think this assert is meaningful. Given the assignement
> above, it should always be true.

It will become meaningful when the second patch is added ;)
>
>  > }
>  > else
>  > {
>  > @@ -7585,6 +7666,7 @@ choose_plan(JOIN *join, table_map join_t
>  > records accessed.
>  > */
>  > jtab_sort_func= straight_join ? join_tab_cmp_straight : join_tab_cmp;
>  > + join_tables= join->all_table_map& ~join->const_table_map;
>
> There is no need to have the same assignment in both if and else part.
> I suggest initializing join_tables where it is declared and making it
> const.

It is a preparation for the second patch.
>
>  > }
>  > my_qsort2(join->best_ref + join->const_tables,
>  > join->tables - join->const_tables, sizeof(JOIN_TAB*),
>
> ...
>
>  > @@ -7980,48 +8056,43 @@ semijoin_order_allows_materialization(co
>  > In the future, 'greedy_search' might be extended to support other
>  > implementations of 'best_extension', e.g. some simpler quadratic procedure.
>  >
>  > - @param join pointer to the structure providing all context info
>  > - for the query
>  > + @par
>  > + @c search_depth from Optimize_table_order controls the exhaustiveness
>  > + of the search, and @c prune_levelcontrols the pruning heuristics that
>
> Missing space btw prune_level and controls.
>
>  > + should be applied during search.
>  > +
>  > @param remaining_tables set of tables not included into the partial plan yet
>  > - @param search_depth controlls the exhaustiveness of the search
>  > - @param prune_level the pruning heuristics that should be applied during
>  > - search
>  >
>  > - @retval
>  > - FALSE ok
>  > - @retval
>  > - TRUE Fatal error
>  > + @return false if successful, true if error
>  > */
>  >
>  > -static bool
>  > -greedy_search(JOIN *join,
>  > - table_map remaining_tables,
>  > - uint search_depth,
>  > - uint prune_level)
>  > +bool Optimize_table_order::greedy_search(table_map remaining_tables)
>
> ...
>
>  > @@ -8278,8 +8337,8 @@ best_extension_by_limited_search(JOIN
>  > table_map allowed_tables= ~(table_map)0;
>  > if (join->emb_sjm_nest)
>  > allowed_tables= join->emb_sjm_nest->sj_inner_tables;
>  > -
>  > - bool has_sj= !join->select_lex->sj_nests.is_empty();
>  > + const bool has_sj=
>  > + !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest);
>
> Where did the second part come from? Will this give a change in behavior?
>
> ...

I can document it.
>
>  > @@ -13878,10 +13935,10 @@ void optimize_wo_join_buffering(JOIN *jo
>  > outer_fanout= 1.0;
>  > }
>  >
>  > - for (i= first_tab; i<= last_tab; i++)
>  > + for (uint i= first_tab; i<= last_tab; i++)
>  > reopt_remaining_tables |= join->positions[i].table->table->map;
>
> I guess an uint iterator is necessary to avoid having to cast this,
> but why are this passed as uint in the first place? Something we
> should look into changing some day.

Yes, some day :)
>
> ...
>
> --
> Øystein

Thread
bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng6 Apr
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr
      • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen11 Apr
        • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr
          • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen11 Apr
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Guilhem Bichot8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr