List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:April 11 2011 7:22am
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517
View as plain text  
Hi Øystein,

thank you for reviewing. There are a couple of questions for you below.

On 08.04.11 11.56, Øystein Grøvlen wrote:
> Hi Roy,
>
> I think the patch looks good. By encapsulating this part of the
> optimizer in a separate class, it should be easier to understand what
> belongs together. See in-line for more specific comments.
>
> I also wonder whether you have resolved with Ole-John the issue he
> raised in an earlier email about conflicts with bug#58225 / bug#41740.

We are aware of it, and it will be resolved by one of us having to "adapt". 
Because this is a prerequisite for further semijoin work, I cannot afford to 
wait for the approval of Ole John's fix, unfortunately.
>
> Also, you are introducing a new variant "semi join" that I do not
> think exists in the code comments before. AFAIK, semi is a prefix
> (expect in special cases like the semis of a sporting tournament), and
> it should be either "semi-join" or just "semijoin".

I did it to align the terminology with inner join and outer join, as in "outer 
join nest" and "inner join nest". I am not so troubled by the fact that it is 
not proper English, and there are quite a few references to "semi join" in the 
google space. However, a reference book such as "Database systems - the complete 
book" talks about semijoin operator - but they also refer to outerjoin operators.

Anyway, it is probably better to revert to the term semi-join, which is used in 
a lot of the code, and in 6.0 documentation.
>
> On 04/ 6/11 01:57 PM, Roy Lyseng wrote:
>  > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:tor.didriksen@stripped
>  >
>  > 3358 Roy Lyseng 2011-04-06
>  > Bug#11822517: Refactor call interface of choose_plan()
>  >
>  > Part 1 - create class Optimize_table_order.
>  >
>  > This patch creates a class that is used to optimize the join order of
>  > tables in a query. The main purpose of the refactoring is to simplify
>  > function interfaces and to encapsulate information needed to perform
>  > optimization in a dedicated class.
>  >
>  > The following existing functions are added to the class:
>  > choose_table_order() - renamed from choose_plan()
>  > check_interleaving_with_nj()
>  > advance_sj_state()
>  > backout_nj_sj_state()
>  > optimize_straight_join()
>  > greedy_search()
>  > best_extension_by_limited_search()
>  > optimize_wo_join_buffering()
>  > determine_search_depth()
>  >
>  > sql/sql_select.cc
>  > New class definition, see above for details.
>  >
>  > sql/sql_select.h
>  > Added all_table_map (map of tables in query) to JOIN class.
>  > cur_embedding_map has been moved to class Optimize_table_order.
>  > Commented const_table_map, found_const_table_map and outer_join.
>  >
>  > modified:
>  > sql/sql_select.cc
>  > sql/sql_select.h
>  > === modified file 'sql/sql_select.cc'
>  > --- a/sql/sql_select.cc 2011-03-29 07:30:44 +0000
>  > +++ b/sql/sql_select.cc 2011-04-06 11:56:55 +0000
>
> ...
>
>  > @@ -279,6 +260,94 @@ TABLE *create_duplicate_weedout_tmp_tabl
>  > Item_equal *find_item_equal(COND_EQUAL *cond_equal, Field *field,
>  > bool *inherited_fl);
>  >
>  > +/**
>  > + This class determines the optimal join order for tables within
>  > + a basic query block, ie a query specification clause, possibly extended
>  > + with semi joined tables from embedded subqueries.
>  > +
>  > + This class takes as prerequisite a join class where all dependencies among
>  > + tables have been sorted out, all possible access paths have been
>  > + sorted out, and all statistics information has been filled in.
>  > +
>  > + The class has a sole public function that will calculate the most
>  > + optimal plan based on the inputs and the environment, such as prune level
>  > + and greedy optimizer search depth. For more information, see the
>  > + function headers for the private functions greedy_search() and
>  > + best_extension_by_limited_search().
>  > +*/
>  > +
>  > +class Optimize_table_order
>  > +{
>  > +public:
>  > + Optimize_table_order(THD *thd, JOIN *join, TABLE_LIST *sjm_nest)
>
> Using same name for parameters as for member variables is not common
> in C++ (in fact, I thought it was not allowed) and Sun compiler
> complains
>
> "Warning: join hides Optimize_table_order::join."
>
> Since things seems to work, I guess the behavior is a expected, but I
> suggest that you change the name of the parameters.

AFAIU, this is acceptable and intuitive in the initializer list, but not in the 
constructor body. I'd wish the compiler would warn about member hiding, though. 
In this case it is the reference to join in the constructor body that causes the 
problem, and the easy way out is to prefix it with "this->" rather than having 
to invent an argument name prefix or suffix.
>
> You do not need to pass thd as a separate argument, you can get it
> from join->thd. I am not sure it is a good idea to open up for using
> another thd than join->thd.

I am preparing for the bright future when thd is not a member of class JOIN. 
Please advice if this is a mandatory comment.
>
>  > + : search_depth(determine_search_depth(join,
>  > + thd->variables.optimizer_search_depth)),
>  > + prune_level(thd->variables.optimizer_prune_level),
>  > + thd(thd), join(join),
>  > + cur_embedding_map(0)
>  > + {
>  > + join->emb_sjm_nest= sjm_nest;
>  > + }
>  > + ~Optimize_table_order()
>  > + {
>  > + join->emb_sjm_nest= NULL;
>  > + }
>  > + /**
>  > + Entry point to table join order optimization.
>  > + For further description, see class header and private function headers.
>  > +
>  > + @return false if successful, true if error
>  > + */
>  > + bool choose_table_order();
>  > +
>  > +private:
>  > + const uint search_depth; // Maximum search depth to apply in greedy search
>  > + const uint prune_level; // pruning heuristics to be applied
>  > + // (0 = EXHAUSTIVE, 1 = PRUNE_BY_TIME_OR_ROWS)
>  > + THD * const thd; // Pointer to current THD
>  > + JOIN * const join; // Pointer to the current plan being developed
>
> I suggest writing "*const" instead of "* const". Stroustrup calls
> this a "*const" modifier, so I think it should be regarded as a single
> unit.

'*const' wins over '* const' 43 to 18 in trunk, so I guess this is reasonable.
>
>  > + /**
>  > + Bitmap of nested joins embedding the position at the end of the current
>  > + partial join.
>  > + */
>
> I do not understand what "embedding the position at the end" means.

I'll try to rephrase it.
>
>  > + nested_join_map cur_embedding_map;
>  > + /**
>  > + @todo: Add remaining Join optimization state members here,
>  > + ie emb_sjm_nest, positions, cur_sj_inner_tables.
>  > + This is not yet possible because (some of them) are accessed through
>  > + best_access_path(), which is currently not part of this class.
>  > + */
>  > +
>  > + bool check_interleaving_with_nj(JOIN_TAB *next_tab);
>  > + void advance_sj_state(table_map remaining_tables,
>  > + const JOIN_TAB *tab, uint idx,
>  > + double *current_record_count,
>  > + double *current_read_time,
>  > + POSITION *loose_scan_pos);
>  > + void backout_nj_sj_state(const table_map remaining_tables,
>  > + const JOIN_TAB *tab);
>  > + void optimize_straight_join(table_map join_tables);
>  > + bool greedy_search(table_map remaining_tables);
>  > + bool best_extension_by_limited_search(table_map remaining_tables,
>  > + uint idx,
>  > + double record_count,
>  > + double read_time,
>  > + uint current_search_depth);
>  > + void optimize_wo_join_buffering(uint first_tab, uint last_tab,
>  > + table_map last_remaining_tables,
>  > + bool first_alt, uint no_jbuf_before,
>  > + double *reopt_rec_count, double *reopt_cost);
>  > +
>  > + static uint determine_search_depth(JOIN* join, uint search_depth);
>
> I assume this is static in order to be able to call it from the
> initialization list. I think I would prefer, making this a non-static
> member function, remove the join parameter, and moving the
> initialization to the body of the constructor.

It means that search_depth will no longer be const in the class. Is that Ok?
>
>  > + /**
>  > + @todo: Add the static functions join_tab_cmp(), join_tab_cmp_straight() and
>  > + join_tab_cmp_embedded_first() as static private member functions here.
>  > + This is currently not possible because they are fed to my_qsort2(),
>  > + which requires C linkage.
>  > + @todo: Add best_access_path() as private member function within this class.
>  > + Currently not possible because it is called from outside this class.
>
> Why not make it a public member?

Because it is called in a phase of the optimization when there is no natural 
scope of an object of this type. Which means that a new object needs to be 
constructed, and I think we can do better than that in a later refactoring.
>
>  > + */
>  > +};
>  >
>  > /**
>  > This handles SELECT with and without UNION
>  > @@ -1953,8 +2022,7 @@ JOIN::optimize()
>  >
>  > /* Calculate how to do the join */
>  > THD_STAGE_INFO(thd, stage_statistics);
>  > - if (make_join_statistics(this, select_lex->leaf_tables, conds,&keyuse)
> ||
>  > - thd->is_fatal_error)
>  > + if (make_join_statistics(this, select_lex->leaf_tables,
> conds,&keyuse))
>
> The reason for the removable of the is_fatal_error check should be
> mentioned in the commit comments.

Yes, will do.
>
>  > {
>  > DBUG_PRINT("error",("Error: make_join_statistics() failed"));
>  > DBUG_RETURN(1);
>  > @@ -4625,9 +4693,10 @@ make_join_statistics(JOIN *join, TABLE_L
>  > {
>  > int error;
>  > TABLE *table;
>  > + THD * const thd= join->thd;
>
> Another case where I think it should be "*const"
>
>  > TABLE_LIST *tables= tables_arg;
>  > uint i,table_count,const_count,key;
>  > - table_map found_const_table_map, all_table_map, found_ref, refs;
>  > + table_map found_ref, refs;
>  > TABLE **table_vector;
>  > JOIN_TAB *stat,*stat_end,*s,**stat_ref;
>  > Key_use *keyuse, *start_keyuse;
>  > @@ -4636,25 +4705,33 @@ make_join_statistics(JOIN *join, TABLE_L
>  > JOIN_TAB *stat_vector[MAX_TABLES+1];
>  > DBUG_ENTER("make_join_statistics");
>  >
>  > + /*
>  > + Conceptually, this initialization belongs immediately before the call
>  > + to table_order.choose_table_order(), but due to the use of the error label,
>  > + it must be placed before all 'goto error' statements.
>  > + */
>
> Why?

I have re-implemented according to Guilhem's suggestion, so there is no longer 
need for the early initialization.
>
>  > + Optimize_table_order table_order(thd, join, NULL);
>  > +
>  > table_count=join->tables;
>  > - stat= new (join->thd->mem_root) JOIN_TAB[table_count];
>  > - stat_ref=(JOIN_TAB**) join->thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
>  > - table_vector=(TABLE**)
> join->thd->alloc(sizeof(TABLE*)*(table_count*2));
>  > + stat= new (thd->mem_root) JOIN_TAB[table_count];
>  > + stat_ref=(JOIN_TAB**) thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
>  > + table_vector=(TABLE**) thd->alloc(sizeof(TABLE*)*(table_count*2));
>
> Missing space after '='

Ok
>
>  > if (!stat || !stat_ref || !table_vector)
>  > - DBUG_RETURN(1); // Eom /* purecov: inspected */
>  > + DBUG_RETURN(true);
>  >
>  > if (!(join->positions=
>  > - new (join->thd->mem_root) POSITION[table_count+1]))
>  > - DBUG_RETURN(TRUE);
>  > + new (thd->mem_root) POSITION[table_count+1]))
>  > + DBUG_RETURN(true);
>  >
>  > if (!(join->best_positions=
>  > - new (join->thd->mem_root) POSITION[table_count+1]))
>  > - DBUG_RETURN(TRUE);
>  > + new (thd->mem_root) POSITION[table_count+1]))
>  > + DBUG_RETURN(true);
>  >
>  > join->best_ref=stat_vector;
>  >
>  > stat_end=stat+table_count;
>  > - found_const_table_map= all_table_map=0;
>  > + join->found_const_table_map=0;
>
> Likewise.

Ok
>
>  > + join->all_table_map= 0;
>  > const_count=0;
>  >
>  > for (s= stat, i= 0;
>
> ...

Ok.
>
>  > @@ -6945,17 +7008,25 @@ public:
>  > 'join->positions' of length 'idx'. The chosen access method for 's' and its
>  > cost are stored in 'join->positions[idx]'.
>  >
>  > + Notice also that when calculating a plan for a materialized semi join nest,
>  > + this function must not consider key references between tables inside the
>  > + semi join nest and those outside of it. The way to control this is to add
>  > + the set of outside tables to the @c remaining_tables argument.
>  > +
>  > @param join pointer to the structure providing all context info
>  > for the query
>  > @param s the table to be joined by the function
>  > @param thd thread for the connection that submitted the query
>  > - @param remaining_tables set of tables not included into the partial plan yet
>  > + @param remaining_tables set of tables not included in the partial plan yet.
>  > + additionally, when calculating a plan for a
>  > + materialized semi join nest, this must also contain
>  > + all tables outside of the semi join nest.
>
> "Additionally" should be capitalized, and there is several occurences
> of "semi join" here.

Ok.
>
>  > @param idx the length of the partial plan
>  > @param disable_jbuf TRUE<=> Don't use join buffering
>  > @param record_count estimate for the number of records returned by the
>  > partial plan
>  > - @param pos OUT Table access plan
>  > - @param loose_scan_pos OUT Table plan that uses loosescan, or set cost to
>  > + @param[out] pos Table access plan
>  > + @param[out] loose_scan_pos Table plan that uses loosescan, or set cost to
>  > DBL_MAX if not possible.
>  >
>  > @return
>  > @@ -7537,33 +7608,37 @@ best_access_path(JOIN *join,
>  >
>  >
>  > /**
>  > - Selects and invokes a search strategy for an optimal query plan.
>  > + Selects and invokes a search strategy for an optimal query join order.
>  >
>  > The function checks user-configurable parameters that control the search
>  > strategy for an optimal plan, selects the search method and then invokes
>  > it. Each specific optimization procedure stores the final optimal plan in
>  > the array 'join->best_positions', and the cost of the plan in
>  > 'join->best_read'.
>  > + The function can be invoked to produce a plan for all tables in the query
>  > + (in this case, the const tables are usually filtered out), or it can be
>  > + invoked to produce a plan for a materialization of a semi join nest.
>  > + Set a non-NULL emb_sjm_nest pointer when producing a plan for a semi join
>  > + nest to be materialized and a NULL pointer when producing a full query plan.
>  >
>  > - @param join pointer to the structure providing all context info for
>  > - the query
>  > - @param join_tables set of the tables in the query
>  > -
>  > - @retval
>  > - FALSE ok
>  > - @retval
>  > - TRUE Fatal error
>  > + @return false if successful, true if error
>  > */
>  >
>  > -static bool
>  > -choose_plan(JOIN *join, table_map join_tables)
>  > +bool Optimize_table_order::choose_table_order()
>  > {
>  > - uint search_depth= join->thd->variables.optimizer_search_depth;
>  > - uint prune_level= join->thd->variables.optimizer_prune_level;
>  > - bool straight_join= test(join->select_options& SELECT_STRAIGHT_JOIN);
>  > - DBUG_ENTER("choose_plan");
>  > + const bool straight_join= test(join->select_options&
> SELECT_STRAIGHT_JOIN);
>  > + table_map join_tables;
>
> I would like a comment describing what join_tables represent.

Ok.
>
>  > + DBUG_ENTER("Optimize_table_order::choose_table_order");
>  > +
>  > + /* Are there any tables to optimize? */
>  > + if (join->const_tables == join->tables)
>  > + {
>  > + memcpy(join->best_positions, join->positions,
>  > + sizeof(POSITION) * join->const_tables);
>  > + join->best_read= 1.0;
>  > + DBUG_RETURN(false);
>  > + }
>  >
>  > - join->cur_embedding_map= 0;
>  > reset_nj_counters(join->join_list);
>  > qsort2_cmp jtab_sort_func;
>  >
>  > @@ -7573,6 +7648,12 @@ choose_plan(JOIN *join, table_map join_t
>  > tables from this semi-join as first
>  > */
>  > jtab_sort_func= join_tab_cmp_embedded_first;
>  > + join_tables= join->all_table_map& ~join->const_table_map;
>  > + /*
>  > + Inner tables of semi join nest are not allowed to be identified as
>  > + const tables.
>  > + */
>  > + DBUG_ASSERT((join_tables& join->const_table_map) == 0);
>
> I do not think this assert is meaningful. Given the assignement
> above, it should always be true.

It will become meaningful when the second patch is added ;)
>
>  > }
>  > else
>  > {
>  > @@ -7585,6 +7666,7 @@ choose_plan(JOIN *join, table_map join_t
>  > records accessed.
>  > */
>  > jtab_sort_func= straight_join ? join_tab_cmp_straight : join_tab_cmp;
>  > + join_tables= join->all_table_map& ~join->const_table_map;
>
> There is no need to have the same assignment in both if and else part.
> I suggest initializing join_tables where it is declared and making it
> const.

It is a preparation for the second patch.
>
>  > }
>  > my_qsort2(join->best_ref + join->const_tables,
>  > join->tables - join->const_tables, sizeof(JOIN_TAB*),
>
> ...
>
>  > @@ -7980,48 +8056,43 @@ semijoin_order_allows_materialization(co
>  > In the future, 'greedy_search' might be extended to support other
>  > implementations of 'best_extension', e.g. some simpler quadratic procedure.
>  >
>  > - @param join pointer to the structure providing all context info
>  > - for the query
>  > + @par
>  > + @c search_depth from Optimize_table_order controls the exhaustiveness
>  > + of the search, and @c prune_levelcontrols the pruning heuristics that
>
> Missing space btw prune_level and controls.
>
>  > + should be applied during search.
>  > +
>  > @param remaining_tables set of tables not included into the partial plan yet
>  > - @param search_depth controlls the exhaustiveness of the search
>  > - @param prune_level the pruning heuristics that should be applied during
>  > - search
>  >
>  > - @retval
>  > - FALSE ok
>  > - @retval
>  > - TRUE Fatal error
>  > + @return false if successful, true if error
>  > */
>  >
>  > -static bool
>  > -greedy_search(JOIN *join,
>  > - table_map remaining_tables,
>  > - uint search_depth,
>  > - uint prune_level)
>  > +bool Optimize_table_order::greedy_search(table_map remaining_tables)
>
> ...
>
>  > @@ -8278,8 +8337,8 @@ best_extension_by_limited_search(JOIN
>  > table_map allowed_tables= ~(table_map)0;
>  > if (join->emb_sjm_nest)
>  > allowed_tables= join->emb_sjm_nest->sj_inner_tables;
>  > -
>  > - bool has_sj= !join->select_lex->sj_nests.is_empty();
>  > + const bool has_sj=
>  > + !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest);
>
> Where did the second part come from? Will this give a change in behavior?
>
> ...

I can document it.
>
>  > @@ -13878,10 +13935,10 @@ void optimize_wo_join_buffering(JOIN *jo
>  > outer_fanout= 1.0;
>  > }
>  >
>  > - for (i= first_tab; i<= last_tab; i++)
>  > + for (uint i= first_tab; i<= last_tab; i++)
>  > reopt_remaining_tables |= join->positions[i].table->table->map;
>
> I guess an uint iterator is necessary to avoid having to cast this,
> but why are this passed as uint in the first place? Something we
> should look into changing some day.

Yes, some day. Here I just follow the current convention of using uint in this 
block of code.

Thanks,
Roy
Thread
bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng6 Apr
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr
      • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen11 Apr
        • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr
          • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen11 Apr
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Guilhem Bichot8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr