List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:April 8 2011 9:56am
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517
View as plain text  
Hi Roy,

I think the patch looks good.  By encapsulating this part of the
optimizer in a separate class, it should be easier to understand what
belongs together.  See in-line for more specific comments.

I also wonder whether you have resolved with Ole-John the issue he
raised in an earlier email about conflicts with bug#58225 / bug#41740.

Also, you are introducing a new variant "semi join" that I do not
think exists in the code comments before.  AFAIK, semi is a prefix
(expect in special cases like the semis of a sporting tournament), and
it should be either "semi-join" or just "semijoin".

On 04/ 6/11 01:57 PM, Roy Lyseng wrote:
 > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on 
revid:tor.didriksen@stripped
 >
 >   3358 Roy Lyseng	2011-04-06
 >        Bug#11822517: Refactor call interface of choose_plan()
 >
 >        Part 1 - create class Optimize_table_order.
 >
 >        This patch creates a class that is used to optimize the join 
order of
 >        tables in a query. The main purpose of the refactoring is to 
simplify
 >        function interfaces and to encapsulate information needed to 
perform
 >        optimization in a dedicated class.
 >
 >        The following existing functions are added to the class:
 >          choose_table_order() - renamed from choose_plan()
 >          check_interleaving_with_nj()
 >          advance_sj_state()
 >          backout_nj_sj_state()
 >          optimize_straight_join()
 >          greedy_search()
 >          best_extension_by_limited_search()
 >          optimize_wo_join_buffering()
 >          determine_search_depth()
 >
 >        sql/sql_select.cc
 >          New class definition, see above for details.
 >
 >        sql/sql_select.h
 >          Added all_table_map (map of tables in query) to JOIN class.
 >          cur_embedding_map has been moved to class Optimize_table_order.
 >          Commented const_table_map, found_const_table_map and outer_join.
 >
 >      modified:
 >        sql/sql_select.cc
 >        sql/sql_select.h
 > === modified file 'sql/sql_select.cc'
 > --- a/sql/sql_select.cc	2011-03-29 07:30:44 +0000
 > +++ b/sql/sql_select.cc	2011-04-06 11:56:55 +0000

...

 > @@ -279,6 +260,94 @@ TABLE *create_duplicate_weedout_tmp_tabl
 >   Item_equal *find_item_equal(COND_EQUAL *cond_equal, Field *field,
 >                               bool *inherited_fl);
 >
 > +/**
 > +  This class determines the optimal join order for tables within
 > +  a basic query block, ie a query specification clause, possibly 
extended
 > +  with semi joined tables from embedded subqueries.
 > +
 > +  This class takes as prerequisite a join class where all 
dependencies among
 > +  tables have been sorted out, all possible access paths have been
 > +  sorted out, and all statistics information has been filled in.
 > +
 > +  The class has a sole public function that will calculate the most
 > +  optimal plan based on the inputs and the environment, such as 
prune level
 > +  and greedy optimizer search depth. For more information, see the
 > +  function headers for the private functions greedy_search() and
 > +  best_extension_by_limited_search().
 > +*/
 > +
 > +class Optimize_table_order
 > +{
 > +public:
 > +  Optimize_table_order(THD *thd, JOIN *join, TABLE_LIST *sjm_nest)

Using same name for parameters as for member variables is not common
in C++ (in fact, I thought it was not allowed) and Sun compiler
complains

     "Warning: join hides Optimize_table_order::join."

Since things seems to work, I guess the behavior is a expected, but I
suggest that you change the name of the parameters.

You do not need to pass thd as a separate argument, you can get it
from join->thd.  I am not sure it is a good idea to open up for using
another thd than join->thd.

 > +  : search_depth(determine_search_depth(join,
 > + 
thd->variables.optimizer_search_depth)),
 > +    prune_level(thd->variables.optimizer_prune_level),
 > +    thd(thd), join(join),
 > +    cur_embedding_map(0)
 > +  {
 > +    join->emb_sjm_nest= sjm_nest;
 > +  }
 > +  ~Optimize_table_order()
 > +  {
 > +    join->emb_sjm_nest= NULL;
 > +  }
 > +  /**
 > +    Entry point to table join order optimization.
 > +    For further description, see class header and private function 
headers.
 > +
 > +    @return false if successful, true if error
 > +  */
 > +  bool choose_table_order();
 > +
 > +private:
 > +  const uint search_depth;   // Maximum search depth to apply in 
greedy search
 > +  const uint prune_level;    // pruning heuristics to be applied
 > +                             // (0 = EXHAUSTIVE, 1 = 
PRUNE_BY_TIME_OR_ROWS)
 > +  THD * const thd;           // Pointer to current THD
 > +  JOIN * const join;         // Pointer to the current plan being 
developed

I suggest writing "*const" instead of "* const".  Stroustrup calls
this a "*const" modifier, so I think it should be regarded as a single
unit.

 > +  /**
 > +    Bitmap of nested joins embedding the position at the end of the 
current
 > +    partial join.
 > +  */

I do not understand what "embedding the position at the end" means.

 > +  nested_join_map cur_embedding_map;
 > +  /**
 > +    @todo: Add remaining Join optimization state members here,
 > +    ie emb_sjm_nest, positions, cur_sj_inner_tables.
 > +    This is not yet possible because (some of them) are accessed through
 > +    best_access_path(), which is currently not part of this class.
 > +  */
 > +
 > +  bool check_interleaving_with_nj(JOIN_TAB *next_tab);
 > +  void advance_sj_state(table_map remaining_tables,
 > +                        const JOIN_TAB *tab, uint idx,
 > +                        double *current_record_count,
 > +                        double *current_read_time,
 > +                        POSITION *loose_scan_pos);
 > +  void backout_nj_sj_state(const table_map remaining_tables,
 > +                           const JOIN_TAB *tab);
 > +  void optimize_straight_join(table_map join_tables);
 > +  bool greedy_search(table_map remaining_tables);
 > +  bool best_extension_by_limited_search(table_map remaining_tables,
 > +                                        uint idx,
 > +                                        double record_count,
 > +                                        double read_time,
 > +                                        uint current_search_depth);
 > +  void optimize_wo_join_buffering(uint first_tab, uint last_tab,
 > +                                  table_map last_remaining_tables,
 > +                                  bool first_alt, uint no_jbuf_before,
 > +                                  double *reopt_rec_count, double 
*reopt_cost);
 > +
 > +  static uint determine_search_depth(JOIN* join, uint search_depth);

I assume this is static in order to be able to call it from the
initialization list.  I think I would prefer, making this a non-static
member function, remove the join parameter, and moving the
initialization to the body of the constructor.

 > +  /**
 > +    @todo: Add the static functions join_tab_cmp(), 
join_tab_cmp_straight() and
 > +    join_tab_cmp_embedded_first() as static private member functions 
here.
 > +    This is currently not possible because they are fed to my_qsort2(),
 > +    which requires C linkage.
 > +    @todo: Add best_access_path() as private member function within 
this class.
 > +    Currently not possible because it is called from outside this class.

Why not make it a public member?

 > +  */
 > +};
 >
 >   /**
 >     This handles SELECT with and without UNION
 > @@ -1953,8 +2022,7 @@ JOIN::optimize()
 >
 >     /* Calculate how to do the join */
 >     THD_STAGE_INFO(thd, stage_statistics);
 > -  if (make_join_statistics(this, select_lex->leaf_tables, 
conds,&keyuse) ||
 > -      thd->is_fatal_error)
 > +  if (make_join_statistics(this, select_lex->leaf_tables, 
conds,&keyuse))

The reason for the removable of the is_fatal_error check should be
mentioned in the commit comments.

 >     {
 >       DBUG_PRINT("error",("Error: make_join_statistics() failed"));
 >       DBUG_RETURN(1);
 > @@ -4625,9 +4693,10 @@ make_join_statistics(JOIN *join, TABLE_L
 >   {
 >     int error;
 >     TABLE *table;
 > +  THD * const thd= join->thd;

Another case where I think it should be "*const"

 >     TABLE_LIST *tables= tables_arg;
 >     uint i,table_count,const_count,key;
 > -  table_map found_const_table_map, all_table_map, found_ref, refs;
 > +  table_map found_ref, refs;
 >     TABLE **table_vector;
 >     JOIN_TAB *stat,*stat_end,*s,**stat_ref;
 >     Key_use *keyuse, *start_keyuse;
 > @@ -4636,25 +4705,33 @@ make_join_statistics(JOIN *join, TABLE_L
 >     JOIN_TAB *stat_vector[MAX_TABLES+1];
 >     DBUG_ENTER("make_join_statistics");
 >
 > +  /*
 > +    Conceptually, this initialization belongs immediately before the 
call
 > +    to table_order.choose_table_order(), but due to the use of the 
error label,
 > +    it must be placed before all 'goto error' statements.
 > +  */

Why?

 > +  Optimize_table_order table_order(thd, join, NULL);
 > +
 >     table_count=join->tables;
 > -  stat= new (join->thd->mem_root) JOIN_TAB[table_count];
 > -  stat_ref=(JOIN_TAB**) join->thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
 > -  table_vector=(TABLE**) 
join->thd->alloc(sizeof(TABLE*)*(table_count*2));
 > +  stat= new (thd->mem_root) JOIN_TAB[table_count];
 > +  stat_ref=(JOIN_TAB**) thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
 > +  table_vector=(TABLE**) thd->alloc(sizeof(TABLE*)*(table_count*2));

Missing space after '='

 >     if (!stat || !stat_ref || !table_vector)
 > -    DBUG_RETURN(1);				// Eom /* purecov: inspected */
 > +    DBUG_RETURN(true);
 >
 >     if (!(join->positions=
 > -        new (join->thd->mem_root) POSITION[table_count+1]))
 > -    DBUG_RETURN(TRUE);
 > +        new (thd->mem_root) POSITION[table_count+1]))
 > +    DBUG_RETURN(true);
 >
 >     if (!(join->best_positions=
 > -        new (join->thd->mem_root) POSITION[table_count+1]))
 > -    DBUG_RETURN(TRUE);
 > +        new (thd->mem_root) POSITION[table_count+1]))
 > +    DBUG_RETURN(true);
 >
 >     join->best_ref=stat_vector;
 >
 >     stat_end=stat+table_count;
 > -  found_const_table_map= all_table_map=0;
 > +  join->found_const_table_map=0;

Likewise.

 > +  join->all_table_map= 0;
 >     const_count=0;
 >
 >     for (s= stat, i= 0;

...

 > @@ -6945,17 +7008,25 @@ public:
 >     'join->positions' of length 'idx'. The chosen access method for 
's' and its
 >     cost are stored in 'join->positions[idx]'.
 >
 > +  Notice also that when calculating a plan for a materialized semi 
join nest,
 > +  this function must not consider key references between tables 
inside the
 > +  semi join nest and those outside of it. The way to control this is 
to add
 > +  the set of outside tables to the @c remaining_tables argument.
 > +
 >     @param join             pointer to the structure providing all 
context info
 >                             for the query
 >     @param s                the table to be joined by the function
 >     @param thd              thread for the connection that submitted 
the query
 > -  @param remaining_tables set of tables not included into the 
partial plan yet
 > +  @param remaining_tables set of tables not included in the partial 
plan yet.
 > +                          additionally, when calculating a plan for a
 > +                          materialized semi join nest, this must 
also contain
 > +                          all tables outside of the semi join nest.

"Additionally" should be capitalized, and there is several occurences
of "semi join" here.

 >     @param idx              the length of the partial plan
 >     @param disable_jbuf     TRUE<=>  Don't use join buffering
 >     @param record_count     estimate for the number of records 
returned by the
 >                             partial plan
 > -  @param pos              OUT Table access plan
 > -  @param loose_scan_pos   OUT Table plan that uses loosescan, or set 
cost to
 > +  @param[out] pos         Table access plan
 > +  @param[out] loose_scan_pos  Table plan that uses loosescan, or set 
cost to
 >                                 DBL_MAX if not possible.
 >
 >     @return
 > @@ -7537,33 +7608,37 @@ best_access_path(JOIN      *join,
 >
 >
 >   /**
 > -  Selects and invokes a search strategy for an optimal query plan.
 > +  Selects and invokes a search strategy for an optimal query join order.
 >
 >     The function checks user-configurable parameters that control the 
search
 >     strategy for an optimal plan, selects the search method and then 
invokes
 >     it. Each specific optimization procedure stores the final optimal 
plan in
 >     the array 'join->best_positions', and the cost of the plan in
 >     'join->best_read'.
 > +  The function can be invoked to produce a plan for all tables in 
the query
 > +  (in this case, the const tables are usually filtered out), or it 
can be
 > +  invoked to produce a plan for a materialization of a semi join nest.
 > +  Set a non-NULL emb_sjm_nest pointer when producing a plan for a 
semi join
 > +  nest to be materialized and a NULL pointer when producing a full 
query plan.
 >
 > -  @param join         pointer to the structure providing all context 
info for
 > -                      the query
 > -  @param join_tables  set of the tables in the query
 > -
 > -  @retval
 > -    FALSE       ok
 > -  @retval
 > -    TRUE        Fatal error
 > +  @return false if successful, true if error
 >   */
 >
 > -static bool
 > -choose_plan(JOIN *join, table_map join_tables)
 > +bool Optimize_table_order::choose_table_order()
 >   {
 > -  uint search_depth= join->thd->variables.optimizer_search_depth;
 > -  uint prune_level=  join->thd->variables.optimizer_prune_level;
 > -  bool straight_join= test(join->select_options&  SELECT_STRAIGHT_JOIN);
 > -  DBUG_ENTER("choose_plan");
 > +  const bool straight_join= test(join->select_options& 
SELECT_STRAIGHT_JOIN);
 > +  table_map join_tables;

I would like a comment describing what join_tables represent.

 > +  DBUG_ENTER("Optimize_table_order::choose_table_order");
 > +
 > +  /* Are there any tables to optimize? */
 > +  if (join->const_tables == join->tables)
 > +  {
 > +    memcpy(join->best_positions, join->positions,
 > +	   sizeof(POSITION) * join->const_tables);
 > +    join->best_read= 1.0;
 > +    DBUG_RETURN(false);
 > +  }
 >
 > -  join->cur_embedding_map= 0;
 >     reset_nj_counters(join->join_list);
 >     qsort2_cmp jtab_sort_func;
 >
 > @@ -7573,6 +7648,12 @@ choose_plan(JOIN *join, table_map join_t
 >          tables from this semi-join as first
 >       */
 >       jtab_sort_func= join_tab_cmp_embedded_first;
 > +    join_tables= join->all_table_map&  ~join->const_table_map;
 > +    /*
 > +      Inner tables of semi join nest are not allowed to be identified as
 > +      const tables.
 > +    */
 > +    DBUG_ASSERT((join_tables&  join->const_table_map) == 0);

I do not think this assert is meaningful.  Given the assignement
above, it should always be true.

 >     }
 >     else
 >     {
 > @@ -7585,6 +7666,7 @@ choose_plan(JOIN *join, table_map join_t
 >           records accessed.
 >       */
 >       jtab_sort_func= straight_join ? join_tab_cmp_straight : 
join_tab_cmp;
 > +    join_tables= join->all_table_map&  ~join->const_table_map;

There is no need to have the same assignment in both if and else part.
I suggest initializing join_tables where it is declared and making it
const.

 >     }
 >     my_qsort2(join->best_ref + join->const_tables,
 >               join->tables - join->const_tables, sizeof(JOIN_TAB*),

...

 > @@ -7980,48 +8056,43 @@ semijoin_order_allows_materialization(co
 >       In the future, 'greedy_search' might be extended to support other
 >       implementations of 'best_extension', e.g. some simpler 
quadratic procedure.
 >
 > -  @param join             pointer to the structure providing all 
context info
 > -                          for the query
 > +  @par
 > +    @c search_depth from Optimize_table_order controls the 
exhaustiveness
 > +    of the search, and @c prune_levelcontrols the pruning heuristics 
that

Missing space btw prune_level and controls.

 > +    should be applied during search.
 > +
 >     @param remaining_tables set of tables not included into the 
partial plan yet
 > -  @param search_depth     controlls the exhaustiveness of the search
 > -  @param prune_level      the pruning heuristics that should be 
applied during
 > -                          search
 >
 > -  @retval
 > -    FALSE       ok
 > -  @retval
 > -    TRUE        Fatal error
 > +  @return false if successful, true if error
 >   */
 >
 > -static bool
 > -greedy_search(JOIN      *join,
 > -              table_map remaining_tables,
 > -              uint      search_depth,
 > -              uint      prune_level)
 > +bool Optimize_table_order::greedy_search(table_map remaining_tables)

...

 > @@ -8278,8 +8337,8 @@ best_extension_by_limited_search(JOIN
 >     table_map allowed_tables= ~(table_map)0;
 >     if (join->emb_sjm_nest)
 >       allowed_tables= join->emb_sjm_nest->sj_inner_tables;
 > -
 > -  bool has_sj= !join->select_lex->sj_nests.is_empty();
 > +  const bool has_sj=
 > +    !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest);

Where did the second part come from?  Will this give a change in behavior?

...

 > @@ -13878,10 +13935,10 @@ void optimize_wo_join_buffering(JOIN *jo
 >       outer_fanout= 1.0;
 >     }
 >
 > -  for (i= first_tab; i<= last_tab; i++)
 > +  for (uint i= first_tab; i<= last_tab; i++)
 >       reopt_remaining_tables |= join->positions[i].table->table->map;

I guess an uint iterator is necessary to avoid having to cast this,
but why are this passed as uint in the first place?  Something we
should look into changing some day.

...

--
Øystein
Thread
bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng6 Apr
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr
      • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen11 Apr
        • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr
          • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen11 Apr
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Guilhem Bichot8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr