List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:April 11 2011 8:05am
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517
View as plain text  
On 04/11/11 09:22 AM, Roy Lyseng wrote:
> Hi Øystein,
>
> thank you for reviewing. There are a couple of questions for you below.

Thanks for your replies. Looks good to me.  I will approve the patch. 
See in-line for some more comments.

>
> On 08.04.11 11.56, Øystein Grøvlen wrote:
>> Hi Roy,
>>
>> I think the patch looks good. By encapsulating this part of the
>> optimizer in a separate class, it should be easier to understand what
>> belongs together. See in-line for more specific comments.
>>
>> I also wonder whether you have resolved with Ole-John the issue he
>> raised in an earlier email about conflicts with bug#58225 / bug#41740.
>
> We are aware of it, and it will be resolved by one of us having to
> "adapt". Because this is a prerequisite for further semijoin work, I
> cannot afford to wait for the approval of Ole John's fix, unfortunately.

OK.

>>
>> Also, you are introducing a new variant "semi join" that I do not
>> think exists in the code comments before. AFAIK, semi is a prefix
>> (expect in special cases like the semis of a sporting tournament), and
>> it should be either "semi-join" or just "semijoin".
>
> I did it to align the terminology with inner join and outer join, as in
> "outer join nest" and "inner join nest". I am not so troubled by the
> fact that it is not proper English, and there are quite a few references
> to "semi join" in the google space. However, a reference book such as
> "Database systems - the complete book" talks about semijoin operator -
> but they also refer to outerjoin operators.
>
> Anyway, it is probably better to revert to the term semi-join, which is
> used in a lot of the code, and in 6.0 documentation.

Good.

>>
>> On 04/ 6/11 01:57 PM, Roy Lyseng wrote:
>> > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>> revid:tor.didriksen@stripped
>> >
>> > 3358 Roy Lyseng 2011-04-06
>> > Bug#11822517: Refactor call interface of choose_plan()
>> >
>> > Part 1 - create class Optimize_table_order.
>> >
>> > This patch creates a class that is used to optimize the join order of
>> > tables in a query. The main purpose of the refactoring is to simplify
>> > function interfaces and to encapsulate information needed to perform
>> > optimization in a dedicated class.
>> >
>> > The following existing functions are added to the class:
>> > choose_table_order() - renamed from choose_plan()
>> > check_interleaving_with_nj()
>> > advance_sj_state()
>> > backout_nj_sj_state()
>> > optimize_straight_join()
>> > greedy_search()
>> > best_extension_by_limited_search()
>> > optimize_wo_join_buffering()
>> > determine_search_depth()
>> >
>> > sql/sql_select.cc
>> > New class definition, see above for details.
>> >
>> > sql/sql_select.h
>> > Added all_table_map (map of tables in query) to JOIN class.
>> > cur_embedding_map has been moved to class Optimize_table_order.
>> > Commented const_table_map, found_const_table_map and outer_join.
>> >
>> > modified:
>> > sql/sql_select.cc
>> > sql/sql_select.h
>> > === modified file 'sql/sql_select.cc'
>> > --- a/sql/sql_select.cc 2011-03-29 07:30:44 +0000
>> > +++ b/sql/sql_select.cc 2011-04-06 11:56:55 +0000
>>
>> ...
>>
>> > @@ -279,6 +260,94 @@ TABLE *create_duplicate_weedout_tmp_tabl
>> > Item_equal *find_item_equal(COND_EQUAL *cond_equal, Field *field,
>> > bool *inherited_fl);
>> >
>> > +/**
>> > + This class determines the optimal join order for tables within
>> > + a basic query block, ie a query specification clause, possibly
>> extended
>> > + with semi joined tables from embedded subqueries.
>> > +
>> > + This class takes as prerequisite a join class where all
>> dependencies among
>> > + tables have been sorted out, all possible access paths have been
>> > + sorted out, and all statistics information has been filled in.
>> > +
>> > + The class has a sole public function that will calculate the most
>> > + optimal plan based on the inputs and the environment, such as
>> prune level
>> > + and greedy optimizer search depth. For more information, see the
>> > + function headers for the private functions greedy_search() and
>> > + best_extension_by_limited_search().
>> > +*/
>> > +
>> > +class Optimize_table_order
>> > +{
>> > +public:
>> > + Optimize_table_order(THD *thd, JOIN *join, TABLE_LIST *sjm_nest)
>>
>> Using same name for parameters as for member variables is not common
>> in C++ (in fact, I thought it was not allowed) and Sun compiler
>> complains
>>
>> "Warning: join hides Optimize_table_order::join."
>>
>> Since things seems to work, I guess the behavior is a expected, but I
>> suggest that you change the name of the parameters.
>
> AFAIU, this is acceptable and intuitive in the initializer list, but not
> in the constructor body. I'd wish the compiler would warn about member
> hiding, though. In this case it is the reference to join in the
> constructor body that causes the problem, and the easy way out is to
> prefix it with "this->" rather than having to invent an argument name
> prefix or suffix.

OK.

>>
>> You do not need to pass thd as a separate argument, you can get it
>> from join->thd. I am not sure it is a good idea to open up for using
>> another thd than join->thd.
>
> I am preparing for the bright future when thd is not a member of class
> JOIN. Please advice if this is a mandatory comment.

I do not really see the point in preparing for that.  There will be so 
much that need fixing anyway at that point. Anyhow, I will not insist on 
anything here.

>>
>> > + : search_depth(determine_search_depth(join,
>> > + thd->variables.optimizer_search_depth)),
>> > + prune_level(thd->variables.optimizer_prune_level),
>> > + thd(thd), join(join),
>> > + cur_embedding_map(0)
>> > + {
>> > + join->emb_sjm_nest= sjm_nest;
>> > + }
>> > + ~Optimize_table_order()
>> > + {
>> > + join->emb_sjm_nest= NULL;
>> > + }
>> > + /**
>> > + Entry point to table join order optimization.
>> > + For further description, see class header and private function
>> headers.
>> > +
>> > + @return false if successful, true if error
>> > + */
>> > + bool choose_table_order();
>> > +
>> > +private:
>> > + const uint search_depth; // Maximum search depth to apply in
>> greedy search
>> > + const uint prune_level; // pruning heuristics to be applied
>> > + // (0 = EXHAUSTIVE, 1 = PRUNE_BY_TIME_OR_ROWS)
>> > + THD * const thd; // Pointer to current THD
>> > + JOIN * const join; // Pointer to the current plan being developed
>>
>> I suggest writing "*const" instead of "* const". Stroustrup calls
>> this a "*const" modifier, so I think it should be regarded as a single
>> unit.
>
> '*const' wins over '* const' 43 to 18 in trunk, so I guess this is
> reasonable.
>>
>> > + /**
>> > + Bitmap of nested joins embedding the position at the end of the
>> current
>> > + partial join.
>> > + */
>>
>> I do not understand what "embedding the position at the end" means.
>
> I'll try to rephrase it.

Good.

>>
>> > + nested_join_map cur_embedding_map;
>> > + /**
>> > + @todo: Add remaining Join optimization state members here,
>> > + ie emb_sjm_nest, positions, cur_sj_inner_tables.
>> > + This is not yet possible because (some of them) are accessed through
>> > + best_access_path(), which is currently not part of this class.
>> > + */
>> > +
>> > + bool check_interleaving_with_nj(JOIN_TAB *next_tab);
>> > + void advance_sj_state(table_map remaining_tables,
>> > + const JOIN_TAB *tab, uint idx,
>> > + double *current_record_count,
>> > + double *current_read_time,
>> > + POSITION *loose_scan_pos);
>> > + void backout_nj_sj_state(const table_map remaining_tables,
>> > + const JOIN_TAB *tab);
>> > + void optimize_straight_join(table_map join_tables);
>> > + bool greedy_search(table_map remaining_tables);
>> > + bool best_extension_by_limited_search(table_map remaining_tables,
>> > + uint idx,
>> > + double record_count,
>> > + double read_time,
>> > + uint current_search_depth);
>> > + void optimize_wo_join_buffering(uint first_tab, uint last_tab,
>> > + table_map last_remaining_tables,
>> > + bool first_alt, uint no_jbuf_before,
>> > + double *reopt_rec_count, double *reopt_cost);
>> > +
>> > + static uint determine_search_depth(JOIN* join, uint search_depth);
>>
>> I assume this is static in order to be able to call it from the
>> initialization list. I think I would prefer, making this a non-static
>> member function, remove the join parameter, and moving the
>> initialization to the body of the constructor.
>
> It means that search_depth will no longer be const in the class. Is that
> Ok?

Good point.
It is OK with me, but I will leave it to you to decide.

>>
>> > + /**
>> > + @todo: Add the static functions join_tab_cmp(),
>> join_tab_cmp_straight() and
>> > + join_tab_cmp_embedded_first() as static private member functions
>> here.
>> > + This is currently not possible because they are fed to my_qsort2(),
>> > + which requires C linkage.
>> > + @todo: Add best_access_path() as private member function within
>> this class.
>> > + Currently not possible because it is called from outside this class.
>>
>> Why not make it a public member?
>
> Because it is called in a phase of the optimization when there is no
> natural scope of an object of this type. Which means that a new object
> needs to be constructed, and I think we can do better than that in a
> later refactoring.

Good point.

>>
>> > + */
>> > +};
>> >
>> > /**
>> > This handles SELECT with and without UNION
>> > @@ -1953,8 +2022,7 @@ JOIN::optimize()
>> >
>> > /* Calculate how to do the join */
>> > THD_STAGE_INFO(thd, stage_statistics);
>> > - if (make_join_statistics(this, select_lex->leaf_tables,
>> conds,&keyuse) ||
>> > - thd->is_fatal_error)
>> > + if (make_join_statistics(this, select_lex->leaf_tables,
>> conds,&keyuse))
>>
>> The reason for the removable of the is_fatal_error check should be
>> mentioned in the commit comments.
>
> Yes, will do.

Good.

>>
>> > {
>> > DBUG_PRINT("error",("Error: make_join_statistics() failed"));
>> > DBUG_RETURN(1);
>> > @@ -4625,9 +4693,10 @@ make_join_statistics(JOIN *join, TABLE_L
>> > {
>> > int error;
>> > TABLE *table;
>> > + THD * const thd= join->thd;
>>
>> Another case where I think it should be "*const"
>>
>> > TABLE_LIST *tables= tables_arg;
>> > uint i,table_count,const_count,key;
>> > - table_map found_const_table_map, all_table_map, found_ref, refs;
>> > + table_map found_ref, refs;
>> > TABLE **table_vector;
>> > JOIN_TAB *stat,*stat_end,*s,**stat_ref;
>> > Key_use *keyuse, *start_keyuse;
>> > @@ -4636,25 +4705,33 @@ make_join_statistics(JOIN *join, TABLE_L
>> > JOIN_TAB *stat_vector[MAX_TABLES+1];
>> > DBUG_ENTER("make_join_statistics");
>> >
>> > + /*
>> > + Conceptually, this initialization belongs immediately before the call
>> > + to table_order.choose_table_order(), but due to the use of the
>> error label,
>> > + it must be placed before all 'goto error' statements.
>> > + */
>>
>> Why?
>
> I have re-implemented according to Guilhem's suggestion, so there is no
> longer need for the early initialization.

Good.

>>
>> > + Optimize_table_order table_order(thd, join, NULL);
>> > +
>> > table_count=join->tables;
>> > - stat= new (join->thd->mem_root) JOIN_TAB[table_count];
>> > - stat_ref=(JOIN_TAB**)
> join->thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
>> > - table_vector=(TABLE**)
>> join->thd->alloc(sizeof(TABLE*)*(table_count*2));
>> > + stat= new (thd->mem_root) JOIN_TAB[table_count];
>> > + stat_ref=(JOIN_TAB**) thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES);
>> > + table_vector=(TABLE**) thd->alloc(sizeof(TABLE*)*(table_count*2));
>>
>> Missing space after '='
>
> Ok
>>
>> > if (!stat || !stat_ref || !table_vector)
>> > - DBUG_RETURN(1); // Eom /* purecov: inspected */
>> > + DBUG_RETURN(true);
>> >
>> > if (!(join->positions=
>> > - new (join->thd->mem_root) POSITION[table_count+1]))
>> > - DBUG_RETURN(TRUE);
>> > + new (thd->mem_root) POSITION[table_count+1]))
>> > + DBUG_RETURN(true);
>> >
>> > if (!(join->best_positions=
>> > - new (join->thd->mem_root) POSITION[table_count+1]))
>> > - DBUG_RETURN(TRUE);
>> > + new (thd->mem_root) POSITION[table_count+1]))
>> > + DBUG_RETURN(true);
>> >
>> > join->best_ref=stat_vector;
>> >
>> > stat_end=stat+table_count;
>> > - found_const_table_map= all_table_map=0;
>> > + join->found_const_table_map=0;
>>
>> Likewise.
>
> Ok
>>
>> > + join->all_table_map= 0;
>> > const_count=0;
>> >
>> > for (s= stat, i= 0;
>>
>> ...
>
> Ok.
>>
>> > @@ -6945,17 +7008,25 @@ public:
>> > 'join->positions' of length 'idx'. The chosen access method for 's'
>> and its
>> > cost are stored in 'join->positions[idx]'.
>> >
>> > + Notice also that when calculating a plan for a materialized semi
>> join nest,
>> > + this function must not consider key references between tables
>> inside the
>> > + semi join nest and those outside of it. The way to control this is
>> to add
>> > + the set of outside tables to the @c remaining_tables argument.
>> > +
>> > @param join pointer to the structure providing all context info
>> > for the query
>> > @param s the table to be joined by the function
>> > @param thd thread for the connection that submitted the query
>> > - @param remaining_tables set of tables not included into the
>> partial plan yet
>> > + @param remaining_tables set of tables not included in the partial
>> plan yet.
>> > + additionally, when calculating a plan for a
>> > + materialized semi join nest, this must also contain
>> > + all tables outside of the semi join nest.
>>
>> "Additionally" should be capitalized, and there is several occurences
>> of "semi join" here.
>
> Ok.
>>
>> > @param idx the length of the partial plan
>> > @param disable_jbuf TRUE<=> Don't use join buffering
>> > @param record_count estimate for the number of records returned by the
>> > partial plan
>> > - @param pos OUT Table access plan
>> > - @param loose_scan_pos OUT Table plan that uses loosescan, or set
>> cost to
>> > + @param[out] pos Table access plan
>> > + @param[out] loose_scan_pos Table plan that uses loosescan, or set
>> cost to
>> > DBL_MAX if not possible.
>> >
>> > @return
>> > @@ -7537,33 +7608,37 @@ best_access_path(JOIN *join,
>> >
>> >
>> > /**
>> > - Selects and invokes a search strategy for an optimal query plan.
>> > + Selects and invokes a search strategy for an optimal query join
>> order.
>> >
>> > The function checks user-configurable parameters that control the
>> search
>> > strategy for an optimal plan, selects the search method and then
>> invokes
>> > it. Each specific optimization procedure stores the final optimal
>> plan in
>> > the array 'join->best_positions', and the cost of the plan in
>> > 'join->best_read'.
>> > + The function can be invoked to produce a plan for all tables in
>> the query
>> > + (in this case, the const tables are usually filtered out), or it
>> can be
>> > + invoked to produce a plan for a materialization of a semi join nest.
>> > + Set a non-NULL emb_sjm_nest pointer when producing a plan for a
>> semi join
>> > + nest to be materialized and a NULL pointer when producing a full
>> query plan.
>> >
>> > - @param join pointer to the structure providing all context info for
>> > - the query
>> > - @param join_tables set of the tables in the query
>> > -
>> > - @retval
>> > - FALSE ok
>> > - @retval
>> > - TRUE Fatal error
>> > + @return false if successful, true if error
>> > */
>> >
>> > -static bool
>> > -choose_plan(JOIN *join, table_map join_tables)
>> > +bool Optimize_table_order::choose_table_order()
>> > {
>> > - uint search_depth= join->thd->variables.optimizer_search_depth;
>> > - uint prune_level= join->thd->variables.optimizer_prune_level;
>> > - bool straight_join= test(join->select_options&
> SELECT_STRAIGHT_JOIN);
>> > - DBUG_ENTER("choose_plan");
>> > + const bool straight_join= test(join->select_options&
>> SELECT_STRAIGHT_JOIN);
>> > + table_map join_tables;
>>
>> I would like a comment describing what join_tables represent.
>
> Ok.
>>
>> > + DBUG_ENTER("Optimize_table_order::choose_table_order");
>> > +
>> > + /* Are there any tables to optimize? */
>> > + if (join->const_tables == join->tables)
>> > + {
>> > + memcpy(join->best_positions, join->positions,
>> > + sizeof(POSITION) * join->const_tables);
>> > + join->best_read= 1.0;
>> > + DBUG_RETURN(false);
>> > + }
>> >
>> > - join->cur_embedding_map= 0;
>> > reset_nj_counters(join->join_list);
>> > qsort2_cmp jtab_sort_func;
>> >
>> > @@ -7573,6 +7648,12 @@ choose_plan(JOIN *join, table_map join_t
>> > tables from this semi-join as first
>> > */
>> > jtab_sort_func= join_tab_cmp_embedded_first;
>> > + join_tables= join->all_table_map& ~join->const_table_map;
>> > + /*
>> > + Inner tables of semi join nest are not allowed to be identified as
>> > + const tables.
>> > + */
>> > + DBUG_ASSERT((join_tables& join->const_table_map) == 0);
>>
>> I do not think this assert is meaningful. Given the assignement
>> above, it should always be true.
>
> It will become meaningful when the second patch is added ;)

I think that means that the assert really belongs to the second patch.

>>
>> > }
>> > else
>> > {
>> > @@ -7585,6 +7666,7 @@ choose_plan(JOIN *join, table_map join_t
>> > records accessed.
>> > */
>> > jtab_sort_func= straight_join ? join_tab_cmp_straight : join_tab_cmp;
>> > + join_tables= join->all_table_map& ~join->const_table_map;
>>
>> There is no need to have the same assignment in both if and else part.
>> I suggest initializing join_tables where it is declared and making it
>> const.
>
> It is a preparation for the second patch.

OK.

>>
>> > }
>> > my_qsort2(join->best_ref + join->const_tables,
>> > join->tables - join->const_tables, sizeof(JOIN_TAB*),
>>
>> ...
>>
>> > @@ -7980,48 +8056,43 @@ semijoin_order_allows_materialization(co
>> > In the future, 'greedy_search' might be extended to support other
>> > implementations of 'best_extension', e.g. some simpler quadratic
>> procedure.
>> >
>> > - @param join pointer to the structure providing all context info
>> > - for the query
>> > + @par
>> > + @c search_depth from Optimize_table_order controls the exhaustiveness
>> > + of the search, and @c prune_levelcontrols the pruning heuristics that
>>
>> Missing space btw prune_level and controls.
>>
>> > + should be applied during search.
>> > +
>> > @param remaining_tables set of tables not included into the partial
>> plan yet
>> > - @param search_depth controlls the exhaustiveness of the search
>> > - @param prune_level the pruning heuristics that should be applied
>> during
>> > - search
>> >
>> > - @retval
>> > - FALSE ok
>> > - @retval
>> > - TRUE Fatal error
>> > + @return false if successful, true if error
>> > */
>> >
>> > -static bool
>> > -greedy_search(JOIN *join,
>> > - table_map remaining_tables,
>> > - uint search_depth,
>> > - uint prune_level)
>> > +bool Optimize_table_order::greedy_search(table_map remaining_tables)
>>
>> ...
>>
>> > @@ -8278,8 +8337,8 @@ best_extension_by_limited_search(JOIN
>> > table_map allowed_tables= ~(table_map)0;
>> > if (join->emb_sjm_nest)
>> > allowed_tables= join->emb_sjm_nest->sj_inner_tables;
>> > -
>> > - bool has_sj= !join->select_lex->sj_nests.is_empty();
>> > + const bool has_sj=
>> > + !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest);
>>
>> Where did the second part come from? Will this give a change in behavior?
>>
>> ...
>
> I can document it.

Good.

>>
>> > @@ -13878,10 +13935,10 @@ void optimize_wo_join_buffering(JOIN *jo
>> > outer_fanout= 1.0;
>> > }
>> >
>> > - for (i= first_tab; i<= last_tab; i++)
>> > + for (uint i= first_tab; i<= last_tab; i++)
>> > reopt_remaining_tables |= join->positions[i].table->table->map;
>>
>> I guess an uint iterator is necessary to avoid having to cast this,
>> but why are this passed as uint in the first place? Something we
>> should look into changing some day.
>
> Yes, some day. Here I just follow the current convention of using uint
> in this block of code.

OK.

-- 
Øystein
Thread
bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng6 Apr
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr
      • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen11 Apr
        • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr
          • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Øystein Grøvlen11 Apr
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Guilhem Bichot8 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3358) Bug#11822517Roy Lyseng11 Apr