From: Guilhem Bichot Date: March 17 2011 2:54pm Subject: Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517 List-Archive: http://lists.mysql.com/commits/133236 Message-Id: <4D8220AC.4050300@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hello Roy, Looks good, but I'm still nit-picking: Roy Lyseng a écrit, Le 15.03.2011 14:14: > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on revid:roy.lyseng@stripped > > 3348 Roy Lyseng 2011-03-15 > Bug#11822517: Refactor call interface of choose_plan() > > The main purposes of this patch is > 1. Make the interface of choose_plan() and greedy_search() simpler. > 2. Prepare for a simpler test for dependencies when doing > semi join materialization together with outer join (WL#5561). > The fix involves replacing the join_tables argument to choose_plan() > with a join nest pointer, which is NULL when the entire query will > be optimized. Thus, the set of relevant tables is instead calculated > inside this function. > We also clarify that we need to pass as remaining_tables to > best_access_path() not only the tables remaining to be optimized, > but also all tables outside of the current semi join nest. > > sql/sql_select.cc > optimize_semijoin_nests() is now taking the map of all query tables > from the passed JOIN object instead of an argument. > > best_access_path() - meaning of argument remaining_tables is changed so > that it actually is interpreted as all tables not added to the current > partial plan. I don't see that this meaning is changed: in best_access_path(), remaining_tables still means the same as before. I think the meaning has changed only in best_extension...() ? In best_extension...(), remaining_tables is now smaller (less tables) than before, which is why: - you can eliminate allowed_tables - you add "back" 'excluded_tables' when calling best_access_path(), so that best_access_path() gets the same tables as before. Isn't it? > choose_plan() - argument join_tables is replaced with sjm_nest - > pointer to semi join nest to generate a plan for. The tables to include > in the plan is calculated inside the function instead. > > greedy_search() - now calling best_access_path() with all tables not > included in the current partial plan. s/best_access_path/best_extension.../ ? > Not calling advance_sj_state() when calculating plan for a materialized > semi join nest. > > sql/sql_select.h > Added all_table_map (map of tables in query) to JOIN object. > Commented const_table_map, found_const_table_map and outer_join > > modified: > sql/sql_select.cc > sql/sql_select.h > === modified file 'sql/sql_select.cc' > --- a/sql/sql_select.cc 2011-03-03 09:43:14 +0000 > +++ b/sql/sql_select.cc 2011-03-15 13:13:54 +0000 > @@ -6927,11 +6925,18 @@ public: > 'join->positions' of length 'idx'. The chosen access method for 's' and its > cost are stored in 'join->positions[idx]'. > > + Notice also that when calculating a plan for a materialized semi join nest, > + this function must not consider key references between tables inside the > + semi join nest and those outside of it. The way to control this is to add > + the set of outside tables to the @c remaining_tables argument. > + > @param join pointer to the structure providing all context info > for the query > @param s the table to be joined by the function > @param thd thread for the connection that submitted the query > - @param remaining_tables set of tables not included into the partial plan yet > + @param remaining_tables set of tables not included into the partial plan yet. > + also tables outside of the join nest when calculating "join nest" -> "semi join nest" I suggest "set of tables not included into the partial plan yet; additionally, when calculating a materialized semi join nest, this must also contain tables outside of the semi join nest" > + a materialized semi join nest. > @param idx the length of the partial plan > @param disable_jbuf TRUE<=> Don't use join buffering > @param record_count estimate for the number of records returned by the > @@ -7526,10 +7531,15 @@ best_access_path(JOIN *join, > it. Each specific optimization procedure stores the final optimal plan in > the array 'join->best_positions', and the cost of the plan in > 'join->best_read'. > - > - @param join pointer to the structure providing all context info for > - the query > - @param join_tables set of the tables in the query > + The function can be invoked to produce a plan for all tables in the query > + (in this case, the const tables are usually filtered out), or it can be > + invoked to produce a plan for a materialization of a semi join nest. > + Call with non-NULL sjm_nest pointer when producing a plan for a semi join > + nest to be materialized and a NULL pointer when producing a full query plan. > + > + @param join pointer to structure providing all context info for query > + @param sjm_nest semi join nest that we are producing plan for, or NULL > + which means produce plan for full query. > > @retval > FALSE ok > @@ -7537,24 +7547,31 @@ best_access_path(JOIN *join, > TRUE Fatal error > */ > > -static bool > -choose_plan(JOIN *join, table_map join_tables) > +static bool choose_plan(JOIN *join, TABLE_LIST *sjm_nest) > { > uint search_depth= join->thd->variables.optimizer_search_depth; > uint prune_level= join->thd->variables.optimizer_prune_level; > bool straight_join= test(join->select_options & SELECT_STRAIGHT_JOIN); > + table_map join_tables; > DBUG_ENTER("choose_plan"); > > + join->emb_sjm_nest= sjm_nest; This is unrelated to your patch, but join->emb_sjm_nest is used only for reading, so it could be made "pointer to const"; same for "n_tables" in the greedy_search() function. I can send you a patch incremental over yours, if you are interested. I would also make the "emb_sjm_nest" parameter of choose_plan() a pointer to const. And I could make a few "n_tables" variables constant too. > join->cur_embedding_map= 0; > reset_nj_counters(join->join_list); > qsort2_cmp jtab_sort_func; > > - if (join->emb_sjm_nest) > + if (sjm_nest) > { > /* We're optimizing semi-join materialization nest, so put the > tables from this semi-join as first > */ > jtab_sort_func= join_tab_cmp_embedded_first; > + join_tables= sjm_nest->sj_inner_tables; > + /* > + Inner tables of semi join nest are not allowed to be identified as > + const tables. > + */ > + DBUG_ASSERT(join_tables == (join_tables & ~join->const_table_map)); DBUG_ASSERT((join_tables & join->const_table_map) == 0); is simpler to read, to me. But I guess you put it this way to be symmetric with the "else" branch, so ok. > } > else > { > @@ -7567,6 +7584,7 @@ choose_plan(JOIN *join, table_map join_t > records accessed. > */ > jtab_sort_func= straight_join ? join_tab_cmp_straight : join_tab_cmp; > + join_tables= join->all_table_map & ~join->const_table_map; > } > my_qsort2(join->best_ref + join->const_tables, > join->tables - join->const_tables, sizeof(JOIN_TAB*), > @@ -7594,6 +7612,9 @@ choose_plan(JOIN *join, table_map join_t > */ > if (join->thd->lex->is_single_level_stmt()) > join->thd->status_var.last_query_cost= join->best_read; > + > + join->emb_sjm_nest= NULL; If we left the function with DBUG_RETURN(TRUE) we forget to restore join->emb_sj_nest. I don't know whether this is a real problem, as we will return an error anyway. > DBUG_RETURN(FALSE); > } > > @@ -7986,16 +8007,14 @@ greedy_search(JOIN *join, > DBUG_ENTER("greedy_search"); > > /* number of tables that remain to be optimized */ > - n_tables= size_remain= my_count_bits(remaining_tables & > - (join->emb_sjm_nest? > - join->emb_sjm_nest->sj_inner_tables : > - ~(table_map)0)); > + n_tables= size_remain= my_count_bits(remaining_tables); > > do { > /* Find the extension of the current QEP with the lowest cost */ > join->best_read= DBL_MAX; > - if (best_extension_by_limited_search(join, remaining_tables, idx, record_count, > - read_time, search_depth, prune_level)) > + if (best_extension_by_limited_search(join, remaining_tables, idx, > + record_count, read_time, > + search_depth, prune_level)) > DBUG_RETURN(TRUE); > /* > 'best_read < DBL_MAX' means that optimizer managed to find > @@ -8242,18 +8261,25 @@ best_extension_by_limited_search(JOIN > > DBUG_EXECUTE("opt", print_plan(join, idx, record_count, read_time, read_time, > "part_plan");); > + /* > + When calculating a plan for a materialized semi join nest, > + best_access_plan() needs to know not only the remaining tables within the > + join nest instead of "join nest", "semi join nest" would be consistent with the rest of the sentence. > +, but also all tables outside of this nest, because there may > + be key references between the semi join nest and the outside tables > + that should not be considered when materializing the semi join nest. > + */ > + const table_map excluded_tables= > + join->emb_sjm_nest ? > + join->all_table_map & ~join->emb_sjm_nest->sj_inner_tables : I continue to think that this 'excluded_tables' should be a parameter to best_extension_...(), on par with 'remaining_tables'. 'remaining_tables' is set by the caller using a formula which involves sj_inner_tables; this is precisely so that best_extension...() does not look into sj_inner_tables... I am not asking that best_access_path() gets a new parameter; only best_extension...(), greedy_search() and optimize_straight_join() would get a new parameter. > + 0; > > - table_map allowed_tables= ~(table_map)0; > - if (join->emb_sjm_nest) > - allowed_tables= join->emb_sjm_nest->sj_inner_tables; > - > - bool has_sj= !join->select_lex->sj_nests.is_empty(); > + const bool has_sj= > + !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest); I define 'has_sj' like this: "whether the join nest we're optimizing contains a semi join nest". If join->emb_sjm_nest is non-NULL, then this semi join nest doesn't contain a semi join nest, for sure, as you explain in some comments. If join->emb_sjm_nest is NULL, we're optimizing the complete query, and its semi join nests are _then_ listed in join->select_lex->sj_nests. So I'd prefer if join->emb_sj_nest was tested first instead of second, when computing 'has_sj'. The idea is that if join->emb_sj_nest is non-NULL then we shouldn't even consult join->select_lex->sj_nests because it is unrelated to the semi join nests of the *to-be-optimized join nest*. I know it may sound like bikeshed painting... > for (JOIN_TAB **pos= join->best_ref + idx ; (s= *pos) ; pos++) > { > table_map real_table_bit= s->table->map; > if ((remaining_tables & real_table_bit) && > - (allowed_tables & real_table_bit) && > !(remaining_tables & s->dependent) && > (!idx || !check_interleaving_with_nj(s))) > { > @@ -8262,7 +8288,8 @@ best_extension_by_limited_search(JOIN > > /* Find the best access method from 's' to the current partial plan */ > POSITION loose_scan_pos; > - best_access_path(join, s, remaining_tables, idx, FALSE, record_count, > + best_access_path(join, s, excluded_tables | remaining_tables, > + idx, FALSE, record_count, nowadays we are allowed to use 'false' and 'FALSE', it's up to you. > join->positions + idx, &loose_scan_pos); > > /* Compute the cost of extending the plan with 's' */ > @@ -8277,6 +8304,8 @@ best_extension_by_limited_search(JOIN > cost (takes 9% of time in a 20-table plan search), hence the if() > above, which is also more efficient than the same if() inside > advance_sj_state() would be. > + Besides, never call advance_sj_state() when calculating the plan > + for a materialized semi join nest. > */ > advance_sj_state(join, remaining_tables, s, idx, > ¤t_record_count, ¤t_read_time, > @@ -8333,7 +8362,7 @@ best_extension_by_limited_search(JOIN > } > } > > - if ( (search_depth > 1) && (remaining_tables & ~real_table_bit) & allowed_tables ) > + if (search_depth > 1 && (remaining_tables & ~real_table_bit)) > { /* Recursively expand the current partial plan */ > swap_variables(JOIN_TAB*, join->best_ref[idx], *pos); > if (best_extension_by_limited_search(join, > @@ -13932,6 +13961,15 @@ void advance_sj_state(JOIN *join, table_ > TABLE_LIST *emb_sj_nest= new_join_tab->emb_sj_nest; > POSITION *pos= join->positions + idx; > > + /* > + Semi join nests cannot be nested, hence we never need to advance the > + semi join state of a materialized semi join query. > + In fact, doing this may cause undesirable effects because all tables > + within a semi join nest have emb_sj_nest |= NULL, which triggers several "|=" ? > + of the actions inside this function. > + */ > + DBUG_ASSERT(join->emb_sjm_nest == NULL); > + > /* Add this table to the join prefix */ > remaining_tables &= ~new_join_tab->table->map; > > > === modified file 'sql/sql_select.h' > --- a/sql/sql_select.h 2011-03-01 14:57:53 +0000 > +++ b/sql/sql_select.h 2011-03-15 13:13:54 +0000 > @@ -1640,11 +1640,10 @@ public: > bool first_record,full_join, no_field_update; > bool group; /**< If query contains GROUP BY clause */ > bool do_send_rows; > - table_map const_table_map,found_const_table_map; > - /* > - Bitmap of all inner tables from outer joins > - */ > - table_map outer_join; > + table_map all_table_map; /**< Set of tables contained in query */ > + table_map const_table_map; /**< Set of tables found to be const */ > + table_map found_const_table_map; /**< Tables that are const and non-empty */ > + table_map outer_join; /**< Bitmap of all inner tables from outer joins */ > /* Number of records produced after join + group operation */ > ha_rows send_records; > ha_rows found_records,examined_rows,row_limit; I'd put a comment near emb_sjm_nest to say that this may be non-NULL only when inside choose_plan().