List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:March 17 2011 2:54pm
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517
View as plain text  
Hello Roy,

Looks good, but I'm still nit-picking:

Roy Lyseng a écrit, Le 15.03.2011 14:14:
> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:roy.lyseng@stripped
> 
>  3348 Roy Lyseng	2011-03-15
>       Bug#11822517: Refactor call interface of choose_plan()
>       
>         The main purposes of this patch is
>          1. Make the interface of choose_plan() and greedy_search() simpler.
>          2. Prepare for a simpler test for dependencies when doing
>             semi join materialization together with outer join (WL#5561).
>          The fix involves replacing the join_tables argument to choose_plan()
>          with a join nest pointer, which is NULL when the entire query will
>          be optimized. Thus, the set of relevant tables is instead calculated
>          inside this function. 
>          We also clarify that we need to pass as remaining_tables to
>          best_access_path() not only the tables remaining to be optimized,
>          but also all tables outside of the current semi join nest.
>       
>       sql/sql_select.cc
>         optimize_semijoin_nests() is now taking the map of all query tables
>         from the passed JOIN object instead of an argument.
>       
>         best_access_path() - meaning of argument remaining_tables is changed so
>         that it actually is interpreted as all tables not added to the current
>         partial plan.

I don't see that this meaning is changed: in best_access_path(), 
remaining_tables still means the same as before. I think the meaning has 
changed only in best_extension...() ? In best_extension...(), 
remaining_tables is now smaller (less tables) than before, which is why:
- you can eliminate allowed_tables
- you add "back" 'excluded_tables' when calling best_access_path(), so 
that best_access_path() gets the same tables as before.
Isn't it?

>         choose_plan() - argument join_tables is replaced with sjm_nest -
>         pointer to semi join nest to generate a plan for. The tables to include
>         in the plan is calculated inside the function instead.
>       
>         greedy_search() - now calling best_access_path() with all tables not
>         included in the current partial plan.

s/best_access_path/best_extension.../  ?

>         Not calling advance_sj_state() when calculating plan for a materialized
>         semi join nest.
>       
>       sql/sql_select.h
>         Added all_table_map (map of tables in query) to JOIN object.
>         Commented const_table_map, found_const_table_map and outer_join
> 
>     modified:
>       sql/sql_select.cc
>       sql/sql_select.h
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc	2011-03-03 09:43:14 +0000
> +++ b/sql/sql_select.cc	2011-03-15 13:13:54 +0000

> @@ -6927,11 +6925,18 @@ public:
>    'join->positions' of length 'idx'. The chosen access method for 's' and its
>    cost are stored in 'join->positions[idx]'.
>  
> +  Notice also that when calculating a plan for a materialized semi join nest,
> +  this function must not consider key references between tables inside the
> +  semi join nest and those outside of it. The way to control this is to add
> +  the set of outside tables to the @c remaining_tables argument.
> +
>    @param join             pointer to the structure providing all context info
>                            for the query
>    @param s                the table to be joined by the function
>    @param thd              thread for the connection that submitted the query
> -  @param remaining_tables set of tables not included into the partial plan yet
> +  @param remaining_tables set of tables not included into the partial plan yet.
> +                          also tables outside of the join nest when calculating

"join nest" -> "semi join nest"

I suggest "set of tables not included into the partial plan yet;
additionally, when calculating a materialized semi join nest, this must
also contain tables outside of the semi join nest"

> +                          a materialized semi join nest.
>    @param idx              the length of the partial plan
>    @param disable_jbuf     TRUE<=> Don't use join buffering
>    @param record_count     estimate for the number of records returned by the
> @@ -7526,10 +7531,15 @@ best_access_path(JOIN      *join,
>    it. Each specific optimization procedure stores the final optimal plan in
>    the array 'join->best_positions', and the cost of the plan in
>    'join->best_read'.
> -
> -  @param join         pointer to the structure providing all context info for
> -                      the query
> -  @param join_tables  set of the tables in the query
> +  The function can be invoked to produce a plan for all tables in the query
> +  (in this case, the const tables are usually filtered out), or it can be
> +  invoked to produce a plan for a materialization of a semi join nest.
> +  Call with non-NULL sjm_nest pointer when producing a plan for a semi join
> +  nest to be materialized and a NULL pointer when producing a full query plan.
> +
> +  @param join      pointer to structure providing all context info for query
> +  @param sjm_nest  semi join nest that we are producing plan for, or NULL
> +                   which means produce plan for full query.
>  
>    @retval
>      FALSE       ok
> @@ -7537,24 +7547,31 @@ best_access_path(JOIN      *join,
>      TRUE        Fatal error
>  */
>  
> -static bool
> -choose_plan(JOIN *join, table_map join_tables)
> +static bool choose_plan(JOIN *join, TABLE_LIST *sjm_nest)
>  {
>    uint search_depth= join->thd->variables.optimizer_search_depth;
>    uint prune_level=  join->thd->variables.optimizer_prune_level;
>    bool straight_join= test(join->select_options & SELECT_STRAIGHT_JOIN);
> +  table_map join_tables;
>    DBUG_ENTER("choose_plan");
>  
> +  join->emb_sjm_nest= sjm_nest;

This is unrelated to your patch, but join->emb_sjm_nest is used only for
reading, so it could be made "pointer to const"; same for "n_tables" in
the greedy_search() function. I can send you a patch incremental over 
yours, if you are interested. I would also make the "emb_sjm_nest"
parameter of choose_plan() a pointer to const. And I could make a few 
"n_tables" variables constant too.

>    join->cur_embedding_map= 0;
>    reset_nj_counters(join->join_list);
>    qsort2_cmp jtab_sort_func;
>  
> -  if (join->emb_sjm_nest)
> +  if (sjm_nest)
>    {
>      /* We're optimizing semi-join materialization nest, so put the 
>         tables from this semi-join as first
>      */
>      jtab_sort_func= join_tab_cmp_embedded_first;
> +    join_tables= sjm_nest->sj_inner_tables;
> +    /*
> +      Inner tables of semi join nest are not allowed to be identified as
> +      const tables.
> +    */
> +    DBUG_ASSERT(join_tables == (join_tables & ~join->const_table_map));

DBUG_ASSERT((join_tables & join->const_table_map) == 0);
is simpler to read, to me.
But I guess you put it this way to be symmetric with the "else" branch,
so ok.

>    }
>    else
>    {
> @@ -7567,6 +7584,7 @@ choose_plan(JOIN *join, table_map join_t
>          records accessed.
>      */
>      jtab_sort_func= straight_join ? join_tab_cmp_straight : join_tab_cmp;
> +    join_tables= join->all_table_map & ~join->const_table_map;
>    }
>    my_qsort2(join->best_ref + join->const_tables,
>              join->tables - join->const_tables, sizeof(JOIN_TAB*),
> @@ -7594,6 +7612,9 @@ choose_plan(JOIN *join, table_map join_t
>    */
>    if (join->thd->lex->is_single_level_stmt())
>      join->thd->status_var.last_query_cost= join->best_read;
> +
> +  join->emb_sjm_nest= NULL;

If we left the function with DBUG_RETURN(TRUE) we forget to restore
join->emb_sj_nest. I don't know whether this is a real problem, as we
will return an error anyway.

>    DBUG_RETURN(FALSE);
>  }
>  
> @@ -7986,16 +8007,14 @@ greedy_search(JOIN      *join,
>    DBUG_ENTER("greedy_search");
>  
>    /* number of tables that remain to be optimized */
> -  n_tables= size_remain= my_count_bits(remaining_tables &
> -                                       (join->emb_sjm_nest? 
> -                                         join->emb_sjm_nest->sj_inner_tables
> :
> -                                         ~(table_map)0));
> +  n_tables= size_remain= my_count_bits(remaining_tables);
>  
>    do {
>      /* Find the extension of the current QEP with the lowest cost */
>      join->best_read= DBL_MAX;
> -    if (best_extension_by_limited_search(join, remaining_tables, idx, record_count,
> -                                         read_time, search_depth, prune_level))
> +    if (best_extension_by_limited_search(join, remaining_tables, idx,
> +                                         record_count, read_time,
> +                                         search_depth, prune_level))
>        DBUG_RETURN(TRUE);
>      /*
>        'best_read < DBL_MAX' means that optimizer managed to find
> @@ -8242,18 +8261,25 @@ best_extension_by_limited_search(JOIN   
>  
>    DBUG_EXECUTE("opt", print_plan(join, idx, record_count, read_time, read_time,
>                                  "part_plan"););
> +  /*
> +    When calculating a plan for a materialized semi join nest,
> +    best_access_plan() needs to know not only the remaining tables within the
> +    join nest

instead of "join nest", "semi join nest" would be consistent with the
rest of the sentence.

 > +, but also all tables outside of this nest, because there may
> +    be key references between the semi join nest and the outside tables
> +    that should not be considered when materializing the semi join nest.
> +  */
> +  const table_map excluded_tables=
> +    join->emb_sjm_nest ?
> +      join->all_table_map & ~join->emb_sjm_nest->sj_inner_tables :

I continue to think that this 'excluded_tables' should be a parameter to 
best_extension_...(), on par with 'remaining_tables'.
'remaining_tables' is set by the caller using a formula which involves 
sj_inner_tables; this is precisely so that best_extension...() does not 
look into sj_inner_tables...
I am not asking that best_access_path() gets a new parameter; only 
best_extension...(), greedy_search() and optimize_straight_join() would 
get a new parameter.

> +      0;
>  
> -  table_map allowed_tables= ~(table_map)0;
> -  if (join->emb_sjm_nest)
> -    allowed_tables= join->emb_sjm_nest->sj_inner_tables;
> -
> -  bool has_sj= !join->select_lex->sj_nests.is_empty();
> +  const bool has_sj=
> +    !(join->select_lex->sj_nests.is_empty() || join->emb_sjm_nest);

I define 'has_sj' like this:
"whether the join nest we're optimizing contains a semi join nest".
If join->emb_sjm_nest is non-NULL, then this semi join nest doesn't
contain a semi join nest, for sure, as you explain in some comments.
If join->emb_sjm_nest is NULL, we're optimizing the complete query, and
its semi join nests are _then_ listed in join->select_lex->sj_nests.
So I'd prefer if join->emb_sj_nest was tested first instead of second, 
when computing 'has_sj'.
The idea is that if join->emb_sj_nest is non-NULL then we shouldn't even
consult join->select_lex->sj_nests because it is unrelated to the semi
join nests of the *to-be-optimized join nest*.
I know it may sound like bikeshed painting...

>    for (JOIN_TAB **pos= join->best_ref + idx ; (s= *pos) ; pos++)
>    {
>      table_map real_table_bit= s->table->map;
>      if ((remaining_tables & real_table_bit) && 
> -        (allowed_tables & real_table_bit) &&
>          !(remaining_tables & s->dependent) && 
>          (!idx || !check_interleaving_with_nj(s)))
>      {
> @@ -8262,7 +8288,8 @@ best_extension_by_limited_search(JOIN   
>  
>        /* Find the best access method from 's' to the current partial plan */
>        POSITION loose_scan_pos;
> -      best_access_path(join, s, remaining_tables, idx, FALSE, record_count, 
> +      best_access_path(join, s, excluded_tables | remaining_tables,
> +                       idx, FALSE, record_count, 

nowadays we are allowed to use 'false' and 'FALSE', it's up to you.

>                         join->positions + idx, &loose_scan_pos);
>  
>        /* Compute the cost of extending the plan with 's' */
> @@ -8277,6 +8304,8 @@ best_extension_by_limited_search(JOIN   
>            cost (takes 9% of time in a 20-table plan search), hence the if()
>            above, which is also more efficient than the same if() inside
>            advance_sj_state() would be.
> +          Besides, never call advance_sj_state() when calculating the plan
> +          for a materialized semi join nest.
>          */
>          advance_sj_state(join, remaining_tables, s, idx,
>                           &current_record_count, &current_read_time,
> @@ -8333,7 +8362,7 @@ best_extension_by_limited_search(JOIN   
>          }
>        }
>  
> -      if ( (search_depth > 1) && (remaining_tables & ~real_table_bit)
> & allowed_tables )
> +      if (search_depth > 1 && (remaining_tables & ~real_table_bit))
>        { /* Recursively expand the current partial plan */
>          swap_variables(JOIN_TAB*, join->best_ref[idx], *pos);
>          if (best_extension_by_limited_search(join,
> @@ -13932,6 +13961,15 @@ void advance_sj_state(JOIN *join, table_
>    TABLE_LIST *emb_sj_nest= new_join_tab->emb_sj_nest;
>    POSITION *pos= join->positions + idx;
>  
> +  /*
> +    Semi join nests cannot be nested, hence we never need to advance the
> +    semi join state of a materialized semi join query.
> +    In fact, doing this may cause undesirable effects because all tables
> +    within a semi join nest have emb_sj_nest |= NULL, which triggers several

"|=" ?

> +    of the actions inside this function.
> +  */
> +  DBUG_ASSERT(join->emb_sjm_nest == NULL);
> +
>    /* Add this table to the join prefix */
>    remaining_tables &= ~new_join_tab->table->map;
>  
> 
> === modified file 'sql/sql_select.h'
> --- a/sql/sql_select.h	2011-03-01 14:57:53 +0000
> +++ b/sql/sql_select.h	2011-03-15 13:13:54 +0000
> @@ -1640,11 +1640,10 @@ public:
>    bool     first_record,full_join, no_field_update;
>    bool	   group;          /**< If query contains GROUP BY clause */
>    bool	   do_send_rows;
> -  table_map const_table_map,found_const_table_map;
> -  /*
> -     Bitmap of all inner tables from outer joins
> -  */
> -  table_map outer_join;
> +  table_map all_table_map;   /**< Set of tables contained in query */
> +  table_map const_table_map; /**< Set of tables found to be const */
> +  table_map found_const_table_map; /**< Tables that are const and non-empty */
> +  table_map outer_join;      /**< Bitmap of all inner tables from outer joins */
>    /* Number of records produced after join + group operation */
>    ha_rows  send_records;
>    ha_rows found_records,examined_rows,row_limit;

I'd put a comment near emb_sjm_nest to say that this may be non-NULL
only when inside choose_plan().
Thread
bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Roy Lyseng15 Mar
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Guilhem Bichot17 Mar
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Ole John Aske17 Mar
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Roy Lyseng21 Mar
      • Re: bzr commit into mysql-trunk branch (roy.lyseng:3348) Bug#11822517Guilhem Bichot21 Mar