From: Roy Lyseng Date: June 27 2011 7:00am Subject: Re: bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543 List-Archive: http://lists.mysql.com/commits/139879 Message-Id: <4E082A80.1040208@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi Øystein, thank you for approval. On 10.06.11 11.58, Øystein Grøvlen wrote: > Hi Roy, > > Thanks for patch. Looks good, and I will approve it. See below for > some comments that you may consider. (Prefixed with ØG) > > === modified file 'sql/sql_select.cc' > --- sql/sql_select.cc 2011-05-18 13:12:02 +0000 > +++ sql/sql_select.cc 2011-05-19 10:56:42 +0000 > > ... > > @@ -4429,32 +4433,35 @@ > } > > > -/* > - Pull tables out of semi-join nests, if possible > - > - SYNOPSIS > - pull_out_semijoin_tables() > - join The join where to do the semi-join flattening > - > - DESCRIPTION > - Try to pull tables out of semi-join nests. > - > +/** > + Pull const tables out of semi-join nests > > ØG: This use of the term "const table" seems to be in conflict with > existing usage, and will probably create confusion. I suggest > "Pull tables out of semi-join nests based on functional > dependencies" instead. Done. > > + > + @param join The join where to do the semi-join table pullout > + > + @return False if successful, true if error (Out of memory) > + > + @details > + Pull const tables out of semi-join nests based on functional dependencies, > > ØG: Please, remove "const" above. Done. > > + ie. if a table is accessed via eq_ref(outer_tables). > + The function may be called several times, the caller is responsible > + for setting up proper key information that this function acts upon. > + > PRECONDITIONS > When this function is called, the join may have several semi-join nests > but it is guaranteed that one semi-join nest does not contain another. > - > - ACTION > - A table can be pulled out of the semi-join nest if > - - It is a constant table, or > - - It is accessed via eq_ref(outer_tables) > + For functionally dependent tables to be pulled out, key information must > + have been calculated (see update_ref_and_keys()). > > POSTCONDITIONS > - * Semi-join nests' TABLE_LIST::sj_inner_tables is updated accordingly > - > - This operation is (and should be) performed at each PS execution since > - tables may become/cease to be constant across PS reexecutions. > + * Tables that were pulled out are removed from the semi-join nest they > + belonged to and added to the parent join nest. > + * For these tables, the used_tables and not_null_tables fields of > + the semi-join nest they belonged to will be adjusted. > + The semi-join nest is also marked as correlated, and > + sj_corr_tables and sj_depends_on are adjusted if necessary. > + * Semi-join nests' sj_inner_tables is set equal to used_tables > > - NOTE > + NOTE > Table pullout may make uncorrelated subquery correlated. Consider this > example: > > @@ -4466,13 +4473,9 @@ > make the subquery (i.e. its semi-join nest) correlated. > Making the subquery (i.e. its semi-join nest) correlated prevents us from > using Materialization or LooseScan to execute it. > - > - RETURN > - FALSE - OK > - TRUE - Out of memory error > */ > > -bool pull_out_semijoin_tables(JOIN *join) > +static bool pull_out_semijoin_tables(JOIN *join) > { > TABLE_LIST *sj_nest; > DBUG_ENTER("pull_out_semijoin_tables"); > @@ -4484,29 +4487,14 @@ > > /* Try pulling out of the each of the semi-joins */ > > ØG: While you are changing this part of the code, I suggest you remove > "the" from comment above? Done. > > while ((sj_nest= sj_list_it++)) > - { > - /* Action #1: Mark the constant tables to be pulled out */ > + { > table_map pulled_tables= 0; > - > List_iterator child_li(sj_nest->nested_join->join_list); > TABLE_LIST *tbl; > - while ((tbl= child_li++)) > - { > - if (tbl->table) > - { > - if (tbl->table->map & join->const_table_map) > - { > - pulled_tables |= tbl->table->map; > - DBUG_PRINT("info", ("Table %s pulled out (reason: constant)", > - tbl->table->alias)); > - } > - } > - } > - > /* > - Action #2: Find which tables we can pull out based on > - update_ref_and_keys() data. Note that pulling one table out can allow > - us to pull out some other tables too. > + Find which tables we can pull out based on key dependency data. > + Note that pulling one table out can allow us to pull out some > + other tables too. > */ > bool pulled_a_table; > do > @@ -4538,23 +4526,25 @@ > > child_li.rewind(); > /* > - Action #3: Move the pulled out TABLE_LIST elements to the parents. > + Move the pulled out TABLE_LIST elements to the parents. > */ > - table_map inner_tables= sj_nest->nested_join->used_tables & > - ~pulled_tables; > - /* Record the bitmap of inner tables */ > - sj_nest->sj_inner_tables= inner_tables; > + sj_nest->nested_join->used_tables&= ~pulled_tables; > + sj_nest->nested_join->not_null_tables&= ~pulled_tables; > + > + /* sj_inner_tables is a copy of nested_join->used_tables */ > + sj_nest->sj_inner_tables= sj_nest->nested_join->used_tables; > + > if (pulled_tables) > { > - List *upper_join_list= (sj_nest->embedding != NULL)? > - (&sj_nest->embedding->nested_join->join_list): > - (&join->select_lex->top_join_list); > + List *upper_join_list= (sj_nest->embedding != NULL) ? > + &sj_nest->embedding->nested_join->join_list : > + &join->select_lex->top_join_list; > Query_arena *arena, backup; > arena= join->thd->activate_stmt_arena_if_needed(&backup); > while ((tbl= child_li++)) > { > if (tbl->table && > - !(inner_tables & tbl->table->map)) > + !(sj_nest->nested_join->used_tables & tbl->table->map)) > > ØG: Any particular reason you prefer to use > sj_nest->nested_join->used_tables over sj_nest->sj_inner_tables > here? I have a followup-patch from WL5266 that intends to remove sj_inner_tables, as there is no need to have both this and nested_join->used_tables. However, the most convenient thing was to delay this patch to after this bug was fixed, but then I did not know it would take this long time to close the bug... The patch: http://lists.mysql.com/commits/110609 (It's probably not applicable as-is). > > { > /* > Pull the table up in the same way as simplify_joins() does: > @@ -4574,7 +4564,7 @@ > } > > /* Remove the sj-nest itself if we've removed everything from it */ > - if (!inner_tables) > + if (!sj_nest->nested_join->used_tables) > > ØG: And here? See above. > > { > List_iterator li(*upper_join_list); > /* Find the sj_nest in the list. */ > > @@ -4700,19 +4690,50 @@ > return len; > } > > - > /** > - Calculate the best possible join and initialize the join structure. > - > - @retval > - 0 ok > - @retval > - 1 Fatal error > + Calculate best possible join order and initialize the join structure. > + > + @param join Join object that is populated with statistics data > + @param tables_arg List of tables that is referenced by this query > + @param conds Where condition of query > + @param keyuse_array[out] Populated with key_use information > + @param first_optimization True if first optimization of this query > + > + @return true if success, false if error > + > + @details > + Here is an overview of the logic of this function: > + > + - Initialize JOIN data structures and setup basic dependencies between tables. > + > + - Update dependencies based on join information. > + > + - Make key descriptions (update_ref_and_keys()). > + > + - Pull out semi-join tables based on table dependencies. > + > + - Extract tables with zero or one rows as const tables. > + > + - Read contents of const tables, substitute columns from these tables with > + actual data. Also keep track of empty tables vs. one-row tables. > + > + - After const table extraction based on row count, more tables may > + have become functionally dependent. Extract these as const tables. > + > + - Add new sargable predicates based on retrieved const values. > + > + - Calculate number of rows to be retrieved from each table. > + > + - Calculate cost of potential semi-join materializations. > + > + - Calculate best possible join order based on available statistics. > + > + - Fill in remaining information for the generated join order. > */ > > static bool > make_join_statistics(JOIN *join, TABLE_LIST *tables_arg, Item *conds, > - Key_use_array *keyuse_array) > + Key_use_array *keyuse_array, bool first_optimization) > { > int error; > TABLE *table; > @@ -4747,10 +4768,15 @@ > join->best_ref= stat_vector; > > stat_end= stat+table_count; > + join->const_table_map= 0; > join->found_const_table_map= 0; > join->all_table_map= 0; > const_count= 0; > > + /* > + Initialize data structures for tables to be joined. > + Initialize dependencies between tables. > + */ > for (s= stat, i= 0; > tables; > s++, tables= tables->next_leaf, i++) > @@ -4777,31 +4803,10 @@ > table->quick_condition_rows= table->file->stats.records; > > s->on_expr_ref= &tables->on_expr; > - if (*s->on_expr_ref) > - { > - /* s is the only inner table of an outer join */ > -#ifdef WITH_PARTITION_STORAGE_ENGINE > - if ((!table->file->stats.records || table->no_partitions_used) && > - !tables->in_outer_join_nest()) > -#else > - if (!table->file->stats.records && !tables->in_outer_join_nest()) > -#endif > - { // Empty table > - s->dependent= 0; // Ignore LEFT JOIN depend. > - set_position(join, const_count++, s, NULL); > - continue; > - } > - outer_join|= table->map; > - s->embedding_map= 0; > - for (TABLE_LIST *embedding= tables->embedding; > - embedding; > - embedding= embedding->embedding) > - s->embedding_map|= embedding->nested_join->nj_map; > - continue; > - } > + > if (tables->in_outer_join_nest()) > { > - /* s belongs to a nested join, maybe to several embedded joins */ > + /* s belongs to a nested join, maybe to several embedding joins */ > s->embedding_map= 0; > for (TABLE_LIST *embedding= tables->embedding; > embedding; > @@ -4812,20 +4817,16 @@ > s->dependent|= embedding->dep_tables; > outer_join|= nested_join->used_tables; > } > - continue; > } > -#ifdef WITH_PARTITION_STORAGE_ENGINE > - const bool no_partitions_used= table->no_partitions_used; > -#else > - const bool no_partitions_used= FALSE; > -#endif > - if ((table->s->system || table->file->stats.records <= 1 || > - no_partitions_used) && > - !s->dependent && > - (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) && > - !table->fulltext_searched && !join->no_const_tables) > + else if (*s->on_expr_ref) > { > - set_position(join, const_count++, s, NULL); > + /* s is the only inner table of an outer join */ > + outer_join|= table->map; > + s->embedding_map= 0; > + for (TABLE_LIST *embedding= tables->embedding; > + embedding; > + embedding= embedding->embedding) > + s->embedding_map|= embedding->nested_join->nj_map; > } > } > stat_vector[i]=0; > @@ -4834,6 +4835,7 @@ > if (join->outer_join) > { > /* > + Complete the dependency analysis. > Build transitive closure for relation 'to be dependent on'. > This will speed up the plan search for many cases with outer joins, > as well as allow us to catch illegal cross references. > @@ -4893,8 +4895,102 @@ > ~outer_join, join->select_lex, &sargables)) > goto error; > > - /* Read tables with 0 or 1 rows (system tables) */ > - join->const_table_map= 0; > + /* > + Pull out semi-join tables based on dependencies. Dependencies are valid > + throughout the lifetime of a query, so this operation can be performed > + on the first optimization only. > + */ > + if (first_optimization) > + { > + if (pull_out_semijoin_tables(join)) > + DBUG_RETURN(true); > + } > + > + /* > + Extract const tables based on row counts, must be done for each execution. > + Tables containing exactly zero or one rows are marked as const, but > + notice the additional constraints checked below. > + Tables that are extracted have their rows read before actual execution > + starts and are placed in the beginning of the join_tab array. > + Thus, they do not take part in join order optimization process, > + which can significantly reduce the optimization time. > + The data read from these tables can also be regarded as "constant" > + throughout query execution, hence the column values can be used for > + additional constant propagation and extraction of const tables based > + on eq-ref properties. > + */ > + enum enum_const_table_extraction > + { > + extract_no_table= 0, > + extract_empty_table= 1, > + extract_const_table= 2 > + }; > + > + for (i= 0, s= stat; i < table_count; i++, s++) > + { > + TABLE *const table= s->table; > + TABLE_LIST *const tables= table->pos_in_table_list; > + enum enum_const_table_extraction extract_method= extract_const_table; > + > +#ifdef WITH_PARTITION_STORAGE_ENGINE > + const bool no_partitions_used= table->no_partitions_used; > +#else > + const bool no_partitions_used= false; > +#endif > + > + if (tables->in_outer_join_nest()) > + { > + /* > + Table belongs to a nested join, no candidate for const table extraction. > + */ > + extract_method= extract_no_table; > + } > + else if (tables->embedding && tables->embedding->sj_on_expr) > + { > + /* > + Table belongs to a semi-join. > + We do not currently pull out const tables from semi-join nests. > + */ > + extract_method= extract_no_table; > + } > + else if (*s->on_expr_ref) > + { > + /* s is the only inner table of an outer join, extract empty tables */ > + extract_method= extract_empty_table; > + } > + switch (extract_method) > + { > + case extract_no_table: > + break; > + > + case extract_empty_table: > + if ((table->file->stats.records == 0 || > + no_partitions_used) && > + (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) && > + !join->no_const_tables) > + set_position(join, const_count++, s, NULL); > + break; > + > + case extract_const_table: > + /* > + Extract tables with zero or one rows, but do not extract tables that > + 1. are dependent upon other tables, or > + 2. have no exact statistics, or > + 3. are full-text searched, or > + 4. the join object does not allow const table elimination. > + */ > + if ((table->s->system || > + table->file->stats.records <= 1 || > + no_partitions_used) && > + !s->dependent && // 1 > + (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) && // 2 > + !table->fulltext_searched && // 3 > + !join->no_const_tables) // 4 > + set_position(join, const_count++, s, NULL); > + break; > > ØG: I am bit concerned that large parts of this if statement has to > be repeated twice. Is it not some way that this code could be > shared? It also seems a bit strange that it is considered that an > explanation is only needed the second time. Explanation added for the first branch as well. I took out the test on join_no_const_tables. Unfortunately, it does introduce a goto, but it will be possible to remove it again if we rafactor this function into multiple ones. > > + } > + } > + /* Read const tables (tables matching no more than 1 rows) */ > > for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count; > p_pos < p_end ; > @@ -5015,14 +5111,19 @@ > } while (keyuse->table == table && keyuse->key == key); > > /* > + Extract const tables with proper key dependencies. > + Exclude tables that > + 1. are full-text searched, or > + 2. are part of nested outer join. > + > TODO (low priority): currently we ignore the const tables that > are within a semi-join nest which is within an outer join nest. > The effect of this is that we don't do const substitution for > such tables. > > ØG: Is the above TODO meaningful anymore? We have decided to ignore > all const tables within semi-join nests so why should we be concerned > about a sub-class of those? I suggest deleting this. Agree, done. > > */ > if (eq_part.is_prefix(table->key_info[key].key_parts) && > - !table->fulltext_searched && > - !table->pos_in_table_list->in_outer_join_nest()) > + !table->fulltext_searched && // 1 > + !table->pos_in_table_list->in_outer_join_nest()) // 2 > { > if (table->key_info[key].flags & HA_NOSAME) > { > @@ -5081,6 +5182,8 @@ > > for (s=stat ; s < stat_end ; s++) > { > + TABLE_LIST *const tl= s->table->pos_in_table_list; > + > if (s->type == JT_SYSTEM || s->type == JT_CONST) > { > /* Only one matching row */ > @@ -5113,9 +5216,8 @@ > Do range analysis if we're on the inner side of a semi-join (3). > */ > if (!s->const_keys.is_clear_all() && // (1) > - (!s->table->pos_in_table_list->embedding || // (2) > - (s->table->pos_in_table_list->embedding && // (3) > - s->table->pos_in_table_list->embedding->sj_on_expr))) // (3) > + (!tl->embedding || // (2) > + (tl->embedding && tl->embedding->sj_on_expr))) // (3) > { > ha_rows records; > SQL_SELECT *select; > @@ -5133,7 +5235,13 @@ > s->quick=select->quick; > s->needed_reg=select->needed_reg; > select->quick=0; > - if (records == 0 && s->table->reginfo.impossible_range) > + /* > + Check for "impossible range", make sure that semi-joined tables > + are not accounted for. > > ØG: What do you mean by not accounted for, and why is this necessary? > Please, improve comment. Problem is that this is yet another "const table pullout" based on data volume, which we no longer support. Comment updated. > > + */ > + if (records == 0 && > + s->table->reginfo.impossible_range && > + (!(tl->embedding && tl->embedding->sj_on_expr))) > { > /* > Impossible WHERE or ON expression > @@ -5161,18 +5269,14 @@ > delete select; > } > } > - > - if (pull_out_semijoin_tables(join)) > - DBUG_RETURN(TRUE); > - > /* > - Set pointer to embedding semijoin nest for all semijoined tables. > - Note that this must be done for every table inside all semijoin nests, > - even for tables within outer join nests embedded in semijoin nests. > - A table can never be part of multiple semijoin nests, hence no > + Set pointer to embedding semi-join nest for all semi-joined tables. > + Note that this must be done for every table inside all semi-join nests, > + even for tables within outer join nests embedded in semi-join nests. > + A table can never be part of multiple semi-join nests, hence no > ambiguities can ever occur. > Note also that the pointer is not set for TABLE_LIST objects that > - are outer join nests within semijoin nests. > + are outer join nests within semi-join nests. > */ > for (s= stat; s < stat_end; s++) > { Thanks, Roy