List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:June 27 2011 7:00am
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543
View as plain text  
Hi Øystein,

thank you for approval.

On 10.06.11 11.58, Øystein Grøvlen wrote:
> Hi Roy,
>
> Thanks for patch. Looks good, and I will approve it. See below for
> some comments that you may consider. (Prefixed with ØG)
>
> === modified file 'sql/sql_select.cc'
> --- sql/sql_select.cc 2011-05-18 13:12:02 +0000
> +++ sql/sql_select.cc 2011-05-19 10:56:42 +0000
>
> ...
>
> @@ -4429,32 +4433,35 @@
> }
>
>
> -/*
> - Pull tables out of semi-join nests, if possible
> -
> - SYNOPSIS
> - pull_out_semijoin_tables()
> - join The join where to do the semi-join flattening
> -
> - DESCRIPTION
> - Try to pull tables out of semi-join nests.
> -
> +/**
> + Pull const tables out of semi-join nests
>
> ØG: This use of the term "const table" seems to be in conflict with
> existing usage, and will probably create confusion. I suggest
> "Pull tables out of semi-join nests based on functional
> dependencies" instead.

Done.
>
> +
> + @param join The join where to do the semi-join table pullout
> +
> + @return False if successful, true if error (Out of memory)
> +
> + @details
> + Pull const tables out of semi-join nests based on functional dependencies,
>
> ØG: Please, remove "const" above.

Done.

>
> + ie. if a table is accessed via eq_ref(outer_tables).
> + The function may be called several times, the caller is responsible
> + for setting up proper key information that this function acts upon.
> +
> PRECONDITIONS
> When this function is called, the join may have several semi-join nests
> but it is guaranteed that one semi-join nest does not contain another.
> -
> - ACTION
> - A table can be pulled out of the semi-join nest if
> - - It is a constant table, or
> - - It is accessed via eq_ref(outer_tables)
> + For functionally dependent tables to be pulled out, key information must
> + have been calculated (see update_ref_and_keys()).
>
> POSTCONDITIONS
> - * Semi-join nests' TABLE_LIST::sj_inner_tables is updated accordingly
> -
> - This operation is (and should be) performed at each PS execution since
> - tables may become/cease to be constant across PS reexecutions.
> + * Tables that were pulled out are removed from the semi-join nest they
> + belonged to and added to the parent join nest.
> + * For these tables, the used_tables and not_null_tables fields of
> + the semi-join nest they belonged to will be adjusted.
> + The semi-join nest is also marked as correlated, and
> + sj_corr_tables and sj_depends_on are adjusted if necessary.
> + * Semi-join nests' sj_inner_tables is set equal to used_tables
>
> - NOTE
> + NOTE
> Table pullout may make uncorrelated subquery correlated. Consider this
> example:
>
> @@ -4466,13 +4473,9 @@
> make the subquery (i.e. its semi-join nest) correlated.
> Making the subquery (i.e. its semi-join nest) correlated prevents us from
> using Materialization or LooseScan to execute it.
> -
> - RETURN
> - FALSE - OK
> - TRUE - Out of memory error
> */
>
> -bool pull_out_semijoin_tables(JOIN *join)
> +static bool pull_out_semijoin_tables(JOIN *join)
> {
> TABLE_LIST *sj_nest;
> DBUG_ENTER("pull_out_semijoin_tables");
> @@ -4484,29 +4487,14 @@
>
> /* Try pulling out of the each of the semi-joins */
>
> ØG: While you are changing this part of the code, I suggest you remove
> "the" from comment above?

Done.
>
> while ((sj_nest= sj_list_it++))
> - {
> - /* Action #1: Mark the constant tables to be pulled out */
> + {
> table_map pulled_tables= 0;
> -
> List_iterator<TABLE_LIST> child_li(sj_nest->nested_join->join_list);
> TABLE_LIST *tbl;
> - while ((tbl= child_li++))
> - {
> - if (tbl->table)
> - {
> - if (tbl->table->map & join->const_table_map)
> - {
> - pulled_tables |= tbl->table->map;
> - DBUG_PRINT("info", ("Table %s pulled out (reason: constant)",
> - tbl->table->alias));
> - }
> - }
> - }
> -
> /*
> - Action #2: Find which tables we can pull out based on
> - update_ref_and_keys() data. Note that pulling one table out can allow
> - us to pull out some other tables too.
> + Find which tables we can pull out based on key dependency data.
> + Note that pulling one table out can allow us to pull out some
> + other tables too.
> */
> bool pulled_a_table;
> do
> @@ -4538,23 +4526,25 @@
>
> child_li.rewind();
> /*
> - Action #3: Move the pulled out TABLE_LIST elements to the parents.
> + Move the pulled out TABLE_LIST elements to the parents.
> */
> - table_map inner_tables= sj_nest->nested_join->used_tables &
> - ~pulled_tables;
> - /* Record the bitmap of inner tables */
> - sj_nest->sj_inner_tables= inner_tables;
> + sj_nest->nested_join->used_tables&= ~pulled_tables;
> + sj_nest->nested_join->not_null_tables&= ~pulled_tables;
> +
> + /* sj_inner_tables is a copy of nested_join->used_tables */
> + sj_nest->sj_inner_tables= sj_nest->nested_join->used_tables;
> +
> if (pulled_tables)
> {
> - List<TABLE_LIST> *upper_join_list= (sj_nest->embedding != NULL)?
> - (&sj_nest->embedding->nested_join->join_list):
> - (&join->select_lex->top_join_list);
> + List<TABLE_LIST> *upper_join_list= (sj_nest->embedding != NULL) ?
> + &sj_nest->embedding->nested_join->join_list :
> + &join->select_lex->top_join_list;
> Query_arena *arena, backup;
> arena= join->thd->activate_stmt_arena_if_needed(&backup);
> while ((tbl= child_li++))
> {
> if (tbl->table &&
> - !(inner_tables & tbl->table->map))
> + !(sj_nest->nested_join->used_tables & tbl->table->map))
>
> ØG: Any particular reason you prefer to use
> sj_nest->nested_join->used_tables over sj_nest->sj_inner_tables
> here?

I have a followup-patch from WL5266 that intends to remove sj_inner_tables, as 
there is no need to have both this and nested_join->used_tables. However, the 
most convenient thing was to delay this patch to after this bug was fixed, but 
then I did not know it would take this long time to close the bug...

The patch: http://lists.mysql.com/commits/110609

(It's probably not applicable as-is).

>
> {
> /*
> Pull the table up in the same way as simplify_joins() does:
> @@ -4574,7 +4564,7 @@
> }
>
> /* Remove the sj-nest itself if we've removed everything from it */
> - if (!inner_tables)
> + if (!sj_nest->nested_join->used_tables)
>
> ØG: And here?

See above.

>
> {
> List_iterator<TABLE_LIST> li(*upper_join_list);
> /* Find the sj_nest in the list. */
>
> @@ -4700,19 +4690,50 @@
> return len;
> }
>
> -
> /**
> - Calculate the best possible join and initialize the join structure.
> -
> - @retval
> - 0 ok
> - @retval
> - 1 Fatal error
> + Calculate best possible join order and initialize the join structure.
> +
> + @param join Join object that is populated with statistics data
> + @param tables_arg List of tables that is referenced by this query
> + @param conds Where condition of query
> + @param keyuse_array[out] Populated with key_use information
> + @param first_optimization True if first optimization of this query
> +
> + @return true if success, false if error
> +
> + @details
> + Here is an overview of the logic of this function:
> +
> + - Initialize JOIN data structures and setup basic dependencies between tables.
> +
> + - Update dependencies based on join information.
> +
> + - Make key descriptions (update_ref_and_keys()).
> +
> + - Pull out semi-join tables based on table dependencies.
> +
> + - Extract tables with zero or one rows as const tables.
> +
> + - Read contents of const tables, substitute columns from these tables with
> + actual data. Also keep track of empty tables vs. one-row tables.
> +
> + - After const table extraction based on row count, more tables may
> + have become functionally dependent. Extract these as const tables.
> +
> + - Add new sargable predicates based on retrieved const values.
> +
> + - Calculate number of rows to be retrieved from each table.
> +
> + - Calculate cost of potential semi-join materializations.
> +
> + - Calculate best possible join order based on available statistics.
> +
> + - Fill in remaining information for the generated join order.
> */
>
> static bool
> make_join_statistics(JOIN *join, TABLE_LIST *tables_arg, Item *conds,
> - Key_use_array *keyuse_array)
> + Key_use_array *keyuse_array, bool first_optimization)
> {
> int error;
> TABLE *table;
> @@ -4747,10 +4768,15 @@
> join->best_ref= stat_vector;
>
> stat_end= stat+table_count;
> + join->const_table_map= 0;
> join->found_const_table_map= 0;
> join->all_table_map= 0;
> const_count= 0;
>
> + /*
> + Initialize data structures for tables to be joined.
> + Initialize dependencies between tables.
> + */
> for (s= stat, i= 0;
> tables;
> s++, tables= tables->next_leaf, i++)
> @@ -4777,31 +4803,10 @@
> table->quick_condition_rows= table->file->stats.records;
>
> s->on_expr_ref= &tables->on_expr;
> - if (*s->on_expr_ref)
> - {
> - /* s is the only inner table of an outer join */
> -#ifdef WITH_PARTITION_STORAGE_ENGINE
> - if ((!table->file->stats.records || table->no_partitions_used) &&
> - !tables->in_outer_join_nest())
> -#else
> - if (!table->file->stats.records && !tables->in_outer_join_nest())
> -#endif
> - { // Empty table
> - s->dependent= 0; // Ignore LEFT JOIN depend.
> - set_position(join, const_count++, s, NULL);
> - continue;
> - }
> - outer_join|= table->map;
> - s->embedding_map= 0;
> - for (TABLE_LIST *embedding= tables->embedding;
> - embedding;
> - embedding= embedding->embedding)
> - s->embedding_map|= embedding->nested_join->nj_map;
> - continue;
> - }
> +
> if (tables->in_outer_join_nest())
> {
> - /* s belongs to a nested join, maybe to several embedded joins */
> + /* s belongs to a nested join, maybe to several embedding joins */
> s->embedding_map= 0;
> for (TABLE_LIST *embedding= tables->embedding;
> embedding;
> @@ -4812,20 +4817,16 @@
> s->dependent|= embedding->dep_tables;
> outer_join|= nested_join->used_tables;
> }
> - continue;
> }
> -#ifdef WITH_PARTITION_STORAGE_ENGINE
> - const bool no_partitions_used= table->no_partitions_used;
> -#else
> - const bool no_partitions_used= FALSE;
> -#endif
> - if ((table->s->system || table->file->stats.records <= 1 ||
> - no_partitions_used) &&
> - !s->dependent &&
> - (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) &&
> - !table->fulltext_searched && !join->no_const_tables)
> + else if (*s->on_expr_ref)
> {
> - set_position(join, const_count++, s, NULL);
> + /* s is the only inner table of an outer join */
> + outer_join|= table->map;
> + s->embedding_map= 0;
> + for (TABLE_LIST *embedding= tables->embedding;
> + embedding;
> + embedding= embedding->embedding)
> + s->embedding_map|= embedding->nested_join->nj_map;
> }
> }
> stat_vector[i]=0;
> @@ -4834,6 +4835,7 @@
> if (join->outer_join)
> {
> /*
> + Complete the dependency analysis.
> Build transitive closure for relation 'to be dependent on'.
> This will speed up the plan search for many cases with outer joins,
> as well as allow us to catch illegal cross references.
> @@ -4893,8 +4895,102 @@
> ~outer_join, join->select_lex, &sargables))
> goto error;
>
> - /* Read tables with 0 or 1 rows (system tables) */
> - join->const_table_map= 0;
> + /*
> + Pull out semi-join tables based on dependencies. Dependencies are valid
> + throughout the lifetime of a query, so this operation can be performed
> + on the first optimization only.
> + */
> + if (first_optimization)
> + {
> + if (pull_out_semijoin_tables(join))
> + DBUG_RETURN(true);
> + }
> +
> + /*
> + Extract const tables based on row counts, must be done for each execution.
> + Tables containing exactly zero or one rows are marked as const, but
> + notice the additional constraints checked below.
> + Tables that are extracted have their rows read before actual execution
> + starts and are placed in the beginning of the join_tab array.
> + Thus, they do not take part in join order optimization process,
> + which can significantly reduce the optimization time.
> + The data read from these tables can also be regarded as "constant"
> + throughout query execution, hence the column values can be used for
> + additional constant propagation and extraction of const tables based
> + on eq-ref properties.
> + */
> + enum enum_const_table_extraction
> + {
> + extract_no_table= 0,
> + extract_empty_table= 1,
> + extract_const_table= 2
> + };
> +
> + for (i= 0, s= stat; i < table_count; i++, s++)
> + {
> + TABLE *const table= s->table;
> + TABLE_LIST *const tables= table->pos_in_table_list;
> + enum enum_const_table_extraction extract_method= extract_const_table;
> +
> +#ifdef WITH_PARTITION_STORAGE_ENGINE
> + const bool no_partitions_used= table->no_partitions_used;
> +#else
> + const bool no_partitions_used= false;
> +#endif
> +
> + if (tables->in_outer_join_nest())
> + {
> + /*
> + Table belongs to a nested join, no candidate for const table extraction.
> + */
> + extract_method= extract_no_table;
> + }
> + else if (tables->embedding && tables->embedding->sj_on_expr)
> + {
> + /*
> + Table belongs to a semi-join.
> + We do not currently pull out const tables from semi-join nests.
> + */
> + extract_method= extract_no_table;
> + }
> + else if (*s->on_expr_ref)
> + {
> + /* s is the only inner table of an outer join, extract empty tables */
> + extract_method= extract_empty_table;
> + }
> + switch (extract_method)
> + {
> + case extract_no_table:
> + break;
> +
> + case extract_empty_table:
> + if ((table->file->stats.records == 0 ||
> + no_partitions_used) &&
> + (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) &&
> + !join->no_const_tables)
> + set_position(join, const_count++, s, NULL);
> + break;
> +
> + case extract_const_table:
> + /*
> + Extract tables with zero or one rows, but do not extract tables that
> + 1. are dependent upon other tables, or
> + 2. have no exact statistics, or
> + 3. are full-text searched, or
> + 4. the join object does not allow const table elimination.
> + */
> + if ((table->s->system ||
> + table->file->stats.records <= 1 ||
> + no_partitions_used) &&
> + !s->dependent && // 1
> + (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) && //
> 2
> + !table->fulltext_searched && // 3
> + !join->no_const_tables) // 4
> + set_position(join, const_count++, s, NULL);
> + break;
>
> ØG: I am bit concerned that large parts of this if statement has to
> be repeated twice. Is it not some way that this code could be
> shared? It also seems a bit strange that it is considered that an
> explanation is only needed the second time.

Explanation added for the first branch as well.

I took out the test on join_no_const_tables. Unfortunately, it does introduce a 
goto, but it will be possible to remove it again if we rafactor this function 
into multiple ones.
>
> + }
> + }
> + /* Read const tables (tables matching no more than 1 rows) */
>
> for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count;
> p_pos < p_end ;
> @@ -5015,14 +5111,19 @@
> } while (keyuse->table == table && keyuse->key == key);
>
> /*
> + Extract const tables with proper key dependencies.
> + Exclude tables that
> + 1. are full-text searched, or
> + 2. are part of nested outer join.
> +
> TODO (low priority): currently we ignore the const tables that
> are within a semi-join nest which is within an outer join nest.
> The effect of this is that we don't do const substitution for
> such tables.
>
> ØG: Is the above TODO meaningful anymore? We have decided to ignore
> all const tables within semi-join nests so why should we be concerned
> about a sub-class of those? I suggest deleting this.

Agree, done.
>
> */
> if (eq_part.is_prefix(table->key_info[key].key_parts) &&
> - !table->fulltext_searched &&
> - !table->pos_in_table_list->in_outer_join_nest())
> + !table->fulltext_searched && // 1
> + !table->pos_in_table_list->in_outer_join_nest()) // 2
> {
> if (table->key_info[key].flags & HA_NOSAME)
> {
> @@ -5081,6 +5182,8 @@
>
> for (s=stat ; s < stat_end ; s++)
> {
> + TABLE_LIST *const tl= s->table->pos_in_table_list;
> +
> if (s->type == JT_SYSTEM || s->type == JT_CONST)
> {
> /* Only one matching row */
> @@ -5113,9 +5216,8 @@
> Do range analysis if we're on the inner side of a semi-join (3).
> */
> if (!s->const_keys.is_clear_all() && // (1)
> - (!s->table->pos_in_table_list->embedding || // (2)
> - (s->table->pos_in_table_list->embedding && // (3)
> - s->table->pos_in_table_list->embedding->sj_on_expr))) // (3)
> + (!tl->embedding || // (2)
> + (tl->embedding && tl->embedding->sj_on_expr))) // (3)
> {
> ha_rows records;
> SQL_SELECT *select;
> @@ -5133,7 +5235,13 @@
> s->quick=select->quick;
> s->needed_reg=select->needed_reg;
> select->quick=0;
> - if (records == 0 && s->table->reginfo.impossible_range)
> + /*
> + Check for "impossible range", make sure that semi-joined tables
> + are not accounted for.
>
> ØG: What do you mean by not accounted for, and why is this necessary?
> Please, improve comment.

Problem is that this is yet another "const table pullout" based on data volume, 
which we no longer support. Comment updated.

>
> + */
> + if (records == 0 &&
> + s->table->reginfo.impossible_range &&
> + (!(tl->embedding && tl->embedding->sj_on_expr)))
> {
> /*
> Impossible WHERE or ON expression
> @@ -5161,18 +5269,14 @@
> delete select;
> }
> }
> -
> - if (pull_out_semijoin_tables(join))
> - DBUG_RETURN(TRUE);
> -
> /*
> - Set pointer to embedding semijoin nest for all semijoined tables.
> - Note that this must be done for every table inside all semijoin nests,
> - even for tables within outer join nests embedded in semijoin nests.
> - A table can never be part of multiple semijoin nests, hence no
> + Set pointer to embedding semi-join nest for all semi-joined tables.
> + Note that this must be done for every table inside all semi-join nests,
> + even for tables within outer join nests embedded in semi-join nests.
> + A table can never be part of multiple semi-join nests, hence no
> ambiguities can ever occur.
> Note also that the pointer is not set for TABLE_LIST objects that
> - are outer join nests within semijoin nests.
> + are outer join nests within semi-join nests.
> */
> for (s= stat; s < stat_end; s++)
> {

Thanks,
Roy
Thread
bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543Roy Lyseng19 May
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543Øystein Grøvlen10 Jun
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543Roy Lyseng27 Jun