List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:June 10 2011 9:58am
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543
View as plain text  
Hi Roy,

Thanks for patch.  Looks good, and I will approve it.  See below for
some comments that you may consider. (Prefixed with ØG)

=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc	2011-05-18 13:12:02 +0000
+++ sql/sql_select.cc	2011-05-19 10:56:42 +0000

...

@@ -4429,32 +4433,35 @@
  }


-/*
-  Pull tables out of semi-join nests, if possible
-
-  SYNOPSIS
-    pull_out_semijoin_tables()
-      join  The join where to do the semi-join flattening
-
-  DESCRIPTION
-    Try to pull tables out of semi-join nests.
-
+/**
+  Pull const tables out of semi-join nests

ØG: This use of the term "const table" seems to be in conflict with
     existing usage, and will probably create confusion.  I suggest
     "Pull tables out of semi-join nests based on functional
     dependencies" instead.

+
+  @param join  The join where to do the semi-join table pullout
+
+  @return False if successful, true if error (Out of memory)
+
+  @details
+    Pull const tables out of semi-join nests based on functional 
dependencies,

ØG: Please, remove "const" above.

+    ie. if a table is accessed via eq_ref(outer_tables).
+    The function may be called several times, the caller is responsible
+    for setting up proper key information that this function acts upon.
+
      PRECONDITIONS
      When this function is called, the join may have several semi-join 
nests
      but it is guaranteed that one semi-join nest does not contain another.
-
-    ACTION
-    A table can be pulled out of the semi-join nest if
-     - It is a constant table, or
-     - It is accessed via eq_ref(outer_tables)
+    For functionally dependent tables to be pulled out, key information 
must
+    have been calculated (see update_ref_and_keys()).

      POSTCONDITIONS
-     * Semi-join nests' TABLE_LIST::sj_inner_tables is updated accordingly
-
-    This operation is (and should be) performed at each PS execution since
-    tables may become/cease to be constant across PS reexecutions.
+     * Tables that were pulled out are removed from the semi-join nest they
+       belonged to and added to the parent join nest.
+     * For these tables, the used_tables and not_null_tables fields of
+       the semi-join nest they belonged to will be adjusted.
+       The semi-join nest is also marked as correlated, and
+       sj_corr_tables and sj_depends_on are adjusted if necessary.
+     * Semi-join nests' sj_inner_tables is set equal to used_tables

-  NOTE
+    NOTE
      Table pullout may make uncorrelated subquery correlated. Consider this
      example:

@@ -4466,13 +4473,9 @@
      make the subquery (i.e. its semi-join nest) correlated.
      Making the subquery (i.e. its semi-join nest) correlated prevents 
us from
      using Materialization or LooseScan to execute it.
-
-  RETURN
-    FALSE - OK
-    TRUE  - Out of memory error
  */

-bool pull_out_semijoin_tables(JOIN *join)
+static bool pull_out_semijoin_tables(JOIN *join)
  {
    TABLE_LIST *sj_nest;
    DBUG_ENTER("pull_out_semijoin_tables");
@@ -4484,29 +4487,14 @@

    /* Try pulling out of the each of the semi-joins */

ØG: While you are changing this part of the code, I suggest you remove
     "the" from comment above?

    while ((sj_nest= sj_list_it++))
-  {
-    /* Action #1: Mark the constant tables to be pulled out */
+  {
      table_map pulled_tables= 0;
-
      List_iterator<TABLE_LIST> child_li(sj_nest->nested_join->join_list);
      TABLE_LIST *tbl;
-    while ((tbl= child_li++))
-    {
-      if (tbl->table)
-      {
-        if (tbl->table->map & join->const_table_map)
-        {
-          pulled_tables |= tbl->table->map;
-          DBUG_PRINT("info", ("Table %s pulled out (reason: constant)",
-                              tbl->table->alias));
-        }
-      }
-    }
-
      /*
-      Action #2: Find which tables we can pull out based on
-      update_ref_and_keys() data. Note that pulling one table out can allow
-      us to pull out some other tables too.
+      Find which tables we can pull out based on key dependency data.
+      Note that pulling one table out can allow us to pull out some
+      other tables too.
      */
      bool pulled_a_table;
      do
@@ -4538,23 +4526,25 @@

      child_li.rewind();
      /*
-      Action #3: Move the pulled out TABLE_LIST elements to the parents.
+      Move the pulled out TABLE_LIST elements to the parents.
      */
-    table_map inner_tables= sj_nest->nested_join->used_tables &
-                            ~pulled_tables;
-    /* Record the bitmap of inner tables */
-    sj_nest->sj_inner_tables= inner_tables;
+    sj_nest->nested_join->used_tables&= ~pulled_tables;
+    sj_nest->nested_join->not_null_tables&= ~pulled_tables;
+
+    /* sj_inner_tables is a copy of nested_join->used_tables */
+    sj_nest->sj_inner_tables= sj_nest->nested_join->used_tables;
+
      if (pulled_tables)
      {
-      List<TABLE_LIST> *upper_join_list= (sj_nest->embedding != NULL)?
- 
(&sj_nest->embedding->nested_join->join_list):
- 
(&join->select_lex->top_join_list);
+      List<TABLE_LIST> *upper_join_list= (sj_nest->embedding != NULL) ?
+          &sj_nest->embedding->nested_join->join_list :
+          &join->select_lex->top_join_list;
        Query_arena *arena, backup;
        arena= join->thd->activate_stmt_arena_if_needed(&backup);
        while ((tbl= child_li++))
        {
          if (tbl->table &&
-            !(inner_tables & tbl->table->map))
+            !(sj_nest->nested_join->used_tables & tbl->table->map))

ØG: Any particular reason you prefer to use
     sj_nest->nested_join->used_tables over sj_nest->sj_inner_tables
     here?

          {
            /*
              Pull the table up in the same way as simplify_joins() does:
@@ -4574,7 +4564,7 @@
        }

        /* Remove the sj-nest itself if we've removed everything from it */
-      if (!inner_tables)
+      if (!sj_nest->nested_join->used_tables)

ØG: And here?

        {
          List_iterator<TABLE_LIST> li(*upper_join_list);
          /* Find the sj_nest in the list. */

@@ -4700,19 +4690,50 @@
    return len;
  }

-
  /**
-  Calculate the best possible join and initialize the join structure.
-
-  @retval
-    0	ok
-  @retval
-    1	Fatal error
+  Calculate best possible join order and initialize the join structure.
+
+  @param  join          Join object that is populated with statistics data
+  @param  tables_arg    List of tables that is referenced by this query
+  @param  conds         Where condition of query
+  @param  keyuse_array[out] Populated with key_use information
+  @param  first_optimization True if first optimization of this query
+
+  @return true if success, false if error
+
+  @details
+  Here is an overview of the logic of this function:
+
+  - Initialize JOIN data structures and setup basic dependencies 
between tables.
+
+  - Update dependencies based on join information.
+
+  - Make key descriptions (update_ref_and_keys()).
+
+  - Pull out semi-join tables based on table dependencies.
+
+  - Extract tables with zero or one rows as const tables.
+
+  - Read contents of const tables, substitute columns from these tables 
with
+    actual data. Also keep track of empty tables vs. one-row tables.
+
+  - After const table extraction based on row count, more tables may
+    have become functionally dependent. Extract these as const tables.
+
+  - Add new sargable predicates based on retrieved const values.
+
+  - Calculate number of rows to be retrieved from each table.
+
+  - Calculate cost of potential semi-join materializations.
+
+  - Calculate best possible join order based on available statistics.
+
+  - Fill in remaining information for the generated join order.
  */

  static bool
  make_join_statistics(JOIN *join, TABLE_LIST *tables_arg, Item *conds,
-                     Key_use_array *keyuse_array)
+                     Key_use_array *keyuse_array, bool first_optimization)
  {
    int error;
    TABLE *table;
@@ -4747,10 +4768,15 @@
    join->best_ref= stat_vector;

    stat_end= stat+table_count;
+  join->const_table_map= 0;
    join->found_const_table_map= 0;
    join->all_table_map= 0;
    const_count= 0;

+  /*
+    Initialize data structures for tables to be joined.
+    Initialize dependencies between tables.
+  */
    for (s= stat, i= 0;
         tables;
         s++, tables= tables->next_leaf, i++)
@@ -4777,31 +4803,10 @@
      table->quick_condition_rows= table->file->stats.records;

      s->on_expr_ref= &tables->on_expr;
-    if (*s->on_expr_ref)
-    {
-      /* s is the only inner table of an outer join */
-#ifdef WITH_PARTITION_STORAGE_ENGINE
-      if ((!table->file->stats.records || table->no_partitions_used) &&
-          !tables->in_outer_join_nest())
-#else
-      if (!table->file->stats.records && !tables->in_outer_join_nest())
-#endif
-      {						// Empty table
-        s->dependent= 0;                        // Ignore LEFT JOIN depend.
-	set_position(join, const_count++, s, NULL);
-	continue;
-      }
-      outer_join|= table->map;
-      s->embedding_map= 0;
-      for (TABLE_LIST *embedding= tables->embedding;
-           embedding;
-           embedding= embedding->embedding)
-        s->embedding_map|= embedding->nested_join->nj_map;
-      continue;
-    }
+
      if (tables->in_outer_join_nest())
      {
-      /* s belongs to a nested join, maybe to several embedded joins */
+      /* s belongs to a nested join, maybe to several embedding joins */
        s->embedding_map= 0;
        for (TABLE_LIST *embedding= tables->embedding;
             embedding;
@@ -4812,20 +4817,16 @@
          s->dependent|= embedding->dep_tables;
          outer_join|= nested_join->used_tables;
        }
-      continue;
      }
-#ifdef WITH_PARTITION_STORAGE_ENGINE
-    const bool no_partitions_used= table->no_partitions_used;
-#else
-    const bool no_partitions_used= FALSE;
-#endif
-    if ((table->s->system || table->file->stats.records <= 1 ||
-         no_partitions_used) &&
-	!s->dependent &&
-	(table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) &&
-        !table->fulltext_searched && !join->no_const_tables)
+    else if (*s->on_expr_ref)
      {
-      set_position(join, const_count++, s, NULL);
+      /* s is the only inner table of an outer join */
+      outer_join|= table->map;
+      s->embedding_map= 0;
+      for (TABLE_LIST *embedding= tables->embedding;
+           embedding;
+           embedding= embedding->embedding)
+        s->embedding_map|= embedding->nested_join->nj_map;
      }
    }
    stat_vector[i]=0;
@@ -4834,6 +4835,7 @@
    if (join->outer_join)
    {
      /*
+       Complete the dependency analysis.
         Build transitive closure for relation 'to be dependent on'.
         This will speed up the plan search for many cases with outer joins,
         as well as allow us to catch illegal cross references.
@@ -4893,8 +4895,102 @@
                              ~outer_join, join->select_lex, &sargables))
        goto error;

-  /* Read tables with 0 or 1 rows (system tables) */
-  join->const_table_map= 0;
+  /*
+    Pull out semi-join tables based on dependencies. Dependencies are valid
+    throughout the lifetime of a query, so this operation can be performed
+    on the first optimization only.
+  */
+  if (first_optimization)
+  {
+    if (pull_out_semijoin_tables(join))
+      DBUG_RETURN(true);
+  }
+
+  /*
+    Extract const tables based on row counts, must be done for each 
execution.
+    Tables containing exactly zero or one rows are marked as const, but
+    notice the additional constraints checked below.
+    Tables that are extracted have their rows read before actual execution
+    starts and are placed in the beginning of the join_tab array.
+    Thus, they do not take part in join order optimization process,
+    which can significantly reduce the optimization time.
+    The data read from these tables can also be regarded as "constant"
+    throughout query execution, hence the column values can be used for
+    additional constant propagation and extraction of const tables based
+    on eq-ref properties.
+  */
+  enum enum_const_table_extraction
+  {
+     extract_no_table=    0,
+     extract_empty_table= 1,
+     extract_const_table= 2
+  };
+
+  for (i= 0, s= stat; i < table_count; i++, s++)
+  {
+    TABLE      *const table= s->table;
+    TABLE_LIST *const tables= table->pos_in_table_list;
+    enum enum_const_table_extraction extract_method= extract_const_table;
+
+#ifdef WITH_PARTITION_STORAGE_ENGINE
+    const bool no_partitions_used= table->no_partitions_used;
+#else
+    const bool no_partitions_used= false;
+#endif
+
+    if (tables->in_outer_join_nest())
+    {
+      /*
+        Table belongs to a nested join, no candidate for const table 
extraction.
+      */
+      extract_method= extract_no_table;
+    }
+    else if (tables->embedding && tables->embedding->sj_on_expr)
+    {
+      /*
+        Table belongs to a semi-join.
+        We do not currently pull out const tables from semi-join nests.
+      */
+      extract_method= extract_no_table;
+    }
+    else if (*s->on_expr_ref)
+    {
+      /* s is the only inner table of an outer join, extract empty 
tables */
+      extract_method= extract_empty_table;
+    }
+    switch (extract_method)
+    {
+    case extract_no_table:
+      break;
+
+    case extract_empty_table:
+      if ((table->file->stats.records == 0 ||
+           no_partitions_used) &&
+          (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT)
&&
+          !join->no_const_tables)
+        set_position(join, const_count++, s, NULL);
+      break;
+
+    case extract_const_table:
+      /*
+        Extract tables with zero or one rows, but do not extract tables 
that
+         1. are dependent upon other tables, or
+         2. have no exact statistics, or
+         3. are full-text searched, or
+         4. the join object does not allow const table elimination.
+      */
+      if ((table->s->system ||
+           table->file->stats.records <= 1 ||
+           no_partitions_used) &&
+          !s->dependent && 
   // 1
+          (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) 
&& // 2
+          !table->fulltext_searched && 
   // 3
+          !join->no_const_tables) 
   // 4
+        set_position(join, const_count++, s, NULL);
+      break;

ØG:  I am bit concerned that large parts of this if statement has to
      be repeated twice.  Is it not some way that this code could be
      shared?  It also seems a bit strange that it is considered that an
      explanation is only needed the second time.

+    }
+  }
+  /* Read const tables (tables matching no more than 1 rows) */

    for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count;
         p_pos < p_end ;
@@ -5015,14 +5111,19 @@
  	  } while (keyuse->table == table && keyuse->key == key);

            /*
+            Extract const tables with proper key dependencies.
+            Exclude tables that
+             1. are full-text searched, or
+             2. are part of nested outer join.
+
              TODO (low priority): currently we ignore the const tables that
              are within a semi-join nest which is within an outer join 
nest.
              The effect of this is that we don't do const substitution for
              such tables.

ØG: Is the above TODO meaningful anymore?  We have decided to ignore
     all const tables within semi-join nests so why should we be concerned
     about a sub-class of those?  I suggest deleting this.

            */
  	  if (eq_part.is_prefix(table->key_info[key].key_parts) &&
-              !table->fulltext_searched &&
-              !table->pos_in_table_list->in_outer_join_nest())
+              !table->fulltext_searched &&                           // 1
+              !table->pos_in_table_list->in_outer_join_nest())       // 2
  	  {
              if (table->key_info[key].flags & HA_NOSAME)
              {
@@ -5081,6 +5182,8 @@

    for (s=stat ; s < stat_end ; s++)
    {
+    TABLE_LIST *const tl= s->table->pos_in_table_list;
+
      if (s->type == JT_SYSTEM || s->type == JT_CONST)
      {
        /* Only one matching row */
@@ -5113,9 +5216,8 @@
        Do range analysis if we're on the inner side of a semi-join (3).
      */
      if (!s->const_keys.is_clear_all() &&                        // (1)
-        (!s->table->pos_in_table_list->embedding ||             // (2)
-         (s->table->pos_in_table_list->embedding &&             // (3)
-          s->table->pos_in_table_list->embedding->sj_on_expr))) // (3)
+        (!tl->embedding ||                                      // (2)
+         (tl->embedding && tl->embedding->sj_on_expr)))         // (3)
      {
        ha_rows records;
        SQL_SELECT *select;
@@ -5133,7 +5235,13 @@
        s->quick=select->quick;
        s->needed_reg=select->needed_reg;
        select->quick=0;
-      if (records == 0 && s->table->reginfo.impossible_range)
+      /*
+        Check for "impossible range", make sure that semi-joined tables
+        are not accounted for.

ØG: What do you mean by not accounted for, and why is this necessary?
     Please, improve comment.

+      */
+      if (records == 0 &&
+          s->table->reginfo.impossible_range &&
+          (!(tl->embedding && tl->embedding->sj_on_expr)))
        {
  	/*
  	  Impossible WHERE or ON expression
@@ -5161,18 +5269,14 @@
        delete select;
      }
    }
-
-  if (pull_out_semijoin_tables(join))
-    DBUG_RETURN(TRUE);
-
    /*
-    Set pointer to embedding semijoin nest for all semijoined tables.
-    Note that this must be done for every table inside all semijoin nests,
-    even for tables within outer join nests embedded in semijoin nests.
-    A table can never be part of multiple semijoin nests, hence no
+    Set pointer to embedding semi-join nest for all semi-joined tables.
+    Note that this must be done for every table inside all semi-join nests,
+    even for tables within outer join nests embedded in semi-join nests.
+    A table can never be part of multiple semi-join nests, hence no
      ambiguities can ever occur.
      Note also that the pointer is not set for TABLE_LIST objects that
-    are outer join nests within semijoin nests.
+    are outer join nests within semi-join nests.
    */
    for (s= stat; s < stat_end; s++)
    {

--
Øystein

Thread
bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543Roy Lyseng19 May
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543Øystein Grøvlen10 Jun
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3374) Bug#11752543Roy Lyseng27 Jun