List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:February 7 2011 1:36pm
Subject:Re: bzr commit into mysql-trunk branch (olav.sandstaa:3559) Bug#59696
View as plain text  
Hi Olav,

thank you for a clever solution to an ugly bug.

As far as I understand the code, the fix is correct. I just wonder if you should 
add an assert on the transform of an Item_ref that will trigger when anything 
but the original object is returned. The reason for this is that I think that 
Item_ref objects can be used as helper objects to encapsulate other item 
objects, in order to prevent having to perform multiple updates if an item is 
being replaced. E.g, if an item is used both in WHERE and HAVING, the HAVING 
item is encapsulated in an Item_ref, so that substituting the original item does 
not have a consequence within the HAVING clause, only within the Item_ref.

Some minor comments and suggestions below.

On 27.01.11 16.51, Olav Sandstaa wrote:
> #At file:///export/home/tmp/mysql2/opt-bug59696/ based on
> revid:anitha.gopi@stripped
>
>   3559 Olav Sandstaa	2011-01-27
>        Fix for Bug#59696 Optimizer fails to move WHERE condition on JOIN column
>                          when joining with a view
>
>        When optimizing a JOIN of two tables the optimizer tries to move WHERE
>        conditions on the join column(s) to the first table in the JOIN in
>        order to reduce the number of records it has to read. If the second
>        operand of the JOIN is a view that is identical to the underlying
>        table the optimizer should be able to perform the same optimization.
>        In this case it fails to move the WHERE condition on the join column
>        to the first table.
>
>        The cause for this is that when the second operand is a view then the
>        WHERE clauses is stored using Item_ref object that references the real
suggest rewrite to:
"conditions contain Item_ref objects that reference the real item field objects.
>        item objects. The existing code fails to evaluate these since it only
>        evaluates the Item_ref object and fails to look at the real
>        objects. As a result it fails to detect that the WHERE clause on the
WHERE clause -> equals predicate
>        second join operand can be substituted by a corresponding WHERE clause
>        on the first operand.
ditto here
>
>        There are two places where we fail to handle Item_ref correctly and that
>        causes the optimizer to fail to do this optimization:
>
>        1. In optimize_cond() we call build_equal_items() which again calls
>        build_equal_items_for_cond() where we in Item_func::compile() fail to
>        evaluate the Item::equal_fields_propagator() for the condition on the
>        view operand. As a result the field used in the condition for the view
>        is not considered as an equal field to the join column of the first
>        table when this later is evaluated in make_join_statistics().
>
>        2. Later in JOIN::optimize() we call substitute_for_best_equal_field()
>        which uses Item_func::transform() to select the best condition of the
>        tables. Item_func::transform() fails to evaluate the different
>        conditions that are equal due to these being stored in a Item_ref
>        object.
>
>        The fix for these problems is to implement compile() and transform()
>        methods for the Item_ref class that will do the proper evaluation of
>        the compile() and transform on the objects the Item_ref object has the
>        reference to. This will ensure that when Item_func::compile() and
>        Item_func::transform() evaluates Item_ref objects both the Item_ref
>        object and the object it references get evaluated.
Please try to simplify these two sentences.
>       @ mysql-test/r/join.result
>          Test case for Bug#59696 Optimizer fails to move WHERE condition
>          on JOIN column when joining with a view.
>       @ mysql-test/r/subquery_sj_none.result
>          Change in explain/query plan due to the fix for Bug#59696 "Optimizer
>          fails to move WHERE condition on JOIN column when joining with a view"
>
>          The change in query plan for the following query:
>
>            select * from t2 where (b,a) in (select a,b from t1 where a=3);
>
>          where the table t2 is defined as:
>
>            create table t2 as select * from t1;
>
>          is caused by that when the optimizer optimizes the subquery it will
>          try to optimize the conditions (in optimize_cond()). One of the things
>          it does it to call build_equal_items_for_cond() for the following
>          condition:
>
>            (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`)
>
>          When this is "compiled" in build_equal_items_for_cond() both the
>          <cache>(`test`.`t2`.`b`) and `test`.`t1`.`a` are stored in Item_ref
>          objects. So with the existing code the real items does not get
>          "compiled" and are not considered for alternatives. So the result of
>          the "compilation" is:
>
>            (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`)
>
>          With the patch applied both of the real items get evaluated and the
>          code is able to find that `test`.`t1`.`a` can be replaced by the
>          constant "3". So the result is from the "compilation" of this condition is
typo: remove first "is"
>
>            <cache>(`test`.`t2`.`b`) = 3
>
>          Which causes the execution plan for the subquery to change as it has.

Thank you for the detailed explanation.

>       @ mysql-test/t/join.test
>          Test case for Bug#59696 Optimizer fails to move WHERE condition
>          on JOIN column when joining with a view.
>       @ sql/item.cc
>          Implement transform() and compile() methods for the Item_ref class.
methods -> functions. Also one line below and four lines below.
>          These methods will then do transform/compile for both the
>          Item_ref object and on the object it has a reference to.
>       @ sql/item.h
>          Implement transform() and compile() methods for the Item_ref class.
>
>      modified:
>        mysql-test/r/join.result
>        mysql-test/r/subquery_sj_none.result
>        mysql-test/t/join.test
>        sql/item.cc
>        sql/item.h
> === modified file 'mysql-test/r/join.result'
>
> === modified file 'mysql-test/r/join.result'
> --- a/mysql-test/r/join.result	2010-12-16 17:38:26 +0000
> +++ b/mysql-test/r/join.result	2011-01-27 15:51:29 +0000
> @@ -1222,3 +1222,40 @@
>   DEALLOCATE PREPARE stmt;
>   DROP TABLE t1;
>   End of 5.1 tests
> +#
> +# Bug #59696 Optimizer fails to move WHERE condition on JOIN column
> +#            when joining with a view
> +#
> +CREATE TABLE t1 (
> +c1 INTEGER NOT NULL
> +);
> +INSERT INTO t1 VALUES (1),(2),(3);
> +CREATE TABLE t2 (
> +pk INTEGER NOT NULL,
> +c1 INTEGER NOT NULL,
> +PRIMARY KEY (pk)
> +);
> +INSERT INTO t2 VALUES (1,4),(3,5),(2,6);
> +EXPLAIN SELECT t2.pk, t2.c1 FROM t2, t1
> +WHERE t2.pk = t1.c1 AND t2.pk>= 2;
> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
> +1	SIMPLE	t1	ALL	NULL	NULL	NULL	NULL	3	Using where
> +1	SIMPLE	t2	eq_ref	PRIMARY	PRIMARY	4	test.t1.c1	1	
> +SELECT t2.pk, t2.c1 FROM t2, t1
> +WHERE t2.pk = t1.c1 AND t2.pk>= 2;
> +pk	c1
> +2	6
> +3	5
Proposal: Do
let $query= SELECT ...;
eval explain $query;
eval $query;

so that you need to specify the query only once.
> +CREATE VIEW v_t2 AS SELECT * FROM t2;
> +EXPLAIN SELECT v_t2.pk, v_t2.c1 FROM v_t2, t1
> +WHERE v_t2.pk = t1.c1 AND v_t2.pk>= 2;
> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
> +1	SIMPLE	t1	ALL	NULL	NULL	NULL	NULL	3	Using where
> +1	SIMPLE	t2	eq_ref	PRIMARY	PRIMARY	4	test.t1.c1	1	
> +SELECT v_t2.pk, v_t2.c1 FROM v_t2, t1
> +WHERE v_t2.pk = t1.c1 AND v_t2.pk>= 2;
> +pk	c1
> +2	6
> +3	5
> +DROP VIEW v_t2;
> +DROP TABLE t1, t2;
>
> === modified file 'mysql-test/r/subquery_sj_none.result'
> --- a/mysql-test/r/subquery_sj_none.result	2011-01-13 10:48:28 +0000
> +++ b/mysql-test/r/subquery_sj_none.result	2011-01-27 15:51:29 +0000
> @@ -2543,7 +2543,7 @@
>   explain select * from t2 where (b,a) in (select a,b from t1 where a=3);
>   id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
>   1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	100	Using where
> -2	DEPENDENT SUBQUERY	t1	index_subquery	a	a	10	func,func	1	Using index; Using where
> +2	DEPENDENT SUBQUERY	t1	ref	a	a	10	const,func	1	Using where; Using index
>   drop table t1,t2;
>   create table t1 (a int, b int);
>   insert into t1 select a,a from t0;
>
> === modified file 'mysql-test/t/join.test'
> --- a/mysql-test/t/join.test	2010-12-16 17:38:26 +0000
> +++ b/mysql-test/t/join.test	2011-01-27 15:51:29 +0000
> @@ -934,3 +934,43 @@
>   DROP TABLE t1;
>
>   --echo End of 5.1 tests
> +
> +--echo #
> +--echo # Bug #59696 Optimizer fails to move WHERE condition on JOIN column
> +--echo #            when joining with a view
> +--echo #
> +
> +CREATE TABLE t1 (
> +  c1 INTEGER NOT NULL
> +);
> +
> +INSERT INTO t1 VALUES (1),(2),(3);
> +
> +CREATE TABLE t2 (
> +  pk INTEGER NOT NULL,
> +  c1 INTEGER NOT NULL,
> +  PRIMARY KEY (pk)
> +);
> +
> +INSERT INTO t2 VALUES (1,4),(3,5),(2,6);
> +
> +let $query=
> +SELECT t2.pk, t2.c1 FROM t2, t1
> +WHERE t2.pk = t1.c1 AND t2.pk>= 2;
> +
> +eval EXPLAIN $query;
> +eval $query;
> +
> +# Create a view on one of the tables. The same query plan should
> +# be used when joining with this view as with the underlying table.
> +CREATE VIEW v_t2 AS SELECT * FROM t2;
> +
> +let $query=
> +SELECT v_t2.pk, v_t2.c1 FROM v_t2, t1
> +WHERE v_t2.pk = t1.c1 AND v_t2.pk>= 2;
> +
> +eval EXPLAIN $query;
> +eval $query;
> +
> +DROP VIEW v_t2;
> +DROP TABLE t1, t2;
>
> === modified file 'sql/item.cc'
> --- a/sql/item.cc	2011-01-18 10:32:38 +0000
> +++ b/sql/item.cc	2011-01-27 15:51:29 +0000
> @@ -6518,6 +6518,85 @@
>   }
>
>
> +/**
> +  Transform an Item_ref object with a transformer callback function.
> +
> +  The function first applies the transform method to the item
method -> function
> +  referenced by this Item_reg object. If this returns a new item the
type: Item_reg -> Item_ref
> +  old item is substituted for a new one. After this the transformer
> +  is applied to the Item_ref object.

I think that you can remove references to error handling in the comment without 
loss of clarity.
> +
> +  @param transformer   the transformer callback function to be applied to
> +                       the nodes of the tree of the object
> +  @param argument      parameter to be passed to the transformer
> +
> +  @return Item returned as the result of transformation of the Item_ref object
> +    @retval !NULL The transformation was successful
> +    @retval NULL  Out of memory error
> +*/
> +
> +Item* Item_ref::transform(Item_transformer transformer, uchar *arg)
> +{
> +  DBUG_ASSERT(!current_thd->is_stmt_prepare());
> +  DBUG_ASSERT((*ref) != NULL);
> +
> +  /* Transform the object we are referencing. */
> +  Item *new_item= (*ref)->transform(transformer, arg);
> +  if (!new_item)
> +    return NULL;
> +
> +  /*
> +    THD::change_item_tree() should be called only if the tree was
> +    really transformed, i.e. when a new item has been created.
> +    Otherwise we'll be allocating a lot of unnecessary memory for
> +    change records at each execution.
> +  */
Proposal:
/* Record the new item in the 'ref' pointer, in a manner safe for prepared 
execution. */

IMHO, the argumentation for how this is called should be documented with 
change_item_tree() and not where it is called.
> +  if (*ref != new_item)
> +    current_thd->change_item_tree(ref, new_item);
> +
> +  /* Transform the item ref object. */
> +  return (this->*transformer)(arg);

Consider asserting that the return value from the transform is the same as 
"this", as suggested above.
> +}
> +
> +
> +/**
> +  Compile an Item_ref object with a processor and a transformer
> +  callback functions.
functions -> function.
> +
> +  First the function applies the analyzer to the Item_ref object. Then
> +  if the analizer succeeeds we first applies the compile method to the
> +  object the Item_ref object is referencing.  If this returns a new
> +  item the old item is substituted for a new one.  After this the
> +  transformer is applied to the Item_ref object itself.

I think that you can omit error handling details here too.
> +
> +  @param analyzer      the analyzer callback function to be applied to the
> +                       nodes of the tree of the object
> +  @param[in,out] arg_p parameter to be passed to the processor
> +  @param transformer   the transformer callback function to be applied to the
> +                       nodes of the tree of the object
> +  @param arg_t         parameter to be passed to the transformer
> +
> +  @return Item returned as the result of transformation of the Item_ref object
> +*/
> +
> +Item* Item_ref::compile(Item_analyzer analyzer, uchar **arg_p,
> +                        Item_transformer transformer, uchar *arg_t)
> +{
> +  /* Analyze this Item object. */
> +  if (!(this->*analyzer)(arg_p))
> +    return NULL;
> +
> +  /* Compile the Item we are referencing. */
> +  DBUG_ASSERT((*ref) != NULL);
> +  Item *new_item= (*ref)->compile(analyzer, arg_p, transformer, arg_t);
> +  if (new_item&&  *ref != new_item)
> +    current_thd->change_item_tree(ref, new_item);
> +
> +  /* Transform this Item object. */
> +  return (this->*transformer)(arg_t);
> +}

Considering the detailed explanation in the Doxygen header, I think that the 
code comments are redundant.
> +
> +
>   void Item_ref::print(String *str, enum_query_type query_type)
>   {
>     if (ref)
>
> === modified file 'sql/item.h'
> --- a/sql/item.h	2011-01-10 16:37:47 +0000
> +++ b/sql/item.h	2011-01-27 15:51:29 +0000
> @@ -2620,6 +2620,9 @@
>       return (*ref)->walk(processor, walk_subquery, arg) ||
>              (this->*processor)(arg);
>     }
> +  virtual Item* transform(Item_transformer, uchar *arg);
> +  virtual Item* compile(Item_analyzer analyzer, uchar **arg_p,
> +                        Item_transformer transformer, uchar *arg_t);
>     virtual void print(String *str, enum_query_type query_type);
>     bool result_as_longlong()
>     {
>
>
>
>
Thanks,
Roy

Thread
bzr commit into mysql-trunk branch (olav.sandstaa:3559) Bug#59696Olav Sandstaa27 Jan
  • Re: bzr commit into mysql-trunk branch (olav.sandstaa:3559) Bug#59696Roy Lyseng7 Feb
    • Re: bzr commit into mysql-trunk branch (olav.sandstaa:3559) Bug#59696Olav Sandstaa9 Feb
      • Re: bzr commit into mysql-trunk branch (olav.sandstaa:3559) Bug#59696Roy Lyseng9 Feb