Hi Roy,
Thanks for the review of this patch.
On 07/02/2011 14:36, Roy Lyseng wrote:
> Hi Olav,
>
> thank you for a clever solution to an ugly bug.
>
> As far as I understand the code, the fix is correct. I just wonder if
> you should add an assert on the transform of an Item_ref that will
> trigger when anything but the original object is returned. The reason
> for this is that I think that Item_ref objects can be used as helper
> objects to encapsulate other item objects, in order to prevent having
> to perform multiple updates if an item is being replaced. E.g, if an
> item is used both in WHERE and HAVING, the HAVING item is encapsulated
> in an Item_ref, so that substituting the original item does not have a
> consequence within the HAVING clause, only within the Item_ref.
I agree that this is a good idea and in the updated patch I have added
this assert. The test suite passes with it.
> Some minor comments and suggestions below.
I have committed an updated version of the patch. This fixes most of
your suggestions and is available here:
http://lists.mysql.com/commits/130859
See inline for some responses to your comments.
Olav
>
> On 27.01.11 16.51, Olav Sandstaa wrote:
>> #Atfile:///export/home/tmp/mysql2/opt-bug59696/ based
> onrevid:anitha.gopi@stripped
>>
>> 3559 Olav Sandstaa 2011-01-27
>> Fix for Bug#59696 Optimizer fails to move WHERE condition on JOIN column
>> when joining with a view
>>
>> When optimizing a JOIN of two tables the optimizer tries to move WHERE
>> conditions on the join column(s) to the first table in the JOIN in
>> order to reduce the number of records it has to read. If the second
>> operand of the JOIN is a view that is identical to the underlying
>> table the optimizer should be able to perform the same optimization.
>> In this case it fails to move the WHERE condition on the join column
>> to the first table.
>>
>> The cause for this is that when the second operand is a view then the
>> WHERE clauses is stored using Item_ref object that references the real
> suggest rewrite to:
> "conditions contain Item_ref objects that reference the real item
> field objects.
Done.
>> item objects. The existing code fails to evaluate these since it only
>> evaluates the Item_ref object and fails to look at the real
>> objects. As a result it fails to detect that the WHERE clause on the
> WHERE clause -> equals predicate
Done.
>> second join operand can be substituted by a corresponding WHERE clause
>> on the first operand.
> ditto here
Done.
>>
>> There are two places where we fail to handle Item_ref correctly and that
>> causes the optimizer to fail to do this optimization:
>>
>> 1. In optimize_cond() we call build_equal_items() which again calls
>> build_equal_items_for_cond() where we in Item_func::compile() fail to
>> evaluate the Item::equal_fields_propagator() for the condition on the
>> view operand. As a result the field used in the condition for the view
>> is not considered as an equal field to the join column of the first
>> table when this later is evaluated in make_join_statistics().
>>
>> 2. Later in JOIN::optimize() we call substitute_for_best_equal_field()
>> which uses Item_func::transform() to select the best condition of the
>> tables. Item_func::transform() fails to evaluate the different
>> conditions that are equal due to these being stored in a Item_ref
>> object.
>>
>> The fix for these problems is to implement compile() and transform()
>> methods for the Item_ref class that will do the proper evaluation of
>> the compile() and transform on the objects the Item_ref object has the
>> reference to. This will ensure that when Item_func::compile() and
>> Item_func::transform() evaluates Item_ref objects both the Item_ref
>> object and the object it references get evaluated.
> Please try to simplify these two sentences.
Hopefully easier to read now.
>> @ mysql-test/r/join.result
>> Test case for Bug#59696 Optimizer fails to move WHERE condition
>> on JOIN column when joining with a view.
>> @ mysql-test/r/subquery_sj_none.result
>> Change in explain/query plan due to the fix for Bug#59696 "Optimizer
>> fails to move WHERE condition on JOIN column when joining with a view"
>>
>> The change in query plan for the following query:
>>
>> select * from t2 where (b,a) in (select a,b from t1 where a=3);
>>
>> where the table t2 is defined as:
>>
>> create table t2 as select * from t1;
>>
>> is caused by that when the optimizer optimizes the subquery it will
>> try to optimize the conditions (in optimize_cond()). One of the things
>> it does it to call build_equal_items_for_cond() for the following
>> condition:
>>
>> (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`)
>>
>> When this is "compiled" in build_equal_items_for_cond() both the
>> <cache>(`test`.`t2`.`b`) and `test`.`t1`.`a` are stored in
> Item_ref
>> objects. So with the existing code the real items does not get
>> "compiled" and are not considered for alternatives. So the result of
>> the "compilation" is:
>>
>> (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`)
>>
>> With the patch applied both of the real items get evaluated and the
>> code is able to find that `test`.`t1`.`a` can be replaced by the
>> constant "3". So the result is from the "compilation" of this condition
> is
> typo: remove first "is"
Done.
>>
>> <cache>(`test`.`t2`.`b`) = 3
>>
>> Which causes the execution plan for the subquery to change as it has.
>
> Thank you for the detailed explanation.
Your welcome :-)
>
>> @ mysql-test/t/join.test
>> Test case for Bug#59696 Optimizer fails to move WHERE condition
>> on JOIN column when joining with a view.
>> @ sql/item.cc
>> Implement transform() and compile() methods for the Item_ref class.
> methods -> functions. Also one line below and four lines below.
Done, done and done.
>> These methods will then do transform/compile for both the
>> Item_ref object and on the object it has a reference to.
>> @ sql/item.h
>> Implement transform() and compile() methods for the Item_ref class.
>>
>> modified:
>> mysql-test/r/join.result
>> mysql-test/r/subquery_sj_none.result
>> mysql-test/t/join.test
>> sql/item.cc
>> sql/item.h
>> === modified file 'mysql-test/r/join.result'
>>
>> === modified file 'mysql-test/r/join.result'
>> --- a/mysql-test/r/join.result 2010-12-16 17:38:26 +0000
>> +++ b/mysql-test/r/join.result 2011-01-27 15:51:29 +0000
>> @@ -1222,3 +1222,40 @@
>> DEALLOCATE PREPARE stmt;
>> DROP TABLE t1;
>> End of 5.1 tests
>> +#
>> +# Bug #59696 Optimizer fails to move WHERE condition on JOIN column
>> +# when joining with a view
>> +#
>> +CREATE TABLE t1 (
>> +c1 INTEGER NOT NULL
>> +);
>> +INSERT INTO t1 VALUES (1),(2),(3);
>> +CREATE TABLE t2 (
>> +pk INTEGER NOT NULL,
>> +c1 INTEGER NOT NULL,
>> +PRIMARY KEY (pk)
>> +);
>> +INSERT INTO t2 VALUES (1,4),(3,5),(2,6);
>> +EXPLAIN SELECT t2.pk, t2.c1 FROM t2, t1
>> +WHERE t2.pk = t1.c1 AND t2.pk>= 2;
>> +id select_type table type possible_keys key key_len ref rows Extra
>> +1 SIMPLE t1 ALL NULL NULL NULL NULL 3 Using where
>> +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.c1 1
>> +SELECT t2.pk, t2.c1 FROM t2, t1
>> +WHERE t2.pk = t1.c1 AND t2.pk>= 2;
>> +pk c1
>> +2 6
>> +3 5
> Proposal: Do
> let $query= SELECT ...;
> eval explain $query;
> eval $query;
>
> so that you need to specify the query only once.
Not done (or rather: it was already done in the original patch for the
test file - I do not know how to get that into the result file :-) )
>> +CREATE VIEW v_t2 AS SELECT * FROM t2;
>> +EXPLAIN SELECT v_t2.pk, v_t2.c1 FROM v_t2, t1
>> +WHERE v_t2.pk = t1.c1 AND v_t2.pk>= 2;
>> +id select_type table type possible_keys key key_len ref rows Extra
>> +1 SIMPLE t1 ALL NULL NULL NULL NULL 3 Using where
>> +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.c1 1
>> +SELECT v_t2.pk, v_t2.c1 FROM v_t2, t1
>> +WHERE v_t2.pk = t1.c1 AND v_t2.pk>= 2;
>> +pk c1
>> +2 6
>> +3 5
>> +DROP VIEW v_t2;
>> +DROP TABLE t1, t2;
>>
>> === modified file 'mysql-test/r/subquery_sj_none.result'
>> --- a/mysql-test/r/subquery_sj_none.result 2011-01-13 10:48:28 +0000
>> +++ b/mysql-test/r/subquery_sj_none.result 2011-01-27 15:51:29 +0000
>> @@ -2543,7 +2543,7 @@
>> explain select * from t2 where (b,a) in (select a,b from t1 where a=3);
>> id select_type table type possible_keys key key_len ref rows Extra
>> 1 PRIMARY t2 ALL NULL NULL NULL NULL 100 Using where
>> -2 DEPENDENT SUBQUERY t1 index_subquery a a 10 func,func 1 Using index; Using
> where
>> +2 DEPENDENT SUBQUERY t1 ref a a 10 const,func 1 Using where; Using index
>> drop table t1,t2;
>> create table t1 (a int, b int);
>> insert into t1 select a,a from t0;
>>
>> === modified file 'mysql-test/t/join.test'
>> --- a/mysql-test/t/join.test 2010-12-16 17:38:26 +0000
>> +++ b/mysql-test/t/join.test 2011-01-27 15:51:29 +0000
>> @@ -934,3 +934,43 @@
>> DROP TABLE t1;
>>
>> --echo End of 5.1 tests
>> +
>> +--echo #
>> +--echo # Bug #59696 Optimizer fails to move WHERE condition on JOIN column
>> +--echo # when joining with a view
>> +--echo #
>> +
>> +CREATE TABLE t1 (
>> + c1 INTEGER NOT NULL
>> +);
>> +
>> +INSERT INTO t1 VALUES (1),(2),(3);
>> +
>> +CREATE TABLE t2 (
>> + pk INTEGER NOT NULL,
>> + c1 INTEGER NOT NULL,
>> + PRIMARY KEY (pk)
>> +);
>> +
>> +INSERT INTO t2 VALUES (1,4),(3,5),(2,6);
>> +
>> +let $query=
>> +SELECT t2.pk, t2.c1 FROM t2, t1
>> +WHERE t2.pk = t1.c1 AND t2.pk>= 2;
>> +
>> +eval EXPLAIN $query;
>> +eval $query;
>> +
>> +# Create a view on one of the tables. The same query plan should
>> +# be used when joining with this view as with the underlying table.
>> +CREATE VIEW v_t2 AS SELECT * FROM t2;
>> +
>> +let $query=
>> +SELECT v_t2.pk, v_t2.c1 FROM v_t2, t1
>> +WHERE v_t2.pk = t1.c1 AND v_t2.pk>= 2;
>> +
>> +eval EXPLAIN $query;
>> +eval $query;
>> +
>> +DROP VIEW v_t2;
>> +DROP TABLE t1, t2;
>>
>> === modified file 'sql/item.cc'
>> --- a/sql/item.cc 2011-01-18 10:32:38 +0000
>> +++ b/sql/item.cc 2011-01-27 15:51:29 +0000
>> @@ -6518,6 +6518,85 @@
>> }
>>
>>
>> +/**
>> + Transform an Item_ref object with a transformer callback function.
>> +
>> + The function first applies the transform method to the item
> method -> function
Done.
>> + referenced by this Item_reg object. If this returns a new item the
> type: Item_reg -> Item_ref
Done.
>> + old item is substituted for a new one. After this the transformer
>> + is applied to the Item_ref object.
>
> I think that you can remove references to error handling in the
> comment without loss of clarity.
Not done. I do not consider what I wrote to be related to error handling.
>> +
>> + @param transformer the transformer callback function to be applied to
>> + the nodes of the tree of the object
>> + @param argument parameter to be passed to the transformer
>> +
>> + @return Item returned as the result of transformation of the Item_ref object
>> + @retval !NULL The transformation was successful
>> + @retval NULL Out of memory error
>> +*/
>> +
>> +Item* Item_ref::transform(Item_transformer transformer, uchar *arg)
>> +{
>> + DBUG_ASSERT(!current_thd->is_stmt_prepare());
>> + DBUG_ASSERT((*ref) != NULL);
>> +
>> + /* Transform the object we are referencing. */
>> + Item *new_item= (*ref)->transform(transformer, arg);
>> + if (!new_item)
>> + return NULL;
>> +
>> + /*
>> + THD::change_item_tree() should be called only if the tree was
>> + really transformed, i.e. when a new item has been created.
>> + Otherwise we'll be allocating a lot of unnecessary memory for
>> + change records at each execution.
>> + */
> Proposal:
> /* Record the new item in the 'ref' pointer, in a manner safe for
> prepared execution. */
>
> IMHO, the argumentation for how this is called should be documented
> with change_item_tree() and not where it is called.
Done (what I had in the original patch was copied from a "standard"
comment that I found in about 7 other implementations of the
Item::transform() member function :-) )
>> + if (*ref != new_item)
>> + current_thd->change_item_tree(ref, new_item);
>> +
>> + /* Transform the item ref object. */
>> + return (this->*transformer)(arg);
>
> Consider asserting that the return value from the transform is the
> same as "this", as suggested above.
Done.
>> +}
>> +
>> +
>> +/**
>> + Compile an Item_ref object with a processor and a transformer
>> + callback functions.
> functions -> function.
Done.
>> +
>> + First the function applies the analyzer to the Item_ref object. Then
>> + if the analizer succeeeds we first applies the compile method to the
>> + object the Item_ref object is referencing. If this returns a new
>> + item the old item is substituted for a new one. After this the
>> + transformer is applied to the Item_ref object itself.
>
> I think that you can omit error handling details here too.
Done (I think).
>> +
>> + @param analyzer the analyzer callback function to be applied to the
>> + nodes of the tree of the object
>> + @param[in,out] arg_p parameter to be passed to the processor
>> + @param transformer the transformer callback function to be applied to the
>> + nodes of the tree of the object
>> + @param arg_t parameter to be passed to the transformer
>> +
>> + @return Item returned as the result of transformation of the Item_ref object
>> +*/
>> +
>> +Item* Item_ref::compile(Item_analyzer analyzer, uchar **arg_p,
>> + Item_transformer transformer, uchar *arg_t)
>> +{
>> + /* Analyze this Item object. */
>> + if (!(this->*analyzer)(arg_p))
>> + return NULL;
>> +
>> + /* Compile the Item we are referencing. */
>> + DBUG_ASSERT((*ref) != NULL);
>> + Item *new_item= (*ref)->compile(analyzer, arg_p, transformer, arg_t);
>> + if (new_item&& *ref != new_item)
>> + current_thd->change_item_tree(ref, new_item);
>> +
>> + /* Transform this Item object. */
>> + return (this->*transformer)(arg_t);
>> +}
>
> Considering the detailed explanation in the Doxygen header, I think
> that the code comments are redundant.
Done.
>> +
>> +
>> void Item_ref::print(String *str, enum_query_type query_type)
>> {
>> if (ref)
>>
>> === modified file 'sql/item.h'
>> --- a/sql/item.h 2011-01-10 16:37:47 +0000
>> +++ b/sql/item.h 2011-01-27 15:51:29 +0000
>> @@ -2620,6 +2620,9 @@
>> return (*ref)->walk(processor, walk_subquery, arg) ||
>> (this->*processor)(arg);
>> }
>> + virtual Item* transform(Item_transformer, uchar *arg);
>> + virtual Item* compile(Item_analyzer analyzer, uchar **arg_p,
>> + Item_transformer transformer, uchar *arg_t);
>> virtual void print(String *str, enum_query_type query_type);
>> bool result_as_longlong()
>> {
>>
>>
>>
>>
> Thanks,
> Roy