Hello,
Roy Lyseng a écrit, Le 07.10.2010 17:13:
> Hi Guilhem,
>
> On 02.10.10 17.17, Guilhem Bichot wrote:
>> Hello Roy,
>>
>> Thanks for a clearly commented patch.
>
> And thanks for a thorough review.
>
>>
>> Roy Lyseng a écrit, Le 29.09.2010 16:34:
>>> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>>> revid:tor.didriksen@stripped
>>>
>>> 3253 Roy Lyseng 2010-09-29
>>> Bug#45174: Incorrectly applied equality propagation caused wrong result
>>> on a query with a materialized semi-join.
>>> Bug#50019: Wrong result for IN-query with materialization.
>> ...
>>> mysql-test/r/subquery_sj_none_jcl7.result
>>> Bug#45174: Incorrectly applied equality propagation caused wrong result
>>> on a query with a materialized semi-join.
>>> Bug#50019: Wrong result for IN-query with materialization.
>>> Results for two new tests added.
>>> Some tests using semijoin materialization show that where clause
>>> has moved from the outer query into the materialized inner query.
>>> This is caused by the changed call to get_first() in
>>> eliminate_item_equal().
>>> Ex: select * from ot where a in(select b from it where b>0);
>>> The clause "b>0" is now evaluated on the inner query materialization.
>>> Performance-wise this is never worse when using MaterializeScan and
>>> usually better for MaterializeLookup. For the latter strategy, the
>>> best possible solution is probably to evaluate the clause in both
>>> queries, this can be subject for a later feature development.
>>
>>> === modified file 'mysql-test/r/subquery_sj_mat.result'
>>> --- a/mysql-test/r/subquery_sj_mat.result 2010-09-20 14:06:02 +0000
>>> +++ b/mysql-test/r/subquery_sj_mat.result 2010-09-29 14:33:39 +0000
>>> @@ -2176,7 +2176,7 @@ create table t3 (a int);
>>> insert into t3 select A.a + 10*B.a from t0 A, t0 B;
>>> explain select * from t3 where a in (select kp1 from t1 where kp1<20);
>>> id select_type table type possible_keys key key_len ref rows Extra
>>> -1 PRIMARY t3 ALL NULL NULL NULL NULL 100 Using where
>>> +1 PRIMARY t3 ALL NULL NULL NULL NULL 100
>>> 1 PRIMARY t1 range kp1 kp1 5 NULL 48 Using where; Using index;
>>> Materialize
>>
>> This one looks worrying, performance-wise. The disappearing "Using
>> where" was
>> probably a condition "a<20"; having it attached to t3 allowed to do
>> less lookups
>> in the materialized table.
>>
>> Looking at "handler%" values in "show status" says that after the
>> patch, the
>> SELECT used 105 Handler_read_key, instead of 25 before. Other
>> statistics are
>> unchanged. Handler_write stays at 40: there are 40 row writes into the
>> tmp table
>> (logical, there are 40 rows having kp1<20 in t1)
>> Thus: after the patch, t3's rows are not filtered anymore with a<20,
>> so the
>> number of key lookups into the materialized table grows
>> (handler_read_key).
>> Handler_write is unchanged so nothing has changed for t1. Also
>> confirmed by the
>> unchanged EXPLAIN row for t1 ("range" and "using where").
>>
>> The commit comment says
>>
>> "Some tests using semijoin materialization show that where clause
>> has moved from the outer query into the materialized inner query."
>>
>> but I think here it's not a move, only a loss.
>
> As far as I can see, it is a move. The inner Where in the previous case
> seemed to be just a True predicate. However, index analysis occurs
> before equality propagation, so the "Using index" clause made the query
> perform well.
Ooook. Before equality propagation, when we still have the original
condition kp1<20, we decide to do range access on the inner table...
After equality propagation the condition is moved outside, but the range
access fortunately remains.
> I created a new table without the index and reran the query. The result
> was a materialized table with 1000 rows instead of 40 rows.
>
> What worries me is that in the existing code, the Where clause is moved
> into the outer query part unconditionally, meaning that if there is no
> index on the inner tables, it will be impossible to apply a filter
> predicate on the inner part. After this fix, a predicate on the outer
> query will remain on the outer query and a predicate on the inner query
> will remain on the inner query.
ok, got it. It's indeed a good change from this point of view.
> We could develop a feature to copy the inner query predicate to the
> outer query (or vice versa) in the case of a materialized semijoin, but
> now it is at least possible for the user to specify these constraints
> manually.
right
>>> @@ -2277,7 +2277,7 @@ insert into t3 select A.a + 10*B.a, 'fil
>>> explain select * from t3 where a in (select a from t2) and (a > 5 or
>>> a < 10);
>>> id select_type table type possible_keys key key_len ref rows Extra
>>> 1 PRIMARY t2 ALL NULL NULL NULL NULL 2 Using where; Materialize; Scan
>>> -1 PRIMARY t3 ref a a 5 test.t2.a 1
>>> +1 PRIMARY t3 ref a a 5 test.t2.a 1 Using where
>>
>> This one is worrying too.
>> Before the patch, the condition "(a>5 or a<10) and a is not null" is
>> pushed to
>> the tmp table ("t2"), and thus nothing needs to be pushed to t3.
>> After the patch, the condition "a is not null" is pushed to the tmp
>> table, and
>> the condition "a>5 or a<10" is pushed to t3. We can see that
>> potentially this
>> means writing more rows to the tmp table, again this is only a loss.
>> This isn't visible in "show status like 'handler%'", because "a>5 or
>> a<10" is
>> true for all rows of t2 (which are '1' and '2') (so pushing or not
>> pushing the
>> condition changes nothing); but if I change this condition to "a>1",
>> then I see
>> the tmp table contains 1 row before the patch and 2 after the patch:
>> more writes
>> to the tmp table, and thus more reads when scanning it.
...
>
> I'll still have to analyze this.
Looks like it does not happen in the latest patch. Good!