List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:October 5 2010 11:50am
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)
Bug#45174 Bug#50019
View as plain text  
On 02.10.10 17.17, Guilhem Bichot wrote:
> Hello Roy,
>
> Thanks for a clearly commented patch.
>
>> === modified file 'mysql-test/r/subquery_sj_mat.result'
>> --- a/mysql-test/r/subquery_sj_mat.result 2010-09-20 14:06:02 +0000
>> +++ b/mysql-test/r/subquery_sj_mat.result 2010-09-29 14:33:39 +0000
>> @@ -2176,7 +2176,7 @@ create table t3 (a int);
>> insert into t3 select A.a + 10*B.a from t0 A, t0 B;
>> explain select * from t3 where a in (select kp1 from t1 where kp1<20);
>> id select_type table type possible_keys key key_len ref rows Extra
>> -1 PRIMARY t3 ALL NULL NULL NULL NULL 100 Using where
>> +1 PRIMARY t3 ALL NULL NULL NULL NULL 100
>> 1 PRIMARY t1 range kp1 kp1 5 NULL 48 Using where; Using index; Materialize
>
> This one looks worrying, performance-wise. The disappearing "Using where" was
> probably a condition "a<20"; having it attached to t3 allowed to do less lookups
> in the materialized table.
>
> Looking at "handler%" values in "show status" says that after the patch, the
> SELECT used 105 Handler_read_key, instead of 25 before. Other statistics are
> unchanged. Handler_write stays at 40: there are 40 row writes into the tmp table
> (logical, there are 40 rows having kp1<20 in t1)
> Thus: after the patch, t3's rows are not filtered anymore with a<20, so the
> number of key lookups into the materialized table grows (handler_read_key).
> Handler_write is unchanged so nothing has changed for t1. Also confirmed by the
> unchanged EXPLAIN row for t1 ("range" and "using where").
>
> The commit comment says
>
> "Some tests using semijoin materialization show that where clause
> has moved from the outer query into the materialized inner query."
>
> but I think here it's not a move, only a loss.

I am a bit concerned about this, sometimes there will be a loss, and sometimes a 
gain.

Consider:
select * from ot where a in (select b from it where b<20);

It is correct that the optimizer may freely move the predicate "b<20" into the 
outer query, but the price to pay is increased cost when materializing the 
table. Thus, the number of rows being filtered away in the inner vs. the outer 
query will tell whether there is a loss or gain.

However, an intelligent programmer will instead write this query:

select * from ot where a in (select b from it where b<20) and a<20;

(An intelligent programmer will assume a not-so-intelligent optimizer).

But the optimizer will actually move both the predicates into the outer query 
(just validate with EXPLAIN EXTENDED).

Thus, I would say that we should keep the predicate within the query it is 
defined in, if the materialized semijoin algorithm is chosen.

The best option would however be to generate a duplicate predicate so that it 
can be attached to both queries.

Thanks,
Roy



Thread
bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253) Bug#45174Bug#50019Roy Lyseng29 Sep
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)Bug#45174 Bug#50019Guilhem Bichot2 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)Bug#45174 Bug#50019Roy Lyseng5 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)Bug#45174 Bug#50019Roy Lyseng7 Oct
      • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)Bug#45174 Bug#50019Guilhem Bichot20 Oct