List:Commits« Previous MessageNext Message »
From:Martin Hansson Date:August 30 2010 9:33am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3478)
Bug#51070
View as plain text  
  Guilhem Bichot skrev 2010-08-25 19.50:
> Hello Martin,
>
> Martin Hansson a écrit, Le 06.08.2010 15:26:
>> #At file:///data0/martin/bzr/bug51070/5.1bt-guilhem/ based on 
>> revid:bjorn.munch@stripped
>>
>>  3478 Martin Hansson    2010-08-06
>>       Bug#51070: Query with a NOT IN subquery predicate returns a 
>> wrong result set
>>             The EXISTS transformation has additional swithces to 
>> catch most corner cases
>>       that appear when translating an IN predicate (into EXISTS). 
>> Guarded conditions
>>       are used which are deactivated when a NULL value is seen in the 
>> outer
>>       expression's row. When the inner query block supplies NULL 
>> values, however,
>>       they are filtered out because no distinction is made between 
>> the guarded
>>       conditions; guarded NOT x IS NULL conditions in the HAVING 
>> clause that filter
>>       out NULL values cannot be de-activated in isolation from those 
>> that match
>>       values or from the outer expression or NULL's.
>>             The above problem is handled by making the guarded 
>> conditions remember whether
>>       they have rejected a NULL value or not, and index access 
>> methods are taking
>>       this into account as well.             The bug consisted of 
>>             1) Not resetting the property for every nested loop 
>> iteration on the inner
>>         query's result
>>             2) Not propagating the NULL result properly from inner 
>> query to IN optimizer.
>>             3) A hack that aimed to fix #2 by returning NULL when 
>> false was actually
>>          the result. This caused failures when #2 was properly fixed.
>>
>>     modified:
>>       mysql-test/r/subselect4.result
>>       mysql-test/t/subselect4.test
>>       sql/item_cmpfunc.cc
>>       sql/item_subselect.cc
>
>> === modified file 'mysql-test/t/subselect4.test'
>> --- a/mysql-test/t/subselect4.test    2010-08-05 10:42:14 +0000
>> +++ b/mysql-test/t/subselect4.test    2010-08-06 13:26:42 +0000
>> @@ -74,6 +74,67 @@ CREATE VIEW v1 AS SELECT 1 LIKE ( 1 IN (
>>  CREATE VIEW v2 AS SELECT 1 LIKE '%' ESCAPE ( 1 IN ( SELECT 1 ) );
>>  DROP VIEW v1, v2;
>>
>> +--echo # +--echo # Bug#51070: Query with a NOT IN subquery predicate 
>> returns a wrong
>> +--echo # result set
>> +--echo # +CREATE TABLE t1 ( a INT, b INT );
>> +INSERT INTO t1 VALUES ( 1, NULL ), ( 2, NULL );
>> +
>> +CREATE TABLE t2 ( c INT, d INT );
>> +INSERT INTO t2 VALUES ( NULL, 3 ), ( NULL, 4 );
>> +
>> +CREATE TABLE t3 ( e INT, f INT );
>> +INSERT INTO t3 VALUES ( NULL, NULL ), ( NULL, NULL );
>> +
>> +CREATE TABLE t4 ( a INT );
>> +INSERT INTO t4 VALUES (1), (2), (3);
>> +
>> +CREATE TABLE t5 ( a INT );
>> +INSERT INTO t5 VALUES (NULL), (2);
>> +
>> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
>> +EXPLAIN
>> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT c, d FROM t2 );
>> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT c, d FROM t2 );
>> +
>> +EXPLAIN
>> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT c, d FROM t2 ) IS NULL;
>
> for the query above you only do EXPLAIN, you don't run the SELECT.
And you want me to, I presume. :-) Done.
>
>> +SELECT * FROM t1 WHERE ( a, b ) IN ( SELECT c, d FROM t2 ) IS NULL;
>> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT c, d FROM t2 ) IS 
>> UNKNOWN;
>> +SELECT * FROM t1 WHERE (( a, b ) NOT IN ( SELECT c, d FROM t2 )) IS 
>> UNKNOWN;
>
> as you add (), is it that
> ( a, b ) NOT IN ( SELECT c, d FROM t2 ) IS UNKNOWN
> stands for
> ( a, b ) NOT IN (( SELECT c, d FROM t2 ) IS UNKNOWN)
> ? Mmmm no that wouldn't be parsable, (c,d) can't be compared to UNKNOWN.
> Then maybe the added () are "just for fun"?
When I made the switch from my solution to your solution I didn't change 
the test case. As you can see in the previous patch I removed the 
special handling for top_level_item. So I wanted to test both the case 
when the predicate was used on the top level in the WHERE as well as 
when it was nested in a NULL sensitive context. This is not necessary 
anymore, but I sleep better with more testing than less.
>
>> +
>> +SELECT * FROM t1 WHERE 1 = 1 AND ( a, b ) NOT IN ( SELECT c, d FROM 
>> t2 );
>> +
>> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
>> +EXPLAIN
>> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT e, f FROM t3 );
>> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT e, f FROM t3 );
>> +
>> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
>> +EXPLAIN
>> +SELECT * FROM t2 WHERE ( c, d ) NOT IN ( SELECT a, b FROM t1 );
>> +SELECT * FROM t2 WHERE ( c, d ) NOT IN ( SELECT a, b FROM t1 );
>> +
>> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
>> +EXPLAIN
>> +SELECT * FROM t3 WHERE ( e, f ) NOT IN ( SELECT c, d FROM t2 );
>> +SELECT * FROM t3 WHERE ( e, f ) NOT IN ( SELECT c, d FROM t2 );
>> +
>> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
>> +EXPLAIN
>> +SELECT * FROM t2 WHERE ( c, d ) NOT IN ( SELECT e, f FROM t3 );
>> +SELECT * FROM t2 WHERE ( c, d ) NOT IN ( SELECT e, f FROM t3 );
>> +
>> +SELECT * FROM t1 WHERE ( a, b ) NOT IN +  ( SELECT c, d FROM t2 
>> WHERE c = 1 AND c <> 1 );
>> +
>> +SELECT * FROM t1 WHERE b NOT IN ( SELECT c FROM t2 WHERE c = 1 );
>> +
>> +SELECT * FROM t1 WHERE NULL NOT IN ( SELECT c FROM t2 WHERE c = 1 
>> AND c <> 1 );
>> +
>> +DROP TABLE t1, t2, t3, t4, t5;
>> +
>>
>>  --echo #
>>  --echo # End of 5.1 tests.
>>
>> === modified file 'sql/item_cmpfunc.cc'
>> --- a/sql/item_cmpfunc.cc    2010-08-05 10:42:14 +0000
>> +++ b/sql/item_cmpfunc.cc    2010-08-06 13:26:42 +0000
>> @@ -1751,6 +1751,34 @@ bool Item_in_optimizer::fix_fields(THD *
>>  }
>>
>>
>> +/**
>> +   The implementation of optimized <outer expression> [NOT] IN 
>> <subquery>
>> +   predicates. The implementation works as follows.
>> +
>> +   For the current value of the outer expression
>> +   +   - If it contains only NULL values, the original (before 
>> rewrite by the
>> +     Item_in_subselect rewrite methods) inner subquery is correlated 
>> and it
>
> correlated or _not_ correlated?
Oops. Fixed.
>
>> +     was previously executed, there is no need to re-execute it, and 
>> the
>> +     previous return value is returned.
>> +
>> +   - If it contains NULL values, check if there is a partial match for
>> +     the inner query block by evaluating the query. The evaulation will
>
> evaulation
fixed.
>
>> +     proceed according to special rules set up elsewhere. These 
>> rules include:
>> +
>> +     - The HAVING NOT <column> IS NULL conditions added by the 
>> aforementioned
>
> here I would write <inner_column> instead of <column>
fixed.
>
>> +       rewrite methods will detect whether they evaulated a NULL 
>> value and if
>
> evaulated
fixed.
>
>> +       so, will cause the subquery to evaluate to NULL. See
>> +       Item_in_subselect::val_bool().
>> +     +     - If there is an eligible index for executing the 
>> subquery, the special
>> +       access method "Full scan on NULL key" is employed which 
>> ensures that
>> +       the inner query will detect if there are NULL values 
>> resulting from the
>> +       inner query.
>> +
>> +   - If it contains no NULL values, the call is forwarded to the 
>> inner query
>> +     block.
>> + */
>>  longlong Item_in_optimizer::val_int()
>>  {
>>    bool tmp;
>
>> @@ -1818,8 +1846,11 @@ longlong Item_in_optimizer::val_int()
>>        else        {
>>          /* The subquery has to be evaluated */
>> -        (void) args[1]->val_bool_result();
>> -        null_value= !item_subs->engine->no_rows();
>> +        (void) item_subs->val_bool_result();
>> +        if (item_subs->engine->no_rows())
>> +          null_value= item_subs->null_value;
>> +        else
>> +          null_value= TRUE;
>
> This is correct but I think we need a long and clear comment to 
> explain the logic of this if/else (don't hesitate to write a long 
> example in the comment, that's the only way I understand it, and I 
> think future readers would benefit from it).
Done. Please check it for stupid mistakes and typos :*)
>
>>          if (all_left_cols_null)
>>            result_for_null_param= null_value;
>>        }
>
> The code itself is ok to push. Tests' results seem ok (I guess they 
> would fail without the code fix?).
That is your job to make sure ;-)
> If QA had a RQG grammar about partial matches, or could quickly write 
> one, it would be good to run it... it's hard to be 100% sure about 
> fixes to this subselect code.
Good idea. Who is doing that stuff nowadays? Actually I think most of 
this code should be removed and replaced with semijoin, flattening, or 
materialization because they are all superior. IMHO there's no need to 
have an inferior solution hanging about that is near impossible to 
maintain and just as buggy. I'll ask the team about it.

Best Regards

Martin
Thread
Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3478)Bug#51070Guilhem Bichot25 Aug
  • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3478)Bug#51070Roy Lyseng25 Aug
  • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3478)Bug#51070Martin Hansson30 Aug
    • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3478)Bug#51070Guilhem Bichot31 Aug
      • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3478)Bug#51070Roy Lyseng31 Aug