List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:August 31 2010 9:37am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496)
Bug#51070
View as plain text  
Hello,

Martin Hansson a écrit, Le 30.08.2010 11:54:
> #At file:///data0/martin/bzr/bug51070/5.1bt/ based on
> revid:ramil@stripped
> 
>  3496 Martin Hansson	2010-08-30
>       Bug#51070: Query with a NOT IN subquery predicate returns a wrong result set
>       
>       The EXISTS transformation has additional swithces to catch most corner cases

swithces

"most" worries me: after your patch, are they still known corner cases 
where IN->EXISTS is broken?

>       that appear when transforming an IN predicate into EXISTS. Guarded conditions
>       are used which are deactivated when a NULL value is seen in the outer
>       expression's row. When the inner query block supplies NULL values, however,
>       they are filtered out because no distinction is made between the guarded
>       conditions; guarded NOT x IS NULL conditions in the HAVING clause that filter
>       out NULL values cannot be de-activated in isolation from those that match
>       values or from the outer expression or NULL's.

yes but is this the problem? This wish to de-activate one in isolation 
from the other is what was implemented by your first patch, but you 
switched to another solution since. Is there a real, logical problem in 
not being able to de-activate one in isolation from the other? You were 
able to fix the bug without de-activating one in isolation from the other.

>       
>       The above problem is handled by making the guarded conditions remember whether
>       they have rejected a NULL value or not

even before the patch the guarded conditions remembered this 
(Item_is_not_null_test), but Item_in_optimizer failed to collect what 
they remembered.

>, and index access methods are taking
>       this into account as well. 
>       
>       The bug consisted of 
>       
>       1) Not resetting the property for every nested loop iteration on the inner
>          query's result
>       
>       2) Not propagating the NULL result properly from inner query to IN optimizer.
>       
>       3) A hack that may or may not have been needed at some point. According to a
>          comment it was aimed to fix #2 by returning NULL when FALSE was actually
>          the result. This caused failures when #2 was properly fixed. The hack is
>          now removed.
> 
>     modified:
>       mysql-test/r/subselect4.result
>       mysql-test/t/subselect4.test
>       sql/item_cmpfunc.cc
>       sql/item_subselect.cc
> === modified file 'mysql-test/r/subselect4.result'
> --- a/mysql-test/r/subselect4.result	2010-08-05 10:42:14 +0000
> +++ b/mysql-test/r/subselect4.result	2010-08-30 09:54:44 +0000
> @@ -77,6 +77,92 @@ Note	1249	Select 2 was reduced during op
>  CREATE VIEW v1 AS SELECT 1 LIKE ( 1 IN ( SELECT 1 ) );
>  CREATE VIEW v2 AS SELECT 1 LIKE '%' ESCAPE ( 1 IN ( SELECT 1 ) );
>  DROP VIEW v1, v2;
> +# 
> +# Bug#51070: Query with a NOT IN subquery predicate returns a wrong
> +# result set
> +# 

test results are correct, I verified.
I guess they were incorrect without the code patch.

> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc	2010-08-05 10:42:14 +0000
> +++ b/sql/item_cmpfunc.cc	2010-08-30 09:54:44 +0000
> @@ -1751,6 +1751,69 @@ bool Item_in_optimizer::fix_fields(THD *
>  }
>  
>  
> +/**
> +   The implementation of optimized <outer expression> [NOT] IN
> <subquery>
> +   predicates. The implementation works as follows.
> +
> +   For the current value of the outer expression
> +   
> +   - If it contains only NULL values, the original (before rewrite by the
> +     Item_in_subselect rewrite methods) inner subquery is non-correlated and
> +     was previously executed, there is no need to re-execute it, and the
> +     previous return value is returned.
> +
> +   - If it contains NULL values, check if there is a partial match for the
> +     inner query block by evaluating it. For clarity we repeat here the
> +     transformation previously performed on the sub-query. The expression
> +
> +     ( oc_1, ..., oc_n ) 
> +     <in predicate> 
> +     ( SELECT ic_1, ..., ic_n
> +       FROM <table>
> +       WHERE <inner where> 
> +     )
> +
> +     was transformed into
> +
> +     ( oc_1, ..., oc_n ) 
> +     <in predicate> 
> +     ( SELECT ic_1, ..., ic_n 
> +       FROM <table> 
> +       WHERE <inner where> AND ... ( ic_k = oc_k OR ic IS NULL ) 

ic IS NULL -> ic_k IS NULL

> +       HAVING ... NOT ic_k IS NULL
> +     )
> +
> +     The evaluation will now proceed according to special rules set up
> +     elsewhere. These rules include:
> +
> +     - The HAVING NOT <inner column> IS NULL conditions added by the
> +       aforementioned rewrite methods will detect whether they evaluated (and
> +       rejected) a NULL value and if so, will cause the subquery to evaluate
> +       to NULL. See Item_in_subselect::val_bool().

and I would add "and see Item_is_not_null_test::val_int()". Because 
Item_is_not_null_test::val_int() sets owner->was_null when it sees a 
NULL, and 'owner' is Item_in_subselect, and then 
Item_in_subselect::val_bool() reads its was_null and sets its null_value 
accordingly.

> +     - The added WHERE and HAVING conditions are present only for those inner
> +       columns that correspond to outer column that are not NULL at the moment.
> +     
> +     - If there is an eligible index for executing the subquery, the special
> +       access method "Full scan on NULL key" is employed which ensures that
> +       the inner query will detect if there are NULL values resulting from the
> +       inner query. This access method will quietly resort to table scan if it
> +       needs to find NULL values as well.
> +
> +     - Under these conditions, the sub-query need only be evaluated in order to
> +       find out whether it produced any rows.
> +     
> +       - If it did, we know that there was a partial match since there are
> +         NULL values in the outer row expression.
> +
> +       - If it did not, the result is FALSE or UNKNOWN. If at least one of the
> +         HAVING sub-predicates rejected a NULL value corresponding to an outer
> +         non-NULL, and hence the inner query block returns UNKNONW upon

UNKNOWN

> +         evaluation, there was a partial match and the result is UNKNOWN.
> +
> +   - If it contains no NULL values, the call is forwarded to the inner query
> +     block.
> + */

Thank you for this very good comment. It will enlighten future developers.

Ok to push.

-- 
Mr. Guilhem Bichot <guilhem.bichot@stripped>
Oracle / MySQL / Optimizer team, Lead Software Engineer
Bordeaux, France
www.oracle.com / www.mysql.com
Thread
bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496) Bug#51070Martin Hansson30 Aug
  • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496)Bug#51070Guilhem Bichot31 Aug
    • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496)Bug#51070Martin Hansson31 Aug