Hello,
Martin Hansson a écrit, Le 30.08.2010 11:54:
> #At file:///data0/martin/bzr/bug51070/5.1bt/ based on
> revid:ramil@stripped
>
> 3496 Martin Hansson 2010-08-30
> Bug#51070: Query with a NOT IN subquery predicate returns a wrong result set
>
> The EXISTS transformation has additional swithces to catch most corner cases
swithces
"most" worries me: after your patch, are they still known corner cases
where IN->EXISTS is broken?
> that appear when transforming an IN predicate into EXISTS. Guarded conditions
> are used which are deactivated when a NULL value is seen in the outer
> expression's row. When the inner query block supplies NULL values, however,
> they are filtered out because no distinction is made between the guarded
> conditions; guarded NOT x IS NULL conditions in the HAVING clause that filter
> out NULL values cannot be de-activated in isolation from those that match
> values or from the outer expression or NULL's.
yes but is this the problem? This wish to de-activate one in isolation
from the other is what was implemented by your first patch, but you
switched to another solution since. Is there a real, logical problem in
not being able to de-activate one in isolation from the other? You were
able to fix the bug without de-activating one in isolation from the other.
>
> The above problem is handled by making the guarded conditions remember whether
> they have rejected a NULL value or not
even before the patch the guarded conditions remembered this
(Item_is_not_null_test), but Item_in_optimizer failed to collect what
they remembered.
>, and index access methods are taking
> this into account as well.
>
> The bug consisted of
>
> 1) Not resetting the property for every nested loop iteration on the inner
> query's result
>
> 2) Not propagating the NULL result properly from inner query to IN optimizer.
>
> 3) A hack that may or may not have been needed at some point. According to a
> comment it was aimed to fix #2 by returning NULL when FALSE was actually
> the result. This caused failures when #2 was properly fixed. The hack is
> now removed.
>
> modified:
> mysql-test/r/subselect4.result
> mysql-test/t/subselect4.test
> sql/item_cmpfunc.cc
> sql/item_subselect.cc
> === modified file 'mysql-test/r/subselect4.result'
> --- a/mysql-test/r/subselect4.result 2010-08-05 10:42:14 +0000
> +++ b/mysql-test/r/subselect4.result 2010-08-30 09:54:44 +0000
> @@ -77,6 +77,92 @@ Note 1249 Select 2 was reduced during op
> CREATE VIEW v1 AS SELECT 1 LIKE ( 1 IN ( SELECT 1 ) );
> CREATE VIEW v2 AS SELECT 1 LIKE '%' ESCAPE ( 1 IN ( SELECT 1 ) );
> DROP VIEW v1, v2;
> +#
> +# Bug#51070: Query with a NOT IN subquery predicate returns a wrong
> +# result set
> +#
test results are correct, I verified.
I guess they were incorrect without the code patch.
> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc 2010-08-05 10:42:14 +0000
> +++ b/sql/item_cmpfunc.cc 2010-08-30 09:54:44 +0000
> @@ -1751,6 +1751,69 @@ bool Item_in_optimizer::fix_fields(THD *
> }
>
>
> +/**
> + The implementation of optimized <outer expression> [NOT] IN
> <subquery>
> + predicates. The implementation works as follows.
> +
> + For the current value of the outer expression
> +
> + - If it contains only NULL values, the original (before rewrite by the
> + Item_in_subselect rewrite methods) inner subquery is non-correlated and
> + was previously executed, there is no need to re-execute it, and the
> + previous return value is returned.
> +
> + - If it contains NULL values, check if there is a partial match for the
> + inner query block by evaluating it. For clarity we repeat here the
> + transformation previously performed on the sub-query. The expression
> +
> + ( oc_1, ..., oc_n )
> + <in predicate>
> + ( SELECT ic_1, ..., ic_n
> + FROM <table>
> + WHERE <inner where>
> + )
> +
> + was transformed into
> +
> + ( oc_1, ..., oc_n )
> + <in predicate>
> + ( SELECT ic_1, ..., ic_n
> + FROM <table>
> + WHERE <inner where> AND ... ( ic_k = oc_k OR ic IS NULL )
ic IS NULL -> ic_k IS NULL
> + HAVING ... NOT ic_k IS NULL
> + )
> +
> + The evaluation will now proceed according to special rules set up
> + elsewhere. These rules include:
> +
> + - The HAVING NOT <inner column> IS NULL conditions added by the
> + aforementioned rewrite methods will detect whether they evaluated (and
> + rejected) a NULL value and if so, will cause the subquery to evaluate
> + to NULL. See Item_in_subselect::val_bool().
and I would add "and see Item_is_not_null_test::val_int()". Because
Item_is_not_null_test::val_int() sets owner->was_null when it sees a
NULL, and 'owner' is Item_in_subselect, and then
Item_in_subselect::val_bool() reads its was_null and sets its null_value
accordingly.
> + - The added WHERE and HAVING conditions are present only for those inner
> + columns that correspond to outer column that are not NULL at the moment.
> +
> + - If there is an eligible index for executing the subquery, the special
> + access method "Full scan on NULL key" is employed which ensures that
> + the inner query will detect if there are NULL values resulting from the
> + inner query. This access method will quietly resort to table scan if it
> + needs to find NULL values as well.
> +
> + - Under these conditions, the sub-query need only be evaluated in order to
> + find out whether it produced any rows.
> +
> + - If it did, we know that there was a partial match since there are
> + NULL values in the outer row expression.
> +
> + - If it did not, the result is FALSE or UNKNOWN. If at least one of the
> + HAVING sub-predicates rejected a NULL value corresponding to an outer
> + non-NULL, and hence the inner query block returns UNKNONW upon
UNKNOWN
> + evaluation, there was a partial match and the result is UNKNOWN.
> +
> + - If it contains no NULL values, the call is forwarded to the inner query
> + block.
> + */
Thank you for this very good comment. It will enlighten future developers.
Ok to push.
--
Mr. Guilhem Bichot <guilhem.bichot@stripped>
Oracle / MySQL / Optimizer team, Lead Software Engineer
Bordeaux, France
www.oracle.com / www.mysql.com