Hello Martin,
Martin Hansson a écrit, Le 06.08.2010 15:26:
> #At file:///data0/martin/bzr/bug51070/5.1bt-guilhem/ based on
> revid:bjorn.munch@stripped
>
> 3478 Martin Hansson 2010-08-06
> Bug#51070: Query with a NOT IN subquery predicate returns a wrong result set
>
> The EXISTS transformation has additional swithces to catch most corner cases
> that appear when translating an IN predicate (into EXISTS). Guarded conditions
> are used which are deactivated when a NULL value is seen in the outer
> expression's row. When the inner query block supplies NULL values, however,
> they are filtered out because no distinction is made between the guarded
> conditions; guarded NOT x IS NULL conditions in the HAVING clause that filter
> out NULL values cannot be de-activated in isolation from those that match
> values or from the outer expression or NULL's.
>
> The above problem is handled by making the guarded conditions remember whether
> they have rejected a NULL value or not, and index access methods are taking
> this into account as well.
>
> The bug consisted of
>
> 1) Not resetting the property for every nested loop iteration on the inner
> query's result
>
> 2) Not propagating the NULL result properly from inner query to IN optimizer.
>
> 3) A hack that aimed to fix #2 by returning NULL when false was actually
> the result. This caused failures when #2 was properly fixed.
>
> modified:
> mysql-test/r/subselect4.result
> mysql-test/t/subselect4.test
> sql/item_cmpfunc.cc
> sql/item_subselect.cc
> === modified file 'mysql-test/t/subselect4.test'
> --- a/mysql-test/t/subselect4.test 2010-08-05 10:42:14 +0000
> +++ b/mysql-test/t/subselect4.test 2010-08-06 13:26:42 +0000
> @@ -74,6 +74,67 @@ CREATE VIEW v1 AS SELECT 1 LIKE ( 1 IN (
> CREATE VIEW v2 AS SELECT 1 LIKE '%' ESCAPE ( 1 IN ( SELECT 1 ) );
> DROP VIEW v1, v2;
>
> +--echo #
> +--echo # Bug#51070: Query with a NOT IN subquery predicate returns a wrong
> +--echo # result set
> +--echo #
> +CREATE TABLE t1 ( a INT, b INT );
> +INSERT INTO t1 VALUES ( 1, NULL ), ( 2, NULL );
> +
> +CREATE TABLE t2 ( c INT, d INT );
> +INSERT INTO t2 VALUES ( NULL, 3 ), ( NULL, 4 );
> +
> +CREATE TABLE t3 ( e INT, f INT );
> +INSERT INTO t3 VALUES ( NULL, NULL ), ( NULL, NULL );
> +
> +CREATE TABLE t4 ( a INT );
> +INSERT INTO t4 VALUES (1), (2), (3);
> +
> +CREATE TABLE t5 ( a INT );
> +INSERT INTO t5 VALUES (NULL), (2);
> +
> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
> +EXPLAIN
> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT c, d FROM t2 );
> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT c, d FROM t2 );
> +
> +EXPLAIN
> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT c, d FROM t2 ) IS NULL;
for the query above you only do EXPLAIN, you don't run the SELECT.
> +SELECT * FROM t1 WHERE ( a, b ) IN ( SELECT c, d FROM t2 ) IS NULL;
> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT c, d FROM t2 ) IS UNKNOWN;
> +SELECT * FROM t1 WHERE (( a, b ) NOT IN ( SELECT c, d FROM t2 )) IS UNKNOWN;
as you add (), is it that
( a, b ) NOT IN ( SELECT c, d FROM t2 ) IS UNKNOWN
stands for
( a, b ) NOT IN (( SELECT c, d FROM t2 ) IS UNKNOWN)
? Mmmm no that wouldn't be parsable, (c,d) can't be compared to UNKNOWN.
Then maybe the added () are "just for fun"?
> +
> +SELECT * FROM t1 WHERE 1 = 1 AND ( a, b ) NOT IN ( SELECT c, d FROM t2 );
> +
> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
> +EXPLAIN
> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT e, f FROM t3 );
> +SELECT * FROM t1 WHERE ( a, b ) NOT IN ( SELECT e, f FROM t3 );
> +
> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
> +EXPLAIN
> +SELECT * FROM t2 WHERE ( c, d ) NOT IN ( SELECT a, b FROM t1 );
> +SELECT * FROM t2 WHERE ( c, d ) NOT IN ( SELECT a, b FROM t1 );
> +
> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
> +EXPLAIN
> +SELECT * FROM t3 WHERE ( e, f ) NOT IN ( SELECT c, d FROM t2 );
> +SELECT * FROM t3 WHERE ( e, f ) NOT IN ( SELECT c, d FROM t2 );
> +
> +--replace_column 1 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x
> +EXPLAIN
> +SELECT * FROM t2 WHERE ( c, d ) NOT IN ( SELECT e, f FROM t3 );
> +SELECT * FROM t2 WHERE ( c, d ) NOT IN ( SELECT e, f FROM t3 );
> +
> +SELECT * FROM t1 WHERE ( a, b ) NOT IN
> + ( SELECT c, d FROM t2 WHERE c = 1 AND c <> 1 );
> +
> +SELECT * FROM t1 WHERE b NOT IN ( SELECT c FROM t2 WHERE c = 1 );
> +
> +SELECT * FROM t1 WHERE NULL NOT IN ( SELECT c FROM t2 WHERE c = 1 AND c <> 1
> );
> +
> +DROP TABLE t1, t2, t3, t4, t5;
> +
>
> --echo #
> --echo # End of 5.1 tests.
>
> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc 2010-08-05 10:42:14 +0000
> +++ b/sql/item_cmpfunc.cc 2010-08-06 13:26:42 +0000
> @@ -1751,6 +1751,34 @@ bool Item_in_optimizer::fix_fields(THD *
> }
>
>
> +/**
> + The implementation of optimized <outer expression> [NOT] IN
> <subquery>
> + predicates. The implementation works as follows.
> +
> + For the current value of the outer expression
> +
> + - If it contains only NULL values, the original (before rewrite by the
> + Item_in_subselect rewrite methods) inner subquery is correlated and it
correlated or _not_ correlated?
> + was previously executed, there is no need to re-execute it, and the
> + previous return value is returned.
> +
> + - If it contains NULL values, check if there is a partial match for
> + the inner query block by evaluating the query. The evaulation will
evaulation
> + proceed according to special rules set up elsewhere. These rules include:
> +
> + - The HAVING NOT <column> IS NULL conditions added by the aforementioned
here I would write <inner_column> instead of <column>
> + rewrite methods will detect whether they evaulated a NULL value and if
evaulated
> + so, will cause the subquery to evaluate to NULL. See
> + Item_in_subselect::val_bool().
> +
> + - If there is an eligible index for executing the subquery, the special
> + access method "Full scan on NULL key" is employed which ensures that
> + the inner query will detect if there are NULL values resulting from the
> + inner query.
> +
> + - If it contains no NULL values, the call is forwarded to the inner query
> + block.
> + */
> longlong Item_in_optimizer::val_int()
> {
> bool tmp;
> @@ -1818,8 +1846,11 @@ longlong Item_in_optimizer::val_int()
> else
> {
> /* The subquery has to be evaluated */
> - (void) args[1]->val_bool_result();
> - null_value= !item_subs->engine->no_rows();
> + (void) item_subs->val_bool_result();
> + if (item_subs->engine->no_rows())
> + null_value= item_subs->null_value;
> + else
> + null_value= TRUE;
This is correct but I think we need a long and clear comment to explain
the logic of this if/else (don't hesitate to write a long example in the
comment, that's the only way I understand it, and I think future readers
would benefit from it).
> if (all_left_cols_null)
> result_for_null_param= null_value;
> }
The code itself is ok to push. Tests' results seem ok (I guess they
would fail without the code fix?).
If QA had a RQG grammar about partial matches, or could quickly write
one, it would be good to run it... it's hard to be 100% sure about fixes
to this subselect code.
--
Mr. Guilhem Bichot <guilhem.bichot@stripped>
Oracle / MySQL / Optimizer team, Lead Software Engineer
Bordeaux, France
www.oracle.com / www.mysql.com