Guilhem Bichot wrote:
> Hello,
>
> Martin Hansson a écrit, Le 30.08.2010 11:54:
>> #At file:///data0/martin/bzr/bug51070/5.1bt/ based on
>> revid:ramil@stripped
>>
>> 3496 Martin Hansson 2010-08-30
>> Bug#51070: Query with a NOT IN subquery predicate returns a
>> wrong result set
>> The EXISTS transformation has additional swithces to
>> catch most corner cases
>
> swithces
>
> "most" worries me: after your patch, are they still known corner cases
> where IN->EXISTS is broken?
Hmm, I guess my sarcasm didn't come across right. Changed to "the known
corner cases".
>
>> that appear when transforming an IN predicate into EXISTS.
>> Guarded conditions
>> are used which are deactivated when a NULL value is seen in the
>> outer
>> expression's row. When the inner query block supplies NULL
>> values, however,
>> they are filtered out because no distinction is made between
>> the guarded
>> conditions; guarded NOT x IS NULL conditions in the HAVING
>> clause that filter
>> out NULL values cannot be de-activated in isolation from those
>> that match
>> values or from the outer expression or NULL's.
>
> yes but is this the problem? This wish to de-activate one in isolation
> from the other is what was implemented by your first patch,
No. If this is a horizontal division, mine was vertical. They are still
de-activateable on a per-column bases and always were. I separated the
HAVING from the WHERE.
> but you switched to another solution since. Is there a real, logical
> problem in not being able to de-activate one in isolation from the
> other? You were able to fix the bug without de-activating one in
> isolation from the other.
>
>> The above problem is handled by making the guarded
>> conditions remember whether
>> they have rejected a NULL value or not
>
> even before the patch the guarded conditions remembered this
> (Item_is_not_null_test), but Item_in_optimizer failed to collect what
> they remembered.
Yes. I am describing the workings of the transformation in order for a
reader to understand what I did. I realize I should add a line at the
bottom to say what I actually did. Now done.
>
>> , and index access methods are taking
>> this into account as well. The bug consisted of
>> 1) Not resetting the property for every nested loop
>> iteration on the inner
>> query's result
>> 2) Not propagating the NULL result properly from inner
>> query to IN optimizer.
>> 3) A hack that may or may not have been needed at some
>> point. According to a
>> comment it was aimed to fix #2 by returning NULL when FALSE
>> was actually
>> the result. This caused failures when #2 was properly fixed.
>> The hack is
>> now removed.
>>
>> modified:
>> mysql-test/r/subselect4.result
>> mysql-test/t/subselect4.test
>> sql/item_cmpfunc.cc
>> sql/item_subselect.cc
>> === modified file 'mysql-test/r/subselect4.result'
>> --- a/mysql-test/r/subselect4.result 2010-08-05 10:42:14 +0000
>> +++ b/mysql-test/r/subselect4.result 2010-08-30 09:54:44 +0000
>> @@ -77,6 +77,92 @@ Note 1249 Select 2 was reduced during op
>> CREATE VIEW v1 AS SELECT 1 LIKE ( 1 IN ( SELECT 1 ) );
>> CREATE VIEW v2 AS SELECT 1 LIKE '%' ESCAPE ( 1 IN ( SELECT 1 ) );
>> DROP VIEW v1, v2;
>> +# +# Bug#51070: Query with a NOT IN subquery predicate returns a wrong
>> +# result set
>> +#
>
> test results are correct, I verified.
> I guess they were incorrect without the code patch.
>
>> === modified file 'sql/item_cmpfunc.cc'
>> --- a/sql/item_cmpfunc.cc 2010-08-05 10:42:14 +0000
>> +++ b/sql/item_cmpfunc.cc 2010-08-30 09:54:44 +0000
>> @@ -1751,6 +1751,69 @@ bool Item_in_optimizer::fix_fields(THD *
>> }
>>
>>
>> +/**
>> + The implementation of optimized <outer expression> [NOT] IN
>> <subquery>
>> + predicates. The implementation works as follows.
>> +
>> + For the current value of the outer expression
>> + + - If it contains only NULL values, the original (before
>> rewrite by the
>> + Item_in_subselect rewrite methods) inner subquery is
>> non-correlated and
>> + was previously executed, there is no need to re-execute it, and
>> the
>> + previous return value is returned.
>> +
>> + - If it contains NULL values, check if there is a partial match
>> for the
>> + inner query block by evaluating it. For clarity we repeat here the
>> + transformation previously performed on the sub-query. The
>> expression
>> +
>> + ( oc_1, ..., oc_n ) + <in predicate> + ( SELECT ic_1,
>> ..., ic_n
>> + FROM <table>
>> + WHERE <inner where> + )
>> +
>> + was transformed into
>> +
>> + ( oc_1, ..., oc_n ) + <in predicate> + ( SELECT ic_1,
>> ..., ic_n + FROM <table> + WHERE <inner where> AND ... (
>
>> ic_k = oc_k OR ic IS NULL )
>
> ic IS NULL -> ic_k IS NULL
>
>> + HAVING ... NOT ic_k IS NULL
>> + )
>> +
>> + The evaluation will now proceed according to special rules set up
>> + elsewhere. These rules include:
>> +
>> + - The HAVING NOT <inner column> IS NULL conditions added by the
>> + aforementioned rewrite methods will detect whether they
>> evaluated (and
>> + rejected) a NULL value and if so, will cause the subquery to
>> evaluate
>> + to NULL. See Item_in_subselect::val_bool().
>
> and I would add "and see Item_is_not_null_test::val_int()". Because
> Item_is_not_null_test::val_int() sets owner->was_null when it sees a
> NULL, and 'owner' is Item_in_subselect, and then
> Item_in_subselect::val_bool() reads its was_null and sets its
> null_value accordingly.
Correct. I added them to an @see section below.
>
>> + - The added WHERE and HAVING conditions are present only for
>> those inner
>> + columns that correspond to outer column that are not NULL at
>> the moment.
>> + + - If there is an eligible index for executing the
>> subquery, the special
>> + access method "Full scan on NULL key" is employed which
>> ensures that
>> + the inner query will detect if there are NULL values
>> resulting from the
>> + inner query. This access method will quietly resort to table
>> scan if it
>> + needs to find NULL values as well.
>> +
>> + - Under these conditions, the sub-query need only be evaluated
>> in order to
>> + find out whether it produced any rows.
>> + + - If it did, we know that there was a partial match
>> since there are
>> + NULL values in the outer row expression.
>> +
>> + - If it did not, the result is FALSE or UNKNOWN. If at least
>> one of the
>> + HAVING sub-predicates rejected a NULL value corresponding
>> to an outer
>> + non-NULL, and hence the inner query block returns UNKNONW upon
>
> UNKNOWN
Man, what is with me these days?! ;-)
>
>> + evaluation, there was a partial match and the result is
>> UNKNOWN.
>> +
>> + - If it contains no NULL values, the call is forwarded to the
>> inner query
>> + block.
>> + */
>
> Thank you for this very good comment. It will enlighten future
> developers.
Yeah, I don't wish for anyone else to go through what I had to... :-(
Best Regards
Martin