List:Commits« Previous MessageNext Message »
From:Martin Hansson Date:August 31 2010 11:46am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496)
Bug#51070
View as plain text  
Guilhem Bichot wrote:
> Hello,
>
> Martin Hansson a écrit, Le 30.08.2010 11:54:
>> #At file:///data0/martin/bzr/bug51070/5.1bt/ based on 
>> revid:ramil@stripped
>>
>>  3496 Martin Hansson    2010-08-30
>>       Bug#51070: Query with a NOT IN subquery predicate returns a 
>> wrong result set
>>             The EXISTS transformation has additional swithces to 
>> catch most corner cases
>
> swithces
>
> "most" worries me: after your patch, are they still known corner cases 
> where IN->EXISTS is broken?
Hmm, I guess my sarcasm didn't come across right. Changed to "the known 
corner cases".
>
>>       that appear when transforming an IN predicate into EXISTS. 
>> Guarded conditions
>>       are used which are deactivated when a NULL value is seen in the 
>> outer
>>       expression's row. When the inner query block supplies NULL 
>> values, however,
>>       they are filtered out because no distinction is made between 
>> the guarded
>>       conditions; guarded NOT x IS NULL conditions in the HAVING 
>> clause that filter
>>       out NULL values cannot be de-activated in isolation from those 
>> that match
>>       values or from the outer expression or NULL's.
>
> yes but is this the problem? This wish to de-activate one in isolation 
> from the other is what was implemented by your first patch, 
No. If this is a horizontal division, mine was vertical. They are still 
de-activateable on  a per-column bases and always  were. I separated the 
HAVING from the WHERE.
> but you switched to another solution since. Is there a real, logical 
> problem in not being able to de-activate one in isolation from the 
> other? You were able to fix the bug without de-activating one in 
> isolation from the other.
>
>>             The above problem is handled by making the guarded 
>> conditions remember whether
>>       they have rejected a NULL value or not
>
> even before the patch the guarded conditions remembered this 
> (Item_is_not_null_test), but Item_in_optimizer failed to collect what 
> they remembered.
Yes. I am describing the workings of the transformation in order for a 
reader to understand what I did. I realize I should add a line at the 
bottom to say what I actually did. Now done.
>
>> , and index access methods are taking
>>       this into account as well.             The bug consisted of 
>>             1) Not resetting the property for every nested loop 
>> iteration on the inner
>>          query's result
>>             2) Not propagating the NULL result properly from inner 
>> query to IN optimizer.
>>             3) A hack that may or may not have been needed at some 
>> point. According to a
>>          comment it was aimed to fix #2 by returning NULL when FALSE 
>> was actually
>>          the result. This caused failures when #2 was properly fixed. 
>> The hack is
>>          now removed.
>>
>>     modified:
>>       mysql-test/r/subselect4.result
>>       mysql-test/t/subselect4.test
>>       sql/item_cmpfunc.cc
>>       sql/item_subselect.cc
>> === modified file 'mysql-test/r/subselect4.result'
>> --- a/mysql-test/r/subselect4.result    2010-08-05 10:42:14 +0000
>> +++ b/mysql-test/r/subselect4.result    2010-08-30 09:54:44 +0000
>> @@ -77,6 +77,92 @@ Note    1249    Select 2 was reduced during op
>>  CREATE VIEW v1 AS SELECT 1 LIKE ( 1 IN ( SELECT 1 ) );
>>  CREATE VIEW v2 AS SELECT 1 LIKE '%' ESCAPE ( 1 IN ( SELECT 1 ) );
>>  DROP VIEW v1, v2;
>> +# +# Bug#51070: Query with a NOT IN subquery predicate returns a wrong
>> +# result set
>> +# 
>
> test results are correct, I verified.
> I guess they were incorrect without the code patch.
>
>> === modified file 'sql/item_cmpfunc.cc'
>> --- a/sql/item_cmpfunc.cc    2010-08-05 10:42:14 +0000
>> +++ b/sql/item_cmpfunc.cc    2010-08-30 09:54:44 +0000
>> @@ -1751,6 +1751,69 @@ bool Item_in_optimizer::fix_fields(THD *
>>  }
>>  
>>  
>> +/**
>> +   The implementation of optimized <outer expression> [NOT] IN 
>> <subquery>
>> +   predicates. The implementation works as follows.
>> +
>> +   For the current value of the outer expression
>> +   +   - If it contains only NULL values, the original (before 
>> rewrite by the
>> +     Item_in_subselect rewrite methods) inner subquery is 
>> non-correlated and
>> +     was previously executed, there is no need to re-execute it, and 
>> the
>> +     previous return value is returned.
>> +
>> +   - If it contains NULL values, check if there is a partial match 
>> for the
>> +     inner query block by evaluating it. For clarity we repeat here the
>> +     transformation previously performed on the sub-query. The 
>> expression
>> +
>> +     ( oc_1, ..., oc_n ) +     <in predicate> +     ( SELECT ic_1, 
>> ..., ic_n
>> +       FROM <table>
>> +       WHERE <inner where> +     )
>> +
>> +     was transformed into
>> +
>> +     ( oc_1, ..., oc_n ) +     <in predicate> +     ( SELECT ic_1, 
>> ..., ic_n +       FROM <table> +       WHERE <inner where> AND ... (
> 
>> ic_k = oc_k OR ic IS NULL ) 
>
> ic IS NULL -> ic_k IS NULL
>
>> +       HAVING ... NOT ic_k IS NULL
>> +     )
>> +
>> +     The evaluation will now proceed according to special rules set up
>> +     elsewhere. These rules include:
>> +
>> +     - The HAVING NOT <inner column> IS NULL conditions added by the
>> +       aforementioned rewrite methods will detect whether they 
>> evaluated (and
>> +       rejected) a NULL value and if so, will cause the subquery to 
>> evaluate
>> +       to NULL. See Item_in_subselect::val_bool().
>
> and I would add "and see Item_is_not_null_test::val_int()". Because 
> Item_is_not_null_test::val_int() sets owner->was_null when it sees a 
> NULL, and 'owner' is Item_in_subselect, and then 
> Item_in_subselect::val_bool() reads its was_null and sets its 
> null_value accordingly.
Correct. I added them to an @see section below.
>
>> +     - The added WHERE and HAVING conditions are present only for 
>> those inner
>> +       columns that correspond to outer column that are not NULL at 
>> the moment.
>> +     +     - If there is an eligible index for executing the 
>> subquery, the special
>> +       access method "Full scan on NULL key" is employed which 
>> ensures that
>> +       the inner query will detect if there are NULL values 
>> resulting from the
>> +       inner query. This access method will quietly resort to table 
>> scan if it
>> +       needs to find NULL values as well.
>> +
>> +     - Under these conditions, the sub-query need only be evaluated 
>> in order to
>> +       find out whether it produced any rows.
>> +     +       - If it did, we know that there was a partial match 
>> since there are
>> +         NULL values in the outer row expression.
>> +
>> +       - If it did not, the result is FALSE or UNKNOWN. If at least 
>> one of the
>> +         HAVING sub-predicates rejected a NULL value corresponding 
>> to an outer
>> +         non-NULL, and hence the inner query block returns UNKNONW upon
>
> UNKNOWN
Man, what is with me these days?! ;-)
>
>> +         evaluation, there was a partial match and the result is 
>> UNKNOWN.
>> +
>> +   - If it contains no NULL values, the call is forwarded to the 
>> inner query
>> +     block.
>> + */
>
> Thank you for this very good comment. It will enlighten future 
> developers. 
Yeah, I don't wish for anyone else to go through what I had to... :-(

Best Regards

Martin
Thread
bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496) Bug#51070Martin Hansson30 Aug
  • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496)Bug#51070Guilhem Bichot31 Aug
    • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496)Bug#51070Martin Hansson31 Aug