From: Martin Hansson Date: August 31 2010 11:46am Subject: Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3496) Bug#51070 List-Archive: http://lists.mysql.com/commits/117216 Message-Id: <4C7CEB89.5040100@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Guilhem Bichot wrote: > Hello, > > Martin Hansson a écrit, Le 30.08.2010 11:54: >> #At file:///data0/martin/bzr/bug51070/5.1bt/ based on >> revid:ramil@stripped >> >> 3496 Martin Hansson 2010-08-30 >> Bug#51070: Query with a NOT IN subquery predicate returns a >> wrong result set >> The EXISTS transformation has additional swithces to >> catch most corner cases > > swithces > > "most" worries me: after your patch, are they still known corner cases > where IN->EXISTS is broken? Hmm, I guess my sarcasm didn't come across right. Changed to "the known corner cases". > >> that appear when transforming an IN predicate into EXISTS. >> Guarded conditions >> are used which are deactivated when a NULL value is seen in the >> outer >> expression's row. When the inner query block supplies NULL >> values, however, >> they are filtered out because no distinction is made between >> the guarded >> conditions; guarded NOT x IS NULL conditions in the HAVING >> clause that filter >> out NULL values cannot be de-activated in isolation from those >> that match >> values or from the outer expression or NULL's. > > yes but is this the problem? This wish to de-activate one in isolation > from the other is what was implemented by your first patch, No. If this is a horizontal division, mine was vertical. They are still de-activateable on a per-column bases and always were. I separated the HAVING from the WHERE. > but you switched to another solution since. Is there a real, logical > problem in not being able to de-activate one in isolation from the > other? You were able to fix the bug without de-activating one in > isolation from the other. > >> The above problem is handled by making the guarded >> conditions remember whether >> they have rejected a NULL value or not > > even before the patch the guarded conditions remembered this > (Item_is_not_null_test), but Item_in_optimizer failed to collect what > they remembered. Yes. I am describing the workings of the transformation in order for a reader to understand what I did. I realize I should add a line at the bottom to say what I actually did. Now done. > >> , and index access methods are taking >> this into account as well. The bug consisted of >> 1) Not resetting the property for every nested loop >> iteration on the inner >> query's result >> 2) Not propagating the NULL result properly from inner >> query to IN optimizer. >> 3) A hack that may or may not have been needed at some >> point. According to a >> comment it was aimed to fix #2 by returning NULL when FALSE >> was actually >> the result. This caused failures when #2 was properly fixed. >> The hack is >> now removed. >> >> modified: >> mysql-test/r/subselect4.result >> mysql-test/t/subselect4.test >> sql/item_cmpfunc.cc >> sql/item_subselect.cc >> === modified file 'mysql-test/r/subselect4.result' >> --- a/mysql-test/r/subselect4.result 2010-08-05 10:42:14 +0000 >> +++ b/mysql-test/r/subselect4.result 2010-08-30 09:54:44 +0000 >> @@ -77,6 +77,92 @@ Note 1249 Select 2 was reduced during op >> CREATE VIEW v1 AS SELECT 1 LIKE ( 1 IN ( SELECT 1 ) ); >> CREATE VIEW v2 AS SELECT 1 LIKE '%' ESCAPE ( 1 IN ( SELECT 1 ) ); >> DROP VIEW v1, v2; >> +# +# Bug#51070: Query with a NOT IN subquery predicate returns a wrong >> +# result set >> +# > > test results are correct, I verified. > I guess they were incorrect without the code patch. > >> === modified file 'sql/item_cmpfunc.cc' >> --- a/sql/item_cmpfunc.cc 2010-08-05 10:42:14 +0000 >> +++ b/sql/item_cmpfunc.cc 2010-08-30 09:54:44 +0000 >> @@ -1751,6 +1751,69 @@ bool Item_in_optimizer::fix_fields(THD * >> } >> >> >> +/** >> + The implementation of optimized [NOT] IN >> >> + predicates. The implementation works as follows. >> + >> + For the current value of the outer expression >> + + - If it contains only NULL values, the original (before >> rewrite by the >> + Item_in_subselect rewrite methods) inner subquery is >> non-correlated and >> + was previously executed, there is no need to re-execute it, and >> the >> + previous return value is returned. >> + >> + - If it contains NULL values, check if there is a partial match >> for the >> + inner query block by evaluating it. For clarity we repeat here the >> + transformation previously performed on the sub-query. The >> expression >> + >> + ( oc_1, ..., oc_n ) + + ( SELECT ic_1, >> ..., ic_n >> + FROM >> + WHERE + ) >> + >> + was transformed into >> + >> + ( oc_1, ..., oc_n ) + + ( SELECT ic_1, >> ..., ic_n + FROM
+ WHERE AND ... ( >> ic_k = oc_k OR ic IS NULL ) > > ic IS NULL -> ic_k IS NULL > >> + HAVING ... NOT ic_k IS NULL >> + ) >> + >> + The evaluation will now proceed according to special rules set up >> + elsewhere. These rules include: >> + >> + - The HAVING NOT IS NULL conditions added by the >> + aforementioned rewrite methods will detect whether they >> evaluated (and >> + rejected) a NULL value and if so, will cause the subquery to >> evaluate >> + to NULL. See Item_in_subselect::val_bool(). > > and I would add "and see Item_is_not_null_test::val_int()". Because > Item_is_not_null_test::val_int() sets owner->was_null when it sees a > NULL, and 'owner' is Item_in_subselect, and then > Item_in_subselect::val_bool() reads its was_null and sets its > null_value accordingly. Correct. I added them to an @see section below. > >> + - The added WHERE and HAVING conditions are present only for >> those inner >> + columns that correspond to outer column that are not NULL at >> the moment. >> + + - If there is an eligible index for executing the >> subquery, the special >> + access method "Full scan on NULL key" is employed which >> ensures that >> + the inner query will detect if there are NULL values >> resulting from the >> + inner query. This access method will quietly resort to table >> scan if it >> + needs to find NULL values as well. >> + >> + - Under these conditions, the sub-query need only be evaluated >> in order to >> + find out whether it produced any rows. >> + + - If it did, we know that there was a partial match >> since there are >> + NULL values in the outer row expression. >> + >> + - If it did not, the result is FALSE or UNKNOWN. If at least >> one of the >> + HAVING sub-predicates rejected a NULL value corresponding >> to an outer >> + non-NULL, and hence the inner query block returns UNKNONW upon > > UNKNOWN Man, what is with me these days?! ;-) > >> + evaluation, there was a partial match and the result is >> UNKNOWN. >> + >> + - If it contains no NULL values, the call is forwarded to the >> inner query >> + block. >> + */ > > Thank you for this very good comment. It will enlighten future > developers. Yeah, I don't wish for anyone else to go through what I had to... :-( Best Regards Martin