From: Roy Lyseng Date: May 2 2011 10:21am Subject: Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881 Bug#11764086 List-Archive: http://lists.mysql.com/commits/136509 Message-Id: <4DBE85C1.4060301@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi Guilhem, thank you for the review. On 30.04.11 14.50, Guilhem Bichot wrote: > Hello Roy, > > Roy Lyseng a écrit, Le 17.03.2011 13:49: >> #At file:///home/rl136806/mysql/repo/mysql-trunk/ based on >> revid:tor.didriksen@stripped >> >> 3781 Roy Lyseng 2011-03-17 >> Bug#11764086 Bug#56881: Null left operand to NOT IN in WHERE clause >> behaves differently than real NULL >> The query that triggers this problem is an outer join query with >> a NOT IN subquery predicate in the WHERE clause. >> One of the tables in the outer join contains only one row, and because >> it is a MyISAM table, const table optimization is applied to it. >> Now the problem: not_null_tables() for the NOT IN predicate reports >> that we can ignore NULL-complemented rows for table child, hence the >> query is converted from a left join to an inner join. This leads >> the query executor to omit the row with id 2 from the evaluation. >> As for why the not_null_tables() result for NOT IN is wrong: >> NOT IN may return TRUE for a row, even if the left-hand expression >> to NOT IN is NULL (as in NULL NOT IN ()). >> This is what happens when the query executor evaluates the row with >> id 2 from the parent table: There is no corresponding row in the >> child table, a NULL-complemented row should be added, and the >> predicate NULL NOT IN ( is evaluated (and it should >> return TRUE). >> The solution to the problem is to implement not_null_tables() >> for Item_in_optimizer. If the Item_in_subselect member is "top level", >> meaning that the original query is an IN predicate, return the >> accumulated not-null information for the object, otherwise >> (the predicate is NOT IN) return an empty set. > > clear comment, saved me time, thanks > >> === modified file 'sql/item_cmpfunc.cc' >> --- a/sql/item_cmpfunc.cc 2011-03-17 09:47:50 +0000 >> +++ b/sql/item_cmpfunc.cc 2011-03-17 12:49:15 +0000 >> @@ -1794,6 +1794,22 @@ void Item_in_optimizer::fix_after_pullou >> const_item_cache&= args[1]->const_item(); >> } >> >> + >> +/** >> + For a NOT IN subquery predicate (for which the Item_in_subselect member is >> + not a \"top level\" item), null values passed from outer tables must be >> + considered in the evaluation, hence return an empty set of tables. >> +*/ >> + >> +table_map Item_in_optimizer::not_null_tables() const >> +{ >> + if (((Item_in_subselect*)args[1])->is_top_level_item()) > > Nitpick: please use static_cast Done. > >> + return not_null_tables_cache; > > Item_bool_func is the direct parent class of Item_in_optimizer, so I'd change > the "return" above to: > > return Item_bool_func::not_null_tables(); > > So that if one day Item_bool_func::not_null_tables() is changed, this is taken > into account here. Good idea. > >> + >> + return 0; > > Nitpick: I'd put comments in a different place: the function's comment normally > describes the function's purpose. This purpose is described already near the > declaration of Item::not_null_tables(). > What is not trivial is the logic inside, which, to me, should go into the body, > close to where the logic is coded, like: > > if (((Item_in_subselect*)args[1])->is_top_level_item()) > { > return not_null_tables_cache; > } > /* > For a NOT IN subquery predicate (for which the Item_in_subselect > member is not a \"top level\" item), null values passed from outer > tables must be considered in the evaluation, hence return an empty set > of tables. > */ > return 0; > > In other words, someone reading the doxygen output for class Item_in_optimizer > should probably not have to read the comment about "For a NOT IN etc". Done. > > I wonder whether we can replace "return 0" with some finer logic. I mean, what > we want is to express that, for "NOT IN", the outer expression's NULLness does > not imply that the item evaluates to false/null. This is specifically about the > outer expression; so instead of 0, we could return > "Item_bool_func::not_null_tables() minus the tables used in the outer > expression", like: > Item_bool_func::not_null_tables() & ~(args[0]->not_null_tables()); > I tested this and subquery_none still passes (which proves nothing). Yes, I think this is more correct than my original fix. > > There is also something frightening to note: not_null_tables_cache is not, as > the name suggests, a cache of the return value of not_null_tables(). Depending > on the class, not_null_tables() may return 0, not_null_tables_cache, or some mix > (as in the patch). So I suggest adding a comment near the declaration of > Item_func::not_null_tables_cache, like > /// union of values returned by not_null_tables() for each argument item It's not necessarily a union of values. What about simply: /// Value used in calculation of result of not_null_tables() Thanks, Roy