thank you for the review.
On 30.04.11 14.50, Guilhem Bichot wrote:
> Hello Roy,
> Roy Lyseng a écrit, Le 17.03.2011 13:49:
>> #At file:///home/rl136806/mysql/repo/mysql-trunk/ based on
>> 3781 Roy Lyseng 2011-03-17
>> Bug#11764086 Bug#56881: Null left operand to NOT IN in WHERE clause
>> behaves differently than real NULL
>> The query that triggers this problem is an outer join query with
>> a NOT IN subquery predicate in the WHERE clause.
>> One of the tables in the outer join contains only one row, and because
>> it is a MyISAM table, const table optimization is applied to it.
>> Now the problem: not_null_tables() for the NOT IN predicate reports
>> that we can ignore NULL-complemented rows for table child, hence the
>> query is converted from a left join to an inner join. This leads
>> the query executor to omit the row with id 2 from the evaluation.
>> As for why the not_null_tables() result for NOT IN is wrong:
>> NOT IN may return TRUE for a row, even if the left-hand expression
>> to NOT IN is NULL (as in NULL NOT IN (<empty subquery>)).
>> This is what happens when the query executor evaluates the row with
>> id 2 from the parent table: There is no corresponding row in the
>> child table, a NULL-complemented row should be added, and the
>> predicate NULL NOT IN (<empty subquery> is evaluated (and it should
>> return TRUE).
>> The solution to the problem is to implement not_null_tables()
>> for Item_in_optimizer. If the Item_in_subselect member is "top level",
>> meaning that the original query is an IN predicate, return the
>> accumulated not-null information for the object, otherwise
>> (the predicate is NOT IN) return an empty set.
> clear comment, saved me time, thanks
>> === modified file 'sql/item_cmpfunc.cc'
>> --- a/sql/item_cmpfunc.cc 2011-03-17 09:47:50 +0000
>> +++ b/sql/item_cmpfunc.cc 2011-03-17 12:49:15 +0000
>> @@ -1794,6 +1794,22 @@ void Item_in_optimizer::fix_after_pullou
>> const_item_cache&= args->const_item();
>> + For a NOT IN subquery predicate (for which the Item_in_subselect member is
>> + not a \"top level\" item), null values passed from outer tables must be
>> + considered in the evaluation, hence return an empty set of tables.
>> +table_map Item_in_optimizer::not_null_tables() const
>> + if (((Item_in_subselect*)args)->is_top_level_item())
> Nitpick: please use static_cast
>> + return not_null_tables_cache;
> Item_bool_func is the direct parent class of Item_in_optimizer, so I'd change
> the "return" above to:
> return Item_bool_func::not_null_tables();
> So that if one day Item_bool_func::not_null_tables() is changed, this is taken
> into account here.
>> + return 0;
> Nitpick: I'd put comments in a different place: the function's comment normally
> describes the function's purpose. This purpose is described already near the
> declaration of Item::not_null_tables().
> What is not trivial is the logic inside, which, to me, should go into the body,
> close to where the logic is coded, like:
> if (((Item_in_subselect*)args)->is_top_level_item())
> return not_null_tables_cache;
> For a NOT IN subquery predicate (for which the Item_in_subselect
> member is not a \"top level\" item), null values passed from outer
> tables must be considered in the evaluation, hence return an empty set
> of tables.
> return 0;
> In other words, someone reading the doxygen output for class Item_in_optimizer
> should probably not have to read the comment about "For a NOT IN etc".
> I wonder whether we can replace "return 0" with some finer logic. I mean, what
> we want is to express that, for "NOT IN", the outer expression's NULLness does
> not imply that the item evaluates to false/null. This is specifically about the
> outer expression; so instead of 0, we could return
> "Item_bool_func::not_null_tables() minus the tables used in the outer
> expression", like:
> Item_bool_func::not_null_tables() & ~(args->not_null_tables());
> I tested this and subquery_none still passes (which proves nothing).
Yes, I think this is more correct than my original fix.
> There is also something frightening to note: not_null_tables_cache is not, as
> the name suggests, a cache of the return value of not_null_tables(). Depending
> on the class, not_null_tables() may return 0, not_null_tables_cache, or some mix
> (as in the patch). So I suggest adding a comment near the declaration of
> Item_func::not_null_tables_cache, like
> /// union of values returned by not_null_tables() for each argument item
It's not necessarily a union of values. What about simply:
/// Value used in calculation of result of not_null_tables()