Roy Lyseng a écrit, Le 17.03.2011 13:49:
> #At file:///home/rl136806/mysql/repo/mysql-trunk/ based on
> 3781 Roy Lyseng 2011-03-17
> Bug#11764086 Bug#56881: Null left operand to NOT IN in WHERE clause
> behaves differently than real NULL
> The query that triggers this problem is an outer join query with
> a NOT IN subquery predicate in the WHERE clause.
> One of the tables in the outer join contains only one row, and because
> it is a MyISAM table, const table optimization is applied to it.
> Now the problem: not_null_tables() for the NOT IN predicate reports
> that we can ignore NULL-complemented rows for table child, hence the
> query is converted from a left join to an inner join. This leads
> the query executor to omit the row with id 2 from the evaluation.
> As for why the not_null_tables() result for NOT IN is wrong:
> NOT IN may return TRUE for a row, even if the left-hand expression
> to NOT IN is NULL (as in NULL NOT IN (<empty subquery>)).
> This is what happens when the query executor evaluates the row with
> id 2 from the parent table: There is no corresponding row in the
> child table, a NULL-complemented row should be added, and the
> predicate NULL NOT IN (<empty subquery> is evaluated (and it should
> return TRUE).
> The solution to the problem is to implement not_null_tables()
> for Item_in_optimizer. If the Item_in_subselect member is "top level",
> meaning that the original query is an IN predicate, return the
> accumulated not-null information for the object, otherwise
> (the predicate is NOT IN) return an empty set.
clear comment, saved me time, thanks
> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc 2011-03-17 09:47:50 +0000
> +++ b/sql/item_cmpfunc.cc 2011-03-17 12:49:15 +0000
> @@ -1794,6 +1794,22 @@ void Item_in_optimizer::fix_after_pullou
> const_item_cache&= args->const_item();
> + For a NOT IN subquery predicate (for which the Item_in_subselect member is
> + not a \"top level\" item), null values passed from outer tables must be
> + considered in the evaluation, hence return an empty set of tables.
> +table_map Item_in_optimizer::not_null_tables() const
> + if (((Item_in_subselect*)args)->is_top_level_item())
Nitpick: please use static_cast
> + return not_null_tables_cache;
Item_bool_func is the direct parent class of Item_in_optimizer, so I'd
change the "return" above to:
So that if one day Item_bool_func::not_null_tables() is changed, this is
taken into account here.
> + return 0;
Nitpick: I'd put comments in a different place: the function's comment
normally describes the function's purpose. This purpose is described
already near the declaration of Item::not_null_tables().
What is not trivial is the logic inside, which, to me, should go into
the body, close to where the logic is coded, like:
For a NOT IN subquery predicate (for which the Item_in_subselect
member is not a \"top level\" item), null values passed from outer
tables must be considered in the evaluation, hence return an empty set
In other words, someone reading the doxygen output for class
Item_in_optimizer should probably not have to read the comment about
"For a NOT IN etc".
I wonder whether we can replace "return 0" with some finer logic. I
mean, what we want is to express that, for "NOT IN", the outer
expression's NULLness does not imply that the item evaluates to
false/null. This is specifically about the outer expression; so instead
of 0, we could return "Item_bool_func::not_null_tables() minus the
tables used in the outer expression", like:
Item_bool_func::not_null_tables() & ~(args->not_null_tables());
I tested this and subquery_none still passes (which proves nothing).
There is also something frightening to note: not_null_tables_cache is
not, as the name suggests, a cache of the return value of
not_null_tables(). Depending on the class, not_null_tables() may return
0, not_null_tables_cache, or some mix (as in the patch). So I suggest
adding a comment near the declaration of
/// union of values returned by not_null_tables() for each argument item