Hello Roy,
Roy Lyseng a écrit, Le 17.03.2011 13:49:
> #At file:///home/rl136806/mysql/repo/mysql-trunk/ based on
> revid:tor.didriksen@stripped
>
> 3781 Roy Lyseng 2011-03-17
> Bug#11764086 Bug#56881: Null left operand to NOT IN in WHERE clause
> behaves differently than real NULL
>
> The query that triggers this problem is an outer join query with
> a NOT IN subquery predicate in the WHERE clause.
> One of the tables in the outer join contains only one row, and because
> it is a MyISAM table, const table optimization is applied to it.
>
> Now the problem: not_null_tables() for the NOT IN predicate reports
> that we can ignore NULL-complemented rows for table child, hence the
> query is converted from a left join to an inner join. This leads
> the query executor to omit the row with id 2 from the evaluation.
>
> As for why the not_null_tables() result for NOT IN is wrong:
> NOT IN may return TRUE for a row, even if the left-hand expression
> to NOT IN is NULL (as in NULL NOT IN (<empty subquery>)).
> This is what happens when the query executor evaluates the row with
> id 2 from the parent table: There is no corresponding row in the
> child table, a NULL-complemented row should be added, and the
> predicate NULL NOT IN (<empty subquery> is evaluated (and it should
> return TRUE).
>
> The solution to the problem is to implement not_null_tables()
> for Item_in_optimizer. If the Item_in_subselect member is "top level",
> meaning that the original query is an IN predicate, return the
> accumulated not-null information for the object, otherwise
> (the predicate is NOT IN) return an empty set.
clear comment, saved me time, thanks
> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc 2011-03-17 09:47:50 +0000
> +++ b/sql/item_cmpfunc.cc 2011-03-17 12:49:15 +0000
> @@ -1794,6 +1794,22 @@ void Item_in_optimizer::fix_after_pullou
> const_item_cache&= args[1]->const_item();
> }
>
> +
> +/**
> + For a NOT IN subquery predicate (for which the Item_in_subselect member is
> + not a \"top level\" item), null values passed from outer tables must be
> + considered in the evaluation, hence return an empty set of tables.
> +*/
> +
> +table_map Item_in_optimizer::not_null_tables() const
> +{
> + if (((Item_in_subselect*)args[1])->is_top_level_item())
Nitpick: please use static_cast
> + return not_null_tables_cache;
Item_bool_func is the direct parent class of Item_in_optimizer, so I'd
change the "return" above to:
return Item_bool_func::not_null_tables();
So that if one day Item_bool_func::not_null_tables() is changed, this is
taken into account here.
> +
> + return 0;
Nitpick: I'd put comments in a different place: the function's comment
normally describes the function's purpose. This purpose is described
already near the declaration of Item::not_null_tables().
What is not trivial is the logic inside, which, to me, should go into
the body, close to where the logic is coded, like:
if (((Item_in_subselect*)args[1])->is_top_level_item())
{
return not_null_tables_cache;
}
/*
For a NOT IN subquery predicate (for which the Item_in_subselect
member is not a \"top level\" item), null values passed from outer
tables must be considered in the evaluation, hence return an empty set
of tables.
*/
return 0;
In other words, someone reading the doxygen output for class
Item_in_optimizer should probably not have to read the comment about
"For a NOT IN etc".
I wonder whether we can replace "return 0" with some finer logic. I
mean, what we want is to express that, for "NOT IN", the outer
expression's NULLness does not imply that the item evaluates to
false/null. This is specifically about the outer expression; so instead
of 0, we could return "Item_bool_func::not_null_tables() minus the
tables used in the outer expression", like:
Item_bool_func::not_null_tables() & ~(args[0]->not_null_tables());
I tested this and subquery_none still passes (which proves nothing).
There is also something frightening to note: not_null_tables_cache is
not, as the name suggests, a cache of the return value of
not_null_tables(). Depending on the class, not_null_tables() may return
0, not_null_tables_cache, or some mix (as in the patch). So I suggest
adding a comment near the declaration of
Item_func::not_null_tables_cache, like
/// union of values returned by not_null_tables() for each argument item