List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:April 30 2011 12:50pm
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881
Bug#11764086
View as plain text  
Hello Roy,

Roy Lyseng a écrit, Le 17.03.2011 13:49:
> #At file:///home/rl136806/mysql/repo/mysql-trunk/ based on
> revid:tor.didriksen@stripped
> 
>  3781 Roy Lyseng	2011-03-17
>       Bug#11764086 Bug#56881: Null left operand to NOT IN in WHERE clause
>       behaves differently than real NULL
>       
>       The query that triggers this problem is an outer join query with
>       a NOT IN subquery predicate in the WHERE clause.
>       One of the tables in the outer join contains only one row, and because
>       it is a MyISAM table, const table optimization is applied to it.
>       
>       Now the problem: not_null_tables() for the NOT IN predicate reports
>       that we can ignore NULL-complemented rows for table child, hence the
>       query is converted from a left join to an inner join. This leads
>       the query executor to omit the row with id 2 from the evaluation.
>       
>       As for why the not_null_tables() result for NOT IN is wrong:
>       NOT IN may return TRUE for a row, even if the left-hand expression
>       to NOT IN is NULL (as in NULL NOT IN (<empty subquery>)).
>       This is what happens when the query executor evaluates the row with
>       id 2 from the parent table: There is no corresponding row in the
>       child table, a NULL-complemented row should be added, and the
>       predicate NULL NOT IN (<empty subquery> is evaluated (and it should
>       return TRUE).
>       
>       The solution to the problem is to implement not_null_tables()
>       for Item_in_optimizer. If the Item_in_subselect member is "top level",
>       meaning that the original query is an IN predicate, return the
>       accumulated not-null information for the object, otherwise
>       (the predicate is NOT IN) return an empty set.

clear comment, saved me time, thanks

> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc	2011-03-17 09:47:50 +0000
> +++ b/sql/item_cmpfunc.cc	2011-03-17 12:49:15 +0000
> @@ -1794,6 +1794,22 @@ void Item_in_optimizer::fix_after_pullou
>    const_item_cache&= args[1]->const_item();
>  }
>  
> +
> +/**
> +  For a NOT IN subquery predicate (for which the Item_in_subselect member is
> +  not a \"top level\" item), null values passed from outer tables must be
> +  considered in the evaluation, hence return an empty set of tables.
> +*/
> +
> +table_map Item_in_optimizer::not_null_tables() const
> +{
> +  if (((Item_in_subselect*)args[1])->is_top_level_item())

Nitpick: please use static_cast

> +    return not_null_tables_cache;

Item_bool_func is the direct parent class of Item_in_optimizer, so I'd 
change the "return" above to:

        return Item_bool_func::not_null_tables();

So that if one day Item_bool_func::not_null_tables() is changed, this is 
taken into account here.

> +
> +  return 0;

Nitpick: I'd put comments in a different place: the function's comment 
normally describes the function's purpose. This purpose is described 
already near the declaration of Item::not_null_tables().
What is not trivial is the logic inside, which, to me, should go into 
the body, close to where the logic is coded, like:

if (((Item_in_subselect*)args[1])->is_top_level_item())
{
   return not_null_tables_cache;
}
/*
   For a NOT IN subquery predicate (for which the Item_in_subselect
   member is not a \"top level\" item), null values passed from outer
   tables must be considered in the evaluation, hence return an empty set
   of tables.
*/
return 0;

In other words, someone reading the doxygen output for class 
Item_in_optimizer should probably not have to read the comment about 
"For a NOT IN etc".

I wonder whether we can replace "return 0" with some finer logic. I 
mean, what we want is to express that, for "NOT IN", the outer 
expression's NULLness does not imply that the item evaluates to 
false/null. This is specifically about the outer expression; so instead 
of 0, we could return "Item_bool_func::not_null_tables() minus the 
tables used in the outer expression", like:
Item_bool_func::not_null_tables() & ~(args[0]->not_null_tables());
I tested this and subquery_none still passes (which proves nothing).

There is also something frightening to note: not_null_tables_cache is 
not, as the name suggests, a cache of the return value of 
not_null_tables(). Depending on the class, not_null_tables() may return 
0, not_null_tables_cache, or some mix (as in the patch). So I suggest 
adding a comment near the declaration of 
Item_func::not_null_tables_cache, like
/// union of values returned by not_null_tables() for each argument item

Thread
bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881 Bug#11764086Roy Lyseng17 Mar
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881Bug#11764086Guilhem Bichot30 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881Bug#11764086Roy Lyseng2 May
      • Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881Bug#11764086Guilhem Bichot2 May