List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:May 2 2011 10:21am
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881
Bug#11764086
View as plain text  
Hi Guilhem,

thank you for the review.

On 30.04.11 14.50, Guilhem Bichot wrote:
> Hello Roy,
>
> Roy Lyseng a écrit, Le 17.03.2011 13:49:
>> #At file:///home/rl136806/mysql/repo/mysql-trunk/ based on
>> revid:tor.didriksen@stripped
>>
>> 3781 Roy Lyseng 2011-03-17
>> Bug#11764086 Bug#56881: Null left operand to NOT IN in WHERE clause
>> behaves differently than real NULL
>> The query that triggers this problem is an outer join query with
>> a NOT IN subquery predicate in the WHERE clause.
>> One of the tables in the outer join contains only one row, and because
>> it is a MyISAM table, const table optimization is applied to it.
>> Now the problem: not_null_tables() for the NOT IN predicate reports
>> that we can ignore NULL-complemented rows for table child, hence the
>> query is converted from a left join to an inner join. This leads
>> the query executor to omit the row with id 2 from the evaluation.
>> As for why the not_null_tables() result for NOT IN is wrong:
>> NOT IN may return TRUE for a row, even if the left-hand expression
>> to NOT IN is NULL (as in NULL NOT IN (<empty subquery>)).
>> This is what happens when the query executor evaluates the row with
>> id 2 from the parent table: There is no corresponding row in the
>> child table, a NULL-complemented row should be added, and the
>> predicate NULL NOT IN (<empty subquery> is evaluated (and it should
>> return TRUE).
>> The solution to the problem is to implement not_null_tables()
>> for Item_in_optimizer. If the Item_in_subselect member is "top level",
>> meaning that the original query is an IN predicate, return the
>> accumulated not-null information for the object, otherwise
>> (the predicate is NOT IN) return an empty set.
>
> clear comment, saved me time, thanks
>
>> === modified file 'sql/item_cmpfunc.cc'
>> --- a/sql/item_cmpfunc.cc 2011-03-17 09:47:50 +0000
>> +++ b/sql/item_cmpfunc.cc 2011-03-17 12:49:15 +0000
>> @@ -1794,6 +1794,22 @@ void Item_in_optimizer::fix_after_pullou
>> const_item_cache&= args[1]->const_item();
>> }
>>
>> +
>> +/**
>> + For a NOT IN subquery predicate (for which the Item_in_subselect member is
>> + not a \"top level\" item), null values passed from outer tables must be
>> + considered in the evaluation, hence return an empty set of tables.
>> +*/
>> +
>> +table_map Item_in_optimizer::not_null_tables() const
>> +{
>> + if (((Item_in_subselect*)args[1])->is_top_level_item())
>
> Nitpick: please use static_cast

Done.

>
>> + return not_null_tables_cache;
>
> Item_bool_func is the direct parent class of Item_in_optimizer, so I'd change
> the "return" above to:
>
> return Item_bool_func::not_null_tables();
>
> So that if one day Item_bool_func::not_null_tables() is changed, this is taken
> into account here.

Good idea.
>
>> +
>> + return 0;
>
> Nitpick: I'd put comments in a different place: the function's comment normally
> describes the function's purpose. This purpose is described already near the
> declaration of Item::not_null_tables().
> What is not trivial is the logic inside, which, to me, should go into the body,
> close to where the logic is coded, like:
>
> if (((Item_in_subselect*)args[1])->is_top_level_item())
> {
> return not_null_tables_cache;
> }
> /*
> For a NOT IN subquery predicate (for which the Item_in_subselect
> member is not a \"top level\" item), null values passed from outer
> tables must be considered in the evaluation, hence return an empty set
> of tables.
> */
> return 0;
>
> In other words, someone reading the doxygen output for class Item_in_optimizer
> should probably not have to read the comment about "For a NOT IN etc".

Done.
>
> I wonder whether we can replace "return 0" with some finer logic. I mean, what
> we want is to express that, for "NOT IN", the outer expression's NULLness does
> not imply that the item evaluates to false/null. This is specifically about the
> outer expression; so instead of 0, we could return
> "Item_bool_func::not_null_tables() minus the tables used in the outer
> expression", like:
> Item_bool_func::not_null_tables() & ~(args[0]->not_null_tables());
> I tested this and subquery_none still passes (which proves nothing).

Yes, I think this is more correct than my original fix.

>
> There is also something frightening to note: not_null_tables_cache is not, as
> the name suggests, a cache of the return value of not_null_tables(). Depending
> on the class, not_null_tables() may return 0, not_null_tables_cache, or some mix
> (as in the patch). So I suggest adding a comment near the declaration of
> Item_func::not_null_tables_cache, like
> /// union of values returned by not_null_tables() for each argument item

It's not necessarily a union of values. What about simply:

/// Value used in calculation of result of not_null_tables()

Thanks,
Roy
Thread
bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881 Bug#11764086Roy Lyseng17 Mar
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881Bug#11764086Guilhem Bichot30 Apr
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881Bug#11764086Roy Lyseng2 May
      • Re: bzr commit into mysql-trunk branch (roy.lyseng:3781) Bug#56881Bug#11764086Guilhem Bichot2 May