Hello,
Ole John Aske a écrit, Le 11.01.2011 10:00:
> #At file:///net/fimafeng09/export/home/tmp/oleja/mysql/mysql-5.1/ based on
> revid:nirbhay.choubey@stripped
>
> 3534 Ole John Aske 2011-01-11
> Fix for bug#58490, 'Incorrect result in multi level OUTER JOIN in combination
> with IS NULL'
>
> As this bug is a duplicate of bug#49322, it also include tescases
> covering this bugreport
>
> Qualifying an OUTER JOIN with the condition 'WHERE <column> IS NULL',
> where <column> is declared as 'NOT NULL' causes the
> 'not_exists_optimize' to be enabled by the optimizer.
>
> In evaluate_join_record() the 'not_exists_optimize' caused
> 'NESTED_LOOP_NO_MORE_ROWS' to be returned immediately
> when a matching row was found.
>
> However, as the 'not_exists_optimize' is derived from
> 'JOIN_TAB::select_cond', the usual rules for condition guards
> also applies for 'not_exist_optimize'. It is therefore incorrect
> to check 'not_exists_optimize' without ensuring that all guards
> protecting it is 'open'.
good explanation
> This fix uses the fact that 'not_exists_optimize' is derived from
> a 'is_null' condition term in 'tab->select_cond'. Futhrermore,
> 'is_null' will evaluate to 'false' for any 'non-null' rows
> once all guards protecting the is_null is open.
>
> We can use this knowledge as an implicit guard check for the
> 'not_exists_optimize' by moving 'if (...not_exists_optimize)'
> inside the handling of 'select_cond==false'. It will then
> not take effect before its guards are open.
>
> We also add an assert which requires that a
> 'not_exists_optimize' always comes together with
> a select_cond. (containing 'is_null').
>
> === modified file 'mysql-test/r/join_outer.result'
> --- a/mysql-test/r/join_outer.result 2010-10-29 08:23:06 +0000
> +++ b/mysql-test/r/join_outer.result 2011-01-11 09:00:00 +0000
> @@ -1427,4 +1427,146 @@ WHERE t1.f1 = 4 AND t2.f1 IS NOT NULL AN
> GROUP BY t2.f1, t2.f2;
> f1 f1 f2
> DROP TABLE t1,t2;
> +#
> +# Bug#58490: Incorrect result in multi level OUTER JOIN
> +# in combination with IS NULL
> +#
> +CREATE TABLE t1 (i INT NOT NULL);
> +INSERT INTO t1 VALUES (0), (2),(3),(4);
> +CREATE TABLE t2 (i INT NOT NULL);
> +INSERT INTO t2 VALUES (0),(1), (3),(4);
> +CREATE TABLE t3 (i INT NOT NULL);
> +INSERT INTO t3 VALUES (0),(1),(2), (4);
> +CREATE TABLE t4 (i INT NOT NULL);
> +INSERT INTO t4 VALUES (0),(1),(2),(3) ;
> +SELECT * FROM
> +t1 LEFT JOIN
> +( t2 LEFT JOIN
> +( t3 LEFT JOIN
> +t4
> +ON t4.i = t3.i
> +)
> +ON t3.i = t2.i
> +)
> +ON t2.i = t1.i
I didn't check correctness of all queries' results. I hope you did.
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc 2010-12-28 23:47:05 +0000
> +++ b/sql/sql_select.cc 2011-01-11 09:00:00 +0000
> @@ -11500,17 +11500,40 @@ evaluate_join_record(JOIN *join, JOIN_TA
> first_unmatched->found= 1;
> for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++)
> {
> - if (tab->table->reginfo.not_exists_optimize)
> - return NESTED_LOOP_NO_MORE_ROWS;
> /* Check all predicates that has just been activated. */
> /*
> Actually all predicates non-guarded by first_unmatched->found
> will be re-evaluated again. It could be fixed, but, probably,
> it's not worth doing now.
> */
> + /*
> + not_exists_optimize has been created from a
> + select_cond containing 'is_null'. This 'is_null'
> + condition is still present on any 'tab' with
> + 'not_exists_optimize'. Furthermore, the usual rules
> + for condition guards also applies for
> + 'not_exists_optimize' -> When 'is_null==false' we
> + know all cond. guards are open and we can apply
> + the 'not_exists_optimize'.
> + */
> + DBUG_ASSERT(!(tab->table->reginfo.not_exists_optimize &&
> + !tab->select_cond));
This assertion is probably true.
However, imagine we have
select * from t1 left join t2 on 1 where t2.a is null and t1.a=t2.a;
where t2.a is declared NOT NULL and has an index.
The query above is converted to inner join, which ruins my
demonstration, but if it were not converted it would be processed like this:
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL |
NULL | 1 | Using where |
| 1 | SIMPLE | t2 | index | NULL | a | 4 |
NULL | 1 | Using where; Using index; Not exists |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------------------+
and the conditions attached to tables (tab->select_cond) would be:
- for t1:
"isnull(`test`.`t1`.`a`)"
- for t2:
"(trigcond_if(found_match(t2), ((`test`.`t2`.`a` = `test`.`t1`.`a`) and
(trigcond_if(is_not_null_compl(t2), 1, true) and
trigcond_if(is_not_null_compl(t2), 1, true))), true) and
trigcond_if(is_not_null_compl(t2), 1, true))"
We can see that nothing in t2's condition is about NOT NULL, though t2
as "not exists". NOT NULL was moved to t1. The assertion would still
pass because it is not 0 due to t2.a=t1.a.
What I want to show here is that it's not so straightforward that "this
'is_null' condition is still present on any 'tab' with
'not_exists_optimize'": it could have been moved to another earlier
table by the optimizer, as in t1 above.
But as it seems practically impossible to create a case where the
assertion fails, let's keep it.
> +
> if (tab->select_cond && !tab->select_cond->val_int())
> {
> /* The condition attached to table tab is false */
> +
> + if (tab->table->reginfo.not_exists_optimize)
> + {
This new code follows from "When 'is_null==false' we
know all cond. guards are open and we can apply
the 'not_exists_optimize'."
But what we know is that "tab->select_cond->val_int() is false".
Without much reasoning, tab->select_cond could be made of
cond1 AND trigcond(tab->found, tab.field IS NULL)
and cond1 could be false, and tab->found could be false, and thus,
tab->select_cond could be false while IS NULL - and thus
"not_exists_optimize" - would not be applicable yet. "is_null" would not
be false yet.
But then:
- cond1 would be _newly_ false; otherwise even the first evaluation of
tab->select_cond->val_int() at start of evaluate_join_record(), in this
stack frame or in upper frames for upper tables, would have been false,
so we wouldn't come into the while() loop above.
- This value change would come from one or more "found" guard having
changed to "true" (not_null_compl guards are very probably not playing a
role here, there are for ON conditions).
- Last, cond1 and IS NULL are probably guarded by _the_same_ "found"
guard, because they are probably pieces of the WHERE and all WHERE
pieces pertaining to a table probably get the same "found" guard. So if
cond1 is enabled it's probably true that IS NULL is enabled too, so "not
exists" is applicable...
Given the number of "probably", I believe it will take a long time
before we see this new code break (if this happens).
So, ok to push. If possible, please try to explain in a comment why
tab->select_cond->val_int() being false implies that the _IS_NULL_ piece
of it is enabled.
For what it's worth, I am not so happy with how not_exists lives a life
so separate from cond guards and the Item implementing NOT NULL. Makes
it hard to be 100% sure of when it's applicable. And when later we
change something in how the optimizer handles NOT NULL, we will have to
remember to rethink "not exists" :-(
> + /*
> + When not_exists_optimize is set: No need to further
> + explore more records of 'tab' for this partial result.
> + Any found 'tab' matches are known to evaluate to 'false'.
> + Returning .._NO_MORE_ROWS will skip rem. 'tab' records.
> + */
> + return NESTED_LOOP_NO_MORE_ROWS;
> + }
> +
> if (tab == join_tab)
> found= 0;
> else
>
>
>
> ------------------------------------------------------------------------
>
>
--
Mr. Guilhem Bichot <guilhem.bichot@stripped>
Oracle / MySQL / Optimizer team, Lead Software Engineer
Bordeaux, France
www.oracle.com / www.mysql.com