List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:December 16 2010 9:58pm
Subject:Re: bzr commit into mysql-5.1 branch (ole.john.aske:3477) Bug#58490
View as plain text  
Hello,

Ole John Aske a écrit, Le 02.12.2010 13:09:
> #At file:///net/fimafeng09/export/home/tmp/oleja/mysql/mysql-5.1/ based on
> revid:georgi.kodinov@stripped
> 
>  3477 Ole John Aske	2010-12-02
>       Updated fix for bug#58490, 'Incorrect result in multi level OUTER JOIN in
> combination with IS NULL'
>       
>       After some more testing, and discussion with Roy L. which has also tested my
> fix on the next-mr branch
>       the DBUG_ASSERT(rc==NESTED_LOOP_OK) a line ~11553 was removed.
>       This was replaced by 'if (rc == NESTED_LOOP_NO_MORE_ROWS) found= false;'.
>       
>       Furthermore testcases for the duplicated bug#49332 has been added.
>       
>       
>       --- original description ---
>       Qualifying an OUTER JOIN with '<column> IS NULL' where <column> is
> declared as 'NOT NULL' 
>       causes the 'not_exists_optimize' to be enabled.
>             
>       In evaluate_join_record() the 'not_exists_optimize' caused
> 'NESTED_LOOP_NO_MORE_ROWS' to be 
>       returned immediately when a matching row was found.
>             
>       This happened before all 'first_unmatched->found' had been properly
> evaluated for all 
>       embedding outer joins. This may cause incorrect calls to 
>       evaluate_null_complemented_join_record() after we had returned back to
> sub_select().
>             
>       This fix ensures that evaluate_join_record() completes the itteration of the
> 'first_unmatched-loop',
>       evaluates all conditions for the unmatched JOIN_TAB's and set
> 'first_unmatched->found' properly before
>       possibly returning NESTED_LOOP_NO_MORE_ROWS iff 'not_exists_optimize' was in
> effect.


> === modified file 'mysql-test/t/join_outer.test'
> --- a/mysql-test/t/join_outer.test	2010-10-29 08:23:06 +0000
> +++ b/mysql-test/t/join_outer.test	2010-12-02 12:09:47 +0000
> +## Bug#49332 & bug#58490 are duplicates. However, we include testcases

number should be 49322

> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc	2010-10-29 08:23:06 +0000
> +++ b/sql/sql_select.cc	2010-12-02 12:09:47 +0000
> @@ -11473,6 +11473,7 @@ evaluate_join_record(JOIN *join, JOIN_TA
>        return NESTED_LOOP_ERROR;
>    }
>  
> +  enum enum_nested_loop_state rc= NESTED_LOOP_OK;
>    if (!select_cond || select_cond_result)
>    {
>      /*
> @@ -11496,7 +11497,8 @@ evaluate_join_record(JOIN *join, JOIN_TA
>        for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++)
>        {
>          if (tab->table->reginfo.not_exists_optimize)
> -          return NESTED_LOOP_NO_MORE_ROWS;
> +          rc= NESTED_LOOP_NO_MORE_ROWS;
> +
>          /* Check all predicates that has just been activated. */
>          /*
>            Actually all predicates non-guarded by first_unmatched->found
> @@ -11515,7 +11517,7 @@ evaluate_join_record(JOIN *join, JOIN_TA
>                not to the last table of the current nest level.
>              */
>              join->return_tab= tab;
> -            return NESTED_LOOP_OK;
> +            return rc;
>            }
>          }
>        }
> @@ -11530,6 +11532,16 @@ evaluate_join_record(JOIN *join, JOIN_TA
>      }
>  
>      /*
> +     Setting NESTED_LOOP_NO_MORE_ROWS (if not_exists_optimize) 
> +     also implies a 'not found' condition. However we could not
> +     set this inside the  loop above as it would prematurely
> +     have terminated the 'first_unmatched' / 'first_unmatched->found'
> +     calculations above.
> +    */
> +    if (rc == NESTED_LOOP_NO_MORE_ROWS)
> +      found= false;
> +
> +    /*
>        It was not just a return to lower loop level when one
>        of the newly activated predicates is evaluated as false
>        (See above join->return_tab= tab).
> @@ -11541,7 +11553,6 @@ evaluate_join_record(JOIN *join, JOIN_TA
>  
>      if (found)
>      {
> -      enum enum_nested_loop_state rc;
>        /* A match from join_tab is found for the current partial join. */
>        rc= (*join_tab->next_select)(join, join_tab+1, 0);
>        if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS)
> @@ -11569,7 +11580,7 @@ evaluate_join_record(JOIN *join, JOIN_TA
>      join->thd->row_count++;
>      join_tab->read_record.unlock_row(join_tab);
>    }
> -  return NESTED_LOOP_OK;
> +  return rc;
>  }

After spending days analyzing the code of outer joins, I am 90% sure
that your patch is correct, so ok to push. Maybe some more documentation
would be needed though.
Let's take the 4-table testcase of BUG#58490. In it, this condition:

"(((trigcond_if(found_match(t4), trigcond_if(found_match(t3),
trigcond_if(found_match(t2), isnull(t4.i), true), true), true)"

is attached to table t4 (together with some AND-ed conditions which are
irrelevant; I pasted only what is about "t4.i IS NULL").

trigcond_if(a,b,c) is defined as "if a is true then return b else return
c". It is Item_func_trig_cond.
"found_match" is the "found" member of JOIN_TAB. "not_null_compl" is the
"not_null_compl" member of JOIN_TAB.

Assume we didn't have the "not exists" optimization, i.e. we remove:
         if (tab->table->reginfo.not_exists_optimize)
           DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS);
from code. I'm using 5.1-bugteam.

Then it's the long condition above which makes sure to eliminate rows
not matching "t4.i IS NULL".
We see that this condition gets enabled only when we have
"j->found==true" for j = t2 and t3 and t4.
We go into the while() loop:
"while (join_tab->first_unmatched && found)".
- First iteration of the loop:
t4->first_unmatched is t4, t4->found is set to true, t3->found and
t2->found are still false, condition on t4 is still disabled. We set
t4->first_unmatched to t3 and loop
- Second iteration:
t4->first_unmatched is t3, t3->found is set to true, t4->found is true,
t2->found is still false, condition on t4 is still disabled. We set
t4->first_unmatched to t2 and loop
- Third iteration:
t4->first_unmatched is t2, t2->found is set to true, t4->found and
t3->found are true, condition on t4 is enabled and false (because t4.i
is not null): we come into
         if (tab->select_cond && !tab->select_cond->val_int())
         {
           /* The condition attached to table tab is false */
           if (tab == join_tab)
             found= 0;    //<<<<<<<<<<<< here
then we set t4->first_unmatched to NULL, and we break out of the while()
(both because of this NULL and of found==0). So the function,
(evaluate_join_record(t4)) returns NESTED_LOOP_OK, and its caller goes
to the next row of t4.
What I want to say is: first we set the j->found for all tables, then we
evaluate the IS NULL condition and terminate the loop. Before j->found
is set for all tables, IS NULL is ignored, because it is inapplicable.
The unpatched code for "not exists" rather "evaluates the
IS NULL condition first" (it tests "reginfo.not_exists_optimize" first
and then returns immediately). That is, it evaluates IS NULL when it's
not yet applicable (because j->found is not true for all tables), hence
the bug.
Now, let's look at the code after your patch.
It tests "reginfo.not_exists_optimize" when IS NULL is not yet
applicable (which is still fishy), but it only uses the result to set rc 
to NESTED_LOOP_NO_MORE_ROWS. So far this does no harm (rc is not yet used).
The only possible harm can start at:

     /*
      Setting NESTED_LOOP_NO_MORE_ROWS (if not_exists_optimize)
      also implies a 'not found' condition. However we could not
      set this inside the  loop above as it would prematurely
      have terminated the 'first_unmatched' / 'first_unmatched->found'
      calculations above.
     */
     if (rc == NESTED_LOOP_NO_MORE_ROWS)
       found= false;

If we come to this if() above, it means we left the while() due to a 
false while() condition. This implies that either
join->first_unmatched is NULL
or
found==0.
If join->first_unmatched is NULL, it means we went up to the top (to the
first inner table of the topmost join nest containing the table with
not_exists_optimize==true); this trip has surely set j->found for all
tables and thus evaluated the IS NULL condition of t4 (the one with
trigcond_if), and as that condition evaluated to false (because the
column is NOT NULL as guaranteed by not_exists_optimize==true), "found"
has been set to 0 (tab==join_tab in the for() loop).
So my impression is that the above if() could be replaced with:
     if (rc == NESTED_LOOP_NO_MORE_ROWS)
       DBUG_ASSERT(!found);
or
   DBUG_ASSERT(rc != NESTED_LOOP_NO_MORE_ROWS || !found);

Then we come to the rest of the function; "found" being false, it boils
down to returning rc, i.e. NESTED_LOOP_NO_MORE_ROWS. Without the
not_exists optimization, we would have returned NESTED_LOOP_OK.
Is it sure that returning NESTED_LOOP_NO_MORE_ROWS is correct? I think
so: if instead we return NESTED_LOOP_OK, sub_select(t4) will read the
next row from t4 and call evaluate_join_record() on it. In that 
evaluate_join_record() call, j->found is true for all tables, so in 
join_tab->select_cond we have the enabled trigcond_if IS NULL condition, 
which is false (as again the value is NOT NULL).
So it sounds ok to just return NESTED_LOOP_NO_MORE_ROWS.
To sum up, after your patch, the first row does undergo the normal IS 
NULL test, so "not exists" brings nothing for the first row, and then 
"not exists" kicks in to make sure that we don't go into other rows of 
the same table. This makes a less invasive optimization as before your 
patch.

Thus I think your patch implements correct logic. I just would like 
those changes:
- using the DBUG_ASSERT() instead of setting found to 0.
- rephrasing/adding comments trying to incorporate some observations 
from above. Maybe a comment like this
  if (tab->table->reginfo.not_exists_optimize)
  {
/*
There is a WHERE ... IS NULL condition which this tab cannot possibly 
satisfy. At this stage this condition may not be enabled yet, due to 
trig_cond conditions on the 'found' members of JOIN_TABs and those 
members not all being true now. So just mark that we should skip 
evaluation of next rows of 'tab', after this row's complete 
evaluation is finished.
    */
    rc= NESTED_LOOP_NO_MORE_ROWS;
  }

and in the assertion:
     if (rc == NESTED_LOOP_NO_MORE_ROWS)
     {
/*
We come here due to "not exists" optimization.
'found' cannot be true, or we would have left the while() loop due to 
join_first_unmatched being NULL, which would imply that all 
JOIN_TAB::found trigger variables for IS NULL have been set to true, so 
condition IS NULL for last inner table has been evaluated, thus to 
'false', which set 'found' to 'false'.
In other words, either the IS NULL condition has been evaluated to 
'false' or it couldn't be evaluated because some other condition has 
been evaluated to false. In any case the row is not a match.
*/
       DBUG_ASSERT(!found);
     }
I understand that we shouldn't add too many comments for something, like 
"not exists", which  is just an optimization, so maybe you will not want 
to add them all and I can understand that. However, I think that without 
comments, it will be hard to understand, a year or two from now, why 
this assertion is correct. So maybe you can find the right balance. Or 
maybe just put a short comment with a reference to a long comment:
  if (tab->table->reginfo.not_exists_optimize)
  {
    // see NotExists in this file
    rc= NESTED_LOOP_NO_MORE_ROWS;
  }

and in the assertion:
     if (rc == NESTED_LOOP_NO_MORE_ROWS)
     {
      // see NotExists in this file
       DBUG_ASSERT(!found);
     }
and put the big comments in text at the end of the file:
/*
   NotExists optimization.
   Turned on when:
   - WHERE clause contains 'field IS NULL' (possibly AND-ed with other 
conditions)
   - 'field' belongs to a table which is alone on the inner side of an 
outer join (see BUG#58928)
   Execution:
   - comments about 'if (tab->table->reginfo.not_exists_optimize)'
   and about the assertion.
*/
Maybe this "relocated big comment" is the best way. I find that this has 
more chances to be read later than commit comments, which disappear over 
time in the bzr history.

Regarding how to merge it into trunk. After writing the lines above, I 
realize that your original patch had the DBUG_ASSERT, but you had to 
remove it because it fired in trunk. I know why. In 5.1 and 5.5, 
evaluate_join_record() and evaluate_null_complemented_record() are twin 
functions with duplicate code. In 5.6 I made the latter call the former: 
the latter just prepares the NULL-complemented row, and passes it to the 
former for evaluation of conditions. The problem is that when evaluating 
the NULL-complemented row, the former may go into
         if (tab->table->reginfo.not_exists_optimize)
           rc= NESTED_LOOP_NO_MORE_ROWS;
This happens in the testcase of BUG#58490, when we have found no match 
for t3 and decide to build a row (row from t1, row from t2, NULL, NULL): 
  evaluate_null_complemented_record(t3) builds the row, then 
evaluate_join_record(t4) evaluates it: join_tab is t4, 
t4->first_unmatched is t2 (because t3 is considered as having a match, 
more precisely because any WHERE can now be applied to t3). We enter:
     while (join_tab->first_unmatched && found)
     {
       JOIN_TAB *first_unmatched= join_tab->first_unmatched;
       first_unmatched->found= 1;
       for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++)
       {
         if (tab->table->reginfo.not_exists_optimize)
           rc= NESTED_LOOP_NO_MORE_ROWS;
which sets rc. Then we evaluate, in this and further iterations of the 
while(), the ON conditions on t2 and t1, which are satisfied, so "found" 
stays true.
That's how we can have both "found" true and rc==NESTED_LOOP_NO_MORE_ROWS.
I think it is my bug, and this code should be:
         if (tab->table->reginfo.not_exists_optimize &&
tab->not_null_compl)
           rc= NESTED_LOOP_NO_MORE_ROWS;
which is, fortunately, what Roy proposes in his '"radical" rewrite' 
email (just read his proposal to see whether he had suggested this too, 
and yes! gives more confidence when people reach the same conclusion 
separately :-)

But even with this correction, subquery_mat_all.test fails in 5.6:
@@ -273,8 +273,6 @@
  (a1, a2) in (select c1, c2 from t3i
  where (c1, c2) in (select b1, b2 from t2i where b2 > '0'));
  a1	a2
-1 - 01	2 - 01
-1 - 02	2 - 02
  explain extended
  select * from t1
  where (a1, a2) in (select b1, b2 from t2
@@ -366,7 +364,6 @@
  where (c1, c2) in (select b1, b2 from t2i where b2 > '0')));
  a1	a2
  1 - 02	2 - 02
-1 - 01	2 - 01
  explain extended
  select * from t1
  where (a1, a2) in (select * from t1 where a1 > '0' UNION select * from 
t2 where b1 < '9') and

Even if I put in the new patch's code:
     if (rc==NESTED_LOOP_NO_MORE_ROWS)
       found=0;
which would be incorrect I believe (for 5.6), the test still fails. And 
I don't know why.

So my suggestion is: keep the assert in 5.1 and 5.5 and close the bug; 
file a new bug for 5.6 and assign it to me (or any other admin way you 
like).

If I don't forget, I will later do a change which will save a useless 
test: we have this code in evaluate_join_record():

   bool select_cond_result= TRUE;

   if (error > 0 || (join->thd->is_error()))     // Fatal error
     return NESTED_LOOP_ERROR;
   if (error < 0)
     return NESTED_LOOP_NO_MORE_ROWS;
   if (join->thd->killed)			// Aborted by user
   {
     join->thd->send_kill_message();
     return NESTED_LOOP_KILLED;               /* purecov: inspected */
   }
   DBUG_PRINT("info", ("select cond 0x%lx", (ulong)select_cond));

   if (select_cond)
   {
     select_cond_result= test(select_cond->val_int());

     /* check for errors evaluating the condition */
     if (join->thd->is_error())
       return NESTED_LOOP_ERROR;
   }

   if (!select_cond || select_cond_result)
   {
      ...

I claim that the test above:
   if (!select_cond || select_cond_result)
can be replaced with
   if (select_cond_result)
because:
a) if (select_cond_result) is true then (!select_cond || 
select_cond_result) is true too
b) if (!select_cond || select_cond_result) is true, then either:
b1) (select_cond_result) is true, or
b2) (!select_cond) is true, which means that the if(select_cond) was not 
entered, so select_cond_result is still at its initial value, which is TRUE.
Thread
bzr commit into mysql-5.1 branch (ole.john.aske:3477) Bug#58490Ole John Aske2 Dec
  • Re: bzr commit into mysql-5.1 branch (ole.john.aske:3477) Bug#58490Guilhem Bichot16 Dec
    • Re: bzr commit into mysql-5.1 branch (ole.john.aske:3477) Bug#58490Roy Lyseng17 Dec