List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:August 11 2010 3:26pm
Subject:Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208)
Bug#54437
View as plain text  
Hello again,

Roy Lyseng a écrit, Le 11.08.2010 16:25:
>>> It does not seem that any of the tests invoke the FirstMatch 
>>> strategy. It
>>> would be good to have coverage for that situation (but not required).
>>
>> good idea; will try and let you know.

Done: by adding two more "1" rows into t2, I made the optimizer prefer 
strategies in the following best-to-worse order: 
materialization-firstmatch-dupsweedout. So subquery_sj_firstmatch now 
uses a firstmatch plan for the query and exhibits the bug as desired (8 
rows instead of 2, before the code fix). subquery_sj_dupsweed uses dups 
weedout and exhibits the bug too (no change).

>>>> === modified file 'sql/sql_select.cc'
>>>> --- a/sql/sql_select.cc 2010-06-17 09:31:07 +0000
>>>> +++ b/sql/sql_select.cc 2010-06-28 20:43:32 +0000
>>>> @@ -16992,7 +16992,15 @@ evaluate_join_record(JOIN *join, JOIN_TA
>>>> for (JOIN_TAB *tab= first_unmatched; tab<= join_tab; tab++)
>>>> {
>>>> if (tab->table->reginfo.not_exists_optimize)
>>>> + {
>>>> + /*
>>>> + We have a match in 'tab' but wanted none: no need to explore this
>>>
>>> Suggest rewrite; "We have a match in 'tab' but wanted none: " ->
>>> "We have a match in 'tab' that causes this condition to evaluate to 
>>> FALSE: "
>>
>> What condition would "this condition" mean?
>> not_exists_optimize means that the optimizer has determined, long ago, 
>> that
>> there is a condition so that if there is a matching row in "tab", the 
>> row of
>> tab-1 should not be returned. Example:
>> select * from t1 left join t2 on <condition>
>> where <blah> and t2.a is null;
>> (assuming t2.a is declared as NOT NULL). In this example, if a 
>> matching row is
>> found in t2, t2.a is surely not null, so this matching row is 
>> rejected, and as
>> there is a matching row there will be no null-complemented rows, so 
>> this row of
>> t1 cannot be in the result set.
>> So, I find that at the place where I write the comment, the user 
>> cannot easily
>> see what condition I would be talking about.
>> What do you think?
> 
> I was not completely sure about what to write, so I defaulted to "this 
> condition". AFAIU, it is the condition attached to the current join_tab 
> (which defaults to TRUE if pointer is NULL), but I am not sure on how to 
> express that so it will be understood.

I propose: "it was determined earlier that if a matching row is found, 
the outer row should be excluded from the result set: no need to explore 
etc".

>>>> + record and other records of 'tab', so we return "no records". But
>>>> + as we set 'found' above, evaluate_join_record() at the upper level
>>>> + will not yield a NULL-complemented record.
>>>> + */
>>>> DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS);
>>>> + }
>>>> /* Check all predicates that has just been activated. */
>>>> /*
>>>> Actually all predicates non-guarded by first_unmatched->found
>>>> @@ -17165,38 +17173,23 @@ evaluate_null_complemented_join_record(J
>>>
>>> Suggestion: Take the first comment in the code body and create a doxygen
>>> parameter comment out of it.
>>
>> I could not find the comment in question: which "first comment" did 
>> you mean?
>> could you please paste it here?
> 
> I meant this:
>   /*
>     The table join_tab is the first inner table of a outer join operation
>     and no matches has been found for the current outer row.
>   */

thanks, done.

> 
> The first part of this should go as the parameter documentation. The 
> second part is just repeating the function comment.

so I removed the second part.

> 
>>
>>>> }
>>>> join_tab--;
>>>
>>> Suggestion: The above line would be more explicit with
>>> join_tab= last_inner_tab;
>>
>> done
>>
>>>> /*
>>>> - The row complemented by nulls might be the first row
>>>> - of embedding outer joins.
>>>> - If so, perform the same actions as in the code
>>>> - for the first regular outer join row above.
>>>> + From the point of view of the rest of execution, this record matches
>>>> + (it has been built and satisfies conditions, no need to do more 
>>>> evaluation
>>>> + on it). See similar code in evaluate_join_record().
>>>> */
>>>> - for ( ; ; )
>>>> - {
>>>> - JOIN_TAB *first_unmatched= join_tab->first_unmatched;
>>>> - if ((first_unmatched= first_unmatched->first_upper)&&
>>>> - first_unmatched->last_inner != join_tab)
>>>> - first_unmatched= 0;
>>>> - join_tab->first_unmatched= first_unmatched;
>>>> - if (!first_unmatched)
>>>> - break;
>>>> - first_unmatched->found= 1;
>>>> - for (JOIN_TAB *tab= first_unmatched; tab<= join_tab; tab++)
>>>> - {
>>>> - if (tab->select_cond&& !tab->select_cond->val_int())
>>>> - {
>>>> - join->return_tab= tab;
>>>> - DBUG_RETURN(NESTED_LOOP_OK);
>>>> - }
>>>> - }
>>>> - }
>>>> + JOIN_TAB *first_unmatched= join_tab->first_unmatched;
>>>> + if ((first_unmatched= first_unmatched->first_upper)&&
>>>> + first_unmatched->last_inner != join_tab)
>>> > + first_unmatched= 0;
>>>
>>> Suggestion: Rewrite the above four lines like this:
>>> JOIN_TAB *first_unmatched= join_tab->first_unmatched->first_upper;
>>> if (first_unmatched && first_unmatched->last_inner != join_tab)
>>> first_unmatched= NULL;
>>
>> done, like this:
>> JOIN_TAB *first_unmatched= join_tab->first_unmatched->first_upper;
>> if (first_unmatched != NULL &&
>> first_unmatched->last_inner != join_tab)
>> first_unmatched= NULL;
>> join_tab->first_unmatched= first_unmatched;
>>
>>> If possible, please document what this code block does...
>>
>> Jorgen also asked for documentation of this same block. In my patch it 
>> has
>> documentation:
>> /*
>>  From the point of view of the rest of execution, this record matches
>> (it has been built and satisfies conditions, no need to do more 
>> evaluation
>> on it). See similar code in evaluate_join_record().
>> */
>> and in evaluate_join_record() it has comment:
>> /*
>> Check whether join_tab is not the last inner table
>> for another embedding outer join.
>> */
>> So somehow, the first comment suggest to look at the second comment.
>> Do you mean I should copy-paste the comment "check whether etc" into
>> evaluate_null_etc() ?
> 
> I think that "check whether..." can be derived from the code. But what 
> does the assignment to first_unmatched really mean? This goes beyond 
> your fix, so do not take it as a requirement.

first_unmatched is described in the "IMPLEMENTATION" part of the 
function comment of sub_select().
What this code block is doing? consider this easy sample query:
SELECT
...
FROM
t1
LEFT JOIN
(
   (
     t2
     LEFT JOIN
     (t3, t4)
     ON t3.a=1 AND t2.b=t4.b
   ),
   (
     t5
     LEFT JOIN
     (
       (t6, t7)
       LEFT JOIN
       t8
       ON t7.b=t8.b AND t6.b < 10
     )
     ON t6.b >= 2 AND t5.b=t7.b
   )
   ON (t3.b=2 OR t3.c IS NULL) AND (t6.b=2 OR t6.c IS NULL) AND
      (t1.b=t5.b OR t3.c IS NULL OR t6.c IS NULL or t8.c IS NULL) AND
      (t1.a != 2)

The query plan is t1,t2,t3,t4,t5,t7,t6,t8. When we are in 
evaluate_join_record() for join_tab==t4 (I'm using a tree without my 
patch), and come to just before the start of this code:
       }
       /*
         Check whether join_tab is not the last inner table
         for another embedding outer join.
       */
       if ((first_unmatched= first_unmatched->first_upper) &&
           first_unmatched->last_inner != join_tab)
         first_unmatched= 0;
first_unmatched is t3 and first_unmatched->first_upper is t2 
(understandable from the comment of sub_select()):  so far we didn't 
have a match in (t3,t4) (for the join to t2), so the first_unmatched of 
t4 was t3, now we have a match, so we look up (first_upper) as it's time 
to re-evaluate ON/WHERE conditions which were so far disabled: do we 
have a match in t2 now (for the join to t1)? Well, t2's last_inner is 
t8, which is not t4, so we go into first_unmatched=0: t4's 
first_unmatched is now 0.
Why t2's last inner is t8? I really wonder, to me t8 is in a different 
nest, joined with t2's nest via an _inner_ - not outer - join, so I 
cannot see why t8 would matter to t2.
Even if it were correct, why should t4's first_unmatched be 0? I wonder.
How did I come to the simple query above? I put a DBUG_ASSERT(0) in the 
"first_unmatched=0" block, and run the "optimizer tests" (group*.test, 
join*.test select*.test subquery*.test to name a few), and it fired in a 
single query of a single test (join_nested.test).
So it seems to be rarely executed. And if I comment out 
"first_unmatched=0", join_nested.test still passes.
Sounds fishy? Oh yes.
However I believe that my patch is correct, because before my patch, the 
loop in evaluate_null...() looked like:

   for ( ; ; )
   {
     JOIN_TAB *first_unmatched= join_tab->first_unmatched;
     if ((first_unmatched= first_unmatched->first_upper) &&
         first_unmatched->last_inner != join_tab)
       first_unmatched= 0;
     join_tab->first_unmatched= first_unmatched;
     if (!first_unmatched)
       break;
     first_unmatched->found= 1;
     for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++)
     {
       if (tab->select_cond && !tab->select_cond->val_int())
       {
         join->return_tab= tab;
         DBUG_RETURN(NESTED_LOOP_OK);
       }
     }
   }

In my patch I only keep the first piece of the first iteration (up to "= 
1;" excluded) and then I cascade into evaluate_join_record() which does 
the rest of the first iteration and the whole next iterations:

       JOIN_TAB *first_unmatched= join_tab->first_unmatched;
       first_unmatched->found= 1;
       for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++)
       {
         if (tab->table->reginfo.not_exists_optimize)
           DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS);
         if (tab->select_cond && !tab->select_cond->val_int())
         {
           /* The condition attached to table tab is false */
           if (tab == join_tab)
             found= 0;
           else
           {
             join->return_tab= tab;
             DBUG_RETURN(NESTED_LOOP_OK);
           }
         }
       }
       if ((first_unmatched= first_unmatched->first_upper) &&
           first_unmatched->last_inner != join_tab)
         first_unmatched= 0;
       join_tab->first_unmatched= first_unmatched;

That is, even though I cannot explain exactly what the logic is, I have 
the impression that I preserved it.

Thread
bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208) Bug#54437Guilhem Bichot28 Jun
Re: bzr commit into mysql-next-mr-opt-backporting branch(guilhem:3208) Bug#54437Jørgen Løland30 Jun
  • Re: bzr commit into mysql-next-mr-opt-backporting branch(guilhem:3208) Bug#54437Guilhem Bichot30 Jun
  • Re: bzr commit into mysql-next-mr-opt-backporting branch(guilhem:3208) Bug#54437Guilhem Bichot1 Jul
    • Re: bzr commit into mysql-next-mr-opt-backporting branch(guilhem:3208) Bug#54437Guilhem Bichot7 Jul
      • Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208)Bug#54437Guilhem Bichot12 Aug
  • Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208)Bug#54437Jorgen Loland9 Aug
Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208)Bug#54437Roy Lyseng9 Aug
  • Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208)Bug#54437Guilhem Bichot11 Aug
    • Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208)Bug#54437Roy Lyseng11 Aug
      • Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208)Bug#54437Guilhem Bichot11 Aug
        • Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208)Bug#54437Guilhem Bichot12 Aug