| List: | Commits | « Previous MessageNext Message » | |
| From: | Guilhem Bichot | Date: | August 11 2010 3:26pm |
| Subject: | Re: bzr commit into mysql-next-mr-opt-backporting branch (guilhem:3208) Bug#54437 | ||
| View as plain text | |||
Hello again, Roy Lyseng a écrit, Le 11.08.2010 16:25: >>> It does not seem that any of the tests invoke the FirstMatch >>> strategy. It >>> would be good to have coverage for that situation (but not required). >> >> good idea; will try and let you know. Done: by adding two more "1" rows into t2, I made the optimizer prefer strategies in the following best-to-worse order: materialization-firstmatch-dupsweedout. So subquery_sj_firstmatch now uses a firstmatch plan for the query and exhibits the bug as desired (8 rows instead of 2, before the code fix). subquery_sj_dupsweed uses dups weedout and exhibits the bug too (no change). >>>> === modified file 'sql/sql_select.cc' >>>> --- a/sql/sql_select.cc 2010-06-17 09:31:07 +0000 >>>> +++ b/sql/sql_select.cc 2010-06-28 20:43:32 +0000 >>>> @@ -16992,7 +16992,15 @@ evaluate_join_record(JOIN *join, JOIN_TA >>>> for (JOIN_TAB *tab= first_unmatched; tab<= join_tab; tab++) >>>> { >>>> if (tab->table->reginfo.not_exists_optimize) >>>> + { >>>> + /* >>>> + We have a match in 'tab' but wanted none: no need to explore this >>> >>> Suggest rewrite; "We have a match in 'tab' but wanted none: " -> >>> "We have a match in 'tab' that causes this condition to evaluate to >>> FALSE: " >> >> What condition would "this condition" mean? >> not_exists_optimize means that the optimizer has determined, long ago, >> that >> there is a condition so that if there is a matching row in "tab", the >> row of >> tab-1 should not be returned. Example: >> select * from t1 left join t2 on <condition> >> where <blah> and t2.a is null; >> (assuming t2.a is declared as NOT NULL). In this example, if a >> matching row is >> found in t2, t2.a is surely not null, so this matching row is >> rejected, and as >> there is a matching row there will be no null-complemented rows, so >> this row of >> t1 cannot be in the result set. >> So, I find that at the place where I write the comment, the user >> cannot easily >> see what condition I would be talking about. >> What do you think? > > I was not completely sure about what to write, so I defaulted to "this > condition". AFAIU, it is the condition attached to the current join_tab > (which defaults to TRUE if pointer is NULL), but I am not sure on how to > express that so it will be understood. I propose: "it was determined earlier that if a matching row is found, the outer row should be excluded from the result set: no need to explore etc". >>>> + record and other records of 'tab', so we return "no records". But >>>> + as we set 'found' above, evaluate_join_record() at the upper level >>>> + will not yield a NULL-complemented record. >>>> + */ >>>> DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS); >>>> + } >>>> /* Check all predicates that has just been activated. */ >>>> /* >>>> Actually all predicates non-guarded by first_unmatched->found >>>> @@ -17165,38 +17173,23 @@ evaluate_null_complemented_join_record(J >>> >>> Suggestion: Take the first comment in the code body and create a doxygen >>> parameter comment out of it. >> >> I could not find the comment in question: which "first comment" did >> you mean? >> could you please paste it here? > > I meant this: > /* > The table join_tab is the first inner table of a outer join operation > and no matches has been found for the current outer row. > */ thanks, done. > > The first part of this should go as the parameter documentation. The > second part is just repeating the function comment. so I removed the second part. > >> >>>> } >>>> join_tab--; >>> >>> Suggestion: The above line would be more explicit with >>> join_tab= last_inner_tab; >> >> done >> >>>> /* >>>> - The row complemented by nulls might be the first row >>>> - of embedding outer joins. >>>> - If so, perform the same actions as in the code >>>> - for the first regular outer join row above. >>>> + From the point of view of the rest of execution, this record matches >>>> + (it has been built and satisfies conditions, no need to do more >>>> evaluation >>>> + on it). See similar code in evaluate_join_record(). >>>> */ >>>> - for ( ; ; ) >>>> - { >>>> - JOIN_TAB *first_unmatched= join_tab->first_unmatched; >>>> - if ((first_unmatched= first_unmatched->first_upper)&& >>>> - first_unmatched->last_inner != join_tab) >>>> - first_unmatched= 0; >>>> - join_tab->first_unmatched= first_unmatched; >>>> - if (!first_unmatched) >>>> - break; >>>> - first_unmatched->found= 1; >>>> - for (JOIN_TAB *tab= first_unmatched; tab<= join_tab; tab++) >>>> - { >>>> - if (tab->select_cond&& !tab->select_cond->val_int()) >>>> - { >>>> - join->return_tab= tab; >>>> - DBUG_RETURN(NESTED_LOOP_OK); >>>> - } >>>> - } >>>> - } >>>> + JOIN_TAB *first_unmatched= join_tab->first_unmatched; >>>> + if ((first_unmatched= first_unmatched->first_upper)&& >>>> + first_unmatched->last_inner != join_tab) >>> > + first_unmatched= 0; >>> >>> Suggestion: Rewrite the above four lines like this: >>> JOIN_TAB *first_unmatched= join_tab->first_unmatched->first_upper; >>> if (first_unmatched && first_unmatched->last_inner != join_tab) >>> first_unmatched= NULL; >> >> done, like this: >> JOIN_TAB *first_unmatched= join_tab->first_unmatched->first_upper; >> if (first_unmatched != NULL && >> first_unmatched->last_inner != join_tab) >> first_unmatched= NULL; >> join_tab->first_unmatched= first_unmatched; >> >>> If possible, please document what this code block does... >> >> Jorgen also asked for documentation of this same block. In my patch it >> has >> documentation: >> /* >> From the point of view of the rest of execution, this record matches >> (it has been built and satisfies conditions, no need to do more >> evaluation >> on it). See similar code in evaluate_join_record(). >> */ >> and in evaluate_join_record() it has comment: >> /* >> Check whether join_tab is not the last inner table >> for another embedding outer join. >> */ >> So somehow, the first comment suggest to look at the second comment. >> Do you mean I should copy-paste the comment "check whether etc" into >> evaluate_null_etc() ? > > I think that "check whether..." can be derived from the code. But what > does the assignment to first_unmatched really mean? This goes beyond > your fix, so do not take it as a requirement. first_unmatched is described in the "IMPLEMENTATION" part of the function comment of sub_select(). What this code block is doing? consider this easy sample query: SELECT ... FROM t1 LEFT JOIN ( ( t2 LEFT JOIN (t3, t4) ON t3.a=1 AND t2.b=t4.b ), ( t5 LEFT JOIN ( (t6, t7) LEFT JOIN t8 ON t7.b=t8.b AND t6.b < 10 ) ON t6.b >= 2 AND t5.b=t7.b ) ON (t3.b=2 OR t3.c IS NULL) AND (t6.b=2 OR t6.c IS NULL) AND (t1.b=t5.b OR t3.c IS NULL OR t6.c IS NULL or t8.c IS NULL) AND (t1.a != 2) The query plan is t1,t2,t3,t4,t5,t7,t6,t8. When we are in evaluate_join_record() for join_tab==t4 (I'm using a tree without my patch), and come to just before the start of this code: } /* Check whether join_tab is not the last inner table for another embedding outer join. */ if ((first_unmatched= first_unmatched->first_upper) && first_unmatched->last_inner != join_tab) first_unmatched= 0; first_unmatched is t3 and first_unmatched->first_upper is t2 (understandable from the comment of sub_select()): so far we didn't have a match in (t3,t4) (for the join to t2), so the first_unmatched of t4 was t3, now we have a match, so we look up (first_upper) as it's time to re-evaluate ON/WHERE conditions which were so far disabled: do we have a match in t2 now (for the join to t1)? Well, t2's last_inner is t8, which is not t4, so we go into first_unmatched=0: t4's first_unmatched is now 0. Why t2's last inner is t8? I really wonder, to me t8 is in a different nest, joined with t2's nest via an _inner_ - not outer - join, so I cannot see why t8 would matter to t2. Even if it were correct, why should t4's first_unmatched be 0? I wonder. How did I come to the simple query above? I put a DBUG_ASSERT(0) in the "first_unmatched=0" block, and run the "optimizer tests" (group*.test, join*.test select*.test subquery*.test to name a few), and it fired in a single query of a single test (join_nested.test). So it seems to be rarely executed. And if I comment out "first_unmatched=0", join_nested.test still passes. Sounds fishy? Oh yes. However I believe that my patch is correct, because before my patch, the loop in evaluate_null...() looked like: for ( ; ; ) { JOIN_TAB *first_unmatched= join_tab->first_unmatched; if ((first_unmatched= first_unmatched->first_upper) && first_unmatched->last_inner != join_tab) first_unmatched= 0; join_tab->first_unmatched= first_unmatched; if (!first_unmatched) break; first_unmatched->found= 1; for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++) { if (tab->select_cond && !tab->select_cond->val_int()) { join->return_tab= tab; DBUG_RETURN(NESTED_LOOP_OK); } } } In my patch I only keep the first piece of the first iteration (up to "= 1;" excluded) and then I cascade into evaluate_join_record() which does the rest of the first iteration and the whole next iterations: JOIN_TAB *first_unmatched= join_tab->first_unmatched; first_unmatched->found= 1; for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++) { if (tab->table->reginfo.not_exists_optimize) DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS); if (tab->select_cond && !tab->select_cond->val_int()) { /* The condition attached to table tab is false */ if (tab == join_tab) found= 0; else { join->return_tab= tab; DBUG_RETURN(NESTED_LOOP_OK); } } } if ((first_unmatched= first_unmatched->first_upper) && first_unmatched->last_inner != join_tab) first_unmatched= 0; join_tab->first_unmatched= first_unmatched; That is, even though I cannot explain exactly what the logic is, I have the impression that I preserved it.
