|List:||Commits||« Previous MessageNext Message »|
|From:||Guilhem Bichot||Date:||July 7 2010 7:33am|
|Subject:||Re: bzr commit into mysql-next-mr-opt-backporting branch|
|View as plain text|
Hello, Guilhem Bichot a écrit, Le 01.07.2010 16:34: > Hello Jorgen, > > Jørgen Løland a écrit, Le 30.06.2010 15:03: >> Guilhem, >> >> I'm doing my best to understand the consequences of removing the >> for(;;) loop and tried a few queries. Here's what I found so far: >> >> #Your query >> select t2.a from t2 left join t3 on t2.a=t3.a; >> a >> 1 >> 1 >> select * from t1 where t1.a in (select t2.a from t2 left join t3 on >> t2.a=t3.a); >> a >> 1 >> 1 >> >> #Modified query with inner cross-join - should be same result >> select t2.a from t2 left join (t2 as t2inner,t3) on t2.a=t3.a; >> a >> 1 >> 1 >> select * from t1 where t1.a in (select t2.a from t2 left join (t2 as >> t2inner,t3) on t2.a=t3.a); >> a >> 1 >> >> Do you agree that the result of the last query misses a '1' row? > > A follow-up on this. > I analyzed how this incorrect result occurs, and it seems to be the same > conditions as BUG#49129. It is specific of join buffering. I find those > ingredients in common with BUG#49129 (see analysis left by Sergey > Petrunia in the report): > - join buffering (though not the same level: block-nested-loop instead > of BKA, but this is probably just due to the access type being "table > scan" instead of "ref": BKA requires "ref") > - when it decides about what tables to include in duplicates weedout, it > has a wrong idea of what table will do join buffering (it thinks that > only t1 will do buffering; in fact, as decided later and seen in > EXPLAIN, only t2 does join buffering) > - so it forgets to include one table (t1) > - and then the usual effect of join buffering, which does not retrieve > rows in the same order as not-buffered nested-loop join, takes care of > making a wrong result as "expectable". > > In more detail, please see in WL#3741 this sentence in paragraph 2.2: > "NL-execution guarantee that this row combination won't be encountered > anywhere below the point (*)". We apply this technique of 2.2, because > we wrongly believe that there won't be join buffering for t2, t2inner, > t3. In fact there will be join buffering for t2, so it won't be > NL-execution but block-nested-loop execution, the guarantee does not > hold, we should not apply the technique of this paragraph 2.2 but rather > of paragraph 2.3, which is about including "t1" in the duplicates > weedout tmp table. > > I tried forcing mysql to include "t1" in the duplicates weedout tmp > table but then I got a crash inside the MEMORY engine code (which > handles the tmp table), I need to research this more. > > My impression is that the bug which you found is not related to my patch > (the result is 2 excess rows before my patch, 1 missing row after my > patch). Before my patch, the execution steps of duplicates weedout were > forgotten, letting duplicate rows pass. After my patch the steps are > duly executed, but exhibit a duplicates weedout bug, which I think is a > duplicate of BUG#49129. > Another indication: I reverted my patch, and coded the "safe and easy" > alternative instead (theoretically less risky: copy-paste of the > duplicates weedout code (do_sj_dups_weedout() call) from > evaluate_join_record() to evaluate_null_complemented_record(). I got the > same bad result (missing row). That suggests that this bad result is not > due to my patch but to duplicates weedout itself. Confirmation. After forcing Duplicates weedout to include table t1, or, which is equivalent, after applying the patch from Sergey Petrunia for BUG#49129, I get a segfault in the MEMORY engine. This is because we try to insert a malformed record into the duplicates weedout temporary table. Which is because the tables "included" in this table are t1,t2inner,t3 (note: t2 isn't). t2inner and t3 should not be included; there is a test for this in sj_table_is_included(): if the table is in a semijoin nest (i.e. join_tab->emb_sj_nest!=NULL), don't include it; but for t2inner and t3, join_tab->emb_sj_nest is NULL. Why? because: - they are in a nested join - we have a semijoin nest which contains t2 and the nested join above - the loop which sets emb_sj_nest, in pull_out_semijoin_tables(), stays on first level: it sets it for t2, then looks at the nested join object, which has tbl->table==NULL, so doesn't set emb_sj_nest for the nested join, and the loop doesn't dive into the nested join. So nested join tables don't get their emb_sj_nest properly set, they are left as "not in a semijoin nest". After fixing this (with a simple recursive function to go down into the nested join and set emb_sj_nest), finally (phew) I get good results for your testcase.