From: Roy Lyseng Date: June 10 2011 1:07pm Subject: Re: [Fwd: bzr commit into mysql-trunk branch (guilhem.bichot:3379) Bug#12546542] List-Archive: http://lists.mysql.com/commits/139065 Message-Id: <4DF2171B.10403@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Guilhem, thank you for fixing this problem. The bugfix is approved, I have just some minor issues on the test case, and a couple of code style suggestions. Really nice explanation that made it possible for me to review without understanding join buffering code 100% :) On 31.05.11 14.29, Guilhem Bichot wrote: > > > -------- Message original -------- > Sujet: bzr commit into mysql-trunk branch (guilhem.bichot:3379) Bug#12546542 > Date: Tue, 31 May 2011 12:10:20 -0000 > De: Guilhem Bichot > Pour :: mysql_optimizer_bugs_ww@stripped > > #At file:///home/mysql_src/bzrrepos_new/mysql-next-mr-opt-backporting/ based on > revid:tor.didriksen@stripped > > 3379 Guilhem Bichot 2011-05-31 > Fix for Bug#12546542 "MISSING ROW WHEN USING OPTIMIZER_JOIN_CACHE_LEVEL>=3" > Join cache code forgot to reset JOIN_TAB::first_unmatched in some cases. > @ sql/sql_join_cache.cc > Without this fix, the final SELECT in main.subquery_sj_mat_nosj > main.subquery_sj_none_jcl6 main.subquery_sj_none_jcl7 and main.subquery_sj_none > shows only "v", misses "we". > > In the test's scenario, we had > "SELECT FROM t1 WHERE t1's field IN (SELECT ... FROM t2 LEFT JOIN t3 ...)". > When semijoin=off, IN is converted to EXISTS. This conversion adds an implicit > LIMIT=1 clause to the now-dependent subquery (finding one row suffices > to calculate the value of EXISTS()). > At optimizer_join_cache_level>=3, join buffering applies to outer joins, > so t3 does join buffering (block nested loop join). > We have two rows in t1 ('v' and 'we'); we evaluate the subquery twice, > one for each row of t1. > The first evaluation, for t1's row 'v', leads > to generating a NULL-complemented row of t2 and t3. This generation > happens in JOIN_CACHE::join_records() which calls join_null_complements(), like > this: > > /* Prepare for generation of null complementing extensions */ > for (tab= join_tab->first_inner; tab <= join_tab->last_inner; tab++) > tab->first_unmatched= join_tab->first_inner; > } > } > if (join_tab->first_unmatched) > { > ... > rc= join_null_complements(skip_last); > if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS) > goto finish; > } > .... > if (outer_join_first_inner) > { > /* > All null complemented rows have been already generated for all > outer records from join buffer. Restore the state of the > first_unmatched values to 0 to avoid another null complementing. > */ > for (tab= join_tab->first_inner; tab <= join_tab->last_inner; tab++) > tab->first_unmatched= 0; > } > finish: > some irrelevant cleanup code; > > The return value 'rc' of join_null_complements() is here > NESTED_LOOP_QUERY_LIMIT because the subquery had LIMIT=1: > one row has been found in the subquery's result, no need > to look for more rows (see end_send()). So we "goto finish", and thus > forget to reset join_tab->first_unmatched to NULL. > > Then we have the second evaluation of the subquery, for t1's row 'we'. > Because of the non-reset join_tab->first_unmatched, in the subquery's > evaluation, in JOIN_CACHE::join_records(), we don't even try > to find a match for t2's row in t3 (missing a row this way), > we directly jump to NULL-complementing. > The fix is the following: > as JOIN_CACHE::join_records() sets first_unmatched and resets it, > make it reset it in all cases: move the "reset" code to the "finish" label. > > modified: > mysql-test/include/subquery_sj.inc > mysql-test/r/subquery_sj_all.result > mysql-test/r/subquery_sj_all_jcl6.result > mysql-test/r/subquery_sj_all_jcl7.result > mysql-test/r/subquery_sj_dupsweed.result > mysql-test/r/subquery_sj_dupsweed_jcl6.result > mysql-test/r/subquery_sj_dupsweed_jcl7.result > mysql-test/r/subquery_sj_firstmatch.result > mysql-test/r/subquery_sj_firstmatch_jcl6.result > mysql-test/r/subquery_sj_firstmatch_jcl7.result > mysql-test/r/subquery_sj_loosescan.result > mysql-test/r/subquery_sj_loosescan_jcl6.result > mysql-test/r/subquery_sj_loosescan_jcl7.result > mysql-test/r/subquery_sj_mat.result > mysql-test/r/subquery_sj_mat_jcl6.result > mysql-test/r/subquery_sj_mat_jcl7.result > mysql-test/r/subquery_sj_mat_nosj.result > mysql-test/r/subquery_sj_none.result > mysql-test/r/subquery_sj_none_jcl6.result > mysql-test/r/subquery_sj_none_jcl7.result > sql/sql_join_cache.cc > === modified file 'mysql-test/include/subquery_sj.inc' > --- a/mysql-test/include/subquery_sj.inc 2011-03-17 08:00:10 +0000 > +++ b/mysql-test/include/subquery_sj.inc 2011-05-31 12:10:01 +0000 > @@ -3540,3 +3540,27 @@ ORDER BY a; > > DROP TABLE t1; > DROP VIEW v1; > + > +--echo # > +--echo # Bug#12546542 MISSING ROW WHEN USING OPTIMIZER_JOIN_CACHE_LEVEL>=3 > +--echo # > +CREATE TABLE t1 ( > + f2 varchar(1024) CHARACTER SET utf8 DEFAULT NULL > +) ENGINE=InnoDB DEFAULT CHARSET=latin1; > +INSERT INTO t1 VALUES ('v'),('we'); > +CREATE TABLE t2 ( > + col_varchar_1024_utf8 varchar(1024) CHARACTER SET utf8 DEFAULT NULL, > + col_int_key int(11) DEFAULT NULL, > + col_int int(11) DEFAULT NULL > +) ENGINE=InnoDB DEFAULT CHARSET=latin1; > +INSERT INTO t2 VALUES ('we',4,NULL),('v',1305673728,6); > +CREATE TABLE t3 ( > + col_int_key int(11) DEFAULT NULL, > + col_int int(11) DEFAULT NULL > +) ENGINE=InnoDB DEFAULT CHARSET=latin1; > +INSERT INTO t3 VALUES (4,4); In the above table definitions, I would delete engine references, character set references and integer format specification to make the case as simple as possible. > + > +SELECT * FROM t1 WHERE f2 IN (SELECT a1.col_varchar_1024_utf8 AS f2 FROM t2 > +AS a1 LEFT JOIN t3 AS a2 ON a1.col_int_key = a2.col_int_key WHERE a1.col_int > +BETWEEN 1 AND 10 OR a2.col_int IS NOT NULL); For better readability, you might format the query as follows: SELECT * FROM t1 WHERE f2 IN (SELECT a1.col_varchar_1024_utf8 AS f2 FROM t2 AS a1 LEFT JOIN t3 AS a2 ON a1.col_int_key = a2.col_int_key WHERE a1.col_int BETWEEN 1 AND 10 OR a2.col_int IS NOT NULL); > +DROP TABLE t1,t2,t3; ... > === modified file 'sql/sql_join_cache.cc' > --- a/sql/sql_join_cache.cc 2011-04-26 08:49:10 +0000 > +++ b/sql/sql_join_cache.cc 2011-05-31 12:10:01 +0000 > @@ -1697,28 +1697,28 @@ enum_nested_loop_state JOIN_CACHE::join_ > if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS) > goto finish; > } > - if (outer_join_first_inner) > - { > - /* > - All null complemented rows have been already generated for all > - outer records from join buffer. Restore the state of the > - first_unmatched values to 0 to avoid another null complementing. > - */ > - for (tab= join_tab->first_inner; tab <= join_tab->last_inner; tab++) > - tab->first_unmatched= 0; > - } > - > + > if (skip_last) > { > DBUG_ASSERT(!is_key_access()); > /* > Restore the last record from the join buffer to generate > - all extentions for it. > + all extensions for it. > */ > get_record(); > } > > finish: > + if (outer_join_first_inner) > + { > + /* > + All null complemented rows have been already generated for all > + outer records from join buffer. Restore the state of the > + first_unmatched values to 0 to avoid another null complementing. > + */ > + for (tab= join_tab->first_inner; tab <= join_tab->last_inner; tab++) > + tab->first_unmatched= 0; You might replace 0 with NULL here. > + } > restore_last_record(); > reset(TRUE); TRUE -> true ? > DBUG_RETURN(rc); Not about your code, but a reasonable modest code refactoring could be to define the "tab" variable inside the two loops where it is used. Thanks, Roy