List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:June 10 2011 1:07pm
Subject:Re: [Fwd: bzr commit into mysql-trunk branch (guilhem.bichot:3379)
Bug#12546542]
View as plain text  
Hi Guilhem,

thank you for fixing this problem.

The bugfix is approved, I have just some minor issues on the test case, and a 
couple of code style suggestions.

Really nice explanation that made it possible for me to review without 
understanding join buffering code 100% :)

On 31.05.11 14.29, Guilhem Bichot wrote:
>
>
> -------- Message original --------
> Sujet: bzr commit into mysql-trunk branch (guilhem.bichot:3379) Bug#12546542
> Date: Tue, 31 May 2011 12:10:20 -0000
> De: Guilhem Bichot <guilhem.bichot@stripped>
> Pour :: mysql_optimizer_bugs_ww@stripped
>
> #At file:///home/mysql_src/bzrrepos_new/mysql-next-mr-opt-backporting/ based on
> revid:tor.didriksen@stripped
>
> 3379 Guilhem Bichot 2011-05-31
> Fix for Bug#12546542 "MISSING ROW WHEN USING OPTIMIZER_JOIN_CACHE_LEVEL>=3"
> Join cache code forgot to reset JOIN_TAB::first_unmatched in some cases.
> @ sql/sql_join_cache.cc
> Without this fix, the final SELECT in main.subquery_sj_mat_nosj
> main.subquery_sj_none_jcl6 main.subquery_sj_none_jcl7 and main.subquery_sj_none
> shows only "v", misses "we".
>
> In the test's scenario, we had
> "SELECT FROM t1 WHERE t1's field IN (SELECT ... FROM t2 LEFT JOIN t3 ...)".
> When semijoin=off, IN is converted to EXISTS. This conversion adds an implicit
> LIMIT=1 clause to the now-dependent subquery (finding one row suffices
> to calculate the value of EXISTS()).
> At optimizer_join_cache_level>=3, join buffering applies to outer joins,
> so t3 does join buffering (block nested loop join).
> We have two rows in t1 ('v' and 'we'); we evaluate the subquery twice,
> one for each row of t1.
> The first evaluation, for t1's row 'v', leads
> to generating a NULL-complemented row of t2 and t3. This generation
> happens in JOIN_CACHE::join_records() which calls join_null_complements(), like
> this:
>
> /* Prepare for generation of null complementing extensions */
> for (tab= join_tab->first_inner; tab <= join_tab->last_inner; tab++)
> tab->first_unmatched= join_tab->first_inner;
> }
> }
> if (join_tab->first_unmatched)
> {
> ...
> rc= join_null_complements(skip_last);
> if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS)
> goto finish;
> }
> ....
> if (outer_join_first_inner)
> {
> /*
> All null complemented rows have been already generated for all
> outer records from join buffer. Restore the state of the
> first_unmatched values to 0 to avoid another null complementing.
> */
> for (tab= join_tab->first_inner; tab <= join_tab->last_inner; tab++)
> tab->first_unmatched= 0;
> }
> finish:
> some irrelevant cleanup code;
>
> The return value 'rc' of join_null_complements() is here
> NESTED_LOOP_QUERY_LIMIT because the subquery had LIMIT=1:
> one row has been found in the subquery's result, no need
> to look for more rows (see end_send()). So we "goto finish", and thus
> forget to reset join_tab->first_unmatched to NULL.
>
> Then we have the second evaluation of the subquery, for t1's row 'we'.
> Because of the non-reset join_tab->first_unmatched, in the subquery's
> evaluation, in JOIN_CACHE::join_records(), we don't even try
> to find a match for t2's row in t3 (missing a row this way),
> we directly jump to NULL-complementing.
> The fix is the following:
> as JOIN_CACHE::join_records() sets first_unmatched and resets it,
> make it reset it in all cases: move the "reset" code to the "finish" label.
>
> modified:
> mysql-test/include/subquery_sj.inc
> mysql-test/r/subquery_sj_all.result
> mysql-test/r/subquery_sj_all_jcl6.result
> mysql-test/r/subquery_sj_all_jcl7.result
> mysql-test/r/subquery_sj_dupsweed.result
> mysql-test/r/subquery_sj_dupsweed_jcl6.result
> mysql-test/r/subquery_sj_dupsweed_jcl7.result
> mysql-test/r/subquery_sj_firstmatch.result
> mysql-test/r/subquery_sj_firstmatch_jcl6.result
> mysql-test/r/subquery_sj_firstmatch_jcl7.result
> mysql-test/r/subquery_sj_loosescan.result
> mysql-test/r/subquery_sj_loosescan_jcl6.result
> mysql-test/r/subquery_sj_loosescan_jcl7.result
> mysql-test/r/subquery_sj_mat.result
> mysql-test/r/subquery_sj_mat_jcl6.result
> mysql-test/r/subquery_sj_mat_jcl7.result
> mysql-test/r/subquery_sj_mat_nosj.result
> mysql-test/r/subquery_sj_none.result
> mysql-test/r/subquery_sj_none_jcl6.result
> mysql-test/r/subquery_sj_none_jcl7.result
> sql/sql_join_cache.cc
> === modified file 'mysql-test/include/subquery_sj.inc'
> --- a/mysql-test/include/subquery_sj.inc 2011-03-17 08:00:10 +0000
> +++ b/mysql-test/include/subquery_sj.inc 2011-05-31 12:10:01 +0000
> @@ -3540,3 +3540,27 @@ ORDER BY a;
>
> DROP TABLE t1;
> DROP VIEW v1;
> +
> +--echo #
> +--echo # Bug#12546542 MISSING ROW WHEN USING OPTIMIZER_JOIN_CACHE_LEVEL>=3
> +--echo #
> +CREATE TABLE t1 (
> + f2 varchar(1024) CHARACTER SET utf8 DEFAULT NULL
> +) ENGINE=InnoDB DEFAULT CHARSET=latin1;
> +INSERT INTO t1 VALUES ('v'),('we');
> +CREATE TABLE t2 (
> + col_varchar_1024_utf8 varchar(1024) CHARACTER SET utf8 DEFAULT NULL,
> + col_int_key int(11) DEFAULT NULL,
> + col_int int(11) DEFAULT NULL
> +) ENGINE=InnoDB DEFAULT CHARSET=latin1;
> +INSERT INTO t2 VALUES ('we',4,NULL),('v',1305673728,6);
> +CREATE TABLE t3 (
> + col_int_key int(11) DEFAULT NULL,
> + col_int int(11) DEFAULT NULL
> +) ENGINE=InnoDB DEFAULT CHARSET=latin1;
> +INSERT INTO t3 VALUES (4,4);

In the above table definitions, I would delete engine references, character set 
references and integer format specification to make the case as simple as possible.
> +
> +SELECT * FROM t1 WHERE f2 IN (SELECT a1.col_varchar_1024_utf8 AS f2 FROM t2
> +AS a1 LEFT JOIN t3 AS a2 ON a1.col_int_key = a2.col_int_key WHERE a1.col_int
> +BETWEEN 1 AND 10 OR a2.col_int IS NOT NULL);

For better readability, you might format the query as follows:

SELECT *
FROM t1
WHERE f2 IN (SELECT a1.col_varchar_1024_utf8 AS f2
              FROM t2 AS a1 LEFT JOIN t3 AS a2 ON a1.col_int_key = a2.col_int_key
              WHERE a1.col_int BETWEEN 1 AND 10 OR a2.col_int IS NOT NULL);

> +DROP TABLE t1,t2,t3;

...

> === modified file 'sql/sql_join_cache.cc'
> --- a/sql/sql_join_cache.cc 2011-04-26 08:49:10 +0000
> +++ b/sql/sql_join_cache.cc 2011-05-31 12:10:01 +0000
> @@ -1697,28 +1697,28 @@ enum_nested_loop_state JOIN_CACHE::join_
> if (rc != NESTED_LOOP_OK && rc != NESTED_LOOP_NO_MORE_ROWS)
> goto finish;
> }
> - if (outer_join_first_inner)
> - {
> - /*
> - All null complemented rows have been already generated for all
> - outer records from join buffer. Restore the state of the
> - first_unmatched values to 0 to avoid another null complementing.
> - */
> - for (tab= join_tab->first_inner; tab <= join_tab->last_inner; tab++)
> - tab->first_unmatched= 0;
> - }
> -
> +
> if (skip_last)
> {
> DBUG_ASSERT(!is_key_access());
> /*
> Restore the last record from the join buffer to generate
> - all extentions for it.
> + all extensions for it.
> */
> get_record();
> }
>
> finish:
> + if (outer_join_first_inner)
> + {
> + /*
> + All null complemented rows have been already generated for all
> + outer records from join buffer. Restore the state of the
> + first_unmatched values to 0 to avoid another null complementing.
> + */
> + for (tab= join_tab->first_inner; tab <= join_tab->last_inner; tab++)
> + tab->first_unmatched= 0;

You might replace 0 with NULL here.
> + }
> restore_last_record();
> reset(TRUE);

TRUE -> true ?

> DBUG_RETURN(rc);

Not about your code, but a reasonable modest code refactoring could be to define 
the "tab" variable inside the two loops where it is used.

Thanks,
Roy
Thread
Re: [Fwd: bzr commit into mysql-trunk branch (guilhem.bichot:3379)Bug#12546542]Roy Lyseng10 Jun
  • Re: [Fwd: bzr commit into mysql-trunk branch (guilhem.bichot:3379)Bug#12546542]Guilhem Bichot11 Jun