From: Roy Lyseng Date: March 23 2011 9:49am Subject: Re: bzr commit into mysql-trunk branch (olav.sandstaa:3306) Bug#11873324 List-Archive: http://lists.mysql.com/commits/133596 Message-Id: <4D89C244.3030301@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Olav, thank you for a patch with a nice encapsulation solution. The patch is approved, I have just a few nitpicking comments below. On 22.03.11 11.09, Olav Sandstaa wrote: > #At file:///export/home/tmp/mysql2/bug11873324-fix/ based on revid:vinay.fisrekar@stripped > > 3306 Olav Sandstaa 2011-03-22 > Fix for Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > > This bug causes NOT IN subqueries to in some cases return too few > records. This can happen when we have a subquery where a table is nitpicking: records -> rows > normally access using ref access and we have pushed an index condition > down to the storage engine based on this but in some cases the access > method is changed during execution to use table scan instead of ref > access. Queries like this are marked with 'Full scan on NULL key' in > the explain output. > > Details about how this situation occurs. > > 1. Index condition pushdown: Since the optimizer decides that the > access method for one of the tables in the subquery should be ref > access this is candidate for using index condition pushdown. The ICP > will push part of the table's condition down to the storage engine. As > part of this it will also remove this part from the table's condition > that should be evaluated by the server. > > 2. Next looking at what 'Full scan on NULL key' means for these queries: > > This is a NOT IN subquery. So for every record produced by the outer > query we will execute the subquery and see if this particular value > is produced by the subquery. The important parts relevant for this > case is found in the following two functions: > > A. longlong Item_in_optimizer::val_int() (called from > evaluate_join_record via a few other Item::val... functions) > > This functions checks if the value to check for is NULL. If it is NULL then > the function does (among other things): > > 1. Turn off the predicates that are based on column compares for > which the left part is currently NULL. This is done by setting > the subquery's "cond guards" to false. > > 2. Executes the subquery (see next step) > > 3. Turn all predicates back on. This is done by setting the > subquery's "cond guards" back to true. > > B. The execution of the subquery will eventually call the following function: > > subselect_single_select_engine::exec() > > In this function we have the following relevant actions: > > 1. We check if the subquery has "guarded conditions" and if so we do > a detailed check for all join tabs of the subquery to see if any of > the "guarded conditions" are "false" (see step 1 above). > > 2. If any of the "guarded conditions" are false we do as commented in the > code: > > /* > For at least one of the pushed predicates the following is true: > We should not apply optimizations based on the condition that was > pushed down into the subquery. Those optimizations are ref[_or_null] > acceses. Change them to be full table scans. > */ > > So we basically changes from using ref access to table scan. > > Summary for how this produces missing results: > > 1. When optimizing the subquery we push and index condition down to InnoDB > based on that the table should be accessed using ref access. > > 2. If the outer query produces (in this case) an integer to be checked > for "NOT IN subquery", the subquery is executed using ref access > (and everything is fine). > > 3. If the outer query produces a NULL value the execution of the > subquery is changed to use table scan. In this case the pushed index > condition will no longer be evaluated by InnoDB and the subquery will > produce more records than it should and the NOT IN will evaluate to false > (when it should evaluate to true) and the query will produce fewer > result records than it should (which is not fine). > > The fix for this is to add a check in push_index_cond() to determine > if the jointab has "guarded conditions". If it has then do not push > any index conditions down to the storage engine. > @ mysql-test/include/icp_tests.inc > Test case for Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > @ mysql-test/r/innodb_icp.result > Test case for Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > @ mysql-test/r/innodb_icp_none.result > Test case for Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > @ mysql-test/r/myisam_icp.result > Test case for Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > @ mysql-test/r/myisam_icp_none.result > Test case for Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > @ mysql-test/r/subquery_nomat_nosj.result > Change in plan after fix for Bug#11873324. After this fix we > will no longer use index condition pushdown for subqueries that > can do a 'Full scan on NULL key' (ie. a subquery that > can change from using ref access to do a full table scan during > execution). > @ sql/sql_select.cc > Fix for Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > > Some NOT IN (and some IN) subqueries that have "guarded conditions" I think that "and some IN" is too unspecific. I propose that you specify "IN queries for which is_top_level_item() returns false". This would e.g. happen for an IN predicate under XOR (because IS TRUE is not propagated past XOR), or an IN (subquery) IS FALSE (which is equivalent to NOT IN). If you feel for it, you may add extra test cases for these types of queries. > can be evaluated using ref access when the outer query produces a row > that is non-NULL. If the outer query produces a NULL value and need to > check that this is NOT IN the result from the subquery the code for > executing the subquery changes from using ref access to use table > scan. In this case any index conditions pushed for this table will no > longer be evaluated. > > To avoid that this situation occurs we check in push_index_cond() that > the jointab does not have any "guarded conditions". If it has we do not > try to push any index conditions for this table. > @ sql/sql_select.h > Add methods to TABLE_REF and JOIN_TAB to check if they contains > guarded conditions. > > modified: > mysql-test/include/icp_tests.inc > mysql-test/r/innodb_icp.result > mysql-test/r/innodb_icp_none.result > mysql-test/r/myisam_icp.result > mysql-test/r/myisam_icp_none.result > mysql-test/r/subquery_nomat_nosj.result > sql/sql_select.cc > sql/sql_select.h > === modified file 'mysql-test/include/icp_tests.inc' > > === modified file 'mysql-test/include/icp_tests.inc' > --- a/mysql-test/include/icp_tests.inc 2011-03-17 11:48:04 +0000 > +++ b/mysql-test/include/icp_tests.inc 2011-03-22 10:09:26 +0000 > @@ -838,3 +838,43 @@ > SELECT col999 FROM t1 WHERE col1000 = "3" AND col1003<=> sysdate(); > > DROP TABLE t1; > + > +--echo # > +--echo # Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > +--echo # > + > +CREATE TABLE t1 ( > + pk INTEGER NOT NULL, > + i1 INTEGER, > + PRIMARY KEY (pk), > + KEY col_int_key (i1) > +); > + > +INSERT INTO t1 VALUES (14,NULL), (18,133); > + > +CREATE TABLE t2 ( > + pk INTEGER NOT NULL, > + i1 INTEGER, > + c1 VARCHAR(1), > + PRIMARY KEY (pk), > + KEY col_int_key (i1) > +); > + > +INSERT INTO t2 VALUES (1,7,'f'); > + > +let query= > +SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > + FROM t1 AS SUBQUERY_t1 > + JOIN t2 AS SUBQUERY_t2 > + ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > + WHERE SUBQUERY_t1.i1> 0 > + OR SUBQUERY_t2.c1 = 'a' > +); > + > +eval EXPLAIN $query; > +eval $query; > + > +DROP TABLE t1,t2; > > === modified file 'mysql-test/r/innodb_icp.result' > --- a/mysql-test/r/innodb_icp.result 2011-03-17 11:48:04 +0000 > +++ b/mysql-test/r/innodb_icp.result 2011-03-22 10:09:26 +0000 > @@ -776,5 +776,51 @@ > SELECT col999 FROM t1 WHERE col1000 = "3" AND col1003<=> sysdate(); > col999 > DROP TABLE t1; > +# > +# Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > +# > +CREATE TABLE t1 ( > +pk INTEGER NOT NULL, > +i1 INTEGER, > +PRIMARY KEY (pk), > +KEY col_int_key (i1) > +); > +INSERT INTO t1 VALUES (14,NULL), (18,133); > +CREATE TABLE t2 ( > +pk INTEGER NOT NULL, > +i1 INTEGER, > +c1 VARCHAR(1), > +PRIMARY KEY (pk), > +KEY col_int_key (i1) > +); > +INSERT INTO t2 VALUES (1,7,'f'); > +EXPLAIN SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > +FROM t1 AS SUBQUERY_t1 > +JOIN t2 AS SUBQUERY_t2 > +ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > +WHERE SUBQUERY_t1.i1> 0 > +OR SUBQUERY_t2.c1 = 'a' > +); > +id select_type table type possible_keys key key_len ref rows Extra > +1 PRIMARY t1 index NULL col_int_key 5 NULL 2 Using where; Using index > +2 DEPENDENT SUBQUERY SUBQUERY_t1 eq_ref PRIMARY,col_int_key PRIMARY 4 func 1 Using where; Full scan on NULL key > +2 DEPENDENT SUBQUERY SUBQUERY_t2 ref col_int_key col_int_key 5 func 1 Using where; Full scan on NULL key > +SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > +FROM t1 AS SUBQUERY_t1 > +JOIN t2 AS SUBQUERY_t2 > +ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > +WHERE SUBQUERY_t1.i1> 0 > +OR SUBQUERY_t2.c1 = 'a' > +); > +i1 > +NULL > +133 > +DROP TABLE t1,t2; > set default_storage_engine= @save_storage_engine; > set optimizer_switch=default; > > === modified file 'mysql-test/r/innodb_icp_none.result' > --- a/mysql-test/r/innodb_icp_none.result 2011-03-17 11:48:04 +0000 > +++ b/mysql-test/r/innodb_icp_none.result 2011-03-22 10:09:26 +0000 > @@ -775,5 +775,51 @@ > SELECT col999 FROM t1 WHERE col1000 = "3" AND col1003<=> sysdate(); > col999 > DROP TABLE t1; > +# > +# Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > +# > +CREATE TABLE t1 ( > +pk INTEGER NOT NULL, > +i1 INTEGER, > +PRIMARY KEY (pk), > +KEY col_int_key (i1) > +); > +INSERT INTO t1 VALUES (14,NULL), (18,133); > +CREATE TABLE t2 ( > +pk INTEGER NOT NULL, > +i1 INTEGER, > +c1 VARCHAR(1), > +PRIMARY KEY (pk), > +KEY col_int_key (i1) > +); > +INSERT INTO t2 VALUES (1,7,'f'); > +EXPLAIN SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > +FROM t1 AS SUBQUERY_t1 > +JOIN t2 AS SUBQUERY_t2 > +ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > +WHERE SUBQUERY_t1.i1> 0 > +OR SUBQUERY_t2.c1 = 'a' > +); > +id select_type table type possible_keys key key_len ref rows Extra > +1 PRIMARY t1 index NULL col_int_key 5 NULL 2 Using where; Using index > +2 DEPENDENT SUBQUERY SUBQUERY_t1 eq_ref PRIMARY,col_int_key PRIMARY 4 func 1 Using where; Full scan on NULL key > +2 DEPENDENT SUBQUERY SUBQUERY_t2 ref col_int_key col_int_key 5 func 1 Using where; Full scan on NULL key > +SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > +FROM t1 AS SUBQUERY_t1 > +JOIN t2 AS SUBQUERY_t2 > +ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > +WHERE SUBQUERY_t1.i1> 0 > +OR SUBQUERY_t2.c1 = 'a' > +); > +i1 > +NULL > +133 > +DROP TABLE t1,t2; > set default_storage_engine= @save_storage_engine; > set optimizer_switch=default; > > === modified file 'mysql-test/r/myisam_icp.result' > --- a/mysql-test/r/myisam_icp.result 2011-03-17 11:48:04 +0000 > +++ b/mysql-test/r/myisam_icp.result 2011-03-22 10:09:26 +0000 > @@ -774,4 +774,50 @@ > SELECT col999 FROM t1 WHERE col1000 = "3" AND col1003<=> sysdate(); > col999 > DROP TABLE t1; > +# > +# Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > +# > +CREATE TABLE t1 ( > +pk INTEGER NOT NULL, > +i1 INTEGER, > +PRIMARY KEY (pk), > +KEY col_int_key (i1) > +); > +INSERT INTO t1 VALUES (14,NULL), (18,133); > +CREATE TABLE t2 ( > +pk INTEGER NOT NULL, > +i1 INTEGER, > +c1 VARCHAR(1), > +PRIMARY KEY (pk), > +KEY col_int_key (i1) > +); > +INSERT INTO t2 VALUES (1,7,'f'); > +EXPLAIN SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > +FROM t1 AS SUBQUERY_t1 > +JOIN t2 AS SUBQUERY_t2 > +ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > +WHERE SUBQUERY_t1.i1> 0 > +OR SUBQUERY_t2.c1 = 'a' > +); > +id select_type table type possible_keys key key_len ref rows Extra > +1 PRIMARY t1 index NULL col_int_key 5 NULL 2 Using where; Using index > +2 DEPENDENT SUBQUERY SUBQUERY_t1 eq_ref PRIMARY,col_int_key PRIMARY 4 func 1 Using where; Full scan on NULL key > +2 DEPENDENT SUBQUERY SUBQUERY_t2 ALL col_int_key NULL NULL NULL 1 Using where; Using join buffer (BNL, incremental buffers) > +SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > +FROM t1 AS SUBQUERY_t1 > +JOIN t2 AS SUBQUERY_t2 > +ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > +WHERE SUBQUERY_t1.i1> 0 > +OR SUBQUERY_t2.c1 = 'a' > +); > +i1 > +NULL > +133 > +DROP TABLE t1,t2; > set optimizer_switch=default; > > === modified file 'mysql-test/r/myisam_icp_none.result' > --- a/mysql-test/r/myisam_icp_none.result 2011-03-17 11:48:04 +0000 > +++ b/mysql-test/r/myisam_icp_none.result 2011-03-22 10:09:26 +0000 > @@ -773,4 +773,50 @@ > SELECT col999 FROM t1 WHERE col1000 = "3" AND col1003<=> sysdate(); > col999 > DROP TABLE t1; > +# > +# Bug#11873324 "WRONG RESULT WITH ICP AND STRAIGHT_JOIN" > +# > +CREATE TABLE t1 ( > +pk INTEGER NOT NULL, > +i1 INTEGER, > +PRIMARY KEY (pk), > +KEY col_int_key (i1) > +); > +INSERT INTO t1 VALUES (14,NULL), (18,133); > +CREATE TABLE t2 ( > +pk INTEGER NOT NULL, > +i1 INTEGER, > +c1 VARCHAR(1), > +PRIMARY KEY (pk), > +KEY col_int_key (i1) > +); > +INSERT INTO t2 VALUES (1,7,'f'); > +EXPLAIN SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > +FROM t1 AS SUBQUERY_t1 > +JOIN t2 AS SUBQUERY_t2 > +ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > +WHERE SUBQUERY_t1.i1> 0 > +OR SUBQUERY_t2.c1 = 'a' > +); > +id select_type table type possible_keys key key_len ref rows Extra > +1 PRIMARY t1 index NULL col_int_key 5 NULL 2 Using where; Using index > +2 DEPENDENT SUBQUERY SUBQUERY_t1 eq_ref PRIMARY,col_int_key PRIMARY 4 func 1 Using where; Full scan on NULL key > +2 DEPENDENT SUBQUERY SUBQUERY_t2 ALL col_int_key NULL NULL NULL 1 Using where; Using join buffer (BNL, incremental buffers) > +SELECT t1.i1 > +FROM t1 > +WHERE t1.i1 NOT IN > +( SELECT STRAIGHT_JOIN SUBQUERY_t1.pk > +FROM t1 AS SUBQUERY_t1 > +JOIN t2 AS SUBQUERY_t2 > +ON SUBQUERY_t2.i1 = SUBQUERY_t1.pk > +WHERE SUBQUERY_t1.i1> 0 > +OR SUBQUERY_t2.c1 = 'a' > +); > +i1 > +NULL > +133 > +DROP TABLE t1,t2; > set optimizer_switch=default; > > === modified file 'mysql-test/r/subquery_nomat_nosj.result' > --- a/mysql-test/r/subquery_nomat_nosj.result 2011-02-14 11:21:26 +0000 > +++ b/mysql-test/r/subquery_nomat_nosj.result 2011-03-22 10:09:26 +0000 > @@ -4923,7 +4923,7 @@ > id select_type table type possible_keys key key_len ref rows Extra > 1 PRIMARY t1 ALL NULL NULL NULL NULL 4 Using where > 2 DEPENDENT SUBQUERY t2 eq_ref PRIMARY PRIMARY 4 func 1 Using where; Using index; Full scan on NULL key > -2 DEPENDENT SUBQUERY t3 eq_ref PRIMARY PRIMARY 4 func 1 Using index condition; Using where; Full scan on NULL key > +2 DEPENDENT SUBQUERY t3 eq_ref PRIMARY PRIMARY 4 func 1 Using where; Full scan on NULL key > SELECT * FROM t1 > WHERE t1.id NOT IN (SELECT t2.id FROM t2,t3 > WHERE t3.name='xxx' AND t2.id=t3.id); > > === modified file 'sql/sql_select.cc' > --- a/sql/sql_select.cc 2011-03-17 12:45:19 +0000 > +++ b/sql/sql_select.cc 2011-03-22 10:09:26 +0000 > @@ -10295,13 +10295,17 @@ > condition might then also be applied by the storage engine > when doing the update part and result in either not finding > the record to update or updating the wrong record. > + 4. The jointab is not part of a subquery that has guarded conditions > + that can be turned on or off during execution of a 'Full scan on NULL > + key'. > */ > if (tab->select_cond&& > tab->table->file->index_flags(keyno, 0, 1)& > HA_DO_INDEX_COND_PUSHDOWN&& > tab->join->thd->optimizer_switch_flag(OPTIMIZER_SWITCH_INDEX_CONDITION_PUSHDOWN)&& > tab->join->thd->lex->sql_command != SQLCOM_UPDATE_MULTI&& > - tab->join->thd->lex->sql_command != SQLCOM_DELETE_MULTI) > + tab->join->thd->lex->sql_command != SQLCOM_DELETE_MULTI&& > + !tab->have_guarded_conds()) > { > DBUG_EXECUTE("where", print_where(tab->select_cond, "full cond", > QT_ORDINARY);); > > === modified file 'sql/sql_select.h' > --- a/sql/sql_select.h 2011-03-17 09:43:28 +0000 > +++ b/sql/sql_select.h 2011-03-22 10:09:26 +0000 > @@ -179,6 +179,26 @@ > return FALSE; > } > > + > + /** > + Check if there are triggered/guarded conditions that might be > + 'switched off' by the subquery code when executing 'Full scan on > + NULL' subqueries. > + > + @return true if there are guarded conditions, false otherwise > + */ > + > + bool have_guarded_conds() const > + { > + DBUG_ASSERT(key_parts == 0 || cond_guards != NULL); > + > + for (uint i = 0; i< key_parts; i++) > + { > + if (cond_guards[i]) > + return true; > + } > + return false; > + } > } TABLE_REF; > > > @@ -431,6 +451,20 @@ > return tmp_select_cond; > } > uint get_sj_strategy() const; > + > + > + /** > + Check if there are triggered/guarded conditions that might be > + 'switched off' by the subquery code when executing 'Full scan on > + NULL' subqueries. > + > + @return true if there are guarded conditions, false otherwise > + */ > + > + bool have_guarded_conds() const > + { > + return ref.have_guarded_conds(); > + } > } JOIN_TAB; nitpicking: I think that it would be better English to call the functions has_guarded_conds(). Thanks, Roy