List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:May 14 2010 11:10am
Subject:bzr commit into mysql-6.0-codebase-bugfixing branch (guilhem:3863) Bug#53298
View as plain text  
#At file:///home/mysql_src/bzrrepos/mysql-6.0-codebase-bugfixing2/ based on revid:alik@ibmvm-20100504122520-kzqe0rufxb054x9s

 3863 Guilhem Bichot	2010-05-14
      Fix for BUG#53298 "wrong result with semijoin (no semijoin strategy chosen)".
     @ mysql-test/r/subselect3.result
        Without the fix, we were comparing the cost of first-considered plan
        "t11,t12,t21,t22" (with materialization scan for (t11,t12)) with the
        cost of partial plan "t11,t12,t22" (inner joins). Yes, inner joins,
        because when we considered that partial plan, cur_dups_producing_tables
        was 0 (had not been reset, which is the bug): this said that no
        semijoin strategy was needed so inner join was used.
        "t11,t12,t22" was then pruned by the heuristic of
        optimizer_prune_level=1, because it had a higher record count.
        With the fix, when considering "t11,t12,t22" we correctly realize that
        this needs a semijoin strategy, choose materialization scan, and go
        down to "t11,t12,t22,t21", which is less costly than
        "t11,t12,t21,t22" (11.986 vs 12). Hence the plan change.
     @ mysql-test/r/subselect4.result
        before the code fix, the final select would return two rows (1 and 1).
     @ mysql-test/r/subselect_sj2.result
        Before the code fix, EXPLAIN showed a wrong plan: adding two Australian towns to table t1, each of them
        having a population greater than 5MB, led the SELECT to return Australia twice (duplicate);
        after this fix, Australia would be returned once.
     @ mysql-test/t/subselect4.test
        test for bug
     @ sql/sql_select.cc
        In the bug's scenario, t3 and t4 are inner tables of the semijoin. t4 is pulled out
        (functionally dependent), but t3 is not. As we thus have one inner table left,
        we need a semijoin strategy; without semijoin strategy, the semijoin just becomes
        a plain join, leading to duplicate rows. Why didn't we have a semijoin strategy:
        - plan search would first choose this order: t4,t3,t1,t2, with duplicate weedout
        as semijoin strategy. As this handles the semijoin fully, join->cur_dups_producing_tables
        is 0 (as a result of choosing duplicates weedout in advance_sj_state() for plan's last
        table t2).
        - then plan search evaluates plan t4,t3,t2,t1: it first computes its cost, then in
        avdance_sj_state(), as join->cur_dups_producing_tables is still 0 (the bug),
        it believes that semijoin inner tables have been handled already, so doesn't feel
        obliged to choose a semijoin strategy, so this plan has no strategy, and as its
        cost is lower than that of t4,t3,t1,t2, it is chosen. A plan without a strategy,
        thus duplicates.
        The fix: move JOIN::cur_dups_producing_tables to POSITION (which tracks the
        optimization state at the partial plan's stage); as the function
        comment of advance_sj_state() says, "no undo is necessary" then. When
        we are about to look at the first table, we don't have duplicate-producing
        tables yet (=0); when we go to table N+1, we inherit the same duplicate-
        producing tables (=pos[-1].etc), and we add more tables if we hit a
        semijoin nest, and we remove tables when we pick a semijoin strategy.
     @ sql/sql_select.h
        move JOIN::cur_dups_producing_tables to POSITION

    modified:
      mysql-test/r/subselect3.result
      mysql-test/r/subselect3_jcl6.result
      mysql-test/r/subselect4.result
      mysql-test/r/subselect_sj2.result
      mysql-test/r/subselect_sj2_jcl6.result
      mysql-test/t/subselect4.test
      sql/sql_select.cc
      sql/sql_select.h
=== modified file 'mysql-test/r/subselect3.result'
--- a/mysql-test/r/subselect3.result	2010-04-28 19:42:00 +0000
+++ b/mysql-test/r/subselect3.result	2010-05-14 11:10:33 +0000
@@ -1063,8 +1063,8 @@ t22.a in (select t12.a from t11, t12 whe
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 1	PRIMARY	t11	ALL	NULL	NULL	NULL	NULL	8	Using where; Using temporary; Using filesort; Start materialize; Scan
 1	PRIMARY	t12	ALL	NULL	NULL	NULL	NULL	8	Using where; End materialize; Using join buffer
-1	PRIMARY	t21	ALL	NULL	NULL	NULL	NULL	26	Using where; Using join buffer
 1	PRIMARY	t22	ALL	NULL	NULL	NULL	NULL	26	Using where; Using join buffer
+1	PRIMARY	t21	ALL	NULL	NULL	NULL	NULL	26	Using where; Using join buffer
 select t21.* from t21,t22 where t21.a = t22.a and 
 t22.a in (select t12.a from t11, t12 where t11.a in(255,256) and t11.a = t12.a and t11.c is null) and t22.c is null order by t21.a;
 a	b	c

=== modified file 'mysql-test/r/subselect3_jcl6.result'
--- a/mysql-test/r/subselect3_jcl6.result	2010-04-28 19:42:00 +0000
+++ b/mysql-test/r/subselect3_jcl6.result	2010-05-14 11:10:33 +0000
@@ -1067,8 +1067,8 @@ t22.a in (select t12.a from t11, t12 whe
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 1	PRIMARY	t11	ALL	NULL	NULL	NULL	NULL	8	Using where; Using temporary; Using filesort; Start materialize; Scan
 1	PRIMARY	t12	ALL	NULL	NULL	NULL	NULL	8	Using where; End materialize; Using join buffer
-1	PRIMARY	t21	ALL	NULL	NULL	NULL	NULL	26	Using where; Using join buffer
 1	PRIMARY	t22	ALL	NULL	NULL	NULL	NULL	26	Using where; Using join buffer
+1	PRIMARY	t21	ALL	NULL	NULL	NULL	NULL	26	Using where; Using join buffer
 select t21.* from t21,t22 where t21.a = t22.a and 
 t22.a in (select t12.a from t11, t12 where t11.a in(255,256) and t11.a = t12.a and t11.c is null) and t22.c is null order by t21.a;
 a	b	c

=== modified file 'mysql-test/r/subselect4.result'
--- a/mysql-test/r/subselect4.result	2010-04-28 14:55:48 +0000
+++ b/mysql-test/r/subselect4.result	2010-05-14 11:10:33 +0000
@@ -500,3 +500,28 @@ SUBQUERY1_t1.col_varchar) ) ) 
 ;
 pk
 drop table t1;
+#
+# BUG#53298 "wrong result with semijoin (no semijoin strategy chosen)"
+#
+create table t1 (uid int, fid int);
+insert into t1 values (1,1), (3,1);
+create table t2 (uid int, name varchar(128));
+insert into t2 values (1, "A"), (2, "B");
+create table t3 (uid int, fid int, index(uid));
+insert into t3 values (1,3), (1,3);
+create table t4 (uid int);
+insert into t4 values (3);
+explain select t2.uid from t2, t1 
+where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
+and t2.uid=t1.fid;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	PRIMARY	t4	system	NULL	NULL	NULL	NULL	1	
+1	PRIMARY	t3	ref	uid	uid	5	const	1	Using where; Materialize; Scan
+1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	2	Using join buffer
+1	PRIMARY	t1	ALL	NULL	NULL	NULL	NULL	2	Using where; Using join buffer
+select t2.uid from t2, t1 
+where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
+and t2.uid=t1.fid;
+uid
+1
+drop table t1,t2,t3,t4;

=== modified file 'mysql-test/r/subselect_sj2.result'
--- a/mysql-test/r/subselect_sj2.result	2010-01-20 10:11:29 +0000
+++ b/mysql-test/r/subselect_sj2.result	2010-05-14 11:10:33 +0000
@@ -303,9 +303,9 @@ t2.Code IN (SELECT Country FROM t3 
 WHERE Language='English' AND Percentage > 10 AND
 t2.Population > 100000);
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	PRIMARY	t1	range	Population,Country	Population	4	NULL	1	Using index condition; Using MRR
-1	PRIMARY	t3	eq_ref	PRIMARY,Percentage	PRIMARY	33	test.t1.Country,const	1	Using index condition; Using where
-1	PRIMARY	t2	eq_ref	PRIMARY,Population	PRIMARY	3	test.t3.Country	1	Using index condition; Using where
+1	PRIMARY	t1	range	Population,Country	Population	4	NULL	1	Using index condition; Using MRR; Materialize; Scan
+1	PRIMARY	t2	eq_ref	PRIMARY,Population	PRIMARY	3	test.t1.Country	1	Using where
+1	PRIMARY	t3	eq_ref	PRIMARY,Percentage	PRIMARY	33	test.t2.Code,const	1	Using index condition; Using where
 DROP TABLE t1,t2,t3;
 CREATE TABLE t1 (
 Code char(3) NOT NULL DEFAULT '',

=== modified file 'mysql-test/r/subselect_sj2_jcl6.result'
--- a/mysql-test/r/subselect_sj2_jcl6.result	2010-01-20 10:11:29 +0000
+++ b/mysql-test/r/subselect_sj2_jcl6.result	2010-05-14 11:10:33 +0000
@@ -307,9 +307,9 @@ t2.Code IN (SELECT Country FROM t3 
 WHERE Language='English' AND Percentage > 10 AND
 t2.Population > 100000);
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	PRIMARY	t1	range	Population,Country	Population	4	NULL	1	Using index condition; Using MRR
-1	PRIMARY	t3	eq_ref	PRIMARY,Percentage	PRIMARY	33	test.t1.Country,const	1	Using index condition; Using where; Using join buffer
-1	PRIMARY	t2	eq_ref	PRIMARY,Population	PRIMARY	3	test.t3.Country	1	Using index condition(BKA); Using where; Using join buffer
+1	PRIMARY	t1	range	Population,Country	Population	4	NULL	1	Using index condition; Using MRR; Materialize; Scan
+1	PRIMARY	t2	eq_ref	PRIMARY,Population	PRIMARY	3	test.t1.Country	1	Using where; Using join buffer
+1	PRIMARY	t3	eq_ref	PRIMARY,Percentage	PRIMARY	33	test.t2.Code,const	1	Using index condition(BKA); Using where; Using join buffer
 DROP TABLE t1,t2,t3;
 CREATE TABLE t1 (
 Code char(3) NOT NULL DEFAULT '',

=== modified file 'mysql-test/t/subselect4.test'
--- a/mysql-test/t/subselect4.test	2010-04-28 14:55:48 +0000
+++ b/mysql-test/t/subselect4.test	2010-05-14 11:10:33 +0000
@@ -465,3 +465,31 @@ WHERE ( 1, 2 ) IN ( SELECT SUBQUERY1_t1.
 ;
 
 drop table t1;
+
+--echo #
+--echo # BUG#53298 "wrong result with semijoin (no semijoin strategy chosen)"
+--echo #
+
+create table t1 (uid int, fid int);
+insert into t1 values (1,1), (3,1);
+
+create table t2 (uid int, name varchar(128));
+insert into t2 values (1, "A"), (2, "B");
+
+create table t3 (uid int, fid int, index(uid));
+insert into t3 values (1,3), (1,3);
+
+create table t4 (uid int);
+insert into t4 values (3);
+
+explain select t2.uid from t2, t1 
+  where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
+        and t2.uid=t1.fid;
+
+--sorted_result
+select t2.uid from t2, t1 
+  where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
+        and t2.uid=t1.fid;
+
+drop table t1,t2,t3,t4;
+

=== modified file 'sql/sql_select.cc'
--- a/sql/sql_select.cc	2010-04-29 12:15:06 +0000
+++ b/sql/sql_select.cc	2010-05-14 11:10:33 +0000
@@ -7010,7 +7010,6 @@ choose_plan(JOIN *join, table_map join_t
   DBUG_ENTER("choose_plan");
 
   join->cur_embedding_map= 0;
-  join->cur_dups_producing_tables= 0;
   reset_nj_counters(join->join_list);
   qsort2_cmp jtab_sort_func;
 
@@ -13335,6 +13334,7 @@ void advance_sj_state(JOIN *join, table_
   /* Initialize the state or copy it from prev. tables */
   if (idx == join->const_tables)
   {
+    pos->cur_dups_producing_tables= 0;
     pos->first_firstmatch_table= MAX_TABLES;
     pos->first_loosescan_table= MAX_TABLES; 
     pos->dupsweedout_tables= 0;
@@ -13343,6 +13343,8 @@ void advance_sj_state(JOIN *join, table_
   }
   else
   {
+    pos->cur_dups_producing_tables= pos[-1].cur_dups_producing_tables;
+
     // FirstMatch
     pos->first_firstmatch_table=
       (pos[-1].sj_strategy == SJ_OPT_FIRST_MATCH) ?
@@ -13520,14 +13522,14 @@ void advance_sj_state(JOIN *join, table_
   if ((emb_sj_nest= new_join_tab->emb_sj_nest))
   {
     join->cur_sj_inner_tables |= emb_sj_nest->sj_inner_tables;
-    join->cur_dups_producing_tables |= emb_sj_nest->sj_inner_tables;
+    pos->cur_dups_producing_tables |= emb_sj_nest->sj_inner_tables;
 
     /* Remove the sj_nest if all of its SJ-inner tables are in cur_table_map */
     if (!(remaining_tables &
           emb_sj_nest->sj_inner_tables & ~new_join_tab->table->map))
       join->cur_sj_inner_tables &= ~emb_sj_nest->sj_inner_tables;
   }
-  join->cur_dups_producing_tables &= ~handled_by_fm_or_ls;
+  pos->cur_dups_producing_tables &= ~handled_by_fm_or_ls;
 
   /* 4. MaterializeLookup and MaterializeScan strategy handler */
   const int sjm_strategy=
@@ -13581,7 +13583,7 @@ void advance_sj_state(JOIN *join, table_
     mat_read_time += mat_info->materialization_cost.total_cost() +
                      prefix_rec_count * mat_info->lookup_cost.total_cost();
 
-    if (mat_read_time < *current_read_time || join->cur_dups_producing_tables)
+    if (mat_read_time < *current_read_time || pos->cur_dups_producing_tables)
     {
       /*
         NOTE: When we pick to use SJM[-Scan] we don't memcpy its POSITION
@@ -13592,7 +13594,7 @@ void advance_sj_state(JOIN *join, table_
       pos->sj_strategy= SJ_OPT_MATERIALIZE_LOOKUP;
       *current_read_time=    mat_read_time;
       *current_record_count= prefix_rec_count;
-      join->cur_dups_producing_tables&=
+      pos->cur_dups_producing_tables &=
         ~new_join_tab->emb_sj_nest->sj_inner_tables;
     }
   }
@@ -13648,12 +13650,12 @@ void advance_sj_state(JOIN *join, table_
       comparing cost without semi-join duplicate removal with cost with
       duplicate removal is not an apples-to-apples comparison.
     */
-    if (prefix_cost < *current_read_time || join->cur_dups_producing_tables)
+    if (prefix_cost < *current_read_time || pos->cur_dups_producing_tables)
     {
       pos->sj_strategy= SJ_OPT_MATERIALIZE_SCAN;
       *current_read_time=    prefix_cost;
       *current_record_count= prefix_rec_count;
-      join->cur_dups_producing_tables&= ~mat_nest->sj_inner_tables;
+      pos->cur_dups_producing_tables &= ~mat_nest->sj_inner_tables;
 
     }
   }
@@ -13758,12 +13760,12 @@ void advance_sj_state(JOIN *join, table_
         to consider (it needs "the most" tables in the prefix) and we can't
         leave duplicate-producing tables not handled by any strategy.
       */
-      if (dups_cost < *current_read_time || join->cur_dups_producing_tables)
+      if (dups_cost < *current_read_time || pos->cur_dups_producing_tables)
       {
         pos->sj_strategy= SJ_OPT_DUPS_WEEDOUT;
         *current_read_time= dups_cost;
         *current_record_count= prefix_rec_count * sj_outer_fanout;
-        join->cur_dups_producing_tables &= ~dups_removed_fanout;
+        pos->cur_dups_producing_tables &= ~dups_removed_fanout;
       }
     }
   }

=== modified file 'sql/sql_select.h'
--- a/sql/sql_select.h	2010-04-16 14:20:09 +0000
+++ b/sql/sql_select.h	2010-05-14 11:10:33 +0000
@@ -1296,6 +1296,13 @@ typedef struct st_position
   */
   uint n_sj_tables;
 
+  /**
+    Bitmap of semi-join inner tables that are in the join prefix and for
+    which there's no provision yet for how to eliminate semi-join duplicates
+    which they produce.
+  */
+  table_map cur_dups_producing_tables;
+
 /* LooseScan strategy members */
 
   /* The first (i.e. driving) table we're doing loose scan for */
@@ -1332,7 +1339,6 @@ typedef struct st_position
    */
   table_map firstmatch_need_tables;
 
-
 /* Duplicate Weedout strategy */
   /* The first table that the strategy will need to handle */
   uint  first_dupsweedout_table;
@@ -1522,13 +1528,6 @@ public:
     nests that have their tables both in and outside of the join prefix.
   */
   table_map cur_sj_inner_tables;
-  
-  /*
-    Bitmap of semi-join inner tables that are in the join prefix and for
-    which there's no provision for how to eliminate semi-join duplicates
-    they produce.
-  */
-  table_map cur_dups_producing_tables;
 
   /* We also maintain a stack of join optimization states in * join->positions[] */
 /******* Join optimization state members end *******/


Attachment: [text/bzr-bundle] bzr/guilhem@mysql.com-20100514111033-8wv6cvp50q2gibfa.bundle
Thread
bzr commit into mysql-6.0-codebase-bugfixing branch (guilhem:3863) Bug#53298Guilhem Bichot14 May
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch (guilhem:3863)Bug#53298Olav Sandstaa19 May
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch (guilhem:3863)Bug#53298Roy Lyseng19 May