Thanks for the patch, Roy.
I have two main questions to the patch:
- Why does it lead to plan changes in the loop of subselect_sj?
I thought the agreement was that behavior should only change when
prepare is used.
- Would it not be better to split pull_out_semijoin into two
different functions, one for functional dependencies and one for
const tables? Seems a bit strange to call same function twice to
do quite different things.
See more comments inline.
On 06/ 4/10 05:04 PM, Roy Lyseng wrote:
> #At file:///home/rl136806/mysql/repo/mysql-work4/ based on
revid:olav@stripped
>
> 3185 Roy Lyseng 2010-06-04
> Bug#43768: Prepared query with nested subqueries core dump on
second execution
>
> The problem here is that pullout of semijoin tables is
attempted for
> each execution, but because those tables are not "pushed back"
again
> after each execution, the pullout fails on second attempt.
>
> The solution chosen here is to pullout only those semijoin
tables that are
> functionally dependent upon the outer tables. This pullout
operation
> can be performed only once, and, unlike the current procedure,
is not
> dependent upon the data volume of the involved tables.
> We still pullout tables based on volume if this is a direct
execution, though.
I think more people would immediatedly understand what you were
discussing here if you referred to "const tables" instead of "based on
volume."
>
> The practical implication of this is as follows:
>
> - Only outer tables containing zero or one rows will now be
extracted
> as "const tables". Thus, such tables from a semijoin nest
are no
> longer accessed during optimization, and some (rare)
optimizations
> are no longer possible.
>
> - In the majority of cases, there is no performance impact.
Often,
> the new strategy chosen is Materialization, meaning that
the row
> of these table is accessed only once and saved in local memory.
>
> - Const table analysis now has to be done in two phases:
> 1) Pullout tables based on dependencies. Both outer and
inner tables
> may apply, and
> 2) Pullout outer tables based on data volume.
>
> In order to implement the latter point above, and assure that
pullout
> of semijoin tables occurs only once, make_join_statistics()
has been
> restructured slightly. The conditional logic within the
function has also
> been enhanced for better readability.
>
> The logic of make_join_statistics() is now as follows:
>
> 1. Initialize JOIN data structures
> (major part of first loop in existing implementation).
>
> 2. Update dependencies based on join information
> (the Warshall algorithm).
>
> 3. Make key descriptions (update_ref_and_keys()).
>
> 4. Pull out semijoin tables, called only once.
>
> 5. Extract tables with zero or one rows as const tables
> (consider outer tables only, no semijoin tables).
>
> 6. Extract dependent tables as const tables.
> (consider outer tables only, no semijoin tables).
>
> 7. The remaining parts of the function.
>
> mysql-test/r/select_found.result
> Possible keys changed from "PRIMARY,kid" to "kid".
>
Is this a good change? Is it a necessary change?
> mysql-test/r/subselect4.result
> Added test case for bug#43768
>
> mysql-test/r/subselect_sj.result
> A number of plan changes because of extensive testing of
semi-join
> tables with 0, 1 and 2 rows.
>
> mysql-test/r/subselect_sj_jcl6.result
> Ditto.
>
> mysql-test/t/subselect4.test
> Added test case for bug#43768
Since it it related to semijoin, I would prefer if you added your test
case to subselect_sj instead. Will also less likely cause conflict
with my test changes (subselect4 will disappear, subselect_sj will be
renamed)
>
> mysql-test/t/subselect_sj.test
> bug#46744 now becomes a duplicate of bug#50489, and test
case is moved.
>
> mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
> Test for bug#46744 moved here.
Does this mean that Bug#46744 is no longer fixed?
>
> sql/sql_select.cc
> pull_out_semijoin_tables()
> join_tab->emb_sj_nest is no longer updated. Comments updated.
> make_join_statistics()
> Removed const table pullout from first loop.
> Simplified testing based on inner/outer/semi-join properties.
> Calls pull_out_semijoin_tables() just after dependency
analysis.
> Then, added loop that performs pullout based on data volume
> but excludes semijoined tables in prepared statements.
> Second call of pull_out_semijoin_tables() is needed after
this.
> Simplified testing based on inner/outer/semi-join properties.
> Assured that semijoin tables are no longer pulled out
based on volume.
>
The last sentence is no longer true.
I also suggest you add a sentence about that you have changed where
emb_sj_nest is set.
> added:
> mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
> modified:
> mysql-test/r/select_found.result
> mysql-test/r/subselect4.result
> mysql-test/r/subselect_sj.result
> mysql-test/r/subselect_sj_jcl6.result
> mysql-test/t/subselect4.test
> mysql-test/t/subselect_sj.test
> sql/sql_select.cc
> === modified file 'mysql-test/r/select_found.result'
> --- a/mysql-test/r/select_found.result 2007-08-02 19:45:56 +0000
> +++ b/mysql-test/r/select_found.result 2010-06-04 15:03:03 +0000
> @@ -83,7 +83,7 @@ UNIQUE KEY e_n (email,name)
> );
> EXPLAIN SELECT SQL_CALC_FOUND_ROWS DISTINCT email FROM t2 LEFT JOIN
t1 ON kid = t2.id WHERE t1.id IS NULL LIMIT 10;
> id select_type table type possible_keys key key_len ref rows Extra
> -1 SIMPLE t1 system PRIMARY,kid NULL NULL NULL 0 const row not found
> +1 SIMPLE t1 system kid NULL NULL NULL 0 const row not found
> 1 SIMPLE t2 index NULL e_n 104 NULL 10
> SELECT SQL_CALC_FOUND_ROWS DISTINCT email FROM t2 LEFT JOIN t1 ON
kid = t2.id WHERE t1.id IS NULL LIMIT 10;
> email
>
> === modified file 'mysql-test/r/subselect4.result'
> --- a/mysql-test/r/subselect4.result 2010-05-19 14:44:18 +0000
> +++ b/mysql-test/r/subselect4.result 2010-06-04 15:03:03 +0000
> @@ -500,6 +500,100 @@ SUBQUERY1_t1.col_varchar) ) )
> pk
> drop table t1;
> #
> +# Bug#43768 Prepared query with nested subqueries core dump on
second execution
> +#
> +CREATE TABLE t1 (
> +id INT PRIMARY KEY,
> +partner_id VARCHAR(35)
> +);
> +INSERT INTO t1 VALUES
> +(1, 'partner1'), (2, 'partner2'),
> +(3, 'partner3'), (4, 'partner4');
> +CREATE TABLE t2 (
> +id INT NOT NULL,
> +t1_line_id INT,
> +article_id VARCHAR(20),
> +PRIMARY KEY(id, t1_line_id)
> +);
> +INSERT INTO t2 VALUES
> +(1, 1, 'sup'), (2, 1, 'sup'),
> +(2, 2, 'sup'), (2, 3, 'sup'),
> +(2, 4, 'imp'), (3, 1, 'sup'),
> +(4, 1, 'sup');
> +CREATE TABLE t3 (
> +user_id VARCHAR(50),
> +article_id VARCHAR(20) NOT NULL,
> +PRIMARY KEY(user_id)
> +);
> +INSERT INTO t3 VALUES('nicke', 'imp');
> +EXPLAIN
> +SELECT t1.partner_id
> +FROM t1
> +WHERE t1.id IN (
> +SELECT t2.id
> +FROM t2
> +WHERE article_id IN (
> +SELECT article_id FROM t3
> +WHERE user_id = 'nicke'
> + )
> +);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 PRIMARY t3 system PRIMARY NULL NULL NULL 1
> +1 PRIMARY t1 ALL PRIMARY NULL NULL NULL 4
> +1 PRIMARY t2 ref PRIMARY PRIMARY 4 test.t1.id 1 Using where;
FirstMatch(t1)
> +SELECT t1.partner_id
> +FROM t1
> +WHERE t1.id IN (
> +SELECT t2.id
> +FROM t2
> +WHERE article_id IN (
> +SELECT article_id FROM t3
> +WHERE user_id = 'nicke'
> + )
> +);
> +partner_id
> +partner2
> +PREPARE stmt FROM
> +'EXPLAIN SELECT t1.partner_id
> +FROM t1
> +WHERE t1.id IN (
> + SELECT t2.id
> + FROM t2
> + WHERE article_id IN (
> + SELECT article_id FROM t3
> + WHERE user_id = \'nicke\'
> + )
> + )';
> +EXECUTE stmt;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 PRIMARY t3 system PRIMARY NULL NULL NULL 1
> +1 PRIMARY t1 ALL PRIMARY NULL NULL NULL 4
> +1 PRIMARY t2 ref PRIMARY PRIMARY 4 test.t1.id 1 Using where;
FirstMatch(t1)
> +EXECUTE stmt;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t3 system PRIMARY NULL NULL NULL 1
> +1 SIMPLE t1 ALL PRIMARY NULL NULL NULL 4
> +1 SIMPLE t2 ref PRIMARY PRIMARY 4 test.t1.id 1 Using where;
FirstMatch(t1)
> +PREPARE stmt FROM
> +'SELECT t1.partner_id
> +FROM t1
> +WHERE t1.id IN (
> + SELECT t2.id
> + FROM t2
> + WHERE article_id IN (
> + SELECT article_id FROM t3
> + WHERE user_id = \'nicke\'
> + )
> + )';
> +EXECUTE stmt;
> +partner_id
> +partner2
> +EXECUTE stmt;
> +partner_id
> +partner2
> +DROP TABLE t1,t2,t3;
> +# End of Bug#43768
> +#
> # BUG#53298 "wrong result with semijoin (no semijoin strategy chosen)"
> #
> create table t1 (uid int, fid int);
>
> === modified file 'mysql-test/r/subselect_sj.result'
> --- a/mysql-test/r/subselect_sj.result 2010-05-18 11:01:08 +0000
> +++ b/mysql-test/r/subselect_sj.result 2010-06-04 15:03:03 +0000
> @@ -333,7 +333,9 @@ i
> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> id select_type table type possible_keys key key_len ref rows Extra
> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
noticed after reading const tables
> +1 PRIMARY t1 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t3 ALL NULL NULL NULL NULL 2 Using where; FirstMatch(t2)
> SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> i
> @@ -363,7 +365,9 @@ i
> EXPLAIN SELECT * FROM t1 WHERE (11) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> id select_type table type possible_keys key key_len ref rows Extra
> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
noticed after reading const tables
> +1 PRIMARY t1 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t3 ALL NULL NULL NULL NULL 2 Using where; FirstMatch(t2)
> SELECT * FROM t1 WHERE (11) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> i
> @@ -1320,14 +1324,18 @@ SELECT (SELECT COUNT(*) from t1),
> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
> id select_type table type possible_keys key key_len ref rows Extra
> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
noticed after reading const tables
> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t3 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where
> SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
> i
> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> id select_type table type possible_keys key key_len ref rows Extra
> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
noticed after reading const tables
> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t3 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where
> SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> i
> @@ -2363,34 +2371,6 @@ int_key
> 7
> DROP TABLE t0, t1, t2;
> # End of bug#46550
> -#
> -# Bug #46744 Crash in optimize_semijoin_nests on empty view
> -# with limit and procedure.
> -#
> -DROP TABLE IF EXISTS t1, t2;
> -DROP VIEW IF EXISTS v1;
> -DROP PROCEDURE IF EXISTS p1;
> -CREATE TABLE t1 ( f1 int );
> -CREATE TABLE t2 ( f1 int );
> -insert into t2 values (5), (7);
> -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
> -create procedure p1()
> -select COUNT(*)
> -FROM v1 WHERE f1 IN
> -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
> -SET SESSION optimizer_switch = 'semijoin=on';
> -CALL p1();
> -COUNT(*)
> -0
> -SET SESSION optimizer_switch = 'semijoin=off';
> -CALL p1();
> -COUNT(*)
> -0
> -drop table t1, t2;
> -drop view v1;
> -drop procedure p1;
> -set SESSION optimizer_switch='default';
> -# End of bug#46744
>
> Bug #48073 Subquery on char columns from view crashes Mysql
>
>
> === modified file 'mysql-test/r/subselect_sj_jcl6.result'
> --- a/mysql-test/r/subselect_sj_jcl6.result 2010-05-18 11:01:08 +0000
> +++ b/mysql-test/r/subselect_sj_jcl6.result 2010-06-04 15:03:03 +0000
> @@ -337,7 +337,9 @@ i
> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> id select_type table type possible_keys key key_len ref rows Extra
> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
noticed after reading const tables
> +1 PRIMARY t1 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t3 ALL NULL NULL NULL NULL 2 Using where; FirstMatch(t2)
> SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> i
> @@ -367,7 +369,9 @@ i
> EXPLAIN SELECT * FROM t1 WHERE (11) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> id select_type table type possible_keys key key_len ref rows Extra
> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
noticed after reading const tables
> +1 PRIMARY t1 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t3 ALL NULL NULL NULL NULL 2 Using where; FirstMatch(t2)
> SELECT * FROM t1 WHERE (11) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> i
> @@ -1324,14 +1328,18 @@ SELECT (SELECT COUNT(*) from t1),
> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
> id select_type table type possible_keys key key_len ref rows Extra
> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
noticed after reading const tables
> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t3 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where
> SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
> i
> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> id select_type table type possible_keys key key_len ref rows Extra
> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
noticed after reading const tables
> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t3 system NULL NULL NULL NULL 0 const row not found
> +1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where
> SELECT * FROM t1 WHERE (t1.i) IN
> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
> i
> @@ -2367,34 +2375,6 @@ int_key
> 7
> DROP TABLE t0, t1, t2;
> # End of bug#46550
> -#
> -# Bug #46744 Crash in optimize_semijoin_nests on empty view
> -# with limit and procedure.
> -#
> -DROP TABLE IF EXISTS t1, t2;
> -DROP VIEW IF EXISTS v1;
> -DROP PROCEDURE IF EXISTS p1;
> -CREATE TABLE t1 ( f1 int );
> -CREATE TABLE t2 ( f1 int );
> -insert into t2 values (5), (7);
> -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
> -create procedure p1()
> -select COUNT(*)
> -FROM v1 WHERE f1 IN
> -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
> -SET SESSION optimizer_switch = 'semijoin=on';
> -CALL p1();
> -COUNT(*)
> -0
> -SET SESSION optimizer_switch = 'semijoin=off';
> -CALL p1();
> -COUNT(*)
> -0
> -drop table t1, t2;
> -drop view v1;
> -drop procedure p1;
> -set SESSION optimizer_switch='default';
> -# End of bug#46744
>
> Bug #48073 Subquery on char columns from view crashes Mysql
>
>
> === added file 'mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test'
> --- a/mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
1970-01-01 00:00:00 +0000
> +++ b/mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
2010-06-04 15:03:03 +0000
> @@ -0,0 +1,35 @@
> +--echo #
> +--echo # Bug #46744 Crash in optimize_semijoin_nests on empty view
> +--echo # with limit and procedure.
> +--echo #
> +
> +--disable_warnings
> +DROP TABLE IF EXISTS t1, t2;
> +DROP VIEW IF EXISTS v1;
> +DROP PROCEDURE IF EXISTS p1;
> +--enable_warnings
> +
> +CREATE TABLE t1 ( f1 int );
> +CREATE TABLE t2 ( f1 int );
> +
> +insert into t2 values (5), (7);
> +
> +CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
> +
> +create procedure p1()
> +select COUNT(*)
> +FROM v1 WHERE f1 IN
> +(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
> +
> +SET SESSION optimizer_switch = 'semijoin=on';
> +CALL p1();
> +SET SESSION optimizer_switch = 'semijoin=off';
> +CALL p1();
> +
> +drop table t1, t2;
> +drop view v1;
> +drop procedure p1;
> +
> +set SESSION optimizer_switch='default';
> +
> +--echo # End of bug#46744
>
> === modified file 'mysql-test/t/subselect4.test'
> --- a/mysql-test/t/subselect4.test 2010-05-19 14:44:18 +0000
> +++ b/mysql-test/t/subselect4.test 2010-06-04 15:03:03 +0000
> @@ -467,6 +467,95 @@ WHERE ( 1, 2 ) IN ( SELECT SUBQUERY1_t1.
> drop table t1;
>
> --echo #
> +--echo # Bug#43768 Prepared query with nested subqueries core dump
on second execution
> +--echo #
> +
> +CREATE TABLE t1 (
> + id INT PRIMARY KEY,
> + partner_id VARCHAR(35)
> +);
> +
> +INSERT INTO t1 VALUES
> + (1, 'partner1'), (2, 'partner2'),
> + (3, 'partner3'), (4, 'partner4');
> +
> +CREATE TABLE t2 (
> + id INT NOT NULL,
> + t1_line_id INT,
> + article_id VARCHAR(20),
> + PRIMARY KEY(id, t1_line_id)
> +);
> +
> +INSERT INTO t2 VALUES
> + (1, 1, 'sup'), (2, 1, 'sup'),
> + (2, 2, 'sup'), (2, 3, 'sup'),
> + (2, 4, 'imp'), (3, 1, 'sup'),
> + (4, 1, 'sup');
> +
> +CREATE TABLE t3 (
> + user_id VARCHAR(50),
> + article_id VARCHAR(20) NOT NULL,
> + PRIMARY KEY(user_id)
> +);
> +
> +INSERT INTO t3 VALUES('nicke', 'imp');
> +
> +EXPLAIN
> +SELECT t1.partner_id
> +FROM t1
> +WHERE t1.id IN (
> + SELECT t2.id
> + FROM t2
> + WHERE article_id IN (
> + SELECT article_id FROM t3
> + WHERE user_id = 'nicke'
> + )
> + );
> +
> +SELECT t1.partner_id
> +FROM t1
> +WHERE t1.id IN (
> + SELECT t2.id
> + FROM t2
> + WHERE article_id IN (
> + SELECT article_id FROM t3
> + WHERE user_id = 'nicke'
> + )
> + );
> +
> +PREPARE stmt FROM
> +'EXPLAIN SELECT t1.partner_id
> +FROM t1
> +WHERE t1.id IN (
> + SELECT t2.id
> + FROM t2
> + WHERE article_id IN (
> + SELECT article_id FROM t3
> + WHERE user_id = \'nicke\'
> + )
> + )';
> +EXECUTE stmt;
> +EXECUTE stmt;
> +
> +PREPARE stmt FROM
> +'SELECT t1.partner_id
> +FROM t1
> +WHERE t1.id IN (
> + SELECT t2.id
> + FROM t2
> + WHERE article_id IN (
> + SELECT article_id FROM t3
> + WHERE user_id = \'nicke\'
> + )
> + )';
> +EXECUTE stmt;
> +EXECUTE stmt;
> +
> +DROP TABLE t1,t2,t3;
> +
> +--echo # End of Bug#43768
> +
> +--echo #
> --echo # BUG#53298 "wrong result with semijoin (no semijoin
strategy chosen)"
> --echo #
>
>
> === modified file 'mysql-test/t/subselect_sj.test'
> --- a/mysql-test/t/subselect_sj.test 2010-05-18 11:01:08 +0000
> +++ b/mysql-test/t/subselect_sj.test 2010-06-04 15:03:03 +0000
> @@ -645,43 +645,6 @@ DROP TABLE t0, t1, t2;
>
> --echo # End of bug#46550
>
> ---echo #
> ---echo # Bug #46744 Crash in optimize_semijoin_nests on empty view
> ---echo # with limit and procedure.
> ---echo #
> -
> ---disable_warnings
> -DROP TABLE IF EXISTS t1, t2;
> -DROP VIEW IF EXISTS v1;
> -DROP PROCEDURE IF EXISTS p1;
> ---enable_warnings
> -
> -CREATE TABLE t1 ( f1 int );
> -CREATE TABLE t2 ( f1 int );
> -
> -insert into t2 values (5), (7);
> -
> -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
> -
> -create procedure p1()
> -select COUNT(*)
> -FROM v1 WHERE f1 IN
> -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
> -
> -SET SESSION optimizer_switch = 'semijoin=on';
> -CALL p1();
> -SET SESSION optimizer_switch = 'semijoin=off';
> -CALL p1();
> -
> -drop table t1, t2;
> -drop view v1;
> -drop procedure p1;
> -
> -set SESSION optimizer_switch='default';
> -
> ---echo # End of bug#46744
> -
> -
> --echo
> --echo Bug #48073 Subquery on char columns from view crashes Mysql
> --echo
>
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc 2010-05-25 15:24:05 +0000
> +++ b/sql/sql_select.cc 2010-06-04 15:03:03 +0000
> @@ -65,7 +65,8 @@ struct st_sargable_param;
>
> static void optimize_keyuse(JOIN *join, DYNAMIC_ARRAY *keyuse_array);
> static bool make_join_statistics(JOIN *join, TABLE_LIST *leaves,
COND *conds,
> - DYNAMIC_ARRAY *keyuse);
> + DYNAMIC_ARRAY *keyuse,
> + bool first_optimization);
> static bool optimize_semijoin_nests(JOIN *join, table_map
all_table_map);
> static bool update_ref_and_keys(THD *thd, DYNAMIC_ARRAY *keyuse,
> JOIN_TAB *join_tab,
> @@ -1513,6 +1514,8 @@ JOIN::optimize()
> if (optimized)
> DBUG_RETURN(0);
> optimized= 1;
> + bool first_optimization= select_lex->first_cond_optimization;
> + select_lex->first_cond_optimization= FALSE;
I do not see how introducing this local variable improves anything.
>
> thd_proc_info(thd, "optimizing");
>
> @@ -1555,7 +1558,7 @@ JOIN::optimize()
> }
> #endif
> SELECT_LEX *sel= thd->lex->current_select;
> - if (sel->first_cond_optimization)
> + if (first_optimization)
> {
> /*
> The following code will allocate the new items in a permanent
> @@ -1568,8 +1571,6 @@ JOIN::optimize()
> else
> thd->set_n_backup_active_arena(arena,&backup);
>
> - sel->first_cond_optimization= 0;
> -
> /* Convert all outer joins to inner joins if possible */
> conds= simplify_joins(this, join_list, conds, TRUE, FALSE);
> build_bitmap_for_nested_joins(join_list, 0);
> @@ -1712,7 +1713,8 @@ JOIN::optimize()
>
> /* Calculate how to do the join */
> thd_proc_info(thd, "statistics");
> - if (make_join_statistics(this, select_lex->leaf_tables,
conds,&keyuse) ||
> + if (make_join_statistics(this, select_lex->leaf_tables, conds,
> +&keyuse, first_optimization) ||
> thd->is_fatal_error)
> {
> DBUG_PRINT("error",("Error: make_join_statistics() failed"));
> @@ -3930,20 +3932,28 @@ bool find_eq_ref_candidate(TABLE *table,
> PRECONDITIONS
> When this function is called, the join may have several
semi-join nests
> but it is guaranteed that one semi-join nest does not contain
another.
> + For functionally dependent tables to be pulled out, key
information must
> + have been calculated.
> + For const tables to be pulled out, join->const_table_map must be
set.
>
> ACTION
> A table can be pulled out of the semi-join nest if
> - - It is a constant table, or
> - - It is accessed via eq_ref(outer_tables)
> + - It is accessed via eq_ref(outer_tables), or
> + - It contains zero or one rows (identified as const table)
>
> POSTCONDITIONS
> - * Tables that were pulled out have JOIN_TAB::emb_sj_nest == NULL
> - * Tables that were not pulled out have JOIN_TAB::emb_sj_nest
pointing
> - to semi-join nest they are in.
> - * Semi-join nests' TABLE_LIST::sj_inner_tables is updated
accordingly
> -
> - This operation is (and should be) performed at each PS execution
since
> - tables may become/cease to be constant across PS reexecutions.
> + * Tables that were pulled out are removed from the semijoin
nest they
> + belonged to and added to the parent join nest.
> + * For these tables, the used_tables and not_null_tables fields of
> + the semijoin nest they belonged to will be adjusted.
> + The semijoin nest is also marked as correlated, and
> + sj_corr_tables and sj_depends_on are adjusted if necessary.
> + * Semi-join nests' sj_inner_tables is set equal to used_tables
> +
> + For queries that can be executed multiple times, only functionally
> + dependent tables can be pulled out.
> + For queries that can be executed only once, tables that have one
> + or zero rows can also be pulled out.
>
> NOTE
> Table pullout may make uncorrelated subquery correlated.
Consider this
> @@ -3971,17 +3981,19 @@ int pull_out_semijoin_tables(JOIN *join)
>
> /* Try pulling out of the each of the semi-joins */
> while ((sj_nest= sj_list_it++))
> - {
> - /* Action #1: Mark the constant tables to be pulled out */
> - table_map pulled_tables= 0;
> -
> + {
Seems like you have made some unnecessary white-space change here.
> List_iterator<TABLE_LIST>
child_li(sj_nest->nested_join->join_list);
> TABLE_LIST *tbl;
> + table_map pulled_tables= 0;
> +
> + /*
> + Add tables that have been marked as const to the set of tables
> + to be pulled out.
> + */
> while ((tbl= child_li++))
> {
> if (tbl->table)
> {
> - tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
> if (tbl->table->map& join->const_table_map)
> {
> pulled_tables |= tbl->table->map;
> @@ -3990,11 +4002,11 @@ int pull_out_semijoin_tables(JOIN *join)
> }
> }
> }
> -
> +
> /*
> - Action #2: Find which tables we can pull out based on
> - update_ref_and_keys() data. Note that pulling one table out
can allow
> - us to pull out some other tables too.
> + Find which tables we can pull out based on key dependency data.
> + Note that pulling one table out can allow us to pull out some
> + other tables too.
> */
> bool pulled_a_table;
> do
> @@ -4027,32 +4039,27 @@ int pull_out_semijoin_tables(JOIN *join)
>
> child_li.rewind();
> /*
> - Action #3: Move the pulled out TABLE_LIST elements to the parents.
> + Move the pulled out TABLE_LIST elements to the parents.
> */
> - table_map inner_tables= sj_nest->nested_join->used_tables&
> - ~pulled_tables;
> - /* Record the bitmap of inner tables */
> - sj_nest->sj_inner_tables= inner_tables;
> + sj_nest->nested_join->used_tables&= ~pulled_tables;
> + sj_nest->nested_join->not_null_tables&= ~pulled_tables;
> +
> + /* sj_inner_tables is a copy of nested_join->used_tables */
> + sj_nest->sj_inner_tables= sj_nest->nested_join->used_tables;
> +
> if (pulled_tables)
> {
> - List<TABLE_LIST> *upper_join_list= (sj_nest->embedding != NULL)?
> -
(&sj_nest->embedding->nested_join->join_list):
> -
(&join->select_lex->top_join_list);
> + List<TABLE_LIST> *upper_join_list= (sj_nest->embedding != NULL) ?
> +&sj_nest->embedding->nested_join->join_list :
> +&join->select_lex->top_join_list;
> Query_arena *arena, backup;
> arena= join->thd->activate_stmt_arena_if_needed(&backup);
> while ((tbl= child_li++))
> {
> if (tbl->table)
> {
> - if (inner_tables& tbl->table->map)
> + if (!(sj_nest->nested_join->used_tables&
tbl->table->map))
> {
> - /* This table is not pulled out */
> - tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
> - }
> - else
> - {
> - /* This table has been pulled out of the semi-join nest */
> - tbl->table->reginfo.join_tab->emb_sj_nest= NULL;
> /*
> Pull the table up in the same way as simplify_joins()
does:
> update join_list and embedding pointers but keep
next[_local]
> @@ -4067,7 +4074,7 @@ int pull_out_semijoin_tables(JOIN *join)
> }
>
> /* Remove the sj-nest itself if we've removed everything from
it */
> - if (!inner_tables)
> + if (!sj_nest->nested_join->used_tables)
> {
> List_iterator<TABLE_LIST> li(*upper_join_list);
> /* Find the sj_nest in the list. */
> @@ -4188,9 +4195,37 @@ static uint get_tmp_table_rec_length(Lis
> return len;
> }
>
> -
> /**
> - Calculate the best possible join and initialize the join structure.
> + Calculate best possible join order and initialize the join structure.
> +
> + @details
> + Here is an overview of the logic of this function:
> +
> + - Initialize JOIN data structures and setup basic dependencies
between tables.
> +
> + - Update dependencies based on join information.
> +
> + - Make key descriptions (update_ref_and_keys()).
> +
> + - Pull out semijoin tables based on table dependencies.
> +
> + - Extract tables with zero or one rows as const tables.
> +
> + - Read contents of const tables, substitute columns from these
tables with
> + actual data. Also keep track of empty tables vs. one-row tables.
> +
> + - Extract tables that have become dependent after const table
extraction
> + as const tables.
This seems a bit strange. Why should one extract tables as const
tables when one has already done const table extraction?
> +
> + - Add new sargable predicates based on retrieved const values.
> +
> + - Calculate number of rows to be retrieved from each table.
> +
> + - Calculate cost of potential semijoin materializations.
> +
> + - Calculate best possible join order based on available statistics.
> +
> + - Fill in remaining information for the generated join order.
>
> @retval
> 0 ok
> @@ -4200,7 +4235,7 @@ static uint get_tmp_table_rec_length(Lis
>
> static bool
> make_join_statistics(JOIN *join, TABLE_LIST *tables_arg, COND *conds,
> - DYNAMIC_ARRAY *keyuse_array)
> + DYNAMIC_ARRAY *keyuse_array, bool first_optimization)
> {
> int error;
> TABLE *table;
> @@ -4224,11 +4259,16 @@ make_join_statistics(JOIN *join, TABLE_L
> DBUG_RETURN(1); // Eom /* purecov: inspected */
>
> join->best_ref=stat_vector;
> + join->const_table_map= 0;
>
> stat_end=stat+table_count;
> found_const_table_map= all_table_map=0;
> const_count=0;
>
> + /*
> + Initialize data structures for tables to be joined.
> + Initialize dependencies between tables.
> + */
> for (s= stat, i= 0;
> tables;
> s++, tables= tables->next_leaf, i++)
> @@ -4261,31 +4301,10 @@ make_join_statistics(JOIN *join, TABLE_L
> table->quick_condition_rows= table->file->stats.records;
>
> s->on_expr_ref=&tables->on_expr;
> - if (*s->on_expr_ref)
> - {
> - /* s is the only inner table of an outer join */
> -#ifdef WITH_PARTITION_STORAGE_ENGINE
> - if ((!table->file->stats.records ||
table->no_partitions_used)&&
> - !tables->in_outer_join_nest())
> -#else
> - if (!table->file->stats.records&&
!tables->in_outer_join_nest())
> -#endif
> - { // Empty table
> - s->dependent= 0; // Ignore LEFT JOIN
depend.
> - set_position(join,const_count++,s,(KEYUSE*) 0);
> - continue;
> - }
> - outer_join|= table->map;
> - s->embedding_map= 0;
> - for (TABLE_LIST *embedding= tables->embedding;
> - embedding;
> - embedding= embedding->embedding)
> - s->embedding_map|= embedding->nested_join->nj_map;
> - continue;
> - }
> +
> if (tables->in_outer_join_nest())
> {
> - /* s belongs to a nested join, maybe to several embedded joins */
> + /* s belongs to a nested join, maybe to several embedding joins */
> s->embedding_map= 0;
> for (TABLE_LIST *embedding= tables->embedding;
> embedding;
> @@ -4296,20 +4315,16 @@ make_join_statistics(JOIN *join, TABLE_L
> s->dependent|= embedding->dep_tables;
> outer_join|= nested_join->used_tables;
> }
> - continue;
> }
> -#ifdef WITH_PARTITION_STORAGE_ENGINE
> - const bool no_partitions_used= table->no_partitions_used;
> -#else
> - const bool no_partitions_used= FALSE;
> -#endif
> - if ((table->s->system || table->file->stats.records<= 1 ||
> - no_partitions_used)&&
> - !s->dependent&&
> - (table->file->ha_table_flags()& HA_STATS_RECORDS_IS_EXACT)&&
> - !table->fulltext_searched&& !join->no_const_tables)
> + else if (*s->on_expr_ref)
> {
> - set_position(join,const_count++,s,(KEYUSE*) 0);
> + /* s is the only inner table of an outer join */
> + outer_join|= table->map;
> + s->embedding_map= 0;
> + for (TABLE_LIST *embedding= tables->embedding;
> + embedding;
> + embedding= embedding->embedding)
> + s->embedding_map|= embedding->nested_join->nj_map;
This is very similar to what happens right above. Would it be
possible to merge the two cases?
> }
> }
> stat_vector[i]=0;
> @@ -4318,6 +4333,7 @@ make_join_statistics(JOIN *join, TABLE_L
> if (join->outer_join)
> {
> /*
> + Complete the dependency analysis.
> Build transitive closure for relation 'to be dependent on'.
> This will speed up the plan search for many cases with outer
joins,
> as well as allow us to catch illegal cross references/
> @@ -4355,8 +4371,84 @@ make_join_statistics(JOIN *join, TABLE_L
> ~outer_join, join->select_lex,&sargables))
> goto error;
>
> - /* Read tables with 0 or 1 rows (system tables) */
> - join->const_table_map= 0;
> + /*
> + Pull out semijoin tables based on dependencies. Dependencies are
valid
> + throughout the lifetime of a query, so this operation need to be
performed
> + on the first optimization only.
> + */
> + if (first_optimization)
> + {
> + if (pull_out_semijoin_tables(join))
> + DBUG_RETURN(TRUE);
> + }
Is the test on first_optimization only for efficiency reasons, or is
not pull_out_semijoin_tables() idempotent? If not, would it be
possible to make it so?
> +
> + /*
> + Pull out tables based on data volumes, needs to be done for each
execution.
Are you mixing pullout and general const table extraction/elimination
here? AFAICT pullout is usually connected to semijoin nests, but the
below code handles all tables.
> + Const tables containing exactly zero or one rows are pulled out, but
> + notice the additional constraints checked below.
> + Tables that are pulled out have their rows read before actual
execution
> + starts. Thus, they do not take part in join order optimization
process,
> + which can significantly reduce the optimization time.
> + The data read from these tables can also be regarded as "constant"
> + throughout query execution, hence the column values can be used for
> + additional constant propagation.
> + */
> + for (i= 0, s= stat; i< table_count; i++, s++)
> + {
> + TABLE *table= s->table;
> + TABLE_LIST *tables= table->pos_in_table_list;
> +
> +#ifdef WITH_PARTITION_STORAGE_ENGINE
> + const bool no_partitions_used= table->no_partitions_used;
> +#else
> + const bool no_partitions_used= FALSE;
> +#endif
> +
> + if (tables->in_outer_join_nest())
> + {
> + /*
> + Table belongs to a nested join, no candidate for const table
extraction.
> + */
> + }
> + else if (tables->embedding&&
tables->embedding->sj_on_expr&&
> + !join->thd->stmt_arena->is_conventional())
> + {
> + /*
> + Table belongs to a semi-join and this is a PS/SP.
> + No candidate for const table extraction.
> + */
> + }
I think you need to explain why it is not a candidate for const table
extraction (or maybe the term should be pull-out here?). Any reason
why you here prefer to separate conditions into several if-statements
with empty blocks over the usual style of a big logical expression
with numbered conditions (like below)?
> + else if (*s->on_expr_ref)
> + {
> + /* s is the only inner table of an outer join */
> + if (!table->file->stats.records || no_partitions_used)
> + { // Empty table
> + set_position(join,const_count++,s,(KEYUSE*) 0);
> + }
> + }
> + else
> + {
> + /*
> + Extract const tables (tables with only zero or one rows).
> + Exclude tables that
> + 1. are dependent upon other tables, or
> + 2. have no exact statistics (usually transactional tables), or
Is this a general property of transactional storage engine, or one
specific to InnoDB?
> + 3. are full-text searched, or
> + 4. the join object does not allow const table elimination.
> + */
> + if ((table->s->system ||
> + table->file->stats.records<= 1 ||
> + no_partitions_used)&&
> + !s->dependent&&
// 1
> + (table->file->ha_table_flags()&
HA_STATS_RECORDS_IS_EXACT)&& // 2
> + !table->fulltext_searched&&
// 3
> + !join->no_const_tables)
// 4
> + {
> + set_position(join,const_count++,s,(KEYUSE*) 0);
> + }
> + }
> + }
> + /* Read tables with 0 or 1 rows (const tables) */
>
> for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count;
> p_pos< p_end ;
> @@ -4478,14 +4570,14 @@ make_join_statistics(JOIN *join, TABLE_L
> } while (keyuse->table == table&& keyuse->key == key);
>
> /*
> - TODO (low priority): currently we ignore the const
tables that
> - are within a semi-join nest which is within an outer
join nest.
> - The effect of this is that we don't do const
substitution for
> - such tables.
Why have you removed this comment?
> + Extract const tables with proper key dependencies.
> + Exclude tables that
> + 1. are full-text searched, or
> + 2. are part of a semi-join nest or part of a nested
outer join.
Item 2 does not match the code. in_outer_join_nest() does not generally
return true for tables in a semi-join nest, only if the semi-join nest
is embedded in an outer join nest.
> */
> if (eq_part.is_prefix(table->key_info[key].key_parts)&&
> - !table->fulltext_searched&&
> - !table->pos_in_table_list->in_outer_join_nest())
> + !table->fulltext_searched&&
// 1
> + !table->pos_in_table_list->in_outer_join_nest())
// 2
> {
> if (table->key_info[key].flags& HA_NOSAME)
> {
> @@ -4544,6 +4636,8 @@ make_join_statistics(JOIN *join, TABLE_L
>
> for (s=stat ; s< stat_end ; s++)
> {
> + TABLE_LIST *tl= s->table->pos_in_table_list;
> +
> if (s->type == JT_SYSTEM || s->type == JT_CONST)
> {
> /* Only one matching row */
> @@ -4576,9 +4670,8 @@ make_join_statistics(JOIN *join, TABLE_L
> Do range analysis if we're on the inner side of a semi-join (3).
> */
> if (!s->const_keys.is_clear_all()&& // (1)
> - (!s->table->pos_in_table_list->embedding || // (2)
> - (s->table->pos_in_table_list->embedding&& //
(3)
> - s->table->pos_in_table_list->embedding->sj_on_expr))) // (3)
> + (!tl->embedding || // (2)
> + (tl->embedding&& tl->embedding->sj_on_expr))) //
(3)
> {
> ha_rows records;
> SQL_SELECT *select;
> @@ -4593,8 +4686,11 @@ make_join_statistics(JOIN *join, TABLE_L
> s->quick=select->quick;
> s->needed_reg=select->needed_reg;
> select->quick=0;
> - if (records == 0&&
s->table->reginfo.impossible_range&&
> - (s->table->file->ha_table_flags()&
HA_STATS_RECORDS_IS_EXACT))
> + if (records == 0&&
> + s->table->reginfo.impossible_range&&
> + (s->table->file->ha_table_flags()&
HA_STATS_RECORDS_IS_EXACT)&&
> + (join->thd->stmt_arena->is_conventional() ||
> + !(tl->embedding&& tl->embedding->sj_on_expr)))
Please, add an explanation for the added conditions. (I would also
appreciate a space before each "&&".)
> {
> /*
> Impossible WHERE or ON expression
> @@ -4623,8 +4719,27 @@ make_join_statistics(JOIN *join, TABLE_L
> }
> }
>
> - if (pull_out_semijoin_tables(join))
> - DBUG_RETURN(TRUE);
> + /*
> + Pull out tables based on data volumes. Data volumes may vary
from one
> + execution to the next, so this needs to be done for every
optimization.
> + */
> + if (join->thd->stmt_arena->is_conventional())
> + {
> + if (pull_out_semijoin_tables(join))
> + DBUG_RETURN(TRUE);
> + }
Please, add a comment on the reason for the is_conventional() test.
> +
> + /*
> + Set embedding semijoin pointer. This must be done after pullout of
> + semijoin tables, but it must also be done for each execution,
hence it
> + cannot be done in pull_out_semijoin_tables().
> + */
The above comment does not make sense anymore since you now are
calling pull_out_semijoin_tables on every execution.
> + for (s= stat; s< stat_end; i++, s++)
> + {
> + TABLE_LIST *tables= s->table->pos_in_table_list;
> + if (tables->embedding&& tables->embedding->sj_on_expr)
> + s->emb_sj_nest= tables->embedding;
> + }
>
> join->join_tab=stat;
> join->map2table=stat_ref;
>
>
>
>
>
--
Øystein