List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:June 15 2010 9:33am
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185)
Bug#43768
View as plain text  
Hi Øystein,

thank you for the review comments!

Øystein Grøvlen wrote:
> 
> Thanks for the patch, Roy.
> 
> I have two main questions to the patch:
> 
>   - Why does it lead to plan changes in the loop of subselect_sj?
>     I thought the agreement was that behavior should only change when
>     prepare is used.

It seems to happen because previously, the outer join dependency between t2 and 
t3 was removed when t2 was empty in a right join (and t3 in a left join). In 
practice this has no effect, as table t1 is empty in any case.
> 
>   - Would it not be better to split pull_out_semijoin into two
>     different functions, one for functional dependencies and one for
>     const tables?  Seems a bit strange to call same function twice to
>     do quite different things.

It does only one thing (pullout of tables), but it does it based on two 
criteria. It will always do eq-ref-based pullout, but if the caller mandates it 
to do volume-based table pullout as well (by setting const_table_map), that may 
also trigger more eq-ref-based pullouts.
> 
> See more comments inline.
> 
> 
> On 06/ 4/10 05:04 PM, Roy Lyseng wrote:
>  > #At file:///home/rl136806/mysql/repo/mysql-work4/ based on 
> revid:olav@stripped
>  >
>  >   3185 Roy Lyseng    2010-06-04
>  >        Bug#43768: Prepared query with nested subqueries core dump on 
> second execution
>  >
>  >        The problem here is that pullout of semijoin tables is 
> attempted for
>  >        each execution, but because those tables are not "pushed back" 
> again
>  >        after each execution, the pullout fails on second attempt.
>  >
>  >        The solution chosen here is to pullout only those semijoin 
> tables that are
>  >        functionally dependent upon the outer tables. This pullout 
> operation
>  >        can be performed only once, and, unlike the current procedure, 
> is not
>  >        dependent upon the data volume of the involved tables.
>  >        We still pullout tables based on volume if this is a direct 
> execution, though.
> 
> I think more people would immediatedly understand what you were
> discussing here if you referred to "const tables" instead of "based on
> volume."

Trying to rephrase this:
We still pullout tables that are classified as const tables based on volume if 
this is a direct execution, though.
> 
>  >
>  >        The practical implication of this is as follows:
>  >
>  >         - Only outer tables containing zero or one rows will now be 
> extracted
>  >           as "const tables". Thus, such tables from a semijoin nest 
> are no
>  >           longer accessed during optimization, and some (rare) 
> optimizations
>  >           are no longer possible.
>  >
>  >         - In the majority of cases, there is no performance impact. 
> Often,
>  >           the new strategy chosen is Materialization, meaning that 
> the row
>  >           of these table is accessed only once and saved in local 
> memory.
>  >
>  >         - Const table analysis now has to be done in two phases:
>  >           1) Pullout tables based on dependencies. Both outer and 
> inner tables
>  >           may apply, and
>  >           2) Pullout outer tables based on data volume.
>  >
>  >        In order to implement the latter point above, and assure that 
> pullout
>  >        of semijoin tables occurs only once, make_join_statistics() 
> has been
>  >        restructured slightly. The conditional logic within the 
> function has also
>  >        been enhanced for better readability.
>  >
>  >        The logic of make_join_statistics() is now as follows:
>  >
>  >        1. Initialize JOIN data structures
>  >           (major part of first loop in existing implementation).
>  >
>  >        2. Update dependencies based on join information
>  >           (the Warshall algorithm).
>  >
>  >        3. Make key descriptions (update_ref_and_keys()).
>  >
>  >        4. Pull out semijoin tables, called only once.
>  >
>  >        5. Extract tables with zero or one rows as const tables
>  >           (consider outer tables only, no semijoin tables).
>  >
>  >        6. Extract dependent tables as const tables.
>  >           (consider outer tables only, no semijoin tables).
>  >
>  >        7. The remaining parts of the function.
>  >
>  >        mysql-test/r/select_found.result
>  >          Possible keys changed from "PRIMARY,kid" to "kid".
>  >
> 
> Is this a good change?  Is it a necessary change?

It happened because the tests for table extraction are now done in a slightly 
different order. Because "const row not found" is reported, key analysis stops 
earlier than before, but since the table will not be accessed during execution 
anyhow, the possible key set does not matter. It could be a mistaken candidate 
key because of the test "t1.id IS NULL". If I replace with "t1.kid IS NULL", the 
candidate key PRIMARY is no longer reported.
> 
> 
>  >        mysql-test/r/subselect4.result
>  >          Added test case for bug#43768
>  >
>  >        mysql-test/r/subselect_sj.result
>  >          A number of plan changes because of extensive testing of 
> semi-join
>  >          tables with 0, 1 and 2 rows.
>  >
>  >        mysql-test/r/subselect_sj_jcl6.result
>  >          Ditto.
>  >
>  >        mysql-test/t/subselect4.test
>  >          Added test case for bug#43768
> 
> Since it it related to semijoin, I would prefer if you added your test
> case to subselect_sj instead.  Will also less likely cause conflict
> with my test changes (subselect4 will disappear, subselect_sj will be
> renamed)

OK.
> 
>  >
>  >        mysql-test/t/subselect_sj.test
>  >          bug#46744 now becomes a duplicate of bug#50489, and test 
> case is moved.
>  >
>  >        mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
>  >          Test for bug#46744 moved here.
> 
> Does this mean that Bug#46744 is no longer fixed?

It becomes a duplicate of bug#50489.
> 
>  >
>  >        sql/sql_select.cc
>  >          pull_out_semijoin_tables()
>  >            join_tab->emb_sj_nest is no longer updated. Comments updated.
>  >          make_join_statistics()
>  >            Removed const table pullout from first loop.
>  >            Simplified testing based on inner/outer/semi-join properties.
>  >            Calls pull_out_semijoin_tables() just after dependency 
> analysis.
>  >            Then, added loop that performs pullout based on data volume
>  >            but excludes semijoined tables in prepared statements.
>  >            Second call of pull_out_semijoin_tables() is needed after 
> this.
>  >            Simplified testing based on inner/outer/semi-join properties.
>  >            Assured that semijoin tables are no longer pulled out 
> based on volume.
>  >
> 
> The last sentence is no longer true.

Rephrased.
> 
> I also suggest you add a sentence about that you have changed where
> emb_sj_nest is set.

Done.
> 
>  >      added:
>  >        mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
>  >      modified:
>  >        mysql-test/r/select_found.result
>  >        mysql-test/r/subselect4.result
>  >        mysql-test/r/subselect_sj.result
>  >        mysql-test/r/subselect_sj_jcl6.result
>  >        mysql-test/t/subselect4.test
>  >        mysql-test/t/subselect_sj.test
>  >        sql/sql_select.cc
>  > === modified file 'mysql-test/r/select_found.result'
>  > --- a/mysql-test/r/select_found.result    2007-08-02 19:45:56 +0000
>  > +++ b/mysql-test/r/select_found.result    2010-06-04 15:03:03 +0000
>  > @@ -83,7 +83,7 @@ UNIQUE KEY e_n (email,name)
>  >   );
>  >   EXPLAIN SELECT SQL_CALC_FOUND_ROWS DISTINCT email FROM t2 LEFT JOIN 
> t1  ON kid = t2.id WHERE t1.id IS NULL LIMIT 10;
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    SIMPLE    t1    system    PRIMARY,kid    NULL    NULL    
> NULL    0    const row not found
>  > +1    SIMPLE    t1    system    kid    NULL    NULL    NULL    0    
> const row not found
>  >   1    SIMPLE    t2    index    NULL    e_n    104    NULL    10   
>  >   SELECT SQL_CALC_FOUND_ROWS DISTINCT email FROM t2 LEFT JOIN t1  ON 
> kid = t2.id WHERE t1.id IS NULL LIMIT 10;
>  >   email
>  >
>  > === modified file 'mysql-test/r/subselect4.result'
>  > --- a/mysql-test/r/subselect4.result    2010-05-19 14:44:18 +0000
>  > +++ b/mysql-test/r/subselect4.result    2010-06-04 15:03:03 +0000
>  > @@ -500,6 +500,100 @@ SUBQUERY1_t1.col_varchar) ) )
>  >   pk
>  >   drop table t1;
>  >   #
>  > +# Bug#43768 Prepared query with nested subqueries core dump on 
> second execution
>  > +#
>  > +CREATE TABLE t1 (
>  > +id INT PRIMARY KEY,
>  > +partner_id VARCHAR(35)
>  > +);
>  > +INSERT INTO t1 VALUES
>  > +(1, 'partner1'), (2, 'partner2'),
>  > +(3, 'partner3'), (4, 'partner4');
>  > +CREATE TABLE t2 (
>  > +id INT NOT NULL,
>  > +t1_line_id INT,
>  > +article_id VARCHAR(20),
>  > +PRIMARY KEY(id, t1_line_id)
>  > +);
>  > +INSERT INTO t2 VALUES
>  > +(1, 1, 'sup'), (2, 1, 'sup'),
>  > +(2, 2, 'sup'), (2, 3, 'sup'),
>  > +(2, 4, 'imp'), (3, 1, 'sup'),
>  > +(4, 1, 'sup');
>  > +CREATE TABLE t3 (
>  > +user_id VARCHAR(50),
>  > +article_id VARCHAR(20) NOT NULL,
>  > +PRIMARY KEY(user_id)
>  > +);
>  > +INSERT INTO t3 VALUES('nicke', 'imp');
>  > +EXPLAIN
>  > +SELECT t1.partner_id
>  > +FROM t1
>  > +WHERE t1.id IN (
>  > +SELECT t2.id
>  > +FROM t2
>  > +WHERE article_id IN (
>  > +SELECT article_id FROM t3
>  > +WHERE user_id = 'nicke'
>  > +    )
>  > +);
>  > +id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > +1    PRIMARY    t3    system    PRIMARY    NULL    NULL    NULL    1   
>  > +1    PRIMARY    t1    ALL    PRIMARY    NULL    NULL    NULL    4   
>  > +1    PRIMARY    t2    ref    PRIMARY    PRIMARY    4    
> test.t1.id    1    Using where; FirstMatch(t1)
>  > +SELECT t1.partner_id
>  > +FROM t1
>  > +WHERE t1.id IN (
>  > +SELECT t2.id
>  > +FROM t2
>  > +WHERE article_id IN (
>  > +SELECT article_id FROM t3
>  > +WHERE user_id = 'nicke'
>  > +    )
>  > +);
>  > +partner_id
>  > +partner2
>  > +PREPARE stmt FROM
>  > +'EXPLAIN SELECT t1.partner_id
>  > +FROM t1
>  > +WHERE t1.id IN (
>  > +    SELECT t2.id
>  > +    FROM t2
>  > +    WHERE article_id IN (
>  > +      SELECT article_id FROM t3
>  > +      WHERE user_id = \'nicke\'
>  > +    )
>  > +  )';
>  > +EXECUTE stmt;
>  > +id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > +1    PRIMARY    t3    system    PRIMARY    NULL    NULL    NULL    1   
>  > +1    PRIMARY    t1    ALL    PRIMARY    NULL    NULL    NULL    4   
>  > +1    PRIMARY    t2    ref    PRIMARY    PRIMARY    4    
> test.t1.id    1    Using where; FirstMatch(t1)
>  > +EXECUTE stmt;
>  > +id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > +1    SIMPLE    t3    system    PRIMARY    NULL    NULL    NULL    1   
>  > +1    SIMPLE    t1    ALL    PRIMARY    NULL    NULL    NULL    4   
>  > +1    SIMPLE    t2    ref    PRIMARY    PRIMARY    4    test.t1.id    
> 1    Using where; FirstMatch(t1)
>  > +PREPARE stmt FROM
>  > +'SELECT t1.partner_id
>  > +FROM t1
>  > +WHERE t1.id IN (
>  > +    SELECT t2.id
>  > +    FROM t2
>  > +    WHERE article_id IN (
>  > +      SELECT article_id FROM t3
>  > +      WHERE user_id = \'nicke\'
>  > +    )
>  > +  )';
>  > +EXECUTE stmt;
>  > +partner_id
>  > +partner2
>  > +EXECUTE stmt;
>  > +partner_id
>  > +partner2
>  > +DROP TABLE t1,t2,t3;
>  > +# End of Bug#43768
>  > +#
>  >   # BUG#53298 "wrong result with semijoin (no semijoin strategy chosen)"
>  >   #
>  >   create table t1 (uid int, fid int);
>  >
>  > === modified file 'mysql-test/r/subselect_sj.result'
>  > --- a/mysql-test/r/subselect_sj.result    2010-05-18 11:01:08 +0000
>  > +++ b/mysql-test/r/subselect_sj.result    2010-06-04 15:03:03 +0000
>  > @@ -333,7 +333,9 @@ i
>  >   EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    PRIMARY    NULL    NULL    NULL    NULL    NULL    NULL    
> NULL    Impossible WHERE noticed after reading const tables
>  > +1    PRIMARY    t1    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t2    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t3    ALL    NULL    NULL    NULL    NULL    2    
> Using where; FirstMatch(t2)
>  >   SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   i
>  > @@ -363,7 +365,9 @@ i
>  >   EXPLAIN SELECT * FROM t1 WHERE (11) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    PRIMARY    NULL    NULL    NULL    NULL    NULL    NULL    
> NULL    Impossible WHERE noticed after reading const tables
>  > +1    PRIMARY    t1    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t2    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t3    ALL    NULL    NULL    NULL    NULL    2    
> Using where; FirstMatch(t2)
>  >   SELECT * FROM t1 WHERE (11) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   i
>  > @@ -1320,14 +1324,18 @@ SELECT (SELECT COUNT(*) from t1),
>  >   EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    PRIMARY    NULL    NULL    NULL    NULL    NULL    NULL    
> NULL    Impossible WHERE noticed after reading const tables
>  > +1    PRIMARY    t2    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t3    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t1    ALL    NULL    NULL    NULL    NULL    2    
> Using where
>  >   SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
>  >   i
>  >   EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    PRIMARY    NULL    NULL    NULL    NULL    NULL    NULL    
> NULL    Impossible WHERE noticed after reading const tables
>  > +1    PRIMARY    t2    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t3    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t1    ALL    NULL    NULL    NULL    NULL    2    
> Using where
>  >   SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   i
>  > @@ -2363,34 +2371,6 @@ int_key
>  >   7
>  >   DROP TABLE t0, t1, t2;
>  >   # End of bug#46550
>  > -#
>  > -# Bug #46744 Crash in optimize_semijoin_nests on empty view
>  > -# with limit and procedure.
>  > -#
>  > -DROP TABLE IF EXISTS t1, t2;
>  > -DROP VIEW IF EXISTS v1;
>  > -DROP PROCEDURE IF EXISTS p1;
>  > -CREATE TABLE t1 ( f1 int );
>  > -CREATE TABLE t2 ( f1 int );
>  > -insert into t2 values (5), (7);
>  > -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
>  > -create procedure p1()
>  > -select COUNT(*)
>  > -FROM v1 WHERE f1 IN
>  > -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
>  > -SET SESSION optimizer_switch = 'semijoin=on';
>  > -CALL p1();
>  > -COUNT(*)
>  > -0
>  > -SET SESSION optimizer_switch = 'semijoin=off';
>  > -CALL p1();
>  > -COUNT(*)
>  > -0
>  > -drop table t1, t2;
>  > -drop view v1;
>  > -drop procedure p1;
>  > -set SESSION optimizer_switch='default';
>  > -# End of bug#46744
>  >
>  >   Bug #48073 Subquery on char columns from view crashes Mysql
>  >
>  >
>  > === modified file 'mysql-test/r/subselect_sj_jcl6.result'
>  > --- a/mysql-test/r/subselect_sj_jcl6.result    2010-05-18 11:01:08 +0000
>  > +++ b/mysql-test/r/subselect_sj_jcl6.result    2010-06-04 15:03:03 +0000
>  > @@ -337,7 +337,9 @@ i
>  >   EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    PRIMARY    NULL    NULL    NULL    NULL    NULL    NULL    
> NULL    Impossible WHERE noticed after reading const tables
>  > +1    PRIMARY    t1    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t2    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t3    ALL    NULL    NULL    NULL    NULL    2    
> Using where; FirstMatch(t2)
>  >   SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   i
>  > @@ -367,7 +369,9 @@ i
>  >   EXPLAIN SELECT * FROM t1 WHERE (11) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    PRIMARY    NULL    NULL    NULL    NULL    NULL    NULL    
> NULL    Impossible WHERE noticed after reading const tables
>  > +1    PRIMARY    t1    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t2    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t3    ALL    NULL    NULL    NULL    NULL    2    
> Using where; FirstMatch(t2)
>  >   SELECT * FROM t1 WHERE (11) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   i
>  > @@ -1324,14 +1328,18 @@ SELECT (SELECT COUNT(*) from t1),
>  >   EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    PRIMARY    NULL    NULL    NULL    NULL    NULL    NULL    
> NULL    Impossible WHERE noticed after reading const tables
>  > +1    PRIMARY    t2    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t3    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t1    ALL    NULL    NULL    NULL    NULL    2    
> Using where
>  >   SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
>  >   i
>  >   EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   id    select_type    table    type    possible_keys    key    
> key_len    ref    rows    Extra
>  > -1    PRIMARY    NULL    NULL    NULL    NULL    NULL    NULL    
> NULL    Impossible WHERE noticed after reading const tables
>  > +1    PRIMARY    t2    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t3    system    NULL    NULL    NULL    NULL    0    
> const row not found
>  > +1    PRIMARY    t1    ALL    NULL    NULL    NULL    NULL    2    
> Using where
>  >   SELECT * FROM t1 WHERE (t1.i) IN
>  >   (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>  >   i
>  > @@ -2367,34 +2375,6 @@ int_key
>  >   7
>  >   DROP TABLE t0, t1, t2;
>  >   # End of bug#46550
>  > -#
>  > -# Bug #46744 Crash in optimize_semijoin_nests on empty view
>  > -# with limit and procedure.
>  > -#
>  > -DROP TABLE IF EXISTS t1, t2;
>  > -DROP VIEW IF EXISTS v1;
>  > -DROP PROCEDURE IF EXISTS p1;
>  > -CREATE TABLE t1 ( f1 int );
>  > -CREATE TABLE t2 ( f1 int );
>  > -insert into t2 values (5), (7);
>  > -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
>  > -create procedure p1()
>  > -select COUNT(*)
>  > -FROM v1 WHERE f1 IN
>  > -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
>  > -SET SESSION optimizer_switch = 'semijoin=on';
>  > -CALL p1();
>  > -COUNT(*)
>  > -0
>  > -SET SESSION optimizer_switch = 'semijoin=off';
>  > -CALL p1();
>  > -COUNT(*)
>  > -0
>  > -drop table t1, t2;
>  > -drop view v1;
>  > -drop procedure p1;
>  > -set SESSION optimizer_switch='default';
>  > -# End of bug#46744
>  >
>  >   Bug #48073 Subquery on char columns from view crashes Mysql
>  >
>  >
>  > === added file 'mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test'
>  > --- a/mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test 
> 1970-01-01 00:00:00 +0000
>  > +++ b/mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test 
> 2010-06-04 15:03:03 +0000
>  > @@ -0,0 +1,35 @@
>  > +--echo #
>  > +--echo # Bug #46744 Crash in optimize_semijoin_nests on empty view
>  > +--echo # with limit and procedure.
>  > +--echo #
>  > +
>  > +--disable_warnings
>  > +DROP TABLE IF EXISTS t1, t2;
>  > +DROP VIEW IF EXISTS v1;
>  > +DROP PROCEDURE IF EXISTS p1;
>  > +--enable_warnings
>  > +
>  > +CREATE TABLE t1 ( f1 int );
>  > +CREATE TABLE t2 ( f1 int );
>  > +
>  > +insert into t2 values (5), (7);
>  > +
>  > +CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
>  > +
>  > +create procedure p1()
>  > +select COUNT(*)
>  > +FROM v1 WHERE f1 IN
>  > +(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
>  > +
>  > +SET SESSION optimizer_switch = 'semijoin=on';
>  > +CALL p1();
>  > +SET SESSION optimizer_switch = 'semijoin=off';
>  > +CALL p1();
>  > +
>  > +drop table t1, t2;
>  > +drop view v1;
>  > +drop procedure p1;
>  > +
>  > +set SESSION optimizer_switch='default';
>  > +
>  > +--echo # End of bug#46744
>  >
>  > === modified file 'mysql-test/t/subselect4.test'
>  > --- a/mysql-test/t/subselect4.test    2010-05-19 14:44:18 +0000
>  > +++ b/mysql-test/t/subselect4.test    2010-06-04 15:03:03 +0000
>  > @@ -467,6 +467,95 @@ WHERE ( 1, 2 ) IN ( SELECT SUBQUERY1_t1.
>  >   drop table t1;
>  >
>  >   --echo #
>  > +--echo # Bug#43768 Prepared query with nested subqueries core dump 
> on second execution
>  > +--echo #
>  > +
>  > +CREATE TABLE t1 (
>  > +  id INT PRIMARY KEY,
>  > +  partner_id VARCHAR(35)
>  > +);
>  > +
>  > +INSERT INTO t1 VALUES
>  > + (1, 'partner1'), (2, 'partner2'),
>  > + (3, 'partner3'), (4, 'partner4');
>  > +
>  > +CREATE TABLE t2 (
>  > +  id INT NOT NULL,
>  > +  t1_line_id INT,
>  > +  article_id VARCHAR(20),
>  > +  PRIMARY KEY(id, t1_line_id)
>  > +);
>  > +
>  > +INSERT INTO t2 VALUES
>  > + (1, 1, 'sup'), (2, 1, 'sup'),
>  > + (2, 2, 'sup'), (2, 3, 'sup'),
>  > + (2, 4, 'imp'), (3, 1, 'sup'),
>  > + (4, 1, 'sup');
>  > +
>  > +CREATE TABLE t3 (
>  > +  user_id VARCHAR(50),
>  > +  article_id VARCHAR(20) NOT NULL,
>  > +  PRIMARY KEY(user_id)
>  > +);
>  > +
>  > +INSERT INTO t3 VALUES('nicke', 'imp');
>  > +
>  > +EXPLAIN
>  > +SELECT t1.partner_id
>  > +FROM t1
>  > +WHERE t1.id IN (
>  > +    SELECT t2.id
>  > +    FROM t2
>  > +    WHERE article_id IN (
>  > +      SELECT article_id FROM t3
>  > +      WHERE user_id = 'nicke'
>  > +    )
>  > +  );
>  > +
>  > +SELECT t1.partner_id
>  > +FROM t1
>  > +WHERE t1.id IN (
>  > +    SELECT t2.id
>  > +    FROM t2
>  > +    WHERE article_id IN (
>  > +      SELECT article_id FROM t3
>  > +      WHERE user_id = 'nicke'
>  > +    )
>  > +  );
>  > +
>  > +PREPARE stmt FROM
>  > +'EXPLAIN SELECT t1.partner_id
>  > +FROM t1
>  > +WHERE t1.id IN (
>  > +    SELECT t2.id
>  > +    FROM t2
>  > +    WHERE article_id IN (
>  > +      SELECT article_id FROM t3
>  > +      WHERE user_id = \'nicke\'
>  > +    )
>  > +  )';
>  > +EXECUTE stmt;
>  > +EXECUTE stmt;
>  > +
>  > +PREPARE stmt FROM
>  > +'SELECT t1.partner_id
>  > +FROM t1
>  > +WHERE t1.id IN (
>  > +    SELECT t2.id
>  > +    FROM t2
>  > +    WHERE article_id IN (
>  > +      SELECT article_id FROM t3
>  > +      WHERE user_id = \'nicke\'
>  > +    )
>  > +  )';
>  > +EXECUTE stmt;
>  > +EXECUTE stmt;
>  > +
>  > +DROP TABLE t1,t2,t3;
>  > +
>  > +--echo # End of Bug#43768
>  > +
>  > +--echo #
>  >   --echo # BUG#53298 "wrong result with semijoin (no semijoin 
> strategy chosen)"
>  >   --echo #
>  >
>  >
>  > === modified file 'mysql-test/t/subselect_sj.test'
>  > --- a/mysql-test/t/subselect_sj.test    2010-05-18 11:01:08 +0000
>  > +++ b/mysql-test/t/subselect_sj.test    2010-06-04 15:03:03 +0000
>  > @@ -645,43 +645,6 @@ DROP TABLE t0, t1, t2;
>  >
>  >   --echo # End of bug#46550
>  >
>  > ---echo #
>  > ---echo # Bug #46744 Crash in optimize_semijoin_nests on empty view
>  > ---echo # with limit and procedure.
>  > ---echo #
>  > -
>  > ---disable_warnings
>  > -DROP TABLE IF EXISTS t1, t2;
>  > -DROP VIEW IF EXISTS v1;
>  > -DROP PROCEDURE IF EXISTS p1;
>  > ---enable_warnings
>  > -
>  > -CREATE TABLE t1 ( f1 int );
>  > -CREATE TABLE t2 ( f1 int );
>  > -
>  > -insert into t2 values (5), (7);
>  > -
>  > -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
>  > -
>  > -create procedure p1()
>  > -select COUNT(*)
>  > -FROM v1 WHERE f1 IN
>  > -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
>  > -
>  > -SET SESSION optimizer_switch = 'semijoin=on';
>  > -CALL p1();
>  > -SET SESSION optimizer_switch = 'semijoin=off';
>  > -CALL p1();
>  > -
>  > -drop table t1, t2;
>  > -drop view v1;
>  > -drop procedure p1;
>  > -
>  > -set SESSION optimizer_switch='default';
>  > -
>  > ---echo # End of bug#46744
>  > -
>  > -
>  >   --echo
>  >   --echo Bug #48073 Subquery on char columns from view crashes Mysql
>  >   --echo
>  >
>  > === modified file 'sql/sql_select.cc'
>  > --- a/sql/sql_select.cc    2010-05-25 15:24:05 +0000
>  > +++ b/sql/sql_select.cc    2010-06-04 15:03:03 +0000
>  > @@ -65,7 +65,8 @@ struct st_sargable_param;
>  >
>  >   static void optimize_keyuse(JOIN *join, DYNAMIC_ARRAY *keyuse_array);
>  >   static bool make_join_statistics(JOIN *join, TABLE_LIST *leaves, 
> COND *conds,
>  > -                 DYNAMIC_ARRAY *keyuse);
>  > +                 DYNAMIC_ARRAY *keyuse,
>  > +                                 bool first_optimization);
>  >   static bool optimize_semijoin_nests(JOIN *join, table_map 
> all_table_map);
>  >   static bool update_ref_and_keys(THD *thd, DYNAMIC_ARRAY *keyuse,
>  >                                   JOIN_TAB *join_tab,
>  > @@ -1513,6 +1514,8 @@ JOIN::optimize()
>  >     if (optimized)
>  >       DBUG_RETURN(0);
>  >     optimized= 1;
>  > +  bool first_optimization= select_lex->first_cond_optimization;
>  > +  select_lex->first_cond_optimization= FALSE;
> 
> I do not see how introducing this local variable improves anything.

It means that I can keep it throughout the execution of this function,
and clear select_lex->first_cond_optimization immediately, and not worry
about how the function is exited.

"const" added in front of bool first_optimization.
> 
>  >
>  >     thd_proc_info(thd, "optimizing");
>  >
>  > @@ -1555,7 +1558,7 @@ JOIN::optimize()
>  >     }
>  >   #endif
>  >     SELECT_LEX *sel= thd->lex->current_select;
>  > -  if (sel->first_cond_optimization)
>  > +  if (first_optimization)
>  >     {
>  >       /*
>  >         The following code will allocate the new items in a permanent
>  > @@ -1568,8 +1571,6 @@ JOIN::optimize()
>  >       else
>  >         thd->set_n_backup_active_arena(arena,&backup);
>  >
>  > -    sel->first_cond_optimization= 0;
>  > -
>  >       /* Convert all outer joins to inner joins if possible */
>  >       conds= simplify_joins(this, join_list, conds, TRUE, FALSE);
>  >       build_bitmap_for_nested_joins(join_list, 0);
>  > @@ -1712,7 +1713,8 @@ JOIN::optimize()
>  >
>  >     /* Calculate how to do the join */
>  >     thd_proc_info(thd, "statistics");
>  > -  if (make_join_statistics(this, select_lex->leaf_tables, 
> conds,&keyuse) ||
>  > +  if (make_join_statistics(this, select_lex->leaf_tables, conds,
>  > +&keyuse, first_optimization) ||
>  >         thd->is_fatal_error)
>  >     {
>  >       DBUG_PRINT("error",("Error: make_join_statistics() failed"));
>  > @@ -3930,20 +3932,28 @@ bool find_eq_ref_candidate(TABLE *table,
>  >       PRECONDITIONS
>  >       When this function is called, the join may have several 
> semi-join nests
>  >       but it is guaranteed that one semi-join nest does not contain 
> another.
>  > +    For functionally dependent tables to be pulled out, key 
> information must
>  > +    have been calculated.
>  > +    For const tables to be pulled out, join->const_table_map must be 
> set.
>  >
>  >       ACTION
>  >       A table can be pulled out of the semi-join nest if
>  > -     - It is a constant table, or
>  > -     - It is accessed via eq_ref(outer_tables)
>  > +     - It is accessed via eq_ref(outer_tables), or
>  > +     - It contains zero or one rows (identified as const table)
>  >
>  >       POSTCONDITIONS
>  > -     * Tables that were pulled out have JOIN_TAB::emb_sj_nest == NULL
>  > -     * Tables that were not pulled out have JOIN_TAB::emb_sj_nest 
> pointing
>  > -       to semi-join nest they are in.
>  > -     * Semi-join nests' TABLE_LIST::sj_inner_tables is updated 
> accordingly
>  > -
>  > -    This operation is (and should be) performed at each PS execution 
> since
>  > -    tables may become/cease to be constant across PS reexecutions.
>  > +     * Tables that were pulled out are removed from the semijoin 
> nest they
>  > +       belonged to and added to the parent join nest.
>  > +     * For these tables, the used_tables and not_null_tables fields of
>  > +       the semijoin nest they belonged to will be adjusted.
>  > +       The semijoin nest is also marked as correlated, and
>  > +       sj_corr_tables and sj_depends_on are adjusted if necessary.
>  > +     * Semi-join nests' sj_inner_tables is set equal to used_tables
>  > +
>  > +    For queries that can be executed multiple times, only functionally
>  > +    dependent tables can be pulled out.
>  > +    For queries that can be executed only once, tables that have one
>  > +    or zero rows can also be pulled out.
>  >
>  >     NOTE
>  >       Table pullout may make uncorrelated subquery correlated. 
> Consider this
>  > @@ -3971,17 +3981,19 @@ int pull_out_semijoin_tables(JOIN *join)
>  >
>  >     /* Try pulling out of the each of the semi-joins */
>  >     while ((sj_nest= sj_list_it++))
>  > -  {
>  > -    /* Action #1: Mark the constant tables to be pulled out */
>  > -    table_map pulled_tables= 0;
>  > -
>  > +  {
> 
> Seems like you have made some unnecessary white-space change here.

Fixed. But while changing the comment to be aligned with the new patch, I also 
moved it to a more "logical" place (ie after the general declarations).
> 
>  >       List_iterator<TABLE_LIST> 
> child_li(sj_nest->nested_join->join_list);
>  >       TABLE_LIST *tbl;
>  > +    table_map pulled_tables= 0;
>  > +
>  > +    /*
>  > +      Add tables that have been marked as const to the set of tables
>  > +      to be pulled out.
>  > +    */
>  >       while ((tbl= child_li++))
>  >       {
>  >         if (tbl->table)
>  >         {
>  > -        tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
>  >           if (tbl->table->map&  join->const_table_map)
>  >           {
>  >             pulled_tables |= tbl->table->map;
>  > @@ -3990,11 +4002,11 @@ int pull_out_semijoin_tables(JOIN *join)
>  >           }
>  >         }
>  >       }
>  > -
>  > +
>  >       /*
>  > -      Action #2: Find which tables we can pull out based on
>  > -      update_ref_and_keys() data. Note that pulling one table out 
> can allow
>  > -      us to pull out some other tables too.
>  > +      Find which tables we can pull out based on key dependency data.
>  > +      Note that pulling one table out can allow us to pull out some
>  > +      other tables too.
>  >       */
>  >       bool pulled_a_table;
>  >       do
>  > @@ -4027,32 +4039,27 @@ int pull_out_semijoin_tables(JOIN *join)
>  >
>  >       child_li.rewind();
>  >       /*
>  > -      Action #3: Move the pulled out TABLE_LIST elements to the 
> parents.
>  > +      Move the pulled out TABLE_LIST elements to the parents.
>  >       */
>  > -    table_map inner_tables= sj_nest->nested_join->used_tables&
>  > -                            ~pulled_tables;
>  > -    /* Record the bitmap of inner tables */
>  > -    sj_nest->sj_inner_tables= inner_tables;
>  > +    sj_nest->nested_join->used_tables&= ~pulled_tables;
>  > +    sj_nest->nested_join->not_null_tables&= ~pulled_tables;
>  > +
>  > +    /* sj_inner_tables is a copy of nested_join->used_tables */
>  > +    sj_nest->sj_inner_tables= sj_nest->nested_join->used_tables;
>  > +
>  >       if (pulled_tables)
>  >       {
>  > -      List<TABLE_LIST>  *upper_join_list= (sj_nest->embedding !=
> NULL)?
>  > - (&sj_nest->embedding->nested_join->join_list):
>  > - (&join->select_lex->top_join_list);
>  > +      List<TABLE_LIST>  *upper_join_list= (sj_nest->embedding != 
> NULL) ?
>  > +&sj_nest->embedding->nested_join->join_list :
>  > +&join->select_lex->top_join_list;
>  >         Query_arena *arena, backup;
>  >         arena= join->thd->activate_stmt_arena_if_needed(&backup);
>  >         while ((tbl= child_li++))
>  >         {
>  >           if (tbl->table)
>  >           {
>  > -          if (inner_tables&  tbl->table->map)
>  > +          if (!(sj_nest->nested_join->used_tables& 
> tbl->table->map))
>  >             {
>  > -            /* This table is not pulled out */
>  > -            tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
>  > -          }
>  > -          else
>  > -          {
>  > -            /* This table has been pulled out of the semi-join nest */
>  > -            tbl->table->reginfo.join_tab->emb_sj_nest= NULL;
>  >               /*
>  >                 Pull the table up in the same way as simplify_joins() 
> does:
>  >                 update join_list and embedding pointers but keep 
> next[_local]
>  > @@ -4067,7 +4074,7 @@ int pull_out_semijoin_tables(JOIN *join)
>  >         }
>  >
>  >         /* Remove the sj-nest itself if we've removed everything from 
> it */
>  > -      if (!inner_tables)
>  > +      if (!sj_nest->nested_join->used_tables)
>  >         {
>  >           List_iterator<TABLE_LIST>  li(*upper_join_list);
>  >           /* Find the sj_nest in the list. */
>  > @@ -4188,9 +4195,37 @@ static uint get_tmp_table_rec_length(Lis
>  >     return len;
>  >   }
>  >
>  > -
>  >   /**
>  > -  Calculate the best possible join and initialize the join structure.
>  > +  Calculate best possible join order and initialize the join structure.
>  > +
>  > +  @details
>  > +  Here is an overview of the logic of this function:
>  > +
>  > +  - Initialize JOIN data structures and setup basic dependencies 
> between tables.
>  > +
>  > +  - Update dependencies based on join information.
>  > +
>  > +  - Make key descriptions (update_ref_and_keys()).
>  > +
>  > +  - Pull out semijoin tables based on table dependencies.
>  > +
>  > +  - Extract tables with zero or one rows as const tables.
>  > +
>  > +  - Read contents of const tables, substitute columns from these 
> tables with
>  > +    actual data. Also keep track of empty tables vs. one-row tables.
>  > +
>  > +  - Extract tables that have become dependent after const table 
> extraction
>  > +    as const tables.
> 
> This seems a bit strange.  Why should one extract tables as const
> tables when one has already done const table extraction?
> 
>  > +
>  > +  - Add new sargable predicates based on retrieved const values.
>  > +
>  > +  - Calculate number of rows to be retrieved from each table.
>  > +
>  > +  - Calculate cost of potential semijoin materializations.
>  > +
>  > +  - Calculate best possible join order based on available statistics.
>  > +
>  > +  - Fill in remaining information for the generated join order.
>  >
>  >     @retval
>  >       0    ok
>  > @@ -4200,7 +4235,7 @@ static uint get_tmp_table_rec_length(Lis
>  >
>  >   static bool
>  >   make_join_statistics(JOIN *join, TABLE_LIST *tables_arg, COND *conds,
>  > -             DYNAMIC_ARRAY *keyuse_array)
>  > +             DYNAMIC_ARRAY *keyuse_array, bool first_optimization)
>  >   {
>  >     int error;
>  >     TABLE *table;
>  > @@ -4224,11 +4259,16 @@ make_join_statistics(JOIN *join, TABLE_L
>  >       DBUG_RETURN(1);                // Eom /* purecov: inspected */
>  >
>  >     join->best_ref=stat_vector;
>  > +  join->const_table_map= 0;
>  >
>  >     stat_end=stat+table_count;
>  >     found_const_table_map= all_table_map=0;
>  >     const_count=0;
>  >
>  > +  /*
>  > +    Initialize data structures for tables to be joined.
>  > +    Initialize dependencies between tables.
>  > +  */
>  >     for (s= stat, i= 0;
>  >          tables;
>  >          s++, tables= tables->next_leaf, i++)
>  > @@ -4261,31 +4301,10 @@ make_join_statistics(JOIN *join, TABLE_L
>  >       table->quick_condition_rows= table->file->stats.records;
>  >
>  >       s->on_expr_ref=&tables->on_expr;
>  > -    if (*s->on_expr_ref)
>  > -    {
>  > -      /* s is the only inner table of an outer join */
>  > -#ifdef WITH_PARTITION_STORAGE_ENGINE
>  > -      if ((!table->file->stats.records ||
> table->no_partitions_used)&&
>  > -          !tables->in_outer_join_nest())
>  > -#else
>  > -      if (!table->file->stats.records&& 
> !tables->in_outer_join_nest())
>  > -#endif
>  > -      {                        // Empty table
>  > -        s->dependent= 0;                        // Ignore LEFT JOIN 
> depend.
>  > -    set_position(join,const_count++,s,(KEYUSE*) 0);
>  > -    continue;
>  > -      }
>  > -      outer_join|= table->map;
>  > -      s->embedding_map= 0;
>  > -      for (TABLE_LIST *embedding= tables->embedding;
>  > -           embedding;
>  > -           embedding= embedding->embedding)
>  > -        s->embedding_map|= embedding->nested_join->nj_map;
>  > -      continue;
>  > -    }
>  > +
>  >       if (tables->in_outer_join_nest())
>  >       {
>  > -      /* s belongs to a nested join, maybe to several embedded joins */
>  > +      /* s belongs to a nested join, maybe to several embedding 
> joins */
>  >         s->embedding_map= 0;
>  >         for (TABLE_LIST *embedding= tables->embedding;
>  >              embedding;
>  > @@ -4296,20 +4315,16 @@ make_join_statistics(JOIN *join, TABLE_L
>  >           s->dependent|= embedding->dep_tables;
>  >           outer_join|= nested_join->used_tables;
>  >         }
>  > -      continue;
>  >       }
>  > -#ifdef WITH_PARTITION_STORAGE_ENGINE
>  > -    const bool no_partitions_used= table->no_partitions_used;
>  > -#else
>  > -    const bool no_partitions_used= FALSE;
>  > -#endif
>  > -    if ((table->s->system || table->file->stats.records<= 1 ||
>  > -         no_partitions_used)&&
>  > -    !s->dependent&&
>  > -    (table->file->ha_table_flags()& 
> HA_STATS_RECORDS_IS_EXACT)&&
>  > -        !table->fulltext_searched&&  !join->no_const_tables)
>  > +    else if (*s->on_expr_ref)
>  >       {
>  > -      set_position(join,const_count++,s,(KEYUSE*) 0);
>  > +      /* s is the only inner table of an outer join */
>  > +      outer_join|= table->map;
>  > +      s->embedding_map= 0;
>  > +      for (TABLE_LIST *embedding= tables->embedding;
>  > +           embedding;
>  > +           embedding= embedding->embedding)
>  > +        s->embedding_map|= embedding->nested_join->nj_map;
> 
> This is very similar to what happens right above.  Would it be
> possible to merge the two cases?

I tried, but I think the added complexity in the tests will not justify it...
> 
>  >       }
>  >     }
>  >     stat_vector[i]=0;
>  > @@ -4318,6 +4333,7 @@ make_join_statistics(JOIN *join, TABLE_L
>  >     if (join->outer_join)
>  >     {
>  >       /*
>  > +       Complete the dependency analysis.
>  >          Build transitive closure for relation 'to be dependent on'.
>  >          This will speed up the plan search for many cases with outer 
> joins,
>  >          as well as allow us to catch illegal cross references/
>  > @@ -4355,8 +4371,84 @@ make_join_statistics(JOIN *join, TABLE_L
>  >                               ~outer_join,
> join->select_lex,&sargables))
>  >         goto error;
>  >
>  > -  /* Read tables with 0 or 1 rows (system tables) */
>  > -  join->const_table_map= 0;
>  > +  /*
>  > +    Pull out semijoin tables based on dependencies. Dependencies are 
> valid
>  > +    throughout the lifetime of a query, so this operation need to be 
> performed
>  > +    on the first optimization only.
>  > +  */
>  > +  if (first_optimization)
>  > +  {
>  > +    if (pull_out_semijoin_tables(join))
>  > +      DBUG_RETURN(TRUE);
>  > +  }
> 
> Is the test on first_optimization only for efficiency reasons, or is
> not pull_out_semijoin_tables() idempotent?  If not, would it be
> possible to make it so?

pull_out_semijoin_tables() is indeed idempotent. In this case, I think the if 
test is part of the logic and clarifies the code, and it is slightly more 
efficient than walking through the semijoin nests to find out there is nothing 
to do.
> 
>  > +
>  > +  /*
>  > +    Pull out tables based on data volumes, needs to be done for each 
> execution.
> 
> Are you mixing pullout and general const table extraction/elimination
> here?  AFAICT pullout is usually connected to semijoin nests, but the
> below code handles all tables.

Yes, I did... Fixed.
> 
> 
>  > +    Const tables containing exactly zero or one rows are pulled out, 
> but
>  > +    notice the additional constraints checked below.
>  > +    Tables that are pulled out have their rows read before actual 
> execution
>  > +    starts. Thus, they do not take part in join order optimization 
> process,
>  > +    which can significantly reduce the optimization time.
>  > +    The data read from these tables can also be regarded as "constant"
>  > +    throughout query execution, hence the column values can be used for
>  > +    additional constant propagation.
>  > +  */
>  > +  for (i= 0, s= stat; i<  table_count; i++, s++)
>  > +  {
>  > +    TABLE      *table= s->table;
>  > +    TABLE_LIST *tables= table->pos_in_table_list;
>  > +
>  > +#ifdef WITH_PARTITION_STORAGE_ENGINE
>  > +    const bool no_partitions_used= table->no_partitions_used;
>  > +#else
>  > +    const bool no_partitions_used= FALSE;
>  > +#endif
>  > +
>  > +    if (tables->in_outer_join_nest())
>  > +    {
>  > +      /*
>  > +        Table belongs to a nested join, no candidate for const table 
> extraction.
>  > +      */
>  > +    }
>  > +    else if (tables->embedding&& 
> tables->embedding->sj_on_expr&&
>  > +             !join->thd->stmt_arena->is_conventional())
>  > +    {
>  > +      /*
>  > +        Table belongs to a semi-join and this is a PS/SP.
>  > +        No candidate for const table extraction.
>  > +      */
>  > +    }
> 
> I think you need to explain why it is not a candidate for const table
> extraction (or maybe the term should be pull-out here?).

Rephrased.

   Any reason
> why you here prefer to separate conditions into several if-statements
> with empty blocks over the usual style of a big logical expression
> with numbered conditions (like below)?

Several reasons, actually:
  - I tried to keep the same structure for testing nesting types as in
    the loop above.
  - The order of the tests is significant (trial and err needed...)
  - I think this structure expresses the logic pretty well.
    (The is_conventional() test is an exception...)
> 
>  > +    else if (*s->on_expr_ref)
>  > +    {
>  > +      /* s is the only inner table of an outer join */
>  > +      if (!table->file->stats.records || no_partitions_used)
>  > +      {                        // Empty table
>  > +        set_position(join,const_count++,s,(KEYUSE*) 0);
>  > +      }
>  > +    }
>  > +    else
>  > +    {
>  > +      /*
>  > +        Extract const tables (tables with only zero or one rows).
>  > +        Exclude tables that
>  > +         1. are dependent upon other tables, or
>  > +         2. have no exact statistics (usually transactional tables), or
> 
> Is this a general property of transactional storage engine, or one
> specific to InnoDB?

One may imagine transactional storage engines that keep exact row counts, e.g. 
by using MVCC.
> 
>  > +         3. are full-text searched, or
>  > +         4. the join object does not allow const table elimination.
>  > +      */
>  > +      if ((table->s->system ||
>  > +           table->file->stats.records<= 1 ||
>  > +           no_partitions_used)&&
>  > +          !s->dependent&&      // 1
>  > +          (table->file->ha_table_flags()& 
> HA_STATS_RECORDS_IS_EXACT)&&  // 2
>  > +          !table->fulltext_searched&&      // 3
>  > +          !join->no_const_tables)      // 4
>  > +      {
>  > +        set_position(join,const_count++,s,(KEYUSE*) 0);
>  > +      }
>  > +    }
>  > +  }
>  > +  /* Read tables with 0 or 1 rows (const tables) */
>  >
>  >     for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count;
>  >          p_pos<  p_end ;
>  > @@ -4478,14 +4570,14 @@ make_join_statistics(JOIN *join, TABLE_L
>  >         } while (keyuse->table == table&&  keyuse->key == key);
>  >
>  >             /*
>  > -            TODO (low priority): currently we ignore the const 
> tables that
>  > -            are within a semi-join nest which is within an outer 
> join nest.
>  > -            The effect of this is that we don't do const 
> substitution for
>  > -            such tables.
> 
> Why have you removed this comment?

Moved back again. Probably a reminisence of first patch, which was more 
aggressive at cutting out const table elimination.
> 
>  > +            Extract const tables with proper key dependencies.
>  > +            Exclude tables that
>  > +             1. are full-text searched, or
>  > +             2. are part of a semi-join nest or part of a nested 
> outer join.
> 
> Item 2 does not match the code. in_outer_join_nest() does not generally
> return true for tables in a semi-join nest, only if the semi-join nest
> is embedded in an outer join nest.

Corrected.
> 
>  >             */
>  >         if (eq_part.is_prefix(table->key_info[key].key_parts)&&
>  > -              !table->fulltext_searched&&
>  > -              !table->pos_in_table_list->in_outer_join_nest())
>  > +              !table->fulltext_searched&&  // 1
>  > +              !table->pos_in_table_list->in_outer_join_nest())  // 2
>  >         {
>  >               if (table->key_info[key].flags&  HA_NOSAME)
>  >               {
>  > @@ -4544,6 +4636,8 @@ make_join_statistics(JOIN *join, TABLE_L
>  >
>  >     for (s=stat ; s<  stat_end ; s++)
>  >     {
>  > +    TABLE_LIST *tl= s->table->pos_in_table_list;
>  > +
>  >       if (s->type == JT_SYSTEM || s->type == JT_CONST)
>  >       {
>  >         /* Only one matching row */
>  > @@ -4576,9 +4670,8 @@ make_join_statistics(JOIN *join, TABLE_L
>  >         Do range analysis if we're on the inner side of a semi-join (3).
>  >       */
>  >       if (!s->const_keys.is_clear_all()&&                         //
> (1)
>  > -        (!s->table->pos_in_table_list->embedding ||             //
> (2)
>  > -         (s->table->pos_in_table_list->embedding&&           
>   // (3)
>  > -          s->table->pos_in_table_list->embedding->sj_on_expr))) //
> (3)
>  > +        (!tl->embedding ||                                      // (2)
>  > +         (tl->embedding&&  tl->embedding->sj_on_expr)))      
>   // (3)
>  >       {
>  >         ha_rows records;
>  >         SQL_SELECT *select;
>  > @@ -4593,8 +4686,11 @@ make_join_statistics(JOIN *join, TABLE_L
>  >         s->quick=select->quick;
>  >         s->needed_reg=select->needed_reg;
>  >         select->quick=0;
>  > -      if (records == 0&& 
> s->table->reginfo.impossible_range&&
>  > -          (s->table->file->ha_table_flags()& 
> HA_STATS_RECORDS_IS_EXACT))
>  > +      if (records == 0&&
>  > +          s->table->reginfo.impossible_range&&
>  > +          (s->table->file->ha_table_flags()& 
> HA_STATS_RECORDS_IS_EXACT)&&
>  > +          (join->thd->stmt_arena->is_conventional() ||
>  > +           !(tl->embedding&&  tl->embedding->sj_on_expr)))
> 
> Please, add an explanation for the added conditions.  (I would also
> appreciate a space before each "&&".)

Explanation added. (Just for the added condition, though. I am not 100% sure
how to explain the other parts of the condition.)

The weird spacing I blame bzr and/or mailer for.
> 
>  >         {
>  >       /*
>  >         Impossible WHERE or ON expression
>  > @@ -4623,8 +4719,27 @@ make_join_statistics(JOIN *join, TABLE_L
>  >       }
>  >     }
>  >
>  > -  if (pull_out_semijoin_tables(join))
>  > -    DBUG_RETURN(TRUE);
>  > +  /*
>  > +    Pull out tables based on data volumes. Data volumes may vary 
> from one
>  > +    execution to the next, so this needs to be done for every 
> optimization.
>  > +  */
>  > +  if (join->thd->stmt_arena->is_conventional())
>  > +  {
>  > +    if (pull_out_semijoin_tables(join))
>  > +      DBUG_RETURN(TRUE);
>  > +  }
> 
> Please, add a comment on the reason for the is_conventional() test.

Comment is changed.
> 
>  > +
>  > +  /*
>  > +    Set embedding semijoin pointer. This must be done after pullout of
>  > +    semijoin tables, but it must also be done for each execution, 
> hence it
>  > +    cannot be done in pull_out_semijoin_tables().
>  > +  */
> 
> The above comment does not make sense anymore since you now are
> calling pull_out_semijoin_tables on every execution.

pull_out_semijoin_tables() is only called for the first execution of a prepared 
statement, so the comment is correct. It may be the above comment that confused 
you, I hope it explains things better now.
> 
> 
>  > +  for (s= stat; s<  stat_end; i++, s++)
>  > +  {
>  > +    TABLE_LIST *tables= s->table->pos_in_table_list;
>  > +    if (tables->embedding&&  tables->embedding->sj_on_expr)
>  > +      s->emb_sj_nest= tables->embedding;
>  > +  }
>  >
>  >     join->join_tab=stat;
>  >     join->map2table=stat_ref;
>  >
>  >
>  >
>  >
>  >
> 
> 

Roy
Thread
bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185) Bug#43768Roy Lyseng4 Jun
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185)Bug#43768Øystein Grøvlen14 Jun
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185)Bug#43768Roy Lyseng15 Jun
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185)Bug#43768Evgeny Potemkin16 Jun
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185)Bug#43768Roy Lyseng6 Jul
      • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185)Bug#43768Evgeny Potemkin14 Jul
        • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185)Bug#43768Roy Lyseng8 Sep