From: Roy Lyseng
Date: September 8 2010 12:11pm
Subject: Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3185)
Bug#43768
List-Archive: http://lists.mysql.com/commits/117770
Message-Id: <4C877D73.3060502@oracle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Evgeny Potemkin wrote:
> Hi Roy,
>
> On 07/06/10 18:06, Roy Lyseng wrote:
>> Hi Evegn,
>>
>> thanks for reviewing.
>>
>> Mail was taken by my commit mail filter, as it was not addressed only
>> to commits.
>>
>> Evgeny Potemkin wrote:
>>> Hi Roy,
>>>
>>> One general note: the "const table" is a well-established term among
>>> mysql
>>> developers, so, please refer to it when you're speaking about some
>>> operation
>>> on tables "based on data volumes" it'll make your comments more clear
>>> to others.
>>
>> I am just trying to use consistent terminology here. It is also fairly
>> compliant
>> with the MySQL manual:
>>
>> system: The table has only one row (= system table). This is a special
>> case of
>> the const join type.
>>
>> const: The table has at most one matching row, which is read at the
>> start of the
>> query
>>
>> Thus, there are two types of "const" table eliminations: based on ref
>> and based
>> on volume (row count). I can rephrase with "row count" instead of
>> "data volume",
>> this is probably more compliant.
> Ok, row count is better. One can think of data volume as of size in
> bytes, for example, which is wrong.
>>>
>>> Also, please see other comments inline.
>>>
>>> Regards, Evgen.
>>>
>>> On 06/04/10 19:04, Roy Lyseng wrote:
>>>> #At file:///home/rl136806/mysql/repo/mysql-work4/ based on
>>>> revid:olav@stripped
>>>>
>>>> 3185 Roy Lyseng 2010-06-04
>>>> Bug#43768: Prepared query with nested subqueries core dump on second
>>>> execution
>>>>
>>>> The problem here is that pullout of semijoin tables is attempted for
>>>> each execution, but because those tables are not "pushed back" again
>>>> after each execution, the pullout fails on second attempt.
>>>>
>>>> The solution chosen here is to pullout only those semijoin tables
>>>> that are
>>>> functionally dependent upon the outer tables. This pullout operation
>>>> can be performed only once, and, unlike the current procedure, is not
>>>> dependent upon the data volume of the involved tables.
>>>> We still pullout tables based on volume if this is a direct
>>>> execution, though.
>>>>
>>>> The practical implication of this is as follows:
>>>>
>>>> - Only outer tables containing zero or one rows will now be extracted
>>>> as "const tables". Thus, such tables from a semijoin nest are no
>>>> longer accessed during optimization, and some (rare) optimizations
>>>> are no longer possible.
>>>>
>>>> - In the majority of cases, there is no performance impact. Often,
>>>> the new strategy chosen is Materialization, meaning that the row
>>>> of these table is accessed only once and saved in local memory.
>>>>
>>>> - Const table analysis now has to be done in two phases:
>>>> 1) Pullout tables based on dependencies. Both outer and inner tables
>>>> may apply, and
>>>> 2) Pullout outer tables based on data volume.
>>>>
>>>> In order to implement the latter point above, and assure that pullout
>>>> of semijoin tables occurs only once, make_join_statistics() has been
>>>> restructured slightly. The conditional logic within the function has
>>>> also
>>>> been enhanced for better readability.
>>>>
>>>> The logic of make_join_statistics() is now as follows:
>>>>
>>>> 1. Initialize JOIN data structures
>>>> (major part of first loop in existing implementation).
>>>>
>>>> 2. Update dependencies based on join information
>>>> (the Warshall algorithm).
>>>>
>>>> 3. Make key descriptions (update_ref_and_keys()).
>>>>
>>>> 4. Pull out semijoin tables, called only once.
>>>>
>>>> 5. Extract tables with zero or one rows as const tables
>>>> (consider outer tables only, no semijoin tables).
>>>>
>>>> 6. Extract dependent tables as const tables.
>>>> (consider outer tables only, no semijoin tables).
>>> Prior to your fix const tables were read earlier. It's not clear from
>>> this
>>> comment why it's needed to move this to a later stage.
>>
>> This is just a description of the steps, it does not describe the
>> logic changes.
>> The justification is placed earlier in the commit header.
> Ok, then, please, clarify it in the commit header.
> This restructing is rather a big change, and it would be much better to
> document it well.
I have tried to explain it better in the new commit header.
>>>>
>>>> 7. The remaining parts of the function.
>>>>
>>>> mysql-test/r/select_found.result
>>>> Possible keys changed from "PRIMARY,kid" to "kid".
>>>>
>>>> mysql-test/r/subselect4.result
>>>> Added test case for bug#43768
>>>>
>>>> mysql-test/r/subselect_sj.result
>>>> A number of plan changes because of extensive testing of semi-join
>>>> tables with 0, 1 and 2 rows.
>>>>
>>>> mysql-test/r/subselect_sj_jcl6.result
>>>> Ditto.
>>>>
>>>> mysql-test/t/subselect4.test
>>>> Added test case for bug#43768
>>>>
>>>> mysql-test/t/subselect_sj.test
>>>> bug#46744 now becomes a duplicate of bug#50489, and test case is moved.
>>>>
>>>> mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
>>>> Test for bug#46744 moved here.
>>>>
>>>> sql/sql_select.cc
>>>> pull_out_semijoin_tables()
>>>> join_tab->emb_sj_nest is no longer updated. Comments updated.
>>>> make_join_statistics()
>>>> Removed const table pullout from first loop.
>>>> Simplified testing based on inner/outer/semi-join properties.
>>>> Calls pull_out_semijoin_tables() just after dependency analysis.
>>>> Then, added loop that performs pullout based on data volume
>>>> but excludes semijoined tables in prepared statements.
>>>> Second call of pull_out_semijoin_tables() is needed after this.
>>>> Simplified testing based on inner/outer/semi-join properties.
>>>> Assured that semijoin tables are no longer pulled out based on volume.
>>>>
>>>> added:
>>>> mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
>>>> modified:
>>>> mysql-test/r/select_found.result
>>>> mysql-test/r/subselect4.result
>>>> mysql-test/r/subselect_sj.result
>>>> mysql-test/r/subselect_sj_jcl6.result
>>>> mysql-test/t/subselect4.test
>>>> mysql-test/t/subselect_sj.test
>>>> sql/sql_select.cc
>>>> === modified file 'mysql-test/r/select_found.result'
>>>> --- a/mysql-test/r/select_found.result 2007-08-02 19:45:56 +0000
>>>> +++ b/mysql-test/r/select_found.result 2010-06-04 15:03:03 +0000
>>>> @@ -83,7 +83,7 @@ UNIQUE KEY e_n (email,name)
>>>> );
>>>> EXPLAIN SELECT SQL_CALC_FOUND_ROWS DISTINCT email FROM t2 LEFT JOIN
>>>> t1 ON kid
>>>> = t2.id WHERE t1.id IS NULL LIMIT 10;
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 SIMPLE t1 system PRIMARY,kid NULL NULL NULL 0 const row not found
>>>> +1 SIMPLE t1 system kid NULL NULL NULL 0 const row not found
>>>> 1 SIMPLE t2 index NULL e_n 104 NULL 10 SELECT SQL_CALC_FOUND_ROWS
>>>> DISTINCT
>>>> email FROM t2 LEFT JOIN t1 ON kid = t2.id WHERE t1.id IS NULL LIMIT 10;
>>>> email
>>>>
>>>> === modified file 'mysql-test/r/subselect4.result'
>>>> --- a/mysql-test/r/subselect4.result 2010-05-19 14:44:18 +0000
>>>> +++ b/mysql-test/r/subselect4.result 2010-06-04 15:03:03 +0000
>>>> @@ -500,6 +500,100 @@ SUBQUERY1_t1.col_varchar) ) )
>>>> pk
>>>> drop table t1;
>>>> #
>>>> +# Bug#43768 Prepared query with nested subqueries core dump on
>>>> second execution
>>>> +#
>>>> +CREATE TABLE t1 (
>>>> +id INT PRIMARY KEY,
>>>> +partner_id VARCHAR(35)
>>>> +);
>>>> +INSERT INTO t1 VALUES
>>>> +(1, 'partner1'), (2, 'partner2'),
>>>> +(3, 'partner3'), (4, 'partner4');
>>>> +CREATE TABLE t2 (
>>>> +id INT NOT NULL,
>>>> +t1_line_id INT,
>>>> +article_id VARCHAR(20),
>>>> +PRIMARY KEY(id, t1_line_id)
>>>> +);
>>>> +INSERT INTO t2 VALUES
>>>> +(1, 1, 'sup'), (2, 1, 'sup'),
>>>> +(2, 2, 'sup'), (2, 3, 'sup'),
>>>> +(2, 4, 'imp'), (3, 1, 'sup'),
>>>> +(4, 1, 'sup');
>>>> +CREATE TABLE t3 (
>>>> +user_id VARCHAR(50),
>>>> +article_id VARCHAR(20) NOT NULL,
>>>> +PRIMARY KEY(user_id)
>>>> +);
>>>> +INSERT INTO t3 VALUES('nicke', 'imp');
>>>> +EXPLAIN
>>>> +SELECT t1.partner_id
>>>> +FROM t1
>>>> +WHERE t1.id IN (
>>>> +SELECT t2.id
>>>> +FROM t2
>>>> +WHERE article_id IN (
>>>> +SELECT article_id FROM t3
>>>> +WHERE user_id = 'nicke'
>>>> + )
>>>> +);
>>>> +id select_type table type possible_keys key key_len ref rows Extra
>>>> +1 PRIMARY t3 system PRIMARY NULL NULL NULL 1 +1 PRIMARY t1 ALL
>>>> PRIMARY NULL
>>>> NULL NULL 4 +1 PRIMARY t2 ref PRIMARY PRIMARY 4 test.t1.id 1 Using
>>>> where;
>>>> FirstMatch(t1)
>>>> +SELECT t1.partner_id
>>>> +FROM t1
>>>> +WHERE t1.id IN (
>>>> +SELECT t2.id
>>>> +FROM t2
>>>> +WHERE article_id IN (
>>>> +SELECT article_id FROM t3
>>>> +WHERE user_id = 'nicke'
>>>> + )
>>>> +);
>>>> +partner_id
>>>> +partner2
>>>> +PREPARE stmt FROM
>>>> +'EXPLAIN SELECT t1.partner_id
>>>> +FROM t1
>>>> +WHERE t1.id IN (
>>>> + SELECT t2.id
>>>> + FROM t2
>>>> + WHERE article_id IN (
>>>> + SELECT article_id FROM t3
>>>> + WHERE user_id = \'nicke\'
>>>> + )
>>>> + )';
>>>> +EXECUTE stmt;
>>>> +id select_type table type possible_keys key key_len ref rows Extra
>>>> +1 PRIMARY t3 system PRIMARY NULL NULL NULL 1 +1 PRIMARY t1 ALL
>>>> PRIMARY NULL
>>>> NULL NULL 4 +1 PRIMARY t2 ref PRIMARY PRIMARY 4 test.t1.id 1 Using
>>>> where;
>>>> FirstMatch(t1)
>>> The idea of having prepared and non-prepared EXPLAIN was to show that
>>> const
>>> table pullout works for non-prepared explain and doesn't for prepared
>>> one.
>>> I.e. EXPLAIN output should differ, but in this example they are the
>>> same.
>>> Could you check and correct this?
>>
>> The problem is that EXPLAIN behaves as if this is a non-prepared
>> statement,
>> regardless of the type of statement. This is an inherent feature of
>> EXPLAIN, it
>> is also the reason that you can run with --ps-protocol without changes
>> to result
>> files :)
>>
>>>> +EXECUTE stmt;
>>>> +id select_type table type possible_keys key key_len ref rows Extra
>>>> +1 SIMPLE t3 system PRIMARY NULL NULL NULL 1 +1 SIMPLE t1 ALL
>>>> PRIMARY NULL
>>>> NULL NULL 4 +1 SIMPLE t2 ref PRIMARY PRIMARY 4 test.t1.id 1 Using
>>>> where;
>>>> FirstMatch(t1)
>>>> +PREPARE stmt FROM
>>>> +'SELECT t1.partner_id
>>>> +FROM t1
>>>> +WHERE t1.id IN (
>>>> + SELECT t2.id
>>>> + FROM t2
>>>> + WHERE article_id IN (
>>>> + SELECT article_id FROM t3
>>>> + WHERE user_id = \'nicke\'
>>>> + )
>>>> + )';
>>>> +EXECUTE stmt;
>>>> +partner_id
>>>> +partner2
>>>> +EXECUTE stmt;
>>>> +partner_id
>>>> +partner2
>>>> +DROP TABLE t1,t2,t3;
>>>> +# End of Bug#43768
>>>> +#
>>>> # BUG#53298 "wrong result with semijoin (no semijoin strategy chosen)"
>>>> #
>>>> create table t1 (uid int, fid int);
>>>>
>>>> === modified file 'mysql-test/r/subselect_sj.result'
>>>> --- a/mysql-test/r/subselect_sj.result 2010-05-18 11:01:08 +0000
>>>> +++ b/mysql-test/r/subselect_sj.result 2010-06-04 15:03:03 +0000
>>>> @@ -333,7 +333,9 @@ i
>>>> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
>>>> noticed after
>>>> reading const tables
>>>> +1 PRIMARY t1 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t3 ALL NULL NULL NULL NULL 2 Using where; FirstMatch(t2)
>>>> SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> i
>>>> @@ -363,7 +365,9 @@ i
>>>> EXPLAIN SELECT * FROM t1 WHERE (11) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
>>>> noticed after
>>>> reading const tables
>>>> +1 PRIMARY t1 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t3 ALL NULL NULL NULL NULL 2 Using where; FirstMatch(t2)
>>>> SELECT * FROM t1 WHERE (11) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> i
>>>> @@ -1320,14 +1324,18 @@ SELECT (SELECT COUNT(*) from t1),
>>>> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
>>>> noticed after
>>>> reading const tables
>>>> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t3 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where
>>>> SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
>>>> i
>>>> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
>>>> noticed after
>>>> reading const tables
>>>> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t3 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where
>>>> SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> i
>>>> @@ -2363,34 +2371,6 @@ int_key
>>>> 7
>>>> DROP TABLE t0, t1, t2;
>>>> # End of bug#46550
>>>> -#
>>>> -# Bug #46744 Crash in optimize_semijoin_nests on empty view
>>>> -# with limit and procedure.
>>>> -#
>>>> -DROP TABLE IF EXISTS t1, t2;
>>>> -DROP VIEW IF EXISTS v1;
>>>> -DROP PROCEDURE IF EXISTS p1;
>>>> -CREATE TABLE t1 ( f1 int );
>>>> -CREATE TABLE t2 ( f1 int );
>>>> -insert into t2 values (5), (7);
>>>> -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
>>>> -create procedure p1()
>>>> -select COUNT(*)
>>>> -FROM v1 WHERE f1 IN
>>>> -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
>>>> -SET SESSION optimizer_switch = 'semijoin=on';
>>>> -CALL p1();
>>>> -COUNT(*)
>>>> -0
>>>> -SET SESSION optimizer_switch = 'semijoin=off';
>>>> -CALL p1();
>>>> -COUNT(*)
>>>> -0
>>>> -drop table t1, t2;
>>>> -drop view v1;
>>>> -drop procedure p1;
>>>> -set SESSION optimizer_switch='default';
>>>> -# End of bug#46744
>>>>
>>>> Bug #48073 Subquery on char columns from view crashes Mysql
>>>>
>>>>
>>>> === modified file 'mysql-test/r/subselect_sj_jcl6.result'
>>>> --- a/mysql-test/r/subselect_sj_jcl6.result 2010-05-18 11:01:08 +0000
>>>> +++ b/mysql-test/r/subselect_sj_jcl6.result 2010-06-04 15:03:03 +0000
>>>> @@ -337,7 +337,9 @@ i
>>>> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
>>>> noticed after
>>>> reading const tables
>>>> +1 PRIMARY t1 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t3 ALL NULL NULL NULL NULL 2 Using where; FirstMatch(t2)
>>>> SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> i
>>>> @@ -367,7 +369,9 @@ i
>>>> EXPLAIN SELECT * FROM t1 WHERE (11) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
>>>> noticed after
>>>> reading const tables
>>>> +1 PRIMARY t1 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t3 ALL NULL NULL NULL NULL 2 Using where; FirstMatch(t2)
>>>> SELECT * FROM t1 WHERE (11) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> i
>>>> @@ -1324,14 +1328,18 @@ SELECT (SELECT COUNT(*) from t1),
>>>> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
>>>> noticed after
>>>> reading const tables
>>>> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t3 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where
>>>> SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 LEFT JOIN t3 ON t2.i=t3.i);
>>>> i
>>>> EXPLAIN SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> id select_type table type possible_keys key key_len ref rows Extra
>>>> -1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE
>>>> noticed after
>>>> reading const tables
>>>> +1 PRIMARY t2 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t3 system NULL NULL NULL NULL 0 const row not found
>>>> +1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where
>>>> SELECT * FROM t1 WHERE (t1.i) IN
>>>> (SELECT t3.i FROM t2 RIGHT JOIN t3 ON t2.i=t3.i);
>>>> i
>>>> @@ -2367,34 +2375,6 @@ int_key
>>>> 7
>>>> DROP TABLE t0, t1, t2;
>>>> # End of bug#46550
>>>> -#
>>>> -# Bug #46744 Crash in optimize_semijoin_nests on empty view
>>>> -# with limit and procedure.
>>>> -#
>>>> -DROP TABLE IF EXISTS t1, t2;
>>>> -DROP VIEW IF EXISTS v1;
>>>> -DROP PROCEDURE IF EXISTS p1;
>>>> -CREATE TABLE t1 ( f1 int );
>>>> -CREATE TABLE t2 ( f1 int );
>>>> -insert into t2 values (5), (7);
>>>> -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
>>>> -create procedure p1()
>>>> -select COUNT(*)
>>>> -FROM v1 WHERE f1 IN
>>>> -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
>>>> -SET SESSION optimizer_switch = 'semijoin=on';
>>>> -CALL p1();
>>>> -COUNT(*)
>>>> -0
>>>> -SET SESSION optimizer_switch = 'semijoin=off';
>>>> -CALL p1();
>>>> -COUNT(*)
>>>> -0
>>>> -drop table t1, t2;
>>>> -drop view v1;
>>>> -drop procedure p1;
>>>> -set SESSION optimizer_switch='default';
>>>> -# End of bug#46744
>>>>
>>>> Bug #48073 Subquery on char columns from view crashes Mysql
>>>>
>>>>
>>>> === added file
>>>> 'mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test'
>>>> --- a/mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
>>>> 1970-01-01
>>>> 00:00:00 +0000
>>>> +++ b/mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
>>>> 2010-06-04
>>>> 15:03:03 +0000
>>>> @@ -0,0 +1,35 @@
>>>> +--echo #
>>>> +--echo # Bug #46744 Crash in optimize_semijoin_nests on empty view
>>>> +--echo # with limit and procedure.
>>>> +--echo #
>>>> +
>>>> +--disable_warnings
>>>> +DROP TABLE IF EXISTS t1, t2;
>>>> +DROP VIEW IF EXISTS v1;
>>>> +DROP PROCEDURE IF EXISTS p1;
>>>> +--enable_warnings
>>>> +
>>>> +CREATE TABLE t1 ( f1 int );
>>>> +CREATE TABLE t2 ( f1 int );
>>>> +
>>>> +insert into t2 values (5), (7);
>>>> +
>>>> +CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
>>>> +
>>>> +create procedure p1()
>>>> +select COUNT(*)
>>>> +FROM v1 WHERE f1 IN
>>>> +(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
>>>> +
>>>> +SET SESSION optimizer_switch = 'semijoin=on';
>>>> +CALL p1();
>>>> +SET SESSION optimizer_switch = 'semijoin=off';
>>>> +CALL p1();
>>>> +
>>>> +drop table t1, t2;
>>>> +drop view v1;
>>>> +drop procedure p1;
>>>> +
>>>> +set SESSION optimizer_switch='default';
>>>> +
>>>> +--echo # End of bug#46744
>>>>
>>>> === modified file 'mysql-test/t/subselect4.test'
>>>> --- a/mysql-test/t/subselect4.test 2010-05-19 14:44:18 +0000
>>>> +++ b/mysql-test/t/subselect4.test 2010-06-04 15:03:03 +0000
>>>> @@ -467,6 +467,95 @@ WHERE ( 1, 2 ) IN ( SELECT SUBQUERY1_t1.
>>>> drop table t1;
>>>>
>>>> --echo #
>>>> +--echo # Bug#43768 Prepared query with nested subqueries core dump
>>>> on second
>>>> execution
>>>> +--echo #
>>>> +
>>>> +CREATE TABLE t1 (
>>>> + id INT PRIMARY KEY,
>>>> + partner_id VARCHAR(35)
>>>> +);
>>>> +
>>>> +INSERT INTO t1 VALUES
>>>> + (1, 'partner1'), (2, 'partner2'),
>>>> + (3, 'partner3'), (4, 'partner4');
>>>> +
>>>> +CREATE TABLE t2 (
>>>> + id INT NOT NULL,
>>>> + t1_line_id INT,
>>>> + article_id VARCHAR(20),
>>>> + PRIMARY KEY(id, t1_line_id)
>>>> +);
>>>> +
>>>> +INSERT INTO t2 VALUES
>>>> + (1, 1, 'sup'), (2, 1, 'sup'),
>>>> + (2, 2, 'sup'), (2, 3, 'sup'),
>>>> + (2, 4, 'imp'), (3, 1, 'sup'),
>>>> + (4, 1, 'sup');
>>>> +
>>>> +CREATE TABLE t3 (
>>>> + user_id VARCHAR(50),
>>>> + article_id VARCHAR(20) NOT NULL,
>>>> + PRIMARY KEY(user_id)
>>>> +);
>>>> +
>>>> +INSERT INTO t3 VALUES('nicke', 'imp');
>>>> +
>>>> +EXPLAIN
>>>> +SELECT t1.partner_id
>>>> +FROM t1
>>>> +WHERE t1.id IN (
>>>> + SELECT t2.id
>>>> + FROM t2
>>>> + WHERE article_id IN (
>>>> + SELECT article_id FROM t3
>>>> + WHERE user_id = 'nicke'
>>>> + )
>>>> + );
>>>> +
>>>> +SELECT t1.partner_id
>>>> +FROM t1
>>>> +WHERE t1.id IN (
>>>> + SELECT t2.id
>>>> + FROM t2
>>>> + WHERE article_id IN (
>>>> + SELECT article_id FROM t3
>>>> + WHERE user_id = 'nicke'
>>>> + )
>>>> + );
>>>> +
>>>> +PREPARE stmt FROM
>>>> +'EXPLAIN SELECT t1.partner_id
>>>> +FROM t1
>>>> +WHERE t1.id IN (
>>>> + SELECT t2.id
>>>> + FROM t2
>>>> + WHERE article_id IN (
>>>> + SELECT article_id FROM t3
>>>> + WHERE user_id = \'nicke\'
>>>> + )
>>>> + )';
>>>> +EXECUTE stmt;
>>>> +EXECUTE stmt;
>>>> +
>>>> +PREPARE stmt FROM
>>>> +'SELECT t1.partner_id
>>>> +FROM t1
>>>> +WHERE t1.id IN (
>>>> + SELECT t2.id
>>>> + FROM t2
>>>> + WHERE article_id IN (
>>>> + SELECT article_id FROM t3
>>>> + WHERE user_id = \'nicke\'
>>>> + )
>>>> + )';
>>>> +EXECUTE stmt;
>>>> +EXECUTE stmt;
>>>> +
>>>> +DROP TABLE t1,t2,t3;
>>>> +
>>>> +--echo # End of Bug#43768
>>>> +
>>>> +--echo #
>>>> --echo # BUG#53298 "wrong result with semijoin (no semijoin strategy
>>>> chosen)"
>>>> --echo #
>>>>
>>>>
>>>> === modified file 'mysql-test/t/subselect_sj.test'
>>>> --- a/mysql-test/t/subselect_sj.test 2010-05-18 11:01:08 +0000
>>>> +++ b/mysql-test/t/subselect_sj.test 2010-06-04 15:03:03 +0000
>>>> @@ -645,43 +645,6 @@ DROP TABLE t0, t1, t2;
>>>>
>>>> --echo # End of bug#46550
>>>>
>>>> ---echo #
>>>> ---echo # Bug #46744 Crash in optimize_semijoin_nests on empty view
>>>> ---echo # with limit and procedure.
>>>> ---echo #
>>>> -
>>>> ---disable_warnings
>>>> -DROP TABLE IF EXISTS t1, t2;
>>>> -DROP VIEW IF EXISTS v1;
>>>> -DROP PROCEDURE IF EXISTS p1;
>>>> ---enable_warnings
>>>> -
>>>> -CREATE TABLE t1 ( f1 int );
>>>> -CREATE TABLE t2 ( f1 int );
>>>> -
>>>> -insert into t2 values (5), (7);
>>>> -
>>>> -CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LIMIT 2;
>>>> -
>>>> -create procedure p1()
>>>> -select COUNT(*)
>>>> -FROM v1 WHERE f1 IN
>>>> -(SELECT f1 FROM t2 WHERE f1 = ANY (SELECT f1 FROM v1));
>>>> -
>>>> -SET SESSION optimizer_switch = 'semijoin=on';
>>>> -CALL p1();
>>>> -SET SESSION optimizer_switch = 'semijoin=off';
>>>> -CALL p1();
>>>> -
>>>> -drop table t1, t2;
>>>> -drop view v1;
>>>> -drop procedure p1;
>>>> -
>>>> -set SESSION optimizer_switch='default';
>>>> -
>>>> ---echo # End of bug#46744
>>>> -
>>>> -
>>>> --echo
>>>> --echo Bug #48073 Subquery on char columns from view crashes Mysql
>>>> --echo
>>>>
>>>> === modified file 'sql/sql_select.cc'
>>>> --- a/sql/sql_select.cc 2010-05-25 15:24:05 +0000
>>>> +++ b/sql/sql_select.cc 2010-06-04 15:03:03 +0000
>>>> @@ -65,7 +65,8 @@ struct st_sargable_param;
>>>>
>>>> static void optimize_keyuse(JOIN *join, DYNAMIC_ARRAY *keyuse_array);
>>>> static bool make_join_statistics(JOIN *join, TABLE_LIST *leaves,
>>>> COND *conds,
>>>> - DYNAMIC_ARRAY *keyuse);
>>>> + DYNAMIC_ARRAY *keyuse,
>>>> + bool first_optimization);
>>>> static bool optimize_semijoin_nests(JOIN *join, table_map
>>>> all_table_map);
>>>> static bool update_ref_and_keys(THD *thd, DYNAMIC_ARRAY *keyuse,
>>>> JOIN_TAB *join_tab,
>>>> @@ -1513,6 +1514,8 @@ JOIN::optimize()
>>>> if (optimized)
>>>> DBUG_RETURN(0);
>>>> optimized= 1;
>>>> + bool first_optimization= select_lex->first_cond_optimization;
>>>> + select_lex->first_cond_optimization= FALSE;
>>>>
>>>> thd_proc_info(thd, "optimizing");
>>>>
>>>> @@ -1555,7 +1558,7 @@ JOIN::optimize()
>>>> }
>>>> #endif
>>>> SELECT_LEX *sel= thd->lex->current_select;
>>>> - if (sel->first_cond_optimization)
>>>> + if (first_optimization)
>>>> {
>>>> /*
>>>> The following code will allocate the new items in a permanent
>>>> @@ -1568,8 +1571,6 @@ JOIN::optimize()
>>>> else
>>>> thd->set_n_backup_active_arena(arena,&backup);
>>>>
>>>> - sel->first_cond_optimization= 0;
>>>> -
>>>> /* Convert all outer joins to inner joins if possible */
>>>> conds= simplify_joins(this, join_list, conds, TRUE, FALSE);
>>>> build_bitmap_for_nested_joins(join_list, 0);
>>>> @@ -1712,7 +1713,8 @@ JOIN::optimize()
>>>>
>>>> /* Calculate how to do the join */
>>>> thd_proc_info(thd, "statistics");
>>>> - if (make_join_statistics(this, select_lex->leaf_tables,
>>>> conds,&keyuse) ||
>>>> + if (make_join_statistics(this, select_lex->leaf_tables, conds,
>>>> +&keyuse, first_optimization) ||
>>>> thd->is_fatal_error)
>>>> {
>>>> DBUG_PRINT("error",("Error: make_join_statistics() failed"));
>>>> @@ -3930,20 +3932,28 @@ bool find_eq_ref_candidate(TABLE *table,
>>>> PRECONDITIONS
>>>> When this function is called, the join may have several semi-join nests
>>>> but it is guaranteed that one semi-join nest does not contain another.
>>>> + For functionally dependent tables to be pulled out, key
>>>> information must
>>>> + have been calculated.
>>> It's not clean what info must be calculated: info on keys, or some
>>> "key info",
>>> and what it is in the latter case.
>>
>> I'll add in a reference to update_ref_and_keys().
>>
>>>> + For const tables to be pulled out, join->const_table_map must be set.
>>>>
>>>> ACTION
>>>> A table can be pulled out of the semi-join nest if
>>>> - - It is a constant table, or
>>>> - - It is accessed via eq_ref(outer_tables)
>>>> + - It is accessed via eq_ref(outer_tables), or
>>>> + - It contains zero or one rows (identified as const table)
>>>>
>>>> POSTCONDITIONS
>>>> - * Tables that were pulled out have JOIN_TAB::emb_sj_nest == NULL
>>>> - * Tables that were not pulled out have JOIN_TAB::emb_sj_nest pointing
>>>> - to semi-join nest they are in.
>>>> - * Semi-join nests' TABLE_LIST::sj_inner_tables is updated accordingly
>>>> -
>>>> - This operation is (and should be) performed at each PS execution
>>>> since
>>>> - tables may become/cease to be constant across PS reexecutions.
>>>> + * Tables that were pulled out are removed from the semijoin nest they
>>>> + belonged to and added to the parent join nest.
>>>> + * For these tables, the used_tables and not_null_tables fields of
>>>> + the semijoin nest they belonged to will be adjusted.
>>>> + The semijoin nest is also marked as correlated, and
>>>> + sj_corr_tables and sj_depends_on are adjusted if necessary.
>>>> + * Semi-join nests' sj_inner_tables is set equal to used_tables
>>>> +
>>>> + For queries that can be executed multiple times, only functionally
>>>> + dependent tables can be pulled out.
>>>> + For queries that can be executed only once, tables that have one
>>>> + or zero rows can also be pulled out.
>>>>
>>>> NOTE
>>>> Table pullout may make uncorrelated subquery correlated. Consider this
>>>> @@ -3971,17 +3981,19 @@ int pull_out_semijoin_tables(JOIN *join)
>>>>
>>>> /* Try pulling out of the each of the semi-joins */
>>>> while ((sj_nest= sj_list_it++))
>>>> - {
>>>> - /* Action #1: Mark the constant tables to be pulled out */
>>>> - table_map pulled_tables= 0;
>>>> -
>>>> + {
>>>> List_iterator child_li(sj_nest->nested_join->join_list);
>>>> TABLE_LIST *tbl;
>>>> + table_map pulled_tables= 0;
>>>> +
>>>> + /*
>>>> + Add tables that have been marked as const to the set of tables
>>>> + to be pulled out.
>>>> + */
>>>> while ((tbl= child_li++))
>>>> {
>>>> if (tbl->table)
>>>> {
>>>> - tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
>>>> if (tbl->table->map& join->const_table_map)
>>>> {
>>>> pulled_tables |= tbl->table->map;
>>>> @@ -3990,11 +4002,11 @@ int pull_out_semijoin_tables(JOIN *join)
>>>> }
>>>> }
>>>> }
>>>> -
>>>> +
>>>> /*
>>>> - Action #2: Find which tables we can pull out based on
>>>> - update_ref_and_keys() data. Note that pulling one table out can allow
>>>> - us to pull out some other tables too.
>>>> + Find which tables we can pull out based on key dependency data.
>>>> + Note that pulling one table out can allow us to pull out some
>>>> + other tables too.
>>>> */
>>>> bool pulled_a_table;
>>>> do
>>>> @@ -4027,32 +4039,27 @@ int pull_out_semijoin_tables(JOIN *join)
>>>>
>>>> child_li.rewind();
>>>> /*
>>>> - Action #3: Move the pulled out TABLE_LIST elements to the parents.
>>>> + Move the pulled out TABLE_LIST elements to the parents.
>>>> */
>>>> - table_map inner_tables= sj_nest->nested_join->used_tables&
>>>> - ~pulled_tables;
>>>> - /* Record the bitmap of inner tables */
>>>> - sj_nest->sj_inner_tables= inner_tables;
>>>> + sj_nest->nested_join->used_tables&= ~pulled_tables;
>>>> + sj_nest->nested_join->not_null_tables&= ~pulled_tables;
>>>> +
>>>> + /* sj_inner_tables is a copy of nested_join->used_tables */
>>>> + sj_nest->sj_inner_tables= sj_nest->nested_join->used_tables;
>>>> +
>>>> if (pulled_tables)
>>>> {
>>>> - List *upper_join_list= (sj_nest->embedding != NULL)?
>>>> - (&sj_nest->embedding->nested_join->join_list):
>>>> - (&join->select_lex->top_join_list);
>>>> + List *upper_join_list= (sj_nest->embedding != NULL) ?
>>>> +&sj_nest->embedding->nested_join->join_list :
>>>> +&join->select_lex->top_join_list;
>>>> Query_arena *arena, backup;
>>>> arena= join->thd->activate_stmt_arena_if_needed(&backup);
>>>> while ((tbl= child_li++))
>>>> {
>>>> if (tbl->table)
>>>> {
>>>> - if (inner_tables& tbl->table->map)
>>>> + if (!(sj_nest->nested_join->used_tables& tbl->table->map))
>>>> {
>>>> - /* This table is not pulled out */
>>>> - tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
>>>> - }
>>>> - else
>>>> - {
>>>> - /* This table has been pulled out of the semi-join nest */
>>>> - tbl->table->reginfo.join_tab->emb_sj_nest= NULL;
>>>> /*
>>>> Pull the table up in the same way as simplify_joins() does:
>>>> update join_list and embedding pointers but keep next[_local]
>>>> @@ -4067,7 +4074,7 @@ int pull_out_semijoin_tables(JOIN *join)
>>>> }
>>>>
>>>> /* Remove the sj-nest itself if we've removed everything from it */
>>>> - if (!inner_tables)
>>>> + if (!sj_nest->nested_join->used_tables)
>>>> {
>>>> List_iterator li(*upper_join_list);
>>>> /* Find the sj_nest in the list. */
>>>> @@ -4188,9 +4195,37 @@ static uint get_tmp_table_rec_length(Lis
>>>> return len;
>>>> }
>>>>
>>>> -
>>>> /**
>>>> - Calculate the best possible join and initialize the join structure.
>>>> + Calculate best possible join order and initialize the join structure.
>>>> +
>>>> + @details
>>>> + Here is an overview of the logic of this function:
>>>> +
>>>> + - Initialize JOIN data structures and setup basic dependencies
>>>> between tables.
>>>> +
>>>> + - Update dependencies based on join information.
>>>> +
>>>> + - Make key descriptions (update_ref_and_keys()).
>>>> +
>>>> + - Pull out semijoin tables based on table dependencies.
>>>> +
>>>> + - Extract tables with zero or one rows as const tables.
>>>> +
>>>> + - Read contents of const tables, substitute columns from these
>>>> tables with
>>>> + actual data. Also keep track of empty tables vs. one-row tables.
>>>> +
>>>> + - Extract tables that have become dependent after const table
>>>> extraction
>>>> + as const tables.
>>> Could you, please, re-phrase this sentence?
>>
>> This was obviously needing some re-phrasing, yes :)
>>
>>>> +
>>>> + - Add new sargable predicates based on retrieved const values.
>>>> +
>>>> + - Calculate number of rows to be retrieved from each table.
>>>> +
>>>> + - Calculate cost of potential semijoin materializations.
>>>> +
>>>> + - Calculate best possible join order based on available statistics.
>>>> +
>>>> + - Fill in remaining information for the generated join order.
>>>>
>>>> @retval
>>>> 0 ok
>>>> @@ -4200,7 +4235,7 @@ static uint get_tmp_table_rec_length(Lis
>>>>
>>>> static bool
>>>> make_join_statistics(JOIN *join, TABLE_LIST *tables_arg, COND *conds,
>>>> - DYNAMIC_ARRAY *keyuse_array)
>>>> + DYNAMIC_ARRAY *keyuse_array, bool first_optimization)
>>>> {
>>>> int error;
>>>> TABLE *table;
>>>> @@ -4224,11 +4259,16 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> DBUG_RETURN(1); // Eom /* purecov: inspected */
>>>>
>>>> join->best_ref=stat_vector;
>>>> + join->const_table_map= 0;
>>>>
>>>> stat_end=stat+table_count;
>>>> found_const_table_map= all_table_map=0;
>>>> const_count=0;
>>>>
>>>> + /*
>>>> + Initialize data structures for tables to be joined.
>>>> + Initialize dependencies between tables.
>>>> + */
>>>> for (s= stat, i= 0;
>>>> tables;
>>>> s++, tables= tables->next_leaf, i++)
>>>> @@ -4261,31 +4301,10 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> table->quick_condition_rows= table->file->stats.records;
>>>>
>>>> s->on_expr_ref=&tables->on_expr;
>>>> - if (*s->on_expr_ref)
>>>> - {
>>>> - /* s is the only inner table of an outer join */
>>>> -#ifdef WITH_PARTITION_STORAGE_ENGINE
>>>> - if ((!table->file->stats.records || table->no_partitions_used)&&
>>>> - !tables->in_outer_join_nest())
>>>> -#else
>>>> - if (!table->file->stats.records&& !tables->in_outer_join_nest())
>>>> -#endif
>>>> - { // Empty table
>>>> - s->dependent= 0; // Ignore LEFT JOIN depend.
>>>> - set_position(join,const_count++,s,(KEYUSE*) 0);
>>>> - continue;
>>>> - }
>>>> - outer_join|= table->map;
>>>> - s->embedding_map= 0;
>>>> - for (TABLE_LIST *embedding= tables->embedding;
>>>> - embedding;
>>>> - embedding= embedding->embedding)
>>>> - s->embedding_map|= embedding->nested_join->nj_map;
>>>> - continue;
>>>> - }
>>>> +
>>>> if (tables->in_outer_join_nest())
>>>> {
>>>> - /* s belongs to a nested join, maybe to several embedded joins */
>>>> + /* s belongs to a nested join, maybe to several embedding joins */
>>>> s->embedding_map= 0;
>>>> for (TABLE_LIST *embedding= tables->embedding;
>>>> embedding;
>>>> @@ -4296,20 +4315,16 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> s->dependent|= embedding->dep_tables;
>>>> outer_join|= nested_join->used_tables;
>>>> }
>>>> - continue;
>>>> }
>>>> -#ifdef WITH_PARTITION_STORAGE_ENGINE
>>>> - const bool no_partitions_used= table->no_partitions_used;
>>>> -#else
>>>> - const bool no_partitions_used= FALSE;
>>>> -#endif
>>>> - if ((table->s->system || table->file->stats.records<= 1 ||
>>>> - no_partitions_used)&&
>>>> - !s->dependent&&
>>>> - (table->file->ha_table_flags()& HA_STATS_RECORDS_IS_EXACT)&&
>>>> - !table->fulltext_searched&& !join->no_const_tables)
>>>> + else if (*s->on_expr_ref)
>>>> {
>>>> - set_position(join,const_count++,s,(KEYUSE*) 0);
>>>> + /* s is the only inner table of an outer join */
>>>> + outer_join|= table->map;
>>>> + s->embedding_map= 0;
>>>> + for (TABLE_LIST *embedding= tables->embedding;
>>>> + embedding;
>>>> + embedding= embedding->embedding)
>>>> + s->embedding_map|= embedding->nested_join->nj_map;
>>>> }
>>>> }
>>>> stat_vector[i]=0;
>>>> @@ -4318,6 +4333,7 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> if (join->outer_join)
>>>> {
>>>> /*
>>>> + Complete the dependency analysis.
>>>> Build transitive closure for relation 'to be dependent on'.
>>>> This will speed up the plan search for many cases with outer joins,
>>>> as well as allow us to catch illegal cross references/
>>>> @@ -4355,8 +4371,84 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> ~outer_join, join->select_lex,&sargables))
>>>> goto error;
>>>>
>>>> - /* Read tables with 0 or 1 rows (system tables) */
>>>> - join->const_table_map= 0;
>>>> + /*
>>>> + Pull out semijoin tables based on dependencies. Dependencies are
>>>> valid
>>>> + throughout the lifetime of a query, so this operation need to be
>>>> performed
>>>> + on the first optimization only.
>>>> + */
>>>> + if (first_optimization)
>>>> + {
>>>> + if (pull_out_semijoin_tables(join))
>>>> + DBUG_RETURN(TRUE);
>>>> + }
>>>> +
>>>> + /*
>>>> + Pull out tables based on data volumes, needs to be done for each
>>>> execution.
>>>> + Const tables containing exactly zero or one rows are pulled out, but
>>>> + notice the additional constraints checked below.
>>>> + Tables that are pulled out have their rows read before actual
>>>> execution
>>>> + starts. Thus, they do not take part in join order optimization
>>>> process,
>>>> + which can significantly reduce the optimization time.
>>>> + The data read from these tables can also be regarded as "constant"
>>>> + throughout query execution, hence the column values can be used for
>>>> + additional constant propagation.
>>>> + */
>>>> + for (i= 0, s= stat; i< table_count; i++, s++)
>>>> + {
>>>> + TABLE *table= s->table;
>>>> + TABLE_LIST *tables= table->pos_in_table_list;
>>>> +
>>>> +#ifdef WITH_PARTITION_STORAGE_ENGINE
>>>> + const bool no_partitions_used= table->no_partitions_used;
>>>> +#else
>>>> + const bool no_partitions_used= FALSE;
>>>> +#endif
>>>> +
>>>> + if (tables->in_outer_join_nest())
>>>> + {
>>>> + /*
>>>> + Table belongs to a nested join, no candidate for const table
>>>> extraction.
>>>> + */
>>>> + }
>>>> + else if (tables->embedding&& tables->embedding->sj_on_expr&&
>>>> + !join->thd->stmt_arena->is_conventional())
>>>> + {
>>>> + /*
>>>> + Table belongs to a semi-join and this is a PS/SP.
>>>> + No candidate for const table extraction.
>>>> + */
>>>> + }
>>> AFAIK we don't use such construction. And IMO it's better to avoid it,
>>> otherwise we could end up with lots of "if (smth) { /* a comment */ }".
>>> Beside that, in this case empty "if" is effectively the "continue"
>>> statement.
>>> but with continue I don't have to keep track on the closing brace and
>>> check
>>> whether there is something after this "if" block.
>>> Thus I strongly suggest you to rewrite it into something like following:
>>> /*
>>> skip table that:
>>> 1) case #1
>>> 2) case #2
>>> */
>>> if (case #1 || //(1)
>>> case #2) // (2)
>>> continue;
>>> if ...
>>>
>>> This construction is more consistent with the rest of the optimizer
>>> code.
>>
>> That may be, but IMHO a pure if-then-else construct is simpler to read
>> than a
>> combination of if-then-else and continue statements. I deliberately
>> did this to
>> improve readability.
> Yes, I understood your intention from our conversation, but still we
> don't use such code style. So, please, change it, for sake of uniformity
> with the rest of the code.
I am not happy with this suggestion. Using continue is dangerous as it makes the
if-else construct dependent upon the surrounding for loop. I appreciate using
continue to skip code based on prerequisites in simple functions, but I consider
using continue in this function dangerous and error-prone. You should also note
that the structure of this if-test is almost identical to another if-test in the
same function. This is also done to ease later maintenance.
I also do not see a rule in the coding guide that forbids or even advice against
the use of empty if blocks. I think that this is one of the most complex
functions in the optimizer, so I think this kind of defensive programming is
justified here.
>>>
>>>> + else if (*s->on_expr_ref)
>>>> + {
>>>> + /* s is the only inner table of an outer join */
>>>> + if (!table->file->stats.records || no_partitions_used)
>>>> + { // Empty table
>>>> + set_position(join,const_count++,s,(KEYUSE*) 0);
>>>> + }
>>>> + }
>>>> + else
>>>> + {
>>>> + /*
>>>> + Extract const tables (tables with only zero or one rows).
>>>> + Exclude tables that
>>>> + 1. are dependent upon other tables, or
>>>> + 2. have no exact statistics (usually transactional tables), or
>>>> + 3. are full-text searched, or
>>>> + 4. the join object does not allow const table elimination.
>>>> + */
>>>> + if ((table->s->system ||
>>>> + table->file->stats.records<= 1 ||
>>>> + no_partitions_used)&&
>>>> + !s->dependent&& // 1
>>>> + (table->file->ha_table_flags()& HA_STATS_RECORDS_IS_EXACT)&& // 2
>>>> + !table->fulltext_searched&& // 3
>>>> + !join->no_const_tables) // 4
>>>> + {
>>>> + set_position(join,const_count++,s,(KEYUSE*) 0);
>>>> + }
>>>> + }
>>>> + }
>>>> + /* Read tables with 0 or 1 rows (const tables) */
>>>>
>>>> for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count;
>>>> p_pos< p_end ;
>>>> @@ -4478,14 +4570,14 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> } while (keyuse->table == table&& keyuse->key == key);
>>>>
>>>> /*
>>>> - TODO (low priority): currently we ignore the const tables that
>>>> - are within a semi-join nest which is within an outer join nest.
>>>> - The effect of this is that we don't do const substitution for
>>>> - such tables.
>>>> + Extract const tables with proper key dependencies.
>>>> + Exclude tables that
>>>> + 1. are full-text searched, or
>>>> + 2. are part of a semi-join nest or part of a nested outer join.
>>>> */
>>>> if (eq_part.is_prefix(table->key_info[key].key_parts)&&
>>>> - !table->fulltext_searched&&
>>>> - !table->pos_in_table_list->in_outer_join_nest())
>>>> + !table->fulltext_searched&& // 1
>>>> + !table->pos_in_table_list->in_outer_join_nest()) // 2
>>>> {
>>>> if (table->key_info[key].flags& HA_NOSAME)
>>>> {
>>>> @@ -4544,6 +4636,8 @@ make_join_statistics(JOIN *join, TABLE_L
>>>>
>>>> for (s=stat ; s< stat_end ; s++)
>>>> {
>>>> + TABLE_LIST *tl= s->table->pos_in_table_list;
>>>> +
>>>> if (s->type == JT_SYSTEM || s->type == JT_CONST)
>>>> {
>>>> /* Only one matching row */
>>>> @@ -4576,9 +4670,8 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> Do range analysis if we're on the inner side of a semi-join (3).
>>>> */
>>>> if (!s->const_keys.is_clear_all()&& // (1)
>>>> - (!s->table->pos_in_table_list->embedding || // (2)
>>>> - (s->table->pos_in_table_list->embedding&& // (3)
>>>> - s->table->pos_in_table_list->embedding->sj_on_expr))) // (3)
>>>> + (!tl->embedding || // (2)
>>>> + (tl->embedding&& tl->embedding->sj_on_expr))) // (3)
>>>> {
>>>> ha_rows records;
>>>> SQL_SELECT *select;
>>>> @@ -4593,8 +4686,11 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> s->quick=select->quick;
>>>> s->needed_reg=select->needed_reg;
>>>> select->quick=0;
>>>> - if (records == 0&& s->table->reginfo.impossible_range&&
>>>> - (s->table->file->ha_table_flags()& HA_STATS_RECORDS_IS_EXACT))
>>>> + if (records == 0&&
>>>> + s->table->reginfo.impossible_range&&
>>>> + (s->table->file->ha_table_flags()& HA_STATS_RECORDS_IS_EXACT)&&
>>>> + (join->thd->stmt_arena->is_conventional() ||
>>>> + !(tl->embedding&& tl->embedding->sj_on_expr)))
>>>> {
>>>> /*
>>>> Impossible WHERE or ON expression
>>>> @@ -4623,8 +4719,27 @@ make_join_statistics(JOIN *join, TABLE_L
>>>> }
>>>> }
>>>>
>>>> - if (pull_out_semijoin_tables(join))
>>>> - DBUG_RETURN(TRUE);
>>>> + /*
>>>> + Pull out tables based on data volumes. Data volumes may vary from one
>>>> + execution to the next, so this needs to be done for every
>>>> optimization.
> I'm not sure this comment is correct. Because you write "from one
> execution to another" it seems like you're talking about multiple
> executions of a query in PS, but from the code I see that const table
> pullout is actually isn't done for PS at all. Please, correct the comment.
Please have a look at the comment in the new commit. I hope that it is more
correct now.
>>>> + */
>>>> + if (join->thd->stmt_arena->is_conventional())
>>>> + {
>>>> + if (pull_out_semijoin_tables(join))
>>>> + DBUG_RETURN(TRUE);
>>>> + }
>>> I see two problem with this approach:
>>> 1) You're calling this function for the second time but in this case
>>> inside
>>> different 'if' and it's not clear why is that and what's the difference
>>> between these calls. I see the comment but the call itself is
>>> essentially the
>>> same and thus confusing.
>>
>> The comment is there to make it clear what the function will do.
>>
>>> 2) In each call only (roughly) 2/3 of function is working. On the
>>> first call
>>> the const table pullout part is skipped, on the second - functionally
>>> dependent tables are already pulled out.
>>> Thus I suggest you to slightly re-design it:
>>> extract the pullout part and make it pull out tables from a given list,
>>> add functions which will build const tables list and functionally
>>> dependent
>>> tables list and call the pulling out function.
>>> This is pretty straightforward and does the same at the less price.
>>
>> This is not entirely true: The row-count based pullout may cause
>> additional
>> tables to become functionally dependent. Thus, in the second case both
>> parts are
>> needed.
> Ok, then the fact that for non-PS queries pull_out_semijoin_tables is
> called twice and pull out is actually done in 3 steps (pullout func.
> dependent tables, const tables, func dependent tables again) should be
> clearly documented in the description of the pull_out_semijoin_tables
> function.
I have updated the comments of pull_out_semijoin_tables() and
make_join_statistics() to show that pullout may be done in multiple phases.
> Also, please, add the TODO section with the note that doing pull out in
> 3 steps is actually a deficiency of the current design and ideally
> should be fixed and done only in 2.
I do not agree that this is a deficiency. It is a consequence of the refactoring
that splits the pullout into two phases.
There are numerous situations where splitting a complex task into 2 or more
stages decreases complexity, while still retaining good performance. E.g, it is
often possible to perform parsing and semantic analysis in one stage, but real
products usually prefer to perform these as two separate tasks.
>
>>
>> Thus, the cost of going through both loops in both cases is almost
>> negligible.
>>>
>>>> +
>>>> + /*
>>>> + Set embedding semijoin pointer. This must be done after pullout of
>>>> + semijoin tables, but it must also be done for each execution,
>>>> hence it
>>>> + cannot be done in pull_out_semijoin_tables().
>>>> + */
>>>> + for (s= stat; s< stat_end; i++, s++)
>>>> + {
>>>> + TABLE_LIST *tables= s->table->pos_in_table_list;
>>>> + if (tables->embedding&& tables->embedding->sj_on_expr)
>>>> + s->emb_sj_nest= tables->embedding;
>>>> + }
>>>>
>>>> join->join_tab=stat;
>>>> join->map2table=stat_ref;
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> Thanks,
>> Roy
>>
> Regards, Evgen.
>
Thanks,
Roy