From: Roy Lyseng Date: June 22 2011 2:59pm Subject: Re: bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294 List-Archive: http://lists.mysql.com/commits/139691 Message-Id: <4E02036A.204@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Guilhem, this bugfix is approved, I have just a comment comment below. On 20.06.11 16.30, Guilhem Bichot wrote: > #At file:///home/mysql_src/bzrrepos_new/mysql-next-mr-opt-backporting/ based on revid:guilhem.bichot@stripped > > 3386 Guilhem Bichot 2011-06-20 > Fix for Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW INSTEAD OF NULL WHEN MATERIALIZATION ON > Reminder of rules, when a join has no rows: > a) "SELECT MIN(field) FROM a join;" > should return a NULL row. > b) "SELECT MIN(field) FROM a join GROUP BY field;" > should return no row. > Summary of bug: send_row_on_empty_set() missed the case of when GROUP > BY has been optimized away, treating (b) like (a). > @ mysql-test/r/subquery_mat.result > A NULL row was wrongly sent, without the code bugfix. > @ mysql-test/r/subquery_mat_all.result > Here there was no bug. > @ mysql-test/r/subquery_mat_none.result > Here there was no bug > @ sql/sql_select.h > Details of what happened are below. > > When semijoin is off and materialization is off. > ================================================ > In resolve_subquery(), IN->EXISTS is done, subquery remains > non-correlated. > Constant tables t1 and t3 are discovered in make_join_statistics(). > Subquery, non-correlated, is evaluated in make_join_select(), here: > if (const_cond&& !const_cond->val_int()) > It is found empty, so the WHERE clause is recognized as impossible, > JOIN::optimize() terminates right after make_join_select(), > JOIN::exec() calls return_zero_rows(). send_row_on_empty_set() sees > that we have GROUP BY (group_list!=NULL) so returns "false" and > return_zero_rows() correctly sends an empty result. > > When semijoin is on and materialization is off. > =============================================== > Semijoin transformation is not done because there is a LEFT > JOIN. IN->EXISTS is done, we are back to the case above. > > When semijoin is on and materialization is on. > ============================================== > Semijoin transformation is not done because there is a LEFT > JOIN. Materialization is not done (indeed > flatten_subqueries() doesn't try to fallback to > materialization - looks like a bug). > IN->EXISTS is done, we are back to the case above. > > When semijoin is off and materialization is on (the bug). > ========================================================= > In resolve_subquery(), subquery is scheduled for materialization. > Constant tables t1 and t3 are discovered in make_join_statistics(). > The (top) query's optimization then progresses while *not* caring for > the subquery; "not caring" is a requirement of subquery > materialization: see for example: > > else if (cond->const_item()&& !cond->is_expensive()) > /* > DontEvaluateMaterializedSubqueryTooEarly: > TODO: > Excluding all expensive functions is too restritive we should exclude only > materialized IN subquery predicates because they can't yet be evaluated > here (they need additional initialization that is done later on). > > The proper way to exclude the subqueries would be to walk the cond tree > and check for materialized subqueries there. > */ > { > *cond_value= eval_const_cond(cond) ? Item::COND_TRUE : Item::COND_FALSE; > > which postpones evaluation of the subquery because it's > to-be-materialized. So make_join_select() doesn't evaluate the > subquery this time. Thus, query's optimization is not shortcut. > The query's execution, in do_select(), sees that all tables of the > LEFT JOIN (i.e. t1, t3) are constant, here : > " if (join->tables == join->const_tables)" > so we have 0 or one row in the result, then do_select() > evaluates the WHERE clause here: > " if (!join->conds || join->conds->val_int())" > which evaluates the subquery. Which is found empty so we go into > else if (join->send_row_on_empty_set()) > But this time, send_row_on_empty_set() returns "true". It's because > it thinks there is no GROUP BY. It thinks so because the top query's > optimization, as it was not shortcut early this time, had reached: > group_list= remove_const(this, (old_group_list= group_list), conds, > rollup.state == ROLLUP::STATE_NONE, > &simple_group); > This remove_const(), which knew that the GROUP BY column (field3) is > from a constant table (t1), was able to optimize away GROUP BY: it > thus set group_list to NULL. Making send_row_on_empty_set() go wrong: > then NULL was wrongly sent. > > The fix: make send_row_on_empty_set() aware of group_optimized_away, a > variable introduced in > sp1r-holyfoot/hf@stripped/hfmain.(none)-20070731054604-56491 > for similar-looking problems. > > modified: > mysql-test/include/subquery_mat.inc > mysql-test/r/subquery_mat.result > mysql-test/r/subquery_mat_all.result > mysql-test/r/subquery_mat_none.result > sql/sql_select.h > === modified file 'mysql-test/include/subquery_mat.inc' > --- a/mysql-test/include/subquery_mat.inc 2011-04-01 11:06:48 +0000 > +++ b/mysql-test/include/subquery_mat.inc 2011-06-20 14:29:59 +0000 > @@ -882,3 +882,32 @@ DROP TABLE t1, t2; > DROP VIEW v3; > > --echo # End Bug#11852644 > + > +--echo > +--echo # Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW > +--echo # INSTEAD OF NULL WHEN MATERIALIZATION ON > +--echo > + > +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY; > +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY; > +INSERT INTO t2 VALUES (8),(7); > +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY; > +INSERT INTO t3 VALUES (7); > + > +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3 > +FROM t3 > + LEFT JOIN t1 > + ON t1.col_int_nokey > +WHERE (194, 200) IN ( > + SELECT SQ4_alias1.col_int_nokey, > + SQ4_alias2.col_int_nokey > + FROM t2 AS SQ4_alias1 > + JOIN > + t2 AS SQ4_alias2 > + ON SQ4_alias2.col_int_nokey = 5 > + ) > +GROUP BY field3 ; t1.col_int_nokey in the SELECT list makes the query non-deterministic (even though this is not a big issue for a single-row query). Is it possible to remove that column and still have a viable test case? > + > +DROP TABLE t1; > +DROP TABLE t2; > +DROP TABLE t3; > > === modified file 'mysql-test/r/subquery_mat.result' > --- a/mysql-test/r/subquery_mat.result 2011-04-01 11:06:48 +0000 > +++ b/mysql-test/r/subquery_mat.result 2011-06-20 14:29:59 +0000 > @@ -1152,4 +1152,30 @@ r > DROP TABLE t1, t2; > DROP VIEW v3; > # End Bug#11852644 > + > +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW > +# INSTEAD OF NULL WHEN MATERIALIZATION ON > + > +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY; > +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY; > +INSERT INTO t2 VALUES (8),(7); > +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY; > +INSERT INTO t3 VALUES (7); > +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3 > +FROM t3 > +LEFT JOIN t1 > +ON t1.col_int_nokey > +WHERE (194, 200) IN ( > +SELECT SQ4_alias1.col_int_nokey, > +SQ4_alias2.col_int_nokey > +FROM t2 AS SQ4_alias1 > +JOIN > +t2 AS SQ4_alias2 > +ON SQ4_alias2.col_int_nokey = 5 > +) > +GROUP BY field3 ; > +MIN(t3.col_int_nokey) field3 > +DROP TABLE t1; > +DROP TABLE t2; > +DROP TABLE t3; > set optimizer_switch=default; > > === modified file 'mysql-test/r/subquery_mat_all.result' > --- a/mysql-test/r/subquery_mat_all.result 2011-04-01 11:06:48 +0000 > +++ b/mysql-test/r/subquery_mat_all.result 2011-06-20 14:29:59 +0000 > @@ -1151,4 +1151,30 @@ r > DROP TABLE t1, t2; > DROP VIEW v3; > # End Bug#11852644 > + > +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW > +# INSTEAD OF NULL WHEN MATERIALIZATION ON > + > +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY; > +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY; > +INSERT INTO t2 VALUES (8),(7); > +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY; > +INSERT INTO t3 VALUES (7); > +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3 > +FROM t3 > +LEFT JOIN t1 > +ON t1.col_int_nokey > +WHERE (194, 200) IN ( > +SELECT SQ4_alias1.col_int_nokey, > +SQ4_alias2.col_int_nokey > +FROM t2 AS SQ4_alias1 > +JOIN > +t2 AS SQ4_alias2 > +ON SQ4_alias2.col_int_nokey = 5 > +) > +GROUP BY field3 ; > +MIN(t3.col_int_nokey) field3 > +DROP TABLE t1; > +DROP TABLE t2; > +DROP TABLE t3; > set optimizer_switch=default; > > === modified file 'mysql-test/r/subquery_mat_none.result' > --- a/mysql-test/r/subquery_mat_none.result 2011-04-01 11:06:48 +0000 > +++ b/mysql-test/r/subquery_mat_none.result 2011-06-20 14:29:59 +0000 > @@ -1150,4 +1150,30 @@ r > DROP TABLE t1, t2; > DROP VIEW v3; > # End Bug#11852644 > + > +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW > +# INSTEAD OF NULL WHEN MATERIALIZATION ON > + > +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY; > +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY; > +INSERT INTO t2 VALUES (8),(7); > +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY; > +INSERT INTO t3 VALUES (7); > +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3 > +FROM t3 > +LEFT JOIN t1 > +ON t1.col_int_nokey > +WHERE (194, 200) IN ( > +SELECT SQ4_alias1.col_int_nokey, > +SQ4_alias2.col_int_nokey > +FROM t2 AS SQ4_alias1 > +JOIN > +t2 AS SQ4_alias2 > +ON SQ4_alias2.col_int_nokey = 5 > +) > +GROUP BY field3 ; > +MIN(t3.col_int_nokey) field3 > +DROP TABLE t1; > +DROP TABLE t2; > +DROP TABLE t3; > set optimizer_switch=default; > > === modified file 'sql/sql_select.h' > --- a/sql/sql_select.h 2011-06-17 12:45:22 +0000 > +++ b/sql/sql_select.h 2011-06-20 14:29:59 +0000 > @@ -2029,10 +2029,16 @@ public: > void clear(); > bool save_join_tab(); > bool init_save_join_tab(); > + /** > + If there is an aggregate function (sum_func_count!=0), and no GROUP BY, > + and the HAVING clause evaluates to "true", we should send a row even if > + the join contains no rows. This is what the standard requires. > + */ Suggestion: Structure the comment as follows: /** Send a row even if the join produced no rows if: - there is an aggregate function (sum_func_count!=0), and - the query is not grouped, and - a possible HAVING clause evaluates to TRUE. */ I am not completely sure about the standards compliance, but I am unable to find a test case that causes an error. In SQL-2008, there is this syntax: "GROUP BY ()" which creates one group with 0 or more rows in it. An aggregate function without an explicit GROUP BY implies GROUP BY (). A HAVING clause without a preceding GROUP BY also implies GROUP BY (). MySQL does not support GROUP BY (), but the construct is merely a formalized version of the implicit grouping found in earlier SQL standards. This will cause the select statement: SELECT 1 FROM t WHERE HAVING COUNT(*) = 0; to report a row, and indeed it does. However, the execution does not reach JOIN::send_row_on_empty_set(), so I am not quite sure how that row is generated... Beware also the MySQL extension that HAVING without a set function is equivalent to a WHERE... > bool send_row_on_empty_set() > { > return (do_send_rows&& tmp_table_param.sum_func_count != 0&& > - !group_list&& select_lex->having_value != Item::COND_FALSE); > + !(group_list != NULL || group_optimized_away)&& > + select_lex->having_value != Item::COND_FALSE); > } > bool change_result(select_result *result); > bool is_top_level_join() const Thanks, Roy