From: Roy Lyseng
Date: June 22 2011 2:59pm
Subject: Re: bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294
List-Archive: http://lists.mysql.com/commits/139691
Message-Id: <4E02036A.204@oracle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Hi Guilhem,
this bugfix is approved, I have just a comment comment below.
On 20.06.11 16.30, Guilhem Bichot wrote:
> #At file:///home/mysql_src/bzrrepos_new/mysql-next-mr-opt-backporting/ based on revid:guilhem.bichot@stripped
>
> 3386 Guilhem Bichot 2011-06-20
> Fix for Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW INSTEAD OF NULL WHEN MATERIALIZATION ON
> Reminder of rules, when a join has no rows:
> a) "SELECT MIN(field) FROM a join;"
> should return a NULL row.
> b) "SELECT MIN(field) FROM a join GROUP BY field;"
> should return no row.
> Summary of bug: send_row_on_empty_set() missed the case of when GROUP
> BY has been optimized away, treating (b) like (a).
> @ mysql-test/r/subquery_mat.result
> A NULL row was wrongly sent, without the code bugfix.
> @ mysql-test/r/subquery_mat_all.result
> Here there was no bug.
> @ mysql-test/r/subquery_mat_none.result
> Here there was no bug
> @ sql/sql_select.h
> Details of what happened are below.
>
> When semijoin is off and materialization is off.
> ================================================
> In resolve_subquery(), IN->EXISTS is done, subquery remains
> non-correlated.
> Constant tables t1 and t3 are discovered in make_join_statistics().
> Subquery, non-correlated, is evaluated in make_join_select(), here:
> if (const_cond&& !const_cond->val_int())
> It is found empty, so the WHERE clause is recognized as impossible,
> JOIN::optimize() terminates right after make_join_select(),
> JOIN::exec() calls return_zero_rows(). send_row_on_empty_set() sees
> that we have GROUP BY (group_list!=NULL) so returns "false" and
> return_zero_rows() correctly sends an empty result.
>
> When semijoin is on and materialization is off.
> ===============================================
> Semijoin transformation is not done because there is a LEFT
> JOIN. IN->EXISTS is done, we are back to the case above.
>
> When semijoin is on and materialization is on.
> ==============================================
> Semijoin transformation is not done because there is a LEFT
> JOIN. Materialization is not done (indeed
> flatten_subqueries() doesn't try to fallback to
> materialization - looks like a bug).
> IN->EXISTS is done, we are back to the case above.
>
> When semijoin is off and materialization is on (the bug).
> =========================================================
> In resolve_subquery(), subquery is scheduled for materialization.
> Constant tables t1 and t3 are discovered in make_join_statistics().
> The (top) query's optimization then progresses while *not* caring for
> the subquery; "not caring" is a requirement of subquery
> materialization: see for example:
>
> else if (cond->const_item()&& !cond->is_expensive())
> /*
> DontEvaluateMaterializedSubqueryTooEarly:
> TODO:
> Excluding all expensive functions is too restritive we should exclude only
> materialized IN subquery predicates because they can't yet be evaluated
> here (they need additional initialization that is done later on).
>
> The proper way to exclude the subqueries would be to walk the cond tree
> and check for materialized subqueries there.
> */
> {
> *cond_value= eval_const_cond(cond) ? Item::COND_TRUE : Item::COND_FALSE;
>
> which postpones evaluation of the subquery because it's
> to-be-materialized. So make_join_select() doesn't evaluate the
> subquery this time. Thus, query's optimization is not shortcut.
> The query's execution, in do_select(), sees that all tables of the
> LEFT JOIN (i.e. t1, t3) are constant, here :
> " if (join->tables == join->const_tables)"
> so we have 0 or one row in the result, then do_select()
> evaluates the WHERE clause here:
> " if (!join->conds || join->conds->val_int())"
> which evaluates the subquery. Which is found empty so we go into
> else if (join->send_row_on_empty_set())
> But this time, send_row_on_empty_set() returns "true". It's because
> it thinks there is no GROUP BY. It thinks so because the top query's
> optimization, as it was not shortcut early this time, had reached:
> group_list= remove_const(this, (old_group_list= group_list), conds,
> rollup.state == ROLLUP::STATE_NONE,
> &simple_group);
> This remove_const(), which knew that the GROUP BY column (field3) is
> from a constant table (t1), was able to optimize away GROUP BY: it
> thus set group_list to NULL. Making send_row_on_empty_set() go wrong:
> then NULL was wrongly sent.
>
> The fix: make send_row_on_empty_set() aware of group_optimized_away, a
> variable introduced in
> sp1r-holyfoot/hf@stripped/hfmain.(none)-20070731054604-56491
> for similar-looking problems.
>
> modified:
> mysql-test/include/subquery_mat.inc
> mysql-test/r/subquery_mat.result
> mysql-test/r/subquery_mat_all.result
> mysql-test/r/subquery_mat_none.result
> sql/sql_select.h
> === modified file 'mysql-test/include/subquery_mat.inc'
> --- a/mysql-test/include/subquery_mat.inc 2011-04-01 11:06:48 +0000
> +++ b/mysql-test/include/subquery_mat.inc 2011-06-20 14:29:59 +0000
> @@ -882,3 +882,32 @@ DROP TABLE t1, t2;
> DROP VIEW v3;
>
> --echo # End Bug#11852644
> +
> +--echo
> +--echo # Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW
> +--echo # INSTEAD OF NULL WHEN MATERIALIZATION ON
> +--echo
> +
> +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY;
> +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t2 VALUES (8),(7);
> +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t3 VALUES (7);
> +
> +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3
> +FROM t3
> + LEFT JOIN t1
> + ON t1.col_int_nokey
> +WHERE (194, 200) IN (
> + SELECT SQ4_alias1.col_int_nokey,
> + SQ4_alias2.col_int_nokey
> + FROM t2 AS SQ4_alias1
> + JOIN
> + t2 AS SQ4_alias2
> + ON SQ4_alias2.col_int_nokey = 5
> + )
> +GROUP BY field3 ;
t1.col_int_nokey in the SELECT list makes the query non-deterministic (even though this is not a big issue for a single-row query). Is it possible to remove that column and still have a viable test case?
> +
> +DROP TABLE t1;
> +DROP TABLE t2;
> +DROP TABLE t3;
>
> === modified file 'mysql-test/r/subquery_mat.result'
> --- a/mysql-test/r/subquery_mat.result 2011-04-01 11:06:48 +0000
> +++ b/mysql-test/r/subquery_mat.result 2011-06-20 14:29:59 +0000
> @@ -1152,4 +1152,30 @@ r
> DROP TABLE t1, t2;
> DROP VIEW v3;
> # End Bug#11852644
> +
> +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW
> +# INSTEAD OF NULL WHEN MATERIALIZATION ON
> +
> +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY;
> +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t2 VALUES (8),(7);
> +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t3 VALUES (7);
> +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3
> +FROM t3
> +LEFT JOIN t1
> +ON t1.col_int_nokey
> +WHERE (194, 200) IN (
> +SELECT SQ4_alias1.col_int_nokey,
> +SQ4_alias2.col_int_nokey
> +FROM t2 AS SQ4_alias1
> +JOIN
> +t2 AS SQ4_alias2
> +ON SQ4_alias2.col_int_nokey = 5
> +)
> +GROUP BY field3 ;
> +MIN(t3.col_int_nokey) field3
> +DROP TABLE t1;
> +DROP TABLE t2;
> +DROP TABLE t3;
> set optimizer_switch=default;
>
> === modified file 'mysql-test/r/subquery_mat_all.result'
> --- a/mysql-test/r/subquery_mat_all.result 2011-04-01 11:06:48 +0000
> +++ b/mysql-test/r/subquery_mat_all.result 2011-06-20 14:29:59 +0000
> @@ -1151,4 +1151,30 @@ r
> DROP TABLE t1, t2;
> DROP VIEW v3;
> # End Bug#11852644
> +
> +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW
> +# INSTEAD OF NULL WHEN MATERIALIZATION ON
> +
> +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY;
> +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t2 VALUES (8),(7);
> +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t3 VALUES (7);
> +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3
> +FROM t3
> +LEFT JOIN t1
> +ON t1.col_int_nokey
> +WHERE (194, 200) IN (
> +SELECT SQ4_alias1.col_int_nokey,
> +SQ4_alias2.col_int_nokey
> +FROM t2 AS SQ4_alias1
> +JOIN
> +t2 AS SQ4_alias2
> +ON SQ4_alias2.col_int_nokey = 5
> +)
> +GROUP BY field3 ;
> +MIN(t3.col_int_nokey) field3
> +DROP TABLE t1;
> +DROP TABLE t2;
> +DROP TABLE t3;
> set optimizer_switch=default;
>
> === modified file 'mysql-test/r/subquery_mat_none.result'
> --- a/mysql-test/r/subquery_mat_none.result 2011-04-01 11:06:48 +0000
> +++ b/mysql-test/r/subquery_mat_none.result 2011-06-20 14:29:59 +0000
> @@ -1150,4 +1150,30 @@ r
> DROP TABLE t1, t2;
> DROP VIEW v3;
> # End Bug#11852644
> +
> +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW
> +# INSTEAD OF NULL WHEN MATERIALIZATION ON
> +
> +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY;
> +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t2 VALUES (8),(7);
> +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t3 VALUES (7);
> +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3
> +FROM t3
> +LEFT JOIN t1
> +ON t1.col_int_nokey
> +WHERE (194, 200) IN (
> +SELECT SQ4_alias1.col_int_nokey,
> +SQ4_alias2.col_int_nokey
> +FROM t2 AS SQ4_alias1
> +JOIN
> +t2 AS SQ4_alias2
> +ON SQ4_alias2.col_int_nokey = 5
> +)
> +GROUP BY field3 ;
> +MIN(t3.col_int_nokey) field3
> +DROP TABLE t1;
> +DROP TABLE t2;
> +DROP TABLE t3;
> set optimizer_switch=default;
>
> === modified file 'sql/sql_select.h'
> --- a/sql/sql_select.h 2011-06-17 12:45:22 +0000
> +++ b/sql/sql_select.h 2011-06-20 14:29:59 +0000
> @@ -2029,10 +2029,16 @@ public:
> void clear();
> bool save_join_tab();
> bool init_save_join_tab();
> + /**
> + If there is an aggregate function (sum_func_count!=0), and no GROUP BY,
> + and the HAVING clause evaluates to "true", we should send a row even if
> + the join contains no rows. This is what the standard requires.
> + */
Suggestion: Structure the comment as follows:
/**
Send a row even if the join produced no rows if:
- there is an aggregate function (sum_func_count!=0), and
- the query is not grouped, and
- a possible HAVING clause evaluates to TRUE.
*/
I am not completely sure about the standards compliance, but I am unable to find
a test case that causes an error. In SQL-2008, there is this syntax: "GROUP BY
()" which creates one group with 0 or more rows in it. An aggregate function
without an explicit GROUP BY implies GROUP BY (). A HAVING clause without a
preceding GROUP BY also implies GROUP BY ().
MySQL does not support GROUP BY (), but the construct is merely a formalized
version of the implicit grouping found in earlier SQL standards.
This will cause the select statement:
SELECT 1 FROM t WHERE HAVING COUNT(*) = 0;
to report a row, and indeed it does. However, the execution does not reach
JOIN::send_row_on_empty_set(), so I am not quite sure how that row is generated...
Beware also the MySQL extension that HAVING without a set function is equivalent
to a WHERE...
> bool send_row_on_empty_set()
> {
> return (do_send_rows&& tmp_table_param.sum_func_count != 0&&
> - !group_list&& select_lex->having_value != Item::COND_FALSE);
> + !(group_list != NULL || group_optimized_away)&&
> + select_lex->having_value != Item::COND_FALSE);
> }
> bool change_result(select_result *result);
> bool is_top_level_join() const
Thanks,
Roy