List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:June 22 2011 2:59pm
Subject:Re: bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294
View as plain text  
Hi Guilhem,

this bugfix is approved, I have just a comment comment below.

On 20.06.11 16.30, Guilhem Bichot wrote:
> #At file:///home/mysql_src/bzrrepos_new/mysql-next-mr-opt-backporting/ based on
> revid:guilhem.bichot@stripped
>
>   3386 Guilhem Bichot	2011-06-20
>        Fix for Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW INSTEAD OF
> NULL WHEN MATERIALIZATION ON
>        Reminder of rules, when a join has no rows:
>        a) "SELECT MIN(field) FROM a join;"
>        should return a NULL row.
>        b) "SELECT MIN(field) FROM a join GROUP BY field;"
>        should return no row.
>        Summary of bug: send_row_on_empty_set() missed the case of when GROUP
>        BY has been optimized away, treating (b) like (a).
>       @ mysql-test/r/subquery_mat.result
>          A NULL row was wrongly sent, without the code bugfix.
>       @ mysql-test/r/subquery_mat_all.result
>          Here there was no bug.
>       @ mysql-test/r/subquery_mat_none.result
>          Here there was no bug
>       @ sql/sql_select.h
>          Details of what happened are below.
>
>          When semijoin is off and materialization is off.
>          ================================================
>          In resolve_subquery(), IN->EXISTS is done, subquery remains
>          non-correlated.
>          Constant tables t1 and t3 are discovered in make_join_statistics().
>          Subquery, non-correlated, is evaluated in make_join_select(), here:
>             if (const_cond&&  !const_cond->val_int())
>          It is found empty, so the WHERE clause is recognized as impossible,
>          JOIN::optimize() terminates right after make_join_select(),
>          JOIN::exec() calls return_zero_rows(). send_row_on_empty_set() sees
>          that we have GROUP BY (group_list!=NULL) so returns "false" and
>          return_zero_rows() correctly sends an empty result.
>
>          When semijoin is on and materialization is off.
>          ===============================================
>          Semijoin transformation is not done because there is a LEFT
>          JOIN. IN->EXISTS is done, we are back to the case above.
>
>          When semijoin is on and materialization is on.
>          ==============================================
>          Semijoin transformation is not done because there is a LEFT
>          JOIN. Materialization is not done (indeed
>          flatten_subqueries() doesn't try to fallback to
>          materialization - looks like a bug).
>          IN->EXISTS is done, we are back to the case above.
>
>          When semijoin is off and materialization is on (the bug).
>          =========================================================
>          In resolve_subquery(), subquery is scheduled for materialization.
>          Constant tables t1 and t3 are discovered in make_join_statistics().
>          The (top) query's optimization then progresses while *not* caring for
>          the subquery; "not caring" is a requirement of subquery
>          materialization: see for example:
>          <code>
>            else if (cond->const_item()&&  !cond->is_expensive())
>            /*
>              DontEvaluateMaterializedSubqueryTooEarly:
>              TODO:
>              Excluding all expensive functions is too restritive we should exclude
> only
>              materialized IN subquery predicates because they can't yet be evaluated
>              here (they need additional initialization that is done later on).
>
>              The proper way to exclude the subqueries would be to walk the cond tree
>              and check for materialized subqueries there.
>            */
>            {
>              *cond_value= eval_const_cond(cond) ? Item::COND_TRUE :
> Item::COND_FALSE;
>          </code>
>          which postpones evaluation of the subquery because it's
>          to-be-materialized. So make_join_select() doesn't evaluate the
>          subquery this time. Thus, query's optimization is not shortcut.
>          The query's execution, in do_select(), sees that all tables of the
>          LEFT JOIN (i.e. t1, t3) are constant, here :
>          "  if (join->tables == join->const_tables)"
>          so we have 0 or one row in the result, then do_select()
>          evaluates the WHERE clause here:
>          "    if (!join->conds || join->conds->val_int())"
>          which evaluates the subquery. Which is found empty so we go into
>              else if (join->send_row_on_empty_set())
>          But this time, send_row_on_empty_set() returns "true". It's because
>          it thinks there is no GROUP BY. It thinks so because the top query's
>          optimization, as it was not shortcut early this time, had reached:
>              group_list= remove_const(this, (old_group_list= group_list), conds,
>                                       rollup.state == ROLLUP::STATE_NONE,
>          			&simple_group);
>          This remove_const(), which knew that the GROUP BY column (field3) is
>          from a constant table (t1), was able to optimize away GROUP BY: it
>          thus set group_list to NULL. Making send_row_on_empty_set() go wrong:
>          then NULL was wrongly sent.
>
>          The fix: make send_row_on_empty_set() aware of group_optimized_away, a
>          variable introduced in
>          sp1r-holyfoot/hf@stripped/hfmain.(none)-20070731054604-56491
>          for similar-looking problems.
>
>      modified:
>        mysql-test/include/subquery_mat.inc
>        mysql-test/r/subquery_mat.result
>        mysql-test/r/subquery_mat_all.result
>        mysql-test/r/subquery_mat_none.result
>        sql/sql_select.h
> === modified file 'mysql-test/include/subquery_mat.inc'
> --- a/mysql-test/include/subquery_mat.inc	2011-04-01 11:06:48 +0000
> +++ b/mysql-test/include/subquery_mat.inc	2011-06-20 14:29:59 +0000
> @@ -882,3 +882,32 @@ DROP TABLE t1, t2;
>   DROP VIEW v3;
>
>   --echo # End Bug#11852644
> +
> +--echo
> +--echo # Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW
> +--echo # INSTEAD OF NULL WHEN MATERIALIZATION ON
> +--echo
> +
> +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY;
> +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t2 VALUES (8),(7);
> +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t3 VALUES (7);
> +
> +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3
> +FROM t3
> +     LEFT JOIN t1
> +     ON t1.col_int_nokey
> +WHERE (194, 200) IN (
> +                     SELECT SQ4_alias1.col_int_nokey,
> +                     SQ4_alias2.col_int_nokey
> +                     FROM t2 AS SQ4_alias1
> +                          JOIN
> +                          t2 AS SQ4_alias2
> +                          ON SQ4_alias2.col_int_nokey = 5
> +                    )
> +GROUP BY field3 ;

t1.col_int_nokey in the SELECT list makes the query non-deterministic (even though this is
not a big issue for a single-row query). Is it possible to remove that column and still
have a viable test case?


> +
> +DROP TABLE t1;
> +DROP TABLE t2;
> +DROP TABLE t3;
>
> === modified file 'mysql-test/r/subquery_mat.result'
> --- a/mysql-test/r/subquery_mat.result	2011-04-01 11:06:48 +0000
> +++ b/mysql-test/r/subquery_mat.result	2011-06-20 14:29:59 +0000
> @@ -1152,4 +1152,30 @@ r
>   DROP TABLE t1, t2;
>   DROP VIEW v3;
>   # End Bug#11852644
> +
> +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW
> +# INSTEAD OF NULL WHEN MATERIALIZATION ON
> +
> +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY;
> +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t2 VALUES (8),(7);
> +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t3 VALUES (7);
> +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3
> +FROM t3
> +LEFT JOIN t1
> +ON t1.col_int_nokey
> +WHERE (194, 200) IN (
> +SELECT SQ4_alias1.col_int_nokey,
> +SQ4_alias2.col_int_nokey
> +FROM t2 AS SQ4_alias1
> +JOIN
> +t2 AS SQ4_alias2
> +ON SQ4_alias2.col_int_nokey = 5
> +)
> +GROUP BY field3 ;
> +MIN(t3.col_int_nokey)	field3
> +DROP TABLE t1;
> +DROP TABLE t2;
> +DROP TABLE t3;
>   set optimizer_switch=default;
>
> === modified file 'mysql-test/r/subquery_mat_all.result'
> --- a/mysql-test/r/subquery_mat_all.result	2011-04-01 11:06:48 +0000
> +++ b/mysql-test/r/subquery_mat_all.result	2011-06-20 14:29:59 +0000
> @@ -1151,4 +1151,30 @@ r
>   DROP TABLE t1, t2;
>   DROP VIEW v3;
>   # End Bug#11852644
> +
> +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW
> +# INSTEAD OF NULL WHEN MATERIALIZATION ON
> +
> +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY;
> +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t2 VALUES (8),(7);
> +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t3 VALUES (7);
> +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3
> +FROM t3
> +LEFT JOIN t1
> +ON t1.col_int_nokey
> +WHERE (194, 200) IN (
> +SELECT SQ4_alias1.col_int_nokey,
> +SQ4_alias2.col_int_nokey
> +FROM t2 AS SQ4_alias1
> +JOIN
> +t2 AS SQ4_alias2
> +ON SQ4_alias2.col_int_nokey = 5
> +)
> +GROUP BY field3 ;
> +MIN(t3.col_int_nokey)	field3
> +DROP TABLE t1;
> +DROP TABLE t2;
> +DROP TABLE t3;
>   set optimizer_switch=default;
>
> === modified file 'mysql-test/r/subquery_mat_none.result'
> --- a/mysql-test/r/subquery_mat_none.result	2011-04-01 11:06:48 +0000
> +++ b/mysql-test/r/subquery_mat_none.result	2011-06-20 14:29:59 +0000
> @@ -1150,4 +1150,30 @@ r
>   DROP TABLE t1, t2;
>   DROP VIEW v3;
>   # End Bug#11852644
> +
> +# Bug#12668294 - GROUP BY ON EMPTY RESULT GIVES EMPTY ROW
> +# INSTEAD OF NULL WHEN MATERIALIZATION ON
> +
> +CREATE TABLE t1 (col_int_nokey INT) ENGINE=MEMORY;
> +CREATE TABLE t2 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t2 VALUES (8),(7);
> +CREATE TABLE t3 (col_int_nokey INT) ENGINE=MEMORY;
> +INSERT INTO t3 VALUES (7);
> +SELECT MIN(t3.col_int_nokey),t1.col_int_nokey AS field3
> +FROM t3
> +LEFT JOIN t1
> +ON t1.col_int_nokey
> +WHERE (194, 200) IN (
> +SELECT SQ4_alias1.col_int_nokey,
> +SQ4_alias2.col_int_nokey
> +FROM t2 AS SQ4_alias1
> +JOIN
> +t2 AS SQ4_alias2
> +ON SQ4_alias2.col_int_nokey = 5
> +)
> +GROUP BY field3 ;
> +MIN(t3.col_int_nokey)	field3
> +DROP TABLE t1;
> +DROP TABLE t2;
> +DROP TABLE t3;
>   set optimizer_switch=default;
>
> === modified file 'sql/sql_select.h'
> --- a/sql/sql_select.h	2011-06-17 12:45:22 +0000
> +++ b/sql/sql_select.h	2011-06-20 14:29:59 +0000
> @@ -2029,10 +2029,16 @@ public:
>     void clear();
>     bool save_join_tab();
>     bool init_save_join_tab();
> +  /**
> +     If there is an aggregate function (sum_func_count!=0), and no GROUP BY,
> +     and the HAVING clause evaluates to "true", we should send a row even if
> +     the join contains no rows. This is what the standard requires.
> +   */
Suggestion: Structure the comment as follows:

/**
   Send a row even if the join produced no rows if:
    -  there is an aggregate function (sum_func_count!=0), and
    - the query is not grouped, and
    - a possible HAVING clause evaluates to TRUE.
*/

I am not completely sure about the standards compliance, but I am unable to find 
a test case that causes an error. In SQL-2008, there is this syntax: "GROUP BY 
()" which creates one group with 0 or more rows in it. An aggregate function 
without an explicit GROUP BY implies GROUP BY (). A HAVING clause without a 
preceding GROUP BY also implies GROUP BY ().

MySQL does not support GROUP BY (), but the construct is merely a formalized 
version of the implicit grouping found in earlier SQL standards.

This will cause the select statement:
SELECT 1 FROM t WHERE <impossible-where> HAVING COUNT(*) = 0;
to report a row, and indeed it does. However, the execution does not reach 
JOIN::send_row_on_empty_set(), so I am not quite sure how that row is generated...

Beware also the MySQL extension that HAVING without a set function is equivalent 
to a WHERE...

>     bool send_row_on_empty_set()
>     {
>       return (do_send_rows&&  tmp_table_param.sum_func_count != 0&&
> -	    !group_list&&  select_lex->having_value != Item::COND_FALSE);
> +	    !(group_list != NULL || group_optimized_away)&&
> +            select_lex->having_value != Item::COND_FALSE);
>     }
>     bool change_result(select_result *result);
>     bool is_top_level_join() const

Thanks,
Roy
Thread
bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294Guilhem Bichot20 Jun
  • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294Roy Lyseng23 Jun
    • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294Guilhem Bichot25 Jun
  • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294Jorgen Loland23 Jun
    • Re: bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294Guilhem Bichot25 Jun
Re: bzr commit into mysql-trunk branch (guilhem.bichot:3386) Bug#12668294Roy Lyseng25 Jun