Hi Martin,
thank you for fixing this problem.
The fix is approved, but please fix the comments before pushing.
On 09.02.11 10.58, Martin Hansson wrote:
> #At file:///data0/martin/bzrroot/bug59839/5.1-commit/ based on
> revid:dmitry.shulga@stripped
>
> 3587 Martin Hansson 2011-02-09
> Bug#59839: Aggregation followed by subquery yields wrong result
>
> The loop that was looping over subqueries' references to outer field used a
> local boolean variable to tell whether the field was grouped or not. But the
> implementor failed to reset the variable after each iteration. Thus a field
> that was not directly aggregated appeared to be.
>
> Fixed by resetting the variable upon each new iteration.
>
> modified:
> mysql-test/r/group_by.result
> mysql-test/t/group_by.test
> sql/sql_select.cc
> === modified file 'mysql-test/r/group_by.result'
> --- a/mysql-test/r/group_by.result 2010-10-29 08:23:06 +0000
> +++ b/mysql-test/r/group_by.result 2011-02-09 09:58:43 +0000
> @@ -1855,4 +1855,40 @@ ON 1 WHERE t2.f1> 1 GROUP BY t2.f1;
> COUNT(*)
> 2
> DROP TABLE t1;
> +#
> +# Bug#59839: Aggregation followed by subquery yields wrong result
> +#
> +CREATE TABLE t1 (
> +a INT,
> +b INT,
> +c INT,
> +KEY (a, b)
> +);
> +INSERT INTO t1 VALUES
> +( 1, 1, 1 ),
> +( 1, 2, 2 ),
> +( 1, 3, 3 ),
> +( 1, 4, 6 ),
> +( 1, 5, 5 ),
> +( 1, 9, 13 ),
> +( 2, 1, 6 ),
> +( 2, 2, 7 ),
> +( 2, 3, 8 );
> +EXPLAIN
> +SELECT a, AVG(t1.b),
> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c,
> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c
> +FROM t1 GROUP BY a;
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 PRIMARY t1 index NULL a 10 NULL 9 Using index
> +3 DEPENDENT SUBQUERY t12 ref a a 10 func,func 2 Using where
> +2 DEPENDENT SUBQUERY t11 ref a a 10 func,func 2 Using where
> +SELECT a, AVG(t1.b),
> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c,
> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c
> +FROM t1 GROUP BY a;
> +a AVG(t1.b) t11c t12c
> +1 4.0000 6 6
> +2 2.0000 7 7
> +DROP TABLE t1;
> # End of 5.1 tests
>
> === modified file 'mysql-test/t/group_by.test'
> --- a/mysql-test/t/group_by.test 2010-10-29 08:23:06 +0000
> +++ b/mysql-test/t/group_by.test 2011-02-09 09:58:43 +0000
> @@ -1247,4 +1247,41 @@ ON 1 WHERE t2.f1> 1 GROUP BY t2.f1;
>
> DROP TABLE t1;
>
> +--echo #
> +--echo # Bug#59839: Aggregation followed by subquery yields wrong result
> +--echo #
> +
> +CREATE TABLE t1 (
> + a INT,
> + b INT,
> + c INT,
> + KEY (a, b)
> +);
> +
> +INSERT INTO t1 VALUES
> + ( 1, 1, 1 ),
> + ( 1, 2, 2 ),
> + ( 1, 3, 3 ),
> + ( 1, 4, 6 ),
> + ( 1, 5, 5 ),
> + ( 1, 9, 13 ),
> +
> + ( 2, 1, 6 ),
> + ( 2, 2, 7 ),
> + ( 2, 3, 8 );
> +
> +EXPLAIN
> +SELECT a, AVG(t1.b),
> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c,
> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c
> +FROM t1 GROUP BY a;
> +
> +SELECT a, AVG(t1.b),
> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c,
> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c
> +FROM t1 GROUP BY a;
I'd suggest that you write something like:
set $query=SELECT ...;
eval EXPLAIN $query;
eval $query;
That way you have to write complex query strings only once.
> +
> +DROP TABLE t1;
> +
> +
> --echo # End of 5.1 tests
>
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc 2011-02-08 11:52:33 +0000
> +++ b/sql/sql_select.cc 2011-02-09 09:58:43 +0000
> @@ -281,46 +281,54 @@ bool handle_select(THD *thd, LEX *lex, s
> /*
/* -> /** (for Doxygen)
> Fix fields referenced from inner selects.
>
> - SYNOPSIS
> - fix_inner_refs()
> - thd Thread handle
> - all_fields List of all fields used in select
> - select Current select
> - ref_pointer_array Array of references to Items used in current select
> - group_list GROUP BY list (is NULL by default)
> + @param thd Thread handle
> + @param all_fields List of all fields used in select
> + @param select Current select
> + @param ref_pointer_array Array of references to Items used in current select
> + @param group_list GROUP BY list (is NULL by default)
>
> - DESCRIPTION
> - The function serves 3 purposes - adds fields referenced from inner
> - selects to the current select list, resolves which class to use
> - to access referenced item (Item_ref of Item_direct_ref) and fixes
> - references (Item_ref objects) to these fields.
> + @details
> + The function serves 3 purposes
> +
> + - adds fields referenced from inner query blocks to the current select list
>
> - If a field isn't already in the select list and the ref_pointer_array
> + - resolves which class to use to access referenced item (Item_ref or
resolves -> decides
access referenced item -> reference the items
> + Item_direct_ref)
> +
> + - fixes references (Item_ref objects) to these fields.
fields -> items
> +
> + If a field isn't already on the select list and the ref_pointer_array
> is provided then it is added to the all_fields list and the pointer to
> it is saved in the ref_pointer_array.
>
> The class to access the outer field is determined by the following rules:
> - 1. If the outer field isn't used under an aggregate function
> - then the Item_ref class should be used.
> - 2. If the outer field is used under an aggregate function and this
> - function is aggregated in the select where the outer field was
> - resolved or in some more inner select then the Item_direct_ref
> - class should be used.
> - Also it should be used if we are grouping by a subquery containing
> - the outer field.
> +
> + -#. If the outer field isn't used under an aggregate function then the
> + Item_ref class should be used.
> +
> + -#. If the outer field is used under an aggregate function and this
> + function is, in turn, aggregated in the query block where the outer
> + field was resolved or some query nested therein, then the
> + Item_direct_ref class should be used. Also it should be used if we are
> + grouping by a subquery containing the outer field.
> +
> The resolution is done here and not at the fix_fields() stage as
> - it can be done only after sum functions are fixed and pulled up to
> - selects where they are have to be aggregated.
> + it can be done only after aggregate functions are fixed and pulled up to
> + selects where they are to be aggregated.
> +
> When the class is chosen it substitutes the original field in the
> Item_outer_ref object.
>
> After this we proceed with fixing references (Item_outer_ref objects) to
> this field from inner subqueries.
>
> - RETURN
> - TRUE an error occured
> - FALSE ok
> -*/
> + @return Status
> + @retval true An error occured.
> + @retval false OK.
This return convention is so common, that I would advice you to write:
@return true if an error occurred, false if success.
> +
> + @todo This comment must be restructured. It is jumping back and forth
> + between abstraction levels and it uses inconsistent terminology.
Hey! There is actual documentation here! You can either remove the @todo, fix
the comment, or pinpoint the jumping and use of inconsistent terminology.
I guess the inconsistet terminology issues are: field vs. item. select vs query
block (I prefer query specification or query expression, they are SQL official
terms),
> + */
>
> bool
> fix_inner_refs(THD *thd, List<Item> &all_fields, SELECT_LEX *select,
> @@ -328,11 +336,11 @@ fix_inner_refs(THD *thd, List<Item> &all
> {
> Item_outer_ref *ref;
> bool res= FALSE;
The local variable res is unused and can be removed.
> - bool direct_ref= FALSE;
>
> List_iterator<Item_outer_ref> ref_it(select->inner_refs_list);
> while ((ref= ref_it++))
> {
> + bool direct_ref= false;
> Item *item= ref->outer_ref;
> Item **item_ref= ref->ref;
> Item_ref *new_ref;
Thanks,
Roy