From: Roy Lyseng Date: February 15 2011 12:12pm Subject: Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839 List-Archive: http://lists.mysql.com/commits/131316 Message-Id: <4D5A6DC9.60007@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Martin, thank you for fixing this problem. The fix is approved, but please fix the comments before pushing. On 09.02.11 10.58, Martin Hansson wrote: > #At file:///data0/martin/bzrroot/bug59839/5.1-commit/ based on revid:dmitry.shulga@stripped > > 3587 Martin Hansson 2011-02-09 > Bug#59839: Aggregation followed by subquery yields wrong result > > The loop that was looping over subqueries' references to outer field used a > local boolean variable to tell whether the field was grouped or not. But the > implementor failed to reset the variable after each iteration. Thus a field > that was not directly aggregated appeared to be. > > Fixed by resetting the variable upon each new iteration. > > modified: > mysql-test/r/group_by.result > mysql-test/t/group_by.test > sql/sql_select.cc > === modified file 'mysql-test/r/group_by.result' > --- a/mysql-test/r/group_by.result 2010-10-29 08:23:06 +0000 > +++ b/mysql-test/r/group_by.result 2011-02-09 09:58:43 +0000 > @@ -1855,4 +1855,40 @@ ON 1 WHERE t2.f1> 1 GROUP BY t2.f1; > COUNT(*) > 2 > DROP TABLE t1; > +# > +# Bug#59839: Aggregation followed by subquery yields wrong result > +# > +CREATE TABLE t1 ( > +a INT, > +b INT, > +c INT, > +KEY (a, b) > +); > +INSERT INTO t1 VALUES > +( 1, 1, 1 ), > +( 1, 2, 2 ), > +( 1, 3, 3 ), > +( 1, 4, 6 ), > +( 1, 5, 5 ), > +( 1, 9, 13 ), > +( 2, 1, 6 ), > +( 2, 2, 7 ), > +( 2, 3, 8 ); > +EXPLAIN > +SELECT a, AVG(t1.b), > +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c, > +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c > +FROM t1 GROUP BY a; > +id select_type table type possible_keys key key_len ref rows Extra > +1 PRIMARY t1 index NULL a 10 NULL 9 Using index > +3 DEPENDENT SUBQUERY t12 ref a a 10 func,func 2 Using where > +2 DEPENDENT SUBQUERY t11 ref a a 10 func,func 2 Using where > +SELECT a, AVG(t1.b), > +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c, > +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c > +FROM t1 GROUP BY a; > +a AVG(t1.b) t11c t12c > +1 4.0000 6 6 > +2 2.0000 7 7 > +DROP TABLE t1; > # End of 5.1 tests > > === modified file 'mysql-test/t/group_by.test' > --- a/mysql-test/t/group_by.test 2010-10-29 08:23:06 +0000 > +++ b/mysql-test/t/group_by.test 2011-02-09 09:58:43 +0000 > @@ -1247,4 +1247,41 @@ ON 1 WHERE t2.f1> 1 GROUP BY t2.f1; > > DROP TABLE t1; > > +--echo # > +--echo # Bug#59839: Aggregation followed by subquery yields wrong result > +--echo # > + > +CREATE TABLE t1 ( > + a INT, > + b INT, > + c INT, > + KEY (a, b) > +); > + > +INSERT INTO t1 VALUES > + ( 1, 1, 1 ), > + ( 1, 2, 2 ), > + ( 1, 3, 3 ), > + ( 1, 4, 6 ), > + ( 1, 5, 5 ), > + ( 1, 9, 13 ), > + > + ( 2, 1, 6 ), > + ( 2, 2, 7 ), > + ( 2, 3, 8 ); > + > +EXPLAIN > +SELECT a, AVG(t1.b), > +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c, > +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c > +FROM t1 GROUP BY a; > + > +SELECT a, AVG(t1.b), > +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c, > +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c > +FROM t1 GROUP BY a; I'd suggest that you write something like: set $query=SELECT ...; eval EXPLAIN $query; eval $query; That way you have to write complex query strings only once. > + > +DROP TABLE t1; > + > + > --echo # End of 5.1 tests > > === modified file 'sql/sql_select.cc' > --- a/sql/sql_select.cc 2011-02-08 11:52:33 +0000 > +++ b/sql/sql_select.cc 2011-02-09 09:58:43 +0000 > @@ -281,46 +281,54 @@ bool handle_select(THD *thd, LEX *lex, s > /* /* -> /** (for Doxygen) > Fix fields referenced from inner selects. > > - SYNOPSIS > - fix_inner_refs() > - thd Thread handle > - all_fields List of all fields used in select > - select Current select > - ref_pointer_array Array of references to Items used in current select > - group_list GROUP BY list (is NULL by default) > + @param thd Thread handle > + @param all_fields List of all fields used in select > + @param select Current select > + @param ref_pointer_array Array of references to Items used in current select > + @param group_list GROUP BY list (is NULL by default) > > - DESCRIPTION > - The function serves 3 purposes - adds fields referenced from inner > - selects to the current select list, resolves which class to use > - to access referenced item (Item_ref of Item_direct_ref) and fixes > - references (Item_ref objects) to these fields. > + @details > + The function serves 3 purposes > + > + - adds fields referenced from inner query blocks to the current select list > > - If a field isn't already in the select list and the ref_pointer_array > + - resolves which class to use to access referenced item (Item_ref or resolves -> decides access referenced item -> reference the items > + Item_direct_ref) > + > + - fixes references (Item_ref objects) to these fields. fields -> items > + > + If a field isn't already on the select list and the ref_pointer_array > is provided then it is added to the all_fields list and the pointer to > it is saved in the ref_pointer_array. > > The class to access the outer field is determined by the following rules: > - 1. If the outer field isn't used under an aggregate function > - then the Item_ref class should be used. > - 2. If the outer field is used under an aggregate function and this > - function is aggregated in the select where the outer field was > - resolved or in some more inner select then the Item_direct_ref > - class should be used. > - Also it should be used if we are grouping by a subquery containing > - the outer field. > + > + -#. If the outer field isn't used under an aggregate function then the > + Item_ref class should be used. > + > + -#. If the outer field is used under an aggregate function and this > + function is, in turn, aggregated in the query block where the outer > + field was resolved or some query nested therein, then the > + Item_direct_ref class should be used. Also it should be used if we are > + grouping by a subquery containing the outer field. > + > The resolution is done here and not at the fix_fields() stage as > - it can be done only after sum functions are fixed and pulled up to > - selects where they are have to be aggregated. > + it can be done only after aggregate functions are fixed and pulled up to > + selects where they are to be aggregated. > + > When the class is chosen it substitutes the original field in the > Item_outer_ref object. > > After this we proceed with fixing references (Item_outer_ref objects) to > this field from inner subqueries. > > - RETURN > - TRUE an error occured > - FALSE ok > -*/ > + @return Status > + @retval true An error occured. > + @retval false OK. This return convention is so common, that I would advice you to write: @return true if an error occurred, false if success. > + > + @todo This comment must be restructured. It is jumping back and forth > + between abstraction levels and it uses inconsistent terminology. Hey! There is actual documentation here! You can either remove the @todo, fix the comment, or pinpoint the jumping and use of inconsistent terminology. I guess the inconsistet terminology issues are: field vs. item. select vs query block (I prefer query specification or query expression, they are SQL official terms), > + */ > > bool > fix_inner_refs(THD *thd, List &all_fields, SELECT_LEX *select, > @@ -328,11 +336,11 @@ fix_inner_refs(THD *thd, List &all > { > Item_outer_ref *ref; > bool res= FALSE; The local variable res is unused and can be removed. > - bool direct_ref= FALSE; > > List_iterator ref_it(select->inner_refs_list); > while ((ref= ref_it++)) > { > + bool direct_ref= false; > Item *item= ref->outer_ref; > Item **item_ref= ref->ref; > Item_ref *new_ref; Thanks, Roy