From: Roy Lyseng Date: February 15 2011 2:07pm Subject: Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839 List-Archive: http://lists.mysql.com/commits/131339 Message-Id: <4D5A88B7.5010304@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 15.02.11 14.45, Martin Hansson wrote: > Hi, > new patch coming up! > > Roy Lyseng skrev 2011-02-15 13.12: >> Hi Martin, >> >> thank you for fixing this problem. > My motives are purely selfish, this bug obstructs the implementation of WL#5801 > (loose index scan on joins) so I need it out of the way. ;) >> >> The fix is approved, but please fix the comments before pushing. >> >> On 09.02.11 10.58, Martin Hansson wrote: >>> #At file:///data0/martin/bzrroot/bug59839/5.1-commit/ based on >>> revid:dmitry.shulga@stripped >>> >>> 3587 Martin Hansson 2011-02-09 >>> Bug#59839: Aggregation followed by subquery yields wrong result >>> >>> The loop that was looping over subqueries' references to outer field used a >>> local boolean variable to tell whether the field was grouped or not. But the >>> implementor failed to reset the variable after each iteration. Thus a field >>> that was not directly aggregated appeared to be. >>> >>> Fixed by resetting the variable upon each new iteration. >>> >>> modified: >>> mysql-test/r/group_by.result >>> mysql-test/t/group_by.test >>> sql/sql_select.cc >>> === modified file 'mysql-test/r/group_by.result' >>> --- a/mysql-test/r/group_by.result 2010-10-29 08:23:06 +0000 >>> +++ b/mysql-test/r/group_by.result 2011-02-09 09:58:43 +0000 >>> @@ -1855,4 +1855,40 @@ ON 1 WHERE t2.f1> 1 GROUP BY t2.f1; >>> COUNT(*) >>> 2 >>> DROP TABLE t1; >>> +# >>> +# Bug#59839: Aggregation followed by subquery yields wrong result >>> +# >>> +CREATE TABLE t1 ( >>> +a INT, >>> +b INT, >>> +c INT, >>> +KEY (a, b) >>> +); >>> +INSERT INTO t1 VALUES >>> +( 1, 1, 1 ), >>> +( 1, 2, 2 ), >>> +( 1, 3, 3 ), >>> +( 1, 4, 6 ), >>> +( 1, 5, 5 ), >>> +( 1, 9, 13 ), >>> +( 2, 1, 6 ), >>> +( 2, 2, 7 ), >>> +( 2, 3, 8 ); >>> +EXPLAIN >>> +SELECT a, AVG(t1.b), >>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c, >>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c >>> +FROM t1 GROUP BY a; >>> +id select_type table type possible_keys key key_len ref rows Extra >>> +1 PRIMARY t1 index NULL a 10 NULL 9 Using index >>> +3 DEPENDENT SUBQUERY t12 ref a a 10 func,func 2 Using where >>> +2 DEPENDENT SUBQUERY t11 ref a a 10 func,func 2 Using where >>> +SELECT a, AVG(t1.b), >>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c, >>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c >>> +FROM t1 GROUP BY a; >>> +a AVG(t1.b) t11c t12c >>> +1 4.0000 6 6 >>> +2 2.0000 7 7 >>> +DROP TABLE t1; >>> # End of 5.1 tests >>> >>> === modified file 'mysql-test/t/group_by.test' >>> --- a/mysql-test/t/group_by.test 2010-10-29 08:23:06 +0000 >>> +++ b/mysql-test/t/group_by.test 2011-02-09 09:58:43 +0000 >>> @@ -1247,4 +1247,41 @@ ON 1 WHERE t2.f1> 1 GROUP BY t2.f1; >>> >>> DROP TABLE t1; >>> >>> +--echo # >>> +--echo # Bug#59839: Aggregation followed by subquery yields wrong result >>> +--echo # >>> + >>> +CREATE TABLE t1 ( >>> + a INT, >>> + b INT, >>> + c INT, >>> + KEY (a, b) >>> +); >>> + >>> +INSERT INTO t1 VALUES >>> + ( 1, 1, 1 ), >>> + ( 1, 2, 2 ), >>> + ( 1, 3, 3 ), >>> + ( 1, 4, 6 ), >>> + ( 1, 5, 5 ), >>> + ( 1, 9, 13 ), >>> + >>> + ( 2, 1, 6 ), >>> + ( 2, 2, 7 ), >>> + ( 2, 3, 8 ); >>> + >>> +EXPLAIN >>> +SELECT a, AVG(t1.b), >>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c, >>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c >>> +FROM t1 GROUP BY a; >>> + >>> +SELECT a, AVG(t1.b), >>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS t11c, >>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c >>> +FROM t1 GROUP BY a; >> >> I'd suggest that you write something like: >> set $query=SELECT ...; >> eval EXPLAIN $query; >> eval $query; > Well personally I don't like that style and haven't seen anybody else using it. ...but you did not challenge my argument :) Anyway, this was just a suggestion. > If anything, we should adopt the new explain mode in mysql_test_run. I'll wait to see how that mode turns out. Is it maintainable? >> >> That way you have to write complex query strings only once. >>> + >>> +DROP TABLE t1; >>> + >>> + >>> --echo # End of 5.1 tests >>> >>> === modified file 'sql/sql_select.cc' >>> --- a/sql/sql_select.cc 2011-02-08 11:52:33 +0000 >>> +++ b/sql/sql_select.cc 2011-02-09 09:58:43 +0000 >>> @@ -281,46 +281,54 @@ bool handle_select(THD *thd, LEX *lex, s >>> /* >> >> /* -> /** (for Doxygen) > done >>> Fix fields referenced from inner selects. >>> >>> - SYNOPSIS >>> - fix_inner_refs() >>> - thd Thread handle >>> - all_fields List of all fields used in select >>> - select Current select >>> - ref_pointer_array Array of references to Items used in current select >>> - group_list GROUP BY list (is NULL by default) >>> + @param thd Thread handle >>> + @param all_fields List of all fields used in select >>> + @param select Current select >>> + @param ref_pointer_array Array of references to Items used in current select >>> + @param group_list GROUP BY list (is NULL by default) >>> >>> - DESCRIPTION >>> - The function serves 3 purposes - adds fields referenced from inner >>> - selects to the current select list, resolves which class to use >>> - to access referenced item (Item_ref of Item_direct_ref) and fixes >>> - references (Item_ref objects) to these fields. >>> + @details >>> + The function serves 3 purposes >>> + >>> + - adds fields referenced from inner query blocks to the current select list >>> >>> - If a field isn't already in the select list and the ref_pointer_array >>> + - resolves which class to use to access referenced item (Item_ref or >> resolves -> decides > You're right! >> access referenced item -> reference the items > ok >>> + Item_direct_ref) >>> + >>> + - fixes references (Item_ref objects) to these fields. >> fields -> items > Not sure about that one. Can you prove it isn't always fields? Nope, this was more of a wild guess :D >>> + >>> + If a field isn't already on the select list and the ref_pointer_array >>> is provided then it is added to the all_fields list and the pointer to >>> it is saved in the ref_pointer_array. >>> >>> The class to access the outer field is determined by the following rules: >>> - 1. If the outer field isn't used under an aggregate function >>> - then the Item_ref class should be used. >>> - 2. If the outer field is used under an aggregate function and this >>> - function is aggregated in the select where the outer field was >>> - resolved or in some more inner select then the Item_direct_ref >>> - class should be used. >>> - Also it should be used if we are grouping by a subquery containing >>> - the outer field. >>> + >>> + -#. If the outer field isn't used under an aggregate function then the >>> + Item_ref class should be used. >>> + >>> + -#. If the outer field is used under an aggregate function and this >>> + function is, in turn, aggregated in the query block where the outer >>> + field was resolved or some query nested therein, then the >>> + Item_direct_ref class should be used. Also it should be used if we are >>> + grouping by a subquery containing the outer field. >>> + >>> The resolution is done here and not at the fix_fields() stage as >>> - it can be done only after sum functions are fixed and pulled up to >>> - selects where they are have to be aggregated. >>> + it can be done only after aggregate functions are fixed and pulled up to >>> + selects where they are to be aggregated. >>> + >>> When the class is chosen it substitutes the original field in the >>> Item_outer_ref object. >>> >>> After this we proceed with fixing references (Item_outer_ref objects) to >>> this field from inner subqueries. >>> >>> - RETURN >>> - TRUE an error occured >>> - FALSE ok >>> -*/ >>> + @return Status >>> + @retval true An error occured. >>> + @retval false OK. >> This return convention is so common, that I would advice you to write: >> @return true if an error occurred, false if success. > Us humans can understand that just fine, but poor Doxygen can't. Personally, I do not care that much about what Doxygen might think... >>> + >>> + @todo This comment must be restructured. It is jumping back and forth >>> + between abstraction levels and it uses inconsistent terminology. >> >> Hey! There is actual documentation here! You can either remove the @todo, fix >> the comment, or pinpoint the jumping and use of inconsistent terminology. > My apologies, didn't mean to offend anyone. I realize that this function has > been gradually edited from its initial appearance. In order to make amends I'll > try to do all three! > > It looks like the original author was using the term "Sum" instead of aggregate. > And someone (perhaps you) had been kind enough to change it to "aggregate" in > most places, so I fixed the rest of them. > > So I try leaving it in a little better state than I found it. But I think making > the comment 100% correct is beyond a bug fix. > > In my defense I'll add that this line was not meant to go in the patch. It was > more like a memento for me that kinda slipped by. >> >> I guess the inconsistet terminology issues are: field vs. item. select vs >> query block (I prefer query specification or query expression, they are SQL >> official terms), > Yes, and sum versus aggregate, fix versus resolve, and what does "pull up" mean? > > And the comment is jumping back and forth between SQL terminology, high-level > design and class layout and list structures, etc. Nevertheless, the description is almost spot-on. > > But it isn't as simple as search-and-replace either. For instance I suspect that > "fix" means different things in different sentences. It goes without saying. You cannot fix an inconsistent description by applying regular principles. >>> + */ >>> >>> bool >>> fix_inner_refs(THD *thd, List &all_fields, SELECT_LEX *select, >>> @@ -328,11 +336,11 @@ fix_inner_refs(THD *thd, List &all >>> { >>> Item_outer_ref *ref; >>> bool res= FALSE; >> The local variable res is unused and can be removed. > Not the bug strictly speaking, but since it gives me an excuse to change FALSE > to false I'll add it in. :-) This is the kind of minor, local refactoring I think that we can apply as part of regular bug fixing. It has (almost) zero risk, and is not easy to confuse with the actual bug fix. >>> - bool direct_ref= FALSE; >>> >>> List_iterator ref_it(select->inner_refs_list); >>> while ((ref= ref_it++)) >>> { >>> + bool direct_ref= false; >>> Item *item= ref->outer_ref; >>> Item **item_ref= ref->ref; >>> Item_ref *new_ref; >> Thanks, >> Roy > Thank you for the review! > > Martin