List:Commits« Previous MessageNext Message »
From:Martin Hansson Date:February 15 2011 1:45pm
Subject:Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839
View as plain text  
Hi,
new patch coming up!

Roy Lyseng skrev 2011-02-15 13.12:
> Hi Martin,
>
> thank you for fixing this problem.
My motives are purely selfish, this bug obstructs the implementation of 
WL#5801 (loose index scan on joins) so I need it out of the way. ;)
>
> The fix is approved, but please fix the comments before pushing.
>
> On 09.02.11 10.58, Martin Hansson wrote:
>> #At file:///data0/martin/bzrroot/bug59839/5.1-commit/ based on 
>> revid:dmitry.shulga@stripped
>>
>>   3587 Martin Hansson    2011-02-09
>>        Bug#59839: Aggregation followed by subquery yields wrong result
>>
>>        The loop that was looping over subqueries' references to outer 
>> field used a
>>        local boolean variable to tell whether the field was grouped 
>> or not. But the
>>        implementor failed to reset the variable after each iteration. 
>> Thus a field
>>        that was not directly aggregated appeared to be.
>>
>>        Fixed by resetting the variable upon each new iteration.
>>
>>      modified:
>>        mysql-test/r/group_by.result
>>        mysql-test/t/group_by.test
>>        sql/sql_select.cc
>> === modified file 'mysql-test/r/group_by.result'
>> --- a/mysql-test/r/group_by.result    2010-10-29 08:23:06 +0000
>> +++ b/mysql-test/r/group_by.result    2011-02-09 09:58:43 +0000
>> @@ -1855,4 +1855,40 @@ ON 1 WHERE t2.f1>  1 GROUP BY t2.f1;
>>   COUNT(*)
>>   2
>>   DROP TABLE t1;
>> +#
>> +# Bug#59839: Aggregation followed by subquery yields wrong result
>> +#
>> +CREATE TABLE t1 (
>> +a INT,
>> +b INT,
>> +c INT,
>> +KEY (a, b)
>> +);
>> +INSERT INTO t1 VALUES
>> +( 1, 1,  1 ),
>> +( 1, 2,  2 ),
>> +( 1, 3,  3 ),
>> +( 1, 4,  6 ),
>> +( 1, 5,  5 ),
>> +( 1, 9, 13 ),
>> +( 2, 1,  6 ),
>> +( 2, 2,  7 ),
>> +( 2, 3,  8 );
>> +EXPLAIN
>> +SELECT a, AVG(t1.b),
>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) 
>> AS t11c,
>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) 
>> AS t12c
>> +FROM t1 GROUP BY a;
>> +id    select_type    table    type    possible_keys    key    
>> key_len    ref    rows    Extra
>> +1    PRIMARY    t1    index    NULL    a    10    NULL    9    Using 
>> index
>> +3    DEPENDENT SUBQUERY    t12    ref    a    a    10    
>> func,func    2    Using where
>> +2    DEPENDENT SUBQUERY    t11    ref    a    a    10    
>> func,func    2    Using where
>> +SELECT a, AVG(t1.b),
>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) 
>> AS t11c,
>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) 
>> AS t12c
>> +FROM t1 GROUP BY a;
>> +a    AVG(t1.b)    t11c    t12c
>> +1    4.0000    6    6
>> +2    2.0000    7    7
>> +DROP TABLE t1;
>>   # End of 5.1 tests
>>
>> === modified file 'mysql-test/t/group_by.test'
>> --- a/mysql-test/t/group_by.test    2010-10-29 08:23:06 +0000
>> +++ b/mysql-test/t/group_by.test    2011-02-09 09:58:43 +0000
>> @@ -1247,4 +1247,41 @@ ON 1 WHERE t2.f1>  1 GROUP BY t2.f1;
>>
>>   DROP TABLE t1;
>>
>> +--echo #
>> +--echo # Bug#59839: Aggregation followed by subquery yields wrong 
>> result
>> +--echo #
>> +
>> +CREATE TABLE t1 (
>> +  a INT,
>> +  b INT,
>> +  c INT,
>> +  KEY (a, b)
>> +);
>> +
>> +INSERT INTO t1 VALUES
>> +  ( 1, 1,  1 ),
>> +  ( 1, 2,  2 ),
>> +  ( 1, 3,  3 ),
>> +  ( 1, 4,  6 ),
>> +  ( 1, 5,  5 ),
>> +  ( 1, 9, 13 ),
>> +
>> +  ( 2, 1,  6 ),
>> +  ( 2, 2,  7 ),
>> +  ( 2, 3,  8 );
>> +
>> +EXPLAIN
>> +SELECT a, AVG(t1.b),
>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) 
>> AS t11c,
>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) 
>> AS t12c
>> +FROM t1 GROUP BY a;
>> +
>> +SELECT a, AVG(t1.b),
>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) 
>> AS t11c,
>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) 
>> AS t12c
>> +FROM t1 GROUP BY a;
>
> I'd suggest that you write something like:
> set $query=SELECT ...;
> eval EXPLAIN $query;
> eval $query;
Well personally I don't like that style and haven't seen anybody else 
using it. If anything, we should adopt the new explain mode in 
mysql_test_run.
>
> That way you have to write complex query strings only once.
>> +
>> +DROP TABLE t1;
>> +
>> +
>>   --echo # End of 5.1 tests
>>
>> === modified file 'sql/sql_select.cc'
>> --- a/sql/sql_select.cc    2011-02-08 11:52:33 +0000
>> +++ b/sql/sql_select.cc    2011-02-09 09:58:43 +0000
>> @@ -281,46 +281,54 @@ bool handle_select(THD *thd, LEX *lex, s
>>   /*
>
> /* -> /** (for Doxygen)
done
>>     Fix fields referenced from inner selects.
>>
>> -  SYNOPSIS
>> -    fix_inner_refs()
>> -    thd               Thread handle
>> -    all_fields        List of all fields used in select
>> -    select            Current select
>> -    ref_pointer_array Array of references to Items used in current 
>> select
>> -    group_list        GROUP BY list (is NULL by default)
>> +  @param thd               Thread handle
>> +  @param all_fields        List of all fields used in select
>> +  @param select            Current select
>> +  @param ref_pointer_array Array of references to Items used in 
>> current select
>> +  @param group_list        GROUP BY list (is NULL by default)
>>
>> -  DESCRIPTION
>> -    The function serves 3 purposes - adds fields referenced from inner
>> -    selects to the current select list, resolves which class to use
>> -    to access referenced item (Item_ref of Item_direct_ref) and fixes
>> -    references (Item_ref objects) to these fields.
>> +  @details
>> +    The function serves 3 purposes
>> +
>> +    - adds fields referenced from inner query blocks to the current 
>> select list
>>
>> -    If a field isn't already in the select list and the 
>> ref_pointer_array
>> +    - resolves which class to use to access referenced item 
>> (Item_ref or
> resolves -> decides
You're right!
> access referenced item -> reference the items
ok
>> +      Item_direct_ref)
>> +
>> +    - fixes references (Item_ref objects) to these fields.
> fields -> items
Not sure about that one. Can you prove it isn't always fields?
>> +
>> +    If a field isn't already on the select list and the 
>> ref_pointer_array
>>       is provided then it is added to the all_fields list and the 
>> pointer to
>>       it is saved in the ref_pointer_array.
>>
>>       The class to access the outer field is determined by the 
>> following rules:
>> -    1. If the outer field isn't used under an aggregate function
>> -      then the Item_ref class should be used.
>> -    2. If the outer field is used under an aggregate function and this
>> -      function is aggregated in the select where the outer field was
>> -      resolved or in some more inner select then the Item_direct_ref
>> -      class should be used.
>> -      Also it should be used if we are grouping by a subquery 
>> containing
>> -      the outer field.
>> +
>> +    -#. If the outer field isn't used under an aggregate function 
>> then the
>> +        Item_ref class should be used.
>> +
>> +    -#. If the outer field is used under an aggregate function and this
>> +        function is, in turn, aggregated in the query block where 
>> the outer
>> +        field was resolved or some query nested therein, then the
>> +        Item_direct_ref class should be used. Also it should be used 
>> if we are
>> +        grouping by a subquery containing the outer field.
>> +
>>       The resolution is done here and not at the fix_fields() stage as
>> -    it can be done only after sum functions are fixed and pulled up to
>> -    selects where they are have to be aggregated.
>> +    it can be done only after aggregate functions are fixed and 
>> pulled up to
>> +    selects where they are to be aggregated.
>> +
>>       When the class is chosen it substitutes the original field in the
>>       Item_outer_ref object.
>>
>>       After this we proceed with fixing references (Item_outer_ref 
>> objects) to
>>       this field from inner subqueries.
>>
>> -  RETURN
>> -    TRUE  an error occured
>> -    FALSE ok
>> -*/
>> +  @return Status
>> +  @retval true An error occured.
>> +  @retval false OK.
> This return convention is so common, that I would advice you to write:
>   @return true if an error occurred, false if success.
Us humans can understand that just fine, but poor Doxygen can't.
>> +
>> +  @todo This comment must be restructured. It is jumping back and forth
>> +        between abstraction levels and it uses inconsistent 
>> terminology.
>
> Hey! There is actual documentation here! You can either remove the 
> @todo, fix the comment, or pinpoint the jumping and use of 
> inconsistent terminology.
My apologies, didn't mean to offend anyone. I realize that this function 
has been gradually edited from its initial appearance. In order to make 
amends I'll try to do all three!

It looks like the original author was using the term "Sum" instead of 
aggregate. And someone (perhaps you) had been kind enough to change it 
to "aggregate" in most places, so I fixed the rest of them.

So I try leaving it in a little better state than I found it. But I 
think making the comment 100% correct is beyond a bug fix.

In my defense I'll add that this line was not meant to go in the patch. 
It was more like a memento for me that kinda slipped by.
>
> I guess the inconsistet terminology issues are: field vs. item. select 
> vs query block (I prefer query specification or query expression, they 
> are SQL official terms),
Yes, and sum versus aggregate, fix versus resolve, and what does "pull 
up" mean?

And the comment is jumping back and forth between SQL terminology, 
high-level design and class layout and list structures, etc.

But it isn't as simple as search-and-replace either. For instance I 
suspect that "fix" means different things in different sentences.
>> + */
>>
>>   bool
>>   fix_inner_refs(THD *thd, List<Item> &all_fields, SELECT_LEX *select,
>> @@ -328,11 +336,11 @@ fix_inner_refs(THD *thd, List<Item> &all
>>   {
>>     Item_outer_ref *ref;
>>     bool res= FALSE;
> The local variable res is unused and can be removed.
Not the bug strictly speaking, but since it gives me an excuse to change 
FALSE to false I'll add it in. :-)
>> -  bool direct_ref= FALSE;
>>
>>     List_iterator<Item_outer_ref>  ref_it(select->inner_refs_list);
>>     while ((ref= ref_it++))
>>     {
>> +    bool direct_ref= false;
>>       Item *item= ref->outer_ref;
>>       Item **item_ref= ref->ref;
>>       Item_ref *new_ref;
> Thanks,
> Roy
Thank you for the review!

Martin
Thread
bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Martin Hansson9 Feb
Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Jorgen Loland9 Feb
  • Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Martin Hansson9 Feb
Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Roy Lyseng15 Feb
  • Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Martin Hansson15 Feb
    • Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Roy Lyseng15 Feb
      • Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Martin Hansson15 Feb