List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:February 15 2011 2:07pm
Subject:Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839
View as plain text  
On 15.02.11 14.45, Martin Hansson wrote:
> Hi,
> new patch coming up!
>
> Roy Lyseng skrev 2011-02-15 13.12:
>> Hi Martin,
>>
>> thank you for fixing this problem.
> My motives are purely selfish, this bug obstructs the implementation of WL#5801
> (loose index scan on joins) so I need it out of the way. ;)
>>
>> The fix is approved, but please fix the comments before pushing.
>>
>> On 09.02.11 10.58, Martin Hansson wrote:
>>> #At file:///data0/martin/bzrroot/bug59839/5.1-commit/ based on
>>> revid:dmitry.shulga@stripped
>>>
>>> 3587 Martin Hansson 2011-02-09
>>> Bug#59839: Aggregation followed by subquery yields wrong result
>>>
>>> The loop that was looping over subqueries' references to outer field used a
>>> local boolean variable to tell whether the field was grouped or not. But the
>>> implementor failed to reset the variable after each iteration. Thus a field
>>> that was not directly aggregated appeared to be.
>>>
>>> Fixed by resetting the variable upon each new iteration.
>>>
>>> modified:
>>> mysql-test/r/group_by.result
>>> mysql-test/t/group_by.test
>>> sql/sql_select.cc
>>> === modified file 'mysql-test/r/group_by.result'
>>> --- a/mysql-test/r/group_by.result 2010-10-29 08:23:06 +0000
>>> +++ b/mysql-test/r/group_by.result 2011-02-09 09:58:43 +0000
>>> @@ -1855,4 +1855,40 @@ ON 1 WHERE t2.f1> 1 GROUP BY t2.f1;
>>> COUNT(*)
>>> 2
>>> DROP TABLE t1;
>>> +#
>>> +# Bug#59839: Aggregation followed by subquery yields wrong result
>>> +#
>>> +CREATE TABLE t1 (
>>> +a INT,
>>> +b INT,
>>> +c INT,
>>> +KEY (a, b)
>>> +);
>>> +INSERT INTO t1 VALUES
>>> +( 1, 1, 1 ),
>>> +( 1, 2, 2 ),
>>> +( 1, 3, 3 ),
>>> +( 1, 4, 6 ),
>>> +( 1, 5, 5 ),
>>> +( 1, 9, 13 ),
>>> +( 2, 1, 6 ),
>>> +( 2, 2, 7 ),
>>> +( 2, 3, 8 );
>>> +EXPLAIN
>>> +SELECT a, AVG(t1.b),
>>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS
> t11c,
>>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c
>>> +FROM t1 GROUP BY a;
>>> +id select_type table type possible_keys key key_len ref rows Extra
>>> +1 PRIMARY t1 index NULL a 10 NULL 9 Using index
>>> +3 DEPENDENT SUBQUERY t12 ref a a 10 func,func 2 Using where
>>> +2 DEPENDENT SUBQUERY t11 ref a a 10 func,func 2 Using where
>>> +SELECT a, AVG(t1.b),
>>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS
> t11c,
>>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c
>>> +FROM t1 GROUP BY a;
>>> +a AVG(t1.b) t11c t12c
>>> +1 4.0000 6 6
>>> +2 2.0000 7 7
>>> +DROP TABLE t1;
>>> # End of 5.1 tests
>>>
>>> === modified file 'mysql-test/t/group_by.test'
>>> --- a/mysql-test/t/group_by.test 2010-10-29 08:23:06 +0000
>>> +++ b/mysql-test/t/group_by.test 2011-02-09 09:58:43 +0000
>>> @@ -1247,4 +1247,41 @@ ON 1 WHERE t2.f1> 1 GROUP BY t2.f1;
>>>
>>> DROP TABLE t1;
>>>
>>> +--echo #
>>> +--echo # Bug#59839: Aggregation followed by subquery yields wrong result
>>> +--echo #
>>> +
>>> +CREATE TABLE t1 (
>>> + a INT,
>>> + b INT,
>>> + c INT,
>>> + KEY (a, b)
>>> +);
>>> +
>>> +INSERT INTO t1 VALUES
>>> + ( 1, 1, 1 ),
>>> + ( 1, 2, 2 ),
>>> + ( 1, 3, 3 ),
>>> + ( 1, 4, 6 ),
>>> + ( 1, 5, 5 ),
>>> + ( 1, 9, 13 ),
>>> +
>>> + ( 2, 1, 6 ),
>>> + ( 2, 2, 7 ),
>>> + ( 2, 3, 8 );
>>> +
>>> +EXPLAIN
>>> +SELECT a, AVG(t1.b),
>>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS
> t11c,
>>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c
>>> +FROM t1 GROUP BY a;
>>> +
>>> +SELECT a, AVG(t1.b),
>>> +(SELECT t11.c FROM t1 t11 WHERE t11.a = t1.a AND t11.b = AVG(t1.b)) AS
> t11c,
>>> +(SELECT t12.c FROM t1 t12 WHERE t12.a = t1.a AND t12.b = AVG(t1.b)) AS t12c
>>> +FROM t1 GROUP BY a;
>>
>> I'd suggest that you write something like:
>> set $query=SELECT ...;
>> eval EXPLAIN $query;
>> eval $query;
> Well personally I don't like that style and haven't seen anybody else using it.

...but you did not challenge my argument :)

Anyway, this was just a suggestion.

> If anything, we should adopt the new explain mode in mysql_test_run.

I'll wait to see how that mode turns out. Is it maintainable?
>>
>> That way you have to write complex query strings only once.
>>> +
>>> +DROP TABLE t1;
>>> +
>>> +
>>> --echo # End of 5.1 tests
>>>
>>> === modified file 'sql/sql_select.cc'
>>> --- a/sql/sql_select.cc 2011-02-08 11:52:33 +0000
>>> +++ b/sql/sql_select.cc 2011-02-09 09:58:43 +0000
>>> @@ -281,46 +281,54 @@ bool handle_select(THD *thd, LEX *lex, s
>>> /*
>>
>> /* -> /** (for Doxygen)
> done
>>> Fix fields referenced from inner selects.
>>>
>>> - SYNOPSIS
>>> - fix_inner_refs()
>>> - thd Thread handle
>>> - all_fields List of all fields used in select
>>> - select Current select
>>> - ref_pointer_array Array of references to Items used in current select
>>> - group_list GROUP BY list (is NULL by default)
>>> + @param thd Thread handle
>>> + @param all_fields List of all fields used in select
>>> + @param select Current select
>>> + @param ref_pointer_array Array of references to Items used in current
> select
>>> + @param group_list GROUP BY list (is NULL by default)
>>>
>>> - DESCRIPTION
>>> - The function serves 3 purposes - adds fields referenced from inner
>>> - selects to the current select list, resolves which class to use
>>> - to access referenced item (Item_ref of Item_direct_ref) and fixes
>>> - references (Item_ref objects) to these fields.
>>> + @details
>>> + The function serves 3 purposes
>>> +
>>> + - adds fields referenced from inner query blocks to the current select
> list
>>>
>>> - If a field isn't already in the select list and the ref_pointer_array
>>> + - resolves which class to use to access referenced item (Item_ref or
>> resolves -> decides
> You're right!
>> access referenced item -> reference the items
> ok
>>> + Item_direct_ref)
>>> +
>>> + - fixes references (Item_ref objects) to these fields.
>> fields -> items
> Not sure about that one. Can you prove it isn't always fields?

Nope, this was more of a wild guess :D

>>> +
>>> + If a field isn't already on the select list and the ref_pointer_array
>>> is provided then it is added to the all_fields list and the pointer to
>>> it is saved in the ref_pointer_array.
>>>
>>> The class to access the outer field is determined by the following rules:
>>> - 1. If the outer field isn't used under an aggregate function
>>> - then the Item_ref class should be used.
>>> - 2. If the outer field is used under an aggregate function and this
>>> - function is aggregated in the select where the outer field was
>>> - resolved or in some more inner select then the Item_direct_ref
>>> - class should be used.
>>> - Also it should be used if we are grouping by a subquery containing
>>> - the outer field.
>>> +
>>> + -#. If the outer field isn't used under an aggregate function then the
>>> + Item_ref class should be used.
>>> +
>>> + -#. If the outer field is used under an aggregate function and this
>>> + function is, in turn, aggregated in the query block where the outer
>>> + field was resolved or some query nested therein, then the
>>> + Item_direct_ref class should be used. Also it should be used if we are
>>> + grouping by a subquery containing the outer field.
>>> +
>>> The resolution is done here and not at the fix_fields() stage as
>>> - it can be done only after sum functions are fixed and pulled up to
>>> - selects where they are have to be aggregated.
>>> + it can be done only after aggregate functions are fixed and pulled up to
>>> + selects where they are to be aggregated.
>>> +
>>> When the class is chosen it substitutes the original field in the
>>> Item_outer_ref object.
>>>
>>> After this we proceed with fixing references (Item_outer_ref objects) to
>>> this field from inner subqueries.
>>>
>>> - RETURN
>>> - TRUE an error occured
>>> - FALSE ok
>>> -*/
>>> + @return Status
>>> + @retval true An error occured.
>>> + @retval false OK.
>> This return convention is so common, that I would advice you to write:
>> @return true if an error occurred, false if success.
> Us humans can understand that just fine, but poor Doxygen can't.

Personally, I do not care that much about what Doxygen might think...
>>> +
>>> + @todo This comment must be restructured. It is jumping back and forth
>>> + between abstraction levels and it uses inconsistent terminology.
>>
>> Hey! There is actual documentation here! You can either remove the @todo, fix
>> the comment, or pinpoint the jumping and use of inconsistent terminology.
> My apologies, didn't mean to offend anyone. I realize that this function has
> been gradually edited from its initial appearance. In order to make amends I'll
> try to do all three!
>
> It looks like the original author was using the term "Sum" instead of aggregate.
> And someone (perhaps you) had been kind enough to change it to "aggregate" in
> most places, so I fixed the rest of them.
>
> So I try leaving it in a little better state than I found it. But I think making
> the comment 100% correct is beyond a bug fix.
>
> In my defense I'll add that this line was not meant to go in the patch. It was
> more like a memento for me that kinda slipped by.
>>
>> I guess the inconsistet terminology issues are: field vs. item. select vs
>> query block (I prefer query specification or query expression, they are SQL
>> official terms),
> Yes, and sum versus aggregate, fix versus resolve, and what does "pull up" mean?
>
> And the comment is jumping back and forth between SQL terminology, high-level
> design and class layout and list structures, etc.

Nevertheless, the description is almost spot-on.
>
> But it isn't as simple as search-and-replace either. For instance I suspect that
> "fix" means different things in different sentences.

It goes without saying. You cannot fix an inconsistent description by applying 
regular principles.
>>> + */
>>>
>>> bool
>>> fix_inner_refs(THD *thd, List<Item> &all_fields, SELECT_LEX
> *select,
>>> @@ -328,11 +336,11 @@ fix_inner_refs(THD *thd, List<Item> &all
>>> {
>>> Item_outer_ref *ref;
>>> bool res= FALSE;
>> The local variable res is unused and can be removed.
> Not the bug strictly speaking, but since it gives me an excuse to change FALSE
> to false I'll add it in. :-)

This is the kind of minor, local refactoring I think that we can apply as part 
of regular bug fixing. It has (almost) zero risk, and is not easy to confuse 
with the actual bug fix.

>>> - bool direct_ref= FALSE;
>>>
>>> List_iterator<Item_outer_ref> ref_it(select->inner_refs_list);
>>> while ((ref= ref_it++))
>>> {
>>> + bool direct_ref= false;
>>> Item *item= ref->outer_ref;
>>> Item **item_ref= ref->ref;
>>> Item_ref *new_ref;
>> Thanks,
>> Roy
> Thank you for the review!
>
> Martin

Thread
bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Martin Hansson9 Feb
Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Jorgen Loland9 Feb
  • Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Martin Hansson9 Feb
Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Roy Lyseng15 Feb
  • Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Martin Hansson15 Feb
    • Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Roy Lyseng15 Feb
      • Re: bzr commit into mysql-5.1 branch (martin.hansson:3587) Bug#59839Martin Hansson15 Feb