List:Commits« Previous MessageNext Message »
From:Evgeny Potemkin Date:December 22 2009 1:08pm
Subject:Re: bzr commit into mysql-5.1-bugteam branch (gshchepa:3201) Bug#45640
View as plain text  
Hi Gleb,

Gleb Shchepa wrote:
> #At file:///work/45640-5.1/ based on
> revid:jorgen.loland@stripped
> 
>  3201 Gleb Shchepa	2009-11-14
>       Bug #45640: optimizer bug produces wrong results
>       
>       Grouping by a subquery in a query with a distinct aggregate
>       function lead to a wrong result (wrong and unordered
>       grouping values).
>       
>       There are two related problems:
>       
>       1) The query like this:
>       
>          SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c
>          FROM t1 GROUP BY aa
>       
>       returned wrong result, because the outer reference "t1.a"
>       in the subquery was substituted with the Item_ref item.
>       
>       The Item_ref item obtains data from the result_field object
>       that refreshes once after the end of each group. This data
>       is not applicable to filesort since filesort() doesn't care
>       about groups (and doesn't update result_field objects with
>       copy_fields() and so on). Also that data is not applicable
>       to group separation algorithm: end_send_group() checks every
>       record with test_if_group_changed() that evaluates Item_ref
>       items, but it refreshes those Item_ref-s only after the end
>       of group, that is a vicious circle and the grouped column
>       values in the output are shifted.
>       
>       Fix: if 
>              a) we grouping by a subquery and 
>              b) that subquery has outer references to FROM list
>                 of the grouping query,
>            then we substitute these outer references with
>            Item_direct_ref like references under aggregate
>            functions: Item_direct_ref obtains data directly
>            from the current record.
>       
>       
>       2) The query with a non-trivial grouping expression like:
>       
>          SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c
>          FROM t1 GROUP BY aa+0
>       
>       also returned wrong result, since JOIN::exec() substitutes
>       references to top-level aliases in SELECT list with Item_copy
>       caching items. Item_copy items have same refreshing policy
>       as Item_ref items, so the whole groping expression with
>       Item_copy inside returns wrong result in filesort() and
>       end_send_group().
>       
>       Fix: include aliased items into GROUP BY item tree instead
>            of Item_ref references to them.
>      @ mysql-test/r/group_by.result
>         Test case for bug #45640
>      @ mysql-test/t/group_by.test
>         Test case for bug #45640
>      @ sql/item.cc
>         Bug #45640: optimizer bug produces wrong results
>         
>         - Item_field::fix_fields() has been modified to resolve
>           aliases in GROUP BY item trees into aliased items instead
>           of Item_ref items.
>         
>         - Item_outer_ref::walk() has been overloaded to implement
>           Item_outer_ref tree search algorithm.
>      @ sql/item.h
>         Bug #45640: optimizer bug produces wrong results
>         
>         Item::find_outer_ref_processor() helper has been introduced
>         and Item_outer_ref::walk() has been overloaded to implement
>         Item_outer_ref tree search algorithm.
>      @ sql/mysql_priv.h
>         Bug #45640: optimizer bug produces wrong results
>         
>         fix_inner_refs() has been modified to accept group_list
>         parameter.
>      @ sql/sql_lex.cc
>         Bug #45640: optimizer bug produces wrong results
>         
>         Initialization of st_select_lex::group_fix_field has
>         been added.
>      @ sql/sql_lex.h
>         Bug #45640: optimizer bug produces wrong results
>         
>         The st_select_lex::group_fix_field field has been introduced
>         to control alias resolution in Itef_fied::fix_fields.
>      @ sql/sql_select.cc
>         Bug #45640: optimizer bug produces wrong results
>         
>         - The fix_inner_refs function has been modified to treat
>           subquery outer references like outer fields under aggregate
>           functions, if they are included in GROUP BY item tree.
>         
>         - The find_order_in_list function has been modified to
>           fix Item_field alias fields included in the GROUP BY item
>           trees in a special manner.
> 
>     modified:
>       mysql-test/r/group_by.result
>       mysql-test/t/group_by.test
>       sql/item.cc
>       sql/item.h
>       sql/mysql_priv.h
>       sql/sql_lex.cc
>       sql/sql_lex.h
>       sql/sql_select.cc
> === modified file 'mysql-test/r/group_by.result'
> --- a/mysql-test/r/group_by.result	2009-02-26 17:17:06 +0000
> +++ b/mysql-test/r/group_by.result	2009-11-14 14:11:54 +0000
> @@ -1703,3 +1703,76 @@ COUNT(i)
>  1
>  DROP TABLE t1;
>  SET @@sql_mode = @old_sql_mode;
> +#
> +# Bug #45640: optmizer bug produces wrong results
> +#
> +CREATE TABLE t1 (a INT, b INT);
> +INSERT INTO t1 VALUES (4, 40), (1, 10), (2, 20), (2, 20), (3, 30);
> +# should return 4 ordered records:
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY aa;
> +aa	COUNT(DISTINCT b)
> +1	1
> +2	1
> +3	1
> +4	1
> +SELECT (SELECT (SELECT t1.a)) aa, COUNT(DISTINCT b) FROM t1 GROUP BY aa;
> +aa	COUNT(DISTINCT b)
> +1	1
> +2	1
> +3	1
> +4	1
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY aa+0;
> +aa	COUNT(DISTINCT b)
> +1	1
> +2	1
> +3	1
> +4	1
> +# should return the same result in a reverse order:
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY -aa;
> +aa	COUNT(DISTINCT b)
> +4	1
> +3	1
> +2	1
> +1	1
> +# execution plan should not use temporary table:
> +EXPLAIN EXTENDED
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY aa+0;
> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
> +1	PRIMARY	t1	ALL	NULL	NULL	NULL	NULL	5	100.00	Using filesort
> +2	DEPENDENT SUBQUERY	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	No tables used
> +Warnings:
> +Note	1276	Field or reference 'test.t1.a' of SELECT #2 was resolved in SELECT #1
> +Note	1003	select (select `test`.`t1`.`a` AS `a`) AS `aa`,count(distinct
> `test`.`t1`.`b`) AS `COUNT(DISTINCT b)` from `test`.`t1` group by ((select `test`.`t1`.`a`
> AS `a`) + 0)
> +EXPLAIN EXTENDED
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY -aa;
> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
> +1	PRIMARY	t1	ALL	NULL	NULL	NULL	NULL	5	100.00	Using filesort
> +2	DEPENDENT SUBQUERY	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	No tables used
> +Warnings:
> +Note	1276	Field or reference 'test.t1.a' of SELECT #2 was resolved in SELECT #1
> +Note	1003	select (select `test`.`t1`.`a` AS `a`) AS `aa`,count(distinct
> `test`.`t1`.`b`) AS `COUNT(DISTINCT b)` from `test`.`t1` group by -((select
> `test`.`t1`.`a` AS `a`))
> +# should return only one record
> +SELECT (SELECT tt.a FROM t1 tt LIMIT 1) aa, COUNT(DISTINCT b) FROM t1
> +GROUP BY aa;
> +aa	COUNT(DISTINCT b)
> +4	4
> +CREATE TABLE t2 SELECT DISTINCT a FROM t1;
> +# originally reported queries (1st two columns of next two query
> +# results should be same):
> +SELECT (SELECT t2.a FROM t2 WHERE t2.a = t1.a) aa, b, COUNT(DISTINCT b)
> +FROM t1 GROUP BY aa, b;
> +aa	b	COUNT(DISTINCT b)
> +1	10	1
> +2	20	1
> +3	30	1
> +4	40	1
> +SELECT (SELECT t2.a FROM t2 WHERE t2.a = t1.a) aa, b, COUNT(         b)
> +FROM t1 GROUP BY aa, b;
> +aa	b	COUNT(         b)
> +1	10	1
> +2	20	2
> +3	30	1
> +4	40	1
> +DROP TABLE t1, t2;
> +#
> +# End of 5.1 tests
> 
> === modified file 'mysql-test/t/group_by.test'
> --- a/mysql-test/t/group_by.test	2009-02-26 17:17:06 +0000
> +++ b/mysql-test/t/group_by.test	2009-11-14 14:11:54 +0000
> @@ -1158,3 +1158,46 @@ SELECT COUNT(i) FROM t1 WHERE i > 1;
>  DROP TABLE t1;
>  SET @@sql_mode = @old_sql_mode;
>  
> +--echo #
> +--echo # Bug #45640: optimizer bug produces wrong results
> +--echo #
> +
> +CREATE TABLE t1 (a INT, b INT);
> +INSERT INTO t1 VALUES (4, 40), (1, 10), (2, 20), (2, 20), (3, 30);
> +
> +--echo # should return 4 ordered records:
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY aa;
> +
> +SELECT (SELECT (SELECT t1.a)) aa, COUNT(DISTINCT b) FROM t1 GROUP BY aa;
> +
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY aa+0;
> +
> +--echo # should return the same result in a reverse order:
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY -aa;
> +
> +--echo # execution plan should not use temporary table:
> +EXPLAIN EXTENDED
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY aa+0;
> +
> +EXPLAIN EXTENDED
> +SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) FROM t1 GROUP BY -aa;
> +
> +--echo # should return only one record
> +SELECT (SELECT tt.a FROM t1 tt LIMIT 1) aa, COUNT(DISTINCT b) FROM t1
> +  GROUP BY aa;
> +
> +CREATE TABLE t2 SELECT DISTINCT a FROM t1;
> +
> +--echo # originally reported queries (1st two columns of next two query
> +--echo # results should be same):
> +
> +SELECT (SELECT t2.a FROM t2 WHERE t2.a = t1.a) aa, b, COUNT(DISTINCT b)
> +  FROM t1 GROUP BY aa, b;
> +SELECT (SELECT t2.a FROM t2 WHERE t2.a = t1.a) aa, b, COUNT(         b)
> +  FROM t1 GROUP BY aa, b;
> +
> +DROP TABLE t1, t2;
> +
> +--echo #
> +
> +--echo # End of 5.1 tests
> 
> === modified file 'sql/item.cc'
> --- a/sql/item.cc	2009-11-06 19:42:24 +0000
> +++ b/sql/item.cc	2009-11-14 14:11:54 +0000
> @@ -4300,17 +4300,34 @@ bool Item_field::fix_fields(THD *thd, It
>                It's not an Item_field in the select list so we must make a new
>                Item_ref to point to the Item in the select list and replace the
>                Item_field created by the parser with the new Item_ref.
> +
> +              NOTE: If we are fixing an alias reference inside ORDER/GROUP BY
> +              item tree, then we use new Item_ref as an intermediate value
> +              to resolve referenced item only.
> +              In this case the new Item_ref item is unused further (and it's
> +              memory-leaked - this is OK at this phase).
It's not leaked, it will be freed at the end of the statement. I'ts just unused.
>              */
>              Item_ref *rf= new Item_ref(context, db_name,table_name,field_name);
>              if (!rf)
>                return 1;
> -            thd->change_item_tree(reference, rf);
> +
> +            bool save_group_fix_field=
> thd->lex->current_select->group_fix_field;
>              /*
> -              Because Item_ref never substitutes itself with other items 
> -              in Item_ref::fix_fields(), we can safely use the original 
> -              pointer to it even after fix_fields()
> -             */
> -            return rf->fix_fields(thd, reference) ||  rf->check_cols(1);
> +              No need for recursive resolving of aliases.
> +            */
> +            thd->lex->current_select->group_fix_field= 0;
> +
> +            bool ret= rf->fix_fields(thd, (Item **) &rf) ||
> rf->check_cols(1);
> +            thd->lex->current_select->group_fix_field=
> save_group_fix_field;
> +            if (ret)
> +              return TRUE;
> +
> +            if (save_group_fix_field && alias_name_used)
> +              thd->change_item_tree(reference, *rf->ref);
> +            else
> +              thd->change_item_tree(reference, rf);
> +
> +            return FALSE;
>            }
>          }
>        }
> @@ -6375,6 +6392,22 @@ bool Item_outer_ref::fix_fields(THD *thd
>  }
>  
>  
> +bool Item_outer_ref::walk(Item_processor processor, bool walk_subquery,
> +                          uchar *arg)
> +{ 
> +  if (processor == &Item::find_outer_ref_processor && this == (void *)
> arg)
> +  {
> +    /*
> +      Work-around depth-first walk() algorithm and terminate the search there
> +      if this is an Item_outer_ref item we are seeking for.
> +    */
> +    return TRUE;
> +  }
> +
> +  return (*ref)->walk(processor, walk_subquery, arg); 
> +}
There is no need in such tweaking. The Item_outer_ref object is created for
fields only, thus short-cutting it here wouldn't give us any practical benefit.
> +
> +
>  /**
>    Compare two view column references for equality.
>  
> 
> === modified file 'sql/item.h'
> --- a/sql/item.h	2009-11-06 19:42:24 +0000
> +++ b/sql/item.h	2009-11-14 14:11:54 +0000
> @@ -894,6 +894,7 @@ public:
>    virtual bool change_context_processor(uchar *context) { return 0; }
>    virtual bool reset_query_id_processor(uchar *query_id_arg) { return 0; }
>    virtual bool is_expensive_processor(uchar *arg) { return 0; }
> +  virtual bool find_outer_ref_processor(uchar *arg) { return 0; }
It's better to make processor return this == arg, than tweaking
Item_outer_ref::walk.
>    virtual bool register_field_in_read_map(uchar *arg) { return 0; }
>    /*
>      Check if a partition function is allowed
> @@ -2371,6 +2372,7 @@ public:
>      return (*ref)->const_item() ? 0 : OUTER_REF_TABLE_BIT;
>    }
>    virtual Ref_Type ref_type() { return OUTER_REF; }
> +  bool walk(Item_processor processor, bool walk_subquery, uchar *arg);
>  };
>  
>  
> 
> === modified file 'sql/mysql_priv.h'
> --- a/sql/mysql_priv.h	2009-10-27 13:20:34 +0000
> +++ b/sql/mysql_priv.h	2009-11-14 14:11:54 +0000
> @@ -1188,7 +1188,7 @@ int setup_group(THD *thd, Item **ref_poi
>  		List<Item> &fields, List<Item> &all_fields, ORDER *order,
>  		bool *hidden_group_fields);
>  bool fix_inner_refs(THD *thd, List<Item> &all_fields, SELECT_LEX *select,
> -                   Item **ref_pointer_array);
> +                   Item **ref_pointer_array, ORDER *group_list= NULL);
>  
>  bool handle_select(THD *thd, LEX *lex, select_result *result,
>                     ulong setup_tables_done_option);
> 
> === modified file 'sql/sql_lex.cc'
> --- a/sql/sql_lex.cc	2009-09-17 15:25:52 +0000
> +++ b/sql/sql_lex.cc	2009-11-14 14:11:54 +0000
> @@ -1592,6 +1592,7 @@ void st_select_lex::init_query()
>    having= prep_having= where= prep_where= 0;
>    olap= UNSPECIFIED_OLAP_TYPE;
>    having_fix_field= 0;
> +  group_fix_field= 0;
>    context.select_lex= this;
>    context.init();
>    /*
> 
> === modified file 'sql/sql_lex.h'
> --- a/sql/sql_lex.h	2009-09-28 12:41:10 +0000
> +++ b/sql/sql_lex.h	2009-11-14 14:11:54 +0000
> @@ -647,6 +647,8 @@ public:
>    bool  braces;   	/* SELECT ... UNION (SELECT ... ) <- this braces */
>    /* TRUE when having fix field called in processing of this SELECT */
>    bool having_fix_field;
> +  /* TRUE when GROUP BY fix field called in processing of this SELECT */
> +  bool group_fix_field;
>    /* List of references to fields referenced from inner selects */
>    List<Item_outer_ref> inner_refs_list;
>    /* Number of Item_sum-derived objects in this SELECT */
> 
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc	2009-11-13 11:22:39 +0000
> +++ b/sql/sql_select.cc	2009-11-14 14:11:54 +0000
> @@ -287,6 +287,7 @@ bool handle_select(THD *thd, LEX *lex, s
>      all_fields        List of all fields used in select
>      select            Current select
>      ref_pointer_array Array of references to Items used in current select
> +    group_list        GROUP BY list
Please, add a note that the group_list is NULL by default.
>  
>    DESCRIPTION
>      The function serves 3 purposes - adds fields referenced from inner
> @@ -305,6 +306,8 @@ bool handle_select(THD *thd, LEX *lex, s
>        function is aggregated in the select where the outer field was
>        resolved or in some more inner select then the Item_direct_ref
>        class should be used.
> +      Also it should be used if we are grouping by a subquery containing
> +      the outer field.
>      The resolution is done here and not at the fix_fields() stage as
>      it can be done only after sum functions are fixed and pulled up to
>      selects where they are have to be aggregated.
> @@ -321,7 +324,7 @@ bool handle_select(THD *thd, LEX *lex, s
>  
>  bool
>  fix_inner_refs(THD *thd, List<Item> &all_fields, SELECT_LEX *select,
> -                 Item **ref_pointer_array)
> +                 Item **ref_pointer_array, ORDER *group_list /*= NULL*/)
Please, remove /*= NULL */ as it's confusing.
>  {
>    Item_outer_ref *ref;
>    bool res= FALSE;
> @@ -371,6 +374,22 @@ fix_inner_refs(THD *thd, List<Item> &all
>          }
>        }
>      }
> +    else
> +    {
> +      /*
> +        Check if GROUP BY item trees contain the outer ref:
> +        in this case we have to use Item_direct_ref insted of Item_ref.
*instead
> +      */
> +      for (ORDER *group= group_list; group; group= group->next)
> +      {
> +        if ((*group->item)->walk(&Item::find_outer_ref_processor, TRUE,
> +                                 (uchar *) ref))
> +        {
> +          direct_ref= TRUE;
> +          break;
> +        }
> +      }
> +    }
>      new_ref= direct_ref ?
>                new Item_direct_ref(ref->context, item_ref, ref->table_name,
>                            ref->field_name, ref->alias_name_used) :
> @@ -564,7 +583,8 @@ JOIN::prepare(Item ***rref_pointer_array
>    }
>  
>    if (select_lex->inner_refs_list.elements &&
> -      fix_inner_refs(thd, all_fields, select_lex, ref_pointer_array))
> +      fix_inner_refs(thd, all_fields, select_lex, ref_pointer_array,
> +                     group_list))
>      DBUG_RETURN(-1);
>  
>    if (group_list)
> @@ -14506,11 +14526,29 @@ find_order_in_list(THD *thd, Item **ref_
>  
>      We check order_item->fixed because Item_func_group_concat can put
>      arguments for which fix_fields already was called.
> +    
> +    group_fix_field= TRUE is to resolve aliases from the SELECT list
> +    without creating of Item_ref-s: JOIN::exec() wraps aliased items
> +    in SELECT list with Item_copy items. To re-evaluate such a tree
> +    that includes Item_copy items we have to refresh Item_copy caches,
> +    but:
> +      - filesort() never refresh Item_copy items,
> +      - end_send_group() checks every record for group boundary by the
> +        test_if_group_changed function that obtain data from these
> +        Item_copy items, but the copy_fields function that
> +        refreshes Item copy items is called after group boundaries only -
> +        that is a vicious circle.
> +    So we prevent inclusion of Item_copy items.
>    */
> -  if (!order_item->fixed &&
> +  bool save_group_fix_field= thd->lex->current_select->group_fix_field;
> +  if (is_group_field)
> +    thd->lex->current_select->group_fix_field= TRUE;
> +  bool ret= (!order_item->fixed &&
>        (order_item->fix_fields(thd, order->item) ||
>         (order_item= *order->item)->check_cols(1) ||
> -       thd->is_fatal_error))
> +       thd->is_fatal_error));
> +  thd->lex->current_select->group_fix_field= save_group_fix_field;
> +  if (ret)
>      return TRUE; /* Wrong field. */
>  
>    uint el= all_fields.elements;
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 


Thread
bzr commit into mysql-5.1-bugteam branch (gshchepa:3201) Bug#45640Gleb Shchepa14 Nov
  • Re: bzr commit into mysql-5.1-bugteam branch (gshchepa:3201) Bug#45640Evgeny Potemkin22 Dec