List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:February 10 2011 10:51am
Subject:Re: bzr commit into mysql-trunk branch (roy.lyseng:3338) Bug#59833
View as plain text  
Hi Roy,

Thanks for taking the time to understand the reasoning behind the
suspicious statement in make_cond_for_table_from_pred(), for explaining
it so well, and for cleaning this up.  Good to see that it also
corrects many misplaced predicates in query plans.

However, I wonder whether the best solution would be to pull this
check out of make_cond_for_table_from_pred().  It assumes a particular
usage of this function, that is not common to all places it is used.
I think it would be better if one could to extract the constant
expensive parts of the condition and add that to the first table
condition explicitly, instead of using this special trick. However, I
see that this is not trivial.

Another question, does the patch assume that the expensive condition
is a constant condition?  I am not quite sure, but I think it should
work OK even if it is non-constant.

More comments inline.

On 02/ 4/11 12:29 PM, Roy Lyseng wrote:
 > #At file:///home/rl136806/mysql/repo/mysql-work5/ based on 
revid:jorgen.loland@stripped
 >
 >   3338 Roy Lyseng	2011-02-04
 >        Bug#59833: materialization=on/off leads to different result 
set when using IN

...

 > === modified file 'mysql-test/r/subquery_all.result'
 > --- a/mysql-test/r/subquery_all.result	2011-02-02 15:05:14 +0000
 > +++ b/mysql-test/r/subquery_all.result	2011-02-04 11:28:14 +0000
 > @@ -4454,7 +4454,7 @@ from t2;
 >   id	select_type	table	type	possible_keys	key	key_len	ref	rows 
filtered	Extra
 >   1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	2	100.00	
 >   2	DEPENDENT SUBQUERY	t1	ref_or_null	a	a	5	func	2	100.00	Using 
where; Full scan on NULL key
 > -2	DEPENDENT SUBQUERY	t4	ALL	NULL	NULL	NULL	NULL	100	100.00	Using 
where; Using join buffer (BNL, incremental buffers)
 > +2	DEPENDENT SUBQUERY	t4	ALL	NULL	NULL	NULL	NULL	100	100.00	Using 
join buffer (BNL, incremental buffers)
 >   Warnings:
 >   Note	1276	Field or reference 'test.t2.oref' of SELECT #2 was 
resolved in SELECT #1
 >   Note	1003	select `test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS 
`b`,`test`.`t2`.`oref` AS 
`oref`,<in_optimizer>((`test`.`t2`.`a`,`test`.`t2`.`b`),<exists>(select 
`test`.`t1`.`a`,`test`.`t1`.`b` from `test`.`t1` join `test`.`t4` where 
((`test`.`t1`.`c` = `test`.`t2`.`oref`) and 
trigcond(((<cache>(`test`.`t2`.`a`) = `test`.`t1`.`a`) or 
isnull(`test`.`t1`.`a`))) and trigcond(((<cache>(`test`.`t2`.`b`) = 
`test`.`t1`.`b`) or isnull(`test`.`t1`.`b`)))) having 
(trigcond(<is_not_null_test>(`test`.`t1`.`a`)) and 
trigcond(<is_not_null_test>(`test`.`t1`.`b`))))) AS `Z` from `test`.`t2`
 > @@ -5126,7 +5126,7 @@ FROM t3 WHERE 1 = 0 GROUP BY 1;
 >   id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 >   1	PRIMARY	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Impossible WHERE
 >   2	DEPENDENT SUBQUERY	t1	index	NULL	PRIMARY	4	NULL	2	Using index
 > -2	DEPENDENT SUBQUERY	t2	index	b	b	5	NULL	2	Using where; Using index; 
Using join buffer (BNL, incremental buffers)
 > +2	DEPENDENT SUBQUERY	t2	ALL	b	NULL	NULL	NULL	2	Range checked for 
each record (index map: 0x2)

IIUC, the above means that the index will be considered used on a
row-to-row basis. I guess that is OK.  (It seems to me that type=ALL
is a bit misleading here, but that is not your fault.)

...

 > === modified file 'mysql-test/r/subquery_sj_mat_nosj.result'
 > --- a/mysql-test/r/subquery_sj_mat_nosj.result	2011-01-27 11:38:22 +0000
 > +++ b/mysql-test/r/subquery_sj_mat_nosj.result	2011-02-04 11:28:14 +0000
 > @@ -5297,7 +5297,7 @@ where t1.uid in (select t4.uid from t4,
 >   and t2.uid=t1.fid;
 >   id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
 >   1	PRIMARY	t1	ALL	NULL	NULL	NULL	NULL	11	Using where
 > -1	PRIMARY	t2	eq_ref	PRIMARY	PRIMARY	4	test.t1.fid	1	Using where
 > +1	PRIMARY	t2	eq_ref	PRIMARY	PRIMARY	4	test.t1.fid	1	
 >   2	SUBQUERY	t3	ref	uid	uid	5	const	4	Using where
 >   2	SUBQUERY	t4	eq_ref	PRIMARY	PRIMARY	4	test.t3.fid	1	Using index
 >   select name from t2, t1

The above query plan seems a bit strange to me, but I am not sure it
is related to your work.  The query in question is

select name from t2, t1
where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
and t2.uid=t1.fid;

Given the join order, I do not quite understand what condition can be
checked on t1 and what condition was previously checked on t2.  It
seems all conditions are related to either t3 or t4.

 > === modified file 'sql/sql_select.cc'
 > --- a/sql/sql_select.cc	2011-02-02 13:48:49 +0000
 > +++ b/sql/sql_select.cc	2011-02-04 11:28:14 +0000
 > @@ -9505,12 +9505,16 @@ static bool pushdown_on_conditions(JOIN*
 >       */
 >       Item *on_expr= *first_inner_tab->on_expr_ref;
 >
 > -    table_map used_tables= (join->const_table_map |
 > -                            OUTER_REF_TABLE_BIT | RAND_TABLE_BIT);
 > +    table_map used_tables= (table_map)0;
 > +
 >       for (JOIN_TAB *join_tab= join->join_tab+join->const_tables;
 >            join_tab<= last_tab ; join_tab++)
 >       {
 >         table_map current_map= join_tab->table->map;
 > +      if (join_tab == join->join_tab+join->const_tables)

Does not coding standard require space around '+'?

 > +        current_map|= join->const_table_map | OUTER_REF_TABLE_BIT;
 > +      if (join_tab == last_tab)
 > +        current_map|= RAND_TABLE_BIT;
 >         used_tables|= current_map;
 >         Item *tmp_cond= make_cond_for_table(on_expr, used_tables, 
current_map, 0);

I found it a bit confusing at first that the mapping between actual
and formal parameters here is:
   used_tables -> tables
   current_map -> used_table

I think we should consider renaming the parameters of
make_cond_for_table_from_pred(), but I guess that should probably be
done in separate patch.

 >         if (!tmp_cond)
 > @@ -9567,7 +9571,6 @@ static bool make_join_select(JOIN *join,
 >     DBUG_ENTER("make_join_select");
 >     {
 >       add_not_null_conds(join);
 > -    table_map used_tables;
 >       /*
 >         Step #1: Extract constant condition
 >          - Extract and check the constant part of the WHERE
 > @@ -9634,22 +9637,34 @@ static bool make_join_select(JOIN *join,
 >       /*
 >         Step #2: Extract WHERE/ON parts
 >       */
 > +    table_map used_tables= 0;
 >       table_map save_used_tables= 0;
 > -    used_tables= join->const_table_map | OUTER_REF_TABLE_BIT | 
RAND_TABLE_BIT;
 > -    JOIN_TAB *tab;
 > -    table_map current_map;
 >       for (uint i=join->const_tables ; i<  join->tables ; i++)
 >       {
 > -      tab= join->join_tab+i;
 > +      JOIN_TAB *tab= join->join_tab+i;
 >         /*
 >           first_inner is the X in queries like:
 >           SELECT * FROM t1 LEFT OUTER JOIN (t2 JOIN t3) ON X
 >         */
 >         JOIN_TAB *first_inner_tab= tab->first_inner;
 > -      current_map= tab->table->map;
 >         bool use_quick_range=0;
 >         Item *tmp;
 >
 > +      /*
 > +        Calculate used table information added at this stage.
 > +        The current table is always added. Const tables are assumed 
to be
 > +        available together with the first table in the join order.
 > +        All outer references are available, so these may be 
evaluated together
 > +        with the first table.
 > +	Random expressions must be added to the last table's condition.
 > +	It solves problem with queries like SELECT * FROM t1 WHERE rand()>  0.5
 > +      */
 > +      table_map current_map= tab->table->map;
 > +      if (i == join->const_tables)
 > +	current_map|= OUTER_REF_TABLE_BIT | join->const_table_map;
 > +      if (i == join->tables-1)
 > +	current_map|= RAND_TABLE_BIT;
 > +

I agree with Olav's comments about being consistent about the order
operator arguments (and anout TAB).  There is a bit of duplication of
code here between pushdown_on_condition() and make_join_select(), but I
see that it is not trivial to avoid that.

 >         /*
 >           Tables that are within SJ-Materialization nests cannot have 
their
 >           conditions referring to preceding non-const tables.
 > @@ -9660,17 +9675,10 @@ static bool make_join_select(JOIN *join,
 >             !(used_tables&  tab->emb_sj_nest->sj_inner_tables))
 >         {
 >           save_used_tables= used_tables;
 > -        used_tables= join->const_table_map | OUTER_REF_TABLE_BIT |
 > -                     RAND_TABLE_BIT;
 > +        used_tables= OUTER_REF_TABLE_BIT | join->const_table_map;
 >         }
 >
 > -      /*
 > -	Following force including random expression in last table condition.
 > -	It solve problem with select like SELECT * FROM t1 WHERE rand()>  0.5
 > -      */
 > -      if (i == join->tables-1)
 > -	current_map|= OUTER_REF_TABLE_BIT | RAND_TABLE_BIT;
 > -      used_tables|=current_map;
 > +      used_tables|= current_map;
 >
 >         if (tab->type == JT_REF&&  tab->quick&&
 >   	(uint) tab->ref.key == tab->quick->index&&
 > @@ -19144,10 +19152,25 @@ static bool replace_subcondition(JOIN *j
 >       specified in 'tables' bitmap are available.
 >       If 'used_table' is 0, extract conditions for all tables in 
'tables'.
 >
 > -    The function assumes that
 > -      - Constant parts of the condition has already been checked.
 > -      - Condition that could be checked for tables in 'tables' has 
already
 > -        been checked.

I think intention of the last sentence above should be kept in some way. 
  I guess we now could say: "Condition that could be checked for tables 
in 'tables', not including tables in used_table, has already been 
checked." Or something like that.

I guess the description of used_table needs to be updated since it 
implies that the map contains a single table (in addition to 
PSEUDO_TABLE_BITS).  However, it may also contain addtional const tables.

 > +    This function can be used to extract conditions relevant for a table
 > +    in a join order. It will ensure that all conditions are attached to
 > +    the first table in the join order where all necessary fields are
 > +    available, and it will also ensure that a given condition is 
attached
 > +    to only one table.

IIUC, correct this function does not attach condition to tables, it
just extracts conditions relevant to a table.  The actual attaching
Hence, I think it is a bit misleading to say that this function will
ensure that conditions are attached to a table.

Starting a new paragraph here, makes it less clear what "this" below
refers too.

 > +
 > +    To accomplish this, first initialize "tables" to the empty

In doxygen comments, it is common to use @c to prefix identifiers.

 > +    set. Then loop over all tables in the join order, set 
"used_table" to

I suggest comma after "Then"

 > +    the bit representing the current table, accumulate "used_table" 
into the
 > +    "tables" set, and call this function. To ensure correct handling of
 > +    const expressions and outer references, add the const table map and
 > +    OUTER_REF_TABLE_BIT to "used_table" for the first table. To ensure
 > +    that random expressions are evaluated for the final table, add
 > +    RAND to "used_table" for the final table.
 > +
 > +    The function assumes that constant, inexpensive parts of the 
condition
 > +    have already been checked. Constant, expensive parts will be 
attached
 > +    to the first table in the join order, provided that the above call
 > +    sequence is followed.
 >
 >       The function takes into account that some parts of the 
condition are
 >       guaranteed to be true by employed 'ref' access methods (the 
code that
 > @@ -19176,20 +19199,13 @@ make_cond_for_table_from_pred(Item *root
 >       Ignore this condition if
 >        1. We are extracting conditions for a specific table, and
 >        2. that table is not referenced by the condition, and
 > -     3. exclude constant conditions not checked at optimization time if
 > -        the table we are pushing conditions to is the first one.
 > -        As a result, such conditions are not considered as already 
checked
 > -        and will be checked at execution time, attached to the first 
table.
 > +     3. this is a constant condition not checked at optimization 
time and
 > +        this is not the first table we are extracting conditions for.
 > +       (Assuming that used_table == tables for the first table.)

The above description is not equal to the condition below.  The way it
is written, it sounds like the condition is only ignored if it is a
"constant condition not checked at optimization time".  However, that
is not the case.  I think it better describes the logic if the "and"
at the end of item 2, is replaced with "but not if" and the "not" in
"not the first table" is removed.

In addition, I do not think a expensive condition always will be a
constant condition.  AFAIK, calls to user-defined functions will also
be considered expensive.

 >     */
 >     if (used_table&& 
  // 1
 >         !(cond->used_tables()&  used_table)&& 
  // 2
 > -      /*
 > -        psergey: TODO: "used_table&  1" doesn't make sense in nearly any
 > -        context. Look at setup_table_map(), table bits reflect the order
 > -        the tables were encountered by the parser. Check what we should
 > -        replace this condition with.
 > -      */
 > -      !((used_table&  1)&&  cond->is_expensive()))                 
// 3
 > +      !(used_table == tables&&  cond->is_expensive()))              // 3

I think order of AND-arguments should be swapped to better match the
textual description.

-- 
Øystein
Thread
bzr commit into mysql-trunk branch (roy.lyseng:3338) Bug#59833Roy Lyseng4 Feb
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3338) Bug#59833Olav Sandstaa6 Feb
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3338) Bug#59833Roy Lyseng11 Feb
  • Re: bzr commit into mysql-trunk branch (roy.lyseng:3338) Bug#59833Øystein Grøvlen10 Feb
    • Re: bzr commit into mysql-trunk branch (roy.lyseng:3338) Bug#59833Roy Lyseng14 Feb