List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:October 7 2010 3:13pm
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)
Bug#45174 Bug#50019
View as plain text  
Hi Guilhem,

On 02.10.10 17.17, Guilhem Bichot wrote:
> Hello Roy,
>
> Thanks for a clearly commented patch.

And thanks for a thorough review.

>
> Roy Lyseng a écrit, Le 29.09.2010 16:34:
>> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>> revid:tor.didriksen@stripped
>>
>> 3253 Roy Lyseng 2010-09-29
>> Bug#45174: Incorrectly applied equality propagation caused wrong result
>> on a query with a materialized semi-join.
>> Bug#50019: Wrong result for IN-query with materialization.
> ...
>> mysql-test/r/subquery_sj_none_jcl7.result
>> Bug#45174: Incorrectly applied equality propagation caused wrong result
>> on a query with a materialized semi-join.
>> Bug#50019: Wrong result for IN-query with materialization.
>> Results for two new tests added.
>> Some tests using semijoin materialization show that where clause
>> has moved from the outer query into the materialized inner query.
>> This is caused by the changed call to get_first() in
>> eliminate_item_equal().
>> Ex: select * from ot where a in(select b from it where b>0);
>> The clause "b>0" is now evaluated on the inner query materialization.
>> Performance-wise this is never worse when using MaterializeScan and
>> usually better for MaterializeLookup. For the latter strategy, the
>> best possible solution is probably to evaluate the clause in both
>> queries, this can be subject for a later feature development.
>
>> === modified file 'mysql-test/r/subquery_sj_mat.result'
>> --- a/mysql-test/r/subquery_sj_mat.result 2010-09-20 14:06:02 +0000
>> +++ b/mysql-test/r/subquery_sj_mat.result 2010-09-29 14:33:39 +0000
>> @@ -2176,7 +2176,7 @@ create table t3 (a int);
>> insert into t3 select A.a + 10*B.a from t0 A, t0 B;
>> explain select * from t3 where a in (select kp1 from t1 where kp1<20);
>> id select_type table type possible_keys key key_len ref rows Extra
>> -1 PRIMARY t3 ALL NULL NULL NULL NULL 100 Using where
>> +1 PRIMARY t3 ALL NULL NULL NULL NULL 100
>> 1 PRIMARY t1 range kp1 kp1 5 NULL 48 Using where; Using index; Materialize
>
> This one looks worrying, performance-wise. The disappearing "Using where" was
> probably a condition "a<20"; having it attached to t3 allowed to do less lookups
> in the materialized table.
>
> Looking at "handler%" values in "show status" says that after the patch, the
> SELECT used 105 Handler_read_key, instead of 25 before. Other statistics are
> unchanged. Handler_write stays at 40: there are 40 row writes into the tmp table
> (logical, there are 40 rows having kp1<20 in t1)
> Thus: after the patch, t3's rows are not filtered anymore with a<20, so the
> number of key lookups into the materialized table grows (handler_read_key).
> Handler_write is unchanged so nothing has changed for t1. Also confirmed by the
> unchanged EXPLAIN row for t1 ("range" and "using where").
>
> The commit comment says
>
> "Some tests using semijoin materialization show that where clause
> has moved from the outer query into the materialized inner query."
>
> but I think here it's not a move, only a loss.

As far as I can see, it is a move. The inner Where in the previous case seemed 
to be just a True predicate. However, index analysis occurs before equality 
propagation, so the "Using index" clause made the query perform well.

I created a new table without the index and reran the query. The result was a 
materialized table with 1000 rows instead of 40 rows.

What worries me is that in the existing code, the Where clause is moved into the 
outer query part unconditionally, meaning that if there is no index on the inner 
tables, it will be impossible to apply a filter predicate on the inner part. 
After this fix, a predicate on the outer query will remain on the outer query 
and a predicate on the inner query will remain on the inner query.

We could develop a feature to copy the inner query predicate to the outer query 
(or vice versa) in the case of a materialized semijoin, but now it is at least 
possible for the user to specify these constraints manually.
>
>> @@ -2277,7 +2277,7 @@ insert into t3 select A.a + 10*B.a, 'fil
>> explain select * from t3 where a in (select a from t2) and (a > 5 or a <
> 10);
>> id select_type table type possible_keys key key_len ref rows Extra
>> 1 PRIMARY t2 ALL NULL NULL NULL NULL 2 Using where; Materialize; Scan
>> -1 PRIMARY t3 ref a a 5 test.t2.a 1
>> +1 PRIMARY t3 ref a a 5 test.t2.a 1 Using where
>
> This one is worrying too.
> Before the patch, the condition "(a>5 or a<10) and a is not null" is pushed to
> the tmp table ("t2"), and thus nothing needs to be pushed to t3.
> After the patch, the condition "a is not null" is pushed to the tmp table, and
> the condition "a>5 or a<10" is pushed to t3. We can see that potentially this
> means writing more rows to the tmp table, again this is only a loss.
> This isn't visible in "show status like 'handler%'", because "a>5 or a<10" is
> true for all rows of t2 (which are '1' and '2') (so pushing or not pushing the
> condition changes nothing); but if I change this condition to "a>1", then I see
> the tmp table contains 1 row before the patch and 2 after the patch: more writes
> to the tmp table, and thus more reads when scanning it.
>
> I am unable to explain why the new Item_equal::get_first(field) causes this, but
> my impression is that even though your patch looks correct, it must cause
> something wrong somewhere when in interaction with some other piece of code. In
> the end there is a potential performance degradation. So I would vote for a
> detailed investigation of this, maybe the patch is correct but needs a bit more
> changes in some other interacting places.
>
> Note how "using where" is partial information: above, the condition pushed to
> the tmp table shrunk after the patch, but there was still something pushed, so
> there was still "using where", giving the impression of no change. What I do is
> break in the EXPLAIN code, where it prints "using where", and print
> tab->select->cond by doing:
>
> p tab->table->alias (i.e. what table we're looking at)
> call print_where(tab->select->cond, "", QT_ORDINARY)
>
> This prints to gdb's window. If using --debug, it prints to the debug trace,
> which needs to be explicitely flushed afterwards with
>
> call fflush(DBUG_FILE)
>
> And if --debug, gdb isn't needed, the condition are printed in make_join_select().
>
> Or the optimizer trace tree can be used, opt trace coontains pushed conditions.

I'll still have to analyze this.
>
>> === modified file 'sql/item_cmpfunc.cc'
>> --- a/sql/item_cmpfunc.cc 2010-08-19 12:54:22 +0000
>> +++ b/sql/item_cmpfunc.cc 2010-09-29 14:33:39 +0000
>> @@ -5736,3 +5736,106 @@ void Item_equal::print(String *str, enum
>> str->append(')');
>> }
>>
>> +
>> +/*
>> + @brief Get the first field of multiple equality.
>> +
>> + @retval First field in the multiple equality.
>> +*/
>
> For doxygen to pick those nice comments up, /** needs to be used.
> @brief is normally unneeded as the suggest doxyfile (which lives somewhere on
> forge.mysql.com it this hasn't changed has "autobrief" on.
> I'm not sure that keeping @brief is a problem though.

Will change.
>
>> +
>> +Item_field* Item_equal::get_first()
>> +{
>> + return fields.head();
>> +}
>> +
>> +/*
>> + @brief Get the first equal field of multiple equality.
>> +
>> + @param[in] field the field to get equal field to, which must be
>> + present within the multiple equality itself.
>
> [in] is implicit, but you can keep it.

Deleted.
>
>> + @retval Found first field in the multiple equality.
>> +
>> + @details Get the first field of multiple equality that is equal to the
>> + given field. In order to make semijoin materialization strategy work
>> + correctly we can't propagate equal fields between a materialized semijoin
>> + and the outer query (or any other semijoin).
>> + Thus the field is returned according to following rules:
>> +
>> + 1) If the given field belongs to a materialized semijoin then the
>> + first field in multiple equality which belong to the same semijoin
>> + is returned.
>> + 2) If the given field doesn't belong to a materialized semijoin then
>> + the first field in the multiple equality that doesn't belong to a
>> + materialized semijoin is returned.
>> +*/
>
> There is, in eliminate_item_equal(), some very resembling concern of
> "sj-mat vs equality choice", starting with "item_field might refer to a
> table that is within a semi-join". Interestingly, this piece uses
> get_first(void). I wonder whether we can unify this and the case you are
> dealing with.
> Also other pieces seem to be talking about the issue of "there is no
> such thing as a 'current row' with sj-mat": in
> setup_sj_materialization(), two pieces, one starting with "We'll be
> doing full scan" and one starting with "Tricks with Item_equal". I'm not sure
> those two are exactly your problem though.

I have changed the implementation of get_first() according to your suggestion. 
It looks correct to me.

Please have a look at the new fix and see if it addresses your concerns.
>
>> +Item_field* Item_equal::get_first(Item_field *field)
>
> Looking only at signatures and names:
> get_first(void)
> get_first(Item_field *field)
> it's not obvious to guess which of the two variants should be used.
> I would have loved to delete the first, but Item_equal::fix_length_and_dec()
> uses it...
> Maybe eliminate_item_equal() can be made to use get_first(field) instead
> of get_first(void)...

It seems that it can :)

> get_first(field) is really about "give me the first, but with tweaks for
> sj-mat" (as your comments explain well), it's hard to find a nice name
> for this. get_first_substitutable() ?

I tried to extend the comments for the two functions, instead of inventing new 
names...

> Or, should we delete get_first(void), keep only get_first(Item_field *field)
> which, if 'field' is NULL, would return the list's head?

Having one function was Evgen's solution, but I do not see any practical 
differences between the two. So the reason I chose this one is that it is more 
"C++ compliant".
>
>> +{
>> + List_iterator<Item_field> it(fields);
>> + Item_field *item;
>> + JOIN_TAB *field_tab;
>
> field_tab can be pointer to const

Ok.
>
>> +
>> + DBUG_ASSERT(field != NULL);
>> + /*
>> + Of all equal fields, return the first one we can use. Normally, this is the
>> + field which belongs to the table that is the first in the join order.
>> +
>> + There is one exception to this: When semijoin materialization strategy is
>> + used, and the given field belongs to a table within the semijoin nest, we
>> + must pick the first field in the semijoin nest.
>> +
>> + Example: suppose we have a join order:
>> +
>> + ot1 ot2 SJ-Mat(it1 it2 it3) ot3
>> +
>> + and equality ot2.col = it1.col = it2.col
>> + If we're looking for best substitute for 'it2.col', we should pick it1.col
>> + and not ot2.col.
>
> here you could explain why (what's the concrete problem if we don't do
> this: temp table gets too few rows because etc).

Yes, and it is because the match will occur during materialization, when the row 
from ot2 is not present.
>
>> + */
>> +
>> + field_tab= field->field->table->reginfo.join_tab;
>> + if (sj_is_materialize_strategy(field_tab->get_sj_strategy()))
>> + {
>> + /*
>> + It's a field from a materialized semijoin. We can substitute it only
>> + with a field from the same semijoin.
>> + */
>> + JOIN_TAB *first= field_tab->first_sj_inner_tab;
>> + JOIN_TAB *last= field_tab->last_sj_inner_tab;
>
> first and last can be pointers to const

Thanks
>
>> + while ((item= it++))
>> + {
>> + if (item->field->table->reginfo.join_tab >= first &&
>> + item->field->table->reginfo.join_tab <= last)
>> + {
>> + return item;
>> + }
>> + }
>> + }
>> + else
>> + {
>> + /*
>> + The field is not in a materialized semijoin nest. We must return
>> + the first field that's not embedded in a materialized semijoin nest.
>> + Example: suppose we have a join order:
>> +
>> + SJ-Mat(it1 it2) ot1 ot2
>> +
>> + and equality ot2.col = ot1.col = it2.col
>> + If we're looking for best substitute for 'ot2.col', we should pick ot1.col
>> + and not it2.col, because when we run a join between ot1 and ot2
>> + execution of SJ-Mat(...) has already finished and we can't rely on the
>> + value of it*.*.
>> + */
>> + while ((item= it++))
>> + {
>> + field_tab= item->field->table->reginfo.join_tab;
>> + if (!sj_is_materialize_strategy(field_tab->get_sj_strategy()))
>> + return item;
>
> This "else" branch should be tested; if I change
>  > + if (!sj_is_materialize_strategy(field_tab->get_sj_strategy()))
>  > + return item;
> to just
>  > + return item;
>
> tests pass with the exception that some plans change (getting back to what they
> were before the patch).
> I don't understand this "else" branch; to me the logic when SJ-Mat is before ot1
> and ot2, is:
> - the table is materialized
> - if sj-mat-scan: a full scan is started on the temp table; so at that
> moment, we do have a current row of the temp table, which could serve as
> a source for the equalities in ot1/ot2 (that is, I don't see that there
> is a problem to solve here)
> - if sj-mat-lookup: a lookup is done, again that yields a row.
>
> On the other hand, I understand the other case (the one before "else"):
> we cannot use the values of the current row of ot1 and ot2, it would
> restrict the temp table's content.

Fixed according to your suggestion.
>
>> === modified file 'sql/item_cmpfunc.h'
>> --- a/sql/item_cmpfunc.h 2010-08-12 00:26:10 +0000
>> +++ b/sql/item_cmpfunc.h 2010-09-29 14:33:39 +0000
>> @@ -1634,7 +1634,8 @@ public:
>> void add(Item_field *f);
>> uint members();
>> bool contains(Field *field);
>> - Item_field* get_first() { return fields.head(); }
>> + Item_field* get_first();
>> + Item_field* get_first(Item_field *field);
>
> those two functions could be "const".

Tried, but getting commpile errors.
>
>> === modified file 'sql/sql_select.cc'
>> --- a/sql/sql_select.cc 2010-09-28 12:35:50 +0000
>> +++ b/sql/sql_select.cc 2010-09-29 14:33:39 +0000
>> @@ -4239,14 +4227,18 @@ bool find_eq_ref_candidate(TABLE *table,
>> using Materialization or LooseScan to execute it.
>> RETURN - 0 - OK
>> - 1 - Out of memory error
>> + FALSE - OK
>> + TRUE - Out of memory error
>
> actually it can only return FALSE, so should be made "void".

Not any more :) There was a missing error check.
>
>> */
>>
>> -int pull_out_semijoin_tables(JOIN *join)
>> +bool pull_out_semijoin_tables(JOIN *join)
>> {
>
>> @@ -8663,31 +8656,69 @@ static bool get_best_combination(JOIN *j
>> if (j->type == JT_CONST)
>> continue; // Handled in make_join_stat..
>>
>> -
>> j->loosescan_match_tab= NULL; //non-nulls will be set later
>> j->ref.key = -1;
>> j->ref.key_parts=0;
>>
>> -
>> if (j->type == JT_SYSTEM)
>> continue;
>> - if (j->keys.is_clear_all() || !(keyuse=
> join->best_positions[tablenr].key)
>> || - (join->best_positions[tablenr].sj_strategy == SJ_OPT_LOOSE_SCAN))
>> + if (j->keys.is_clear_all() ||
>> + !(keyuse= join->best_positions[tableno].key) || +
>> (join->best_positions[tableno].sj_strategy == SJ_OPT_LOOSE_SCAN))
>> {
>> j->type=JT_ALL;
>> - j->index= join->best_positions[tablenr].loosescan_key;
>> - if (tablenr != join->const_tables)
>> + j->index= join->best_positions[tableno].loosescan_key;
>> + if (tableno != join->const_tables)
>> join->full_join=1;
>> }
>> else if (create_ref_for_key(join, j, keyuse, used_tables))
>> DBUG_RETURN(TRUE); // Something went wrong
>> }
>>
>> - for (i=0 ; i < table_count ; i++)
>> -
> join->map2table[join->join_tab[i].table->tablenr]=join->join_tab+i;
>> + for (uint tableno= 0; tableno < table_count; tableno++)
>> + join->map2table[join->join_tab[tableno].table->tablenr]=
>> + join->join_tab + tableno;
>> +
>> update_depend_map(join);
>> - DBUG_RETURN(0);
>> +
>> + for (uint tableno= join->const_tables; tableno < table_count; )
>> + {
>> + JOIN_TAB *tab= join->join_tab + tableno;
>> + POSITION *pos= join->best_positions + tableno;
>
> "pos" could be pointer to const.

Thanks.
>
>> +
>> + switch (pos->sj_strategy)
>> + {
>> + case SJ_OPT_NONE:
>> + tableno++;
>> + break;
>

Regards,
Roy
Thread
bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253) Bug#45174Bug#50019Roy Lyseng29 Sep
  • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)Bug#45174 Bug#50019Guilhem Bichot2 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)Bug#45174 Bug#50019Roy Lyseng5 Oct
    • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)Bug#45174 Bug#50019Roy Lyseng7 Oct
      • Re: bzr commit into mysql-next-mr-bugfixing branch (roy.lyseng:3253)Bug#45174 Bug#50019Guilhem Bichot20 Oct