Hello Evgeny,
Evgeny Potemkin a écrit, Le 05.11.2009 14:20:
> Hi Guilhem,
>
> Please, see my comments inline. Let me know if you don't have any new
> comments on this fix and I'll re-commit the patch.
>
> Regards, Evgen.
> Guilhem Bichot wrote:
>> Hello Evgeny,
>>
>> Evgeny Potemkin a écrit, Le 13.10.2009 11:38:
>>> #At file:///work/bzrroot/45174-bug-azalea/ based on
>>> revid:alik@stripped
>>>
>>> 2814 Evgeny Potemkin 2009-10-13
>>> Bug#45174: Incorrectly applied equality propagation caused
>>> wrong result
>>> on a query with a materialized semi-join.
>> I have a more minimal testcase, which I used to try to understand the
>> problem, and which is like your one above but with less rows. If you
>> are interested, here are the INSERTs:
>> INSERT INTO CC VALUES
>>
> (19,'b','b'),(20,'w','w'),(21,'m','m'),(23,'',''),(24,'w','w'),(26,'e','e'),(27,'e','e'),(28,'p','p');
>
>>
>> INSERT INTO C VALUES ('b'),('u');
>> Then the bug manifests - the SELECT returns 'b' instead of 0 rows.
>>
>> Some debugging info which may be interesting: without the patch,
>> when we enter make_join_select(), "cond" contains the XOR which now has
>> a column from C as one argument (due to equality propagation), then the
>> XOR is lost (I don't clearly see how): the condition attached to CC's
>> JOIN_TAB is only that its varchar columns should be equal.
>> With your patch, when we enter make_join_select(), "cond" contains the
>> XOR which has only columns from CC as arguments; then the condition
>> attached to CC's JOIN_TAB includes the XOR, as is correct.
> ???
Well, this was just to share some debugging info which I had about this bug.
>>> === modified file 'sql/item_cmpfunc.cc'
>>> --- a/sql/item_cmpfunc.cc 2009-06-09 16:53:34 +0000
>>> +++ b/sql/item_cmpfunc.cc 2009-10-13 09:38:46 +0000
>>> @@ -5376,7 +5376,7 @@ longlong Item_equal::val_int()
>>>
>>> void Item_equal::fix_length_and_dec()
>>> {
>>> - Item *item= get_first();
>>> + Item *item= get_first(NULL);
>>> eval_item= cmp_item::get_comparator(item->result_type(),
>>> item->collation.collation);
>>> }
>>> @@ -5439,3 +5439,115 @@ void Item_equal::print(String *str, enum
>>> str->append(')');
>>> }
>>>
>>> +
>>> +/*
>>> + @brief Get the first equal field of multiple equality.
>>> + @param[in] field the field to get equal field to
>>> +
>>> + @details Get the first field of multiple equality that is equal to
>>> the
>>> + given field. In order to make semi-join materialization strategy work
>>> + correctly we can't propagate equal fields from upper select to a
>>> + materialized semi-join.
>>
>> My impression is that there is already code with a similar goal: it is
>> in eliminate_item_equal(), starting with this comment:
>> /*
>> item_field might refer to a table that is within a semi-join
>> materialization nest. In that case, the join order looks like
>> this:
>>
>> outer_tbl1 outer_tbl2 SJM (inner_tbl1 inner_tbl2) outer_tbl3
>>
>> We must not construct equalities like
>>
>> outer_tbl1.col = inner_tbl1.col
>>
>> because they would get attached to inner_tbl1 and will get
>> evaluated
>> during materialization phase, when we don't have current value of
>> outer_tbl1.col.
>> */
>>
>> This comment seems to be about our testcase, doesn't it?
>> If this code has the goal to fix our testcase, maybe it just doesn't
>> work and should be fixed?
>> I worry about adding a new solution instead of repairing the original
>> solution.
>> I may also be very wrong, because in spite of my efforts, I don't
>> understand this code well. Maybe this existing code has nothing to do
>> with our testcase.
> eliminate_item_equal is used to construct equalities from item_equal.
> but it doesn't work for '<','>',etc. Thus it's a partial fix for the
> problem.
> In our case we have (f1,f2) IN (select f11,f22 WHERE f11<const...)
> transformed to semi-join and having f1=f11 and f2=f22 added to WHERE.
> Without this fix f1 is allowed to replace f11 in f11<const while it
> shouldn't
> and eliminate_item_equal can't help us here since '<' isn't an equality.
> The comment itself is correct and describes a case similar to ours but
> for equalities.
In that case, shouldn't we have a solution similar to
eliminate_item_equal() but for < ?
I wonder about having eliminate_item_equal() for == and something else
(the patch) for < . Is it as uniform as we can do?
Again forgive my questions given my ignorance.
>>> +Item_field* Item_equal::get_first(Item_field *field)
>>> +{
>>> + List_iterator<Item_field> it(fields);
>>> + Item_field *item;
>>> + JOIN_TAB *field_tab;
>>> +
>>> + if (!field)
>>> + return fields.head();
>>> + /*
>>> + Of all equal fields, return the first one we can use. Normally,
>>> this is the
>>> + field which belongs to the table that is the first in the join
>>> order.
>>> +
>>> + There is one exception to this: When semi-join materialization
>>> strategy is
>>> + used, and the given field belongs to a table within the
>>> semi-join nest, we
>>> + must pick the first field in the semi-join nest.
>>> +
>>> + Example: suppose we have a join order:
>>> +
>>> + ot1 ot2 SJ-Mat(it1 it2 it3) ot3
>>> +
>>> + and equality ot2.col = it1.col = it2.col
>>> + If we're looking for best substitute for 'it2.col', we should
>>> pick it1.col
>>> + and not ot2.col.
>>
>> Please add an explanation of why we should not pick ot2.col.
> Added.
Ok, I'll check it in the commit.
>>> + }
>>> + else
>>> + {
>>> + /*
>>> + The field is not in SJ-Materialization nest. We must return
>>> the first
>>> + field that's not embedded in a SJ-Materialization nest.
>>> + Example: suppose we have a join order:
>>> +
>>> + SJ-Mat(it1 it2) ot1 ot2
>>> +
>>> + and equality ot2.col = ot1.col = it2.col
>>> + If we're looking for best substitute for 'ot2.col', we should
>>> pick ot1.col
>>> + and not it2.col, because when we run a join between ot1 and ot2
>>> + execution of SJ-Mat(...) has already finished and we can't
>>> rely on the
>>> + value of it*.*.
>>> + */
>>> + while ((item= it++))
>>> + {
>>
>> Here I'm not sure this new code is properly covered by tests.
>> I executed only t/subselect*.test, so maybe I missed something. In all
>> of those tests, when we came into this branch, what was returned was
>> identical to fields.head() (i.e. the while() stopped at first
>> iteration), so at least with subselect*.test there is no difference
>> between this code and a simple "return fields.head()".
>> Could you please commit a testcase where below we stop at the Nth
>> iteration where N>=2? Or just point to a current test where this happens.
> I think there is no such testcase yet, I'll commit one. This part came
> up after discussion with SergeyP.
Ok, I'll check it in the commit.
>>> === modified file 'sql/sql_select.h'
>>> --- a/sql/sql_select.h 2009-05-07 20:48:24 +0000
>>> +++ b/sql/sql_select.h 2009-10-13 09:38:46 +0000
>>> @@ -274,6 +274,13 @@ typedef struct st_join_table
>>> /* NestedOuterJoins: Bitmap of nested joins this table is part of */
>>> nested_join_map embedding_map;
>>>
>>> + /*
>>> + Semi-join strategy to be used for this join table. This is a
>>> copy of
>>> + POSITION::sj_strategy field. This field is set up by the
>>> + fix_semijion_strategies_for_picked_join_order.
>>
>> s/jion/join
>>
>> Are we sure that the read of this new member by
>> Item_equal::get_first()
>> is always preceded by a set in
>> fix_semijoin_strategies_for_picked_join_order()?
> Yes, fix_semijoin_strategies_for_picked_join_order is called at the end
> of make_join_statistics and Item_equal elimination/substitution is done
> afterwards.
>> To enforce this rule, I suggest:
>> * initialize it to an impossible value (-1?) when constructing the
>> JOIN_TAB
>> * Add DBUG_ASSERT(sj_strategy != -1) in Item_equal::get_first().
>> Regarding what place is "constructing the JOIN_TAB": I see the
>> constructor for JOIN_TAB is empty, so I suspect that there must already
>> exist some function which initializes JOIN_TABs
>> (JOIN::make_simple_join()?), where we could add initialization of
>> sj_strategy to -1.
>
> fix_semijoin_strategies_for_picked_join_order is the best place for that.
> There is a loop which goes over all join_tabs and fixes sj info for them,
> including setting correct pos->sj_strategy and later this value is copied
> to corresponding join_tab.
> However it's not done for const tables, I'll add code for this.
Thanks (I guess you mean you'll add the init-to-minus-1 and the assert).
I'll check it in the commit mail.
>>
>>> + */
>>> + uint sj_strategy;
>>> +
>>> void cleanup();
>>> inline bool is_using_loose_index_scan()
>>> {
>>
>> It is not your fault *at all*, but it is quite frustrating to be
>> reviewer of some patch to some code which I don't understand enough,
>> even though I try with gdb. I'm not used to doing reviews "in
>> blindness". I guess it's going to be this way with Optimizer bugs for
>> a long time so I have to get used to it :-( Still quite worrying, though.
>>
>> Also, I'd be interested in your reply on Timour's first comment in his
>> review email.
> I haven't replied with an email but just committed a new patch with
> objections addressed.
I wonder about this comment he made (found in
http://lists.mysql.com/commits/85548):
"Then why this limitation applies
only to materialized semi-joins? Did you analyze all other semi-join
strategies
with respect to the same problem? Is it clear if and why with all other
semi-join
strategies it is possible to execute equalities that relate outer with inner
tables?"
--
Mr. Guilhem Bichot <guilhem@stripped>
Sun Microsystems / MySQL, Lead Software Engineer
Bordeaux, France
www.sun.com / www.mysql.com