Hello Evgeny,
Evgeny Potemkin a écrit, Le 13.10.2009 11:38:
> #At file:///work/bzrroot/45174-bug-azalea/ based on
> revid:alik@stripped
>
> 2814 Evgeny Potemkin 2009-10-13
> Bug#45174: Incorrectly applied equality propagation caused wrong result
> on a query with a materialized semi-join.
>
> When a subquery is a subject to a semi-join optimization its tables are
> merged to the upper query and later they treated as usual tables.
> This allows a bunch of optimizations to be applied, equality
> propagation is among them. Equality propagation is done after query execution
> plan is chosen. It substitutes fields from tables being retrieved later for
> fields from tables being retrieved earlier. However it can't be applied as is
> to any semi-join table.
> The semi-join materialization strategy differs from other semi-join
> strategies that the data from materialized semi-join tables isn't used
> directly but saved to a temporary table first. The materialization isn't
> isolated is a separate step, it is done inline within the nested loop
> execution.
> When it comes to fetch rows from the first table in
> the block of materialized semi-join tables they are isolated and the
> sub_select function is called to materialize result and save it in the
> semi-join result table. Materialization is done once and later data from the
> semi-join result table is used.
> Due to this we can't substitute fields that belong to the semi-join
> for fields from outer query and vice versa.
>
> Example: suppose we have a join order:
>
> ot1 ot2 SJ-Mat(it1 it2 it3) ot3
>
> and equality ot2.col = it1.col = it2.col
> If we're looking for best substitute for 'it2.col', we should pick it1.col
> and not ot2.col.
>
> For a field that is not in a materialized semi-join we must pick a field
> that's not embedded in a materialized semi-join.
>
> Example: suppose we have a join order:
>
> SJ-Mat(it1 it2) ot1 ot2
>
> and equality ot2.col = ot1.col = it2.col
> If we're looking for best substitute for 'ot2.col', we should pick ot1.col
> and not it2.col, because when we run a join between ot1 and ot2
> execution of SJ-Mat(...) has already finished and we can't rely on the value
> of it*.*.
>
> Now the Item_equal::get_first function accepts as a parameter a field being
> substituted and checks whether it belongs to a materialized semi-join.
> Depending on the check result a field to substitute for or NULL is returned.
>
> The sj_strategy field is added to the st_join_table structure. It's a copy of
> the
> POSITION::sj_strategy field and is used to easy checks.
> @ mysql-test/r/subselect_sj.result
> A test case added for the bug#45174.
> @ mysql-test/r/subselect_sj_jcl6.result
> A test case added for the bug#45174.
> @ mysql-test/t/subselect_sj.test
> A test case added for the bug#45174.
> @ sql/item.cc
> Bug#45174: Incorrectly applied equality propagation caused wrong result
> on a query with a materialized semi-join.
> Now the Item_equal::get_first function accepts as a parameter a field being
> substituted.
> @ sql/item_cmpfunc.cc
> Bug#45174: Incorrectly applied equality propagation caused wrong result
> on a query with a materialized semi-join.
>
> Now the Item_equal::get_first function accepts a field being substituted and
> checks whether it belongs to a materialized semi-join. Depending on the
> check
> result a field to substitute for or NULL is returned.
> @ sql/item_cmpfunc.h
> Bug#45174: Incorrectly applied equality propagation caused wrong result
> on a query with a materialized semi-join.
>
> Now the Item_equal::get_first function accepts as a parameter a field being
> substituted.
> @ sql/sql_select.cc
> Bug#45174: Incorrectly applied equality propagation caused wrong result
> on a query with a materialized semi-join.
> The is_sj_materialization_strategy method is added to the JOIN_TAB class to
> check whether JOIN_TAB belongs to a materialized semi-join.
> @ sql/sql_select.h
> Bug#45174: Incorrectly applied equality propagation caused wrong result
> on a query with a materialized semi-join.
>
> The sj_strategy field is added to the st_join_table structure. It's a copy of
> the
> POSITION::sj_strategy field and is used to easy checks.
>
> modified:
> mysql-test/r/subselect_sj.result
> mysql-test/r/subselect_sj_jcl6.result
> mysql-test/t/subselect_sj.test
> sql/item.cc
> sql/item_cmpfunc.cc
> sql/item_cmpfunc.h
> sql/sql_select.cc
> sql/sql_select.h
> === modified file 'mysql-test/t/subselect_sj.test'
> --- a/mysql-test/t/subselect_sj.test 2009-03-19 17:03:58 +0000
> +++ b/mysql-test/t/subselect_sj.test 2009-10-13 09:38:46 +0000
> @@ -216,4 +216,42 @@ WHERE
> HAVING X > '2012-12-12';
> drop table t1, t2;
>
> +--echo #
> +--echo # Bug#45174: Incorrectly applied equality propagation caused wrong
> +--echo # result on a query with a materialized semi-join.
> +--echo #
> +
> +CREATE TABLE `CC` (
> + `pk` int(11) NOT NULL AUTO_INCREMENT,
> + `varchar_key` varchar(1) NOT NULL,
> + `varchar_nokey` varchar(1) NOT NULL,
> + PRIMARY KEY (`pk`),
> + KEY `varchar_key` (`varchar_key`)
> +);
> +
> +INSERT INTO `CC` VALUES
> (11,'m','m'),(12,'j','j'),(13,'z','z'),(14,'a','a'),(15,'',''),(16,'e','e'),(17,'t','t'),(19,'b','b'),(20,'w','w'),(21,'m','m'),(23,'',''),(24,'w','w'),(26,'e','e'),(27,'e','e'),(28,'p','p');
> +
> +CREATE TABLE `C` (
> + `varchar_nokey` varchar(1) NOT NULL
> +);
> +
> +INSERT INTO `C` VALUES
> ('v'),('u'),('n'),('l'),('h'),('u'),('n'),('j'),('k'),('e'),('i'),('u'),('n'),('b'),('x'),(''),('q'),('u');
You could name the tables t1/t2. This is more standard in tests, and
also has the advantage of being lowercase: as I found out (the hard
way), on platforms with case-insensitive filesystems like Windows and
Mac OS X, queries displayed by EXPLAIN EXTENDED use lowercase names. So
if you use C/CC, on Unix your result file will contain C/CC in EXPLAIN
EXTENDED, and c/cc on Windows, so test will fail somewhere.
I have a more minimal testcase, which I used to try to understand the
problem, and which is like your one above but with less rows. If you are
interested, here are the INSERTs:
INSERT INTO CC VALUES
(19,'b','b'),(20,'w','w'),(21,'m','m'),(23,'',''),(24,'w','w'),(26,'e','e'),(27,'e','e'),(28,'p','p');
INSERT INTO C VALUES ('b'),('u');
Then the bug manifests - the SELECT returns 'b' instead of 0 rows.
Some debugging info which may be interesting: without the patch,
when we enter make_join_select(), "cond" contains the XOR which now has
a column from C as one argument (due to equality propagation), then the
XOR is lost (I don't clearly see how): the condition attached to CC's
JOIN_TAB is only that its varchar columns should be equal.
With your patch, when we enter make_join_select(), "cond" contains the
XOR which has only columns from CC as arguments; then the condition
attached to CC's JOIN_TAB includes the XOR, as is correct.
> === modified file 'sql/item.cc'
> --- a/sql/item.cc 2009-06-09 16:53:34 +0000
> +++ b/sql/item.cc 2009-10-13 09:38:46 +0000
> @@ -4883,7 +4883,7 @@ Item *Item_field::replace_equal_field(uc
> return this;
> return const_item;
> }
> - Item_field *subst= item_equal->get_first();
> + Item_field *subst= item_equal->get_first(this);
In some of the 3 places where get_first() is used, the new argument is
"this", in other places it's NULL.
I don't know what each of the 3 places is for, could you please explain?
> if (subst && field->table != subst->field->table &&
> !field->eq(subst->field))
> return subst;
> }
>
> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc 2009-06-09 16:53:34 +0000
> +++ b/sql/item_cmpfunc.cc 2009-10-13 09:38:46 +0000
> @@ -5376,7 +5376,7 @@ longlong Item_equal::val_int()
>
> void Item_equal::fix_length_and_dec()
> {
> - Item *item= get_first();
> + Item *item= get_first(NULL);
> eval_item= cmp_item::get_comparator(item->result_type(),
> item->collation.collation);
> }
> @@ -5439,3 +5439,115 @@ void Item_equal::print(String *str, enum
> str->append(')');
> }
>
> +
> +/*
> + @brief Get the first equal field of multiple equality.
> + @param[in] field the field to get equal field to
> +
> + @details Get the first field of multiple equality that is equal to the
> + given field. In order to make semi-join materialization strategy work
> + correctly we can't propagate equal fields from upper select to a
> + materialized semi-join.
My impression is that there is already code with a similar goal: it is
in eliminate_item_equal(), starting with this comment:
/*
item_field might refer to a table that is within a semi-join
materialization nest. In that case, the join order looks like this:
outer_tbl1 outer_tbl2 SJM (inner_tbl1 inner_tbl2) outer_tbl3
We must not construct equalities like
outer_tbl1.col = inner_tbl1.col
because they would get attached to inner_tbl1 and will get
evaluated
during materialization phase, when we don't have current value of
outer_tbl1.col.
*/
This comment seems to be about our testcase, doesn't it?
If this code has the goal to fix our testcase, maybe it just doesn't
work and should be fixed?
I worry about adding a new solution instead of repairing the original
solution.
I may also be very wrong, because in spite of my efforts, I don't
understand this code well. Maybe this existing code has nothing to do
with our testcase.
> + Thus the fields is returned according to following rules:
> +
> + 1) If the given field belongs to a semi-join then the first field in
> + multiple equality which belong to the same semi-join is returned.
> + Otherwise NULL is returned.
> + 2) If the given field doesn't belong to a semi-join then
> + the first field in the multiple equality that doesn't belong to any
> + semi-join is returned.
> + If all fields in the equality are belong to semi-join(s) then NULL
> + is returned.
> + 3) If no field is given then the first field in the multiple equality
> + is returned without regarding whether it belongs to a semi-join or not.
> +
> + @retval Found first field in the multiple equality.
> + @retval 0 if no field found.
Detail: please write NULL above, instead of 0, because it's a pointer.
> +*/
> +
> +Item_field* Item_equal::get_first(Item_field *field)
> +{
> + List_iterator<Item_field> it(fields);
> + Item_field *item;
> + JOIN_TAB *field_tab;
> +
> + if (!field)
> + return fields.head();
> + /*
> + Of all equal fields, return the first one we can use. Normally, this is the
> + field which belongs to the table that is the first in the join order.
> +
> + There is one exception to this: When semi-join materialization strategy is
> + used, and the given field belongs to a table within the semi-join nest, we
> + must pick the first field in the semi-join nest.
> +
> + Example: suppose we have a join order:
> +
> + ot1 ot2 SJ-Mat(it1 it2 it3) ot3
> +
> + and equality ot2.col = it1.col = it2.col
> + If we're looking for best substitute for 'it2.col', we should pick it1.col
> + and not ot2.col.
Please add an explanation of why we should not pick ot2.col.
> + */
> +
> + field_tab= field->field->table->reginfo.join_tab;
> + if (field_tab->sj_strategy == SJ_OPT_MATERIALIZE ||
> + field_tab->sj_strategy == SJ_OPT_MATERIALIZE_SCAN)
> + {
I see that in the test this branch is taken, which is good.
> + /*
> + It's a field from an materialized semi-join. We can substitute it only
> + for a field from the same semi-join.
> + */
> + JOIN_TAB *first;
> + JOIN *join= field_tab->join;
> + uint tab_idx= field_tab - field_tab->join->join_tab;
> + /* Find first table of this semi-join. */
> + for (int i=tab_idx; i >= join->const_tables; i--)
> + {
> + if (join->best_positions[i].sj_strategy == SJ_OPT_MATERIALIZE ||
> + join->best_positions[i].sj_strategy == SJ_OPT_MATERIALIZE_SCAN)
> + first= join->join_tab + i;
> + else
> + // Found first tab that doesn't belong to current SJ.
> + break;
Detail: I think the recommended syntax is:
else
{
/* Found first .. */
break;
}
> + }
> + /* Find an item to substitute for. */
> + while ((item= it++))
> + {
> + if (item->field->table->reginfo.join_tab >= first)
> + {
> + /*
> + If we found given field then return NULL to avoid unnecessary
> + substitution.
> + */
> + return (item != field) ? item : NULL;
> + }
> + }
> + }
> + else
> + {
> + /*
> + The field is not in SJ-Materialization nest. We must return the first
> + field that's not embedded in a SJ-Materialization nest.
> + Example: suppose we have a join order:
> +
> + SJ-Mat(it1 it2) ot1 ot2
> +
> + and equality ot2.col = ot1.col = it2.col
> + If we're looking for best substitute for 'ot2.col', we should pick ot1.col
> + and not it2.col, because when we run a join between ot1 and ot2
> + execution of SJ-Mat(...) has already finished and we can't rely on the
> + value of it*.*.
> + */
> + while ((item= it++))
> + {
Here I'm not sure this new code is properly covered by tests.
I executed only t/subselect*.test, so maybe I missed something. In all
of those tests, when we came into this branch, what was returned was
identical to fields.head() (i.e. the while() stopped at first
iteration), so at least with subselect*.test there is no difference
between this code and a simple "return fields.head()".
Could you please commit a testcase where below we stop at the Nth
iteration where N>=2? Or just point to a current test where this happens.
> + field_tab= item->field->table->reginfo.join_tab;
> + if (!(field_tab->sj_strategy == SJ_OPT_MATERIALIZE ||
> + field_tab->sj_strategy == SJ_OPT_MATERIALIZE_SCAN))
> + return item;
> + }
> + }
> + // Shouldn't get here.
Detail: If I remember correctly, when there is only a comment on the
line it should use /* */. Or you can keep // and move it to the right of
DBUG_ASSERT(0).
> + DBUG_ASSERT(0);
> + return NULL;
> +}
>
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc 2009-06-30 08:03:05 +0000
> +++ b/sql/sql_select.cc 2009-10-13 09:38:46 +0000
> @@ -7911,6 +7911,7 @@ static void fix_semijoin_strategies_for_
> if (tablenr != first)
> pos->sj_strategy= SJ_OPT_NONE;
> remaining_tables |= s->table->map;
> + s->sj_strategy= pos->sj_strategy;
> }
> }
>
> @@ -11706,7 +11707,7 @@ Item *eliminate_item_equal(COND *cond, C
> head= item_const;
> else
> {
> - head= item_equal->get_first();
> + head= item_equal->get_first(NULL);
> it++;
> }
> Item_field *item_field;
>
> === modified file 'sql/sql_select.h'
> --- a/sql/sql_select.h 2009-05-07 20:48:24 +0000
> +++ b/sql/sql_select.h 2009-10-13 09:38:46 +0000
> @@ -274,6 +274,13 @@ typedef struct st_join_table
> /* NestedOuterJoins: Bitmap of nested joins this table is part of */
> nested_join_map embedding_map;
>
> + /*
> + Semi-join strategy to be used for this join table. This is a copy of
> + POSITION::sj_strategy field. This field is set up by the
> + fix_semijion_strategies_for_picked_join_order.
s/jion/join
Are we sure that the read of this new member by
Item_equal::get_first()
is always preceded by a set in
fix_semijoin_strategies_for_picked_join_order()?
To enforce this rule, I suggest:
* initialize it to an impossible value (-1?) when constructing the JOIN_TAB
* Add DBUG_ASSERT(sj_strategy != -1) in Item_equal::get_first().
Regarding what place is "constructing the JOIN_TAB": I see the
constructor for JOIN_TAB is empty, so I suspect that there must already
exist some function which initializes JOIN_TABs
(JOIN::make_simple_join()?), where we could add initialization of
sj_strategy to -1.
> + */
> + uint sj_strategy;
> +
> void cleanup();
> inline bool is_using_loose_index_scan()
> {
It is not your fault *at all*, but it is quite frustrating to be
reviewer of some patch to some code which I don't understand enough,
even though I try with gdb. I'm not used to doing reviews "in
blindness". I guess it's going to be this way with Optimizer bugs for a
long time so I have to get used to it :-( Still quite worrying, though.
Also, I'd be interested in your reply on Timour's first comment in his
review email.
--
Mr. Guilhem Bichot <guilhem@stripped>
Sun Microsystems / MySQL, Lead Software Engineer
Bordeaux, France
www.sun.com / www.mysql.com