Hello,
After some night-time thinking, a bit less work for you:
Guilhem Bichot a écrit, Le 20.10.2010 23:13:
> Roy Lyseng a écrit, Le 07.10.2010 17:19:
>> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>> revid:tor.didriksen@stripped
>>
>> 3260 Roy Lyseng 2010-10-07
>> Bug#45174: Incorrectly applied equality propagation caused wrong
>> result
>> on a query with a materialized semi-join.
>> Bug#50019: Wrong result for IN-query with materialization.
>> Bug#52068: Optimizer generates invalid semijoin materialization
> I looked at the changes in pushed conditions introduced by the patch (by
> "pushed", I mean conditions which we attach to each table, evaluated
> after a row has been read from the table; not index/engine condition
> pushdown). It will be more understandable for you, if you use the same
> method as I did: I prepared a playground for this. I have prepared a
> custom version of the optimizer trace tree (you can get it from tree
> mysql-next-mr-guilhem at usual machine). It has your patch applied, both
> the code and test file changes. As for the result files, I ran
> ./mtr --record t/subquery_sj*.test --opt-trace-protocol
> which generates the optimizer trace of each SELECT or EXPLAIN SELECT
> (good work Jorgen). So the result files show conditions "after your patch".
> If you then revert only the code piece of your patch by applying the
> patch attached to this mail, you can re-run tests and see the
> differences; the differences in the optimizer traces will show what
> condition disappears/appears/moves. Remember: the "result" file is
> after-your-patch; the "reject" file is before-your-patch. The
> interesting test is subquery_sj_mat. For browsing differences, I warmly
> recommend a GUI diff tool, which shows line numbers and can do text
> searches (I use kdiff3). Please ignore the differences of records'
> counts in the trace, they are probably due to InnoDB's "random"
> statistics. Conditions generated by make_cond_after_sjm() are named
> "attached_after_semijoin_materialization" in the optimizer trace.
>
> Below I go through some noteworthy changes of subquery_sj_mat which I
> observed by the procedure above.
>
> 1) I see a good change, the last query of the test for BUG#52068 (line
> 109510 of the result file): after your patch we have
> make_cond_after_sjm() which produces "it2.a=ot1.a" which is then
> attached to the mat-scan-SJM, so tested before we reach the last table
> in plan (ot4). That is a win.
>
> 2) Another win, the last query of the test for BUG#45174 (line 108683 of
> the result file): your patch moves a condition from after-SJM to
> inside-SJM (from "attached_after_semijoin_materialization" to
> "attached"-to-t2)
>
> REQ 3) This time make_cond_after_sjm() produces a condition only after
> your patch: at line 2359 of the result file:
> "select * from t2 where t2.a in (select a from t1)"
> mat-lookup is used, which, I guess, looks up value of t2.a in the
> materialized table, but make_cond_after_sjm() produces
> "t1.a=t2.a"
> which is think is a superfluous condition to check (as index lookup
> should already guarantee this?).
> There are other examples, in the queries which follow the one above
> (just scroll down in the diff). Unfortunately it seems to be often such
> unneeded conditions (repeating what the index lookup already
> implements), for mat-lookup, and added by the patch.
> Somehow it makes the questionable comment ("a regular semi-join etc") be
> wrong more often.
> On the one hand, I don't see this as a big problem: I'm not sure this is
> a real speed penalty to check this, even for every row.
> On the other hand, I cannot explain why we see this change. Could you
> please have a look, to be sure?
_Maybe_ it is because
- the optimizer can, at the end of make_cond_for_table_from_pred(),
delete a condition of the form x=y if it notices that "ref" access on
index "y" with value "x" is already decided.
- but is _not_ aware of the "ref" access which will be done in the
mat-lookup temp table?
Could that explain that the superfluous equality condition is not removed?
If yes, then it's normal, I don't think we need to make the optimizer
more aware, and we can close this item.
> REQ 4) at line 65381 of the result file: query
> explain select *
> from t0 where a in
> (select t2.a+t3.a from t1 left join (t2 join t3) on t2.a=t1.a and
> t3.a=t1.a);
> is worrying: before the patch it used to have a condition attached to
> t3, i.e. evaluated inside materialization, which had the potential to
> make the materialized table smaller. The patch moves this condition out
> into a make_cond_after_sjm() condition (so, the materialized table will
> be bigger). I don't understand what can explain the move, and if it has
> to be. I don't think it can be be explained the same way as the "kp1<20"
> move which we discussed in a previous mail. I don't see that equality
> propagation plays a direct role here. Could you please investigate this?
Found it. The condition which used to be pushed inside materialization
involved the outer table, which is a bug (same as BUG#45174), so it's
normal that your patch moves it to after the materialized table.
If you want to build the tree I prepared, you can do it with
cd bzr tree
mkdir build
cd build
cmake .. -DOPTIMIZER_TRACE=1 -DWITH_DEBUG=1