On 26.05.11 10.55, Øystein Grøvlen wrote:
> Hi Roy,
>
> See inline for some comments on the changes in test results:
>
> On 12/05/2011 17:26, Roy Lyseng wrote:
> > #At file:///home/rl136806/mysql/repo/mysql-work0/ based on
> revid:roy.lyseng@stripped
> >
> > 3371 Roy Lyseng 2011-05-12
> > Bug#12316645: Wrong cost calculation with optimizer_join_cache_level settings
> >
> > best_access_path() did not take properly into account the setting of
> > optimizer_join_cache_level when calculating the access path that gives
> > the smallest cost to access a given table.
> >
> > There are three changes applied in this bug fix:
> > - best_access_path() properly accounts for join cache level 0, ie
> > do not use join caching.
> > - best_access_path() does account for join cache level greater than 3
> > when a table is outer-joined to the current join prefix.
> > - best_access_path() does account for join cache level greater than 3
> > when a table is semi-joined to the current join prefix.
> >
> > Static properties for setting join buffering has been centralized to
> > a new function set_join_buffer_properties().
> >
> > Further on, setting of join cache level is changed.
> > check_join_cache_usage() is renamed to setup_join_cache().
> >
> > Cost calculation for joins with join buffering is better than it used
> > to be, but it is still not fair. Sometimes, cost is calculated using
> > join buffering, but the decision has to be reverted later.
> > Cost calculation for BKA join strategy is also not considered properly
> > (but this patch changes nothing in this area).
> >
>
> ...
>
> > === modified file 'mysql-test/r/join_cache_jcl4.result'
> > --- a/mysql-test/r/join_cache_jcl4.result 2011-04-28 11:55:16 +0000
> > +++ b/mysql-test/r/join_cache_jcl4.result 2011-05-12 15:25:46 +0000
> > @@ -1449,8 +1449,8 @@ explain
> > select t1.a, count(t2.p) as count
> > from t1 left join t2 on t1.a=t2.a and t2.p % 2 = 1 group by t1.a;
> > id select_type table type possible_keys key key_len ref rows Extra
> > -1 SIMPLE t1 index NULL PRIMARY 4 NULL 4 Using index
> > -1 SIMPLE t2 ref i_a i_a 5 test.t1.a 2 Using where
> > +1 SIMPLE t1 index NULL PRIMARY 4 NULL 8 Using index; Using temporary; Using
> filesort
> > +1 SIMPLE t2 ALL i_a NULL NULL NULL 10 Using where; Using join buffer (BNL,
> incremental buffers)
>
> Hm. The text in the result file just above this diff is
> The following must not show "using join cache":
>
> Looking at the original bug report (Bug#40136) and fix, it seems the
> intention is to prevent BKA in this case. So it is probably the
> comment that is wrong. I also notice your patch changes the for this
> query from BKA to BNL for higher values of join cache level. Join
> caching appeared with the fix for Bug#42955, so maybe it is not longer
> true that BKA cannot be used?
It looks so, yes. I will delete that line from the test case.
>
> I also looked at handler request for the above query. The change does
> not look like an improvement in this case. On the contrary, both
> random and sequential accesses increases:
>
> show status like 'Handler_read%';
> Variable_name Value
> Handler_read_first 1
> -Handler_read_key 8
> +Handler_read_key 10
> Handler_read_last 0
> -Handler_read_next 18
> +Handler_read_next 8
> Handler_read_prev 0
> -Handler_read_rnd 0
> -Handler_read_rnd_next 0
> +Handler_read_rnd 8
> +Handler_read_rnd_next 20
The cost model is far from perfect...
Without join buffering, the cost for table t2 is 16.0 for the lookups (8 lookups
multiplied by 2 for each?) and 16 x 0.2 for the compare cost, giving a total
cost of 22.8175 before the sort stage. The row count estimate is 16 and a sort
cost of 16 is added to get a total cost of 38.81. I assume that the sort stage
is removed later in optimization.
With join buffering, the cost for table t2 is 2.422 for the table scan and 64 x
0.2 for the compare cost, giving a total cost of 18.84. The row count estimate
is 64, which means that after sorting the total cost estimate is 82.84.
So, the reason for the discrepancy here is:
- Sorting is not considered when comparing costs after reading table t2.
- If it were considered, it would still have been wrong because the sort stage
is later deleted when not using join buffering.
- Row count estimates are wrong, giving a huge sort cost for BNL (which however
is not considered in the total cost estimate).
I do not think there is anything I can do about this, except noting the above
points for the cost model refactoring project...
>
> ...
>
>
> > === modified file 'mysql-test/r/join_nested.result'
> > --- a/mysql-test/r/join_nested.result 2010-12-06 13:12:51 +0000
> > +++ b/mysql-test/r/join_nested.result 2011-05-12 15:25:46 +0000
>
> ...
>
> > @@ -1729,11 +1729,11 @@ LEFT JOIN
> > (t5 JOIN t4 ON t5.carrier_id = t4.id)
> > ON t4.carrier = t1.carrier;
> > id select_type table type possible_keys key key_len ref rows Extra
> > -1 SIMPLE t2 index package_id package_id 5 NULL 45 Using where; Using index
> > -1 SIMPLE t1 eq_ref PRIMARY PRIMARY 4 test.t2.package_id 1
> > +1 SIMPLE t1 ALL PRIMARY NULL NULL NULL 36
> > +1 SIMPLE t3 ref package_id package_id 5 test.t1.id 1 Using where; Using index
> > +1 SIMPLE t2 ref package_id package_id 5 test.t3.package_id 5 Using where;
> Using index
> > 1 SIMPLE t4 eq_ref PRIMARY,id PRIMARY 2 test.t1.carrier 1
> > 1 SIMPLE t5 ref carrier_id carrier_id 5 test.t4.id 22 Using index
> > -1 SIMPLE t3 ref package_id package_id 5 test.t1.id 1 Using where; Using index
>
> The handler statistics indicates that the old plan was probably better:
>
> show status like 'Handler_read%';
> Variable_name Value
> -Handler_read_first 1
> -Handler_read_key 56
> +Handler_read_first 0
> +Handler_read_key 74
> Handler_read_last 0
> -Handler_read_next 51
> +Handler_read_next 38
> Handler_read_prev 0
> Handler_read_rnd 0
> -Handler_read_rnd_next 0
> +Handler_read_rnd_next 37
Current join order: t2 - t1 - t4 - t5 - t3
New join order: t1 - t3 - t2 - t4 - t5
The reason why the current join order is no longer select has to do with t4. The
access for this table used to be an eq_ref, but now a table scan and join
buffering appears to be cheaper. The problem is that the fanout increases from 1
to 5, hence the subsequent table accesses become very expensive and the plan is
eventually pruned.
It is probably correct that join buffering is cheaper than eq_ref for this
table, but the problem is the wrong fanout estimate. best_access_path() should
have used the fanout estimate given by the eq_ref access method (ie 1) when
calculating the cost. Nevertheless, setup_join_buffering() will later cancel
join buffering for t4, because the second table in the outer join nest (t5) does
not allow join buffering.
>
> ...
>
> > === modified file 'mysql-test/r/select_all.result'
> > --- a/mysql-test/r/select_all.result 2011-01-13 10:48:28 +0000
> > +++ b/mysql-test/r/select_all.result 2011-05-12 15:25:46 +0000
> > @@ -2398,7 +2398,7 @@ INSERT INTO t2 VALUES ('one'),('two'),('
> > EXPLAIN SELECT * FROM t1 LEFT JOIN t2 USE INDEX (a) ON t1.a=t2.a;
> > id select_type table type possible_keys key key_len ref rows Extra
> > 1 SIMPLE t1 ALL NULL NULL NULL NULL 5
> > -1 SIMPLE t2 ref a a 23 test.t1.a 2 Using where
> > +1 SIMPLE t2 ALL a NULL NULL NULL 5 Using where; Using join buffer (BNL,
> incremental buffers)
>
> The above query is probably no longer testing what it was intended
> to test. ("Test case for bug 7520: a wrong cost of the index for a
> BLOB field").
We should probably have more tests with join_cache_level=0 for improved coverage.
Thanks,
Roy