List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:February 21 2011 9:29am
Subject:Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724
View as plain text  
On 16.02.11 16.31, Martin Hansson wrote:
> Roy Lyseng wrote:
>> Hi Martin,
>>
>> thank you for this patch. I have now revisited all code, and AFAIK things look
>> good. I have some minor suggestions that you need to consider, though.
> There is a risk of moving in circles. There are items in the review that we have
> either discussed already and that I - perhaps mistakenly - thought we were done
> with. We need to decide which parts are agreed upon and we have to continue
> previous discussions unless resolved. I will hold the next commit until I fully
> understand all that must be done to this one, so that you don't have to go over
> every consecutive patch all over again.

Ok.
>>
>>
>>
>> greedy_optimizer.result shows some anomalies, as discussed on IRC. Apparently,
>> there are some shortcuts inserted, even though there are dependencies that
>> should prevent them. Please investigate these issues.
> Yes, the very embarrassing reason is that we should not iterate through table
> "map", but the join_tab array. The "map" is the order in which tables are
> opened. Usually that's the parse order and our perfect little tests usually list
> tables in the desired plan order.
>
> You'll see how it's done in the new patch.
>>
>>> ...
>>
>> You could generally use this syntax to make the tests easier to maintain
>> consistently, when you need to both execute and explain a query:
>>
>> let $query=SELECT ...;
>> eval EXPLAIN $query;
>> eval $query;
> I will consider it for future work. But the test cases have stayed the same for
> many consecutive commits and I don't want to start changing them now.
>>> ...

I do not understand the argument - but ... Ok.
>>
>> record -> row.
>>> +EXPLAIN +EXPLAIN
>>> +SELECT * FROM t1 JOIN t2key ON a = b JOIN t3 ON a = c WHERE d> 0;
>>> +SELECT * FROM t1 JOIN t2key ON a = b JOIN t3 ON a = c WHERE d> 0;
>>
>> Perhaps you should re-run these tests without join caching to see the
> difference.
> As I said in previous e-mails I don't see the reason. I have other similar - and
> more suitable - tests that do that.

Ok.

>>> ...
>>> +EXPLAIN
>>> +SELECT * FROM t1 JOIN t2 USING(a) JOIN t3 ON t2.a = t3.b JOIN t4 ON t1.b =
>>> t4.b;
>>> +SELECT * FROM t1 JOIN t2 USING(a) JOIN t3 ON t2.a = t3.b JOIN t4 ON t1.b =
>>> t4.b;
>>> +
>>> +SHOW STATUS LIKE 'handler_read_%';
>>
>> You could add another predicate that does not perform any "real" filtering,
>> but makes shortcutting impossible, and compare handler statistics for the two
>> queries.
> That would be great for demonstrating one aspect of the benefits of the
> optimization (number of rows read from handler). But it does not belong in a
> regression test.
>>>
>>> +DROP TABLE t1, t2, t3;
>>> +--echo # Test 2c
>>> +--echo # Test that we don't erroneously take a short-cut at the end of a
>>> table scan.
>> Can you explain this in more detail?
> I think it's rather clear. But maybe adding something like "I.e. an "end of
> records" state is not sufficient for taking a short-cut. It must be that all
> tuples were rejected by the filtering condition.". Something like that? Feel
> free to edit.
Test that we don't erroneously take a short-cut when the last row of the scan 
evaluates to false, even if there are matching rows earlier in the scan.
>>>
>>> +--echo # Test of internal representation of pushdown conditions.
>>> +--echo # The join conditions will be encoded in ref access, while the WHERE
>>> +--echo # expression remains in the pushdown condition.
>>
>> WHERE expression -> WHERE condition.
>>
>> When you mention pushdown conditions, I immediately think about ECP or ICP. I
>> think that we need a better term... Can table condition be better?
> Other similar code in the optimizer uses "pushdown condition" in abundance. I
> can't see why such a small worklog as this one should be the avant garde of a
> terminology shift. Also, I think the term pushdown condition existed before ICP.

Of course it did. But ICP/ECP has cluttered things up, and I do not think 
"pushdown condition" is a good term. But of course this is just one of a number 
of weird terminology uses in MySQL...
>>> ...
>>> +DROP TABLE t1, t2, t3, t4;
>>> +--echo # Test 4. Test of execution step.
>>> +--echo # In this test we are testing that a short-cut is properly separated
>>> from
>>> +--echo # the case of a normal 'end of records' state on the last table in
>>> the plan.
>>> +--echo # This particular case gets hit only when
>>> +--echo # - The optimizer reorders the tables
>>> +--echo # - We have a SELECT<table>.*
>>
>> I did not get this...
> Then you'll be happy to see this whole test is gone in the next patch. :-)
>> ...
>>>
>>> +--echo # Should not use short-cuts.
>>> +EXPLAIN
>>> +SELECT * FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.b = t3.a AND t1.c =
> t4.a;
>>> +SELECT * FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.b = t3.a AND t1.c =
> t4.a;
>>> +
>>> +--echo # Should not use short-cuts.
>>> +EXPLAIN
>>> +SELECT * FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.b = t3.a AND t1.c =
> t4.a;
>>> +SELECT * FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.b = t3.a AND t1.c =
> t4.a;
>>> +
>>> +SHOW STATUS LIKE 'handler_read_%';
>>
>> Please rerun without join buffering and expose the difference.
> I don't understand what would I use the result for. Do other optimizations have
> these kinds of tests?

IMHO it would be a good regression test for join shortcutting.
>> ...
>>>
>>> +--echo #
>>> +--echo # Test 6. Test of short-cuts in conjunction with outer join.
>>> +--echo #
>>> +--echo # Test 6a. Test of outer join detection.
>>> +--echo # In this case the execution order is t1, t2, t3, t4.
>>> +--echo # t2, t3 and t4 will form an 'outer join nest' with which t1 is left
>>> +--echo # outer joined. But (t3, t2) is left joined with t4. Since t4 is
> just
>>> one
>>> +--echo # table, however, the 'outer join nest' of t4 is only implicitly
>>> +--echo # represented in the query plan. The only way to detect it is by
> looking
>>> +--echo # at t4's first_inner pointer, which would be null for the last
> inner
>>> table
>>> +--echo # in an outer join nest. But since it points to itseld, t4 is in its
> own
>>> +--echo # nest. But there is no NESTED_JOIN to represent it, since it
> follows
>>> the
>>> +--echo # 'normal' left-deep tree representation.
>>
>> It's probably overkill to explain internal data structures in a test file...
> Yes, but having it there was very helpful to me.
>>
>> "Test that combines outer join operation with one table and outer join
>> operation with multiple inner tables" might just do...
> Absolutely, but now that it's there I see no reason to take information away.
>>> ....
>>> +EXPLAIN
>>> +SELECT * FROM t1
>>> + LEFT JOIN
>>> + (t2 JOIN t3 ON b = c JOIN t4 ON b = d JOIN t5 ON d = e)
>>> + ON a = b;
>>> +SELECT * FROM t1
>>> + LEFT JOIN
>>> + (t2 JOIN t3 ON b = c JOIN t4 ON b = d JOIN t5 ON d = e)
>>> + ON a = b;
>>
>> What would happen with "a=d" instead of "a=b" ?
> Not much. The shortcut would still be illegal. The point of this test is a
> short-cut from t4 to t2, i.e. both tables are at non-top level outer join nest.
>>> ...
>>> + This is the set of tables directly referenced by this Item.
>>
>> Please remove "directly" from this sentence, I think it can be misunderstood.
> ok.
>>> ...
>>> + Updates a short-cut from from one plan node to another. The function
>>
>> typo: from from
> You must be reviewing an old patch. I fixed this after your previous review.
>>
>>> ...
>>> +static bool update_star_dependency(JOIN *join, JOIN_TAB *join_tab,
>>> + JOIN_TAB *dependent)
>>> +{
>>> + if (dependent == NULL)
>>> + return false;
>> I think that this statement can be omitted. I did DBUG_ASSERT(dependent) w/o
>> crash.
> Not applicable anymore, this function is no more. :-)
>>> ...
>>> + table_map mask= 1<< table_index;
>>
>> Pls make this computation 64-bit safe: table_map mask= (table_map)1 <<
>> table_index
>>> + while (mask> 0) {
>> -> while (mask != 0)
> Ditto.
>>> ...
>>> + if(join->join_tab == NULL)
>>> + return;
>>
>> Space before parenthesis, please.
> oops... :-)
>>> ...
>>> join_tab->table->null_row=0;
>>> if (end_of_records)
>>> {
>>> + join->return_tab= join_tab;
>> Maybe move this ahead of the if and delete the following similar assignment?
> Good idea. I think this came from a bazaar pull.
>>> ...
>>> if (found)
>>> {
>>> + ++join_tab->produced_rows;
>>
>> I think this should go into the *last* "if (found)" block in the function.
>> Otherwise you need to document why it is here.
> I'm not very familiar with this code. Can you explain why that is more correct?

Because this is the place where all matching has been done, and where the row is 
pushed to the next level.
>
> Are you saying I would not need to document if I put it in the other place?

It goes without saying that you can increment produced_rows there (I think).
>
> The best would probably be if we have a test case that fails unless it is in the
> right place. Do you think such a test can be constructed?

No.

>>> ...
>>>
>>> + /**
>>> + The number of rows produced at this plan node with a common prefix
>>> + produced by the preceding plan nodes.
>>> +
>>> + The counter is reset each time a partial row has been produced (and
>>> + pushdown conditions are true) for the current table that corresponds to
>>> + the partial row produced for all previous tables in the current join.
>>> + */
>>> + uint produced_rows;
>> Should be ulonglong. The last sentence of the comment was difficult to
>> understand.
> Good point, but in that case it should be ha_rows. Sentence changed.
>>> +
>>> + st_join_table *shortcut;
>>
>> This is probably worthy a comment as well?
> Done.
>>> ....
>>> + static bool are_inner_joined(JOIN_TAB *a, JOIN_TAB *b)
>>> + {
>>> + DBUG_ASSERT(a< b);
>>> + return
>>> + a->table->pos_in_table_list->embedding ==
>>> + b->table->pos_in_table_list->embedding&&
>>> + (b->on_expr_ref == NULL || *b->on_expr_ref == NULL);
>>> + }
>> I think that you can safely delete the part "b->on_expr_ref == NULL".
> I'd rather not, as said in earlier e-mails. If you disagree, please reply to
> that e-mail so we get the whole context.

If you can find even one such use of on_expr_ref in existing code, I would allow 
it. This will just confuse the use of that field.
>
> Thank you for the review!
>
> Martin
>

Thanks,
Roy
Thread
Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Roy Lyseng11 Feb
  • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Martin Hansson11 Feb
    • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Roy Lyseng11 Feb
  • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Martin Hansson16 Feb
    • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Roy Lyseng21 Feb
      • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Martin Hansson16 Mar