List:Commits« Previous MessageNext Message »
From:Martin Hansson Date:February 16 2011 3:31pm
Subject:Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724
View as plain text  
Roy Lyseng wrote:
> Hi Martin,
>
> thank you for this patch. I have now revisited all code, and AFAIK 
> things look good. I have some minor suggestions that you need to 
> consider, though.
There is a risk of moving in circles. There are items in the review that 
we have either discussed already and that I - perhaps mistakenly - 
thought we were done with. We need to decide which parts are agreed upon 
and we have to continue previous discussions unless resolved. I will 
hold the next commit until I fully understand all that must be done to 
this one, so that you don't have to go over every consecutive patch all 
over again.
>
>
>
> greedy_optimizer.result shows some anomalies, as discussed on IRC. 
> Apparently, there are some shortcuts inserted, even though there are 
> dependencies that should prevent them. Please investigate these issues.
Yes, the very embarrassing reason is that we should not iterate through 
table "map", but the join_tab array. The "map" is the order in which 
tables are opened. Usually that's the parse order and our perfect little 
tests usually list tables in the desired plan order.

You'll see how it's done in the new patch.
>
>> ...
>
> You could generally use this syntax to make the tests easier to 
> maintain consistently, when you need to both execute and explain a query:
>
> let $query=SELECT ...;
> eval EXPLAIN $query;
> eval $query;
I will consider it for future work. But the test cases have stayed the 
same for many consecutive commits and I don't want to start changing 
them now.
>> ...
>
> record -> row.
>> +EXPLAIN 
>> +EXPLAIN
>> +SELECT * FROM t1 JOIN t2key ON a = b JOIN t3 ON a = c WHERE d>  0;
>> +SELECT * FROM t1 JOIN t2key ON a = b JOIN t3 ON a = c WHERE d>  0;
>
> Perhaps you should re-run these tests without join caching to see the 
> difference.
As I said in previous e-mails I don't see the reason. I have other 
similar - and more suitable - tests that do that.
>> ...
>> +EXPLAIN
>> +SELECT * FROM t1 JOIN t2 USING(a) JOIN t3 ON t2.a = t3.b JOIN t4 ON 
>> t1.b = t4.b;
>> +SELECT * FROM t1 JOIN t2 USING(a) JOIN t3 ON t2.a = t3.b JOIN t4 ON 
>> t1.b = t4.b;
>> +
>> +SHOW STATUS LIKE 'handler_read_%';
>
> You could add another predicate that does not perform any "real" 
> filtering, but makes shortcutting impossible, and compare handler 
> statistics for the two queries.
That would be great for demonstrating one aspect of the benefits of the 
optimization (number of rows read from handler). But it does not belong 
in a regression test.
>>
>> +DROP TABLE t1, t2, t3;
>> +--echo # Test 2c
>> +--echo # Test that we don't erroneously take a short-cut at the end 
>> of a table scan.
> Can you explain this in more detail?
I think it's rather clear. But maybe adding something like "I.e. an "end 
of records" state is not sufficient for taking a short-cut. It must be 
that all tuples were rejected by the filtering condition.". Something 
like that? Feel free to edit.
>>
>> +--echo # Test of internal representation of pushdown conditions.
>> +--echo # The join conditions will be encoded in ref access, while 
>> the WHERE
>> +--echo # expression remains in the pushdown condition.
>
> WHERE expression -> WHERE condition.
>
> When you mention pushdown conditions, I immediately think about ECP or 
> ICP. I think that we need a better term... Can table condition be better?
Other similar code in the optimizer uses "pushdown condition" in 
abundance. I can't see why such a small worklog as this one should be 
the avant garde of a terminology shift. Also, I think the term pushdown 
condition existed before ICP.
>> ...
>> +DROP TABLE t1, t2, t3, t4;
>> +--echo # Test 4. Test of execution step.
>> +--echo # In this test we are testing that a short-cut is properly 
>> separated from
>> +--echo # the case of a normal 'end of records' state on the last 
>> table in the plan.
>> +--echo # This particular case gets hit only when
>> +--echo # - The optimizer reorders the tables
>> +--echo # - We have a SELECT<table>.*
>
> I did not get this...
Then you'll be happy to see this whole test is gone in the next patch. :-)
> ...
>>
>> +--echo # Should not use short-cuts.
>> +EXPLAIN
>> +SELECT * FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.b = t3.a AND 
>> t1.c = t4.a;
>> +SELECT * FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.b = t3.a AND 
>> t1.c = t4.a;
>> +
>> +--echo # Should not use short-cuts.
>> +EXPLAIN
>> +SELECT * FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.b = t3.a AND 
>> t1.c = t4.a;
>> +SELECT * FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.b = t3.a AND 
>> t1.c = t4.a;
>> +
>> +SHOW STATUS LIKE 'handler_read_%';
>
> Please rerun without join buffering and expose the difference.
I don't understand what would I use the result for. Do other 
optimizations have these kinds of tests?
> ...
>>
>> +--echo #
>> +--echo # Test 6. Test of short-cuts in conjunction with outer join.
>> +--echo #
>> +--echo # Test 6a. Test of outer join detection.
>> +--echo # In this case the execution order is t1, t2, t3, t4.
>> +--echo # t2, t3 and t4 will form an 'outer join nest' with which t1 
>> is left
>> +--echo # outer joined. But (t3, t2) is left joined with t4. Since t4 
>> is just one
>> +--echo # table, however, the 'outer join nest' of t4 is only implicitly
>> +--echo # represented in the query plan. The only way to detect it is 
>> by looking
>> +--echo # at t4's first_inner pointer, which would be null for the 
>> last inner table
>> +--echo # in an outer join nest. But since it points to itseld, t4 is 
>> in its own
>> +--echo # nest. But there is no NESTED_JOIN to represent it, since it 
>> follows the
>> +--echo # 'normal' left-deep tree representation.
>
> It's probably overkill to explain internal data structures in a test 
> file...
Yes, but having it there was very helpful to me.
>
> "Test that combines outer join operation with one table and outer join 
> operation with multiple inner tables" might just do...
Absolutely, but now that it's there I see no reason to take information 
away.
>> ....
>> +EXPLAIN
>> +SELECT * FROM t1
>> +         LEFT JOIN
>> +         (t2 JOIN t3 ON b = c JOIN t4 ON b = d JOIN t5 ON d = e)
>> +         ON a = b;
>> +SELECT * FROM t1
>> +         LEFT JOIN
>> +         (t2 JOIN t3 ON b = c JOIN t4 ON b = d JOIN t5 ON d = e)
>> +         ON a = b;
>
> What would happen with "a=d" instead of "a=b" ?
Not much. The shortcut would still be illegal. The point of this test is 
a short-cut from t4 to t2, i.e. both tables are at non-top level outer 
join nest.
>> ...
>> +     This is the set of tables directly referenced by this Item.
>
> Please remove  "directly" from this sentence, I think it can be 
> misunderstood.
ok.
>> ...
>> +   Updates a short-cut from from one plan node to another. The function
>
> typo: from from
You must be reviewing an old patch. I fixed this after your previous review.
>
>> ...
>> +static bool update_star_dependency(JOIN *join, JOIN_TAB *join_tab,
>> +                                   JOIN_TAB *dependent)
>> +{
>> +  if (dependent == NULL)
>> +    return false;
> I think that this statement can be omitted. I did 
> DBUG_ASSERT(dependent) w/o crash.
Not applicable anymore, this function is no more. :-)
>> ...
>> +  table_map mask= 1<<  table_index;
>
> Pls make this computation 64-bit safe: table_map mask= (table_map)1 << 
> table_index
>> +  while (mask>  0) {
> -> while (mask != 0)
Ditto.
>> ...
>> +  if(join->join_tab == NULL)
>> +    return;
>
> Space before parenthesis, please.
oops... :-)
>> ...
>>     join_tab->table->null_row=0;
>>     if (end_of_records)
>>     {
>> +    join->return_tab= join_tab;
> Maybe move this ahead of the if and delete the following similar 
> assignment?
Good idea. I think this came from a bazaar pull.
>> ...
>>     if (found)
>>     {
>> +    ++join_tab->produced_rows;
>
> I think this should go into the *last* "if (found)" block in the 
> function. Otherwise you need to document why it is here.
I'm not very familiar with this code. Can you explain why that is more 
correct?

Are you saying I would not need to document if I put it in the other place?

The best would probably be if we have a test case that fails unless it 
is in the right place. Do you think such a test can be constructed?
>> ...
>>
>> +  /**
>> +     The number of rows produced at this plan node with a common prefix
>> +     produced by the preceding plan nodes.
>> +
>> +     The counter is reset each time a partial row has been produced 
>> (and
>> +     pushdown conditions are true) for the current table that 
>> corresponds to
>> +     the partial row produced for all previous tables in the current 
>> join.
>> +  */
>> +  uint produced_rows;
> Should be ulonglong. The last sentence of the comment was difficult to 
> understand.
Good point, but in that case it should be ha_rows. Sentence changed.
>> +
>> +  st_join_table *shortcut;
>
> This is probably worthy a comment as well?
Done.
>> ....
>> +  static bool are_inner_joined(JOIN_TAB *a, JOIN_TAB *b)
>> +  {
>> +    DBUG_ASSERT(a<  b);
>> +    return
>> +      a->table->pos_in_table_list->embedding ==
>> +      b->table->pos_in_table_list->embedding&&
>> +      (b->on_expr_ref == NULL || *b->on_expr_ref == NULL);
>> +  }
> I think that you can safely delete the part "b->on_expr_ref == NULL".
I'd rather not, as said in earlier e-mails. If you disagree, please 
reply to that e-mail so we get the whole context.

Thank you for the review!

Martin
Thread
Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Roy Lyseng11 Feb
  • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Martin Hansson11 Feb
    • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Roy Lyseng11 Feb
  • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Martin Hansson16 Feb
    • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Roy Lyseng21 Feb
      • Re: bzr commit into mysql-trunk branch (martin.hansson:3272) WL#3724Martin Hansson16 Mar