List:Commits« Previous MessageNext Message »
From:Martin Hansson Date:September 28 2010 6:59pm
Subject:Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)
WL#3724
View as plain text  
  Roy Lyseng skrev 2010-09-27 16.08:
>  Hi Martin,
>
> thanks for working on this!
>
> Some early comments to the implementation.
>>
>>
>> === modified file 'sql/item.h'
>> --- a/sql/item.h    2010-09-07 19:07:18 +0000
>> +++ b/sql/item.h    2010-09-20 13:03:54 +0000
>> @@ -860,7 +860,16 @@ public:
>>     virtual bool val_bool_result() { return val_bool(); }
>>     virtual bool is_null_result() { return is_null(); }
>>
>> -  /* bit map of tables used by item */
>> +  /**
>> +     Conceptually, this is the set of tables directly referenced by 
>> this Item.
>> +
>> +     For an Item_field, the set contains simply the table to which 
>> the field
>> +     belongs.
>> +
>> +     For pushdown conditions, used_tables is the set tables 
>> referenced by the
>> +     pushdown condition recursively, *including* the table to which the
>> +     condition is pushed.
>> +  */
>
> Please only keep the first line here, and remove the word "Conceptually".
> The two other comments might belong with the respective implementations.
For the middle sentence, agreed.
> BTW, a pushdown condition is not a particular class of Items, right?
No, it isn't. That's why I put that comment there.
>
>>     virtual table_map used_tables() const { return (table_map) 0L; }
>>     /*
>>       Return table map of tables that can't be NULL tables (tables 
>> that are
>>
>> === modified file 'sql/records.h'
>> --- a/sql/records.h    2010-07-13 17:29:44 +0000
>> +++ b/sql/records.h    2010-09-20 13:03:54 +0000
>> @@ -38,6 +38,34 @@ class SQL_SELECT;
>>       ...
>>     }
>>     end_read_record();
>> +
>> +  The READ_RECORD could be called a record reader, and acts as
>
> I am not interested in what it "could" be called, I am interested in 
> what it is... :)
Actually, I mean it *should* be called RecordReader. :-)

Furthermore, READ_RECORD is a completely pointless name. It isn't a 
record. As a method name, readRecord would make sense because its a 
verb. I frequently translate our bogus names in my mind to something 
meaningful. When I see 'Item', I read 'Expression', TABLE_LIST becomes 
TableReference, etc.
>> +
>> +  - An abstraction layer beneath the query execution engine. The 
>> only read
>> +    operation the execution engine has to know of is reading the next
>> +    record. Beneath the abstraction layer is the storage engine API, 
>> which is
>> +    invoked differently for table/index scans and key lookups, and 
>> the join
>> +    buffer.
>
> An abstraction layer between the query execution engine and the 
> storage engine API.
done.
>
>
>> +
>> +  - An executable sub-program, resulting from the optimizer. It is 
>> set up with a
>> +    reference to a handler and a function pointer to a read function 
>> during
>> +    query optimization.
>> +
>> +    The abstraction is incomplete, however. For instace, reading the 
>> first
>> +    record from a handler is done through a function pointer in the 
>> JOIN_TAB
>> +    rather than in its READ_RECORD.
>> +
>> +    Sometimes the READ_RECORD is initialized during query execution.
>> +
>> +    - It is reinitialized during buffered nested loops join, before 
>> the buffer
>> +      is flushed.
>> +
>> +    - For many table accesses, it is initialized right before the 
>> first record
>> +      is read.
>> +
>> +    The READ_RECORD design can be viewed as a strategy pattern, 
>> where the
>> +    abstract strategy is the READ_RECORD struct and the concrete 
>> strategy is
>> +    determined by the values of the function pointer 
>> READ_RECORD::read_record.
>>   */
>
> I am not sure that it is a good idea to describe how and where a 
> particular class is used with the definition of that class. 
> Documenting the recommended usage pattern can be a great idea, but 
> IMHO the pecularies should instead be documented where they are being 
> used.
I need to know what exactly is referred to before I can comment.
>
>>
>>   class Copy_field;
>>
>> === modified file 'sql/sql_select.cc'
>> --- a/sql/sql_select.cc    2010-09-10 05:14:00 +0000
>> +++ b/sql/sql_select.cc    2010-09-20 13:03:54 +0000
>> @@ -28,6 +28,9 @@
>>   #pragma implementation                // gcc: Class implementation
>>   #endif
>>
>> +#include "sql_set.h"
>> +#include "sql_shortcut.h"
>> +
>>   #include "sql_priv.h"
>>   #include "unireg.h"
>>   #include "sql_select.h"
>> @@ -3488,6 +3491,8 @@ mysql_select(THD *thd, Item ***rref_poin
>>     if (thd->is_error())
>>       goto err;
>>
>> +  Star_join_shortcut::setup_shortcuts(join);
>> +
>
> IMHO setting up shortcuts is part of optimization.
Do you mean this should be done earlier? Granted, it should be done 
during cost-based optimization but my current proposal is to do that as 
another worklog.
>
>>     join->exec();
>>
>>     if (thd->lex->describe&  DESCRIBE_EXTENDED)
>> @@ -16956,6 +16961,7 @@ sub_select_cache(JOIN *join, JOIN_TAB *j
>>     DBUG_ASSERT(cache != NULL);
>>
>>     cache->reset_join(join);
>> +  join->return_tab= join_tab;
>>
>>     DBUG_ENTER("sub_select_cache");
>>
>> @@ -17128,8 +17134,13 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>     DBUG_ENTER("sub_select");
>>
>>     join_tab->table->null_row=0;
>> +  Execution_state state;
>> +  state.set_return_tab(join->return_tab);
>> +
>
> You are instantiating an Execution_state object both in sub_select() 
> and in evaluate_join_record().
>
> First, why do you need two state objects for one table in a join?
The state object is the interface between my optimization and the rest 
of query execution. It is an extensible and reusable interface of the 
kind everybody should strive to write, always. If you want my opinion. :-)

Ideally, the same state object should be passed-by-reference as a 
paremeter from sub_select to evaluate_join_record, but it seems a bit 
silly since the same information is passed as loose parameters anyway. 
It would be simple to put the little status int's inside the state 
object but I wanted someone to see the code before spending the effort.
>
> Second, I think of a join-tab as the state object for a table in a 
> join, so I would rather have this information inside the join-tab 
> object. It also occurs to me that some information is duplicated, 
> which is a potential efficiency and consistency problem.

> (Actually, the join-tab is both a plan operator and a state object, 
See, there's the name problem again. :-) I read "PlanNode" when I see 
"JOIN_TAB". There is no need IMHO to clutter up the structure even more 
with data that could go on the stack instead. If more people had 
realized this a JOIN_TAB would be half the size it is now. But it's not 
too late to turn around.
> but we leave that problem to a refactoring project...)
This has been said for 10 years now. But refactoring must be a 
continuous effort or it will lose momentum. I take a very pragmatic 
approach in this and refactor exactly what I need to in order to reach a 
stable implementation of a feature. If it becomes a performance hit, 
let's optimize it.
>
>>     if (end_of_records)
>>     {
>> +    join->return_tab= join_tab;
>> +    state.set_return_tab(join_tab);
>>       enum_nested_loop_state nls=
>>         (*join_tab->next_select)(join,join_tab+1,end_of_records);
>>       DBUG_RETURN(nls);
>> @@ -17144,6 +17155,7 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>     }
>>
>>     join->return_tab= join_tab;
>> +  state.set_return_tab(join_tab);
>>     join_tab->not_null_compl= TRUE;
>>
>>     if (join_tab->last_inner)
>> @@ -17159,11 +17171,15 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>     join->thd->warning_info->reset_current_row_for_warning();
>>
>>     error= (*join_tab->read_first_record)(join_tab);
>> +  state.set_read_error(error);
>> +  join_tab->notify_row_read(&state);
>>
>>     if (join_tab->keep_current_rowid)
>>      
> join_tab->table->file->position(join_tab->table->record[0]);
>> -
>> +
>>     rc= evaluate_join_record(join, join_tab, error);
>> +
>> +  join->return_tab= state.get_return_tab();
>>
>>     /*
>>       Note: psergey has added the 2nd part of the following 
>> condition; the
>> @@ -17172,11 +17188,16 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>     while (rc == NESTED_LOOP_OK&&  join->return_tab>= join_tab)
>>     {
>>       error= info->read_record(info);
>> +    state.set_read_error(error);
>> +    join_tab->notify_row_read(&state);
>>
>>       if (join_tab->keep_current_rowid)
>>        
> join_tab->table->file->position(join_tab->table->record[0]);
>> -
>> +
>>       rc= evaluate_join_record(join, join_tab, error);
>> +
>> +    if (state.get_return_tab()<  join->return_tab)
>> +      join->return_tab= state.get_return_tab();
>>     }
>>
>>     if (rc == NESTED_LOOP_NO_MORE_ROWS&&
>> @@ -17342,14 +17363,23 @@ evaluate_join_record(JOIN *join, JOIN_TA
>>     }
>>     DBUG_PRINT("info", ("select cond 0x%lx", (ulong)select_cond));
>>
>> +  Execution_state state;
>> +  state.set_return_tab(join->return_tab);
>> +  state.set_read_error(error);
>>     if (select_cond)
>>     {
>>       found= test(select_cond->val_int());
>> +    state.set_evaluation(found);
>>
>>       /* check for errors evaluating the condition */
>>       if (join->thd->is_error())
>>         DBUG_RETURN(NESTED_LOOP_ERROR);
>>     }
>> +  else
>> +    state.set_evaluation(TRUE);
>> +
>> +  join_tab->notify_row_evaluated(&state);
>> +
>>     if (found)
>>     {
>>       /*
>> @@ -17414,7 +17444,7 @@ evaluate_join_record(JOIN *join, JOIN_TA
>>         join_tab->first_unmatched= first_unmatched;
>>       }
>>
>> -    JOIN_TAB *return_tab= join->return_tab;
>> +    JOIN_TAB *return_tab= state.get_return_tab();
>>
>>       if (join_tab->check_weed_out_table&&  found)
>>       {
>> @@ -22764,6 +22794,7 @@ void select_describe(JOIN *join, bool ne
>>               extra.append(STRING_WITH_LEN(", incremental buffers)"));
>>           }
>>
>> +        tab->notify_explain_extra_column_ready(join,&extra);
>>           /* Skip initial "; "*/
>>           const char *str= extra.ptr();
>>           uint32 len= extra.length();
>>
>> === modified file 'sql/sql_select.h'
>> --- a/sql/sql_select.h    2010-09-06 17:56:05 +0000
>> +++ b/sql/sql_select.h    2010-09-20 13:03:54 +0000
>> @@ -179,6 +179,57 @@ inline bool sj_is_materialize_strategy(u
>>     return strategy>= SJ_OPT_MATERIALIZE_LOOKUP;
>>   }
>>
>> +
>> +/**
>> +   Part of the IExecution_listener interface. An execution listener is
>> +   notified by the plan node upon various events in the execiton 
>> plan. A
>> +   listener may alter the state and thus change the course of the query
>> +   execution by altering the execution state. The query execution 
>> engine may
>> +   always overrride the decision, however.
>
> The last sentence does not make sense to me. 
I misspelled 'override'. ;-)

In practice what it means is that the Execution_state does not 
manipulate the plan (the JOIN_TAB) directly. This is the design. It only 
updates its own member (m_return_tab) which sub_select() picks up later. 
I wanted to save the option (in the design) for sub_select not to pick 
it up.
> And "various events" is just loose talk. I want exact specifications, 
> please :)
I was intentionally vague here so that this documentation wouldn't drift 
out of sync if one adds another event to the design. Extensibility in 
mind :-)
>> +*/
>> +class Execution_state {
>> +  int m_read_error;
>
> What does "m_read_error" has to do with execution? I thought a read 
> error would cause immediate exit, and nothing that we need to track on 
> a table basis.
What do you mean by 'immediate exit' in this context? m_read_error is 
used to store the value returned by join_tab->read_first_record(). I 
believe negative values mean 'no more rows' and positive ones mean 
'serious error' (see evaluate_join_record). But I see no proper spec 
anywhere.
>
>> +  bool m_evaluation;
>
> What is this?
It's filled in by evaluate_join_record(), whether it accepts or rejects 
a tuple. It's not commented because it's a private member, but perhaps I 
should comment the getXXX and setXXX methods in Execution_state?
>
>> +  struct st_join_table *m_return_tab;
>> +
>> +public:
>> +
>> +  Execution_state() : m_read_error(0), m_evaluation(FALSE), 
>> m_return_tab(NULL)
>> +  {}
>> +
>> +  int get_read_error() { return m_read_error; }
>> +  void set_read_error(int read_error) { m_read_error = read_error; }
>> +
>> +  int get_evaluation() { return m_evaluation; }
>> +  void set_evaluation(bool evaluation) { m_evaluation= evaluation; }
>> +
>> +  struct st_join_table *get_return_tab() { return m_return_tab; }
>> +
>> +  /**
>> +     Signals to query execution that a short-cut be taken in the query
>> +     execution, and no further tuples should be read from tables 
>> appearing
>> +     after return_tab in the query execution plan.
>> +
>> +     @param return_tab The latest plan node where query execution 
>> should be
>> +     resumed.
>> +  */
>> +  void set_return_tab(struct st_join_table *return_tab) {
>> +    m_return_tab = return_tab;
>> +  }
>> +};
>> +
>> +
>> +class IExecution_listener {
>> +public:
>> +  virtual void row_read(struct st_join_table *join_tab,
>> +                        Execution_state *state) {}
>> +  virtual void row_evaluated(struct st_join_table *join_tab,
>> +                             Execution_state *state) {}
>> +  virtual void explain_extra_column_ready(JOIN *join, String *extra) {}
>> +  virtual ~IExecution_listener() {}
>> +};
>
> I am a bit concerned about the efficiency of your implementation. You 
> are performing quite a number of checks that generally seem unneeded. 
I am concerned, too. :-) For the record, this is totally unoptimized 
code. For my first commit I just wanted to program the design as 
straightforwardly and obviously as possible, and then optimize once the 
design is communicated and approved.

If I do any checks that  are unnecessary I would welcome if you pointed 
them out. We should make them DBUG_ASSERT's in that case.

What I am concerned about is not the virtual method calls. These are as 
efficient as function pointers. But it might be a problem to traverse a 
list of listeners. Do you have any suggestions?
>
> Is it possible to implement a more "lightweight" solution 
What would it be characterized by? More like in-lining the code in 
sub_select and evaluate_join_record?
> that is more in line with the code style currently being used in the 
> executioner?
I must say the executioner exhibits a great lack of code style. I can't 
really see how it could be a role model.

>
>> +
>> +
>>   /**
>>       Bits describing quick select type
>>   */
>> @@ -186,11 +237,25 @@ enum quick_type { QS_NONE, QS_RANGE, QS_
>>
>>   typedef struct st_join_table : public Sql_alloc
>>   {
>> +private:
>> +  List<IExecution_listener>  listeners;
>> +public:
>>     st_join_table();
>>
>>     TABLE        *table;
>>     KEYUSE    *keyuse;            /**<  pointer to first used key */
>>     SQL_SELECT    *select;
>> +  /**
>> +     This field corresponds somewhat to a pushdown condition. It may
>> +     contain more than just those conditions pushed to the actual 
>> table,
>> +     however.
>
> Where is a "pushdown condition" defined? 
Section 1.1 of the HLS. Fifth bullet.
> And please do not use the word "somewhat", but describe exactly what 
> the field is.
The contents are most likely partially undefined.
>> +
>> +     @note For the first table in the plan there is usually a 
>> select_cond
>> +     attached but it is not initialized.
>
> You mean it is NULL? I guess this is unnecessary information.
No, not NULL. I mean the object exists but is not properly initialized. 
Most of its members are undefined. That is, the job that you would 
expect a constructor to do is not done.
>> +
>> +     @note Not all pushdown conditions are found in this field. @c 
>> ref access
>> +     conditions are removed from here and are encoded in JOIN_TAB::ref.
>
> The basic idea is that this field specifies the remaining conditions 
> to be evaluated on the currently available rows, after you have 
> navigated to that set of rows. However, this excludes outer join 
> conditions that are located in the join nest structures.
Right.
>
>> +  */
>>     Item        *select_cond;
>>     QUICK_SELECT_I *quick;
>>     Item           **on_expr_ref;   /**<  pointer to the associated 
>> on expression   */
>> @@ -385,6 +450,32 @@ typedef struct st_join_table : public Sq
>>       return tmp_select_cond;
>>     }
>>     uint get_sj_strategy() const;
>> +  void add_execution_listener(IExecution_listener *listener)
>> +  {
>> +    listeners.push_front(listener);
>> +  }
>> +  void notify_row_read(Execution_state *state)
>> +  {
>> +    IExecution_listener* listener;
>> +    for (List_iterator<IExecution_listener>  it(listeners);
>> +         (listener= it++);)
>> +      listener->row_read(this, state);
>> +  }
>> +  void notify_row_evaluated(Execution_state *state)
>> +  {
>> +    IExecution_listener* listener;
>> +    for (List_iterator<IExecution_listener>  it(listeners);
>> +         (listener= it++);)
>> +      listener->row_evaluated(this, state);
>> +
>> +  }
>> +  void notify_explain_extra_column_ready(JOIN *join, String *extra)
>> +  {
>> +    IExecution_listener* listener;
>> +    for (List_iterator<IExecution_listener>  it(listeners);
>> +         (listener= it++);)
>> +      listener->explain_extra_column_ready(join, extra);
>> +  }
>>   } JOIN_TAB;
>>
>>
>> @@ -1761,11 +1852,12 @@ public:
>>     List<TABLE_LIST>  *join_list;       ///<  list of joined tables 
>> in reverse order
>>     COND_EQUAL *cond_equal;
>>     /*
>> +    Interface for taking short-cuts through join execution.
>
> I do not think this sentence adds information.
My bad. This should have been a Doxygen comment. Doxygen puts the first 
sentence of a comment in a special 'brief' section so one should strive 
to summarize what the comment says in the first sentence.

http://www.stack.nl/~dimitri/doxygen/commands.html#cmdbrief
>
>>       Join tab to return to. Points to an element of join->join_tab 
>> array, or to
>>       join->join_tab[-1].
>>       This is used at execution stage to shortcut join enumeration. 
>> Currently
>>       shortcutting is done to handle outer joins or handle semi-joins 
>> with
>> -    FirstMatch strategy.
>> +    FirstMatch strategy. It is also used to optimize execution of 
>> star joins.
>>     */
>>     JOIN_TAB *return_tab;
>>     Item **ref_pointer_array; ///<used pointer reference for this select
>> @@ -1933,6 +2025,46 @@ public:
>>               ((group || tmp_table_param.sum_func_count)&&  
>> !group_list)) ?
>>                 NULL : join_tab+const_tables;
>>     }
>> +  /**
>> +     Returns true if JOIN_TAB's a and b participate on the same 
>> nesting level
>> +     within a chain of inner joined tables.
>> +
>> +     @note The optimizer execution plan is considered, and outer 
>> joins may
>> +     have been rewritten into inner joins.
>> +   */
>> +  static bool are_inner_joined(JOIN_TAB *a, JOIN_TAB *b)
>> +  {
>> +
>> +    if (a>  b) {
>> +      JOIN_TAB *c= a;
>> +      a= b;
>> +      b= c;
>> +    }
>> +
>> +    if (a->last_inner != NULL&&  // a is in an outer join nest, then
>> +        a->last_inner == b->last_inner&&  // a must be at least @
> 
>> b's nest level
>> +        b->first_inner == NULL) // and b must not be deeper nested
>> +      return TRUE;
>> +
>> +    if (a->last_inner == NULL&&  // a is not an outer join nest
>> +        b->last_inner == NULL&&  b->first_inner == NULL) //
> neither 
>> is b
>> +      return TRUE;
>> +
>> +    if (b->last_inner == NULL&&  // b is not first in a nested inner
> 
>> join
>> +        b->first_inner != NULL&&  b->first_inner ==
> a->first_inner) 
>> // but a is
>> +      return TRUE;
>> +
>> +    return FALSE;
>> +  }
>
> I have not analyzed this algorithm in detail :) 
It's not an algorithm per se. Just a view over of the last/first _inner 
pointers.

> Based on our IRC discussion, do you think it is possible to simplify 
> the implementation?
Do you mean implement it in terms of  is_inner_table_of_outer_join() et 
al? I guess I could try...
>
>> +
>> +  static bool contains_join_buffering(JOIN_TAB *a, JOIN_TAB *b)
>> +  {
>> +    for (JOIN_TAB *join_tab= a; join_tab<  b; join_tab++)
>> +      if (join_tab->next_select == sub_select_cache)
>> +        return TRUE;
>> +    return FALSE;
>
> Is it possible to use the use_join_cache flag instead? Relying on 
> generated "code"
> seems dangerous...
Good point. Will look into that.
>
>> +  }
>> +
>>   private:
>>     /**
>>       TRUE if the query contains an aggregate function but has no GROUP
>>
>> === added file 'sql/sql_set.h'
>> --- a/sql/sql_set.h    1970-01-01 00:00:00 +0000
>> +++ b/sql/sql_set.h    2010-09-20 13:03:54 +0000
>> @@ -0,0 +1,112 @@
>> +#ifndef INCLUDES_MYSQL_SQL_SET_H
>> +#define INCLUDES_MYSQL_SQL_SET_H
>> +/* Copyright (c) 2000, 2010 Oracle and/or its affiliates. All rights 
>> reserved.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; version 2 of the License.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program; if not, write to the Free Software
>> +   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 
>> 02110-1301  USA */
>> +
>> +#include "my_global.h"
>> +
>> +template<class T>  class Bitmap_set_iterator;
>> +template<class T>  class Bitmap_set_reverse_iterator;
>> +
>> +/**
>> +   A set implemented as a bitmap. Due to this implementation, 
>> operations that
>> +   normally are quite expensive, typically having O(n log n) 
>> complexity, are
>> +   extremely cheap. A set consists of a bitmap and an array of 
>> pointers to
>> +   elements, the backing store. Each n:th bit in the bitmap 
>> represents the
>> +   member at position n in the array, where the first member has 
>> position 0. A
>> +   set consisting of this element is represented by the bitmap 1.
>> +
>> +   For obvious reasons, it cannot be allowed to perform a union 
>> operation on
>> +   two sets that have different backing stores, even if their type 
>> is the
>> +   same. Attempts at such operations will result in failed assertions.
>> + */
>
> AFAIU, the immediate gain of this class is that you get type-safety 
> for the operations
> on the backingstore elements, and that you avoid doing the indexing into
> the backingstore array.
The idea is to use a simple iterator interface to access the data and to 
hide the fact that it is represented by a bitmap. It annoys the hell out 
of me that the code is full of 'sets' that are just int32's, table_map's 
whatever. And don't get me started on the 'magic' bits that people shove 
in there just because the representation is wide open. Pick your 
favorite among PARAM_TABLE_BIT, OUTER_REF_TABLE_BIT, RAND_TABLE_BIT...

You can see this great benefit in action in set_star_dependency(). If I 
didn't have an iterator I would have to slide a bitmap over an integer, 
dereference pointers and whatnot.
>
> The downside is more complexity 
IMHO This class *removes* complexity. Have you seen how many bizarre 
bitwise operations are done in the code? And how many bugs they cause?

Do you really not see any expressive advantage of "if (set_difference(a, 
b, c).is_empty())" over "if (((a & ~b) & ~c))"? The compiled code looks 
exactly the same.

> (yet another bitmap type, 
Yes, but one to rule them all. This type can encompass all the other 
types and hide *all* of them behind a consistent interface that lets you 
switch representation. It could be represented as uint32, long int, 
short, or indeed anything.
> templates)...
Templates are part of the C++ language. Granted, they may have had a bad 
reputation 15 years ago but there is no reason for treating them like 
second-class citizens anymore.
>> +template<class T>  class Bitmap_set {
>> +
>> +private:
>> +  ulonglong m_map;
>> +  T** m_backing_store;
>> +
>> +public:
>> +  Bitmap_set(ulonglong map, T** backing_store) :
>> +    m_map(map), m_backing_store(backing_store) {}
>> +
>> +    bool is_empty() { return m_map == 0; }
>> +
>> +    static Bitmap_set<T>  set_union(const Bitmap_set<T>&  a,
>> +                                   const Bitmap_set<T>&  b) {
>> +      DBUG_ASSERT(a.m_backing_store == b.m_backing_store);
>> +      Bitmap_set<T>  result(a.m_map | b.m_map, a.m_backing_store);
>> +      return result;
>> +    }
>> +
>> +    friend class Bitmap_set_iterator<T>;
>> +    friend class Bitmap_set_reverse_iterator<T>;
>> +};
>> +
>> +template<class T>  class Bitmap_set_iterator {
>> +private:
>> +  const Bitmap_set<T>&  m_set;
>> +  uint m_current_element;
>> +
>> +public:
>> +  Bitmap_set_iterator(const Bitmap_set<T>&  set) :
>> +    m_set(set), m_current_element(0) {}
>> +
>> +  inline T* operator++(int) {
>> +    ulonglong mask= 1<<  m_current_element;
>> +    while ((mask&  m_set.m_map) == 0)
>> +    {
>> +      if (mask == 0 || mask>  m_set.m_map)
>> +        return NULL;
>> +      mask<<= 1;
>> +      ++m_current_element;
>> +    }
>> +    T* element= m_set.m_backing_store[m_current_element++];
>> +    return element;
>> +  }
>> +
>> +};
>> +
>> +template<class T>  class Bitmap_set_reverse_iterator {
>> +private:
>> +  const Bitmap_set<T>&  m_set;
>> +  int m_current_element;
>> +
>> +public:
>> +  Bitmap_set_reverse_iterator(const Bitmap_set<T>&  set) :
>> +    m_set(set), m_current_element(0)
>> +  {
>> +    ulonglong map= set.m_map;
>> +    while ((map>>= 1) != 0)
>> +      ++m_current_element;
>> +  }
>> +
>> +  inline T* operator++(int) {
>> +    if (m_current_element == -1)
>> +      return NULL;
>> +    ulonglong mask= 1<<  m_current_element;
>> +    while ((mask&  m_set.m_map) == 0)
>> +    {
>> +      if (mask == 0)
>> +        return NULL;
>> +      mask>>= 1;
>> +      --m_current_element;
>> +    }
>> +    T* element= m_set.m_backing_store[m_current_element--];
>> +    return element;
>> +  }
>> +
>> +};
>> +#endif // INCLUDES_MYSQL_SQL_SET_H
>>
>> === added file 'sql/sql_shortcut.h'
>> --- a/sql/sql_shortcut.h    1970-01-01 00:00:00 +0000
>> +++ b/sql/sql_shortcut.h    2010-09-20 13:03:54 +0000
>> @@ -0,0 +1,208 @@
>> +#ifndef INCLUDES_MYSQL_SQL_SHORTCUT_H
>> +#define INCLUDES_MYSQL_SQL_SHORTCUT_H
>> +
>> +/* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All 
>> rights reserved.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; version 2 of the License.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program; if not, write to the Free Software
>> +   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  
>> 02111-1307  USA */
>> +
>> +#include "sql_select.h"
>> +
>> +class Star_join_shortcut : public IExecution_listener {
>> +
>> +private:
>> +
>> +  JOIN_TAB *m_destination;
>> +  bool m_current_prefix_successful;
>
> What is meant by "success" here?
Comment added.
>
>> +
>> +Star_join_shortcut(JOIN_TAB *destination) :
>> +  m_destination(destination), m_current_prefix_successful(FALSE) {
>> +    DBUG_ASSERT(m_destination != NULL);
>> +  }
>> +
>> +public:
>> +
>> +  /**
>> +     Pre-compute short-cuts for the join short-cut optimization.
>> +
>> +     This function adorns the query exeution plan with star join 
>> short-cuts
>> +     where applicable. Short-cuts are triggered through the
>> +     IExecution_listener interface. The algorithm is implemented as 
>> a nested
>> +     loop, where the outer loop walks the nodes (JOIN_TAB's) in the 
>> query
>> +     execution plan, and the inner loop iterates through all tables 
>> that this
>> +     table depends upon. Valid short-cuts have the following property.
>> +
>> +     - The short-cut has a minimum length of 2. If a table is 
>> dependent on the
>> +       table directly preceding it, the short-cut would be of length 
>> 1, and it
>> +       would indicate that there are no valid short-cuts from this 
>> table.
>> +
>> +     - The short-cut does not cross an outer join boundary. In this 
>> case the
>> +       optimization is not applicable, because a valid result would 
>> be missed.
>> +
>> +     - The short-cut does not cross a node using join buffers. 
>> Short-cuts are
>> +       not compatible with join buffering.
>> +
>> +     - If there are several applicable shortcuts by the above 
>> requirements, the
>> +       shortest one is chosen.
>> +
>> +     @param join The query execution plan. The plan is assumed to be 
>> complete
>> +     and ready for execution when this method is invoked.
>> + */
>> +  static void setup_shortcuts(JOIN *join)
>> +  {
>> +    if(!join->join_tab)
>> +      return;
>> +
>> +    for (uint i= 2; i<  join->tables; i++)
>> +      set_star_dependency(join,&join->join_tab[i]);
>> +  }
>> +
>> +  void row_read(JOIN_TAB *join_tab, Execution_state *state)
>> +  {
>> +    if (state->get_read_error() != 0)
>> +    {
>> +      if (!m_current_prefix_successful)
>> +        handle_read_failure(join_tab, state);
>> +      m_current_prefix_successful= FALSE;
>> +    }
>> +  }
>> +
>> +
>> +  /**
>> +     If a row has passed the push-down condition test for the 
>> current prefix
>> +     of the partial result tuple, we record that here. When there 
>> are no more
>> +     rows left, it is allowed to take a short-cut if now rows passed 
>> the test.
>> +   */
>> +  void row_evaluated(JOIN_TAB *join_tab, Execution_state *state) {
>> +    if (state->get_evaluation())
>> +      m_current_prefix_successful= TRUE;
>> +  }
>> +
>> +  /**
>> +     Appends information about the short-cut optimization to EXPLAIN 
>> output, if
>> +     used for the current plan node.
>> +
>> +     @param join     The execution plan.
>> +     @param join_tab The plan node to be annotated.
>> +     @param extra    The current content of the 'extra' column in 
>> EXPLAIN.
>> +  */
>> +  void explain_extra_column_ready(JOIN *join, String *extra)
>> +  {
>> +    extra->append(STRING_WITH_LEN("; StarJoin("));
>> +    extra->append(m_destination->table->alias);
>> +    extra->append(STRING_WITH_LEN(")"));
>> +  }
>> +
>> +private:
>> +
>> + /**
>> +    Sets the short-cut from from one plan node to another, if one is 
>> found.
>> +
>> +     @param join The execution plan.
>> +
>> +     @param join_tab The plan node from which a short-cut 
>> willpotentially be
>> +     taken.
>> +   */
>> +  static void set_star_dependency(JOIN *join, JOIN_TAB *join_tab)
>> +  {
>> +    table_map where_clause_tables_map;
>> +    if (join_tab->select_cond != NULL)
>> +    {
>> +      Item *where_condition= join_tab->select_cond;
>> +      where_condition->update_used_tables();
>
> Do you need to update used tables here?
I gather you have not tried removing it. ;-) Yes it is needed.
>
>> +      where_clause_tables_map=
>> +        where_condition->used_tables()&  ~PSEUDO_TABLE_BITS;
>> +    }
>> +    else
>> +      where_clause_tables_map= (table_map)0L;
>> +
>> +    table_map ref_access_tables_map=
>> +      join_tab->ref.depend_map&  ~PSEUDO_TABLE_BITS;
>> +
>> +    Bitmap_set<JOIN_TAB>
>> +      where_clause_tables(where_clause_tables_map, join->map2table);
>> +
>> +    Bitmap_set<JOIN_TAB>
>> +      ref_access_tables(ref_access_tables_map, join->map2table);
>> +
>> +    Bitmap_set<JOIN_TAB>  all_dependency_tables=
>> +      Bitmap_set<JOIN_TAB>::set_union(where_clause_tables, 
>> ref_access_tables);
>
> all_dependency_tables ->  dependent_tables ?
That would be 'tables_depended_upon' in that case.
>
>
>> +
>> +    if (all_dependency_tables.is_empty())
>> +      return;
>> +
>> +    for(Bitmap_set_reverse_iterator<JOIN_TAB>  
>> it(all_dependency_tables);
>> +        update_star_dependency(join, join_tab, it++);) ;
>> +  }
>> +
>> +
>> +  /**
>> +     Updates a short-cut from from one plan node to another. The method
>> +     assumes that it is invoked on plan nodes in reverse order in 
>> which they
>> +     appear in the query execution plan.
>
> -> Order from the most inner to the most outer table?
What is meant by inner and outer in this case?
>
>> +
>> +     @param join The execution plan.
>> +
>> +     @param join_tab The plan node from which a short-cut may 
>> potentially be
>> +     taken.
>> +
>> +     @param dependent A node that access to join_tab depends on.
>> +
>> +     @retval TRUE There may be other short-cuts possible early in 
>> the plan.
>> +     @retval FALSE There are no earlier short-cuts available. Either 
>> the only
>> +     possible one was found and completely initialized, or there are 
>> none.
>> +   */
>> +  static bool update_star_dependency(JOIN *join, JOIN_TAB *join_tab,
>> +                                     JOIN_TAB *dependency)
>> +  {
>> +    if (dependency == NULL)
>> +      return FALSE;
>> +
>> +    if (join_tab<= dependency)
>> +      return TRUE;
>> +
>> +    if (dependency == join_tab - 1)
>> +    {
>> +      DBUG_PRINT("info", ("Invalidated a join short-cut from %s to %s",
>> +                          join_tab->table->alias,
>> +                          dependency->table->alias));
>> +      return FALSE;
>> +    }
>> +    else if (join->are_inner_joined(dependency, join_tab)&&
>> +             !join->contains_join_buffering(dependency, join_tab))
>> +    {
>> +      DBUG_PRINT("info", ("Found a join short-cut from %s to %s",
>> +                          join_tab->table->alias,
>> +                          dependency->table->alias));
>> +
>> +      join_tab->add_execution_listener(new 
>> Star_join_shortcut(dependency));
>> +
>> +      return FALSE;
>> +    }
>> +    return TRUE;
>> +  }
>> +
>> +private:
>> +
>> +  void handle_read_failure(JOIN_TAB *join_tab, Execution_state 
>> *state) {
>> +    if (m_destination != NULL)
>> +    {
>> +      state->set_return_tab(m_destination);
>> +      DBUG_PRINT("info", ("Execution short-cut to '%s'",
>> +                          m_destination->table != NULL ?
>> +                          m_destination->table->alias : ""));
>> +    }
>> +  }
>> +};
>> +
>> +#endif // INCLUDES_MYSQL_SQL_SHORTCUT_H
>>
>> === modified file 'sql/structs.h'
>> --- a/sql/structs.h    2010-07-13 17:29:44 +0000
>> +++ b/sql/structs.h    2010-09-20 13:03:54 +0000
>> @@ -119,6 +119,27 @@ struct st_join_table;
>>   typedef struct st_reginfo {        /* Extra info about reg */
>>     struct st_join_table *join_tab;    /* Used by SELECT() */
>>     enum thr_lock_type lock_type;        /* How database is used */
>> +  /**
>> +     This is an outer join optimization. The idea is to avoid
>> +     NULL-complementing a row for the inner table(s). The bespoke 
>> case is when
>> +     we have a query such as
>> +
>> +     @code
>> +
>> +     SELECT ...
>> +     FROM t1 LEFT OUTER JOIN (t2, t3) ON<condition>
>> +     WHERE t2.non_nullable_column IS NULL
>> +
>> +     @endcode
>> +
>> +     In this case not_exists_optimize would be set for t2.
>> +
>> +     Once the query execution reaches the place where<condition>  
>> holds for
>> +     the current result tuple, we know that there cannot be a row 
>> where the
>> +     WHERE condition holds. There is no need produce a 
>> NULL-complemented row.
>> +
>> +     @see update_ref_and_keys
>
> These comments are repeated at two places in the code. 
Please show me where, I can't find it.


Thank you for the comprehensive review!

Best Regards

Martin

Thread
bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218) WL#3724Martin Hansson20 Sep
Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng27 Sep
  • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Martin Hansson28 Sep
    • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng29 Sep
      • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Martin Hansson29 Sep
        • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng29 Sep