List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:September 29 2010 6:56am
Subject:Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)
WL#3724
View as plain text  
On 28.09.10 20.59, Martin Hansson wrote:
> Roy Lyseng skrev 2010-09-27 16.08:
>> Hi Martin,
>>
>> thanks for working on this!
>>
>> Some early comments to the implementation.
>>>
>>>
>>> === modified file 'sql/item.h'
>>> --- a/sql/item.h 2010-09-07 19:07:18 +0000
>>> +++ b/sql/item.h 2010-09-20 13:03:54 +0000
>>> @@ -860,7 +860,16 @@ public:
>>> virtual bool val_bool_result() { return val_bool(); }
>>> virtual bool is_null_result() { return is_null(); }
>>>
>>> - /* bit map of tables used by item */
>>> + /**
>>> + Conceptually, this is the set of tables directly referenced by this Item.
>>> +
>>> + For an Item_field, the set contains simply the table to which the field
>>> + belongs.
>>> +
>>> + For pushdown conditions, used_tables is the set tables referenced by the
>>> + pushdown condition recursively, *including* the table to which the
>>> + condition is pushed.
>>> + */
>>
>> Please only keep the first line here, and remove the word "Conceptually".
>> The two other comments might belong with the respective implementations.
> For the middle sentence, agreed.
>> BTW, a pushdown condition is not a particular class of Items, right?
> No, it isn't. That's why I put that comment there.

Then it is better to put the comment on the field(s) that implement pushdown 
conditions, most likely the "select_cond" field of the join-tab.

>>
>>> virtual table_map used_tables() const { return (table_map) 0L; }
>>> /*
>>> Return table map of tables that can't be NULL tables (tables that are
>>>
>>> === modified file 'sql/records.h'
>>> --- a/sql/records.h 2010-07-13 17:29:44 +0000
>>> +++ b/sql/records.h 2010-09-20 13:03:54 +0000
>>> @@ -38,6 +38,34 @@ class SQL_SELECT;
>>> ...
>>> }
>>> end_read_record();
>>> +
>>> + The READ_RECORD could be called a record reader, and acts as
>>
>> I am not interested in what it "could" be called, I am interested in what it
>> is... :)
> Actually, I mean it *should* be called RecordReader. :-)
>
> Furthermore, READ_RECORD is a completely pointless name. It isn't a record. As a
> method name, readRecord would make sense because its a verb. I frequently
> translate our bogus names in my mind to something meaningful. When I see 'Item',
> I read 'Expression', TABLE_LIST becomes TableReference, etc.

Nevertheless, please do not try to express your opinions on the code in the 
comments that you write. If you want to document existing structs, please use as 
consistent language as possible. I am really afraid that trying to document what 
a struct "should" be called only confuses developers that come after you and try 
to understand the structure of the system.

So, the documentation should be like this:

"READ_RECORD is an interface that is used to retrieve records one by one..."

I agree with you about the naming, but writing notes for refactoring should go 
in a different place.

>>> +
>>> + - An abstraction layer beneath the query execution engine. The only read
>>> + operation the execution engine has to know of is reading the next
>>> + record. Beneath the abstraction layer is the storage engine API, which is
>>> + invoked differently for table/index scans and key lookups, and the join
>>> + buffer.
>>
>> An abstraction layer between the query execution engine and the storage engine
>> API.
> done.
>>
>>
>>> +
>>> + - An executable sub-program, resulting from the optimizer. It is set up
> with a
>>> + reference to a handler and a function pointer to a read function during
>>> + query optimization.
>>> +
>>> + The abstraction is incomplete, however. For instace, reading the first
>>> + record from a handler is done through a function pointer in the JOIN_TAB
>>> + rather than in its READ_RECORD.
>>> +
>>> + Sometimes the READ_RECORD is initialized during query execution.
>>> +
>>> + - It is reinitialized during buffered nested loops join, before the buffer
>>> + is flushed.
>>> +
>>> + - For many table accesses, it is initialized right before the first record
>>> + is read.
>>> +
>>> + The READ_RECORD design can be viewed as a strategy pattern, where the
>>> + abstract strategy is the READ_RECORD struct and the concrete strategy is
>>> + determined by the values of the function pointer READ_RECORD::read_record.
>>> */
>>
>> I am not sure that it is a good idea to describe how and where a particular
>> class is used with the definition of that class. Documenting the recommended
>> usage pattern can be a great idea, but IMHO the pecularies should instead be
>> documented where they are being used.
> I need to know what exactly is referred to before I can comment.

I mean the paragraph starting with "Sometimes" and the two following paragraphs.
>>
>>>
>>> class Copy_field;
>>>
>>> === modified file 'sql/sql_select.cc'
>>> --- a/sql/sql_select.cc 2010-09-10 05:14:00 +0000
>>> +++ b/sql/sql_select.cc 2010-09-20 13:03:54 +0000
>>> @@ -28,6 +28,9 @@
>>> #pragma implementation // gcc: Class implementation
>>> #endif
>>>
>>> +#include "sql_set.h"
>>> +#include "sql_shortcut.h"
>>> +
>>> #include "sql_priv.h"
>>> #include "unireg.h"
>>> #include "sql_select.h"
>>> @@ -3488,6 +3491,8 @@ mysql_select(THD *thd, Item ***rref_poin
>>> if (thd->is_error())
>>> goto err;
>>>
>>> + Star_join_shortcut::setup_shortcuts(join);
>>> +
>>
>> IMHO setting up shortcuts is part of optimization.
> Do you mean this should be done earlier? Granted, it should be done during
> cost-based optimization but my current proposal is to do that as another worklog.

I think this code belongs at the end of JOIN::optimize(). This function contains 
a combination of rule-based and cost-based optimizations, so shortcutting 
definitely fits in.

Besides, I am not sure that doing cost-based choice for join shortcutting is 
worthwhile. The cost of the optimization is virtually zero (if the 
implementation is sufficiently lightweight), and a successful shortcut will 
always reduce execution cost.
>>
>>> join->exec();
>>>
>>> if (thd->lex->describe& DESCRIBE_EXTENDED)
>>> @@ -16956,6 +16961,7 @@ sub_select_cache(JOIN *join, JOIN_TAB *j
>>> DBUG_ASSERT(cache != NULL);
>>>
>>> cache->reset_join(join);
>>> + join->return_tab= join_tab;
>>>
>>> DBUG_ENTER("sub_select_cache");
>>>
>>> @@ -17128,8 +17134,13 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>> DBUG_ENTER("sub_select");
>>>
>>> join_tab->table->null_row=0;
>>> + Execution_state state;
>>> + state.set_return_tab(join->return_tab);
>>> +
>>
>> You are instantiating an Execution_state object both in sub_select() and in
>> evaluate_join_record().
>>
>> First, why do you need two state objects for one table in a join?
> The state object is the interface between my optimization and the rest of query
> execution. It is an extensible and reusable interface of the kind everybody
> should strive to write, always. If you want my opinion. :-)

My opinion is that join execution should first and foremost be a highly 
optimized program. I do not care about extensible and reusable interfaces, 
unless there is a plan to extend the use of them. Now I think they are plain 
overkill.
>
> Ideally, the same state object should be passed-by-reference as a paremeter from
> sub_select to evaluate_join_record, but it seems a bit silly since the same
> information is passed as loose parameters anyway. It would be simple to put the
> little status int's inside the state object but I wanted someone to see the code
> before spending the effort.
>>
>> Second, I think of a join-tab as the state object for a table in a join, so I
>> would rather have this information inside the join-tab object. It also occurs
>> to me that some information is duplicated, which is a potential efficiency and
>> consistency problem.
>
>> (Actually, the join-tab is both a plan operator and a state object,
> See, there's the name problem again. :-) I read "PlanNode" when I see
> "JOIN_TAB". There is no need IMHO to clutter up the structure even more with
> data that could go on the stack instead. If more people had realized this a
> JOIN_TAB would be half the size it is now. But it's not too late to turn around.

Actually, I do not agree with you. Using the stack can be an option, but when 
bouncing between levels of the join continuously, it is perhaps better to have 
the data in more persistent structures. Another issue is that the join-tabs are 
not reentrant, and that they contain a lot of baggage that is only useful during 
optimization.

My suggestion is that we have one optimizer "join-tab", one execution join-tab 
with read-only information, and one execution "join-tab" with state information.

I also think that we might replace the current recursive functions with an 
interpreter that this moving up and down a stack of state objects.

>> but we leave that problem to a refactoring project...)
> This has been said for 10 years now. But refactoring must be a continuous effort
> or it will lose momentum. I take a very pragmatic approach in this and refactor
> exactly what I need to in order to reach a stable implementation of a feature.
> If it becomes a performance hit, let's optimize it.

I just say: Stick with the current design until we have a clear plan on how to 
refactor execution. Refactoring without a plan is worse than no refactoring.

And I do not accept a solution that considers performance as an afterthought.

It seems to me that the classes you propose can be replaced with the following 
bits and pieces:

  - A field in the join-tab that is NULL or contains the shortcut target.

  - Code that sets return_tab equal to the shortcut at the start of a scan,
    if shortcut target is non-NULL.

  - Code that nulls return_tab when the scan delivers at least one matching row,
    if shortcut target is non-NULL.

That would be one additional test when initiating a row scan, and one additional 
test when a row is accepted.

>>
>>> if (end_of_records)
>>> {
>>> + join->return_tab= join_tab;
>>> + state.set_return_tab(join_tab);
>>> enum_nested_loop_state nls=
>>> (*join_tab->next_select)(join,join_tab+1,end_of_records);
>>> DBUG_RETURN(nls);
>>> @@ -17144,6 +17155,7 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>> }
>>>
>>> join->return_tab= join_tab;
>>> + state.set_return_tab(join_tab);
>>> join_tab->not_null_compl= TRUE;
>>>
>>> if (join_tab->last_inner)
>>> @@ -17159,11 +17171,15 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>> join->thd->warning_info->reset_current_row_for_warning();
>>>
>>> error= (*join_tab->read_first_record)(join_tab);
>>> + state.set_read_error(error);
>>> + join_tab->notify_row_read(&state);
>>>
>>> if (join_tab->keep_current_rowid)
>>> join_tab->table->file->position(join_tab->table->record[0]);
>>> -
>>> +
>>> rc= evaluate_join_record(join, join_tab, error);
>>> +
>>> + join->return_tab= state.get_return_tab();
>>>
>>> /*
>>> Note: psergey has added the 2nd part of the following condition; the
>>> @@ -17172,11 +17188,16 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>> while (rc == NESTED_LOOP_OK&& join->return_tab>= join_tab)
>>> {
>>> error= info->read_record(info);
>>> + state.set_read_error(error);
>>> + join_tab->notify_row_read(&state);
>>>
>>> if (join_tab->keep_current_rowid)
>>> join_tab->table->file->position(join_tab->table->record[0]);
>>> -
>>> +
>>> rc= evaluate_join_record(join, join_tab, error);
>>> +
>>> + if (state.get_return_tab()< join->return_tab)
>>> + join->return_tab= state.get_return_tab();
>>> }
>>>
>>> if (rc == NESTED_LOOP_NO_MORE_ROWS&&
>>> @@ -17342,14 +17363,23 @@ evaluate_join_record(JOIN *join, JOIN_TA
>>> }
>>> DBUG_PRINT("info", ("select cond 0x%lx", (ulong)select_cond));
>>>
>>> + Execution_state state;
>>> + state.set_return_tab(join->return_tab);
>>> + state.set_read_error(error);
>>> if (select_cond)
>>> {
>>> found= test(select_cond->val_int());
>>> + state.set_evaluation(found);
>>>
>>> /* check for errors evaluating the condition */
>>> if (join->thd->is_error())
>>> DBUG_RETURN(NESTED_LOOP_ERROR);
>>> }
>>> + else
>>> + state.set_evaluation(TRUE);
>>> +
>>> + join_tab->notify_row_evaluated(&state);
>>> +
>>> if (found)
>>> {
>>> /*
>>> @@ -17414,7 +17444,7 @@ evaluate_join_record(JOIN *join, JOIN_TA
>>> join_tab->first_unmatched= first_unmatched;
>>> }
>>>
>>> - JOIN_TAB *return_tab= join->return_tab;
>>> + JOIN_TAB *return_tab= state.get_return_tab();
>>>
>>> if (join_tab->check_weed_out_table&& found)
>>> {
>>> @@ -22764,6 +22794,7 @@ void select_describe(JOIN *join, bool ne
>>> extra.append(STRING_WITH_LEN(", incremental buffers)"));
>>> }
>>>
>>> + tab->notify_explain_extra_column_ready(join,&extra);
>>> /* Skip initial "; "*/
>>> const char *str= extra.ptr();
>>> uint32 len= extra.length();
>>>
>>> === modified file 'sql/sql_select.h'
>>> --- a/sql/sql_select.h 2010-09-06 17:56:05 +0000
>>> +++ b/sql/sql_select.h 2010-09-20 13:03:54 +0000
>>> @@ -179,6 +179,57 @@ inline bool sj_is_materialize_strategy(u
>>> return strategy>= SJ_OPT_MATERIALIZE_LOOKUP;
>>> }
>>>
>>> +
>>> +/**
>>> + Part of the IExecution_listener interface. An execution listener is
>>> + notified by the plan node upon various events in the execiton plan. A
>>> + listener may alter the state and thus change the course of the query
>>> + execution by altering the execution state. The query execution engine may
>>> + always overrride the decision, however.
>>
>> The last sentence does not make sense to me.
> I misspelled 'override'. ;-)
>
> In practice what it means is that the Execution_state does not manipulate the
> plan (the JOIN_TAB) directly. This is the design. It only updates its own member
> (m_return_tab) which sub_select() picks up later. I wanted to save the option
> (in the design) for sub_select not to pick it up.
>> And "various events" is just loose talk. I want exact specifications, please :)
> I was intentionally vague here so that this documentation wouldn't drift out of
> sync if one adds another event to the design. Extensibility in mind :-)

IMHO being vague is a cardinal sin when documenting code. The sole purpose of 
documentation is to enhance understanding of the code.
>>> +*/
>>> +class Execution_state {
>>> + int m_read_error;
>>
>> What does "m_read_error" has to do with execution? I thought a read error
>> would cause immediate exit, and nothing that we need to track on a table basis.
> What do you mean by 'immediate exit' in this context? m_read_error is used to
> store the value returned by join_tab->read_first_record(). I believe negative
> values mean 'no more rows' and positive ones mean 'serious error' (see
> evaluate_join_record). But I see no proper spec anywhere.

Well, there is a confusion in the code that it mixes error codes and EOF state 
for a scan. I thought you would not make the same mistake ;)

What I mean with "immediate exit" is that you do not attempt any recovery when 
an error occurs, you just clean up what is possible and exit execution 
immediately. Thus, the m_read_error mostly contains a scan state, so it is 
clearly mis-named.
>>
>>> + bool m_evaluation;
>>
>> What is this?
> It's filled in by evaluate_join_record(), whether it accepts or rejects a tuple.
> It's not commented because it's a private member, but perhaps I should comment
> the getXXX and setXXX methods in Execution_state?

"accepted" would be a better name, then. Besides, it seems to be the same as the 
"bool found" within the join-tab, so it is duplicated information.

And why is not comments justified for private members?
>>
>>> + struct st_join_table *m_return_tab;
>>> +
>>> +public:
>>> +
>>> + Execution_state() : m_read_error(0), m_evaluation(FALSE),
> m_return_tab(NULL)
>>> + {}
>>> +
>>> + int get_read_error() { return m_read_error; }
>>> + void set_read_error(int read_error) { m_read_error = read_error; }
>>> +
>>> + int get_evaluation() { return m_evaluation; }
>>> + void set_evaluation(bool evaluation) { m_evaluation= evaluation; }
>>> +
>>> + struct st_join_table *get_return_tab() { return m_return_tab; }
>>> +
>>> + /**
>>> + Signals to query execution that a short-cut be taken in the query
>>> + execution, and no further tuples should be read from tables appearing
>>> + after return_tab in the query execution plan.
>>> +
>>> + @param return_tab The latest plan node where query execution should be
>>> + resumed.
>>> + */
>>> + void set_return_tab(struct st_join_table *return_tab) {
>>> + m_return_tab = return_tab;
>>> + }
>>> +};
>>> +
>>> +
>>> +class IExecution_listener {
>>> +public:
>>> + virtual void row_read(struct st_join_table *join_tab,
>>> + Execution_state *state) {}
>>> + virtual void row_evaluated(struct st_join_table *join_tab,
>>> + Execution_state *state) {}
>>> + virtual void explain_extra_column_ready(JOIN *join, String *extra) {}
>>> + virtual ~IExecution_listener() {}
>>> +};
>>
>> I am a bit concerned about the efficiency of your implementation. You are
>> performing quite a number of checks that generally seem unneeded.
> I am concerned, too. :-) For the record, this is totally unoptimized code. For
> my first commit I just wanted to program the design as straightforwardly and
> obviously as possible, and then optimize once the design is communicated and
> approved.
>
> If I do any checks that are unnecessary I would welcome if you pointed them out.
> We should make them DBUG_ASSERT's in that case.
>
> What I am concerned about is not the virtual method calls. These are as
> efficient as function pointers. But it might be a problem to traverse a list of
> listeners. Do you have any suggestions?

See above. I do not think there is anything currently that justifies the list 
implementation.
>>
>> Is it possible to implement a more "lightweight" solution
> What would it be characterized by? More like in-lining the code in sub_select
> and evaluate_join_record?

See above.

>> that is more in line with the code style currently being used in the
> executioner?
> I must say the executioner exhibits a great lack of code style. I can't really
> see how it could be a role model.

I did not ask that you look upon it as a role model, nevertheless I do ask you 
to follow the code style. There's probably a lot that can be said about the 
style, but at least it encourages efficiency.
>
>>
>>> +
>>> +
>>> /**
>>> Bits describing quick select type
>>> */
>>> @@ -186,11 +237,25 @@ enum quick_type { QS_NONE, QS_RANGE, QS_
>>>
>>> typedef struct st_join_table : public Sql_alloc
>>> {
>>> +private:
>>> + List<IExecution_listener> listeners;
>>> +public:
>>> st_join_table();
>>>
>>> TABLE *table;
>>> KEYUSE *keyuse; /**< pointer to first used key */
>>> SQL_SELECT *select;
>>> + /**
>>> + This field corresponds somewhat to a pushdown condition. It may
>>> + contain more than just those conditions pushed to the actual table,
>>> + however.
>>
>> Where is a "pushdown condition" defined?
> Section 1.1 of the HLS. Fifth bullet.

Ah. OK. Actually, it is easy to confuse "pushdown condition" with a condition 
that is pushed down to the storage engine level ;) Is "Table condition" better?

>> And please do not use the word "somewhat", but describe exactly what the field
>> is.
> The contents are most likely partially undefined.

It is still not useful information, please delete it.

>>> +
>>> + @note For the first table in the plan there is usually a select_cond
>>> + attached but it is not initialized.
>>
>> You mean it is NULL? I guess this is unnecessary information.
> No, not NULL. I mean the object exists but is not properly initialized. Most of
> its members are undefined. That is, the job that you would expect a constructor
> to do is not done.

Still, this is confusing information. If there is a particular problem, you 
should document that (or fix it).
>>> +
>>> + @note Not all pushdown conditions are found in this field. @c ref access
>>> + conditions are removed from here and are encoded in JOIN_TAB::ref.
>>
>> The basic idea is that this field specifies the remaining conditions to be
>> evaluated on the currently available rows, after you have navigated to that
>> set of rows. However, this excludes outer join conditions that are located in
>> the join nest structures.
> Right.
>>
>>> + */
>>> Item *select_cond;
>>> QUICK_SELECT_I *quick;
>>> Item **on_expr_ref; /**< pointer to the associated on expression */
>>> @@ -385,6 +450,32 @@ typedef struct st_join_table : public Sq
>>> return tmp_select_cond;
>>> }
>>> uint get_sj_strategy() const;
>>> + void add_execution_listener(IExecution_listener *listener)
>>> + {
>>> + listeners.push_front(listener);
>>> + }
>>> + void notify_row_read(Execution_state *state)
>>> + {
>>> + IExecution_listener* listener;
>>> + for (List_iterator<IExecution_listener> it(listeners);
>>> + (listener= it++);)
>>> + listener->row_read(this, state);
>>> + }
>>> + void notify_row_evaluated(Execution_state *state)
>>> + {
>>> + IExecution_listener* listener;
>>> + for (List_iterator<IExecution_listener> it(listeners);
>>> + (listener= it++);)
>>> + listener->row_evaluated(this, state);
>>> +
>>> + }
>>> + void notify_explain_extra_column_ready(JOIN *join, String *extra)
>>> + {
>>> + IExecution_listener* listener;
>>> + for (List_iterator<IExecution_listener> it(listeners);
>>> + (listener= it++);)
>>> + listener->explain_extra_column_ready(join, extra);
>>> + }
>>> } JOIN_TAB;
>>>
>>>
>>> @@ -1761,11 +1852,12 @@ public:
>>> List<TABLE_LIST> *join_list; ///< list of joined tables in reverse
> order
>>> COND_EQUAL *cond_equal;
>>> /*
>>> + Interface for taking short-cuts through join execution.
>>
>> I do not think this sentence adds information.
> My bad. This should have been a Doxygen comment. Doxygen puts the first sentence
> of a comment in a special 'brief' section so one should strive to summarize what
> the comment says in the first sentence.

And it is a field, not an interface.
>
> http://www.stack.nl/~dimitri/doxygen/commands.html#cmdbrief
>>
>>> Join tab to return to. Points to an element of join->join_tab array, or
> to
>>> join->join_tab[-1].
>>> This is used at execution stage to shortcut join enumeration. Currently
>>> shortcutting is done to handle outer joins or handle semi-joins with
>>> - FirstMatch strategy.
>>> + FirstMatch strategy. It is also used to optimize execution of star joins.
>>> */
>>> JOIN_TAB *return_tab;
>>> Item **ref_pointer_array; ///<used pointer reference for this select
>>> @@ -1933,6 +2025,46 @@ public:
>>> ((group || tmp_table_param.sum_func_count)&& !group_list)) ?
>>> NULL : join_tab+const_tables;
>>> }
>>> + /**
>>> + Returns true if JOIN_TAB's a and b participate on the same nesting level
>>> + within a chain of inner joined tables.
>>> +
>>> + @note The optimizer execution plan is considered, and outer joins may
>>> + have been rewritten into inner joins.
>>> + */
>>> + static bool are_inner_joined(JOIN_TAB *a, JOIN_TAB *b)
>>> + {
>>> +
>>> + if (a> b) {
>>> + JOIN_TAB *c= a;
>>> + a= b;
>>> + b= c;
>>> + }
>>> +
>>> + if (a->last_inner != NULL&& // a is in an outer join nest, then
>>> + a->last_inner == b->last_inner&& // a must be at least @ b's
> nest level
>>> + b->first_inner == NULL) // and b must not be deeper nested
>>> + return TRUE;
>>> +
>>> + if (a->last_inner == NULL&& // a is not an outer join nest
>>> + b->last_inner == NULL&& b->first_inner == NULL) // neither is
> b
>>> + return TRUE;
>>> +
>>> + if (b->last_inner == NULL&& // b is not first in a nested inner
> join
>>> + b->first_inner != NULL&& b->first_inner == a->first_inner)
> // but a is
>>> + return TRUE;
>>> +
>>> + return FALSE;
>>> + }
>>
>> I have not analyzed this algorithm in detail :)
> It's not an algorithm per se. Just a view over of the last/first _inner pointers.
>
>> Based on our IRC discussion, do you think it is possible to simplify the
>> implementation?
> Do you mean implement it in terms of is_inner_table_of_outer_join() et al? I
> guess I could try...

Or just use embedding and exclude outer-joined tables?
>>
>>> +
>>> + static bool contains_join_buffering(JOIN_TAB *a, JOIN_TAB *b)
>>> + {
>>> + for (JOIN_TAB *join_tab= a; join_tab< b; join_tab++)
>>> + if (join_tab->next_select == sub_select_cache)
>>> + return TRUE;
>>> + return FALSE;
>>
>> Is it possible to use the use_join_cache flag instead? Relying on generated
>> "code"
>> seems dangerous...
> Good point. Will look into that.
>>
>>> + }
>>> +
>>> private:
>>> /**
>>> TRUE if the query contains an aggregate function but has no GROUP
>>>
>>> === added file 'sql/sql_set.h'
>>> --- a/sql/sql_set.h 1970-01-01 00:00:00 +0000
>>> +++ b/sql/sql_set.h 2010-09-20 13:03:54 +0000
>>> @@ -0,0 +1,112 @@
>>> +#ifndef INCLUDES_MYSQL_SQL_SET_H
>>> +#define INCLUDES_MYSQL_SQL_SET_H
>>> +/* Copyright (c) 2000, 2010 Oracle and/or its affiliates. All rights
> reserved.
>>> +
>>> + This program is free software; you can redistribute it and/or modify
>>> + it under the terms of the GNU General Public License as published by
>>> + the Free Software Foundation; version 2 of the License.
>>> +
>>> + This program is distributed in the hope that it will be useful,
>>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>> + GNU General Public License for more details.
>>> +
>>> + You should have received a copy of the GNU General Public License
>>> + along with this program; if not, write to the Free Software
>>> + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> */
>>> +
>>> +#include "my_global.h"
>>> +
>>> +template<class T> class Bitmap_set_iterator;
>>> +template<class T> class Bitmap_set_reverse_iterator;
>>> +
>>> +/**
>>> + A set implemented as a bitmap. Due to this implementation, operations that
>>> + normally are quite expensive, typically having O(n log n) complexity, are
>>> + extremely cheap. A set consists of a bitmap and an array of pointers to
>>> + elements, the backing store. Each n:th bit in the bitmap represents the
>>> + member at position n in the array, where the first member has position 0.
> A
>>> + set consisting of this element is represented by the bitmap 1.
>>> +
>>> + For obvious reasons, it cannot be allowed to perform a union operation on
>>> + two sets that have different backing stores, even if their type is the
>>> + same. Attempts at such operations will result in failed assertions.
>>> + */
>>
>> AFAIU, the immediate gain of this class is that you get type-safety for the
>> operations
>> on the backingstore elements, and that you avoid doing the indexing into
>> the backingstore array.
> The idea is to use a simple iterator interface to access the data and to hide
> the fact that it is represented by a bitmap. It annoys the hell out of me that
> the code is full of 'sets' that are just int32's, table_map's whatever. And
> don't get me started on the 'magic' bits that people shove in there just because
> the representation is wide open. Pick your favorite among PARAM_TABLE_BIT,
> OUTER_REF_TABLE_BIT, RAND_TABLE_BIT...
>
> You can see this great benefit in action in set_star_dependency(). If I didn't
> have an iterator I would have to slide a bitmap over an integer, dereference
> pointers and whatnot.

64-bit bitmaps already exist and iterators over bitmaps already exist. The novel 
part is the backingstore which avoid an array indexing.
>>
>> The downside is more complexity
> IMHO This class *removes* complexity. Have you seen how many bizarre bitwise
> operations are done in the code? And how many bugs they cause?

Well, if you want to extend beyond 61 tables in a join, you now need to extend 
two bitset implementations.
>
> Do you really not see any expressive advantage of "if (set_difference(a, b,
> c).is_empty())" over "if (((a & ~b) & ~c))"? The compiled code looks exactly
> the
> same.

Yes, to some extent I do. I have tried the same myself, but after all I decided 
the pain was bigger than the gain. And I have come to understand the bitmap 
tweaks pretty well by now. It's not so much the bitmaps, but the incredibly 
complex 10-part and/or-expressions that make the code hard to read and maintain.
>
>> (yet another bitmap type,
> Yes, but one to rule them all. This type can encompass all the other types and
> hide *all* of them behind a consistent interface that lets you switch
> representation. It could be represented as uint32, long int, short, or indeed
> anything.
>> templates)...
> Templates are part of the C++ language. Granted, they may have had a bad
> reputation 15 years ago but there is no reason for treating them like
> second-class citizens anymore.

They have a place, but IMHO they should be used sparingly.

>>> +template<class T> class Bitmap_set {
>>> +
>>> +private:
>>> + ulonglong m_map;
>>> + T** m_backing_store;
>>> +
>>> +public:
>>> + Bitmap_set(ulonglong map, T** backing_store) :
>>> + m_map(map), m_backing_store(backing_store) {}
>>> +
>>> + bool is_empty() { return m_map == 0; }
>>> +
>>> + static Bitmap_set<T> set_union(const Bitmap_set<T>& a,
>>> + const Bitmap_set<T>& b) {
>>> + DBUG_ASSERT(a.m_backing_store == b.m_backing_store);
>>> + Bitmap_set<T> result(a.m_map | b.m_map, a.m_backing_store);
>>> + return result;
>>> + }
>>> +
>>> + friend class Bitmap_set_iterator<T>;
>>> + friend class Bitmap_set_reverse_iterator<T>;
>>> +};
>>> +
>>> +template<class T> class Bitmap_set_iterator {
>>> +private:
>>> + const Bitmap_set<T>& m_set;
>>> + uint m_current_element;
>>> +
>>> +public:
>>> + Bitmap_set_iterator(const Bitmap_set<T>& set) :
>>> + m_set(set), m_current_element(0) {}
>>> +
>>> + inline T* operator++(int) {
>>> + ulonglong mask= 1<< m_current_element;
>>> + while ((mask& m_set.m_map) == 0)
>>> + {
>>> + if (mask == 0 || mask> m_set.m_map)
>>> + return NULL;
>>> + mask<<= 1;
>>> + ++m_current_element;
>>> + }
>>> + T* element= m_set.m_backing_store[m_current_element++];
>>> + return element;
>>> + }
>>> +
>>> +};
>>> +
>>> +template<class T> class Bitmap_set_reverse_iterator {
>>> +private:
>>> + const Bitmap_set<T>& m_set;
>>> + int m_current_element;
>>> +
>>> +public:
>>> + Bitmap_set_reverse_iterator(const Bitmap_set<T>& set) :
>>> + m_set(set), m_current_element(0)
>>> + {
>>> + ulonglong map= set.m_map;
>>> + while ((map>>= 1) != 0)
>>> + ++m_current_element;
>>> + }
>>> +
>>> + inline T* operator++(int) {
>>> + if (m_current_element == -1)
>>> + return NULL;
>>> + ulonglong mask= 1<< m_current_element;
>>> + while ((mask& m_set.m_map) == 0)
>>> + {
>>> + if (mask == 0)
>>> + return NULL;
>>> + mask>>= 1;
>>> + --m_current_element;
>>> + }
>>> + T* element= m_set.m_backing_store[m_current_element--];
>>> + return element;
>>> + }
>>> +
>>> +};
>>> +#endif // INCLUDES_MYSQL_SQL_SET_H
>>>
>>> === added file 'sql/sql_shortcut.h'
>>> --- a/sql/sql_shortcut.h 1970-01-01 00:00:00 +0000
>>> +++ b/sql/sql_shortcut.h 2010-09-20 13:03:54 +0000
>>> @@ -0,0 +1,208 @@
>>> +#ifndef INCLUDES_MYSQL_SQL_SHORTCUT_H
>>> +#define INCLUDES_MYSQL_SQL_SHORTCUT_H
>>> +
>>> +/* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights
> reserved.
>>> +
>>> + This program is free software; you can redistribute it and/or modify
>>> + it under the terms of the GNU General Public License as published by
>>> + the Free Software Foundation; version 2 of the License.
>>> +
>>> + This program is distributed in the hope that it will be useful,
>>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>> + GNU General Public License for more details.
>>> +
>>> + You should have received a copy of the GNU General Public License
>>> + along with this program; if not, write to the Free Software
>>> + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
>>> +
>>> +#include "sql_select.h"
>>> +
>>> +class Star_join_shortcut : public IExecution_listener {
>>> +
>>> +private:
>>> +
>>> + JOIN_TAB *m_destination;
>>> + bool m_current_prefix_successful;
>>
>> What is meant by "success" here?
> Comment added.
>>
>>> +
>>> +Star_join_shortcut(JOIN_TAB *destination) :
>>> + m_destination(destination), m_current_prefix_successful(FALSE) {
>>> + DBUG_ASSERT(m_destination != NULL);
>>> + }
>>> +
>>> +public:
>>> +
>>> + /**
>>> + Pre-compute short-cuts for the join short-cut optimization.
>>> +
>>> + This function adorns the query exeution plan with star join short-cuts
>>> + where applicable. Short-cuts are triggered through the
>>> + IExecution_listener interface. The algorithm is implemented as a nested
>>> + loop, where the outer loop walks the nodes (JOIN_TAB's) in the query
>>> + execution plan, and the inner loop iterates through all tables that this
>>> + table depends upon. Valid short-cuts have the following property.
>>> +
>>> + - The short-cut has a minimum length of 2. If a table is dependent on the
>>> + table directly preceding it, the short-cut would be of length 1, and it
>>> + would indicate that there are no valid short-cuts from this table.
>>> +
>>> + - The short-cut does not cross an outer join boundary. In this case the
>>> + optimization is not applicable, because a valid result would be missed.
>>> +
>>> + - The short-cut does not cross a node using join buffers. Short-cuts are
>>> + not compatible with join buffering.
>>> +
>>> + - If there are several applicable shortcuts by the above requirements, the
>>> + shortest one is chosen.
>>> +
>>> + @param join The query execution plan. The plan is assumed to be complete
>>> + and ready for execution when this method is invoked.
>>> + */
>>> + static void setup_shortcuts(JOIN *join)
>>> + {
>>> + if(!join->join_tab)
>>> + return;
>>> +
>>> + for (uint i= 2; i< join->tables; i++)
>>> + set_star_dependency(join,&join->join_tab[i]);
>>> + }
>>> +
>>> + void row_read(JOIN_TAB *join_tab, Execution_state *state)
>>> + {
>>> + if (state->get_read_error() != 0)
>>> + {
>>> + if (!m_current_prefix_successful)
>>> + handle_read_failure(join_tab, state);
>>> + m_current_prefix_successful= FALSE;
>>> + }
>>> + }
>>> +
>>> +
>>> + /**
>>> + If a row has passed the push-down condition test for the current prefix
>>> + of the partial result tuple, we record that here. When there are no more
>>> + rows left, it is allowed to take a short-cut if now rows passed the test.
>>> + */
>>> + void row_evaluated(JOIN_TAB *join_tab, Execution_state *state) {
>>> + if (state->get_evaluation())
>>> + m_current_prefix_successful= TRUE;
>>> + }
>>> +
>>> + /**
>>> + Appends information about the short-cut optimization to EXPLAIN output, if
>>> + used for the current plan node.
>>> +
>>> + @param join The execution plan.
>>> + @param join_tab The plan node to be annotated.
>>> + @param extra The current content of the 'extra' column in EXPLAIN.
>>> + */
>>> + void explain_extra_column_ready(JOIN *join, String *extra)
>>> + {
>>> + extra->append(STRING_WITH_LEN("; StarJoin("));
>>> + extra->append(m_destination->table->alias);
>>> + extra->append(STRING_WITH_LEN(")"));
>>> + }
>>> +
>>> +private:
>>> +
>>> + /**
>>> + Sets the short-cut from from one plan node to another, if one is found.
>>> +
>>> + @param join The execution plan.
>>> +
>>> + @param join_tab The plan node from which a short-cut willpotentially be
>>> + taken.
>>> + */
>>> + static void set_star_dependency(JOIN *join, JOIN_TAB *join_tab)
>>> + {
>>> + table_map where_clause_tables_map;
>>> + if (join_tab->select_cond != NULL)
>>> + {
>>> + Item *where_condition= join_tab->select_cond;
>>> + where_condition->update_used_tables();
>>
>> Do you need to update used tables here?
> I gather you have not tried removing it. ;-) Yes it is needed.

I think that if used tables information for the table conditions needs updating 
after optimization is "complete", it is indicative of an error earlier in the code.

Compensating with update_used_tables() is perhaps wrong?
>>
>>> + where_clause_tables_map=
>>> + where_condition->used_tables()& ~PSEUDO_TABLE_BITS;
>>> + }
>>> + else
>>> + where_clause_tables_map= (table_map)0L;
>>> +
>>> + table_map ref_access_tables_map=
>>> + join_tab->ref.depend_map& ~PSEUDO_TABLE_BITS;
>>> +
>>> + Bitmap_set<JOIN_TAB>
>>> + where_clause_tables(where_clause_tables_map, join->map2table);
>>> +
>>> + Bitmap_set<JOIN_TAB>
>>> + ref_access_tables(ref_access_tables_map, join->map2table);
>>> +
>>> + Bitmap_set<JOIN_TAB> all_dependency_tables=
>>> + Bitmap_set<JOIN_TAB>::set_union(where_clause_tables,
> ref_access_tables);
>>
>> all_dependency_tables -> dependent_tables ?
> That would be 'tables_depended_upon' in that case.

OK
>>
>>
>>> +
>>> + if (all_dependency_tables.is_empty())
>>> + return;
>>> +
>>> + for(Bitmap_set_reverse_iterator<JOIN_TAB> it(all_dependency_tables);
>>> + update_star_dependency(join, join_tab, it++);) ;
>>> + }
>>> +
>>> +
>>> + /**
>>> + Updates a short-cut from from one plan node to another. The method
>>> + assumes that it is invoked on plan nodes in reverse order in which they
>>> + appear in the query execution plan.
>>
>> -> Order from the most inner to the most outer table?
> What is meant by inner and outer in this case?

For nested-loop join, it is common to start with the "most outer" table, until 
you reach the "most inner" table.
>>
>>> +
>>> + @param join The execution plan.
>>> +
>>> + @param join_tab The plan node from which a short-cut may potentially be
>>> + taken.
>>> +
>>> + @param dependent A node that access to join_tab depends on.
>>> +
>>> + @retval TRUE There may be other short-cuts possible early in the plan.
>>> + @retval FALSE There are no earlier short-cuts available. Either the only
>>> + possible one was found and completely initialized, or there are none.
>>> + */
>>> + static bool update_star_dependency(JOIN *join, JOIN_TAB *join_tab,
>>> + JOIN_TAB *dependency)
>>> + {
>>> + if (dependency == NULL)
>>> + return FALSE;
>>> +
>>> + if (join_tab<= dependency)
>>> + return TRUE;
>>> +
>>> + if (dependency == join_tab - 1)
>>> + {
>>> + DBUG_PRINT("info", ("Invalidated a join short-cut from %s to %s",
>>> + join_tab->table->alias,
>>> + dependency->table->alias));
>>> + return FALSE;
>>> + }
>>> + else if (join->are_inner_joined(dependency, join_tab)&&
>>> + !join->contains_join_buffering(dependency, join_tab))
>>> + {
>>> + DBUG_PRINT("info", ("Found a join short-cut from %s to %s",
>>> + join_tab->table->alias,
>>> + dependency->table->alias));
>>> +
>>> + join_tab->add_execution_listener(new Star_join_shortcut(dependency));
>>> +
>>> + return FALSE;
>>> + }
>>> + return TRUE;
>>> + }
>>> +
>>> +private:
>>> +
>>> + void handle_read_failure(JOIN_TAB *join_tab, Execution_state *state) {
>>> + if (m_destination != NULL)
>>> + {
>>> + state->set_return_tab(m_destination);
>>> + DBUG_PRINT("info", ("Execution short-cut to '%s'",
>>> + m_destination->table != NULL ?
>>> + m_destination->table->alias : ""));
>>> + }
>>> + }
>>> +};
>>> +
>>> +#endif // INCLUDES_MYSQL_SQL_SHORTCUT_H
>>>
>>> === modified file 'sql/structs.h'
>>> --- a/sql/structs.h 2010-07-13 17:29:44 +0000
>>> +++ b/sql/structs.h 2010-09-20 13:03:54 +0000
>>> @@ -119,6 +119,27 @@ struct st_join_table;
>>> typedef struct st_reginfo { /* Extra info about reg */
>>> struct st_join_table *join_tab; /* Used by SELECT() */
>>> enum thr_lock_type lock_type; /* How database is used */
>>> + /**
>>> + This is an outer join optimization. The idea is to avoid
>>> + NULL-complementing a row for the inner table(s). The bespoke case is when
>>> + we have a query such as
>>> +
>>> + @code
>>> +
>>> + SELECT ...
>>> + FROM t1 LEFT OUTER JOIN (t2, t3) ON<condition>
>>> + WHERE t2.non_nullable_column IS NULL
>>> +
>>> + @endcode
>>> +
>>> + In this case not_exists_optimize would be set for t2.
>>> +
>>> + Once the query execution reaches the place where<condition> holds
> for
>>> + the current result tuple, we know that there cannot be a row where the
>>> + WHERE condition holds. There is no need produce a NULL-complemented row.
>>> +
>>> + @see update_ref_and_keys
>>
>> These comments are repeated at two places in the code.
> Please show me where, I can't find it.

In sql_select.cc,
At around line 6112, there is documentation for when this flag is set.
At around line 17420, there is documentation on the use of the flag.
>
>
> Thank you for the comprehensive review!
>
> Best Regards
>
> Martin
>

Roy
Thread
bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218) WL#3724Martin Hansson20 Sep
Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng27 Sep
  • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Martin Hansson28 Sep
    • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng29 Sep
      • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Martin Hansson29 Sep
        • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng29 Sep