List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:September 29 2010 1:31pm
Subject:Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)
WL#3724
View as plain text  
On 29.09.10 13.59, Martin Hansson wrote:
> Hi Roy,
> while I do believe that the 'call and response' method is an efficient means of
> communication, we may need quick summary as well. My comments and naming of
> fields are very important subjects. But I believe a discussion of whether those
> fields and classes should be there in the first place should take precedence.
>
> So I shamelessly ask right up front, what should go and what should stay?
>
> - The Star_join_shortcut class - Personally I see a merit in every optimization
> being contained in its own class, all other thing being equal. If you disagree,
> we should start the discussion here.

I envision the executor as one big "engine", and the optimizations tuning that 
engine in various ways. But multiple optimizations may use the same tuning 
knobs, in no way will we want each optimization to be represented by its own 
optimizer class.
>
> - The listener pattern - Hardly any point in talking about unless the above is
> agreed upon.

Yepp...
>
> - The Execution_state class - Not a necessity either, but I can't see any
> performance problem with it.

I just see that class as an unnecessary abstraction, and we should design a 
"better" executor model before decorating with new abstractions.
>
> - Listeners list - will be gone in the next commit.
>
> Some history about the list of listeners:
>
> A possible extension to the optimization is the addition of a loose scan
> strategy. It does the following. When a shortcut has been taken back to its
> destination plan node, let's call it t1, we might be reading several rows that
> will yield the same result: We traverse through all tables until we take a
> shortcut again. And again for every row of t1 until we reach a row in t1 with a
> different value. A loose scan strategy will use the handler facility to skip to
> the next unique value in an index. (It only works for index scan, obviously).
>
> The loose scan could also be implemented as a subclass of IExecution_listener.
> So in essence a plan node which is both the origin and destination of a
> short-cut would have two listeners: A short-cut and a loose scan. Hence the
> list. For semijoin we already have a fake loose scan (it is a tight scan with a
> quick reject) but alas it is too tightly coupled to semijoin to reuse. If I
> implemented a reusable loose scan, it could be used in both cases. I quickly
> realized time would not permit this, however.
>
> That's the origin of the list.

Ok.
>
> Roy Lyseng wrote:
>> On 28.09.10 20.59, Martin Hansson wrote:
>>> Roy Lyseng skrev 2010-09-27 16.08:
>>>> Hi Martin,
>>>>
>>>> thanks for working on this!
>>>>
>>>> Some early comments to the implementation.
>>>>>
>>>>>
>>>>> === modified file 'sql/item.h'
>>>>> --- a/sql/item.h 2010-09-07 19:07:18 +0000
>>>>> +++ b/sql/item.h 2010-09-20 13:03:54 +0000
>>>>> @@ -860,7 +860,16 @@ public:
>>>>> virtual bool val_bool_result() { return val_bool(); }
>>>>> virtual bool is_null_result() { return is_null(); }
>>>>>
>>>>> - /* bit map of tables used by item */
>>>>> + /**
>>>>> + Conceptually, this is the set of tables directly referenced by this
> Item.
>>>>> +
>>>>> + For an Item_field, the set contains simply the table to which the
> field
>>>>> + belongs.
>>>>> +
>>>>> + For pushdown conditions, used_tables is the set tables referenced
> by the
>>>>> + pushdown condition recursively, *including* the table to which the
>>>>> + condition is pushed.
>>>>> + */
>>>>
>>>> Please only keep the first line here, and remove the word
> "Conceptually".
>>>> The two other comments might belong with the respective implementations.
>>> For the middle sentence, agreed.
>>>> BTW, a pushdown condition is not a particular class of Items, right?
>>> No, it isn't. That's why I put that comment there.
>>
>> Then it is better to put the comment on the field(s) that implement pushdown
>> conditions, most likely the "select_cond" field of the join-tab.
> Done.
>>
>>>>
>>>>> virtual table_map used_tables() const { return (table_map) 0L; }
>>>>> /*
>>>>> Return table map of tables that can't be NULL tables (tables that
> are
>>>>>
>>>>> === modified file 'sql/records.h'
>>>>> --- a/sql/records.h 2010-07-13 17:29:44 +0000
>>>>> +++ b/sql/records.h 2010-09-20 13:03:54 +0000
>>>>> @@ -38,6 +38,34 @@ class SQL_SELECT;
>>>>> ...
>>>>> }
>>>>> end_read_record();
>>>>> +
>>>>> + The READ_RECORD could be called a record reader, and acts as
>>>>
>>>> I am not interested in what it "could" be called, I am interested in what
> it
>>>> is... :)
>>> Actually, I mean it *should* be called RecordReader. :-)
>>>
>>> Furthermore, READ_RECORD is a completely pointless name. It isn't a record.
> As a
>>> method name, readRecord would make sense because its a verb. I frequently
>>> translate our bogus names in my mind to something meaningful. When I see
> 'Item',
>>> I read 'Expression', TABLE_LIST becomes TableReference, etc.
>>
>> Nevertheless, please do not try to express your opinions on the code in the
>> comments that you write. If you want to document existing structs, please use
>> as consistent language as possible. I am really afraid that trying to document
>> what a struct "should" be called only confuses developers that come after you
>> and try to understand the structure of the system.
>>
>> So, the documentation should be like this:
>>
>> "READ_RECORD is an interface that is used to retrieve records one by one..."
>>
>> I agree with you about the naming, but writing notes for refactoring should go
>> in a different place.
> Done.
>>
>>>>> +
>>>>> + - An abstraction layer beneath the query execution engine. The only
> read
>>>>> + operation the execution engine has to know of is reading the next
>>>>> + record. Beneath the abstraction layer is the storage engine API,
> which is
>>>>> + invoked differently for table/index scans and key lookups, and the
> join
>>>>> + buffer.
>>>>
>>>> An abstraction layer between the query execution engine and the storage
> engine
>>>> API.
>>> done.
>>>>
>>>>
>>>>> +
>>>>> + - An executable sub-program, resulting from the optimizer. It is
> set up
>>>>> with a
>>>>> + reference to a handler and a function pointer to a read function
> during
>>>>> + query optimization.
>>>>> +
>>>>> + The abstraction is incomplete, however. For instace, reading the
> first
>>>>> + record from a handler is done through a function pointer in the
> JOIN_TAB
>>>>> + rather than in its READ_RECORD.
>>>>> +
>>>>> + Sometimes the READ_RECORD is initialized during query execution.
>>>>> +
>>>>> + - It is reinitialized during buffered nested loops join, before the
> buffer
>>>>> + is flushed.
>>>>> +
>>>>> + - For many table accesses, it is initialized right before the first
> record
>>>>> + is read.
>>>>> +
>>>>> + The READ_RECORD design can be viewed as a strategy pattern, where
> the
>>>>> + abstract strategy is the READ_RECORD struct and the concrete
> strategy is
>>>>> + determined by the values of the function pointer
> READ_RECORD::read_record.
>>>>> */
>>>>
>>>> I am not sure that it is a good idea to describe how and where a
> particular
>>>> class is used with the definition of that class. Documenting the
> recommended
>>>> usage pattern can be a great idea, but IMHO the pecularies should instead
> be
>>>> documented where they are being used.
>>> I need to know what exactly is referred to before I can comment.
>>
>> I mean the paragraph starting with "Sometimes" and the two following paragraphs.
> Okay. Any ideas where I should put this information instead?

Not really. But you could document the regular use pattern for the class and 
document the irregular ones where they are used, if at all possible.

>>>>
>>>>>
>>>>> class Copy_field;
>>>>>
>>>>> === modified file 'sql/sql_select.cc'
>>>>> --- a/sql/sql_select.cc 2010-09-10 05:14:00 +0000
>>>>> +++ b/sql/sql_select.cc 2010-09-20 13:03:54 +0000
>>>>> @@ -28,6 +28,9 @@
>>>>> #pragma implementation // gcc: Class implementation
>>>>> #endif
>>>>>
>>>>> +#include "sql_set.h"
>>>>> +#include "sql_shortcut.h"
>>>>> +
>>>>> #include "sql_priv.h"
>>>>> #include "unireg.h"
>>>>> #include "sql_select.h"
>>>>> @@ -3488,6 +3491,8 @@ mysql_select(THD *thd, Item ***rref_poin
>>>>> if (thd->is_error())
>>>>> goto err;
>>>>>
>>>>> + Star_join_shortcut::setup_shortcuts(join);
>>>>> +
>>>>
>>>> IMHO setting up shortcuts is part of optimization.
>>> Do you mean this should be done earlier? Granted, it should be done during
>>> cost-based optimization but my current proposal is to do that as another
>>> worklog.
>>
>> I think this code belongs at the end of JOIN::optimize(). This function
>> contains a combination of rule-based and cost-based optimizations, so
>> shortcutting definitely fits in.
> Done. I put it in JOIN::optimize right before that code branches on whether it's
> EXPLAIN or not.
>>
>> Besides, I am not sure that doing cost-based choice for join shortcutting is
>> worthwhile. The cost of the optimization is virtually zero (if the
>> implementation is sufficiently lightweight), and a successful shortcut will
>> always reduce execution cost.
> Reduction is the whole point. The cost-based optimizer can then deliberately
> choose a join order that will give us a short-cut if it's cheaper, whereas now
> it cannot. With proper histograms we can get a very detailed picture of how much
> a short-cut plan will save compared to a plan without it.

Right.

>>
>>>>
>>>>> join->exec();
>>>>>
>>>>> if (thd->lex->describe& DESCRIBE_EXTENDED)
>>>>> @@ -16956,6 +16961,7 @@ sub_select_cache(JOIN *join, JOIN_TAB *j
>>>>> DBUG_ASSERT(cache != NULL);
>>>>>
>>>>> cache->reset_join(join);
>>>>> + join->return_tab= join_tab;
>>>>>
>>>>> DBUG_ENTER("sub_select_cache");
>>>>>
>>>>> @@ -17128,8 +17134,13 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>>>> DBUG_ENTER("sub_select");
>>>>>
>>>>> join_tab->table->null_row=0;
>>>>> + Execution_state state;
>>>>> + state.set_return_tab(join->return_tab);
>>>>> +
>>>>
>>>> You are instantiating an Execution_state object both in sub_select() and
> in
>>>> evaluate_join_record().
>>>>
>>>> First, why do you need two state objects for one table in a join?
>>> The state object is the interface between my optimization and the rest of
> query
>>> execution. It is an extensible and reusable interface of the kind everybody
>>> should strive to write, always. If you want my opinion. :-)
>>
>> My opinion is that join execution should first and foremost be a highly
>> optimized program.
> I agree. But optimization should be based on fact and our intuition may not be
> reliable when it comes to what is optimal or not.

Definitely possible. So replacing the executor must be based on benchmarking and 
not subjective meanings.

>> I do not care about extensible and reusable interfaces, unless there is a plan
>> to extend the use of them. Now I think they are plain overkill.
> They are indeed. See top post for the big picture on this.
>>>
>>> Ideally, the same state object should be passed-by-reference as a paremeter
> from
>>> sub_select to evaluate_join_record, but it seems a bit silly since the same
>>> information is passed as loose parameters anyway. It would be simple to put
> the
>>> little status int's inside the state object but I wanted someone to see the
> code
>>> before spending the effort.
>>>>
>>>> Second, I think of a join-tab as the state object for a table in a join,
> so I
>>>> would rather have this information inside the join-tab object. It also
> occurs
>>>> to me that some information is duplicated, which is a potential
> efficiency and
>>>> consistency problem.
>>>
>>>> (Actually, the join-tab is both a plan operator and a state object,
>>> See, there's the name problem again. :-) I read "PlanNode" when I see
>>> "JOIN_TAB". There is no need IMHO to clutter up the structure even more with
>>> data that could go on the stack instead. If more people had realized this a
>>> JOIN_TAB would be half the size it is now. But it's not too late to turn
> around.
>>
>> Actually, I do not agree with you. Using the stack can be an option, but when
>> bouncing between levels of the join continuously, it is perhaps better to have
>> the data in more persistent structures.
> Let's use my worklog as an example. What state information is needed? Two
> things: First, whether the current prefix of a result tuple has yet produced an
> extension with the fields from the next table. Second, whether the pushdown
> conditions rejected that tuple. 'Current' is the operative word here. I can't
> see how these data have any validity except in the current iteration of the
> nested loop. Storing them persistently will give rise to problems such as when
> to invalidate the data, where it is best done etc. If it's on the stack, it is
> always safe to use and always current.

I would first see if we can re-use existing data items in new ways before we 
introduce new data items.
>
> Slightly off topic, we are constantly fixing bugs that relate to such transient
> information that has to be restored on recursive ascent but is forgotten. I
> personally feel it would be very irresponsible of me to provide such fertile
> breeding ground for bugs.
>
> That said, there is information that is more persistent than this.
> JOIN_TAB::return_tab being the most prominent example. This field is valid in a
> broader context than the current partial tuple, it is indeed used for the
> bouncing between levels of the join. I refrain from using this on the stack.
>
> I find this distinction quite meaningful, what do you think?

Definitely, I like pragmatism :)

>> Another issue is that the join-tabs are not reentrant, and that they contain a
>> lot of baggage that is only useful during optimization.
> Are re-entrant JOIN_TAB's desirable? IMHO the whole execution plan has to be
> reentrant in that case. That would implicate making the whole Item tree
> re-entrant first.

Why would the Item tree have to come first? There's no dependency there, except 
that both tasks must have been carried out before the executor is truly reentrant.
>>
>> My suggestion is that we have one optimizer "join-tab", one execution join-tab
>> with read-only information, and one execution "join-tab" with state information.
> Sounds reasonable, but I'm afraid we are drifting out of the scope of this work.
>>
>> I also think that we might replace the current recursive functions with an
>> interpreter that this moving up and down a stack of state objects.
> I think this discussion is best left for other venues. There are probably more
> people interested in reading such a discussion that those that open a code
> review email for a worklog. :-)

I was just expressing my thoughts, not trying to start a discussion here.
>>
>>>> but we leave that problem to a refactoring project...)
>>> This has been said for 10 years now. But refactoring must be a continuous
> effort
>>> or it will lose momentum. I take a very pragmatic approach in this and
> refactor
>>> exactly what I need to in order to reach a stable implementation of a
> feature.
>>> If it becomes a performance hit, let's optimize it.
>>
>> I just say: Stick with the current design until we have a clear plan on how to
>> refactor execution. Refactoring without a plan is worse than no refactoring.
> Hmm, that sounds like the start of a flame war. :-) Again I'll take the
> pragmatic approach. The plan with my refactorings was to implement the worklog
> rather than hack it in place. That's the plan, the whole plan and nothing but
> the plan. You obviously think I went overboard with it. Fine, let's take it from
> there.
>>
>> And I do not accept a solution that considers performance as an afterthought.
>
>>
>> It seems to me that the classes you propose can be replaced with the following
>> bits and pieces:
>>
>> - A field in the join-tab that is NULL or contains the shortcut target.
> This field is already there and I use it. JOIN_TAB::return_tab. I could remove
> the intermediate Execution_state::m_return_tab. That would save us 8 bytes and
> maybe 10 machine instructions per iteration.

This is in addition to the return_tab field, because we must have a source for 
the setting of return_tab.
>>
>> - Code that sets return_tab equal to the shortcut at the start of a scan,
>> if shortcut target is non-NULL.
>>
>> - Code that nulls return_tab when the scan delivers at least one matching row,
>> if shortcut target is non-NULL.
> This could be done, but how it would be done is more important.
>
> - in-lined in sub_select or evaluate_join_record

This is the current code style...

> - as method(s) of JOIN_TAB
> - keeping the listener pattern but optimizing it
> - keeping the code in a separate class but removing the listener pattern
>
> Or something else?
>
>>
>> That would be one additional test when initiating a row scan, and one
>> additional test when a row is accepted.
> Ok, now I understand what you meant my 'addtional test'.
>>
>>>>
>>>>> if (end_of_records)
>>>>> {
>>>>> + join->return_tab= join_tab;
>>>>> + state.set_return_tab(join_tab);
>>>>> enum_nested_loop_state nls=
>>>>> (*join_tab->next_select)(join,join_tab+1,end_of_records);
>>>>> DBUG_RETURN(nls);
>>>>> @@ -17144,6 +17155,7 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>>>> }
>>>>>
>>>>> join->return_tab= join_tab;
>>>>> + state.set_return_tab(join_tab);
>>>>> join_tab->not_null_compl= TRUE;
>>>>>
>>>>> if (join_tab->last_inner)
>>>>> @@ -17159,11 +17171,15 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>>>> join->thd->warning_info->reset_current_row_for_warning();
>>>>>
>>>>> error= (*join_tab->read_first_record)(join_tab);
>>>>> + state.set_read_error(error);
>>>>> + join_tab->notify_row_read(&state);
>>>>>
>>>>> if (join_tab->keep_current_rowid)
>>>>>
> join_tab->table->file->position(join_tab->table->record[0]);
>>>>> -
>>>>> +
>>>>> rc= evaluate_join_record(join, join_tab, error);
>>>>> +
>>>>> + join->return_tab= state.get_return_tab();
>>>>>
>>>>> /*
>>>>> Note: psergey has added the 2nd part of the following condition; the
>>>>> @@ -17172,11 +17188,16 @@ sub_select(JOIN *join,JOIN_TAB *join_tab
>>>>> while (rc == NESTED_LOOP_OK&& join->return_tab>=
> join_tab)
>>>>> {
>>>>> error= info->read_record(info);
>>>>> + state.set_read_error(error);
>>>>> + join_tab->notify_row_read(&state);
>>>>>
>>>>> if (join_tab->keep_current_rowid)
>>>>>
> join_tab->table->file->position(join_tab->table->record[0]);
>>>>> -
>>>>> +
>>>>> rc= evaluate_join_record(join, join_tab, error);
>>>>> +
>>>>> + if (state.get_return_tab()< join->return_tab)
>>>>> + join->return_tab= state.get_return_tab();
>>>>> }
>>>>>
>>>>> if (rc == NESTED_LOOP_NO_MORE_ROWS&&
>>>>> @@ -17342,14 +17363,23 @@ evaluate_join_record(JOIN *join, JOIN_TA
>>>>> }
>>>>> DBUG_PRINT("info", ("select cond 0x%lx", (ulong)select_cond));
>>>>>
>>>>> + Execution_state state;
>>>>> + state.set_return_tab(join->return_tab);
>>>>> + state.set_read_error(error);
>>>>> if (select_cond)
>>>>> {
>>>>> found= test(select_cond->val_int());
>>>>> + state.set_evaluation(found);
>>>>>
>>>>> /* check for errors evaluating the condition */
>>>>> if (join->thd->is_error())
>>>>> DBUG_RETURN(NESTED_LOOP_ERROR);
>>>>> }
>>>>> + else
>>>>> + state.set_evaluation(TRUE);
>>>>> +
>>>>> + join_tab->notify_row_evaluated(&state);
>>>>> +
>>>>> if (found)
>>>>> {
>>>>> /*
>>>>> @@ -17414,7 +17444,7 @@ evaluate_join_record(JOIN *join, JOIN_TA
>>>>> join_tab->first_unmatched= first_unmatched;
>>>>> }
>>>>>
>>>>> - JOIN_TAB *return_tab= join->return_tab;
>>>>> + JOIN_TAB *return_tab= state.get_return_tab();
>>>>>
>>>>> if (join_tab->check_weed_out_table&& found)
>>>>> {
>>>>> @@ -22764,6 +22794,7 @@ void select_describe(JOIN *join, bool ne
>>>>> extra.append(STRING_WITH_LEN(", incremental buffers)"));
>>>>> }
>>>>>
>>>>> + tab->notify_explain_extra_column_ready(join,&extra);
>>>>> /* Skip initial "; "*/
>>>>> const char *str= extra.ptr();
>>>>> uint32 len= extra.length();
>>>>>
>>>>> === modified file 'sql/sql_select.h'
>>>>> --- a/sql/sql_select.h 2010-09-06 17:56:05 +0000
>>>>> +++ b/sql/sql_select.h 2010-09-20 13:03:54 +0000
>>>>> @@ -179,6 +179,57 @@ inline bool sj_is_materialize_strategy(u
>>>>> return strategy>= SJ_OPT_MATERIALIZE_LOOKUP;
>>>>> }
>>>>>
>>>>> +
>>>>> +/**
>>>>> + Part of the IExecution_listener interface. An execution listener
> is
>>>>> + notified by the plan node upon various events in the execiton plan.
> A
>>>>> + listener may alter the state and thus change the course of the
> query
>>>>> + execution by altering the execution state. The query execution
> engine may
>>>>> + always overrride the decision, however.
>>>>
>>>> The last sentence does not make sense to me.
>>> I misspelled 'override'. ;-)
>>>
>>> In practice what it means is that the Execution_state does not manipulate
> the
>>> plan (the JOIN_TAB) directly. This is the design. It only updates its own
> member
>>> (m_return_tab) which sub_select() picks up later. I wanted to save the
> option
>>> (in the design) for sub_select not to pick it up.
>>>> And "various events" is just loose talk. I want exact specifications,
> please :)
>>> I was intentionally vague here so that this documentation wouldn't drift out
> of
>>> sync if one adds another event to the design. Extensibility in mind :-)
>>
>> IMHO being vague is a cardinal sin when documenting code. The sole purpose of
>> documentation is to enhance understanding of the code.
>
>>>>> +*/
>>>>> +class Execution_state {
>>>>> + int m_read_error;
>>>>
>>>> What does "m_read_error" has to do with execution? I thought a read
> error
>>>> would cause immediate exit, and nothing that we need to track on a table
> basis.
>>> What do you mean by 'immediate exit' in this context? m_read_error is used
> to
>>> store the value returned by join_tab->read_first_record(). I believe
> negative
>>> values mean 'no more rows' and positive ones mean 'serious error' (see
>>> evaluate_join_record). But I see no proper spec anywhere.
>>
>> Well, there is a confusion in the code that it mixes error codes and EOF state
>> for a scan. I thought you would not make the same mistake ;)
>>
>> What I mean with "immediate exit" is that you do not attempt any recovery when
>> an error occurs, you just clean up what is possible and exit execution
>> immediately. Thus, the m_read_error mostly contains a scan state, so it is
>> clearly mis-named.
>>>>
>>>>> + bool m_evaluation;
>>>>
>>>> What is this?
>>> It's filled in by evaluate_join_record(), whether it accepts or rejects a
> tuple.
>>> It's not commented because it's a private member, but perhaps I should
> comment
>>> the getXXX and setXXX methods in Execution_state?
>>
>> "accepted" would be a better name, then. Besides, it seems to be the same as
>> the "bool found" within the join-tab, so it is duplicated information.
>>
>> And why is not comments justified for private members?
> Doxygen comments don't show up for private members, I think.

I don't care, I usually don't use Doxygen while reading the code...

>>>>
>>>>> + struct st_join_table *m_return_tab;
>>>>> +
>>>>> +public:
>>>>> +
>>>>> + Execution_state() : m_read_error(0), m_evaluation(FALSE),
> m_return_tab(NULL)
>>>>> + {}
>>>>> +
>>>>> + int get_read_error() { return m_read_error; }
>>>>> + void set_read_error(int read_error) { m_read_error = read_error; }
>>>>> +
>>>>> + int get_evaluation() { return m_evaluation; }
>>>>> + void set_evaluation(bool evaluation) { m_evaluation= evaluation; }
>>>>> +
>>>>> + struct st_join_table *get_return_tab() { return m_return_tab; }
>>>>> +
>>>>> + /**
>>>>> + Signals to query execution that a short-cut be taken in the query
>>>>> + execution, and no further tuples should be read from tables
> appearing
>>>>> + after return_tab in the query execution plan.
>>>>> +
>>>>> + @param return_tab The latest plan node where query execution should
> be
>>>>> + resumed.
>>>>> + */
>>>>> + void set_return_tab(struct st_join_table *return_tab) {
>>>>> + m_return_tab = return_tab;
>>>>> + }
>>>>> +};
>>>>> +
>>>>> +
>>>>> +class IExecution_listener {
>>>>> +public:
>>>>> + virtual void row_read(struct st_join_table *join_tab,
>>>>> + Execution_state *state) {}
>>>>> + virtual void row_evaluated(struct st_join_table *join_tab,
>>>>> + Execution_state *state) {}
>>>>> + virtual void explain_extra_column_ready(JOIN *join, String *extra)
> {}
>>>>> + virtual ~IExecution_listener() {}
>>>>> +};
>>>>
>>>> I am a bit concerned about the efficiency of your implementation. You
> are
>>>> performing quite a number of checks that generally seem unneeded.
>>> I am concerned, too. :-) For the record, this is totally unoptimized code.
> For
>>> my first commit I just wanted to program the design as straightforwardly and
>>> obviously as possible, and then optimize once the design is communicated and
>>> approved.
>>>
>>> If I do any checks that are unnecessary I would welcome if you pointed them
> out.
>>> We should make them DBUG_ASSERT's in that case.
>>>
>>> What I am concerned about is not the virtual method calls. These are as
>>> efficient as function pointers. But it might be a problem to traverse a list
> of
>>> listeners. Do you have any suggestions?
>>
>> See above. I do not think there is anything currently that justifies the list
>> implementation.
>>>>
>>>> Is it possible to implement a more "lightweight" solution
>>> What would it be characterized by? More like in-lining the code in
> sub_select
>>> and evaluate_join_record?
>>
>> See above.
>>
>>>> that is more in line with the code style currently being used in the
>>>> executioner?
>>> I must say the executioner exhibits a great lack of code style. I can't
> really
>>> see how it could be a role model.
>>
>> I did not ask that you look upon it as a role model, nevertheless I do ask you
>> to follow the code style. There's probably a lot that can be said about the
>> style, but at least it encourages efficiency.
> My opinions on this won't affect anything. Off the record I think what it does
> is drown you in so many details you wouldn't discover a performance problem.

I have a big concern with all the if-tests that are necessary at runtime. 
Generating code that does the job unconditionally is my ultimate dream.
>>>
>>>>
>>>>> +
>>>>> +
>>>>> /**
>>>>> Bits describing quick select type
>>>>> */
>>>>> @@ -186,11 +237,25 @@ enum quick_type { QS_NONE, QS_RANGE, QS_
>>>>>
>>>>> typedef struct st_join_table : public Sql_alloc
>>>>> {
>>>>> +private:
>>>>> + List<IExecution_listener> listeners;
>>>>> +public:
>>>>> st_join_table();
>>>>>
>>>>> TABLE *table;
>>>>> KEYUSE *keyuse; /**< pointer to first used key */
>>>>> SQL_SELECT *select;
>>>>> + /**
>>>>> + This field corresponds somewhat to a pushdown condition. It may
>>>>> + contain more than just those conditions pushed to the actual
> table,
>>>>> + however.
>>>>
>>>> Where is a "pushdown condition" defined?
>>> Section 1.1 of the HLS. Fifth bullet.
>>
>> Ah. OK. Actually, it is easy to confuse "pushdown condition" with a condition
>> that is pushed down to the storage engine level ;) Is "Table condition" better?
> Pushdown condition is (was) a fairly established term in the old optimizer team.
> You will find it in many code comments and worklogs. In fact I copied my
> definition from one of Timour's worklogs.
>>
>>>> And please do not use the word "somewhat", but describe exactly what the
> field
>>>> is.
>>> The contents are most likely partially undefined.
>>
>> It is still not useful information, please delete it.
> Done.
>>
>>>>> +
>>>>> + @note For the first table in the plan there is usually a
> select_cond
>>>>> + attached but it is not initialized.
>>>>
>>>> You mean it is NULL? I guess this is unnecessary information.
>>> No, not NULL. I mean the object exists but is not properly initialized. Most
> of
>>> its members are undefined. That is, the job that you would expect a
> constructor
>>> to do is not done.
>>
>> Still, this is confusing information. If there is a particular problem, you
>> should document that (or fix it).
> Personally, I would be grateful to have that comment there when I was working on
> this. But I removed it now.
>>>>> +
>>>>> + @note Not all pushdown conditions are found in this field. @c ref
> access
>>>>> + conditions are removed from here and are encoded in JOIN_TAB::ref.
>>>>
>>>> The basic idea is that this field specifies the remaining conditions to
> be
>>>> evaluated on the currently available rows, after you have navigated to
> that
>>>> set of rows. However, this excludes outer join conditions that are
> located in
>>>> the join nest structures.
>>> Right.
>>>>
>>>>> + */
>>>>> Item *select_cond;
>>>>> QUICK_SELECT_I *quick;
>>>>> Item **on_expr_ref; /**< pointer to the associated on expression
> */
>>>>> @@ -385,6 +450,32 @@ typedef struct st_join_table : public Sq
>>>>> return tmp_select_cond;
>>>>> }
>>>>> uint get_sj_strategy() const;
>>>>> + void add_execution_listener(IExecution_listener *listener)
>>>>> + {
>>>>> + listeners.push_front(listener);
>>>>> + }
>>>>> + void notify_row_read(Execution_state *state)
>>>>> + {
>>>>> + IExecution_listener* listener;
>>>>> + for (List_iterator<IExecution_listener> it(listeners);
>>>>> + (listener= it++);)
>>>>> + listener->row_read(this, state);
>>>>> + }
>>>>> + void notify_row_evaluated(Execution_state *state)
>>>>> + {
>>>>> + IExecution_listener* listener;
>>>>> + for (List_iterator<IExecution_listener> it(listeners);
>>>>> + (listener= it++);)
>>>>> + listener->row_evaluated(this, state);
>>>>> +
>>>>> + }
>>>>> + void notify_explain_extra_column_ready(JOIN *join, String *extra)
>>>>> + {
>>>>> + IExecution_listener* listener;
>>>>> + for (List_iterator<IExecution_listener> it(listeners);
>>>>> + (listener= it++);)
>>>>> + listener->explain_extra_column_ready(join, extra);
>>>>> + }
>>>>> } JOIN_TAB;
>>>>>
>>>>>
>>>>> @@ -1761,11 +1852,12 @@ public:
>>>>> List<TABLE_LIST> *join_list; ///< list of joined tables in
> reverse order
>>>>> COND_EQUAL *cond_equal;
>>>>> /*
>>>>> + Interface for taking short-cuts through join execution.
>>>>
>>>> I do not think this sentence adds information.
>>> My bad. This should have been a Doxygen comment. Doxygen puts the first
> sentence
>>> of a comment in a special 'brief' section so one should strive to summarize
> what
>>> the comment says in the first sentence.
>>
>> And it is a field, not an interface.
> Doxygen doesn't differentiate between the two. They both have a 'brief' and a
> 'details' section.
>>>
>>> http://www.stack.nl/~dimitri/doxygen/commands.html#cmdbrief
>>>>
>>>>> Join tab to return to. Points to an element of join->join_tab
> array, or to
>>>>> join->join_tab[-1].
>>>>> This is used at execution stage to shortcut join enumeration.
> Currently
>>>>> shortcutting is done to handle outer joins or handle semi-joins with
>>>>> - FirstMatch strategy.
>>>>> + FirstMatch strategy. It is also used to optimize execution of star
> joins.
>>>>> */
>>>>> JOIN_TAB *return_tab;
>>>>> Item **ref_pointer_array; ///<used pointer reference for this
> select
>>>>> @@ -1933,6 +2025,46 @@ public:
>>>>> ((group || tmp_table_param.sum_func_count)&& !group_list)) ?
>>>>> NULL : join_tab+const_tables;
>>>>> }
>>>>> + /**
>>>>> + Returns true if JOIN_TAB's a and b participate on the same nesting
> level
>>>>> + within a chain of inner joined tables.
>>>>> +
>>>>> + @note The optimizer execution plan is considered, and outer joins
> may
>>>>> + have been rewritten into inner joins.
>>>>> + */
>>>>> + static bool are_inner_joined(JOIN_TAB *a, JOIN_TAB *b)
>>>>> + {
>>>>> +
>>>>> + if (a> b) {
>>>>> + JOIN_TAB *c= a;
>>>>> + a= b;
>>>>> + b= c;
>>>>> + }
>>>>> +
>>>>> + if (a->last_inner != NULL&& // a is in an outer join
> nest, then
>>>>> + a->last_inner == b->last_inner&& // a must be at
> least @ b's nest level
>>>>> + b->first_inner == NULL) // and b must not be deeper nested
>>>>> + return TRUE;
>>>>> +
>>>>> + if (a->last_inner == NULL&& // a is not an outer join
> nest
>>>>> + b->last_inner == NULL&& b->first_inner == NULL) //
> neither is b
>>>>> + return TRUE;
>>>>> +
>>>>> + if (b->last_inner == NULL&& // b is not first in a
> nested inner join
>>>>> + b->first_inner != NULL&& b->first_inner ==
> a->first_inner) // but a is
>>>>> + return TRUE;
>>>>> +
>>>>> + return FALSE;
>>>>> + }
>>>>
>>>> I have not analyzed this algorithm in detail :)
>>> It's not an algorithm per se. Just a view over of the last/first _inner
>>> pointers.
>>>
>>>> Based on our IRC discussion, do you think it is possible to simplify the
>>>> implementation?
>>> Do you mean implement it in terms of is_inner_table_of_outer_join() et al? I
>>> guess I could try...
>>
>> Or just use embedding and exclude outer-joined tables?
> I don't know if that is easier and frankly I don't see the point. The
> information has been computed once, why recompute it? The only reason would be
> that you would want to introduce short-cut discovery as early as in the
> cost-based optimization loop but you appear to be opposed to that too.

I just want the simplest possible way to calculate the property, because that 
makes later maintenance easier.
>>>>
>>>>> +
>>>>> + static bool contains_join_buffering(JOIN_TAB *a, JOIN_TAB *b)
>>>>> + {
>>>>> + for (JOIN_TAB *join_tab= a; join_tab< b; join_tab++)
>>>>> + if (join_tab->next_select == sub_select_cache)
>>>>> + return TRUE;
>>>>> + return FALSE;
>>>>
>>>> Is it possible to use the use_join_cache flag instead? Relying on
> generated
>>>> "code"
>>>> seems dangerous...
>>> Good point. Will look into that.
>>>>
>>>>> + }
>>>>> +
>>>>> private:
>>>>> /**
>>>>> TRUE if the query contains an aggregate function but has no GROUP
>>>>>
>>>>> === added file 'sql/sql_set.h'
>>>>> --- a/sql/sql_set.h 1970-01-01 00:00:00 +0000
>>>>> +++ b/sql/sql_set.h 2010-09-20 13:03:54 +0000
>>>>> @@ -0,0 +1,112 @@
>>>>> +#ifndef INCLUDES_MYSQL_SQL_SET_H
>>>>> +#define INCLUDES_MYSQL_SQL_SET_H
>>>>> +/* Copyright (c) 2000, 2010 Oracle and/or its affiliates. All
> rights
>>>>> reserved.
>>>>> +
>>>>> + This program is free software; you can redistribute it and/or
> modify
>>>>> + it under the terms of the GNU General Public License as published
> by
>>>>> + the Free Software Foundation; version 2 of the License.
>>>>> +
>>>>> + This program is distributed in the hope that it will be useful,
>>>>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>>>> + GNU General Public License for more details.
>>>>> +
>>>>> + You should have received a copy of the GNU General Public License
>>>>> + along with this program; if not, write to the Free Software
>>>>> + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
> 02110-1301 USA */
>>>>> +
>>>>> +#include "my_global.h"
>>>>> +
>>>>> +template<class T> class Bitmap_set_iterator;
>>>>> +template<class T> class Bitmap_set_reverse_iterator;
>>>>> +
>>>>> +/**
>>>>> + A set implemented as a bitmap. Due to this implementation,
> operations that
>>>>> + normally are quite expensive, typically having O(n log n)
> complexity, are
>>>>> + extremely cheap. A set consists of a bitmap and an array of
> pointers to
>>>>> + elements, the backing store. Each n:th bit in the bitmap represents
> the
>>>>> + member at position n in the array, where the first member has
> position 0. A
>>>>> + set consisting of this element is represented by the bitmap 1.
>>>>> +
>>>>> + For obvious reasons, it cannot be allowed to perform a union
> operation on
>>>>> + two sets that have different backing stores, even if their type is
> the
>>>>> + same. Attempts at such operations will result in failed
> assertions.
>>>>> + */
>>>>
>>>> AFAIU, the immediate gain of this class is that you get type-safety for
> the
>>>> operations
>>>> on the backingstore elements, and that you avoid doing the indexing into
>>>> the backingstore array.
>>> The idea is to use a simple iterator interface to access the data and to
> hide
>>> the fact that it is represented by a bitmap. It annoys the hell out of me
> that
>>> the code is full of 'sets' that are just int32's, table_map's whatever. And
>>> don't get me started on the 'magic' bits that people shove in there just
> because
>>> the representation is wide open. Pick your favorite among PARAM_TABLE_BIT,
>>> OUTER_REF_TABLE_BIT, RAND_TABLE_BIT...
>>>
>>> You can see this great benefit in action in set_star_dependency(). If I
> didn't
>>> have an iterator I would have to slide a bitmap over an integer, dereference
>>> pointers and whatnot.
>>
>> 64-bit bitmaps already exist and iterators over bitmaps already exist. The
>> novel part is the backingstore which avoid an array indexing.
> Are you talking about Table_map_iterator? That is an interesting hack. ;-) What
> the heck are they doing, looks like they divide the bitmap in chunks of four
> bits somehow. Hilarious, made my day. :-)
>>>>
>>>> The downside is more complexity
>>> IMHO This class *removes* complexity. Have you seen how many bizarre bitwise
>>> operations are done in the code? And how many bugs they cause?
>>
>> Well, if you want to extend beyond 61 tables in a join, you now need to extend
>> two bitset implementations.
> And redefine a handful of magic macros.
>>>
>>> Do you really not see any expressive advantage of "if (set_difference(a, b,
>>> c).is_empty())" over "if (((a & ~b) & ~c))"? The compiled code looks
> exactly the
>>> same.
>>
>> Yes, to some extent I do. I have tried the same myself, but after all I
>> decided the pain was bigger than the gain. And I have come to understand the
>> bitmap tweaks pretty well by now. It's not so much the bitmaps, but the
>> incredibly complex 10-part and/or-expressions that make the code hard to read
>> and maintain.
> Yes, those easily cost half a day every time I run into them. Usually it's a
> fairly basic operation that is desired but the primitives are lacking. Why don't
> you point me to one of those and I'll show you how slick it can get with a
> Bitmap_set? :-)

I am pretty sure that you can convince somebody that reworking the table_map 
typedef into a class with inlined functions can be a good thing, and they will 
approve your code :)

>>>
>>>> (yet another bitmap type,
>>> Yes, but one to rule them all. This type can encompass all the other types
> and
>>> hide *all* of them behind a consistent interface that lets you switch
>>> representation. It could be represented as uint32, long int, short, or
> indeed
>>> anything.
>>>> templates)...
>>> Templates are part of the C++ language. Granted, they may have had a bad
>>> reputation 15 years ago but there is no reason for treating them like
>>> second-class citizens anymore.
>>
>> They have a place, but IMHO they should be used sparingly.
> I agree. Container types are the only place we use them afaiu.
>>
>>>>> +template<class T> class Bitmap_set {
>>>>> +
>>>>> +private:
>>>>> + ulonglong m_map;
>>>>> + T** m_backing_store;
>>>>> +
>>>>> +public:
>>>>> + Bitmap_set(ulonglong map, T** backing_store) :
>>>>> + m_map(map), m_backing_store(backing_store) {}
>>>>> +
>>>>> + bool is_empty() { return m_map == 0; }
>>>>> +
>>>>> + static Bitmap_set<T> set_union(const Bitmap_set<T>&
> a,
>>>>> + const Bitmap_set<T>& b) {
>>>>> + DBUG_ASSERT(a.m_backing_store == b.m_backing_store);
>>>>> + Bitmap_set<T> result(a.m_map | b.m_map, a.m_backing_store);
>>>>> + return result;
>>>>> + }
>>>>> +
>>>>> + friend class Bitmap_set_iterator<T>;
>>>>> + friend class Bitmap_set_reverse_iterator<T>;
>>>>> +};
>>>>> +
>>>>> +template<class T> class Bitmap_set_iterator {
>>>>> +private:
>>>>> + const Bitmap_set<T>& m_set;
>>>>> + uint m_current_element;
>>>>> +
>>>>> +public:
>>>>> + Bitmap_set_iterator(const Bitmap_set<T>& set) :
>>>>> + m_set(set), m_current_element(0) {}
>>>>> +
>>>>> + inline T* operator++(int) {
>>>>> + ulonglong mask= 1<< m_current_element;
>>>>> + while ((mask& m_set.m_map) == 0)
>>>>> + {
>>>>> + if (mask == 0 || mask> m_set.m_map)
>>>>> + return NULL;
>>>>> + mask<<= 1;
>>>>> + ++m_current_element;
>>>>> + }
>>>>> + T* element= m_set.m_backing_store[m_current_element++];
>>>>> + return element;
>>>>> + }
>>>>> +
>>>>> +};
>>>>> +
>>>>> +template<class T> class Bitmap_set_reverse_iterator {
>>>>> +private:
>>>>> + const Bitmap_set<T>& m_set;
>>>>> + int m_current_element;
>>>>> +
>>>>> +public:
>>>>> + Bitmap_set_reverse_iterator(const Bitmap_set<T>& set) :
>>>>> + m_set(set), m_current_element(0)
>>>>> + {
>>>>> + ulonglong map= set.m_map;
>>>>> + while ((map>>= 1) != 0)
>>>>> + ++m_current_element;
>>>>> + }
>>>>> +
>>>>> + inline T* operator++(int) {
>>>>> + if (m_current_element == -1)
>>>>> + return NULL;
>>>>> + ulonglong mask= 1<< m_current_element;
>>>>> + while ((mask& m_set.m_map) == 0)
>>>>> + {
>>>>> + if (mask == 0)
>>>>> + return NULL;
>>>>> + mask>>= 1;
>>>>> + --m_current_element;
>>>>> + }
>>>>> + T* element= m_set.m_backing_store[m_current_element--];
>>>>> + return element;
>>>>> + }
>>>>> +
>>>>> +};
>>>>> +#endif // INCLUDES_MYSQL_SQL_SET_H
>>>>>
>>>>> === added file 'sql/sql_shortcut.h'
>>>>> --- a/sql/sql_shortcut.h 1970-01-01 00:00:00 +0000
>>>>> +++ b/sql/sql_shortcut.h 2010-09-20 13:03:54 +0000
>>>>> @@ -0,0 +1,208 @@
>>>>> +#ifndef INCLUDES_MYSQL_SQL_SHORTCUT_H
>>>>> +#define INCLUDES_MYSQL_SQL_SHORTCUT_H
>>>>> +
>>>>> +/* Copyright (c) 2000, 2010, Oracle and/or its affiliates. All
> rights
>>>>> reserved.
>>>>> +
>>>>> + This program is free software; you can redistribute it and/or
> modify
>>>>> + it under the terms of the GNU General Public License as published
> by
>>>>> + the Free Software Foundation; version 2 of the License.
>>>>> +
>>>>> + This program is distributed in the hope that it will be useful,
>>>>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>>>> + GNU General Public License for more details.
>>>>> +
>>>>> + You should have received a copy of the GNU General Public License
>>>>> + along with this program; if not, write to the Free Software
>>>>> + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
> USA */
>>>>> +
>>>>> +#include "sql_select.h"
>>>>> +
>>>>> +class Star_join_shortcut : public IExecution_listener {
>>>>> +
>>>>> +private:
>>>>> +
>>>>> + JOIN_TAB *m_destination;
>>>>> + bool m_current_prefix_successful;
>>>>
>>>> What is meant by "success" here?
>>> Comment added.
>>>>
>>>>> +
>>>>> +Star_join_shortcut(JOIN_TAB *destination) :
>>>>> + m_destination(destination), m_current_prefix_successful(FALSE) {
>>>>> + DBUG_ASSERT(m_destination != NULL);
>>>>> + }
>>>>> +
>>>>> +public:
>>>>> +
>>>>> + /**
>>>>> + Pre-compute short-cuts for the join short-cut optimization.
>>>>> +
>>>>> + This function adorns the query exeution plan with star join
> short-cuts
>>>>> + where applicable. Short-cuts are triggered through the
>>>>> + IExecution_listener interface. The algorithm is implemented as a
> nested
>>>>> + loop, where the outer loop walks the nodes (JOIN_TAB's) in the
> query
>>>>> + execution plan, and the inner loop iterates through all tables that
> this
>>>>> + table depends upon. Valid short-cuts have the following property.
>>>>> +
>>>>> + - The short-cut has a minimum length of 2. If a table is dependent
> on the
>>>>> + table directly preceding it, the short-cut would be of length 1,
> and it
>>>>> + would indicate that there are no valid short-cuts from this table.
>>>>> +
>>>>> + - The short-cut does not cross an outer join boundary. In this case
> the
>>>>> + optimization is not applicable, because a valid result would be
> missed.
>>>>> +
>>>>> + - The short-cut does not cross a node using join buffers.
> Short-cuts are
>>>>> + not compatible with join buffering.
>>>>> +
>>>>> + - If there are several applicable shortcuts by the above
> requirements, the
>>>>> + shortest one is chosen.
>>>>> +
>>>>> + @param join The query execution plan. The plan is assumed to be
> complete
>>>>> + and ready for execution when this method is invoked.
>>>>> + */
>>>>> + static void setup_shortcuts(JOIN *join)
>>>>> + {
>>>>> + if(!join->join_tab)
>>>>> + return;
>>>>> +
>>>>> + for (uint i= 2; i< join->tables; i++)
>>>>> + set_star_dependency(join,&join->join_tab[i]);
>>>>> + }
>>>>> +
>>>>> + void row_read(JOIN_TAB *join_tab, Execution_state *state)
>>>>> + {
>>>>> + if (state->get_read_error() != 0)
>>>>> + {
>>>>> + if (!m_current_prefix_successful)
>>>>> + handle_read_failure(join_tab, state);
>>>>> + m_current_prefix_successful= FALSE;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> +
>>>>> + /**
>>>>> + If a row has passed the push-down condition test for the current
> prefix
>>>>> + of the partial result tuple, we record that here. When there are no
> more
>>>>> + rows left, it is allowed to take a short-cut if now rows passed the
> test.
>>>>> + */
>>>>> + void row_evaluated(JOIN_TAB *join_tab, Execution_state *state) {
>>>>> + if (state->get_evaluation())
>>>>> + m_current_prefix_successful= TRUE;
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + Appends information about the short-cut optimization to EXPLAIN
> output, if
>>>>> + used for the current plan node.
>>>>> +
>>>>> + @param join The execution plan.
>>>>> + @param join_tab The plan node to be annotated.
>>>>> + @param extra The current content of the 'extra' column in EXPLAIN.
>>>>> + */
>>>>> + void explain_extra_column_ready(JOIN *join, String *extra)
>>>>> + {
>>>>> + extra->append(STRING_WITH_LEN("; StarJoin("));
>>>>> + extra->append(m_destination->table->alias);
>>>>> + extra->append(STRING_WITH_LEN(")"));
>>>>> + }
>>>>> +
>>>>> +private:
>>>>> +
>>>>> + /**
>>>>> + Sets the short-cut from from one plan node to another, if one is
> found.
>>>>> +
>>>>> + @param join The execution plan.
>>>>> +
>>>>> + @param join_tab The plan node from which a short-cut
> willpotentially be
>>>>> + taken.
>>>>> + */
>>>>> + static void set_star_dependency(JOIN *join, JOIN_TAB *join_tab)
>>>>> + {
>>>>> + table_map where_clause_tables_map;
>>>>> + if (join_tab->select_cond != NULL)
>>>>> + {
>>>>> + Item *where_condition= join_tab->select_cond;
>>>>> + where_condition->update_used_tables();
>>>>
>>>> Do you need to update used tables here?
>>> I gather you have not tried removing it. ;-) Yes it is needed.
>>
>> I think that if used tables information for the table conditions needs
>> updating after optimization is "complete", it is indicative of an error
>> earlier in the code.
>
>>
>> Compensating with update_used_tables() is perhaps wrong?
> Please see the implementation. It is in fact implemented in such a way that you
> have to call update_used_tables() before you can use it. Otherwise it uses some
> 'caching' mechanism. Only if you yourself can guarantee that it's not changed is
> it safe to optimize your code by not calling it. Premature optimization at its
> worst.

Damn.

BTW, I'd like to remove all the asserts in Item::fix_fields() stating
DBUG_ASSERT(!fixed), and then let the classes handle calls to already fixed 
objects themselves...

>>>>
>>>>> + where_clause_tables_map=
>>>>> + where_condition->used_tables()& ~PSEUDO_TABLE_BITS;
>>>>> + }
>>>>> + else
>>>>> + where_clause_tables_map= (table_map)0L;
>>>>> +
>>>>> + table_map ref_access_tables_map=
>>>>> + join_tab->ref.depend_map& ~PSEUDO_TABLE_BITS;
>>>>> +
>>>>> + Bitmap_set<JOIN_TAB>
>>>>> + where_clause_tables(where_clause_tables_map, join->map2table);
>>>>> +
>>>>> + Bitmap_set<JOIN_TAB>
>>>>> + ref_access_tables(ref_access_tables_map, join->map2table);
>>>>> +
>>>>> + Bitmap_set<JOIN_TAB> all_dependency_tables=
>>>>> + Bitmap_set<JOIN_TAB>::set_union(where_clause_tables,
> ref_access_tables);
>>>>
>>>> all_dependency_tables -> dependent_tables ?
>>> That would be 'tables_depended_upon' in that case.
>>
>> OK
> Do you want me to change it?

Yes

>>>>
>>>>
>>>>> +
>>>>> + if (all_dependency_tables.is_empty())
>>>>> + return;
>>>>> +
>>>>> + for(Bitmap_set_reverse_iterator<JOIN_TAB>
> it(all_dependency_tables);
>>>>> + update_star_dependency(join, join_tab, it++);) ;
>>>>> + }
>>>>> +
>>>>> +
>>>>> + /**
>>>>> + Updates a short-cut from from one plan node to another. The method
>>>>> + assumes that it is invoked on plan nodes in reverse order in which
> they
>>>>> + appear in the query execution plan.
>>>>
>>>> -> Order from the most inner to the most outer table?
>>> What is meant by inner and outer in this case?
>>
>> For nested-loop join, it is common to start with the "most outer" table, until
>> you reach the "most inner" table.
> Yes, but tables at the same nesting depth are also ordered according to cost.

I thought you referred to the query execution plan, where costs have been 
considered, and "same nesting depth" thus has been made irrelevant?
>>>>
>>>>> +
>>>>> + @param join The execution plan.
>>>>> +
>>>>> + @param join_tab The plan node from which a short-cut may
> potentially be
>>>>> + taken.
>>>>> +
>>>>> + @param dependent A node that access to join_tab depends on.
>>>>> +
>>>>> + @retval TRUE There may be other short-cuts possible early in the
> plan.
>>>>> + @retval FALSE There are no earlier short-cuts available. Either the
> only
>>>>> + possible one was found and completely initialized, or there are
> none.
>>>>> + */
>>>>> + static bool update_star_dependency(JOIN *join, JOIN_TAB *join_tab,
>>>>> + JOIN_TAB *dependency)
>>>>> + {
>>>>> + if (dependency == NULL)
>>>>> + return FALSE;
>>>>> +
>>>>> + if (join_tab<= dependency)
>>>>> + return TRUE;
>>>>> +
>>>>> + if (dependency == join_tab - 1)
>>>>> + {
>>>>> + DBUG_PRINT("info", ("Invalidated a join short-cut from %s to %s",
>>>>> + join_tab->table->alias,
>>>>> + dependency->table->alias));
>>>>> + return FALSE;
>>>>> + }
>>>>> + else if (join->are_inner_joined(dependency, join_tab)&&
>>>>> + !join->contains_join_buffering(dependency, join_tab))
>>>>> + {
>>>>> + DBUG_PRINT("info", ("Found a join short-cut from %s to %s",
>>>>> + join_tab->table->alias,
>>>>> + dependency->table->alias));
>>>>> +
>>>>> + join_tab->add_execution_listener(new
> Star_join_shortcut(dependency));
>>>>> +
>>>>> + return FALSE;
>>>>> + }
>>>>> + return TRUE;
>>>>> + }
>>>>> +
>>>>> +private:
>>>>> +
>>>>> + void handle_read_failure(JOIN_TAB *join_tab, Execution_state
> *state) {
>>>>> + if (m_destination != NULL)
>>>>> + {
>>>>> + state->set_return_tab(m_destination);
>>>>> + DBUG_PRINT("info", ("Execution short-cut to '%s'",
>>>>> + m_destination->table != NULL ?
>>>>> + m_destination->table->alias : ""));
>>>>> + }
>>>>> + }
>>>>> +};
>>>>> +
>>>>> +#endif // INCLUDES_MYSQL_SQL_SHORTCUT_H
>>>>>
>>>>> === modified file 'sql/structs.h'
>>>>> --- a/sql/structs.h 2010-07-13 17:29:44 +0000
>>>>> +++ b/sql/structs.h 2010-09-20 13:03:54 +0000
>>>>> @@ -119,6 +119,27 @@ struct st_join_table;
>>>>> typedef struct st_reginfo { /* Extra info about reg */
>>>>> struct st_join_table *join_tab; /* Used by SELECT() */
>>>>> enum thr_lock_type lock_type; /* How database is used */
>>>>> + /**
>>>>> + This is an outer join optimization. The idea is to avoid
>>>>> + NULL-complementing a row for the inner table(s). The bespoke case
> is when
>>>>> + we have a query such as
>>>>> +
>>>>> + @code
>>>>> +
>>>>> + SELECT ...
>>>>> + FROM t1 LEFT OUTER JOIN (t2, t3) ON<condition>
>>>>> + WHERE t2.non_nullable_column IS NULL
>>>>> +
>>>>> + @endcode
>>>>> +
>>>>> + In this case not_exists_optimize would be set for t2.
>>>>> +
>>>>> + Once the query execution reaches the place where<condition>
> holds for
>>>>> + the current result tuple, we know that there cannot be a row where
> the
>>>>> + WHERE condition holds. There is no need produce a NULL-complemented
> row.
>>>>> +
>>>>> + @see update_ref_and_keys
>>>>
>>>> These comments are repeated at two places in the code.
>>> Please show me where, I can't find it.
>>
>> In sql_select.cc,
>> At around line 6112, there is documentation for when this flag is set.
>> At around line 17420, there is documentation on the use of the flag.
> Ah, this was not there in the tree where I added the comment at first.
>
> I am appalled by the following comment:
>
> /*
> When not_exists_optimizer is set and a matching row is found, the
> outer row should be excluded from the result set: no need to
> explore this record and other records of 'tab', so we return "no
> records". But as we set 'found' above, evaluate_join_record() at
> the upper level will not yield a NULL-complemented record.
> */
>
> Do you see how the implementor intentionally skews the definition of 'found'?
>  From this moment on 'found' does not mean 'found' anymore. At the very least,
> the name should have been changed to 'do_not_null_complete'.

As long as the use of "found" is documented, I see nothing wrong with it. Better 
than creating another field for an obscure purpose..
>
> I think this comments stays. Comments in the code do not show up in Doxygen
>
>
> Best Regards
>
> Martin
>

Thanks,
Roy
Thread
bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218) WL#3724Martin Hansson20 Sep
Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng27 Sep
  • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Martin Hansson28 Sep
    • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng29 Sep
      • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Martin Hansson29 Sep
        • Re: bzr commit into mysql-next-mr-opt-team branch (martin.hansson:3218)WL#3724Roy Lyseng29 Sep