Hello,
Roy Lyseng a écrit, Le 26.04.2010 20:44:
> Hi Guilhem,
>
> thanks for commenting.
>
> Guilhem Bichot wrote:
>> Hello Roy,
>>
>> Roy Lyseng a écrit, Le 26.04.2010 12:54:
>>> #At file:///home/rl136806/mysql/repo/mysql-6.0-work1/ based on
>>> revid:alfranio.correia@stripped
>>>
>>> 3836 Roy Lyseng 2010-04-26
>>> WL#5347: Refactor optimizer function at_sjmat_pos()
>>
>>> @@ -7241,58 +7270,79 @@ optimize_straight_join(JOIN *join, table
>>> }
>>>
>>>
>>> -/*
>>> - Check if the last tables of the partial join order allow to use
>>> - sj-materialization strategy for them
>>> +/**
>>> + Check whether a semijoin materialization strategy is allowed for
>>> + the current (semi)join table order.
>>>
>>> - SYNOPSIS
>>> - at_sjmat_pos()
>>> - join - remaining_tables
>>> - tab the last table's join tab
>>> - idx last table's index
>>> - loose_scan OUT TRUE <=> use LooseScan
>>> + @param join Join object
>>> + @param remaining_tables Tables that have not yet been added to
>>> the join plan
>>> + @param tab Join tab of the table being considered
>>> + @param idx Index of table with join tab "tab"
>>> +
>>> + @retval SJ_OPT_NONE - Materialization not applicable
>>> + @retval SJ_OPT_MATERIALIZE_LOOKUP - Materialization with lookup
>>> applicable
>>> + @retval SJ_OPT_MATERIALIZE_SCAN - Materialization with scan
>>> applicable
>>>
>>> - RETURN
>>> - TRUE Yes, can apply sj-materialization
>>> - FALSE No, some of the requirements are not met
>>> + @details
>>> + The function checks applicability of both MaterializeLookup and
>>> + MaterializeScan strategies.
>>> + No checking is made until "tab" is pointing to the last inner table
>>> + of a semijoin nest that can be executed using materialization -
>>> + for all other cases SJ_OPT_NONE is returned.
>>> + If both MaterializeLookup and MaterializeScan are applicable
>>> + (ie if there are no correlated outer tables),
>>
>> I don't understand the equivalence ("i.e. if there are" etc)
>
> I mean that both strategies are applicable only when there are no
> correlated outer tables, ie all outer tables are non-correlated. If
> there are correlated tables in the suffix, MaterializeScan must be used.
> If there are correlated tables only in the prefix, then
> MaterializeLookup must be used.
From the text it's not clear: it says that MatScan can handle:
+ MaterializeScan strategy
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ (ot|nt)* [ it (it)* ] (ot|nt)*
+ +------+ +==========+ +-----+
+ (1) (2) (3)
+
This picture accepts the case where the suffix is only "nt", doesn't it?
But then, this particular MatScan is like a MatLookup...
I wonder what's the difference between MatScan and MatLookup when there
are no "ot" in the suffix?
> That leaves the case of no correlated
> outer tables either places, which could, at least in theory, be executed
> by both strategies. The materialized table would degenerate to a boolean
> scalar, then.
yes
>>> + MaterializeLookup is returned.
>>> + This is a purely heuristic decision.
>>
>> Is this decision something which is introduced by the patch?
>
> No, but the decision is moved into this function. In the existing
> version, the caller could make the decision itself. It would be easy to
> revert this however, e.g by implementing the return value as a set of
> applicable strategies, if somebody wanted it.
no, it's ok, thanks
>>> + MaterializeScan allows correlated outer tables both before and after
>>> + the materialized inner tables. In this case, the materialized
>>> table must
>>> + contain keys from the prefix outer tables followed by keys from the
>>> + suffix outer tables, called prefix-key and suffix-key, respectively.
>>
>> What do you mean by "keys" here... I think the materialized tables has
>> a single index.
>
> I should state that as key columns.
ok
>>> + First, a scan is set up over the prefix outer tables. Then a range
>>> scan
>>> + is done over the prefix-key of the materialized table.
>>> + Finally, the suffix-key from the materialized table is used for a
>>> scan
>>> + over the suffix outer tables.
>>
>> mmmm the suffix outer tables are not always scanned, if they have an
>> index then we will do a index lookup on them (searching for a value
>> which we found when scanning the materialized table).
>
> I thought that I could use "scan" as a common term for table scan and
> index scan/lookup. Do you want me to be more specific?
in fact, it's hard to be accurate. Even this:
"First, a scan is set up over the prefix outer tables"
can be misleading: if the prefix outer tables are preceded by other
tables, then the prefix outer tables may be accessed via "ref" or
"unique" index accesses (no scan) based on rows found in preceding other
tables.
Maybe this:
"first, data is read from the prefix outer tables, then matches are
located in the materialized table, then matches are located in the
suffix outer tables. If there are no prefix outer tables, the
materialized table is scanned."
I used "data" instead of "rows" because the optimizer might be able to
find all data from the indices (so, not even read "rows").
Maybe this makes a too vague sentence though.
>> Is the comment a description of how things work now?
>> I was surprised to see that materialization-scan is expected work with
>> the inner table not first, so I tested it and filed BUG#53172. It may
>> or not be related to BUG#52068 - semijoin materialization has a bit
>> too many bugs it seems :-(
>
> If there are so many bugs here, we could perhaps disable the "mixed
> scan" method. It is not very significant, I think... Øystein has a bit
> more information about this strategy.
Yes, let's see when we start attacking those bugs. All you did is
describing in greater detail than the existing comments, that's already
good reading for us all.
>>> -SJ_MATERIALIZATION_INFO *
>>> -at_sjmat_pos(const JOIN *join, table_map remaining_tables, const
>>> JOIN_TAB *tab,
>>> - uint idx, bool *loose_scan)
>>> +static int
Good that you made it static.
>>> +semijoin_order_allows_materialization(const JOIN *join,
>>> + table_map remaining_tables,
>>> + const JOIN_TAB *tab, uint idx)
>>> {
>>> /*
>>> Check if 1. We're in a semi-join nest that can be run with
>>> SJ-materialization
>>> - 2. All the tables correlated through the IN subquery are in the
>>> prefix
>>> + 2. All the tables from the subquery are in the prefix
>>> */
>>> - TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
>>> - table_map suffix= remaining_tables & ~tab->table->map;
>>> - if (emb_sj_nest && emb_sj_nest->sj_mat_info &&
>>> - !(suffix & emb_sj_nest->sj_inner_tables))
>>> + const TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
>>> + if (!emb_sj_nest ||
>>> + !emb_sj_nest->sj_mat_info ||
>>> + (remaining_tables & emb_sj_nest->sj_inner_tables))
>>
>> The old code would use ~tab->table->map here...?
>
> If you go back to advance_sj_state(), you see this code line:
>
> remaining_tables &= ~new_join_tab->table->map;
>
> so, suffix is identical to remaining_tables, and there is no need for it.
ok. Though then the property "tab is not part of remaining_tables"
becomes an assumption of semijoin_order_allows_materialization(), I'd
DBUG_ASSERT() it.
>>> + return SJ_OPT_NONE;
>>> +
>>> + /* + Walk back and check if all immediately preceding tables
>>> are from
>>> + this semi-join.
>>> + */
>>> + const uint n_tables= my_count_bits(emb_sj_nest->sj_inner_tables);
>>> + for (uint i= 1; i < n_tables ; i++)
>>> {
>>> - /* - Walk back and check if all immediately preceding
>>> tables are from
>>> - this semi-join.
>>> - */
>>> - uint n_tables= my_count_bits(tab->emb_sj_nest->sj_inner_tables);
>>> - for (uint i= 1; i < n_tables ; i++)
>>> - {
>>> - if (join->positions[idx - i].table->emb_sj_nest !=
>>> tab->emb_sj_nest)
>>> - return NULL;
>>> - }
>>> - *loose_scan= test(remaining_tables & ~tab->table->map &
>>> - (emb_sj_nest->sj_inner_tables |
>>> -
>>> emb_sj_nest->nested_join->sj_depends_on));
>>> - if (*loose_scan &&
> !emb_sj_nest->sj_subq_pred->sjm_scan_allowed)
>>> - return NULL;
>>> - else
>>> - return emb_sj_nest->sj_mat_info;
>>> + if (join->positions[idx - i].table->emb_sj_nest != emb_sj_nest)
>>> + return SJ_OPT_NONE;
>>> }
>>> - return NULL;
>>> -}
>>>
>>> + /*
>>> + Must use MaterializeScan strategy if there are outer correlated
>>> tables
>>> + among the remaining tables, otherwise use MaterializeLookup.
>>> + */
>>> + if (remaining_tables & emb_sj_nest->nested_join->sj_depends_on)
>>> + {
>>> + if (!emb_sj_nest->sj_subq_pred->sjm_scan_allowed)
>>
>> There seems to be code changes too (I don't see ~tab->table->map,
>> emb_sj_nest->sj_inner_tables). They may well be fine, but I'd really
>> need an explanation, please. I don't fully understand what all those
>> bitmaps are about and why they are not needed in fact...
>
> The table->map thing is moot, see above.
>
> We have also tested above that if
> "remaining_tables & emb_sj_nest->sj_inner_tables",
> we have returned SJ_OPT_NONE.
>
> This is the reason why I could simplify the code.
>
> It's a bit difficult to annotate every code change with a reason. I
> cannot write that into the comment section, because it should document
> the existing code. I could write it into the commit comments, but that
> means that I need to copy the diffs into the commit section. So
> refactoring code actually means harder work than writing the initial
> code :(
Maybe (an idea for a next time) make a commit with verbose comments
inside the code like "To reviewer: in the above line I could eliminate
variable X because Y", and when it's time to push, re-commit without
comments...?
Ok to push. Thanks!