MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:April 26 2010 8:06pm
Subject:Re: bzr commit into mysql-6.0-codebase-bugfixing branch
(roy.lyseng:3836) WL#5347
View as plain text  
Hello,

Roy Lyseng a écrit, Le 26.04.2010 20:44:
> Hi Guilhem,
> 
> thanks for commenting.
> 
> Guilhem Bichot wrote:
>> Hello Roy,
>>
>> Roy Lyseng a écrit, Le 26.04.2010 12:54:
>>> #At file:///home/rl136806/mysql/repo/mysql-6.0-work1/ based on 
>>> revid:alfranio.correia@stripped
>>>
>>>  3836 Roy Lyseng    2010-04-26
>>>       WL#5347: Refactor optimizer function at_sjmat_pos()
>>
>>> @@ -7241,58 +7270,79 @@ optimize_straight_join(JOIN *join, table
>>>  }
>>>  
>>>  
>>> -/*
>>> -  Check if the last tables of the partial join order allow to use
>>> -  sj-materialization strategy for them
>>> +/**
>>> +  Check whether a semijoin materialization strategy is allowed for
>>> +  the current (semi)join table order.
>>>  
>>> -  SYNOPSIS
>>> -    at_sjmat_pos()
>>> -      join              -      remaining_tables
>>> -      tab                the last table's join tab
>>> -      idx                last table's index
>>> -      loose_scan    OUT  TRUE <=> use LooseScan
>>> +  @param join              Join object
>>> +  @param remaining_tables  Tables that have not yet been added to 
>>> the join plan
>>> +  @param tab               Join tab of the table being considered
>>> +  @param idx               Index of table with join tab "tab"
>>> +
>>> +  @retval SJ_OPT_NONE               - Materialization not applicable
>>> +  @retval SJ_OPT_MATERIALIZE_LOOKUP - Materialization with lookup 
>>> applicable
>>> +  @retval SJ_OPT_MATERIALIZE_SCAN   - Materialization with scan 
>>> applicable
>>>  
>>> -  RETURN
>>> -    TRUE   Yes, can apply sj-materialization
>>> -    FALSE  No, some of the requirements are not met
>>> +  @details
>>> +  The function checks applicability of both MaterializeLookup and
>>> +  MaterializeScan strategies.
>>> +  No checking is made until "tab" is pointing to the last inner table
>>> +  of a semijoin nest that can be executed using materialization -
>>> +  for all other cases SJ_OPT_NONE is returned.
>>> +  If both MaterializeLookup and MaterializeScan are applicable
>>> +  (ie if there are no correlated outer tables),
>>
>> I don't understand the equivalence ("i.e. if there are" etc)
> 
> I mean that both strategies are applicable only when there are no 
> correlated outer tables, ie all outer tables are non-correlated. If 
> there are correlated tables in the suffix, MaterializeScan must be used. 
> If there are correlated tables only in the prefix, then 
> MaterializeLookup must be used.

 From the text it's not clear: it says that MatScan can handle:

+    MaterializeScan strategy
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+     (ot|nt)*  [ it (it)* ]  (ot|nt)*
+     +------+  +==========+  +-----+
+        (1)         (2)         (3)
+

This picture accepts the case where the suffix is only "nt", doesn't it?
But then, this particular MatScan is like a MatLookup...
I wonder what's the difference between MatScan and MatLookup when there 
are no "ot" in the suffix?

> That leaves the case of no correlated 
> outer tables either places, which could, at least in theory, be executed 
> by both strategies. The materialized table would degenerate to a boolean 
> scalar, then.

yes

>>> +  MaterializeLookup is returned.
>>> +  This is a purely heuristic decision.
>>
>> Is this decision something which is introduced by the patch?
> 
> No, but the decision is moved into this function. In the existing 
> version, the caller could make the decision itself. It would be easy to 
> revert this however, e.g by implementing the return value as a set of 
> applicable strategies, if somebody wanted it.

no, it's ok, thanks

>>> +  MaterializeScan allows correlated outer tables both before and after
>>> +  the materialized inner tables. In this case, the materialized 
>>> table must
>>> +  contain keys from the prefix outer tables followed by keys from the
>>> +  suffix outer tables, called prefix-key and suffix-key, respectively.
>>
>> What do you mean by "keys" here... I think the materialized tables has 
>> a single index.
> 
> I should state that as key columns.

ok

>>> +  First, a scan is set up over the prefix outer tables. Then a range 
>>> scan
>>> +  is done over the prefix-key of the materialized table.
>>> +  Finally, the suffix-key from the materialized table is used for a 
>>> scan
>>> +  over the suffix outer tables.
>>
>> mmmm the suffix outer tables are not always scanned, if they have an 
>> index then we will do a index lookup on them (searching for a value 
>> which we found when scanning the materialized table).
> 
> I thought that I could use "scan" as a common term for table scan and 
> index scan/lookup. Do you want me to be more specific?

in fact, it's hard to be accurate. Even this:
"First, a scan is set up over the prefix outer tables"
can be misleading: if the prefix outer tables are preceded by other 
tables, then the prefix outer tables may be accessed via "ref" or 
"unique" index accesses (no scan) based on rows found in preceding other 
tables.
Maybe this:
"first, data is read from the prefix outer tables, then matches are 
located in the materialized table, then matches are located in the 
suffix outer tables. If there are no prefix outer tables, the 
materialized table is scanned."
I used "data" instead of "rows" because the optimizer might be able to 
find all data from the indices (so, not even read "rows").
Maybe this makes a too vague sentence though.

>> Is the comment a description of how things work now?
>> I was surprised to see that materialization-scan is expected work with 
>> the inner table not first, so I tested it and filed BUG#53172. It may 
>> or not be related to BUG#52068 - semijoin materialization has a bit 
>> too many bugs it seems :-(
> 
> If there are so many bugs here, we could perhaps disable the "mixed 
> scan" method. It is not very significant, I think... Øystein has a bit 
> more information about this strategy.

Yes, let's see when we start attacking those bugs. All you did is 
describing in greater detail than the existing comments, that's already 
good reading for us all.

>>> -SJ_MATERIALIZATION_INFO *
>>> -at_sjmat_pos(const JOIN *join, table_map remaining_tables, const 
>>> JOIN_TAB *tab,
>>> -             uint idx, bool *loose_scan)
>>> +static int

Good that you made it static.

>>> +semijoin_order_allows_materialization(const JOIN *join,
>>> +                                      table_map remaining_tables,
>>> +                                      const JOIN_TAB *tab, uint idx)
>>>  {
>>>    /*
>>>     Check if      1. We're in a semi-join nest that can be run with 
>>> SJ-materialization
>>> -    2. All the tables correlated through the IN subquery are in the 
>>> prefix
>>> +    2. All the tables from the subquery are in the prefix
>>>    */
>>> -  TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
>>> -  table_map suffix= remaining_tables & ~tab->table->map;
>>> -  if (emb_sj_nest && emb_sj_nest->sj_mat_info &&
>>> -      !(suffix & emb_sj_nest->sj_inner_tables))
>>> +  const TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
>>> +  if (!emb_sj_nest ||
>>> +      !emb_sj_nest->sj_mat_info ||
>>> +      (remaining_tables & emb_sj_nest->sj_inner_tables))
>>
>> The old code would use ~tab->table->map here...?
> 
> If you go back to advance_sj_state(), you see this code line:
> 
> remaining_tables &= ~new_join_tab->table->map;
> 
> so, suffix is identical to remaining_tables, and there is no need for it.

ok. Though then the property "tab is not part of remaining_tables" 
becomes an assumption of semijoin_order_allows_materialization(), I'd 
DBUG_ASSERT() it.

>>> +    return SJ_OPT_NONE;
>>> +
>>> +  /* +    Walk back and check if all immediately preceding tables 
>>> are from
>>> +    this semi-join.
>>> +  */
>>> +  const uint n_tables= my_count_bits(emb_sj_nest->sj_inner_tables);
>>> +  for (uint i= 1; i < n_tables ; i++)
>>>    {
>>> -    /* -      Walk back and check if all immediately preceding 
>>> tables are from
>>> -      this semi-join.
>>> -    */
>>> -    uint n_tables= my_count_bits(tab->emb_sj_nest->sj_inner_tables);
>>> -    for (uint i= 1; i < n_tables ; i++)
>>> -    {
>>> -      if (join->positions[idx - i].table->emb_sj_nest != 
>>> tab->emb_sj_nest)
>>> -        return NULL;
>>> -    }
>>> -    *loose_scan= test(remaining_tables & ~tab->table->map &
>>> -                             (emb_sj_nest->sj_inner_tables |
>>> -                              
>>> emb_sj_nest->nested_join->sj_depends_on));
>>> -    if (*loose_scan &&
> !emb_sj_nest->sj_subq_pred->sjm_scan_allowed)
>>> -      return NULL;
>>> -    else
>>> -      return emb_sj_nest->sj_mat_info;
>>> +    if (join->positions[idx - i].table->emb_sj_nest != emb_sj_nest)
>>> +      return SJ_OPT_NONE;
>>>    }
>>> -  return NULL;
>>> -}
>>>  
>>> +  /*
>>> +    Must use MaterializeScan strategy if there are outer correlated 
>>> tables
>>> +    among the remaining tables, otherwise use MaterializeLookup.
>>> +  */
>>> +  if (remaining_tables & emb_sj_nest->nested_join->sj_depends_on)
>>> +  {
>>> +    if (!emb_sj_nest->sj_subq_pred->sjm_scan_allowed)
>>
>> There seems to be code changes too (I don't see ~tab->table->map, 
>> emb_sj_nest->sj_inner_tables). They may well be fine, but I'd really 
>> need an explanation, please. I don't fully understand what all those 
>> bitmaps are about and why they are not needed in fact...
> 
> The table->map thing is moot, see above.
> 
> We have also tested above that if
> "remaining_tables & emb_sj_nest->sj_inner_tables",
> we have returned SJ_OPT_NONE.
> 
> This is the reason why I could simplify the code.
> 
> It's a bit difficult to annotate every code change with a reason. I 
> cannot write that into the comment section, because it should document 
> the existing code. I could write it into the commit comments, but that 
> means that I need to copy the diffs into the commit section. So 
> refactoring code actually means harder work than writing the initial 
> code :(

Maybe (an idea for a next time) make a commit with verbose comments 
inside the code like "To reviewer: in the above line I could eliminate 
variable X because Y", and when it's time to push, re-commit without 
comments...?

Ok to push. Thanks!

Thread
bzr commit into mysql-6.0-codebase-bugfixing branch (roy.lyseng:3836) WL#5347Roy Lyseng26 Apr
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Guilhem Bichot26 Apr
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Roy Lyseng26 Apr
      • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Guilhem Bichot26 Apr
        • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Roy Lyseng28 Apr
      • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Øystein Grøvlen27 Apr
        • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Roy Lyseng27 Apr
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Øystein Grøvlen27 Apr
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Roy Lyseng28 Apr