MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:April 28 2010 11:29am
Subject:Re: bzr commit into mysql-6.0-codebase-bugfixing branch
(roy.lyseng:3836) WL#5347
View as plain text  
Guilhem,

thank you for another round of useful comments.

Guilhem Bichot wrote:
> Hello,
> 
> Roy Lyseng a écrit, Le 26.04.2010 20:44:
>> Hi Guilhem,
>>
>> thanks for commenting.
>>
>> Guilhem Bichot wrote:
>>> Hello Roy,
>>>
>>> Roy Lyseng a écrit, Le 26.04.2010 12:54:
>>>> #At file:///home/rl136806/mysql/repo/mysql-6.0-work1/ based on 
>>>> revid:alfranio.correia@stripped
>>>>
>>>>  3836 Roy Lyseng    2010-04-26
>>>>       WL#5347: Refactor optimizer function at_sjmat_pos()
>>>
>>>> @@ -7241,58 +7270,79 @@ optimize_straight_join(JOIN *join, table
>>>>  }
>>>>  
>>>>  
>>>> -/*
>>>> -  Check if the last tables of the partial join order allow to use
>>>> -  sj-materialization strategy for them
>>>> +/**
>>>> +  Check whether a semijoin materialization strategy is allowed for
>>>> +  the current (semi)join table order.
>>>>  
>>>> -  SYNOPSIS
>>>> -    at_sjmat_pos()
>>>> -      join              -      remaining_tables
>>>> -      tab                the last table's join tab
>>>> -      idx                last table's index
>>>> -      loose_scan    OUT  TRUE <=> use LooseScan
>>>> +  @param join              Join object
>>>> +  @param remaining_tables  Tables that have not yet been added to 
>>>> the join plan
>>>> +  @param tab               Join tab of the table being considered
>>>> +  @param idx               Index of table with join tab "tab"
>>>> +
>>>> +  @retval SJ_OPT_NONE               - Materialization not applicable
>>>> +  @retval SJ_OPT_MATERIALIZE_LOOKUP - Materialization with lookup 
>>>> applicable
>>>> +  @retval SJ_OPT_MATERIALIZE_SCAN   - Materialization with scan 
>>>> applicable
>>>>  
>>>> -  RETURN
>>>> -    TRUE   Yes, can apply sj-materialization
>>>> -    FALSE  No, some of the requirements are not met
>>>> +  @details
>>>> +  The function checks applicability of both MaterializeLookup and
>>>> +  MaterializeScan strategies.
>>>> +  No checking is made until "tab" is pointing to the last inner table
>>>> +  of a semijoin nest that can be executed using materialization -
>>>> +  for all other cases SJ_OPT_NONE is returned.
>>>> +  If both MaterializeLookup and MaterializeScan are applicable
>>>> +  (ie if there are no correlated outer tables),
>>>
>>> I don't understand the equivalence ("i.e. if there are" etc)
>>
>> I mean that both strategies are applicable only when there are no 
>> correlated outer tables, ie all outer tables are non-correlated. If 
>> there are correlated tables in the suffix, MaterializeScan must be 
>> used. If there are correlated tables only in the prefix, then 
>> MaterializeLookup must be used.
> 
>  From the text it's not clear: it says that MatScan can handle:
> 
> +    MaterializeScan strategy
> +    ~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +     (ot|nt)*  [ it (it)* ]  (ot|nt)*
> +     +------+  +==========+  +-----+
> +        (1)         (2)         (3)
> +
> 
> This picture accepts the case where the suffix is only "nt", doesn't it?
> But then, this particular MatScan is like a MatLookup...
> I wonder what's the difference between MatScan and MatLookup when there 
> are no "ot" in the suffix?

It is quite likely that the two materialization strategies can be consolidated 
into one at some point in time. But that will be a separate WL...
> 
>> That leaves the case of no correlated outer tables either places, 
>> which could, at least in theory, be executed by both strategies. The 
>> materialized table would degenerate to a boolean scalar, then.
> 
> yes
> 
>>>> +  MaterializeLookup is returned.
>>>> +  This is a purely heuristic decision.
>>>
>>> Is this decision something which is introduced by the patch?
>>
>> No, but the decision is moved into this function. In the existing 
>> version, the caller could make the decision itself. It would be easy 
>> to revert this however, e.g by implementing the return value as a set 
>> of applicable strategies, if somebody wanted it.
> 
> no, it's ok, thanks
> 
>>>> +  MaterializeScan allows correlated outer tables both before and after
>>>> +  the materialized inner tables. In this case, the materialized 
>>>> table must
>>>> +  contain keys from the prefix outer tables followed by keys from the
>>>> +  suffix outer tables, called prefix-key and suffix-key, respectively.
>>>
>>> What do you mean by "keys" here... I think the materialized tables 
>>> has a single index.
>>
>> I should state that as key columns.
> 
> ok
> 
>>>> +  First, a scan is set up over the prefix outer tables. Then a 
>>>> range scan
>>>> +  is done over the prefix-key of the materialized table.
>>>> +  Finally, the suffix-key from the materialized table is used for a 
>>>> scan
>>>> +  over the suffix outer tables.
>>>
>>> mmmm the suffix outer tables are not always scanned, if they have an 
>>> index then we will do a index lookup on them (searching for a value 
>>> which we found when scanning the materialized table).
>>
>> I thought that I could use "scan" as a common term for table scan and 
>> index scan/lookup. Do you want me to be more specific?
> 
> in fact, it's hard to be accurate. Even this:
> "First, a scan is set up over the prefix outer tables"
> can be misleading: if the prefix outer tables are preceded by other 
> tables, then the prefix outer tables may be accessed via "ref" or 
> "unique" index accesses (no scan) based on rows found in preceding other 
> tables.
> Maybe this:
> "first, data is read from the prefix outer tables, then matches are 
> located in the materialized table, then matches are located in the 
> suffix outer tables. If there are no prefix outer tables, the 
> materialized table is scanned."
> I used "data" instead of "rows" because the optimizer might be able to 
> find all data from the indices (so, not even read "rows").
> Maybe this makes a too vague sentence though.

Thanks, I tried to work out a new text based on your suggestion. I object to 
using "data", though, because I think it is too general. I prefer "rows", or 
"data rows" and "index rows" when I need to make the distinction.
> 
>>> Is the comment a description of how things work now?
>>> I was surprised to see that materialization-scan is expected work 
>>> with the inner table not first, so I tested it and filed BUG#53172. 
>>> It may or not be related to BUG#52068 - semijoin materialization has 
>>> a bit too many bugs it seems :-(
>>
>> If there are so many bugs here, we could perhaps disable the "mixed 
>> scan" method. It is not very significant, I think... Øystein has a bit 
>> more information about this strategy.
> 
> Yes, let's see when we start attacking those bugs. All you did is 
> describing in greater detail than the existing comments, that's already 
> good reading for us all.
> 
>>>> -SJ_MATERIALIZATION_INFO *
>>>> -at_sjmat_pos(const JOIN *join, table_map remaining_tables, const 
>>>> JOIN_TAB *tab,
>>>> -             uint idx, bool *loose_scan)
>>>> +static int
> 
> Good that you made it static.
> 
>>>> +semijoin_order_allows_materialization(const JOIN *join,
>>>> +                                      table_map remaining_tables,
>>>> +                                      const JOIN_TAB *tab, uint idx)
>>>>  {
>>>>    /*
>>>>     Check if      1. We're in a semi-join nest that can be run with 
>>>> SJ-materialization
>>>> -    2. All the tables correlated through the IN subquery are in the 
>>>> prefix
>>>> +    2. All the tables from the subquery are in the prefix
>>>>    */
>>>> -  TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
>>>> -  table_map suffix= remaining_tables & ~tab->table->map;
>>>> -  if (emb_sj_nest && emb_sj_nest->sj_mat_info &&
>>>> -      !(suffix & emb_sj_nest->sj_inner_tables))
>>>> +  const TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
>>>> +  if (!emb_sj_nest ||
>>>> +      !emb_sj_nest->sj_mat_info ||
>>>> +      (remaining_tables & emb_sj_nest->sj_inner_tables))
>>>
>>> The old code would use ~tab->table->map here...?
>>
>> If you go back to advance_sj_state(), you see this code line:
>>
>> remaining_tables &= ~new_join_tab->table->map;
>>
>> so, suffix is identical to remaining_tables, and there is no need for it.
> 
> ok. Though then the property "tab is not part of remaining_tables" 
> becomes an assumption of semijoin_order_allows_materialization(), I'd 
> DBUG_ASSERT() it.

Done.
> 
>>>> +    return SJ_OPT_NONE;
>>>> +
>>>> +  /* +    Walk back and check if all immediately preceding tables 
>>>> are from
>>>> +    this semi-join.
>>>> +  */
>>>> +  const uint n_tables= my_count_bits(emb_sj_nest->sj_inner_tables);
>>>> +  for (uint i= 1; i < n_tables ; i++)
>>>>    {
>>>> -    /* -      Walk back and check if all immediately preceding 
>>>> tables are from
>>>> -      this semi-join.
>>>> -    */
>>>> -    uint n_tables=
> my_count_bits(tab->emb_sj_nest->sj_inner_tables);
>>>> -    for (uint i= 1; i < n_tables ; i++)
>>>> -    {
>>>> -      if (join->positions[idx - i].table->emb_sj_nest != 
>>>> tab->emb_sj_nest)
>>>> -        return NULL;
>>>> -    }
>>>> -    *loose_scan= test(remaining_tables & ~tab->table->map
> &
>>>> -                             (emb_sj_nest->sj_inner_tables |
>>>> -                              
>>>> emb_sj_nest->nested_join->sj_depends_on));
>>>> -    if (*loose_scan &&
> !emb_sj_nest->sj_subq_pred->sjm_scan_allowed)
>>>> -      return NULL;
>>>> -    else
>>>> -      return emb_sj_nest->sj_mat_info;
>>>> +    if (join->positions[idx - i].table->emb_sj_nest !=
> emb_sj_nest)
>>>> +      return SJ_OPT_NONE;
>>>>    }
>>>> -  return NULL;
>>>> -}
>>>>  
>>>> +  /*
>>>> +    Must use MaterializeScan strategy if there are outer correlated 
>>>> tables
>>>> +    among the remaining tables, otherwise use MaterializeLookup.
>>>> +  */
>>>> +  if (remaining_tables &
> emb_sj_nest->nested_join->sj_depends_on)
>>>> +  {
>>>> +    if (!emb_sj_nest->sj_subq_pred->sjm_scan_allowed)
>>>
>>> There seems to be code changes too (I don't see ~tab->table->map, 
>>> emb_sj_nest->sj_inner_tables). They may well be fine, but I'd really 
>>> need an explanation, please. I don't fully understand what all those 
>>> bitmaps are about and why they are not needed in fact...
>>
>> The table->map thing is moot, see above.
>>
>> We have also tested above that if
>> "remaining_tables & emb_sj_nest->sj_inner_tables",
>> we have returned SJ_OPT_NONE.
>>
>> This is the reason why I could simplify the code.
>>
>> It's a bit difficult to annotate every code change with a reason. I 
>> cannot write that into the comment section, because it should document 
>> the existing code. I could write it into the commit comments, but that 
>> means that I need to copy the diffs into the commit section. So 
>> refactoring code actually means harder work than writing the initial 
>> code :(
> 
> Maybe (an idea for a next time) make a commit with verbose comments 
> inside the code like "To reviewer: in the above line I could eliminate 
> variable X because Y", and when it's time to push, re-commit without 
> comments...?

Good idea...
> 
> Ok to push. Thanks!
> 
> 
Thanks,
Roy
Thread
bzr commit into mysql-6.0-codebase-bugfixing branch (roy.lyseng:3836) WL#5347Roy Lyseng26 Apr
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Guilhem Bichot26 Apr
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Roy Lyseng26 Apr
      • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Guilhem Bichot26 Apr
        • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Roy Lyseng28 Apr
      • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Øystein Grøvlen27 Apr
        • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Roy Lyseng27 Apr
  • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Øystein Grøvlen27 Apr
    • Re: bzr commit into mysql-6.0-codebase-bugfixing branch(roy.lyseng:3836) WL#5347Roy Lyseng28 Apr