Hello,
Roy Lyseng a écrit, Le 12.08.2010 10:32:
>> Minor comments below, nothing critical, ok to push anyway.
>>
>> Roy Lyseng a écrit, Le 27.07.2010 10:24:
>>> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>>> revid:epotemkin@stripped
>>>
>>> 3219 Roy Lyseng 2010-07-27
>>> Bug#50489: another segfault in fix_semijoin_strategies...
>>> === modified file 'sql/sql_select.cc'
>>> --- a/sql/sql_select.cc 2010-07-26 11:34:07 +0000
>>> +++ b/sql/sql_select.cc 2010-07-27 08:23:50 +0000
>>
>>> @@ -1040,8 +1039,7 @@ bool subquery_types_allow_materializatio
>>> */
>>>
>>> static -bool semijoin_types_allow_materialization(TABLE_LIST *sj_nest,
>>> - bool *sjm_scan_allowed)
>>> +bool semijoin_types_allow_materialization(TABLE_LIST *sj_nest)
>>> {
>>> DBUG_ENTER("semijoin_types_allow_materialization");
>>>
>>> @@ -1051,7 +1049,7 @@ bool semijoin_types_allow_materializatio
>>> List_iterator<Item> it1(sj_nest->nested_join->sj_outer_exprs);
>>> List_iterator<Item> it2(sj_nest->nested_join->sj_inner_exprs);
>>>
>>> - *sjm_scan_allowed= FALSE;
>>> + sj_nest->nested_join->sjm.scan_allowed= FALSE;
>>>
>>> bool all_are_fields= TRUE;
>>> Item *outer, *inner;
>>> @@ -1062,7 +1060,7 @@ bool semijoin_types_allow_materializatio
>>> if (!types_allow_materialization(outer, inner))
>>> DBUG_RETURN(FALSE);
>>> }
>>> - *sjm_scan_allowed= all_are_fields;
>>> + sj_nest->nested_join->sjm.scan_allowed= all_are_fields;
>>
>> In the patch there are many occurences of sj_nest->nested_join->sjm, I
>> wonder if
>> those repeated indirections are a problem or not (i.e. this sjm should be
>> locally cached in a variable or not).
>> Same question for optimizer_semijoin_nests(),
>> semijoin_order_allows_materialization()...
>
> The "sjm." should not cause an indirection, so that should not be a
> problem.
>
> As for "sj_nest->nested_join", I hope that we can sometime refactor
> TABLE_LIST into an object hierarchy. Then, there would be a "nested
> join" derived class, and the object pointed to by nested_join would go
> into this class.
>
> If you want encouragement, just look into the comment titled "Table
> reference in the FROM clause" in table.h, and see all the non-systematic
> testing that must be applied for various table types ;)
ok, ok...
> Besides, a repeated indirection of something that is in first-level
> cache should be very fast anyway.
>
> So I see the point that any unnecessary overhead should be eliminated,
> but in this case I prefer to keep this code and leave any optimization
> to a later refactoring.
ok
>>> DBUG_PRINT("info",("semijoin_types_allow_materialization: ok,
>>> allowed"));
>>> DBUG_RETURN(TRUE);
>>> }
>>> @@ -1362,7 +1360,7 @@ static bool sj_table_is_included(JOIN *j
>>> It may be possible to consolidate the materialization strategies into
>>> one.
>>> The choice between the strategies is made by the join optimizer (see
>>> - advance_sj_state() and
>>> fix_semijoin_strategies_for_picked_join_order()).
>>> + advance_sj_state() and fix_semijoin_strategies()).
>>> This function sets up all fields/structures/etc needed for execution
>>> except
>>> for setup/initialization of semi-join materialization which is done in
>>> setup_sj_materialization() (todo: can't we move that to here also?)
>>> @@ -4873,6 +4871,17 @@ static bool optimize_semijoin_nests(JOIN
>>> DBUG_ENTER("optimize_semijoin_nests");
>>> List_iterator<TABLE_LIST>
> sj_list_it(join->select_lex->sj_nests);
>>> TABLE_LIST *sj_nest;
>>> +
>>> + /*
>>> + Perform necessary cleanup in case semijoin nests have been used in
>>> prior
>>> + executions.
>>> + */
>>
>> ideally cleanup should be done by the prior executions when they end
>> (that's
>> supposed to be the "clean house" principle in MySQL code: don't leave
>> it to next
>> inhabitants to clean up after you). I don't know whether it's safely
>> possible.
>
> I like that principle too... In this case, I just went for the simple
> solution, as I wanted a quick fix to see if the solution worked :)
>
> Will investigate more...
thanks!
>>
>>> {
>>> JOIN_TAB *tab= join->best_positions[i].table;
>>> join->map2table[tab->table->tablenr]= tab;
>>> @@ -4955,10 +4954,16 @@ static bool optimize_semijoin_nests(JOIN
>>> double rows= 1.0;
>>> while ((tableno = tm_it.next_bit()) != Table_map_iterator::BITMAP_END)
>>> rows *= join->map2table[tableno]->table->quick_condition_rows;
>>> - sjm->rows= min(sjm->rows, rows);
>>> + sj_nest->nested_join->sjm.rows=
> min(sj_nest->nested_join->sjm.rows,
>>> + rows);
>>> }
>>> - memcpy(sjm->positions, join->best_positions + join->const_tables,
> -
>>> sizeof(POSITION) * n_tables);
>>> + if (!(sj_nest->nested_join->sjm.positions=
>>> + (st_position*)join->thd->alloc(sizeof(st_position)*n_tables)))
>>
>> st_position or POSITION? Further down the patch uses POSITION in the
>> reverse
>> memcpy().
>
> It is the curse of having two names for all structs. As I prefer the
> lowercase name generally (neither of the names adhere to the coding
> standard), I would like to keep st_position here.
that's ok.
>>> -static void fix_semijoin_strategies_for_picked_join_order(JOIN *join)
>>> +static bool fix_semijoin_strategies(JOIN *join)
>>
>> The old name was clear about when it happens. "Fix" is so much used in
>> MySQL, to
>> me it just means "let me correct this stuff"; even advance_sj_state()
>> could be
>> called fix_semijoin_strategies.
>
> I did the change because I thought "for_picked_join_order" was redundant
> and it made the name become unnecessary long. It is also clearly
> documented in the header that it must be called after the join order is
> fixed, and there is no real sense in fixing the strategies before the
> join order is correct.
>
> But as you did not like it, I will revert this part of the patch.
thanks
>>
>>> + SJ_MATERIALIZATION_INFO* sjm;
>>> + if (!(sjm= new SJ_MATERIALIZATION_INFO))
>>> + DBUG_RETURN(TRUE);
>>
>> I find "sjm" is confusing: here it's used for a
>> SJ_MATERIALIZATION_INFO object,
>> whereas in structure st_semijoin_mat "sjm" is a st_semijoin_mat; and
>> one goal of
>> the patch is to separate both structs.
>
> Yes, standalone names must have more expressive names...
>
> How about sjm_exec? (sjm_info is just as bad as sjm...)
yes, sjm_exec is good. Maybe sjm_opt for the optimization-related
counterpart?
>>> else if (sjm_strategy == SJ_OPT_MATERIALIZE_LOOKUP)
>>> {
>>> COST_VECT prefix_cost; - SJ_MATERIALIZATION_INFO *mat_info=
>>> emb_sj_nest->sj_mat_info;
>>> - signed int first_tab= (int)idx - mat_info->tables;
>>> + signed int first_tab=
>>
>> this is not your code, but I wonder whether 'signed' is useful here.
>
> I inserted DBUG_ASSERT(first_tab >= 0); and got verification that
> first_tab may become -1.
right; it should not be unsigned, and plain "int" would do.
>>> @@ -13714,11 +13735,11 @@ void advance_sj_state(JOIN *join, table_
>>> {
>>> TABLE_LIST *mat_nest=
>>> join->positions[pos->sjm_scan_last_inner].table->emb_sj_nest;
>>> - SJ_MATERIALIZATION_INFO *mat_info= mat_nest->sj_mat_info;
>>> + uint tables= my_count_bits(mat_nest->sj_inner_tables);
>>
>> several my_count_bits() calls added over the patch, maybe the number
>> of tables
>> should be cached?
>
> It's almost always sj_inner_tables. It is probably a good idea to add an
> "uint table_count" to the same struct. But sj_inner_tables is part of
> TABLE_LIST, and this should really be part of nested_join, as several
> TABLE_LIST objects implicitly refer a single table.
>
> I think that this is better left to a future refactoring.
ok