Roy Lyseng a écrit, Le 12.08.2010 10:32:
>> Minor comments below, nothing critical, ok to push anyway.
>> Roy Lyseng a écrit, Le 27.07.2010 10:24:
>>> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>>> 3219 Roy Lyseng 2010-07-27
>>> Bug#50489: another segfault in fix_semijoin_strategies...
>>> === modified file 'sql/sql_select.cc'
>>> --- a/sql/sql_select.cc 2010-07-26 11:34:07 +0000
>>> +++ b/sql/sql_select.cc 2010-07-27 08:23:50 +0000
>>> @@ -1040,8 +1039,7 @@ bool subquery_types_allow_materializatio
>>> static -bool semijoin_types_allow_materialization(TABLE_LIST *sj_nest,
>>> - bool *sjm_scan_allowed)
>>> +bool semijoin_types_allow_materialization(TABLE_LIST *sj_nest)
>>> @@ -1051,7 +1049,7 @@ bool semijoin_types_allow_materializatio
>>> List_iterator<Item> it1(sj_nest->nested_join->sj_outer_exprs);
>>> List_iterator<Item> it2(sj_nest->nested_join->sj_inner_exprs);
>>> - *sjm_scan_allowed= FALSE;
>>> + sj_nest->nested_join->sjm.scan_allowed= FALSE;
>>> bool all_are_fields= TRUE;
>>> Item *outer, *inner;
>>> @@ -1062,7 +1060,7 @@ bool semijoin_types_allow_materializatio
>>> if (!types_allow_materialization(outer, inner))
>>> - *sjm_scan_allowed= all_are_fields;
>>> + sj_nest->nested_join->sjm.scan_allowed= all_are_fields;
>> In the patch there are many occurences of sj_nest->nested_join->sjm, I
>> wonder if
>> those repeated indirections are a problem or not (i.e. this sjm should be
>> locally cached in a variable or not).
>> Same question for optimizer_semijoin_nests(),
> The "sjm." should not cause an indirection, so that should not be a
> As for "sj_nest->nested_join", I hope that we can sometime refactor
> TABLE_LIST into an object hierarchy. Then, there would be a "nested
> join" derived class, and the object pointed to by nested_join would go
> into this class.
> If you want encouragement, just look into the comment titled "Table
> reference in the FROM clause" in table.h, and see all the non-systematic
> testing that must be applied for various table types ;)
> Besides, a repeated indirection of something that is in first-level
> cache should be very fast anyway.
> So I see the point that any unnecessary overhead should be eliminated,
> but in this case I prefer to keep this code and leave any optimization
> to a later refactoring.
>>> DBUG_PRINT("info",("semijoin_types_allow_materialization: ok,
>>> @@ -1362,7 +1360,7 @@ static bool sj_table_is_included(JOIN *j
>>> It may be possible to consolidate the materialization strategies into
>>> The choice between the strategies is made by the join optimizer (see
>>> - advance_sj_state() and
>>> + advance_sj_state() and fix_semijoin_strategies()).
>>> This function sets up all fields/structures/etc needed for execution
>>> for setup/initialization of semi-join materialization which is done in
>>> setup_sj_materialization() (todo: can't we move that to here also?)
>>> @@ -4873,6 +4871,17 @@ static bool optimize_semijoin_nests(JOIN
>>> TABLE_LIST *sj_nest;
>>> + /*
>>> + Perform necessary cleanup in case semijoin nests have been used in
>>> + executions.
>>> + */
>> ideally cleanup should be done by the prior executions when they end
>> supposed to be the "clean house" principle in MySQL code: don't leave
>> it to next
>> inhabitants to clean up after you). I don't know whether it's safely
> I like that principle too... In this case, I just went for the simple
> solution, as I wanted a quick fix to see if the solution worked :)
> Will investigate more...
>>> JOIN_TAB *tab= join->best_positions[i].table;
>>> join->map2table[tab->table->tablenr]= tab;
>>> @@ -4955,10 +4954,16 @@ static bool optimize_semijoin_nests(JOIN
>>> double rows= 1.0;
>>> while ((tableno = tm_it.next_bit()) != Table_map_iterator::BITMAP_END)
>>> rows *= join->map2table[tableno]->table->quick_condition_rows;
>>> - sjm->rows= min(sjm->rows, rows);
>>> + sj_nest->nested_join->sjm.rows=
>>> + rows);
>>> - memcpy(sjm->positions, join->best_positions + join->const_tables,
>>> sizeof(POSITION) * n_tables);
>>> + if (!(sj_nest->nested_join->sjm.positions=
>>> + (st_position*)join->thd->alloc(sizeof(st_position)*n_tables)))
>> st_position or POSITION? Further down the patch uses POSITION in the
> It is the curse of having two names for all structs. As I prefer the
> lowercase name generally (neither of the names adhere to the coding
> standard), I would like to keep st_position here.
>>> -static void fix_semijoin_strategies_for_picked_join_order(JOIN *join)
>>> +static bool fix_semijoin_strategies(JOIN *join)
>> The old name was clear about when it happens. "Fix" is so much used in
>> MySQL, to
>> me it just means "let me correct this stuff"; even advance_sj_state()
>> could be
>> called fix_semijoin_strategies.
> I did the change because I thought "for_picked_join_order" was redundant
> and it made the name become unnecessary long. It is also clearly
> documented in the header that it must be called after the join order is
> fixed, and there is no real sense in fixing the strategies before the
> join order is correct.
> But as you did not like it, I will revert this part of the patch.
>>> + SJ_MATERIALIZATION_INFO* sjm;
>>> + if (!(sjm= new SJ_MATERIALIZATION_INFO))
>>> + DBUG_RETURN(TRUE);
>> I find "sjm" is confusing: here it's used for a
>> SJ_MATERIALIZATION_INFO object,
>> whereas in structure st_semijoin_mat "sjm" is a st_semijoin_mat; and
>> one goal of
>> the patch is to separate both structs.
> Yes, standalone names must have more expressive names...
> How about sjm_exec? (sjm_info is just as bad as sjm...)
yes, sjm_exec is good. Maybe sjm_opt for the optimization-related
>>> else if (sjm_strategy == SJ_OPT_MATERIALIZE_LOOKUP)
>>> COST_VECT prefix_cost; - SJ_MATERIALIZATION_INFO *mat_info=
>>> - signed int first_tab= (int)idx - mat_info->tables;
>>> + signed int first_tab=
>> this is not your code, but I wonder whether 'signed' is useful here.
> I inserted DBUG_ASSERT(first_tab >= 0); and got verification that
> first_tab may become -1.
right; it should not be unsigned, and plain "int" would do.
>>> @@ -13714,11 +13735,11 @@ void advance_sj_state(JOIN *join, table_
>>> TABLE_LIST *mat_nest=
>>> - SJ_MATERIALIZATION_INFO *mat_info= mat_nest->sj_mat_info;
>>> + uint tables= my_count_bits(mat_nest->sj_inner_tables);
>> several my_count_bits() calls added over the patch, maybe the number
>> of tables
>> should be cached?
> It's almost always sj_inner_tables. It is probably a good idea to add an
> "uint table_count" to the same struct. But sj_inner_tables is part of
> TABLE_LIST, and this should really be part of nested_join, as several
> TABLE_LIST objects implicitly refer a single table.
> I think that this is better left to a future refactoring.