List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:August 12 2010 8:32am
Subject:Re: bzr commit into mysql-next-mr-opt-backporting branch (roy.lyseng:3219)
Bug#50489
View as plain text  
Hi Guilhem,

thank you for reviewing!

On 09.08.10 11.13, Guilhem Bichot wrote:
> Hello,
>
> Minor comments below, nothing critical, ok to push anyway.
>
> Roy Lyseng a écrit, Le 27.07.2010 10:24:
>> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
>> revid:epotemkin@stripped
>>
>> 3219 Roy Lyseng 2010-07-27
>> Bug#50489: another segfault in fix_semijoin_strategies...
>
>
>> === modified file 'sql/sql_class.h'
>> --- a/sql/sql_class.h 2010-07-23 18:14:59 +0000
>> +++ b/sql/sql_class.h 2010-07-27 08:23:50 +0000
>> @@ -3360,43 +3360,16 @@ struct st_table_ref;
>>
>>
>> /*
>
> /** will make doxygen pick this comment up.

Yepp.
>
>> - Optimizer and executor structure for the materialized semi-join info. This
>> - structure contains
>> + Executor structure for the materialized semi-join info, which contains
>> - The sj-materialization temporary table
>> - Members needed to make index lookup or a full scan of the temptable.
>> */
>> class SJ_MATERIALIZATION_INFO : public Sql_alloc
>
> The is_sj_scan member could optionally be renamed to is_scan, as "sj" is obvious
> in the context of this object.
> And sjm_table_cols could optionally be renamed to table_cols.

and sjm_table_param -> table_param.
>
>> === modified file 'sql/sql_select.cc'
>> --- a/sql/sql_select.cc 2010-07-26 11:34:07 +0000
>> +++ b/sql/sql_select.cc 2010-07-27 08:23:50 +0000
>
>> @@ -1040,8 +1039,7 @@ bool subquery_types_allow_materializatio
>> */
>>
>> static -bool semijoin_types_allow_materialization(TABLE_LIST *sj_nest,
>> - bool *sjm_scan_allowed)
>> +bool semijoin_types_allow_materialization(TABLE_LIST *sj_nest)
>> {
>> DBUG_ENTER("semijoin_types_allow_materialization");
>>
>> @@ -1051,7 +1049,7 @@ bool semijoin_types_allow_materializatio
>> List_iterator<Item> it1(sj_nest->nested_join->sj_outer_exprs);
>> List_iterator<Item> it2(sj_nest->nested_join->sj_inner_exprs);
>>
>> - *sjm_scan_allowed= FALSE;
>> + sj_nest->nested_join->sjm.scan_allowed= FALSE;
>>
>> bool all_are_fields= TRUE;
>> Item *outer, *inner;
>> @@ -1062,7 +1060,7 @@ bool semijoin_types_allow_materializatio
>> if (!types_allow_materialization(outer, inner))
>> DBUG_RETURN(FALSE);
>> }
>> - *sjm_scan_allowed= all_are_fields;
>> + sj_nest->nested_join->sjm.scan_allowed= all_are_fields;
>
> In the patch there are many occurences of sj_nest->nested_join->sjm, I wonder
> if
> those repeated indirections are a problem or not (i.e. this sjm should be
> locally cached in a variable or not).
> Same question for optimizer_semijoin_nests(),
> semijoin_order_allows_materialization()...

The "sjm." should not cause an indirection, so that should not be a problem.

As for "sj_nest->nested_join", I hope that we can sometime refactor TABLE_LIST 
into an object hierarchy. Then, there would be a "nested join" derived class, 
and the object pointed to by nested_join would go into this class.

If you want encouragement, just look into the comment titled "Table reference in 
the FROM clause" in table.h, and see all the non-systematic testing that must be 
applied for various table types ;)

Besides, a repeated indirection of something that is in first-level cache should 
be very fast anyway.

So I see the point that any unnecessary overhead should be eliminated, but in 
this case I prefer to keep this code and leave any optimization to a later 
refactoring.
>
>> DBUG_PRINT("info",("semijoin_types_allow_materialization: ok, allowed"));
>> DBUG_RETURN(TRUE);
>> }
>> @@ -1362,7 +1360,7 @@ static bool sj_table_is_included(JOIN *j
>> It may be possible to consolidate the materialization strategies into one.
>> The choice between the strategies is made by the join optimizer (see
>> - advance_sj_state() and fix_semijoin_strategies_for_picked_join_order()).
>> + advance_sj_state() and fix_semijoin_strategies()).
>> This function sets up all fields/structures/etc needed for execution except
>> for setup/initialization of semi-join materialization which is done in
>> setup_sj_materialization() (todo: can't we move that to here also?)
>> @@ -4873,6 +4871,17 @@ static bool optimize_semijoin_nests(JOIN
>> DBUG_ENTER("optimize_semijoin_nests");
>> List_iterator<TABLE_LIST> sj_list_it(join->select_lex->sj_nests);
>> TABLE_LIST *sj_nest;
>> +
>> + /*
>> + Perform necessary cleanup in case semijoin nests have been used in prior
>> + executions.
>> + */
>
> ideally cleanup should be done by the prior executions when they end (that's
> supposed to be the "clean house" principle in MySQL code: don't leave it to next
> inhabitants to clean up after you). I don't know whether it's safely possible.

I like that principle too... In this case, I just went for the simple solution, 
as I wanted a quick fix to see if the solution worked :)

Will investigate more...
>
>> + while ((sj_nest= sj_list_it++))
>> + {
>> + sj_nest->nested_join->sjm.positions= NULL;
>> + sj_nest->sj_mat_info= NULL;
>> + }
>> + sj_list_it.rewind();
>> /*
>> The statement may have been executed with 'semijoin=on' earlier.
>> We need to verify that 'semijoin=on' still holds.
>> @@ -4888,14 +4897,12 @@ static bool optimize_semijoin_nests(JOIN
>> Try semijoin materialization if the semijoin is classified as
>> non-trivially-correlated.
>> */ - sj_nest->sj_mat_info= NULL;
>> if (sj_nest->nested_join->sj_corr_tables)
>> continue;
>> /*
>> Check whether data types allow execution with materialization.
>> */
>> - bool sjm_scan_allowed;
>> - if (semijoin_types_allow_materialization(sj_nest, &sjm_scan_allowed))
>> + if (semijoin_types_allow_materialization(sj_nest))
>> {
>> join->emb_sjm_nest= sj_nest;
>> if (choose_plan(join, all_table_map & ~join->const_table_map))
>> @@ -4905,21 +4912,13 @@ static bool optimize_semijoin_nests(JOIN
>> save it.
>> */
>> uint n_tables= my_count_bits(sj_nest->sj_inner_tables);
>> - SJ_MATERIALIZATION_INFO* sjm;
>> - if (!(sjm= new SJ_MATERIALIZATION_INFO) ||
>> - !(sjm->positions= (POSITION*)join->thd->alloc(sizeof(POSITION)*
>> - n_tables)))
>> - DBUG_RETURN(TRUE); /* purecov: inspected */
>> - sjm->tables= n_tables;
>> - sjm->is_used= FALSE;
>> - sjm->sjm_scan_allowed= sjm_scan_allowed;
>> -
>> double subjoin_out_rows, subjoin_read_time;
>> get_partial_join_cost(join, n_tables,
>> &subjoin_read_time, &subjoin_out_rows);
>>
>> - sjm->materialization_cost.convert_from_cost(subjoin_read_time);
>> - sjm->rows= subjoin_out_rows;
>> + sj_nest->nested_join->sjm.materialization_cost
>> + .convert_from_cost(subjoin_read_time);
>> + sj_nest->nested_join->sjm.rows= subjoin_out_rows;
>>
>> List<Item> &right_expr_list=
>> sj_nest->sj_subq_pred->unit->first_select()->item_list;
>> @@ -4939,7 +4938,7 @@ static bool optimize_semijoin_nests(JOIN
>> nest (the payoff is probably less here?)
>> */
>> {
>> - for (uint i=0 ; i < join->const_tables + sjm->tables ; i++)
>> + for (uint i=0 ; i < join->const_tables + n_tables ; i++)
>
> This uses n_tables quite far after its initialization; to make sure that no-one
> adds code which changes n_tables meanwhile, I would declare it const.

Good idea.
>
>> {
>> JOIN_TAB *tab= join->best_positions[i].table;
>> join->map2table[tab->table->tablenr]= tab;
>> @@ -4955,10 +4954,16 @@ static bool optimize_semijoin_nests(JOIN
>> double rows= 1.0;
>> while ((tableno = tm_it.next_bit()) != Table_map_iterator::BITMAP_END)
>> rows *= join->map2table[tableno]->table->quick_condition_rows;
>> - sjm->rows= min(sjm->rows, rows);
>> + sj_nest->nested_join->sjm.rows=
> min(sj_nest->nested_join->sjm.rows,
>> + rows);
>> }
>> - memcpy(sjm->positions, join->best_positions + join->const_tables, -
>> sizeof(POSITION) * n_tables);
>> + if (!(sj_nest->nested_join->sjm.positions=
>> + (st_position*)join->thd->alloc(sizeof(st_position)*n_tables)))
>
> st_position or POSITION? Further down the patch uses POSITION in the reverse
> memcpy().

It is the curse of having two names for all structs. As I prefer the lowercase 
name generally (neither of the names adhere to the coding standard), I would 
like to keep st_position here.

A brief grep found 68 occurrences of POSITION and 85 occurrences of st_position.

Or I could replace both with Join_position ;)
>
>> + DBUG_RETURN(TRUE);
>> +
>> + memcpy(sj_nest->nested_join->sjm.positions,
>> + join->best_positions + join->const_tables, + sizeof(st_position) *
> n_tables);
>>
>> /*
>> Calculate temporary table parameters and usage costs
>> @@ -4974,19 +4979,19 @@ static bool optimize_semijoin_nests(JOIN
>> Let materialization cost include the cost to write the data into the
>> temporary table:
>> */ - sjm->materialization_cost.add_io(subjoin_out_rows, lookup_cost);
>> + sj_nest->nested_join->sjm.materialization_cost
>> + .add_io(subjoin_out_rows, lookup_cost);
>> /*
>> Set the cost to do a full scan of the temptable (will need this to consider
>> doing sjm-scan):
>> */ - sjm->scan_cost.zero();
>> - if (sjm->rows > 0.0)
>> - sjm->scan_cost.add_io(sjm->rows, lookup_cost);
>> -
>> - sjm->lookup_cost.convert_from_cost(lookup_cost);
>> - sj_nest->sj_mat_info= sjm;
>> - DBUG_EXECUTE("opt", print_sjm(sjm););
>> + sj_nest->nested_join->sjm.scan_cost.zero();
>> + if (sj_nest->nested_join->sjm.rows > 0.0)
>> + sj_nest->nested_join->sjm.scan_cost
>> + .add_io(sj_nest->nested_join->sjm.rows, lookup_cost);
>> +
>> + sj_nest->nested_join->sjm.lookup_cost.convert_from_cost(lookup_cost);
>> }
>> }
>> }
>> @@ -7504,7 +7509,7 @@ semijoin_order_allows_materialization(co
>> */
>> const TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
>> if (!emb_sj_nest ||
>> - !emb_sj_nest->sj_mat_info ||
>> + !emb_sj_nest->nested_join->sjm.positions ||
>> (remaining_tables & emb_sj_nest->sj_inner_tables))
>> return SJ_OPT_NONE;
>>
>> @@ -7525,7 +7530,7 @@ semijoin_order_allows_materialization(co
>> */
>> if (remaining_tables & emb_sj_nest->nested_join->sj_depends_on)
>> {
>> - if (emb_sj_nest->sj_mat_info->sjm_scan_allowed)
>> + if (emb_sj_nest->nested_join->sjm.scan_allowed)
>> return SJ_OPT_MATERIALIZE_SCAN;
>> return SJ_OPT_NONE;
>> }
>> @@ -8184,7 +8189,7 @@ prev_record_reads(JOIN *join, uint idx, Fix semi-join
>> strategies for the picked join order
>>
>> SYNOPSIS
>> - fix_semijoin_strategies_for_picked_join_order()
>> + fix_semijoin_strategies()
>> join The join with the picked join order
>>
>> DESCRIPTION
>> @@ -8222,16 +8227,20 @@ prev_record_reads(JOIN *join, uint idx, join and
>> semi-join order from left to right.
>> */
>> -static void fix_semijoin_strategies_for_picked_join_order(JOIN *join)
>> +static bool fix_semijoin_strategies(JOIN *join)
>
> The old name was clear about when it happens. "Fix" is so much used in MySQL, to
> me it just means "let me correct this stuff"; even advance_sj_state() could be
> called fix_semijoin_strategies.

I did the change because I thought "for_picked_join_order" was redundant and it 
made the name become unnecessary long. It is also clearly documented in the 
header that it must be called after the join order is fixed, and there is no 
real sense in fixing the strategies before the join order is correct.

But as you did not like it, I will revert this part of the patch.
>
>> {
>> uint table_count=join->tables;
>> uint tablenr;
>> table_map remaining_tables= 0;
>> table_map handled_tabs= 0;
>> +
>> + DBUG_ENTER("fix_semijoin_strategies");
>> +
>> for (tablenr= table_count - 1 ; tablenr != join->const_tables - 1; tablenr--)
>> {
>> POSITION *pos= join->best_positions + tablenr;
>> JOIN_TAB *s= pos->table;
>> + TABLE_LIST *emb_sj_nest= s->emb_sj_nest;
>> uint first;
>> LINT_INIT(first); // Set by every branch except SJ_OPT_NONE which doesn't use it
>>
>> @@ -8243,8 +8252,12 @@ static void fix_semijoin_strategies_for_
>>
>> if (pos->sj_strategy == SJ_OPT_MATERIALIZE_LOOKUP)
>> {
>> - SJ_MATERIALIZATION_INFO *sjm= s->emb_sj_nest->sj_mat_info;
>> - sjm->is_used= TRUE;
>> + uint tables= my_count_bits(emb_sj_nest->sj_inner_tables);
>
> I would declare 'tables' const

Absolutely.
>
>> + SJ_MATERIALIZATION_INFO* sjm;
>> + if (!(sjm= new SJ_MATERIALIZATION_INFO))
>> + DBUG_RETURN(TRUE);
>
> I find "sjm" is confusing: here it's used for a SJ_MATERIALIZATION_INFO object,
> whereas in structure st_semijoin_mat "sjm" is a st_semijoin_mat; and one goal of
> the patch is to separate both structs.

Yes, standalone names must have more expressive names...

How about sjm_exec? (sjm_info is just as bad as sjm...)
>
>> + s->emb_sj_nest->sj_mat_info= sjm;
>> + sjm->tables= tables;
>> sjm->is_sj_scan= FALSE;
>> /*
>> This memcpy() copies a partial QEP produced by
>> @@ -8261,23 +8274,31 @@ static void fix_semijoin_strategies_for_
>> setting it to SJ_OPT_NONE). But until then, pos->sj_strategy should
>> not be read.
>> */
>> - memcpy(pos - sjm->tables + 1, sjm->positions, - sizeof(POSITION) *
>> sjm->tables);
>> - first= tablenr - sjm->tables + 1;
>> - join->best_positions[first].n_sj_tables= sjm->tables;
>> + memcpy(pos - tables + 1, emb_sj_nest->nested_join->sjm.positions, +
>> sizeof(POSITION) * tables);
>> + first= tablenr - tables + 1;
>> + join->best_positions[first].n_sj_tables= tables;
>> join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE_LOOKUP;
>> +
>> + DBUG_EXECUTE("opt", print_sjm(emb_sj_nest););
>> }
>> else if (pos->sj_strategy == SJ_OPT_MATERIALIZE_SCAN)
>> {
>> POSITION *first_inner= join->best_positions + pos->sjm_scan_last_inner;
>> - SJ_MATERIALIZATION_INFO *sjm=
> first_inner->table->emb_sj_nest->sj_mat_info;
>> - sjm->is_used= TRUE;
>> + TABLE_LIST *mat_nest= first_inner->table->emb_sj_nest;
>> + uint tables= my_count_bits(mat_nest->sj_inner_tables);
>> + SJ_MATERIALIZATION_INFO* sjm;
>> + if (!(sjm= new SJ_MATERIALIZATION_INFO))
>> + DBUG_RETURN(TRUE);
>> + mat_nest->sj_mat_info= sjm;
>> + sjm->tables= tables;
>> sjm->is_sj_scan= TRUE;
>> - first= pos->sjm_scan_last_inner - sjm->tables + 1;
>> + first= pos->sjm_scan_last_inner - tables + 1;
>> memcpy(join->best_positions + first, // stale semijoin strategy here too
>> - sjm->positions, sizeof(POSITION) * sjm->tables);
>> + mat_nest->nested_join->sjm.positions,
>> + sizeof(POSITION) * tables);
>> join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE_SCAN;
>> - join->best_positions[first].n_sj_tables= sjm->tables;
>> + join->best_positions[first].n_sj_tables= tables;
>> /* Do what advance_sj_state did: re-run best_access_path for every table
>> in the [last_inner_table + 1; pos..) range
>> @@ -8290,22 +8311,24 @@ static void fix_semijoin_strategies_for_
>> prefix_rec_count= join->best_positions[first-1].prefix_record_count;
>> /* Add materialization record count*/
>> - prefix_rec_count *= sjm->rows;
>> + prefix_rec_count *= mat_nest->nested_join->sjm.rows;
>> uint i;
>> table_map rem_tables= remaining_tables;
>> - for (i= tablenr; i != (first + sjm->tables - 1); i--)
>> + for (i= tablenr; i != (first + tables - 1); i--)
>> rem_tables |= join->best_positions[i].table->table->map;
>>
>> POSITION dummy;
>> join->cur_sj_inner_tables= 0;
>> - for (i= first + sjm->tables; i <= tablenr; i++)
>> + for (i= first + tables; i <= tablenr; i++)
>> {
>> best_access_path(join, join->best_positions[i].table, rem_tables, i, FALSE,
>> prefix_rec_count, join->best_positions + i, &dummy);
>> prefix_rec_count *= join->best_positions[i].records_read;
>> rem_tables &= ~join->best_positions[i].table->table->map;
>> }
>> +
>> + DBUG_EXECUTE("opt", print_sjm(emb_sj_nest););
>> }
>> else if (pos->sj_strategy == SJ_OPT_FIRST_MATCH)
>> {
>> @@ -8400,6 +8423,8 @@ static void fix_semijoin_strategies_for_
>> pos->sj_strategy= SJ_OPT_NONE;
>> remaining_tables |= s->table->map;
>> }
>> +
>> + DBUG_RETURN(FALSE);
>> }
>>
>>
>> @@ -8413,8 +8438,7 @@ static void fix_semijoin_strategies_for_
>>
>> DESCRIPTION
>> Setup join structures according the picked join order
>> - - finalize semi-join strategy choices (see
>> - fix_semijoin_strategies_for_picked_join_order)
>> + - finalize semi-join strategy choices (see fix_semijoin_strategies)
>> - create join->join_tab array and put there the JOIN_TABs in the join order
>> - create data structures describing ref access methods.
>>
>> @@ -8441,7 +8465,8 @@ static bool get_best_combination(JOIN *j
>>
>> used_tables= OUTER_REF_TABLE_BIT; // Outer row is already read
>>
>> - fix_semijoin_strategies_for_picked_join_order(join);
>> + if (fix_semijoin_strategies(join))
>> + DBUG_RETURN(TRUE);
>>
>> for (j=join_tab, tablenr=0 ; tablenr < table_count ; tablenr++,j++)
>> {
>> @@ -9219,8 +9244,8 @@ static bool make_join_select(JOIN *join,
>> - If we're looking at the first SJM table, reset used_tables
>> to refer to only allowed tables
>> */
>> - if (tab->emb_sj_nest && tab->emb_sj_nest->sj_mat_info
> && -
>> tab->emb_sj_nest->sj_mat_info->is_used &&
>> + if (tab->emb_sj_nest &&
>> + tab->emb_sj_nest->sj_mat_info && !(used_tables &
>> tab->emb_sj_nest->sj_inner_tables))
>> {
>> save_used_tables= used_tables;
>> @@ -12358,8 +12383,7 @@ Item *eliminate_item_equal(Item *cond, C
>> */
>> TABLE_LIST *emb_nest=
> item_field->field->table->pos_in_table_list->embedding;
>> - if (!item_const && emb_nest && emb_nest->sj_mat_info
> &&
>> - emb_nest->sj_mat_info->is_used)
>> + if (!item_const && emb_nest && emb_nest->sj_mat_info)
>> {
>> /* Find the first equal expression that refers to a table that is
>> @@ -13436,7 +13460,7 @@ void advance_sj_state(JOIN *join, table_
>> double *current_record_count, double *current_read_time, POSITION
>> *loose_scan_pos)
>> {
>> - TABLE_LIST *emb_sj_nest;
>> + TABLE_LIST *emb_sj_nest= new_join_tab->emb_sj_nest;
>> POSITION *pos= join->positions + idx;
>> remaining_tables &= ~new_join_tab->table->map;
>>
>> @@ -13484,13 +13508,11 @@ void advance_sj_state(JOIN *join, table_
>> table_map handled_by_fm_or_ls= 0;
>> /* FirstMatch Strategy */
>> - if (new_join_tab->emb_sj_nest &&
>> + if (emb_sj_nest &&
>> join->thd->optimizer_switch_flag(OPTIMIZER_SWITCH_FIRSTMATCH))
>> {
>> - const table_map outer_corr_tables=
>> - new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
>> - const table_map sj_inner_tables=
>> - new_join_tab->emb_sj_nest->sj_inner_tables;
>> + const table_map outer_corr_tables=
> emb_sj_nest->nested_join->sj_depends_on;
>> + const table_map sj_inner_tables= emb_sj_nest->sj_inner_tables;
>> /* Enter condition:
>> 1. The next join tab belongs to semi-join nest
>> @@ -13571,7 +13593,7 @@ void advance_sj_state(JOIN *join, table_
>> */
>> if ((pos->first_loosescan_table != MAX_TABLES) && // (1)
>> (first->table->emb_sj_nest->sj_inner_tables & remaining_tables)
> && //(2)
>> - new_join_tab->emb_sj_nest != first->table->emb_sj_nest) //(2)
>> + emb_sj_nest != first->table->emb_sj_nest) //(2)
>> {
>> pos->first_loosescan_table= MAX_TABLES;
>> }
>> @@ -13583,9 +13605,8 @@ void advance_sj_state(JOIN *join, table_
>> if (loose_scan_pos->read_time != DBL_MAX)
>> {
>> pos->first_loosescan_table= idx;
>> - pos->loosescan_need_tables=
>> - new_join_tab->emb_sj_nest->sj_inner_tables | -
>> new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
>> + pos->loosescan_need_tables= emb_sj_nest->sj_inner_tables |
>> + emb_sj_nest->nested_join->sj_depends_on;
>> }
>> if ((pos->first_loosescan_table != MAX_TABLES) && @@ -13629,7 +13650,7
> @@ void
>> advance_sj_state(JOIN *join, table_
>> Update join->cur_sj_inner_tables (Used by FirstMatch in this function and
>> LooseScan detector in best_access_path)
>> */
>> - if ((emb_sj_nest= new_join_tab->emb_sj_nest))
>> + if (emb_sj_nest)
>> {
>> join->cur_sj_inner_tables |= emb_sj_nest->sj_inner_tables;
>> pos->dups_producing_tables |= emb_sj_nest->sj_inner_tables;
>> @@ -13666,16 +13687,16 @@ void advance_sj_state(JOIN *join, table_
>> we reach the point #2.
>> */
>> pos->sjm_scan_need_tables=
>> - new_join_tab->emb_sj_nest->sj_inner_tables | -
>> new_join_tab->emb_sj_nest->nested_join->sj_depends_on |
>> - new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
>> + emb_sj_nest->sj_inner_tables | +
> emb_sj_nest->nested_join->sj_depends_on |
>> + emb_sj_nest->nested_join->sj_corr_tables;
>> pos->sjm_scan_last_inner= idx;
>> }
>> else if (sjm_strategy == SJ_OPT_MATERIALIZE_LOOKUP)
>> {
>> COST_VECT prefix_cost; - SJ_MATERIALIZATION_INFO *mat_info=
>> emb_sj_nest->sj_mat_info;
>> - signed int first_tab= (int)idx - mat_info->tables;
>> + signed int first_tab=
>
> this is not your code, but I wonder whether 'signed' is useful here.

I inserted DBUG_ASSERT(first_tab >= 0); and got verification that first_tab may 
become -1.
>
>> + (int)idx - my_count_bits(emb_sj_nest->sj_inner_tables);
>> double prefix_rec_count;
>> if (first_tab < (int)join->const_tables)
>> {
>> @@ -13689,8 +13710,9 @@ void advance_sj_state(JOIN *join, table_
>> }
>>
>> double mat_read_time= prefix_cost.total_cost();
>> - mat_read_time += mat_info->materialization_cost.total_cost() +
>> - prefix_rec_count * mat_info->lookup_cost.total_cost();
>> + mat_read_time +=
>> + emb_sj_nest->nested_join->sjm.materialization_cost.total_cost() +
>> + prefix_rec_count *
> emb_sj_nest->nested_join->sjm.lookup_cost.total_cost();
>>
>> if (mat_read_time < *current_read_time || pos->dups_producing_tables)
>> {
>> @@ -13703,8 +13725,7 @@ void advance_sj_state(JOIN *join, table_
>> pos->sj_strategy= SJ_OPT_MATERIALIZE_LOOKUP;
>> *current_read_time= mat_read_time;
>> *current_record_count= prefix_rec_count;
>> - pos->dups_producing_tables &=
>> - ~new_join_tab->emb_sj_nest->sj_inner_tables;
>> + pos->dups_producing_tables &= ~emb_sj_nest->sj_inner_tables;
>> }
>> }
>> @@ -13714,11 +13735,11 @@ void advance_sj_state(JOIN *join, table_
>> {
>> TABLE_LIST *mat_nest=
>> join->positions[pos->sjm_scan_last_inner].table->emb_sj_nest;
>> - SJ_MATERIALIZATION_INFO *mat_info= mat_nest->sj_mat_info;
>> + uint tables= my_count_bits(mat_nest->sj_inner_tables);
>
> several my_count_bits() calls added over the patch, maybe the number of tables
> should be cached?

It's almost always sj_inner_tables. It is probably a good idea to add an "uint 
table_count" to the same struct. But sj_inner_tables is part of TABLE_LIST, and 
this should really be part of nested_join, as several TABLE_LIST objects 
implicitly refer a single table.

I think that this is better left to a future refactoring.
>
>>
>> double prefix_cost;
>> double prefix_rec_count;
>> - int first_tab= pos->sjm_scan_last_inner + 1 - mat_info->tables;
>> + int first_tab= pos->sjm_scan_last_inner + 1 - tables;
>> /* Get the prefix cost */
>> if (first_tab == (int)join->const_tables)
>> {
>> @@ -13732,18 +13753,19 @@ void advance_sj_state(JOIN *join, table_
>> }
>>
>> /* Add materialization cost */
>> - prefix_cost += mat_info->materialization_cost.total_cost() +
>> - prefix_rec_count * mat_info->scan_cost.total_cost();
>> - prefix_rec_count *= mat_info->rows;
>> + prefix_cost+=
>> + mat_nest->nested_join->sjm.materialization_cost.total_cost() +
>> + prefix_rec_count * mat_nest->nested_join->sjm.scan_cost.total_cost();
>> + prefix_rec_count*= mat_nest->nested_join->sjm.rows;
>> uint i;
>> table_map rem_tables= remaining_tables;
>> - for (i= idx; i != (first_tab + mat_info->tables - 1); i--)
>> + for (i= idx; i != (first_tab + tables - 1); i--)
>> rem_tables |= join->positions[i].table->table->map;
>>
>> POSITION curpos, dummy;
>> /* Need to re-run best-access-path as we prefix_rec_count has changed */
>> - for (i= first_tab + mat_info->tables; i <= idx; i++)
>> + for (i= first_tab + tables; i <= idx; i++)
>> {
>> best_access_path(join, join->positions[i].table, rem_tables, i, FALSE,
>> prefix_rec_count, &curpos, &dummy);
>> @@ -13775,14 +13797,13 @@ void advance_sj_state(JOIN *join, table_
>> Duplicate weedout can be applied after all ON-correlated and correlated */
>> - TABLE_LIST *nest;
>> - if ((nest= new_join_tab->emb_sj_nest))
>> + if (emb_sj_nest)
>> {
>> if (!pos->dupsweedout_tables)
>> pos->first_dupsweedout_table= idx;
>>
>> - pos->dupsweedout_tables |= nest->sj_inner_tables |
>> - nest->nested_join->sj_depends_on;
>> + pos->dupsweedout_tables|= emb_sj_nest->sj_inner_tables |
>> + emb_sj_nest->nested_join->sj_depends_on;
>> }
>>
>> if (pos->dupsweedout_tables &&
>> === modified file 'sql/sql_test.cc'
>> --- a/sql/sql_test.cc 2010-07-13 17:29:44 +0000
>> +++ b/sql/sql_test.cc 2010-07-27 08:23:50 +0000
>> @@ -377,21 +377,22 @@ print_plan(JOIN* join, uint idx, double }
>>
>>
>> -void print_sjm(SJ_MATERIALIZATION_INFO *sjm)
>> +void print_sjm(TABLE_LIST *emb_sj_nest)
>> {
>> DBUG_LOCK_FILE;
>> + SJ_MATERIALIZATION_INFO *sjm= emb_sj_nest->sj_mat_info;
>> fprintf(DBUG_FILE, "\nsemi-join nest{\n");
>> fprintf(DBUG_FILE, " tables { \n");
>> for (uint i= 0;i < sjm->tables; i++)
>> {
>> fprintf(DBUG_FILE, " %s%s\n", - sjm->positions[i].table->table->alias,
>> + emb_sj_nest->nested_join->sjm.positions[i].table->table->alias,
>> (i == sjm->tables -1)? "": ",");
>> }
>> fprintf(DBUG_FILE, " }\n");
>> fprintf(DBUG_FILE, " materialize_cost= %g\n",
>> - sjm->materialization_cost.total_cost());
>> - fprintf(DBUG_FILE, " rows= %g\n", sjm->rows);
>> + emb_sj_nest->nested_join->sjm.materialization_cost.total_cost());
>> + fprintf(DBUG_FILE, " rows= %g\n", emb_sj_nest->nested_join->sjm.rows);
>> fprintf(DBUG_FILE, "}\n");
>> DBUG_UNLOCK_FILE;
>> }
>>
>> === modified file 'sql/sql_test.h'
>> --- a/sql/sql_test.h 2010-07-13 17:29:44 +0000
>> +++ b/sql/sql_test.h 2010-07-27 08:23:50 +0000
>> @@ -30,7 +30,7 @@ void TEST_join(JOIN *join);
>> void print_plan(JOIN* join,uint idx, double record_count, double read_time,
>> double current_read_time, const char *info);
>> void dump_TABLE_LIST_graph(SELECT_LEX *select_lex, TABLE_LIST* tl);
>> -void print_sjm(SJ_MATERIALIZATION_INFO *sjm);
>> +void print_sjm(TABLE_LIST *emb_sj_nest);
>> void print_keyuse_array(DYNAMIC_ARRAY *keyuse_array);
>> #endif
>> void mysql_print_status();
>>
>> === modified file 'sql/table.h'
>> --- a/sql/table.h 2010-07-23 18:14:59 +0000
>> +++ b/sql/table.h 2010-07-27 08:23:50 +0000
>> @@ -1939,6 +1939,21 @@ public:
>> Natural_join_column *get_natural_column_ref();
>> };
>>
>> +typedef struct st_semijoin_mat
>
> It would be good to have a doxygen comment for this structure, to say that this
> serves for the optimization stage...

Done.
>
>> +{
>> + /* Optimal join order calculated for inner tables of this semijoin op. */
>> + struct st_position *positions;
>> + /* True if data types allow the MaterializeScan semijoin strategy */
>> + bool scan_allowed;
>> + /* Expected #rows in the materialized table */
>> + double rows;
>> + /* Materialization cost - execute sub-join and write rows to temp.table */
>> + COST_VECT materialization_cost;
>> + /* Cost to make one lookup in the temptable */
>> + COST_VECT lookup_cost;
>> + /* Cost of scanning the materialized table */
>> + COST_VECT scan_cost;
>> +} st_semijoin_mat;
>>
>> typedef struct st_nested_join
>> {
>> @@ -1971,6 +1986,7 @@ typedef struct st_nested_join
>> of the semi-join, respectively.
>> */
>> List<Item> sj_outer_exprs, sj_inner_exprs;
>> + st_semijoin_mat sjm;
>> /**
>> True if this join nest node is completely covered by the query execution
>> plan. This means two things.
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>
>

Thanks,
Roy
Thread
bzr commit into mysql-next-mr-opt-backporting branch (roy.lyseng:3219)Bug#50489Roy Lyseng27 Jul
  • Re: bzr commit into mysql-next-mr-opt-backporting branch (roy.lyseng:3219)Bug#50489Guilhem Bichot27 Jul
  • Re: bzr commit into mysql-next-mr-opt-backporting branch (roy.lyseng:3219)Bug#50489Guilhem Bichot9 Aug
    • Re: bzr commit into mysql-next-mr-opt-backporting branch (roy.lyseng:3219)Bug#50489Roy Lyseng12 Aug
      • Re: bzr commit into mysql-next-mr-opt-backporting branch (roy.lyseng:3219)Bug#50489Guilhem Bichot12 Aug
        • Re: bzr commit into mysql-next-mr-opt-backporting branch (roy.lyseng:3219)Bug#50489Roy Lyseng12 Aug