Hello Roy,
Roy Lyseng a écrit, Le 29.04.2011 16:09:
> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:tor.didriksen@stripped
>
> 3367 Roy Lyseng 2011-04-29
> Bug#12407753: Time to compare a row is missing in cost calculation of
> semi-join
>
> Stage 1: Change cost factors to be multiplicative
>
> Basically, this patch changes the constant values TIME_FOR_COMPARE
> and TIME_FOR_COMPARE_ROWID to be the inverse of their previous values
> and gives them names that are more similar to other cost factors.
> As a consequence, all cost calculations involving these constants
> are changed from divisions to multiplications.
> @@ -1235,7 +1235,7 @@ bool check_if_pq_applicable(Sort_param *
> */
> const double pq_cpu_cost=
> (PQ_slowness * num_rows + param->max_keys_per_buffer) *
> - log((double) param->max_keys_per_buffer) / TIME_FOR_COMPARE_ROWID;
> + log((double) param->max_keys_per_buffer) * ROWID_COMPARE_COST;
log() takes a double argument (according to "man log" on my Linux) so
the explicit cast of param->max_keys_per_buffer is maybe not needed.
Same for other "log((double)x)" calls in the patch.
I say "*maybe* not needed", because I haven't checked what the compiler
thinks of my theories.
> const double pq_io_cost=
> param->max_rows * table->file->scan_time() / 2.0;
> const double pq_cost= pq_cpu_cost + pq_io_cost;
>
> === modified file 'sql/filesort_utils.cc'
> --- a/sql/filesort_utils.cc 2010-12-17 09:41:21 +0000
> +++ b/sql/filesort_utils.cc 2011-04-29 14:09:10 +0000
> @@ -26,8 +26,7 @@ double get_merge_cost(ha_rows num_elemen
> {
> return
> 2.0 * ((double) num_elements * elem_size) / IO_SIZE
> - + (double) num_elements * log((double) num_buffers) /
> - (TIME_FOR_COMPARE_ROWID * M_LN2);
> + + num_elements * log((double) num_buffers) * ROWID_COMPARE_COST / M_LN2;
here I guess you removed the cast to double because in "integer*double"
the integer is automatically cast to double IIRC.
> === modified file 'sql/sql_const.h'
> --- a/sql/sql_const.h 2010-12-17 09:41:21 +0000
> +++ b/sql/sql_const.h 2011-04-29 14:09:10 +0000
> @@ -158,16 +158,16 @@
>
> /**
> The following is used to decide if MySQL should use table scanning
> - instead of reading with keys. The number says how many evaluation of the
> - WHERE clause is comparable to reading one extra row from a table.
> + instead of reading with keys. The number says how costly evaluation of the
> + filter condition for a row is compared to reading one extra row from a table.
> */
> -#define TIME_FOR_COMPARE 5.0 // 5 compares == one read
> +#define JOIN_COMPARE_COST 0.20
and this has type "double" per the C++ standard (talking to myself here).
> @@ -7582,18 +7582,18 @@ best_access_path(JOIN *join,
> we read the table (see flush_cached_records for details). Here we
> take into account cost to read and skip these records.
> */
> - tmp+= (s->records - rnd_records)/(double) TIME_FOR_COMPARE;
> + tmp+= (s->records - rnd_records) * JOIN_COMPARE_COST;
> }
> }
>
> /*
> We estimate the cost of evaluating WHERE clause for found records
> - as record_count * rnd_records / TIME_FOR_COMPARE. This cost plus
> + as record_count * rnd_records * JOIN_COMPARE_COST. This cost plus
> tmp give us total cost of using TABLE SCAN
> */
> if (best == DBL_MAX ||
> - (tmp + record_count/(double) TIME_FOR_COMPARE*rnd_records <
> - best + record_count/(double) TIME_FOR_COMPARE*records))
> + (tmp + (record_count * JOIN_COMPARE_COST * rnd_records) <
> + best + (record_count * JOIN_COMPARE_COST * records)))
the added () are not needed, as you know * has priority over +. I don't
find that they add clarity. But keep them if you like.
Ok to push. I like reviewing those kind of patches, they don't exercise
the brain too much :-))