From: Roy Lyseng Date: May 2 2011 7:28am Subject: Re: bzr commit into mysql-trunk branch (roy.lyseng:3367) Bug#12407753 List-Archive: http://lists.mysql.com/commits/136480 Message-Id: <4DBE5D2C.8060901@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi Guilhem, thanks for the review! On 29.04.11 18.56, Guilhem Bichot wrote: > Hello Roy, > > Roy Lyseng a écrit, Le 29.04.2011 16:09: >> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on >> revid:tor.didriksen@stripped >> >> 3367 Roy Lyseng 2011-04-29 >> Bug#12407753: Time to compare a row is missing in cost calculation of semi-join >> Stage 1: Change cost factors to be multiplicative Basically, this patch >> changes the constant values TIME_FOR_COMPARE >> and TIME_FOR_COMPARE_ROWID to be the inverse of their previous values >> and gives them names that are more similar to other cost factors. >> As a consequence, all cost calculations involving these constants >> are changed from divisions to multiplications. > >> @@ -1235,7 +1235,7 @@ bool check_if_pq_applicable(Sort_param * >> */ >> const double pq_cpu_cost= (PQ_slowness * num_rows + param->max_keys_per_buffer) * >> - log((double) param->max_keys_per_buffer) / TIME_FOR_COMPARE_ROWID; >> + log((double) param->max_keys_per_buffer) * ROWID_COMPARE_COST; > > log() takes a double argument (according to "man log" on my Linux) so the > explicit cast of param->max_keys_per_buffer is maybe not needed. > Same for other "log((double)x)" calls in the patch. > I say "*maybe* not needed", because I haven't checked what the compiler thinks > of my theories. I agree, but there are some issues with log() (or maybe some other math function) that caused a broken build a few months ago, so I put in an explicit cast for safety. Jørgen had a problem with this in bug#30597. > >> const double pq_io_cost= >> param->max_rows * table->file->scan_time() / 2.0; >> const double pq_cost= pq_cpu_cost + pq_io_cost; >> >> === modified file 'sql/filesort_utils.cc' >> --- a/sql/filesort_utils.cc 2010-12-17 09:41:21 +0000 >> +++ b/sql/filesort_utils.cc 2011-04-29 14:09:10 +0000 >> @@ -26,8 +26,7 @@ double get_merge_cost(ha_rows num_elemen >> { >> return 2.0 * ((double) num_elements * elem_size) / IO_SIZE >> - + (double) num_elements * log((double) num_buffers) / >> - (TIME_FOR_COMPARE_ROWID * M_LN2); >> + + num_elements * log((double) num_buffers) * ROWID_COMPARE_COST / M_LN2; > > here I guess you removed the cast to double because in "integer*double" the > integer is automatically cast to double IIRC. Yep. > >> === modified file 'sql/sql_const.h' >> --- a/sql/sql_const.h 2010-12-17 09:41:21 +0000 >> +++ b/sql/sql_const.h 2011-04-29 14:09:10 +0000 >> @@ -158,16 +158,16 @@ >> >> /** >> The following is used to decide if MySQL should use table scanning >> - instead of reading with keys. The number says how many evaluation of the >> - WHERE clause is comparable to reading one extra row from a table. >> + instead of reading with keys. The number says how costly evaluation of the >> + filter condition for a row is compared to reading one extra row from a table. >> */ >> -#define TIME_FOR_COMPARE 5.0 // 5 compares == one read >> +#define JOIN_COMPARE_COST 0.20 > > and this has type "double" per the C++ standard (talking to myself here). Yep again. > >> @@ -7582,18 +7582,18 @@ best_access_path(JOIN *join, >> we read the table (see flush_cached_records for details). Here we >> take into account cost to read and skip these records. >> */ >> - tmp+= (s->records - rnd_records)/(double) TIME_FOR_COMPARE; >> + tmp+= (s->records - rnd_records) * JOIN_COMPARE_COST; >> } >> } >> >> /* >> We estimate the cost of evaluating WHERE clause for found records >> - as record_count * rnd_records / TIME_FOR_COMPARE. This cost plus >> + as record_count * rnd_records * JOIN_COMPARE_COST. This cost plus >> tmp give us total cost of using TABLE SCAN >> */ >> if (best == DBL_MAX || >> - (tmp + record_count/(double) TIME_FOR_COMPARE*rnd_records < >> - best + record_count/(double) TIME_FOR_COMPARE*records)) >> + (tmp + (record_count * JOIN_COMPARE_COST * rnd_records) < >> + best + (record_count * JOIN_COMPARE_COST * records))) > > the added () are not needed, as you know * has priority over +. I don't find > that they add clarity. But keep them if you like. I prefer to keep them, they do not hurt... > > Ok to push. I like reviewing those kind of patches, they don't exercise the > brain too much :-)) Maybe I should do more Friday night commits then :) Thanks, Roy