List:Commits« Previous MessageNext Message »
From:Jorgen Loland Date:August 27 2010 12:42pm
Subject:Re: bzr commit into mysql-next-mr-opt-backporting branch (jorgen.loland:3228)
Bug#49129
View as plain text  
Guilhem, Roy

Thank you both for the review. I have done some improvements in my latest patch, 
most noticeably:

  * Added bool extend_duplicate_generating_range() to use in if and ASSERT as 
suggested
  * Added comment on why rowid is needed for join buffering. I added this to the 
code instead of the commit comment so that it's easily accessible for later. I 
had to dig a lot into WLs and code for this, but I think I got it right.
  * Changed tbl_no and tbl names as suggested.
  * Fixed to doxygen @todo's

Two questions remain:

   Why is copy->field NULL for duplicate weedout
   Sergeys todo on counting rowids

I do not have a good answer to these :(

On 08/23/2010 12:07 PM, Guilhem Bichot wrote:
> Hello Jorgen,
>
> I'm not reviewer for this bug, but as I had spent a bit of time on it
> I'm casting thoughts anyway:
>
> Jorgen Loland a écrit, Le 13.08.2010 14:49:
>> #At
>> file:///export/home/jl208045/mysql/mysql-next-mr-opt-backporting-49129/ based
>> on revid:guilhem.bichot@stripped
>>
>> 3228 Jorgen Loland 2010-08-13
>> Bug#49129 - Wrong result with IN-subquery with join_cache_level=6 and
>> firstmatch=off
>> Patch based on contribution from Sergey Petrunia.
>> Consider the query:
>> SELECT * FROM t0 WHERE t0.a IN (
>> SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
>> With join cache level 6, this query only returns the first tuple
>> from t0 that has a match in the subquery. Consider the relevant part
>> of EXPLAIN:
>> t0 | Using where |
>> t1 | Start temporary; Using join buffer |
>> t2 | Using where; End temporary; Using join buffer |
>> When the optimizer decides to use join buffers, temporary tables
>> created for duplicate weedout should extend to the first table
>> after const tables. I.e., Start temporary should be printed for
>> t0 above.
>
> here we could explain why the temporary table should extend to the first
> table after const tables. I think it's for the following reason.
> If join buffering is not used, we have a usual nested loop join. It can
> be like this:
> * read a first record from t0
> ** read a first record from t1
> *** read a first record from t2
> *** store (t1 || t2) ( || stands for concatenation of records) into tmp
> table
> *** read a second record from t2
> *** store (t1 || t2) into tmp table; to not produce duplicates, we need
> to recognize that this is about the same first record of t1 as in the
> previous iteration; this works as we have stored the rowid of t1's
> record in the tmp table, so that it can be eliminated as duplicate.
> Or it can be like this:
> * read a first record from t0
> ** read a first record from t1
> *** read a first record from t2
> *** store (t1 || t2) into tmp table
> ** no more records in t2, read a second record from t1
> *** read a first record from t2
> *** store (t1 || t2) into tmp table; to not produce duplicates, we need
> to recognize that this is about the same first record of t2 as in the
> previous iteration; this works as we have stored the rowid of t2's
> record in the tmp table, so that it can be eliminated as duplicate.
> Anyway, when we get to the second record of t0, we will never have a
> chance to produce duplicates of what we produced for the first record of
> t0 (we will never get back to this first record of t0, this is nested
> loop join): thus t0's record does not need to be stored in the tmp table.
> Now, if using join buffering, for example block nested loop join.
> We read records from t0, we buffer many of them (because t1 does join
> buffering), then we read records of t1, we also buffer many of them
> (because t2 does join buffering), then it's time to flush all those
> buffers and produce a result set. Flushing happens in the t2-t1 order:
> we read a record from table t2, and for this record, we join it with all
> buffered records from t1 and all buffered records from t0. So you get
> this order of output records:
> first-row-of-t2 || first-row-of-t1 || first-row-of-t0
> first-row-of-t2 || first-row-of-t1 || second-row-of-t0
> etc
> If we don't store t0 into the tmp table, the two first rows reduce to:
> first-row-of-t2 || first-row-of-t1
> first-row-of-t2 || first-row-of-t1
> which are then seen as duplicates, so the second-row-of-t0 will be
> wrongly excluded from the result. On the opposite, if we store t0 into
> the tmp table, the two first are correctly non-duplicate (t0's rowid
> helps distinguishing them).
>
>> The reason for the bug is that setup_semijoin_dups_elimination()
>> is called before the final decision is made in
>> check_join_cache_usage() on whether or not to use join buffering.
>> In this case, use_join_buffer==false for t1 and t2 during
>> setup_semijoin_dups_elimination(), and the range of tables to
>> buffer is therefore not extended to t0.
>> Since check_join_cache_usage() needs to know if duplicate weedout
>> is used, so moving setup_semijoin_dups_elimination() from before
>> check_join_cache_usage() to after it is not possible. The temporary
>> fix of this patch is to use a rough estimate of
>> whether join buffering will be used in
>> setup_semijoin_dups_elimination(). This rough test covers more cases
>> than actually end up with join buffering, and in these cases we now
>> extend the temporary table to store rowids for more tables than
>> strictly required, i.e., the first table up to the start of the
>> semijoin. A proper (but much more costly to do) fix would be to merge
>> the loops of setup_semijoin_dups_elimination() and
>> make_join_readinfo() (which calls check_join_cache_usage()).
>
> good comment
>
>> === modified file 'sql/sql_join_cache.cc'
>> --- a/sql/sql_join_cache.cc 2010-07-23 17:51:11 +0000
>> +++ b/sql/sql_join_cache.cc 2010-08-13 12:49:50 +0000
>> @@ -612,7 +612,12 @@ int JOIN_CACHE_BKA::init()
>> copy_end= cache->field_descr+cache->fields;
>> for (copy= cache->field_descr+cache->flag_fields; copy < copy_end;
>> copy++)
>> {
>> - if (copy->field->table == tab->table &&
>> + /*
>> + (1) - when we store rowids for DuplicateWeedout, they have
>> + copy->field==NULL
>
> do you know why?
>
>> + */
>> + if (copy->field && // (1)
>> + copy->field->table == tab->table &&
>> bitmap_is_set(key_read_set, copy->field->field_index))
>> {
>> *copy_ptr++= copy;
>> === modified file 'sql/sql_select.cc'
>> --- a/sql/sql_select.cc 2010-08-12 09:02:11 +0000
>> +++ b/sql/sql_select.cc 2010-08-13 12:49:50 +0000
>> @@ -1430,10 +1430,38 @@ int setup_semijoin_dups_elimination(JOIN
>> forwards, but do not destroy other duplicate elimination methods.
>> */
>> uint first_table= i;
>> + uint join_cache_level= join->thd->variables.optimizer_join_cache_level;
>> for (uint j= i; j < i + pos->n_sj_tables; j++)
>> {
>> - if (join->best_positions[j].use_join_buffer && j <=
> no_jbuf_after)
>> + /*
>> + The final decision on whether or not join buffering will
>> + be used is done in check_join_cache_usage(), which is
>> + called from make_join_readinfo()'s main loop.
>> + check_join_cache_usage() needs to know if duplicate
>> + weedout is used, so moving
>> + setup_semijoin_dups_elimination() from before the main
>> + loop to after it is not possible. I.e.,
>> + join->best_positions[j].use_join_buffer is not
>> + trustworthy at this point.
>> +
>> + TODO: merge make_join_readinfo() and
>
> for doxygen, you could use /** @todo */
>
>> + setup_semijoin_dups_elimination() loops and change the
>> + following 'if' to
>> +
>> + "if (join->best_positions[j].use_join_buffer && + j <=
> no_jbuf_after)".
>> +
>> + For now, use a rough criteria:
>> + */
>> + JOIN_TAB *sj_tab=join->join_tab + j; + if (j != join->const_tables
>> && + sj_tab->use_quick != 2 && + j <= no_jbuf_after
> &&
>> + ((sj_tab->type == JT_ALL && join_cache_level != 0) ||
>> + (join_cache_level > 4 && (tab->type == JT_REF || + tab->type
> ==
>> JT_EQ_REF))))
>
> I would like to protect against the case of a developer, in the future,
> who would relax the conditions in check_join_cache_usage() and would
> forget to update the if() above. Imagine that person would make
> check_join_cache_usage() use BKA for some more types than
> JT_REF/JT_EQ_REF and forget to update the if() above: the bug would be
> re-introduced.
> To prevent this, a suggestion:
> - put the if() condition into a function, named like
> bool might_do_join_buffering() const (or a better name :-)
> - in check_join_cache_usage(), each time we decide to use a join buffer
> (JOIN_CACHE_(BNL|BKA|BKA_UNIQUE)), do
> DBUG_ASSERT(might_do_join_buffering());
> (or add this assertion into the constructor of JOIN_CACHE, if
> technically possible).
> This would ensure that the if() above is always a superset of
> check_join_cache_usage()'s decisions.
>
>> {
>> + /* Looks like we'll be using join buffer */
>
> I would change to "it might be that we'll be using join buffer" to
> insist on the uncertainty.
>
>> first_table= join->const_tables;
>> break;
>> }
>> @@ -8080,6 +8108,10 @@ void calc_used_field_length(THD *thd, JO
>> (join_tab->table->s->reclength- rec_length));
>> rec_length+=(uint) max(4,blob_length);
>> }
>> + /*
>> + psergey-todo: why we don't count here rowid that we might need to store
>> + when using DuplicateElimination?
>> + */
>
> How about a different type of todo-tag (/** @todo */)? If you have time:
> maybe this todo asks an interesting question, maybe it's possible to
> create a testcase which shows a real bug? Or it's possible to muse a bit
> and find that this todo is actually a non-problem? Maybe that would be
> significant work, though?
>
>> join_tab->used_fields=fields;
>> join_tab->used_fieldlength=rec_length;
>> join_tab->used_blobs=blobs;

-- 
Jørgen Løland | Senior Software Engineer | +47 73842138
Oracle MySQL
Trondheim, Norway
Thread
bzr commit into mysql-next-mr-opt-backporting branch (jorgen.loland:3228)Bug#49129Jorgen Loland13 Aug
  • Re: bzr commit into mysql-next-mr-opt-backporting branch (jorgen.loland:3228)Bug#49129Jorgen Loland13 Aug
    • Re: bzr commit into mysql-next-mr-opt-backporting branch (jorgen.loland:3228)Bug#49129Roy Lyseng26 Aug
  • Re: bzr commit into mysql-next-mr-opt-backporting branch (jorgen.loland:3228)Bug#49129Guilhem Bichot23 Aug
    • Re: bzr commit into mysql-next-mr-opt-backporting branch (jorgen.loland:3228)Bug#49129Jorgen Loland27 Aug