Hi Guilhem,
thank you for the new commit.
The patch is approved by me.
Please see some inline minor comments, which you are free to ignore.
On 01.04.11 11.09, Guilhem Bichot wrote:
> #At file:///home/mysql_src/bzrrepos_new/59651/ based on
> revid:roy.lyseng@stripped
>
> 3348 Guilhem Bichot 2011-04-01
> Fix for BUG#11766522 "59651: ASSERTION `TABLE_REF->HAS_RECORD' FAILED WITH
> JOIN_CACHE_LEVEL=3":
> during execution, access to last-in-plan tables spoiled TABLE::status of first
> tables.
> @ mysql-test/include/join_cache.inc
> test for bug. Without the code fix, join_cache_jcl3 would hit an assertion
> failure in join_read_key2.
> @ mysql-test/r/join_cache_jcl1.result
> correct result (32 rows)
> @ mysql-test/r/join_cache_jcl2.result
> correct result (32 rows)
> @ mysql-test/r/join_cache_jcl3.result
> correct result (32 rows)
> @ mysql-test/r/join_cache_jcl4.result
> correct result (32 rows)
> @ mysql-test/r/join_cache_jcl5.result
> correct result (32 rows)
> @ mysql-test/r/join_cache_jcl6.result
> correct result (32 rows)
> @ mysql-test/r/join_cache_jcl7.result
> correct result (32 rows)
> @ mysql-test/r/join_cache_jcl8.result
> correct result (32 rows)
> @ sql/handler.h
> Removing the assignments to table->status
> from ha_innodb.cc, ha_myisam.cc, ha_heap.cc causes test failures,
> which shows that engines must set table->status.
> So tell this brutal truth, in comments: engines must set table->status.
> All those which I could check, do so (including PBXT).
> Question for reviewers: most of the affected handler functions
> have ha_* non-virtual wrappers (ha_index_next_same(), etc);
> we could, in a follow-up patch:
> - remove the requirement from the engine (don't set table->status
> in their functions anymore)
> - set table->status in the ha_* wrappers.
> - make STATUS_ * flags invisible to engines
> - change them to an enum, and replace the magic value "0"
> with an enum value.
> @ sql/sql_join_cache.cc
> Deleting broken code which caused the bug. It was broken for two reasons.
> First, it affected JOIN_TABs located _before_ 'join_tab' in the plan;
> there is no reason why reading rows from 'join_tab' should affect
> TABLE::status of previous JOIN_TABs.
> Second, the save/restore logic was not recursion-safe; consider:
> int save_x, x; // global variables
> f()
> {
> save_x= x; // save
> f();
> x= save_x; // restore
> }
> the inner call to f() will overwrite the value of save_x,
> in the end x will not get its original value back.
> That is what caused the assertion failure. In the bug's testcase,
> the last tables of the plan are t7, t8, t6. t7 is doing ref access,
> t8 and t6 are doing join buffering. During execution,
> at some point, no row is found in t7 so t7's table->status is set to
> STATUS_NOT_FOUND // because no record found
> | STATUS_NULL_ROW // because NULL-complemented (LEFT JOIN)
> Then we continue into t8: at some point the join buffer of t8 is full,
> so t8 cannot accumulate more partial records sent by t7, so
> join_records() is called for t8, to generate records (of t8 joined
> with its buffer), in order to be able to later empty t8's buffer.
> Those records are generated, which backs up t7's table->status
> into t7's tab->status, and changes table->status to 0.
> Generated records are passed to t6. t6's buffer gets full. So
> join_records() is called for t6. Which backs up t7's table->status
> (0) into t7's tab->status. From then on, things are wrong, as when
> the two join_records() finally end, t7's table->status is not
> restored to its original value (STATUS_NOT_FOUND|STATUS_NULL_ROW)
> but to 0 ("save_x" issue above). After that, more rows come from tables
> before t7, so we have another ref access to t7 i.e. a call to
> join_read_key2(),
> which has this code:
> if (cmp_buffer_with_ref(tab->join->thd, table, table_ref) ||
> (table->status& (STATUS_GARBAGE | STATUS_NULL_ROW)))
> {
> ...
> }
> else if (table->status == 0)
> {
> DBUG_ASSERT(table_ref->has_record);
> table_ref->use_count++;
> }
> The "else" branch is expected to be for when:
> - the lookup value is the same as the previous lookup value (see
> cmp_buffer_with_ref())
> - the previous lookup did find a record (i.e. table->status==0)
> - in which case there should be a record ready for reusal
> (which is asserted with table_ref->has_record) and a new lookup can be
> avoided.
> But in our case, though the lookup value is the same, there
> is no record; if t7's table->status was correct it would still
> have STATUS_NULL_ROW and we wouldn't enter the "else" block.
> Because t7's table->status is wrongly 0, we enter the "else"
> and the assertion fires, because there is no record actually.
>
> This code dates from 2000 at the latest (when code was imported into
> bitkeeper).
>
> It is unlikely that this causes a real bug in 5.1 or 5.5 because:
> - those older versions do join buffering only for inner join
> - in an inner join, if the "ref" access finds no rows, we don't
> go into the "next" tables, so have no chance to corrupt table->status
> of the table doing "ref" access.
> @ sql/sql_select.cc
> unused flag (was never set)
> @ sql/sql_select.h
> JOIN_TAB::status not needed anymore
> @ sql/structs.h
> Best-effort documentation of STATUS_ flags, to the extent of what I
> understood.
> Deleting STATUS_PARENT and STATUS_NOT_READ: they were never set, and this is
> sure:
> Deleting STATUS_NO_RECORD.
> I temporarily changed TABLE::status to an enum so that any
> table->status= 8; /* using 8 instead of STATUS_NOT_READ */
> would be caught by the compiler.
>
> modified:
> mysql-test/include/join_cache.inc
> mysql-test/r/join_cache_jcl1.result
> mysql-test/r/join_cache_jcl2.result
> mysql-test/r/join_cache_jcl3.result
> mysql-test/r/join_cache_jcl4.result
> mysql-test/r/join_cache_jcl5.result
> mysql-test/r/join_cache_jcl6.result
> mysql-test/r/join_cache_jcl7.result
> mysql-test/r/join_cache_jcl8.result
> sql/handler.h
> sql/item_cmpfunc.cc
> sql/sql_base.h
> sql/sql_join_cache.cc
> sql/sql_select.cc
> sql/sql_select.h
> sql/structs.h
> sql/table.cc
>
<Long test section clipped away>
> === modified file 'sql/handler.h'
> --- a/sql/handler.h 2011-01-26 21:12:56 +0000
> +++ b/sql/handler.h 2011-04-01 09:09:15 +0000
> @@ -1799,9 +1799,12 @@ public:
> }
> /**
> @brief
> - Positions an index cursor to the index specified in the handle. Fetches the
> - row if available. If the key value is null, begin at the first key of the
> - index.
> + Positions an index cursor to the index specified in the handle
> + ('active_index'). Fetches the row if available. If the key value is null,
> + begin at the first key of the index.
> + To indicate success (found a record), function returns 0 _and_
> + sets table->status to 0. Otherwise, returns non-zero and sets
> + table->status to STATUS_NOT_FOUND.
> */
> virtual int index_read_map(uchar * buf, const uchar * key,
> key_part_map keypart_map,
> @@ -1813,27 +1816,34 @@ public:
> protected:
> /**
> @brief
> - Positions an index cursor to the index specified in the handle. Fetches the
> - row if available. If the key value is null, begin at the first key of the
> - index.
> + Positions an index cursor to the index specified in parameter. Fetches
nitpick: C++ uses arguments, not parameters.
> + the row if available. If the key value is null, begin at the first key of
> + the index.
> + @returns as index_read_map().
> */
> virtual int index_read_idx_map(uchar * buf, uint index, const uchar * key,
> key_part_map keypart_map,
> enum ha_rkey_function find_flag);
> + /// @returns as index_read_map().
> virtual int index_next(uchar * buf)
> { return HA_ERR_WRONG_COMMAND; }
> + /// @returns as index_read_map().
> virtual int index_prev(uchar * buf)
> { return HA_ERR_WRONG_COMMAND; }
> + /// @returns as index_read_map().
> virtual int index_first(uchar * buf)
> { return HA_ERR_WRONG_COMMAND; }
> + /// @returns as index_read_map().
> virtual int index_last(uchar * buf)
> { return HA_ERR_WRONG_COMMAND; }
Proposal: Add a blank line between the definitions above.
> public:
> + /// @returns as index_read_map().
> virtual int index_next_same(uchar *buf, const uchar *key, uint keylen);
> /**
> @brief
> The following functions works like index_read, but it find the last
> row with the current key value or prefix.
> + @returns as index_read_map().
> */
> virtual int index_read_last_map(uchar * buf, const uchar * key,
> key_part_map keypart_map)
> @@ -1853,7 +1863,9 @@ public:
> { return NULL; }
> virtual int ft_read(uchar *buf) { return HA_ERR_WRONG_COMMAND; }
> protected:
> + /// @returns as index_read_map().
> virtual int rnd_next(uchar *buf)=0;
> + /// @returns as index_read_map().
> virtual int rnd_pos(uchar * buf, uchar *pos)=0;
> public:
> /**
>
> === modified file 'sql/item_cmpfunc.cc'
> --- a/sql/item_cmpfunc.cc 2011-02-08 15:54:12 +0000
> +++ b/sql/item_cmpfunc.cc 2011-04-01 09:09:15 +0000
> @@ -430,7 +430,8 @@ static bool convert_constant_item(THD *t
> */
> bool save_field_value= (field_item->depended_from&&
> (field_item->const_item() ||
> - !(field->table->status&
> STATUS_NO_RECORD)));
> + !(field->table->status&
> + (STATUS_GARBAGE | STATUS_NOT_FOUND))));
> if (save_field_value)
> orig_field_val= field->val_int();
> if (!(*item)->is_null()&& !(*item)->save_in_field(field, 1))
>
> === modified file 'sql/sql_base.h'
> --- a/sql/sql_base.h 2010-11-23 22:37:59 +0000
> +++ b/sql/sql_base.h 2011-04-01 09:09:15 +0000
> @@ -323,7 +323,7 @@ inline void setup_table_map(TABLE *table
> table->used_fields= 0;
> table->const_table= 0;
> table->null_row= 0;
> - table->status= STATUS_NO_RECORD;
> + table->status= STATUS_GARBAGE | STATUS_NOT_FOUND;
> table->maybe_null= table_list->outer_join;
> TABLE_LIST *embedding= table_list->embedding;
> while (!table->maybe_null&& embedding)
>
> === modified file 'sql/sql_join_cache.cc'
> --- a/sql/sql_join_cache.cc 2010-09-02 07:34:10 +0000
> +++ b/sql/sql_join_cache.cc 2011-04-01 09:09:15 +0000
> @@ -1761,7 +1761,6 @@ enum_nested_loop_state JOIN_CACHE_BNL::j
> {
> uint cnt;
> int error;
> - JOIN_TAB *tab;
> READ_RECORD *info;
> enum_nested_loop_state rc= NESTED_LOOP_OK;
> SQL_SELECT *select= join_tab->cache_select;
> @@ -1788,18 +1787,9 @@ enum_nested_loop_state JOIN_CACHE_BNL::j
> join_tab->select->quick= 0;
> }
>
> - for (tab= join->join_tab; tab != join_tab ; tab++)
> - {
> - tab->status= tab->table->status;
> - tab->table->status= 0;
> - }
> -
> /* Start retrieving all records of the joined table */
> if ((error= join_init_read_record(join_tab)))
> - {
> - rc= error< 0 ? NESTED_LOOP_NO_MORE_ROWS: NESTED_LOOP_ERROR;
> - goto finish;
> - }
> + return error< 0 ? NESTED_LOOP_NO_MORE_ROWS: NESTED_LOOP_ERROR;
>
> info=&join_tab->read_record;
> do
> @@ -1811,8 +1801,7 @@ enum_nested_loop_state JOIN_CACHE_BNL::j
> {
> /* The user has aborted the execution of the query */
> join->thd->send_kill_message();
> - rc= NESTED_LOOP_KILLED;
> - goto finish;
> + return NESTED_LOOP_KILLED;
> }
>
> /*
> @@ -1826,10 +1815,7 @@ enum_nested_loop_state JOIN_CACHE_BNL::j
>
> (!select->skip_record(join->thd,&skip_record)&&
> !skip_record));
> if (select&& join->thd->is_error())
> - {
> - rc= NESTED_LOOP_ERROR;
> - goto finish;
> - }
> + return NESTED_LOOP_ERROR;
> if (consider_record)
> {
> /* Prepare to read records from the join buffer */
> @@ -1847,7 +1833,7 @@ enum_nested_loop_state JOIN_CACHE_BNL::j
> get_record();
> rc= generate_full_extensions(get_curr_rec());
> if (rc != NESTED_LOOP_OK&& rc != NESTED_LOOP_NO_MORE_ROWS)
> - goto finish;
> + return rc;
> }
> }
> }
> @@ -1856,9 +1842,6 @@ enum_nested_loop_state JOIN_CACHE_BNL::j
>
> if (error> 0) // Fatal error
> rc= NESTED_LOOP_ERROR;
> -finish:
> - for (tab= join->join_tab; tab != join_tab ; tab++)
> - tab->table->status= tab->status;
> return rc;
> }
>
> @@ -2299,7 +2282,7 @@ enum_nested_loop_state JOIN_CACHE_BKA::j
>
> rc= init_join_matching_records(&seq_funcs, records);
> if (rc != NESTED_LOOP_OK)
> - goto finish;
> + return rc;
>
> while (!(error= file->multi_range_read_next((char **)&rec_ptr)))
> {
> @@ -2307,8 +2290,7 @@ enum_nested_loop_state JOIN_CACHE_BKA::j
> {
> /* The user has aborted the execution of the query */
> join->thd->send_kill_message();
> - rc= NESTED_LOOP_KILLED;
> - goto finish;
> + return NESTED_LOOP_KILLED;
> }
> if (join_tab->keep_current_rowid)
> join_tab->table->file->position(join_tab->table->record[0]);
> @@ -2323,14 +2305,13 @@ enum_nested_loop_state JOIN_CACHE_BKA::j
> get_record_by_pos(rec_ptr);
> rc= generate_full_extensions(rec_ptr);
> if (rc != NESTED_LOOP_OK&& rc != NESTED_LOOP_NO_MORE_ROWS)
> - goto finish;
> + return rc;
> }
> }
>
> if (error> 0&& error != HA_ERR_END_OF_FILE)
> return NESTED_LOOP_ERROR;
> -finish:
> - return end_join_matching_records(rc);
> + return rc;
> }
>
>
> @@ -2375,12 +2356,6 @@ JOIN_CACHE_BKA::init_join_matching_recor
> /* Dynamic range access is never used with BKA */
> DBUG_ASSERT(join_tab->use_quick != 2);
>
> - for (JOIN_TAB *tab =join->join_tab; tab != join_tab ; tab++)
> - {
> - tab->status= tab->table->status;
> - tab->table->status= 0;
> - }
> -
> init_mrr_buff();
>
> /*
> @@ -2398,31 +2373,6 @@ JOIN_CACHE_BKA::init_join_matching_recor
>
>
> /*
> - Finish searching for records that match records from the join buffer
> -
> - SYNOPSIS
> - end_join_matching_records()
> - rc return code passed by the join_matching_records function
> -
> - DESCRIPTION
> - This function perform final actions on searching for all matches for
> - the records from the join buffer and building all full join extensions
> - of the records with these matches.
> -
> - RETURN
> - return code rc passed to the function as a parameter
> -*/
> -
> -enum_nested_loop_state
> -JOIN_CACHE_BKA::end_join_matching_records(enum_nested_loop_state rc)
> -{
> - for (JOIN_TAB *tab=join->join_tab; tab != join_tab ; tab++)
> - tab->table->status= tab->status;
> - return rc;
> -}
> -
> -
> -/*
> Get the key built over the next record from BKA join buffer
>
> SYNOPSIS
> @@ -3192,7 +3142,7 @@ JOIN_CACHE_BKA_UNIQUE::join_matching_rec
>
> rc= init_join_matching_records(&seq_funcs, key_entries);
> if (rc != NESTED_LOOP_OK)
> - goto finish;
> + return rc;
>
> while (!(error= file->multi_range_read_next((char **)&key_chain_ptr)))
> {
> @@ -3227,8 +3177,7 @@ JOIN_CACHE_BKA_UNIQUE::join_matching_rec
> {
> /* The user has aborted the execution of the query */
> join->thd->send_kill_message();
> - rc= NESTED_LOOP_KILLED;
> - goto finish;
> + return NESTED_LOOP_KILLED;
> }
> /*
> If only the first match is needed and it has been already found
> @@ -3241,7 +3190,7 @@ JOIN_CACHE_BKA_UNIQUE::join_matching_rec
> get_record_by_pos(rec_ptr);
> rc= generate_full_extensions(rec_ptr);
> if (rc != NESTED_LOOP_OK&& rc != NESTED_LOOP_NO_MORE_ROWS)
> - goto finish;
> + return rc;
> }
> }
> while (next_rec_ref_ptr != last_rec_ref_ptr);
> @@ -3249,8 +3198,7 @@ JOIN_CACHE_BKA_UNIQUE::join_matching_rec
>
> if (error> 0&& error != HA_ERR_END_OF_FILE)
> return NESTED_LOOP_ERROR;
> -finish:
> - return end_join_matching_records(rc);
> + return rc;
> }
>
>
>
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc 2011-03-03 09:43:14 +0000
> +++ b/sql/sql_select.cc 2011-04-01 09:09:15 +0000
> @@ -11168,7 +11168,7 @@ make_join_readinfo(JOIN *join, ulonglong
> */
> tab->sorted= sorted;
> sorted= 0; // only first must be sorted
> - table->status=STATUS_NO_RECORD;
> + table->status= STATUS_GARBAGE | STATUS_NOT_FOUND;
> pick_table_access_method (tab);
>
> if (tab->loosescan_match_tab)
> @@ -17960,7 +17960,7 @@ join_read_const_table(JOIN_TAB *tab, POS
> TABLE *table=tab->table;
> table->const_table=1;
> table->null_row=0;
> - table->status=STATUS_NO_RECORD;
> + table->status= STATUS_GARBAGE | STATUS_NOT_FOUND;
>
> if (tab->type == JT_SYSTEM)
> {
> @@ -18154,7 +18154,7 @@ join_read_key2(JOIN_TAB *tab, TABLE *tab
> indices on NOT NULL columns (see create_ref_for_key()).
> */
> if (cmp_buffer_with_ref(tab->join->thd, table, table_ref) ||
> - (table->status& (STATUS_GARBAGE | STATUS_NO_PARENT |
> STATUS_NULL_ROW)))
> + (table->status& (STATUS_GARBAGE | STATUS_NULL_ROW)))
> {
> if (table_ref->key_err)
> {
>
> === modified file 'sql/sql_select.h'
> --- a/sql/sql_select.h 2011-03-01 14:57:53 +0000
> +++ b/sql/sql_select.h 2011-04-01 09:09:15 +0000
> @@ -296,7 +296,6 @@ typedef struct st_join_table : public Sq
>
> table_map dependent,key_dependent;
> uint index;
> - uint status; ///< Save status for cache
> uint used_fields,used_fieldlength,used_blobs;
> uint used_null_fields;
> uint used_rowid_fields;
> @@ -470,7 +469,6 @@ st_join_table::st_join_table()
> dependent(0),
> key_dependent(0),
> index(0),
> - status(0),
> used_fields(0),
> used_fieldlength(0),
> used_blobs(0),
> @@ -1032,9 +1030,6 @@ protected:
> enum_nested_loop_state init_join_matching_records(RANGE_SEQ_IF *seq_funcs,
> uint ranges);
>
> - /* Finish searching for records that match records from the join buffer */
> - enum_nested_loop_state end_join_matching_records(enum_nested_loop_state rc);
> -
> public:
>
> /*
>
> === modified file 'sql/structs.h'
> --- a/sql/structs.h 2010-10-21 09:49:16 +0000
> +++ b/sql/structs.h 2011-04-01 09:09:15 +0000
> @@ -231,13 +231,20 @@ typedef struct user_conn {
> #define REG_SKIP_DUP 256
>
> /* Bits in form->status */
> -#define STATUS_NO_RECORD (1+2) /* Record isn't usably */
> #define STATUS_GARBAGE 1
> -#define STATUS_NOT_FOUND 2 /* No record in database when needed */
> -#define STATUS_NO_PARENT 4 /* Parent record wasn't found */
> -#define STATUS_NOT_READ 8 /* Record isn't read */
> -#define STATUS_UPDATED 16 /* Record is updated by formula */
> -#define STATUS_NULL_ROW 32 /* table->null_row is set */
> +/**
> + Means we were searching for a row and didn't find it. This is used by
> + storage engines (@see handler::index_read_map()) and the Server layer.
> +*/
> +#define STATUS_NOT_FOUND 2
> +/// Reserved for use by multi-table update. Means the row has been updated.
> +#define STATUS_UPDATED 16
> +/**
> + Means that table->null_row is set. This is an artificial NULL-filled row
> + (one example: in outer join, if no match has been found in inner table).
> +*/
> +#define STATUS_NULL_ROW 32
> +/// Reserved for use by multi-table delete. Means the row has been deleted.
> #define STATUS_DELETED 64
One problem here is that the value zero is a distinct value, but there is no
documentation. But I do accept your excuse for not extending the documentation
further.
>
> /*
>
> === modified file 'sql/table.cc'
> --- a/sql/table.cc 2011-02-09 15:17:14 +0000
> +++ b/sql/table.cc 2011-04-01 09:09:15 +0000
> @@ -3301,7 +3301,7 @@ void TABLE::init(THD *thd, TABLE_LIST *t
> force_index= 0;
> force_index_order= 0;
> force_index_group= 0;
> - status= STATUS_NO_RECORD;
> + status= STATUS_GARBAGE | STATUS_NOT_FOUND;
BTW, I replaced all "status= STATUS_GARBAGE | STATUS_NOT_FOUND" assigments with
"status= STATUS_GARBAGE", and there are no test failures after that... But that
is not a proof that it is a safe change, of course.
> insert_values= 0;
> fulltext_searched= 0;
> file->ft_handler= 0;
Thanks,
Roy