Hello Roy,
Roy Lyseng a écrit, Le 27.07.2010 10:24:
> #At file:///home/rl136806/mysql/repo/mysql-work5/ based on
> revid:epotemkin@stripped
>
> 3219 Roy Lyseng 2010-07-27
> Bug#50489: another segfault in fix_semijoin_strategies...
>
> commit mail test
I built a tree with the same revision as when I filed the bug report,
was able to reproduce the Valgrind error.
==397== Invalid read of size 4
==397== at 0x7B390B:
_ZL16advance_sj_stateP4JOINyPK13st_join_tablejPdS4_P11st_position
(sql_select.cc:13385)
==397== by 0x7B5A42:
_ZL32best_extension_by_limited_searchP4JOINyjddjj (sql_select.cc:7572)
==397== by 0x7B608C: _ZL13greedy_searchP4JOINyjj (sql_select.cc:7299)
==397== by 0x7B67DF: _ZL11choose_planP4JOINy (sql_select.cc:6910)
==397== by 0x7BAF2C:
_ZL20make_join_statisticsP4JOINP10TABLE_LISTP4ItemP16st_dynamic_array
(sql_select.cc:4527)
==397== by 0x7BE111: JOIN::optimize() (sql_select.cc:1621)
==397== by 0x7C2B1E: mysql_select(THD*, Item***, TABLE_LIST*,
unsigned, List<Item>&, Item*, unsigned, st_order*, st_order*, Item*,
st_order*, unsigned long long, select_result*, st_select_lex_unit*,
st_select_lex*) (sql_select.cc:3123)
==397== by 0x7C85D2: handle_select(THD*, LEX*, select_result*,
unsigned long) (sql_select.cc:304)
==397== by 0x717C37: _ZL21execute_sqlcom_selectP3THDP10TABLE_LIST
(sql_parse.cc:4940)
==397== by 0x719FB2: mysql_execute_command(THD*) (sql_parse.cc:2165)
==397== by 0x923D6C: sp_instr_stmt::exec_core(THD*, unsigned*)
(sp_head.cc:2922)
==397== by 0x923F63: sp_lex_keeper::reset_lex_and_exec_core(THD*,
unsigned*, bool, sp_instr*) (sp_head.cc:2745)
==397== by 0x92A2C7: sp_instr_stmt::execute(THD*, unsigned*)
(sp_head.cc:2860)
==397== by 0x926342: sp_head::execute(THD*) (sp_head.cc:1246)
==397== by 0x9271B9: sp_head::execute_procedure(THD*, List<Item>*)
(sp_head.cc:1988)
==397== by 0x72054E: mysql_execute_command(THD*) (sql_parse.cc:4398)
==397== Address 0xb706e70 is 40 bytes inside a block of size 464 free'd
==397== at 0x4C252AF: free (vg_replace_malloc.c:323)
==397== by 0xB8F51C: my_no_flags_free (my_malloc.c:66)
==397== by 0xBBE83B: free_root (my_alloc.c:349)
==397== by 0x9263BF: sp_head::execute(THD*) (sp_head.cc:1263)
==397== by 0x9271B9: sp_head::execute_procedure(THD*, List<Item>*)
(sp_head.cc:1988)
==397== by 0x72054E: mysql_execute_command(THD*) (sql_parse.cc:4398)
==397== by 0x72206C: mysql_parse(THD*, char const*, unsigned, char
const**) (sql_parse.cc:5973)
==397== by 0x722C5A: dispatch_command(enum_server_command, THD*,
char*, unsigned) (sql_parse.cc:1090)
==397== by 0x72413A: do_command(THD*) (sql_parse.cc:774)
==397== by 0x711104: do_handle_one_connection(THD*) (sql_connect.cc:1173)
==397== by 0x7111C9: handle_one_connection (sql_connect.cc:1113)
==397== by 0xC4AC33: pfs_spawn_thread(void*) (pfs.cc:1011)
==397== by 0x503B3E9: start_thread (in /lib/libpthread-2.8.90.so)
The line doing wrong access is:
/* This is SJ-Materialization with lookups */
COST_VECT prefix_cost;
signed int first_tab= (int)idx - mat_info->tables; // here
because mat_info is bad data.
This is bad data because:
1) at first EXECUTE (when materialization is on): mat_info is created by
optimize_semijoin_nests(): the "new" operator allocates in the execution
MEM_ROOT, which is later freed in sp_head::execute() at sp_head.cc:1263:
free_root(&execute_mem_root, MYF(0));
2) at second EXECUTE (when materialization has just been turned off):
mat_info is not created by optimize_semijoin_nests(); the freed pointer
thus stays, and is used by advance_sj_state().
One could say that:
a) stage (2) is correct to not create mat_info, as it's not doing
materialization
b) stage (1) is incorrect because it leaves a dangling pointer behind;
it should cleanup after itself.
Such patch:
=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc 2010-01-18 19:21:20 +0000
+++ sql/sql_select.cc 2010-07-27 20:44:08 +0000
@@ -10634,6 +10634,8 @@
table->reginfo.join_tab= 0;
}
end_read_record(&read_record);
+ if (emb_sj_nest)
+ emb_sj_nest->sj_mat_info= NULL;
}
which adds a proper reset of sj_mat_info to JOIN_TAB::cleanup(), solves
the bug's testcase. It has some logic: no reference to memory allocated
in the execution MEM_ROOT is left.
I suggest we discuss this further on IRC.