List:Commits« Previous MessageNext Message »
From:marko.makela Date:March 30 2011 12:09pm
Subject:bzr push into mysql-5.1-innodb branch (marko.makela:3718 to 3719)
Bug#11877216
View as plain text  
 3719 Marko Mäkelä	2011-03-30
      Bug#11877216 InnoDB too eager to commit suicide on a busy server
      
      sync_array_print_long_waits(): Return the longest waiting thread ID
      and the longest waited-for lock. Only if those remain unchanged
      between calls in srv_error_monitor_thread(), increment
      fatal_cnt. Otherwise, reset fatal_cnt.
      
      Background: There is a built-in watchdog in InnoDB whose purpose is to
      kill the server when some thread is stuck waiting for a mutex or
      rw-lock. Before this fix, the logic was flawed.
      
      The function sync_array_print_long_waits() returns TRUE if it finds a
      lock wait that exceeds 10 minutes (srv_fatal_semaphore_wait_threshold).
      The function srv_error_monitor_thread() will kill the server if this
      happens 10 times in a row (fatal_cnt reaches 10), checked every 30
      seconds. This is wrong, because this situation does not mean that the
      server is hung. If the server is very busy for a little over 15
      minutes, it will be killed.
      
      Consider this example. Thread T1 is waiting for mutex M. Some time
      later, threads T2..Tn start waiting for the same mutex M. If T1 keeps
      waiting for 600 seconds, fatal_cnt will be incremented to 1. So far,
      so good. Now, if M is granted to T1, the server was obviously not
      stuck. But, T2..Tn keeps waiting, and their wait time will be longer
      than 600 seconds. If 5 minutes later, some Tn has still been waiting
      for more than 10 minutes for the mutex M, the server can be killed,
      even though it is not stuck.
      
      rb:622 approved by Jimmy Yang

    modified:
      storage/innobase/include/sync0arr.h
      storage/innobase/srv/srv0srv.c
      storage/innobase/sync/sync0arr.c
      storage/innodb_plugin/ChangeLog
      storage/innodb_plugin/include/sync0arr.h
      storage/innodb_plugin/srv/srv0srv.c
      storage/innodb_plugin/sync/sync0arr.c
 3718 Vasil Dimov	2011-03-28
      Store the '\0'-terminated query in row->trx_query
      
      This problem was introduced in
      marko.makela@stripped
      and is probably the reason for the following valgrind warning:
      
      from http://bugs.mysql.com/52691 , http://bugs.mysql.com/file.php?id=16880 :
      
      Version: '5.6.3-m5-valgrind-max-debug'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
      ==14947== Thread 18:
      ==14947== Conditional jump or move depends on uninitialised value(s)
      ==14947==    at 0x4A06318: __GI_strlen (mc_replace_strmem.c:284)
      ==14947==    by 0x9F3D7A: fill_innodb_trx_from_cache(trx_i_s_cache_struct*, THD*, TABLE*) (i_s.cc:591)
      ==14947==    by 0x9F4D7D: trx_i_s_common_fill_table(THD*, TABLE_LIST*, Item*) (i_s.cc:1238)
      ==14947==    by 0x7689F3: get_schema_tables_result(JOIN*, enum_schema_table_state) (sql_show.cc:6745)
      ==14947==    by 0x715A75: JOIN::exec() (sql_select.cc:2861)
      ==14947==    by 0x7185BD: mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:3609)
      ==14947==    by 0x70E823: handle_select(THD*, LEX*, select_result*, unsigned long) (sql_select.cc:319)
      ==14947==    by 0x6F2305: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:4557)
      ==14947==    by 0x6EAED4: mysql_execute_command(THD*) (sql_parse.cc:2135)
      ==14947==    by 0x6F44C9: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5597)
      ==14947==    by 0x6E864B: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1093)
      ==14947==    by 0x6E785E: do_command(THD*) (sql_parse.cc:815)
      ==14947==    by 0x6C18DD: do_handle_one_connection(THD*) (sql_connect.cc:771)
      ==14947==    by 0x6C146E: handle_one_connection (sql_connect.cc:707)
      ==14947==    by 0x30E1807760: start_thread (pthread_create.c:301)
      ==14947==    by 0x35EA670F: ???
      ==14947==  Uninitialised value was created by a heap allocation
      ==14947==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
      ==14947==    by 0xB4B948: mem_area_alloc (mem0pool.c:385)
      ==14947==    by 0xB4A27C: mem_heap_create_block (mem0mem.c:333)
      ==14947==    by 0xB4A530: mem_heap_add_block (mem0mem.c:446)
      ==14947==    by 0xB0D2A4: mem_heap_alloc (mem0mem.ic:186)
      ==14947==    by 0xB0D9C2: ha_storage_put_memlim (ha0storage.c:118)
      ==14947==    by 0xA479D8: fill_trx_row (trx0i_s.c:521)
      ==14947==    by 0xA490E9: fetch_data_into_cache (trx0i_s.c:1319)
      ==14947==    by 0xA491BA: trx_i_s_possibly_fetch_data_into_cache (trx0i_s.c:1352)
      ==14947==    by 0x9F4CE7: trx_i_s_common_fill_table(THD*, TABLE_LIST*, Item*) (i_s.cc:1221)
      ==14947==    by 0x7689F3: get_schema_tables_result(JOIN*, enum_schema_table_state) (sql_show.cc:6745)
      ==14947==    by 0x715A75: JOIN::exec() (sql_select.cc:2861)
      ==14947==    by 0x7185BD: mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:3609)
      ==14947==    by 0x70E823: handle_select(THD*, LEX*, select_result*, unsigned long) (sql_select.cc:319)
      ==14947==    by 0x6F2305: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:4557)
      ==14947==    by 0x6EAED4: mysql_execute_command(THD*) (sql_parse.cc:2135)
      ==14947==    by 0x6F44C9: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5597)
      ==14947==    by 0x6E864B: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1093)
      ==14947==    by 0x6E785E: do_command(THD*) (sql_parse.cc:815)
      ==14947==    by 0x6C18DD: do_handle_one_connection(THD*) (sql_connect.cc:771)
      ==14947==    by 0x6C146E: handle_one_connection (sql_connect.cc:707)
      ==14947==    by 0x30E1807760: start_thread (pthread_create.c:301)
      ==14947==    by 0x35EA670F: ???
      
      
      (gdb) bt
      #0  0x0000000004a06318 in _vgrZU_libcZdsoZa___GI_strlen (str=0x3026bfa0 "insert into `blobtest` set `data`='pkefxxpkalpabzgrczlxefkreqljeqbvzrcnhvhsjsfnvxzjsltfuincffigdkmhvvcmnseluzgbtedrfmxvnrdmzesbinjgwvharkpgjplrlnqudfidbqwgbykupycxzyikzqincnsjrxgncqzlgyqwjdbjulztgsffxpjgymsnntdibvklwqylmwhsmdskmllxuwafabdjnwlyofknwuixiyrgnplmerfdewgizkdhznitesfqepsqbbwkdepkmjoseyxjofmmjaqdipwopfrwidmhqbtovdslvayxcnpewzhppeetblccppniamezibuoinvlxkafpcmozawtplfpepxwlwhymsuraezcwvjqzwogsozodlsfzjiyrcaljjhqwdrcjawvelhefzzaexvcbyorlcyupqwgjuamiqpiputtndjwcsuyzdfhuxswuowhrzdvriwrxqmcqthvzzzvivbabbnhdbtcfdtgssvmirrcddnytnctcvqplwytxxzxelldhwahalzxvgynaiwjyezhxqhlsqudngekocfvlbqprxqhyhwbaomgqiwkpfguohuvlnhtrsszgacxhhzeppyqwfwabiqzgyzkperiidyunrykopysvlcxwhrcboetjltawdjergalsfvaxncmzoznryumrjmncvhvxqvqhhbznnifkguuiffmlrbmgwtzvnuwlaguixqadkupfhasbbxnwkrvsfhrqanfmvjtzfqodtutkjlxfcogtsjywrdgmzgszjtsmimaelsveayqrwviqwwefeziuaqsqpauxpnzhaxjtkdfvvodniwezskbxfxszyniyzkzxngcfwgjlyrlskmrzxqnptwlilsxybuguafxxkvryyjrnkhhcmxuusitaflaiuxjhyfnzkahlgmaszujqmfdhyppdnpweqanmvzgjfyzjolbmprhnuuxextcaxzicfvsuochprmlf"...) at mc_replace_strmem.c:284
      #1  0x00000000009f3d7b in fill_innodb_trx_from_cache (cache=0x1462440, thd=0x2a495000, table=0x2a422500) at /home/sbester/build/bzr/mysql-trunk/storage/innobase/handler/i_s.cc:591
      #2  0x00000000009f4d7e in trx_i_s_common_fill_table (thd=0x2a495000, tables=0x2a4c3ec0) at /home/sbester/build/bzr/mysql-trunk/storage/innobase/handler/i_s.cc:1238
      #3  0x00000000007689f4 in get_schema_tables_result (join=0x30f90c40, executed_place=PROCESSED_BY_JOIN_EXEC) at /home/sbester/build/bzr/mysql-trunk/sql/sql_show.cc:6745
      #4  0x0000000000715a76 in JOIN::exec (this=0x30f90c40) at /home/sbester/build/bzr/mysql-trunk/sql/sql_select.cc:2861
      #5  0x00000000007185be in mysql_select (thd=0x2a495000, rref_pointer_array=0x2a497590, tables=0x2a4c3ec0, wild_num=1, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2684619520, result=0x30319720, unit=0x2a496d28, select_lex=0x2a497378) at /home/sbester/build/bzr/mysql-trunk/sql/sql_select.cc:3609
      #6  0x000000000070e824 in handle_select (thd=0x2a495000, lex=0x2a496c78, result=0x30319720, setup_tables_done_option=0) at /home/sbester/build/bzr/mysql-trunk/sql/sql_select.cc:319
      #7  0x00000000006f2306 in execute_sqlcom_select (thd=0x2a495000, all_tables=0x2a4c3ec0) at /home/sbester/build/bzr/mysql-trunk/sql/sql_parse.cc:4557
      #8  0x00000000006eaed5 in mysql_execute_command (thd=0x2a495000) at /home/sbester/build/bzr/mysql-trunk/sql/sql_parse.cc:2135
      #9  0x00000000006f44ca in mysql_parse (thd=0x2a495000, rawbuf=0x30d80060 "select * from innodb_trx", length=24, parser_state=0x35ea5540) at /home/sbester/build/bzr/mysql-trunk/sql/sql_parse.cc:5597
      #10 0x00000000006e864c in dispatch_command (command=COM_QUERY, thd=0x2a495000, packet=0x30bb4e31 "select * from innodb_trx", packet_length=24) at /home/sbester/build/bzr/mysql-trunk/sql/sql_parse.cc:1093
      #11 0x00000000006e785f in do_command (thd=0x2a495000) at /home/sbester/build/bzr/mysql-trunk/sql/sql_parse.cc:815
      #12 0x00000000006c18de in do_handle_one_connection (thd_arg=0x2a495000) at /home/sbester/build/bzr/mysql-trunk/sql/sql_connect.cc:771
      #13 0x00000000006c146f in handle_one_connection (arg=0x2a495000) at /home/sbester/build/bzr/mysql-trunk/sql/sql_connect.cc:707
      #14 0x00000030e1807761 in start_thread (arg=0x35ea6710) at pthread_create.c:301
      #15 0x00000030e14e14ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
      (gdb) frame 1
      #1  0x00000000009f3d7b in fill_innodb_trx_from_cache (cache=0x1462440, thd=0x2a495000, table=0x2a422500) at /home/sbester/build/bzr/mysql-trunk/storage/innobase/handler/i_s.cc:591
      591                                     row->trx_query_cs);
      (gdb) list
      586                     if (row->trx_query) {
      587                             /* store will do appropriate character set
      588                             conversion check */
      589                             fields[IDX_TRX_QUERY]->store(
      590                                     row->trx_query, strlen(row->trx_query),
      591                                     row->trx_query_cs);
      592                             fields[IDX_TRX_QUERY]->set_notnull();
      593                     } else {
      594                             fields[IDX_TRX_QUERY]->set_null();
      595                     }

    modified:
      storage/innodb_plugin/trx/trx0i_s.c
=== modified file 'storage/innobase/include/sync0arr.h'
--- a/storage/innobase/include/sync0arr.h	revid:vasil.dimov@stripped-odusry0rkbeenhd1
+++ b/storage/innobase/include/sync0arr.h	revid:marko.makela@strippedpqg52bf
@@ -93,10 +93,13 @@ sync_arr_wake_threads_if_sema_free(void)
 Prints warnings of long semaphore waits to stderr. */
 
 ibool
-sync_array_print_long_waits(void);
-/*=============================*/
-			/* out: TRUE if fatal semaphore wait threshold
-			was exceeded */
+sync_array_print_long_waits(
+/*========================*/
+				/* out: TRUE if fatal semaphore wait threshold
+				was exceeded */
+	os_thread_id_t*	waiter,	/* out: longest waiting thread */
+	const void**	sema)	/* out: longest-waited-for semaphore */
+	__attribute__((nonnull));
 /************************************************************************
 Validates the integrity of the wait array. Checks
 that the number of reserved cells equals the count variable. */

=== modified file 'storage/innobase/srv/srv0srv.c'
--- a/storage/innobase/srv/srv0srv.c	revid:vasil.dimov@oracle.com-20110328083412-odusry0rkbeenhd1
+++ b/storage/innobase/srv/srv0srv.c	revid:marko.makela@stripped0330112558-5qleclsijpqg52bf
@@ -2180,9 +2180,15 @@ srv_error_monitor_thread(
 			os_thread_create */
 {
 	/* number of successive fatal timeouts observed */
-	ulint	fatal_cnt	= 0;
-	dulint	old_lsn;
-	dulint	new_lsn;
+	ulint		fatal_cnt	= 0;
+	dulint		old_lsn;
+	dulint		new_lsn;
+	/* longest waiting thread for a semaphore */
+	os_thread_id_t	waiter		= os_thread_get_curr_id();
+	os_thread_id_t	old_waiter	= waiter;
+	/* the semaphore that is being waited for */
+	const void*	sema		= NULL;
+	const void*	old_sema	= NULL;
 
 	old_lsn = srv_start_lsn;
 
@@ -2224,10 +2230,11 @@ loop:
 	/* In case mutex_exit is not a memory barrier, it is
 	theoretically possible some threads are left waiting though
 	the semaphore is already released. Wake up those threads: */
-	
+
 	sync_arr_wake_threads_if_sema_free();
 
-	if (sync_array_print_long_waits()) {
+	if (sync_array_print_long_waits(&waiter, &sema)
+	    && sema == old_sema && os_thread_eq(waiter, old_waiter)) {
 		fatal_cnt++;
 		if (fatal_cnt > 10) {
 
@@ -2242,6 +2249,8 @@ loop:
 		}
 	} else {
 		fatal_cnt = 0;
+		old_waiter = waiter;
+		old_sema = sema;
 	}
 
 	/* Flush stderr so that a database user gets the output

=== modified file 'storage/innobase/sync/sync0arr.c'
--- a/storage/innobase/sync/sync0arr.c	revid:vasil.dimov@stripped2-odusry0rkbeenhd1
+++ b/storage/innobase/sync/sync0arr.c	revid:marko.makela@strippedg52bf
@@ -916,10 +916,12 @@ sync_arr_wake_threads_if_sema_free(void)
 Prints warnings of long semaphore waits to stderr. */
 
 ibool
-sync_array_print_long_waits(void)
-/*=============================*/
-			/* out: TRUE if fatal semaphore wait threshold
-			was exceeded */
+sync_array_print_long_waits(
+/*========================*/
+				/* out: TRUE if fatal semaphore wait threshold
+				was exceeded */
+	os_thread_id_t*	waiter,	/* out: longest waiting thread */
+	const void**	sema)	/* out: longest-waited-for semaphore */
 {
 	sync_cell_t*	cell;
 	ibool		old_val;
@@ -927,24 +929,40 @@ sync_array_print_long_waits(void)
 	ulint		i;
 	ulint		fatal_timeout = srv_fatal_semaphore_wait_threshold;
 	ibool		fatal = FALSE;
+	double		longest_diff = 0;
 
 	for (i = 0; i < sync_primary_wait_array->n_cells; i++) {
 
+		double	diff;
+		void*	wait_object;
+
 		cell = sync_array_get_nth_cell(sync_primary_wait_array, i);
 
-		if (cell->wait_object != NULL && cell->waiting
-		    && difftime(time(NULL), cell->reservation_time) > 240) {
+		wait_object = cell->wait_object;
+
+		if (wait_object == NULL || !cell->waiting) {
+
+			continue;
+		}
+
+		diff = difftime(time(NULL), cell->reservation_time);
+
+		if (diff > 240) {
 			fputs("InnoDB: Warning: a long semaphore wait:\n",
 			      stderr);
 			sync_array_cell_print(stderr, cell);
 			noticed = TRUE;
 		}
 
-		if (cell->wait_object != NULL && cell->waiting
-		    && difftime(time(NULL), cell->reservation_time)
-		    > fatal_timeout) {
+		if (diff > fatal_timeout) {
 			fatal = TRUE;
 		}
+
+		if (diff > longest_diff) {
+			longest_diff = diff;
+			*sema = wait_object;
+			*waiter = cell->thread;
+		}
 	}
 
 	if (noticed) {

=== modified file 'storage/innodb_plugin/ChangeLog'
--- a/storage/innodb_plugin/ChangeLog	revid:vasil.dimov@strippedsry0rkbeenhd1
+++ b/storage/innodb_plugin/ChangeLog	revid:marko.makela@stripped
@@ -1,3 +1,8 @@
+2011-03-30	The InnoDB Team
+
+	* srv/srv0srv.c, sync/sync0arr.h, sync/sync0arr.c:
+	Fix Bug#11877216 InnoDB too eager to commit suicide on a busy server
+
 2011-03-15	The InnoDB Team
 
 	* btr/btr0cur.c, page/page0zip.c:

=== modified file 'storage/innodb_plugin/include/sync0arr.h'
--- a/storage/innodb_plugin/include/sync0arr.h	revid:vasil.dimov@strippedkbeenhd1
+++ b/storage/innodb_plugin/include/sync0arr.h	revid:marko.makela@stripped2bf
@@ -115,8 +115,11 @@ Prints warnings of long semaphore waits
 @return	TRUE if fatal semaphore wait threshold was exceeded */
 UNIV_INTERN
 ibool
-sync_array_print_long_waits(void);
-/*=============================*/
+sync_array_print_long_waits(
+/*========================*/
+	os_thread_id_t*	waiter,	/*!< out: longest waiting thread */
+	const void**	sema)	/*!< out: longest-waited-for semaphore */
+	__attribute__((nonnull));
 /********************************************************************//**
 Validates the integrity of the wait array. Checks
 that the number of reserved cells equals the count variable. */

=== modified file 'storage/innodb_plugin/srv/srv0srv.c'
--- a/storage/innodb_plugin/srv/srv0srv.c	revid:vasil.dimov@oracle.com-20110328083412-odusry0rkbeenhd1
+++ b/storage/innodb_plugin/srv/srv0srv.c	revid:marko.makela@stripped-20110330112558-5qleclsijpqg52bf
@@ -2236,6 +2236,12 @@ srv_error_monitor_thread(
 	ulint		fatal_cnt	= 0;
 	ib_uint64_t	old_lsn;
 	ib_uint64_t	new_lsn;
+	/* longest waiting thread for a semaphore */
+	os_thread_id_t	waiter		= os_thread_get_curr_id();
+	os_thread_id_t	old_waiter	= waiter;
+	/* the semaphore that is being waited for */
+	const void*	sema		= NULL;
+	const void*	old_sema	= NULL;
 
 	old_lsn = srv_start_lsn;
 
@@ -2284,7 +2290,8 @@ loop:
 
 	sync_arr_wake_threads_if_sema_free();
 
-	if (sync_array_print_long_waits()) {
+	if (sync_array_print_long_waits(&waiter, &sema)
+	    && sema == old_sema && os_thread_eq(waiter, old_waiter)) {
 		fatal_cnt++;
 		if (fatal_cnt > 10) {
 
@@ -2299,6 +2306,8 @@ loop:
 		}
 	} else {
 		fatal_cnt = 0;
+		old_waiter = waiter;
+		old_sema = sema;
 	}
 
 	/* Flush stderr so that a database user gets the output

=== modified file 'storage/innodb_plugin/sync/sync0arr.c'
--- a/storage/innodb_plugin/sync/sync0arr.c	revid:vasil.dimov@strippedd1
+++ b/storage/innodb_plugin/sync/sync0arr.c	revid:marko.makela@stripped
@@ -914,8 +914,10 @@ Prints warnings of long semaphore waits
 @return	TRUE if fatal semaphore wait threshold was exceeded */
 UNIV_INTERN
 ibool
-sync_array_print_long_waits(void)
-/*=============================*/
+sync_array_print_long_waits(
+/*========================*/
+	os_thread_id_t*	waiter,	/*!< out: longest waiting thread */
+	const void**	sema)	/*!< out: longest-waited-for semaphore */
 {
 	sync_cell_t*	cell;
 	ibool		old_val;
@@ -923,24 +925,40 @@ sync_array_print_long_waits(void)
 	ulint		i;
 	ulint		fatal_timeout = srv_fatal_semaphore_wait_threshold;
 	ibool		fatal = FALSE;
+	double		longest_diff = 0;
 
 	for (i = 0; i < sync_primary_wait_array->n_cells; i++) {
 
+		double	diff;
+		void*	wait_object;
+
 		cell = sync_array_get_nth_cell(sync_primary_wait_array, i);
 
-		if (cell->wait_object != NULL && cell->waiting
-		    && difftime(time(NULL), cell->reservation_time) > 240) {
+		wait_object = cell->wait_object;
+
+		if (wait_object == NULL || !cell->waiting) {
+
+			continue;
+		}
+
+		diff = difftime(time(NULL), cell->reservation_time);
+
+		if (diff > 240) {
 			fputs("InnoDB: Warning: a long semaphore wait:\n",
 			      stderr);
 			sync_array_cell_print(stderr, cell);
 			noticed = TRUE;
 		}
 
-		if (cell->wait_object != NULL && cell->waiting
-		    && difftime(time(NULL), cell->reservation_time)
-		    > fatal_timeout) {
+		if (diff > fatal_timeout) {
 			fatal = TRUE;
 		}
+
+		if (diff > longest_diff) {
+			longest_diff = diff;
+			*sema = wait_object;
+			*waiter = cell->thread;
+		}
 	}
 
 	if (noticed) {

Attachment: [text/bzr-bundle] bzr/marko.makela@oracle.com-20110330112558-5qleclsijpqg52bf.bundle
Thread
bzr push into mysql-5.1-innodb branch (marko.makela:3718 to 3719)Bug#11877216marko.makela30 Mar