Hi Luis,
Nice work. Patch not approved yet.
See some requests in what follows:
1 - Check why the rpl_temporary_errors are failing as follow:
#0 0x006b7422 in __kernel_vsyscall ()
#1 0x004a1e93 in __pthread_kill (threadid=3077912256, signo=6) at
../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:64
#2 0x0871f299 in my_write_core (sig=6) at stacktrace.c:329
#3 0x082bd594 in handle_segfault (sig=6) at mysqld.cc:2571
#4 <signal handler called>
#5 0x006b7422 in __kernel_vsyscall ()
#6 0x0073d4d1 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#7 0x00740932 in *__GI_abort () at abort.c:92
#8 0x00773ee5 in __libc_message (do_abort=2, fmt=0x837578 "*** glibc detected *** %s: %s:
0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#9 0x0077dff1 in malloc_printerr (action=<value optimized out>, str=0x6 <Address
0x6 out of bounds>, ptr=0xa4e6c48) at malloc.c:6217
#10 0x0077f6f2 in _int_free (av=<value optimized out>, p=<value optimized
out>) at malloc.c:4750
#11 0x007827cd in *__GI___libc_free (mem=0xa4e6c48) at malloc.c:3716
#12 0x0872ee3e in FreeList (linkp=0x0) at dbug.c:1736
#13 0x0872e946 in FreeState (cs=0xa4e43f0, state=0x8a49020, free_state=0) at dbug.c:1475
#14 0x0872ebcb in _db_end_ () at dbug.c:1542
#15 0x086fd4cb in my_end (infoflag=3) at my_init.c:220
#16 0x082c0d28 in main (argc=10, argv=0xbfa7c364) at mysqld.cc:4579
Notice that I did not check if this is due to your patch.
2 - Please, fix the function find_uniq_filename(char *name)
which is not checking if anything bad is happening.
3 - I did not follow which functions can fail in the
find_uniq_filename(char *name) and how they fail. So, verify
if it is necessary to introduce an error in the diagnostic area.
DBUG_EXECUTE_IF("error_unique_log_filename", DBUG_RETURN(1););
4 - Please, remember to remove the following code from >= 5.5
because it is not true after WL#2687:
+ if (error)
+ {
+ /**
+ Be conservative... There are possible lost events (eg,
+ failing to log the Execute_load_query_log_event
+ on a LOAD DATA while using a non-transactional
+ table)!
+ */
+ Incident incident= INCIDENT_LOST_EVENTS;
+ Incident_log_event ev(current_thd, incident);
+ /**
+ We give it a shot and try to write an incident event anyway
+ to the current log.
+ */
+ (void) ev.write(&log_file);
+
+ goto end;
+ }
6 - Please, check the error code of the following function and augment the
test case to take this error into account:
purge_logs_before_date(purge_time);
7 - Add doxygen
8 - Have you checked with the runtime team if they are ok with the following change:
+++ sql/handler.cc 2010-04-28 22:22:25 +0000
@@ -1065,6 +1065,8 @@
1 transaction was rolled back
@retval
2 error during commit, data may be inconsistent
+ @retval
+ 3 error during unlog
9 - Please, print the content of the index file for the following changes:
--- mysql-test/suite/binlog/t/binlog_index.test 2009-12-08 16:03:19 +0000
+++ mysql-test/suite/binlog/t/binlog_index.test 2010-04-28 22:22:25 +0000
@@ -205,11 +205,11 @@
--echo # This should put the server in unsafe state and stop
--echo # accepting any command. If we inject a fault at this
--echo # point and continue the execution the server crashes.
---echo # Besides the flush command does not report an error.
--echo #
--echo # fault_injection_registering_index
----> Print it here
SET SESSION debug="+d,fault_injection_registering_index";
+-- error ER_CANT_OPEN_FILE
flush logs;
----> Print it here
--source include/restart_mysqld.inc
@@ -221,6 +221,7 @@
--echo # fault_injection_updating_index
----> Print it here
SET SESSION debug="+d,fault_injection_updating_index";
+-- error ER_CANT_OPEN_FILE
flush logs;
----> Print it here
--source include/restart_mysqld.inc
10 - I did not carefully check your test case to see if it covers
all the changes. I will do so in the next round. However, why don't
you combine it with binlog_index.test or vice-versa?
Cheers.
Luis Soares wrote:
> #At file:///home/lsoares/Workspace/bzr/work/bugfixing/46166/mysql-5.1-bugteam/ based
> on revid:kristofer.pettersson@stripped
>
> 3356 Luis Soares 2010-04-26
> BUG#46166: MYSQL_BIN_LOG::new_file_impl is not propagating error
> when generating new name.
>
> If find_uniq_filename returns an error, then this error is not
> being propagated upwards, and execution does not report error
> to the user (although a entry in the error log is generated).
>
> Additionally, some more errors were ignored in new_file_impl:
> - when writing the rotate event
> - when reopening the index and binary log file
>
> This patch addresses this by propagating the error up in the
> execution stack. Furthermore, when rotation of the binary log
> fails, an incident event is written, because there may be a
> chance that some changes for a given statement, were not properly
> logged. For example, in SBR, LOAD DATA INFILE statement requires
> more than one event to be logged, should rotation fail while
> logging part of the LOAD DATA events, then the logged data would
> become inconsistent with the data in the storage engine.
> @ mysql-test/suite/binlog/t/binlog_index.test
> The error on open of index and binary log on new_file_impl is now
> caught. Thence the user will get an error message. We need to
> accomodate this change in the test case for the failing FLUSH LOGS.
> @ mysql-test/suite/rpl/t/rpl_binlog_errors-master.opt
> Sets max_binlog_size to 4096.
> @ mysql-test/suite/rpl/t/rpl_binlog_errors.test
> Added a test case for asserting that the error is found and
> reported.
> @ sql/handler.cc
> Catching error now returned by unlog (in ha_commit_trans) and
> returning it.
> @ sql/log.cc
> Propagating errors from new_file_impl upwards. The errors that
> new_file_impl catches now are:
> - error on generate_new_name
> - error on writing the rotate event
> - error when opening the index or the binary log file.
> @ sql/log.h
> Changing declaration of:
> - rotate_and_purge
> - new_file
> - new_file_without_locking
> - new_file_impl
> - unlog
>
> They now return int instead of void.
> @ sql/rpl_injector.cc
> Changes to catch the return from rotate_and_purge.
> @ sql/slave.cc
> Changes to catch the return values for new_file and rotate_relay_log.
> @ sql/slave.h
> Changes to rotate_relay_log declaration (now returns int
> instead of void).
> @ sql/sql_load.cc
> In SBR, some logging of LOAD DATA events goes through
> IO_CACHE_CALLBACK invocation at mf_iocache.c:_my_b_get. The
> IO_CACHE implementation is ignoring the return value for from
> these callbacks (pre_read and post_read), so we need to find
> out at the end of the execution if the error is set or not
> in THD.
> @ sql/sql_parse.cc
> Catching the rotate_relay_log and rotate_and_purge return values.
>
> added:
> mysql-test/suite/rpl/r/rpl_binlog_errors.result
> mysql-test/suite/rpl/t/rpl_binlog_errors-master.opt
> mysql-test/suite/rpl/t/rpl_binlog_errors.test
> modified:
> mysql-test/suite/binlog/r/binlog_index.result
> mysql-test/suite/binlog/t/binlog_index.test
> sql/handler.cc
> sql/log.cc
> sql/log.h
> sql/rpl_injector.cc
> sql/slave.cc
> sql/slave.h
> sql/sql_load.cc
> sql/sql_parse.cc