From: Kristian Nielsen Date: May 4 2010 2:32pm Subject: error handling for MySQL binary log List-Archive: http://lists.mysql.com/internals/37894 Message-Id: <878w7zenzq.fsf@knielsen-hq.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Hi, While working on the binlog code in sql/log.cc, I discovered what I think is a number of problems with the error handling in the code that writes events to the binary log. At this point I do not have much in the way of concrete suggestions for how to improve this (sorry!), but I thought it might be useful to at least mention the issues, rather than just ignore it and let it be forgotten until someone else hits the same issues (and maybe someone can correct me if I misunderstood the code). Suppose that mysqld gets an error writing to the binlog file. Eg. I tested with an error insert like this in sql/log.cc MYSQL_BIN_LOG::write_cache(): } /* Write data to the binary log file */ + DBUG_EXECUTE_IF("fail_binlog_write_1", return ER_ERROR_ON_WRITE;); if (my_b_write(&log_file, cache->read_pos, length)) return ER_ERROR_ON_WRITE; cache->read_pos=cache->read_end; // Mark buffer used up With this, I found a number of problems/bugs in the code: 1. A debug built server crashes with an assertion in net_end_statement() because the binlog code causes the statement to fail without setting an error status with my_error() or similar. The reason is this statement: if ((write_error= write_cache(cache, false, false))) goto err; write_cache does not set any error status with my_error(), and nor does the code at the err: label, due to assigning to write_error (write_error seems to be a flag that disables further error reporting, to prevent reporting write errors multiple times). So the statement fails without setting any error code. Also, this is the only place that write_error can ever be cleared, which seems wrong. My guess is that the write_error stuff is some old code that is no longer fully working... 2. MYSQL_BIN_LOG::write(THD *, IO_CACHE *, Log_event *_event, bool) and other places in the code reports errors with sql_print_error() instead of my_error() or similar. Again this causes the assertion in net_end_statement(), as the current SQL statement fails but no error status is set. 3. If writing the binlog fails for an InnoDB transaction, what happens is that first innobase_prepare() is called, which logs the prepare_commit_mutex, then binlogging is tried and fails, then innobase_rollback() is called to roll back the transaction due to the failure. But innobase_rollback() is missing code to unlock the prepare_commit_mutex, so we get an assertion about locking an already locked mutex in the next transaction. Eg. innobase_rollback() needs code like this: if (trx->active_trans == 2) { pthread_mutex_unlock(&prepare_commit_mutex); } 4. When MYSQL_BIN_LOG::write_cache() bails out due to error, it does not get to this code near the end: cache->read_pos=cache->read_end; // Mark buffer used up Because of this, binlog_trx_data::empty() will return true, and binlog_trx_data::reset() will not call binlog_trx_data::truncate() to re-initialise the transaction cache. This means that after an error, binlogging for subsequent transactions in the same session will just silently be ignored, as the transaction cache is in READ_CACHE mode rather than WRITE_CACHE. It seems the logic in is a bit flawed, and it is kind of by accident that the transaction cache ends up with empty() returning false when write_cache() succeeds. Unless I am missing something, it seems that currently the server is not really able to handle/recover from a failure to write the binary log... Hope this helps, - Kristian.