List:Internals« Previous MessageNext Message »
From:Kristian Nielsen Date:May 4 2010 2:32pm
Subject:error handling for MySQL binary log
View as plain text  
Hi,

While working on the binlog code in sql/log.cc, I discovered what I think is a
number of problems with the error handling in the code that writes events to
the binary log. At this point I do not have much in the way of concrete
suggestions for how to improve this (sorry!), but I thought it might be useful
to at least mention the issues, rather than just ignore it and let it be
forgotten until someone else hits the same issues (and maybe someone can
correct me if I misunderstood the code).

Suppose that mysqld gets an error writing to the binlog file. Eg. I tested
with an error insert like this in sql/log.cc MYSQL_BIN_LOG::write_cache():

     }
 
     /* Write data to the binary log file */
+    DBUG_EXECUTE_IF("fail_binlog_write_1", return ER_ERROR_ON_WRITE;);
     if (my_b_write(&log_file, cache->read_pos, length))
       return ER_ERROR_ON_WRITE;
     cache->read_pos=cache->read_end;		// Mark buffer used up

With this, I found a number of problems/bugs in the code:


1. A debug built server crashes with an assertion in net_end_statement()
because the binlog code causes the statement to fail without setting an
error status with my_error() or similar. The reason is this statement:

      if ((write_error= write_cache(cache, false, false)))
         goto err;

write_cache does not set any error status with my_error(), and nor does the
code at the err: label, due to assigning to write_error (write_error seems to
be a flag that disables further error reporting, to prevent reporting write
errors multiple times). So the statement fails without setting any error code.

Also, this is the only place that write_error can ever be cleared, which seems
wrong. My guess is that the write_error stuff is some old code that is no
longer fully working...


2. MYSQL_BIN_LOG::write(THD *, IO_CACHE *, Log_event *_event, bool) and other
places in the code reports errors with sql_print_error() instead of my_error()
or similar. Again this causes the assertion in net_end_statement(), as the
current SQL statement fails but no error status is set.


3. If writing the binlog fails for an InnoDB transaction, what happens is that
first innobase_prepare() is called, which logs the prepare_commit_mutex, then
binlogging is tried and fails, then innobase_rollback() is called to roll back
the transaction due to the failure. But innobase_rollback() is missing code to
unlock the prepare_commit_mutex, so we get an assertion about locking an
already locked mutex in the next transaction.

Eg. innobase_rollback() needs code like this:

    if (trx->active_trans == 2) {

            pthread_mutex_unlock(&prepare_commit_mutex);
    }


4. When MYSQL_BIN_LOG::write_cache() bails out due to error, it does not get
to this code near the end:

    cache->read_pos=cache->read_end;		// Mark buffer used up

Because of this, binlog_trx_data::empty() will return true, and
binlog_trx_data::reset() will not call binlog_trx_data::truncate() to
re-initialise the transaction cache. This means that after an error,
binlogging for subsequent transactions in the same session will just silently
be ignored, as the transaction cache is in READ_CACHE mode rather than
WRITE_CACHE.

It seems the logic in is a bit flawed, and it is kind of by accident that the
transaction cache ends up with empty() returning false when write_cache()
succeeds.


Unless I am missing something, it seems that currently the server is not
really able to handle/recover from a failure to write the binary log...

Hope this helps,

 - Kristian.
Thread
error handling for MySQL binary logKristian Nielsen4 May
  • Re: error handling for MySQL binary logMARK CALLAGHAN4 May
  • Re: error handling for MySQL binary logStewart Smith5 May
  • Re: error handling for MySQL binary logHe Zhenxing28 May