From: MARK CALLAGHAN Date: May 4 2010 4:13pm Subject: Re: error handling for MySQL binary log List-Archive: http://lists.mysql.com/internals/37895 Message-Id: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Update function declarations with "__attribute__((warn_unused_result)" and you might find lots of interesting things. http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Function-Attributes.html#index-= g_t_0040code_007bwarn_005funused_005fresult_007d-attribute-2445 See these for examples: http://bugs.mysql.com/bug.php?id=3D25058 http://bugs.mysql.com/bug.php?id=3D40492 http://bugs.mysql.com/bug.php?id=3D37148 One of the problems was that callers to mysql_bin_log.write() did not check for an error. That appears to have been fixed. I assume a lot more work remains as there are only two references to "warn_unused_result" in 5.1 source and MySQL still generates a lot of warnings during compilation. There was a bug a few years ago in the replication code that caused a lot of downtime for us from code that did not check the return value for a function that failed when unable to allocate memory. We messed up by getting the server to run out of memory. But the silent corruption that followed was a serious pain for us and one of the bugs is still open -- http://bugs.mysql.com/bug.php?id=3D40492. On Tue, May 4, 2010 at 7:32 AM, Kristian Nielsen wrote: > Hi, > > While working on the binlog code in sql/log.cc, I discovered what I think= is a > number of problems with the error handling in the code that writes events= to > the binary log. At this point I do not have much in the way of concrete > suggestions for how to improve this (sorry!), but I thought it might be u= seful > to at least mention the issues, rather than just ignore it and let it be > forgotten until someone else hits the same issues (and maybe someone can > correct me if I misunderstood the code). > > Suppose that mysqld gets an error writing to the binlog file. Eg. I teste= d > with an error insert like this in sql/log.cc MYSQL_BIN_LOG::write_cache()= : > > =A0 =A0 } > > =A0 =A0 /* Write data to the binary log file */ > + =A0 =A0DBUG_EXECUTE_IF("fail_binlog_write_1", return ER_ERROR_ON_WRITE;= ); > =A0 =A0 if (my_b_write(&log_file, cache->read_pos, length)) > =A0 =A0 =A0 return ER_ERROR_ON_WRITE; > =A0 =A0 cache->read_pos=3Dcache->read_end; =A0 =A0 =A0 =A0 =A0 // Mark bu= ffer used up > > With this, I found a number of problems/bugs in the code: > > > 1. A debug built server crashes with an assertion in net_end_statement() > because the binlog code causes the statement to fail without setting an > error status with my_error() or similar. The reason is this statement: > > =A0 =A0 =A0if ((write_error=3D write_cache(cache, false, false))) > =A0 =A0 =A0 =A0 goto err; > > write_cache does not set any error status with my_error(), and nor does t= he > code at the err: label, due to assigning to write_error (write_error seem= s to > be a flag that disables further error reporting, to prevent reporting wri= te > errors multiple times). So the statement fails without setting any error = code. > > Also, this is the only place that write_error can ever be cleared, which = seems > wrong. My guess is that the write_error stuff is some old code that is no > longer fully working... > > > 2. MYSQL_BIN_LOG::write(THD *, IO_CACHE *, Log_event *_event, bool) and o= ther > places in the code reports errors with sql_print_error() instead of my_er= ror() > or similar. Again this causes the assertion in net_end_statement(), as th= e > current SQL statement fails but no error status is set. > > > 3. If writing the binlog fails for an InnoDB transaction, what happens is= that > first innobase_prepare() is called, which logs the prepare_commit_mutex, = then > binlogging is tried and fails, then innobase_rollback() is called to roll= back > the transaction due to the failure. But innobase_rollback() is missing co= de to > unlock the prepare_commit_mutex, so we get an assertion about locking an > already locked mutex in the next transaction. > > Eg. innobase_rollback() needs code like this: > > =A0 =A0if (trx->active_trans =3D=3D 2) { > > =A0 =A0 =A0 =A0 =A0 =A0pthread_mutex_unlock(&prepare_commit_mutex); > =A0 =A0} > > > 4. When MYSQL_BIN_LOG::write_cache() bails out due to error, it does not = get > to this code near the end: > > =A0 =A0cache->read_pos=3Dcache->read_end; =A0 =A0 =A0 =A0 =A0 =A0// Mark = buffer used up > > Because of this, binlog_trx_data::empty() will return true, and > binlog_trx_data::reset() will not call binlog_trx_data::truncate() to > re-initialise the transaction cache. This means that after an error, > binlogging for subsequent transactions in the same session will just sile= ntly > be ignored, as the transaction cache is in READ_CACHE mode rather than > WRITE_CACHE. > > It seems the logic in is a bit flawed, and it is kind of by accident that= the > transaction cache ends up with empty() returning false when write_cache() > succeeds. > > > Unless I am missing something, it seems that currently the server is not > really able to handle/recover from a failure to write the binary log... > > Hope this helps, > > =A0- Kristian. > > -- > MySQL Internals Mailing List > For list archives: http://lists.mysql.com/internals > To unsubscribe: =A0 =A0http://lists.mysql.com/internals?unsub=3Dmdcallag@= gmail.com > > --=20 Mark Callaghan mdcallag@stripped