Hi Sven!
I have some minor comments, but after than the patch is approved. You
can just handle the items below, and then push.
Just my few cents,
Mats Kindahl
Sven Sandberg wrote:
> Below is the list of changes that have just been committed into a local
> 5.1 repository of sven. When sven does a push these changes will
> be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>
> ChangeSet@stripped, 2007-12-13 19:56:17+01:00, sven@riska.(none) +21 -0
> BUG#26395: if crash during autocommit update to transactional table on master,
> slave fails
> Now, every transaction (including autocommit transactions) start with
> a BEGIN and end with a COMMIT/ROLLBACK in the binlog.
> Added a test case, and updated lots of test case result files.
>
> mysql-test/r/multi_update.result@stripped, 2007-12-13 19:56:12+01:00, sven@riska.(none)
> +2 -2
> Updated result file
>
> mysql-test/r/sp_trans_log.result@stripped, 2007-12-13 19:56:12+01:00, sven@riska.(none)
> +1 -0
> Updated result file
>
> mysql-test/suite/binlog/r/binlog_innodb.result@stripped, 2007-12-13 19:56:12+01:00,
> sven@riska.(none) +1 -0
> Updated result file
>
> mysql-test/suite/binlog/r/binlog_multi_engine.result@stripped, 2007-12-13
> 19:56:12+01:00, sven@riska.(none) +8 -0
> Updated result file
>
> mysql-test/suite/binlog/r/binlog_stm_blackhole.result@stripped, 2007-12-13
> 19:56:12+01:00, sven@riska.(none) +16 -0
> Updated result file
>
> mysql-test/suite/binlog/r/binlog_stm_mix_innodb_myisam.result@stripped, 2007-12-13
> 19:56:12+01:00, sven@riska.(none) +15 -0
> Updated result file
>
> mysql-test/suite/ndb/r/ndb_binlog_format.result@stripped, 2007-12-13 19:56:12+01:00,
> sven@riska.(none) +4 -0
> Updated result file
>
> mysql-test/suite/rpl/r/rpl_innodb_mixed_dml.result@stripped, 2007-12-13 19:56:12+01:00,
> sven@riska.(none) +71 -0
> Updated result file
>
> mysql-test/suite/rpl/r/rpl_row_charset_innodb.result@stripped, 2007-12-13
> 19:56:12+01:00, sven@riska.(none) +12 -0
> Updated result file
>
> mysql-test/suite/rpl/r/rpl_row_create_table.result@stripped, 2007-12-13 19:56:12+01:00,
> sven@riska.(none) +20 -19
> Updated result file
>
> mysql-test/suite/rpl/r/rpl_row_log_innodb.result@stripped, 2007-12-13 19:56:12+01:00,
> sven@riska.(none) +16 -8
> Updated result file
>
> mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result@stripped, 2007-12-13
> 19:56:12+01:00, sven@riska.(none) +12 -0
> Updated result file
>
> mysql-test/suite/rpl/r/rpl_truncate_3innodb.result@stripped, 2007-12-13 19:56:12+01:00,
> sven@riska.(none) +12 -0
> Updated result file
>
> mysql-test/suite/rpl/t/rpl_row_create_table.test@stripped, 2007-12-13 19:56:12+01:00,
> sven@riska.(none) +4 -4
> Updated result file
>
> mysql-test/suite/rpl_ndb/r/rpl_ndb_stm_innodb.result@stripped, 2007-12-13
> 19:56:12+01:00, sven@riska.(none) +2 -2
> Updated result file
>
> mysql-test/suite/rpl_ndb/r/rpl_ndb_transaction.result@stripped, 2007-12-13
> 19:56:13+01:00, sven@riska.(none) +101 -0
> Results for new test case
>
> mysql-test/suite/rpl_ndb/r/rpl_ndb_transaction.result@stripped, 2007-12-13
> 19:56:13+01:00, sven@riska.(none) +0 -0
>
> mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction-master.opt@stripped, 2007-12-13
> 19:56:13+01:00, sven@riska.(none) +1 -0
> Options for new test case
>
> mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction-master.opt@stripped, 2007-12-13
> 19:56:13+01:00, sven@riska.(none) +0 -0
>
> mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction-slave.opt@stripped, 2007-12-13
> 19:56:13+01:00, sven@riska.(none) +1 -0
> Options for new test case
>
> mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction-slave.opt@stripped, 2007-12-13
> 19:56:13+01:00, sven@riska.(none) +0 -0
>
> mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction.test@stripped, 2007-12-13 19:56:13+01:00,
> sven@riska.(none) +123 -0
> Added new test case.
>
> mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction.test@stripped, 2007-12-13 19:56:13+01:00,
> sven@riska.(none) +0 -0
>
> sql/log.cc@stripped, 2007-12-13 19:56:13+01:00, sven@riska.(none) +65 -29
> - Always write BEGIN and COMMIT around statements, even in autocommit
> mode.
> - Added comments for binlog_commit and binlog_rollback.
>
> sql/log_event.cc@stripped, 2007-12-13 19:56:13+01:00, sven@riska.(none) +1 -0
> Added debug trigger to avoid writing xid events to the binlog.
>
[snip]
> diff -Nrup a/mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction.test
> b/mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction.test
> --- /dev/null Wed Dec 31 16:00:00 196900
> +++ b/mysql-test/suite/rpl_ndb/t/rpl_ndb_transaction.test 2007-12-13 19:56:13 +01:00
> @@ -0,0 +1,123 @@
> +# Tests that transactions are replicated correctly, with various
> +# combinations of non-transactional and transactional non-XA tables.
> +# Also tests that an XA transaction where the master crashes just
> +# before writing the XID log event is executed correctly. See below
> +# for implementation details.
> +#
> +# Written by Sven.
>
Skip your name, for the same reasons as in the previous patch (you don't
want to be associated with any changes done to this file later.)
> +
> +source include/ndb_master-slave.inc;
> +source include/have_ndb.inc;
> +
> +
> +CREATE TABLE tmyisam (a int) ENGINE = MYISAM;
> +CREATE TABLE tinnodb (a int) ENGINE = INNODB;
> +CREATE TABLE tndb (a int) ENGINE = NDB;
> +
> +SHOW CREATE TABLE tmyisam;
> +SHOW CREATE TABLE tinnodb;
> +SHOW CREATE TABLE tndb;
> +
> +
> +--echo ==== Test 1: Non-XA Engines ====
> +# Test that everything works fine with non-XA engines. We just try
> +# all ways to do transactions involving ndb and/or myisam, with
> +# rollback or commit.
> +
> +--echo --- on master ---
> +
> +SET AUTOCOMMIT = 1;
> +
> +INSERT INTO tndb VALUES (1);
> +INSERT INTO tmyisam VALUES (1);
> +
> +BEGIN;
> +INSERT INTO tndb VALUES (2);
> +INSERT INTO tndb VALUES (3);
> +COMMIT;
> +
> +BEGIN;
> +INSERT INTO tmyisam VALUES (2);
> +INSERT INTO tmyisam VALUES (3);
> +COMMIT;
> +
> +BEGIN;
> +INSERT INTO tndb VALUES (4);
> +INSERT INTO tmyisam VALUES (4);
> +COMMIT;
> +
> +BEGIN;
> +INSERT INTO tndb VALUES (5);
> +INSERT INTO tndb VALUES (6);
> +ROLLBACK;
> +
> +BEGIN;
> +INSERT INTO tmyisam VALUES (5);
> +INSERT INTO tmyisam VALUES (6);
> +--warning 1196
> +ROLLBACK;
> +
> +BEGIN;
> +INSERT INTO tndb VALUES (7);
> +INSERT INTO tmyisam VALUES (7);
> +--warning 1196
> +ROLLBACK;
> +
> +SELECT * FROM tndb ORDER BY a;
> +SELECT * FROM tmyisam ORDER BY a;
> +
> +--echo --- on slave ---
> +--sync_slave_with_master
> +SELECT * FROM tndb ORDER BY a;
> +SELECT * FROM tmyisam ORDER BY a;
> +
> +
> +--echo ==== Test 2: Master crash before writing XID event on XA engine ====
> +# We now want to test the following scenario, to verify that BUG#26395
> +# has been fixed:
> +
> +# "master and slave have a transactional table that uses XA. Master
> +# has AUTOCOMMIT on and executes a statement (in this case an
> +# INSERT). Master crashes just before writing the XID event."
> +
> +# In this scenario, master will roll back, so slave should not execute
> +# the statement, and slave should roll back later when master is
> +# restarted.
> +
> +# However, we the master to be alive so that we are sure it replicates
> +# the statement to the slave. So in the test case, we must therefore
> +# not crash the master. Instead, we fake the crash by just not writing
> +# the XID event to the binlog. This is done by the
> +# --debug=d,do_not_write_xid flag in the .opt file.
> +
> +# So, unlike if the master had crashed, the master *will* execute the
> +# statement. But the slave should not execute it. Hence, after the
> +# first test is executed, the expected result on master is a table
> +# with one row, and on slave a table with no rows.
> +
> +# To simulate the slave correctly, we wait until everything up to the
> +# XID is replicated. We cannot sync_slave_with_master, because that
> +# would wait for the transaction to end. Instead, we wait for
> +# "sufficiently long time". Then we stop the slave.
> +
> +# Note: since this puts the master binlog in an inconsistent state,
> +# this should be the last test of the file.
>
It is kind of a hack. We run the risk of this test failing in PB when
the "sufficiently long time" is not sufficient.
> +
> +--echo --- on master ---
> +--connection master
> +
> +INSERT INTO tinnodb VALUES (1);
> +SELECT * FROM tinnodb ORDER BY a;
> +
> +--echo --- on slave ---
> +--connection slave
> +--sleep 3
> +STOP SLAVE;
> +# the following statement should show that nothing has been replicated
> +SELECT * FROM tinnodb ORDER BY a;
>
Please show the status of the slave as well by including
"include/show_slave_status.inc". This will show that the slave has
executed all the events up to the non-existent XID event. This will give
us a result mismatch of the "sufficiently long time" is not sufficient.
The fact that it's hard to test these kinds of scenarios is a separate
problem.
> +
> +
> +# clean up
> +DROP TABLE tmyisam;
> +DROP TABLE tinnodb;
> +DROP TABLE tndb;
> diff -Nrup a/sql/log.cc b/sql/log.cc
> --- a/sql/log.cc 2007-11-14 11:01:42 +01:00
> +++ b/sql/log.cc 2007-12-13 19:56:13 +01:00
> @@ -1402,6 +1402,13 @@ static int binlog_prepare(handlerton *ht
> return 0;
> }
>
> +/**
> + This function is called when a transaction involving a transactional
> + table is committed.
> +
> + It has the responsibility to flush the transaction cache to the
> + binlog file. See handlerton::commit for more details.
> +*/
>
Add parameters and return value to the Doxygen comment.
Use @see handlerton::commit to reference another symbol in Doxygen.
> static int binlog_commit(handlerton *hton, THD *thd, bool all)
> {
> DBUG_ENTER("binlog_commit");
> @@ -1414,7 +1421,15 @@ static int binlog_commit(handlerton *hto
> trx_data->reset();
> DBUG_RETURN(0);
> }
> - if (all)
> + /*
> + Write commit event if at least one of the following holds:
> + - the user sends an explicit COMMIT; or
> + - (we are not inside a BEGIN) and (the autocommit flag is on).
>
Strange use of parentheses. This is plain English, not "Attempto
Controlled English".
Minor issue, fix it if you feel inclined.
> + However, if the user has not sent an explicit COMMIT, and we are
> + either inside a BEGIN or run with autocommit off, then this is not
> + the end of a transaction and we should not write a commit event.
> + */
> + if (all || !(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN)))
> {
> Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, FALSE);
> qev.error_code= 0; // see comment in MYSQL_LOG::write(THD, IO_CACHE)
> @@ -1428,6 +1443,15 @@ static int binlog_commit(handlerton *hto
> }
> }
>
> +/**
> + This function is called when a transaction involving a transactional
> + table is rolled back.
> +
> + It has the responsibility to flush the transaction cache to the
> + binlog file. However, if the transaction does not involve
> + non-transactional tables, nothing needs to be logged. See
> + handlerton::rollback for more details.
> +*/
>
Add parameters and return value to the Doxygen comment. Use @see for
referencing other symbols in Doxygen.
> static int binlog_rollback(handlerton *hton, THD *thd, bool all)
> {
> DBUG_ENTER("binlog_rollback");
> @@ -3967,9 +3991,11 @@ int MYSQL_BIN_LOG::write_cache(IO_CACHE
> IMPLEMENTATION
> - To support transaction over replication, we wrap the transaction
> with BEGIN/COMMIT or BEGIN/ROLLBACK in the binary log.
> - We want to write a BEGIN/ROLLBACK block when a non-transactional table
> - was updated in a transaction which was rolled back. This is to ensure
> - that the same updates are run on the slave.
> + If a transaction that only involves transactional tables is
> + rolled back, we do not binlog it. However, we write a
> + BEGIN/ROLLBACK block when a non-transactional table was updated
> + in a transaction which was rolled back. This is to ensure that
> + the same updates are run on the slave.
> */
>
> bool MYSQL_BIN_LOG::write(THD *thd, IO_CACHE *cache, Log_event *commit_event)
> @@ -3990,32 +4016,42 @@ bool MYSQL_BIN_LOG::write(THD *thd, IO_C
> if (my_b_tell(cache) > 0)
> {
> /*
> - Log "BEGIN" at the beginning of the transaction.
> - which may contain more than 1 SQL statement.
> + Log "BEGIN" at the beginning of every transaction. Here, a
> + transaction is either a BEGIN..COMMIT block or a single
> + statement in autocommit mode.
> */
> - if (thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN))
> - {
> - Query_log_event qinfo(thd, STRING_WITH_LEN("BEGIN"), TRUE, FALSE);
> - /*
> - Imagine this is rollback due to net timeout, after all statements of
> - the transaction succeeded. Then we want a zero-error code in BEGIN.
> - In other words, if there was a really serious error code it's already
> - in the statement's events, there is no need to put it also in this
> - internally generated event, and as this event is generated late it
> - would lead to false alarms.
> - This is safer than thd->clear_error() against kills at shutdown.
> - */
> - qinfo.error_code= 0;
> - /*
> - Now this Query_log_event has artificial log_pos 0. It must be adjusted
> - to reflect the real position in the log. Not doing it would confuse the
> - slave: it would prevent this one from knowing where he is in the
> - master's binlog, which would result in wrong positions being shown to
> - the user, MASTER_POS_WAIT undue waiting etc.
> - */
> - if (qinfo.write(&log_file))
> - goto err;
> - }
> + Query_log_event qinfo(thd, STRING_WITH_LEN("BEGIN"), TRUE, FALSE);
> + /*
> + Imagine this is rollback due to net timeout, after all
> + statements of the transaction succeeded. Then we want a
> + zero-error code in BEGIN. In other words, if there was a
> + really serious error code it's already in the statement's
> + events, there is no need to put it also in this internally
> + generated event, and as this event is generated late it would
> + lead to false alarms.
> +
> + This is safer than thd->clear_error() against kills at shutdown.
> + */
> + qinfo.error_code= 0;
>
Yes, BEGIN cannot fail. Maybe you should clear the error code before
constructing the BEGIN event instead, but I really have no strong
opinion on this. It seems to be just a copy and paste of the code above,
so just leave it as it is.
> + /*
> + Now this Query_log_event has artificial log_pos 0. It must be
> + adjusted to reflect the real position in the log. Not doing it
> + would confuse the slave: it would prevent this one from
> + knowing where he is in the master's binlog, which would result
> + in wrong positions being shown to the user, MASTER_POS_WAIT
> + undue waiting etc.
> + */
> + if (qinfo.write(&log_file))
> + goto err;
> +
> + DBUG_EXECUTE_IF("crash_before_writing_xid",
> + {
> + if ((write_error= write_cache(cache, false, true)))
> + DBUG_PRINT("error", ("error writing binlog cache: %d",
> + write_error));
> + DBUG_PRINT("warning", ("crashing before writing xid"));
>
The symbols in the DBUG_PRINT() should not be "warning" or "error". Use
"info" instead.
> + abort();
> + });
>
> if ((write_error= write_cache(cache, false, false)))
> goto err;
> diff -Nrup a/sql/log_event.cc b/sql/log_event.cc
> --- a/sql/log_event.cc 2007-11-28 12:11:13 +01:00
> +++ b/sql/log_event.cc 2007-12-13 19:56:13 +01:00
> @@ -4265,6 +4265,7 @@ Xid_log_event(const char* buf,
> #ifndef MYSQL_CLIENT
> bool Xid_log_event::write(IO_CACHE* file)
> {
> + DBUG_EXECUTE_IF("do_not_write_xid", return 0;);
> return write_header(file, sizeof(xid)) ||
> my_b_safe_write(file, (uchar*) &xid, sizeof(xid));
> }
>
>
--
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com