Andrei Elkin wrote:
> Mats, hello.
>
>
> I have agreed with many ideas this patch implemented.
> As this work is big enough so that it's naturally for me to have some
> questions.
>
>
>> Below is the list of changes that have just been committed into a local
>> 5.1 repository of mats. When mats does a push these changes
>> will be propagated to the main repository and, within 24 hours after the
>> push, to the public repository.
>> For information on how to access the public repository
>> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>>
>> ChangeSet@stripped, 2008-03-25 10:39:27+01:00, mats@mats-laptop.(none) +29 -0
>> BUG#29020 (Event results not correctly replicated to slave in RBR):
>>
>>
>
> Could you please state what is the problem of the bug.
> My version: it's lack of atomicity in binlogging the Table map and subjected
> to the event ordinary Rows_log_events.
>
OK.
>
>> Bug is fixed by writing non-transactional events to the transaction cache and
>> flushing the cache to the binary log at statement commit. To mimic the
> behavior
>> of normal statement-based replication, we flush the transaction cache in row-
>> based mode when there is no committed statements in the transaction cache,
>> which means we are committing the first one. This means that it will be
> written
>> to the binary log as a "mini-transaction" with just the rows for the
> statement.
>>
>
>
> As we talked about limitations on #rep yestarday I think you need to
> add here that the bug scenario will be possible if the server is
> compliled w/o transactional support.
>
OK. I'll mention it.
> I think you need to mention some facts about user visible behavior
> like possible appearence of redundant COMMIT/ROLLBACK in binlog
> (discussed further).
>
Well... I think not. This is the result of the patch Sven did and is not
really related to this patch (at least not very much).
[snip]
>> --- a/mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result 2007-12-14 14:40:43
> +01:00
>> +++ b/mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result 2008-03-25 10:39:23
> +01:00
>> @@ -425,673 +425,5 @@ INSERT INTO t13 VALUES (USER());
>> INSERT INTO t13 VALUES (my_user());
>> INSERT INTO t13 VALUES (CURRENT_USER());
>> INSERT INTO t13 VALUES (my_current_user());
>>
>
> I really appreciate that you are removing big results of show-binlog-events.
>
> Still, I think it's necessary to explain why `show binlog event' is safe to
> remove. Do we have any risk to loosing something in that dropped piece of
> results?
>
Well... the main reason is that we should not check the binlog in
replication tests, we should check that replication works. We have a
much smaller suite for checking that items go into the binary log in a
way that we expect, but if replication works correctly, it does not
matter what is in the binary log.
I can add a note that it was removed because we shouldn't rely on the
contents of the binary log in replication tests.
>
>
>> -show binlog events from <binlog_start>;
>> -Log_name Pos Event_type Server_id End_log_pos Info
>> -master-bin.000001 # Query # # drop database if exists mysqltest1
>> -master-bin.000001 # Query # # create database mysqltest1
>> -master-bin.000001 # Query # # use `mysqltest1`; CREATE TABLE t1 (a
> varchar(100))
>> -master-bin.000001 # Table_map # # table_id: # (mysqltest1.t1)
>>
> ...
>
>> -master-bin.000001 # Table_map # # table_id: # (mysqltest1.t11)
>> -master-bin.000001 # Write_rows # # table_id: # flags: STMT_END_F
>> -master-bin.000001 # Query # # use `mysqltest1`; INSERT INTO t11 VALUES('Careful
> With That Axe, Eugene')
>> -master-bin.000001 # Query # # use `mysqltest1`; DROP TABLE IF EXISTS t12
>> -master-bin.000001 # Query # # use `mysqltest1`; CREATE TABLE t12 (data LONG)
>> -master-bin.000001 # Table_map # # table_id: # (mysqltest1.t12)
>> -master-bin.000001 # Write_rows # # table_id: # flags: STMT_END_F
>> -master-bin.000001 # Query # # use `mysqltest1`; CREATE
> DEFINER=`root`@`localhost` FUNCTION my_user()
>>
>
>
>
>> -RETURNS CHAR(64)
>> -BEGIN
>> -DECLARE user CHAR(64);
>> -SELECT USER() INTO user;
>> -RETURN user;
>> -END
>> -master-bin.000001 # Query # # use `mysqltest1`; CREATE
> DEFINER=`root`@`localhost` FUNCTION my_current_user()
>> -RETURNS CHAR(64)
>> -BEGIN
>> -DECLARE user CHAR(64);
>> -SELECT CURRENT_USER() INTO user;
>> -RETURN user;
>> -END
>> -master-bin.000001 # Query # # use `mysqltest1`; DROP TABLE IF EXISTS t13
>> -master-bin.000001 # Query # # use `mysqltest1`; CREATE TABLE t13 (data
> CHAR(64))
>> -master-bin.000001 # Table_map # # table_id: # (mysqltest1.t13)
>> -master-bin.000001 # Write_rows # # table_id: # flags: STMT_END_F
>> -master-bin.000001 # Table_map # # table_id: # (mysqltest1.t13)
>> -master-bin.000001 # Write_rows # # table_id: # flags: STMT_END_F
>> -master-bin.000001 # Table_map # # table_id: # (mysqltest1.t13)
>> -master-bin.000001 # Write_rows # # table_id: # flags: STMT_END_F
>> -master-bin.000001 # Table_map # # table_id: # (mysqltest1.t13)
>> -master-bin.000001 # Write_rows # # table_id: # flags: STMT_END_F
>> drop database mysqltest1;
>> set global binlog_format =@my_binlog_format;
>>
>>
[snip]
> changes in the following hunk are safe while it's safe to remove
> invisible commit (discussed further).
>
>
>> @@ -1373,26 +1389,20 @@ binlog_end_trans(THD *thd, binlog_trx_da
>> inside a stored function.
>> */
>> thd->binlog_flush_pending_rows_event(TRUE);
>> +
>> + error= mysql_bin_log.write(thd, &trx_data->trans_log, end_ev);
>> + trx_data->reset();
>> +
>> /*
>> - We write the transaction cache to the binary log if either we're
>> - committing the entire transaction, or if we are doing an
>> - autocommit outside a transaction.
>> - */
>> - if (all || !(thd->options & (OPTION_BEGIN | OPTION_NOT_AUTOCOMMIT)))
>> + We need to step the table map version after writing the
>> + transaction cache to disk.
>> + */
>> + mysql_bin_log.update_table_map_version();
>> + statistic_increment(binlog_cache_use, &LOCK_status);
>> + if (trans_log->disk_writes != 0)
>> {
>> - error= mysql_bin_log.write(thd, &trx_data->trans_log, end_ev);
>> - trx_data->reset();
>> - /*
>> - We need to step the table map version after writing the
>> - transaction cache to disk.
>> - */
>> - mysql_bin_log.update_table_map_version();
>> - statistic_increment(binlog_cache_use, &LOCK_status);
>> - if (trans_log->disk_writes != 0)
>> - {
>> - statistic_increment(binlog_cache_disk_use, &LOCK_status);
>> - trans_log->disk_writes= 0;
>> - }
>> + statistic_increment(binlog_cache_disk_use, &LOCK_status);
>> + trans_log->disk_writes= 0;
>> }
>> }
>> else
>>
[snip]
> binlog_commit()'s argument
>
> @param all true if this is the last statement before a COMMIT
> statement;
>
> false
>
> if either this is a statement in a
> transaction but not the last, or if this is a statement
> not inside a BEGIN block and autocommit is on.
>
> the false's meaning has extended with your patch. Could you please update the
> comments?
>
Well... actually, it changed with the introduction of patch for
BUG#12713, but I will update the comment.
>> @@ -1458,26 +1470,63 @@ static int binlog_commit(handlerton *hto
>> trx_data->reset();
>> DBUG_RETURN(0);
>> }
>> +
>> /*
>> - Write commit event if at least one of the following holds:
>> - - the user sends an explicit COMMIT; or
>> - - the autocommit flag is on, and we are not inside a BEGIN.
>> - However, if the user has not sent an explicit COMMIT, and we are
>> - either inside a BEGIN or run with autocommit off, then this is not
>> - the end of a transaction and we should not write a commit event.
>> + Decision table for committing a transaction:
>> +
>>
>
> I like the idea of the table.
>
> I was asking yesterday and you agreed to make the table desciption
> more verbose.
> Besides from what you kindly explained - the columns correspond to
> combinations of the five paraments we need to clarify/define some of
> them like Real transaction.
>
>
>> + ============================= = = = = = = = = = = = = = = = =
>> + Real transaction N N N N N N N N N N N N N N N N
>> + Statement in cache N N N N N N N N Y Y Y Y Y Y Y Y
>> + In transaction N N N N Y Y Y Y N N N N Y Y Y Y
>> + Stmt modified non-trans N N Y Y N N Y Y N N Y Y N N Y Y
>> + All modified non-trans N Y N Y N Y N Y N Y N Y N Y N Y
>> +
>> + Action: (C)ommit/(A)ccumulate C C C C A C C C - - - - A A A A
>> + ============================= = = = = = = = = = = = = = = = =
>> +
>> +
>> + ============================= = = = = = = = = = = = = = = = =
>> + Real transaction Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
>> + Statement in cache N N N N N N N N Y Y Y Y Y Y Y Y
>> + In transaction N N N N Y Y Y Y N N N N Y Y Y Y
>> + Stmt modified non-trans N N Y Y N N Y Y N N Y Y N N Y Y
>> + All modified non-trans N Y N Y N Y N Y N Y N Y N Y N Y
>> +
>> + (C)ommit/(A)ccumulate/(-) - - - - C C C C - - - - C C C C
>> + ============================= = = = = = = = = = = = = = = = =
>> +
>>
>
> Without the definition a reader may get confused even more.
> E.g consider the 7th column of the 2nd part of the table.
>
> Y
> N
> Y
> Y
> N
>
> C
>
> Stmt modified non-trans Y constradicts to All modified non-trans N
> while the combination is benign.
>
Well, we are committing a statement, which means that the flag for the
transaction has not been updated. Basically, this code is executed just
before the state is changed from non-committed to committed.
>
>
>
>> + In other words, we commit the transaction if and only if:
>> + - The keep log option is set
>>
>
> first,
> i think there should be explicit
>
> OR
>
> in between of the items
>
OK.
> second,
>
> the original logics do not mention (thd->options & OPTION_KEEP_LOG)
> in the following if arglist.
> What is the reason on add it?
>
It was there from the beginning in both binlog_rollback() and
binlog_savepoint_rollback(), so I have just added that as a condition.
The option is used to force the statement to be written to the binary
log in the case of a rollback, so it may be partially redundant (it's
main purpose seems to be to ensure that DROP TEMPORARY TABLE and CREATE
TEMPORARY TABLE goes to the binary log). I think we should go over it's
usage, but that is beyond this patch.
>
>
>
>> + - We are in a transaction and:
>> + - A full transaction is committed,
>> + - A non-transactional statement is committed and there is
>> + no statement cached, or
>>
>
> OR (as above)
>
>
>> + - We are not in a transaction and committing a statement
>>
>
> the last condition is actually what your fixes add to the original
> logics: binlog_commit() can be called with all == false and
> in_transaction == false and with the very intention to sink the trans
> cache into the binlog.
>
>
>
>> +
>> + Otherwise, we accumulate the statement
>> */
>> - if (all || !(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN)))
>> + ulonglong const in_transaction=
>> + thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN);
>> + DBUG_PRINT("debug",
>> + ("all: %d, empty: %s, in_transaction: %s,
> all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s",
>> + all,
>> + YESNO(trx_data->empty()),
>> + YESNO(in_transaction),
>> + YESNO(thd->transaction.all.modified_non_trans_table),
>> + YESNO(thd->transaction.stmt.modified_non_trans_table)));
>>
>
>
>> + if ((thd->options & OPTION_KEEP_LOG) ||
>>
>
> so I am not sure about the first case (thd->options &
> OPTION_KEEP_LOG) and need your clarification.
>
>
>> + in_transaction &&
>> + (all ||
>> + (!trx_data->at_least_one_stmt &&
>> + thd->transaction.stmt.modified_non_trans_table)) ||
>>
>
> Previously binlog_commit called binlog_end_trans(thd, trx_data, &qev)
> having as warrants to do that either
>
> all or ! in_transaction
>
> Now i see that in the case
>
> in_transaction is true,
> all is false,
> the current statement has made what the identifier says -
> thd->transaction.stmt.modified_non_trans_table
> and the trans cache is empty
>
> binlog_commit() is going to inject COMMIT into the binlog.
>
> And what is a use case scenario for such combination?
>
Yes, most of the logic has been moved away from binlog_end_trans() in
favor of doing the actual checking in the binlog_commit() et.al. This in
order to simplify the logic and allow such things as what you just
mention to actually be visible. The scenario is:
CREATE TABLE t1 (a INT) ENGINE=MyISAM;
CREATE TABLE t2 (a INT) ENGINE=InnoDB;
BEGIN;
INSERT INTO t1 VALUES (1),(2);
INSERT INTO t2 VALUES (1),(2);
COMMIT;
When doing that on a server, I get the following binary log, which means
that statement-based and row-based behave identically.
mysql> show binlog events;
+-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos |
Info |
+-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
| master-bin.000001 | 4 | Format_desc | 1 | 106 |
Server ver: 5.1.24-rc-debug-log, Binlog ver: 4 |
| master-bin.000001 | 106 | Query | 1 | 206 | use
`test`; create table t1 (a int) engine=myisam |
| master-bin.000001 | 206 | Query | 1 | 306 | use
`test`; create table t2 (a int) engine=innodb |
| master-bin.000001 | 306 | Query | 1 | 374 | use
`test`; BEGIN |
| master-bin.000001 | 374 | Table_map | 1 | 415 |
table_id: 16 (test.t1) |
| master-bin.000001 | 415 | Write_rows | 1 | 454 |
table_id: 16 flags: STMT_END_F |
| master-bin.000001 | 454 | Query | 1 | 523 | use
`test`; COMMIT |
| master-bin.000001 | 523 | Query | 1 | 591 | use
`test`; BEGIN |
| master-bin.000001 | 591 | Table_map | 1 | 632 |
table_id: 17 (test.t2) |
| master-bin.000001 | 632 | Write_rows | 1 | 671 |
table_id: 17 flags: STMT_END_F |
| master-bin.000001 | 671 | Table_map | 1 | 712 |
table_id: 16 (test.t1) |
| master-bin.000001 | 712 | Write_rows | 1 | 751 |
table_id: 16 flags: STMT_END_F |
| master-bin.000001 | 751 | Xid | 1 | 778 |
COMMIT /* xid=7 */ |
| master-bin.000001 | 778 | Query | 1 | 869 | use
`test`; insert into t1 values(1),(2) |
| master-bin.000001 | 869 | Query | 1 | 937 | use
`test`; BEGIN |
| master-bin.000001 | 937 | Query | 1 | 1028 | use
`test`; insert into t2 values(1),(2) |
| master-bin.000001 | 1028 | Query | 1 | 1119 | use
`test`; insert into t1 values(3),(4) |
| master-bin.000001 | 1119 | Xid | 1 | 1146 |
COMMIT /* xid=15 */ |
+-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
18 rows in set (0.00 sec)
> Previously the first two items
> in_transaction is true,
> all is false,
> meant an ordirary dml query not commit is about to end and therefore
> nothing to commit yet neither to binlog.
>
That holds only for transactional engines, not for non-transactional
engines.
> Btw that piece of code seems to work (effectively i.e with actual
> writing into binlog) only if there is no xa-support.
> Besides to clarify I am asking you to check how no-xa-support-in-server
> is going to cooperate with the current patch.
>
It will work fine: XA is basically used for row-based replication
involving InnoDB tables. The culprit is statement-based replication for
non-transactional tables, which do not register the transaction at all.
>
>
>> + !in_transaction && !all)
>>
>
> and this is the new rule for committing which I am okay with.
>
It was actually there before, just implicitly.
>
>> {
>> Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, FALSE);
>> qev.error_code= 0; // see comment in MYSQL_LOG::write(THD, IO_CACHE)
>> int error= binlog_end_trans(thd, trx_data, &qev, all);
>> DBUG_RETURN(error);
>> }
>>
>
>
>> - else
>> - {
>> - int error= binlog_end_trans(thd, trx_data, &invisible_commit, all);
>> - DBUG_RETURN(error);
>> - }
>>
>
> invisible_commit seemed to be a dead piece of code. Again, as was
> suggested xa-off server testing can change this preliminary
> conclusion.
>
It's dead after Sven's fix.
> If it proves to be dead indeed then you need to remove the global object and
> the class as well.
>
True.
> From another hand, naturally redundant COMMIT and ROLLBACK completing events on
> myisam table in the binlog is something we'd rather to consider how to
> avoid. And this invisible commit could serve for such purpose.
>
Problem is that they cannot basically be removed because the slave has
to know where the transaction started and where it ended. Consider the
case of replicating form a non-transactional engine (not writing
COMMIT), to a transactional engine when AUTOCOMMIT is false. This will
make all statements sent by the master appear as one gigantic
transaction on the slave, and we do not want that.
Long-term, we will add flags to the events instead to mark BEGIN, COMMIT
and ROLLBACK and remove the separate queries. This will allow the binlog
with the BEGIN and COMMIT to work as before, while eliminating it for
new binary logs.
> But I am okay with the current approach having the redundant queries
> in binlog in this case.
>
>
>
>> + DBUG_RETURN(0);
>> }
>>
>>
>
> It's worth to mention a confusing in my opinion description of `all'
> for begin_rollback. Should not it be `ROLLBACK' on place of `COMMIT'?!
> And actully all=true can be a reaction to force rolling back not the
> query.
>
>
> @param all true if this is the last statement before a COMMIT
> statement; false if either this is a statement in a
> transaction but not the last, or if this is a statement
> not inside a BEGIN block and autocommit is on.
>
> If so, could you please fix that?
>
OK.
> Also, as in the commit case the description of `all' needs updating.
> Could you please state explicitly that all == false and
>
OK.
>
>> /**
>> @@ -1509,21 +1558,36 @@ static int binlog_rollback(handlerton *h
>> DBUG_RETURN(0);
>> }
>>
>> - /*
>> - Update the binary log with a BEGIN/ROLLBACK block if we have
>> - cached some queries and we updated some non-transactional
>> - table. Such cases should be rare (updating a
>> - non-transactional table inside a transaction...)
>> - */
>> - if (unlikely(thd->transaction.all.modified_non_trans_table ||
>> - (thd->options & OPTION_KEEP_LOG)))
>> + DBUG_PRINT("debug", ("all: %s, all.modified_non_trans_table: %s,
> stmt.modified_non_trans_table: %s",
>> + YESNO(all),
>> + YESNO(thd->transaction.all.modified_non_trans_table),
>> +
> YESNO(thd->transaction.stmt.modified_non_trans_table)));
>> + if (all && thd->transaction.all.modified_non_trans_table ||
>> + !all && thd->transaction.stmt.modified_non_trans_table ||
>> + (thd->options & OPTION_KEEP_LOG))
>> {
>> + /*
>> + We write the transaction cache with a rollback last if we have
>> + modified any non-transactional table. We do this even if we are
>> + committing a single statement that has modified a
>> + non-transactional table since it can have modified a
>> + transactional table in that statement as well, which needs to be
>> + rolled back on the slave.
>> + */
>> Query_log_event qev(thd, STRING_WITH_LEN("ROLLBACK"), TRUE, FALSE);
>> qev.error_code= 0; // see comment in MYSQL_LOG::write(THD, IO_CACHE)
>> error= binlog_end_trans(thd, trx_data, &qev, all);
>> }
>> - else
>> + else if (all && !thd->transaction.all.modified_non_trans_table ||
>> + !all && !thd->transaction.stmt.modified_non_trans_table)
>> + {
>> + /*
>> + If we have modified only transactional tables, we can truncate
>> + the transaction cache without writing anything to the binary
>> + log.
>> + */
>> error= binlog_end_trans(thd, trx_data, 0, all);
>> + }
>> DBUG_RETURN(error);
>> }
>>
>> diff -Nrup a/sql/log_event.cc b/sql/log_event.cc
>> --- a/sql/log_event.cc 2008-02-28 18:55:42 +01:00
>> +++ b/sql/log_event.cc 2008-03-25 10:39:24 +01:00
>> @@ -7054,7 +7054,7 @@ int Table_map_log_event::save_field_meta
>> #if !defined(MYSQL_CLIENT)
>> Table_map_log_event::Table_map_log_event(THD *thd, TABLE *tbl, ulong tid,
>> bool is_transactional, uint16 flags)
>> - : Log_event(thd, 0, is_transactional),
>> + : Log_event(thd, 0, true),
>> m_table(tbl),
>> m_dbnam(tbl->s->db.str),
>> m_dblen(m_dbnam ? tbl->s->db.length : 0),
>> diff -Nrup a/sql/sql_insert.cc b/sql/sql_insert.cc
>> --- a/sql/sql_insert.cc 2008-02-19 13:45:16 +01:00
>> +++ b/sql/sql_insert.cc 2008-03-25 10:39:25 +01:00
>> @@ -2426,6 +2426,7 @@ pthread_handler_t handle_delayed_insert(
>> */
>> di->table->file->ha_release_auto_increment();
>> mysql_unlock_tables(thd, lock);
>>
>
>
>> + ha_autocommit_or_rollback(thd, 0);
>>
>
> that's clear why.
> Still would be better to leave comments what the func is doing here.
> (actually it's just in order to call binlog_commit() eventually, is
> not it?)
>
With the change introduced by Kostja, all work should be terminated with
an autocommit. This is mainly for replication, but it makes sense to
commit the work done when no more work is coming in because the delayed
statement ended.
>
>
>> di->group_count=0;
>> pthread_mutex_lock(&di->mutex);
>> }
>> @@ -3677,13 +3678,10 @@ void select_create::abort()
>> DBUG_ENTER("select_create::abort");
>>
>> /*
>> - Disable binlog, because we "roll back" partial inserts in ::abort
>> - by removing the table, even for non-transactional tables.
>> - */
>> - tmp_disable_binlog(thd);
>> - /*
>> In select_insert::abort() we roll back the statement, including
>> - truncating the transaction cache of the binary log.
>> + truncating the transaction cache of the binary log. To do this, we
>> + pretend that the statement is transactional, even though it might
>> + be the case that it was not.
>>
>> We roll back the statement prior to deleting the table and prior
>> to releasing the lock on the table, since there might be potential
>> @@ -3694,7 +3692,9 @@ void select_create::abort()
>> of the table succeeded or not, since we need to reset the binary
>> log state.
>> */
>> + tmp_disable_binlog(thd);
>> select_insert::abort();
>> + thd->transaction.stmt.modified_non_trans_table= FALSE;
>> reenable_binlog(thd);
>>
>>
>> diff -Nrup a/sql/sql_load.cc b/sql/sql_load.cc
>> --- a/sql/sql_load.cc 2008-02-19 13:45:17 +01:00
>> +++ b/sql/sql_load.cc 2008-03-25 10:39:25 +01:00
>> @@ -513,6 +513,7 @@ bool mysql_load(THD *thd,sql_exchange *e
>> err:
>> DBUG_ASSERT(transactional_table || !(info.copied || info.deleted) ||
>> thd->transaction.stmt.modified_non_trans_table);
>>
>
>
>> + ha_autocommit_or_rollback(thd, error);
>>
>
> Why do we need this reincarnation?
> is anything wrong with relying on dispatch_command calls the function?
>
Oops... seems to be the remains of me trying to fix the problems in the
load function. I tried removing it, and it seems to work anyway. I'll do
a more through test.
/Matz
>
> regards,
>
> Andrei
>
>
>
--
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com