Mats, hello.
Thanks for acknowledging my earlier comments.
This mail explains some of my questions.
There remain two outstanding of them.
>
>> I think you need to mention some facts about user visible behavior
>> like possible appearence of redundant COMMIT/ROLLBACK in binlog
>> (discussed further).
>>
>
> Well... I think not. This is the result of the patch Sven did and is
> not really related to this patch (at least not very much).
>
Please do not confuse with autocommit-ON commit that patch made. I was
rather about the content of binlog after making e.g an insert into a
non-ta table.
After your current patch it gains more BEGIN/COMMIT braces:
mysql> reset master; show create table tm; insert into tm values (4001); show binlog
events\G
Query OK, 0 rows affected (0.00 sec)
+-------+--------------------------------------------------------------------------------------------------------+
| Table | Create Table
|
+-------+--------------------------------------------------------------------------------------------------------+
| tm | CREATE TABLE `tm` (
`a` int(11) NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Query OK, 1 row affected (0.01 sec)
*************************** 1. row ***************************
Log_name: mysql1000-bin.000001
Pos: 4
Event_type: Format_desc
Server_id: 1
End_log_pos: 106
Info: Server ver: 5.1.24-rc-debug-log, Binlog ver: 4
*************************** 2. row ***************************
Log_name: mysql1000-bin.000001
Pos: 106
Event_type: Query
Server_id: 1
End_log_pos: 174
Info: use `test`; BEGIN
*************************** 3. row ***************************
Log_name: mysql1000-bin.000001
Pos: 174
Event_type: Table_map
Server_id: 1
End_log_pos: 215
Info: table_id: 16 (test.tm)
*************************** 4. row ***************************
Log_name: mysql1000-bin.000001
Pos: 215
Event_type: Write_rows
Server_id: 1
End_log_pos: 249
Info: table_id: 16 flags: STMT_END_F
*************************** 5. row ***************************
Log_name: mysql1000-bin.000001
Pos: 249
Event_type: Query
Server_id: 1
End_log_pos: 318
Info: use `test`; COMMIT
5 rows in set (0.01 sec)
And you actually are talking on that issue further along the text.
>
> [snip]
>
>>> --- a/mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result 2007-12-14
> 14:40:43 +01:00
>>> +++ b/mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result 2008-03-25
> 10:39:23 +01:00
>>> @@ -425,673 +425,5 @@ INSERT INTO t13 VALUES (USER());
>>> INSERT INTO t13 VALUES (my_user());
>>> INSERT INTO t13 VALUES (CURRENT_USER());
>>> INSERT INTO t13 VALUES (my_current_user());
>>>
>>
>> I really appreciate that you are removing big results of show-binlog-events.
>>
>> Still, I think it's necessary to explain why `show binlog event' is safe to
>> remove. Do we have any risk to loosing something in that dropped piece of
>> results?
>>
>
> Well... the main reason is that we should not check the binlog in
> replication tests, we should check that replication works. We have a
> much smaller suite for checking that items go into the binary log in a
> way that we expect, but if replication works correctly, it does not
> matter what is in the binary log.
>
> I can add a note that it was removed because we shouldn't rely on the
> contents of the binary log in replication tests.
I agree that checking binlog content might be sometimes far from the perfect
testing. Still in this particular BUG#28086 related test of
rpl_switch_stm_row_mixed `show binlog events' is the *only* check,
we may not remove it.
I think we can achieve the aim of shortening the results with
exploiting a wonderful JasonH's improvement:
starting the test for BUG#28086 with
+let $binlog_start= query_get_value(SHOW MASTER STATUS, Position, 1);
doing its logics
so that the final report
source include/show_binlog_events.inc;
stays but will result in events only after $binlog_start.
I think that's a good compromise.
>> binlog_commit()'s argument
>>
>> @param all true if this is the last statement before a COMMIT
>> statement; false
>> if either this is a statement in a
>> transaction but not the last, or if this is a statement
>> not inside a BEGIN block and autocommit is on.
>>
>> the false's meaning has extended with your patch. Could you please update the
>> comments?
>>
>
> Well... actually, it changed with the introduction of patch for
> BUG#12713, but I will update the comment.
I thought post-bug#12713 server executed binlog_commit() same way as
the pre- version.
That's your patch forces committing even though `all' being false.
>
>>> @@ -1458,26 +1470,63 @@ static int binlog_commit(handlerton *hto
>>> trx_data->reset();
>>> DBUG_RETURN(0);
>>> }
>>> +
>>> /*
>>> - Write commit event if at least one of the following holds:
>>> - - the user sends an explicit COMMIT; or
>>> - - the autocommit flag is on, and we are not inside a BEGIN.
>>> - However, if the user has not sent an explicit COMMIT, and we are
>>> - either inside a BEGIN or run with autocommit off, then this is not
>>> - the end of a transaction and we should not write a commit event.
>>> + Decision table for committing a transaction:
>>> +
>>>
>>
>> I like the idea of the table.
>>
>> I was asking yesterday and you agreed to make the table desciption
>> more verbose.
>> Besides from what you kindly explained - the columns correspond to
>> combinations of the five paraments we need to clarify/define some of
>> them like Real transaction.
>>
>>
>>> + ============================= = = = = = = = = = = = = = = = =
>>> + Real transaction N N N N N N N N N N N N N N N N
>>> + Statement in cache N N N N N N N N Y Y Y Y Y Y Y Y
>>> + In transaction N N N N Y Y Y Y N N N N Y Y Y Y
>>> + Stmt modified non-trans N N Y Y N N Y Y N N Y Y N N Y Y
>>> + All modified non-trans N Y N Y N Y N Y N Y N Y N Y N Y
>>> +
>>> + Action: (C)ommit/(A)ccumulate C C C C A C C C - - - - A A A A
>>> + ============================= = = = = = = = = = = = = = = = =
>>> +
>>> +
>>> + ============================= = = = = = = = = = = = = = = = =
>>> + Real transaction Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
>>> + Statement in cache N N N N N N N N Y Y Y Y Y Y Y Y
>>> + In transaction N N N N Y Y Y Y N N N N Y Y Y Y
>>> + Stmt modified non-trans N N Y Y N N Y Y N N Y Y N N Y Y
>>> + All modified non-trans N Y N Y N Y N Y N Y N Y N Y N Y
>>> +
>>> + (C)ommit/(A)ccumulate/(-) - - - - C C C C - - - - C C C C
>>> + ============================= = = = = = = = = = = = = = = = =
>>> +
>>>
>>
>> Without the definition a reader may get confused even more.
>> E.g consider the 7th column of the 2nd part of the table.
>>
>> Y
>> N
>> Y
>> Y
>> N
>>
>> C
>>
>> Stmt modified non-trans Y constradicts to All modified non-trans N
>> while the combination is benign.
>>
>
> Well, we are committing a statement, which means that the flag for the
> transaction has not been updated. Basically, this code is executed
> just before the state is changed from non-committed to committed.
>
Considering how merging of `Stmt modified non-trans' to
`All modified non-trans' works, and it works as merging right away,
the column above can not exist; it should have `-' at the bottom.
>>
>>
>>> + In other words, we commit the transaction if and only if:
>>> + - The keep log option is set
>>>
>>
>> first,
>> i think there should be explicit
>>
>> OR
>>
>> in between of the items
>>
>
> OK.
>
>> second,
>>
>> the original logics do not mention (thd->options & OPTION_KEEP_LOG)
>> in the following if arglist.
>> What is the reason on add it?
>>
>
> It was there from the beginning in both binlog_rollback() and
> binlog_savepoint_rollback(), so I have just added that as a
> condition. The option is used to force the statement to be written to
> the binary log in the case of a rollback, so it may be partially
> redundant (it's main purpose seems to be to ensure that DROP TEMPORARY
> TABLE and CREATE TEMPORARY TABLE goes to the binary log). I think we
> should go over it's usage, but that is beyond this patch.
>
But that's binlog_commit() that i have meant, and it did no have the
option referred.
>>
>>
>>
>>> + - We are in a transaction and:
>>> + - A full transaction is committed,
>>> + - A non-transactional statement is committed and there is
>>> + no statement cached, or
>>>
>>
>> OR (as above)
actually there was or, but i'd rather to place the word on a separate
line.
>>
>>
>>> + - We are not in a transaction and committing a statement
>>>
>>
>> the last condition is actually what your fixes add to the original
>> logics: binlog_commit() can be called with all == false and
>> in_transaction == false and with the very intention to sink the trans
>> cache into the binlog.
>>
>>
>>
>>> +
>>> + Otherwise, we accumulate the statement
>>> */
>>> - if (all || !(thd->options & (OPTION_NOT_AUTOCOMMIT |
> OPTION_BEGIN)))
>>> + ulonglong const in_transaction=
>>> + thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN);
>>> + DBUG_PRINT("debug",
>>> + ("all: %d, empty: %s, in_transaction: %s,
> all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s",
>>> + all,
>>> + YESNO(trx_data->empty()),
>>> + YESNO(in_transaction),
>>> + YESNO(thd->transaction.all.modified_non_trans_table),
>>> + YESNO(thd->transaction.stmt.modified_non_trans_table)));
>>>
>>
>>
>>> + if ((thd->options & OPTION_KEEP_LOG) ||
>>>
>>
>> so I am not sure about the first case (thd->options &
>> OPTION_KEEP_LOG) and need your clarification.
Remains open question whether it's needed. Notice, it's
binlog_commit() code.
>>
>>
>>> + in_transaction &&
>>> + (all
>>> + ||
>>> + (!trx_data->at_least_one_stmt &&
>>> + thd->transaction.stmt.modified_non_trans_table)) ||
>>>
>>
>> Previously binlog_commit called binlog_end_trans(thd, trx_data, &qev)
>> having as warrants to do that either
>> all or ! in_transaction
>>
>> Now i see that in the case
>>
>> in_transaction is true,
>> all is false,
>> the current statement has made what the identifier says -
>> thd->transaction.stmt.modified_non_trans_table
>> and the trans cache is empty
>>
>> binlog_commit() is going to inject COMMIT into the binlog.
>>
>> And what is a use case scenario for such combination?
>>
> Yes, most of the logic has been moved away from binlog_end_trans() in
> favor of doing the actual checking in the binlog_commit() et.al. This
> in order to simplify the logic and allow such things as what you just
> mention to actually be visible. The scenario is:
>
> CREATE TABLE t1 (a INT) ENGINE=MyISAM;
> CREATE TABLE t2 (a INT) ENGINE=InnoDB;
>
> BEGIN;
> INSERT INTO t1 VALUES (1),(2);
> INSERT INTO t2 VALUES (1),(2);
> COMMIT;
>
> When doing that on a server, I get the following binary log, which
> means that statement-based and row-based behave identically.
>
> mysql> show binlog events;
>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
> | Log_name | Pos | Event_type | Server_id | End_log_pos |
> Info |
>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
> | master-bin.000001 | 4 | Format_desc | 1 | 106 |
> Server ver: 5.1.24-rc-debug-log, Binlog ver: 4 |
> | master-bin.000001 | 106 | Query | 1 | 206 |
> use `test`; create table t1 (a int) engine=myisam |
> | master-bin.000001 | 206 | Query | 1 | 306 |
> use `test`; create table t2 (a int) engine=innodb |
> | master-bin.000001 | 306 | Query | 1 | 374 |
> use `test`; BEGIN |
> | master-bin.000001 | 374 | Table_map | 1 | 415 |
> table_id: 16 (test.t1) |
> | master-bin.000001 | 415 | Write_rows | 1 | 454 |
> table_id: 16 flags: STMT_END_F |
> | master-bin.000001 | 454 | Query | 1 | 523 |
> use `test`; COMMIT |
> | master-bin.000001 | 523 | Query | 1 | 591 |
> use `test`; BEGIN |
> | master-bin.000001 | 591 | Table_map | 1 | 632 |
> table_id: 17 (test.t2) |
> | master-bin.000001 | 632 | Write_rows | 1 | 671 |
> table_id: 17 flags: STMT_END_F |
> | master-bin.000001 | 671 | Table_map | 1 | 712 |
> table_id: 16 (test.t1) |
> | master-bin.000001 | 712 | Write_rows | 1 | 751 |
> table_id: 16 flags: STMT_END_F |
> | master-bin.000001 | 751 | Xid | 1 | 778 |
> COMMIT /* xid=7 */ |
> | master-bin.000001 | 778 | Query | 1 | 869 |
> use `test`; insert into t1 values(1),(2) |
> | master-bin.000001 | 869 | Query | 1 | 937 |
> use `test`; BEGIN |
> | master-bin.000001 | 937 | Query | 1 | 1028 |
> use `test`; insert into t2 values(1),(2) |
> | master-bin.000001 | 1028 | Query | 1 | 1119 |
> use `test`; insert into t1 values(3),(4) |
> | master-bin.000001 | 1119 | Xid | 1 | 1146 |
> COMMIT /* xid=15 */ |
>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
> 18 rows in set (0.00 sec)
I think you mean that if the first query after BEGIN changes on a
non-ta table the query binlogged ahead of BEGIN with binlog format
stmt, correct?
But there was no special reason behind that policy. It has remained for
statement format as a consequence of binlogging algirthm which starts
filling the trans cache at first appearence of a transactional table.
With your fixes, I believe, we could do not copy that statement tricky
behaviour and to put the first non-ta query row-events into the cache
to appear in binlog after BEGIN.
Getting back to that if
+ if ((thd->options & OPTION_KEEP_LOG) ||
+ in_transaction &&
+ (all ||
+ (!trx_data->at_least_one_stmt &&
+ thd->transaction.stmt.modified_non_trans_table)) ||
+ !in_transaction && !all)
Why should not it be just as it was before:
- if (all || !in_transaction)
?
You principal use case of ac=ON query on a not-ta table with row-based
format satisfy.
All can be false for format.
Then in_transaction == false and the rows binlog.
Yes, there will be "difference" in the order of binlogging comparing
with stmt format. However, indeed that would be correction of the order of
binlogging.
>
>> Previously the first two items
>> in_transaction is true,
>> all is false,
>> meant an ordirary dml query not commit is about to end and therefore
>> nothing to commit yet neither to binlog.
>>
>
> That holds only for transactional engines, not for non-transactional
> engines.
>
>> Btw that piece of code seems to work (effectively i.e with actual
>> writing into binlog) only if there is no xa-support.
>> Besides to clarify I am asking you to check how no-xa-support-in-server
>> is going to cooperate with the current patch.
>>
>
> It will work fine: XA is basically used for row-based replication
> involving InnoDB tables. The culprit is statement-based replication
> for non-transactional tables, which do not register the transaction at
> all.
>
>>
>>
>>> + !in_transaction && !all)
>>>
>>
>> and this is the new rule for committing which I am okay with.
>>
>
> It was actually there before, just implicitly.
>
>>
>>> {
>>> Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, FALSE);
>>> qev.error_code= 0; // see comment in MYSQL_LOG::write(THD, IO_CACHE)
>>> int error= binlog_end_trans(thd, trx_data, &qev, all);
>>> DBUG_RETURN(error);
>>> }
>>>
>>
>>
>>> - else
>>> - {
>>> - int error= binlog_end_trans(thd, trx_data, &invisible_commit, all);
>>> - DBUG_RETURN(error);
>>> - }
>>>
>>
>> invisible_commit seemed to be a dead piece of code. Again, as was
>> suggested xa-off server testing can change this preliminary
>> conclusion.
>>
>
> It's dead after Sven's fix.
>
I pasted earlier my log to show that your changes were meant.
>> If it proves to be dead indeed then you need to remove the global object and
>> the class as well.
>>
>
> True.
>
>> From another hand, naturally redundant COMMIT and ROLLBACK completing events on
>> myisam table in the binlog is something we'd rather to consider how to
>> avoid. And this invisible commit could serve for such purpose.
>>
>
> Problem is that they cannot basically be removed because the slave has
> to know where the transaction started and where it ended. Consider the
> case of replicating form a non-transactional engine (not writing
> COMMIT), to a transactional engine when AUTOCOMMIT is false. This will
> make all statements sent by the master appear as one gigantic
> transaction on the slave, and we do not want that.
>
> Long-term, we will add flags to the events instead to mark BEGIN,
> COMMIT and ROLLBACK and remove the separate queries. This will allow
> the binlog with the BEGIN and COMMIT to work as before, while
> eliminating it for new binary logs.
>
>> But I am okay with the current approach having the redundant queries
>> in binlog in this case.
>>
>>
>>
>>> + DBUG_RETURN(0);
>>> }
>>>
>>
>> It's worth to mention a confusing in my opinion description of `all'
>> for begin_rollback. Should not it be `ROLLBACK' on place of `COMMIT'?!
>> And actully all=true can be a reaction to force rolling back not the
>> query.
>>
>>
>> @param all true if this is the last statement before a COMMIT
>> statement; false if either this is a statement in a
>> transaction but not the last, or if this is a statement
>> not inside a BEGIN block and autocommit is on.
>>
>> If so, could you please fix that?
>>
>
> OK.
>
>> Also, as in the commit case the description of `all' needs updating.
>> Could you please state explicitly that all == false and
>
> OK.
>
>>
>>> /**
>>> @@ -1509,21 +1558,36 @@ static int binlog_rollback(handlerton *h
>>> DBUG_RETURN(0);
>>> }
>>> - /*
>>> - Update the binary log with a BEGIN/ROLLBACK block if we have
>>> - cached some queries and we updated some non-transactional
>>> - table. Such cases should be rare (updating a
>>> - non-transactional table inside a transaction...)
>>> - */
>>> - if (unlikely(thd->transaction.all.modified_non_trans_table || -
>>> (thd->options & OPTION_KEEP_LOG)))
>>> + DBUG_PRINT("debug", ("all: %s, all.modified_non_trans_table: %s,
> stmt.modified_non_trans_table: %s",
>>> + YESNO(all),
>>> +
> YESNO(thd->transaction.all.modified_non_trans_table),
>>> +
> YESNO(thd->transaction.stmt.modified_non_trans_table)));
>>> + if (all && thd->transaction.all.modified_non_trans_table ||
>>> + !all && thd->transaction.stmt.modified_non_trans_table ||
>>> + (thd->options & OPTION_KEEP_LOG))
>>> {
>>> + /*
>>> + We write the transaction cache with a rollback last if we have
>>> + modified any non-transactional table. We do this even if we are
>>> + committing a single statement that has modified a
>>> + non-transactional table since it can have modified a
>>> + transactional table in that statement as well, which needs to be
>>> + rolled back on the slave.
>>> + */
>>> Query_log_event qev(thd, STRING_WITH_LEN("ROLLBACK"), TRUE, FALSE);
>>> qev.error_code= 0; // see comment in MYSQL_LOG::write(THD, IO_CACHE)
>>> error= binlog_end_trans(thd, trx_data, &qev, all);
>>> }
>>> - else
>>> + else if (all && !thd->transaction.all.modified_non_trans_table
> ||
>>> + !all &&
> !thd->transaction.stmt.modified_non_trans_table)
>>> + {
>>> + /*
>>> + If we have modified only transactional tables, we can truncate
>>> + the transaction cache without writing anything to the binary
>>> + log.
>>> + */
>>> error= binlog_end_trans(thd, trx_data, 0, all);
>>> + }
>>> DBUG_RETURN(error);
>>> }
>>> diff -Nrup a/sql/log_event.cc b/sql/log_event.cc
>>> --- a/sql/log_event.cc 2008-02-28 18:55:42 +01:00
>>> +++ b/sql/log_event.cc 2008-03-25 10:39:24 +01:00
>>> @@ -7054,7 +7054,7 @@ int Table_map_log_event::save_field_meta
>>> #if !defined(MYSQL_CLIENT)
>>> Table_map_log_event::Table_map_log_event(THD *thd, TABLE *tbl, ulong tid,
>>> bool is_transactional, uint16
> flags)
>>> - : Log_event(thd, 0, is_transactional),
>>> + : Log_event(thd, 0, true),
>>> m_table(tbl),
>>> m_dbnam(tbl->s->db.str),
>>> m_dblen(m_dbnam ? tbl->s->db.length : 0),
>>> diff -Nrup a/sql/sql_insert.cc b/sql/sql_insert.cc
>>> --- a/sql/sql_insert.cc 2008-02-19 13:45:16 +01:00
>>> +++ b/sql/sql_insert.cc 2008-03-25 10:39:25 +01:00
>>> @@ -2426,6 +2426,7 @@ pthread_handler_t handle_delayed_insert(
>>> */
>>> di->table->file->ha_release_auto_increment();
>>> mysql_unlock_tables(thd, lock);
>>>
>>
>>
>>> + ha_autocommit_or_rollback(thd, 0);
>>>
>>
>> that's clear why.
>> Still would be better to leave comments what the func is doing here.
>> (actually it's just in order to call binlog_commit() eventually, is
>> not it?)
>>
>
> With the change introduced by Kostja, all work should be terminated
> with an autocommit. This is mainly for replication, but it makes sense
> to commit the work done when no more work is coming in because the
> delayed statement ended.
That can make delayed insert delayed ever more :)
Thanks for clarifying. Comments for this added code do not seem to be
mandatory.
cheers,
Andrei