List:Commits« Previous MessageNext Message »
From:Mats Kindahl Date:March 28 2008 7:58am
Subject:Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020
View as plain text  
Andrei Elkin wrote:
> Mats, hello.
>
> Thanks for acknowledging my earlier comments.
>
> This mail explains some of my questions.
> There remain two outstanding of them.
>
>   
>>> I think you need to mention some facts about user visible behavior
>>> like possible appearence of redundant COMMIT/ROLLBACK in binlog
>>> (discussed further).
>>>   
>>>       
>> Well... I think not. This is the result of the patch Sven did and is
>> not really related to this patch (at least not very much).
>>
>>     
>
> Please do not confuse with autocommit-ON commit that patch made. I was
> rather about the content of binlog after making e.g an insert into a
> non-ta table.
>   

Hmmm... I was actually thinking of the patch for BUG#29288, but I see 
that is not pushed to the team tree.

> After your current patch it gains more BEGIN/COMMIT braces:
>   

Yes, and this is intentional.

>    mysql> reset master; show create table tm; insert into tm values (4001); show
> binlog events\G
>    Query OK, 0 rows affected (0.00 sec)
>
>   
> +-------+--------------------------------------------------------------------------------------------------------+
>    | Table | Create Table                                                            
>                               |
>   
> +-------+--------------------------------------------------------------------------------------------------------+
>    | tm    | CREATE TABLE `tm` (
>    `a` int(11) NOT NULL,
>    PRIMARY KEY (`a`)
>    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | 
>   
> +-------+--------------------------------------------------------------------------------------------------------+
>    1 row in set (0.00 sec)
>
>    Query OK, 1 row affected (0.01 sec)
>
>    *************************** 1. row ***************************
>    Log_name: mysql1000-bin.000001
>    Pos: 4
>    Event_type: Format_desc
>    Server_id: 1
>    End_log_pos: 106
>    Info: Server ver: 5.1.24-rc-debug-log, Binlog ver: 4
>    *************************** 2. row ***************************
>    Log_name: mysql1000-bin.000001
>    Pos: 106
>    Event_type: Query
>    Server_id: 1
>    End_log_pos: 174
>    Info: use `test`; BEGIN
>    *************************** 3. row ***************************
>    Log_name: mysql1000-bin.000001
>    Pos: 174
>    Event_type: Table_map
>    Server_id: 1
>    End_log_pos: 215
>    Info: table_id: 16 (test.tm)
>    *************************** 4. row ***************************
>    Log_name: mysql1000-bin.000001
>    Pos: 215
>    Event_type: Write_rows
>    Server_id: 1
>    End_log_pos: 249
>    Info: table_id: 16 flags: STMT_END_F
>    *************************** 5. row ***************************
>    Log_name: mysql1000-bin.000001
>    Pos: 249
>    Event_type: Query
>    Server_id: 1
>    End_log_pos: 318
>    Info: use `test`; COMMIT
>    5 rows in set (0.01 sec)
>
> And you actually are talking on that issue further along the text.
>
>
>   
>> [snip]
>>
>>     
>>>> --- a/mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result	2007-12-14
> 14:40:43 +01:00
>>>> +++ b/mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result	2008-03-25
> 10:39:23 +01:00
>>>> @@ -425,673 +425,5 @@ INSERT INTO t13 VALUES (USER());
>>>>  INSERT INTO t13 VALUES (my_user());
>>>>  INSERT INTO t13 VALUES (CURRENT_USER());
>>>>  INSERT INTO t13 VALUES (my_current_user());
>>>>     
>>>>         
>>> I really appreciate that you are removing big results of show-binlog-events.
>>>
>>> Still, I think it's necessary to explain why `show binlog event' is safe to
>>> remove. Do we have any risk to loosing something in that dropped piece of
>>> results?
>>>   
>>>       
>> Well... the main reason is that we should not check the binlog in
>> replication tests, we should check that replication works. We have a
>> much smaller suite for checking that items go into the binary log in a
>> way that we expect, but if replication works correctly, it does not
>> matter what is in the binary log.
>>
>> I can add a note that it was removed because we shouldn't rely on the
>> contents of the binary log in replication tests.
>>     
>
> I agree that checking binlog content might be sometimes far from the perfect
> testing. Still in this particular BUG#28086 related test of
> rpl_switch_stm_row_mixed `show binlog events' is the *only* check,
> we may not remove it.
>
> I think we can achieve the aim of shortening the results with
> exploiting a wonderful JasonH's improvement:
>
> starting the test for BUG#28086 with
>
> +let $binlog_start= query_get_value(SHOW MASTER STATUS, Position, 1);
>
> doing its logics
>
> so that the final report
>
> source include/show_binlog_events.inc;
>
> stays but will result in events only after $binlog_start.
>
> I think that's a good compromise.
>   

Yes, I agree. It seems to do the trick.

>
>   
>>>  binlog_commit()'s argument
>>>
>>>   @param all   true if this is the last statement before a COMMIT
>>>                statement;                               false
>>> if either this is a statement in a
>>>                transaction but not the last, or if this is a statement
>>>                not inside a BEGIN block and autocommit is on.
>>>
>>> the false's meaning has extended with your patch. Could you please update
> the
>>> comments?
>>>   
>>>       
>> Well... actually, it changed with the introduction of patch for
>> BUG#12713, but I will update the comment.
>>     
>
> I thought post-bug#12713 server executed binlog_commit() same way as
> the pre- version.
> That's your patch forces committing even though `all' being false.
>   

No, but AIUI, with the introduction of the patch for BUG#12713, every 
statement is terminated with a statement commit, and every real 
transaction is terminated with a transaction commit. Previously, the 
last statement commit could be skipped, but it shouldn't be skipped any 
more.

What do you mean by "forces committing even though `all' being false"? 
Of course we can commit a statement (and even regardless of whether it 
is part of a transaction or not).

>>>> @@ -1458,26 +1470,63 @@ static int binlog_commit(handlerton *hto
>>>>      trx_data->reset();
>>>>      DBUG_RETURN(0);
>>>>    }
>>>> +
>>>>    /*
>>>> -    Write commit event if at least one of the following holds:
>>>> -     - the user sends an explicit COMMIT; or
>>>> -     - the autocommit flag is on, and we are not inside a BEGIN.
>>>> -    However, if the user has not sent an explicit COMMIT, and we are
>>>> -    either inside a BEGIN or run with autocommit off, then this is not
>>>> -    the end of a transaction and we should not write a commit event.
>>>> +    Decision table for committing a transaction:
>>>> +
>>>>     
>>>>         
>>> I like the idea of the table.
>>>
>>> I was asking yesterday and you agreed to make the table desciption
>>> more verbose.
>>> Besides from what you kindly explained - the columns correspond to
>>> combinations of the five paraments we need to clarify/define some of
>>> them like Real transaction.
>>>
>>>   
>>>       
>>>> +    =============================  = = = = = = = = = = = = = = = =
>>>> +    Real transaction               N N N N N N N N N N N N N N N N
>>>> +    Statement in cache             N N N N N N N N Y Y Y Y Y Y Y Y
>>>> +    In transaction                 N N N N Y Y Y Y N N N N Y Y Y Y
>>>> +    Stmt modified non-trans        N N Y Y N N Y Y N N Y Y N N Y Y
>>>> +    All modified non-trans         N Y N Y N Y N Y N Y N Y N Y N Y
>>>> +
>>>> +    Action: (C)ommit/(A)ccumulate  C C C C A C C C - - - - A A A A
>>>> +    =============================  = = = = = = = = = = = = = = = =
>>>> +
>>>> +
>>>> +    =============================  = = = = = = = = = = = = = = = =
>>>> +    Real transaction               Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
>>>> +    Statement in cache             N N N N N N N N Y Y Y Y Y Y Y Y
>>>> +    In transaction                 N N N N Y Y Y Y N N N N Y Y Y Y
>>>> +    Stmt modified non-trans        N N Y Y N N Y Y N N Y Y N N Y Y
>>>> +    All modified non-trans         N Y N Y N Y N Y N Y N Y N Y N Y
>>>> +
>>>> +    (C)ommit/(A)ccumulate/(-)      - - - - C C C C - - - - C C C C
>>>> +    =============================  = = = = = = = = = = = = = = = =
>>>> +
>>>>     
>>>>         
>>> Without the definition a reader may get confused even more.
>>> E.g consider the 7th column of the 2nd part of the table.
>>>
>>> Y
>>> N
>>> Y
>>> Y
>>> N
>>>
>>> C
>>>
>>> Stmt modified non-trans Y constradicts to All modified non-trans N
>>> while the combination is benign.
>>>   
>>>       
>
>   
>> Well, we are committing a statement, which means that the flag for the
>> transaction has not been updated. Basically, this code is executed
>> just before the state is changed from non-committed to committed.
>>
>>     
>
> Considering how merging of `Stmt modified non-trans' to 
> `All modified non-trans' works, and it works as merging right away,
> the column above can not exist; it should have `-' at the bottom.
>   

OK. I've looked closer at the code, and I agree.

>
>   
>>>   
>>>       
>>>> +    In other words, we commit the transaction if and only if:
>>>> +     - The keep log option is set
>>>>     
>>>>         
>>> first,
>>> i think there should be explicit
>>>
>>>         OR
>>>
>>> in between of the items
>>>   
>>>       
>> OK.
>>
>>     
>>> second,
>>>
>>>       
>
>
>
>   
>>> the original logics do not mention  (thd->options & OPTION_KEEP_LOG)
>>> in the following if arglist.
>>> What is the reason on add it?
>>>   
>>>       
>> It was there from the beginning in both binlog_rollback() and
>> binlog_savepoint_rollback(), so I have just added that as a
>> condition. The option is used to force the statement to be written to
>> the binary log in the case of a rollback, so it may be partially
>> redundant (it's main purpose seems to be to ensure that DROP TEMPORARY
>> TABLE and CREATE TEMPORARY TABLE goes to the binary log). I think we
>> should go over it's usage, but that is beyond this patch.
>>
>>     
>
> But that's binlog_commit() that i have meant, and it did no have the
> option referred.
>   

True. As I mentioned on IRC, I tried not using it and semantically there 
is no difference. There is a difference in the binary log in that the 
BEGIN is logged either as "use test; BEGIN" or as just "BEGIN". Safer to 
keep it the way it was though, so I'll change it to the original use of 
OPTION_KEEP_LOG (i.e., not use it).

>   
>>>   
>>>       
>>>> +     - We are in a transaction and:
>>>> +       - A full transaction is committed,
>>>> +       - A non-transactional statement is committed and there is
>>>> +         no statement cached, or
>>>>     
>>>>         
>>>         OR (as above)
>>>       
>
> actually there was or, but i'd rather to place the word on a separate
> line.
>   

Yes, I realized that you meant to clarify the OR between the lines to 
avoid confusion.

>   
>>>   
>>>       
>>>> +     - We are not in a transaction and committing a statement
>>>>     
>>>>         
>>> the last condition is actually what your fixes add to the original
>>> logics: binlog_commit() can be called with all == false and
>>> in_transaction == false and with the very intention to sink the trans
>>> cache into the binlog.
>>>
>>>
>>>   
>>>       
>>>> +
>>>> +    Otherwise, we accumulate the statement
>>>>    */
>>>> -  if (all || !(thd->options & (OPTION_NOT_AUTOCOMMIT |
> OPTION_BEGIN)))
>>>> +  ulonglong const in_transaction=
>>>> +    thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN);
>>>> +  DBUG_PRINT("debug",
>>>> +             ("all: %d, empty: %s, in_transaction: %s,
> all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s",
>>>> +              all,
>>>> +              YESNO(trx_data->empty()),
>>>> +              YESNO(in_transaction),
>>>> +              YESNO(thd->transaction.all.modified_non_trans_table),
>>>> +             
> YESNO(thd->transaction.stmt.modified_non_trans_table)));
>>>>     
>>>>         
>>>   
>>>       
>>>> +  if ((thd->options & OPTION_KEEP_LOG) ||
>>>>     
>>>>         
>
>   
>>> so I am not sure about the first case (thd->options &
>>> OPTION_KEEP_LOG) and need your clarification.
>>>       
>
> Remains open question whether it's needed. Notice, it's
> binlog_commit() code.
>   

See above.

>>>   
>>>       
>>>> +      in_transaction &&
>>>> +      (all
>>>> +       ||
>>>> +       (!trx_data->at_least_one_stmt &&
>>>> +        thd->transaction.stmt.modified_non_trans_table)) ||
>>>>     
>>>>         
>>> Previously binlog_commit called binlog_end_trans(thd, trx_data, &qev)
>>> having as warrants to do that either
>>>    all or ! in_transaction
>>>
>>> Now i see that in the case
>>>
>>>    in_transaction is true,
>>>    all is false,
>>>    the current statement has made what the identifier says - 
>>>    thd->transaction.stmt.modified_non_trans_table
>>>    and the trans cache is empty 
>>>
>>> binlog_commit() is going to inject COMMIT into the binlog.
>>>
>>> And what is a use case scenario for such combination?
>>>   
>>>       
>
>
>   
>> Yes, most of the logic has been moved away from binlog_end_trans() in
>> favor of doing the actual checking in the binlog_commit() et.al. This
>> in order to simplify the logic and allow such things as what you just
>> mention to actually be visible. The scenario is:
>>
>> CREATE TABLE t1 (a INT) ENGINE=MyISAM;
>> CREATE TABLE t2 (a INT) ENGINE=InnoDB;
>>
>> BEGIN;
>> INSERT INTO t1 VALUES (1),(2);
>> INSERT INTO t2 VALUES (1),(2);
>> COMMIT;
>>
>> When doing that on a server, I get the following binary log, which
>> means that statement-based and row-based behave identically.
>>
>> mysql> show binlog events;
>>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
>> | Log_name          | Pos  | Event_type  | Server_id | End_log_pos |
>> Info                                              |
>>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
>> | master-bin.000001 |    4 | Format_desc |         1 |         106 |
>> Server ver: 5.1.24-rc-debug-log, Binlog ver: 4    |
>> | master-bin.000001 |  106 | Query       |         1 |         206 |
>> use `test`; create table t1 (a int) engine=myisam |
>> | master-bin.000001 |  206 | Query       |         1 |         306 |
>> use `test`; create table t2 (a int) engine=innodb |
>> | master-bin.000001 |  306 | Query       |         1 |         374 |
>> use `test`; BEGIN                                 |
>> | master-bin.000001 |  374 | Table_map   |         1 |         415 |
>> table_id: 16 (test.t1)                            |
>> | master-bin.000001 |  415 | Write_rows  |         1 |         454 |
>> table_id: 16 flags: STMT_END_F                    |
>> | master-bin.000001 |  454 | Query       |         1 |         523 |
>> use `test`; COMMIT                                |
>> | master-bin.000001 |  523 | Query       |         1 |         591 |
>> use `test`; BEGIN                                 |
>> | master-bin.000001 |  591 | Table_map   |         1 |         632 |
>> table_id: 17 (test.t2)                            |
>> | master-bin.000001 |  632 | Write_rows  |         1 |         671 |
>> table_id: 17 flags: STMT_END_F                    |
>> | master-bin.000001 |  671 | Table_map   |         1 |         712 |
>> table_id: 16 (test.t1)                            |
>> | master-bin.000001 |  712 | Write_rows  |         1 |         751 |
>> table_id: 16 flags: STMT_END_F                    |
>> | master-bin.000001 |  751 | Xid         |         1 |         778 |
>> COMMIT /* xid=7 */                                |
>> | master-bin.000001 |  778 | Query       |         1 |         869 |
>> use `test`; insert into t1 values(1),(2)          |
>> | master-bin.000001 |  869 | Query       |         1 |         937 |
>> use `test`; BEGIN                                 |
>> | master-bin.000001 |  937 | Query       |         1 |        1028 |
>> use `test`; insert into t2 values(1),(2)          |
>> | master-bin.000001 | 1028 | Query       |         1 |        1119 |
>> use `test`; insert into t1 values(3),(4)          |
>> | master-bin.000001 | 1119 | Xid         |         1 |        1146 |
>> COMMIT /* xid=15 */                               |
>>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
>> 18 rows in set (0.00 sec)
>>     
>
> I think you mean that if the first query after BEGIN changes on a
> non-ta table the query binlogged ahead of BEGIN with binlog format
> stmt, correct?
>   

Yes.

> But there was no special reason behind that policy. It has remained for
> statement format as a consequence of binlogging algirthm which starts
> filling the trans cache at first appearence of a transactional table.
>   

The reason for this policy is that there is no obvious reason to hold 
not send that change over to the slave since it has already occurred. 
The following scenario would fail if we did not do that (both threads 
are on the master):

T-1 T-2
W(t) |
| W(t)
| |
| C
R(t)
W(n)
|
C

This scenario would move the T-2 W(t) ahead of the T-1 W(t), and the 
only reason that we do *not* write all non-transactional writes directly 
to the binary log is because it would potentially cause the slave to 
stop hard with a syntax error.

> With your fixes, I believe, we could do not copy that statement tricky
> behaviour and to put the first non-ta query row-events into the cache
> to appear in binlog after BEGIN.
>
> Getting back to that if
>
> +  if ((thd->options & OPTION_KEEP_LOG) ||
> +      in_transaction &&
> +      (all ||
> +       (!trx_data->at_least_one_stmt &&
> +        thd->transaction.stmt.modified_non_trans_table)) ||
> +      !in_transaction && !all)
>
> Why should not it be just as it was before:
>
> -  if (all || !in_transaction)
>
> ?
>   

Because that would break the case I give above.

> You principal use case of ac=ON query on a not-ta table with row-based
> format satisfy.
> All can be false for format.
> Then in_transaction == false and the rows binlog.
>
> Yes, there will be "difference"  in the order of binlogging comparing
> with stmt format. However, indeed that would be correction of the order of
> binlogging.
>   

The AUTO_COMMIT=ON case has to do with writing BEGIN/COMMIT for each 
transactional unit.

Just my few cents,
Mats Kindahl

-- 
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com


Thread
bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl25 Mar
  • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Andrei Elkin26 Mar
    • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl27 Mar
      • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Andrei Elkin27 Mar
        • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl28 Mar
          • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Andrei Elkin28 Mar
            • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl28 Mar
  • RE: bk commit into 5.1 tree (mats:1.2559) BUG#29020Chuck Bell26 Mar
    • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl27 Mar
  • RE: bk commit into 5.1 tree (mats:1.2559) BUG#29020Chuck Bell26 Mar
    • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl27 Mar