List:Commits« Previous MessageNext Message »
From:Andrei Elkin Date:March 27 2008 4:37pm
Subject:Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020
View as plain text  
Mats, hello.

Thanks for acknowledging my earlier comments.

This mail explains some of my questions.
There remain two outstanding of them.

>
>> I think you need to mention some facts about user visible behavior
>> like possible appearence of redundant COMMIT/ROLLBACK in binlog
>> (discussed further).
>>   
>
> Well... I think not. This is the result of the patch Sven did and is
> not really related to this patch (at least not very much).
>

Please do not confuse with autocommit-ON commit that patch made. I was
rather about the content of binlog after making e.g an insert into a
non-ta table.
After your current patch it gains more BEGIN/COMMIT braces:

   mysql> reset master; show create table tm; insert into tm values (4001); show binlog
events\G
   Query OK, 0 rows affected (0.00 sec)

  
+-------+--------------------------------------------------------------------------------------------------------+
   | Table | Create Table                                                                 
                         |
  
+-------+--------------------------------------------------------------------------------------------------------+
   | tm    | CREATE TABLE `tm` (
   `a` int(11) NOT NULL,
   PRIMARY KEY (`a`)
   ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | 
  
+-------+--------------------------------------------------------------------------------------------------------+
   1 row in set (0.00 sec)

   Query OK, 1 row affected (0.01 sec)

   *************************** 1. row ***************************
   Log_name: mysql1000-bin.000001
   Pos: 4
   Event_type: Format_desc
   Server_id: 1
   End_log_pos: 106
   Info: Server ver: 5.1.24-rc-debug-log, Binlog ver: 4
   *************************** 2. row ***************************
   Log_name: mysql1000-bin.000001
   Pos: 106
   Event_type: Query
   Server_id: 1
   End_log_pos: 174
   Info: use `test`; BEGIN
   *************************** 3. row ***************************
   Log_name: mysql1000-bin.000001
   Pos: 174
   Event_type: Table_map
   Server_id: 1
   End_log_pos: 215
   Info: table_id: 16 (test.tm)
   *************************** 4. row ***************************
   Log_name: mysql1000-bin.000001
   Pos: 215
   Event_type: Write_rows
   Server_id: 1
   End_log_pos: 249
   Info: table_id: 16 flags: STMT_END_F
   *************************** 5. row ***************************
   Log_name: mysql1000-bin.000001
   Pos: 249
   Event_type: Query
   Server_id: 1
   End_log_pos: 318
   Info: use `test`; COMMIT
   5 rows in set (0.01 sec)

And you actually are talking on that issue further along the text.


>
> [snip]
>
>>> --- a/mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result	2007-12-14
> 14:40:43 +01:00
>>> +++ b/mysql-test/suite/rpl/r/rpl_switch_stm_row_mixed.result	2008-03-25
> 10:39:23 +01:00
>>> @@ -425,673 +425,5 @@ INSERT INTO t13 VALUES (USER());
>>>  INSERT INTO t13 VALUES (my_user());
>>>  INSERT INTO t13 VALUES (CURRENT_USER());
>>>  INSERT INTO t13 VALUES (my_current_user());
>>>     
>>
>> I really appreciate that you are removing big results of show-binlog-events.
>>
>> Still, I think it's necessary to explain why `show binlog event' is safe to
>> remove. Do we have any risk to loosing something in that dropped piece of
>> results?
>>   
>
> Well... the main reason is that we should not check the binlog in
> replication tests, we should check that replication works. We have a
> much smaller suite for checking that items go into the binary log in a
> way that we expect, but if replication works correctly, it does not
> matter what is in the binary log.
>
> I can add a note that it was removed because we shouldn't rely on the
> contents of the binary log in replication tests.

I agree that checking binlog content might be sometimes far from the perfect
testing. Still in this particular BUG#28086 related test of
rpl_switch_stm_row_mixed `show binlog events' is the *only* check,
we may not remove it.

I think we can achieve the aim of shortening the results with
exploiting a wonderful JasonH's improvement:

starting the test for BUG#28086 with

+let $binlog_start= query_get_value(SHOW MASTER STATUS, Position, 1);

doing its logics

so that the final report

source include/show_binlog_events.inc;

stays but will result in events only after $binlog_start.

I think that's a good compromise.


>>  binlog_commit()'s argument
>>
>>   @param all   true if this is the last statement before a COMMIT
>>                statement;                               false
>> if either this is a statement in a
>>                transaction but not the last, or if this is a statement
>>                not inside a BEGIN block and autocommit is on.
>>
>> the false's meaning has extended with your patch. Could you please update the
>> comments?
>>   
>
> Well... actually, it changed with the introduction of patch for
> BUG#12713, but I will update the comment.

I thought post-bug#12713 server executed binlog_commit() same way as
the pre- version.
That's your patch forces committing even though `all' being false.

>
>>> @@ -1458,26 +1470,63 @@ static int binlog_commit(handlerton *hto
>>>      trx_data->reset();
>>>      DBUG_RETURN(0);
>>>    }
>>> +
>>>    /*
>>> -    Write commit event if at least one of the following holds:
>>> -     - the user sends an explicit COMMIT; or
>>> -     - the autocommit flag is on, and we are not inside a BEGIN.
>>> -    However, if the user has not sent an explicit COMMIT, and we are
>>> -    either inside a BEGIN or run with autocommit off, then this is not
>>> -    the end of a transaction and we should not write a commit event.
>>> +    Decision table for committing a transaction:
>>> +
>>>     
>>
>> I like the idea of the table.
>>
>> I was asking yesterday and you agreed to make the table desciption
>> more verbose.
>> Besides from what you kindly explained - the columns correspond to
>> combinations of the five paraments we need to clarify/define some of
>> them like Real transaction.
>>
>>   
>>> +    =============================  = = = = = = = = = = = = = = = =
>>> +    Real transaction               N N N N N N N N N N N N N N N N
>>> +    Statement in cache             N N N N N N N N Y Y Y Y Y Y Y Y
>>> +    In transaction                 N N N N Y Y Y Y N N N N Y Y Y Y
>>> +    Stmt modified non-trans        N N Y Y N N Y Y N N Y Y N N Y Y
>>> +    All modified non-trans         N Y N Y N Y N Y N Y N Y N Y N Y
>>> +
>>> +    Action: (C)ommit/(A)ccumulate  C C C C A C C C - - - - A A A A
>>> +    =============================  = = = = = = = = = = = = = = = =
>>> +
>>> +
>>> +    =============================  = = = = = = = = = = = = = = = =
>>> +    Real transaction               Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
>>> +    Statement in cache             N N N N N N N N Y Y Y Y Y Y Y Y
>>> +    In transaction                 N N N N Y Y Y Y N N N N Y Y Y Y
>>> +    Stmt modified non-trans        N N Y Y N N Y Y N N Y Y N N Y Y
>>> +    All modified non-trans         N Y N Y N Y N Y N Y N Y N Y N Y
>>> +
>>> +    (C)ommit/(A)ccumulate/(-)      - - - - C C C C - - - - C C C C
>>> +    =============================  = = = = = = = = = = = = = = = =
>>> +
>>>     
>>
>> Without the definition a reader may get confused even more.
>> E.g consider the 7th column of the 2nd part of the table.
>>
>> Y
>> N
>> Y
>> Y
>> N
>>
>> C
>>
>> Stmt modified non-trans Y constradicts to All modified non-trans N
>> while the combination is benign.
>>   
>

> Well, we are committing a statement, which means that the flag for the
> transaction has not been updated. Basically, this code is executed
> just before the state is changed from non-committed to committed.
>

Considering how merging of `Stmt modified non-trans' to 
`All modified non-trans' works, and it works as merging right away,
the column above can not exist; it should have `-' at the bottom.


>>
>>   
>>> +    In other words, we commit the transaction if and only if:
>>> +     - The keep log option is set
>>>     
>>
>> first,
>> i think there should be explicit
>>
>>         OR
>>
>> in between of the items
>>   
>
> OK.
>
>> second,
>>



>> the original logics do not mention  (thd->options & OPTION_KEEP_LOG)
>> in the following if arglist.
>> What is the reason on add it?
>>   
>
> It was there from the beginning in both binlog_rollback() and
> binlog_savepoint_rollback(), so I have just added that as a
> condition. The option is used to force the statement to be written to
> the binary log in the case of a rollback, so it may be partially
> redundant (it's main purpose seems to be to ensure that DROP TEMPORARY
> TABLE and CREATE TEMPORARY TABLE goes to the binary log). I think we
> should go over it's usage, but that is beyond this patch.
>

But that's binlog_commit() that i have meant, and it did no have the
option referred.

>>
>>
>>   
>>> +     - We are in a transaction and:
>>> +       - A full transaction is committed,
>>> +       - A non-transactional statement is committed and there is
>>> +         no statement cached, or
>>>     
>>
>>         OR (as above)

actually there was or, but i'd rather to place the word on a separate
line.

>>
>>   
>>> +     - We are not in a transaction and committing a statement
>>>     
>>
>> the last condition is actually what your fixes add to the original
>> logics: binlog_commit() can be called with all == false and
>> in_transaction == false and with the very intention to sink the trans
>> cache into the binlog.
>>
>>
>>   
>>> +
>>> +    Otherwise, we accumulate the statement
>>>    */
>>> -  if (all || !(thd->options & (OPTION_NOT_AUTOCOMMIT |
> OPTION_BEGIN)))
>>> +  ulonglong const in_transaction=
>>> +    thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN);
>>> +  DBUG_PRINT("debug",
>>> +             ("all: %d, empty: %s, in_transaction: %s,
> all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s",
>>> +              all,
>>> +              YESNO(trx_data->empty()),
>>> +              YESNO(in_transaction),
>>> +              YESNO(thd->transaction.all.modified_non_trans_table),
>>> +              YESNO(thd->transaction.stmt.modified_non_trans_table)));
>>>     
>>
>>   
>>> +  if ((thd->options & OPTION_KEEP_LOG) ||
>>>     
>>

>> so I am not sure about the first case (thd->options &
>> OPTION_KEEP_LOG) and need your clarification.

Remains open question whether it's needed. Notice, it's
binlog_commit() code.


>>
>>   
>>> +      in_transaction &&
>>> +      (all
>>> +       ||
>>> +       (!trx_data->at_least_one_stmt &&
>>> +        thd->transaction.stmt.modified_non_trans_table)) ||
>>>     
>>
>> Previously binlog_commit called binlog_end_trans(thd, trx_data, &qev)
>> having as warrants to do that either
>>    all or ! in_transaction
>>
>> Now i see that in the case
>>
>>    in_transaction is true,
>>    all is false,
>>    the current statement has made what the identifier says - 
>>    thd->transaction.stmt.modified_non_trans_table
>>    and the trans cache is empty 
>>
>> binlog_commit() is going to inject COMMIT into the binlog.
>>
>> And what is a use case scenario for such combination?
>>   


> Yes, most of the logic has been moved away from binlog_end_trans() in
> favor of doing the actual checking in the binlog_commit() et.al. This
> in order to simplify the logic and allow such things as what you just
> mention to actually be visible. The scenario is:
>
> CREATE TABLE t1 (a INT) ENGINE=MyISAM;
> CREATE TABLE t2 (a INT) ENGINE=InnoDB;
>
> BEGIN;
> INSERT INTO t1 VALUES (1),(2);
> INSERT INTO t2 VALUES (1),(2);
> COMMIT;
>
> When doing that on a server, I get the following binary log, which
> means that statement-based and row-based behave identically.
>
> mysql> show binlog events;
>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
> | Log_name          | Pos  | Event_type  | Server_id | End_log_pos |
> Info                                              |
>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
> | master-bin.000001 |    4 | Format_desc |         1 |         106 |
> Server ver: 5.1.24-rc-debug-log, Binlog ver: 4    |
> | master-bin.000001 |  106 | Query       |         1 |         206 |
> use `test`; create table t1 (a int) engine=myisam |
> | master-bin.000001 |  206 | Query       |         1 |         306 |
> use `test`; create table t2 (a int) engine=innodb |
> | master-bin.000001 |  306 | Query       |         1 |         374 |
> use `test`; BEGIN                                 |
> | master-bin.000001 |  374 | Table_map   |         1 |         415 |
> table_id: 16 (test.t1)                            |
> | master-bin.000001 |  415 | Write_rows  |         1 |         454 |
> table_id: 16 flags: STMT_END_F                    |
> | master-bin.000001 |  454 | Query       |         1 |         523 |
> use `test`; COMMIT                                |
> | master-bin.000001 |  523 | Query       |         1 |         591 |
> use `test`; BEGIN                                 |
> | master-bin.000001 |  591 | Table_map   |         1 |         632 |
> table_id: 17 (test.t2)                            |
> | master-bin.000001 |  632 | Write_rows  |         1 |         671 |
> table_id: 17 flags: STMT_END_F                    |
> | master-bin.000001 |  671 | Table_map   |         1 |         712 |
> table_id: 16 (test.t1)                            |
> | master-bin.000001 |  712 | Write_rows  |         1 |         751 |
> table_id: 16 flags: STMT_END_F                    |
> | master-bin.000001 |  751 | Xid         |         1 |         778 |
> COMMIT /* xid=7 */                                |
> | master-bin.000001 |  778 | Query       |         1 |         869 |
> use `test`; insert into t1 values(1),(2)          |
> | master-bin.000001 |  869 | Query       |         1 |         937 |
> use `test`; BEGIN                                 |
> | master-bin.000001 |  937 | Query       |         1 |        1028 |
> use `test`; insert into t2 values(1),(2)          |
> | master-bin.000001 | 1028 | Query       |         1 |        1119 |
> use `test`; insert into t1 values(3),(4)          |
> | master-bin.000001 | 1119 | Xid         |         1 |        1146 |
> COMMIT /* xid=15 */                               |
>
> +-------------------+------+-------------+-----------+-------------+---------------------------------------------------+
> 18 rows in set (0.00 sec)

I think you mean that if the first query after BEGIN changes on a
non-ta table the query binlogged ahead of BEGIN with binlog format
stmt, correct?

But there was no special reason behind that policy. It has remained for
statement format as a consequence of binlogging algirthm which starts
filling the trans cache at first appearence of a transactional table.

With your fixes, I believe, we could do not copy that statement tricky
behaviour and to put the first non-ta query row-events into the cache
to appear in binlog after BEGIN.

Getting back to that if

+  if ((thd->options & OPTION_KEEP_LOG) ||
+      in_transaction &&
+      (all ||
+       (!trx_data->at_least_one_stmt &&
+        thd->transaction.stmt.modified_non_trans_table)) ||
+      !in_transaction && !all)

Why should not it be just as it was before:

-  if (all || !in_transaction)

?

You principal use case of ac=ON query on a not-ta table with row-based
format satisfy.
All can be false for format.
Then in_transaction == false and the rows binlog.

Yes, there will be "difference"  in the order of binlogging comparing
with stmt format. However, indeed that would be correction of the order of
binlogging.



>
>> Previously the first two items
>>    in_transaction is true,
>>    all is false,
>> meant an ordirary dml query not commit is about to end and therefore
>> nothing to commit yet neither to binlog.
>>   
>
> That holds only for transactional engines, not for non-transactional
> engines.
>
>> Btw that piece of code seems to work (effectively i.e with actual
>> writing into binlog) only if there is no xa-support.
>> Besides to clarify I am asking you to check how no-xa-support-in-server
>> is going to cooperate with the current patch.
>>   
>
> It will work fine: XA is basically used for row-based replication
> involving InnoDB tables. The culprit is statement-based replication
> for non-transactional tables, which do not register the transaction at
> all.
>
>>
>>   
>>> +      !in_transaction && !all)
>>>     
>>
>> and this is the new rule for committing which I am okay with.
>>   
>
> It was actually there before, just implicitly.
>
>>   
>>>    {
>>>      Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, FALSE);
>>>      qev.error_code= 0; // see comment in MYSQL_LOG::write(THD, IO_CACHE)
>>>      int error= binlog_end_trans(thd, trx_data, &qev, all);
>>>      DBUG_RETURN(error);
>>>    }
>>>     
>>
>>   
>>> -  else
>>> -  {
>>> -    int error= binlog_end_trans(thd, trx_data, &invisible_commit, all);
>>> -    DBUG_RETURN(error);
>>> -  }
>>>     
>>
>> invisible_commit seemed to be a dead piece of code. Again, as was
>> suggested xa-off server testing can change this preliminary
>> conclusion.
>>   
>
> It's dead after Sven's fix.
>

I pasted earlier my log to show that your changes were meant.

>> If it proves to be dead indeed then you need to remove the global object and
>> the class as well.
>>   
>
> True.
>
>> From another hand, naturally redundant COMMIT and ROLLBACK completing events on
>> myisam table in the binlog is something we'd rather to consider how to
>> avoid. And this invisible commit could serve for such purpose.
>>   
>
> Problem is that they cannot basically be removed because the slave has
> to know where the transaction started and where it ended. Consider the
> case of replicating form a non-transactional engine (not writing
> COMMIT), to a transactional engine when AUTOCOMMIT is false. This will
> make all statements sent by the master appear as one gigantic
> transaction on the slave, and we do not want that.
>
> Long-term, we will add flags to the events instead to mark BEGIN,
> COMMIT and ROLLBACK and remove the separate queries. This will allow
> the binlog with the BEGIN and COMMIT to work as before, while
> eliminating it for new binary logs.


>
>> But I am okay with the current approach having the redundant queries
>> in binlog in this case.
>>
>>
>>   
>>> +  DBUG_RETURN(0);
>>>  }
>>>      
>>
>> It's worth to mention a confusing in my opinion description of `all'
>> for begin_rollback. Should not it be `ROLLBACK' on place of `COMMIT'?!
>> And actully all=true can be a reaction to force rolling back not the
>> query.
>>
>>
>>   @param all   true if this is the last statement before a COMMIT
>>                statement; false if either this is a statement in a
>>                transaction but not the last, or if this is a statement
>>                not inside a BEGIN block and autocommit is on.
>>
>> If so, could you please fix that?
>>   
>
> OK.
>
>> Also, as in the commit case the description of `all' needs updating.
>> Could you please state explicitly that all == false and   
>
> OK.
>
>>   
>>>  /**
>>> @@ -1509,21 +1558,36 @@ static int binlog_rollback(handlerton *h
>>>      DBUG_RETURN(0);
>>>    }
>>>  -  /*
>>> -    Update the binary log with a BEGIN/ROLLBACK block if we have
>>> -    cached some queries and we updated some non-transactional
>>> -    table. Such cases should be rare (updating a
>>> -    non-transactional table inside a transaction...)
>>> -  */
>>> -  if (unlikely(thd->transaction.all.modified_non_trans_table || -
>>> (thd->options & OPTION_KEEP_LOG)))
>>> +  DBUG_PRINT("debug", ("all: %s, all.modified_non_trans_table: %s,
> stmt.modified_non_trans_table: %s",
>>> +                       YESNO(all),
>>> +                      
> YESNO(thd->transaction.all.modified_non_trans_table),
>>> +                      
> YESNO(thd->transaction.stmt.modified_non_trans_table)));
>>> +  if (all && thd->transaction.all.modified_non_trans_table ||
>>> +      !all && thd->transaction.stmt.modified_non_trans_table ||
>>> +      (thd->options & OPTION_KEEP_LOG))
>>>    {
>>> +    /*
>>> +      We write the transaction cache with a rollback last if we have
>>> +      modified any non-transactional table. We do this even if we are
>>> +      committing a single statement that has modified a
>>> +      non-transactional table since it can have modified a
>>> +      transactional table in that statement as well, which needs to be
>>> +      rolled back on the slave.
>>> +    */
>>>      Query_log_event qev(thd, STRING_WITH_LEN("ROLLBACK"), TRUE, FALSE);
>>>      qev.error_code= 0; // see comment in MYSQL_LOG::write(THD, IO_CACHE)
>>>      error= binlog_end_trans(thd, trx_data, &qev, all);
>>>    }
>>> -  else
>>> +  else if (all && !thd->transaction.all.modified_non_trans_table
> ||
>>> +           !all &&
> !thd->transaction.stmt.modified_non_trans_table)
>>> +  {
>>> +    /*
>>> +      If we have modified only transactional tables, we can truncate
>>> +      the transaction cache without writing anything to the binary
>>> +      log.
>>> +     */
>>>      error= binlog_end_trans(thd, trx_data, 0, all);
>>> +  }
>>>    DBUG_RETURN(error);
>>>  }
>>>  diff -Nrup a/sql/log_event.cc b/sql/log_event.cc
>>> --- a/sql/log_event.cc	2008-02-28 18:55:42 +01:00
>>> +++ b/sql/log_event.cc	2008-03-25 10:39:24 +01:00
>>> @@ -7054,7 +7054,7 @@ int Table_map_log_event::save_field_meta
>>>  #if !defined(MYSQL_CLIENT)
>>>  Table_map_log_event::Table_map_log_event(THD *thd, TABLE *tbl, ulong tid,
>>>                                           bool is_transactional, uint16
> flags)
>>> -  : Log_event(thd, 0, is_transactional),
>>> +  : Log_event(thd, 0, true),
>>>      m_table(tbl),
>>>      m_dbnam(tbl->s->db.str),
>>>      m_dblen(m_dbnam ? tbl->s->db.length : 0),
>>> diff -Nrup a/sql/sql_insert.cc b/sql/sql_insert.cc
>>> --- a/sql/sql_insert.cc	2008-02-19 13:45:16 +01:00
>>> +++ b/sql/sql_insert.cc	2008-03-25 10:39:25 +01:00
>>> @@ -2426,6 +2426,7 @@ pthread_handler_t handle_delayed_insert(
>>>        */
>>>        di->table->file->ha_release_auto_increment();
>>>        mysql_unlock_tables(thd, lock);
>>>     
>>
>>   
>>> +      ha_autocommit_or_rollback(thd, 0);
>>>     
>>
>> that's clear why.
>> Still would be better to leave comments what the func is doing here.
>> (actually it's just in order to call  binlog_commit() eventually, is
>> not it?)
>>   
>
> With the change introduced by Kostja, all work should be terminated
> with an autocommit. This is mainly for replication, but it makes sense
> to commit the work done when no more work is coming in because the
> delayed statement ended.

That can make delayed insert delayed ever more :)
Thanks for clarifying. Comments for this added code do not seem to be
mandatory.


cheers,

Andrei
Thread
bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl25 Mar
  • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Andrei Elkin26 Mar
    • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl27 Mar
      • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Andrei Elkin27 Mar
        • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl28 Mar
          • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Andrei Elkin28 Mar
            • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl28 Mar
  • RE: bk commit into 5.1 tree (mats:1.2559) BUG#29020Chuck Bell26 Mar
    • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl27 Mar
  • RE: bk commit into 5.1 tree (mats:1.2559) BUG#29020Chuck Bell26 Mar
    • Re: bk commit into 5.1 tree (mats:1.2559) BUG#29020Mats Kindahl27 Mar