List:Internals« Previous MessageNext Message »
From:Mats Kindahl Date:May 20 2009 9:28am
Subject:Re: Feedback needed for WL#2687
View as plain text  

Michael Widenius wrote:
> Hi!
>>>>>> "Mats" == Mats Kindahl <mats@stripped> writes:
> <cut>
> Alfranio> Mixing transactional and non-transactional tables is not supported by
> the
> Alfranio> current code. In a nutshell, non-transactional operations are logged
> upon
> Alfranio> commit or rollback thus generating invalid sequence of operations.
> Alfranio> For instance, assume the following statements and sessions
> (connections):
> Alfranio> con0: CREATE TABLE test (a int);
> Alfranio> con0: INSERT INTO test VALUES(1);
> Alfranio> con1: START TRANSACTION;
> Alfranio> con2: START TRANSACTION;
> Alfranio> con1: UPDATE test SET a = 10;
> Alfranio> con2: DELETE FROM test;
> Alfranio> con2: COMMIT;
> Alfranio> con1: COMMIT;
>>> <cut>
>>> The above assumption wasn't the case for MySQL 5.0-GA (but was changed
>>> recently without any good reason).
> Mats> Not so. There has been a change in the behavior regarding non-transactional
> Mats> changes that were *first* in the transaction, but this behavior was already
> Mats> present for non-transactional changes that were inside a transaction.
> What exactly is the recent change ?
> I am a bit confused here as a customer's told me how the change
> in 5.0 had broken the data on their slaves and when discussing this
> got to belive that we before wrote all non-transactional statements
> directly to the binary log.

Well, we did in one particular case: when a "non-transactional" statement was
first in a transaction.

> Looking at the code and the comments in 4.1 it looks like we wrote
> them directly only of the cache was empty.


>>> MySQL used to always log changes to non-transactional tables at once
>>> and only buffer transactional changes.  This is something that MySQL
>>> users have known and taken into account when writing their
>>> applications.
> Mats> Non-transactional changes were written to the binary log only when they
> were
> Mats> first in the transaction. Also, the notion of when a statement was
> transactional
> Mats> or not was decided on very loose grounds.
> Define loose grounds.

Attached to each statement is a flag that says "contain non-transactional
changes". This flag is set if there is a non-transactional change done in the
statement (and this flag is propagated to the "all" transaction flag, to mark a
real transaction in a similar way). The purpose of this flag is to decide if the
cache needs to be written to the binary log on rollback.

However, the logic used this flag to deem a statement as non-transactional as
well, which means that it failed if there were transactional changes in the
statement as well (hence, the statement was committed even though it contained
transactional changes).

The only case where it is "safe" to write the statement is when there has been
no transactional changes done at all because, conceptually, the transaction has
not really started.

>>> Now things seams to work that everything is non-transactional tables
>>> are logged at once until we change a non-transactional table in which
>>> case everything is logged at commit.
> Mats> I assume that you mean "transactional" instead of "non-transactional" in
> the
> Mats> second case.
> Yes.
>>> The other change that was recently introduced, probably at the same
>>> time, is that in case of ROLLBACK we send the whole statement to the
>>> binary log, but with a ROLLBACK at end.  This will crash all setups
>>> where you have transactional tables on the master and
>>> non-transactional tables on the slave.
> Mats> I assume that you means "the whole transaction" above. I don't know if you
> Mats> consider that recent, but the code for this is in 4.1.
> When testing this, it looked as if we would got ROLLBACK in more cases
> than before in 5.1, but I may be mistaken.

Yes, that is probably the case, but it is not related to this work.

The difference is that in 5.0, tables in triggers were not considered when
deciding whether there were any changes to non-transactional tables, so a
trigger updating a non-transactional table was happily rolled back by *not*
sending the statement over to the slave, which resulted in an inconsistency.

> This was however not what
> the customer was complaining about, this was just an observation while
> examining the logs.

It would be interesting to see the *real* example of the sequence that they have
instead of just examples of what does not work. I suspect that the problem is
caused by several unrelated changes in 5.1, and not by a single "fix."

>>> The changed behavior in MySQL 5.0-GA have caused some concerns among
>>> MySQL users as it has caused their data to be inconsistent on their
>>> slaves.  I got several requests at the MySQL users conference what
>>> could be done to get back the old behavior.
> Mats> As mentioned, this behavior was in 4.1, and I don't know what
> inconsistencies
> Mats> you are referring to, so it is hard for me to answer that.
> It goes back to me understanding the exact nature of the change in
> MySQL 5.0 that breaks customers applications.

I think it is several unrelated changes that together cause the problem. I would
like to see the actual case instead of just a suggestion for a solution since I
fear that we may end up fixing the wrong problem if we don't have a concrete case.

> Mats> I don't question that inconsistencies are there, but basically, statement
> based
> Mats> replication cannot be made to behave in a consistent manner when mixing
> Mats> non-transactional and transactional tables except in some controlled and
> limited
> Mats> circumstances. This is what we are striving to change.
> Yes, I understand that.
> However doing changes like this in a GA release is not something one
> should do.
> The correct way, I assume to fix this, is to go over to row level
> logging at once when there is mix of transactional followed by
> non-transactional tables under one transaction.

Yes, and that is what this worklog is about.


> Mats> ... and in the following scenario, you will have inconsistency on the slave
> Mats> because n2 is logged before the transaction:
> Mats> T1: BEGIN;
> Mats> T1: INSERT into t1 values (1);
> Mats> T1: INSERT INTO n2 select * from t1;
> Mats> T1: COMMIT;
> That is correct, but that is something our customers knows about in
> 5.0/5.1.

Yes, but the existing implementation for the above problem makes *this* case be
logged at commit as a single transaction:

T1: INSERT into t1 values (1);
T1: INSERT INTO n2 values (1);

Which is likely to *not* be something that they expect.

> There is nothing that says which of the above versions is better as
> both can break things.
> There is however one really serious problem with not logging anything
> until commit:
> If you are ONLY using non-transactional tables, this will cause a
> serious problem if you are using BEGIN/COMMIT in your code!

Exactly, but this has been the case since 4.1, so it is nothing new. This was
one of the first things that I spotted out when I started with MySQL.

Mats Kindahl
Senior Software Engineer
Database Technology Group
Sun Microsystems
Feedback needed for WL#2687Alfranio Correia13 May
  • Re: Feedback needed for WL#2687Mats Kindahl13 May
    • Re: Feedback needed for WL#2687Alfranio Correia13 May
  • Re: Feedback needed for WL#2687He Zhenxing14 May
    • Re: Feedback needed for WL#2687He Zhenxing14 May
    • Re: Feedback needed for WL#2687Alfranio Correia15 May
      • Re: Feedback needed for WL#2687He Zhenxing18 May
  • re: Feedback needed for WL#2687Michael Widenius14 May