List:Replication« Previous MessageNext Message »
From:Rick James Date:April 23 2008 8:19pm
Subject:RE: innodb replication
View as plain text  
Consider this timeline:

1. BEGIN
2. do some stuff in transaction
3. FLUSH TABLES WITH READ LOCK
4. take snapshot
5. UNLOCK TABLES
6. do some stuff in transaction
7. COMMIT

I understand that #3 and #6 won't be written to the replication stream until #7.  What
about the updating of local things for #2?  Won't the snapshot see #1-2, realize there is
no COMMIT, and ROLLBACK the stuff from #2?

Later, if I bring up the snapshot:

91. Bring up Mysql on the snapshot
92. Stuff from #2 has not been committed, so it is rolled back?

So, effectively the snapshot was at state #1.

And the problem with 4.1 was that some stuff from #2 (or other things) might not have been
flushed to disk before #4.  Right?

> -----Original Message-----
> From: Jeremy Cole [mailto:jeremy.cole@stripped] On 
> Behalf Of Jeremy Cole
> Sent: Wednesday, April 23, 2008 12:18 PM
> To: Rick James
> Cc: Eric Bergen; augusto@stripped; replication@stripped
> Subject: Re: innodb replication
> 
> Hi Rick,
> 
> No, it doesn't wait.  Quite the opposite, FLUSH TABLES WITH READ LOCK 
> causes any subsequent COMMITs to block waiting for the lock to be 
> released.  (This is true of 5.0, in 4.1, this doesn't happen, and 
> consequently this method can't be used to get a 
> replicate-able snapshot.)
> 
> Regards,
> 
> Jeremy
> 
> Rick James wrote:
> > Will FLUSH wait for any transactions to COMMIT (or ROLLBACK)?
> > 
> >> -----Original Message-----
> >> From: Jeremy Cole [mailto:jeremy.cole@stripped] On 
> >> Behalf Of Jeremy Cole
> >> Sent: Wednesday, April 23, 2008 11:20 AM
> >> To: Rick James
> >> Cc: Eric Bergen; augusto@stripped; replication@stripped
> >> Subject: Re: innodb replication
> >>
> >> Hi Rick,
> >>
> >> Sure, but why not?  Both states (pre-transaction and 
> >> post-transaction) 
> >> are valid, consistent states.  It's exactly as though you had 
> >> taken the 
> >> snapshot a bit earlier than you actually did.
> >>
> >> In fact, the only reason to use FLUSH TABLES WITH READ 
> LOCK in this 
> >> context is so that you can get the binary log positions 
> >> aligned with it. 
> >>   Strictly, you don't need it.
> >>
> >> Regards,
> >>
> >> Jeremy
> >>
> >> Rick James wrote:
> >>> Consistent state, yes.  But what about mid-transaction?  
> >> Might it be that the snapshot will have half the transaction 
> >> and have to ROLLBACK, while the live machine (after UNLOCK) 
> >> will COMMIT the transaction? 
> >>>> -----Original Message-----
> >>>> From: eric.bergen@stripped 
> >>>> [mailto:eric.bergen@stripped] On Behalf Of Eric Bergen
> >>>> Sent: Tuesday, April 22, 2008 7:12 PM
> >>>> To: augusto@stripped
> >>>> Cc: replication@stripped
> >>>> Subject: Re: innodb replication
> >>>>
> >>>> Innodb always keeps it's files in a consistent state on 
> disk. Power
> >>>> failure and snapshots of the system won't result in a 
> corrupt data
> >>>> file. You can find more details on how innodb achieves this here:
> >>>> 
> http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/
> >>>>
> >>>> More replies inline.
> >>>>
> >>>> On Tue, Apr 22, 2008 at 5:22 PM, Augusto Bott 
> >>>> <augusto@stripped> wrote:
> >>>>> Hi Mark, Eric!
> >>>>>
> >>>>>  The last comments on this thread got me thinking about 
> all those
> >>>>>  backup chalenges... since a FLUSH TABLES won't stop 
> >> internal InnoDB
> >>>>>  threads from working and changing the datafiles in the 
> >>>> background, how
> >>>>>  can we be absolutely sure that a filesystem snapshot will 
> >>>> be actually
> >>>>>  consistent when it's taken? What if when we're taking that 
> >>>> snapshot in
> >>>>>  the precise moment when InnoDB threads are working and 
> >>>> modifying data?
> >>>>>  How can we be sure that the snapshot will not be taken on 
> >>>> the precise
> >>>>>  moment there's a write in place?
> >>>>>
> >>>>>  I must acknowledge under this scenario, taking a 
> >> 'instant' snapshot
> >>>>>  from the filesystem will greatly reduce the probability of 
> >>>> corruption
> >>>>>  but... any ideas on how to get it 100% safe? (hm... 
> >>>> perhaps we should
> >>>>>  discuss this on internals@ ?)
> >>>> See the notes above on why the snapshot is guaranteed not to 
> >>>> be corrupt.
> >>>>
> >>>>>  On the other hand, we're talking about warm backups on a 
> >>>> slave, right?
> >>>>>  So... what if we stop replication, issued a flush tables, 
> >>>> waited for a
> >>>>>  moment until IO operations stop, and then start copying 
> >>>> the datafiles
> >>>>>  or taking that snapshot?
> >>>> You may as well just shutdown the slave. It's safer than 
> trying to
> >>>> determine if innodb isn't going to write something.
> >>>>
> >>>>>  Anyway... I don't recall having experienced corruption 
> >>>> when restoring
> >>>>>  from a warm backup (ever), but that may be that I just 
> >>>> "got lucky" or
> >>>>>  didn't look into the data deep enough to actually find 
> >> any problems
> >>>>>  :-)
> >>>>>
> >>>>>  Cheers!
> >>>>>
> >>>>>  --
> >>>>>  Augusto Bott
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>  On Mon, Apr 21, 2008 at 8:27 PM, Mark Callaghan 
> >>>> <mcallaghan@stripped> wrote:
> >>>>>  >
> >>>>>  >
> >>>>>  >
> >>>>>  > On Mon, Apr 21, 2008 at 5:14 PM, Augusto Bott 
> >>>> <augusto@stripped> wrote:
> >>>>>  > > Hi Rick, Eric, Marcus!
> >>>>>  > >
> >>>>>  > > I must correct some of the things that have been said
> 
> >>>> on this thread.
> >>>>>  > > I'm not commenting on rumors, just the facts from the
> 
> >>>> documentation
> >>>>>  > > (and a few years of experience with MySQL) :-)
> >>>>>  > >
> >>>>>  > > If you're running your MySQL server with
> >>>>>  > > innodb_flush_log_at_trx_commit=1 (that's the
> default), 
> >>>> issuing FLUSH
> >>>>>  > > TABLES WITH READ LOCK will indeed prevent 
> modification to the
> >>>>>  > > datafiles while the global read lock is held since
> all 
> >>>> committed
> >>>>>  > > transactions (so far) will be for sure on disk. The 
> >>>> global read lock
> >>>>>  > > stays in place until UNLOCK TABLES is issued or until
> 
> >>>> that connection
> >>>>>  > > is closed/timed out. This makes it safe to copy the 
> >>>> datafiles for a
> >>>>>  > > backup - the result is indeed a consistent backup.
> >>>>>  >
> >>>>>  > FLUSH TABLES WITH READ LOCK should allow for a snapshot 
> >>>> (LVM, ZFS) to be
> >>>>>  > used to copy all files -- as long as they are on the 
> >>>> same filesystem. But I
> >>>>>  > do not think it is safe to copy InnoDB files without a 
> >>>> snapshot as the
> >>>>>  > background IO threads (purge, insert buffer, page 
> >>>> writer) can do IO during
> >>>>>  > FLUSH TABLES WITH READ LOCK. I have not seen any code 
> >>>> that makes the InnoDB
> >>>>>  > background IO threads wait when there is a global read
> lock.
> >>>>>  >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > > This operation can be called a 'warm backup'. A hot 
> >>>> backup is what
> >>>>>  > > "InnoDB Hot Backup" does: it does not lock anything 
> >>>> while it's running
> >>>>>  > > and transactions can start and commit while it's 
> >>>> running (the tool
> >>>>>  > > requires a brief global read lock at the beginning, 
> >>>> when the backup is
> >>>>>  > > starting) and takes a full consistent backup (you
> must 
> >>>> license it to
> >>>>>  > > use it). A cold backup would be shutting down MySQL,
> 
> >>>> copying the
> >>>>>  > > datafiles and firing it up again (also a safe move).
> >>>>>  > >
> >>>>>  > > The "caveat" when you make a warm backup is that the
> 
> >>>> datafiles will be
> >>>>>  > > marked 'dirty' and on the next startup, the logs will
> 
> >>>> be replayed to
> >>>>>  > > bring the database to a consistent point, since it 
> >>>> "wasn't closed
> >>>>>  > > properly" (to the very same point when you issued the
> 
> >>>> FLUSH TABLES
> >>>>>  > > WITH READ LOCK statement).
> >>>>>  > >
> >>>>>  > > --
> >>>>>  > > Augusto Bott
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > > On Sun, Apr 20, 2008 at 5:46 PM, Eric Bergen 
> >>>> <ebergen@stripped>
> >>>>>  > wrote:
> >>>>>  > > > Hi,
> >>>>>  > > >
> >>>>>  > > >  The flush tables with read lock issue has to do
> 
> >>>> with copying the files
> >>>>>  > > >  off the disk, not with using mysqldump's sql.
> The 
> >>>> issue is that flush
> >>>>>  > > >  tables with read lock is enough to prevent
> clients 
> >>>> from modifying
> >>>>>  > > >  innodb but it doesn't make innodb "hold still"
> . It 
> >>>> still has
> >>>>>  > > >  background threads modifying the data files. If
> you 
> >>>> did a flush tables
> >>>>>  > > >  with read lock; then copied the data files they
> 
> >>>> will almost certainly
> >>>>>  > > >  be corrupted because different parts of the
> files 
> >>>> are copied at
> >>>>>  > > >  different times.
> >>>>>  > > >
> >>>>>  > > >  If you use mysqldump --master-data it will turn
> on 
> >>>> --lock-all-tables
> >>>>>  > > >  automatically giving you a consistent snapshot
> 
> >>>> across all storage
> >>>>>  > > >  engines. The downside of this is that mysqldump
> has 
> >>>> to hold a read
> >>>>>  > > >  lock the entire time it's dumping data. If
> you're 
> >>>> only using innodb
> >>>>>  > > >  you can specify --master-data
> --single-transaction 
> >>>> mysqldump will
> >>>>>  > > >  issue a flush tables with read lock long enough
> to 
> >>>> copy the master
> >>>>>  > > >  data and start a transaction. With a
> transaction 
> >>>> started innodb will
> >>>>>  > > >  only return rows that were committed before the
> 
> >>>> transaction was
> >>>>>  > > >  started. Other transactions can proceed and
> innodb 
> >>>> won't see the rows.
> >>>>>  > > >
> >>>>>  > > >
> >>>>>  > > >
> >>>>>  > > >  On Fri, Apr 18, 2008 at 4:14 PM, Rick James 
> >>>> <rjames@stripped>
> >>>>>  > wrote:
> >>>>>  > > >  > Rumor has it that InnoDB used to still
> have some 
> >>>> things in RAM after
> >>>>>  > the  "FLUSH TABLES WITH READ LOCK", thereby disallowing 
> >>>> it as a safe way to
> >>>>>  > dump InnoDB tables.  Another rumor has it that that has 
> >>>> been fixed.  [Sorry
> >>>>>  > for lack of details; hope someone will respond with
> details.]
> >>>>>  > > >  >
> >>>>>  > > >  >  I prefer to stop mysql, copy the
> necessary 
> >>>> directories, then restart
> >>>>>  > mysql.  An improvement on that is to use Snapshots (eg 
> >>>> LVM on Linux).  They
> >>>>>  > make that extremely fast, hence very practical.  (The 
> >>>> copy can be done from
> >>>>>  > the snapshot _after_ restarting mysql.)
> >>>>>  > > >  >
> >>>>>  > > >  >
> >>>>>  > > >  >
> >>>>>  > > >  >  > -----Original Message-----
> >>>>>  > > >  >  > From: Marcus Bointon 
> >> [mailto:marcus@stripped]
> >>>>>  > > >  >  > Sent: Friday, April 18, 2008 2:42
> AM
> >>>>>  > > >  >  > To: replication@stripped
> >>>>>  > > >  >  > Subject: Re: innodb replication
> >>>>>  > > >  >  >
> >>>>>  > > >  >  > On 18 Apr 2008, at 10:27, Ed W
> wrote:
> >>>>>  > > >  >  >
> >>>>>  > > >  >  > >> I´m going to set up a
> innodb database 
> >>>> replication, and I
> >>>>>  > > >  >  > have some
> >>>>>  > > >  >  > >> doubts
> >>>>>  > > >  >  > >> about the backing up the
> master database step.
> >>>>>  > > >  >  > >>
> >>>>>  > > >  >  > >> Normally, I do a "FLUSH
> TABLES WITH READ 
> >>>> LOCK", backup the data
> >>>>>  > > >  >  > >> files (or
> >>>>>  > > >  >  > >> use mysqldump), get the
> current log 
> >>>> coordinates with "SHOW
> >>>>>  > MASTER
> >>>>>  > > >  >  > >> STATUS"
> >>>>>  > > >  >  > >> and unlock the tables.
> Using exactly the 
> >>>> same steps work for
> >>>>>  > innodb
> >>>>>  > > >  >  > >> databases?
> >>>>>  > > >  >  > >>
> >>>>>  > > >  >  > >> According to some texts, it
> seems not, but 
> >>>> I haven´t found
> >>>>>  > > >  >  > a clear
> >>>>>  > > >  >  > >> cookbook.
> >>>>>  > > >  >  > >>
> >>>>>  > > >  >  > >>
> >>>>>  > > >  >  > >> Just doing a "mysqldump
> --master-data 
> >>>> --single-transaction" is
> >>>>>  > > >  >  > >> enough to
> >>>>>  > > >  >  > >> replace the above steps?
> >>>>>  > > >  >  > >>
> >>>>>  > > >  >  > >>
> >>>>>  > > >  >  > >
> >>>>>  > > >  >  > > How would you put these two
> things together 
> >>>> in the case of
> >>>>>  > > >  >  > a server
> >>>>>  > > >  >  > > using mixed database table
> types for a clean 
> >>>> backup that
> >>>>>  > > >  >  > can be used
> >>>>>  > > >  >  > > to restart synchronisation?
> >>>>>  > > >  >  >
> >>>>>  > > >  >  >
> >>>>>  > > >  >  > You can't. Only InnoDB supports 
> >>>> single-transaction, so though
> >>>>>  > > >  >  > it would
> >>>>>  > > >  >  > still generate a dump file for other
> table 
> >>>> types, you'd have no
> >>>>>  > > >  >  > assurance that they are clean
> because 
> writes could be
> >>>>>  > > >  >  > happening as you
> >>>>>  > > >  >  > do it. This is one of the advantages
> of 
> using innodb.
> >>>>>  > > >  >  >
> >>>>>  > > >  >  > You can of course do separate dumps
> for other 
> >>>> DBs, and because
> >>>>>  > that
> >>>>>  > > >  >  > wouldn't need to include the innodb
> tables, it 
> >>>> would at least
> >>>>>  > reduce
> >>>>>  > > >  >  > the time that your tables are
> locked.
> >>>>>  > > >  >  >
> >>>>>  > > >  >  > You could also look at maatkit's
> parallel 
> >>>> dump/load scripts.
> >>>>>  > > >  >  >
> >>>>>  > > >  >  > Marcus
> >>>>>  > > >  >  > --
> >>>>>  > > >  >  > Marcus Bointon
> >>>>>  > > >  >  > Synchromedia Limited: Creators of 
> >>>> http://www.smartmessages.net/
> >>>>>  > > >  >  > UK resellers of info@hand CRM
> solutions
> >>>>>  > > >  >  > marcus@stripped | 
> >>>> http://www.synchromedia.co.uk/
> >>>>>  > > >  >  >
> >>>>>  > > >  >  >
> >>>>>  > > >  >  >
> >>>>>  > > >  >  > --
> >>>>>  > > >  >  > MySQL Replication Mailing List
> >>>>>  > > >  >  > For list archives: 
> http://lists.mysql.com/replication
> >>>>>  > > >  >  > To unsubscribe:
> >>>>>  > > >  >  > 
> >>>> http://lists.mysql.com/replication?unsub=1
> >>>>>  > > >  >
> >>>>>  > > >  >
> >>>>>  > > >  > >
> >>>>>  > > >  >  >
> >>>>>  > > >  >
> >>>>>  > > >  >  --
> >>>>>  > > >  >  MySQL Replication Mailing List
> >>>>>  > > >  >  For list archives:
> http://lists.mysql.com/replication
> >>>>>  > > >  >  To unsubscribe:
> >>>>>  > 
> http://lists.mysql.com/replication?unsub=1
> >>>>>  > > >  >
> >>>>>  > > >  >
> >>>>>  > > >
> >>>>>  > > >
> >>>>>  > > >
> >>>>>  > > >  --
> >>>>>  > > >  Eric Bergen
> >>>>>  > > >  eric.bergen@stripped
> >>>>>  > > >  http://www.provenscaling.com
> >>>>>  > > >
> >>>>>  > > >
> >>>>>  > > >
> >>>>>  > > >  --
> >>>>>  > > >  MySQL Replication Mailing List
> >>>>>  > > >  For list archives:
> http://lists.mysql.com/replication
> >>>>>  > > >  To unsubscribe:
> >>>>>  > 
> http://lists.mysql.com/replication?unsub=1
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > >
> >>>>>  > > >
> >>>>>  > > >
> >>>>>  > >
> >>>>>  > > --
> >>>>>  > > MySQL Replication Mailing List
> >>>>>  > > For list archives:
> http://lists.mysql.com/replication
> >>>>>  > > To unsubscribe:
> >>>>>  > 
> http://lists.mysql.com/replication?unsub=1
> >>>>>  > >
> >>>>>  > >
> >>>>>  >
> >>>>>  >
> >>>>>  >
> >>>>>  > --
> >>>>>  > Mark Callaghan
> >>>>>  > mcallaghan@stripped
> >>>>>
> >>>>>  --
> >>>>>
> >>>>> MySQL Replication Mailing List
> >>>>>  For list archives: http://lists.mysql.com/replication
> >>>>>  To unsubscribe:    
> >>>> http://lists.mysql.com/replication?unsub=1
> >>>>
> >>>> -- 
> >>>> high performance mysql consulting.
> >>>> http://provenscaling.com
> >>>>
> >>>> -- 
> >>>> MySQL Replication Mailing List
> >>>> For list archives: http://lists.mysql.com/replication
> >>>> To unsubscribe:    
> >>>> http://lists.mysql.com/replication?unsub=1
> >>>>
> >>>>
> >> -- 
> >> high performance mysql consulting
> >> www.provenscaling.com
> >>
> > 
> 
> -- 
> high performance mysql consulting
> www.provenscaling.com
> 
Thread
innodb replicationEdson Noboru Yamada23 Mar
  • Re: innodb replicationMarcus Bointon24 Mar
  • Re: innodb replicationEd W18 Apr
    • Re: innodb replicationMarcus Bointon18 Apr
      • Re: innodb replicationEd W18 Apr
        • Re: innodb replicationMarcus Bointon18 Apr
      • RE: innodb replicationRick James19 Apr
        • Re: innodb replicationEric Bergen20 Apr
          • Re: innodb replicationAugusto Bott22 Apr
            • Re: innodb replicationMark Callaghan22 Apr
              • Re: innodb replicationAugusto Bott22 Apr
                • Re: innodb replicationEric Bergen22 Apr
              • Re: innodb replicationAugusto Bott23 Apr
                • Re: innodb replicationEric Bergen23 Apr
                  • RE: innodb replicationRick James23 Apr
                    • Re: innodb replicationMarcus Bointon23 Apr
                    • Re: innodb replicationJeremy Cole23 Apr
                      • RE: innodb replicationRick James23 Apr
                        • Re: innodb replicationJeremy Cole23 Apr
                          • RE: innodb replicationRick James23 Apr
                            • Re: innodb replicationMarcus Bointon23 Apr
                              • Re: innodb replicationEric Bergen24 Apr
                                • Re: innodb replicationMoon's Father21 May