List:General Discussion« Previous MessageNext Message »
From:Howard Hart Date:October 23 2011 9:56pm
Subject:RE: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and
Recovers Without Intervention
View as plain text  
One cause of heavy replication lag we noticed was due to a misbehaving application
blasting updates (and commits) onto the master InnoDB tables from multiple clients. Since
slave replication is single-threaded, it couldn't keep up I/O-wise, while the master
seemed to show reasonably low load throughout. 

The temporary fix was to just set innodb_flush_log_at_trx_commit = 2 to only flush the log
file to disk once every second. Result was the lag went from 5,000 seconds behind and
climbing to 0 in literally seconds, and the slave 
load dropped way below 1 again.

The catch (there's always one, of course) is if the server crashes, you could lose up to 1
seconds' worth of uncommitted transactions.

Howard
________________________________________
From: Claudio Nanni [claudio.nanni@stripped]
Sent: Sunday, October 23, 2011 2:27 PM
To: Tyler Poland
Cc: mysql@stripped
Subject: Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers
Without Intervention

Luis,

Very hard to tackle.
In my experience, excluding external(to mysql) bottlenecks, like hardware,
o.s. etc, 'suspects' are the shared resources 'guarded' by unique mutexes,
like on the query cache or key cache.
Since you do not use MySQL it cannot be the key cache. Since you use percona
the query cache is disabled by default.
You should go a bit lower level and catch the system calls with one of the
tools you surely know to see if there are waits on the semaphores.

I also would like to tell that the 'seconds behind master' reported by the
slave is not reliable.

Good luck!

Claudio

2011/10/23 Tyler Poland <tpoland@stripped>

> Luis,
>
> How large is your database?  Have you checked for an increase in write
> activity on the master leading up to this? Are you running a backup against
> the replica?
>
> Thank you,
> Tyler
>
> Sent from my Droid Bionic
> On Oct 23, 2011 5:40 AM, "Luis Motta Campos" <luismottacampos@stripped>
> wrote:
>
> > Fellow DBAs and MySQL Users
> >
> > [apologies for eventual duplicates - I've posted this to
> > percona-discussion@stripped also]
> >
> > I've been hunting an issue with my database cluster for several months
> now
> > without much success. Maybe I'm overlooking something here.
> >
> > I've been observing the database slowing down and lagging behind for
> > thousands of seconds (sometimes over the course of several days) even
> > without any query load besides replication itself.
> >
> > I am running Percona MySQL 5.1.51 (InnoDB plug-in version 1.12) on Dell
> > R710 (6 x 3.5 inch 15K RPM disks in RAID10; 24GB RAM; 2x Quad-core Intel
> > processors) running Debian Lenny. MySQL data, binary logs, relay logs,
> > innodb log files are on separated partitions from each other, on a RAID
> > system separated from the operating system disks.
> >
> > Default Storage Engine is InnoDB, and the usual InnoDB memory structures
> > are stable and look healthy.
> >
> > I have about 500 (read) queries per second on average, and about 10% of
> > this as writes on the master.
> >
> > I've been observing something that looks like between 6 and 10 pending
> > reads per second uniformly on my cacti graphs.
> >
> > The issue is characterized by the server suddenly slowing down writes
> > without any previous warning or change, and lagging behind for several
> > thousand seconds (triggering all sorts of alerts on my monitoring
> system). I
> > don't observe extra CPU activity, just a reduced disk access ratio (from
> > about 5-6MB/s to 500KB/s) and replication lagging. I could correlate it
> > neither InnoDB hashing activity, nor with long-running-queries, nor with
> > background read/write thread activities.
> >
> > I don't have any clues of what is causing this behavior, and I'm unable
> to
> > reproduce it under controlled conditions. I've observed the issue both on
> > severs with and without workload (apart from the usual replication load).
> I
> > am sure no changes were applied to the server or to the cluster.
> >
> > I'm looking forward for suggestions and theories on the issue - all ideas
> > are welcome.
> > Thank you for your time and attention,
> > Kind regards,
> > --
> > Luis Motta Campos
> > is a DBA, Foodie, and Photographer
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:
> > http://lists.mysql.com/mysql?unsub=1
> >
> >
>



--
Claudio
Thread
5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without InterventionLuis Motta Campos23 Oct
  • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, andRecovers Without InterventionTyler Poland23 Oct
    • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, andRecovers Without InterventionClaudio Nanni23 Oct
      • RE: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, andRecovers Without InterventionHoward Hart23 Oct
        • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without InterventionLuis Motta Campos24 Oct
      • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without InterventionLuis Motta Campos24 Oct
    • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without InterventionLuis Motta Campos24 Oct