List:General Discussion« Previous MessageNext Message »
From:Luis Motta Campos Date:October 23 2011 9:40am
Subject:5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without Intervention
View as plain text  
Fellow DBAs and MySQL Users

[apologies for eventual duplicates - I've posted this to
percona-discussion@stripped also]

I've been hunting an issue with my database cluster for several months now without much
success. Maybe I'm overlooking something here.

I've been observing the database slowing down and lagging behind for thousands of seconds
(sometimes over the course of several days) even without any query load besides
replication itself.

I am running Percona MySQL 5.1.51 (InnoDB plug-in version 1.12) on Dell R710 (6 x 3.5 inch
15K RPM disks in RAID10; 24GB RAM; 2x Quad-core Intel processors) running Debian Lenny.
MySQL data, binary logs, relay logs, innodb log files are on separated partitions from
each other, on a RAID system separated from the operating system disks.

Default Storage Engine is InnoDB, and the usual InnoDB memory structures are stable and
look healthy.

I have about 500 (read) queries per second on average, and about 10% of this as writes on
the master.

I've been observing something that looks like between 6 and 10 pending reads per second
uniformly on my cacti graphs.

The issue is characterized by the server suddenly slowing down writes without any previous
warning or change, and lagging behind for several thousand seconds (triggering all sorts
of alerts on my monitoring system). I don't observe extra CPU activity, just a reduced
disk access ratio (from about 5-6MB/s to 500KB/s) and replication lagging. I could
correlate it neither InnoDB hashing activity, nor with long-running-queries, nor with
background read/write thread activities.

I don't have any clues of what is causing this behavior, and I'm unable to reproduce it
under controlled conditions. I've observed the issue both on severs with and without
workload (apart from the usual replication load). I am sure no changes were applied to
the server or to the cluster.

I'm looking forward for suggestions and theories on the issue - all ideas are welcome. 
Thank you for your time and attention,
Kind regards,
--
Luis Motta Campos
is a DBA, Foodie, and Photographer

Thread
5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without InterventionLuis Motta Campos23 Oct
  • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, andRecovers Without InterventionTyler Poland23 Oct
    • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, andRecovers Without InterventionClaudio Nanni23 Oct
      • RE: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, andRecovers Without InterventionHoward Hart23 Oct
        • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without InterventionLuis Motta Campos24 Oct
      • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without InterventionLuis Motta Campos24 Oct
    • Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without InterventionLuis Motta Campos24 Oct