From: Luis Motta Campos Date: October 24 2011 6:10am Subject: Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without Intervention List-Archive: http://lists.mysql.com/mysql/226161 Message-Id: <32C7C785-A054-4B8B-8569-388C1E65B27E@yahoo.co.uk> MIME-Version: 1.0 (iPad Mail 8L1) Content-Type: multipart/alternative; boundary=Apple-Mail-105--320453598 Content-Transfer-Encoding: 7bit --Apple-Mail-105--320453598 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Claudio,=20 Thank you for your interest.=20 I will wait for the issue to happen again and will see what kind of informat= ion I can get back with strace. This is indeed something I didn't think of t= rying yet.=20 I'll keep you people posted on this.=20 Much appreciated on the new approaches and fresh ideas.=20 Kind regards, -- Luis Motta Campos On 23 Oct 2011, at 23:27, Claudio Nanni wrote: > Luis, >=20 > Very hard to tackle. > In my experience, excluding external(to mysql) bottlenecks, like hardware,= > o.s. etc, 'suspects' are the shared resources 'guarded' by unique mutexes,= > like on the query cache or key cache. > Since you do not use MySQL it cannot be the key cache. Since you use perco= na > the query cache is disabled by default. > You should go a bit lower level and catch the system calls with one of the= > tools you surely know to see if there are waits on the semaphores. >=20 > I also would like to tell that the 'seconds behind master' reported by the= > slave is not reliable. >=20 > Good luck! >=20 > Claudio >=20 > 2011/10/23 Tyler Poland >=20 >> Luis, >>=20 >> How large is your database? Have you checked for an increase in write >> activity on the master leading up to this? Are you running a backup again= st >> the replica? >>=20 >> Thank you, >> Tyler >>=20 >> Sent from my Droid Bionic >> On Oct 23, 2011 5:40 AM, "Luis Motta Campos" >> wrote: >>=20 >>> Fellow DBAs and MySQL Users >>>=20 >>> [apologies for eventual duplicates - I've posted this to >>> percona-discussion@stripped also] >>>=20 >>> I've been hunting an issue with my database cluster for several months >> now >>> without much success. Maybe I'm overlooking something here. >>>=20 >>> I've been observing the database slowing down and lagging behind for >>> thousands of seconds (sometimes over the course of several days) even >>> without any query load besides replication itself. >>>=20 >>> I am running Percona MySQL 5.1.51 (InnoDB plug-in version 1.12) on Dell >>> R710 (6 x 3.5 inch 15K RPM disks in RAID10; 24GB RAM; 2x Quad-core Intel= >>> processors) running Debian Lenny. MySQL data, binary logs, relay logs, >>> innodb log files are on separated partitions from each other, on a RAID >>> system separated from the operating system disks. >>>=20 >>> Default Storage Engine is InnoDB, and the usual InnoDB memory structures= >>> are stable and look healthy. >>>=20 >>> I have about 500 (read) queries per second on average, and about 10% of >>> this as writes on the master. >>>=20 >>> I've been observing something that looks like between 6 and 10 pending >>> reads per second uniformly on my cacti graphs. >>>=20 >>> The issue is characterized by the server suddenly slowing down writes >>> without any previous warning or change, and lagging behind for several >>> thousand seconds (triggering all sorts of alerts on my monitoring >> system). I >>> don't observe extra CPU activity, just a reduced disk access ratio (from= >>> about 5-6MB/s to 500KB/s) and replication lagging. I could correlate it >>> neither InnoDB hashing activity, nor with long-running-queries, nor with= >>> background read/write thread activities. >>>=20 >>> I don't have any clues of what is causing this behavior, and I'm unable >> to >>> reproduce it under controlled conditions. I've observed the issue both o= n >>> severs with and without workload (apart from the usual replication load)= . >> I >>> am sure no changes were applied to the server or to the cluster. >>>=20 >>> I'm looking forward for suggestions and theories on the issue - all idea= s >>> are welcome. >>> Thank you for your time and attention, >>> Kind regards, >>> -- >>> Luis Motta Campos >>> is a DBA, Foodie, and Photographer >>>=20 >>>=20 >>> -- >>> MySQL General Mailing List >>> For list archives: http://lists.mysql.com/mysql >>> To unsubscribe: >>> http://lists.mysql.com/mysql?unsub=3Dtpoland@stripped >>>=20 >>>=20 >>=20 >=20 >=20 >=20 > --=20 > Claudio --Apple-Mail-105--320453598--