List:General Discussion« Previous MessageNext Message »
From:Christopher E. Brown Date:October 30 2007 2:44am
Subject:Re: Problem with *very* slow replication
View as plain text  
On Thu, 25 Oct 2007, Bob@stripped wrote:

> Not sure that I get the whole picture.
> We have been running replication since about 4.0 and we have been through 
> several upgrades and are now at 5.0.27.
> The 'show slave status' always gives us an accurate reflection of where it is 
> at which is usually 0 seconds behind.
> Occasionally, it falls behind if the master is really busy (>2200 q/s with 
> about 70% being updates/deletes/inserts).
> At those times the slave tops out at about 1200 q/s of which most are db mods 
> of some kind and some selects since we have reports running against the 
> replica and it will fall behind temporarily.
> Can you send show slave status and show master status as well and typical 
> mytop outputs for master and slave?
> That might let me be able to provide more help.
> Bob

Unfortunatly I had to tear down replication as it was causing problems 
with the master.  (The master will not delete binlogs that a slave is 
still loading, when the slave is 40 file behind disk gets short).

CPU load was near zero on both systems (98% idle or better).

Disk load is minimal.

The slave is always up to date with relay file processing and reporting 
zero seconds behind.

In short, everything looks fine.

What happens is that the master -> slave binlog feed runs very slow (no 
more than abount 10 writes/sec).

So, afer a few days the slave is still reporting zero seconds behind, and 
it is zero seconds behind the relay log.

The problem is that while the master is currently writing binlog 650, the 
slave is actually zero seconds behind the feed, but the binlog feed has 
fallen 20 - 30 files behind (our binlog rolls at 256M).

Since there is no load issue, I expect there is a timing or trigger issue 
with the master side proc doing the binlog dump, or the slave side 
receiving it.

I can stop/start replication and/or reload both servers, it still holds.

I see the replication restart, with the slave running zero seconds behind 
the relay log, the binlog feed starts up right where it left off but the 
feed only runs at about 10 writes a second.

Are your running native or LinuxThreads?  This is smelling like threading 
issue to me (we are running FreeBSD 6.2 with native threading and 5.1.19).

The exact same setup was pre-built on Linux systems (2.6.x Slackware) 
before being built out on the production systems (FreeBSD 6.2).

During the testing 1000 writes/sec were no problem (small/simple table, 
fits in memory).  When I forced a backlog of approx 2GB by shuttong down 
the slave on restart the binlog -> relay log feed ran at over 25MB/sec 
until caught up.
Re: Problem with *very* slow replicationChristopher E. Brown30 Oct
  • SOLVED: Problem with *very* slow replication, FreeBSD 6.2Christopher E. Brown3 Nov
Re: SOLVED: Problem with *very* slow replication, FreeBSD 6.2Christopher E. Brown5 Nov
  • Re: SOLVED: Problem with *very* slow replication, FreeBSD 6.2Baron Schwartz5 Nov
    • Re: SOLVED: Problem with *very* slow replication, FreeBSD 6.2Bob Bankay X-AST : 7731^29u18e35 Nov