List:General Discussion« Previous MessageNext Message »
From:Devananda Date:July 20 2005 7:45pm
Subject:Re: database problems.
View as plain text  
Hi Chris,

I have run into this myself before as well, where the partition housing 
the bin log filled up. In our case, there weren't failed queries on the 
master; the integrity of data on the master was fine, and the slave was 
simply out of date, but not error-full. However, even in that case (and 
it sounds like your case is more extreme than ours was), we had to 
rebuild the slaves. Without *any* log of what queries were run on the 
master, we were unable to bring the slave up to date.

Unless you have some other log (mysql's general query log, or if your 
application logs all queries somewhere independent of mysql), you won't 
be able to restore replication. Even if you do, parsing through such a 
query log and figuring out what the slaves have/have not run will be 
difficult and error-prone.

I don't understand what would cause DELETE queries to fail; that sounds 
like it could have caused corruption of your tables on the master 
itself, particularly if you are not using a transactional database 
engine. Really, I don't know, I've never seen that error myself.

As per recreating the slaves and restarting replication fresh, there are 
some good methods of doing that. If you are using the InnoDB storage 
engine, then you can easily create a new slave w/o interrupting service 
on the master. See http://www.innodb.com/index.php for commercial 
software, and see http://dev.mysql.com/doc/mysql/en/mysqldump.html and 
http://dev.mysql.com/doc/mysql/en/backup.html for an explanation of how 
to use mysqldump utility program to do the same thing if you are running 
mysql 4.1.8 or newer.

One final point, I highly recommend creating a monitoring program that 
watches disk usage and the status of replication between your master and 
slave servers, and removes old binary log files before the disk fills 
up. I wrote a perl script that does this simply by executing "FLUSH 
MASTER LOGS TO 'filename'", and removes only the oldest binary log file 
if the disk is above 80% capacity. That way, this dreadful sort of 
problem should never come up again.


Good luck!
Devananda vdv



Chris Knipe wrote:

> Hi all,
> 
> The moral of the story, is don't run out of disk space, but it's a bit to
> late for that now.
> 
> A quick scenario.... One master server, two backups replicating from the
> master. Our data and bin logs are on two different partitions, and the
> partition holding the bin logs, ran out of disk space.  We saw allot of
> errors in the mysql log on the master, stating that DELETE queries failed
> because it was unable to write this to the bin log.
> 
> Question... Why would only DELETE fail?  If it cannot write to the bin log
> because it is out of disk space, shouldn't INSERT / UPDATE also fail?
> 
> Now, our slaves are going completely crazy right now.  The data is beyond
> inconsistent, and we're desperately trying to figure out a way to restore
> the replication, without having to manually execute a good couple of 
> million
> of DELETE queries on two seperate slaves, OR to take new snapshots from the
> master and redo the replication setup.  It would SEEM to us that the bin 
> log
> has gotten corrupted some time during the lack of disk space.
> 
> Thus, I want to know now...
> - Generally, our slaves are missing ALLOT of DELETE queries, and the slave
> is now failing because it is getting duplicate records.
> - Running the slave with skip-errors untill it is up to date, is not a
> option.  We NEED the DELETE queries to execute, because certain rows are
> DELETED and then RE-INSERTED with new values.  Yes, I know we should use
> update, I'm just a administrator, not a programmer / developer.  This is
> something that the developers needs to take up.
> - *IF* push comes to pull and we need to re-setup the slaves and
> replication, is there a way to take a snapshot from the master, WITHOUT
> having to shut down the database, OR lock the tables for long periods of
> time (We are talking about a DB that executes a good 20 queries per second
> on a slow day).
> - Can replication be 're-started' from the CURRENT bin-log position on the
> master, and if that has been done, can the 'missing' gaps in the two bin 
> log
> positions (place of failure and place of current position) be manually /
> semi automatically replicated?
> 
> I hope there is someone with some wise ideas.... I can use allot of them
> right now.
> 
> Thanks,
> Chris.
> 
> 
Thread
database problems.Chris Knipe20 Jul
  • Re: database problems.Devananda20 Jul