I have run into this myself before as well, where the partition housing
the bin log filled up. In our case, there weren't failed queries on the
master; the integrity of data on the master was fine, and the slave was
simply out of date, but not error-full. However, even in that case (and
it sounds like your case is more extreme than ours was), we had to
rebuild the slaves. Without *any* log of what queries were run on the
master, we were unable to bring the slave up to date.
Unless you have some other log (mysql's general query log, or if your
application logs all queries somewhere independent of mysql), you won't
be able to restore replication. Even if you do, parsing through such a
query log and figuring out what the slaves have/have not run will be
difficult and error-prone.
I don't understand what would cause DELETE queries to fail; that sounds
like it could have caused corruption of your tables on the master
itself, particularly if you are not using a transactional database
engine. Really, I don't know, I've never seen that error myself.
As per recreating the slaves and restarting replication fresh, there are
some good methods of doing that. If you are using the InnoDB storage
engine, then you can easily create a new slave w/o interrupting service
on the master. See http://www.innodb.com/index.php for commercial
software, and see http://dev.mysql.com/doc/mysql/en/mysqldump.html and
http://dev.mysql.com/doc/mysql/en/backup.html for an explanation of how
to use mysqldump utility program to do the same thing if you are running
mysql 4.1.8 or newer.
One final point, I highly recommend creating a monitoring program that
watches disk usage and the status of replication between your master and
slave servers, and removes old binary log files before the disk fills
up. I wrote a perl script that does this simply by executing "FLUSH
MASTER LOGS TO 'filename'", and removes only the oldest binary log file
if the disk is above 80% capacity. That way, this dreadful sort of
problem should never come up again.
Chris Knipe wrote:
> Hi all,
> The moral of the story, is don't run out of disk space, but it's a bit to
> late for that now.
> A quick scenario.... One master server, two backups replicating from the
> master. Our data and bin logs are on two different partitions, and the
> partition holding the bin logs, ran out of disk space. We saw allot of
> errors in the mysql log on the master, stating that DELETE queries failed
> because it was unable to write this to the bin log.
> Question... Why would only DELETE fail? If it cannot write to the bin log
> because it is out of disk space, shouldn't INSERT / UPDATE also fail?
> Now, our slaves are going completely crazy right now. The data is beyond
> inconsistent, and we're desperately trying to figure out a way to restore
> the replication, without having to manually execute a good couple of
> of DELETE queries on two seperate slaves, OR to take new snapshots from the
> master and redo the replication setup. It would SEEM to us that the bin
> has gotten corrupted some time during the lack of disk space.
> Thus, I want to know now...
> - Generally, our slaves are missing ALLOT of DELETE queries, and the slave
> is now failing because it is getting duplicate records.
> - Running the slave with skip-errors untill it is up to date, is not a
> option. We NEED the DELETE queries to execute, because certain rows are
> DELETED and then RE-INSERTED with new values. Yes, I know we should use
> update, I'm just a administrator, not a programmer / developer. This is
> something that the developers needs to take up.
> - *IF* push comes to pull and we need to re-setup the slaves and
> replication, is there a way to take a snapshot from the master, WITHOUT
> having to shut down the database, OR lock the tables for long periods of
> time (We are talking about a DB that executes a good 20 queries per second
> on a slow day).
> - Can replication be 're-started' from the CURRENT bin-log position on the
> master, and if that has been done, can the 'missing' gaps in the two bin
> positions (place of failure and place of current position) be manually /
> semi automatically replicated?
> I hope there is someone with some wise ideas.... I can use allot of them
> right now.