The moral of the story, is don't run out of disk space, but it's a bit to
late for that now.
A quick scenario.... One master server, two backups replicating from the
master. Our data and bin logs are on two different partitions, and the
partition holding the bin logs, ran out of disk space. We saw allot of
errors in the mysql log on the master, stating that DELETE queries failed
because it was unable to write this to the bin log.
Question... Why would only DELETE fail? If it cannot write to the bin log
because it is out of disk space, shouldn't INSERT / UPDATE also fail?
Now, our slaves are going completely crazy right now. The data is beyond
inconsistent, and we're desperately trying to figure out a way to restore
the replication, without having to manually execute a good couple of million
of DELETE queries on two seperate slaves, OR to take new snapshots from the
master and redo the replication setup. It would SEEM to us that the bin log
has gotten corrupted some time during the lack of disk space.
Thus, I want to know now...
- Generally, our slaves are missing ALLOT of DELETE queries, and the slave
is now failing because it is getting duplicate records.
- Running the slave with skip-errors untill it is up to date, is not a
option. We NEED the DELETE queries to execute, because certain rows are
DELETED and then RE-INSERTED with new values. Yes, I know we should use
update, I'm just a administrator, not a programmer / developer. This is
something that the developers needs to take up.
- *IF* push comes to pull and we need to re-setup the slaves and
replication, is there a way to take a snapshot from the master, WITHOUT
having to shut down the database, OR lock the tables for long periods of
time (We are talking about a DB that executes a good 20 queries per second
on a slow day).
- Can replication be 're-started' from the CURRENT bin-log position on the
master, and if that has been done, can the 'missing' gaps in the two bin log
positions (place of failure and place of current position) be manually /
semi automatically replicated?
I hope there is someone with some wise ideas.... I can use allot of them