2013/2/3 Larry Martell <larry.martell@stripped>
> We also ended up dropping the database and restoring from dumps.
> However all recent dumps ended up having a similar corruption and we
> were still getting the same errors. We had to go back to an October
> dump before it would come up cleanly. And our db is fairly large, and
> it takes around 4 hours to load a dump. We were working on this Friday
> from 8:30 AM until 4AM Saturday before we got it going. And now we're
> trying to recall all the alters we did since then, and reload all the
> data since then, most of which is in files we can import. The only
> thing we don't have is the manual updates done. All in all a total
> disaster and something that will make us rethink our procedures.
> Perhaps we'll look at replication, although I don't know if that would
> have helped in this case.
I am sorry to read this. I hope you guys recovered everything already.
I would like to suggest something though.
From my point of view it is always good to backup just schemas (without
data) aside from regular data backups, that's to say, combine both. If
something like this happens, you can always do a diff and get the schemas
recovered in a matter of minutes.
Generally, schemas are pretty light and they won't use any significant disk
About the replication solution....I would strongly recommend to use it if
possible in your scenario.
Clearly it won't prevent any data-loss generated by a bad statement (UPDATE
without where, DELETE * from etc). Albeit, if you're thinking to have a
dedicated slave for backups you might want to use pt-delay-slave (
http://www.percona.com/doc/percona-toolkit/2.1/pt-slave-delay.html) so you
can have your slave delayed XX minutes/hours and you can prevent disasters
coming from bad statements such as the ones I described earlier.
Anyways, as I was saying, if it's possible to have a server just acting as
a slave as a backup, that would help you to recover faster in corruption
due to HW problems. It would be a matter of setting it up as a master,
which generally takes minutes.
Hope you guys fixed everything already!