List:Replication« Previous MessageNext Message »
From:Jesse Date:July 22 2008 4:13pm
Subject:Re: Best way to recover from Replication Failure
View as plain text  
> About your problem: are you sure that your data dump was actually in
> sync with the master log position you used to initiate the
> replication? Are you sure that you don't have an application that
> accidentally writes to the slave?

I don't know what caused the problem to begin with.  I was too concerned 
about getting replication restarted, so I simply did a dump and restore on 
the slave (which I did improperly, but I've got it working now).  I've got 
it all re-set and running again, but next time this occurs, I'm going to 
take it much more slowly.  The perticular master/slave setup I was using IS 
a production web site/server, but it's not one that's too busy (and I do 
daily backups).  It could go for a day or more without any transactions this 
time of year, so I'm about 99.99% sure that eveything was in sync.  I did do 
a FLUSH TABLES WITH READ LOCK before doing the dump, and I kept the client 
open the whole time to ensure that everything was locked while I did the 
dump and restore on the slave.  My assumption is that the "READ LOCK" keeps 
all changes locked down until I either close the client or perform a UNLOCK 
TABLES, is that correct?

I'm 100% positive that I don't have a client writing to the slave.  The 
slave database isn't being used for anything at all.  The only transaction 
that occur on that machine are when I log on to check the slave status.  In 
getting this set up as well, I did a few GRANT commands, but that is really 
it.  I definitely didn't do anything with the database that was affected.

> As a solution, you could try resetting the slave and starting the
> replication anew with either your previous dump and master position or
> with a new one or alternatively you could try resetting both the
> master and slave. I've performed each of those and they all worked the
> way I intended, although I wouldn't recommend completely resetting the
> master if this is a production system or if you have multiple slaves.

I hadn't considered that. It hadn't really been that long since I initiated 
replication, and I still had the original dump file, but I think I didn't 
save the position.  I would need to keep track of the log position at the 
time I initiated that dump, I suppose.  I'll start logging that so next time 
it happens, I'll try that. That way I can simply work with the slave only 
without disturbing the master.  I did the complete reset on the master/slave 
the other day, and everything seems to be fine.

> Here's a step-by-step guide to resetting the slave and starting the
> replication with a new dump and master position:
> - On the slave, run STOP SLAVE to stop all slave activity, then run
> RESET SLAVE to make the slave forget its replication position and
> delete its relay log amongst others.
> - Make sure that the collation-server and default-collation
> configuration options are the same on both the master and the slave.

I don't know what these settings are.  I've checked My.INI and my.cnf  for 
"collation", and those are nowhere to be found??

> - On the master, run FLUSH TABLES WITH READ LOCK and leave that
> session open. If you close it, the tables will be unlocked and your
> data dump and log position will be out of sync.
> - Now, while having that session still opened, make a dump of the data
> you want to replicate using mysqldump or whatever tool fits your
> purpose.
> - When it's finished, run SHOW MASTER STATUS, save the output
> somewhere, free the read lock with UNLOCK TABLES and just to be safe
> close the session.
> - Import your dump on the slave and make sure that the data imported
> successfully. Then issue a full CHANGE MASTER TO statement using the
> values from the statement above.
>     MASTER_HOST='master_host_name',
>     MASTER_USER='replication_user_name',
>     MASTER_PASSWORD='replication_password',
>     MASTER_LOG_FILE='recorded_log_file_name',
>     MASTER_LOG_POS=recorded_log_position;
> I've experienced issues when using only MASTER_LOG_POS and
> MASTER_LOG_FILE, so it's probably a good idea to always include
> hostname, user and password when you want to make sure that the master
> data is changed.

That's good to know, as I usually only do MASTER_LOG_FILE and 
MASTER_LOG_POS.  I'll start using the whole thing.

> - Start the slave with START SLAVE, then run SHOW SLAVE STATUS to see
> whether everything worked as expected.
> In case you really want to reset the master too you can run RESET
> MASTER before resetting the slave. You might also want to read section
> 12.6 in the manual before doing that.
> Hope that helps,

It does, you mentioned several things I didn't know.  I appreciate the help.


Best way to recover from Replication FailureJesse22 Jul
  • Re: Best way to recover from Replication FailureMichael Bunzel22 Jul
  • Re: Best way to recover from Replication FailureJesse22 Jul
    • Re: Best way to recover from Replication FailureMarcus Bointon22 Jul
    • Re: Best way to recover from Replication FailureMichael Bunzel22 Jul
  • Re: Best way to recover from Replication FailureJesse23 Jul