List:Replication« Previous MessageNext Message »
From:Joe Hammerman Date:July 27 2009 11:10pm
Subject:RE: master-master replication broken (conflicts) - what now?
View as plain text  
Here is the documentation I wrote regarding the process for my company. In your case, I
would STOP SLAVE on both servers to insure that you don't affect the primary Master while
you are restoring the secondary.

Do you have the filesystems which host the datadir running on Logical Volumes? If not,
you'll have to run mysqldump, and the outage duration will be increased. Also remember
that when you install all the databases from the primary master to the secondary master,
you won't have the replication user the primary master uses as a user on the secondary
db.... if that makes any sense? I would also insure that you do not have the log slave
updates flag enabled, that you have different server id's configured, and the that
auto-increment-offset values differ between the masters.

Here is a brief overview of the process:

-Lock tables on Master 1
-Record log & log pos from master
-Create an LVM snapshot of the datadir partition
-Transfer the database files to Master 2
-Bring up Master 2 as a slave
-Lock tables on Master 2
-Record log & log pos on Master 2
-Unlock tables on Master 2
-Start Slave replication on Master 1

Connect to the Master 1 MySQL database with an admin account. Run
>FLUSH TABLES WITH READ LOCK;
>SHOW MASTER STATUS;

Record this data!

Exit the MySQL shell. Run
#/usr/sbin/lvmcreate -L16G -s -n dbbackup <path/to/LVM/datadir/device>

Reconnect to the database and run:
>UNLOCK TABLES;
>GRANT REPLICATION SLAVE ON . ON '<repuser>'@'<Master 2>' IDENTIFIED BY
> '<reppass>';

Return to the shell and run

#mkdir /mnt/dbbak
#mount <path/to/snapshot/device> /mnt/dbbak

If you are using an XFS filesystem, you will need to run mount with the -onouuid,ro
options!

Now cd to the dbbak directory.

run
#tar cvzf - ./* | ssh <username>@<Master 2 URL> "cat > /<target
datadir>/dbbak.tgz"

After this completes, turn off the mysql process on master 2. umount the the /mnt/dbbak
directory and run lvremove against the dbbak device. Unpack the tarball, and bring the db
on Master 2 back up. Connect to the db and run

>STOP SLAVE;
>CHANGE MASTER TO MASTER_HOST='<URL of Master 1>', MASTER_USER="<repuser>",
> MASTER_PASSWORD="<reppass>", master_log_file="<value recorded from SHOW MASTER
> STATUS on Master 1>", master_log_pos=<value recorded from SHOW MASTER STATUS on
> Master 1>;
>START SLAVE;
>SHOW SLAVE STATUS\G

You should see Slave_IO_Running=Yes and Slave_SQL_Running=Yes. After Seconds_Behind_Master
reaches zero, we are ready to set up the replication on Master 1. Run the Grant statement
from above, with the appropriate values for the Master 1 replication user. On Master 2
run

>FLUSH TABLES WITH READ LOCK;
>SHOW MASTER STATUS;

Now, on Master 1, we can run

>CHANGE MASTER TO TO MASTER_HOST='<URL of Master 2>',
> MASTER_USER="<repuser>", MASTER_PASSWORD="<reppass>",
> master_log_file="<value recorded from SHOW MASTER STATUS on Master 2>",
> master_log_pos=<value recorded from SHOW MASTER STATUS on Master 2>;
>START SLAVE;
>SHOW SLAVE STATUS\G

Look for the output noted above.

On Master 2

>UNLOCK TABLES;

You should see both slaves 0 seconds behind master, with no errors. This can be tested by
creating and dropping a test database on each master, verifying that the changes are
reflected on the paired machine.

Finally, we need to add the machines to V.I.P., and bring up the pulse daemon to enable
failover.
Run

#/sbin/modprobe bonding

Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 and
/etc/sysconfig/network-scripts/ifcfg-bond0 so that they are similar to the following

ETH0:
DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=static

BOND0:
DEVICE=bond0
ONBOOT=yes
BOOTPROTO=none
IPADDR=10.11.8.22
NETMASK=255.255.255.0
NETWORK=10.11.8.0

Restart the network daemon, restart the pulse daemon, and verify that the V.I.P. is
functional. A sample lvs.cf file is attached.

You are done!

-----Original Message-----
From: mangoo@stripped [mailto:mangoo@stripped] 
Sent: Monday, July 27, 2009 4:04 PM
To: replication@stripped
Subject: master-master replication broken (conflicts) - what now?

Due to some problems with failover scripts, I have a broken master-master installation:
some entries were inserted on one master, some were inserted
to the other master.

For now, I stopped one master to make the setup at least usable (yes, I know I lost some
inserts this way - but it's OK, they are not important).


I'd like to remove the whole database on the "broken" master (the one which is currently
disabled) and start the replication from scratch.

How should I proceed? I know I better start with backup, but what should I do next?

I wouldn't like to remove (drop) the database on the "broken" master just to find out that
the whole database was cleared on the "good" master as
well...

Could anyone describe the procedure briefly, or point me to the documentation?


-- 
MySQL Replication Mailing List
For list archives: http://lists.mysql.com/replication
To unsubscribe:    http://lists.mysql.com/replication?unsub=1

Thread
master-master replication broken (conflicts) - what now?mangoo28 Jul
  • RE: master-master replication broken (conflicts) - what now?Joe Hammerman28 Jul
    • Re: master-master replication broken (conflicts) - what now?Tomasz Chmielewski28 Jul
      • Re: master-master replication broken (conflicts) - what now?Marcus Bointon28 Jul
Re: master-master replication broken (conflicts) - what now?Marcus Bointon29 Jul