List:General Discussion« Previous MessageNext Message »
From:Baron Schwartz Date:May 9 2007 12:38pm
Subject:Re: duplicating a replicated slave environment
View as plain text  
Hi Hank,

(CCing the list again so others can see...)

Hank wrote:
> Thanks for your reply.  Two things -- I start/stop the sql-thread once 
> daily
> as a backup strategy.  This slave has no application readers or 
> writers.  If
> anything should go wrong with the master, I have one full day to either
> correct the problem, or restore the master from the slave in case of
> disaster.  I have another server that does the same thing except on a
> two-day schedule, so in case something goes wrong with the master and the
> first slave, and I don't get to it in time, I also have a snapshot from two
> days ago. (these servers do other things, and aren't dedicated mysql rep
> slaves).

Makes sense to me.

> Second, in your suggestion you say to stop the sql-thread -- I'm assuming
> you mean stop the IO thread and capture the master log file position --
> since that's the only time I'll be able to get the master bin log position
> AND file in the 'show slave status' command on Slave A.

There are actually three file/position markers here:

             Master_Log_File: usa-bin.000267
         Read_Master_Log_Pos: 850675858
              Relay_Log_File: nepal-relay-bin.000117
               Relay_Log_Pos: 850675993
       Relay_Master_Log_File: usa-bin.000267
         Exec_Master_Log_Pos: 850675858

Relay_Master_Log_File and Exec_Master_Log_Pos are the slave's coordinates relative to 
the master.  This shows you the position, on the master, to which the slave has 
replayed changes.  The I/O thread is independent of this as you know, so all you need 
to do is make sure these files haven't been deleted from the master, and you can start 
the second slave from here.  It will re-fetch logs from the master as needed.

These values always confuse me -- I always have to refer to the manual page: 
http://dev.mysql.com/doc/refman/5.0/en/show-slave-status.html.  The most relevant parts 
for you, I think, are

"Relay_Master_Log_File

The name of the master binary log file containing the most recent event executed by the 
SQL thread.

Exec_Master_Log_Pos

The position of the last event executed by the SQL thread from the master's binary log 
(Relay_Master_Log_File). (Relay_Master_Log_File, Exec_Master_Log_Pos) in the master's 
binary log corresponds to (Relay_Log_File, Relay_Log_Pos) in the relay log."

> What I think I'll end up doing is stopping both IO+SQL threads when I know
> Slave A is complete (up to date) with the master, then capture the master
> position, clone the database files, and restart both slaves.  This way, I
> won't have to mess with the relay logs on Slave A.

This will work, too.  I only wrote the above in case you need to start another slave 
from a point when the first slave isn't caught up in the SQL thread.  Maybe it's too 
many options, and I'm just confusing you ;-)  I bet a good diagram would help.  Maybe 
I'll make one and submit it for the manual.

> 
> And I'll try out your checksum tool.  I've been wanting something like that
> -- I've pretty much relied on record counts between my master and slaves to
> check consistency, and I know that can be misleading.

Cool.  I hope it helps.

Baron

> 
> thanks,
> -Hank
> 
> On 5/8/07, Baron Schwartz <baron@stripped> wrote:
>>
>> Hi,
>>
>> Hank wrote:
>> > Hello All,
>> >
>> > I have a 4.1.14 mysql database master and slave set up.
>> >
>> > For this slave #1, I have the IO thread running constantly, and the SQL
>> > thread running once a day to update all pending updates from the master
>> > (then I shut it off).  So for most of the day, this database is static
>> > (except for the collecting relay logs).
>> >
>> > I have a new machine to be another slave of the same master (slave #2).
>> >
>> > I can not shut down or lock the master in order to copy the master
>> database
>> > to the slave #2 (it is 44GB total, and would take over an hour to 
>> copy).
>> >
>> > I have copied the (static) database from Slave #1 to Slave #2.  How can
>> I
>> > now configure Slave #2 to process the pending relay-logs and bring 
>> it up
>> to
>> > date?
>> >
>> > Obviously I would need to copy (and rename?) the relay logs, but what
>> about
>> > the master.info and relay-log.info files?
>> >
>> > Or in other words, can I use the show slave status information on Slave
>> #1
>> > to setup Slave #2 in the CHANGE MASTER TO command?
>>
>> This shouldn't be too hard to do.  I'm curious why you don't leave the 
>> SQL
>> thread running, but I guess that's off-topic.  Anyway, what you need 
>> to do
>> is
>> stop Slave A's SQL thread, look at Slave A's status, clone B from A, and
>> then
>> start Slave B from the master *at Slave A's Exec_Master file and
>> position*.
>>
>> Slave B should then ask the master for whatever binlogs it needs,
>> beginning at
>> the point corresponding to where it was cloned from.  If you want, you 
>> can
>> copy
>> the binlogs from Slave A over to it, but this is probably trickier. Not
>> that
>> there's anything wrong with doing this, but there might be more ways to
>> make a
>> mistake by looking at the wrong parameter, etc.
>>
>> For the future, if you are running on Linux, one of my favorite things to
>> do is
>> put the MySQL data, temp files, and logs on LVM.  This way you can 
>> take an
>> instantaneous snapshot of the data and copy it at your leisure.  You 
>> don't
>> have
>> to be down for an hour.  For some storage engines, you don't even have to
>> shut
>> MySQL down; you can just do FLUSH TABLES WITH READ LOCK.  Maybe that's
>> still not
>> possible for you, though.  Just a suggestion.
>>
>> There is always some danger in cloning a slave from another slave.  What
>> if
>> Slave A has somehow gotten different data from the master?  If you are
>> curious
>> whether Slave A and B are really in sync with the master,
>> mysql-table-checksum
>> (a tool I wrote) might help you:
>> http://sourceforge.net/projects/mysqltoolkit/
>> It's probably easiest for you to use the --replicate option to get a
>> consistent
>> checksum.  (If you try it, let me know if that works well for you).
>>
>> Cheers
>> Baron
>>
> 

-- 
Baron Schwartz
http://www.xaprb.com/
Thread
duplicating a replicated slave environmentHank8 May
  • Re: duplicating a replicated slave environmentBaron Schwartz9 May
Re: duplicating a replicated slave environmentBaron Schwartz9 May