List:General Discussion« Previous MessageNext Message »
From:Shawn Green (MySQL) Date:August 27 2010 1:01am
Subject:Re: master-slave replication sync problems.
View as plain text  
Hello List,

On 8/26/2010 3:00 PM, Daevid Vincent wrote:
> ssh to the slave
> mysql -uroot -pPASSWORD -P3306 -hlocalhost
> 
> show slave status\G
> 
> If the Slave IO is NOT Running, but SQL is, then simply try to restart the
> slave...
> 
> *************************** 1. row ***************************
>              Slave_IO_State:
>                 Master_Host: 10.10.10.45
>                 Master_User: slave
>                 Master_Port: 3306
>                           ...
>            Slave_IO_Running: No
>           Slave_SQL_Running: Yes
> 
> start slave; show slave status\G
> 
> Otherwise if it's a replication issue such as this, then you must skip over
> the bad SQL queries one at a time till the slave syncs.
> 
> *************************** 1. row ***************************
>              Slave_IO_State: Waiting for master to send event
>                 Master_Host: 10.10.10.41
>                 Master_User: slave
>                 Master_Port: 3306
>                           ...
>            Slave_IO_Running: Yes
>           Slave_SQL_Running: No
>             Replicate_Do_DB: agis_core_2008
>         Replicate_Ignore_DB:
>          Replicate_Do_Table:
>      Replicate_Ignore_Table:
>     Replicate_Wild_Do_Table:
> Replicate_Wild_Ignore_Table:
>                  Last_Errno: 1061
>                  Last_Error: Error 'Duplicate key name 'id_operator'' on
> query. ....
> 
> The SQL statement will give you an idea of where the master and slave went
> askew. If these are recent commands you did you can guess as to how much to
> increment the SKIP_COUNTER below, otherwise, you just have to do it one at
> a time until they sync again.
> 
> mysql> stop slave; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; start slave; show
> slave status\G
> 
> Repeat the above statements over and over until you see two YES rows.
> 
> *************************** 1. row ***************************
> 
>            Slave_IO_Running: Yes
>           Slave_SQL_Running: Yes
> 
>> -----Original Message-----
>> From: Norman Khine [mailto:norman@stripped] 
>> Sent: Thursday, August 26, 2010 6:05 AM
>> To: mysql@stripped
>> Subject: master-slave replication sync problems.
>>
>> hello,
>> i have a working master-slave replication, the problem i find is that
>> if i restart the MASTER there is a difference in the MASTER_LOG_FILE
>> and MASTER_LOG_POS on the SLAVE.
>>
>> what is the correct way to keep the two slaves in sync even after i
>> restart the server. the way i do it now is to:
>>
>> [MASTER]
>> mysql> show master status;
>>
>> +------------------+----------+----------------------+--------
>> ------------------+
>> | mysql-bin.000010 | 13405429 | upgrade,tracker,bugs |
>> mysql,information_schema |
>> +------------------+----------+----------------------+--------
>> ------------------+
>> 1 row in set (0.00 sec)
>>
>> [SLAVE]
>> mysql> stop slave;
>> mysql> change master to MASTER_HOST='master.domain.com',
>> MASTER_USER='replicator', MASTER_PASSWORD='xxx',
>> MASTER_LOG_FILE='mysql-bin.000010', MASTER_LOG_POS=13301215;
>> mysql> start slave;
>>
>> is this correct or is there a better way to do this?
>>
>> thanks
> 
> 

To me, it appears that many of you are not fully versed in the theory of 
operations for how MySQL replication actually functions. Granted, there 
are two formats for replication (ROW and STATEMENT) but the general 
process remains the same. Here is a nutshell summary of the process.

** on the master **

m1) The MySQL master is instructed to change some data.

m2) The data is changed and the results are committed to disk. If you 
rollback the changes before you commit them, then there is nothing to 
replicate. Only the InnoDB engine supports this type of rollback.

m3) The change committed in step 2 is written to the binary log

(repeat from step m1 until the Master is shutdown)


** on the slave - the IO thread **
(assuming that the slave is already configured with compatible data, a 
starting position, and the proper credentials to act as a slave)

o1) The SLAVE IO thread requests information from the master's binary 
logs. This information is identified by a file name and a byte offset 
from the start of that file

o2) The SLAVE IO thread copies all available information from the 
master's binary logs into a local copy of those logs known as the relay 
logs.

(repeat from o1 until the SLAVE IO thread is stopped(by error or by 
command) or the slave is shutdown)

** on the slave - the SQL thread **

s1) Once an unapplied change has been completely buffered into the relay 
logs, the SLAVE SQL thread attempts to apply the change to the slave's 
data.

s2) If LOG SLAVE UPDATES is enabled, copy the applied change (using the 
correct format) into the slave's binary log.

(repeat from s1 until the SLAVE SQL thread is stopped (by error or by 
command) or the slave is shutdown)

**
As you can tell by this very simplified process description, there is no 
attempt to rectify one dataset to the other. Replication operates under 
the principle that if you perform identical changes to identical sets of 
data, you will end up with identical end results.

Various replication "filters" can omit certain changes from either being 
replicated to or processed by the slave instance. Use these WITH EXTREME 
CAUTION as they can very easily create situations where the master and 
slave datasets diverge to the point that a rebuild of the slave data is 
the only effective repair.

**
When you get a replication error, such as DUPLICATE KEY, it is the 
administrator's responsibility to figure out why and fix it. Maybe some 
user on the slave added an extra row to a table? Maybe an INSERT ... 
SELECT added more rows on the slave than it did on the master? Maybe a 
DELETE on the master removed less rows than it did on the slave?

Simply skipping those problems (and others) without investigation or 
correction may allow the two datasets (master and slave) to diverge even 
more. The human administrator is required to make a judgment call on 
which version of the conflicting row is the "correct" version. Is is the 
one on the table or the change coming in from the binary log? Maybe the 
data on the slave is "correct" but it has been offset by a few rows 
inserted a long time ago.

It's your responsibility to understand and appropriately respond to the 
errors not just repeat scripted actions until the problems disappear 
temporarily.
-- 
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc.
Office: Blountville, TN
Thread
master-slave replication sync problems.Norman Khine26 Aug
  • Re: master-slave replication sync problems.Ananda Kumar26 Aug
  • Re: master-slave replication sync problems.a.smith26 Aug
    • Re: master-slave replication sync problems.Ananda Kumar26 Aug
    • Re: master-slave replication sync problems.Norman Khine26 Aug
      • Re: master-slave replication sync problems.Ananda Kumar26 Aug
      • Re: master-slave replication sync problems.a.smith26 Aug
        • Re: master-slave replication sync problems.Ananda Kumar26 Aug
          • Re: master-slave replication sync problems.a.smith26 Aug
  • Re: master-slave replication sync problems.jitendra ranjan26 Aug
    • Re: master-slave replication sync problems.a.smith26 Aug
  • RE: master-slave replication sync problems.Daevid Vincent26 Aug
    • Re: master-slave replication sync problems.MySQL)27 Aug
  • Re: master-slave replication sync problems.Todd Lyons1 Sep