Hello List,
We're experiencing strange replication failures on some SQL actions,
so far we haven't been able to reproduce this failures in a clear
scenario.
Our setup is as following:
We have 2 Nodes (rnd-dev1, qa-dev1), both are running Red Hat 5 el,
kernel version 2.6.18-53.1.4.el5PAE.
MySQL version is 5.0.45. The machines are in the same netmask and no
firewall is configured between them.
Replication is configured with master-master replication.
my.cnf of both nodes:
rnd-dev1 :
"""
[mysqld]
# binary logs are what makes replication tic with master-slave(s) setup.
log-bin
expire_logs_days=14
# relay logs are used by slave to store statements before execution.
relay-log=ml-relay-log
relay-log-index=ml-relay-log-index
# Unique on each node !
server-id=115
auto_increment_increment=2
auto_increment_offset=2
# General system logs
log=/var/log/COMPANY/MySQL.log
log-error=/var/log/COMPANY/MySQL_error.log
log-slow-queries=/var/log/COMPANY/MySQL_slow-queries.log
log-warnings=2
# mysql
replicate-wild-ignore-table=mysql.%
# scheme1
replicate-wild-do-table=scheme1.%
# scheme2
replicate-wild-do-table=scheme2.tbl1
replicate-wild-do-table=scheme2.tbl2
replicate-wild-ignore-table=scheme2.tbl3
replicate-wild-ignore-table=scheme2.tbl4
replicate-wild-ignore-table=scheme2.tbl5
replicate-wild-ignore-table=scheme2.tbl6
replicate-wild-ignore-table=scheme2.tbl7
"""
qa-dev1:
"""
[mysqld]
# binary logs are what makes replication tic with master-slave(s) setup.
log-bin
expire_logs_days=14
# relay logs are used by slave to store statements before execution.
relay-log=ml-relay-log
relay-log-index=ml-relay-log-index
# Unique on each node !
server-id=114
auto_increment_increment=2
auto_increment_offset=1
# General system logs
log=/var/log/COMPANY/MySQL.log
log-error=/var/log/COMPANY/MySQL_error.log
log-slow-queries=/var/log/COMPANY/MySQL_slow-queries.log
log-warnings=2
# mysql
replicate-wild-ignore-table=mysql.%
# scheme1
replicate-wild-do-table=scheme1.%
# scheme2
replicate-wild-do-table=scheme2.tbl1
replicate-wild-do-table=scheme2.tbl2
replicate-wild-ignore-table=scheme2.tbl3
replicate-wild-ignore-table=scheme2.tbl4
replicate-wild-ignore-table=scheme2.tbl5
replicate-wild-ignore-table=scheme2.tbl6
replicate-wild-ignore-table=scheme2.tbl7
"""
Under normal conditions we have replication working fine, tested under
load of INSERT, UPDATE and co.
We suspect that it breaks when we do a procedure that involves
dropping of temporary tables. This process of ours does
1. DROP tmpTblA
2. CREATE tmpTblA
3. INSERT ~380 records per second into tmpTblA during ~60 seconds.
4. SELECT INTO tmpTblB small fragments from tmpTblA
5. SELECT FROM tmpTblB INSERT INTO tblA
The differences between this procedure and normal operation being in
the fact here DROP TABLE and CREATE TABLE commands are issued.
We notice the failure because a DROP TABLE statement replicated from
rnd-dev1 (Where it executed OK) to qa-dev1 where this table was not
created so the DROP TABLE failed and replication stopped.
Running this procedure again does not reproduce the replication failure.
The difference in the bin-log before and after the procedure is ~20mb:
Before:
-rw-rw---- 1 mysql mysql 270139448 Jan 30 14:04 rnd-dev1-bin.000047
After:
-rw-rw---- 1 mysql mysql 291102580 Jan 30 15:26 rnd-dev1-bin.000047
Trying to debug this further I'm suspecting that for some reason some
commands or "command blocks" are not being replicated, which leads me
to think that we notice this now only because here a DROP is issued
which causes a replication error while in other times command were
just skipped.
--
Cheers,
Maxim Veksler
"Free as in Freedom" - Do u GNU ?