On Friday 15 December 2000 12:27, Aaron Ingram wrote:
>When replicating a query having an error code, the slave will fail if that
>same error is not encountered. Assuming there's some reason to log failed
>queries, why should the slave try to execute them? If they were not
>actually written to the master, even attempting the command on the slave
>seems incorrect.
Here is an example:
create table foo(n int not null primary key);
insert into foo values (1);
insert into foo values (1),(2),(3);
the last query will return an error, but still modify the table, so it needs
to be replicated.
>
>-Aaron
>
>-----Original Message-----
>From: Aaron Ingram
>Sent: Sunday, December 10, 2000 10:20 PM
>To: 'mysql@stripped'
>Subject: Replication fails when expecting error
>
>
>I've run across the following replication error on the slave:
> 001209 2:08:36 Slave: did not get the expected error running query
>from master - expected: 'Got an error writing communication packets', got
>'no error'
> 001209 2:08:36 Slave: error running query 'delete from
>table_name'
> 001209 2:08:36 Error running query, slave aborted. Fix the
>problem, and re-start the slave thread with mysqladmin start-slave
>I've confirmed the "Got an error writing communication packets" error
>appears on the master. However, whatever problem caused that error on the
>master is not occurring on the slave. Hence the slave failure. Should a
>slave fail when encountering a mismatched error code of this type?
>
>That aside, even if I can stop the communication problem from recurring on
>the master, I still have the existing log events to deal with. How do I
>work around this problem? I really would like to avoid restarting the log
>from a fresh dump&load, especially since there's no guarantee I can stop
>this error from happening again.
>
>I'm running MySQL 3.23.28 on RedHat Linux 6.1.
You have found a rather rare bug that would be near to impossible to repeat
"at will". The query not the master has actually succeeded, but the errno in
the thread structure was set because the client dropped the connection as the
thread was trying to tell it that everything was cool. Here is a patch for
this:
--- 1.28/sql/sql_delete.cc Fri Dec 8 08:04:53 2000
+++ edited/sql/sql_delete.cc Sat Dec 16 09:54:11 2000
@@ -106,13 +106,13 @@
}
if (!error)
{
- send_ok(&thd->net); // This should return record count
mysql_update_log.write(thd,thd->query,thd->query_length);
if (mysql_bin_log.is_open())
{
Query_log_event qinfo(thd, thd->query);
mysql_bin_log.write(&qinfo);
}
+ send_ok(&thd->net); // This should return record count
}
DBUG_RETURN(error ? -1 : 0);
}
Resuming the replication would be a rather tricky task :
SHOW SLAVE STATUS; on the slave
figure out the name of the master log and the position
on the master
od -c -j offset_on_the_slave /path/to/datadir/binlog_name
then count the bytes and try to guess where the next log entry starts - it
would be about 100 bytes ahead of the current position, and the 5th byte of
the entry ( 4 bytes away from the start) will be most likely 0x02 ( the code
for query log event). For the exact offset, look at the previous entry - here
is the format:
offset size meaning
0 4 timestamp
4 1 event code ( 0x02 for query)
5 4 orginating server id
9 4 event size
all integers are little endian
And here is the code that creates it ( the ultimate reference :-) ):
int Log_event::write_header(IO_CACHE* file)
{
// make sure to change this when the header gets bigger
char buf[LOG_EVENT_HEADER_LEN];
char* pos = buf;
int4store(pos, when); // timestamp
pos += 4;
*pos++ = get_type_code(); // event type code
int4store(pos, server_id);
pos += 4;
long tmp=get_data_size() + LOG_EVENT_HEADER_LEN;
int4store(pos, tmp);
pos += 4;
return (my_b_write(file, (byte*) buf, (uint) (pos - buf)));
}
So after you have figured out the event size of the trouble query, add it to
the current slave offset and do:
mysqlbinlog -j new_offset /path/to/datadir/binlog_name | head -1
You should see the next query printed out in plain text - if you do, you got
the offset right - if not, check your arithmetic and try again.
Once you got the offset right, on the slave:
CHANGE MASTER TO MASTER_LOG_POS=new_offset;
SLAVE START;
SHOW SLAVE STATUS;
it should now be going, and your data on the slave should be ok, as the
delete query that we have skipped by adjusting the offset has already
happened.
In 3.23.30, I will change the code on the slave to print the next offset in
case of a query error, so one could skip the trouble query in case something
really terrible happens without having to do the binlog magic.
--
MySQL Development Team
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sasha Pachev <sasha@stripped>
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/
/_/ /_/\_, /___/\___\_\___/ Provo, Utah, USA
<___/