MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:Jeremy Zawodny Date:April 30 2002 7:08am
Subject:Re: 4.0.2 Replication still buggy...
View as plain text  
On Fri, Apr 26, 2002 at 01:28:59PM -0600, Sasha Pachev wrote:


Sasha, here's an update with some context preserved for your
sanity. :-)

> > I recently wiped my 4.0.2 slave clean and installed the latest
> > 4.0.2, built from the BK tree.  Then I synced it up with a nearby
> > slave running 3.23.47 (using rsync after I had flushed the tables
> > on the other slave and run a "SLAVE STOP").
> > 
> > I started it up and it ran for about a day before it ran into a
> > duplicate key error.  The 3.23.47 slave hasn't hit the duplicate
> > key error, nor have any of our other slaves.  So it is a 4.0.2 bug
> > of some sort.


>   * use mysqlbinlog to check the current relay log in the SQL thread
> ( you can see it in SHOW SLAVE STATUS) at the current position and
> one event prior to see if this is the same query. You will probably
> want to do mysqlbinlog log-name > log.sql, search for the current
> position in a text editor, and then scroll back one entry

I spent all of Friday on that--literally.  And you have the patch I
made for mysqlbinlog to make the job easier.  After hours of digging
thru logs on the master, the 4.0.2 slave, and a 3.23.47 slave, I've
concluded that they're all getting and attempted to execute exactly
the same queries.  I found no evidence of duplication or missing
records in the relay log or binary log.

The problem is something a bit more mysterious in 4.0.2.  I don't yet
know if it is replication specific or not.

But the good news is that it *seems* to be repeatable.  After hacking
on this for quite some time on Friday, I just let the slave sit.  Then
on Monday, I did:

  * slave stop
  * set sql_slave_skip_counter = 1
  * slave start

and let it run like crazy.  After a couple hours went buy, I found
that the slave had stopped again.  It had the same error on the same
table on the same basic query!

So I'm working to reproduce it under more controlled circumstances.
I'd rather not submit a bug report that requires 2GB of relay log
files to test. :-)

As a side note, do you recall my complaints about how it can take a
long time to get the "mysql> " prompt back (on FreeBSD) after
executing the "slave stop; set ... ; slave start;" sequence?

This time I watched in another window to see what MySQL was doing
while I waited for my prompt to come back.  To my surprise, the binary
log on the slave was updating, so it was clearly replaying queries
from the relay log.  But it still took another 20 seconds or so to get
that prompt back.

Strange.  It feels like there's a lock it's trying to obtain
unnecessarily or something.

Anyway, that's where things stand.

There's *some bug* in 4.0.2, but I'm no longer convinced that it's a
replication bug.

Jeremy D. Zawodny, <jzawodn@stripped>
Technical Yahoo - Yahoo Finance
Desk: (408) 349-7878   Fax: (408) 349-5454   Cell: (408) 685-5936

MySQL 3.23.47-max: up 81 days, processed 2,110,710,335 queries (299/sec. avg)
4.0.2 Replication still buggy...Jeremy Zawodny26 Apr
  • Re: 4.0.2 Replication still buggy...Sasha Pachev26 Apr
    • Re: 4.0.2 Replication still buggy...Jeremy Zawodny27 Apr
    • Re: 4.0.2 Replication still buggy...Jeremy Zawodny30 Apr
  • Solved (Re: 4.0.2 Replication still buggy...)Jeremy Zawodny2 May
    • Re: Solved (Re: 4.0.2 Replication still buggy...)Sasha Pachev3 May
      • Re: Solved (Re: 4.0.2 Replication still buggy...)Jeremy Zawodny3 May
      • Re: Solved (Re: 4.0.2 Replication still buggy...)Michael Widenius16 May