Hi!
>>>>> "Simon" == Simon Cocking <simon@stripped> writes:
Simon> [ Crossposted to the internals mailing list ]
Simon> Hi Sasha,
Simon> Good news -- I've isolated the cause of my replication failures. While
Simon> discussing the problem with a colleauge, I realised that the
Simon> replication error messages were occurring in 30 sec intervals almost
Simon> 100% of the time. Digging through the documentation revealed the
Simon> net_read_timeout and net_write_timeout variables. I now have these set
Simon> to 600 secs each, and that fixed it.
Simon> However, in my mind these variables represent something of a design
Simon> flaw. Essentially, any query which took longer than net_read_timeout
Simon> seconds to be transmitted over a WAN link would cause replication to
Simon> fail, with no obvious reason. Now, when all replicated servers are
Simon> located on a fast LAN this problem never surfaces, but surely MySQL's
Simon> design should allow for replication over large distances? If I
Simon> understand it correctly, even a client-server connection over a slow
Simon> WAN link would consistently fail in the face of a large query.
The above variables exists to ensure that a packet doesn't take too
long to be read/transmitted. If you have slow connections and want
to support these, then you should set up the value for the variables.
Simon> Initially, my reading of the net_read_timeout variable led me to
Simon> believe that this value represented the amount of time that a query
Simon> connection would be *idle* before it was timed out and closed.
This is the meaning with the wait_timout variable
Simon> However, it actually represents the *total* amount of time a remote
Simon> query can take. This makes it impossible to set MySQL up to perform
Simon> with 100% reliability when components are located at either ends of a
Simon> (potentially slow) network.
With the current variables you can:
- Ensure that no send takes longer than a certain total time.
- Ensure that no read takes longer than a certain total time.
- That the connection is not idle too long.
What timeout value have we missed?
Regards,
Monty