Here are my comments about your patch:
* Davi Arnaut <davi.arnaut@stripped> [11/05/31 15:21]:
> # At a local mysql-trunk repository of davi
> 3134 Davi Arnaut 2011-05-31
> Bug#11762221 - 54790: Use of non-blocking mode for sockets limits performance
> Bug#11758972 - 51244: wait_timeout fails on OpenSolaris
> The problem was that a optimization for the case when the server
> uses alarms for timeouts could cause a slowdown when socket
> timeouts are used instead. In case alarms are used for timeouts,
> a non-blocking read is attempted first in order to avoid the
> cost of setting up a alarm and if this non-blocking read fails,
> the socket mode is changed to blocking and a alarm is armed.
> If socket timeout is used, there is no point in attempting a
> non-blocking read first as the timeout will be automatically
> armed by the OS. Yet the server would attempt a non-blocking
> read first and later switch the socket to blocking mode. This
> could inadvertently impact performance as switching the blocking
> mode of a socket requires at least two calls into the kernel
> on Linux, apart from problems inherited by the scalability
> of fcntl(2).
> The solution is to remove alarm based timeouts from the
> protocol layer and push timeout handling down to the virtual
> I/O layer. This approach allows the handling of socket timeouts
> on a platform-specific basis. The blocking mode of the socket
> is no longer exported and VIO read and write operations either
> complete or fail with a error or timeout.
> On Linux, the MSG_DONTWAIT flag is used to enable non-blocking
> send and receive operations. If the operation would block,
> poll() is used to wait for readiness or until a timeout occurs.
> This strategy avoids the need to set the socket timeout and
> blocking mode twice per query.
> On Windows, as before, the timeout is set on a per-socket
> fashion. In all remaining operating systems, the socket is set
> to non-blocking mode and poll() is used to wait for readiness
> or until a timeout occurs.
> In order to cleanup the code after the removal of alarm based
> timeouts, the low level packet reading loop is unrolled into
> two specific sequences: reading the packet header and the
> payload. This make error handling easier down the road.
> In conclusion, benchmarks have shown that these changes do not
> introduce any performance hits and actually slightly improves
> the server throughput for higher numbers of threads.
> - Incompatible changes:
> A timeout is now always applied to a individual receive or
> send I/O operation. In contrast, a alarm based timeout was
> applied to an entire send or receive packet operation. That
> is, before this patch the timeout was really a time limit
> for sending or reading one packet.
> Building and running MySQL on POSIX systems now requires
> support for poll() and O_NONBLOCK. These should be available
> in any modern POSIX system. In other words, except for Windows,
> legacy (non-POSIX) systems which only support O_NDELAY and
> select() are no longer supported.
> O Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT
> is no longer 20 seconds. The default value now is no timeout
> (infinite), the same as in all other platforms.
> Packets bigger than the maximum allowed packet size are no
> longer skipped. Before this patch, if a application sent a
> packet bigger than the maximum allowed packet size, or if
> the server failed to allocate a buffer sufficiently large
> to hold the packet, the server would keep reading the packet
> until its end. Now the session is simply disconnected if the
> server cannot handle such large packets.
Please don't forget to notify the Connectors Team about this
> The client socket buffer is no longer cleared (drained)
> before sending commands to the server. Before this patch,
> any data left in the socket buffer would be drained (removed)
> before a command was sent to the server, in order to work
> around bugs where the server would violate the protocol and
> send more data. The only check left is a debug-only assertion
> to ensure that the socket buffer is empty.
> @ include/violite.h
> Introduce a vio_io_wait method which can be used to wait
> for I/O events on a VIO. The supported I/O events are read
> and write. The notification of error conditions is implicit.
> Extend vio_reset, which was SSL-specific, to be able to
> rebind a new socket to an already initialized Vio object.
> Remove vio_is_blocking. The blocking mode is no longer
> exported. Also, remove fcntl_mode as the blocking mode is
> no longer buffered. It is now a product of the timeout.
> Add prototype for a wrapper vio_timeout() function, which
> invokes the underlying timeout method of the VIO. Add to
> VIO two member variables to hold the read and write timeout
> Rename vio_was_interrupted to vio_was_timeout.
> Remove vio_poll_read, which is now superseded vio_io_wait.
> Introduce vio_connect to avoid code duplication.
I think in this version of your patch vio_connect is called
vio_socket_connect. Please update file comment.
> @ sql-common/client.c
> Look into last_errno to check whether the socket was
> interrupted due to a timeout.
> Add a couple of wrappers to retrieve the connect_timeout
> option value.
> Use vio_io_wait instead of vio_poll_read.
> The result of a failed socket operation is in socket_errno,
> which is WSAGetLastError on Windows.
> Replace my_connect() and wait_for_data() with vio_connect()
> to avoid code duplication. The polling inside wait_for_data()
> is somewhat equivalent to wait_for_io(). Also, this can be
> considered a bug fix as wait_for_data() would incorrectly
> use POLLIN (there is data to read) to wait (poll) for the
> connection to be established. This problem probably never
> surfaced because the OS network layer timeout is usually
> lower then our default.
> @ vio/viopipe.c
> Simplify the pipe transport code and adjust to handle the
> new timeout interface.
Maybe it makes sense also to mention that most of pipe transport code
was moved here from viosocket.c ?
> @ vio/viosocket.c
> Use vio_ssl_errno to retrieve the error number if the VIO
> type is SSL.
I think the above sentence is no longer relevant in the current version
of your patch. Please remove it.
> Add the vio_socket_io_wait() helper which wraps vio_io_wait()
> in order to avoid code duplication in vio_write()/vio_read().
> The function selects a appropriate timeout (based on the passed
> event) and translates the return value.
> Re-implement vio_read()/vio_write() as simple event loops. The
> loop consists of retrying the I/O operation until it succeeds
> or fails with a non-recoverable error. If it fails with error
> set to EAGAIN or EWOULDBLOCK, vio_io_wait() is used to wait
> for the socket to become ready. On Linux, the MSG_DONTWAIT
> flag is used to get the same effect as if the socket is in
> non-blocking mode.
> Add vio_set_blocking() which tweaks the blocking mode of
> a socket.
> Add vio_win32_timeout() which sets the timeout of a socket
> on Windows.
There is no vio_win32_timeout() in this version of your patch.
Please remove this sentence.
> Add vio_socket_timeout() which implements the logic around
> when a socket should be switched to non-blocking mode. Except
> on Windows, the socket is put into non-blocking mode if a
> timeout is set.
> vio_should_retry now indicates whether a I/O operation was
> interrupted by a temporary failure (interrupted by a signal)
> and should be retried later. vio_was_timeout indicates
> whether a I/O operation timed out.
> Remove socket_poll_read, now superseded by vio_io_wait.
> Implement vio_io_wait() using select() on Windows. Otherwise,
> use poll(). vio_io_wait() will loop until either a timeout
> occurs, or the socket becomes ready. In the presence of
> spurious wake-ups, the wait is resumed if possible.
> Move code related to the transport types named pipe and
> shared memory to their own files.
> Introduce vio_connect to avoid code duplication. The
> function implements a timed connect by setting the socket
> to non-blocking and polling until the connection is either
> established or fails.
Same comment about vio_socket_connect().
> === modified file 'extra/yassl/include/openssl/ssl.h'
> --- a/extra/yassl/include/openssl/ssl.h 2011-03-29 08:01:07 +0000
> +++ b/extra/yassl/include/openssl/ssl.h 2011-05-31 11:12:00 +0000
> @@ -539,13 +539,23 @@ void MD5_Final(unsigned char*, MD5_CTX*)
> #define SSL_DEFAULT_CIPHER_LIST "" /* default all */
> -/* yaSSL adds */
> +/* yaSSL extensions */
> int SSL_set_compression(SSL*); /* turn on yaSSL zlib compression */
> char *yaSSL_ASN1_TIME_to_string(ASN1_TIME *time, char *buf, size_t len);
> +#include "transport_types.h"
> + Set functions for yaSSL to use in order to send and receive data.
> + These hooks are offered in order to enable non-blocking I/O. If
> + blocking I/O is used, yaSSL defaults to using send() and recv().
As discussed on IRC the above sentence should be: "If the hooks are
not set, yaSSL defaults ...".
Otherwise I am OK with your patch and think that it can be pushed
into trunk once the above minor issues are addressed. Great work!
Dmitry Lenev, Software Developer
Oracle Development SPB/MySQL, www.mysql.com
Are you MySQL certified? http://www.mysql.com/certification