From: Dmitry Lenev Date: May 31 2011 1:44pm Subject: Re: bzr commit into mysql-trunk branch (davi:3134) Bug#11758972 Bug#11762221 List-Archive: http://lists.mysql.com/commits/138455 Message-Id: <20110531134433.GC2583@bandersnatch> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Hello Davi! Here are my comments about your patch: * Davi Arnaut [11/05/31 15:21]: > # At a local mysql-trunk repository of davi > > 3134 Davi Arnaut 2011-05-31 > Bug#11762221 - 54790: Use of non-blocking mode for sockets limits performance > Bug#11758972 - 51244: wait_timeout fails on OpenSolaris > > The problem was that a optimization for the case when the server > uses alarms for timeouts could cause a slowdown when socket > timeouts are used instead. In case alarms are used for timeouts, > a non-blocking read is attempted first in order to avoid the > cost of setting up a alarm and if this non-blocking read fails, > the socket mode is changed to blocking and a alarm is armed. > > If socket timeout is used, there is no point in attempting a > non-blocking read first as the timeout will be automatically > armed by the OS. Yet the server would attempt a non-blocking > read first and later switch the socket to blocking mode. This > could inadvertently impact performance as switching the blocking > mode of a socket requires at least two calls into the kernel > on Linux, apart from problems inherited by the scalability > of fcntl(2). > > The solution is to remove alarm based timeouts from the > protocol layer and push timeout handling down to the virtual > I/O layer. This approach allows the handling of socket timeouts > on a platform-specific basis. The blocking mode of the socket > is no longer exported and VIO read and write operations either > complete or fail with a error or timeout. > > On Linux, the MSG_DONTWAIT flag is used to enable non-blocking > send and receive operations. If the operation would block, > poll() is used to wait for readiness or until a timeout occurs. > This strategy avoids the need to set the socket timeout and > blocking mode twice per query. > > On Windows, as before, the timeout is set on a per-socket > fashion. In all remaining operating systems, the socket is set > to non-blocking mode and poll() is used to wait for readiness > or until a timeout occurs. > > In order to cleanup the code after the removal of alarm based > timeouts, the low level packet reading loop is unrolled into > two specific sequences: reading the packet header and the > payload. This make error handling easier down the road. ^^^^^ Typo: makes. > > In conclusion, benchmarks have shown that these changes do not > introduce any performance hits and actually slightly improves > the server throughput for higher numbers of threads. > > - Incompatible changes: > > A timeout is now always applied to a individual receive or > send I/O operation. In contrast, a alarm based timeout was > applied to an entire send or receive packet operation. That > is, before this patch the timeout was really a time limit > for sending or reading one packet. > > Building and running MySQL on POSIX systems now requires > support for poll() and O_NONBLOCK. These should be available > in any modern POSIX system. In other words, except for Windows, > legacy (non-POSIX) systems which only support O_NDELAY and > select() are no longer supported. > > O Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT ^^ Typo: On > is no longer 20 seconds. The default value now is no timeout > (infinite), the same as in all other platforms. > > Packets bigger than the maximum allowed packet size are no > longer skipped. Before this patch, if a application sent a > packet bigger than the maximum allowed packet size, or if > the server failed to allocate a buffer sufficiently large > to hold the packet, the server would keep reading the packet > until its end. Now the session is simply disconnected if the > server cannot handle such large packets. Please don't forget to notify the Connectors Team about this incompatible change. > The client socket buffer is no longer cleared (drained) > before sending commands to the server. Before this patch, > any data left in the socket buffer would be drained (removed) > before a command was sent to the server, in order to work > around bugs where the server would violate the protocol and > send more data. The only check left is a debug-only assertion > to ensure that the socket buffer is empty. ... > @ include/violite.h > Introduce a vio_io_wait method which can be used to wait > for I/O events on a VIO. The supported I/O events are read > and write. The notification of error conditions is implicit. > > Extend vio_reset, which was SSL-specific, to be able to > rebind a new socket to an already initialized Vio object. > > Remove vio_is_blocking. The blocking mode is no longer > exported. Also, remove fcntl_mode as the blocking mode is > no longer buffered. It is now a product of the timeout. > > Add prototype for a wrapper vio_timeout() function, which > invokes the underlying timeout method of the VIO. Add to > VIO two member variables to hold the read and write timeout > values. > > Rename vio_was_interrupted to vio_was_timeout. > > Remove vio_poll_read, which is now superseded vio_io_wait. > > Introduce vio_connect to avoid code duplication. I think in this version of your patch vio_connect is called vio_socket_connect. Please update file comment. > @ sql-common/client.c > Look into last_errno to check whether the socket was > interrupted due to a timeout. > > Add a couple of wrappers to retrieve the connect_timeout > option value. > > Use vio_io_wait instead of vio_poll_read. > > The result of a failed socket operation is in socket_errno, > which is WSAGetLastError on Windows. > > Replace my_connect() and wait_for_data() with vio_connect() Same here. > to avoid code duplication. The polling inside wait_for_data() > is somewhat equivalent to wait_for_io(). Also, this can be > considered a bug fix as wait_for_data() would incorrectly > use POLLIN (there is data to read) to wait (poll) for the > connection to be established. This problem probably never > surfaced because the OS network layer timeout is usually > lower then our default. > @ vio/viopipe.c > Simplify the pipe transport code and adjust to handle the > new timeout interface. Maybe it makes sense also to mention that most of pipe transport code was moved here from viosocket.c ? > @ vio/viosocket.c > Use vio_ssl_errno to retrieve the error number if the VIO > type is SSL. > I think the above sentence is no longer relevant in the current version of your patch. Please remove it. > Add the vio_socket_io_wait() helper which wraps vio_io_wait() > in order to avoid code duplication in vio_write()/vio_read(). > The function selects a appropriate timeout (based on the passed ^^ Typo: an > event) and translates the return value. > > Re-implement vio_read()/vio_write() as simple event loops. The > loop consists of retrying the I/O operation until it succeeds > or fails with a non-recoverable error. If it fails with error > set to EAGAIN or EWOULDBLOCK, vio_io_wait() is used to wait > for the socket to become ready. On Linux, the MSG_DONTWAIT > flag is used to get the same effect as if the socket is in > non-blocking mode. > > Add vio_set_blocking() which tweaks the blocking mode of > a socket. > > Add vio_win32_timeout() which sets the timeout of a socket > on Windows. > There is no vio_win32_timeout() in this version of your patch. Please remove this sentence. > Add vio_socket_timeout() which implements the logic around > when a socket should be switched to non-blocking mode. Except > on Windows, the socket is put into non-blocking mode if a > timeout is set. > > vio_should_retry now indicates whether a I/O operation was > interrupted by a temporary failure (interrupted by a signal) > and should be retried later. vio_was_timeout indicates > whether a I/O operation timed out. > > Remove socket_poll_read, now superseded by vio_io_wait. > > Implement vio_io_wait() using select() on Windows. Otherwise, > use poll(). vio_io_wait() will loop until either a timeout > occurs, or the socket becomes ready. In the presence of > spurious wake-ups, the wait is resumed if possible. > > Move code related to the transport types named pipe and > shared memory to their own files. > > Introduce vio_connect to avoid code duplication. The > function implements a timed connect by setting the socket > to non-blocking and polling until the connection is either > established or fails. Same comment about vio_socket_connect(). ... > === modified file 'extra/yassl/include/openssl/ssl.h' > --- a/extra/yassl/include/openssl/ssl.h 2011-03-29 08:01:07 +0000 > +++ b/extra/yassl/include/openssl/ssl.h 2011-05-31 11:12:00 +0000 > @@ -539,13 +539,23 @@ void MD5_Final(unsigned char*, MD5_CTX*) > #define SSL_DEFAULT_CIPHER_LIST "" /* default all */ > > > -/* yaSSL adds */ > +/* yaSSL extensions */ > int SSL_set_compression(SSL*); /* turn on yaSSL zlib compression */ > char *yaSSL_ASN1_TIME_to_string(ASN1_TIME *time, char *buf, size_t len); > > +#include "transport_types.h" > > +/* > + Set functions for yaSSL to use in order to send and receive data. > > + These hooks are offered in order to enable non-blocking I/O. If > + blocking I/O is used, yaSSL defaults to using send() and recv(). As discussed on IRC the above sentence should be: "If the hooks are not set, yaSSL defaults ...". ... Otherwise I am OK with your patch and think that it can be pushed into trunk once the above minor issues are addressed. Great work! -- Dmitry Lenev, Software Developer Oracle Development SPB/MySQL, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification