From: Vladislav Vaintroub Date: November 19 2010 8:01am Subject: RE: bzr commit into mysql-5.5-runtime branch (davi:3186) Bug#54790 List-Archive: http://lists.mysql.com/commits/124368 Message-Id: <007801cb87bf$f6aedf70$e40c9e50$@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit > -----Original Message----- > From: Davi Arnaut [mailto:davi.arnaut@stripped] > Sent: Friday, November 19, 2010 3:15 AM > To: Vladislav Vaintroub > Cc: commits@stripped > Subject: Re: bzr commit into mysql-5.5-runtime branch (davi:3186) Bug#54790 > > On 11/18/10 11:37 PM, Vladislav Vaintroub wrote: > > > > > >> -----Original Message----- From: Davi Arnaut > >> [mailto:davi.arnaut@stripped] Sent: Thursday, November 18, 2010 > >> 5:44 PM To: Vladislav Vaintroub Cc: commits@stripped > >> Subject: Re: bzr commit into mysql-5.5-runtime branch (davi:3186) > >> Bug#54790 > > > > > >> I'll add a explanation. Essentially, it allows for a single, common > >> and shared implementation. If Windows XP supported WSAPoll(), I > >> would have gone for poll even there. > >> > > Ok.. Looking forward to fixing "kill" command bugs, I hope we'll > > not repeat libevent mistakes and introduce monsters like socketpairs > > to signal events (just because select() or WSAPoll() would not work > > That's roughly the plan, at least for systems where socketpairs and > alike (eventfd actually) aren't monsters. I'm afraid of monsters :-O > > > with pipes) . Old-fashioned WaitForSingle/MultipleObjects or a > > little more fancy asynchronous IO could be a better tool for the > > job :) But I digress, this has nothing to do with this patch. > > Have you seen what a monster is I/O cancellation on Windows? =:O > I guess it's a good time to walk over what I've been thinking lately. > One thing to keep in mind is that buffers given to vio_read/vio_write > are owned by their callers. When those two functions are left, there > can't be any dangling reference to those buffers. > In the case of asynchronous I/O and given that we need to support > Windows XP, it seems the only option to support timeout and notification > (cancellation) is to actually close the socket on either cancellation or > timeout. The problem is that this creates a fancy discrepancy that I'm > not really sure is worth the trouble. Closing the socket is one possibility, yes. IT seems to be is a way to get it work on XP and actually earlier, maybe NT4.0 even.. See below. > > The alternative which I think is more sane and that shouldn't suck too > much performance wise is to use WSAEventSelect. But there is a catch. > The authors of the said function decided to make FD_READ events special > in the sense that they are level triggered, while FD_WRITE events are > edge triggered. Which leaves us with the choice to either disable > FD_READ events after each wait or to just let them pending and use > WSAEnumNetworkEvents after each wake up. Dunno, async io does not seem t be that complicated. Buffers are not going anywhere, we would not leave he functions with an incomplete io, would be too bad .. Anyway my point is that It should be easy to write a vio_read/vio_write , with timeouts and cancellation, with implementation a lot sense of Linux-optimized MSG_DONTWAIT, but not using poll(), or its numerous incomplete Windows equivalents. This would use asynchronous IO, even if to user it appears as normal blocking IO. We talked before about this, I just guess I could formulate the idea bette, and understand it myself... Consider idea for vio_read(), with timeouts and cancel functionality: start read operation asynchronously, and then waits for io completion or cancel event (real windows event object which can be signaled from outside) or for timeout. If something bad happens, and IO is in progress CancelIo . I encourage you to find a bug in this logic, Icannot see yet how it could be bad. The "pro" argument is that should be faster than anything else, especially if IO does not block. The code would be along the lines... /* Start asynchronous read */ if (WSARecv( vio->sd, buffers, 1, &recvcnt, 0, vio->overlapped) == 0) { /* completed immediately, return result. */ return recvcnt; } if(WSAGetLastError() != WSA_IO_PENDING) /* Something bad happened */ return some_error; /* IO is pending, now wait */ BOOL do_cancel_io; HANDLE handles[] ={vio->cancel_event, vio->overlapped->hEvent}; switch (WaitForMultipleObject(handles, timeout)) /* This is poll , really :) */ { case 0: /* cancel_event signaled, other thread wants to our IO */ do_cancel_io = TRUE; break; case 1: /* IO completed */ do_cancel_io= FALSE; /* retrieve number of bytes received or error */ if( GetOverlappedResult(vio->sd, &recvcnt, vio->overlapped, FALSE)) return recvcnt; break; case WAIT_TIMEOUT: do_cancel_io = TRUE; break; } if (do_cancel_io) CancelIo(socket); /*in the same thread that initiated IO, I can do CancelIo() */ return some_error; The cancel functionality would be just int vio_cancel_io(Vio *io) { return SetEvent(io->cancel_event)?0:-1; } That's it , it can do read, timeout , can be interrupted. It sucks somewhat , with 2 extra events per vio (cancel_event and one in overlapped). On Vista, canceling from outside can be done more efficient using CancelIoEx. Extra bonus point for using vio->sd in WaitForMultipleObjects, instead of , vio->overlapped->hEvent (yes, this surprisingly should work too, overlapped with hEvent=NULL should signal the file handle an one could wait for file handle, but I never had a chance to test in practice) . So we actually may end up with WaitForSingleObject in an optimized version.