List:General Discussion« Previous MessageNext Message »
From:Michael Widenius Date:October 20 1999 2:58am
Subject:Problem with many repeated remote connections.
View as plain text  
>>>>> "clewis" == clewis  <clewis@stripped> writes:

>> Description:
clewis> 	We have some code which will run for a while before being unable
clewis> 	to connect to a remote server.  When the client is unable to
clewis> 	connect, mysql_error() returns an empty string.
clewis> 	The code wants to retrieve information about all users.  Instead
clewis> 	of retrieving all information, we iterate over all users by
clewis> 	connecting to the database, retrieving a single row, then
clewis> 	disconnecting.  The attached code is a very simplified version
clewis> 	of this, written for testing purposes.  For testing purposes,
clewis> 	I added a section to re-try the connection after a 5 second
clewis> 	sleep if the mysql_connect() fails.  We are re-writting our
clewis> 	code to work around this (see Fix section), but I am concerned
clewis> 	about the root cause.  I belive this is a problem in the client,
clewis> 	although I can't prove it.  If it is the client, then I can work
clewis> 	around it with no worries.  If the problem is in the server, then
clewis> 	our server will being having more problems in the near future as
clewis> 	the load increases.

>> How-To-Repeat:
clewis> 	The setup:  We have two development machines, named hermes and
clewis> 	matrix.  Matrix is the big machine, a PII/350, 128 Meg Ram running
clewis> 	RedHat 5.2.  Hermes is a AMD-K6/350, 32 Meg Ram running RedHat 6.0.
clewis> 	Matrix has MySQL 3.21.33c client/server.  Hermes has MySQL 3.22.27
clewis> 	client/server.  Matrix is the build machine, and all code
clewis> 	that I compiled was built on matrix.  Both machines are on a
clewis> 	100baseT subnet.  We orginally saw the problem on the beefy
clewis> 	servers, (Source distribution) but everything was sucessfully
clewis> 	re-produced on the smaller development machines.  The web server
clewis> 	is a Dual PII/450 with 512 Meg RAM and the MySQL server machine
clewis> 	is a Quad XeonIII/450 with 2 Gig RAM.

clewis> 	Testing configurations:
clewis> 	The C code was execute 4 different ways.
clewis> 	1) Running on matrix, connecting to MySQL on matrix.
clewis> 	2) Running on matrix, connecting to MySQL on hermes.
clewis> 	3) Running on hermes, connecting to MySQL on hermes.
clewis> 	4) Running on hermes, connecting to MySQL on matrix.

clewis> 	The first 3 cases work fine.  The 4th case always results in a
clewis> 	"Can't connect to server" error.  The error usually occurs in the 
clewis> 	4000 +/- 100th connection attempt, but not always.  Most of the
clewis> 	time, the connection would be re-attempted after a sleep, and the
clewis> 	re-attempt would fail.  Ocassionaly the re-attempt would work, but
clewis> 	the next iteration both the 1st & 2nd attempt would fail.  Rarely,
clewis> 	we would see a string of 1st connections failing, but 2nd attempts
clewis> 	going through, with both attempts eventually failing (This is
clewis> 	the example that I've provided below).  I've never seen test
clewis> 	case 4 complete all 15000 connections sucessfully.  We tried
clewis> 	upgrading MySQL on both machines to 3.22.27, but it was worse.
clewis> 	With both machines running 3.22.27 (RPM Distribution), we were
clewis> 	only able to get 1023 repeat connection before we couldn't
clewis> 	connect, with the 2nd attempt always failing.  I can't remember
clewis> 	if this happened in all four test cases, or just test cases
clewis> 	2 and 4.


The problem is that Linux has a delay between you do close on a TCP/IP
socket and until this is actually freed.  As there is only room for a
finite number of TCP/IP slots you will get the problem after a while.

I have mailed about this problem a couple of times to different
mailing lists and I have even talked with Alan Cox, but I have never
been able to resolve this properly.

Note that Linux 2.0 doesn't have this problem;  I have only seen this 
with 2.2 kernels!

Here is a simple program you can use to check this:

After compiling it you should run it as follows:

./server_client 'your host name'

if you have a problem with the tcp/ip connections, it should stop
after about 2000-4000 connections


I did just test this on my Linux 2.2.12 kernel and it WORKED!
(It didn't, when I last tested it a couple of months ago with an
earlier 2.2 kernel)

In other words, this is a server problem that may be fixed by
upgrading to a newer kernel!


Problem with many repeated remote connections.clewis20 Oct
  • Problem with many repeated remote connections.Michael Widenius20 Oct