Java threaded socket connection timeouts - java

I have to make simultaneous tcp socket connections every x seconds to multiple machines, in order to get something like a status update packet.
I use a Callable thread class, which creates a future task that connects to each machine, sends a query packet, and receives a reply which is returned to the main thread that creates all the callable objects.
My socket connection class is :
public class ClientConnect implements Callable<String> {
Connection con = null;
Statement st = null;
ResultSet rs = null;
String hostipp, hostnamee;
ClientConnect(String hostname, String hostip) {
hostnamee=hostname;
hostipp = hostip;
}
#Override
public String call() throws Exception {
return GetData();
}
private String GetData() {
Socket so = new Socket();
SocketAddress sa = null;
PrintWriter out = null;
BufferedReader in = null;
try {
sa = new InetSocketAddress(InetAddress.getByName(hostipp), 2223);
} catch (UnknownHostException e1) {
e1.printStackTrace();
}
try {
so.connect(sa, 10000);
out = new PrintWriter(so.getOutputStream(), true);
out.println("\1IDC_UPDATE\1");
in = new BufferedReader(new InputStreamReader(so.getInputStream()));
String [] response = in.readLine().split("\1");
out.close();in.close();so.close(); so = null;
try{
Integer.parseInt(response[2]);
} catch(NumberFormatException e) {
System.out.println("Number format exception");
return hostnamee + "|-1" ;
}
return hostnamee + "|" + response[2];
} catch (IOException e) {
try {
if(out!=null)out.close();
if(in!=null)in.close();
so.close();so = null;
return hostnamee + "|-1" ;
} catch (IOException e1) {
// TODO Auto-generated catch block
return hostnamee + "|-1" ;
}
}
}
}
And this is the way i create a pool of threads in my main class :
private void StartThreadPool()
{
ExecutorService pool = Executors.newFixedThreadPool(30);
List<Future<String>> list = new ArrayList<Future<String>>();
for (Map.Entry<String, String> entry : pc_nameip.entrySet())
{
Callable<String> worker = new ClientConnect(entry.getKey(),entry.getValue());
Future<String> submit = pool.submit(worker);
list.add(submit);
}
for (Future<String> future : list) {
try {
String threadresult;
threadresult = future.get();
//........ PROCESS DATA HERE!..........//
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
}
The pc_nameip map contains (hostname, hostip) values and for every entry i create a ClientConnect thread object.
My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.
If i force the list to contain a single working pc, I have no problem.
The timeouts are pretty random, no clue what's causing them.
All machines are in a local network, the remote servers are written by my also (in C/C++) and been working in another setup for more than 2 years without any problems.
Am i missing something or could it be an os network restriction problem?
I am testing this code on windows xp sp3. Thanks in advance!
UPDATE:
After creating two new server machines, and keeping one that was getting a lot of timeouts, i have the following results :
For 100 thread runs over 20 minutes :
NEW_SERVER1 : 99 successful connections/ 1 timeouts
NEW_SERVER2 : 94 successful connections/ 6 timeouts
OLD_SERVER : 57 successful connections/ 43 timeouts
Other info :
- I experienced a JRE crash (EXCEPTION_ACCESS_VIOLATION (0xc0000005)) once and had to restart the application.
- I noticed that while the app was running my network connection was struggling as i was browsing the internet. I have no idea if this is expected but i think my having at MAX 15 threads is not that much.
So, fisrt of all my old servers had some kind of problem. No idea what that was, since my new servers were created from the same OS image.
Secondly, although the timeout percentage has dropped dramatically, i still think it is uncommon to get even one timeout in a small LAN like ours. But this could be a server's application part problem.
Finally my point of view is that, apart from the old server's problem (i still cannot beleive i lost so much time with that!), there must be either a server app bug, or a JDK related bug (since i experienced that JRE crash).
p.s. I use Eclipse as IDE and my JRE is the latest.
If any of the above ring any bells to you, please comment.
Thank you.
-----EDIT-----
Could it be that PrintWriter and/or BufferedReader are not actually thread safe????!!!?
----NEW EDIT 09 Sep 2013----
After re-reading all the comments and thanks to #Gray and his comment :
When you run multiple servers does the first couple work and the rest of them timeout? Might be interesting to put a small sleep in your fork loop (like 10 or 100ms) to see if it works that way.
I rearanged the tree list of the hosts/ip's and got some really strange results.
It seems that if an alive host is placed on top of the tree list, thus being first to start a socket connection, has no problem connecting and receiving packets without any delay or timeout.
On the contrary, if an alive host is placed at the bottom of the list, with several dead hosts before it, it just takes too long to connect and with my previous timeout of 10 secs it failed to connect. But after changing the timeout to 60 seconds (thanks to #EJP) i realised that no timeouts are occuring!
It just takes too long to connect (more than 20 seconds in some occasions).
Something is blobking new socket connections, and it isn't that the hosts or network is to busy to respond.
I have some debug data here, if you would like to take a look :
http://pastebin.com/2m8jDwKL

You could simply check for availability before you connect to the socket. There is an answer who provides some kind of hackish workaround https://stackoverflow.com/a/10145643/1809463
Process p1 = java.lang.Runtime.getRuntime().exec("ping -c 1 " + ip);
int returnVal = p1.waitFor();
boolean reachable = (returnVal==0);
by jayunit100
It should work on unix and windows, since ping is a common program.

My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.
So as I understand the problem, if you have (for example) 10 PCs in your map and 1 is alive and the other 9 are not online, all 10 connections time out. If you just put the 1 alive PC in the map, it shows up as fine.
This points to some sort of concurrency problem but I can't see it. I would have thought that there was some sort of shared data that was not being locked or something. I see your test code is using Statement and ResultSet. Maybe there is a database connection that is being shared without locking or something? Can you try just returning the result string and printing it out?
Less likely is some sort of network or firewall configuration but the idea that one failed connection would cause another to fail is just strange. Maybe try running your program on one of the servers or from another computer?
If I try your test code, it seems to work fine. Here's the source code for my test class. It has no problems contacting a combination of online and offline hosts.
Lastly some quick comments about your code:
You should close the streams, readers, and sockets in a finally block. Check my test class for a better pattern there.
You should return a small Result class instead of passing back a String that they has to be parsed.
Hope this helps.

After a lot of reading and experimentation i will have to answer my own question (if i am allowed to do of course).
Java just can't handle concurrent multiple socket connections without adding a big performance overhead. At least in a Core2Duo/4GB RAM/ Windows XP machine.
Creating multiple concurrent socket connections to remote hosts (using of course the code i posted) creates some kind of resource bottleneck, or blocking situation, wich i am still not aware of.
If you try to connect to 20 hosts simultaneously, and a lot of them are disconnected, then you cannot guarantee a "fast" connection to the alive ones.
You will get connected but could be after 20-25 seconds. Meaning that you'll have to set socket timeout to something like 60 seconds. (not acceptable for my application)
If an alive host is lucky to start its connection try first (having in mind that concurrency is not absolute. the for loop still has sequentiality), then he will probably get connected very fast and get a response.
If it is unlucky, the socket.connect() method will block for some time, depending on how many are the hosts before it that will timeout eventually.
After adding a small sleep between the pool.submit(worker) method calls (100 ms) i realised that it makes some difference. I get to connect faster to the "unlucky" hosts. But still if the list of dead hosts is increased, the results are almost the same.
If i edit my host list and place a previously "unlucky" host at the top (before dead hosts), all problems dissapear...
So, for some reason the socket.connect() method creates a form of bottleneck when the hosts to connect to are many, and not alive. Be it a JVM problem, a OS limitation or bad coding from my side, i have no clue...
I will try a different coding approach and hopefully tommorow i will post some feedback.
p.s. This answer made me think of my problem :
https://stackoverflow.com/a/4351360/2025271

Related

Winsock "connect" hangs. Visual studio reports possible deadlock

I have this code. (Used it in other old project of mine, worked wonderfully)
SOCKET Connect(char * host, int port){
struct sockaddr_in sin = {0};
struct hostent * entry = 0;
SOCKET s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if(s == INVALID_SOCKET){
return INVALID_SOCKET;
}
entry = gethostbyname(host);
if(entry == 0){
closesocket(s);
return INVALID_SOCKET;
}
sin.sin_addr = *((LPIN_ADDR)*entry->h_addr_list);
sin.sin_family = AF_INET;
sin.sin_port = htons(port);
// The process becomes dealocked after this line
if( connect(s,(const LPSOCKADDR)&sin,sizeof(SOCKADDR)) == SOCKET_ERROR){
closesocket(s);
return INVALID_SOCKET;
}
return s;
}
I started this morning working on a Delphi project using TTcpClient and Indy's TIdTcpClient wrappers and I noticed the process did not make any connections rather it just hung after calling connect. I then switched to C/C++ and tried with this code which does the same thing. After it hangs, there's no way to kill it (unless when it's being debugged where I had to exit the debugger). TaskManager, Process Explorer didn't do shit.
There are no threads or loops or whatever that may cause it to hang just this code and another function that writes to the socket after it connects.
When debugging with Visual Studio, after sometime there's a message (below)
Even Wireshark doesn't show anything at all. Restarted my computer and still the same problem.
So has anyone ever had this problem before?
Used compilers
Visual Studio 2010
Pelles-C
Delphi 7
OS : Windows 7 64 bit, Ultimate
Winsock Version: 2.2
Update:
So I thought I would getaway and switched to Java only to find out the same problem after a couple of times. What the hell is wrong here. The Java takes around 2 minutes to connect even on localhost. This simple code takes ~2 minutes during which java.exe can't be killed also.
long startTime = System.currentTimeMillis(), endTime;
Socket clientSock = new Socket("localhost",80); // running Apache on localhost
endTime = System.currentTimeMillis();
Log("Connection time " + (endTime - startTime) + " ms");
clientSock.close();
run:
Connection time 125088 ms
As for Java I did some searches and this problem was a bug in version 1 of the JDK but the change log showed it had been patched. But then again this happens in the underlying winsock library. WHY ? This program connects instantly and it also uses winsock: http://flatassembler.net/examples/quetannon.zip
So now I have to re-write 976 lines of JAVA in assembly just because of this? Help me out here people.
Since you are encountering the same problem in multiple wrappers that all ultimately delegate to Winsock, its safe to assume that this is an OS issue, not a coding issue. Something on your system has hosed your Winsock installation, or the OS is having networking problems in general, especially since a simple OS reboot did not clear the issue. Try using Windows' command-line netsh tool to reset both the TCP and Winsock subsystems, the command-line ipconfig tool to flush the DNS cache, reboot, and see if the problem continues.
On the coding side, you should implement a timeout on the connect() to avoid further deadlocks. There are two ways to do that:
Put the socket into non-blocking mode and then call select() if connect() returns a WSAEWOULDBLOCk error. If select() times out, close the socket.
Leave the socket in blocking mode and use a separate thread to manage the timeout. Call connect() in the thread, or run your timeout logic in the thread, it does not really matter, but if the timeout elapses while connect() is still running then you can close the socket, aborting connect(). This is the approach that TIdTCPClient uses.
Ok. For the JAVA part at least I solved it by using the following code based on the answer here Java Socket creation takes more time.
So basically the default timeout value is (possibly) huge.So what I did was set a 3 second timeout then once the timeout exception is thrown, the next call works instantly.
private static final int CONNECT_TIMEOUT = 3000; // 3 seconds
private static Socket AttemptConnection(String host, int port) {
Socket temp;
try {
temp = new Socket();
temp.connect(new InetSocketAddress(host, port), CONNECT_TIMEOUT);
return temp;
} catch (Exception ex) {
temp = null;
lastException = ex.getMessage();
return temp;
}
}
And somewhere in your code (at least in my app)
while ( (clientSock = AttemptConnection("localhost",80)) == null ){
Log("Attempting connection. Last exception: " + lastException);
try{Thread.sleep(2500);}catch(Exception ex){} /* This is necessary in my application */
}
So looking at this I think the fix to all the socket implementations (JAVA,Delphi, etc) is to set a small timeout value then connect again.
EDIT:
The root of the problem was found: I have a HIPS program (COMODO Firewall) running on my laptop. If COMODO's cmdagent.exe is active, it'll show me an alert of an outgoing connection to which I can accept/deny. If not, it will silently deny the connection, so therefore something becomes deadlocked in the low levels.I was worried my PC was effed up.

android socket gets stuck after connection

I am trying to make a app that scans all the ip in range for a specific open port (5050) and if it is open write some message on LOG.
heres the code :
public void run(){
for(int i=0;i<256;i++)
{
Log.d("NetworkScanner","attemping to contact 192.168.1."+i);
try {
Socket socket=new Socket(searchIP+i,5050);
possibleClients.add(searchIP);
socket.close();
Log.d("NetworkScanner"," 192.168.1."+i+" YEAAAHHH");
} catch (UnknownHostException e) {
Log.d("NetworkScanner"," 192.168.1."+i+" unavailable");
} catch (IOException e) {
e.printStackTrace();
}
}
}
EDIT: here's a new problem: Even if a host is found online without the port open the scanning process (for loop) is stuck for a long time before moving to next. also scanning each host is taking considerable time!
Phew the final solution was to make a Socket object with default constructor then create the InetAddr object for the host and then use the Connect(InetAddr,timeout) function of the socket api with timeout in milliseconds (approx 300ms) that scans every ip in just 300 ms or less (less than 200 ms may give errors) and multi threading to scan in parallel make it as fast as 5 sec to scan all the IPs in range..
You are breaking out of the loop when no Exception is thrown.
You need to remove break;
To address your new problem:
Of course it's slow. What did you expect? You are trying to establish a connection to each IP in your subnet which takes time. It seems you are only trying figure out what devices are available on the network, so you might be able to decrease the time a little by looking at this answer. He is using the build in isReachable method which accepts a timeout value. It will still take some time, but not that much time.
Remove the "break;"...it stops the iteration.

Java sockets: best way to retry upon Connection Refused exception?

Right now I'm doing this:
while (true) {
try {
SocketAddress sockaddr = new InetSocketAddress(ivDestIP, ivDestPort);
downloadSock = new Socket();
downloadSock.connect(sockaddr);
this.oos = new ObjectOutputStream(downloadSock.getOutputStream());
this.ois = new ObjectInputStream(downloadSock.getInputStream());
break;
} catch (Exception e) {}
}
downloadSock.connect(sockaddr) will generate a ConnectionRefused exception if the remote host is not listening on the socket. I'm running my code in a separate thread, so I'm not worried about blocking. Given this, is my method of retrying appropriate or is there a better way???
Thanks!
Its OK to try to attempt to connect to a remote server in a loop, and is actually very common, but make sure that there's a Thread.sleep(ms) in each iteration, or, the server host may think that you are trying a DOS.
In this case I usually use a progressively longer sleep period each request.
It could be that the server is almost up, so you just want to try again in a second. But if that request fails, wait 2 seconds, but if that one fails, wait 4, etc.
It may be that you want to cap the amount of waiting to 30 seconds or a minute or something like that. It's probably wise to define the maximum number of tries so you don't just wait indefinitely.
Something like this might calculate your next delay in seconds:
seconds_to_wait = Math.min(60, Math.pow(2, try_num));
Your method will hammer the server with connection requests one after the other. You should include a Thread.sleep() call in your catch block (so it will only be executed if you actually need to wait) in order to wait a couple of seconds before you try again.

Java TCP/IP Server Closing Connections Improperly

I've created an MMO for the Android phone and use a Java server with TCP/IP sockets. Everything generally works fine, but after about a day of clients logging on and off my network becomes extremely laggy -- even if there aren't clients connected. NETSTAT shows no lingering connections, but there is obviously something terribly wrong going on.
If I do a full reboot everything magically is fine again, but this isn't a tenable solution for the long-term. This is what my disconnect method looks like (on both ends):
public final void disconnect()
{
Alive = false;
Log.write("Disconnecting " + _socket.getRemoteSocketAddress());
try
{
_socket.shutdownInput();
}
catch (final Exception e)
{
Log.write(e);
}
try
{
_socket.shutdownOutput();
}
catch (final Exception e)
{
Log.write(e);
}
try
{
_input.close();
}
catch (final Exception e)
{
Log.write(e);
}
try
{
_output.close();
}
catch (final Exception e)
{
Log.write(e);
}
try
{
_socket.close();
}
catch (final Exception e)
{
Log.write(e);
}
}
_input and _output are BufferedInputStream and BufferedOutputStream spawned from the socket. According to documentation calling shutdownInput() and shutdownOutput() shouldn't be necessary, but I'm throwing everything I possibly can at this.
I instantiate the sockets with default settings -- I'm not touching soLinger, KeepAlive, noDelay or anything like that. I do not have any timeouts set on send/receive. I've tried using WireShark but it reveals nothing unusual, just like NETSTAT.
I'm pretty desperate for answers on this. I've put a lot of effort into this project and am frustrated with what appears to be a serious hidden flaw in Java's default TCP implementation.
Get rid of shutdownInput() and shutdownOutput() and all the closes except the close for the BufferedOutputStream, and a subsequent close on the socket itself in a finally block as a belt & braces. You are shutting down and closing everything else before the output stream, which prevents it from flushing. Closing the output stream flushes it and closes the socket. That's all you need.
OP here, unable to comment on original post.
Restarting the server process does not appear to resolve the issue. The network remains very "laggy" even several minutes after shutting down the server entirely.
By "laggy" I mean the connection becomes extremely slow with both up and down traffic. Trying to load websites, or upload to my FTP, is painfully slow like I'm on a 14.4k modem (I'm on a 15mbs fiber). Internet Speed Tests don't even work when it is in this state -- I get an error about not finding the file, when the websites eventually load up.
All of this instantly clears up after a reboot, and only after a reboot.
I modified my disconnect method as EJP suggested, but the problem persists.
Server runs on a Windows 7 installation, latest version of Java / Java SDK. The server has 16gb of RAM, although it's possible I'm not allocating it properly for the JVM to use fully. No stray threads or processes appear to be present. I'll see what JVISUALVM says. – jysend 13 mins ago
Nothing unusual in JVISUALVM -- 10mb heap, 50% CPU use, 3160 objects (expected), 27 live threads out of 437 started. Server has been running for about 18 hours; loading up CNN's front page takes about a minute, and the normal speed test I use (first hit googling Speed Test) won't even load the page. NETSTAT shows no lingering connections. Ran all up to date antivirus. Server has run 24/7 in the past without any issues -- it is only when I started running this Java server on it that this started to happen.

BindException/Too many file open while using HttpClient under load

I have got 1000 dedicated Java threads where each thread polls a corresponding url every one second.
public class Poller {
public static Node poll(Node node) {
GetMethod method = null;
try {
HttpClient client = new HttpClient(new SimpleHttpConnectionManager(true));
......
} catch (IOException ex) {
ex.printStackTrace();
} finally {
method.releaseConnection();
}
}
}
The threads are run every one second:
for (int i=0; i <1000; i++) {
MyThread thread = threads.get(i) // threads is a static field
if(thread.isAlive()) {
// If the previous thread is still running, let it run.
} else {
thread.start();
}
}
The problem is if I run the job every one second I get random exceptions like these:
java.net.BindException: Address already in use
INFO httpclient.HttpMethodDirector: I/O exception (java.net.BindException) caught when processing request: Address already in use
INFO httpclient.HttpMethodDirector: Retrying request
But if I run the job every 2 seconds or more, everything runs fine.
I even tried shutting down the instance of SimpleHttpConnectionManager() using shutDown() with no effect.
If I do netstat, I see thousands of TCP connections in TIME_WAIT state, which means they are have been closed and are clearing up.
So to limit the no of connections, I tried using a single instance of HttpClient and use it like this:
public class MyHttpClientFactory {
private static MyHttpClientFactory instance = new HttpClientFactory();
private MultiThreadedHttpConnectionManager connectionManager;
private HttpClient client;
private HttpClientFactory() {
init();
}
public static HttpClientFactory getInstance() {
return instance;
}
public void init() {
connectionManager = new MultiThreadedHttpConnectionManager();
HttpConnectionManagerParams managerParams = new HttpConnectionManagerParams();
managerParams.setMaxTotalConnections(1000);
connectionManager.setParams(managerParams);
client = new HttpClient(connectionManager);
}
public HttpClient getHttpClient() {
if (client != null) {
return client;
} else {
init();
return client;
}
}
}
However after running for exactly 2 hours, it starts throwing 'too many open files' and eventually cannot do anything at all.
ERROR java.net.SocketException: Too many open files
INFO httpclient.HttpMethodDirector: I/O exception (java.net.SocketException) caught when processing request: Too many open files
INFO httpclient.HttpMethodDirector: Retrying request
I should be able to increase the no of connections allowed and make it work, but I would just be prolonging the evil. Any idea what is the best practise to use HttpClient in a situation like above?
Btw, I am still on HttpClient3.1.
This happened to us a few months back. First, double check to make sure you really are calling releaseConnection() every time. But even then, the OS doesn't actually reclaim the TCP connections all at once. The solution is to use the Apache HTTP Client's MultiThreadedHttpConnectionManager. This pools and reuses the connections.
See http://hc.apache.org/httpclient-3.x/performance.html for more performance tips.
Update: Whoops, I didn't read the lower code sample. If you're doing releaseConnection() and using MultiThreadedHttpConnectionManager, consider whether your OS limit on open files per process is set high enough. We had that problem too, and needed to extend the limit a bit.
There is nothing wrong with first error. You just depleted empirical ports available. Each TCP connection can stay in TIME_WAIT state for 2 minutes. You generate 2000/seconds. Soon or later, the socket can't find any unused local port and you will get that error. TIME_WAIT designed exactly for this purpose. Without it, your system might hijack a previous connection.
The second error means you have too many sockets open. On some system, there is a limit of 1K open files. Maybe you just hit that limit due to lingering sockets and other open files. On Linux, you can change this limit using
ulimit -n 2048
But that's limited by a system-wide max value.
As sudo or root edit the /etc/security/limits.conf file. At the end of the file just above “# End of File” enter the following values:
* soft nofile 65535
* hard nofile 65535
This will set the number of open files to unlimited.

Categories