BindException/Too many file open while using HttpClient under load

BindException/Too many file open while using HttpClient under load - java

I have got 1000 dedicated Java threads where each thread polls a corresponding url every one second.
public class Poller {
public static Node poll(Node node) {
GetMethod method = null;
try {
HttpClient client = new HttpClient(new SimpleHttpConnectionManager(true));
......
} catch (IOException ex) {
ex.printStackTrace();
} finally {
method.releaseConnection();
}
}
}
The threads are run every one second:
for (int i=0; i <1000; i++) {
MyThread thread = threads.get(i) // threads is a static field
if(thread.isAlive()) {
// If the previous thread is still running, let it run.
} else {
thread.start();
}
}
The problem is if I run the job every one second I get random exceptions like these:
java.net.BindException: Address already in use
INFO httpclient.HttpMethodDirector: I/O exception (java.net.BindException) caught when processing request: Address already in use
INFO httpclient.HttpMethodDirector: Retrying request
But if I run the job every 2 seconds or more, everything runs fine.
I even tried shutting down the instance of SimpleHttpConnectionManager() using shutDown() with no effect.
If I do netstat, I see thousands of TCP connections in TIME_WAIT state, which means they are have been closed and are clearing up.
So to limit the no of connections, I tried using a single instance of HttpClient and use it like this:
public class MyHttpClientFactory {
private static MyHttpClientFactory instance = new HttpClientFactory();
private MultiThreadedHttpConnectionManager connectionManager;
private HttpClient client;
private HttpClientFactory() {
init();
}
public static HttpClientFactory getInstance() {
return instance;
}
public void init() {
connectionManager = new MultiThreadedHttpConnectionManager();
HttpConnectionManagerParams managerParams = new HttpConnectionManagerParams();
managerParams.setMaxTotalConnections(1000);
connectionManager.setParams(managerParams);
client = new HttpClient(connectionManager);
}
public HttpClient getHttpClient() {
if (client != null) {
return client;
} else {
init();
return client;
}
}
}
However after running for exactly 2 hours, it starts throwing 'too many open files' and eventually cannot do anything at all.
ERROR java.net.SocketException: Too many open files
INFO httpclient.HttpMethodDirector: I/O exception (java.net.SocketException) caught when processing request: Too many open files
INFO httpclient.HttpMethodDirector: Retrying request
I should be able to increase the no of connections allowed and make it work, but I would just be prolonging the evil. Any idea what is the best practise to use HttpClient in a situation like above?
Btw, I am still on HttpClient3.1.

This happened to us a few months back. First, double check to make sure you really are calling releaseConnection() every time. But even then, the OS doesn't actually reclaim the TCP connections all at once. The solution is to use the Apache HTTP Client's MultiThreadedHttpConnectionManager. This pools and reuses the connections.
See http://hc.apache.org/httpclient-3.x/performance.html for more performance tips.
Update: Whoops, I didn't read the lower code sample. If you're doing releaseConnection() and using MultiThreadedHttpConnectionManager, consider whether your OS limit on open files per process is set high enough. We had that problem too, and needed to extend the limit a bit.

There is nothing wrong with first error. You just depleted empirical ports available. Each TCP connection can stay in TIME_WAIT state for 2 minutes. You generate 2000/seconds. Soon or later, the socket can't find any unused local port and you will get that error. TIME_WAIT designed exactly for this purpose. Without it, your system might hijack a previous connection.
The second error means you have too many sockets open. On some system, there is a limit of 1K open files. Maybe you just hit that limit due to lingering sockets and other open files. On Linux, you can change this limit using
ulimit -n 2048
But that's limited by a system-wide max value.

As sudo or root edit the /etc/security/limits.conf file. At the end of the file just above “# End of File” enter the following values:
* soft nofile 65535
* hard nofile 65535
This will set the number of open files to unlimited.

Related

How prevent too many file open from close_wait connections

My program is fetching some images on a min.io server via their Java SDK.
The issue is that even after inputStream.close() the connections remain open from the java code. I can see it with lsof -p <PID>.
After a while, it disappears but sometimes it does not, I guess fast enough, and the java server throws some too many open files errors.
Is there like a garbage collector that removes the connections from the operating system?
How can I prevent these too many open files errors?
Just in case, here is the code:
public static byte[] getImageByImageBinaryId(String imagId) throws IOException {
InputStream object = null;
try {
object = getMinioClientClient().getObject(ServerProperties.MINIO_BUCKET_NAME, imagId);
return IOUtils.toByteArray(object);
} catch (Exception e) {
log.error(e);
} finally {
IOUtils.closeQuietly(object);
}
return null;
}

Internally minio-java uses OkHttp to make HTTP calls. OkHttp, like many Http clients, internally uses a connection pool to speed up repeated calls to the same location. If you need for connections to not persist you can pass in your own OkHttp client to one of the available constructors with your own pooling config but I do not recommend it.
Minio should probably expose a close method to clean up these resources but their expected use case probably involves clients living the whole life of your application.

Why adding socket read timeout doesn't help for socketread0 [duplicate]

Performing millions of HTTP requests with different Java libraries gives me threads hanged on:
java.net.SocketInputStream.socketRead0()
Which is native function.
I tried to set up Apche Http Client and RequestConfig to have timeouts on (I hope) everythig that is possible but still, I have (probably infinite) hangs on socketRead0. How to get rid of them?
Hung ratio is about ~1 per 10000 requests (to 10000 different hosts) and it can last probably forever (I've confirmed thread hung as still valid after 10 hours).
JDK 1.8 on Windows 7.
My HttpClient factory:
SocketConfig socketConfig = SocketConfig.custom()
.setSoKeepAlive(false)
.setSoLinger(1)
.setSoReuseAddress(true)
.setSoTimeout(5000)
.setTcpNoDelay(true).build();
HttpClientBuilder builder = HttpClientBuilder.create();
builder.disableAutomaticRetries();
builder.disableContentCompression();
builder.disableCookieManagement();
builder.disableRedirectHandling();
builder.setConnectionReuseStrategy(new NoConnectionReuseStrategy());
builder.setDefaultSocketConfig(socketConfig);
return HttpClientBuilder.create().build();
My RequestConfig factory:
HttpGet request = new HttpGet(url);
RequestConfig config = RequestConfig.custom()
.setCircularRedirectsAllowed(false)
.setConnectionRequestTimeout(8000)
.setConnectTimeout(4000)
.setMaxRedirects(1)
.setRedirectsEnabled(true)
.setSocketTimeout(5000)
.setStaleConnectionCheckEnabled(true).build();
request.setConfig(config);
return new HttpGet(url);
OpenJDK socketRead0 source
Note: Actually I have some "trick" - I can schedule .getConnectionManager().shutdown() in other Thread with cancellation of Future if request finished properly, but it is depracated and also it kills whole HttpClient, not only that single request.

Though this question mentions Windows, I have the same problem on Linux. It appears there is a flaw in the way the JVM implements blocking socket timeouts:
https://bugs.openjdk.java.net/browse/JDK-8049846
https://bugs.openjdk.java.net/browse/JDK-8075484
To summarize, timeout for blocking sockets is implemented by calling poll on Linux (and select on Windows) to determine that data is available before calling recv. However, at least on Linux, both methods can spuriously indicate that data is available when it is not, leading to recv blocking indefinitely.
From poll(2) man page BUGS section:
See the discussion of spurious readiness notifications under the BUGS section of select(2).
From select(2) man page BUGS section:
Under Linux, select() may report a socket file descriptor as "ready
for reading", while nevertheless a subsequent read blocks. This could
for example happen when data has arrived but upon examination has
wrong checksum and is discarded. There may be other circumstances
in which a file descriptor is spuriously reported as ready. Thus it
may be safer to use O_NONBLOCK on sockets that should not block.
The Apache HTTP Client code is a bit hard to follow, but it appears that connection expiration is only set for HTTP keep-alive connections (which you've disabled) and is indefinite unless the server specifies otherwise. Therefore, as pointed out by oleg, the Connection eviction policy approach won't work in your case and can't be relied upon in general.

As Clint said, you should consider a Non-blocking HTTP client, or (seeing that you are using the Apache Httpclient) implement a Multithreaded request execution to prevent possible hangs of the main application thread (this not solve the problem but is better than restart your app because is freezed). Anyway, you set the setStaleConnectionCheckEnabled property but the stale connection check is not 100% reliable, from the Apache Httpclient tutorial:
One of the major shortcomings of the classic blocking I/O model is
that the network socket can react to I/O events only when blocked in
an I/O operation. When a connection is released back to the manager,
it can be kept alive however it is unable to monitor the status of the
socket and react to any I/O events. If the connection gets closed on
the server side, the client side connection is unable to detect the
change in the connection state (and react appropriately by closing the
socket on its end).
HttpClient tries to mitigate the problem by testing whether the
connection is 'stale', that is no longer valid because it was closed
on the server side, prior to using the connection for executing an
HTTP request. The stale connection check is not 100% reliable and adds
10 to 30 ms overhead to each request execution.
The Apache HttpComponents crew recommends the implementation of a Connection eviction policy
The only feasible solution that does not involve a one thread per
socket model for idle connections is a dedicated monitor thread used
to evict connections that are considered expired due to a long period
of inactivity. The monitor thread can periodically call
ClientConnectionManager#closeExpiredConnections() method to close all
expired connections and evict closed connections from the pool. It can
also optionally call ClientConnectionManager#closeIdleConnections()
method to close all connections that have been idle over a given
period of time.
Take a look at the sample code of the Connection eviction policy section and try to implement it in your application along with the Multithread request execution, I think the implementation of both mechanisms will prevent your undesired hangs.

You should consider a Non-blocking HTTP client like Grizzly or Netty which do not have blocking operations to hang a thread.

I have more than 50 machines that make about 200k requests/day/machine. They are running Amazon Linux AMI 2017.03. I previously had jdk1.8.0_102, now I have jdk1.8.0_131. I am using both apacheHttpClient and OKHttp as scraping libraries.
Each machine was running 50 threads, and sometimes, the threads get lost. After profiling with Youkit java profiler I got
ScraperThread42 State: RUNNABLE CPU usage on sample: 0ms
java.net.SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) SocketInputStream.java (native)
java.net.SocketInputStream.socketRead(FileDescriptor, byte[], int, int, int) SocketInputStream.java:116
java.net.SocketInputStream.read(byte[], int, int, int) SocketInputStream.java:171
java.net.SocketInputStream.read(byte[], int, int) SocketInputStream.java:141
okio.Okio$2.read(Buffer, long) Okio.java:139
okio.AsyncTimeout$2.read(Buffer, long) AsyncTimeout.java:211
okio.RealBufferedSource.indexOf(byte, long) RealBufferedSource.java:306
okio.RealBufferedSource.indexOf(byte) RealBufferedSource.java:300
okio.RealBufferedSource.readUtf8LineStrict() RealBufferedSource.java:196
okhttp3.internal.http1.Http1Codec.readResponse() Http1Codec.java:191
okhttp3.internal.connection.RealConnection.createTunnel(int, int, Request, HttpUrl) RealConnection.java:303
okhttp3.internal.connection.RealConnection.buildTunneledConnection(int, int, int, ConnectionSpecSelector) RealConnection.java:156
okhttp3.internal.connection.RealConnection.connect(int, int, int, List, boolean) RealConnection.java:112
okhttp3.internal.connection.StreamAllocation.findConnection(int, int, int, boolean) StreamAllocation.java:193
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(int, int, int, boolean, boolean) StreamAllocation.java:129
okhttp3.internal.connection.StreamAllocation.newStream(OkHttpClient, boolean) StreamAllocation.java:98
okhttp3.internal.connection.ConnectInterceptor.intercept(Interceptor$Chain) ConnectInterceptor.java:42
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.internal.http.BridgeInterceptor.intercept(Interceptor$Chain) BridgeInterceptor.java:93
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(Interceptor$Chain) RetryAndFollowUpInterceptor.java:124
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.RealCall.getResponseWithInterceptorChain() RealCall.java:198
okhttp3.RealCall.execute() RealCall.java:83
I found out that they have a fix for this
https://bugs.openjdk.java.net/browse/JDK-8172578
in JDK 8u152 (early access). I have installed it on one of our machines. Now I am waiting to see some good results.

Given no one else responded so far, here is my take
Your timeout setting looks perfectly OK to me. The reason why certain requests appear to be constantly blocked in a java.net.SocketInputStream#socketRead0() call is likely to be due to a combination of misbehaving servers and your local configuration. Socket timeout defines a maximum period of inactivity between two consecutive i/o read operations (or in other words two consecutive incoming packets). Your socket timeout setting is 5,000 milliseconds. As long as the opposite endpoint keeps on sending a packet every 4,999 milliseconds for a chunk encoded message the request will never time out and will end up sending most of its time blocked in java.net.SocketInputStream#socketRead0(). You can find out whether or not this is the case by running HttpClient with wire logging turned on.

For Apache HTTP Client (blocking) I found best solution is to getConnectionManager(). and shutdown it.
So in high-reliability solution I just schedule shutdown in other thread and in case request does not complete I'm shutting in down from other thread

I bumped into the same issue using apache common http client.
There's a pretty simple workaround (which doesn't require shutting the connection manager down):
In order to reproduce it, one needs to execute the request from the question in a new thread paying attention to details:
run request in separate thread, close request and release it's connection in a different thread, interrupt hanging thread
don't run EntityUtils.consumeQuietly(response.getEntity()) in finally block (because it hangs on 'dead' connection)
First, add the interface
interface RequestDisposer {
void dispose();
}
Execute an HTTP request in a new thread
final AtomicReference<RequestDisposer> requestDisposer = new AtomicReference<>(null);
final Thread thread = new Thread(() -> {
final HttpGet request = new HttpGet("http://my.url");
final RequestDisposer disposer = () -> {
request.abort();
request.releaseConnection();
};
requestDiposer.set(disposer);
try (final CloseableHttpResponse response = httpClient.execute(request))) {
...
} finally {
disposer.dispose();
}
};)
thread.start()
Call dispose() in the main thread to close hanging connection
requestDisposer.get().dispose(); // better check if it's not null first
thread.interrupt();
thread.join();
That fixed the issue for me.
My stacktrace looked like this:
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
To whom it might be interesting, it easily reproducable, interrupt the thread without aborting request and releasing connection (ratio is about 1/100).
Windows 10, version 10.0.
jdk8.151-x64.

I feel that all these answers are way too specific.
We have to note that this is probably a real JVM bug. It should be possible to get the file descriptor and close it. All this timeout-talk is too high level. You do not want a timeout to the extent that the connection fails, what you want is an ability to hard break this stuck thread and stop or interrupt it.
The way the JVM should implemented the SocketInputStream.socketRead function is to set some internal default timeout, which should be even as low as 1 second. Then when the timeout comes, immediately looping back to the socketRead0. While that is happening, the Thread.interrupt and Thread.stop commands can take effect.
The even better way of doing this of course is not to do any blocking wait at all, but instead use a the select(2) system call with a list of file descriptors and when any one has data available, let it perform the read operation.
Just look all over the internet all these people having trouble with threads stuck in java.net.SocketInputStream#socketRead0, it's the most popular topic about java.net.SocketInputStream hands down!
So, while the bug is not fixed, I wonder about the most dirty trick I can come up with to break up this situation. Something like connecting with the debugger interface to get to the stack frame of the socketRead call and grab the FileDescriptor and then break into that to get the int fd number and then make a native close(2) call on that fd.
Do we have a chance to do that? (Don't tell me "it's not good practice") -- if so, let's do it!

I faced the same issue today. Based on #Sergei Voitovich I've tried to make it work still using Apache Http Client.
Since I am using Java 8 its simpler to make a timeout to abort the connection.
Here's is a draft of the implementation:
private HttpResponse executeRequest(Request request){
InterruptibleRequestExecution requestExecution = new InterruptibleRequestExecution(request, executor);
ExecutorService executorService = Executors.newSingleThreadExecutor();
try {
return executorService.submit(requestExecution).get(<your timeout in milliseconds>, TimeUnit.MILLISECONDS);
} catch (TimeoutException | ExecutionException e) {
// Your request timed out, you can throw an exception here if you want
throw new UsefulExceptionForYourApplication(e);
} catch (InterruptedException e) {
// Always remember to call interrupt after catching InterruptedException
Thread.currentThread().interrupt();
throw new UsefulExceptionForYourApplication(e);
} finally {
// This method forces to stop the Thread Pool (with single thread) created by Executors.newSingleThreadExecutor() and makes the pending request to abort inside the thread. So if the request is hanging in socketRead0 it will stop and also the thread will be terminated
forceStopIdleThreadsAndRequests(requestExecution, executorService);
}
}
private void forceStopIdleThreadsAndRequests(InterruptibleRequestExecution execution,
ExecutorService executorService) {
execution.abortRequest();
executorService.shutdownNow();
}
The code above will create a new Thread to execute the request using org.apache.http.client.fluent.Executor. Timeout can be easily configured.
The execution of the thread is defined in InterruptibleRequestExecution which you can see below.
private static class InterruptibleRequestExecution implements Callable<HttpResponse> {
private final Request request;
private final Executor executor;
private final RequestDisposer disposer;
public InterruptibleRequestExecution(Request request, Executor executor) {
this.request = request;
this.executor = executor;
this.disposer = request::abort;
}
#Override
public HttpResponse call() {
try {
return executor.execute(request).returnResponse();
} catch (IOException e) {
throw new UsefulExceptionForYourApplication(e);
} finally {
disposer.dispose();
}
}
public void abortRequest() {
disposer.dispose();
}
#FunctionalInterface
interface RequestDisposer {
void dispose();
}
}
The results are really good. We've had times where some connections where hanging in sockedRead0 for 7 hours! Now, it never passes the defined timeout and its working in production with millions of requests per day without having any problems.

How to prevent hangs on SocketInputStream.socketRead0 in Java?

Performing millions of HTTP requests with different Java libraries gives me threads hanged on:
java.net.SocketInputStream.socketRead0()
Which is native function.
I tried to set up Apche Http Client and RequestConfig to have timeouts on (I hope) everythig that is possible but still, I have (probably infinite) hangs on socketRead0. How to get rid of them?
Hung ratio is about ~1 per 10000 requests (to 10000 different hosts) and it can last probably forever (I've confirmed thread hung as still valid after 10 hours).
JDK 1.8 on Windows 7.
My HttpClient factory:
SocketConfig socketConfig = SocketConfig.custom()
.setSoKeepAlive(false)
.setSoLinger(1)
.setSoReuseAddress(true)
.setSoTimeout(5000)
.setTcpNoDelay(true).build();
HttpClientBuilder builder = HttpClientBuilder.create();
builder.disableAutomaticRetries();
builder.disableContentCompression();
builder.disableCookieManagement();
builder.disableRedirectHandling();
builder.setConnectionReuseStrategy(new NoConnectionReuseStrategy());
builder.setDefaultSocketConfig(socketConfig);
return HttpClientBuilder.create().build();
My RequestConfig factory:
HttpGet request = new HttpGet(url);
RequestConfig config = RequestConfig.custom()
.setCircularRedirectsAllowed(false)
.setConnectionRequestTimeout(8000)
.setConnectTimeout(4000)
.setMaxRedirects(1)
.setRedirectsEnabled(true)
.setSocketTimeout(5000)
.setStaleConnectionCheckEnabled(true).build();
request.setConfig(config);
return new HttpGet(url);
OpenJDK socketRead0 source
Note: Actually I have some "trick" - I can schedule .getConnectionManager().shutdown() in other Thread with cancellation of Future if request finished properly, but it is depracated and also it kills whole HttpClient, not only that single request.

Though this question mentions Windows, I have the same problem on Linux. It appears there is a flaw in the way the JVM implements blocking socket timeouts:
https://bugs.openjdk.java.net/browse/JDK-8049846
https://bugs.openjdk.java.net/browse/JDK-8075484
To summarize, timeout for blocking sockets is implemented by calling poll on Linux (and select on Windows) to determine that data is available before calling recv. However, at least on Linux, both methods can spuriously indicate that data is available when it is not, leading to recv blocking indefinitely.
From poll(2) man page BUGS section:
See the discussion of spurious readiness notifications under the BUGS section of select(2).
From select(2) man page BUGS section:
Under Linux, select() may report a socket file descriptor as "ready
for reading", while nevertheless a subsequent read blocks. This could
for example happen when data has arrived but upon examination has
wrong checksum and is discarded. There may be other circumstances
in which a file descriptor is spuriously reported as ready. Thus it
may be safer to use O_NONBLOCK on sockets that should not block.
The Apache HTTP Client code is a bit hard to follow, but it appears that connection expiration is only set for HTTP keep-alive connections (which you've disabled) and is indefinite unless the server specifies otherwise. Therefore, as pointed out by oleg, the Connection eviction policy approach won't work in your case and can't be relied upon in general.

As Clint said, you should consider a Non-blocking HTTP client, or (seeing that you are using the Apache Httpclient) implement a Multithreaded request execution to prevent possible hangs of the main application thread (this not solve the problem but is better than restart your app because is freezed). Anyway, you set the setStaleConnectionCheckEnabled property but the stale connection check is not 100% reliable, from the Apache Httpclient tutorial:
One of the major shortcomings of the classic blocking I/O model is
that the network socket can react to I/O events only when blocked in
an I/O operation. When a connection is released back to the manager,
it can be kept alive however it is unable to monitor the status of the
socket and react to any I/O events. If the connection gets closed on
the server side, the client side connection is unable to detect the
change in the connection state (and react appropriately by closing the
socket on its end).
HttpClient tries to mitigate the problem by testing whether the
connection is 'stale', that is no longer valid because it was closed
on the server side, prior to using the connection for executing an
HTTP request. The stale connection check is not 100% reliable and adds
10 to 30 ms overhead to each request execution.
The Apache HttpComponents crew recommends the implementation of a Connection eviction policy
The only feasible solution that does not involve a one thread per
socket model for idle connections is a dedicated monitor thread used
to evict connections that are considered expired due to a long period
of inactivity. The monitor thread can periodically call
ClientConnectionManager#closeExpiredConnections() method to close all
expired connections and evict closed connections from the pool. It can
also optionally call ClientConnectionManager#closeIdleConnections()
method to close all connections that have been idle over a given
period of time.
Take a look at the sample code of the Connection eviction policy section and try to implement it in your application along with the Multithread request execution, I think the implementation of both mechanisms will prevent your undesired hangs.

You should consider a Non-blocking HTTP client like Grizzly or Netty which do not have blocking operations to hang a thread.

I have more than 50 machines that make about 200k requests/day/machine. They are running Amazon Linux AMI 2017.03. I previously had jdk1.8.0_102, now I have jdk1.8.0_131. I am using both apacheHttpClient and OKHttp as scraping libraries.
Each machine was running 50 threads, and sometimes, the threads get lost. After profiling with Youkit java profiler I got
ScraperThread42 State: RUNNABLE CPU usage on sample: 0ms
java.net.SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) SocketInputStream.java (native)
java.net.SocketInputStream.socketRead(FileDescriptor, byte[], int, int, int) SocketInputStream.java:116
java.net.SocketInputStream.read(byte[], int, int, int) SocketInputStream.java:171
java.net.SocketInputStream.read(byte[], int, int) SocketInputStream.java:141
okio.Okio$2.read(Buffer, long) Okio.java:139
okio.AsyncTimeout$2.read(Buffer, long) AsyncTimeout.java:211
okio.RealBufferedSource.indexOf(byte, long) RealBufferedSource.java:306
okio.RealBufferedSource.indexOf(byte) RealBufferedSource.java:300
okio.RealBufferedSource.readUtf8LineStrict() RealBufferedSource.java:196
okhttp3.internal.http1.Http1Codec.readResponse() Http1Codec.java:191
okhttp3.internal.connection.RealConnection.createTunnel(int, int, Request, HttpUrl) RealConnection.java:303
okhttp3.internal.connection.RealConnection.buildTunneledConnection(int, int, int, ConnectionSpecSelector) RealConnection.java:156
okhttp3.internal.connection.RealConnection.connect(int, int, int, List, boolean) RealConnection.java:112
okhttp3.internal.connection.StreamAllocation.findConnection(int, int, int, boolean) StreamAllocation.java:193
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(int, int, int, boolean, boolean) StreamAllocation.java:129
okhttp3.internal.connection.StreamAllocation.newStream(OkHttpClient, boolean) StreamAllocation.java:98
okhttp3.internal.connection.ConnectInterceptor.intercept(Interceptor$Chain) ConnectInterceptor.java:42
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.internal.http.BridgeInterceptor.intercept(Interceptor$Chain) BridgeInterceptor.java:93
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(Interceptor$Chain) RetryAndFollowUpInterceptor.java:124
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.RealCall.getResponseWithInterceptorChain() RealCall.java:198
okhttp3.RealCall.execute() RealCall.java:83
I found out that they have a fix for this
https://bugs.openjdk.java.net/browse/JDK-8172578
in JDK 8u152 (early access). I have installed it on one of our machines. Now I am waiting to see some good results.

Given no one else responded so far, here is my take
Your timeout setting looks perfectly OK to me. The reason why certain requests appear to be constantly blocked in a java.net.SocketInputStream#socketRead0() call is likely to be due to a combination of misbehaving servers and your local configuration. Socket timeout defines a maximum period of inactivity between two consecutive i/o read operations (or in other words two consecutive incoming packets). Your socket timeout setting is 5,000 milliseconds. As long as the opposite endpoint keeps on sending a packet every 4,999 milliseconds for a chunk encoded message the request will never time out and will end up sending most of its time blocked in java.net.SocketInputStream#socketRead0(). You can find out whether or not this is the case by running HttpClient with wire logging turned on.

For Apache HTTP Client (blocking) I found best solution is to getConnectionManager(). and shutdown it.
So in high-reliability solution I just schedule shutdown in other thread and in case request does not complete I'm shutting in down from other thread

I bumped into the same issue using apache common http client.
There's a pretty simple workaround (which doesn't require shutting the connection manager down):
In order to reproduce it, one needs to execute the request from the question in a new thread paying attention to details:
run request in separate thread, close request and release it's connection in a different thread, interrupt hanging thread
don't run EntityUtils.consumeQuietly(response.getEntity()) in finally block (because it hangs on 'dead' connection)
First, add the interface
interface RequestDisposer {
void dispose();
}
Execute an HTTP request in a new thread
final AtomicReference<RequestDisposer> requestDisposer = new AtomicReference<>(null);
final Thread thread = new Thread(() -> {
final HttpGet request = new HttpGet("http://my.url");
final RequestDisposer disposer = () -> {
request.abort();
request.releaseConnection();
};
requestDiposer.set(disposer);
try (final CloseableHttpResponse response = httpClient.execute(request))) {
...
} finally {
disposer.dispose();
}
};)
thread.start()
Call dispose() in the main thread to close hanging connection
requestDisposer.get().dispose(); // better check if it's not null first
thread.interrupt();
thread.join();
That fixed the issue for me.
My stacktrace looked like this:
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
To whom it might be interesting, it easily reproducable, interrupt the thread without aborting request and releasing connection (ratio is about 1/100).
Windows 10, version 10.0.
jdk8.151-x64.

I feel that all these answers are way too specific.
We have to note that this is probably a real JVM bug. It should be possible to get the file descriptor and close it. All this timeout-talk is too high level. You do not want a timeout to the extent that the connection fails, what you want is an ability to hard break this stuck thread and stop or interrupt it.
The way the JVM should implemented the SocketInputStream.socketRead function is to set some internal default timeout, which should be even as low as 1 second. Then when the timeout comes, immediately looping back to the socketRead0. While that is happening, the Thread.interrupt and Thread.stop commands can take effect.
The even better way of doing this of course is not to do any blocking wait at all, but instead use a the select(2) system call with a list of file descriptors and when any one has data available, let it perform the read operation.
Just look all over the internet all these people having trouble with threads stuck in java.net.SocketInputStream#socketRead0, it's the most popular topic about java.net.SocketInputStream hands down!
So, while the bug is not fixed, I wonder about the most dirty trick I can come up with to break up this situation. Something like connecting with the debugger interface to get to the stack frame of the socketRead call and grab the FileDescriptor and then break into that to get the int fd number and then make a native close(2) call on that fd.
Do we have a chance to do that? (Don't tell me "it's not good practice") -- if so, let's do it!

I faced the same issue today. Based on #Sergei Voitovich I've tried to make it work still using Apache Http Client.
Since I am using Java 8 its simpler to make a timeout to abort the connection.
Here's is a draft of the implementation:
private HttpResponse executeRequest(Request request){
InterruptibleRequestExecution requestExecution = new InterruptibleRequestExecution(request, executor);
ExecutorService executorService = Executors.newSingleThreadExecutor();
try {
return executorService.submit(requestExecution).get(<your timeout in milliseconds>, TimeUnit.MILLISECONDS);
} catch (TimeoutException | ExecutionException e) {
// Your request timed out, you can throw an exception here if you want
throw new UsefulExceptionForYourApplication(e);
} catch (InterruptedException e) {
// Always remember to call interrupt after catching InterruptedException
Thread.currentThread().interrupt();
throw new UsefulExceptionForYourApplication(e);
} finally {
// This method forces to stop the Thread Pool (with single thread) created by Executors.newSingleThreadExecutor() and makes the pending request to abort inside the thread. So if the request is hanging in socketRead0 it will stop and also the thread will be terminated
forceStopIdleThreadsAndRequests(requestExecution, executorService);
}
}
private void forceStopIdleThreadsAndRequests(InterruptibleRequestExecution execution,
ExecutorService executorService) {
execution.abortRequest();
executorService.shutdownNow();
}
The code above will create a new Thread to execute the request using org.apache.http.client.fluent.Executor. Timeout can be easily configured.
The execution of the thread is defined in InterruptibleRequestExecution which you can see below.
private static class InterruptibleRequestExecution implements Callable<HttpResponse> {
private final Request request;
private final Executor executor;
private final RequestDisposer disposer;
public InterruptibleRequestExecution(Request request, Executor executor) {
this.request = request;
this.executor = executor;
this.disposer = request::abort;
}
#Override
public HttpResponse call() {
try {
return executor.execute(request).returnResponse();
} catch (IOException e) {
throw new UsefulExceptionForYourApplication(e);
} finally {
disposer.dispose();
}
}
public void abortRequest() {
disposer.dispose();
}
#FunctionalInterface
interface RequestDisposer {
void dispose();
}
}
The results are really good. We've had times where some connections where hanging in sockedRead0 for 7 hours! Now, it never passes the defined timeout and its working in production with millions of requests per day without having any problems.

java: Good socket timeout for LAN connections?

I have a server (java app running on my laptop) and a client (java app running on my android smartphone).
I'm trying to automatically find the IP address of the server from my client.
Right now i just loop all IPs in the same LAN (192.168.1.0 > 192.168.1.1.255) and if the server (that is listening on a custom port) accept the connection then i found the IP.
The problem is, if i set the connection timeout less then 200ms most of times the client can't find the server.
So the question is, how i can implement a better (faster) way to find the server IP?
I have tried the java InetAddress.isReachable() method but the server always seems unrechable...
And, if there isn't a better way, what do you think it's a good timeout value from local (LAN) socket connections?

Just for others... I just found a very good way to find the server IP in LESS THEN HALF SECOND!
here my solution:
String partialIp = "192.168.1.";
int port = 123;
int counter;
boolean found;
String ip;
Runnable tryNextIp = new Runnable() {
#Override
public void run() {
int myIp = counter++;
String targetIpTemp = partialIp + myIp;
Socket socketTemp = new Socket();
try {
socketTemp.connect(new InetSocketAddress(targetIpTemp, port), 6000);
socketTemp.close();
ip = targetIpTemp;
found = true;
} catch (IOException e) {
try {
socketTemp.close();
} catch (IOException e1) {}
}
}
};
String findIp() {
counter = 0;
found = false;
ExecutorService executorService = Executors.newFixedThreadPool(256);
for (int i=0; i<256; i++) {
if (found)
break;
executorService.execute(tryNextIp);
}
executorService.shutdown();
try {
while (!found && !executorService.isTerminated())
executorService.awaitTermination(200, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {}
if (found)
return ip;
else
return null;
}

A good timeout value is the time you're willing to wait for your server to reply, given typical network conditions and server response times. You need to pick a reasonable value, independently of your application here -- it is up to you to decide that if the server does not respond in X amount of time, then it is safe to assume it is not there.
To speed up your client, consider creating multiple threads to query multiple servers at once. Executors.newFixedThreadPool() will make this trivial for you.
However, you may want to consider other alternatives that don't require a full network scan; for example:
Just let the user/administrator specify the IP address (Why do you need to discover the server IP? Do you not know what machine you set up your server on? Why not just configure the server to have a static LAN IP?)
If you truly do need service discovery, technologies like NSD/Zeroconf/Bonjour allow for service advertising and discovery.
Even something very basic may be suitable to your needs, e.g. send a broadcast UDP packet from the client and let the server respond, or have the server periodically broadcast announcements.

What the socket timeout should be depends entirely on the expected service time of the request. Naively you could find the average service time and use double that for the timeout. If you want to get more accurate, you would need to plot the statistical distribution of service times, determine the standard deviation, and use the average plus three or even four times the standard deviation as the timeout, to make sure you don't get false-positive timeouts but you do detect failures within a reasonable time. Ultimately it depends on just how trigger-happy you want to be.

Java threaded socket connection timeouts

I have to make simultaneous tcp socket connections every x seconds to multiple machines, in order to get something like a status update packet.
I use a Callable thread class, which creates a future task that connects to each machine, sends a query packet, and receives a reply which is returned to the main thread that creates all the callable objects.
My socket connection class is :
public class ClientConnect implements Callable<String> {
Connection con = null;
Statement st = null;
ResultSet rs = null;
String hostipp, hostnamee;
ClientConnect(String hostname, String hostip) {
hostnamee=hostname;
hostipp = hostip;
}
#Override
public String call() throws Exception {
return GetData();
}
private String GetData() {
Socket so = new Socket();
SocketAddress sa = null;
PrintWriter out = null;
BufferedReader in = null;
try {
sa = new InetSocketAddress(InetAddress.getByName(hostipp), 2223);
} catch (UnknownHostException e1) {
e1.printStackTrace();
}
try {
so.connect(sa, 10000);
out = new PrintWriter(so.getOutputStream(), true);
out.println("\1IDC_UPDATE\1");
in = new BufferedReader(new InputStreamReader(so.getInputStream()));
String [] response = in.readLine().split("\1");
out.close();in.close();so.close(); so = null;
try{
Integer.parseInt(response[2]);
} catch(NumberFormatException e) {
System.out.println("Number format exception");
return hostnamee + "|-1" ;
}
return hostnamee + "|" + response[2];
} catch (IOException e) {
try {
if(out!=null)out.close();
if(in!=null)in.close();
so.close();so = null;
return hostnamee + "|-1" ;
} catch (IOException e1) {
// TODO Auto-generated catch block
return hostnamee + "|-1" ;
}
}
}
}
And this is the way i create a pool of threads in my main class :
private void StartThreadPool()
{
ExecutorService pool = Executors.newFixedThreadPool(30);
List<Future<String>> list = new ArrayList<Future<String>>();
for (Map.Entry<String, String> entry : pc_nameip.entrySet())
{
Callable<String> worker = new ClientConnect(entry.getKey(),entry.getValue());
Future<String> submit = pool.submit(worker);
list.add(submit);
}
for (Future<String> future : list) {
try {
String threadresult;
threadresult = future.get();
//........ PROCESS DATA HERE!..........//
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
}
The pc_nameip map contains (hostname, hostip) values and for every entry i create a ClientConnect thread object.
My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.
If i force the list to contain a single working pc, I have no problem.
The timeouts are pretty random, no clue what's causing them.
All machines are in a local network, the remote servers are written by my also (in C/C++) and been working in another setup for more than 2 years without any problems.
Am i missing something or could it be an os network restriction problem?
I am testing this code on windows xp sp3. Thanks in advance!
UPDATE:
After creating two new server machines, and keeping one that was getting a lot of timeouts, i have the following results :
For 100 thread runs over 20 minutes :
NEW_SERVER1 : 99 successful connections/ 1 timeouts
NEW_SERVER2 : 94 successful connections/ 6 timeouts
OLD_SERVER : 57 successful connections/ 43 timeouts
Other info :
- I experienced a JRE crash (EXCEPTION_ACCESS_VIOLATION (0xc0000005)) once and had to restart the application.
- I noticed that while the app was running my network connection was struggling as i was browsing the internet. I have no idea if this is expected but i think my having at MAX 15 threads is not that much.
So, fisrt of all my old servers had some kind of problem. No idea what that was, since my new servers were created from the same OS image.
Secondly, although the timeout percentage has dropped dramatically, i still think it is uncommon to get even one timeout in a small LAN like ours. But this could be a server's application part problem.
Finally my point of view is that, apart from the old server's problem (i still cannot beleive i lost so much time with that!), there must be either a server app bug, or a JDK related bug (since i experienced that JRE crash).
p.s. I use Eclipse as IDE and my JRE is the latest.
If any of the above ring any bells to you, please comment.
Thank you.
-----EDIT-----
Could it be that PrintWriter and/or BufferedReader are not actually thread safe????!!!?
----NEW EDIT 09 Sep 2013----
After re-reading all the comments and thanks to #Gray and his comment :
When you run multiple servers does the first couple work and the rest of them timeout? Might be interesting to put a small sleep in your fork loop (like 10 or 100ms) to see if it works that way.
I rearanged the tree list of the hosts/ip's and got some really strange results.
It seems that if an alive host is placed on top of the tree list, thus being first to start a socket connection, has no problem connecting and receiving packets without any delay or timeout.
On the contrary, if an alive host is placed at the bottom of the list, with several dead hosts before it, it just takes too long to connect and with my previous timeout of 10 secs it failed to connect. But after changing the timeout to 60 seconds (thanks to #EJP) i realised that no timeouts are occuring!
It just takes too long to connect (more than 20 seconds in some occasions).
Something is blobking new socket connections, and it isn't that the hosts or network is to busy to respond.
I have some debug data here, if you would like to take a look :
http://pastebin.com/2m8jDwKL

You could simply check for availability before you connect to the socket. There is an answer who provides some kind of hackish workaround https://stackoverflow.com/a/10145643/1809463
Process p1 = java.lang.Runtime.getRuntime().exec("ping -c 1 " + ip);
int returnVal = p1.waitFor();
boolean reachable = (returnVal==0);
by jayunit100
It should work on unix and windows, since ping is a common program.

My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.
So as I understand the problem, if you have (for example) 10 PCs in your map and 1 is alive and the other 9 are not online, all 10 connections time out. If you just put the 1 alive PC in the map, it shows up as fine.
This points to some sort of concurrency problem but I can't see it. I would have thought that there was some sort of shared data that was not being locked or something. I see your test code is using Statement and ResultSet. Maybe there is a database connection that is being shared without locking or something? Can you try just returning the result string and printing it out?
Less likely is some sort of network or firewall configuration but the idea that one failed connection would cause another to fail is just strange. Maybe try running your program on one of the servers or from another computer?
If I try your test code, it seems to work fine. Here's the source code for my test class. It has no problems contacting a combination of online and offline hosts.
Lastly some quick comments about your code:
You should close the streams, readers, and sockets in a finally block. Check my test class for a better pattern there.
You should return a small Result class instead of passing back a String that they has to be parsed.
Hope this helps.

After a lot of reading and experimentation i will have to answer my own question (if i am allowed to do of course).
Java just can't handle concurrent multiple socket connections without adding a big performance overhead. At least in a Core2Duo/4GB RAM/ Windows XP machine.
Creating multiple concurrent socket connections to remote hosts (using of course the code i posted) creates some kind of resource bottleneck, or blocking situation, wich i am still not aware of.
If you try to connect to 20 hosts simultaneously, and a lot of them are disconnected, then you cannot guarantee a "fast" connection to the alive ones.
You will get connected but could be after 20-25 seconds. Meaning that you'll have to set socket timeout to something like 60 seconds. (not acceptable for my application)
If an alive host is lucky to start its connection try first (having in mind that concurrency is not absolute. the for loop still has sequentiality), then he will probably get connected very fast and get a response.
If it is unlucky, the socket.connect() method will block for some time, depending on how many are the hosts before it that will timeout eventually.
After adding a small sleep between the pool.submit(worker) method calls (100 ms) i realised that it makes some difference. I get to connect faster to the "unlucky" hosts. But still if the list of dead hosts is increased, the results are almost the same.
If i edit my host list and place a previously "unlucky" host at the top (before dead hosts), all problems dissapear...
So, for some reason the socket.connect() method creates a form of bottleneck when the hosts to connect to are many, and not alive. Be it a JVM problem, a OS limitation or bad coding from my side, i have no clue...
I will try a different coding approach and hopefully tommorow i will post some feedback.
p.s. This answer made me think of my problem :
https://stackoverflow.com/a/4351360/2025271

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.