How to limit Jersey 2 connections - java

I am using code that is similar to the code in this question.
A copy of the code form the question is with properties commented out, as I have mine commented out.
import javax.ws.rs.client.Client;
public static void main(String[] args)
{
Client client = ClientBuilder.newClient();
//client.property(ClientProperties.CONNECT_TIMEOUT, 1000);
//client.property(ClientProperties.READ_TIMEOUT, 1000);
WebTarget target = client.target("http://1.2.3.4:8080");
target = target.queryParam("paramname", "paramvalue");
target = target.queryParam("paramname2", "paramvalue2");
try
{
String responseMsg;
for (int i = 0; i < 50; i++)
responseMsg = target.request(MediaType.APPLICATION_XML).get();
System.out.println("responseMsg: " + responseMsg);
}
catch (ProcessingException pe)
{
pe.printStackTrace();
}
}
I modified the original code slightly by adding in a for-loop. The idea is that Jersey only generates one connection, not 50.
The problem that I have is that the daemon with which I communicate reports that I create a new connection with each call.
How can I have just one connection and then use that for each communication transaction?
At worst, I would like to close the connection, but that seems silly. There is a lot of overhead to creating a connection (on the daemon if nothing else and closing it).
The daemon reports "connection allowed" on the terminal window (CENTOS 7, but does not matter). I run the client usually off of my Windows 7 desktop. I am using Java 8 with Eclipse Luna. What happens quite frequently is that the daemon will say "maximum number of connections reached" and the proceed to do not nice things.

I have not tested fully yet, however the answer is in this other StackOverflow ticket.
I have to use an ApacheConnectorProvider object.
The Jersey help documentation, section 5.5 states:
In a simple environment, setting the property before creating the first target is sufficient, but in complex
environments (such as application servers), where some poolable connections might exist before your
application even bootstraps, this approach is not 100% reliable and we recommend using a different client
transport connector, such as Apache Connector. These limitations have to be considered especially when
invoking CORS (Cross Origin Resource Sharing) requests.
I am doing cross original resource sharing, so the simple method that I used is not stable. Using the Apache Connector on my small applet worked. I was able to use a for-loop with an iteration of 500 and no issues, just have to try the real code now.

Related

httpclient Connection reset [duplicate]

I'm creating a (well behaved) web spider and I notice that some servers are causing Apache HttpClient to give me a SocketException -- specifically:
java.net.SocketException: Connection reset
The code that causes this is:
// Execute the request
HttpResponse response;
try {
response = httpclient.execute(httpget); //httpclient is of type HttpClient
} catch (NullPointerException e) {
return;//deep down in apache http sometimes throws a null pointer...
}
For most servers it's just fine. But for others, it immediately throws a SocketException.
Example of site that causes immediate SocketException: http://www.bhphotovideo.com/
Works great (as do most websites): http://www.google.com/
Now, as you can see, www.bhphotovideo.com loads fine in a web browser. It also loads fine when I don't use Apache's HTTP Client. (Code like this:)
HttpURLConnection c = (HttpURLConnection)url.openConnection();
BufferedInputStream in = new BufferedInputStream(c.getInputStream());
Reader r = new InputStreamReader(in);
int i;
while ((i = r.read()) != -1) {
source.append((char) i);
}
So, why don't I just use this code instead? Well there are some key features in Apache's HTTP Client that I need to use.
Does anyone know what causes some servers to cause this exception?
Research so far:
Problem occurs on my local Mac dev machines AND an AWS EC2 Instance, so it's not a local firewall.
It seems the error isn't caused by the remote machine because the exception doesn't say "by peer"
This stack overflow seems relavent java.net.SocketException: Connection reset but the answers don't show why this would happen only from Apache HTTP Client and not other approaches.
Bonus question: I'm doing a fair amount of crawling with this system. Is there generally a better Java class for this other than Apache HTTP Client? I've found a number of issues (such as the NullPointerException I have to catch in the code above). It seems that HTTPClient is very picky about server communications -- more picky than I'd like for a crawler that can't just break when a server doesn't behave.
Thanks all!
Solution
Honestly, I don't have a perfect solution, but it works, so that's good enough for me.
As pointed out by oleg below, Bixo has created a crawler that customizes HttpClient to be more forgiving to servers. To "get around" the issue more than fix it, I just used SimpleHttpFetcher provided by Bixo here:
(linked removed - SO thinks I'm a spammer, so you'll have to google it yourself)
SimpleHttpFetcher fetch = new SimpleHttpFetcher(new UserAgent("botname","contact#yourcompany.com","ENTER URL"));
try {
FetchedResult result = fetch.fetch("ENTER URL");
System.out.println(new String(result.getContent()));
} catch (BaseFetchException e) {
e.printStackTrace();
}
The down side to this solution is that there are a lot of dependencies for Bixo -- so this may not be a good work around for everyone. However, you can always just work through their use of DefaultHttpClient and see how they instantiated it to get it to work. I decided to use the whole class because it handles some things for me, like automatic redirect following (and reporting the final destination url) that are helpful.
Thanks for the help all.
Edit: TinyBixo
Hi all. So, I loved how Bixo worked, but didn't like that it had so many dependencies (including all of Hadoop). So, I created a vastly simplified Bixo, without all the dependencies. If you're running into the problems above, I would recommend using it (and feel free to make pull requests if you'd like to update it!)
It's available here: https://github.com/juliuss/TinyBixo
First, to answer your question:
The connection reset was caused by a problem on the server side. Most likely the server failed to parse the request or was unable to process it and dropped the connection as a result without returning a valid response. There is likely something in the HTTP requests generated by HttpClient that causes server side logic to fail, probably due to a server side bug. Just because the error message does not say 'by peer' does not mean the connection reset took place on the client side.
A few remarks:
(1) Several popular web crawlers such as bixo http://openbixo.org/ use HttpClient without major issues but pretty much of them had to tweak HttpClient behavior to make it more lenient about common HTTP protocol violations. Per default HttpClient is rather strict about the HTTP protocol compliance.
(2) Why did not you report the NPE problem or any other problem you have been experiencing to the HttpClient project?
These two settings will sometimes help:
client.getParams().setParameter("http.socket.timeout", new Integer(0));
client.getParams().setParameter("http.connection.stalecheck", new Boolean(true));
The first sets the socket timeout to be infinite.
Try getting a network trace using wireshark, and augment that with log4j logging of the HTTPClient. That should show why the connection is being reset

Tomcat websocket client frame

I have the following code that is executed in java as the clientendpoint of a websocket
protected void dequeue() throws InterruptedException, IOException
{
ByteBuffer bbuf;
System.out.println("start");
while((bbuf = messageQueue.take()).get(0) != 0)
{
bbuf.position(bbuf.limit());
if(bbuf.get(0)== 0)
System.out.println("here");
bbuf.flip();
for(Session session : sessionList)
{
//Thread.sleep(10000);
if(!session.isOpen())
break;
session.getBasicRemote().sendBinary(bbuf);
}
}
System.out.println("end");
}
The code works fine when the Thread.sleep() that is commented out is put back into the code. However when the Thread.sleep() is not included in the code the writes to the websocket sometimes work and other times the #onClose is called after the first message is written and the following reason is given,
CloseReason: code [1002], reason [The client frame set the reserved bits to [7] which was not supported by this endpoint]
In which the [7] will sometimes be a 1,2,etc. I have not been able to find anything really to why this would be happening, does anyone happen to have any insight into what is happening? As of note, I am using tomcat 7.0.53 to host the ServerSide of the websocket and uses HTTPS instead of HTTP.
This is due to the following bug in tomcat, https://issues.apache.org/bugzilla/show_bug.cgi?id=57318#c1 updating to apache tomcat 7.0.57 will fix the issue. Updating to an apache tomcat version greater then 7.0.53 may fix the issue but has not been tested yet.

Rather mysterious SocketException with Java 1.6 on CentOS 4

I have a JUnit test of a JAX-RS web service. The test launches embedded tomcat, and then talks to it via the Apache CXF JAX-RS client.
Consider this backtrace:
Caused by: java.net.SocketException: Socket Closed
at java.net.PlainSocketImpl.getOption(PlainSocketImpl.java:286)
at java.net.Socket.getSoTimeout(Socket.java:1032)
at sun.net.www.http.HttpClient.available(HttpClient.java:356)
at sun.net.www.http.HttpClient.New(HttpClient.java:273)
at sun.net.www.http.HttpClient.New(HttpClient.java:310)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:987)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:923)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:841)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1031)
This fails only on CentOS 4.8. The same unit test (which launches an embedded tomcat and then talks to a web service in it) works just fine on a wide variety of other systems. Note the extreme oddity of this backtrace: HttpHRLConnection has called HttpClient to get a new connection, and that later class has apparently closed its own socket before the connection has been returned where any code of mine could get to it.
Further, the test has friends that do the same server setup of the same service and talk to it without issues.
Even further, the following incantation (slightly abbreviated) is a workaround:
#Before
public void pingServiceToWorkAroundCentos() {
try {
/* ... code to make a connection to the service and close it ... */
} catch (Throwable t) {
// do nothing
}
}
In other words, if I arrange for an extra throwaway connection before running each of the test cases, that uses up whatever this problem is.
What could this be?
Since there is only a backtrace and no code here, I am assuming that there is some sort of race condition or bug where the socket is being closed prior by another thread while this current thread is attempting to get the OutputStream.
Looking at the source for the JDK I see this...
public Object getOption(int opt) throws SocketException {
if (isClosedOrPending()) {
throw new SocketException("Socket Closed");
}
... snip ...
the isClosedOrPending method checks whether the internal FD is null or if a close is pending, i.e. close has been called on the socket.
Good luck tracking it down.
Nothing mysterious about it. You have closed the socket and then continued to use it.
Closing either the input or the output stream of the socket closes the other stream and the socket.
I am pretty sure this is a JDK bug.
HttpClient was modified in a recent commit:
http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/diff/e6dc1d9bc70b/src/share/classes/sun/net/www/http/HttpClient.java
The getSoTimeout() call needs to be in a try/catch block, for now unfortunately the only real option is to downgrade the JDK.
Looks similar to an issue we ran into where the httpclient pooled connections were kept alive longer than the corresponding server side connections in tomcat. Basically this results in stale connections in the httpclient connection pool. When httpclient tries to use these, they basically fail. I believe httpclient actually recovers from this using the standard retry handler.
The solution is to double check your timeout settings client and serverside and your retry policy.

How can I ensure that my HttpClient 4.1 does not leak sockets?

My server uses data from an internal web service to construct its response, on a per request basis. I'm using Apache HttpClient 4.1 to make the requests. Each initial request will result in about 30 requests to the web service. Of these, 4 - 8 will end up with sockets stuck in CLOSE_WAIT, which never get released. Eventually these stuck sockets exceed my ulimit and my process runs out of file descriptors.
I don't want to just raise my ulimit (1024), because that will just mask the problem.
The reason I've moved to HttpClient is that java.net.HttpUrlConnection was behaving the same way.
I have tried moving to a SingleClientConnManager per request, and calling client.getConnectionManager().shutdown() on it, but sockets still end up stuck.
Should I be trying to solve this so that I end up with 0 open sockets while there are no running requests, or should I be concentrating on request persistence and pooling?
For clarity I'm including some details which may be relevant:
OS: Ubuntu 10.10
JRE: 1.6.0_22
Language: Scala 2.8
Sample code:
val cleaner = Executors.newScheduledThreadPool(1)
private val client = {
val ssl_ctx = SSLContext.getInstance("TLS")
val managers = Array[TrustManager](TrustingTrustManager)
ssl_ctx.init(null, managers, new java.security.SecureRandom())
val sslSf = new org.apache.http.conn.ssl.SSLSocketFactory(ssl_ctx, SSLSocketFactory.ALLOW_ALL_HOSTNAME_VERIFIER)
val schemeRegistry = new SchemeRegistry()
schemeRegistry.register(new Scheme("https", 443, sslSf))
val connection = new ThreadSafeClientConnManager(schemeRegistry)
object clean extends Runnable{
override def run = {
connection.closeExpiredConnections
connection.closeIdleConnections(30, SECONDS)
}
}
cleaner.scheduleAtFixedRate(clean,10,10,SECONDS)
val httpClient = new DefaultHttpClient(connection)
httpClient.getCredentialsProvider().setCredentials(new AuthScope(AuthScope.ANY), new UsernamePasswordCredentials(username,password))
httpClient
}
val get = new HttpGet(uri)
val entity = client.execute(get).getEntity
val stream = entity.getContent
val justForTheExample = IOUtils.toString(stream)
stream.close()
Test: netstat -a | grep {myInternalWebServiceName} | grep CLOSE_WAIT
(Lists sockets for my process that are in CLOSE_WAIT state)
Post comment discussion:
This code now demonstrates correct usage.
One needs to pro-actively evict expired / idle connections from the connection pool, as in the blocking I/O model connections cannot react to I/O events unless they are being read from / written to. For details see
http://hc.apache.org/httpcomponents-client-dev/tutorial/html/connmgmt.html#d4e631
I've marked oleg's answer as correct, as it highlights an important usage point about HttpClient's connection pooling.
To answer my specific original question, though, which was "Should I be trying to solve for 0 unused sockets or trying to maximize pooling?"
Now that the pooling solution is in place and working correctly the application throughput has increased by about 150%. I attribute this to not having to renegotiate SSL and multiple handshakes, instead reusing persistent connections in accordance with HTTP 1.1.
It is definitely worth working to utilize pooling as intended, rather than trying to hack around with calling ThreadSafeClientConnManager.shutdown() after each request etcetera. If, on the other hand, you were calling arbitrary hosts and not reusing routes the way I am you might easily find that it becomes necessary to do that sort of hackery, as the JVM might surprise you with the long life of CLOSE_WAIT designated sockets if you're not garbage collecting very often.
I had the same issue and solved it using the suggesting found here: here. The author touches on some TCP basics:
When a TCP connection is about to close, its finalization is negotiated by both parties. Think of it as breaking a contract in a civilized manner. Both parties sign the paper and it’s all good. In geek talk, this is done via the FIN/ACK messages. Party A sends a FIN message to indicate it wants to close the socket. Party B sends an ACK saying it received the message and is considering the demand. Party B then cleans up and sends a FIN to Party A. Party A responds with the ACK and everyone walks away.
The problem comes in
when B doesn’t send its FIN. A is kinda stuck waiting for it. It has
initiated its finalization sequence and is waiting for the other party
to do the same.
He then mentions RFC 2616, 14.10 to suggest setting up an http header to solve this issue:
postMethod.addHeader("Connection", "close");
Honestly, I don't really know the implications of setting this header. But it did stop CLOSE_WAIT from happening on my unit tests.

Apache HTTPClient throws java.net.SocketException: Connection reset for many domains

I'm creating a (well behaved) web spider and I notice that some servers are causing Apache HttpClient to give me a SocketException -- specifically:
java.net.SocketException: Connection reset
The code that causes this is:
// Execute the request
HttpResponse response;
try {
response = httpclient.execute(httpget); //httpclient is of type HttpClient
} catch (NullPointerException e) {
return;//deep down in apache http sometimes throws a null pointer...
}
For most servers it's just fine. But for others, it immediately throws a SocketException.
Example of site that causes immediate SocketException: http://www.bhphotovideo.com/
Works great (as do most websites): http://www.google.com/
Now, as you can see, www.bhphotovideo.com loads fine in a web browser. It also loads fine when I don't use Apache's HTTP Client. (Code like this:)
HttpURLConnection c = (HttpURLConnection)url.openConnection();
BufferedInputStream in = new BufferedInputStream(c.getInputStream());
Reader r = new InputStreamReader(in);
int i;
while ((i = r.read()) != -1) {
source.append((char) i);
}
So, why don't I just use this code instead? Well there are some key features in Apache's HTTP Client that I need to use.
Does anyone know what causes some servers to cause this exception?
Research so far:
Problem occurs on my local Mac dev machines AND an AWS EC2 Instance, so it's not a local firewall.
It seems the error isn't caused by the remote machine because the exception doesn't say "by peer"
This stack overflow seems relavent java.net.SocketException: Connection reset but the answers don't show why this would happen only from Apache HTTP Client and not other approaches.
Bonus question: I'm doing a fair amount of crawling with this system. Is there generally a better Java class for this other than Apache HTTP Client? I've found a number of issues (such as the NullPointerException I have to catch in the code above). It seems that HTTPClient is very picky about server communications -- more picky than I'd like for a crawler that can't just break when a server doesn't behave.
Thanks all!
Solution
Honestly, I don't have a perfect solution, but it works, so that's good enough for me.
As pointed out by oleg below, Bixo has created a crawler that customizes HttpClient to be more forgiving to servers. To "get around" the issue more than fix it, I just used SimpleHttpFetcher provided by Bixo here:
(linked removed - SO thinks I'm a spammer, so you'll have to google it yourself)
SimpleHttpFetcher fetch = new SimpleHttpFetcher(new UserAgent("botname","contact#yourcompany.com","ENTER URL"));
try {
FetchedResult result = fetch.fetch("ENTER URL");
System.out.println(new String(result.getContent()));
} catch (BaseFetchException e) {
e.printStackTrace();
}
The down side to this solution is that there are a lot of dependencies for Bixo -- so this may not be a good work around for everyone. However, you can always just work through their use of DefaultHttpClient and see how they instantiated it to get it to work. I decided to use the whole class because it handles some things for me, like automatic redirect following (and reporting the final destination url) that are helpful.
Thanks for the help all.
Edit: TinyBixo
Hi all. So, I loved how Bixo worked, but didn't like that it had so many dependencies (including all of Hadoop). So, I created a vastly simplified Bixo, without all the dependencies. If you're running into the problems above, I would recommend using it (and feel free to make pull requests if you'd like to update it!)
It's available here: https://github.com/juliuss/TinyBixo
First, to answer your question:
The connection reset was caused by a problem on the server side. Most likely the server failed to parse the request or was unable to process it and dropped the connection as a result without returning a valid response. There is likely something in the HTTP requests generated by HttpClient that causes server side logic to fail, probably due to a server side bug. Just because the error message does not say 'by peer' does not mean the connection reset took place on the client side.
A few remarks:
(1) Several popular web crawlers such as bixo http://openbixo.org/ use HttpClient without major issues but pretty much of them had to tweak HttpClient behavior to make it more lenient about common HTTP protocol violations. Per default HttpClient is rather strict about the HTTP protocol compliance.
(2) Why did not you report the NPE problem or any other problem you have been experiencing to the HttpClient project?
These two settings will sometimes help:
client.getParams().setParameter("http.socket.timeout", new Integer(0));
client.getParams().setParameter("http.connection.stalecheck", new Boolean(true));
The first sets the socket timeout to be infinite.
Try getting a network trace using wireshark, and augment that with log4j logging of the HTTPClient. That should show why the connection is being reset

Categories