I'm running a Java microservice issuing many REST client requests using the Apache CXF implementation of the JAX RS Client class. While repeating 10 concurrent requests I see 350+ map entries inside sun.net.www.http.KeepAliveCache consuming a significant amount of heap memory. This is memory I'd like to free.
In this example I'm issuing the same request to the same host (target) all the time, and I don't understand why that many entries are kept in the cache. Wouldn't roughly 10 be enough? I'm also unable to figure out how connections / client instances / ... can be re-used.
In my code I only create a single instance of the Client class. This is then used with client.target(...).request().get().
I'd be happy to get some hints of how to use a connection manager with CXF - assuming this would help with my problem.
As a side note, a lot of CPU time is spent in SSLSocketImpl.startHandshake. If this can be avoided (it is the same host all the time!), I'd be happy to get some hints on that as well.
EDIT: I close the response objects using a try-finally very soon after reading the entity out of it. I never close the Client instance, though.
EDIT 2: Re-using the same Client instance is not a good idea, as Apache CXF does not provide a thread-safe implementation. As I don't see a performance hit, I'll just create a new Client whenever I need an instance. This is not related to the problem outlined here, though.
EDIT 3: Same issue with Jersey client.
Related
I'd like to know if HttpClient is thread-safe, ie can be used by 2 threads without problem.
Reading the class code, it's not clear for me, as some fields are thread safe and some are not.
There has been questions not answered:
Is Jetty HttpClient(9.3.8.v20160314) thread-safe and can be used repeatedly?
And some question seem to say it is:
How to isolate Jetty HttpClient for multiple users?
The design of Jetty's HttpClient is for it to be treated the same as a Web Browser with multiple tabs open.
It is, by design, multi-threaded.
You are encouraged to only have 1 HttpClient instance started and use it repeatedly for all requests you want to make to as many servers as you want.
The Request object you create for each request is unique for that request only and cannot be reused.
There is also an active ConnectionPool that is maintained for each destination so that a minimum number of open connections is maintained, which is especially useful if you are talking to a secure SSL/TLS server (as the initial handshake to establish the connection is often the slowest part of that request/response exchange)
I am in the process of building a client-server application and I would really like an advise on how to design the server-database connection part.
Let's say the basic idea is the following:
Client authenticates himself on the server.
Client sends a request to server.
Server stores client's request to the local database.
In terms of Java Objects we have
Client Object
Server Object
Database Object
So when a client connects to the server a session is created between them through which all the data is exchanged. Now what bothers me is whether i should create a database object/connection for each client session or whether I should create one database object that will handle all requests.
Thus the two concepts are
Create one database object that handles all client requests
For each client-server session create a database object that is used exclusively for the client.
Going with option 1, I guess that all methods should become synchronized in order to avoid one client thread not overwriting the variables of the other. However, making it synchronize it will be time consuming in the case of a lot of concurrent requests as each request will be placed in queue until the one running is completed.
Going with option 2, seems a more appropriate solution but creating a database object for every client-server session is a memory consuming task, plus creating a database connection for each client could lead to a problem again when the number of concurrent connected users is big.
These are just my thoughts, so please add any comments that it may help on the decision.
Thank you
Option 3: use a connection pool. Every time you want to connect to the database, you get a connection from the pool. When you're done with it, you close the connection to give it back to the pool.
That way, you can
have several clients accessing the database concurrently (your option 1 doesn't allow that)
have a reasonable number of connections opened and avoid bringing the database to its knees or run out of available connections (your option 2 doesn't allow that)
avoid opening new database connections all the time (your option 2 doesn't allow that). Opening a connection is a costly operation.
Basically all server apps use this strategy. All Java EE servers come with a connection pool. You can also use it in Java SE applications, by using a pool as a library (HikariCP, Tomcat connection pool, etc.)
I would suggested a third option, database connection pooling. This way you create a specified number of connections and give out the first available free connection as soon as it becomes available. This gives you the best of both worlds - there will almost always be free connections available quickly and you keep the number of connections the database at a reasonable level. There are plenty of the box java connection pooling solutions, so have a look online.
Just use connection pooling and go with option 2. There are quite a few - C3P0, BoneCP, DBCP. I prefer BoneCP.
Both are not good solutions.
Problem with Option 1:
You already stated the problems with synchronizing when there are multiple threads. But apart from that there are many other problems like transaction management (when are you going to commit your connection?), Security (all clients can see precommitted values).. just to state a few..
Problem with Option 2:
Two of the biggest problems with this are:
It takes a lot of time to create a new connection each and every time. So performance will become an issue.
Database connections are extremely expensive resources which should be used in limited numbers. If you start creating DB Connections for every client you will soon run out of them although most of the connections would not be actively used. You will also see your application performance drop.
The Connection Pooling Option
That is why almost all client-server applications go with the connection pooling solution. You have a set connections in the pool which are obtained and released appropriately. Almost all Java Frameworks have sophisticated connection pooling solutions.
If you are not using any JDBC framework (most use the Spring JDBC\Hibernate) read the following article:
http://docs.oracle.com/javase/jndi/tutorial/ldap/connect/pool.html
If you are using any of the popular Java Frameworks like Spring, I would suggest you use Connection Pooling provided by the framework.
I am writing a benchmarking tool to run against a web application. The problem that I am facing is that the first request to the server always takes significantly longer than subsequent requests.
I have experienced this problem with the apache http client 3.x, 4.x and the Google http client. The apache http client 4.x shows the biggest difference (first request takes about seven times longer than subsequent ones. For Google and 3.x it is about 3 times longer.
My tool has to be able to benchmark simultaneous requests with threads. I can not use one instance of e.g. HttpClient and call it from all the threads, since this throws a Concurrent exception. Therefore, I have to use an individual instance in each thread, which will only execute a single request. This changes the overall results dramatically.
I do not understand this behavior. I do not think that it is due to a caching mechanism on the server because a) the webapp under consideration does not employ any caching (to my knowledge) and b) this effect is also visible when first requesting www.hostxy.com and afterwards www.hostxy.com/my/webapp.
I use System.nanoTime() immediately before and after calling client.execute(get) or get.execute(), respectively.
Does anyone have an idea where this behavior stems from? Do these httpclients themselves do any caching? I would be very grateful for any hints.
Read this: http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html for connection pooling.
Your first connection probably takes the longest, because its a Connect: keep-alive connection, thus following connections can reuse that connection, once it has been established. This is justa guess
Are you hitting a JSP for the first time after server start? If the server flushes it's working directory on each start, then the first hit the JSPs compile and it takes a long time.
Also done on the first transaction: If the transaction uses a ca cert trust store, it will be loaded and cached.
You better once see it about caching
http://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html
If your problem is that "first http request to specific host significantly slower", maybe the cause of this symptom is on the server, while you are concerned about the client.
If the "specific host" that you are calling is an Google App Engine application (or any other Cloud Plattform), it is normal that the first call to that application make you wait a little more. That is because Google put it under a dormant state upon inactivity.
After a recent call (that usually takes longer to respond), the subsequent calls have faster responses, because the server instances are all awake.
Take a look on this:
Google App Engine Application Extremely slow
I hope it helps you, good luck!
I have a multithreaded java program that runs on Amazon's EC2. It queries and fetches data items from a vendor via HttpPost and HttpGet, using a org.apache.http.impl.client.DefaultHttpClient. Concurrently, it pushes the retrieved data items into S3 using AWS's Java SDK.
After a few days of running, I get the symptoms that normally come with http connection leaks:
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection
at org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:417)
at org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:300)
at org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:224)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:391)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
Since both AWS and my requests to the data vendor use Http connections, I am not quite sure where exactly I forget to HttpEntity.consume(), or S3ObjectInputStream.close() (unless it is yet something else...).
So here is my question: are there ways to monitor org.apache.http.impl.conn.tsccm.ConnPoolByRoute so that at least I can detect when I am starting to leak connections/entities not properly consumed/http streams not closed? (I have a feeling it happens only under certain conditions, e.g. when certain exceptions are being thrown, by-passing the logic in my code that consumes HttpEntities, closes streams, etc.) Any idea on how to diagnose what eventually causes all my http connections to fail with that ConnectionPoolTimeoutException would be most welcome. I don't feel like waiting 4+ days between attempts to fix the root cause of the problem.
If you're using the PoolingClientConnectionManager note there are the methods getTotalStats() and getStats(final HttpRoute route) which will give you a PoolStats object with the data you're looking to monitor.
Just fetch the ConnectionManager from your httpclient:
PoolingClientConnectionManager poolManager = (PoolingClientConnectionManager) httpClient.getConnectionManager();
If you can access the org.apache.http.impl.conn.tsccm.ConnPoolByRoute then set it's connTTL to a low enough value so that it's WaitingThreadAborter will eventually terminate a connection. It will show a nice stacktrace there. The other option is to use CGLIB or some other bytecode manipulating framework to create a proxy class wrapping org.apache.http.impl.conn.tsccm.ConnPoolByRoute. Depending on your environment it might not be that easy to set it up, but it's a rather valuable tool to debug issues like yours. (And yes, if you happen to use spring or just plain Aspects the setup will be supereasy :) )
My web app is running on 64-bit Java 6.0.23, Tomcat 6.0.29 (with Apache Portable Runtime 1.4.2), on Linux (CentOS). Tomcat's JAVA_OPTS includes -Xincgc, which is supposed to help prevent long garbage collections.
The app is under heavy load and has intermittent failures, and I'd like to troubleshoot it.
Here is the symptom: Very intermittently, an HTTP client will send an HTTP request to the web app and get an empty response back.
The app doesn't use a database, so it's definitely not a problem with JDBC connections. So I figure the problem is perhaps one of: memory (perhaps long garbage collections), out of threads, or out of file descriptors.
I used javamelody to view the number of threads that are being used, and it seems that maxThreads is set high enough to not be running out of threads. Similarly, we have the number of available of file descriptors set to a very high number.
The app does use a lot of memory. Does it seem like memory is probably the culprit here, or is there something else that I might be overlooking?
I guess my main confusion, though, is why garbage collections would cause HTTP requests to fail. Intuitively, I would guess that a long garbage collection might cause an HTTP request to take a long time to run, but I would not guess that a long garbage collection would cause an HTTP request to fail.
Additional info in response to Jon Skeet's comments...
The client is definitely not timing out. The empty response happens fairly quickly. When it fails, there is no data and no HTTP headers.
I very much doubt that garbage collection is responsible for the issue.
You really really need to find out exactly what this "empty response" consists of:
Does the server just chop the connection?
Does the client perhaps time out?
Does the server give a valid HTTP response but with no data?
Each of these could suggest very different ways of finding out what's going on. Determining the failure mode should be your primary concern, IMO. Until you know that, it's complete guesswork.