We start the Jetty Server in Java and have a pretty straight forward process for that: API gets called, server starts. There will be no magic happening while starting Jetty, its a static process that shouldn't depend on anyhting. However we experience issues with some startups, where Jetty will try to open (presumably endlessly many) Connectors and maxes out the ThreadPool's size. OK-Runs will have exactly one Server Connector, as I would expect with a 4-core CPU and no Server Connector count set.
We also tried to navigate around the problem by setting the number of Server Connectors (to one), still Jetty would ramp up the count and fail to start, because there would be not enough threads available.
Even more curious, another API-User (different application) never had this issue once. This has all been tested on the same machine, same OS, often the same day even.
We use Jetty 9.4.38. This is what the Exception says:
could not subscribe connector
java.lang.IllegalStateException: Insufficient configured threads: required=200 < max=200 for QueuedThreadPool[qtp370296980]#16124894{STARTED,8<=144<=200,i=0,r=-1,q=0}[ReservedThreadExecutor#3825f21{s=0/6,p=0}]
at org.eclipse.jetty.util.thread.ThreadPoolBudget.check(ThreadPoolBudget.java:165)
at org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseTo(ThreadPoolBudget.java:141)
at org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseFrom(ThreadPoolBudget.java:191)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:320)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:234)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
The ThreadPoolBudget being reported is indicating that you have some combination of technologies and/or components that are requiring a thread load of 200 threads, but your max threads is exactly 200.
The output from INFO logging of the named logger org.eclipse.jetty.util.thread.ThreadPoolBudget will report the lease / budget of the various components you have.
Output will have a format indicating ...
INFO: <component> requires <num> threads from <thread-pool>
That will tell you what components are requesting that amount of threads.
The <component> will include the hashcode / object identity of the component, if you see the same one repeated (eg: Foo#1234 and Foo#789 would be different object instances, but Bar#575 and Bar#575 would be the same object instance), then you have a bad initialization of your Server. (likely not using the LifeCycle properly)
Related
I implemented a HttpSessionListener which counts the active sessions of our JSF WebApp. That works fine.
Is there a way to find out how many connections are allowed? Our JSF-WebApp is running on several tomcats with different settings.
What I need is a java method, that returns the max allowed connections of the environment to be able to show a warning if the conecctions may run out.
Any ideas?
Thanks a lot
Feels like you should rather configure this in your monitoring environment, e.g. monitoring JMX.
The maximum number of connections (something completely different from number of sessions) can be configured in server.xml - check maxConnections:
The maximum number of connections that the server will accept and
process at any given time. When this number has been reached, the
server will accept, but not process, one further connection. This
additional connection be blocked until the number of connections being
processed falls below maxConnections at which point the server will
start accepting and processing new connections again. Note that once
the limit has been reached, the operating system may still accept
connections based on the acceptCount setting. The default value is
8192.
For NIO/NIO2 only, setting the value to -1, will disable the
maxConnections feature and connections will not be counted.
What you see here is that it's not only dependent on tomcat, but also on your Operating System. And, of course, on your application: It might fold a lot earlier than that number.
I consider a system that just monitors the number of connections, not the other resources, to not be well tuned: If you figure out that your maximum configured connections are saturated, but technically you could "survive" twice as many: Who configured it? If your page load time goes down below the acceptable time at half of the connections: Who cares about the number of connections available still?
And if a reverse proxy comes in, that could queue some additional connections as well...
To come back to your original question: JMX might be a solution (find the relevant bean). Or, if you want to go homegrown, a servlet filter can keep track of the number of currently handled requests.
I am working on enterprise java application which has a lot of tools/frameworks in it already, such as Struts, JAX-RS and Spring MVC. It contains UIs and REST endpoints bundled together in a .war file.
The project is evolving and we are getting rid of older tools, striving for sticking up with only Spring MVC/Webflux.
Application is performing search on millions of XML/JSON records and recently the search engine was switched from Marklogic to Elasticsearch.
What we have noticed is that on production with not that heavy usage (up to 1.7k rpm on 2-4 application nodes) response times on some of the endpoints are increasing over time.
Elasticsearch has a space to grow and does not show any signs of a huge load.
So currently we have to restart/replace prod instances once in like a week or two when average response time is over 3 seconds instead of a regular 200-300 millis.
I tried to get CPU and heap flame graphs using async-profiler but the load profile is changing on every measurement as we have bunch of features available so I cannot really compare how graphs are changing over time.
Can you advise me on some tactics/approaches for finding the proper place in the code?
Found issue. It is related to thread pooling.
What we have noticed is that over time amount of active tomcat threads were growing together with response times:
On the image you can also see that the server was restarted on May 9th.
I was able to get a heap dump before the server restarted and after some digging found an interesting repeated piece in thread dump:
Thread xxx
at sun.misc.Unsafe.park(ZJ)V (Native Method)
at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()V (AbstractQueuedSynchronizer.java:2039)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(Ljava/lang/Object;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/Future;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:377)
at org.apache.http.pool.AbstractConnPool.access$200(Lorg/apache/http/pool/AbstractConnPool;Ljava/lang/Object;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/Future;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:67)
at org.apache.http.pool.AbstractConnPool$2.get(JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:243)
at org.apache.http.pool.AbstractConnPool$2.get(JLjava/util/concurrent/TimeUnit;)Ljava/lang/Object; (AbstractConnPool.java:191)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(Ljava/util/concurrent/Future;JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/HttpClientConnection; (PoolingHttpClientConnectionManager.java:282)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/HttpClientConnection; (PoolingHttpClientConnectionManager.java:269)
at org.apache.http.impl.execchain.MainClientExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (MainClientExec.java:191)
at org.apache.http.impl.execchain.ProtocolExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(Lorg/apache/http/HttpHost;Lorg/apache/http/HttpRequest;Lorg/apache/http/protocol/HttpContext;)Lorg/apache/http/client/methods/CloseableHttpResponse; (InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(Lorg/apache/http/client/methods/HttpUriRequest;Lorg/apache/http/protocol/HttpContext;)Lorg/apache/http/client/methods/CloseableHttpResponse; (CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(Lorg/apache/http/client/methods/HttpUriRequest;)Lorg/apache/http/client/methods/CloseableHttpResponse; (CloseableHttpClient.java:108)
at io.searchbox.client.http.JestHttpClient.executeRequest(Lorg/apache/http/client/methods/HttpUriRequest;)Lorg/apache/http/client/methods/CloseableHttpResponse; (JestHttpClient.java:136)
at io.searchbox.client.http.JestHttpClient.execute(Lio/searchbox/action/Action;Lorg/apache/http/client/config/RequestConfig;)Lio/searchbox/client/JestResult; (JestHttpClient.java:70)
at io.searchbox.client.http.JestHttpClient.execute(Lio/searchbox/action/Action;)Lio/searchbox/client/JestResult; (JestHttpClient.java:63)
...
In our case we are using Jest library to talk to Elasticsearch.
Internally it is using Apache HTTP client & Apache HTTP Async Client.
As you see on the thread dump it's clear that this thread was waiting for an available thread in HTTP Client's thread pool. And there were more threads with exactly the same stack.
What I also discovered is that we set maxTotal (maximum total number of connections) to 20 and defaultMaxPerRoute (maximum connections per route) to 2:
By default the pool allows only 20 concurrent connections in total and two concurrent connections per a unique route. The two connection limit is due to the requirements of the HTTP specification. However, in practical terms this can often be too restrictive.
See Connection pools description.
So the fix I did is increased those values to 50 and 40 respectively.
I would still prefer to have this parameters unbounded and grow with a usage but for now stick to these values.
I need to find my web application's performance and load test it. The application currently has a Tomcat configuration of 25 threads maximum and there are two such servers.
Does it mean that I should do load testing for 50 concurrent requests?
And what happens where there are more requests; does it go to the thread waiting queue in Tomcat?
In case it goes for a thread wait queue, can I test the application with more than 50 requests?
Tomcat can work in 2 modes:
BIO (blocking IO) where 1 thread can serve maximum 1 connection
NIO (non-blocking IO) where 1 thread can serve many more connections
Most probably your application is using the latter one, check out Understanding the Tomcat NIO Connector and How to Configure It guide for the overview. On the other hand even in case of using the BIO connector application might still be able to operate fast enough to serve more than 50 users.
In both cases, you should treat your backend as a "black box" (imagine you don't know anything about the configuration) and focus on testing non-functional requirements.
The essential performance testing types you should be considering are:
Load Testing: check how does your system behave when the anticipated amount of concurrent users are using it.
Soak Testing: the same, but assuming longer test duration, i.e. overnight or weekend. This way you will be able to see if there are any memory leaks, how does log rotation work, whether application cleans after itself so it won't run out of disk space, etc.
Stress Testing: the process of identifying the boundaries of your application, i.e. start with 1 virtual user and increase the load until application response time will be within reasonable boundaries or errors start occurring.
See Why ‘Normal’ Load Testing Isn’t Enough for more information.
We're using Glassfish 3.0.1 and experiencing very long response times; in the order of 5 minutes for 25% of our POST/PUT requests, by the time the response comes back the front facing load balancer has timed out.
My theory is that the requests are queuing up and waiting for an available thread.
The reason I think this is because the access logs reveal that the requests are taking a few seconds to complete however the time at which the requests are being executed are five minutes later than I'd expect.
Does anyone have any advice for debugging what is going on with the thread pools? or what the optimum settings should be for them?
Is it required to do a thread dump periodically or will a one off dump be sufficient?
At first glance, this seems to have very little to do with the threadpools themselves. Without knowing much about the rest of your network setup, here are some things I would check:
Is there a dead/nonresponsive node in the load balancer pool? This can cause all requests to be tried against this node until they fail due to timeout before being redirected to the other node.
Is there some issue with initial connections between the load balancer and the Glassfish server? This can be slow or incorrect DNS lookups (though the server should cache results), a missing proxy, or some other network-related problem.
Have you checked that the clocks are synchronized between the machines? This could cause the logs to get out of sync. 5min is a pretty strange timeout period.
If all these come up empty, you may simply have an impedance mismatch between the load balancer and the web server and you may need to add webservers to handle the load. The load balancer should be able to give you plenty of stats on the traffic coming in and how it's stacking up.
Usually you get this behaviour if you configured not enough worker threads in your server. Default values range from 15 to 100 threads in common webservers. However if your application blocks the server's worker threads (e.g. by waiting for queries) the defaults are way too low frequently.
You can increase the number of workers up to 1000 without problems (assure 64 bit). Also check the number of workerthreads (sometimes referred to as 'max concurrent/open requests') of any in-between server (e.g. a proxy or an apache forwarding via mod_proxy).
Another common pitfall is your software sending requests to itself (e.g. trying to reroute or forward a request) while blocking an incoming request.
Taking threaddump is the best way to debug what is going on with the threadpools. Please take 3-4 threaddumps one after another with 1-2 seconds gap between each threaddump.
From threaddump, you can find the number of worker threads by their name. Find out long running threads from the multiple threaddumps.
You may use TDA tool (http://java.net/projects/tda/downloads/download/tda-bin-2.2.zip) for analyzing threaddumps.
I created a web service both client and server. I thought of doing the performance testing. I tried jmeter with a sample test plan to execute it. Upto 3000 request jboss processed the request but when requests more than 3000 some of the request are not processed (In sense of Can't open connection : Connection refused). Where i have to make the changes to handle more than 10000 request at the same time. Either it's a jboss issue or System Throughput ?
jmeter Config : 300 Threads, 1 sec ramp up and 10 loop ups.
System (Server Config) : Windows 7, 4G RAM
Where i have to make the changes to handle more than 10000 request at the same time
10 thousand concurrent requests in Tomcat (I believe it is used in JBoss) is quite a lot. In typical setup (with blocking IO connector) you need one thread per one HTTP connection. This is way too much for ordinary JVM. On a 64-bit server machine one thread needs 1 MiB (check out -Xss parameter). And you only have 4 GiB.
Moreover, the number of context switches will kill your performance. You would need hundreds of cores to effectively handle all these connections. And if your request is I/O or database bound - you'll see a bottleneck elsewhere.
That being said you need a different approach. Either try out non-blocking I/O or asynchronous servlets (since 3.0) or... scale out. By default Tomcat can handle 100-200 concurrent connections (reasonable default) and a similar amount of connections are queued. Everything above that is rejected and you are probably experiencing that.
See also
Advanced IO and Tomcat
Asynchronous Support in Servlet 3.0
There are two common problems that I think of.
First, if you run JBoss on Linux as a normal user, you can run into 'Too many open files', if you did not edit the limits.conf file. See https://community.jboss.org/thread/155699. Each open socket counts as an 'open file' for Linux, so the OS could block your connections because of this.
Second, the maximum threadpool size for incoming connections is 200 by default. This limits the number of concurrent requests, i.e. requests that are in progress at the same time. If you have jmeter doing 300 threads, the jboss connector threadpool should be larger. You can find this in jboss6 in the jboss-web.sar/server.xml. Look for 'maxThreads' in the element: http://docs.jboss.org/jbossweb/latest/config/http.html.
200 is the recommended maximum for a single core CPU. Above that, the context switches start to give too much overhead, like Tomasz says. So for production use, only increase to 400 on a dual core, 800 on a quad core, etc.