Tomcat performance issue with many simultaneous connections and scalling - java

I am running a Tomcat 7.0.55 instance with a Spring REST service behind on Ubuntu 14.04LTS server. I am doing performance tests with Gatling. I have created a simulation using a front-end application that accesses the REST backend.
My config is:
Total RAM: 512MB, 1 CPU, JVM options: -Xms128m -Xmx312m -XX:PermSize=64m -XX:MaxPermSize=128m
The environment might not seem to be very efficient, but if I do not cross the limit of the ~700 users (I process 90k requests in 7 minutes) I get all request processed successfully and very quickly.
I am starting having issues when there are too many connections at the same time. The failing scenario is that there are around 120k requests in 7 minutes. Problems start to begin when there are around 800 concurrent users in play. Until the number of users is 600-700, all goes fine, but after this limit I am starting getting exceptions:
java.util.concurrent.TimeoutException: Request timed out to /xxx.xxx.xxx.xxx:8080 of 60000 ms
at com.ning.http.client.providers.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43) [async-http-client-1.8.12.jar:na]
at com.ning.http.client.providers.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:43) [async-http-client-1.8.12.jar:na]
at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) [netty-3.9.2.Final.jar:na]
at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) [netty-3.9.2.Final.jar:na]
at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) [netty-3.9.2.Final.jar:na]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.9.2.Final.jar:na]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_55]
12:00:50.809 [WARN ] c.e.e.g.h.a.GatlingAsyncHandlerActor - Request 'request_47'
failed : GatlingAsyncHandlerActor timed out
I thought this could be related to small jvm. However, when I upgrade the environment to:
Total RAM: 2GB, 2CPUs, JVM options: -Xms1024m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m
I still get very similar results. The difference in failed requests is insignificant..
I've been playing with setting the Tomcat connector with no effect. The current tomcat settings are:
<Connector enableLookups="false" maxThreads="400" maxSpareThreads="200" minSpareThreads="60" maxConnections="8092" port="8080" protocol="org.apache.coyote.http11.Http11Protocol" connectionTimeout="20000" keepAliveTimeout="10000" redirectPort="8443" />
Manipulating the numbers of threads, connections, keepAliveTimeout didn't help at all to get the 800 concurrent users to work with no timeouts. I was planning to scale the app to handle at least 2k concurrent users, but so far I can see that vertical scaling and upgrading the env gives me no results. I also do not see any issues with memory through jvisualvm. The OS shoudln't be a limit, the ulimits are set to either unlimited or high values.. The DB is not a bottleneck as all REST is using internal caches.
It seems like tomcat is not able to process more than 800 connected users in my case. Do you have any ideas of how these issues could be adressed? I would like to be able to scale up to at least 2k users and keep the failed rate as low as possible. I will appreciate any thoughts and tips how I can work it out. If you need more details, please leave a comment.
Cheers
Adam

Do you increase open file number.Every connection consume a open file item.

You are probably hitting the limit on TCP connections given that you are creating so many in such a short time. By default Linux waits a while before cleaning up connections. After the test fails, run netstat -ant | grep WAIT | wc -l and see if you are close to 60,000. If so, that indicates you can do some tuning of the TCP stack. Try changing the following sysctl settings:
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_fin_timeout = 5
You can also try some other settings mentioned in this ServerFault question.

Related

How can we manage our memory more effectively with RunDeck 3.4.1 and TomCat 9 on Windows Server 2016 to prevent locking and crashing?

We have a Windows Server 2016 vm with 8192 MB ram with six cores and running RunDeck 3.4.1 with TomCat 9 from the rundeck.war file. Lately we've been seeing a couple issues crop up. First, RunDeck keeps user login sessions open well past the 30 minute idle limit in TomCat and Second, RunDeck does not respond or is extremely sluggish when the StandBy Memory leaves less than 400 MB of 'free memory' as if it never gets access to the StandBy Cache or queue or the priority is so low it can't get access to it. When a job fails, this problem gets even worse; but it also happens on successful jobs. This is causing our server to become unresponsive multiple times a day and the only way so far to free it is to manually release sessions in TomCat and/or to reboot the server completely. In the RunDeck Profile I have set the JVM to export RDECK_JVM="$RDECK_JVM -Xmx2048m -Xms512m -XX:MaxMetaspaceSize=512m -server".
Following the official documentation, those parameters (Xmx, Xms, and MaxMetaspaceSize) need to be defined in the setenv.bat file, take a look.

Tomcat 8 - POST and PUT requests slow when deployed on RHEL

I have developed a REST API using Spring Framework. When I deploy this in Tomcat 8 on RHEL, the response times for POST and PUT requests are very high when compared to deployment on my local machine (Windows 8.1). On RHEL server it takes 7-9 seconds whereas on local machine it is less than 200 milliseconds.
RAM and CPU of RHEL server are 4 times that of local machine. Default tomcat configurations are used in both Windows and RHEL. Network latency is ruled out because GET requests take more or less same time as local machine whereas time taken to first byte is more for POST and PUT requests.
I even tried profiling the remote JVM using Visual JVM. There are no major hotspots in my custom code.
I was able to reproduce this same issue in other RHEL servers. Is there any tomcat setting which could help in fixing this performance issue ?
The profiling log you have placed means nothing, more or less. It shows the following:
The blocking queue is blocking. Which is normal, because this is its purpose - to block. This mean there is nothing to take from it.
It is waiting for connection on the socket. Which is also normal.
You do not specify what is your RHEL 8 physical/hardware setup. The operating system here might not be the only thing. You can not eliminate still network latency. What about if you have SAN, the SAN may have latency itself. If you are using SSD drive and the RHEL is using SAN with replication you may experience network latecy there.
I am more inclined to first check the IO on the disk than to focus on operating system. If the server is shared there might be other processes occupying the disk.
You are saying that the latency is ruled out because the GET requests are taking the same time. This is not enough to overrule it as I said this is the latency between the client and the application server, it does not check the latency between your app server machin and your SAN or disk or whatever storage is there.

jboss unable to handle more than 3000 request

I created a web service both client and server. I thought of doing the performance testing. I tried jmeter with a sample test plan to execute it. Upto 3000 request jboss processed the request but when requests more than 3000 some of the request are not processed (In sense of Can't open connection : Connection refused). Where i have to make the changes to handle more than 10000 request at the same time. Either it's a jboss issue or System Throughput ?
jmeter Config : 300 Threads, 1 sec ramp up and 10 loop ups.
System (Server Config) : Windows 7, 4G RAM
Where i have to make the changes to handle more than 10000 request at the same time
10 thousand concurrent requests in Tomcat (I believe it is used in JBoss) is quite a lot. In typical setup (with blocking IO connector) you need one thread per one HTTP connection. This is way too much for ordinary JVM. On a 64-bit server machine one thread needs 1 MiB (check out -Xss parameter). And you only have 4 GiB.
Moreover, the number of context switches will kill your performance. You would need hundreds of cores to effectively handle all these connections. And if your request is I/O or database bound - you'll see a bottleneck elsewhere.
That being said you need a different approach. Either try out non-blocking I/O or asynchronous servlets (since 3.0) or... scale out. By default Tomcat can handle 100-200 concurrent connections (reasonable default) and a similar amount of connections are queued. Everything above that is rejected and you are probably experiencing that.
See also
Advanced IO and Tomcat
Asynchronous Support in Servlet 3.0
There are two common problems that I think of.
First, if you run JBoss on Linux as a normal user, you can run into 'Too many open files', if you did not edit the limits.conf file. See https://community.jboss.org/thread/155699. Each open socket counts as an 'open file' for Linux, so the OS could block your connections because of this.
Second, the maximum threadpool size for incoming connections is 200 by default. This limits the number of concurrent requests, i.e. requests that are in progress at the same time. If you have jmeter doing 300 threads, the jboss connector threadpool should be larger. You can find this in jboss6 in the jboss-web.sar/server.xml. Look for 'maxThreads' in the element: http://docs.jboss.org/jbossweb/latest/config/http.html.
200 is the recommended maximum for a single core CPU. Above that, the context switches start to give too much overhead, like Tomasz says. So for production use, only increase to 400 on a dual core, 800 on a quad core, etc.

Tomcat Stress Test Timeout

I'm currently investigating issues on the following system:
3.2 GHz 8-core machine, 24 GB ram
Debian 6.0.2
ulimit -n 4096
ulimit -Sn 4096
ulimit -Hn 65535
Tomcat 6.0.28
-Xmx20g
MySQL 5.0.51a (through hibernate and a few manual JDBC queries)
also pretty much room for caching
I'm testing the most common requests to the server with 2000 requests per minute remotely. Testing tool is latest jMeter. The average response time is around 65 ms, min is 35 and max is 4000ms (in rare cases, but has it's reason).
As far as I watched htop, the system specs are sufficient for at least 3 times more request per Minute. (Avg. CPU: 25%, RAM: 5 of 22GB) The server itself is accessible all the time. (Pinging it constantly while running the test.)
Important is the fact, that each request results in 3 additional requests to the local tomcat where the second finally gets the required data and the last is for statistics:
jMeter(1) -> RESTeasy-Service(2) -> ?-Service(2) -> Data-Service(2) -(new Thread)> Statistic-Service(2)
(1) is my jMeter test server and distant from (2), which is the tomcat server. Yes, the architecture might be a little weird, but that's not my fault. ^^
I switched the thread management to pool in server.xml. Set 1000 max threads up from default 200 and 10 idle up from 4. What I noticed is that the number of concurrent threads as good as never decreases, instead steadily rises up to tomcat's max it seems. htop reports 160 Threads while tomcat is stopped. About 460 when it's started freshly. (Services seem to start a few...) After a few hours (sometimes less) of hitting the server with 2000 requests per minute htop says there are 1400 tasks. This seems to be the point when I start to get timeouts in jMeter. As this is extremely time consuming I did not watch it a thousand times and therefore can't garantuee this is the cause, but that's pretty much what happens.
Primary questions:
Math tells me that the concurrently used thread count should never ever exceed about 600. (34 requests * 4 requests * 4 seconds = 544, even less, but estimated 600 should be fine). As far as I understand the idea of thread pooling, unused threads should be released and stopped when idle for too long. Is there still a way I could get a thousand idling(?) threads? And is this ok?
Could a thread started manually in one of the request processors deny the tomcat threads to be released?
Shouldn't there be any log message telling me that tomcat could not create/fetch a thread for a request?
Any other ideas? I'm working on this for far too long and now tomcat exhausting it's thread pool seems the only valid reason for these weird timeouts. But maybe somebody has another hint.
Thanks in advance especially if you can finally save me from this...
After hours and days of mind-blowing I found that the timeouts happen when Tomcat reaches it's thread limit while we're in the middle of those 3 local connection openings. I guess if it once reaches that limit one thread is waiting for another to open which will not happen while the previous do not close. In German I'd call that Teufelskreis. ^^
Whatever, solution was raise max threads to a ridiculous high number:
<Executor name="tomcatThreadPool" namePrefix="catalina-exec-" maxThreads="10000" minSpareThreads="10"/>
I know that this should not be the way to go, but unfortunately we all here know that our architecture is somewhat impractical and nobody got the time to change something about it.
Hope it helps somebody. =)
I guess, this issue needs the understanding of underlying HTTP/1.1 or HTTP/1.1 keep alive connection.
If you are using it for REST web service, probably you want to set the maxKeepAliveRequests parameter in your connector configuration to 1.
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
maxKeepAliveRequests="1"
redirectPort="8443" />
This setting can be found in your $CATALINA_HOME/conf/server.xml.

Tomcat not recovering from excess trafic

When my tomcat (6.0.20) maxThreads limit is reached, i get the expected error:
Maximum number of threads (XXX) created for connector with address null and port 80
And then request starts hanging on queue and eventually timing out. so far, so good.
The problem is that when the load goes down, the server does not recover and is forever paralysed, instead of coming back to life.
Any hints?
Consider switching to NIO, then you don't need to worry about the technical requirement of 1 thread per connection. Without NIO, the limit is about 5K threads (5K HTTP connections), then it blows like that. With NIO, Java will be able to manage multiple resources by a single thread, so the limit is much higher. The border is practically the available heap memory, with about 2GB you can go up to 20K connections.
Configuring Tomcat to use NIO is as simple as changing the protocol attribute of the <Connector> element in /conf/server.xml to "org.apache.coyote.http11.Http11NioProtocol".
I think may be a bug in Tomcat and according to the issue:
https://issues.apache.org/bugzilla/show_bug.cgi?id=48843
should be fixed in Tomcat 6.0.27 and 5.5.30

Categories