Tomcat - scaling on a single server

Tomcat - scaling on a single server - java

I have an embedded Tomcat application running on an Amazon EC2 instance. The site was getting an increased amount of traffic, so I upgraded to a much larger instance. However, with the same amount of traffic and with a much larger server, the slow down was still there. I increased the maxThreads and the xmx/xms, but that didn't help much.
The server resources used are small on both the web server and the database server (RDS) (less than 10% memory and less than 20% CPU).
Is there anything that can be done to speed up Tomcat? Or should I bite the bullet and use multiple Tomcat instances and a load balancer?
EDIT: Just to clarify, nothing has changed in the application, just the traffic increased (almost doubled). My assumption was that (more than) doubling the resources (web server and db) should be adequate. I guess it's not that simple.

It's better you don't do anything right now. You obviously don't know what's actually slowing down the application, that's why you were surprised when you upgraded the server and it had no effect.
Instead of randomly doing things like a monkey with a typewriter, hoping that something will help, profile your application (and run load testing against it) and see what are the "heaviest" actions. Then decide how to fix it, whether it's with code optimization, architectural changes, load balancing or any other solutions.
Don't guess, know.

stepanian, even though I'm not sure exactly what you mean by 'slow down', I had a similar case a few months back, when a huge spike of traffic (some viral campaign) caused our tomcat to behave really weird, even though the bottleneck was NOT on the disk, cpu or memory.
In our case, changing the Connector configuration basically solved the problem, especially understanding the:
acceptCount
acceptorThreadCount
maxConnections
maxThreads
Hope this helps! : )

Related

Cluster gets into state where members restart repeatedly and clients cannot update the data in the cluster

We've been using Hazelcast for a number of years but I'm new to the group.
We have a cluster formed by a dedicated Java application (it's sole purpose is to provide the cluster). It's using the 3.8.2 jars and running JDK 1.8.0_192 on Linux (Centos 7).
The cluster manages relatively static data (ie. a few updates a day/week). Although an update may involve changing a 2MB chunk of data. We're using the default sharding config with 271 shards across 6 cluster members. There are between 40 and 80 clients. Each client connection should be long-lived and stable.
"Occasionally" we get into a situation where the Java app that's providing the cluster repeatedly restarts and any client that attempts to write to the cluster is unable to do so. We've had issues in the past where the cluster app runs out of memory due to limits on the JVM command line. We've previously increased these and (to the best of my knowledge) the process restarts are no longer caused by OutOfMemory exceptions.
I'm aware we're running a very old version and many people will suggest simply updating. This is work we will carry out but we're attempting to diagnose the existing issue with the system we have in front of us.
What I'm looking for here is any suggestions regarding types of investigation to carry out, queries to run (either periodically when the system is healthy or during the time when it is in this failed state).
We use tools such as: netstat, tcpdump, wireshark and top regularly (I'm sure there are more) when diagnosing issues such as this but have been unable to establish a convincing root cause of this issue.
Any help greatly appreciated.
Thanks,
Dave
As per the problem description.
Our only way to resolve the issue is to bounce the cluster completely - ie. stop all the members and then restart the cluster.
Ideally we'd have a system to remained stable and could recover from whatever "event" causes the issue we're seeing.
This may involve config or code changes.

Updating entries the size of 2MBs has many consequences - large serialization/deserialization costs, fat packets in the network, cost of accommodating those chunks in JVM heap etc. An ideal entry size is < 30-40KB.
To your immediate problem, start with GC diagnosis. You can use jstat to investigate memory usage patterns. If you are running into lot of full GCs and/or back-to-back full GCs then you will need to adjust heap settings. Also check the network bandwidth, which is usually the prime suspect in the cases of fat packets traveling through the network.
All of the above are just band-aid solutions, you should really look to break your entries down to smaller entries.

Weird Memory Usage by Tomcat

Its a vague question. So please feel free to ask for any specific data.
We have a tomcat server running with two web-service's. One tomcat built using spring. We use mysql for 90% of tasks and mongo for caching of jsons (10%). The other web-service is written using grails. Both the services are medium sized codebases (About 35k lines of code each)
The computation only happens when there is an HTTP request (No batch processing). With about 2000 database hits per request (I know its humongous. We are working on it). The request rate is about 30 req/min. For one particular request, there is Image processing which is quite memory expensive. No JNI anywhere
We have found a weird behavior. Last night, I can confirm that there was no request to the server for about 12 hours. But when I look at the memory consumption, it is very confusing:
Without any requests, the memory keeps jumping from 500Mb to 1.2Gb (700 Mb jump is worrysome). There is no computation on server side as mentioned. I am not sure if its a memory leak:
The memory usage comes down. (Things would have been way easier if the memory didnt come down).
This behavior is reproducable with caches based on SoftReference or so. With full gc's. But I am not using them anywhere (Not sure if something else is using it)
What else can be the reason. is it a cause to worry?
PS: We have had Our of Memory Crashes (Not errors but JVM crash) quite frequently very recently.

This is actually normal behavior. You're just seeing garbage collection occur.

Running bottleneck tests for servers

I'm attempting to detect a bottleneck in our servers, and I'm having a hard time deciding where to start. The symptoms from the machine before it crashes are dropped connections (time outs, this is what happens when a client sees when a response takes too long, possibly indicating that a processor couldn't be allocated by the server side code, and the request couldn't be handled) and out of memory issues. The latter error code is actually given by the JVM in an error log, but I have a hard time believing that both the RAM and the lack of available CPUs are turning out to be the bottlenecks at the same time. (In the days of the previous crashes, it has consistently crashed in the manner described above.) We have our own in house server code, and I wouldn't classify it as analogous to Apache or any other code I've seen. (Sorry if this makes giving advice that much more difficult.)
I'd like to take some time to create a somewhat controlled test locally. I'm running the server, and I'm creating a program that will request different things from the local server. What's a good way to monitor RAM/CPU? I'm currently using Java's VisualVM, but monitors stop responding when I hammer it with some of the tests.
Any ideas out there would be greatly appreciated. Like I mentioned, I'm trying to grab as much useful data, to help me further troubleshoot. In general, when a bottleneck issue like this arises, what are some general strategies to take? The live servers are all running on Windows Server 2008. The version of Java is 7.03. My local box is running Windows 7, with Java 7.03 as well. I don't want to make too many assumptions, but I think it's reasonable to assume that Server 2008 and Windows 7 are pretty similar. (The OS architecture is the same.) Aside from that, my local box has identical hardware to our servers.

For Windows, you need to:
1) Establish a "performance baseline"
2) Try to reproduce the bottleneck, and compare the behavior under stress with the baseline
3) Identify the cause of the bottleneck, and correct it.
These links will help:
http://technet.microsoft.com/en-us/library/cc781394%28v=ws.10%29.aspx
http://msdn.microsoft.com/en-us/library/windows/hardware/gg463394.aspx
You should also look at these TCP/IP registry settings:
Increase the range of ephemeral ports
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, MaxUserPort
Consider disabling auto-tuning:
http://geekswithblogs.net/cajunmcse/archive/2010/08/20/windows-2008-exchange-and-tcpip-auto-tuning.aspx

Performance drop after 5 days running web application, how to spot the bottleneck?

I've developed a web application using the following tech stack:
Java
Mysql
Scala
Play Framework
DavMail integration (for calender and exchange server)
Javamail
Akka actors
On the first days, the application runs smoothly and without lags. But after 5 days or so, the application gets really slow! And now I have no clue how to profile this, since I have huge dependencies and it's hard to reproduce this kind of thing. I have looked into the memory and it seems that everything its okay.
Any pointers on the matter?

Try using VisualVM - you can monitor gc behaviour, memory usage, heap, threads, cpu usage etc. You can use it to connect to a remote VM.

`visualvm˙ is also a great tool for such purposes, you can connect to a remote JVM as well and see what's inside.
I suggest you doing this:
take a snapshot of the application running since few hours and since 5 days
compare thread counts
compare object counts, search for increasing numbers
see if your program spends more time in particular methods on the 5th day than on the 1str one
check for disk space, maybe you are running out of it

jconsole comes with the JDK and is an easy tool to spot bottlenecks. Connect it to your server, look into memory usage, GC times, take a look at how many threads are alive because it could be that the server creates many threads and they never exit.

I agree with tulskiy. On top of that you could also use JMeter if the investigations you will have made with jconsole are unconclusive.
The probable causes of the performances degradation are threads (that are created but never exit) and also memory leaks: if you allocate more and more memory, before having the OutOfMemoryError, you may encounter some performances degradation (happened to me a few weeks ago).

To eliminate your database you can monitor slow queries (and/or queries that are not using an index) using the slow query log
see: http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
I would hazard a guess that you have a missing index, and it has only become apparent as your data volumes have increased.

Yet another profiler is Yourkit.
It is commercial, but with trial period (two weeks).
Actually, I've firstly tried VisualVM as #axel22 suggested, but our remote server was ssh'ed and we had problems with connecting via VisualVM (not saying that it is impossible, I've just surrendered after a few hours).

You might just want to try the 'play status' command, which will list web app state (threads, jobs, etc). This might give you a hint on what's going on.

So guys, in this specific case, I was running play in Developer mode, which makes the compiler works every now and then.
After changing to production mode, everything was lightning fast and no more problems anymore. But thanks for all the help.

Java Application Server Performance

I've got a somewhat dated Java EE application running on Sun Application Server 8.1 (aka SJSAS, precursor to Glassfish). With 500+ simultaneous users the application becomes unacceptably slow and I'm trying to assist in identifying where most of the execution time is spent and what can be done to speed it up. So far, we've been experimenting and measuring with LoadRunner, the app server logs, Oracle statpack, snoop, adjusting the app server acceptor and session (worker) threads, adjusting Hibernate batch size and join fetch use, etc but after some initial gains we're struggling to improve matters more.
Ok, with that introduction to the problem, here's the real question: If you had a slow Java EE application running on a box whose CPU and memory use never went above 20% and while running with 500+ users you showed two things: 1) that requesting even static files within the same app server JVM process was exceedingly slow, and 2) that requesting a static file outside of the app server JVM process but on the same box was fast, what would you investigate?
My thoughts initially jumped to the application server threads, both acceptor and session threads, thinking that even requests for static files were being queued, waiting for an available thread, and if the CPU/memory weren't really taxed then more threads were in order. But then we upped both the acceptor and session threads substantially and there was no improvement.
Clarification Edits:
1) Static files should be served by a web server rather than an app server. I am using the fact that in our case this (unfortunately) is not the configuration so that I can see the app server performance for files that it doesn't execute -- therefore excluding any database performance costs, etc.
2) I don't think there is a proxy between the requesters and the app server but even if there was it doesn't seem to be overloaded because static files requested from the same application server machine but outside of the application's JVM instance return immediately.
3) The JVM heap size (Xmx) is set to 1GB.
Thanks for any help!

SunONE itself is a pain in the ass. I have a very same problem, and you know what? A simple redeploy of the same application to Weblogic reduced the memory consumption and CPU consumption by about 30%.
SunONE is a reference implementation server, and shouldn't be used for production (don't know about Glassfish).
I know, this answer doesn't really helps, but I've noticed considerable pauses even in a very simple operations, such as getting a bean instance from a pool.
May be, trying to deploy JBoss or Weblogic on the same machine would give you a hint?
P.S. You shouldn't serve static content from under application server (though I do it too sometimes, when CPU is abundant).
P.P.S. 500 concurrent users is quite high a load, I'd definetely put SunONE behind a caching proxy or Apache which serves static content.

After using a Sun performance monitoring tool we found that the garbage collector was running every couple seconds and that only about 100MB out of the 1GB heap was being used. So we tried adding the following JVM options and, so far, this new configuration as greatly improved performance.
-XX:+DisableExplicitGC -XX:+AggressiveHeap
See http://java.sun.com/docs/performance/appserver/AppServerPerfFaq.html
Our lesson: don't leave JVM option tuning and garbage collection adjustments to the end. If you're having performance trouble, look at these settings early in your troubleshooting process.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.