Tomcat consumes very high CPU usage - java

This is my situation and I do not know what I can check next to solve my problem.
I have a Java web application running on tomcat & linux server
The application is very slow
The top command show that the CPU load for the Java process is very high. It reaches more than 1000 percent.
the dstat command show that the disk write rate is much higher than the read rate
And I can not restart the application :(
What can I do now?

Well unless you can restart something you can't fix anything.
You've got to analyse what is going on, do we know it's the app that's at fault? [You don't say what else is deployed into the server.] But supposing that it is known to be at fault you need to look at it in some detail.
Busy disk writes is a bit suggestive: is it possible that there's lots of diagnostic trace being output? Or is it possible there's a memory leak and you're getting paging?
There are many performance analysis tools out there, you may need to get into some detailed analysis.


Heavy resourse utilization - Weblogic server

I have a server with 4 CPU's and 16GB of RAM.
There is a Weblogic Admin server and 2 managed servers and a Tomcat server running in this Ubuntu Machine.
The resource utilization explodes at times which is very unusual. This has never happened before and I think it has something to do with the Java Parameters that I used.
Have a look at this:
Weblogic Cluster:
Admin Server : qaas-01
Managed Servers : qams-01, qams-02
In the below image you will be able to see that the java processes associated with the above are multiplying and consuming too much memory.
Figured out that this is more generic and not specific to Weblogic.
A lot of processes are behaving the same way.
In the below picture its Apache Tomcat and Jenkin's slave process thats replicating and consuming memory.
Can anyone help me identify the real issue?
This question is quite broad, so start looking into why it may be happening. Post your JVM flags also and if you changed anything that may be causing this.
First you need to figure out what is taking up your CPU time.
Check weblogic config console to generate a stack trace to see what is going on. You may need to sit and watch the CPU so you can run that when it spikes. You can also force a stack trace using jstack. To get java stacktrace you may need to sudo and execute it as the user running the server otherwise you get OS thread dump which may not be as useful. Read about jstack.
If above does not give enough info as to why the CPU spiked, and since this is ubuntu you can run:
timeout 20 strace -cvf -p {SERVER PID HERE} -o strace_digest.txt
This will run strace for 20 seconds and report on which OS calls are being made most frequently. This can give you a hint as to what is going on.
Enable and check the garbage collection log and see how often it runs, it may not have enough memory. See if there is a correlation between GC running and CPU spike.
I don't think there is a definitive way to help you solve CPU spike by looking at top, but above is a start to get you debugging.

getting "java.lang.OutOfMemoryError: unable to create new native" on VPS allthough have enough memory

I just can't figure it out, why i get this error. It is not always shown, but once it appears, my application refuses to accept connections (can't create new Socket-Threads, and also other threads i create in my JAVA-application for some of them i use ThreadPool).
top and htop shows me, there is ~ 900 MB of 2048MB used.
and there is also enough heap memory, about 200MB free.
cat /proc/sys/kernel/threads-max outputs:
and also, everything worked fine few days ago, it's a multiplayer-online game, and we had over 200 users online(~500 threads in total). But now even with 80 users online(~200 threads) after 10 min or few hours my application gets somehow broken with this OutOfMemoryError. In this case i do restart my application and again it works only for this short period of time.
I am very curious about, what if JVM act strangely on VPS, since other VPS on the same physical machine do also use JVM. Is that even possible?
Is there some sort of limit by provider what is not visible to me?
Or is there some sort of server attack?
I should also mention, by the time this error occours, sometimes munin fails to log the data for about only 10 min. Looking at graph-images, there is just white-space, like munin is not working at all. And again there is about 1 GB memory free as htop tells me by that time.
It might be also we case, i somehow produced a bug in my application. And start getting this error after I've done update. But even so, where do i begin the debugging ?
try increasing the stack size (-Xss)
You seem to host your app in some remote vps server. Are you sure the server, not your development box, has sufficient ram. People very often confuse their own machine with the remote machine.
Because if Bash is running out of memory too, is obviously a System Memory issue, not an App Memory issue. Post the results of free -m and ulimit -a on the remote machine to get more data.
If you distrust yout your provider to be using some troyanized htop, free and ulimit , you can test the real available memory with a simple C progran where you allocate with malloc 70~80% of your available ram and assigning random bytes on it in no more than 10 lines of ANSI C code. You can compile it statically on your box to avoid any crooked libc, and then transfer it with scp. That being said I heard rumors of vps providers giving less than promised but never encounter any.
Well moving from a VPS to a dedicated server solved my problem.
Additionally i found this
this might be exactly the case, because on VPS i had there was really too low value for "privvmpages". It seems there is really some weird JVM behaviour in VPS.
As i already wrote in comments, even other programs(ls, top, htop, less) were not able to start at some time, although enough memory were available/free.
And.. provider did really made some changes on their System.
And also thank you everyone, for very fast reply and helping me solving this mystery.
You should try JRockit VM it is work perfect on my OpenVZ VPS, it consumes memory much less then Sun/Oracle jvm.

Performance drop after 5 days running web application, how to spot the bottleneck?

I've developed a web application using the following tech stack:
Play Framework
DavMail integration (for calender and exchange server)
Akka actors
On the first days, the application runs smoothly and without lags. But after 5 days or so, the application gets really slow! And now I have no clue how to profile this, since I have huge dependencies and it's hard to reproduce this kind of thing. I have looked into the memory and it seems that everything its okay.
Any pointers on the matter?
Try using VisualVM - you can monitor gc behaviour, memory usage, heap, threads, cpu usage etc. You can use it to connect to a remote VM.
`visualvm˙ is also a great tool for such purposes, you can connect to a remote JVM as well and see what's inside.
I suggest you doing this:
take a snapshot of the application running since few hours and since 5 days
compare thread counts
compare object counts, search for increasing numbers
see if your program spends more time in particular methods on the 5th day than on the 1str one
check for disk space, maybe you are running out of it
jconsole comes with the JDK and is an easy tool to spot bottlenecks. Connect it to your server, look into memory usage, GC times, take a look at how many threads are alive because it could be that the server creates many threads and they never exit.
I agree with tulskiy. On top of that you could also use JMeter if the investigations you will have made with jconsole are unconclusive.
The probable causes of the performances degradation are threads (that are created but never exit) and also memory leaks: if you allocate more and more memory, before having the OutOfMemoryError, you may encounter some performances degradation (happened to me a few weeks ago).
To eliminate your database you can monitor slow queries (and/or queries that are not using an index) using the slow query log
I would hazard a guess that you have a missing index, and it has only become apparent as your data volumes have increased.
Yet another profiler is Yourkit.
It is commercial, but with trial period (two weeks).
Actually, I've firstly tried VisualVM as #axel22 suggested, but our remote server was ssh'ed and we had problems with connecting via VisualVM (not saying that it is impossible, I've just surrendered after a few hours).
You might just want to try the 'play status' command, which will list web app state (threads, jobs, etc). This might give you a hint on what's going on.
So guys, in this specific case, I was running play in Developer mode, which makes the compiler works every now and then.
After changing to production mode, everything was lightning fast and no more problems anymore. But thanks for all the help.

How to determine why is Java app slow

We have an Java ERP type of application. Communication between server an client is via RMI. In peak hours there can be up to 250 users logged in and about 20 of them are working at the same time. This means that about 20 threads are live at any given time in peak hours.
The server can run for hours without any problems, but all of a sudden response times get higher and higher. Response times can be in minutes.
We are running on Windows 2008 R2 with Sun's JDK 1.6.0_16. We have been using perfmon and Process Explorer to see what is going on. The only thing that we find odd is that when server starts to work slow, the number of handles java.exe process has opened is around 3500. I'm not saying that this is the acual problem.
I'm just curious if there are some guidelines I should follow to be able to pinpoint the problem. What tools should I use? ....
Can you access to the log configuration of this application.
If you can, you should change the log level to "DEBUG". Tracing the DEBUG logs of a request could give you a usefull information about the contention point.
If you can't, profiler tools are can help you :
VisualVM (Free, and good product)
Eclipse TPTP (Free, but more complicated than VisualVM)
JProbe (not Free but very powerful. It is my favorite Java profiler, but it is expensive)
If the application has been developped with JMX control points, you can plug a JMX viewer to get informations...
If you want to stress the application to trigger the problem (if you want to verify whether it is a charge problem), you can use stress tools like JMeter
Sounds like the garbage collection cannot keep up and starts "halt-the-world" collecting for some reason.
Attach with jvisualvm in the JDK when starting and have a look at the collected data when the performance drops.
The problem you'r describing is quite typical but general as well. Causes can range from memory leaks, resource contention etcetera to bad GC policies and heap/PermGen-space allocation. To point out exact problems with your application, you need to profile it (I am aware of tools like Yourkit and JProfiler). If you profile your application wisely, only some application cycles would reveal the problems otherwise profiling isn't very easy itself.
In a similar situation, I have coded a simple profiling code myself. Basically I used a ThreadLocal that has a "StopWatch" (based on a LinkedHashMap) in it, and I then insert code like this into various points of the application: watch.time("OperationX");
then after the thread finishes a task, I'd call watch.logTime(); and the class would write a log that looks like this: [DEBUG] StopWatch time:Stuff=0, AnotherEvent=102, OperationX=150
After this I wrote a simple parser that generates CSV out from this log (per code path). The best thing you can do is to create a histogram (can be easily done using excel). Averages, medium and even mode can fool you.. I highly recommend to create a histogram.
Together with this histogram, you can create line graphs using average/medium/mode (which ever represents data best, you can determine this from the histogram).
This way, you can be 100% sure exactly what operation is taking time. If you can't determine the culprit, binary search is your friend (fine grain the events).
Might sound really primitive, but works. Also, if you make a library out of it, you can use it in any project. It's also cool because you can easily turn it on in production as well..
Aside from the GC that others have mentioned, Try taking thread dumps every 5-10 seconds for about 30 seconds during your slow down. There could be a case where DB calls, Web Service, or some other dependency becomes slow. If you take a look at the tread dumps you will be able to see threads which don't appear to move, and you could narrow your culprit that way.
From the GC stand point, do you monitor your CPU usage during these times? If the GC is running frequently you will see a jump in your overall CPU usage.
If only this was a Solaris box, prstat would be your friend.
For acute issues like this a quick jstack <pid> should quickly point out the problem area. Probably no need to get all fancy on it.
If I had to guess, I'd say Hotspot jumped in and tightly optimised some badly written code. Netbeans grinds to a halt where it uses a WeakHashMap with newly created objects to cache file data. When optimised, the entries can be removed from the map straight after being added. Obviously, if the cache is being relied upon, much file activity follows. You probably wont see the drive light up, because it'll all be cached by the OS.

My JBoss server hits 100% SYS CPU on Linux; what can cause this?

We've been debugging this JBoss server problem for quite a while. After about 10 hours of work, the server goes into 100% CPU panic attacks and just stalls. During this time you cannot run any new programs, so you can't even kill -quit to get a stack trace. These high 100% SYS CPU loads last 10-20 seconds and repeat every few minutes.
We have been working on for quite a while. We suspect it has something to do with the GC, but cannot confirm it with a smaller program. We are running on i386 32bit, RHEL5 and Java 1.5.0_10 using -client and ParNew GC.
Here's what we have tried so far:
We limited the CPU affinity so we can actually use the server when the high load hits. With strace we see an endless loop of SIGSEGV and then the sig return.
We tried to reproduce this with a Java program. It's true that SYS CPU% climbs high with WeakHashMap or when accessing null pointers. Problem was that fillStackTrace took a lot of user CPU% and that's why we never reached 100% SYS CPU.
We know that after 10 hours of stress, GC goes crazy and full GC sometimes takes 5 seconds. So we assume it has something to do with memory.
jstack during that period showed all threads as blocked. pstack during that time, showed MarkSweep stack trace occasionally, so we can't be sure about this as well. Sending SIGQUIT yielded nothing: Java dumped the stack trace AFTER the SYS% load period was over.
We're now trying to reproduce this problem with a small fragment of code so we can ask Sun.
If you know what's causing it, please let us know. We're open to ideas and we are clueless, any idea is welcome :)
Thanks for your time.
Thanks to everybody for helping out.
Eventually we upgraded (only half of the java servers,) to JDK 1.6 and the problem disappeared. Just don't use :)
We managed to reproduce these problems by just accessing null pointers (boosts SYS instead of US, and kills the entire linux.)
Again, thanks to everyone.
If you're certain that GC is the problem (and it does sound like it based on your description), then adding the -XX:+HeapDumpOnOutOfMemoryError flag to your JBoss settings might help (in JBOSS_HOME/bin/run.conf).
You can read more about this flag here. It was originally added in Java 6, but was later back-ported to Java 1.5.0_07.
Basically, you will get a "dump file" if an OutOfMemoryError occurs, which you can then open in various profiling tools. We've had good luck with the Eclipse Memory Analyzer.
This won't give you any "free" answers, but if you truly have a memory leak, then this will help you find it.
Have you tried profiling applications. There are some good profiling applications that can run on production servers. Those should give you if GC is running into trouble and with which objects
I had a similar issue with JBoss (JBoss 4, Linux 2.6) last year. I think in the end it did turn out to be related to an application bug, but it was definitely very hard to figure out. I would keep trying to send a 'kill -3' to the process, to get some kind of stack trace and figure out what is blocking. Maybe add logging statements to see if you can figure out what is setting it off. You can use 'lsof' to figure out what files it has open; this will tell you if there is a leak of some resource other than memory.
Also, why are you running JBoss with -client instead of -server? (Not that I think it will help in this case, just a general question).
You could try adding the command line option -verbose:gc which should print GC and heap sizes out to stdout. pipe stdout to a file and see if the high cpu times line up with a major gc.
I remember having similar issues with JBoss on Windows. Periodically the cpu would go 100%, and the Windows reported mem usage would suddenly drop down to like 2.5 MB, much smaller than possible to run JBoss, and after a few seconds build itself back up. As if the entire server came down and restarted itself. I eventually tracked my issue down to a prepared statement cache never expiring in Apache Commons.
If it does seem to be a memory issue, then you can start taking periodic heap dumps and comparing the two, or use something like JProbe Memory profiler to track everything.
