I have a server with 4 CPU's and 16GB of RAM.
There is a Weblogic Admin server and 2 managed servers and a Tomcat server running in this Ubuntu Machine.
The resource utilization explodes at times which is very unusual. This has never happened before and I think it has something to do with the Java Parameters that I used.
Have a look at this:
Weblogic Cluster:
Admin Server : qaas-01
Managed Servers : qams-01, qams-02
In the below image you will be able to see that the java processes associated with the above are multiplying and consuming too much memory.
Figured out that this is more generic and not specific to Weblogic.
A lot of processes are behaving the same way.
In the below picture its Apache Tomcat and Jenkin's slave process thats replicating and consuming memory.
Can anyone help me identify the real issue?
This question is quite broad, so start looking into why it may be happening. Post your JVM flags also and if you changed anything that may be causing this.
First you need to figure out what is taking up your CPU time.
Check weblogic config console to generate a stack trace to see what is going on. You may need to sit and watch the CPU so you can run that when it spikes. You can also force a stack trace using jstack. To get java stacktrace you may need to sudo and execute it as the user running the server otherwise you get OS thread dump which may not be as useful. Read about jstack.
If above does not give enough info as to why the CPU spiked, and since this is ubuntu you can run:
timeout 20 strace -cvf -p {SERVER PID HERE} -o strace_digest.txt
This will run strace for 20 seconds and report on which OS calls are being made most frequently. This can give you a hint as to what is going on.
Enable and check the garbage collection log and see how often it runs, it may not have enough memory. See if there is a correlation between GC running and CPU spike.
I don't think there is a definitive way to help you solve CPU spike by looking at top, but above is a start to get you debugging.
Related
We have trouble with Tomcat 5.5 which stops at night on our production servers (Linux CentOS 4.8) and we have no idea why it stops...
There is no Tomcat's log in catalina.out or any application's log.
We tried different things to find why the server stops:
configure Tomcat to be able to generate a core dump
instrument System.exit() method with javassist to find if the method was called
add a shutdown hook to the JVM (with Runtime.getRuntime().addShutdownHook())
None of them worked, we have no core dump, the Exit method and the shutdown hook are not called.
My conclusions are:
The VM is not terminated properly but crash without any log.
Any idea or log to read to find why Tomcat stops?
1) Make sure you know where stderr is redirected and check if anything got printed there.
2) Check the memory limits on Tomcat and how much free memory does the system have. Review the Linux system logs under /var/log to see if anything suspicious happened during the time. For example, kernel can randomly kill a process (almost) without a trace if the system is running low on memory.
We've ran 5.5 in production for years and never had any unexplained shutdowns, FWIW.
This worked for me.
As suggested here in other answers checked system logs in /var/log/messages but permission denied for me. So, I used dmesg command instead and got this in the logs
"Out of memory: Kill process 14606 (java) score 106 or sacrifice child".
In the output I also noticed Swap Memory free 0 K. Ran top command to confirm the same. So, somehow there was a high memory usage which caused the OS to kill my tomcat process.
After spending hours finally got the reason.
ps -ef | grep tomcat showed that there were several tomcat processes running for the same application. It seems that, earlier tomcat shutdowns might not have been completed successfully and due to some reason the processes were not killed even after the shutdown, which was causing the high memory usage.
So, killed all running tomcat processes using kill. SWAP memory got freed.
Started tomcat again, worked fine. :)
Tomcat 7 has an option inside catalina to prevent the System.exit class call or something similar: http://ci.apache.org/projects/tomcat/tomcat7/docs/security-manager-howto.html .
Maybe there's a similar option for the 5.5 version. Try the documentation.
There are options to redirect the output to the same console that you use to start Tomcat. This information is redirected to logs when you execute on Unix based systems, on Windows, it remains with the console if not redirected.
Most probably there is a stack-overflow exception. This is typical behavior of Tomcat when it happens. For example, you're trying to serialize to JSON or XML beans with cyclic dependencies (but without handling of the cycles).
Everytime I had this issue (several times) it always has been this one. All other stops are usually logged properly (like OutOfMemory etc).
This type of stops leaves no trace anywhere.
I just can't figure it out, why i get this error. It is not always shown, but once it appears, my application refuses to accept connections (can't create new Socket-Threads, and also other threads i create in my JAVA-application for some of them i use ThreadPool).
top and htop shows me, there is ~ 900 MB of 2048MB used.
and there is also enough heap memory, about 200MB free.
cat /proc/sys/kernel/threads-max outputs:
1196032
and also, everything worked fine few days ago, it's a multiplayer-online game, and we had over 200 users online(~500 threads in total). But now even with 80 users online(~200 threads) after 10 min or few hours my application gets somehow broken with this OutOfMemoryError. In this case i do restart my application and again it works only for this short period of time.
I am very curious about, what if JVM act strangely on VPS, since other VPS on the same physical machine do also use JVM. Is that even possible?
Is there some sort of limit by provider what is not visible to me?
Or is there some sort of server attack?
I should also mention, by the time this error occours, sometimes munin fails to log the data for about only 10 min. Looking at graph-images, there is just white-space, like munin is not working at all. And again there is about 1 GB memory free as htop tells me by that time.
It might be also we case, i somehow produced a bug in my application. And start getting this error after I've done update. But even so, where do i begin the debugging ?
try increasing the stack size (-Xss)
You seem to host your app in some remote vps server. Are you sure the server, not your development box, has sufficient ram. People very often confuse their own machine with the remote machine.
Because if Bash is running out of memory too, is obviously a System Memory issue, not an App Memory issue. Post the results of free -m and ulimit -a on the remote machine to get more data.
If you distrust yout your provider to be using some troyanized htop, free and ulimit , you can test the real available memory with a simple C progran where you allocate with malloc 70~80% of your available ram and assigning random bytes on it in no more than 10 lines of ANSI C code. You can compile it statically on your box to avoid any crooked libc, and then transfer it with scp. That being said I heard rumors of vps providers giving less than promised but never encounter any.
Well moving from a VPS to a dedicated server solved my problem.
Additionally i found this
https://serverfault.com/questions/168080/java-vm-problem-in-openvz
this might be exactly the case, because on VPS i had there was really too low value for "privvmpages". It seems there is really some weird JVM behaviour in VPS.
As i already wrote in comments, even other programs(ls, top, htop, less) were not able to start at some time, although enough memory were available/free.
And.. provider did really made some changes on their System.
And also thank you everyone, for very fast reply and helping me solving this mystery.
You should try JRockit VM it is work perfect on my OpenVZ VPS, it consumes memory much less then Sun/Oracle jvm.
This is my situation and I do not know what I can check next to solve my problem.
I have a Java web application running on tomcat & linux server
The application is very slow
The top command show that the CPU load for the Java process is very high. It reaches more than 1000 percent.
the dstat command show that the disk write rate is much higher than the read rate
And I can not restart the application :(
What can I do now?
Well unless you can restart something you can't fix anything.
You've got to analyse what is going on, do we know it's the app that's at fault? [You don't say what else is deployed into the server.] But supposing that it is known to be at fault you need to look at it in some detail.
Busy disk writes is a bit suggestive: is it possible that there's lots of diagnostic trace being output? Or is it possible there's a memory leak and you're getting paging?
There are many performance analysis tools out there, you may need to get into some detailed analysis.
I have a production web application (Struts, iBatis, Hibernate) that runs in Tomcat that would hang while serving requests after 6 - 7 days of running but would run fine again after doing a thread dump.
I have a hard time figuring out why that is the case.
I was just wondering whether anyone else has ever encountered something similar.
Maybe this will help you find the cause of your problem.
I have enable JMX on tomcat
(set these optional vm arguments when starting tomcat)
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=30188 (whatever port you want jmx to run on for tc)
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
I then wrote a little app that monitors memory usage (via jmx) and notifies me if the memory usage goes over , say 80%.
I would then know as soon as something is starting to go wrong. Then I will get a histogram for in-memory objects (see http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html for how to get that).
At the end it turned out that one of my ejbQL queries caused a huge amount of memory being used.
Hope it might help in some way ......
First of all try to reproduce this in a test environment. You can use JMeter to stress the app. You can start tomcat using the -verbose:gc and -XX:+PrintGCDetails which will give you more insight on what is happening while GC runs. Then, when the site is not responding, you can get a thread dump and if this unblocks the site have a look at the GC details for more info.
We've been debugging this JBoss server problem for quite a while. After about 10 hours of work, the server goes into 100% CPU panic attacks and just stalls. During this time you cannot run any new programs, so you can't even kill -quit to get a stack trace. These high 100% SYS CPU loads last 10-20 seconds and repeat every few minutes.
We have been working on for quite a while. We suspect it has something to do with the GC, but cannot confirm it with a smaller program. We are running on i386 32bit, RHEL5 and Java 1.5.0_10 using -client and ParNew GC.
Here's what we have tried so far:
We limited the CPU affinity so we can actually use the server when the high load hits. With strace we see an endless loop of SIGSEGV and then the sig return.
We tried to reproduce this with a Java program. It's true that SYS CPU% climbs high with WeakHashMap or when accessing null pointers. Problem was that fillStackTrace took a lot of user CPU% and that's why we never reached 100% SYS CPU.
We know that after 10 hours of stress, GC goes crazy and full GC sometimes takes 5 seconds. So we assume it has something to do with memory.
jstack during that period showed all threads as blocked. pstack during that time, showed MarkSweep stack trace occasionally, so we can't be sure about this as well. Sending SIGQUIT yielded nothing: Java dumped the stack trace AFTER the SYS% load period was over.
We're now trying to reproduce this problem with a small fragment of code so we can ask Sun.
If you know what's causing it, please let us know. We're open to ideas and we are clueless, any idea is welcome :)
Thanks for your time.
Thanks to everybody for helping out.
Eventually we upgraded (only half of the java servers,) to JDK 1.6 and the problem disappeared. Just don't use 1.5.0.10 :)
We managed to reproduce these problems by just accessing null pointers (boosts SYS instead of US, and kills the entire linux.)
Again, thanks to everyone.
If you're certain that GC is the problem (and it does sound like it based on your description), then adding the -XX:+HeapDumpOnOutOfMemoryError flag to your JBoss settings might help (in JBOSS_HOME/bin/run.conf).
You can read more about this flag here. It was originally added in Java 6, but was later back-ported to Java 1.5.0_07.
Basically, you will get a "dump file" if an OutOfMemoryError occurs, which you can then open in various profiling tools. We've had good luck with the Eclipse Memory Analyzer.
This won't give you any "free" answers, but if you truly have a memory leak, then this will help you find it.
Have you tried profiling applications. There are some good profiling applications that can run on production servers. Those should give you if GC is running into trouble and with which objects
I had a similar issue with JBoss (JBoss 4, Linux 2.6) last year. I think in the end it did turn out to be related to an application bug, but it was definitely very hard to figure out. I would keep trying to send a 'kill -3' to the process, to get some kind of stack trace and figure out what is blocking. Maybe add logging statements to see if you can figure out what is setting it off. You can use 'lsof' to figure out what files it has open; this will tell you if there is a leak of some resource other than memory.
Also, why are you running JBoss with -client instead of -server? (Not that I think it will help in this case, just a general question).
You could try adding the command line option -verbose:gc which should print GC and heap sizes out to stdout. pipe stdout to a file and see if the high cpu times line up with a major gc.
I remember having similar issues with JBoss on Windows. Periodically the cpu would go 100%, and the Windows reported mem usage would suddenly drop down to like 2.5 MB, much smaller than possible to run JBoss, and after a few seconds build itself back up. As if the entire server came down and restarted itself. I eventually tracked my issue down to a prepared statement cache never expiring in Apache Commons.
If it does seem to be a memory issue, then you can start taking periodic heap dumps and comparing the two, or use something like JProbe Memory profiler to track everything.