I am running a Java process with Xmx2000m, the host OS is linux centos, jdk 1.6 update 22. Lately I have been experiencing a weird behavior in the process, it becomes totally unresponsive with no apparent reason, no logs, no errors, nothing.. I am using jconsole to monitor the processor, heap and Perm memory are not full, threads and loaded classes are not leaking..
Explanation anyone?
I doubt anyone can give you an explanation since there are lots of possible reasons and not nearly enough information. However, I suggest that you jstack the process once it's hung to figure out what the threads are doing, and take it from there. It sounds like a deadlock or thrashing of some sort.
Do a thread dump. If you have access to the foreground process on Linux, use ctrl-\. Or use jstack to dump stack remotely. Or you can actually poke it through JMX via jconsole at MBeans/java.lang/Threading/Operations/dumpAllThreads.
Without knowing more about your app, it's hard to speculate about the cause. Presumably your threads are either a) blocked or b) exited. If they are blocked, they could be waiting for I/O on a database or other operation OR they could be waiting on a lock or monitor (deadlocked). If a deadlock exists, the thread dump will tell you which threads are deadlocked, which lock, and (in Java 6) annotate the stack with where locks have been taken. You can also search for deadlocks with the JMX method, available through jconsole at MBeans/java.lang/Threading/Operations/find[Monitor]DeadlockedThreads().
Or your threads may have received unhandled exceptions and exited. Check out Thread's uncaughtExceptionHandlers or (better) use Executors in java.util.concurrent.
And finally, the other classic source of pauses in Java is GC. Run with -verbose:gc and other GC flags to see if it's doing a full GC collection. You can also turn this on dynamically in jconsole by flipping the flag at MBeans/java.lang/Memory/Attributes/Verbose.
Agree with aix, but would like to add a couple of recommendataions.
1. check your system. Run top to see whether the system itself is healthy, CPU is not 100% and memory is available. If not, fix this.
2. application may freeze as a result of dead lock. Check this.
Ok here are some updates I wanted to share:
There is an incompatability between NTPL (Linux’s new thread library) and the Java 1.6+ JVM. A random bug causes the JVM to hang and eat up 100% CPU.
To work around it set LD_ASSUME_KERNEL=2.4.1 before running the JVM, export LD_ASSUME_KERMEL=2.4.1 . This disables NTPL: problem solved!
But for compatibility reasons, I'm still looking for a solution that uses NTPL.
Threads can be traced using jvisualvm and jconsole, and deadlocks can be avoided too. Note that there are several network services each with separate thread pools, and they all become unreachable.
Check the jvisualvm of the process right before the crash.
http://www.jadyounan.com/wp-content/uploads/2010/12/process.png
Could you elaborate more on what you are doing ? 2000 for memory is rather a lot.
Related
If a Java process hangs (due to bug in JNI (faces deadlock), Can it result in blocking of entire JVM? i.e. all processes and threads getting blocked?
due to bug in JNI Yes. If you call into native code a bug can easily bring down the entire JVM (or block everything).
No. The thread or threads in deadlock will remain blocked, but other threads can run independently in other programs or even in the same program. Obviously deadlocks should be avoided whenever possible, but the affected threads will only be the threads in deadlock and any and all threads waiting on the completion of those threads.
Normally every application has it's own JVM instance running. So you could not crash other applications by trying to crash your current JVM . However some application share one JVM like a Web server for instance.
Another scenario would be if it crashes anything on an operating system level. Well then everything related to that will shut down.
It is possible to kill a thread that is in state RUNNING in a non programatically way?
I know that top command in *nix can show threads. Can I kill the thread in the OS?
I'd like to know if there is a way to link a thread to a process so I can kill only that specific thread and not the application.
We had a bug in our code that kept a thread in state RUNNING in a synchronized method. The thread kept the lock on the object "hanging" the application.
The bug is fixed. But I wonder if is possible.
The short answer is "maybe, but you should not and most of the time it won't work either".
The long answer is:
"Maybe..."
Some JVM implementation map java threads to OS threads and some do not. If the JVM does a mapping to a native OS thread, you might be able to kill that thread with some process tool that the OS provides (like kill on *nix). If the JVM does green threads, meaning it doesn't map a Java thread to an OS level thread, then you are basically out of luck using OS level tools. Luckily only very few JVM implementations do this. An approach that can be used regardless in which way the JVM organizes it's threads, is using the java debugger. This article describes the procedure of doing it: http://www.rhcedan.com/2010/06/22/killing-a-java-thread/.
"but you should not do it"
Killing a thread on the OS level will almost certainly leave the JVM in an undefined state (read "jvm might crash or delete all files on your disk or do whatever it fricking pleases to do"). Even when going the debugger way, only a very small amount of java applications (read "no application made on this planet") will properly handle the event that an outside application is killing one of it's threads. As a result these applications will be put in an undefined state (read "application might crash or delete all files on your disk or do whatever it fricking pleases to do").
"and most of the time it won't work either"
If the thread is really stuck with some blocked IO etc, then killing the thread won't work, it will just not respond. If a program is stuck it's probably better to kill the whole program, find the issue with the program and fix it instead of killing a single thread.
For all your doubts on killing a thread, refer this:
http://download.oracle.com/javase/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html
On linux, there is a tkill(int tid, int sig) command, similar to kill.
On windows, ProcessExplorer can do it from gui, don't know if there is anything with cli.
I have an app server process that's constantly at 100% CPU. By constantly I mean hours, or even days.
I know how to generate a heap/thread dump, but I'm looking for more dynamic information. I would like to know what is using so much CPU in there. There are tens (or probably 100+) threads. I know what those threads are, but I need to know which of them are using my CPU so much.
How can I obtain this information?
Use a profiler. There is one included in VisualVM which comes with the Oracle JDK.
An advanced commercial one (trial licenses available) is YourKit.
By creating a thread dump. You can use the jstack to connect to a running java process to get the thread dump. If you take two or more thread dumps over a period of time you can by analyzing them figure out which ones are actively using CPU. Typically the threads in the RUNNING state are the ones you need to focus on.
I personally use YourKit for this.
VisualVM also has some profiling capabilities, but I haven't used them.
in linux try kill -3 processid it will generate thread dump. You can analyze this to see what is happening in the java process.
I have a Java program for doing a set of scientific calculations across multiple processors by breaking it into pieces and running each piece in a different thread. The problem is trivially partitionable so there's no contention or communication between the threads. The only common data they access are some shared static caches that don't need to have their access synchronized, and some data files on the hard drive. The threads are also continuously writing to the disk, but to separate files.
My problem is that sometimes when I run the program I get very good speed, and sometimes when I run the exact same thing it runs very slowly. If I see it running slowly and ctrl-C and restart it, it will usually start running fast again. It seems to set itself into either slow mode or fast mode early on in the run and never switches between modes.
I have hooked it up to jconsole and it doesn't seem to be a memory problem. When I have caught it running slowly, I've tried connecting a profiler to it but the profiler won't connect. I've tried running with -Xprof but the dumps between a slow run and fast run don't seem to be much different. I have tried using different garbage collectors and different sizings of the various parts of the memory space, also.
My machine is a mac pro with striped RAID partition. The cpu usage never drops off whether its running slowly or quickly, which you would expect if threads were spending too much time blocking on reads from the disk, so I don't think it could be a disk read problem.
My question is, what types of problems with my code could cause this? Or could this be an OS problem? I haven't been able to duplicate it on a windows a machine, but I don't have a windows machine with a similar RAID setup.
You might have thread that have gone into an endless loop.
Try connecting with VisualVM and use the Thread monitor.
https://visualvm.dev.java.net
You may have to connect before the problem occurs.
I second that you should be doing it with a profiler looking at the threads view - how many threads, what states are they in, etc. It might be an odd race condition happening every now and then. It could also be the case that instrumenting the classes with profiler hooks (which causes slowdown), sortes the race condition out and you will see no slowdown with the profiler attached :/
Please have a look at this post, or rather the answer, where there is Cache contention problem mentioned.
Are you spawning the same umber of threads each time? Is that number less or equal the number of threads available on your platform? That number could be checked or guestimated with a fair accuracy.
Please post any finidngs!
Do you have a tool to measure CPU temperature? The OS might be throttling the CPU to deal with temperature issues.
Is it possible that your program is being paged to disk sometimes? In this case, you will need to look at the memory usage of the operating system as whole, rather than just your program. I know from experience there is a huge difference in runtime performance when memory is being continually paged to the disk and back.
I don't know much about OSX, but in linux the "free" command is useful for this purpose.
Another issue that might cause this slowdown is log files? I've known at least some logging code that slowed down the system incrementally as the log files grew. It's possible that your threads are synchronizing on a log file which is growing in size, then when you restart your program, another log file is used.
I've been running Tomcat 5.5 with Java 1.4 for a while now with a huge webapp. Most of the time it runs fine, but sometimes it will just hang, with no exception generated, and no apparant way of getting it to run again other than re-starting Tomcat. The tomcat instance is allowed a gigabyte of memory on the heap, but rarely exceeds 300 MB. Has anyone else run into this issue, and is there a solution for it?
For clarification: I determined how much memory it is using via Task Manager and via Eclipse (I've also tried running it outside of Eclipse, but get the same problem eventually, though it takes a little longer). With Eclipse, I look at the memory allocated via its little (optional) memory pane and the amount allocated to javaw.exe via the task manager. I use the sysdeo? tomcat plugin for Eclipse.
For any jvm process, force a thread dump. In windows, this can be done with CTRL-BREAK, I believe, in the console window.
In *nix, it is almost always "kill -3 jvm-pid".
This may show if you have threads waiting on db connection pool/thread pool, etc.
Another thing to check out is how many connections you have currently to the JVM -- either use NETSTAT or SysInternals utility such as tcpconn/tcpview (google it).
Also, try to run with the verbose:gc JVM flag. For Sun's JVM, run like "java -verbose:gc". This will show your garbage collections. If it is collecting a lot (FULL COLLECTIONS, expecially) then you probably have a memory leak. The full collections are costly, especially on large heaps like that.
How are you determining that only 300mb are being used?
It sounds like you're hitting a deadlock.
If you can reproduce it in a dev environment then try attaching a debugger once it's happened. Take a look at your threads and see if you have any deadlocks.
If you can't get a debugger to attach you should be able to generate a thread dump, as Dustin pointed out.
Try increasing the logging sensitivity for the Tomcat application server.
http://tomcat.apache.org/tomcat-5.5-doc/logging.html
You can increase the sensitivity to FINEST or ALL for most of them for a few days and see if that helps you catch anything.
I agree with creating multiple thread dumps and viewing them though this: Thread Dump Analyzer