we've got a slightly grown spring webapp (on tomcat 7) that is very slow in shutdown. (which has negative impacts on the performance of our continous delivery)
My suspicion is, that there must be some bean that is blocking (or taking very long) in it's #PreDestroy method.
So far I've ensured that it's not related to a thread(pool) that is not shut down correctly by giving distinct names to every pool, thread and timer and ensuring that they are either daemon threads or being shut down correctly.
Has anybody every solved a situation like this and can give me a hint on how to cope with this?
BTW: killing the tomcat process is not an option - we really need a clean shutdown for our production system.
Profiling would be the nuclear option. It's probably easy to get a picture of what's happening (especially if it is just blocked threads since that state will be long lived) just using thread dumps. If you take 2 dumps separated by a few seconds and they show the same or similar output for one or more threads then that is probably the bottleneck. You can get a thread dump using jstack or "kill -3" (on a sensible operating system).
and if you're on Windows, then selecting the java console window, and hitting ctrl + pause will dump to that window - just hit 'enter' to resume execution
Related
TL;DR: Is there a foolproof (!) way I can detect from my master JVM that my slave JVM spawned via 2 intermediate scripts has experienced an OutOfMemory error on Linux?
Long version:
I'm running some sort of application launcher. Basically it receives some input and reacts by spawning a slave Java application to process said input. This happens via a python script (to correctly handle remote kill commands for it) which in turn calls a bash script (generated by Gradle and sets up the classpath) to actually spawn the slave.
The slave contains a worker thread and a monitor thread to make callbacks to a remote host for status updates. If status updates fail to occur for a set amount of time, the slave gets killed by the launcher. The reason for it not responding CAN be an OutOfMemoryError, however it can also be other reasons. I need to differentiate an OutOfMemoryError of the slave from some other error which caused it to stop working.
I don't just want to monitor memory usage and say once it reaches like 90% "ok that's enough". It may very well be that the GC succeeds in cleaning up sufficiently for the workload to finish. I only want to know if it failed to clean up and the JVM died because not enough memory could be freed.
What I have tried:
Use the -XX:OnOutOfMemory flag as a JVM option for the slave which calls a script which in turn creates an empty flag file. I then check with the launcher for the existence of the flag file if the slave died. Worked like a charm on Windows, did not work at all on Unix because there is a funky bug which causes the execution of the flag call to require the exact same amount of Xmx the slave has used. See https://bugs.openjdk.java.net/browse/JDK-8027434 for the bug. => Solution discarded because the slave needs the entire memory of the machine.
try{ longWork(); } catch (OutOfMemoryError e) { createOomFlagFile(); System.exit(100); } This does work in some cases. However there are also cases where this does not happen and the monitor thread simply stops sending status updates. No exception occurs, no OOM flag file gets created. I know from SSHing onto the machine though that Java is eating all the memory available on the system and the whole system is slow.
Is there some (elegant) foolproof way to detect this which I am missing?
You shouldn't wait for the OutOfMemory. My suggestion is, that you track memory consumption from the master application via Java Management Beans and issue warnings when memory consumption gets critical. I never did that on my own before, so I cannot get more precisely on how to do that, but maybe you find out or some others here can provide a solution.
Edit: this is the respective MXBean http://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html
It is possible to kill a thread that is in state RUNNING in a non programatically way?
I know that top command in *nix can show threads. Can I kill the thread in the OS?
I'd like to know if there is a way to link a thread to a process so I can kill only that specific thread and not the application.
We had a bug in our code that kept a thread in state RUNNING in a synchronized method. The thread kept the lock on the object "hanging" the application.
The bug is fixed. But I wonder if is possible.
The short answer is "maybe, but you should not and most of the time it won't work either".
The long answer is:
"Maybe..."
Some JVM implementation map java threads to OS threads and some do not. If the JVM does a mapping to a native OS thread, you might be able to kill that thread with some process tool that the OS provides (like kill on *nix). If the JVM does green threads, meaning it doesn't map a Java thread to an OS level thread, then you are basically out of luck using OS level tools. Luckily only very few JVM implementations do this. An approach that can be used regardless in which way the JVM organizes it's threads, is using the java debugger. This article describes the procedure of doing it: http://www.rhcedan.com/2010/06/22/killing-a-java-thread/.
"but you should not do it"
Killing a thread on the OS level will almost certainly leave the JVM in an undefined state (read "jvm might crash or delete all files on your disk or do whatever it fricking pleases to do"). Even when going the debugger way, only a very small amount of java applications (read "no application made on this planet") will properly handle the event that an outside application is killing one of it's threads. As a result these applications will be put in an undefined state (read "application might crash or delete all files on your disk or do whatever it fricking pleases to do").
"and most of the time it won't work either"
If the thread is really stuck with some blocked IO etc, then killing the thread won't work, it will just not respond. If a program is stuck it's probably better to kill the whole program, find the issue with the program and fix it instead of killing a single thread.
For all your doubts on killing a thread, refer this:
http://download.oracle.com/javase/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html
On linux, there is a tkill(int tid, int sig) command, similar to kill.
On windows, ProcessExplorer can do it from gui, don't know if there is anything with cli.
I am running a Java process with Xmx2000m, the host OS is linux centos, jdk 1.6 update 22. Lately I have been experiencing a weird behavior in the process, it becomes totally unresponsive with no apparent reason, no logs, no errors, nothing.. I am using jconsole to monitor the processor, heap and Perm memory are not full, threads and loaded classes are not leaking..
Explanation anyone?
I doubt anyone can give you an explanation since there are lots of possible reasons and not nearly enough information. However, I suggest that you jstack the process once it's hung to figure out what the threads are doing, and take it from there. It sounds like a deadlock or thrashing of some sort.
Do a thread dump. If you have access to the foreground process on Linux, use ctrl-\. Or use jstack to dump stack remotely. Or you can actually poke it through JMX via jconsole at MBeans/java.lang/Threading/Operations/dumpAllThreads.
Without knowing more about your app, it's hard to speculate about the cause. Presumably your threads are either a) blocked or b) exited. If they are blocked, they could be waiting for I/O on a database or other operation OR they could be waiting on a lock or monitor (deadlocked). If a deadlock exists, the thread dump will tell you which threads are deadlocked, which lock, and (in Java 6) annotate the stack with where locks have been taken. You can also search for deadlocks with the JMX method, available through jconsole at MBeans/java.lang/Threading/Operations/find[Monitor]DeadlockedThreads().
Or your threads may have received unhandled exceptions and exited. Check out Thread's uncaughtExceptionHandlers or (better) use Executors in java.util.concurrent.
And finally, the other classic source of pauses in Java is GC. Run with -verbose:gc and other GC flags to see if it's doing a full GC collection. You can also turn this on dynamically in jconsole by flipping the flag at MBeans/java.lang/Memory/Attributes/Verbose.
Agree with aix, but would like to add a couple of recommendataions.
1. check your system. Run top to see whether the system itself is healthy, CPU is not 100% and memory is available. If not, fix this.
2. application may freeze as a result of dead lock. Check this.
Ok here are some updates I wanted to share:
There is an incompatability between NTPL (Linux’s new thread library) and the Java 1.6+ JVM. A random bug causes the JVM to hang and eat up 100% CPU.
To work around it set LD_ASSUME_KERNEL=2.4.1 before running the JVM, export LD_ASSUME_KERMEL=2.4.1 . This disables NTPL: problem solved!
But for compatibility reasons, I'm still looking for a solution that uses NTPL.
Threads can be traced using jvisualvm and jconsole, and deadlocks can be avoided too. Note that there are several network services each with separate thread pools, and they all become unreachable.
Check the jvisualvm of the process right before the crash.
http://www.jadyounan.com/wp-content/uploads/2010/12/process.png
Could you elaborate more on what you are doing ? 2000 for memory is rather a lot.
How can I suspend the execution of a JVM for a configurable amount of time, similar to want happens on a full, serial, Garbage Collection? I want to test some edge cases but it's difficult to create the exact pauses I need with manually generating garbage + System.gc().
I'm trying to 'freeze' all the threads in the application, to simulate a situation where an application becomes unresponsive, and then resumes execution. The application is part of a cluster, and I'm trying to debug some leave / join / re-join problems.
JVM suspend can be done from command line with jdb:
jdb -attach 8787
Initializing jdb ...
> suspend
All threads suspended.
> resume
All threads resumed.
> exit
But this requires it to be started with -Xdebug ...
There is also universal way to suspend and resume any process on Linux/Unix:
kill -SIGSTOP PID
kill -SIGCONT PID
See also How to suspend/resume a process in Windows?
When debugging from Eclipse you can supsend all threads, I guess other debuggers allow to do that too. So you'd need to start your server with JVM debug options and remote connect to it.
In my case (running JBoss) I modify the startup script by adding this line:
set JAVA_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n %JAVA_OPTS%
Then in Eclipse, Run -> Debug Configurations -> Remove Java Application -> New
Normally you just need to provide the hostname and the port.
Use a Virtual Machine and suspend its execution outright.
I think you probably want something that is less invasive, but one approach is to have a controlling thread obtain a lock and then have all the other threads also try to obtain that lock. Then have the controlling thread sleep for the time you need before giving it up. A slightly less clumsy version of this would be to use a semaphore with one permit per thread and then have the controlling thread obtain them all, before having all the other threads try to acquire a permit.
As I said that's pretty invasive you'd have to hack that into your code for each thread.
Another approach would be to run your code under a debugger and manually suspend each thread.
I have a web application running in a jboss application server (But it is not jboss specific so we could also assume it is a tomcat or any other server). Now I have the problem that one thread seems to be in dead-lock situation. It uses 100% CPU all the time. I have started the server with enabled debug port and I can connect Eclipse to it. But the problem is: There are a lot of threads running. How can I find the right thread? I know the process id (from Linux "top" command) but I think this will not help. Do I really have to open each thread separately and check what they are currently doing? Or is there a way to filter the threads for "most active" or something like that in Eclipse?
You can try and generate a thread dump (CTRL+Break as shown in this thread).
Or you could attach a JConsole to the remote session (so leaving Eclipse aside for now), monitor the threads and generate a thread dump.
alt text http://www.jroller.com/dumpster/resource/tdajconsole.png
Seems to be you need to narrow things down to the code that has the bug by identifying which thread is eating the CPU first, then which code is being executed by that thread and at that point you can remote debug.
I would suggest using something like JProfiler, jvisualvm, jconsole or something similar. Using one of these tools will allow you to get some insight into what the thread is doing and should allow you to sort the threads by cpu cycles used so you kind find the offending thread quickly.