Zombie threads eating my brainz (J2EE, Tomcat, Hibernate, Quartz) - java

It is Hallowe'en after all.
Here's the problem: I'm maintaining some old-ish J2EE code, using Quartz, in which I'm running out of threads. jconsole tells me that there are just short of 60K threads when it goes pear-shaped, of which about 100 (!!) are actually running. Intuition and some googling (see also here) suggest that what's happening is something (I'm betting Quartz) is creating unmanaged threads that never get cleaned up.
Several subquestions:
It there a tool that I can use easily to trace thread creation, so I can be certain the issue is really Quartz?
Most everything I've found about similar problems references Weblogic; is this a false lead for Tomcat?
Does anyone have a known solution?
It's been years since I did J2EE, so I wouldn't be too surprised if this is something that can be solved simply.
Update: It's clearly increasing threads without bound, see this plot from jconsole.

Try to increase the logging level of org.quartz.simpl.SimpleThreadPool to debug to get more information.
If that does not work, try a logging listener. Quartz has a JobListener interface, which is specified in its tutorial. A listener can help you trace job execution. Maybe jobs just don't finish and get deadlocked.
Configure org.quartz.threadPool.threadCount to stop running out of threads.
update:
Also, you might want to take a thread dump and see the thread stats. visual vm has a plugin called TDA, or you can use Thread dump analyzer directly.
Just in case, check the quartz version to see if there is no known bug.

Have you had a look with jvisualvm - it gives some more information.
Also, get stack traces to see what the threads are actually waiting on. You might have an aha-feeling right there.

Related

How can we take historical thread dumps in weblogic servers

A java application process running on weblogic server was reported to have some stuck threads in the past.
Can we get information on those stuck threads from a past time?
If those threads were marked as stuck by WebLogic they were logged in WebLogic server's log file. The stack trace is also logged. Have a look to these log files to see were your threads were stucked.
No, you can't. Gone is gone, end of story.
The only thing you can do is to figure if you can somehow detect such situations while they occur to then collect thread dumps and to log the results.
Worst case you have to give the customers/operators clear instructions how to gather dumps themselves to make them available to you.

Critical guava memory leak - Workaround needed

Is there any way to workaround the Google Guava r15 memory leak (link to the bug report) in the cache component?
(Without relying that the application server might clean things up and/or considering that the web application will never be restarted/redeployed)
I guess you don't need to care about it. The Tomcat message says
Threads are going to be renewed over time to try and avoid a probable memory leak.
IIUIC it means that once all old threads are gone, so will all the pointers to the old version of your class.
Details: The reason for the thread pooling is the big cost of thread creation. The pooling itself is hacky as you get a thread which was doing something else and thread are not stateless. Thread creation is expensive assuming you'd need a lot of them and never recycle them. There's nothing wrong with renewing all threads every few minutes, so I hoped, Tomcat's workaround solves it perfectly. But it's not the case.
EDIT
I'm afraid, I misunderstood something. The linked bug says
It seems that web applications which are using guava cache might face a memory leak.
After several redeployments, the application container crashes or stalls with an OutOfMemoryError.
I thought Tomcat could solve it easily, but for whatever reason it doesn't. So I'm afraid, you have to clean the ThreadLocals yourself. This is easily possible via reflection, the concerned fields are Thread.threadLocals and possibly inheritableThreadLocals. It's a bad hack and the harder part is to make this happen when nothing can go wrong, i.e., when no application is loaded.
EDIT 2 and 3
I guess it's safe to do something like
Stripped64.threadHashCode = new ThreadHashCode();
as the contained things are only needed for performance under heavy contention and they get recreated upon use. But according to MRalwasser's comment, it won't help at all as alive threads will still refer the old value. So there seem to be no way.
As ThreadLocal works by storing data with the threads (rather then using a real Map<Thread, Something>), you'd have to through all threads and remove references there. Fooling around with other threads' private fields is a terrible idea as they are not thread-safe and also due to visibility issues.
Another thing that might or mighn't not work is my proposal on the issue page. It's just a 20 line patch. Or simply wait, the issue has been assigned yesterday.
EDIT 4
Thread locals which don't get used can't cause any problems. AFAIK the only use of this TL is in cache stats. So avoid both CacheBuilder.recordStats and Cache.stats and Stripped64 won't get loaded.
EDIT 5
It looks like it's gonna get fixed finally. From the issue:
Doug fixed this upstream for us and we patched it back into Guava:
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/Striped64.java?revision=1.9
At the first glance his change seems to be identical to mine.
EDIT 6
Finally, this has been marked as fixed and Guava 18.0-rc1 has been announced. It's just sad it took that long given that the change is the same as mine (9 month ago).
You can use the ServletListener ClassLoaderLeakPreventor https://github.com/mjiderhamn/classloader-leak-prevention/ which also clears ThreadLocals on undeploy/stop. It also has fixes/workarounds for other common leaks.
Seems to be a drawback of ThreadLocals. You'll get the same every time you put an application level class in ThreadLocal.
The only workaround is to restart server on deploy I guess. I think it's a known issue of Java applications. Are you sure that it's the only place that stops unloading classloader?

Standalone Java App dies after a few days

We have a Java App that connects via RMI to another Java app.
There are multiple instances of this app running at the same time, and after a few days an instance just stops processing... the CPU is in 0 and I have an extra thread listening to an specific port that helps to shutdown the App.
I can connect to the specific port but the app doesn't do anything.
We're using Log4j to keep a log and nothing is written, so there aren't any exceptions thrown.
We also use c3p0 for the DB connections.
Anyone have ideas?
Thanks,
I would suggest starting with a thread dump of the affected application.
You need to see what is going on on a thread by thread basis. It could be that you have a long running thread, or other process which is blocking other work from being done.
Since you are running linux, you can get your thread dump with the following command
kill -3 <pid>
If you need help reading the output, please post it in your original question.
If nothing is shown from the thread dump, other alternatives can be looked at.
Hum... I would suggest using JMetter to stress the Application and take note of anything weird that might be happening (such as Memory Leaks, Deadlocks and such). Also review the code for any Exceptions that might interrupt the program (or System.exit() calls). Finally, if other people have access to the computer, makes sense to check if the process wasn't killed manually somehow.

Java process is hanging for no apparent reason

I am running a Java process with Xmx2000m, the host OS is linux centos, jdk 1.6 update 22. Lately I have been experiencing a weird behavior in the process, it becomes totally unresponsive with no apparent reason, no logs, no errors, nothing.. I am using jconsole to monitor the processor, heap and Perm memory are not full, threads and loaded classes are not leaking..
Explanation anyone?
I doubt anyone can give you an explanation since there are lots of possible reasons and not nearly enough information. However, I suggest that you jstack the process once it's hung to figure out what the threads are doing, and take it from there. It sounds like a deadlock or thrashing of some sort.
Do a thread dump. If you have access to the foreground process on Linux, use ctrl-\. Or use jstack to dump stack remotely. Or you can actually poke it through JMX via jconsole at MBeans/java.lang/Threading/Operations/dumpAllThreads.
Without knowing more about your app, it's hard to speculate about the cause. Presumably your threads are either a) blocked or b) exited. If they are blocked, they could be waiting for I/O on a database or other operation OR they could be waiting on a lock or monitor (deadlocked). If a deadlock exists, the thread dump will tell you which threads are deadlocked, which lock, and (in Java 6) annotate the stack with where locks have been taken. You can also search for deadlocks with the JMX method, available through jconsole at MBeans/java.lang/Threading/Operations/find[Monitor]DeadlockedThreads().
Or your threads may have received unhandled exceptions and exited. Check out Thread's uncaughtExceptionHandlers or (better) use Executors in java.util.concurrent.
And finally, the other classic source of pauses in Java is GC. Run with -verbose:gc and other GC flags to see if it's doing a full GC collection. You can also turn this on dynamically in jconsole by flipping the flag at MBeans/java.lang/Memory/Attributes/Verbose.
Agree with aix, but would like to add a couple of recommendataions.
1. check your system. Run top to see whether the system itself is healthy, CPU is not 100% and memory is available. If not, fix this.
2. application may freeze as a result of dead lock. Check this.
Ok here are some updates I wanted to share:
There is an incompatability between NTPL (Linux’s new thread library) and the Java 1.6+ JVM. A random bug causes the JVM to hang and eat up 100% CPU.
To work around it set LD_ASSUME_KERNEL=2.4.1 before running the JVM, export LD_ASSUME_KERMEL=2.4.1 . This disables NTPL: problem solved!
But for compatibility reasons, I'm still looking for a solution that uses NTPL.
Threads can be traced using jvisualvm and jconsole, and deadlocks can be avoided too. Note that there are several network services each with separate thread pools, and they all become unreachable.
Check the jvisualvm of the process right before the crash.
http://www.jadyounan.com/wp-content/uploads/2010/12/process.png
Could you elaborate more on what you are doing ? 2000 for memory is rather a lot.

How to find problematic thread in Eclipse remote debugger?

I have a web application running in a jboss application server (But it is not jboss specific so we could also assume it is a tomcat or any other server). Now I have the problem that one thread seems to be in dead-lock situation. It uses 100% CPU all the time. I have started the server with enabled debug port and I can connect Eclipse to it. But the problem is: There are a lot of threads running. How can I find the right thread? I know the process id (from Linux "top" command) but I think this will not help. Do I really have to open each thread separately and check what they are currently doing? Or is there a way to filter the threads for "most active" or something like that in Eclipse?
You can try and generate a thread dump (CTRL+Break as shown in this thread).
Or you could attach a JConsole to the remote session (so leaving Eclipse aside for now), monitor the threads and generate a thread dump.
alt text http://www.jroller.com/dumpster/resource/tdajconsole.png
Seems to be you need to narrow things down to the code that has the bug by identifying which thread is eating the CPU first, then which code is being executed by that thread and at that point you can remote debug.
I would suggest using something like JProfiler, jvisualvm, jconsole or something similar. Using one of these tools will allow you to get some insight into what the thread is doing and should allow you to sort the threads by cpu cycles used so you kind find the offending thread quickly.

Categories