How do I debug hanging threads in the JVM?

How do I debug hanging threads in the JVM? - java

I am running a durable Java program on a remote Ubuntu server, where I have root user rights. After some time, the usage on some CPU cores goes up to 100%. The logs show nothing suspicious and the application still works, but with reduced throughput.
How can I debug the JVM so that I can find out the cause of this, while it's still running?

One option is VisualVM, which is included in the JDK starting with Java 1.6. I have found it useful in some situations in the past.
You may connect to local applications or remote applications.
To connect to a remote app, run jstatd on your remote server, and then run VisualVM locally and enter your server's IP address. You should be provided with a list of running Java applications including the one you wish to explore. If you have any trouble listing your application, good documentation is available at the VisualVM website.

Connect to the process with jvisualvm
This tool will allow you to connect to the running process and view all of the threads and their state. This could show you which thread is the culprit merely by looking at which one is awake all the time. You can do a thread dump to see the stack trace for each thread and see what each thread is doing.
It's a very powerful tool for just this kind of debugging. It is distributed with the JDK only, so you will need more than just the JVM runtime installed to have access. Be sure you install the same version of the JDK that the JVM is running.
You will need to have your X display forwarded for this to work.

If you want to see the stack trace on linux just issue kill -SIGQUIT <java-program-pid>. That is one way to see where the the code is executing.

Related

SpringBoot Application : CPU usage reaches to maximum just sometimes

How to troubleshoot/Optimize CPU usage in a Springboot application. Are the allocated resources sufficient for an application having a total of around a 300k user base? The application isn't heavy at all. It just calls third-party APIs and do the necessary checks and gives the response.
How to identify exact codes that could have been using more resources than normally required? I found out somewhere that tracking the processes id from top command and reaching to thread dump and looking up for the corresponding hexadecimal value of processid that could have been using more CPU is one way to figure out. This wasn't easily achievable as some of the commands suggested didn't work. I would appreciate any help or suggestions.
Thanks in advance.
Htop command output
Htop when it's normal

The process of Collection of Thread Stack is no different for a spring boot app. Before a boot app is containerized it is still a Jar. If you suspect that its your application that is actually contributing to the high CPUT then run your jar and attach a profiler to it and trace the code contributing to the high CPU on load. If you can not do it then take the thread dump of the running jar/java process and use any free or opensource tool to analyze the trace. The second logic explained is applicable for the containerized application as well.
Follow this steps to take the thread dump of a java app/boot app running inside a docker container :-
docker exec -it <containerName> jstack > someFile.txt
Take multiple snapshot of it for better visiblity and comparision.

If you have not added JMX enable options to the JVM commandline, do it to begin with:
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=10000
-Dcom.sun.management.jmxremote.rmi.port=10000
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
Then on your local machine you start "jmc" from your JDK bin folder and connect to your spring boot server.
You will then be able to see all the threads and enable both CPU load and thread locks on all active threads.
Be aware though that the above opens up JVM for unauthenticated entry so keep the port safe.
Next if your JVM dies send a "kill -3" SIGHUP which will tell the JVM to dump the core. that can then be read via the Eclipse MAT plugin in order to analyze the JVM inner doings.
Another way is to install jolokia into your server for other ways to retrieve the same info.

Profile Entire Java Program Execution in VisualVM

In Java profiling, it seems like all (free) roads nowadays lead to the VisualVM profiler included with JDK6. It looks like a fine program, and everyone touts how you can "attach it to a running process" as a major feature. The problem is, that seems to be the only way to use it on a local process. I want to be able to start my program in the profiler, and track its entire execution.
I have tried using the -Xrunjdwp option described in how to profile application startup with visualvm, but between the two transport methods (shared memory and server), neither is useful for me. VisualVM doesn't seem to have any integration with the former, and VisualVM refuses to connect to localhost or 127.0.0.1, so the latter is no good either. I also tried inserting a simple read of System.in into my program to insert a pause in execution, but in that case VisualVM blocks until the read completes, and doesn't allow you to start profiling until after execution is under way. I have also tried looking into the Eclipse plugin but the website is full of dead links and the launcher just crashes with a NullPointerException when I try to use it (this may no longer be accurate).
Coming from C, this doesn't seem like a particularly difficult task to me. Am I just missing something or is this really an impossible request? I'm open to any kinds of suggestions, including using a different (also free) profiler, and I'm not averse to the command line.

Consider using HPROF and opening the data file with a tool like HPjmeter - or just reading the resulting text file in your favorite editor.
Command used: javac -J-agentlib:hprof=heap=sites Hello.java
SITES BEGIN (ordered by live bytes) Fri Oct 22 11:52:24 2004
percent live alloc'ed stack class rank self accum bytes objs bytes objs trace name
1 44.73% 44.73% 1161280 14516 1161280 14516 302032 java.util.zip.ZipEntry
2 8.95% 53.67% 232256 14516 232256 14516 302033 com.sun.tools.javac.util.List
3 5.06% 58.74% 131504 2 131504 2 301029 com.sun.tools.javac.util.Name[]
4 5.05% 63.79% 131088 1 131088 1 301030 byte[]
5 5.05% 68.84% 131072 1 131072 1 301710 byte[]
HPROF is capable of presenting CPU usage, heap allocation statistics,
and monitor contention profiles. In addition, it can also report
complete heap dumps and states of all the monitors and threads in the
Java virtual machine.

The best way to solve this problem without modifying your application, is to not use VisualVM at all. As far as other free options are concerned, you could use either Eclipse TPTP or the Netbeans profiler, or whatever comes with your IDE.
If you can modify your application, to suspend it's state while you setup the profiler in VisualVM, it is quite possible to do so, using the VisualVM Eclipse plugin. I'm not sure why you are getting the NullPointerException, since it appears to work on my workstation. You'll need to configure the plugin by providing the path to the jvisualvm binary and the path of the JDK; this is done by visiting the VisualVM configuration dialog at Windows -> Preferences -> Run/Debug - > Launching -> VisualVM Configuration (as shown in the below screenshot).
You'll also need to configure your application to start with the VisualVM launcher, instead of the default JDT launcher.
All application launches from Eclipse, will now result in VisualVM tracking the new local JVM automatically, provided that VisualVM is already running. If you do not have VisualVM running, then the plugin will launch VisualVM, but it will also continue running the application.
Inferring from the previous sentence, it is evident that having the application halt in the main() method before performing any processing is quite useful. But, that is not the main reason for suspending the application. Apparently, VisualVM or its Eclipse plugin does not allow for automatically starting the CPU or memory profilers. This would mean that these profilers would have to be started manually, thereby necessitating the need to suspend the application.
Additionally, it is worth noting that adding the flags: -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y to the JVM startup will not help you in the case of VisualVM, to suspend the application and setup up the profilers. The flags are meant to help you in the case of profilers that can actually connect to the open port of the JVM, using the JDWP protocol. VisualVM does not use this protocol and therefore you would have to connect to the application using JDB or a remote debugger; but that would not resolve the problem associated with profiler configuration, as VisualVM (at least as of Java 6 update 26) does not allow you to configure the profilers on a suspended process as it simply does not display the Profiler tab.

This is now possible with the startup profiler plugin to VisualVM.

The advice with -Xrunjdwp is incorrect. It just enables debugger and with suspend=y it waits for debugger to attach. Since VisualVM is not debugger, it does not help you. However inserting System.in or Thread.sleep() will pause the startup and allows VisualVM to attach to your application. Be sure to read Profiling with VisualVM 1 and Profiling with VisualVM 2 to better understand profiler settings. Note also that instead of profiling, you can use 'Sampler' tab in VisualVM, which is more suitable for profiling entire java program execution. As other mentioned you can also use NetBeans Profiler, which directly support profiling of the application startup.

VisualVM hanging on startup "computing description"

I've got two remote servers, both running recent CentOS, both running recent Tomcat6, recent JDK6, and VisualVM 1.3.2.
ssh -X forwarding works on one server - I can start up VisualVM from that machine, it port forwards and runs fine - I see all the JVM processes running on that remote machine as 'local' in VVM.
ssh -X forwarding on the second machine - then running VisualVM - brings up an X windows with VVM in it, but it just shows one 'local' process - the VisualVM itself - and the lower right corner has a bouncing progress bar that says "computing description", and it never ends.
I can't find anything about this anywhere - anyone ever hit this? How do I get past this?

I experienced a similar issue - VisualVM hanging on "computing description", not displaying any local JVMs other than itself. I used "jps" to find all the JVMs running on the system. Used jstack to get the stacks of all those JVMs, including JVisualVM itself. What I found was happening, was that JVisualVM was trying to create an RMI connection to one of the target VMs, and that VM was hanging on the RMI connection attempt. In my case, the reason why it hanged was that I had the JVisualVM profiler attached to that JVM, but JVisualVM then died with a PermGen OOM. Parts of the profiler were still running in the target JVM, but were hanging due to lack of the profiler frontend; thus any attempt at class loading would hang, which would cause the incoming RMI from the new JVisualVM instance to hang also. Restarting the affected JVM resolved the issue.
Without any thread dumps, I can't say whether your issue was anything like mine or not; but if anyone gets this problem again, collecting thread dumps is a good idea. Whatever the root cause is, restarting all JVMs on your box (e.g. reboot) has a reasonable chance of solving it.

If your JVM is paused on a debugging breakpoint then this will cause VisualVM to hang.

I was facing the same issue - then I came across this post https://github.com/oracle/visualvm/issues/82. Then I killed all the JVM/JDK sessions on my machine. Restarted the visual vm - waited a bit and there you go, its not hanging anymore.
In short visual vm hangs when you switch between IP's.

I was using VisualVM in combination with IntelliJ. For me, VisualVM hanged because I was on a VPN connection. The issue resolved after I turned-off VPN.

How do I create a thread dump via JMX?

I have a Tomcat running as a Windows Service, and those are known not to work well with jstack. jconsole is working well, on the other hand, and I can see stacks of individual threads (I'm connecting to "localhost:port" to access it).
How can I use jconsole or a similar tool to dump all the thread stacks into a file? (similar to jstack)

You can use the ThreadMXBean management interface.
This FullThreadDump class demonstrates the capability to get a full thread dump and also detect deadlock remotely using JMX.

Nowadays you can use jvisualvm tool to connect to your remote JVM through JMX and create a thread dump. Don't know if this was available

Here's another code sample that will write a stack dump to a file:
http://pastebin.com/zwcKC0hz
We use this over JMX to give us an approximation of the stack dump you get when you make a JMX request or if the process detects high, unexpected load.

It would be helpful if you take a flight recording to get a deeper view on the JVM behavior, specially focusing on the Hot Methods.
Usually, a recording of half an hour is enough. To trigger a recording, you must be logged in to the machines, and issue the following command:
If using Java HotSpot 1.8.x:
$JAVA_HOME/bin/jcmd VM.unlock_commercial_features
$JAVA_HOME/bin/jcmd JFR.start duration=1800s settings=profile filename=/tmp/recording.jfr
IF using java HotSpot 1.7.x:
Edit your $HOME/conf/wrapper.conf file by adding the following parameters on JVM startup:
wrapper.java.additiona.=-XX:+UnlockCommercialFeatures
wrapper.java.additional.=-XX:+FlightRecorder
(replace with the corresponding positional number )
Then, have your instances restarted. Once done, issue the following command :
$JAVA_HOME/bin/jcmd JFR.start duration=1800s settings=profile filename=/tmp/recording.jfr
The flight recording wil produce a file on /tmp/recording.jfr upon termination.

What is the best way to monitor (java) process deaths on a Windows box?

We have a curious problem with our java processes dying.
The application doesn't stacktrace, or write anything to the logs, the process just randomly dies. It's a heavily used application, but the problem only appears about once a month.
We're currently looking into using Process Monitor but any other suggestions would be welcome.
Edit:
It's a distributed Java application, running on Weblogic with an in-house web framework (Yes, this is a terrible idea, but it's been running for eight years), connecting to Oracle.
-
Out of Memory?
Our logs would catch java.lang.OutOfMemoryException, according to Brian Agnew.
Write crashes to a log? I don't think Java ever gets the chance, the death is happening at a process level, rather than Java exiting.

Can you wrap it in some shell script that captures the log files (stdout/stderr) and the exit code (which should give some indication as to how it died) ? On JVM exit you can also capture machine level stats using WMI
IF the VM itself is crashing it'll leave behind an hs_err_pid... file that contains stacktraces, machine-level debug info. You can then use that to diagnose the VM issue. See this blog entry for further information.
If the problem is related to the app's behaviour, it may be worth looking at JConsole, although from your description of the issue, this sounds much more like a low level VM issue.
(I assume you're on the latest VM for your Java version number etc.)

You can use a Linux NAGIOS Server to monitor the health of your Windows machines and services! Have a look at: nagios-monitoring-windows.
If you have such problems with your java app! You should test it and debug it! Applications shouldn't die without a trace! Look for logfiles! From which vendor is the app? Or is it self written? Try to enforce another Log4J/Logger/Debug Level. Monitor your System with cacti etc. to reduce the possibilities for such a crash. Talk to the software vendor.
Is enogh memory available? Maybe the app runs out of memory? Is it a standalone java process or a java process from a tomcat/jboss server?
Have you written down the crash times to a log? Appear they in different time-slices? Or appear they nearly time-circular?

VisualVM is a new tool which makes monitoring Java applications easier:
https://visualvm.dev.java.net/description.html
"VisualVM is a tool that provides detailed information about Java applications while they are running. It provides an intuitive graphical user interface that allows you to easily see information about multiple Java applications."

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.