How to know the reason for JVM crashing with Segfault? - java

We are seeing the JVM getting crashed at times with segfault. The only error we see in logs is as below.
Anyone can suggest something by looking at the below error trace.
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fef7f1d3eb0, pid=42623, tid=0x00007feea62c8700
#
# JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 62683 C2 org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObject0(Ljava/lang/Object;)V (331 bytes) # 0x00007fef7f1d3eb0 [0x00007fef7f1d3e00+0xb0]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hsperfdata_pvappuser/hs_err_pid42623.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
While trying to understand the reason for this crash Oracle JVM docs https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/crashes001.html ,this looks to be the case of 5.1.2 Crash in Compiled Code as the problematic frame is java frame(has a "J")
Though could not get much further from it, we also not sure when it comes, the only probale pattern is it comes when JVM is running for 5-6 days so usually on Friday.
We are using openjdk-8 ("1.8.0_232") distribution provided by RedHat running on RHEL 6.10.
Looking forward to get any leading point in tracing this error.

The current stack frame has writeObject0 as the last called method. There is a naming convention that native method's names end with 0. Therefore check whether that method is indeed native.
If it is, it is probably written in C, an ancient unsafe language whose programs tend to crash in an uncontrolled way. This often leads to SIGSEGV.
In this case, that method is written in Java though.
As you were told in the error message, read hs_err_pid42623.log for further details. In that file you will find the registers and a few machine instructions around the code that crashed.

Related

Set ulimit when debugging with Eclipse

Sometimes my code crashes leaving me with an error file but no core dump because the latter is disabled.
Now it is suggested to set ulimit -c unlimited to allow core dumps.
If I would be running the code from console, it would be no issue to set ulimit before starting the Java Application. But the error seems to be much more frequent when debugged in Eclipse as when running as a standalone. (Actually as a standalone it crashed this way only once in hundreds of running hours, in debugging it crashed 4 times in the last two months (and some times before)).
Is there a way to tell Eclipse to set ulimit before launching a Debug-Session in Eclipse ?
And will a Core-Dump help me to find what causes the crash ?
For completeness, I am working on macOS and my error files all start like this (only pidchanges between them):
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGILL (0x4) at pc=0x00007fffa08c144e, pid=617, tid=0x0000000000000307
#
# JRE version: Java(TM) SE Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.181-b13 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C [AppKit+0x3a544e] -[NSApplication _crashOnException:]+0x6d
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
I suspect (but have nothing to prove it except the "moment when it crashes") that the crashes are related with the handling of BufferedImage within the Java code, but then again, the Error-Log tells that the error happened outside the JVM.

Why JVM crashes with seg fault? [duplicate]

We are seeing the JVM getting crashed at times with segfault. The only error we see in logs is as below.
Anyone can suggest something by looking at the below error trace.
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fef7f1d3eb0, pid=42623, tid=0x00007feea62c8700
#
# JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 62683 C2 org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObject0(Ljava/lang/Object;)V (331 bytes) # 0x00007fef7f1d3eb0 [0x00007fef7f1d3e00+0xb0]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hsperfdata_pvappuser/hs_err_pid42623.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
While trying to understand the reason for this crash Oracle JVM docs https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/crashes001.html ,this looks to be the case of 5.1.2 Crash in Compiled Code as the problematic frame is java frame(has a "J")
Though could not get much further from it, we also not sure when it comes, the only probale pattern is it comes when JVM is running for 5-6 days so usually on Friday.
We are using openjdk-8 ("1.8.0_232") distribution provided by RedHat running on RHEL 6.10.
Looking forward to get any leading point in tracing this error.
The current stack frame has writeObject0 as the last called method. There is a naming convention that native method's names end with 0. Therefore check whether that method is indeed native.
If it is, it is probably written in C, an ancient unsafe language whose programs tend to crash in an uncontrolled way. This often leads to SIGSEGV.
In this case, that method is written in Java though.
As you were told in the error message, read hs_err_pid42623.log for further details. In that file you will find the registers and a few machine instructions around the code that crashed.

JVM crash on hadoop reducer

I am running java codes on hadoop, but encounter this error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f2ffe7e1904, pid=31718, tid=139843231057664
#
# JRE version: Java(TM) SE Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x813904] PhaseIdealLoop::build_loop_late_post(Node*)+0x144
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /hadoop/nm-local-dir/usercache/ihradmin/appcache/application_1479451766852_3736/container_1479451766852_3736_01_000144/hs_err_pid31718.log
#
# Compiler replay data is saved as:
# /hadoop/nm-local-dir/usercache/ihradmin/appcache/application_1479451766852_3736/container_1479451766852_3736_01_000144/replay_pid31718.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
When I go to the node manager, all the logs are aggregated since yarn.log-aggregation-enable is true, and log hs_err_pid31718.log and replay_pid31718.log cannot be found.
Normally 1) the JVM crashes after several minutes of reducer, 2) sometimes the auto-retry of reducer can succeeds, 3) some reducers can succeed without failure.
Hadoop version is 2.6.0, Java is Java8. This is not a new environments, we have lots of jobs running on the cluster.
My questions:
Can I find hs_err_pid31718.log anywhere after yarn aggregate the log and remove the folder? Or is there a setting to keep all the local logs so I can check the hs_err_pid31718.log while aggregating logs by yarn?
What's the common steps to narrow down the deep dive scope? Since the jvm crashed, I cannot see any exception in code. I have tried -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp this args but there is no heap dumped on the host failing the reduce tasks.
Thanks for any suggestion.
Answers
Use -XX:ErrorFile=<your prefered location>/hs_err_pid<pid>.log to set the hs_error file location to your prefered one.
Crash is due to JDK bug JDK-6675699 this has already fixed in JDK9 and backports are available on JDK8 update 74 onwards.
You are using JDK8 update 72.
Kindly upgrade to latest version from here to avoid this crash.

How to prove that the error is not my Java agent or how to debug it?

One of our clients have contact me complaining that he's getting the following JRE crash while using our Java Agent.
According to the error (below) the crash is on the native code since the problematic frame is categorized as 'C'.
I've did some googling and it seems like there are some open bugs which are quite similar around this issue in this while using java agents. See the following links:
https://bugs.openjdk.java.net/browse/JDK-8094079
https://bugs.openjdk.java.net/browse/JDK-8041920
The issue is that the customer is reluctant to upgrade the JDK since he mentions that he has other java agents which are running without any issue.
Any suggestions on how to solve this issue?
For the completeness, please see the error that he had sent:
cat /opt/somecompany/apps/some-product-platform/some-product-name/hs_err_pid6697.log | grep sealights
7fe19d9b5000-7fe19d9d7000 r--s 00401000 ca:01 3539943 /opt/somecompany/apps/some-product-platform/some-product-name/sealights/sl-test-listener.jar
jvm_args: -javaagent:/opt/somecompany/apps/some-product-platform/hawtio/jolokia-jvm.jar=config=/opt/somecompany/apps/some-product-platform/some-product-name/conf/jolokia-agent.properties -javaagent:/opt/somecompany/apps/some-product-platform/some-product-name/agent/newrelic.jar -DNEWS_product_HOME=/opt/somecompany/apps/some-product-platform/some-product-name -Dsl.environmentName=Functional Tests DEV-INT -Dsl.customerId=myCustomer -Dsl.appName=ABB-product-name -Dsl.server=https://my-server.com -Dsl.includes=com.somecompany.* -javaagent:/opt/somecompany/apps/some-product-platform/some-product-name/sealights/sl-test-listener.jar -Dlog.dir=/opt/somecompany/apps/some-product-platform/logs -Dlog.threshold=debug
java_class_path (initial): some-product-name.jar:/opt/somecompany/apps/some-product-platform/hawtio/jolokia-jvm.jar:/opt/somecompany/apps/some-product-platform/some-product-name/agent/newrelic.jar:/opt/somecompany/apps/some-product-platform/some-product-name/sealights/sl-test-listener.jar
JVM crash message:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000000000000055, pid=6697, tid=140604865455872
#
# JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C 0x0000000000000055
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /opt/somecompany/apps/some-product-platform/some-product-name/hs_err_pid6697.log
Compiled method (c1) 565518 13823 1 sun.invoke.util.ValueConversions::identity (2 bytes)
total in heap [0x00007fe1a76857d0,0x00007fe1a7685a48] = 632
relocation [0x00007fe1a76858f8,0x00007fe1a7685918] = 32
main code [0x00007fe1a7685920,0x00007fe1a7685980] = 96
stub code [0x00007fe1a7685980,0x00007fe1a7685a10] = 144
metadata [0x00007fe1a7685a10,0x00007fe1a7685a18] = 8
scopes data [0x00007fe1a7685a18,0x00007fe1a7685a20] = 8
scopes pcs [0x00007fe1a7685a20,0x00007fe1a7685a40] = 32
dependencies [0x00007fe1a7685a40,0x00007fe1a7685a48] = 8
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
After asking for further details in the chat discussion, I assume that this error is related to the use of the ASM AdviceAdapter in combination with other agents in the same application. If the other agent expected a specific structure of a class, using the adapter might reorder variable indices of the class in question what the other agent does not anticipate. As the VM skips the verification of built-in classes, this results in a hard crash.
The solution would be the change the instrumentation to not use the AdviceAdapter.

Java Virtual Machine crashes Multiple times

This is the entire error message
http://pastebin.com/bDgye0rt
The error log is too big that I can't attach it here. I am not extremely familiar with how jvm works in the background and what registers it makes use of. I am hoping someone can look at this error log and explain to me what it means.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007ffd18cbcafe, pid=29906, tid=140725158119168
#
# JRE version: 6.0_27-b27
# Java VM: OpenJDK 64-Bit Server VM (20.0-b12 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea6 1.12.6
# Distribution: Debian GNU/Linux 7.1 (wheezy), package 6b27-1.12.6-1~deb7u1
# Problematic frame:
# C [libresolv.so.2+0x7afe] __libc_res_nquery+0x19e
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
# http://icedtea.classpath.org/bugzilla
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
I can't understand why and how it is created. It keeps happening all the time.
PS:
I have the error log of another 3 crashes that happened in my machine which I can share if you think its necessary.
EDIT
I have around 8 logs and out of the 8 logs it is made clear that the last call before the program crashes is
java.net.Inet6AddressImpl.lookupAllHostAddr
I do not believe the input is that much related because there is a static list of inputs that are parsed serially, and the error occurs at different time frames (Sometimes 1 hour after the program has been running and some other times 6 hours, it appears to be random).
It is an issue related to glibc and IPv6.
It would be safe to add -Djava.net.preferIPv4Stack=true to ask JVM use IPv4 first to avoid this issue.

Categories