A quick answer would be "without crying" of course :).
I have a really strange problem with my Java application (J2SE 1.7) on a Win7 32bits system. I encountered all the cases :
Sometimes it goes out of memory of Java heap (and so I can log it and recover from this)
Sometimes it crash in the native and I have the hs_err_pidxxxx.log file and I can analyze what is going on.
Sometimes it crash in the native and I have no hs_err file but I have a popup java stop functioning and I can see the exception in windows event log and even debug with visual some part of the process.
Sometimes it crash and I have nothing (no hs_err, no popup, nothing...). It just ends all like if there were an System.exit() or a native exit() call.
So my question is :
how can I be sure this is a native exit call as I don't have all the code of native libraries I am using ?
Is it possible to have this strange behaviour produced by another mean ?
Finally how to debug and track which lib can be the root cause ?
how can I be sure this is a native exit call as I don't have all the
code of native libraries I am using ?
The only way I know to be sure would be to wrap the call to a native library with logging commands so you log before each call and after each return. After your program has crashed if the log has an enter message but no return message then that library call is suspect.
Is it possible to have this strange behaviour produced by another
means ?
Yes there are an infinite number of strange other means.
Using up memory or some other resource might be one explanation.
Finally how to debug and track which lib can be the root cause ?
Logging described above should find this too if the messages are specific to which library is being called. You can monitor the application in jconsole to see if it is using up tons of memory or threads. Disable anything that can be disabled so you can eliminate it as being part of the problem. If the problem goes away enable things one at atime until the problem returns.
How can I be sure this is a native exit call as I don't have all the code of native libraries I am using?
Debug it.
Is it possible to have this strange behaviour produced by another mean?
Hard to tell... Could be threading, could be memory leak, ...
Finally how to debug and track which lib can be the root cause?
Run Java with
-XX:+CreateMinidumpOnCrash
and you'll get a crash dump that you can analyze. Or use
-XX:+UseOSErrorReporting
to let Windows handle the crash (which will e.g. show a message to attach a debugger, depending on what you have installed. It might as well show "Send to Microsoft" error report.).
Related
My Java application has started to crash regularly with a SIGSEGV and a dump of stack data and a load of information in a text file.
I have debugged C programs in gdb and I have debugged Java code from my IDE. I'm not sure how to approach C-like crashes in a running Java program.
I'm assuming I'm not looking at a JVM bug here. Other Java programs run just fine, and the JVM from Sun is probably more stable than my code. However, I have no idea how I could even cause segfaults with Java code. There definitely is enough memory available, and when I last checked in the profiler, heap usage was around 50% with occasional spikes around 80%. Are there any startup parameters I could investigate? What is a good checklist when approaching a bug like this?
Though I'm not so far able to reliably reproduce the event, it does not seem to occur entirely at random either, so testing is not completely impossible.
ETA: Some of the gory details
(I'm looking for a general approach, since the actual problem might be very specific. Still, there's some info I already collected and that may be of some value.)
A while ago, I had similar-looking trouble after upgrading my CI server (see here for more details), but that fix (setting -XX:MaxPermSize) did not help this time.
Further investigation revealed that in the crash log files the thread marked as "current thread" is never one of mine, but either one called "VMThread" or one called "GCTaskThread"- I f it's the latter, it is additionally marked with the comment "(exited)", if it's the former, the GCTaskThread is not in the list. This makes me suppose that the problem might be around the end of a GC operation.
I'm assuming I'm not looking at a JVM bug here. Other Java programs
run just fine, and the JVM from Sun is probably more stable than my
code.
I don't think you should make that assumption. Without using JNI, you should not be able to write Java code that causes a SIGSEGV (although we know it happens). My point is, when it happens, it is either a bug in the JVM (not unheard of) or a bug in some JNI code. If you don't have any JNI in your own code, that doesn't mean that you aren't using some library that is, so look for that. When I have seen this kind of problem before, it was in an image manipulation library. If the culprit isn't in your own JNI code, you probably won't be able to 'fix' the bug, but you may still be able to work around it.
First, you should get an alternate JVM on the same platform and try to reproduce it. You can try one of these alternatives.
If you cannot reproduce it, it likely is a JVM bug. From that, you can either mandate a particular JVM or search the bug database, using what you know about how to reproduce it, and maybe get suggested workarounds. (Even if you can reproduce it, many JVM implementations are just tweaks on Oracle's Hotspot implementation, so it might still be a JVM bug.)
If you can reproduce it with an alternative JVM, the fault might be that you have some JNI bug. Look at what libraries you are using and what native calls they might be making. Sometimes there are alternative "pure Java" configurations or jar files for the same library or alternative libraries that do almost the same thing.
Good luck!
The following will almost certainly be useless unless you have native code. However, here goes.
Start java program in java debugger, with breakpoint well before possible sigsegv.
Use the ps command to obtain the processid of java.
gdb /usr/lib/jvm/sun-java6/bin/java processid
make sure that the gdb 'handle' command is set to stop on SIGSEGV
continue in the java debugger from the breakpoint.
wait for explosion.
Use gdb to investigate
If you've really managed to make the JVM take a sigsegv without any native code of your own, you are very unlikely to make any sense of what you will see next, and the best you can do is push a test case onto a bug report.
I found a good list at http://www.oracle.com/technetwork/java/javase/crashes-137240.html. As I'm getting the crashes during GC, I'll try switching between garbage collectors.
I tried switching between the serial and the parallel GC (the latter being the default on a 64-bit Linux server), this only changed the error message accordingly.
Reducing the max heap size from 16G to 10G after a fresh analysis in the profiler (which gave me a heap usage flattening out at 8G) did lead to a significantly lower "Virtual Memory" footprint (16G instead of 60), but I don't even know what that means, and The Internet says, it doesn't matter.
Currently, the JVM is running in client mode (using the -client startup option thus overriding the default of -server). So far, there's no crash, but the performance impact seems rather large.
If you have a corefile you could try running jstack on it, which would give you something a little more comprehensible - see http://download.oracle.com/javase/6/docs/technotes/tools/share/jstack.html, although if it's a bug in the gc thread it may not be all that helpful.
Try to check whether c program carsh which have caused java crash.use valgrind to know invalid and also cross check stack size.
A project I am currently involved in uses JavaCv/OpenCv for face detection. Since the OpenCv occasionally throws an error, and the propagation of OpenCv/C++ errors to Java Exceptions isn't fully functional yet, this means the Java main-loop crashes with no way to recover.
However, the code gives mostly accurate results, and since we're running it on a large database I baked a quick Batch-script around the execution to keep it going, and the Java code internally manages an id, to make sure it continues from just after where it crashed.
:RETRY
java -Xmx1024m -jar Main.jar
IF ERRORLEVEL 1 GOTO RETRY
EXIT 0
However, occasionally I get a Runtime Error pop-up, as follows:
Microsoft Visual C++ Runtime Library
Runtime Error!
Program: C:\Windows\System32\java.exe
This application has requested the runtime to end in an unusual way.
Please contact the application's support team for more information.
At which point the code execution halts until the pop-up is clicked, which is really annoying, as it means my code can't run without me babysitting it.
I found this question, which basically asks the same thing. There is an accepted solution for that question, but since I'm not directly working with C++, I don't see how I can implement this.
Is there a Batch-level solution to this problem? Is there a Java/JavaCv-level solution to catch the C++ errors coming from OpenCv? Any other solution?
Interesting question.
Java.exe is dependent on one or more of Visual C++ DLLs (like MSVCRT.DLL, msvcr90.dll etc). Probably the JAR file is causing Java.exe to cause this error. Java.exe must be calling some CRT function which is raising the exception and hence the Runtime Error.
The best bet you can do is to launch the process, let this error pop and then start Process Explorer, and see the call stack. Nevertheless, solving this issue is most probably out of your control. May be latest version of Java may help.
I've built a RCP-based application, and one of my users running on Windows XP, Sun JVM 1.6.0_12 had a full application crash. After the app was running for two days (and this is not a new version or anything), he got the nice gray JVM force exit box, with exit code=1073807364.
He was away from the machine at the time, and the only thing I can find near that time in the application logs was some communication with the database (SQL Server by way of Hibernate). There's no hs_ files or anything similar as far as I can tell. Web searching found a bunch of crash reports with that exit code in a variety of applications, but I didn't see any fundamental explanation of what causes it.
Can anyone tell me what causes it? Is there additional information likely to have been dumped that could prove useful?
From what I can tell, this error code (0x40010004) arises in all sorts of situations, with (as you noted) no obvious common thread.
However this page says "0x40010004" means "the task is running"! So, I would surmise that the correct way to interpret it is as saying "this tasked has exited in a way that prevented it setting a proper exit code".
I don't know if this will help, but I would try looking in the Windows Event logs to see if the problem is being reported there.
I am working on a Java product. An client claims that the application is getting crashed after an arbitrary time. SInce it is a crash we can't find any information on our logs.
Are there any tools, methods to find out the reasons for such Issues?
Can we do anything in code side to get more information on such program crashes?
Can we enable a "DEBUG" mode for JVM? If so where can I find the JVM log files/crash dumps?
Any known procedures to deal with this sort of issues?
If you got in to this problem, What would be your procedure in troubleshooting this?
I find it hard to believe there's no output from the JVM when it crashes. Start by taking a long and hard look at your run scripts and seeing whether you are simply ignoring output. If the JVM ends due to an unhandled exception, it will output the exception to stdout I believe. If it crashes hard (heap corruption etc) it will output something to stderr. Your in-application logging is useful, but you should be logging any output that goes to stdout and stderr as well (you don't define the platform your app is running on, but this basically applies to all of them).
Aside from that, there's a whole host of non-standard options you can pass to define the location of error files and the like, see Java HotSpot VM Options.
I would adjust your application logging to verboser levels or tweak the JVM as pointed before, but if you want more options, you can try JVisualVM to watch something weird (memory/thread/gc/jmx operations) and, in the last chance, I would search for hs_err_pid*.log files.
These files contains information about the state of the JVM in the moment of the hard crash (memory violations and so on).
Here you have an example:
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x6d741e3a, pid=1572, tid=1364
#
# Java VM: Java HotSpot(TM) Client VM (1.5.0_11-b03 mixed mode)
# Problematic frame:
# V [jvm.dll+0x1e3a]
#
--------------- T H R E A D ---------------
Current thread (0x00a85c78): VMThread [id=1364]
siginfo: ExceptionCode=0xc0000005, reading address 0x00000054
Registers:
EAX=0x00000050, EBX=0x00990000, ECX=0x0847b9f8, EDX=0x00000050
ESP=0x0ab0f660, EBP=0x0ab0f684, ESI=0x0847b9f8, EDI=0x0847b9f8
EIP=0x6d741e3a, EFLAGS=0x00010216
After a crash, you don't have logs during the crash itself, but you still have all your logs before your actual crash. That should give you a lot of information, if your logs are detailed enough.
In java, you combine the two phases:
logging in the code can be very detailed, using levels (fatal, error, warning, info, debug)
logging can be configured in production to only output what is relevant only (even as specific as a single class's logs at debug level, while the rest is only at error level), to have a decent performance and log files of acceptable size.
Using the power of logging, you should be able to narrow your focus little by little. Note that, if your application has too few logs, you should start ASAP adding some more (at the appropriate logging levels of course). Example process:
activate error level for all the application, see what you get
activate warning level for one module, see what you get
deactivate the previous, activate info level for one package, see what you get
deactivate the previous, activate debug level for one class, see what you get
At first yout should be aware, if the JVM crashes or your application itself. If your JVM crashes the java process creates several crash dumps on the file system, something like hs_errXXX.pid. If you find one of these files in the directory where java starts, you should check for this error on the official bug site at sun.
If your application crashes, you should extend your log infrastructure (like KLE mentioned). Using a shutdown hook to print out, that it is shut down (normally) is also quite handy .See here for API reference.
If this problem occurs only with that client, ask them if they run the application on more than one machine. If yes, does the problem occur on all of them?
If the problem occurs only on one machine, I'd suspect faulty hardware, most likely RAM. This can be diagnosed with a tool like memtest.
I've personally witnessed only two instances of recurring JVM crashes. In both cases, the problem was faulty RAM.
A few options that will help to diagnose memory issues:
The JVM option -XX:+HeapDumpOnOutOfMemoryError will create a heap dump if the VM exits due to memory exhaustion. You can analyse the dump using something like eclipseMAT to determine the cause of the problem.
Also -verbose:gc will provide detailed garbage collection stats, and adding -Xloggc:<file> will redirect this to a file.
If you're using JNI (or any libraries that use JNI), it's easy to crash the JVM so that it leaves no traces at all. As far as I know, the only way to debug this kind of problems is to step through the native stuff with a debugger.
In addition to all of the other suggestions, check your codebase for calls to System.exit().
When a Java VM crashes with an EXCEPTION_ACCESS_VIOLATION and produces an hs_err_pidXXX.log file, what does that indicate? The error itself is basically a null pointer exception. Is it always caused by a bug in the JVM, or are there other causes like malfunctioning hardware or software conflicts?
Edit: there is a native component, this is an SWT application on win32.
Most of the times this is a bug in the VM.
But it can be caused by any native code (e.g. JNI calls).
The hs_err_pidXXX.log file should contain some information about where the problem happened.
You can also check the "Heap" section inside the file. Many of the VM bugs are caused by the garbage collection (expecially in older VMs). This section should show you if the garbage was running at the time of the crash. Also this section shows, if some sections of the heap are filled (the percentage numbers).
The VM is also much more likely to crash in a low memory situation than otherwise.
Answer found!
I had the same error and noticed that others who provided the contents of the pid log file were running 64 bit Windows. Just like me. At the end log file, it included the PATH statement. There I could see C:\Windows\SysWOW64 was incorrectly listed ahead of: %SystemRoot%\system32. Once I corrected it, the exception disappeared.
First thing you should do is upgrade your JVM to the latest you can.
Can you repeat the issue? Or does it seem to happen randomly? We recently had a problem where our JVM was crashing all over the place, at random times. Turns out it was a hardware problem. We put the drives in a new server and it completely went away.
Bottom line, the JVM should never crash, as the poster above mentioned if your not doing any JNI then my gut is that you have a hardware problem.
The cause of the problem will be documented in the hs_err* file, if you know what to look for. Take a look, and if it still isn't clear, consider posting the first 5 or 10 lines of the stack trace and other pertinent info (don't post the whole thing, there's tons of info in there that won't help - but you have to figure out which 1% is important :-) )
Are you using a Browser widget and executing javascript in the Browser widget? If so, then there are bugs in some versions of SWT that causes the JVM to crash in native code, in various Windows libraries.
Two examples (that I opened) are bug 217306 and bug 127960. These two bug reports are not the only bug reports of the JVM crashing in SWT, however.
If you aren't using the Browser widget then these suggestions won't help you. In that case, you can search for a list of SWT bugs causing a JVM crash. If none of those are your issue, then I highly recommend that you open a bug report with SWT.
I have the same problem with a JNLP application that I have been using for a long time and is pretty reliable. The problem started immediately after I upgraded from Windows 7 to Windows 10. According to my investigation, it is most likely a bug in Win 10.
The following is not a solution, but an ugly workaround. In jre/bin directory, there is javaws.exe. If I right-clicked /Properties/Compatibility and ticked Run this program as an administrator, the JNLP app started to work.
Please, be aware that this approach could cause security issues and use it only if you have no other option and 100% know what you are doing.