Debugging the "Too many files open" issue

Debugging the "Too many files open" issue - java

The application i am working on suddenly crashed with
java.io.IOException: ... Too many open files
As i understand the issue it means that files are opened but not closed.
Stacktrace of course happens after the fact and can only help understand before what event error occurred.
What would be an intelligent way to search your code base to find this issue which only seems to occur when app is under high stress load.

use lsof -p pid to check what cause leak of file references;
use ulimit -n to see the limit of opened file references of a single process;
check any IO resources in your project,are they released in time?,Note that,File,Process,Socket(and Http connections) are all IO resources.
sometimes, too many threads will cause this problem too.

I think the best way to use a tool specifically designed for the purpose, such as this one:
This little Java agent is a tool that keeps track of where/when/who opened files in your JVM. You can have the agent trace these operations to find out about the access pattern or handle leaks, and dump the list of currently open files and where/when/who opened them.
In addition, upon "too many open files" exception, this agent will dump the list, allowing you to find out where a large number of file descriptors are in use.
I seem to remember YourKit also having some facilities around this, but can't find any specific information at the moment.

What OS? If it's linux/mac, there is information under /proc that should help. On Windows, use the Process Explorer.
As far as searching the code base, perhaps look for code that catches or raises IOException - I think I/O methods that already catch/raise this have a high likelihood of needing a close() call.

Have you tried attaching to the running process using jvisualvm (Java 5.0 and later in the JDK bin directory). You can open the running process and do a heap dump (which if you have an older JDK you will need to analyze using eclipse or intellij or netbeans et. al.).
In JDK 7 the heap dump button is under the "Monitor" tab. It will create a heap dump tab, "Classes" sub-tab that you can check and see if any classes that open files exist in high quantity. Another very useful feature is heap dump compare, so you can take a reference heap dump, let your app run a bit and then take another heap dump and compare the two (the link to compare is on the "[heapdump]" tab you get when you take one. There is also a flag in java for taking a heapdump on crash or OOM exception, you can go down that route if comparing heap dumps does not give you an obvious class that is causing the problem. Also, "Instances" subtab in the heap dump diff will show you what has been allocated in the time between the two heap dumps which may also help.
jvisualvm is an awesome tool that does not get enough mentions.

Related

How do I analyze a Java heap dump when local memory is less than the size of the dumped heap? [duplicate]

I have a HotSpot JVM heap dump that I would like to analyze. The VM ran with -Xmx31g, and the heap dump file is 48 GB large.
I won't even try jhat, as it requires about five times the heap memory (that would be 240 GB in my case) and is awfully slow.
Eclipse MAT crashes with an ArrayIndexOutOfBoundsException after analyzing the heap dump for several hours.
What other tools are available for that task? A suite of command line tools would be best, consisting of one program that transforms the heap dump into efficient data structures for analysis, combined with several other tools that work on the pre-structured data.

Normally, what I use is ParseHeapDump.sh included within Eclipse Memory Analyzer and described here, and I do that onto one our more beefed up servers (download and copy over the linux .zip distro, unzip there). The shell script needs less resources than parsing the heap from the GUI, plus you can run it on your beefy server with more resources (you can allocate more resources by adding something like -vmargs -Xmx40g -XX:-UseGCOverheadLimit to the end of the last line of the script.
For instance, the last line of that file might look like this after modification
./MemoryAnalyzer -consolelog -application org.eclipse.mat.api.parse "$#" -vmargs -Xmx40g -XX:-UseGCOverheadLimit
Run it like ./path/to/ParseHeapDump.sh ../today_heap_dump/jvm.hprof
After that succeeds, it creates a number of "index" files next to the .hprof file.
After creating the indices, I try to generate reports from that and scp those reports to my local machines and try to see if I can find the culprit just by that (not just the reports, not the indices). Here's a tutorial on creating the reports.
Example report:
./ParseHeapDump.sh ../today_heap_dump/jvm.hprof org.eclipse.mat.api:suspects
Other report options:
org.eclipse.mat.api:overview and org.eclipse.mat.api:top_components
If those reports are not enough and if I need some more digging (i.e. let's say via oql), I scp the indices as well as hprof file to my local machine, and then open the heap dump (with the indices in the same directory as the heap dump) with my Eclipse MAT GUI. From there, it does not need too much memory to run.
EDIT:
I just liked to add two notes :
As far as I know, only the generation of the indices is the memory intensive part of Eclipse MAT. After you have the indices, most of your processing from Eclipse MAT would not need that much memory.
Doing this on a shell script means I can do it on a headless server (and I normally do it on a headless server as well, because they're normally the most powerful ones). And if you have a server that can generate a heap dump of that size, chances are, you have another server out there that can process that much of a heap dump as well.

First step: increase the amount of RAM you are allocating to MAT. By default it's not very much and it can't open large files.
In case of using MAT on MAC (OSX) you'll have file MemoryAnalyzer.ini file in MemoryAnalyzer.app/Contents/MacOS. It wasn't working for me to make adjustments to that file and have them "take". You can instead create a modified startup command/shell script based on content of this file and run it from that directory. In my case I wanted 20 GB heap:
./MemoryAnalyzer -vmargs -Xmx20g --XX:-UseGCOverheadLimit ... other params desired
Just run this command/script from Contents/MacOS directory via terminal, to start the GUI with more RAM available.

I suggest trying YourKit. It usually needs a little less memory than the heap dump size (it indexes it and uses that information to retrieve what you want)

The accepted answer to this related question should provide a good start for you (if you have access to the running process, generates live jmap histograms instead of heap dumps, it's very fast):
Method for finding memory leak in large Java heap dumps
Most other heap analysers (I use IBM http://www.alphaworks.ibm.com/tech/heapanalyzer) require at least a percentage of RAM more than the heap if you're expecting a nice GUI tool.
Other than that, many developers use alternative approaches, like live stack analysis to get an idea of what's going on.
Although I must question why your heaps are so large? The effect on allocation and garbage collection must be massive. I'd bet a large percentage of what's in your heap should actually be stored in a database / a persistent cache etc etc.

This person http://blog.ragozin.info/2015/02/programatic-heapdump-analysis.html
wrote a custom "heap analyzer" that just exposes a "query style" interface through the heap dump file, instead of actually loading the file into memory.
https://github.com/aragozin/heaplib
Though I don't know if "query language" is better than the eclipse OQL mentioned in the accepted answer here.

The latest snapshot build of Eclipse Memory Analyzer has a facility to randomly discard a certain percentage of objects to reduce memory consumption and allow the remaining objects to be analyzed. See Bug 563960 and the nightly snapshot build to test this facility before it is included in the next release of MAT. Update: it is now included in released version 1.11.0.

A not so well known tool - http://dr-brenschede.de/bheapsampler/ works well for large heaps. It works by sampling so it doesn't have to read the entire thing, though a bit finicky.

This is not a command line solution, however I like the tools:
Copy the heap dump to a server large enough to host it. It is very well possible that the original server can be used.
Enter the server via ssh -X to run the graphical tool remotely and use jvisualvm from the Java binary directory to load the .hprof file of the heap dump.
The tool does not load the complete heap dump into memory at once, but loads parts when they are required. Of course, if you look around enough in the file the required memory will finally reach the size of the heap dump.

I came across an interesting tool called JXray. It provides limited evaluation trial license. Found it very useful to find memory leaks. You may give it a shot.

Try using jprofiler , its works good in analyzing large .hprof, I have tried with file sized around 22 GB.
https://www.ej-technologies.com/products/jprofiler/overview.html
$499/dev license but has a free 10 day evaluation

When the problem can be "easily" reproduced, one unmentioned alternative is to take heap dumps before memory grows that big (e.g., jmap -dump:format=b,file=heap.bin <pid>).
In many cases you will already get an idea of what's going on without waiting for an OOM.
In addition, MAT provides a feature to compare different snapshots, which can come handy (see https://stackoverflow.com/a/55926302/898154 for instructions and a description).

Is there any way to analyze a truncated Java Heap Dump (hprof file)?

In my work, we are running into a difficult to reproduce OOM issue. Or, more accurately, it is very easy to reproduce on one system, making that system unusable, but difficult to reproduce anywhere else, given the same inputs.
The application is being run as a service using a service wrapper. We did manage to get the configuration changed to launch it with the option of outputting a heap dump file on OOM but, unfortunately, they were truncated, most likely due to the service wrapper timing out and killing the process as it wrote the file. This is readily apparent, since the max memory is set to 1GB, and the hprof files are as small as 700MB, which is too small to be the entire heap upon OOM.
It would take a lot of jumping through hoops to additionally configure the wrapper to give the java process a longer time to write out the heap, but we are pursuing this using these 2 options:
wrapper.jvm_exit.timeout=600
wrapper.shutdown.timeout=600
The question is, is there anything useful I can do with the truncated hprof files I have? Eclipse MAT chokes on them. Jhat appears to load them, but then only shows 3 instances of Java.Object of size 0 and nothing else. I tried YourKit and it couldn't write its oids file.
It seems to me like these files should have some useful, accessible information in them. Is there a tool that can read what's there?
Thank you for your time!

Best option for analyzing the dump file which i came across till date, is text editors like vim.

Use Jpofiler(https://www.ejtechnologies.com/products/jprofiler/overview.html). It's not free, but, it has a trial period.
The live memory and CPU view options are your best bet to isolate your issues. It generally runs reasonably well even on large dumps.

Hunting for “too many files” cause

We hit a strange issue on one of customers servers, where Java encounters "Too many files",
Checking the descriptors via lsof produces a large list of "sock" descriptors with "can't identify protocol".
I suspect it happens due to sockets that opened for too much time, but as our thread dump contains a lot of them, I have no clear idea who exactly the culprit.
Is there any good method to detect which threads exactly open these sockets?
Thanks.

Is there any good method to detect which threads exactly open these sockets?
Not the threads per se.
One approach is to run the application using a profiler. This could well find the problem even if you cannot exactly reproduce the customer's problem. (#SyBer reports that the YourKit profiler has specific support for finding socket leaks ... see comment.)
A second approach is to tweak your test platform by using ulimit to REDUCE the number of open files allowed. This may make it easier to reproduce the "too many files open" scenario in your test environment.
Finally, I'd recommend "grepping" your codebase to find all places where socket objects are created. Then examine them all to make sure they use correctly try / finally blocks to ensure that the sockets are always closed.

Start from
netstat -ano | grep $YOUR_PROCESS_ID - for unix
netstat -ano | find "$YOUR_PROCESS_ID" - for windows
At least you will see the whether connections really exist.

Did you try ulimit to increase amount of open files? Also, it's possible that you're not closing your sockets properly, so you have a leak.

The only "good" method to detect leaking sockets is either a very verbose log, or a profiler. Do a memory dump and analyse the objects.

Valgrind will identify file descriptor leaks if you pass --track-fds=yes. Valgrind generates short stack traces at the "acquisition" point of the resources it tracks. When you have located the source lines the leaks are occurring, you can combine this with the return value of pthread_self to your logging system (I'm sure you would be using one!), or place breakpoints in gdb.
Likely you are neglecting to close() sockets that you are finished with. This needs to be done even when the peer initiates the shutdown.

HPjmeter-like graphical tool to view -agentlib:hprof profiling output

What tools are available to view the output of the built-in JVM profiler? For example, I'm starting my JVM with:
-agentlib:hprof=cpu=times,thread=y,cutoff=0,format=a,file=someFile.hprof.txt
This generates output in the hprof ("JAVA PROFILE 1.0.1") format.
I have had success in the past using HPjmeter to view these output files in a reasonable way. However, for whatever reason the files that are generated using the current version of the Sun JVM fail to load in the current version of HPjmeter:
java.lang.NullPointerException
at com.hp.jmeter.f.jb.a(Unknown Source)
at com.hp.jmeter.f.a.a(Unknown Source)
at com.hp.c.a.j.z.run(Unknown Source)
Exception in thread "HPeprofDataFileReaderThread" java.lang.AssertionError: null pointer exception from loader
at com.hp.jmeter.f.a.a(Unknown Source)
at com.hp.c.a.j.z.run(Unknown Source)
(Why would they obfuscate the bytecode for a free product?!)
Two questions arise from this:
Does anyone know the cause of this HPjmeter error? (EDIT: Yes--see below)
What other tools exist to read hprof files? And why are there none from Sun (are there)?
I know the Eclipse TPTP and other tools can monitor JVMTI data on the fly, but I need a solution that can process the generated hprof files after the fact since the deployed machine only has a JRE (not a JDK) intalled.
EDIT: A very helpful HPjmeter developer replied to my question on an HP ITRC forum indicating that heap=dump needs to be included in the -agentlib options temporarily until a bug in HPjmeter is fixed. This information makes HPjmeter viable again, but I will still leave the question open to see if anyone knows of any other tools.
EDIT: As of version 4.0.00 of HPjmeter (available 05/2009) this bug has been fixed.

Your Kit Java Profiler is able to read hprof snapshots (I am not sure if only for memory profiling or for CPU as well). It is not free but is by far the best java profiler I ever used. It presents the results in a clear, intuitive way and performs well on large data sets. The documentation is also pretty good.

For viewing and analyzing the output of hprof=samples or hprof=cpu I have used PerfAnal with good results. The GUI is a bit spartan, but very useful.
PerfAnal is a free download (GPL, originally an example project in the book Java Programming on Linux).
See this article:
http://www.oracle.com/technetwork/articles/javase/perfanal-137231.html
for more information and the download.
Normally you can just run
java -jar PerfAnal.jar hprof.java.txt
You may need to fiddle with -Xmx for large hprof files.

I am not 100% sure it'll work (it sounds like it will) and I am not sure it'll show it in the format you want... but have you thought about the VisualVM?
I believe it'll open up the resulting file.

I have been using Eclipse Memory Analyzer for analyzing different performance problems successfully. First of all, install the tool as described in the project webpage in Eclipse.
After that, you can create a dump file knowing the pid of the jvm to be analyzed
jmap -dump:format=b,file=<filename>.hprof <jvm_pid>
Then just import the .hprof file in eclipse. It has some automatic reports that try (for me they usually do not work) to point out which could be the possible problems.
Edit:
Answering the comment: You are right, it is more like a leak finder for Java. For performance problems, I have played with JRat for small projects. It shows time comsumed per method, number of times a method is called, hierarchy of calls, etc. The only problem is that as far as I know, it does not support .hprof files. To use it, yo need to execute your program adding a VM argument
-javaagent:<path>/shiftone-jrat.jar
This will generate a directory with the profile captured by the tool. Then, execute
java -jar shiftone-jrat.jar
And open the trace. Even been a simple tool, I think it could be useful.

Tools to view/solve Windows XP memory fragmentation

We have a java program that requires a large amount of heap space - we start it with (among other command line arguments) the argument -Xmx1500m, which specifies a maximum heap space of 1500 MB. When starting this program on a Windows XP box that has been freshly rebooted, it will start and run without issues. But if the program has run several times, the computer has been up for a while, etc., when it tries to start I get this error:
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.
I suspect that Windows itself is suffering from memory fragmentation, but I don't know how to confirm this suspicion. At the time that this happens, Task manager and sysinternals procexp report 2000MB free memory. I have looked at this question related to internal fragmentation
So the first question is, How do I confirm my suspicion?
The second question is, if my suspicions are correct, does anyone know of any tools to solve this problem? I've looked around quite a bit, but I haven't found anything that helps, other than periodic reboots of the machine.
ps - changing operating systems is also not currently a viable option.

Agree with Torlack, a lot of this is because other DLLs are getting loaded and go into certain spots, breaking up the amount of memory you can get for the VM in one big chunk.
You can do some work on WinXP if you have more than 3G of memory to get some of the windows stuff moved around, look up PAE here:
http://www.microsoft.com/whdc/system/platform/server/PAE/PAEdrv.mspx
Your best bet, if you really need more than 1.2G of memory for your java app, is to look at 64 bit windows or linux or OSX. If you're using any kind of native libraries with your app you'll have to recompile them for 64 bit, but its going to be a lot easier than trying to rebase dlls and stuff to maximize the memory you can get on 32 bit windows.
Another option would be to split your program up into multiple VMs and have them communicate with eachother via RMI or messaging or something. That way each VM can have some subset of the memory you need. Without knowing what your app does, i'm not sure that this will help in any way, though...

Unless you are running out of page file space, this issue isn't that the computer is running out of memory. The whole point of virtual memory is to allow the processes to use more virtual memory than is physically available.
Not knowing how the JVM handles the heap, it is a bit hard to say exactly what the problem is, but one of the common issues is that there isn't enough contiguous free address space available in your process to allow the heap to be extended. Why this would be a problem after the machine has been running a while is a bit confusing.
I've been working on a similar problem at work. I have found that running the program using WinDBG and using the "!address" and "!address -summary" commands have been invaluable in tracking down why a processes' virtual address space has become fragmented. You can also try running the program after reboot and using the "!address" command to take a picture of the address space and then do the same when the program no longer runs. This might clue you in on the problem. Maybe something simple as an extra DLL getting loading might cause the problem.

I suspect that the problem is Windows memory fragmentation. There is another question here on StackOverflow called Java Maximum Memory on Windows XP that mentions using Process Explorer to look at where DLLs are mapped into memory, and then to address the problem by rebasing the DLLs so that load into memory in a more compact way.

Using Minimem (http://minimem.kerkia.net/) for that application might fix your problem. However, I'm not sure this is the answer you are looking for. I hope it helps.

Maybe you should consider to start the program and reserving the memory and not
end the VM after each run. Look for different GC options and release your objects.

Use vmmap from Microsoft's SysInternals tools to view the fragmentation of the virtual address space, and identify what's breaking up the space

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.