Parsing hprof memory dump

Parsing hprof memory dump - java

I am working on an Android application and I have a memory dump (i.e. a .hprof file) that I captured from my application.
I want to investigate a certain container of the application.
Just like an ArrayList, the container has a growth policy that increases the capacity of the container when the size is reached to guarantee constant amortized time cost for adding elements into it.
My hypothesis is that every instance of the container possesses multiple unused allocated memory spaces.
I want to automate the process of investigating this case. Is there a way to write a script that can parse this .hprof file and return the ratio of these unused allocated spaces to total_entries?
Thank you!

Related

Java OutOfMemoryError, but very few Live Objects seen in JFR?

I have some code that throws an OutOfMemoryError.
I set the JVM to dump on OOM and I opened the dump in Java Flight Recorder.
When inspecting the Live Objects in JFR, I see very few objects (less than 60).
How can I find out the largest object(s) being held in memory and noncollectable at the moment the OOM was triggered?

Objects are sampled, so there is no way you can be sure to see the largest object before OOM.
That said, 60 samples are usually sufficient to find a memory leak, at least if the application has been running for some time and the leak is not negligible in size.
Samples that happens in the beginning are typically singletons and static objects that you have for the whole duration of application. Samples that happen at the end are typically short-lived objects that are to be garbage collected. In JMC you can click in "the middle" of the timeline on top to find memore leak candidates. Then you can look at stack trace and path to GC root and see if you see something suspicious.
You can also use the command line tool and do:
$ jfr print --events OldObjectSample --stack-depth 64 recording.jfr
It will list the samples in chronological order. It may be easier to see each sample than looking at an aggregate. The command line approach is described in detail here

You can't do this in an automated way (like memory analyzer tools do it with heap dumps) due to the nature of data being collected.
It is totally fine that you can see only handful of objects. The reason is how low overhead sampling works - on every new TLAB allocation JFR steps in and takes a few objects from old TLAB. Therefore you don't get all objects recorded, only a representative sample of objects being allocated. This should be enough to give you a ratio of objects in heap. Also, all of the objects reported are live at the point of recording dump.
If you think that you get too little samples to come to a proper conclusion, it might be that your heap is small relative to TLAB size and you might want to reduce TLAB size. This is not advisable in production environment as improper TLAB setting can reduce application performance.
If you had "Memory Leak Detection" to "Object types + Allocation Stack Traces + Path to GC Root" set in the profiling configuration during record, you can trace where live objects go in code after being created and you can reconstruct representative dominator tree that way.
If you care about large objects meaning large themselves (and not retaining most of heap), you can find objects that are larger than TLAB by looking at "TLAB Allocations" page and look for "Total Allocations Outside TLAB" column. This data would be collected only if profiling configuration had "Memory Profiling" set to "Object Allocation and Promotion".
By profiling configuration I mean the file that you specify with settings option when you start recording with JFR. This file can be created using JMC application's "Flight Recording Template Manager".

Managing JAVA Heap size during storing data from Oracle Database

I need to store in memory a huge amount of data by looping over a result set (Oracle database).
I carried out some test and, by using a profiler, I noticed that there is a considerable difference between the heap size and the used heap (i.e. my data). Here an example.
I already saw the available JVM arguments in order to set the right heap size, but the problem is that I don't know in advance how many bytes data will occupy (since the amount of data can vary from one test to another).
By observing the graph in the image, the problem seems to be the memory "peaks" during the execution.
Could these peaks be related to the number of fetched rows (or in general to the extracted data?
Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?
Thanks

By looking at your memory chart, it seems much of the data is of a temporary nature and can be removed from heap at some point. The final ratio of used heap vs. its total size says it all.
It seems like the temporary data's (e.g. buffered data from an Oracle ResultSet) time to live is too high or the eden space it too small and thus data is being moved from the eden and/or survivor space to the old generation space where it's being collected as a result of the JVM detecting the need to run the GC on the old generation space. This possibly happens when you iterate over your ResultSet and the Oracle driver needs to fetch the next chunk of data from the database, which can be fairly large.
At this point I should go a little bit into detail about the Oracle ResultSet buffer. It's basically just a chunk of bytes on the heap. Depending on the column data it is stored as something different than you'd read from the ResultSet. Take a java.sql.Timestamp for instance. Inside the buffer it's stored as an oracle.sql.TIMESTAMP or even just plain bytes. This means that whenever you extract a java.sql.Timestamp from a ResultSet there's the need for another object to be allocated. And this object is most likely the "final" object you want to keep eventually.
I suggest tuning the JVM's GC to your needs. Maybe you can figure out which data is being constantly collected. Try adjusting the eden size so the JVM doesn't need to promote too much to the old generation. You can also adjust how much new space the JVM allocates on demand and how it shrinks when detecting a gap in usage and allocated size.
You can find a list of JVM options here.

Of course you can limit the memory, but there is not much benefit in doing this. If you do so, garbage collection will have to happen more often, which will result in a slower execution of your program.
This is simply the way garbage collection works in Java. If you have enough memory GC will not be called. That gives your application more resources (CPU time).
Also, to optimize memory consumption you should check your algorithms and see if you can reuse some objects instead of creating new ones, because the new objects is exactly what makes the blue line go up. See fly weight and other similar patterns which are used to control memory consumption.

Could these peaks be related to the number of fetched rows (or in general to the extracted data?
I presume you are referring to the blue peaks.
The blue area represents the memory used at any given point in time, and the peaks represent points at which the garbage collector runs. As you can see, the line slopes up at an angle to each peak, and then falls vertically. This is normal behavior.
You will also notice that heights of the peaks and troughs are trending upward. This is most likely the effect of your application's in-memory data structure growing.
Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?
Basically, no there isn't. If the blue line wasn't jagged, or the peaks were shallower and closer together, that would mean that the GC is running more frequently ... which would be bad for performance.
Basically, if you are building a big data structure in memory, you need enough memory to represent it, PLUS a bunch of extra space for temporary objects and to give the garbage collector room to do what it needs to do.
If your concern is that your application is using too much memory overall, then you need to optimize the in-memory data structure you are building, and check to see you don't have any (other) memory leaks.
If your concern is that you can't predict how big the Java heap needs to be, then consider running the SQL query as a COUNT first, and then start / restart the Java application with a heap size estimate base on the count.

How to avoid Java Heap Space Exception in java

I'm reading data from IO having huge volume of data and I need to store the data in key value pair in Map or properties file, then only I can use that data for generating reports. But when I am storing this huge data in Map or Properties file, Heap Memory Exception is coming.Instead, if I am using SQLLite its taking very huge time to retrieve that. Is there any different way available to achieve this.Please suggest.

Java Heap Space Important Points
Java Heap Memory is part of memory allocated to JVM by Operating System.
Whenever we create objects they are created inside Heap in Java.
Java Heap space is divided into three regions or generation for sake of garbage collection called New Generation, Old or tenured Generation or Perm Space. Permanent generation is garbage collected during full gc in hotspot JVM.
You can increase or change size of Java Heap space by using JVM command line option -Xms, -Xmx and -Xmn. don't forget to add word "M" or "G" after specifying size to indicate Mega or Gig.
For example you can set java heap size to 258MB by executing following command java -Xmx256m javaClassName (your program class name).
You can use either JConsole or Runtime.maxMemory(), Runtime.totalMemory(), Runtime.freeMemory() to query about Heap size programmatic in Java.
You can use command "jmap" to take Heap dump in Java and "jhat" to analyze that heap dump.
Java Heap space is different than Stack which is used to store call hierarchy and local variables.
Java Garbage collector is responsible for reclaiming memory from dead object and returning to Java Heap space.
Don’t panic when you get java.lang.OutOfMemoryError, sometimes it’s just matter of increasing heap size but if it’s recurrent then look for memory leak in Java.
Use Profiler and Heap dump Analyzer tool to understand Java Heap space and how much memory is allocated to each object.
Reference link for more details:
https://docs.oracle.com/cd/E19159-01/819-3681/abeii/index.html
https://docs.oracle.com/cd/E40520_01/integrator.311/integratoretl_users/src/ti_troubleshoot_memory_errors.html

You need to do a rough estimate of of memory needed for your map. How many keys and values? How large are keys and values? For example, if the keys are longs and values are strings 40 characters long on average, the absolute minimum for 2 billion key-value pairs is (40 + 8) * 2E9 - approximately 100 GB. Of course, the real requirement is larger than the minimum estimate - as much as two times larger depending on the nature of the keys and values.
If the estimated amount of memory beyond reasonable (100 GB is beyond reasonable unless you have lots of money) you need to figure out a way to partition your processing. You need to read in a large chunk of data, then run some algorithm on it to reduce it to some small size. Then do it for all other chunks one by one, making sure to not keeping the old chunk around when you process the new chunk. Finally, look at the results from all chunks and compute the final result. For a better description of this approach, look up "map-reduce'.
If the estimated amount of memory is somewhat reasonable (say, 8 GB - and you have a 16 GB machine) - use 64 bit JVM, set the maximum heap memory using -Xmx switch, make sure you use the most efficient data structures such as Trove maps.
Good luck!

Increasing the heap size is one option but there is an alternative to store data off heap by using memory mapped files in java.You can refer this

How can I correct a "Java heap space - Out of Memory" error

I'm currently running my system against a rather large dataset and am getting the error. 'Out of memory. Java Heap Space'.
Is there anyway to get around this or is it just a case of the dataset is too large and can't be used?

In general, you can either
give it more memory e.g. increase the maximum heap size, but don't give it more than about 90% of main memory. BTW the default is 25% of main memory up to 32GB.
optimise the code so that it uses less memory, e.g. use a memory profiler. You can use a more efficient data structure or load portions of data into memory at a time.
break up the data so it own works on a portion at a time.

If it's not the dataset that's eating up memory, it could be that you are not freeing up objects once they are inactive.
This is typically due to keeping references to very large objects or to lots objects laying around long after they are no longer needed. This is most likely references that are static variables, but it can also be references to large temporary variables (e.g., largeStringBuilderobjects) within methods that are still active.

MAT space vs. TaskManager space

after searching the web for a while I decided to ask you for help with my problem.
My program should analyze logfiles, which are really big. They are about 100mb up to 2gb. I want to read the files using NIO-classes like FileChannel.
I don't want to save the files in memory, but I want to process the lines immediately. The code works.
Now my problem: I analyzed the Memory usage with the Eclipse MAT plugin and it says about 18mb of data is saved (that fits). But TaskManager in Windows says that about 180mb are used by the JVM.
Can you tell me WHY this is?
I don't want to save the data reading with the FileChannel, i just want to process it. I am closing the Channel afterwards - I thought every data would be deleted then?
I hope you guys can help me with the difference between the used space is shown in MAT and the used space is shown in TaskManager.

MAT will only show objects that are actively references by your program. The JVM uses more memory than that:
Its own code
Non-object data (classes, compiled bytecode e.t.c.)
Heap space that is not currently in use, but has already been allocated.
The last case is probably the most major one. Depending on how much physical memory there is on a computer, the JVM will set a default maximum size for its heap. To improve performance it will keep using up to that amount of memory with minimal garbage collection activity. That means that objects that are no longer referenced will remain in memory, rather than be garbage collected immediately, thus increasing the total amount of memory used.
As a result, the JVM will generally not free any memory it has allocated as part of its heap back to the system. This will show-up as an inordinate amount of used memory in the OS monitoring utilities.
Applications with high object allocation/de-allocation rates will be worse - I have an application that uses 1.8GB of memory, while actually requiring less than 100MB. Reducing the maximum heap size to 120 MB, though, increases the execution time by almost a full order of magnitude.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.