after searching the web for a while I decided to ask you for help with my problem.
My program should analyze logfiles, which are really big. They are about 100mb up to 2gb. I want to read the files using NIO-classes like FileChannel.
I don't want to save the files in memory, but I want to process the lines immediately. The code works.
Now my problem: I analyzed the Memory usage with the Eclipse MAT plugin and it says about 18mb of data is saved (that fits). But TaskManager in Windows says that about 180mb are used by the JVM.
Can you tell me WHY this is?
I don't want to save the data reading with the FileChannel, i just want to process it. I am closing the Channel afterwards - I thought every data would be deleted then?
I hope you guys can help me with the difference between the used space is shown in MAT and the used space is shown in TaskManager.
MAT will only show objects that are actively references by your program. The JVM uses more memory than that:
Its own code
Non-object data (classes, compiled bytecode e.t.c.)
Heap space that is not currently in use, but has already been allocated.
The last case is probably the most major one. Depending on how much physical memory there is on a computer, the JVM will set a default maximum size for its heap. To improve performance it will keep using up to that amount of memory with minimal garbage collection activity. That means that objects that are no longer referenced will remain in memory, rather than be garbage collected immediately, thus increasing the total amount of memory used.
As a result, the JVM will generally not free any memory it has allocated as part of its heap back to the system. This will show-up as an inordinate amount of used memory in the OS monitoring utilities.
Applications with high object allocation/de-allocation rates will be worse - I have an application that uses 1.8GB of memory, while actually requiring less than 100MB. Reducing the maximum heap size to 120 MB, though, increases the execution time by almost a full order of magnitude.
Related
I have some code that throws an OutOfMemoryError.
I set the JVM to dump on OOM and I opened the dump in Java Flight Recorder.
When inspecting the Live Objects in JFR, I see very few objects (less than 60).
How can I find out the largest object(s) being held in memory and noncollectable at the moment the OOM was triggered?
Objects are sampled, so there is no way you can be sure to see the largest object before OOM.
That said, 60 samples are usually sufficient to find a memory leak, at least if the application has been running for some time and the leak is not negligible in size.
Samples that happens in the beginning are typically singletons and static objects that you have for the whole duration of application. Samples that happen at the end are typically short-lived objects that are to be garbage collected. In JMC you can click in "the middle" of the timeline on top to find memore leak candidates. Then you can look at stack trace and path to GC root and see if you see something suspicious.
You can also use the command line tool and do:
$ jfr print --events OldObjectSample --stack-depth 64 recording.jfr
It will list the samples in chronological order. It may be easier to see each sample than looking at an aggregate. The command line approach is described in detail here
You can't do this in an automated way (like memory analyzer tools do it with heap dumps) due to the nature of data being collected.
It is totally fine that you can see only handful of objects. The reason is how low overhead sampling works - on every new TLAB allocation JFR steps in and takes a few objects from old TLAB. Therefore you don't get all objects recorded, only a representative sample of objects being allocated. This should be enough to give you a ratio of objects in heap. Also, all of the objects reported are live at the point of recording dump.
If you think that you get too little samples to come to a proper conclusion, it might be that your heap is small relative to TLAB size and you might want to reduce TLAB size. This is not advisable in production environment as improper TLAB setting can reduce application performance.
If you had "Memory Leak Detection" to "Object types + Allocation Stack Traces + Path to GC Root" set in the profiling configuration during record, you can trace where live objects go in code after being created and you can reconstruct representative dominator tree that way.
If you care about large objects meaning large themselves (and not retaining most of heap), you can find objects that are larger than TLAB by looking at "TLAB Allocations" page and look for "Total Allocations Outside TLAB" column. This data would be collected only if profiling configuration had "Memory Profiling" set to "Object Allocation and Promotion".
By profiling configuration I mean the file that you specify with settings option when you start recording with JFR. This file can be created using JMC application's "Flight Recording Template Manager".
I need to store in memory a huge amount of data by looping over a result set (Oracle database).
I carried out some test and, by using a profiler, I noticed that there is a considerable difference between the heap size and the used heap (i.e. my data). Here an example.
I already saw the available JVM arguments in order to set the right heap size, but the problem is that I don't know in advance how many bytes data will occupy (since the amount of data can vary from one test to another).
By observing the graph in the image, the problem seems to be the memory "peaks" during the execution.
Could these peaks be related to the number of fetched rows (or in general to the extracted data?
Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?
Thanks
By looking at your memory chart, it seems much of the data is of a temporary nature and can be removed from heap at some point. The final ratio of used heap vs. its total size says it all.
It seems like the temporary data's (e.g. buffered data from an Oracle ResultSet) time to live is too high or the eden space it too small and thus data is being moved from the eden and/or survivor space to the old generation space where it's being collected as a result of the JVM detecting the need to run the GC on the old generation space. This possibly happens when you iterate over your ResultSet and the Oracle driver needs to fetch the next chunk of data from the database, which can be fairly large.
At this point I should go a little bit into detail about the Oracle ResultSet buffer. It's basically just a chunk of bytes on the heap. Depending on the column data it is stored as something different than you'd read from the ResultSet. Take a java.sql.Timestamp for instance. Inside the buffer it's stored as an oracle.sql.TIMESTAMP or even just plain bytes. This means that whenever you extract a java.sql.Timestamp from a ResultSet there's the need for another object to be allocated. And this object is most likely the "final" object you want to keep eventually.
I suggest tuning the JVM's GC to your needs. Maybe you can figure out which data is being constantly collected. Try adjusting the eden size so the JVM doesn't need to promote too much to the old generation. You can also adjust how much new space the JVM allocates on demand and how it shrinks when detecting a gap in usage and allocated size.
You can find a list of JVM options here.
Of course you can limit the memory, but there is not much benefit in doing this. If you do so, garbage collection will have to happen more often, which will result in a slower execution of your program.
This is simply the way garbage collection works in Java. If you have enough memory GC will not be called. That gives your application more resources (CPU time).
Also, to optimize memory consumption you should check your algorithms and see if you can reuse some objects instead of creating new ones, because the new objects is exactly what makes the blue line go up. See fly weight and other similar patterns which are used to control memory consumption.
Could these peaks be related to the number of fetched rows (or in general to the extracted data?
I presume you are referring to the blue peaks.
The blue area represents the memory used at any given point in time, and the peaks represent points at which the garbage collector runs. As you can see, the line slopes up at an angle to each peak, and then falls vertically. This is normal behavior.
You will also notice that heights of the peaks and troughs are trending upward. This is most likely the effect of your application's in-memory data structure growing.
Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?
Basically, no there isn't. If the blue line wasn't jagged, or the peaks were shallower and closer together, that would mean that the GC is running more frequently ... which would be bad for performance.
Basically, if you are building a big data structure in memory, you need enough memory to represent it, PLUS a bunch of extra space for temporary objects and to give the garbage collector room to do what it needs to do.
If your concern is that your application is using too much memory overall, then you need to optimize the in-memory data structure you are building, and check to see you don't have any (other) memory leaks.
If your concern is that you can't predict how big the Java heap needs to be, then consider running the SQL query as a COUNT first, and then start / restart the Java application with a heap size estimate base on the count.
I am getting:
java.lang.OutOfMemoryError : Java heap space
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:407)
at java.lang.StringBuilder.append(StringBuilder.java:136)
ltimately you always have a finite max of heap to use no matter what platform you are running on. In Windows 32 bit this is around 2gb (not specifically heap but total amount of memory per process). It just happens that Java happens to make the default smaller (presumably so that the programmer can't create programs that have runaway memory allocation without running into this problem and having to examine exactly what they are doing).
So this given there are several approaches you could take to either determine what amount of memory you need or to reduce the amount of memory you are using. One common mistake with garbage collected languages such as Java or C# is to keep around references to objects that you no longer are using, or allocating many objects when you could reuse them instead. As long as objects have a reference to them they will continue to use heap space as the garbage collector will not delete them.
In this case you can use a Java memory profiler to determine what methods in your program are allocating large number of objects and then determine if there is a way to make sure they are no longer referenced, or to not allocate them in the first place. One option which I have used in the past is "JMP" http://www.khelekore.org/jmp/.
If you determine that you are allocating these objects for a reason and you need to keep around references (depending on what you are doing this might be the case), you will just need to increase the max heap size when you start the program. However, once you do the memory profiling and understand how your objects are getting allocated you should have a better idea about how much memory you need.
In general if you can't guarantee that your program will run in some finite amount of memory (perhaps depending on input size) you will always run into this problem. Only after exhausting all of this will you need to look into caching objects out to disk etc. At this point you should have a very good reason to say "I need Xgb of memory" for something and you can't work around it by improving your algorithms or memory allocation patterns. Generally this will only usually be the case for algorithms operating on large datasets (like a database or some scientific analysis program) and then techniques like caching and memory mapped IO become useful.
The OutOfMemoryError is usually caused by the VM not having enough memory to run your project. Did you run it directly from the command line or did you use an IDE ?
For example, Try running your programm with adding the -Xmx1G option which allocate 1Go of memory heap to your programm, you can of course adjust it to your convenience. the G is for Go and the m is for Mb.
You should give the heap a bigger size for it to work.
I'm currently running my system against a rather large dataset and am getting the error. 'Out of memory. Java Heap Space'.
Is there anyway to get around this or is it just a case of the dataset is too large and can't be used?
In general, you can either
give it more memory e.g. increase the maximum heap size, but don't give it more than about 90% of main memory. BTW the default is 25% of main memory up to 32GB.
optimise the code so that it uses less memory, e.g. use a memory profiler. You can use a more efficient data structure or load portions of data into memory at a time.
break up the data so it own works on a portion at a time.
If it's not the dataset that's eating up memory, it could be that you are not freeing up objects once they are inactive.
This is typically due to keeping references to very large objects or to lots objects laying around long after they are no longer needed. This is most likely references that are static variables, but it can also be references to large temporary variables (e.g., largeStringBuilderobjects) within methods that are still active.
I have one probably dumb question. I am currently testing CSP solvers choco and jacop. When I run profiling of the app (graph colouring, about 3000 nodes), I dont fully understand the results.
The used heap space declared by profiler is about 1GB of memory. The sum of all object created is less than 100MB. Where are the other 900MB of RAM?
I think that method calls (solvers probably use massive backtracking) are being alocated on stack, so here should not be the problem. When I reduce maximum memory by using Xmx param, the app fails with exception:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
So it seems, that the rest isnt unused uncollected memory (because in this case the GC would dealocate it (and would not fail)).
Thanks for your help.
Can you get a map of the heap? Most likely it's fragmented so those 100M of objects are spread out across the entire memory space. The memory needed is both a function of allocated objects and how fast they're being allocated and then de-referenced. That error means the memory area is too small for the work load, the garbage collector is consuming a lot CPU managing it, and it went beyond the allowed threshold.
Amir Afghani was probably correct in his comment. The classes (objects) in Netbeans 6.9.1 are probably somehow filtered (?or the profiler is bogus?), because when I performed the heap dump from java visual VM and analyzed it, it showed me !very! different numbers (which were in sum the same as the used heap).
Thanks for your replies.