I'm reading data from IO having huge volume of data and I need to store the data in key value pair in Map or properties file, then only I can use that data for generating reports. But when I am storing this huge data in Map or Properties file, Heap Memory Exception is coming.Instead, if I am using SQLLite its taking very huge time to retrieve that. Is there any different way available to achieve this.Please suggest.
Java Heap Space Important Points
Java Heap Memory is part of memory allocated to JVM by Operating System.
Whenever we create objects they are created inside Heap in Java.
Java Heap space is divided into three regions or generation for sake of garbage collection called New Generation, Old or tenured Generation or Perm Space. Permanent generation is garbage collected during full gc in hotspot JVM.
You can increase or change size of Java Heap space by using JVM command line option -Xms, -Xmx and -Xmn. don't forget to add word "M" or "G" after specifying size to indicate Mega or Gig.
For example you can set java heap size to 258MB by executing following command java -Xmx256m javaClassName (your program class name).
You can use either JConsole or Runtime.maxMemory(), Runtime.totalMemory(), Runtime.freeMemory() to query about Heap size programmatic in Java.
You can use command "jmap" to take Heap dump in Java and "jhat" to analyze that heap dump.
Java Heap space is different than Stack which is used to store call hierarchy and local variables.
Java Garbage collector is responsible for reclaiming memory from dead object and returning to Java Heap space.
Don’t panic when you get java.lang.OutOfMemoryError, sometimes it’s just matter of increasing heap size but if it’s recurrent then look for memory leak in Java.
Use Profiler and Heap dump Analyzer tool to understand Java Heap space and how much memory is allocated to each object.
Reference link for more details:
https://docs.oracle.com/cd/E19159-01/819-3681/abeii/index.html
https://docs.oracle.com/cd/E40520_01/integrator.311/integratoretl_users/src/ti_troubleshoot_memory_errors.html
You need to do a rough estimate of of memory needed for your map. How many keys and values? How large are keys and values? For example, if the keys are longs and values are strings 40 characters long on average, the absolute minimum for 2 billion key-value pairs is (40 + 8) * 2E9 - approximately 100 GB. Of course, the real requirement is larger than the minimum estimate - as much as two times larger depending on the nature of the keys and values.
If the estimated amount of memory beyond reasonable (100 GB is beyond reasonable unless you have lots of money) you need to figure out a way to partition your processing. You need to read in a large chunk of data, then run some algorithm on it to reduce it to some small size. Then do it for all other chunks one by one, making sure to not keeping the old chunk around when you process the new chunk. Finally, look at the results from all chunks and compute the final result. For a better description of this approach, look up "map-reduce'.
If the estimated amount of memory is somewhat reasonable (say, 8 GB - and you have a 16 GB machine) - use 64 bit JVM, set the maximum heap memory using -Xmx switch, make sure you use the most efficient data structures such as Trove maps.
Good luck!
Increasing the heap size is one option but there is an alternative to store data off heap by using memory mapped files in java.You can refer this
Related
I need to store in memory a huge amount of data by looping over a result set (Oracle database).
I carried out some test and, by using a profiler, I noticed that there is a considerable difference between the heap size and the used heap (i.e. my data). Here an example.
I already saw the available JVM arguments in order to set the right heap size, but the problem is that I don't know in advance how many bytes data will occupy (since the amount of data can vary from one test to another).
By observing the graph in the image, the problem seems to be the memory "peaks" during the execution.
Could these peaks be related to the number of fetched rows (or in general to the extracted data?
Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?
Thanks
By looking at your memory chart, it seems much of the data is of a temporary nature and can be removed from heap at some point. The final ratio of used heap vs. its total size says it all.
It seems like the temporary data's (e.g. buffered data from an Oracle ResultSet) time to live is too high or the eden space it too small and thus data is being moved from the eden and/or survivor space to the old generation space where it's being collected as a result of the JVM detecting the need to run the GC on the old generation space. This possibly happens when you iterate over your ResultSet and the Oracle driver needs to fetch the next chunk of data from the database, which can be fairly large.
At this point I should go a little bit into detail about the Oracle ResultSet buffer. It's basically just a chunk of bytes on the heap. Depending on the column data it is stored as something different than you'd read from the ResultSet. Take a java.sql.Timestamp for instance. Inside the buffer it's stored as an oracle.sql.TIMESTAMP or even just plain bytes. This means that whenever you extract a java.sql.Timestamp from a ResultSet there's the need for another object to be allocated. And this object is most likely the "final" object you want to keep eventually.
I suggest tuning the JVM's GC to your needs. Maybe you can figure out which data is being constantly collected. Try adjusting the eden size so the JVM doesn't need to promote too much to the old generation. You can also adjust how much new space the JVM allocates on demand and how it shrinks when detecting a gap in usage and allocated size.
You can find a list of JVM options here.
Of course you can limit the memory, but there is not much benefit in doing this. If you do so, garbage collection will have to happen more often, which will result in a slower execution of your program.
This is simply the way garbage collection works in Java. If you have enough memory GC will not be called. That gives your application more resources (CPU time).
Also, to optimize memory consumption you should check your algorithms and see if you can reuse some objects instead of creating new ones, because the new objects is exactly what makes the blue line go up. See fly weight and other similar patterns which are used to control memory consumption.
Could these peaks be related to the number of fetched rows (or in general to the extracted data?
I presume you are referring to the blue peaks.
The blue area represents the memory used at any given point in time, and the peaks represent points at which the garbage collector runs. As you can see, the line slopes up at an angle to each peak, and then falls vertically. This is normal behavior.
You will also notice that heights of the peaks and troughs are trending upward. This is most likely the effect of your application's in-memory data structure growing.
Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?
Basically, no there isn't. If the blue line wasn't jagged, or the peaks were shallower and closer together, that would mean that the GC is running more frequently ... which would be bad for performance.
Basically, if you are building a big data structure in memory, you need enough memory to represent it, PLUS a bunch of extra space for temporary objects and to give the garbage collector room to do what it needs to do.
If your concern is that your application is using too much memory overall, then you need to optimize the in-memory data structure you are building, and check to see you don't have any (other) memory leaks.
If your concern is that you can't predict how big the Java heap needs to be, then consider running the SQL query as a COUNT first, and then start / restart the Java application with a heap size estimate base on the count.
I am getting:
java.lang.OutOfMemoryError : Java heap space
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:407)
at java.lang.StringBuilder.append(StringBuilder.java:136)
ltimately you always have a finite max of heap to use no matter what platform you are running on. In Windows 32 bit this is around 2gb (not specifically heap but total amount of memory per process). It just happens that Java happens to make the default smaller (presumably so that the programmer can't create programs that have runaway memory allocation without running into this problem and having to examine exactly what they are doing).
So this given there are several approaches you could take to either determine what amount of memory you need or to reduce the amount of memory you are using. One common mistake with garbage collected languages such as Java or C# is to keep around references to objects that you no longer are using, or allocating many objects when you could reuse them instead. As long as objects have a reference to them they will continue to use heap space as the garbage collector will not delete them.
In this case you can use a Java memory profiler to determine what methods in your program are allocating large number of objects and then determine if there is a way to make sure they are no longer referenced, or to not allocate them in the first place. One option which I have used in the past is "JMP" http://www.khelekore.org/jmp/.
If you determine that you are allocating these objects for a reason and you need to keep around references (depending on what you are doing this might be the case), you will just need to increase the max heap size when you start the program. However, once you do the memory profiling and understand how your objects are getting allocated you should have a better idea about how much memory you need.
In general if you can't guarantee that your program will run in some finite amount of memory (perhaps depending on input size) you will always run into this problem. Only after exhausting all of this will you need to look into caching objects out to disk etc. At this point you should have a very good reason to say "I need Xgb of memory" for something and you can't work around it by improving your algorithms or memory allocation patterns. Generally this will only usually be the case for algorithms operating on large datasets (like a database or some scientific analysis program) and then techniques like caching and memory mapped IO become useful.
The OutOfMemoryError is usually caused by the VM not having enough memory to run your project. Did you run it directly from the command line or did you use an IDE ?
For example, Try running your programm with adding the -Xmx1G option which allocate 1Go of memory heap to your programm, you can of course adjust it to your convenience. the G is for Go and the m is for Mb.
You should give the heap a bigger size for it to work.
after searching the web for a while I decided to ask you for help with my problem.
My program should analyze logfiles, which are really big. They are about 100mb up to 2gb. I want to read the files using NIO-classes like FileChannel.
I don't want to save the files in memory, but I want to process the lines immediately. The code works.
Now my problem: I analyzed the Memory usage with the Eclipse MAT plugin and it says about 18mb of data is saved (that fits). But TaskManager in Windows says that about 180mb are used by the JVM.
Can you tell me WHY this is?
I don't want to save the data reading with the FileChannel, i just want to process it. I am closing the Channel afterwards - I thought every data would be deleted then?
I hope you guys can help me with the difference between the used space is shown in MAT and the used space is shown in TaskManager.
MAT will only show objects that are actively references by your program. The JVM uses more memory than that:
Its own code
Non-object data (classes, compiled bytecode e.t.c.)
Heap space that is not currently in use, but has already been allocated.
The last case is probably the most major one. Depending on how much physical memory there is on a computer, the JVM will set a default maximum size for its heap. To improve performance it will keep using up to that amount of memory with minimal garbage collection activity. That means that objects that are no longer referenced will remain in memory, rather than be garbage collected immediately, thus increasing the total amount of memory used.
As a result, the JVM will generally not free any memory it has allocated as part of its heap back to the system. This will show-up as an inordinate amount of used memory in the OS monitoring utilities.
Applications with high object allocation/de-allocation rates will be worse - I have an application that uses 1.8GB of memory, while actually requiring less than 100MB. Reducing the maximum heap size to 120 MB, though, increases the execution time by almost a full order of magnitude.
I have an application, basically, create a new byte array (less than 1K) store some data after few seconds (generally less than 1 minute, but some data stored up to 1 hour) write to disk and data will goes to garbage. Approximatelly 400 packets per second created. I read some articles that say don't worry about GC especially quickly created and released memory parts (on Java 6).
GC runs too long cause some problem about on my application.
I set some GC parameters(Bigger XMX and ParalelGC),this decrease Full GC time decrease but not enough yet. I have 2 idea,
Am I focus GC parameters or create Byte array memory pool mechanism? Which one is better?
The frequency of performing a GC is dependant on the object size, but the cost (the clean up time) is more dependant on the number of objects. I suspect the long living arrays are being copied between the spaces until it end up in the old space and finally discarded. Cleaning the old gen is relatively expensive.
I suggest you try using ByteBuffer to store data. These are like byte[] but have a variable size and can be slightly more efficient if you can use direct byte buffers with NIO. Pre-allocating your buffers can be more efficient to preallocate your buffers. (though can waste virtual memory)
BTW: The direct byte buffers use little heap space as they use memory in the "C" space.
I suggest you do some analysis into why GC is not working well enough for you. You can use jmap to dump out the heap and then use jhat or Eclipse Memory Analyser to see what objects are living in it. You might find that you are holding on to references that you no longer need.
The GC is very clever and you could actually make things worse by trying to outsmart it with your own memory management code. Try tuning the parameters and maybe you can try out the new G1 Garbage Collector too.
Also, remember, that GC loves short-lived, immutable objects.
Use profiler to identify the code snippet
Try with WeakReferences.
Suggest an GC algo to the VM
-Xgc: parallel
Set a big Heap and shared mem
-XX:+UseISM -XX:+AggressiveHeap
set below for garbage collection.
-XX:SurvivorRatio 8
This may help
http://download.oracle.com/docs/cd/E12840_01/wls/docs103/perform/JVMTuning.html#wp1130305
I am trying to insert about 50,000 objects (and therefore 50,000 keys) into a java.util.HashMap<java.awt.Point, Segment>. However, I keep getting an OutOfMemory exception. (Segment is my own class - very light weight - one String field, and 3 int fields).
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:508)
at java.util.HashMap.addEntry(HashMap.java:799)
at java.util.HashMap.put(HashMap.java:431)
at bus.tools.UpdateMap.putSegment(UpdateMap.java:168)
This seems quite ridiculous since I see that there is plenty of memory available on the machine - both in free RAM and HD space for virtual memory.
Is it possible Java is running with some stringent memory requirements? Can I increase these?
Is there some weird limitation with HashMap? Am I going to have to implement my own? Are there any other classes worth looking at?
(I am running Java 5 under OS X 10.5 on an Intel machine with 2GB RAM.)
You can increase the maximum heap size by passing -Xmx128m (where 128 is the number of megabytes) to java. I can't remember the default size, but it strikes me that it was something rather small.
You can programmatically check how much memory is available by using the Runtime class.
// Get current size of heap in bytes
long heapSize = Runtime.getRuntime().totalMemory();
// Get maximum size of heap in bytes. The heap cannot grow beyond this size.
// Any attempt will result in an OutOfMemoryException.
long heapMaxSize = Runtime.getRuntime().maxMemory();
// Get amount of free memory within the heap in bytes. This size will increase
// after garbage collection and decrease as new objects are created.
long heapFreeSize = Runtime.getRuntime().freeMemory();
(Example from Java Developers Almanac)
This is also partially addressed in Frequently Asked Questions About the Java HotSpot VM, and in the Java 6 GC Tuning page.
Some people are suggesting changing the parameters of the HashMap to tighten up the memory requirements. I would suggest to measure instead of guessing; it might be something else causing the OOME. In particular, I'd suggest using either NetBeans Profiler or VisualVM (which comes with Java 6, but I see you're stuck with Java 5).
Another thing to try if you know the number of objects beforehand is to use the HashMap(int capacity,double loadfactor) constructor instead of the default no-arg one which uses defaults of (16,0.75). If the number of elements in your HashMap exceeds (capacity * loadfactor) then the underlying array in the HashMap will be resized to the next power of 2 and the table will be rehashed. This array also requires a contiguous area of memory so for example if you're doubling from a 32768 to a 65536 size array you'll need a 256kB chunk of memory free. To avoid the extra allocation and rehashing penalties, just use a larger hash table from the start. It'll also decrease the chance that you won't have a contiguous area of memory large enough to fit the map.
The implementations are backed by arrays usually. Arrays are fixed size blocks of memory. The hashmap implementation starts by storing data in one of these arrays at a given capacity, say 100 objects.
If it fills up the array and you keep adding objects the map needs to secretly increase its array size. Since arrays are fixed, it does this by creating an entirely new array, in memory, along with the current array, that is slightly larger. This is referred to as growing the array. Then all the items from the old array are copied into the new array and the old array is dereferenced with the hope it will be garbage collected and the memory freed at some point.
Usually the code that increases the capacity of the map by copying items into a larger array is the cause of such a problem. There are "dumb" implementations and smart ones that use a growth or load factor that determines the size of the new array based on the size of the old array. Some implementations hide these parameters and some do not so you cannot always set them. The problem is that when you cannot set it, it chooses some default load factor, like 2. So the new array is twice the size of the old. Now your supposedly 50k map has a backing array of 100k.
Look to see if you can reduce the load factor down to 0.25 or something. this causes more hash map collisions which hurts performance but you are hitting a memory bottleneck and need to do so.
Use this constructor:
(http://java.sun.com/javase/6/docs/api/java/util/HashMap.html#HashMap(int, float))
You probably need to set the flag -Xmx512m or some larger number when starting java. I think 64mb is the default.
Edited to add:
After you figure out how much memory your objects are actually using with a profiler, you may want to look into weak references or soft references to make sure you're not accidentally holding some of your memory hostage from the garbage collector when you're no longer using them.
Also might want to take a look at this:
http://java.sun.com/docs/hotspot/gc/
Implicit in these answers it that Java has a fixed size for memory and doesn't grow beyond the configured maximum heap size. This is unlike, say, C, where it's constrained only by the machine on which it's being run.
By default, the JVM uses a limited heap space. The limit is JVM implementation-dependent, and it's not clear what JVM you are using. On OS's other than Windows, a 32-bit Sun JVM on a machine with 2 Gb or more will use a default maximum heap size of 1/4 of the physical memory, or 512 Mb in your case. However, the default for a "client" mode JVM is only 64 Mb maximum heap size, which may be what you've run into. Other vendor's JVM's may select different defaults.
Of course, you can specify the heap limit explicitly with the -Xmx<NN>m option to java, where <NN> is the number of megabytes for the heap.
As a rough guess, your hash table should only be using about 16 Mb, so there must be some other large objects on the heap. If you could use a Comparable key in a TreeMap, that would save some memory.
See "Ergonomics in the 5.0 JVM" for more details.
The Java heap space is limited by default, but that still sounds extreme (though how big are your 50000 segments?)
I am suspecting that you have some other problem, like the arrays in the set growing too big because everything gets assigned into the same "slot" (also affects performance, of course). However, that seems unlikely if your points are uniformly distributed.
I'm wondering though why you're using a HashMap rather than a TreeMap? Even though points are two dimensional, you could subclass them with a compare function and then do log(n) lookups.
Random thought: The hash buckets associated with HashMap are not particularly memory efficient. You may want to try out TreeMap as an alternative and see if it still provide sufficient performance.