java - limit of ArrayList<String>

java - limit of ArrayList<String> - java

I am trying to read a text file and store every line into ArrayList
However, the text file is too long (about 2,000,000) lines and error: java.lang.OutOfMemoryError occurs.
How do i know if the arraylist is full and then create another arraylist to store the remaining data automatically?
Sorry for my poor english.
Thanks for your help.

2 million lines is far beyond the maximum size for Java Collection (INTEGER.MAX_VALUE or 2 billion indexes).
You are more likely to have heap space outOfMemory error. You can do either
Increase your JVM maximum heap memory allocation.
java -Xmx4g
4g = 4GB.
The default maximum heap size is half of the physical memory up to a physical memory size of 192 megabytes and otherwise one fourth of the physical memory up to a physical memory size of 1 gigabyte.
http://www.oracle.com/technetwork/java/javase/6u18-142093.html
as konsolas recommends, read line by line and store it into a file and flush the variable.
Hope it helps!

This depends on what you are planning to do with the file. You're definitely not going to be able to store all of it in memory, as shown by your error.
Depending on what you're trying to do with the file, processing it in blocks and then saving it would be a better idea. For example:
Read the first 1000 lines of the file
Process these lines/save into another file, etc.
Read the next 1000 lines
etc.
An ArrayList can theoretically hold 2,147,483,647 items. (max int)

As the other answers suggested, your problem is because you run out of memory before your ArrayList is full. If you still don't have enough memory after increasing the heap space size, BigArrayList will solve your problems. It functions like a normal ArrayList and automatically handles swapping data between disk and memory. Note that the library currently supports a limited number of operations, which may or may not be sufficient for you.

Related

How to avoid Java Heap Space Exception in java

I'm reading data from IO having huge volume of data and I need to store the data in key value pair in Map or properties file, then only I can use that data for generating reports. But when I am storing this huge data in Map or Properties file, Heap Memory Exception is coming.Instead, if I am using SQLLite its taking very huge time to retrieve that. Is there any different way available to achieve this.Please suggest.

Java Heap Space Important Points
Java Heap Memory is part of memory allocated to JVM by Operating System.
Whenever we create objects they are created inside Heap in Java.
Java Heap space is divided into three regions or generation for sake of garbage collection called New Generation, Old or tenured Generation or Perm Space. Permanent generation is garbage collected during full gc in hotspot JVM.
You can increase or change size of Java Heap space by using JVM command line option -Xms, -Xmx and -Xmn. don't forget to add word "M" or "G" after specifying size to indicate Mega or Gig.
For example you can set java heap size to 258MB by executing following command java -Xmx256m javaClassName (your program class name).
You can use either JConsole or Runtime.maxMemory(), Runtime.totalMemory(), Runtime.freeMemory() to query about Heap size programmatic in Java.
You can use command "jmap" to take Heap dump in Java and "jhat" to analyze that heap dump.
Java Heap space is different than Stack which is used to store call hierarchy and local variables.
Java Garbage collector is responsible for reclaiming memory from dead object and returning to Java Heap space.
Don’t panic when you get java.lang.OutOfMemoryError, sometimes it’s just matter of increasing heap size but if it’s recurrent then look for memory leak in Java.
Use Profiler and Heap dump Analyzer tool to understand Java Heap space and how much memory is allocated to each object.
Reference link for more details:
https://docs.oracle.com/cd/E19159-01/819-3681/abeii/index.html
https://docs.oracle.com/cd/E40520_01/integrator.311/integratoretl_users/src/ti_troubleshoot_memory_errors.html

You need to do a rough estimate of of memory needed for your map. How many keys and values? How large are keys and values? For example, if the keys are longs and values are strings 40 characters long on average, the absolute minimum for 2 billion key-value pairs is (40 + 8) * 2E9 - approximately 100 GB. Of course, the real requirement is larger than the minimum estimate - as much as two times larger depending on the nature of the keys and values.
If the estimated amount of memory beyond reasonable (100 GB is beyond reasonable unless you have lots of money) you need to figure out a way to partition your processing. You need to read in a large chunk of data, then run some algorithm on it to reduce it to some small size. Then do it for all other chunks one by one, making sure to not keeping the old chunk around when you process the new chunk. Finally, look at the results from all chunks and compute the final result. For a better description of this approach, look up "map-reduce'.
If the estimated amount of memory is somewhat reasonable (say, 8 GB - and you have a 16 GB machine) - use 64 bit JVM, set the maximum heap memory using -Xmx switch, make sure you use the most efficient data structures such as Trove maps.
Good luck!

Increasing the heap size is one option but there is an alternative to store data off heap by using memory mapped files in java.You can refer this

Loading big HashMap<String, TreeMap> from file gives java.lang.OutOfMemoryError(GC overhead limit exceeded)

My problem in short:
I have a machine with 500 GB RAM without swap (more than enough) : top command shows 500GB of free ram
I have a 20GB file containing triplets (stringOfTypeX, stringOfTypeY, double val). The meaning is that for one string of type X, the file has on average 20-30 lines, each containing this string of type X plus one (different) string of type Y and the double value associated
I want to load the file in an in-memory index HashMap < StringOfTypeX, TreeMap < StringOfTypeY, val > >
I wrote a Java program using BufferedReader.readLine()
in this program, the hashmap is initialized in the constructor using an initCapacity of 2 times the expected number of distinct strings of type X (the expected number of keys)
I ran the program using: java -jar XXX.jar -Xms500G -Xmx500G -XX:-UseGCOverheadLimit
the program seems to process file lines slower and slower: at first, it processes 2M lines per minute, but with each chunk of 2M lines, it gets slower and slower. After 16M of lines, it is almost stopped and, eventually, it will throw a java.lang.OutOfMemoryError(GC overhead limit exceeded)
before it throws that error, top command shows me that it consumes 6% of the 500GB ram (and this value is constant, the program doesn't consume more RAM than this for the rest of its lifetime)
I've read all possible internet threads regarding this. Nothing seems to work. I guess the GC starts doing a lot of stuff, but I don't understand why it does this given that I tried to allocate the hashmap enough RAM before the starting. Anyways, it seems that JVM cannot be forced to pre-allocate a big amount of RAM, no matter what command line args I give. If this is true, what is the real usage of Xmx and Xms params ?
Anyone has any ideas? Many thanks !!
Update:
my jvm is 64-bit
6.1% of the 515 GB of RAM is ~ 32GB. Seems that JVM is not allowing the usage of more than 32 GB. Following this post I tried to disable the use of compressed pointers using the flag -XX:-UseCompressedOops . However, nothing changed. The limit is still 32GB.
no swap is done at any point in time (checked using top)
running with -Xms400G -Xmx400G doesn't solve the issue

It is fairly common to mis-diagnose these sorts of problem.
500 GB should be more than enough, assuming you have more than 500 GB of main memory, swap will not do.
20 GB file is likely to have a significant expansion ration if you have Strings. e.g. a 16 character String will use about 80 bytes of memory., A Double uses around 24 bytes in a 64-bit JVM, not the 8 bytes you might expect.
HashMap and TreeMap uses about 24 extra bytes per entry.
Using readLine() and doubling the capacity is fine. Actually expected-size*4/3 is enough though it always uses the next power of 2.
Setting the -Xms does preallocate the memory specific (or almost that number, it is often out by 1% for no apparent reason)
2 M lines per minute is pretty slow. It suggests your overhead is already very high. I would be looking for something closer to 1 million lines per second.
16 million entries is nothing compared with the size of your JVM. My guess is you have started to swap and the error you see is because the GC is taking too long, not because the heap is too full.
How much free main memory do you have? e.g. in top what do you see after the application dies.

Problem solved:
java -jar XXX.jar -Xms500G -Xmx500G -XX:-UseGCOverheadLimit is not correct. The running parameters should be specified before -jar, otherwise they will be considered as Main params. The correct cmd line is java -Xms500G -Xmx500G -XX:-UseGCOverheadLimit -jar XXX.jar args[0] args[1] ... .
Sorry for this and thanks for you answers!

You say you have 500GB of RAM. You shouldn't set the Xmx to 500 GB because this is only the Heap size. The VM itself has some memory overhead too. So it is not advised to fully set all memory to it.
I would recommend to profile your application using for example JVisualVM. Or make an heapdump to know what really is in the memory. Maybe something is not cleaned up.

Increasing Heap Size upto 6GB

I was looking at the following article:
Increase heap size in Java
Now I have a program that needs about 5GB memory and while doing what was told in the article (that is increasing heap size by using -Xmx5g in the arguments field), I am still getting
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
My system is Windows 7 (64 bit) with 8GB RAM. Am I doing something wrong? If yes, how shall I proceed to get 5GB of heap memory or it is just not feasible for my system to handle?
Note: I have to do calculations with a 2D matrix that is of 25K*25K size having all non-zero values. Hence I cannot use sparse matrix as well.

OutOfMemoryError is thrown when JVM does not have enough memory for objects being allocated. If you defined heap of 5G this almost definitely mean that you have a kind of memory leak. For example I can write very simple code that will cause OutOfMemoryError at any environment:
List<String> list = new LinkedList<>();
while(true) {
list.add("a");
}
Run this code and wait several seconds. OutOfMemoryError will be thrown. This is because I add strings to list and never clean it.
I believe that something similar happens with your application.
I understand, that it is not so trivial as my example, so you will probably have to use profiler to debug it and understand the reason of your memory leak.
EDIT:
I've just saw that you are working with 25K*25K martrix. It means that you have 625M cells. You have not mentioned the type of the matrix but if it is int that occupies 4 bytes you need 625*4=2500M=2.5G memory, so 5G should be enough.
Please try to analyze what else happens in your program and where your memory is spent.

5G/(25K*25K) ~ 8 bytes.
Generously assuming that you program does not use memory except for that matrix, each matrix element must take no more than 8 bytes.
You should calculate at least approximate memory requirements to check whether it is even possible to handle problem of such size on your hardware. For example, if you need a 2D array of MxN size of double values then you need at least 8*M*N bytes of memory.

MAT space vs. TaskManager space

after searching the web for a while I decided to ask you for help with my problem.
My program should analyze logfiles, which are really big. They are about 100mb up to 2gb. I want to read the files using NIO-classes like FileChannel.
I don't want to save the files in memory, but I want to process the lines immediately. The code works.
Now my problem: I analyzed the Memory usage with the Eclipse MAT plugin and it says about 18mb of data is saved (that fits). But TaskManager in Windows says that about 180mb are used by the JVM.
Can you tell me WHY this is?
I don't want to save the data reading with the FileChannel, i just want to process it. I am closing the Channel afterwards - I thought every data would be deleted then?
I hope you guys can help me with the difference between the used space is shown in MAT and the used space is shown in TaskManager.

MAT will only show objects that are actively references by your program. The JVM uses more memory than that:
Its own code
Non-object data (classes, compiled bytecode e.t.c.)
Heap space that is not currently in use, but has already been allocated.
The last case is probably the most major one. Depending on how much physical memory there is on a computer, the JVM will set a default maximum size for its heap. To improve performance it will keep using up to that amount of memory with minimal garbage collection activity. That means that objects that are no longer referenced will remain in memory, rather than be garbage collected immediately, thus increasing the total amount of memory used.
As a result, the JVM will generally not free any memory it has allocated as part of its heap back to the system. This will show-up as an inordinate amount of used memory in the OS monitoring utilities.
Applications with high object allocation/de-allocation rates will be worse - I have an application that uses 1.8GB of memory, while actually requiring less than 100MB. Reducing the maximum heap size to 120 MB, though, increases the execution time by almost a full order of magnitude.

Why do I get an OutOfMemoryError when inserting 50,000 objects into HashMap?

I am trying to insert about 50,000 objects (and therefore 50,000 keys) into a java.util.HashMap<java.awt.Point, Segment>. However, I keep getting an OutOfMemory exception. (Segment is my own class - very light weight - one String field, and 3 int fields).
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:508)
at java.util.HashMap.addEntry(HashMap.java:799)
at java.util.HashMap.put(HashMap.java:431)
at bus.tools.UpdateMap.putSegment(UpdateMap.java:168)
This seems quite ridiculous since I see that there is plenty of memory available on the machine - both in free RAM and HD space for virtual memory.
Is it possible Java is running with some stringent memory requirements? Can I increase these?
Is there some weird limitation with HashMap? Am I going to have to implement my own? Are there any other classes worth looking at?
(I am running Java 5 under OS X 10.5 on an Intel machine with 2GB RAM.)

You can increase the maximum heap size by passing -Xmx128m (where 128 is the number of megabytes) to java. I can't remember the default size, but it strikes me that it was something rather small.
You can programmatically check how much memory is available by using the Runtime class.
// Get current size of heap in bytes
long heapSize = Runtime.getRuntime().totalMemory();
// Get maximum size of heap in bytes. The heap cannot grow beyond this size.
// Any attempt will result in an OutOfMemoryException.
long heapMaxSize = Runtime.getRuntime().maxMemory();
// Get amount of free memory within the heap in bytes. This size will increase
// after garbage collection and decrease as new objects are created.
long heapFreeSize = Runtime.getRuntime().freeMemory();
(Example from Java Developers Almanac)
This is also partially addressed in Frequently Asked Questions About the Java HotSpot VM, and in the Java 6 GC Tuning page.

Some people are suggesting changing the parameters of the HashMap to tighten up the memory requirements. I would suggest to measure instead of guessing; it might be something else causing the OOME. In particular, I'd suggest using either NetBeans Profiler or VisualVM (which comes with Java 6, but I see you're stuck with Java 5).

Another thing to try if you know the number of objects beforehand is to use the HashMap(int capacity,double loadfactor) constructor instead of the default no-arg one which uses defaults of (16,0.75). If the number of elements in your HashMap exceeds (capacity * loadfactor) then the underlying array in the HashMap will be resized to the next power of 2 and the table will be rehashed. This array also requires a contiguous area of memory so for example if you're doubling from a 32768 to a 65536 size array you'll need a 256kB chunk of memory free. To avoid the extra allocation and rehashing penalties, just use a larger hash table from the start. It'll also decrease the chance that you won't have a contiguous area of memory large enough to fit the map.

The implementations are backed by arrays usually. Arrays are fixed size blocks of memory. The hashmap implementation starts by storing data in one of these arrays at a given capacity, say 100 objects.
If it fills up the array and you keep adding objects the map needs to secretly increase its array size. Since arrays are fixed, it does this by creating an entirely new array, in memory, along with the current array, that is slightly larger. This is referred to as growing the array. Then all the items from the old array are copied into the new array and the old array is dereferenced with the hope it will be garbage collected and the memory freed at some point.
Usually the code that increases the capacity of the map by copying items into a larger array is the cause of such a problem. There are "dumb" implementations and smart ones that use a growth or load factor that determines the size of the new array based on the size of the old array. Some implementations hide these parameters and some do not so you cannot always set them. The problem is that when you cannot set it, it chooses some default load factor, like 2. So the new array is twice the size of the old. Now your supposedly 50k map has a backing array of 100k.
Look to see if you can reduce the load factor down to 0.25 or something. this causes more hash map collisions which hurts performance but you are hitting a memory bottleneck and need to do so.
Use this constructor:
(http://java.sun.com/javase/6/docs/api/java/util/HashMap.html#HashMap(int, float))

You probably need to set the flag -Xmx512m or some larger number when starting java. I think 64mb is the default.
Edited to add:
After you figure out how much memory your objects are actually using with a profiler, you may want to look into weak references or soft references to make sure you're not accidentally holding some of your memory hostage from the garbage collector when you're no longer using them.

Also might want to take a look at this:
http://java.sun.com/docs/hotspot/gc/

Implicit in these answers it that Java has a fixed size for memory and doesn't grow beyond the configured maximum heap size. This is unlike, say, C, where it's constrained only by the machine on which it's being run.

By default, the JVM uses a limited heap space. The limit is JVM implementation-dependent, and it's not clear what JVM you are using. On OS's other than Windows, a 32-bit Sun JVM on a machine with 2 Gb or more will use a default maximum heap size of 1/4 of the physical memory, or 512 Mb in your case. However, the default for a "client" mode JVM is only 64 Mb maximum heap size, which may be what you've run into. Other vendor's JVM's may select different defaults.
Of course, you can specify the heap limit explicitly with the -Xmx<NN>m option to java, where <NN> is the number of megabytes for the heap.
As a rough guess, your hash table should only be using about 16 Mb, so there must be some other large objects on the heap. If you could use a Comparable key in a TreeMap, that would save some memory.
See "Ergonomics in the 5.0 JVM" for more details.

The Java heap space is limited by default, but that still sounds extreme (though how big are your 50000 segments?)
I am suspecting that you have some other problem, like the arrays in the set growing too big because everything gets assigned into the same "slot" (also affects performance, of course). However, that seems unlikely if your points are uniformly distributed.
I'm wondering though why you're using a HashMap rather than a TreeMap? Even though points are two dimensional, you could subclass them with a compare function and then do log(n) lookups.

Random thought: The hash buckets associated with HashMap are not particularly memory efficient. You may want to try out TreeMap as an alternative and see if it still provide sufficient performance.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.