How does memory allocation of an ArrayList work? - java

As far as I know, when we are creating an ArrayList:
ArrayList<String> list = new ArrayList<String>(SIZE);
The JVM reserves for it a contiguous part of memory. When we are adding new elements into our list, when number of elements reaches 75% of SIZE it reserves a new, contiguous part of memory and copies all of the elements.
Our list is getting bigger and bigger. We are adding new objects and the list has to be rebuilt once again.
What happens now?
The JVM is looking for a contiguous segment of memory, but it does not find enough space.
The Garbage Collector can try to remove some unused references and defragment memory. What happens, if the JVM is not able to reserve space for new instance of list after this process?
Does it create a new one, using maximal possible segment? Which Exception will be thrown?
I read this question Java: How ArrayList manages memory and one of the answers is:
Reference doesn't consume much space. but anyhow, some of space is used. When array is getting bigger, it could be a problem. We cannot also forget that we have got another things which use memory space.

If JVM is not able to allocate requested amount of memory it'll throw
OutOfMemoryError
That's it. Actually JVM memory allocation has only two possible outcomes:
Application is given requested amount of memory.
JVM throws OutOfMemoryError.
There is no intermediate options, like some amount of memory is allocated.
It has nothing to do with ArrayList, it's a JVM issue. If you asking whether ArrayList somehow manages this situation in a special way - then answer is "No, it does not." It just tries to allocate amount of memory it needs and lets JVM think about the rest.

In Java, the referrences to the object are stored in the contiguous memory. The actual objects can stay in a non contiguous manner. So for ex your array might have 10 objects, JVM only needs to reserve the memory for the object references, not the objects. So if each reference takes a Byte(approx not the correct value), but each object takes up a KB, and you have an array of 10 elements, JVm will try to reserve contguous memory of only 1*10 B i.e 10 B. The objects can reside in 10 different memory locations totalling 10KB. Remember that both the contiguous and non contiguous memory spaces are for the memory allocated to the thread.
When it needs to resize the array, the JVM tried to find a contiguos array of the newer length. So if you want to resize the array from 10 to 20 elements, it would try to reserve a contiguous space of 20 KB(using the above example). If it finds this space, it will do a copy of the references from the old array to the new array. If it does not find this space, it will try to do a GC . If it still does not find the space it throws an OutofMemoryException.
Therefore, at any time when you are resizing the array, the JVM needs to find a contiguos memory to store referrences of the new sized array. So if you want to extend the array to a size of say 1000 element, and each reference is a byte each , the JVm will try to find a contiguos memory of 1000* 1KB which is 1 MB.
If it finds this memory, it will do a copy of the references, and mark the oldeer contiguos memory for GC , whenever GC runs the next time
If it is not able to find the memory , it will try to do a GC, and if it still does not find the contiguos memory , it will throw a Out of memory exception
This is the code in ArrayList which does the resizing.
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/ArrayList.java#ArrayList.ensureCapacity%28int%29

This will throw an OutOfMemoryError as soon as there is not enough heap space to allocate the new array.
Garbage collection will always be done before this error is thrown. This will compact memory and eliminate all the arrays of smaller sizes that are no longer used. But there is no way to get around the fact that the old array, the new array, and all the contained objects need to all be in memory at once in order for the old contents to be copied into the new list.
So, if your memory limit is 10 MB, and the array takes up 2 MB and is being sized up to 3 MB, and the strings take up 6 MB, then OOM will be thrown even though after this operation you will only have 3 + 6 = 9 MB in memory. One way to avoid this, if you want to run really close to memory limits with a huge array, is to size the array to the full size to begin with so that it never needs to resize.

I assume it will run out of memory since there will be no space to use in the case where JVM can extend array size.

The very first thing I want to correct is, When we are adding new elements into our list, when number of elements reaches 100% of SIZE it reserves a new, contiguous part of memory and copies all of the elements.
New Size of ArrayList will be:
NewSize of ArrayList = (CurrentSize * 3/2) + 1
But going this way is never recommended, if we have idea how much objects needs to be stored then we can use following constructor of ArrayList:-
ArrayList ar = new ArrayList(int initialCapacity);
If our JVM couldn't specify enough contiguous space on heap for the ArrayList, At runtime we'll get
Runtime error: OutOfMemoryError

Related

Java memory optimized [Key:Long, Value:Long] store of very large size (500M) for concurrent read-access

I have a use-case where I need to store Key - Value pairs of size approx. 500 Million entries in sinle VM of size 8 GB. Key and Value are of type Long. Key is auto incremented starting from 1, 2 ,3, so on..
Only once I build this Map[K-V] structure at the start of program as a exclusive operation, Once this is build, used only for lookup, No update or delete is performed in this structure.
I have tried this with java.util.hashMap but as expected it consumes a lot of memory and program give OOM : Heap usage exceeds Error.
I need some guidance on following which helps in reducing the memory footprint, I am Ok with some degradation in access performance.
What are the other alternative (from java collection or other libraries)
that can be tried here.
What is a recommended way to get the memory footprint by this Map, for
comparison purpose.
Just use a long[] or long[][].
500 million ascending keys is less than 2^31. And if you go over 2^31, use a long[][] where the first dimension is small and the second one is large.
(When the key type is an integer, you only need a complicated "map" data structure if the key space is sparse.)
The space wastage in a 1D array is insignificant. Every Java array node has 12 byte header, and the node size is rounded up to a multiple of 8 bytes. So a 500 million entry long[] will take so close to 500 million x 8 bytes == 4 billion bytes that it doesn't matter.
However, a JVM typically cannot allocate a single object that takes up the entire available heap space. If virtual address space is at a premium, it would be advisable to use a 2-D array; e.g. new long[4][125_000_000]. This makes the lookups slightly more complicated, but you will most likely reduce the memory footprint by doing this.
If you don't know beforehand the number of keys to expect, you could do the same thing with a combination of arrays and ArrayList objects. But an ArrayList has the problem that if you don't set an (accurate) capacity, the memory utilization is liable to be suboptimal. And if you populate an ArrayList by appending to it, the instantaneous memory demand for the append can be as much as 3 times the list's current space usage.
There is no reason for using a Map in your case.
If you just have a start index and further indizes are just constant increments, just use a List:
List<Long> data=new ArrayList<>(510_000_000);//capacity should ideally not be reached, if it is reached, the array behind the ArrayList needs to be reallocated, the allocated memory would be doubled by that
data.add(1337L);//inserting, how often you want
long value=data.get(1-1);//1...your index that starts with 1, -1...because your index starts with 1, you should subtract one from the index.
If you don't even add more elements and know the size from the start, an array will be even better:
long[] data=long[510_000_000];//capacity should surely not be reached, you will need to create a new array and copy all data if it is higher
int currentIndex=0;
data[currentIndex++]=1337L//inserting, as often as it is smaller than the size
long value=data[1-1];//1...your index that starts with 1, -1...because your index starts with 1, you should subtract one from the index.
Note that you should check the index (currentIndex) before inserting so that it is smaller than the array length.
When iterating, use currentIndex+1 as length instead of .length.
Create an array with the size you need and whenever you need to access it, use arr[i-1] (-1 because your indizes start with 1 instead of zero).
If you "just" have 500 million entries, you will not reach the integer limit and a simple array will be fine.
If you need more entries and you have sufficient memories, use an array of arrays.
The memory footprint of using an array this big is the memory footprint of the data and a bit more.
However, if you don't know the size, you should use a higher length/capacity then you may need. If you use an ArrayList, the memory footprint will be doubled (temporarily tripled) whenever the capacity is reached because it needs to allocate a bigger array.
A Map would need an object for each entry and an array of lists for all those object that would highly increase the memory footprint. The increasing of the memory footprint (using HashMap) is even worse than with ÀrrayLists as the underlaying array is reallocated even if the Map is not completely filled up.
But consider saving it to the HDD/SSD if you need to store that much data. In most cases, this works much better. You can use RandomAccessFile in order to access the data on the HDD/SSD on any point.

Will ThreadMXBean#getThreadAllocatedBytes return size of allocated memory or objects?

I want to employ com.sun.management.ThreadMXBean to do something like this:
long before = threadMxBean.getThreadAllocatedBytes(currentThreadId);
seriousBusiness(); // some calls here
long after = threadMxBean.getThreadAllocatedBytes(currentThreadId);
long allocDiff = after - before; // use for stats or whatever
The question is, what does this method actually return: amount of new memory allocated at the moment of method invocation or the size of allocated objects? To be clear on what difference I mean:
1) Say I allocated a huge array in my seriousBusiness() call, so a new memory region was allocated for this purpose and getThreadAllocatedBytes gets incremented by the corresponding value.
2) Some time passed, there was a GC run, that unused array was collected and the memory region is free now.
3) I do some calls again (in the same thread), JVM sees that it doesn't need to allocate new memory and reuses that memory region for the new purpose, which results in getThreadAllocatedBytes value not growing.
I might not be precise on how does JVM memory management work, but the question should be clear as is.
Also, if the first assumption is correct (just new memory allocations count), what would be the right way to do per-thread object allocations / memory footprint measurements?
UPD. I tried to check for myself: http://pastebin.com/ECQpz8g4. (sleeps in the code are to allow me to connect to the JVM with JMC).
TL;DR: Allocate a huge int array, then GC it, then allocate some new objects and check allocated memory. Here's what I got:
So, it appears that GC was actually run and, while memory was certainly allocated and subsequently freed, I got this output:
665328 // before everything started
4295684088 // 4 GiB allocated
4295684296 // did GC (for certain! (really?))
5812441672 // allocated a long string, took new memory
So, I'm just waiting for someone with expertise on JVM memory to tell me if I'm right or where am I wrong.
Referencing ThreadMXBean, it reports the number of bytes allocated in heap on behalf of the target thread, but it is implied that this is equivalent to the size of the objects allocated. There is a caveat though:
The returned value is an approximation because some Java virtual
machine implementations may use object allocation mechanisms that
result in a delay between the time an object is allocated and the time
its size is recorded
Therefore, I would assume that reclamation of heap space has no effect on the reported values, since it is only reporting the absolute number of bytes allocated, so if you allocate 100 bytes, then 80 bytes are reclaimed, and then you allocate another 100 bytes, the reported (delta) value at the conclusion of these events would be 200 bytes, although the net allocation was only 120.
ThreadMXBean.getThreadAllocatedBytes returns the cumulative amount of heap memory allocated by the given thread from the beginning, i.e. this is a monotonically incrementing counter. It is roughly the total size of allocated objects.
EDIT
There is no 'thread ownership' of the allocated heap memory in HotSpot JVM. Once the memory is allocated it is shared among all threads. So, 'per-thread usage' does not really make sense; when JVM allocates an object in heap, it does not know whether this memory has been used before and by whom.

In Java, how to allocate given amount of memory, and hold it until program exit?

I am doing some experiments on memory. The first problem I met is how to allocate given amount of memory during runtime, say 500MB. I need the program's process hold it until the program exit.
I guess there may be several ways to achieve this? I prefer a simple but practical one.
Well, Java hides memory management from you, so there are two answers to your question:
Create the data structures of this size, you are going to need and hold a reference to them in some thread, until the program exits, because, once there is no reference to data on the heap in an active thread it becomes garbage collectable. On a 32-bit system 500MB should be roughly enough for an int array of 125000 cells, or 125 int arrays of 1000 cells.
If you just want to have the memory allocated and available, but not filled up, then start the virtual machine with -Xms=512M. This is going to make the VM allocate 512 M of memory for your program on startup, but it is going to be empty (just allocated) until you need it (do point 1). Xmx sets the maximum allocatable memory by your program.
public static void main( String[] args ) {
final byte[] x = new byte[500*1024 ]; // 500 Kbytes
final byte[] y = new byte[500*1024*1024]; // 500 Mbytes
...
System.out.println( x.length + y.length );
}
jmalloc lets you do it, but I wouldn't recommend it unless you're truly an expert. You're giving up something that's central to Java - garbage collection. You might as well be writing C.
Java NIO allocates byte buffers off heap this way. I think this is where Oracle is going for memory mapping JARs and getting rid of perm gen, too.

Java throwing out of memory exception before it's really out of memory?

I wish to make a large int array that very nearly fills all of the memory available to the JVM. Take this code, for instance:
final int numBuffers = (int) ((runtime.freeMemory() - 200000L) / (BUFFER_SIZE));
System.out.println(runtime.freeMemory());
System.out.println(numBuffers*(BUFFER_SIZE/4)*4);
buffers = new int[numBuffers*(BUFFER_SIZE / 4)];
When run with a heap size of 10M, this throws an OutOfMemoryException, despite the output from the printlns being:
9487176
9273344
I realise the array is going to have some overheads, but not 200k, surely? Why does java fail to allocate memory for something it claims to have enough space for? I have to set that constant that is subtracted to something around 4M before Java will run this (By which time the printlns are looking more like:
9487176
5472256
)
Even more bewilderingly, if I replace buffers with a 2D array:
buffers = new int[numBuffers][BUFFER_SIZE / 4];
Then it runs without complaint using the 200k subtraction shown above - even though the amount of integers being stored is the same in both arrays (And wouldn't the overheads on a 2D array be larger than that of a 1D array, since it's got all those references to other arrays to store).
Any ideas?
The VM will divide the heap memory into different areas (mainly for the garbage collector), so you will run out of memory when you attempt to allocate a single object of nearly the entire heap size.
Also, some memory will already have been used up by the JRE. 200k is nothing with todays memory sizes, and 10M heap is almost unrealistically small for most applications.
The actual overhead of an array is relatively small, on a 32bit VM its 12 bytes IIRC (plus what gets wasted if the size is less than the minimal granularity, which is AFAIK 8 bytes). So in the worst case you have something like 19 bytes overhead per array.
Note that Java has no 2D (multi-dimensional) arrays, it implements this internally as an array of arrays.
In the 2D case, you are allocating more, smaller objects. The memory manager is objecting to the single large object taking up most of the heap. Why this is objectionable is a detail of the garbage collection scheme-- it's probably because something like it can move the smaller objects between generations and the heap won't accomodate moving the single large object around.
This might be due to memory fragmentation and the JVM's inability to allocate an array of that size given the current heap.
Imagine your heap is 10 x long:
xxxxxxxxxx
Then, you allocate an object 0 somehere. This makes your heap look like:
xxxxxxx0xx
Now, you can no longer allocate those 10 x spaces. You can not even allocate 8 xs, despite the fact that available memory is 9 xs.
The fact is that an array of arrays does not suffer from the same problem because it's not contiguous.
EDIT: Please note that the above is a very simplistic view of the problem. When in need of space in the heap, Java's garbage collector will try to collect as much memory as it can and, if really, really necessary, try to compact the heap. However, some objects might not be movable or collectible, creating heap fragmentation and putting you in the above situation.
There are also many other factors that you have to consider, some of which include: memory leaks either in the VM (not very likely) or your application (also not likely for a simple scenario), unreliability of using Runtime.freeMemory() (the GC might run right after the call and the available free memory could change), implementation details of each particular JVM, etc.
The point is, as a rule of thumb, don't always expect to have the full amount of Runtime.freeMemory() available to your application.

Heap: Survivor Space

I wrote a sample java application which allocates memory and then running forever.
why is the memory used by the survivor space 0kbytes ?!
List<String> stringlist = new ArrayList<String>();
while (true) {
stringlist.add("test");
if (stringlist.size() >= 5000000)
break;
}
while (true)
for (String s : stringlist);
Because "test" is a String literal it will end up in permanent memory not heap.
Memory size of objects you create is 5000000 + 4*2 ~ 5MB which will easily fit into Eden space.
Modify
stringlist.add("test");
to
stringlist.add(new String("test"));
and you will get 5000000 * 4 * 2 =38MB which most probably will still fit into Eden. You can either increase your list size or String length to make sure you have survivors.
"test" is a String literal and, regardless of how it’s stored (this has changed during the Java development), the important point here is, that it is a single object.
Recall the Java Language Specification:
…a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern
So there are no new Strings created within your loop as "test" always refers to the same String instance. The only heap change occurs when the ArrayList’s internal capacity is exhausted.
The memory finally required for the ArrayList’s internal array depends on the size of an object reference, usually it’s 5000000*4 bytes for 32Bit JVMs and 64Bit JVMs with compressed oops and 5000000*8 bytes for 64Bit JVMs without compressed oops.
The interesting point here is described on www.kdgregory.com:
if your object is large enough, it will be created directly in the tenured generation. User-defined objects won't (shouldn't!) have anywhere near the number of members needed to trigger this behavior, but arrays will: with JDK 1.5, arrays larger than a half megabyte go directly into the tenured generation.
This harmonizes with these words found on oracle.com:
If survivor spaces are too small, copying collection overflows directly into the tenured generation.
which gives another reason why larger arrays might not show up in the survivor space. So it depends on the exact configuration whether they do not appear because the were copied from Eden space to Tenured Generation or were created in the Tenured Generation in the first place. But the result of not showing up in the survivor space is the same.
So when the ArrayList is created with its default capacity of 10, the internal array is smaller than this threshold and so are the next ones to be created on each capacity enlargements. However, at the time the new array exceeds this threshold, all old ones are garbage and hence won’t show up as “survivors”.
So at the end of the first loop you have only one remaining array which has a size exceeding the threshold by far and hence bypassed the Survivor space. Your second loop does not add anything to the memory management. It creates temporary Iterators but these never “survive”.

Categories