How is "Size" calculated in JVisualVM's Heap Dump - java

I'm running Java Visual VM to analyze performance on a Mule app to reduce the amount of memory used. When I look at the heap dump, I see that char[] has a size of over 37 MB, String has a size of a bit over 28 MB.
What I'm not clear on is how the size column accounts for the amount of memory used. In particular, since a String is an abstraction of a char[], I'm wondering if this means that some of that 37 MB of char arrays is also present within the 28 MB of Strings, or if they are allocated separately.
On top of that, I also have a class that I suspect is hogging a great deal of memory and contains several strings, but according to my heap dump, this class only uses 6.5% of the total memory in the heap.
So I guess my question is... If I were to make my custom class more efficient by using fewer String objects, would I see a reduction in the amount of memory reported to be used by Strings and Char[]s, or just for that specific class?
Thanks!

Holger's comment is all I needed...
"The sizes only include the memory for an object itself, not any referenced object (and arrays are objects)."
This alone gives me a much better idea of how I can optimize.

Related

Why are JVM memory parameters usually in multiples of 256?

I have seen almost all of the JVM memory parameters usually in multiples of 256 or a round binary value - e.g. 256m, 512m, 1024m etc. I know, it may be linked to the fact that physical memory (RAM) is usually a binary number, such as 256 MB, 1 GB etc.
My question is does it really help the memory management in anyway if JVM memory is set to a value that is a multiple of 256 or any binary value for that matter? Does it hurt to keep the JVM memory a round decimal value, such as 1000m, instead of 1024m - though I have never seen any JVM using such a value that is considered round in terms of decimal.
The OS would allocate the mentioned memory to JVM when it is launched, so I guess, it is more of a question for the JVM whether it can manage a round decimal memory size (e.g. 1000 MB) efficiently or would there be any shortcomings.
EDIT: I know, we CAN use decimal values for JVM memory, but my question is SHOULD we use decimal values?
EDIT2: For opinions/guesses about JVM being equally efficient of handling every memory size, please share any relevant links that you used for coming to that conclusion. I have seen enough WAR on this topic among fellow developers, but I haven't seen much of concrete reasoning to back either - decimal or binary value.
It is not necessary to use a multiple of 2 for the JVM memory parameters. It is just common use for memory allocation to double the value if the old one isn't enough.
If you increase your assigned memory value in 1MB steps you will have to adjust the value several (hundred) times before the configuration matches you requirements. So it it just more comfortable to double the old value.
This relies on the fact that memory is a cheap ressource in those days.
EDIT:
As you already mentioned it is possible to assign values like 1000 MB or 381 MB. The JVM can handle every memory size that is big enough to host the permGenSpace, the stack and the heap.
It does not matter. There is no special treatment of rounded values.
You can specify the memory size within 1-byte accuracy - JVM itself will round the size up to the value it is comfortable with. E.g. the heap size is rounded to the 2 MB boundary. See my other answer: https://stackoverflow.com/a/24228242/3448419
There is not real requirement those values be a multiplier of 2. It is just a way of use. you can use what ever values there.
-Xms1303m -Xmx2303m -XX:MaxPermSize=256m // my configs
I think it is mainly a way of thinking, something like: I have a 1 GB memory, I will give JVM a half, which is 512 MB.
It is just a way to ensure the size that you specify (as argument) fits to the actual allocated memory, because the machine allocates the memory in blocks of size power of 2.

How does memory allocation of an ArrayList work?

As far as I know, when we are creating an ArrayList:
ArrayList<String> list = new ArrayList<String>(SIZE);
The JVM reserves for it a contiguous part of memory. When we are adding new elements into our list, when number of elements reaches 75% of SIZE it reserves a new, contiguous part of memory and copies all of the elements.
Our list is getting bigger and bigger. We are adding new objects and the list has to be rebuilt once again.
What happens now?
The JVM is looking for a contiguous segment of memory, but it does not find enough space.
The Garbage Collector can try to remove some unused references and defragment memory. What happens, if the JVM is not able to reserve space for new instance of list after this process?
Does it create a new one, using maximal possible segment? Which Exception will be thrown?
I read this question Java: How ArrayList manages memory and one of the answers is:
Reference doesn't consume much space. but anyhow, some of space is used. When array is getting bigger, it could be a problem. We cannot also forget that we have got another things which use memory space.
If JVM is not able to allocate requested amount of memory it'll throw
OutOfMemoryError
That's it. Actually JVM memory allocation has only two possible outcomes:
Application is given requested amount of memory.
JVM throws OutOfMemoryError.
There is no intermediate options, like some amount of memory is allocated.
It has nothing to do with ArrayList, it's a JVM issue. If you asking whether ArrayList somehow manages this situation in a special way - then answer is "No, it does not." It just tries to allocate amount of memory it needs and lets JVM think about the rest.
In Java, the referrences to the object are stored in the contiguous memory. The actual objects can stay in a non contiguous manner. So for ex your array might have 10 objects, JVM only needs to reserve the memory for the object references, not the objects. So if each reference takes a Byte(approx not the correct value), but each object takes up a KB, and you have an array of 10 elements, JVm will try to reserve contguous memory of only 1*10 B i.e 10 B. The objects can reside in 10 different memory locations totalling 10KB. Remember that both the contiguous and non contiguous memory spaces are for the memory allocated to the thread.
When it needs to resize the array, the JVM tried to find a contiguos array of the newer length. So if you want to resize the array from 10 to 20 elements, it would try to reserve a contiguous space of 20 KB(using the above example). If it finds this space, it will do a copy of the references from the old array to the new array. If it does not find this space, it will try to do a GC . If it still does not find the space it throws an OutofMemoryException.
Therefore, at any time when you are resizing the array, the JVM needs to find a contiguos memory to store referrences of the new sized array. So if you want to extend the array to a size of say 1000 element, and each reference is a byte each , the JVm will try to find a contiguos memory of 1000* 1KB which is 1 MB.
If it finds this memory, it will do a copy of the references, and mark the oldeer contiguos memory for GC , whenever GC runs the next time
If it is not able to find the memory , it will try to do a GC, and if it still does not find the contiguos memory , it will throw a Out of memory exception
This is the code in ArrayList which does the resizing.
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/ArrayList.java#ArrayList.ensureCapacity%28int%29
This will throw an OutOfMemoryError as soon as there is not enough heap space to allocate the new array.
Garbage collection will always be done before this error is thrown. This will compact memory and eliminate all the arrays of smaller sizes that are no longer used. But there is no way to get around the fact that the old array, the new array, and all the contained objects need to all be in memory at once in order for the old contents to be copied into the new list.
So, if your memory limit is 10 MB, and the array takes up 2 MB and is being sized up to 3 MB, and the strings take up 6 MB, then OOM will be thrown even though after this operation you will only have 3 + 6 = 9 MB in memory. One way to avoid this, if you want to run really close to memory limits with a huge array, is to size the array to the full size to begin with so that it never needs to resize.
I assume it will run out of memory since there will be no space to use in the case where JVM can extend array size.
The very first thing I want to correct is, When we are adding new elements into our list, when number of elements reaches 100% of SIZE it reserves a new, contiguous part of memory and copies all of the elements.
New Size of ArrayList will be:
NewSize of ArrayList = (CurrentSize * 3/2) + 1
But going this way is never recommended, if we have idea how much objects needs to be stored then we can use following constructor of ArrayList:-
ArrayList ar = new ArrayList(int initialCapacity);
If our JVM couldn't specify enough contiguous space on heap for the ArrayList, At runtime we'll get
Runtime error: OutOfMemoryError

Alternatives to Java string interning

Since Java's default string interning has got a lot of bad press, I am looking for an alternative.
Can you suggest an API which is a good alternative to Java string interning? My application uses Java 6. My requirement is mainly to avoid duplicate strings via interning.
Regarding the bad press:
String intern is implemented via a native method. And the C implementation uses a fixed size of some 1k entries and scales very poorly for large number of strings.
Java 6 stores interned strings in Perm gen. And therefore are not GC'd and possibly lead to perm gen errors. I know this is fixed in java 7 but I can't upgrade to java 7.
Why do I need to use intering?
My application is a server app with heap size of 10-20G for different deployments.
During profiling we have figured that hundrends of thousands of string are duplicates and we can significantly improve the memory usage by avoiding storing duplicate strings.
Memory has been a bottleneck for us and therefore we are targetting it rather than doing any premature optimization.
String intern is implemented via a native method. And the C implementation uses a fixed size of some 1k entries and scales very poorly for large number of strings.
It scales poorly for many thousand Strings.
Java 6 stores interned strings in Perm gen. And therefore are not GC'd
It will be cleaned up when the perm gen is cleaned up which is not often but it can mean you reach the maximum of this space if you don't increase it.
My application is a server app with heap size of 10-20G for different deployments.
I suggest you consider using off heap memory. I have 500 GB in off heap memory and about 1 GB in heap in one application. It isn't useful in all cases but worth considering.
During profiling we have figured that hundrends of thousands of string are duplicates and we can significantly improve the memory usage by avoiding storing duplicate strings.
For this I have used a simple array of String. This is very light weight and you can control the upper bound of Strings stored easily.
Here is an example of generic interner.
class Interner<T> {
private final T[] cache;
#SuppressWarnings("unchecked")
public Interner(int primeSize) {
cache = (T[]) new Object[primeSize];
}
public T intern(T t) {
int hash = Math.abs(t.hashCode() % cache.length);
T t2 = cache[hash];
if (t2 != null && t.equals(t2))
return t2;
cache[hash] = t;
return t;
}
}
An interest property of this cache is it doesn't matter that its not thread safe.
For extra speed you can use a power of 2 size and a bit mask, but its more complicated and may not work very well depending on how your hashCodes are calculated.

Java throwing out of memory exception before it's really out of memory?

I wish to make a large int array that very nearly fills all of the memory available to the JVM. Take this code, for instance:
final int numBuffers = (int) ((runtime.freeMemory() - 200000L) / (BUFFER_SIZE));
System.out.println(runtime.freeMemory());
System.out.println(numBuffers*(BUFFER_SIZE/4)*4);
buffers = new int[numBuffers*(BUFFER_SIZE / 4)];
When run with a heap size of 10M, this throws an OutOfMemoryException, despite the output from the printlns being:
9487176
9273344
I realise the array is going to have some overheads, but not 200k, surely? Why does java fail to allocate memory for something it claims to have enough space for? I have to set that constant that is subtracted to something around 4M before Java will run this (By which time the printlns are looking more like:
9487176
5472256
)
Even more bewilderingly, if I replace buffers with a 2D array:
buffers = new int[numBuffers][BUFFER_SIZE / 4];
Then it runs without complaint using the 200k subtraction shown above - even though the amount of integers being stored is the same in both arrays (And wouldn't the overheads on a 2D array be larger than that of a 1D array, since it's got all those references to other arrays to store).
Any ideas?
The VM will divide the heap memory into different areas (mainly for the garbage collector), so you will run out of memory when you attempt to allocate a single object of nearly the entire heap size.
Also, some memory will already have been used up by the JRE. 200k is nothing with todays memory sizes, and 10M heap is almost unrealistically small for most applications.
The actual overhead of an array is relatively small, on a 32bit VM its 12 bytes IIRC (plus what gets wasted if the size is less than the minimal granularity, which is AFAIK 8 bytes). So in the worst case you have something like 19 bytes overhead per array.
Note that Java has no 2D (multi-dimensional) arrays, it implements this internally as an array of arrays.
In the 2D case, you are allocating more, smaller objects. The memory manager is objecting to the single large object taking up most of the heap. Why this is objectionable is a detail of the garbage collection scheme-- it's probably because something like it can move the smaller objects between generations and the heap won't accomodate moving the single large object around.
This might be due to memory fragmentation and the JVM's inability to allocate an array of that size given the current heap.
Imagine your heap is 10 x long:
xxxxxxxxxx
Then, you allocate an object 0 somehere. This makes your heap look like:
xxxxxxx0xx
Now, you can no longer allocate those 10 x spaces. You can not even allocate 8 xs, despite the fact that available memory is 9 xs.
The fact is that an array of arrays does not suffer from the same problem because it's not contiguous.
EDIT: Please note that the above is a very simplistic view of the problem. When in need of space in the heap, Java's garbage collector will try to collect as much memory as it can and, if really, really necessary, try to compact the heap. However, some objects might not be movable or collectible, creating heap fragmentation and putting you in the above situation.
There are also many other factors that you have to consider, some of which include: memory leaks either in the VM (not very likely) or your application (also not likely for a simple scenario), unreliability of using Runtime.freeMemory() (the GC might run right after the call and the available free memory could change), implementation details of each particular JVM, etc.
The point is, as a rule of thumb, don't always expect to have the full amount of Runtime.freeMemory() available to your application.

How many elements can a Stack store in Java?

Is there a maximum number of elements that can be stored in a stack? Is the only limitation the amount of storage available to the system?
For clarity, I'm referring to java.util.Stack.
If you are taling about java.util.Stack, then the limit is Integer.MAX_VALUE which is about 2 billion. However if you let it grow naturally, you get an exception if you add more than about 1.3 billion (10 * 2^28) as it will try to grow the underlying array to a size larger than is allowed.
IMHO Stack is a legacy class replaced in Java 1.2 (1998) I don't suggest you use it.
The storage capability is normally limited by memory available, either heap memory for stack data structures or stack memory for the call stack.

Categories