I was asked in an interview to calculate the memory usage for HashMap and how much estimated memory it will consume if you have 2 million items in it.
For example:
Map <String,List<String>> mp=new HashMap <String,List<String>>();
The mapping is like this.
key value
----- ---------------------------
abc ['hello','how']
abz ['hello','how','are','you']
How would I estimate the memory usage of this HashMap Object in Java?
The short answer
To find out how large an object is, I would use a profiler. In YourKit, for example, you can search for the object and then get it to calculate its deep size. This will give a you a fair idea of how much memory would be used if the object were stand alone and is a conservative size for the object.
The quibbles
If parts of the object are re-used in other structures e.g. String literals, you won't free this much memory by discarding it. In fact discarding one reference to the HashMap might not free any memory at all.
What about Serialisation?
Serialising the object is one approach to getting an estimate, but it can be wildly off as the serialisation overhead and encoding is different in memory and to a byte stream. How much memory is used depends on the JVM (and whether its using 32/64-bit references), but the Serialisation format is always the same.
e.g.
In Sun/Oracle's JVM, an Integer can take 16 bytes for the header, 4 bytes for the number and 4 bytes padding (the objects are 8-byte aligned in memory), total 24 bytes. However if you serialise one Integer, it takes 81 bytes, serialise two integers and they takes 91 bytes. i.e. the size of the first Integer is inflated and the second Integer is less than what is used in memory.
String is a much more complex example. In the Sun/Oracle JVM, it contains 3 int values and a char[] reference. So you might assume it uses 16 byte header plus 3 * 4 bytes for the ints, 4 bytes for the char[], 16 bytes for the overhead of the char[] and then two bytes per char, aligned to 8-byte boundary...
What flags can change the size?
If you have 64-bit references, the char[] reference is 8 bytes long resulting in 4 bytes of padding. If you have a 64-bit JVM, you can use +XX:+UseCompressedOops to use 32-bit references. (So look at the JVM bit size alone doesn't tell you the size of its references)
If you have -XX:+UseCompressedStrings, the JVM will use a byte[] instead of a char array when it can. This can slow down your application slightly but could improve you memory consumption dramatically. When a byte[] in used, the memory consumed is 1 byte per char. ;) Note: for a 4-char String, as in the example, the size used is the same due to the 8-byte boundary.
What do you mean by "size"?
As has been pointed out, HashMap and List is more complex as many, if not all, the Strings can be reused, possibly String literals. What you mean by "size" depends on how it is used. i.e. How much memory would the structure use alone? How much would be freed if the structure were discarded? How much memory would be used if you copied the structure? These questions can have different answers.
What can you do without a profiler?
If you can determine that the likely conservative size, is small enough, the exact size doesn't matter. The conservative case is likely to where you construct every String and entry from scratch. (I only say likely as a HashMap can have capacity for 1 billion entries even though it is empty. Strings with a single char can be a sub-string of a String with 2 billion characters)
You can perform a System.gc(), take the free memory, create the objects, perform another System.gc() and see how much the free memory has reduced. You may need to create the object many times and take an average. Repeat this exercise many times, but it can give you a fair idea.
(BTW While System.gc() is only a hint, the Sun/Oracle JVM will perform a Full GC every time by default)
I think that the question should be clarified because there is a difference between the size of the HashMap and the size of HashMap + the objects contained by the HashMap.
If you consider the size of the HashMap, in the example you provided, the HashMap stores one reference to the String "aby" and one reference to the List. So the multiple elements in the list do not matter. Only the reference to the list is stored in the value.
In a 32 bits JVM, in one Map entry, you have 4 bytes for the "aby" reference + 4 bytes for the List reference + 4 bytes for the "hashcode" int property of Map entry + 4 bytes for the "next" property of Map entry.
You also add the 4*(X-1) bytes references where the "X" is the number of empty buckets that the HashMap has created when you called the constructor new HashMap<String,List<String>>()
. According to http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html, it should be 16.
There are also loadFactor, modCount, threshold and size which are all primitive int type (16 more bytes) and header (8bytes).
So in the end, the size of your above HashMap would be 4 + 4 + 1 + (4*15) + 16 + 8 = 93 bytes
This is an approximation based on data that are owned by the HashMap. I think that maybe the interviewer was interested in seeing if you were aware of the way HashMap works (the fact for example that the default constructor create and array of 16 buckets for Map entry, the fact that the sizes of the objects stored in the HashMap do not affect the HashMap size since it only store the references).
HashMap are so widely used that under certain circumstances, it should be worth using the constructors with initial capacity and load factor.
Summary:
memory = hashmap_array_size*bucket_size
+ n*chained_list_node_size
+ sum(key string sizes)
+ sum(list_string_size(string_list) for each hashmap List<String> value)
= 254 MB
(theoretical in-interview estimate)
Test program total-memory-used-size for 2 million sample entries: (see below)
= 640 MB
(I recommend a simple test program like this for a quick true-total-size estimate)
A minimal estimate (actual implementation probably has a bit more overhead):
Assumed data structure:
Bucket: (Pointer to String key, Pointer to hash-chain-list first-node)
Chained List Node: (Pointer to List<String> value, Next-pointer)
(HashMap is a chained hash - each bucket has a list/tree of values)
(as of Java 8, the list switches to a tree after 8 items)
List<String> instance: (Pointer to first node)
List<String> Node: (Pointer to String value, Next-pointer)
Assumption to simplify this estimate: zero collisions, each bucket has max 1 value (ask interviewer if this is ok - to give a rough, initial answer)
Assumption: 64-bit JVM so 64-bit pointers so pointer_size=8 bytes
Assumption: HashMap underlying array is 50% full (by default, at 75% full, the hashmap is rehashed with double the size), so hashmap_array_size = 2*n
memory = hashmap_array_size*bucket_size
+ n*chained_list_node_size
+ sum(key string sizes)
+ sum(list_string_size(string_list) for each hashmap List<String> value)
So:
memory = (n*2)*(8*2)
+ n*(8*2) + ((2 length_field + 3 string_length)*n)
+ (n*(8 + 3*(8*2)
+ 3*(2 length_field + 4 string_length))
= 2000000*(2*8*2 + 8*2 + (2+3) + (8 + 3*8*2 + 3*(2+4)))
= 254000000
= 254 MB
n = number of items in the hash map
bucket_size = pointer_size*2
chained_list_node_size = pointer_size*2
list_string_size(list) = pointer_size +
list.size()*list_string_node_size
+ sum(string value sizes in this List<String> list)
list_string_node_size = pointer_size*2
String length bytes = length_field_size + string_characters
(UTF-8 is 1 byte per ascii character)
(length_field_size = size of integer = 2)
Assume all keys are length 3.
(we have to assume something to calculate space used)
so: sum(key string sizes) = (2 length_field + 3 string_length)*n
Assume all value string-lists are length 3 and each string is of length 4. So:
sum(list_string_size(string_list) for each hashmap List<String> value)
= n*(8 + 3*(8*2) + 3*(2 length_field + 4 string_length))
A simple test program would give a better real answer:
import java.util.*;
class TempTest {
public static void main(String[] args) {
HashMap<String, List<String>> map = new HashMap<>();
System.gc();
printMemory();
for (int i = 0; i < 2000000; ++i) {
map.put(String.valueOf(i), Arrays.asList(String.valueOf(i), String.valueOf(i) + "b", String.valueOf(i) + "c"));
}
System.gc();
printMemory();
}
private static void printMemory() {
Runtime runtime = Runtime.getRuntime();
long totalMemory = runtime.totalMemory();
long freeMemory = runtime.freeMemory();
System.out.println("Memory: Used=" + (totalMemory - freeMemory) + " Total=" + totalMemory + " Free=" + freeMemory);
}
}
For me, this took 640MB (after.Used - before.Used).
you can't know in advance without knowing what all the strings are, and how many items are in each list, or without knowing if the strings are all unique references.
The only way to know for sure, is to serialize the whole thing to a byte array (or temp file) and see exactly how many bytes that was.
Related
I was solving combination sum IV on leetcode (#377), which reads:
"Given an integer array with all positive numbers and no duplicates, find the number of possible combinations that add up to a positive integer target."
I solved it in Java using a top down recursive approach with a memoization array:
public int combinationSum4(int[] nums, int target){
int[] memo = new int[target+1];
for(int i = 1; i < target+1; i++) {
memo[i] = -1;
}
memo[0] = 1;
return topDownCalc(nums, target, memo);
}
public static int topDownCalc(int[] nums, int target, int[] memo) {
if (memo[target] >= 0) {
return memo[target];
}
int tot = 0;
for(int num : nums) {
if(target - num >= 0) {
tot += topDownCalc(nums, target - num, memo);
}
}
memo[target] = tot;
return tot;
}
Then I figured I was wasting time by initializing the entire memo array and could just use a Map instead (which would also save space / memory). So I rewrote the code as follows:
public int combinationSum4(int[] nums, int target) {
Map<Integer, Integer> memo = new HashMap<Integer, Integer>();
memo.put(0, 1);
return topDownMapCalc(nums, target, memo);
}
public static int topDownMapCalc(int[] nums, int target, Map<Integer, Integer> memo) {
if (memo.containsKey(target)) {
return memo.get(target);
}
int tot = 0;
for(int num : nums) {
if(target - num >= 0) {
tot += topDownMapCalc(nums, target - num, memo);
}
}
memo.put(target, tot);
return tot;
}
I am confused though, because after submitting the second version of my code Leetcode said it was slower and used more space than the first code. How does the HashMap use more space and run slower than an array whos values all had to be initialized and whos length is greater than the HashMaps size?
Then I figured I was wasting time by initializing the entire memo array
You could have stored 'answer + 1' instead, so that the default value (0) can now be a placeholder for 'not calculated yet', and save that initialization. Not that it is expensive. Let's dig into cache pages.
Cache pages
CPUs are complex beasts. They don't operate on memory directly; not anymore. They literally cannot do it; the chip's calculating parts are simply not hooked up. at all. Instead, the CPU has caches, which come in set sizes (for example, 64k - you can't have a single cache node hold more or less than precisely 64k, and that entire 64k is then considered to be a cached copy of some 64k segment of main memory). One such node is called a cache page.
The CPU can only operate on cache pages.
In java, int[] leads to a contiguous, straight sized chunk of memory representing the data. In other words, an int[] x = new int[1000] would declare a single chunk of memory of 1000*4 = 4000 bytes (because ints are 4 bytes, and you reserved room for 1000 of em). That fits inside a single page. So, when you write your loop to initialize the values to -1, that's asking the CPU to loop through a single cache page and write some data to it. CPUs have pipelines and other speedup factors; this costs maybe 250 cycles.
Contrast to the cost of fetching a cache page: The CPU will be twiddling its thumbs (which is good; it can cool down some, and on modern hardware, often the CPU is limited not by its raw speed capabilities, but by the ability of the system to wick away the thermal impact of having it run! - it can also spend time on other threads/processes) whilst it farms out the job of fetching some chunk of memory into a cache page to the memory controller. Nevertheless, that thumb twiddling takes on the order of magnitude of 500 cycles or more. It's nice the CPU gets to cool down or focus on other things during it, but it's still the case that writing 4000 contiguous bytes in a tight loop is faster than a single cache miss.
Thus, 'fill a 1000-large int array with -1s' is an extremely cheap operation.
Wrapper objects
maps operate on objects, not ints, which is why you had to write Integer and not int. an Integer, in memory at least, is a much, much larger load on memory. It's an entire object, containing an int field. Then, your variable (or your map) holds a pointer to it.
So, an int[] x = new int[1000] takes 4000 bytes, plus some change for the object headers (maybe add 12 bytes to it all), and 1 reference (depends on VM, but let's say 64 bit), for a grand total of 4020 bytes.
In contrast,
Integer[] x = new Integer[1000];
for (int i = 0; i < 1000; i++) x[i] = i;`
is much, much larger. It's 1000 pointers (can be as large as 8 bytes per pointer, or as small as 4. So, 4000 to 8000 bytes), to 1000 separate integer objects. Each integer object gets the object overhead (~12 bytes or more), + 1 integer field, generally word-aligned (so, 64-bits, even though it's only 32-bit, assuming a 64-bit VM running on 64-bit hardware, which is going to be the case on anything modern), for another 20000 bytes. A grand total of something closer to 30000 bytes.
That is about 8x more memory required.
Then consider that the 'key' in your memoized array is inherent (it's the index into the array) whereas in the map the key needs separate storage and it gets worse still: Each k/v pair in your map occupies at least 12+12+8+8+8+8 bytes (2 object overheads and 2 int fields for your key and value Integer objects, and 2 pointers for the map to point at these), 56 bytes. In contrast to your int[] which does it in 4.
That gives you a rate of 56/4 = 14.
If your map contains only 1 in 14 numbers, then the map should be about as large as your array, because the map can do a thing your array can't: The array is as large as it has to be from the get-go, the map only needs to store required nodes.
Still, one would assume for most 'interesting' inputs, the coverage factor of that map is going to be far north of 7.14%, thus resulting in the map being larger.
The map also has its objects smeared out over memory which risks them being in more than one cache page: Large memory load + fragmentation = an easy road to having the CPU wait for multiple cache page fetches vs. being able to do all the work in one go, never having to wait for cache misses.
Can it be faster?
Yeah, probably - but with map occupancy rates at 10% or higher, the concept of using a map to save space is dubious. If you want to try, you'd need a map specifically designed to hold ints and nothing else. These do exist, such as eclipse collections' IntIntMap.
But I bet in this case the simple array memoization strategy is just the winner, even if you use IntIntMap.
These things came to my mind first:
HashMap is what the name implies, a hash-based map. Sowhenever you put something into it or get something out of it, it has to hash the key, then find the target based on that hash.
put() operation isn't just a walk in the park, either - you can check here to get an idea what it does. Definitely more than array assignment.
in java it doesn't work with primitives, so for each value you have to convert ints to Integers and vice versa. (as noted by others, there are int-specialized map alternatives available, but not in standard lib)
aaand since you're not initializing it, it might need to resize internally several times during your run - default size for a hashmap is just 16 - which is definitely more expensive then one-shot initialization you did with array. here's what each resizing does.
it also works with Entry objects that it needs for each internal entry it's got, and all those objects also take some space, plenty more than just having an array of integers
So I wouldn't think a hashmap would save you neither space or time. Why would it?
I have a rather large dataset consisting of 2.3GB worth of data spread over 160 Million byte[] arrays with an average data length of 15 bytes. The value for each byte[] key is only an int so memory usage of nearly half the hashmap (which is over 6GB) is made up of the 16 byte overhead of each byte array
overhead = 8 byte header + 4 byte length rounded up by VM to 16 bytes.
So my overhead is 2.5GB.
Does anybody know of a hashmap implementation that stores its (variable length) byte[] keys in one single large byte array so there would be no overhead (apart from 1 byte length field)?
I would rather not use an in memory DB as they usually have a performance overhead compared to a plain Trove TObjectIntHashMap that i'm using and I value cpu cycles even more then memory usage.
Thanks in advance
Since most personal computers have 16GB these days and servers often 32 - 128 GB or more, is there a real problem with there being a degree of bookkeeping overhead?
If we consider the alternative: the byte data concatenated into a single large array -- we should think about what individual values would have to look like, to reference a slice of a larger array.
Most generally you would start with:
public class ByteSlice {
protected byte[] array;
protected int offset;
protected int len;
}
However, that's 8 bytes + the size of a pointer (perhaps just 4 bytes?) + the JVM object header (12 bytes on a 64-bit JVM). So perhaps total 24 bytes.
If we try and make this single-purpose & minimalist, we're still going to need 4 bytes for the offset.
public class DedicatedByteSlice {
protected int offset;
protected byte len;
protected static byte[] getArray() {/*somebody else knows about the array*/}
}
This is still 5 bytes (probably padded to 8) + the JVM object header. Probably still total 20 bytes.
It seems that the cost of dereferencing with an offset & length, and having an object to track that, are not substantially less than the cost of storing the small array directly.
One further theoretical possibility -- de-structuring the Map Key so it is not an object
It is possible to conceive of destructuring the "length & offset" data such that it is no longer in an object. It then is passed as a set of scalar parameters eg (length, offset) and -- in a hashmap implementation -- would be stored by means of arrays of the separate components (eg. instead of a single Object[] keyArray).
However I expect it is extremely unlikely that any library offers an existing hashmap implementation for your (very particular) usecase.
If you were talking about the values it would probably be pointless, since Java does not offer multiple returns or method OUT parameters; which makes communication impractical without "boxing" destructured data back to an object. Since here you are asking about Map Keys specifically, and these are passed as parameters but need not be returned, such an approach may be theoretically possible to consider.
[Extended]
Even given this it becomes tricky -- for your usecase the map API probably has to become asymmetrical for population vs lookup, as population has to be by (offset, len) to define keys; whereas practical lookup is likely still by concrete byte[] arrays.
OTOH: Even quite old laptops have 16GB now. And your time to write this (times 4-10 to maintain) should be worth far more than the small cost of extra RAM.
So i created a very large array of byte[] so something like byte[50,000,000][16].. so according to my math thats 800,000,000 bytes which is 0.8GB plus some overhead.
To my surprise when to do
memoryBefore = runtime.totalMemory() - runtime.freeMemory()
its using 1.8GB of memory. when i point a profiler at it i get this
https://www.dropbox.com/s/el9vlav5ylg781z/Screen%20Shot%202015-08-30%20at%202.55.00%20pm.png?dl=0
I can see most byte[] are 24bytes not 16 as expected and i see quite a few much larger byte[] of size 472 or more.. Does anyone know whats going on here?
Thanks for reading
All objects have an overhead to maintain the object, information like "type" aka Class. See What is the memory consumption of an object in Java?.
Arrays also have a length, and the overhead might be bigger in 64-bit Java.
Since you are allocating 50,000,000 arrays of 16 bytes, you get 50_000_000 * (16 + overhead) + (overhead + 50_000_000 * pointerSize). Second part is the outer array of arrays.
Depending on your requirements, you might be able to improve this in one of two ways:
Flip the indices of you 2-dimensional array to byte[16][50_000_000]. This reduces overhead from 50,000,001 to 17 and reduced outer array size.
Use a single array byte[16 * 50_000_000] and do offset logic yourself. This will keep your 16 bytes contiguous and eliminate all overhead.
I can see most byte[] are 24bytes not 16
An object in Java has a couple of words of header information, in addition to the space that holds the object's fields. In the case of an array, there is an additional word to hold the array's length field. The length actually needs 4 bytes ('cos length is an int), but the JVM is most likely aligning to an 8 byte boundary on your platform.
If you are seeing 16 byte arrays occupying 24 bytes, then that accounting of space is most likely including (just) the length word.
(Note that the actual space occupied by object / array headers is JVM and platform specific.)
... I see quite a few much larger byte[] of size 472 or more.
Those are unrelated. If some other part of your code is not creating them explicitly, they are most likely created by Java runtime libraries.
I'm trying to create a very large linked list as below, but it failed (ran as a unit test in maven. Already set heap size by running "set MAVEN_OPTS=-Xmx4096m" (i'm running in windows).
The code failed after inserting about 6M (6000000) Long items into the list. Why? Considering Long type is 8 bytes, 6M Long type variables are just 48M bytes. Even if Java object has some additional hidden fields, it shouldn't fail so early.
int N = 100000000;
try {
LinkedList<Long> buffer = new LinkedList<Long> ();
for(int i=0;i<N;++i) {
buffer.add((long)i);
if (i % 1000000 == 0) {
System.out.println("added " + i);
}
}
catch(Exception e)
{...}
Each entry in the list is a separate object, with a reference to the previous and next nodes, as well as the current value.
If we assume 8 bytes per reference, and an object overhead of 16 bytes, that means for each entry you've got:
A entry object: 40 bytes (3 references + overhead)
A Long object: 24 bytes (data + overhead)
So after 6000000 entries that would be about 384M... which you should still be okay with. (Depending on your JVM, I'd expect the reference size and per object overhead to be lower, too.)
I wonder whether your MAVEN_OPTS is either being set in the wrong place, or not used for JVM arguments for some reason. I've just tried running this on my Windows box (not as a unit test - just as a main method) and with the JVM default allocation, it fails after 6 million entries for me too. With -Xmx1024M it gets to 25 million entries, which suggests a smaller overhead than I've estimated above. (I'm on a 32-bit VM though.)
That certainly suggests your MAVEN_OPTS isn't doing what you want...
LinkedList is about the most memory inefficient way to store a series of longs. You have the Long object which is about 3x larger than a plain long and you have a linkedList entry which is doubly linked making its about 5x larger than a plain long.
I sugegst you use a long[] or a wrapper like TLongArrayList which use almost 8 bytes per long for a large collection. (It will be faster too)
Don't forget the memory taken up by
LinkedList$Entry
Is the memory space consumed by one object with 100 attributes the same as that of 100 objects, with one attribute each?
How much memory is allocated for an object?
How much additional space is used when adding an attribute?
Mindprod points out that this is not a straightforward question to answer:
A JVM is free to store data any way it pleases internally, big or little endian, with any amount of padding or overhead, though primitives must behave as if they had the official sizes.
For example, the JVM or native compiler might decide to store a boolean[] in 64-bit long chunks like a BitSet. It does not have to tell you, so long as the program gives the same answers.
It might allocate some temporary Objects on the stack.
It may optimize some variables or method calls totally out of existence replacing them with constants.
It might version methods or loops, i.e. compile two versions of a method, each optimized for a certain situation, then decide up front which one to call.
Then of course the hardware and OS have multilayer caches, on chip-cache, SRAM cache, DRAM cache, ordinary RAM working set and backing store on disk. Your data may be duplicated at every cache level. All this complexity means you can only very roughly predict RAM consumption.
Measurement methods
You can use Instrumentation.getObjectSize() to obtain an estimate of the storage consumed by an object.
To visualize the actual object layout, footprint, and references, you can use the JOL (Java Object Layout) tool.
Object headers and Object references
In a modern 64-bit JDK, an object has a 12-byte header, padded to a multiple of 8 bytes, so the minimum object size is 16 bytes. For 32-bit JVMs, the overhead is 8 bytes, padded to a multiple of 4 bytes. (From Dmitry Spikhalskiy's answer, Jayen's answer, and JavaWorld.)
Typically, references are 4 bytes on 32bit platforms or on 64bit platforms up to -Xmx32G; and 8 bytes above 32Gb (-Xmx32G). (See compressed object references.)
As a result, a 64-bit JVM would typically require 30-50% more heap space. (Should I use a 32- or a 64-bit JVM?, 2012, JDK 1.7)
Boxed types, arrays, and strings
Boxed wrappers have overhead compared to primitive types (from JavaWorld):
Integer: The 16-byte result is a little worse than I expected because an int value can fit into just 4 extra bytes. Using an Integer costs me a 300 percent memory overhead compared to when I can store the value as a primitive type
Long: 16 bytes also: Clearly, actual object size on the heap is subject to low-level memory alignment done by a particular JVM implementation for a particular CPU type. It looks like a Long is 8 bytes of Object overhead, plus 8 bytes more for the actual long value. In contrast, Integer had an unused 4-byte hole, most likely because the JVM I use forces object alignment on an 8-byte word boundary.
Other containers are costly too:
Multidimensional arrays: it offers another surprise.
Developers commonly employ constructs like int[dim1][dim2] in numerical and scientific computing.
In an int[dim1][dim2] array instance, every nested int[dim2] array is an Object in its own right. Each adds the usual 16-byte array overhead. When I don't need a triangular or ragged array, that represents pure overhead. The impact grows when array dimensions greatly differ.
For example, a int[128][2] instance takes 3,600 bytes. Compared to the 1,040 bytes an int[256] instance uses (which has the same capacity), 3,600 bytes represent a 246 percent overhead. In the extreme case of byte[256][1], the overhead factor is almost 19! Compare that to the C/C++ situation in which the same syntax does not add any storage overhead.
String: a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead.
For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.
Alignment
Consider this example object:
class X { // 8 bytes for reference to the class definition
int a; // 4 bytes
byte b; // 1 byte
Integer c = new Integer(); // 4 bytes for a reference
}
A naïve sum would suggest that an instance of X would use 17 bytes. However, due to alignment (also called padding), the JVM allocates the memory in multiples of 8 bytes, so instead of 17 bytes it would allocate 24 bytes.
It depends on the CPU architecture and JDK. For a modern JDK and 64-bit architecture, an object has 12-byte header and padding of 8 bytes - so the minimum object size is 16 bytes. You can use a tool called Java Object Layout to determine a size and get details about any entity's object layout and internal structure or guess this information by class reference. Example of output for Integer instance on my environment:
Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.lang.Integer object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 12 (object header) N/A
12 4 int Integer.value N/A
Instance size: 16 bytes (estimated, the sample instance is not available)
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
For Integer, the instance size is 16 bytes: 4-bytes int is placed in place right after the 12-byte header. And it doesn't need any additional "padding", because 16 is a multiple of 8 (which is a RAM word size on 64-bits architecture) without remainder.
Code sample:
import org.openjdk.jol.info.ClassLayout;
import org.openjdk.jol.util.VMSupport;
public static void main(String[] args) {
System.out.println(VMSupport.vmDetails());
System.out.println(ClassLayout.parseClass(Integer.class).toPrintable());
}
If you use maven, to get JOL:
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<version>0.3.2</version>
</dependency>
Each object has a certain overhead for its associated monitor and type information, as well as the fields themselves. Beyond that, fields can be laid out pretty much however the JVM sees fit (I believe) - but as shown in another answer, at least some JVMs will pack fairly tightly. Consider a class like this:
public class SingleByte
{
private byte b;
}
vs
public class OneHundredBytes
{
private byte b00, b01, ..., b99;
}
On a 32-bit JVM, I'd expect 100 instances of SingleByte to take 1200 bytes (8 bytes of overhead + 4 bytes for the field due to padding/alignment). I'd expect one instance of OneHundredBytes to take 108 bytes - the overhead, and then 100 bytes, packed. It can certainly vary by JVM though - one implementation may decide not to pack the fields in OneHundredBytes, leading to it taking 408 bytes (= 8 bytes overhead + 4 * 100 aligned/padded bytes). On a 64 bit JVM the overhead may well be bigger too (not sure).
EDIT: See the comment below; apparently HotSpot pads to 8 byte boundaries instead of 32, so each instance of SingleByte would take 16 bytes.
Either way, the "single large object" will be at least as efficient as multiple small objects - for simple cases like this.
It appears that every object has an overhead of 16 bytes on 32-bit systems (and 24-byte on 64-bit systems).
http://algs4.cs.princeton.edu/14analysis/ is a good source of information. One example among many good ones is the following.
http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf is also very informative, for example:
The total used / free memory of a program can be obtained in the program via
java.lang.Runtime.getRuntime();
The runtime has several methods which relate to the memory. The following coding example demonstrates its usage.
public class PerformanceTest {
private static final long MEGABYTE = 1024L * 1024L;
public static long bytesToMegabytes(long bytes) {
return bytes / MEGABYTE;
}
public static void main(String[] args) {
// I assume you will know how to create an object Person yourself...
List <Person> list = new ArrayList <Person> ();
for (int i = 0; i <= 100_000; i++) {
list.add(new Person("Jim", "Knopf"));
}
// Get the Java runtime
Runtime runtime = Runtime.getRuntime();
// Run the garbage collector
runtime.gc();
// Calculate the used memory
long memory = runtime.totalMemory() - runtime.freeMemory();
System.out.println("Used memory is bytes: " + memory);
System.out.println("Used memory is megabytes: " + bytesToMegabytes(memory));
}
}
Is the memory space consumed by one object with 100 attributes the same as that of 100 objects, with one attribute each?
No.
How much memory is allocated for an object?
The overhead is 8 bytes on 32-bit, 12 bytes on 64-bit; and then rounded up to a multiple of 4 bytes (32-bit) or 8 bytes (64-bit).
How much additional space is used when adding an attribute?
Attributes range from 1 byte (byte) to 8 bytes (long/double), but references are either 4 bytes or 8 bytes depending not on whether it's 32bit or 64bit, but rather whether -Xmx is < 32Gb or >= 32Gb: typical 64-bit JVM's have an optimisation called "-UseCompressedOops" which compress references to 4 bytes if the heap is below 32Gb.
No, registering an object takes a bit of memory too. 100 objects with 1 attribute will take up more memory.
The question will be a very broad one.
It depends on the class variable or you may call as states memory usage in java.
It also has some additional memory requirement for headers and referencing.
The heap memory used by a Java object includes
memory for primitive fields, according to their size (see below for Sizes of primitive types);
memory for reference fields (4 bytes each);
an object header, consisting of a few bytes of "housekeeping" information;
Objects in java also requires some "housekeeping" information, such as recording an object's class, ID and status flags such as whether the object is currently reachable, currently synchronization-locked etc.
Java object header size varies on 32 and 64 bit jvm.
Although these are the main memory consumers jvm also requires additional fields sometimes like for alignment of the code e.t.c.
Sizes of primitive types
boolean & byte -- 1
char & short -- 2
int & float -- 4
long & double -- 8
I've gotten very good results from the java.lang.instrument.Instrumentation approach mentioned in another answer. For good examples of its use, see the entry, Instrumentation Memory Counter from the JavaSpecialists' Newsletter and the java.sizeOf library on SourceForge.
In case it's useful to anyone, you can download from my web site a small Java agent for querying the memory usage of an object. It'll let you query "deep" memory usage as well.
no, 100 small objects needs more information (memory) than one big.
The rules about how much memory is consumed depend on the JVM implementation and the CPU architecture (32 bit versus 64 bit for example).
For the detailed rules for the SUN JVM check my old blog
Regards,
Markus