What is the memory consumption of an object in Java? - java

Is the memory space consumed by one object with 100 attributes the same as that of 100 objects, with one attribute each?
How much memory is allocated for an object?
How much additional space is used when adding an attribute?

Mindprod points out that this is not a straightforward question to answer:
A JVM is free to store data any way it pleases internally, big or little endian, with any amount of padding or overhead, though primitives must behave as if they had the official sizes.
For example, the JVM or native compiler might decide to store a boolean[] in 64-bit long chunks like a BitSet. It does not have to tell you, so long as the program gives the same answers.
It might allocate some temporary Objects on the stack.
It may optimize some variables or method calls totally out of existence replacing them with constants.
It might version methods or loops, i.e. compile two versions of a method, each optimized for a certain situation, then decide up front which one to call.
Then of course the hardware and OS have multilayer caches, on chip-cache, SRAM cache, DRAM cache, ordinary RAM working set and backing store on disk. Your data may be duplicated at every cache level. All this complexity means you can only very roughly predict RAM consumption.
Measurement methods
You can use Instrumentation.getObjectSize() to obtain an estimate of the storage consumed by an object.
To visualize the actual object layout, footprint, and references, you can use the JOL (Java Object Layout) tool.
Object headers and Object references
In a modern 64-bit JDK, an object has a 12-byte header, padded to a multiple of 8 bytes, so the minimum object size is 16 bytes. For 32-bit JVMs, the overhead is 8 bytes, padded to a multiple of 4 bytes. (From Dmitry Spikhalskiy's answer, Jayen's answer, and JavaWorld.)
Typically, references are 4 bytes on 32bit platforms or on 64bit platforms up to -Xmx32G; and 8 bytes above 32Gb (-Xmx32G). (See compressed object references.)
As a result, a 64-bit JVM would typically require 30-50% more heap space. (Should I use a 32- or a 64-bit JVM?, 2012, JDK 1.7)
Boxed types, arrays, and strings
Boxed wrappers have overhead compared to primitive types (from JavaWorld):
Integer: The 16-byte result is a little worse than I expected because an int value can fit into just 4 extra bytes. Using an Integer costs me a 300 percent memory overhead compared to when I can store the value as a primitive type
Long: 16 bytes also: Clearly, actual object size on the heap is subject to low-level memory alignment done by a particular JVM implementation for a particular CPU type. It looks like a Long is 8 bytes of Object overhead, plus 8 bytes more for the actual long value. In contrast, Integer had an unused 4-byte hole, most likely because the JVM I use forces object alignment on an 8-byte word boundary.
Other containers are costly too:
Multidimensional arrays: it offers another surprise.
Developers commonly employ constructs like int[dim1][dim2] in numerical and scientific computing.
In an int[dim1][dim2] array instance, every nested int[dim2] array is an Object in its own right. Each adds the usual 16-byte array overhead. When I don't need a triangular or ragged array, that represents pure overhead. The impact grows when array dimensions greatly differ.
For example, a int[128][2] instance takes 3,600 bytes. Compared to the 1,040 bytes an int[256] instance uses (which has the same capacity), 3,600 bytes represent a 246 percent overhead. In the extreme case of byte[256][1], the overhead factor is almost 19! Compare that to the C/C++ situation in which the same syntax does not add any storage overhead.
String: a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead.
For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.
Alignment
Consider this example object:
class X { // 8 bytes for reference to the class definition
int a; // 4 bytes
byte b; // 1 byte
Integer c = new Integer(); // 4 bytes for a reference
}
A naïve sum would suggest that an instance of X would use 17 bytes. However, due to alignment (also called padding), the JVM allocates the memory in multiples of 8 bytes, so instead of 17 bytes it would allocate 24 bytes.

It depends on the CPU architecture and JDK. For a modern JDK and 64-bit architecture, an object has 12-byte header and padding of 8 bytes - so the minimum object size is 16 bytes. You can use a tool called Java Object Layout to determine a size and get details about any entity's object layout and internal structure or guess this information by class reference. Example of output for Integer instance on my environment:
Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.lang.Integer object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 12 (object header) N/A
12 4 int Integer.value N/A
Instance size: 16 bytes (estimated, the sample instance is not available)
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
For Integer, the instance size is 16 bytes: 4-bytes int is placed in place right after the 12-byte header. And it doesn't need any additional "padding", because 16 is a multiple of 8 (which is a RAM word size on 64-bits architecture) without remainder.
Code sample:
import org.openjdk.jol.info.ClassLayout;
import org.openjdk.jol.util.VMSupport;
public static void main(String[] args) {
System.out.println(VMSupport.vmDetails());
System.out.println(ClassLayout.parseClass(Integer.class).toPrintable());
}
If you use maven, to get JOL:
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<version>0.3.2</version>
</dependency>

Each object has a certain overhead for its associated monitor and type information, as well as the fields themselves. Beyond that, fields can be laid out pretty much however the JVM sees fit (I believe) - but as shown in another answer, at least some JVMs will pack fairly tightly. Consider a class like this:
public class SingleByte
{
private byte b;
}
vs
public class OneHundredBytes
{
private byte b00, b01, ..., b99;
}
On a 32-bit JVM, I'd expect 100 instances of SingleByte to take 1200 bytes (8 bytes of overhead + 4 bytes for the field due to padding/alignment). I'd expect one instance of OneHundredBytes to take 108 bytes - the overhead, and then 100 bytes, packed. It can certainly vary by JVM though - one implementation may decide not to pack the fields in OneHundredBytes, leading to it taking 408 bytes (= 8 bytes overhead + 4 * 100 aligned/padded bytes). On a 64 bit JVM the overhead may well be bigger too (not sure).
EDIT: See the comment below; apparently HotSpot pads to 8 byte boundaries instead of 32, so each instance of SingleByte would take 16 bytes.
Either way, the "single large object" will be at least as efficient as multiple small objects - for simple cases like this.

It appears that every object has an overhead of 16 bytes on 32-bit systems (and 24-byte on 64-bit systems).
http://algs4.cs.princeton.edu/14analysis/ is a good source of information. One example among many good ones is the following.
http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf is also very informative, for example:

The total used / free memory of a program can be obtained in the program via
java.lang.Runtime.getRuntime();
The runtime has several methods which relate to the memory. The following coding example demonstrates its usage.
public class PerformanceTest {
private static final long MEGABYTE = 1024L * 1024L;
public static long bytesToMegabytes(long bytes) {
return bytes / MEGABYTE;
}
public static void main(String[] args) {
// I assume you will know how to create an object Person yourself...
List <Person> list = new ArrayList <Person> ();
for (int i = 0; i <= 100_000; i++) {
list.add(new Person("Jim", "Knopf"));
}
// Get the Java runtime
Runtime runtime = Runtime.getRuntime();
// Run the garbage collector
runtime.gc();
// Calculate the used memory
long memory = runtime.totalMemory() - runtime.freeMemory();
System.out.println("Used memory is bytes: " + memory);
System.out.println("Used memory is megabytes: " + bytesToMegabytes(memory));
}
}

Is the memory space consumed by one object with 100 attributes the same as that of 100 objects, with one attribute each?
No.
How much memory is allocated for an object?
The overhead is 8 bytes on 32-bit, 12 bytes on 64-bit; and then rounded up to a multiple of 4 bytes (32-bit) or 8 bytes (64-bit).
How much additional space is used when adding an attribute?
Attributes range from 1 byte (byte) to 8 bytes (long/double), but references are either 4 bytes or 8 bytes depending not on whether it's 32bit or 64bit, but rather whether -Xmx is < 32Gb or >= 32Gb: typical 64-bit JVM's have an optimisation called "-UseCompressedOops" which compress references to 4 bytes if the heap is below 32Gb.

No, registering an object takes a bit of memory too. 100 objects with 1 attribute will take up more memory.

The question will be a very broad one.
It depends on the class variable or you may call as states memory usage in java.
It also has some additional memory requirement for headers and referencing.
The heap memory used by a Java object includes
memory for primitive fields, according to their size (see below for Sizes of primitive types);
memory for reference fields (4 bytes each);
an object header, consisting of a few bytes of "housekeeping" information;
Objects in java also requires some "housekeeping" information, such as recording an object's class, ID and status flags such as whether the object is currently reachable, currently synchronization-locked etc.
Java object header size varies on 32 and 64 bit jvm.
Although these are the main memory consumers jvm also requires additional fields sometimes like for alignment of the code e.t.c.
Sizes of primitive types
boolean & byte -- 1
char & short -- 2
int & float -- 4
long & double -- 8

I've gotten very good results from the java.lang.instrument.Instrumentation approach mentioned in another answer. For good examples of its use, see the entry, Instrumentation Memory Counter from the JavaSpecialists' Newsletter and the java.sizeOf library on SourceForge.

In case it's useful to anyone, you can download from my web site a small Java agent for querying the memory usage of an object. It'll let you query "deep" memory usage as well.

no, 100 small objects needs more information (memory) than one big.

The rules about how much memory is consumed depend on the JVM implementation and the CPU architecture (32 bit versus 64 bit for example).
For the detailed rules for the SUN JVM check my old blog
Regards,
Markus

Related

Java hashmap with single byte[] array for all keys

I have a rather large dataset consisting of 2.3GB worth of data spread over 160 Million byte[] arrays with an average data length of 15 bytes. The value for each byte[] key is only an int so memory usage of nearly half the hashmap (which is over 6GB) is made up of the 16 byte overhead of each byte array
overhead = 8 byte header + 4 byte length rounded up by VM to 16 bytes.
So my overhead is 2.5GB.
Does anybody know of a hashmap implementation that stores its (variable length) byte[] keys in one single large byte array so there would be no overhead (apart from 1 byte length field)?
I would rather not use an in memory DB as they usually have a performance overhead compared to a plain Trove TObjectIntHashMap that i'm using and I value cpu cycles even more then memory usage.
Thanks in advance
Since most personal computers have 16GB these days and servers often 32 - 128 GB or more, is there a real problem with there being a degree of bookkeeping overhead?
If we consider the alternative: the byte data concatenated into a single large array -- we should think about what individual values would have to look like, to reference a slice of a larger array.
Most generally you would start with:
public class ByteSlice {
protected byte[] array;
protected int offset;
protected int len;
}
However, that's 8 bytes + the size of a pointer (perhaps just 4 bytes?) + the JVM object header (12 bytes on a 64-bit JVM). So perhaps total 24 bytes.
If we try and make this single-purpose & minimalist, we're still going to need 4 bytes for the offset.
public class DedicatedByteSlice {
protected int offset;
protected byte len;
protected static byte[] getArray() {/*somebody else knows about the array*/}
}
This is still 5 bytes (probably padded to 8) + the JVM object header. Probably still total 20 bytes.
It seems that the cost of dereferencing with an offset & length, and having an object to track that, are not substantially less than the cost of storing the small array directly.
One further theoretical possibility -- de-structuring the Map Key so it is not an object
It is possible to conceive of destructuring the "length & offset" data such that it is no longer in an object. It then is passed as a set of scalar parameters eg (length, offset) and -- in a hashmap implementation -- would be stored by means of arrays of the separate components (eg. instead of a single Object[] keyArray).
However I expect it is extremely unlikely that any library offers an existing hashmap implementation for your (very particular) usecase.
If you were talking about the values it would probably be pointless, since Java does not offer multiple returns or method OUT parameters; which makes communication impractical without "boxing" destructured data back to an object. Since here you are asking about Map Keys specifically, and these are passed as parameters but need not be returned, such an approach may be theoretically possible to consider.
[Extended]
Even given this it becomes tricky -- for your usecase the map API probably has to become asymmetrical for population vs lookup, as population has to be by (offset, len) to define keys; whereas practical lookup is likely still by concrete byte[] arrays.
OTOH: Even quite old laptops have 16GB now. And your time to write this (times 4-10 to maintain) should be worth far more than the small cost of extra RAM.

size of java byte[] in memory

So i created a very large array of byte[] so something like byte[50,000,000][16].. so according to my math thats 800,000,000 bytes which is 0.8GB plus some overhead.
To my surprise when to do
memoryBefore = runtime.totalMemory() - runtime.freeMemory()
its using 1.8GB of memory. when i point a profiler at it i get this
https://www.dropbox.com/s/el9vlav5ylg781z/Screen%20Shot%202015-08-30%20at%202.55.00%20pm.png?dl=0
I can see most byte[] are 24bytes not 16 as expected and i see quite a few much larger byte[] of size 472 or more.. Does anyone know whats going on here?
Thanks for reading
All objects have an overhead to maintain the object, information like "type" aka Class. See What is the memory consumption of an object in Java?.
Arrays also have a length, and the overhead might be bigger in 64-bit Java.
Since you are allocating 50,000,000 arrays of 16 bytes, you get 50_000_000 * (16 + overhead) + (overhead + 50_000_000 * pointerSize). Second part is the outer array of arrays.
Depending on your requirements, you might be able to improve this in one of two ways:
Flip the indices of you 2-dimensional array to byte[16][50_000_000]. This reduces overhead from 50,000,001 to 17 and reduced outer array size.
Use a single array byte[16 * 50_000_000] and do offset logic yourself. This will keep your 16 bytes contiguous and eliminate all overhead.
I can see most byte[] are 24bytes not 16
An object in Java has a couple of words of header information, in addition to the space that holds the object's fields. In the case of an array, there is an additional word to hold the array's length field. The length actually needs 4 bytes ('cos length is an int), but the JVM is most likely aligning to an 8 byte boundary on your platform.
If you are seeing 16 byte arrays occupying 24 bytes, then that accounting of space is most likely including (just) the length word.
(Note that the actual space occupied by object / array headers is JVM and platform specific.)
... I see quite a few much larger byte[] of size 472 or more.
Those are unrelated. If some other part of your code is not creating them explicitly, they are most likely created by Java runtime libraries.

Does using small datatypes reduce memory usage (From memory allocation not efficiency)?

Actually, my question is very similar with this one, but the post is focus on the C# only. Recently I read an article said that java will 'promote' some short types (like short) to 4 bytes in memory even if some bits are not used, so it can't reduce usage. (is it true ?)
So my question is how languages, especially C, C++ and java (as Manish said in this post talked about java), handles memory allocation of small datatypes. References or any approaches to figure out it are preferred. Thanks
C/C++ uses only the specified amount of memory but aligns the data (by default) to an address that is a multiple of some value, typically 4 bytes for 32 bit applications or 8 bytes for 64 bit.
So for example if the data is aligned on a 4 or 8 byte boundary then a "char" uses only one byte. An array of 5 chars will use 5 bytes. But the data item that is allocated after the 5 byte char array is placed at an address that skips 3 bytes to keep it correctly aligned.
This is for performance on most processors. There are usually pragmas like "pack" and "align" that can be used to change the alignment or disable it.
In C and C++, different approaches may be taken depending on how you've requested the memory.
For T* p = (T*)malloc(n * sizeof(T)); or T* p = new T[n]; then the data will occupy sizeof(T)*n bytes of memory, so if sizeof(T) is reduced (e.g. to int16_t instead of int32_t) then that space is reduced accordingly. That said, heap allocations tend to have some overheads, so few large allocations are better than a great many allocations for individual data items or very small arrays, where the overheads may be much more significant than small differences in sizeof(T).
For structures, static and stack usage, padding is more significant than for large arrays, as the following data item might be of a different type with different alignment requirements, resulting in more padding.
At the other extreme, you can apply bitfields to effectively pack values into the minimum number of bits they need - very dense compression indeed, though you need to rely on compiler pragmas/attributes if you want explicit control - the Standard leaves it unspecified when a bitfield might start in a new memory "word" (e.g. 32 bit memory word for a 32 bit process, 64 for 64) or wrap across separate words, where in the words the bits hold data vs padding etc.). Data types like C++ bitsets and vector<bool> may be more efficient than arrays of bool (which may well use an int for each element, but it's unspecified in the C++03 Standard).`

Why is Java's String memory usage said to be high?

On this blog post, it's said that the minimum memory usage of a String is:
8 * (int) ((((no chars) * 2) + 45) / 8) bytes.
So for the String "Apple Computers", the minimum memory usage would be 72 bytes.
Even if I have 10,000 String objects of twice that length, the memory usage would be less than 2Mb, which isn't much at all. So does that mean I'm underestimating the amount of Strings present in an enterprise application, or is that formula wrong?
Thanks
String storage in Java depends on how the string was obtained. The backing char array can be shared between multiple instances. If that isn't the case, you have the usual object overhead plus storage for one pointer and three ints which usually comes out to 16 bytes overhead. Then the backing array requires 2 bytes per char since chars are UTF-16 code units.
For "Apple Computers" where the backing array is not shared, the minimum cost is going to be
backing array for 16 chars -- 32B which aligns nicely on a word boundary.
pointer to array - 4 or 8B depending on the platform
three ints for the offset, length, and memoized hashcode - 12B
2 x object overhead - depends on the VM, but 8B is a good rule of thumb.
one int for the array length.
So roughly 72B of which the actual payload constitutes 44.4%. The payload constitutes more for longer strings.
In Java7, some JDK implementations are doing away with backing array sharing to avoid pinning large char[]s in memory. That allows them to do away with 2 of the three ints.
That changes the calculation to 64B for a string of length 16 of which the actual payload constitutes 50%.
Is it possible to save character data using less memory than a Java String? Yes.
Does it matter for "enterprise" applications (or even Android or J2ME applications, which have to get by on a lot less memory)? Almost never.
Premature optimization is the root...
Compared to a other data types that you have, it is definitely high. The other primitives use 32 bits,64 bits,etc.
And given that String is immutable, every time you perform any operation on it, you end up creating a new String object, consuming even more memory.

Is an array of one boolean in Java smaller than a standalone variable?

My searches on SO have failed me, so if this is a duplicate, please redirect me.
With that out of the way, my question: I learned, from experience and browsing SO that a Java boolean is stored as a 32-bit int if you declare it as a standalone value, but as an 8-bit byte if you declare it within an array. My question, here, is as follows: Which is more memory efficient? Does the meta data of the array make it bigger in memory than the alternative?
boolean oneVariable = false, oneArray[] = {false};
The Array is an actual Object that comes with a memory penalty (I believe 12 bytes) So the primitive boolean is smaller.
The "meta data" of the array includes:
8 bytes (32-bit JVM) or 16 bytes (64-bit JVM) for object header
4 bytes (32 bits) for the length of the array
Add on the 1 necessary byte for the boolean data and you have 13 bytes (32 bit) or 21 bytes (64 bit) at a minimum.
However, objects are allocated memory in 8-byte multiples, so even though you only need 12 or 20 bytes of overhead + 1 byte for the boolean, you'll end up using 16 or 24 bytes of memory, respectively, for your array object.
In addition to the 16/24 bytes the object itself will take up, you'll need 4 bytes (32 bit) or 8 bytes (64 bit) for the memory address of the object, totaling 20 or 32 bytes of memory, respectively, to store your boolean in an array.
The size of a standalone variable is JVM dependent. Java does not specify the size of storage, and in fact Oracle says
This data type represents one bit of information, but its "size" isn't something that's precisely defined.
Older JVMs use a 32-bit stack cell, used to hold local variables, method arguments, and expression values so a single boolean used as a variable would consume 4 bytes; making the array at least 5 times as expensive as for a single boolean. This answer may be different if, for example, the boolean is a class variable in which case it would just be a single byte added to the existing overhead. In newer JVMs a single boolean would only use 1 byte, but depending on its context and the 8-byte padding necessary to align memory addresses, could still consume up to 8 bytes of heap space. It would still be smaller than the boolean array.
As user949300 mentioned, all objects carry a penalty that make them larger than primitives. For only a single boolean though, memory doesn't really matter. If you are storing a large number of booleans, consider using a BitSet. I believe under the hood it uses approximately 1 bit per boolean (plus some overhead).
This Java specialist article is a good source for understanding the memory usage.

Categories