Memory allocation : How much space does a reference occupy in Java?

Memory allocation : How much space does a reference occupy in Java? - java

In Java we have written a code:
A a1;
a1 = new A();
How many bytes of memory is reserved when compiler compiles the code:
A a1;

That's not specified by the Java standard and thus you should not worry about it.
Technically, references are usually as big as the machine's word size, i.e. 32 bit on a 32 bit machine and 64 bit on a 64 bit machine, though some 64 bit JVMs use special magic to allow 32 bit references.

One pointer's worth of memory is used on the stack. That should be 32 bits (4 bytes) unless your machine's in 64-bits mode.
edit:
I see that some people are confused and think that the A object itself is allocated on the stack. That is not the case in Java: all objects are allocated on the heap (modulo JIT optimizations of course). The line A a1; simply allocates pointer a1, initially set to NULL. The pointer itself is in the stack, though of course what it points to will be on the heap. The later call to new A() will allocate an A object on the heap, and the size of that allocation does depend on what's in A.

That depends on the platform and the implementation. For a 32-bit platform, a 4 byte pointer is used behind the scenes on object instances, regardless of the size of class A.
Edit:
The Java compiler does not reserve any memory for this, that's the runtime's (to be exact, the JIT's) responsibility.

A variable reference is a handle to an object on the heap, so it will take up a fixed amount (depending on the JVM implementation). However, just for that line, the compiler may not take up anything, since the variable has not been initialized yet. This is statically checked by the compiler, so it will know when it needs to allocate the variable and may in fact allocate it only when it is first assigned.
If you had a method:
public static void method() {
A a1;
}
I would expect the compiler to optimize it out completely, as it can't do anything with it.
All that being said, in Java programming, you just don't worry about these things, they are determined by the JVM implementation and Java is not suitable for byte-level memory concerns. If you are counting bytes like that, you should be using C or some similarly close-to-the-metal language.

Was your question: How much space does a reference occupy in Java?
If that's the case I'm not sure, sorry.
A a1;
All the above does is define a local variable on the execution stack so no heap memory is reserved.

Enough to store a reference to any A! :-)
Note that it's generally impossible to know exactly how many bytes a particular implementation will actually use for a particular allocation, even in low-level languages like C: malloc() itself is a function which obviously needs to maintain internal data structures. To avoid fragmentation, it usually allocates a 2^n-sized block of memory. And so on.
If you're concerned about how much memory is actually used, write a sample program, and run it through your profiler.

As has been mentioned, it will use either 32-bits or 64-bits, however if the reference is only placed in a register, it might not use any memory.

reference variable occupies bytes which are solely dependent on arhitecture of jvm (8 bytes for 64 bits and 4 for 32 bi

A a1; allocates on the stack, not the heap.
However, this is all up to implementation, and is not actually defined, as far as I know.

Even for the amount of memory in the stack, that will depend of what is contained/defined in A.

null does not occupy any space in memory.
Simply saying int occupies some bytes like float occupies some space in memory.
But for null no space is occupied in memory.
Send me details if my answer is wrong.
Try for system.memorrysize() like method in Java.

Related

When is Java variable created in memory?

I am wondering when is Java creating variables during runtime (when a function is called). Please see the examples below and answer whether those are one the same.
void function someFunction(boolean test) {
if (!test) {
return;
}
int[] myVar = new int[1000000];
...
}
void function someFunction(boolean test) {
int[] myVar = new int[1000000];
if (!test) {
return;
}
...
}
I wouldn't like so spend time allocating memory only for it to be deallocated moments later, so I need to know whether Java will allocate memory needed for a certain variable (or array) needed by a function at the very beginning of that function, regardless of where the declaration happened, or will it allocate memory when it reaches the point of declaration.
EDIT:
I'm terribly sorry for confusion I'm causing. When I say variable I mean object.

Probably at the point of method entry. It is a common compiler optimization to allocate a stack frame large enough to contain all locals. If that's so, it's pretty much a single subtraction to allocate space for them all. But you'd have to examine the bytecode to be sure.
However, in this:
int[] myVar = new int[1000000];
the 'variable' is a single reference, taking up 4 or 8 bytes. The object to which the variable refers is allocated at the point the initialization is encountered in execution, by the execution of the 'new' operator.
I suspect you need to read up on the distinction between variables and objects.

In general, the Java compiler is a bit dumb in its compilation since it leaves the optimization up to the runtime JVM. As such, the compilation is mostly a straightforward translation of source code into bytecode.
https://godbolt.org/z/5xT3KchrW
In this case with the OpenJDK 17.0.0 compiler, the allocation of the array is done roughly at the same point in the bytecode as where the source code indicates.
However, the local variable of the pointer to the array is "allocated" at the time the method is called via registers. While the JVM uses a stack frame in a way very similar to C/C++, on Android's Dalvik they instead use registers to hold local variables so it isn't ever actually "allocated" at all in memory (a design decision due to the focus on ARM -- with it's plentiful registers -- rather than x86 -- which is infamous for its lack of registers.

Is array in java virtually sequential memory data structure? or physically sequential?

I try to find what is difference between primitive java 'array' and 'List' data structure (like ArrayList), and find articles or Q&A like this (Difference between List and Array). Many articles including that link point that java primitive 'array' is 'sequential memory'. In this point, what is sequential exactly? is this really sequential in physical memory? or sequential in virtual memory? My guess is sequential in virtual memory, because OS assigns physical memory in general and application(JVM) doesn't care about specific memory allocation. But I do not know exact answer.

A Java array is sequential in virtual memory not necessarily in physical memory.
A user-space application (such as a JVM) has no say over whether the physical pages that make up its virtual address space are contiguous in memory. And in fact, it has no way of even knowing this in typical modern operating systems. This is all hidden from a user-space application by the machine's virtual memory hardware and (user-space) instruction set architecture.
Looking at the JVM spec is not going to be instructive on the physical memory issue. It is simply not relevant / out of scope.
The JVM spec doesn't mandate that arrays are contiguous in virtual memory. However, (hypothetical) array implementations that involved non-contiguous virtual memory would lead to expensive array operations, so you are unlikely to find a mainstream JVM that does this.
References:
JVM Spec 2.7 says:
"The Java Virtual Machine does not mandate any particular internal structure for objects."
Other parts of the spec imply that "objects" refers here to BOTH instances of classes AND arrays.
JVM Spec 2.4 talks about arrays, but it doesn't mention how they are represented in memory.
The difference between arrays and ArrayLists are at a higher level. Arrays have a fixed size. ArrayLists have a variable size. But under the hood, an ArrayList is implemented using a (single) array ... which can be reallocated (i.e. replaced) if the list grows too big.

You would have to look at the JVM specs to see whether any such requirement is made (whether arrays need to be sequential memory or not), but for efficiency purposes it makes sense that an array would be allocated in a malloc type way.
As for virtual vs. physical, everything (above the OS) works with virtual memory. The JVM isn't low level enough to have access to something the kernel does at Ring-0.
And lastly, why are you interested, are you writing your own JVM?

JVM gets virtual sequential memory from OS. Only at OS level it is possible to assign physical memory sequentially.
Also it's important to not confuse between sequential memory allocation and sequential access - sequential access means that a group of elements is accessed in a predetermined, ordered sequence. A data structure is said to have sequential access if one can only visit the values it contains in one particular order. The canonical example is the linked list.
Whereas sequential memory meaning assigning of sequential memory (not necessarily physically sequential, but virtually sequential).
Besides the link you posted some major differences between Array and ArrayList are:
Array is fixed in size, ArrayList is dynamic in size
Array can store primitives, ArrayList can only store Objects (Wrapper
types for primitives)
You can use generics with ArrayList
You can use add() method to insert element into ArrayList and you can
simply use assignment operator to store element into Array
References: Java67 article, Wikipedia

this might be an interesting article explaining your question.
Arrays are also objects in Java, so how an object looks like in memory applies to an array.
To summarise:
class A {
int x;
int y;
}
public void m1() {
int i = 0;
m2();
}
public void m2() {
A a = new A();
}
When m1 is invoked, a new frame (Frame-1) is pushed into the stack, and local variable i is also created in Frame-1.
Then m2 is invoked inside of m1, another new frame (Frame-2) is pushed into the stack. In m2, an object of class A is created in the heap and reference variable is put in Frame-2.
Physical memory locations are out of your hands and will be assigned by the OS
http://www.programcreek.com/2013/04/what-does-a-java-array-look-like-in-memory/

what actually compiler does when we declare variable?

let have an example.
public class test {
public static void main(String args[]) {
int a=5,b=4;
int c=a+b;
int d=9;
System.out.println("ANSWER PLEASE..");
}
}
Now when we execute this code what os does?
It first create a variable named a and alocates a memory address similar things for b and c.
now what happen to d. os creates a new memory address or it just reffer to the address of c as the value is same.

First of all, the compiler doesn't do much. It basically translates it into class-files / bytecode. In the bytecode there is a number called "max locals" which tells how many local variables are required to run the method.
The JVM on the other hand, which reads this information and runs the code makes sure memory is allocated on the stack frame to fit the required variables. How much it asks for is highly implementation dependent, and it may very well optimize the whole thing, and allocate fewer bytes than what's indicated by the code.

what happen to d. os creates a new memory address or it just reffer to the address of c as the value is same.
This is tagged Java. So let's keep the OS out of it. Everything happens in memory managed by the JVM, which is allocated in big chunks in advance.
You are talking about primitive data types. Those are easy, as they are stored by value, not as references to objects living elsewhere.
You are talking about local variables. Those are allocated on the running thread's call stack (not in heap memory). Since they are primitive here, too, this does not involve the heap at all.
In your case, there is memory allocated (on the stack), for the four integers. Each of them contains the value assigned to it, not a reference. Even if the same value is assigned to all of them, they exist separately. The memory is "freed" (not really freed, but no longer used by the thread) when the method returns.
If those were not integers, but Objects, you could "share pointers" (to objects on the heap), but integers are stored by value (four bytes each).

Are arrays of 'structs' theoretically possible in Java?

There are cases when one needs a memory efficient to store lots of objects. To do that in Java you are forced to use several primitive arrays (see below why) or a big byte array which produces a bit CPU overhead for converting.
Example: you have a class Point { float x; float y;}. Now you want to store N points in an array which would take at least N * 8 bytes for the floats and N * 4 bytes for the reference on a 32bit JVM. So at least 1/3 is garbage (not counting in the normal object overhead here). But if you would store this in two float arrays all would be fine.
My question: Why does Java not optimize the memory usage for arrays of references? I mean why not directly embed the object in the array like it is done in C++?
E.g. marking the class Point final should be sufficient for the JVM to see the maximum length of the data for the Point class. Or where would this be against the specification? Also this would save a lot of memory when handling large n-dimensional matrices etc
Update:
I would like to know wether the JVM could theoretically optimize it (e.g. behind the scene) and under which conditions - not wether I can force the JVM somehow. I think the second point of the conclusion is the reason it cannot be done easily if at all.
Conclusions what the JVM would need to know:
The class needs to be final to let the JVM guess the length of one array entry
The array needs to be read only. Of course you can change the values like Point p = arr[i]; p.setX(i) but you cannot write to the array via inlineArr[i] = new Point(). Or the JVM would have to introduce copy semantics which would be against the "Java way". See aroth's answer
How to initialize the array (calling default constructor or leaving the members intialized to their default values)

Java doesn't provide a way to do this because it's not a language-level choice to make. C, C++, and the like expose ways to do this because they are system-level programming languages where you are expected to know system-level features and make decisions based on the specific architecture that you are using.
In Java, you are targeting the JVM. The JVM doesn't specify whether or not this is permissible (I'm making an assumption that this is true; I haven't combed the JLS thoroughly to prove that I'm right here). The idea is that when you write Java code, you trust the JIT to make intelligent decisions. That is where the reference types could be folded into an array or the like. So the "Java way" here would be that you cannot specify if it happens or not, but if the JIT can make that optimization and improve performance it could and should.
I am not sure whether this optimization in particular is implemented, but I do know that similar ones are: for example, objects allocated with new are conceptually on the "heap", but if the JVM notices (through a technique called escape analysis) that the object is method-local it can allocate the fields of the object on the stack or even directly in CPU registers, removing the "heap allocation" overhead entirely with no language change.
Update for updated question
If the question is "can this be done at all", I think the answer is yes. There are a few corner cases (such as null pointers) but you should be able to work around them. For null references, the JVM could convince itself that there will never be null elements, or keep a bit vector as mentioned previously. Both of these techniques would likely be predicated on escape analysis showing that the array reference never leaves the method, as I can see the bookkeeping becoming tricky if you try to e.g. store it in an object field.

The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do new Point() the object you create is dynamically allocated on the heap. So if you allocate 100 Point instances by calling new Point() there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).
So how would a Point instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field in Point into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the original Point instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.
Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:
Point p = new Point(0, 0);
Point[] compressedA = {p}; //assuming 'p' is "optimally" stored as {0,0}
Point[] compressedB = {p}; //assuming 'p' is "optimally" stored as {0,0}
compressedA[0].setX(5)
compressedB[0].setX(1)
System.out.println(p.x);
System.out.println(compressedA[0].x);
System.out.println(compressedB[0].x);
...you would get:
0
5
1
...even though logically there should only be a single instance of Point. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.

Isn't this tantamount to providing trivial classes such as the following?
class Fixed {
float hiddenArr[];
Point pointArray(int position) {
return new Point(hiddenArr[position*2], hiddenArr[position*2+1]);
}
}
Also, it's possible to implement this without making the programmer explicitly state that they'd like it; the JVM is already aware of "value types" (POD types in C++); ones with only other plain-old-data types inside them. I believe HotSpot uses this information during stack elision, no reason it couldn't do it for arrays too?

Force memory allocation in Java

I am aware that memory allocation is not explicitly required in Java, as the JVM handles allocation behind the scenes. Even though I am not required to allocate memory, for the sake of testing a memory greedy application, how would I be able to hold objects of certain numbers of bytes?
The current solution is to instantiate arrays of the primitive 'byte'. If I want to hold 5 MB worth of objects, I create an array of bytes.
byte[] b = new byte[5000000];
Is there a better way to explicitly allocate memory in a Java JVM, if only for the sake creating / releasing objects of known size for some unit tests?

There isn't really a better way of doing it. 'new' is the only way to explicitly occupy memory (except allocating stack by calling a method, for example).
byte b = new byte[MEM_SIZE];
is the most controllable way of doing it. It won't allocate exactly 5000000 bytes, thanks to object overhead, but it's pretty close.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.