When is Java variable created in memory? - java

I am wondering when is Java creating variables during runtime (when a function is called). Please see the examples below and answer whether those are one the same.
void function someFunction(boolean test) {
if (!test) {
return;
}
int[] myVar = new int[1000000];
...
}
void function someFunction(boolean test) {
int[] myVar = new int[1000000];
if (!test) {
return;
}
...
}
I wouldn't like so spend time allocating memory only for it to be deallocated moments later, so I need to know whether Java will allocate memory needed for a certain variable (or array) needed by a function at the very beginning of that function, regardless of where the declaration happened, or will it allocate memory when it reaches the point of declaration.
EDIT:
I'm terribly sorry for confusion I'm causing. When I say variable I mean object.

Probably at the point of method entry. It is a common compiler optimization to allocate a stack frame large enough to contain all locals. If that's so, it's pretty much a single subtraction to allocate space for them all. But you'd have to examine the bytecode to be sure.
However, in this:
int[] myVar = new int[1000000];
the 'variable' is a single reference, taking up 4 or 8 bytes. The object to which the variable refers is allocated at the point the initialization is encountered in execution, by the execution of the 'new' operator.
I suspect you need to read up on the distinction between variables and objects.

In general, the Java compiler is a bit dumb in its compilation since it leaves the optimization up to the runtime JVM. As such, the compilation is mostly a straightforward translation of source code into bytecode.
https://godbolt.org/z/5xT3KchrW
In this case with the OpenJDK 17.0.0 compiler, the allocation of the array is done roughly at the same point in the bytecode as where the source code indicates.
However, the local variable of the pointer to the array is "allocated" at the time the method is called via registers. While the JVM uses a stack frame in a way very similar to C/C++, on Android's Dalvik they instead use registers to hold local variables so it isn't ever actually "allocated" at all in memory (a design decision due to the focus on ARM -- with it's plentiful registers -- rather than x86 -- which is infamous for its lack of registers.

Related

How do GC work in middle of scope

I have code like:
public void foo()
{
Object x = new LongObject();
doSomething(x);
//More Code
// x is never used again
// x = null helps GB??
Object x2 = new LongObject();
doSomething(x2);
}
I would like that memory alocated by x could be free by GC if it's needed. But I don't know if set to null is necesary or compiler do it.
In point of fact, the JIT does liveness analysis on references (which at bytecode level are stored as slots in the current frame). If a reference is never again read from, its slot can be reused, and the JIT will know that. It is completely possible for an object to be garbage collected while a variable that refers to it is still in lexical scope, so long as the compiler and JIT are able to prove that the variable will never again be dereferenced.
The point is: scope is a construct of the language, and specifies what a name like x means at any point in the text of the program code that it occurs. Lifetime is a property of objects, and the JIT and GC manage that -- often in non-obvious ways.
Remember that the JIT can recompile your code while it's running, and will optimize your code as it sees what happens when it executes. Unless you're really certain you know what you're doing, don't try to outsmart the JIT. Write code that is correct and let the JIT do its job, and only worry about it if you have evidence that the JIT hasn't done its job well enough.
To answer your questions literally, the compiler (speaking of source code to bytecode compiler) never inserts null assignments, but still, assigning a variable to null is not necessary—usually.
As this answer explains, scope is a compile time thing and formally, an object is eligible to garbage collection, if it can not “be accessed in any potential continuing computation from any live thread”. But there is no guaranty about which eligible object will be identified by a particular implementation. As the linked answer also explains, JIT compiled code will only keep references to objects which will be subsequently accessed. This may go even further than you expect, allow garbage collection of objects that look like being in use in the source code, as runtime optimization may transform the code and reduce actual memory accesses.
But in interpreted mode, the analysis will not go so far and there might be object references in the current stack frame preventing the collection of the referent, despite the variable is not being used afterwards or even out of scope in the source code. There is no guaranty that switching from interpreted to compiled code while the method is executed is capable of getting rid of such a dangling references. It’s even unlikely that the hotspot optimizer considers compiling foo() when the actual heavy computation happens within doSomething.
Still, this is rarely an issue. Running interpreted happens only during the initialization or first time execution and even if these objects are large, there’s rarely a problem if such an object gets collected a bit later than it could. An average application consists of millions of objects.
However, if you ever think there could be an issue, you can easily fix this, without assigning null to the variable. Limit the scope:
public void foo()
{
{
Object x = new LongObject();
doSomething(x);
//More Code
}
{
Object x2 = new LongObject();
doSomething(x2);
}
}
Other than assigning null, limiting the scope of variables to the actual use is improving the source code quality, even in cases where it has no impact on the compiled code. While the scope is purely a source code thing, it can have an impact on the bytecode though. In the code above, compilers will reuse the location of x within the stack frame to store x2, so no dangling reference to the first LongObject exists during the second doSomething execution.
As said, this is rarely needed for memory management and improving source code quality should be driving you decisions, not attempts to help the garbage collector.

What happens to memory when stack is popped?

I have a function
public void f() {
int x = 72;
return;
}
So x is stored at possibly the address 0x9FFF.
When the function returns, what happens to the memory at that address? Is it still there? I.e is the value still 72? Or is it completely invalidated?
In C it is undefined behaviour.
In practice, if you were to try something like:
int *ptr;
void foo() {
bar();
printf("%d", *ptr);
}
void bar() {
int x = 72;
ptr = &x;
}
Then it's likely that in most implementations of C, foo() would print 72. This is because although the address referenced by ptr is available for reallocation, it's not likely to have been re-allocated yet, and nothing has overwritten that memory. The longer your program continues to run, initialising more local variables, and calling malloc(), the more likely it is that this memory address will be re-used, and the value will change.
However there's nothing in the C specification that says this must be the case - an implementation could zero that address as soon as it goes out of scope, or make the runtime panic when you try to read it, or, well, anything -- that's what "undefined" means.
As a programmer you should take care to avoid doing this. A lot of the time the bugs it would cause would be glaring, but some of the time you'll cause intermittent bugs, which are they hardest kind to track down.
In Java, while it's possible that the memory still contains 72 after it goes out of scope, there is literally no way to access it, so it does not affect the programmer. The only way it could be accessed in Java would be if there were an "official" reference to it, in which case it would not be marked for garbage collection, and isn't really out of scope.
Both the Java programming language and the Java virtual machine do not define what happens to the memory of a stack frame after the frame is popped. This is a low-level implementation detail that is masked by the higher level abstractions. In fact, the Java language and JVM bytecode make it impossible by design to retrieve already-deleted values from the stack (unlike C/C++).
In practice however, stack frames in Java will behave like stack frames in C. Growing the stack will bump its pointer (usually downward) and allocate space to store variables. Shrinking the stack will usually bump the pointer up and simply leave the old values in memory to rot without overwriting them. If you have low-level access to the JVM's stack memory region, this is the behavior you should expect to see.
Note that it is impossible in Java to do a trick like C where you attempt to read uninitialized stack variables:
static boolean firstTime = true;
public void f() {
int x;
if (firstTime) {
x = 72;
firstTime = false;
} else {
// Compile error: Variable 'x' may not have been initialized
System.out.println(x);
}
}
Other stack behaviors are possible in JVM implementations. For example, as frames are popped it is possible to unmap the 4 KiB virtual memory pages back to the operating system, which will actually erase the old values. Also on machine architectures such as the Mill, stack memory is treated specially so that growing the stack will always return a region filled with zero bytes, which saves the work of actually loading old values from memory.
Primitive types in Java are placed on the stack (into a local variables array of a frame). A new frame is created each time a method is invoked:
public void foo() {
int x = 72; // 'x' will be stored in the array of local variables of the frame
}
A frame is destroyed when its method invocation completes. At this moment all local variables and partial results might still reside on the stack, but they are abandoned and no longer available.
I'm not looking at the spec offhand, but I'm guessing that this isn't technically defined.
I actually tried something like that in C++ once and it was, in fact, 72 (or whatever I put there before the function call returned) if I remember correctly, so the machine didn't actually go through and write 0 to that location or something.
Some of this is an implementation detail, too. I implemented it in MIPS assembly language as well (I'll include a code sample if I can dig it up). Basically, when I needed registers, I'd just "grow" the stack by however many local variables I needed, store whatever the current values in the registers I needed were (so I could restore them later), and re-use the register. If that's the implementation, then the value could actually contain the value of a local variable in the caller. I don't think that that's exactly what Java's doing, though.
TL;DR It's an implementation detail, but in C at least odds are it won't overwrite the value in memory until it actually needs it. Java is much harder to predict.

Free heap memory with null then GC

Suppose I have this code:
DataStructure hugeData = Data.readLotsOfStuff(); // like gigabytes
DataStructure processedData = processData(hugeData);
// now I don't need hugeData, so
hugeData = null;
System.gc();
Is explicitly freeing memory like this a good practice?
It is generally not a good practice to care about those things. It makes the code harder to read (each additional line takes time to read) and in case of System.gc() all cases of bad runtime behaviour could happen. It will confuse the internal garbage collector predictions and typically trigger a very heavy/slow GCs. (Especially if your code is used in a larger application with concurrent uses it is hell).
Nulling a local variable (hard reference) early has issues:
a) if your reference is only used up to a certain point and your code block is much larger, then likely it is too large.
b) it is good to make local variables final, you can better read the code if you know the variable does not change. And those final variables cannot be nulled, anyway.
c) Java (runtime) compilers and interpreters get smarter by the minute. Let the VM decide when to free references.
Having said all that. If you leave the scope/block for a variable inside a method, you cannot expect the referenced object to become unreachable. It looks good to have variables narrow scoped, but for the problem of early freeing the reference it does not do much (unless a later scope re-uses the slot). If the method is long running (maybe a endless loop) it might be good to null the reference. But only if you reference a very large object tree which substantially would keep objects alive. Otherwise the additional assignment and code is not worth it.
method() {
if (...) {
byte[] ref=new byte[1*MB];
}
// B
while(true) { ... }
}
In this example the large 1MB buffer is not in scope of location 'B', but it is still on the runtime java stack (java VM has no concept of scopes, ref is kept in a local variable table slot till the method is left). Since the method will not terminate (soon), it might be acceptable to null it. Splitting up the method is the better idea.
And never call System.gc().

what actually compiler does when we declare variable?

let have an example.
public class test {
public static void main(String args[]) {
int a=5,b=4;
int c=a+b;
int d=9;
System.out.println("ANSWER PLEASE..");
}
}
Now when we execute this code what os does?
It first create a variable named a and alocates a memory address similar things for b and c.
now what happen to d. os creates a new memory address or it just reffer to the address of c as the value is same.
First of all, the compiler doesn't do much. It basically translates it into class-files / bytecode. In the bytecode there is a number called "max locals" which tells how many local variables are required to run the method.
The JVM on the other hand, which reads this information and runs the code makes sure memory is allocated on the stack frame to fit the required variables. How much it asks for is highly implementation dependent, and it may very well optimize the whole thing, and allocate fewer bytes than what's indicated by the code.
what happen to d. os creates a new memory address or it just reffer to the address of c as the value is same.
This is tagged Java. So let's keep the OS out of it. Everything happens in memory managed by the JVM, which is allocated in big chunks in advance.
You are talking about primitive data types. Those are easy, as they are stored by value, not as references to objects living elsewhere.
You are talking about local variables. Those are allocated on the running thread's call stack (not in heap memory). Since they are primitive here, too, this does not involve the heap at all.
In your case, there is memory allocated (on the stack), for the four integers. Each of them contains the value assigned to it, not a reference. Even if the same value is assigned to all of them, they exist separately. The memory is "freed" (not really freed, but no longer used by the thread) when the method returns.
If those were not integers, but Objects, you could "share pointers" (to objects on the heap), but integers are stored by value (four bytes each).

Are arrays of 'structs' theoretically possible in Java?

There are cases when one needs a memory efficient to store lots of objects. To do that in Java you are forced to use several primitive arrays (see below why) or a big byte array which produces a bit CPU overhead for converting.
Example: you have a class Point { float x; float y;}. Now you want to store N points in an array which would take at least N * 8 bytes for the floats and N * 4 bytes for the reference on a 32bit JVM. So at least 1/3 is garbage (not counting in the normal object overhead here). But if you would store this in two float arrays all would be fine.
My question: Why does Java not optimize the memory usage for arrays of references? I mean why not directly embed the object in the array like it is done in C++?
E.g. marking the class Point final should be sufficient for the JVM to see the maximum length of the data for the Point class. Or where would this be against the specification? Also this would save a lot of memory when handling large n-dimensional matrices etc
Update:
I would like to know wether the JVM could theoretically optimize it (e.g. behind the scene) and under which conditions - not wether I can force the JVM somehow. I think the second point of the conclusion is the reason it cannot be done easily if at all.
Conclusions what the JVM would need to know:
The class needs to be final to let the JVM guess the length of one array entry
The array needs to be read only. Of course you can change the values like Point p = arr[i]; p.setX(i) but you cannot write to the array via inlineArr[i] = new Point(). Or the JVM would have to introduce copy semantics which would be against the "Java way". See aroth's answer
How to initialize the array (calling default constructor or leaving the members intialized to their default values)
Java doesn't provide a way to do this because it's not a language-level choice to make. C, C++, and the like expose ways to do this because they are system-level programming languages where you are expected to know system-level features and make decisions based on the specific architecture that you are using.
In Java, you are targeting the JVM. The JVM doesn't specify whether or not this is permissible (I'm making an assumption that this is true; I haven't combed the JLS thoroughly to prove that I'm right here). The idea is that when you write Java code, you trust the JIT to make intelligent decisions. That is where the reference types could be folded into an array or the like. So the "Java way" here would be that you cannot specify if it happens or not, but if the JIT can make that optimization and improve performance it could and should.
I am not sure whether this optimization in particular is implemented, but I do know that similar ones are: for example, objects allocated with new are conceptually on the "heap", but if the JVM notices (through a technique called escape analysis) that the object is method-local it can allocate the fields of the object on the stack or even directly in CPU registers, removing the "heap allocation" overhead entirely with no language change.
Update for updated question
If the question is "can this be done at all", I think the answer is yes. There are a few corner cases (such as null pointers) but you should be able to work around them. For null references, the JVM could convince itself that there will never be null elements, or keep a bit vector as mentioned previously. Both of these techniques would likely be predicated on escape analysis showing that the array reference never leaves the method, as I can see the bookkeeping becoming tricky if you try to e.g. store it in an object field.
The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do new Point() the object you create is dynamically allocated on the heap. So if you allocate 100 Point instances by calling new Point() there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).
So how would a Point instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field in Point into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the original Point instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.
Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:
Point p = new Point(0, 0);
Point[] compressedA = {p}; //assuming 'p' is "optimally" stored as {0,0}
Point[] compressedB = {p}; //assuming 'p' is "optimally" stored as {0,0}
compressedA[0].setX(5)
compressedB[0].setX(1)
System.out.println(p.x);
System.out.println(compressedA[0].x);
System.out.println(compressedB[0].x);
...you would get:
0
5
1
...even though logically there should only be a single instance of Point. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.
Isn't this tantamount to providing trivial classes such as the following?
class Fixed {
float hiddenArr[];
Point pointArray(int position) {
return new Point(hiddenArr[position*2], hiddenArr[position*2+1]);
}
}
Also, it's possible to implement this without making the programmer explicitly state that they'd like it; the JVM is already aware of "value types" (POD types in C++); ones with only other plain-old-data types inside them. I believe HotSpot uses this information during stack elision, no reason it couldn't do it for arrays too?

Categories