How to delete an (effectively) final array in Java - java

I want to delete remove the reference to a large array by nulling the reference after I used it. This gives me a compiler error however, since the parallel assignment to the array requires the array to be (effectivly) final (at least that is what I think the problem is...). How can I allow the garbage collection to remove the array?
double[][] arr = new double[n][n];
IntStream.range(0, n).parallel().forEach(i -> {
for(int j=0;j<i;j++) {
directDistances[i][j] = directDistances[j][i] = ...;
}
});
//Use arr here...
arr = null; //arr no longer needed.
//This gives the error "Local variable defined in an enclosing scope must be final or effectively final."

I want to delete remove the reference to a large array by nulling the reference after I used it
Don't.
All implementations that I am aware of in the JVM world, will scan thread stacks to find out reachable objects. This means that scope of the method has nothing to do with how long an Object is kept alive. In simpler words:
void yourMethod(){
byte [] bytes = ....
// use bytes array somehow
// stop using the byte array
// .... 10_000 lines of code
// done
}
immediately after the line // stop using the byte array, bytes IS eligible for garbage collection. It is not going to be eligible after the method ends. scope of the method (everything between { and }) does not influence how much bytes is going to stay alive. here is an example that proves this.

The array becomes eligible for garbage collection at the latest when the method returns - you don't need to set it to null.
If you have a long method and are concerned that the array is kept around for the rest of it, the solution is to write smaller methods. Dividing the functionality among smaller methods may also improve readability and reusability.
If you can't or don't want to write smaller methods, introducing separate blocks in the method may help. Local variable declarations are local to the block, so this "trick" also lets you re-use a variable name in different blocks in the method.
void largeMethod() {
first: {
final int number = 1;
}
second: {
final int number = 2;
}
}
Technically, the array becomes eligible for garbage collection after its last use, which can be in the middle of the method - before the variable goes out of scope. This is explicitly allowed by section 12.6.1 in the language specification:
Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
While the specification allows this optimization, it does not require it. If you find that the optimization is not being made in a particular situation and you need a better guarantee, splitting the big method into smaller methods will help.

Use AtomicReference ar = new AtomicReference () ; ar. set(arr) ;
This will provide you with an effectively final array
Then use ar.set() and ar.get() methods to modify the array

Related

How to "safely publish" lazily-generated effectively-immutable array

Java's present memory model guarantees that if the only reference to an object "George" is stored into a final field of some other object "Joe", and neither George nor Joe have never been seen by any other thread, all operations upon George which were performed before the store will be seen by all threads as having been performed before the store. This works out very nicely in cases where it makes sense to store into a final field a reference to an object which will never be mutated after that.
Is there any efficient way of achieving such semantics in cases where an object of mutable type is supposed to be lazily created (sometime after the owning object's constructor has finished execution)? Consider the fairly simple class ArrayThing which encapsulates an immutable array, but it offers a method (three versions with the same nominal purpose) to return the sum of all elements prior to a specified one. For purposes of this example, assume that many instances will be constructed without ever using that method, but on instances where that method is used, it will be used a lot; consequently, it's not worthwhile to precompute the sums when every instance of ArrayThing is constructed, but it is worthwhile to cache them.
class ArrayThing {
final int[] mainArray;
ArrayThing(int[] initialContents) {
mainArray = (int[])initialContents.clone();
}
public int getElementAt(int index) {
return mainArray[index];
}
int[] makeNewSumsArray() {
int[] temp = new int[mainArray.length+1];
int sum=0;
for (int i=0; i<mainArray.length; i++) {
temp[i] = sum;
sum += mainArray[i];
}
temp[i] = sum;
return temp;
}
// Unsafe version (a thread could be seen as setting sumOfPrevElements1
// before it's seen as populating array).
int[] sumOfPrevElements1;
public int getSumOfElementsBefore_v1(int index) {
int[] localElements = sumOfPrevElements1;
if (localElements == null) {
localElements = makeNewSumsArray();
sumOfPrevElements1 = localElements;
}
return localElements[index];
}
static class Holder {
public final int[] it;
public Holder(int[] dat) { it = dat; }
}
// Safe version, but slower to read (adds another level of indirection
// but no thread can possibly see a write to sumOfPreviousElements2
// before the final field and the underlying array have been written.
Holder sumOfPrevElements2;
public int getSumOfElementsBefore_v2(int index) {
Holder localElements = sumOfPrevElements2;
if (localElements == null) {
localElements = new Holder(makeNewSumsArray());
sumOfPrevElements2 = localElements;
}
return localElements.it[index];
}
// Safe version, I think; but no penalty on reading speed.
// Before storing the reference to the new array, however, it
// creates a temporary object which is almost immediately
// discarded; that seems rather hokey.
int[] sumOfPrevElements3;
public int getSumOfElementsBefore_v3(int index) {
int[] localElements = sumOfPrevElements3;
if (localElements == null) {
localElements = (new Holder(makeNewSumsArray())).it;
sumOfPrevElements3 = localElements;
}
return localElements[index];
}
}
As with the String#hashCode() method, it is possible that two or more threads might see that a computation hasn't been performed, decide to perform it, and store the result. Since all threads would end up producing identical results, that wouldn't be an issue. With getSumOfElementsBefore_v1(), however, there is a different problem: Java could re-order program execution so the array reference gets written to sumOfPrevElements1 before all the elements of the array have been written. Another thread which called getSumOfElementsBefore() at that moment could see that the array wasn't null, and then proceed to read an array element which hadn't yet been written. Oops.
From what I understand, getSumOfElementsBefore_v2() fixes that problem, since storing a reference to the array in final field Holder#it would establish a "happens-after" relationship with regard to the array element writes. Unfortunately, that version of the code would need to create and maintain an extra heap object, and would require that every attempt to access the sum-of-elements array go through an extra level of indirection.
I think getSumOfElementsBefore_v3() would be cheaper but still safe. The JVM guarantees that all actions which were done to a new object before a reference is stored into a final field will be visible to all threads by the time any thread can see that reference. Thus, even if other threads don't use Holder#it directly, the fact that they are using a reference which was copied from that field would establish that they can't see the reference until after everything that was done before the store has actually happened.
Even though the latter method limits the overhead (versus the unsafe method) to the times when the new array is created (rather than adding overhead to every read), it still seems rather ugly to create a new object purely for the purpose of writing and reading back a final field. Making the array field volatile would achieve legitimate semantics, but would add memory-system overhead every time the field is read (a volatile qualifier would require that the code notice if the field has been written in another thread, but that's overkill for this application; what's necessary is merely that any thread which does see that the field has been written also see all writes which occurred to the array identify thereby before the reference was stored). Is there any way to achieve similar semantics without having to either create and abandon a superfluous temporary object, or add additional overhead every time the field is read??
Your third version does not work. The guarantees made for a properly constructed object stored in a final instance field apply to reads of that final field only. Since the other threads don’t read that final variable, there is no guaranty made.
Most notably, the fact that the initialization of the array has to be completed before the array reference is stored in the final Holder.it variable does not say anything about when the sumOfPrevElements3 variable will be written (as seen by other threads). In practice, a JVM might optimize away the entire Holder instance creation as it has no side-effects, thus the resulting code behaves like an ordinary unsafe publication of an int[] array.
For using the final field publication guaranty you have to publish the Holder instance containing the final field, there is no way around it.
But if that additional instance annoys you, you should really consider using a simple volatile variable. After all, you are making only assumptions about the cost of that volatile variable, in other words, thinking about premature optimization.
After all, detecting a change made by another thread doesn’t have to be expensive, e.g. on x86 it doesn’t even need an access to the main memory as it has cache coherence. It’s also possible that an optimizer detects that you never write to the variable again once it became non-null, then enabling almost all optimizations possible for ordinary fields once a non-null reference has been read.
So the conclusion is as always: measure, don’t guess. And start optimizing only once you found a real bottleneck.
I think your second and third examples do work (sort of, as you say the reference itself might not be noticed by another thread, which might re-assign the array. That's a lot of extra work!).
But those examples are based on a faulty premise: it is not true that a volatile field requires the reader to "notice" the change. In fact, volatile and final fields perform exactly the same operation. The read operation of a volatile or a final has no overhead on most CPU architectures. I believe on a write volatile has a tiny amount of extra overhead.
So I would just use volatile here, and not worry about your supposed "optimizations". The difference in speed, if any, is going to be extremely slight, and I'm talking like an extra 4 bytes written with a bus-lock, if that. And your "optimized" code is pretty god-awful to read.
As a minor pendant, it is not true that final fields require you to have the sole reference to an object to make it immutable and thread safe. The spec only requires you to prevent changes to the object. Having the sole reference to an object is one way to prevent changes, sure. But objects that are already immutable (like java.lang.String for example) can be shared without problems.
In summary: Premature Optimization is the Root of All Evil.. Loose the tricky nonsense and just write a simple array update with assignment to a volatile.
volatile int[] sumOfPrevElements;
public int getSumOfElementsBefore(int index) {
if( sumOfPrevElements != null ) return sumOfPrevElements[index];
sumOfPrevElements = makeNewSumsArray();
return sumOfPrevElements[index];
}

Do uninitialized primitive instance variables use memory?

In Java, does it cost memory to declare a class level instance variable without initializing it?
For example: Does int i; use any memory if I don't initialize it with i = 5;?
Details:
I have a huge super-class that many different (not different enough to have their own super classes) sub-classes extend. Some sub-classes don't use every single primitive declared by the super-class. Can I simply keep such primitives as uninitialized and only initialize them in necessary sub-classes to save memory?
All members defined in your classes have default values, even if you don't initialize them explicitly, so they do use memory.
For example, every int will be initialized by default to 0, and will occupy 4 bytes.
For class members :
int i;
is the same as :
int i = 0;
Here's what the JLS says about instance variables :
If a class T has a field a that is an instance variable, then a new instance variable a is created and initialized to a default value (§4.12.5) as part of each newly created object of class T or of any class that is a subclass of T (§8.1.4). The instance variable effectively ceases to exist when the object of which it is a field is no longer referenced, after any necessary finalization of the object (§12.6) has been completed.
Yes, memory allocates though you are not assigning any value to it.
int i;
That takes 32 bit memory (allocation). No matter you are using it or not.
Some sub-classes don't use every single primitive declared by the super-Class. Can I simply keep such primitives as uninitialized and only initialize them in necessary sub-classes to save memory?
Again, no matter where you initialized, the memory allocates.
Only thing you need to take care is, just find the unused primitives and remove them.
Edit:
Adding one more point that unlike primitive's references by default value is null, which carries a a memory of
4 bytes(32-bit)
8 bytes on (64-bit)
The original question talks about class level variables and the answer is that they do use space, but it's interesting to look at method scoped ones too.
Let's take a small example:
public class MemTest {
public void doSomething() {
long i = 0; // Line 3
if(System.currentTimeMillis() > 0) {
i = System.currentTimeMillis();
System.out.println(i);
}
System.out.println(i);
}
}
If we look at the bytecode generated:
L0
LINENUMBER 3 L0
LCONST_0
LSTORE 1
Ok, as expected we assign a value at line 3 in the code, now if we change line 3 to (and remove the second println due to a compiler error):
long i; // Line 3
... and check the bytecode then nothing is generated for line 3. So, the answer is that no memory is used at this point. In fact, the LSTORE occurs only on line 5 when we assign to the variable. So, declaring an unassigned method variable does not use any memory and, in fact, doesn't generate any bytecode. It's equivalent to making the declaration where you first assign to it.
Yes. In your class level variables will assign to its default value even if you don't initialize them.
In this case you int variables will assign to 0 and will occupied 4 bytes per each.
Neither the Java Language Specification nor the Java Virtual Machine Specification specifies the answer to this because it's an implementation detail. In fact, JVMS §2.7 specifically says:
Representation of Objects
The Java Virtual Machine does not mandate any particular internal structure for objects.
In theory, a conformant virtual machine could implement objects which have a lot of fields using set of bit flags to mark which fields have been set to non-default values. Initially no fields would be allocated, the flag bits would be all 0, and the object would be small. When a field is first set, the corresponding flag bit would be set to 1 and the object would be resized to make space for it. [The garbage collector already provides the necessary machinery for momentarily pausing running code in order to relocate live objects around the heap, which would be necessary for resizing them.]
In practice, this is not a good idea because even if it saves memory it is complicated and slow. Access to fields would require temporarily locking the object to prevent corruption due to multithreading; then reading the current flag bits; and if the field exists then counting the set bits to calculate the current offset of the wanted field relative to the base of the object; then reading the field; and finally unlocking the object.
So, no general-purpose Java virtual machine does anything like this. Some objects with an exorbitant number of fields might benefit from it, but even they couldn't rely on it, because they might need to run on the common virtual machines which don't do that.
A flat layout which allocates space for all fields when an object is first instantiated is simple and fast, so that is the standard. Programmers assume that objects are allocated that way and thus design their programs accordingly to best take advantage of it. Likewise, virtual machine designers optimize to make that usage fast.
Ultimately the flat layout of fields is a convention, not a rule, although you can rely on it anyway.
In Java, when you declare a class attribute such as String str;, you are declaring a reference to an object, but it is not pointing yet to any object unless you affect a value to it str=value;. But as you may guess, the reference, even without pointing to a memory place, consumes itself some memory.

how does garbage collection with respect to Strings in Java?

I am reading about memory management in JVM and that if an object has no more references to it, it is garbage collected.
lets say, I have a program
public test {
public static void main(String[ ] args) {
String name = "hello";
for (int i =0 ; i < 5; i++) {
System.out.println(i);
}
}
}
As you can see, the String name is not used anywhere, so its reference is kept through out and not garbage collected.
now I have,
String name = "hello"
String name2 = name.substring(1,4)//"ell"
here again, the reference for hello must be present always, and cannot be garbage collected, since name2 uses it.
so when do these String or any objects get garbage collected, which have references but are obsolete, i.e. no longer used in code?
I can see one scenario where trimming down an array causes memory leak and hence setting its reference to null is a good way to garbage collect those obsolete references.
I can see one scenario where trimming down an array causes memory leak
and hence setting its reference to null is a good way to garbage
collect those obsolete references.
Strings are reference types, so all the rules for reference types with respect to garbage collection apply to strings. The JVM may also do some optimizations on String literals but if you're worrying about these, then you're probably thinking too hard.
When does the JVM collect unreferenced objects?
The only answer that matters is: you can't tell and it needn't ever, but if it does you can't know when that will be. You should never write Java code around deterministic garbage collection. It is unnecessary and fraught with ugliness.
Speaking generally, if you confine your reference variables (including arrays or collections of reference types) to the narrowest possible scope, then you'll already have gone a long way toward not having to worry about memory leaks. Long-lived reference types will require some care and feeding.
"Trimming" arrays (unreferencing array elements by assigning null to them) is ONLY necessary in the special case where the array represents your own system for managing memory, eg. if you are making your own cache or queue of objects.
Because the JVM can't know that your array is "managing memory" it can't collect unused objects in it that are still referenced but are expired. In cases where an array represents your own system for managing memory, then you should assign null to array elements whose objects have expired (eg. popped off a queue; J. Bloch, Essential Java, 2nd Ed.).
Technically, the JVM is not required to garbage-collect objects ever. In practice, they usually come behind a little while after the last reference is gone and free up the memory.
First, be aware that constants are always going to be around. Even if you assign a new value to name, the system still has a copy of "hello" stored with the class that it will reuse every time you hit that initializer statement.
However, don't confuse using an object for some sort of calculation with keeping a reference to it forever. In your second example, while "hello" is in fact kept around, that's just because it's living in the constant pool; name2 doesn't have any sort of "hold" on it that keeps it in memory. The call to substring executes and finishes, and there's no eternal hold on name. (The actual implementation in the Oracle JVM shares the underlying char[], but that's implementation-dependent.)
Clearing out arrays is a good practice because it's common for them to be long-lived and reused. If the entire array gets garbage collected, the references it holds get erased (and their objects garbage collected if those were the last ones).
Every variable in Java has a scope: The piece of code during which the variable is defined. The scope of a local variable like name in your example is between the brackets {} it is in. Thus, the name variable will be defined when the thread reaches the String name = "hello"; declaration, and will be kept alive until the main method is finished (because then the brackets the variable was in are closed).
Strings are a different story though than other variables. Strings are cached internally and may not actually be made available for the garbage collector yet.

Is it faster to access final local variables than class variables in Java?

I've been looking at at some of the java primitive collections (trove, fastutil, hppc) and I've noticed a pattern that class variables are sometimes declared as final local variables. For example:
public void forEach(IntIntProcedure p) {
final boolean[] used = this.used;
final int[] key = this.key;
final int[] value = this.value;
for (int i = 0; i < used.length; i++) {
if (used[i]) {
p.apply(key[i],value[i]);
}
}
}
I've done some benchmarking, and it appears that it is slightly faster when doing this, but why is this the case? I'm trying to understand what Java would do differently if the first three lines of the function were commented out.
Note: This seems similiar to this question, but that was for c++ and doesn't address why they are declared final.
Accessing local variable or parameter is a single step operation: take a variable located at offset N on the stack. If you function has 2 arguments (simplified):
N = 0 - this
N = 1 - first argument
N = 2 - second argument
N = 3 - first local variable
N = 4 - second local variable
...
So when you access local variable, you have one memory access at fixed offset (N is known at compilation time). This is the bytecode for accessing first method argument (int):
iload 1 //N = 1
However when you access field, you are actually performing an extra step. First you are reading "local variable" this just to determine the current object address. Then you are loading a field (getfield) which has a fixed offset from this. So you perform two memory operations instead of one (or one extra). Bytecode:
aload 0 //N = 0: this reference
getfield total I //int total
So technically accessing local variables and parameters is faster than object fields. In practice, many other factors may affect performance (including various levels of CPU cache and JVM optimizations).
final is a different story. It is basically a hint for the compiler/JIT that this reference won't change so it can make some heavier optimizations. But this is much harder to track down, as a rule of thumb use final whenever possible.
The final keyword is a red herring here.
The performance difference comes because they are saying two different things.
public void forEach(IntIntProcedure p) {
final boolean[] used = this.used;
for (int i = 0; i < used.length; i++) {
...
}
}
is saying, "fetch a boolean array, and for each element of that array do something."
Without final boolean[] used, the function is saying "while the index is less than the length of the current value of the used field of the current object, fetch the current value of the used field of the current object and do something with the element at index i."
The JIT might have a much easier time proving loop bound invariants to eliminate excess bound checks and so on because it can much more easily determine what would cause the value of used to change. Even ignoring multiple threads, if p.apply could change the value of used then the JIT can't eliminate bounds checks or do other useful optimizations.
In the generated VM opcodes local variables are entries on the operand stack while field references must be moved to the stack via an instruction that retrieves the value through the object reference. I imagine the JIT can make the stack references register references more easily.
it tells the runtime (jit) that in the context of that method call, those 3 values will never change, so the runtime does not need to continually load the values from the member variable. this may give a slight speed improvement.
of course, as the jit gets smarter and can figure out these things on its own, these conventions become less useful.
note, i didn't make it clear that the speedup is more from using a local variable than the final part.
Such simple optimizations are already included in JVM runtime. If JVM does naive access to instance variables, our Java applications will be turtle slow.
Such manual tuning probably worthwhile for simpler JVMs though, e.g. Android.

SCJP Question to Figure When Object Gets Garbage Collected?

I can't figure out an SCJP questions even after getting the right answer:
From the following code(source: http://scjptest.com), we need to determine when is object referenced as myInt be eligible for garbage collection:
01.public void doStuff() {
02. Integer arr[] = new Integer[5];
03. for (int i = 0; i < arr.length; i++) {
04. Integer myInt = new Integer(i);
05. arr[i] = myInt;
06. }
07. System.out.println("end");
08.}
The answer states that it is eligible for GC on line 6. But I think the object is simply not eligable for GC until after line 7. Because, the object being referenced as myInt is also referred to as arr[i] as well. So, don't you think, since, after myInt going out of scope, arr[] still has a reference to it till line 8?
The reasoning for the SCJP answer is that at line 6 there are no remaining statements in the scope of arr that refer to it. Under normal circumstances, this would make the array and its elements eligible for garbage collection.
(The Java Language Spec (12.6.1) says this:
"A reachable object is any object that can be accessed in any potential continuing computation from any live thread. Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner. "
As you can see, the real definition of reachability is not actually based on scoping.)
There is another twist to this question ...
If they had assigned i to myInt, autoboxing will use Integer.valueOf(i), and that method would have recorded the Integer object in a static cache. This cache would have caused the object to remain reachable ...
However, the Integer instance is created using new, so caching does not occur. And the object is unreachable at line 6.
arr[i] = myInt creates a copy of the reference to new Integer(i), not a reference to myInt; therefore, myInt isn't strictly required to exist after that assignment.
Contrary to popular belief, Java object variables contain references to objects and not the object itself. When one object variable is assigned to another, the reference gets copied instead of the object. AFAIK GC is for an object and not a reference. We all know the GC claims an object when no reference to it exists.
In my opinion, the object referenced by myInt wont be available for collection until the function doStuff returns (line 8). object referenced by myInt gets stored in arr which is in scope till the function returns.
arr and myInt are last referenced on line 5. Since it's not referenced in line 7 I can see why line 6 is the stated answer.
From JLS §12.6.1:
A reachable object is any object that can be accessed in any potential continuing computation from any live thread. Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
So, under this definition array referenced by arr can be considered unreachable after line 6, therfore its elements are unreachable as well.

Categories