I read stuff about TheUnsafe, but I get confused by the fact that, unlike C/C++ we have to work out the offset of stuff, and there's also the 32bits VM vs the 64bits VM, which may or may not have different pointers sizes depending on a particular VM setting being turned on or off (also, I'm assuming all offsets to data are actually based on pointer arithmetic this would influence them to).
Unfortunately, it seems all the stuff ever written about how to use TheUnsafe stems from one article only (the one who happened to be the first) and all the others copy pasted from it to a certain degree. Not many of them exist, and some are not clear because the author apparently did not speak English.
My question is:
How can I find the offset of a field + the pointer to the instance that owns that field (or field of a field, or field, of a field, of a field...) using TheUnsafe
How can I use it to perform a memcpy to another pointer + offset memory address
Considering the data may have several GB in size, and considering the heap offers no direct control over data alignment and it may most certainly be fragmented because:
1) I don't think there's nothing stoping the VM from allocating field1 at offset + 10 and field2 at offset sizeof(field1) + 32, is there?
2) I would also assume the GC would move big chunks of data around, leading to a field with 1GB in size being fragmented sometimes.
So is the memcpy operation as I described even possible?
If data is fragmented because of GC, of course the heap has a pointer to where the next chunk of data is, but using the simple process described above doesn't seem to cover that.
so must the data be off-heap for this to (maybe) work? If so, how to allocate off-heap data using TheUnsafe, making such data work as a field of an instance and of course freeing the allocated memory once done with it?
I encourage anyone who didn't quite understand the question to ask for any specifics they need to know.
I also urge people to refrain from answering if their whole idea is "put all objects you need to copy in an array and useSystem.arraycopy. I know it's common practice in this wonderful forum to, instead of answering what's been asked, offering a complete alternate solution that, in principle, has nothing to do with the original question apart from the fact that it gets the same job done.
Best regards.
First a big warning: “Unsafe must die” http://blog.takipi.com/still-unsafe-the-major-bug-in-java-6-that-turned-into-a-java-9-feature/
Some prerequisites
static class DataHolder {
int i1;
int i2;
int i3;
DataHolder d1;
DataHolder d2;
public DataHolder(int i1, int i2, int i3, DataHolder dh) {
this.i1 = i1;
this.i2 = i2;
this.i3 = i3;
this.d1 = dh;
this.d2 = this;
}
}
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
DataHolder dh1 = new DataHolder(11, 13, 17, null);
DataHolder dh2 = new DataHolder(23, 29, 31, dh1);
The basics
To get the offset of a field (i1), you can use the following code:
Field fi1 = DataHolder.class.getDeclaredField("i1");
long oi1 = unsafe.objectFieldOffset(fi1);
and the access the field value of instance dh1 you can write
System.out.println(unsafe.getInt(dh1, oi1)); // will print 11
You can use similar code to access an object reference (d1):
Field fd1 = DataHolder.class.getDeclaredField("d1");
long od1 = unsafe.objectFieldOffset(fd1);
and you can use it to get the reference to dh1 from dh2:
System.out.println(dh1 == unsafe.getObject(dh2, od1)); // will print true
Field ordering and alignment
To get the offsets of all declared fields of a object:
for (Field f: DataHolder.class.getDeclaredFields()) {
if (!Modifier.isStatic(f.getModifiers())) {
System.out.println(f.getName()+" "+unsafe.objectFieldOffset(f));
}
}
On my test it seems that the JVM reorders fields as it sees fit (i.e. adding a field can yield completely different offsets on the next run)
An Objects address in native memory
It's important to understand that the following code is going to crash your JVM sooner or later, because the Garbage Collector will move your objects at random times, without you having any control on when and why it happens.
Also it's important to understand that the following code depends on the JVM type (32 bits versus 64 bits) and on some start parameters for the JVM (namely, usage of compressed oops on 64 bit JVMs).
On a 32 bit VM a reference to an object has the same size as an int. So what do you get if you call int addr = unsafe.getInt(dh2, od1)); instead of unsafe.getObject(dh2, od1))? Could it be the native address of the object?
Let's try:
System.out.println(unsafe.getInt(null, unsafe.getInt(dh2, od1)+oi1));
will print out 11 as expected.
On a 64 bit VM without compressed oops (-XX:-UseCompressedOops), you will need to write
System.out.println(unsafe.getInt(null, unsafe.getLong(dh2, od1)+oi1));
On a 64 bit VM with compressed oops (-XX:+UseCompressedOops), things are a bit more complicated. This variant has 32 bit object references that are turned into 64 bit addresses by multiplying them with 8L:
System.out.println(unsafe.getInt(null, 8L*(0xffffffffL&(dh2, od1)+oi1));
What is the problem with these accesses
The problem is the Garbage Collector together with this code. The Garbage Collector can move around objects as it pleases. Since the JVM knows about it's object references (the local variables dh1 and dh2, the fields d1 and d2 of these objects) it can adjust these references accordingly, your code will never notice.
By extracting object references into int/long variables you turn these object references into primitive values that happen to have the same bit-pattern as an object reference, but the Garbage Collector does not know that these were object references (they could have been generated by a random generator as well) and therefore does not adjust these values while moving objects around. So as soon as a Garbage Collection cycle is triggered your extracted addresses are no longer valid, and trying to access memory at these addresses might crash your JVM immediately (the good case) or you might trash your memory without noticing on the spot (the bad case).
Related
Does assigning an unused object reference to null in Java improve the garbage collection process in any measurable way?
My experience with Java (and C#) has taught me that is often counter intuitive to try and outsmart the virtual machine or JIT compiler, but I've seen co-workers use this method and I am curious if this is a good practice to pick up or one of those voodoo programming superstitions?
Typically, no.
But like all things: it depends. The GC in Java these days is VERY good and everything should be cleaned up very shortly after it is no longer reachable. This is just after leaving a method for local variables, and when a class instance is no longer referenced for fields.
You only need to explicitly null if you know it would remain referenced otherwise. For example an array which is kept around. You may want to null the individual elements of the array when they are no longer needed.
For example, this code from ArrayList:
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
Also, explicitly nulling an object will not cause an object to be collected any sooner than if it just went out of scope naturally as long as no references remain.
Both:
void foo() {
Object o = new Object();
/// do stuff with o
}
and:
void foo() {
Object o = new Object();
/// do stuff with o
o = null;
}
Are functionally equivalent.
In my experience, more often than not, people null out references out of paranoia not out of necessity. Here is a quick guideline:
If object A references object B and you no longer need this reference and object A is not eligible for garbage collection then you should explicitly null out the field. There is no need to null out a field if the enclosing object is getting garbage collected anyway. Nulling out fields in a dispose() method is almost always useless.
There is no need to null out object references created in a method. They will get cleared automatically once the method terminates. The exception to this rule is if you're running in a very long method or some massive loop and you need to ensure that some references get cleared before the end of the method. Again, these cases are extremely rare.
I would say that the vast majority of the time you will not need to null out references. Trying to outsmart the garbage collector is useless. You will just end up with inefficient, unreadable code.
Good article is today's coding horror.
The way GC's work is by looking for objects that do not have any pointers to them, the area of their search is heap/stack and any other spaces they have. So if you set a variable to null, the actual object is now not pointed by anyone, and hence could be GC'd.
But since the GC might not run at that exact instant, you might not actually be buying yourself anything. But if your method is fairly long (in terms of execution time) it might be worth it since you will be increasing your chances of GC collecting that object.
The problem can also be complicated with code optimizations, if you never use the variable after you set it to null, it would be a safe optimization to remove the line that sets the value to null (one less instruction to execute). So you might not actually be getting any improvement.
So in summary, yes it can help, but it will not be deterministic.
At least in java, it's not voodoo programming at all. When you create an object in java using something like
Foo bar = new Foo();
you do two things: first, you create a reference to an object, and second, you create the Foo object itself. So long as that reference or another exists, the specific object can't be gc'd. however, when you assign null to that reference...
bar = null ;
and assuming nothing else has a reference to the object, it's freed and available for gc the next time the garbage collector passes by.
It depends.
Generally speaking shorter you keep references to your objects, faster they'll get collected.
If your method takes say 2 seconds to execute and you don't need an object anymore after one second of method execution, it makes sense to clear any references to it. If GC sees that after one second, your object is still referenced, next time it might check it in a minute or so.
Anyway, setting all references to null by default is to me premature optimization and nobody should do it unless in specific rare cases where it measurably decreases memory consuption.
Explicitly setting a reference to null instead of just letting the variable go out of scope, does not help the garbage collector, unless the object held is very large, where setting it to null as soon as you are done with is a good idea.
Generally setting references to null, mean to the READER of the code that this object is completely done with and should not be concerned about any more.
A similar effect can be achieved by introducing a narrower scope by putting in an extra set of braces
{
int l;
{ // <- here
String bigThing = ....;
l = bigThing.length();
} // <- and here
}
this allows the bigThing to be garbage collected right after leaving the nested braces.
public class JavaMemory {
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f() {
{
byte[] data = new byte[dataSize];
//data = null;
}
byte[] data2 = new byte[dataSize];
}
public static void main(String[] args) {
JavaMemory jmp = new JavaMemory();
jmp.f();
}
}
Above program throws OutOfMemoryError. If you uncomment data = null;, the OutOfMemoryError is solved. It is always good practice to set the unused variable to null
I was working on a video conferencing application one time and noticed a huge huge huge difference in performance when I took the time to null references as soon as I didn't need the object anymore. This was in 2003-2004 and I can only imagine the GC has gotten even smarter since. In my case I had hundreds of objects coming and going out of scope every second, so I noticed the GC when it kicked in periodically. However after I made it a point to null objects the GC stopped pausing my application.
So it depends on what your doing...
Yes.
From "The Pragmatic Programmer" p.292:
By setting a reference to NULL you reduce the number of pointers to the object by one ... (which will allow the garbage collector to remove it)
I assume the OP is referring to things like this:
private void Blah()
{
MyObj a;
MyObj b;
try {
a = new MyObj();
b = new MyObj;
// do real work
} finally {
a = null;
b = null;
}
}
In this case, wouldn't the VM mark them for GC as soon as they leave scope anyway?
Or, from another perspective, would explicitly setting the items to null cause them to get GC'd before they would if they just went out of scope? If so, the VM may spend time GC'ing the object when the memory isn't needed anyway, which would actually cause worse performance CPU usage wise because it would be GC'ing more earlier.
Even if nullifying the reference were marginally more efficient, would it be worth the ugliness of having to pepper your code with these ugly nullifications? They would only be clutter and obscure the intent code that contains them.
Its a rare codebase that has no better candidate for optimisation than trying to outsmart the Garbage collector (rarer still are developers who succeed in outsmarting it). Your efforts will most likely be better spent elsewhere instead, ditching that crufty Xml parser or finding some opportunity to cache computation. These optimisations will be easier to quantify and don't require you dirty up your codebase with noise.
Oracle doc point out "Assign null to Variables That Are No Longer Needed" https://docs.oracle.com/cd/E19159-01/819-3681/abebi/index.html
"It depends"
I do not know about Java but in .net (C#, VB.net...) it is usually not required to assign a null when you no longer require a object.
However note that it is "usually not required".
By analyzing your code the .net compiler makes a good valuation of the life time of the variable...to accurately tell when the object is not being used anymore. So if you write obj=null it might actually look as if the obj is still being used...in this case it is counter productive to assign a null.
There are a few cases where it might actually help to assign a null. One example is you have a huge code that runs for long time or a method that is running in a different thread, or some loop. In such cases it might help to assign null so that it is easy for the GC to know its not being used anymore.
There is no hard & fast rule for this. Going by the above place null-assigns in your code and do run a profiler to see if it helps in any way. Most probably you might not see a benefit.
If it is .net code you are trying to optimize, then my experience has been that taking good care with Dispose and Finalize methods is actually more beneficial than bothering about nulls.
Some references on the topic:
http://blogs.msdn.com/csharpfaq/archive/2004/03/26/97229.aspx
http://weblogs.asp.net/pwilson/archive/2004/02/20/77422.aspx
In the future execution of your program, the values of some data members will be used to computer an output visible external to the program. Others might or might not be used, depending on future (And impossible to predict) inputs to the program. Other data members might be guaranteed not to be used. All resources, including memory, allocated to those unused data are wasted. The job of the garbage collector (GC) is to eliminate that wasted memory. It would be disastrous for the GC to eliminate something that was needed, so the algorithm used might be conservative, retaining more than the strict minimum. It might use heuristic optimizations to improve its speed, at the cost of retaining some items that are not actually needed. There are many potential algorithms the GC might use. Therefore it is possible that changes you make to your program, and which do not affect the correctness of your program, might nevertheless affect the operation of the GC, either making it run faster to do the same job, or to sooner identify unused items. So this kind of change, setting an unusdd object reference to null, in theory is not always voodoo.
Is it voodoo? There are reportedly parts of the Java library code that do this. The writers of that code are much better than average programmers and either know, or cooperate with, programmers who know details of the garbage collector implementations. So that suggests there is sometimes a benefit.
As you said there are optimizations, i.e. JVM knows the place when the variable was last used and the object referenced by it can be GCed right after this last point (still executing in current scope). So nulling out references in most cases does not help GC.
But it can be useful to avoid "nepotism" (or "floating garbage") problem (read more here or watch video). The problem exists because heap is split into Old and Young generations and there are different GC mechanisms applied: Minor GC (which is fast and happens often to clean young gen) and Major Gc (which causes longer pause to clean Old gen). "Nepotism" does not allow for garbage in Young gen to be collected if it is referenced by garbage which was already tenured to an Old gen.
This is 'pathological' because ANY promoted node will result in the promotion of ALL following nodes until a GC resolves the issue.
To avoid nepotism it's a good idea to null out references from an object which is supposed to be removed. You can see this technique applied in JDK classes: LinkedList and LinkedHashMap
private E unlinkFirst(Node<E> f) {
final E element = f.item;
final Node<E> next = f.next;
f.item = null;
f.next = null; // help GC
// ...
}
As we know when memory is moved to L caches on cpu it is moved with cachelines, thus the whole cache strading performance optimization...
Well in java when we define an array jmm guarantees that memory for each element will be allocated sequentially. However if we have array of references, those references can point randomly to different places in the memory.
My question is does java allocate actual objects memory sequentially? What optimizations do we have under the hood for this?
For example if we declare int[] we are confident those are all actually sequential in memory, but if we define a NewType (like struct) that has two int fields in it, and declare NewType[] will java figure out and keep actual memory sequentially or not?
My question is does java allocate actual objects memory sequentially?
This is not guaranteed, but most of the time the OpenJDK/Oracle JVM does. Some of the times it doesn't are;
when you allocate a large object in tenured space,
your TLAB is full and you need to get another one.
However, within the TLAB, it just allocates sequentially in memory.
declare NewType[] will java figure out and keep actual memory sequentially or not?
Java doesn't figure out anything, nor does it go out of it's way to allocate objects randomly in memory. In general, each new object will be immediately after the last one.
but if we define a NewType (like struct) that has two int fields in it, and declare NewType[] will java figure out and keep actual memory sequentially or not?
In this scenario java is not very cache-friendly because apart from primitive types java arrays are not packed data structures, they are arrays of references pointing to objects allocated elsewhere in memory.
I.e. there will be at least one level of indirection from the array to the object itself. This problem is often referred to as "pointer chasing".
I.e. usually the memory layout will look like this:
HlRRRRRRRRRRRRRRRRRRRRRRRRR0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0
Array | Obj | Obj | Obj | Obj | Obj | Obj | Obj |
H = object header
l = array length
R = reference
i = int
0 = various types of padding
You can use jol to inspect the memory layout of objects.
The JDK devs are working on Value types as part of project valhalla that will eventually allow packed arrays to exist, which may be needed as part of project panama, but this still is far off into the future.
In the meantime there are 3rd-party projects aim to provide similar features:
https://github.com/ObjectLayout/ObjectLayout
https://github.com/RichardWarburton/packed-objects-experiments
Other projects either use off-heap storage (e.g. via sun.misc.Unsafe) or views on ByteBuffer / byte[] arrays to create packed, cache-friendly data structures at the expense of more complicated APIs.
There are cases when one needs a memory efficient to store lots of objects. To do that in Java you are forced to use several primitive arrays (see below why) or a big byte array which produces a bit CPU overhead for converting.
Example: you have a class Point { float x; float y;}. Now you want to store N points in an array which would take at least N * 8 bytes for the floats and N * 4 bytes for the reference on a 32bit JVM. So at least 1/3 is garbage (not counting in the normal object overhead here). But if you would store this in two float arrays all would be fine.
My question: Why does Java not optimize the memory usage for arrays of references? I mean why not directly embed the object in the array like it is done in C++?
E.g. marking the class Point final should be sufficient for the JVM to see the maximum length of the data for the Point class. Or where would this be against the specification? Also this would save a lot of memory when handling large n-dimensional matrices etc
Update:
I would like to know wether the JVM could theoretically optimize it (e.g. behind the scene) and under which conditions - not wether I can force the JVM somehow. I think the second point of the conclusion is the reason it cannot be done easily if at all.
Conclusions what the JVM would need to know:
The class needs to be final to let the JVM guess the length of one array entry
The array needs to be read only. Of course you can change the values like Point p = arr[i]; p.setX(i) but you cannot write to the array via inlineArr[i] = new Point(). Or the JVM would have to introduce copy semantics which would be against the "Java way". See aroth's answer
How to initialize the array (calling default constructor or leaving the members intialized to their default values)
Java doesn't provide a way to do this because it's not a language-level choice to make. C, C++, and the like expose ways to do this because they are system-level programming languages where you are expected to know system-level features and make decisions based on the specific architecture that you are using.
In Java, you are targeting the JVM. The JVM doesn't specify whether or not this is permissible (I'm making an assumption that this is true; I haven't combed the JLS thoroughly to prove that I'm right here). The idea is that when you write Java code, you trust the JIT to make intelligent decisions. That is where the reference types could be folded into an array or the like. So the "Java way" here would be that you cannot specify if it happens or not, but if the JIT can make that optimization and improve performance it could and should.
I am not sure whether this optimization in particular is implemented, but I do know that similar ones are: for example, objects allocated with new are conceptually on the "heap", but if the JVM notices (through a technique called escape analysis) that the object is method-local it can allocate the fields of the object on the stack or even directly in CPU registers, removing the "heap allocation" overhead entirely with no language change.
Update for updated question
If the question is "can this be done at all", I think the answer is yes. There are a few corner cases (such as null pointers) but you should be able to work around them. For null references, the JVM could convince itself that there will never be null elements, or keep a bit vector as mentioned previously. Both of these techniques would likely be predicated on escape analysis showing that the array reference never leaves the method, as I can see the bookkeeping becoming tricky if you try to e.g. store it in an object field.
The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do new Point() the object you create is dynamically allocated on the heap. So if you allocate 100 Point instances by calling new Point() there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).
So how would a Point instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field in Point into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the original Point instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.
Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:
Point p = new Point(0, 0);
Point[] compressedA = {p}; //assuming 'p' is "optimally" stored as {0,0}
Point[] compressedB = {p}; //assuming 'p' is "optimally" stored as {0,0}
compressedA[0].setX(5)
compressedB[0].setX(1)
System.out.println(p.x);
System.out.println(compressedA[0].x);
System.out.println(compressedB[0].x);
...you would get:
0
5
1
...even though logically there should only be a single instance of Point. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.
Isn't this tantamount to providing trivial classes such as the following?
class Fixed {
float hiddenArr[];
Point pointArray(int position) {
return new Point(hiddenArr[position*2], hiddenArr[position*2+1]);
}
}
Also, it's possible to implement this without making the programmer explicitly state that they'd like it; the JVM is already aware of "value types" (POD types in C++); ones with only other plain-old-data types inside them. I believe HotSpot uses this information during stack elision, no reason it couldn't do it for arrays too?
We have a big class with 68 int, 22 double members, and there are also 4 members as class.
e.g
Class A{
public int i1
public int i2
public int i3
....
public Order order1
public Order order2
...
public double..
}
1: Is the memory of i1,i2,i3 is continually physically?
2: For class A, does it store the pointer to order1 & order 2, or it stores the content of order 1 & order 2?
There is another class B which has a member as an array of A, there are 365 A. So the memory for B could be very large. My concern is if the size of B is too huge, we can get lots of cache level 2 missing and degrade the performance. We mainly will sum the value of i1, and sum the value of i2, and sum the value of i3 etc.
e.g if sum i1 for all 365 A, then the i1 for all these 365A will not sit continually in the memory. So we could hit some cache missing and get not good performance.
I am thinking of using class B but remove the class A, and move all the elements inside A to B, so we can get
Class B {
public array_of_i1
public array_of_i2
..
}
In this way, when I calculate the sum of i1 or i2, then all the i1 or i2 are sitting together, so maybe we could get performance improvement?
As the class is huge, I'd like to look for your opinions before the change.
It's generally consecutive but it depends on which JVM you are using.
One complication is that runtime in
memory structure of Java objects is
not enforced by the virtual machine
specification, which means that
virtual machine providers can
implement them as they please. The
consequence is that you can write a
class, and instances of that class in
one VM can occupy a different amount
of memory than instances of that same
class when run in another VM.
As for the specific layout,
In order to save some memory, the Sun
VM doesn't lay out object's attributes
in the same order they are declared.
Instead, the attributes are organized
in memory in the following order:
doubles and longs
ints and floats
shorts and chars
booleans and bytes
references
(from http://www.codeinstructions.com/2008/12/java-objects-memory-structure.html)
He also includes how inherited classes are handled.
The JLS doesn't strongly specify the exact sizes of objects, so this can vary between JVM implementations (though you can infer some lower bounds, i.e. an integer must be at least 32 bits).
In Sun's JVM however, integers take 32 bits, doubles take 64 bits and object references take 32 bits (unless you're running on a 64-bit JVM and pointer compression is disabled). Then the object itself has a 2 word header, and the overall memory size is aligned to a multiple of 8 bytes.
So overall this object should take 8 * ceil((8 + 68 * 4 + 22 * 8 + 4 * 4) / 8) = 10448 bytes, if I haven't forgotten to account for something (which is entirely possible), and if you're running on a 32-bit machine.
But - as stated above, you shouldn't really rely too strongly on this as it's not specified anywhere, and will vary between implementations and on different platforms. As always with performance-related metrics, the key is to write clean code, measure the impact (in this case use a profiler to look at memory usage, and execution time) and then optimise as required.
Performance only really matters from the macro perspective; worrying about L2 cache misses when designing your object model is really the wrong way round to do it.
(And a class with 94 fields is almost certainly not a clean design, so you're right to consider refactoring it...)
Firstly, before you embark on any work, have you profiled your application? Are cache misses causing a bottleneck?
What are your performance requirements? (Note: 'As fast as possible' isnt a requirement*)
That would be implementation dependent.
Yes, it stores pointers. The objects will reside elsewhere.
In general, yes. But I don't think you necessarily want to depend on it. Wrong language for that low-level type stuff.
Pointers, but I'm not sure why that matters.
Profile before making significant changes for performance reasons. I think the second is cleaner though. Wouldn't you rather do a simple array loop for your summing?
Or you could change the structure to use a smaller class, keeping the stuff that runs in a tight loop together will tend to improve cache hits (iff that is your performance bottleneck).
Does assigning an unused object reference to null in Java improve the garbage collection process in any measurable way?
My experience with Java (and C#) has taught me that is often counter intuitive to try and outsmart the virtual machine or JIT compiler, but I've seen co-workers use this method and I am curious if this is a good practice to pick up or one of those voodoo programming superstitions?
Typically, no.
But like all things: it depends. The GC in Java these days is VERY good and everything should be cleaned up very shortly after it is no longer reachable. This is just after leaving a method for local variables, and when a class instance is no longer referenced for fields.
You only need to explicitly null if you know it would remain referenced otherwise. For example an array which is kept around. You may want to null the individual elements of the array when they are no longer needed.
For example, this code from ArrayList:
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
Also, explicitly nulling an object will not cause an object to be collected any sooner than if it just went out of scope naturally as long as no references remain.
Both:
void foo() {
Object o = new Object();
/// do stuff with o
}
and:
void foo() {
Object o = new Object();
/// do stuff with o
o = null;
}
Are functionally equivalent.
In my experience, more often than not, people null out references out of paranoia not out of necessity. Here is a quick guideline:
If object A references object B and you no longer need this reference and object A is not eligible for garbage collection then you should explicitly null out the field. There is no need to null out a field if the enclosing object is getting garbage collected anyway. Nulling out fields in a dispose() method is almost always useless.
There is no need to null out object references created in a method. They will get cleared automatically once the method terminates. The exception to this rule is if you're running in a very long method or some massive loop and you need to ensure that some references get cleared before the end of the method. Again, these cases are extremely rare.
I would say that the vast majority of the time you will not need to null out references. Trying to outsmart the garbage collector is useless. You will just end up with inefficient, unreadable code.
Good article is today's coding horror.
The way GC's work is by looking for objects that do not have any pointers to them, the area of their search is heap/stack and any other spaces they have. So if you set a variable to null, the actual object is now not pointed by anyone, and hence could be GC'd.
But since the GC might not run at that exact instant, you might not actually be buying yourself anything. But if your method is fairly long (in terms of execution time) it might be worth it since you will be increasing your chances of GC collecting that object.
The problem can also be complicated with code optimizations, if you never use the variable after you set it to null, it would be a safe optimization to remove the line that sets the value to null (one less instruction to execute). So you might not actually be getting any improvement.
So in summary, yes it can help, but it will not be deterministic.
At least in java, it's not voodoo programming at all. When you create an object in java using something like
Foo bar = new Foo();
you do two things: first, you create a reference to an object, and second, you create the Foo object itself. So long as that reference or another exists, the specific object can't be gc'd. however, when you assign null to that reference...
bar = null ;
and assuming nothing else has a reference to the object, it's freed and available for gc the next time the garbage collector passes by.
It depends.
Generally speaking shorter you keep references to your objects, faster they'll get collected.
If your method takes say 2 seconds to execute and you don't need an object anymore after one second of method execution, it makes sense to clear any references to it. If GC sees that after one second, your object is still referenced, next time it might check it in a minute or so.
Anyway, setting all references to null by default is to me premature optimization and nobody should do it unless in specific rare cases where it measurably decreases memory consuption.
Explicitly setting a reference to null instead of just letting the variable go out of scope, does not help the garbage collector, unless the object held is very large, where setting it to null as soon as you are done with is a good idea.
Generally setting references to null, mean to the READER of the code that this object is completely done with and should not be concerned about any more.
A similar effect can be achieved by introducing a narrower scope by putting in an extra set of braces
{
int l;
{ // <- here
String bigThing = ....;
l = bigThing.length();
} // <- and here
}
this allows the bigThing to be garbage collected right after leaving the nested braces.
public class JavaMemory {
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f() {
{
byte[] data = new byte[dataSize];
//data = null;
}
byte[] data2 = new byte[dataSize];
}
public static void main(String[] args) {
JavaMemory jmp = new JavaMemory();
jmp.f();
}
}
Above program throws OutOfMemoryError. If you uncomment data = null;, the OutOfMemoryError is solved. It is always good practice to set the unused variable to null
I was working on a video conferencing application one time and noticed a huge huge huge difference in performance when I took the time to null references as soon as I didn't need the object anymore. This was in 2003-2004 and I can only imagine the GC has gotten even smarter since. In my case I had hundreds of objects coming and going out of scope every second, so I noticed the GC when it kicked in periodically. However after I made it a point to null objects the GC stopped pausing my application.
So it depends on what your doing...
Yes.
From "The Pragmatic Programmer" p.292:
By setting a reference to NULL you reduce the number of pointers to the object by one ... (which will allow the garbage collector to remove it)
I assume the OP is referring to things like this:
private void Blah()
{
MyObj a;
MyObj b;
try {
a = new MyObj();
b = new MyObj;
// do real work
} finally {
a = null;
b = null;
}
}
In this case, wouldn't the VM mark them for GC as soon as they leave scope anyway?
Or, from another perspective, would explicitly setting the items to null cause them to get GC'd before they would if they just went out of scope? If so, the VM may spend time GC'ing the object when the memory isn't needed anyway, which would actually cause worse performance CPU usage wise because it would be GC'ing more earlier.
Even if nullifying the reference were marginally more efficient, would it be worth the ugliness of having to pepper your code with these ugly nullifications? They would only be clutter and obscure the intent code that contains them.
Its a rare codebase that has no better candidate for optimisation than trying to outsmart the Garbage collector (rarer still are developers who succeed in outsmarting it). Your efforts will most likely be better spent elsewhere instead, ditching that crufty Xml parser or finding some opportunity to cache computation. These optimisations will be easier to quantify and don't require you dirty up your codebase with noise.
Oracle doc point out "Assign null to Variables That Are No Longer Needed" https://docs.oracle.com/cd/E19159-01/819-3681/abebi/index.html
"It depends"
I do not know about Java but in .net (C#, VB.net...) it is usually not required to assign a null when you no longer require a object.
However note that it is "usually not required".
By analyzing your code the .net compiler makes a good valuation of the life time of the variable...to accurately tell when the object is not being used anymore. So if you write obj=null it might actually look as if the obj is still being used...in this case it is counter productive to assign a null.
There are a few cases where it might actually help to assign a null. One example is you have a huge code that runs for long time or a method that is running in a different thread, or some loop. In such cases it might help to assign null so that it is easy for the GC to know its not being used anymore.
There is no hard & fast rule for this. Going by the above place null-assigns in your code and do run a profiler to see if it helps in any way. Most probably you might not see a benefit.
If it is .net code you are trying to optimize, then my experience has been that taking good care with Dispose and Finalize methods is actually more beneficial than bothering about nulls.
Some references on the topic:
http://blogs.msdn.com/csharpfaq/archive/2004/03/26/97229.aspx
http://weblogs.asp.net/pwilson/archive/2004/02/20/77422.aspx
In the future execution of your program, the values of some data members will be used to computer an output visible external to the program. Others might or might not be used, depending on future (And impossible to predict) inputs to the program. Other data members might be guaranteed not to be used. All resources, including memory, allocated to those unused data are wasted. The job of the garbage collector (GC) is to eliminate that wasted memory. It would be disastrous for the GC to eliminate something that was needed, so the algorithm used might be conservative, retaining more than the strict minimum. It might use heuristic optimizations to improve its speed, at the cost of retaining some items that are not actually needed. There are many potential algorithms the GC might use. Therefore it is possible that changes you make to your program, and which do not affect the correctness of your program, might nevertheless affect the operation of the GC, either making it run faster to do the same job, or to sooner identify unused items. So this kind of change, setting an unusdd object reference to null, in theory is not always voodoo.
Is it voodoo? There are reportedly parts of the Java library code that do this. The writers of that code are much better than average programmers and either know, or cooperate with, programmers who know details of the garbage collector implementations. So that suggests there is sometimes a benefit.
As you said there are optimizations, i.e. JVM knows the place when the variable was last used and the object referenced by it can be GCed right after this last point (still executing in current scope). So nulling out references in most cases does not help GC.
But it can be useful to avoid "nepotism" (or "floating garbage") problem (read more here or watch video). The problem exists because heap is split into Old and Young generations and there are different GC mechanisms applied: Minor GC (which is fast and happens often to clean young gen) and Major Gc (which causes longer pause to clean Old gen). "Nepotism" does not allow for garbage in Young gen to be collected if it is referenced by garbage which was already tenured to an Old gen.
This is 'pathological' because ANY promoted node will result in the promotion of ALL following nodes until a GC resolves the issue.
To avoid nepotism it's a good idea to null out references from an object which is supposed to be removed. You can see this technique applied in JDK classes: LinkedList and LinkedHashMap
private E unlinkFirst(Node<E> f) {
final E element = f.item;
final Node<E> next = f.next;
f.item = null;
f.next = null; // help GC
// ...
}