Java: Clarification needed on API doc for Reference Objects

Java: Clarification needed on API doc for Reference Objects - java

I get the gist of reference objects in Java, and the basic differences between soft, weak, and phantom reference objects.
However, I don't fully understand the following points from the API docs
From the API doc for WeakReference<T>:
"Weak reference objects, which do not prevent their referents from being made finalizable, finalized, and then reclaimed."
Now, the terms in bold haven't been explained anywhere in the API docs, so I wonder what they precisely mean, especially in relation to the more or less deprecated Object.finalize() method's notion of finalization.
From the API doc for Reference<T>:
public void clear(): "This method is invoked only by Java code; when the garbage collector clears references it does so directly, without invoking this method."
public boolean enqueue(): "This method is invoked only by Java code; when the garbage collector enqueues references it does so directly, without invoking this method."
Again, I don't know what is meant by "Java code" in above 2 quotes: The JVM internal code to which I have no access? Or, the JDK code to which I have readonly/browsing access? Or, the end-user's own Java code?
The "directly, without invoking this method" part tells me that JVM has no need to call these methods. On the other hand, the "only by Java code" part tells me that it is not the end-user's Java code but rather the JVM's (if it meant end-user code, then we'd be finding this phrase littered in all of the API doc for almost every method of every Java class!). So which interpretation is right and who can call this function?

"Weak reference objects, which do not prevent their referents from being made finalizable, finalized, and then reclaimed."
These are all stages in the Garbage Collection process. Objects first get marked as finalizable to say that there are no strong references to them. Then finalize() is called and they are marked as finalized, and then finally the memory is reclaimed.
public void clear(): "This method is invoked only by Java code; when the garbage collector clears references it does so directly, without invoking this method."
This is saying that when you as a programmer decide to clear a reference then the clear() method is used to do that, however if you were to subclass WeakReference and override the clear method you would NOT see the JVM calling that method when the object was removed.
The quote for enqueue is essentially saying the same thing. It is a warning that you cannot interact with the workings of the GC by overriding these methods.

"finalizable, finalized, and then reclaimed." means garbage collected.
"only by java code" means called from your program itself (including the JDK) - i.e. you may have some code somewhere that calls ref.clear();. It also explains that the GC (i.e. the JVM) does effectively clear the reference but with a different mechanism that does not call the clear method. For example, if you override clear to be a no-op, the GC will still be able to "nullify" the reference.

Related

Is there a way to receive object, without having reference to it?

Suppose following code:
Object obj = new Object();
obj = null;
At this point, i don't have any reference to this object, but it's still on the heap, because garbage collection don't happens instantly. Is there a way to re obtain reference on this object, before it'll be collected by GC?
Only possible way that i seen so far is to use Unsafe, which provides direct memory access, but i will need to know where in memory exactly object is allocated. Also, there is Weak\SoftReference, but they are implemented by special GC behavior.
P.S. To predict questions like "Why do you need it?" - Because science is not about why, it's about why not! (c)

This is highly JVM implementation specific. In a naive implementation having memory allocation information associated with each object, you could find an object whose memory has not been freed yet and it seems you are thinking into that direction.
However, sophisticated JVMs don’t work that way. Associating allocation information with each object would create a giant overhead, given that you may have millions of objects in your runtime. Not only regarding memory requirement, but also regarding the amount of work that has to be done for maintaining these information when allocating or freeing an object.
So what makes a part of your heap memory an object? Only the reference you are holding to it. The garbage collector traverses existing references and within the objects found this way, it will find meta information (i.e. a pointer to class specific information) needed to understand how much memory belongs to the object and how to interpret the contained data (to traverse the sub-references, if any). Everything unreferenced is unused per se and might contain old objects or might have never been used at all, who knows. Once all references to an object are gone, there is no information left about the former existence of this object.
Getting to the point, there is no explicit freeing action. When the garbage collector has found surviving objects, they will be copied to a dedicated new place and their old place is considered to be free, regardless of how many objects there were before and how much memory each individual object occupied when it was alive.
When you search memory that is considered to be unused, you may find reminiscences of old objects, but without references to their starting points, it’s impossible to say whether the bit pattern that looks like an object really is a dead object or just a coincidence. Even if you managed to resurrect an object that way, it had nothing to do with your original idea of being able to resurrect a reference, because the gc didn’t run yet.
Note that all modifications to this ordinary life time work by holding another reference to the object. E.g., when the class defines a non-trivial finalize() method, the JVM has to add a reference to the queue of objects needing finalization. Similarly, soft, weak and phantom references encapsulate a reference to the object in question. Also a debugger may keep a reference to an object, once it has seen it.
But for your trivial code Object obj = new Object(); obj = null;, assuming there’s no breakpoint set in-between, there will be no additional reference and hence, no way of resurrecting the object. A JVM may even elide the entire allocation when optimizing the code at runtime. So then you wouldn’t even find remainings of the object in the RAM when searching it as the object effectively never existed.

At this point, i don't have any reference to this object, but it's still on the heap, because garbage collection don't happens instantly.
It is undefined where it is, and it is also undefined whether or not garbage collection happens instantly.
Is there a way to re obtain reference on this object, before it'll be collected by GC?
You already had one and you threw it away. Just keep it.
I will need to know where in memory exactly object is allocated.
There is nothing in standard Java that will tell you that, and no useful way you could make use of the information if you could get it.
Also, there is Weak/SoftReference, but they are implemented by special GC behavior.
I don't see how this affects your question, whatever it is.

When a PhantomReference/SoftReference/WeakReference is queued, how do you know what it referred to?

I haven't used PhantomReferences. There seems to be very few good examples of real-world use.
When a phantom shows up in your queue, how do you know which object it is/was? The get() method appears to be useless. According to the JavaDoc,
Because the referent of a phantom reference is always inaccessible,
this method always returns null.
I think that unless your object is a singleton, you always want to use a subclass of PhantomReference, in which you place whatever mementos you need in order to understand what died.
Is this correct, or did I miss something?
Is this also true for SoftReferences?
For WeakReferences?
Links to relevant examples of usage would be great.

I think that unless your object is a singleton, you always want to use a subclass of PhantomReference, in which you place whatever mementos you need in order to understand what died.
You could also use a Map<Reference<?>, SomeMetadataClassOrInterface> to recover whatever metadata you need. Since ReferenceQueue<T> returns a Reference<T>, you either have to cast it to whatever subclass of PhantomReference you expect, or let a Map<> do it for you.
For what it's worth, it looks like using PhantomReferences puts some burden on you:
Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are enqueued. An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable.
so you'd have to clear() the references yourself in order for memory to be reclaimed. (why there is usefulness in having to do so vs. letting the JVM do this for you is beyond me)

Your question has caused me to look into it a little more, and I found this very well written explanation and examples of all the reference types. He even talks about some (tenuous) uses of phantom references.
http://weblogs.java.net/blog/2006/05/04/understanding-weak-references

Can someone explain the difference between Strong, Soft, Weak and Phantom references and the usage of it? [duplicate]

This question already has answers here:
What's the difference between SoftReference and WeakReference in Java?
(12 answers)
Closed 6 years ago.
I have been trying to understand the difference between different references but the theory does not provoke any ideas for me to visualize the same.
Could anyone please explain in brief the different references?
An example for each would do better.

Another good article on the topic:
Java Reference Objects or How I Learned to Stop Worrying and Love OutOfMemoryError, with nice diagrams
Extract:
As you might guess, adding three new optional states to the object life-cycle diagram makes for a mess.
Although the documentation indicates a logical progression from strongly reachable through soft, weak, and phantom, to reclaimed, the actual progression depends on what reference objects your program creates.
If you create a WeakReference but don't create a SoftReference, then an object progresses directly from strongly-reachable to weakly-reachable to finalized to collected. object life-cycle, with reference objects
It's also important to remember that not all objects are attached to reference objects — in fact, very few of them should be.
A reference object is a layer of indirection: you go through the reference object to reach the referred object, and clearly you don't want that layer of indirection throughout your code.
Most programs, in fact, will use reference objects to access a relatively small number of the objects that the program creates.
References and Referents
A reference object provides a layer of indirection between your program code and some other object, called the referent.
Each reference object is constructed around its referent, and provides a get() method to access the referent. Once you create a reference, you cannot change its referent. Once the referent has been collected, the get() method returns null. relationships between application code, soft/weak reference, and referent
Even more examples: Java Programming: References' Package
alt text http://www.pabrantes.net/blog/space/start/2007-09-16/1/referenceTypes.png
Case 1: This is the regular case where Object is said to be strongly reachable.
Case 2: There are two paths to Object, so the strongest one is chosen, which is the one with the strong reference hence the object is strongly reachable.
Case 3: Once again there are two paths to the Object, the strongest one is the Weak Reference (since the other one is a Phantom Reference), so the object is said to be weakly reachable.
Case 4: There is only one path and the weakest link is a weak reference, so the object is weakly reachable.
Case 5: Only one path and the weakest link is the phantom reference hence the object is phantomly reachable.
Case 6: There are now two paths and the strongest path is the one with a soft reference, so the object is now said to be softly reachable.

An article explaining these types of references (including examples): Understanding Weak References - weblogs.java.net (From the Web Archive)

There's a really simple rule:
strongly-referenced objects are standard bits of code like Object a = new Object(). The referenced Objects are not garbage as long as the reference (a, above) is "reachable". Hence anything which has no reachable strong reference can be deemed garbage.
So then we look at the non-strong reference types:
weakly-referenced objects will probably get collected by the JVM as soon as they become eligible for GC (and the WeakReference cleared). A weak reference to a would look like new WeakReference<Object>(a). Weak references are useful in the case that you want a cache whereby the data is only needed if the keys exist as strongly-reachable elsewhere (e.g. HttpSessions)
softly-referenced objects will probably hang around in the JVM until it absolutely needs to recover memory. Soft references are useful for caches where the values are long-lived but can collected if necessary
I'm never too sure about phantom ones!

Why is the finalize() method in java.lang.Object "protected"?

Out of curiosity,
Why is the finalize() method's access modifier is made as protected. Why cant it be public? Can someone explain me any specific reason behind this?
Also, I came to know that finalize() method is called only once. If I call it twice in my program internally, what is happening? Will the garbage collector call this again?
private void dummyCall() {
try {
finalize();
finalize();
} catch (Throwable e) {
e.printStackTrace();//NOT REACHES EXCEPTION
}
}

I answer your question with another question:
Why finalize method shouldn't be protected?
In general, you should try to keep things as much private as possible. That's what encapsulation is all about. Otherwise, you could make everything public. finalize can't be private (since derived classes should be able to access it to be able to override it), so it should at least be protected but why give out more access when it's not desirable?
After reading your comment more carefully, I guess I see your main point now. I think your point is since everything derives from java.lang.Object and consequently accesses its protected members, it wouldn't make any difference for it (or any method in java.lang.Object for that matter) to be public as opposed to protected. Personally, I'd count this as a design flaw in Java. This is indeed fixed in C#. The problem is not why finalize is protected. That's OK. The real issue is that you shouldn't be able to call protected methods in the base class through an object reference of the base class type. Eric Lippert has a blog entry discussing why allowing such kind of access to protected members is a bad idea which is further elaborated on Stack Overflow in this question.

Why is the finalize() method's access
modifier is made as protected. Why
cant it be public?
It is not public because it shouldn't be invoked by anyone other than the JVM. However, it must be protected so that it can be overridden by subclasses who need to define behavior for it.
If i call it twice in my program,
internally what is happening?
You can call it all you want, its just a method after all. However, much like public static void main(String [] args), it has special meaning to the JVM
Will the garbage collector call this
again?
Yes

finalize is meant to be called by the gc only and as such does not require public access
finalize is guaranteed to be called only once by the gc, calling it yourself will break this guarantee, as the gc wont know about it.
Any overriding class can make finalize public, which I believe is bad for the above reasons
finalize should not contain much code, as any exception thrown by finalize may kill the finalizer thread of the gc.
Rant against finalize()
Managing native resources or any resource which requires dispose() or close() to be called may cause hard to find bugs as they will only be released when the jvm runs out of memory, you should release resources manually. Finalize should only be used for debugging resource leaks or for cases where managing resources manually is too much work.
finalize will be called in an additional thread of the gc and may cause problems with resource locking and so on.
the reference classes like WeakReference and ReferenceQueue are an alternative (rather complex) way to deal with cleanup and may have the same problems as finalize() for native resources.
Beware of errors in the above statements, I'm a bit tired :-)

Check out this link which discusses it.
Basically, it would make the most sense for it to be private, as it should only be called by the JVM (garbage collector). But in order to allow a subclass to call the parent finalize() method as part of its finalize(), it has to be protected.
(Edit - And just a general caution - use of the finalize() method is generally discouraged as there's no way of ensuring that it will ever be called. Although that doesn't mean that you'll never have occasion to use it - it's just rare.)

The part about finalize() being called only once applies only to the calls from the GC. You can imagine the object as having a hidden flag "finalize() was called by the GC", and the GC checking that flag to know what to do with the object. The flag is not impacted in any way by your own handmade calls to finalize().
On finalization, read this article from Hans Boehm (who is well-known for his work on garbage collection). This is an eye-opener about finalization; in particular, Boehm explains why finalization is necessarily asynchronous. A corollary is that while finalization is a powerful tool, it is very rarely the right tool for a given job.

It's not public (or default access) because it's meant to be called by the JVM internally when the object is garbage collected - it's not meant to be called by anything else. And it's not private because it's meant to be overridden and you can't override private methods.
If i call it twice in my program,
internally what is happening? Will the
garbage collector will call this
again?
Probably yes, but it's hard to imagine a scenario where this would make any kind of sense - the point of finalize() is to do cleanup when an object is garbage collected. And it doesn't even do that well, so it's really something you should avoid altogether rather than experiment with.

finalize() is only used by the JVM to clean up resources when the object is collected. It's reasonable for a class to define what actions should be taken on collection, for which it may need to access super.finalize(). It doesn't really make sense for an outside process to call finalize(), since an outside process doesn't have control over when the object is collected.

Also, I came to know that finalize()
method is called only once. If i call
it twice in my program, internally
what is happening?
You probably ask this under impression of C++ ~destructors. In java finalize () method doesn't do any magic (like clearing memory). It's supposed to be called by garbage collector. But not vice versa.
I recommend you to read correspondent chapter in Joshua Bloch's "Effective Java". It says that using finalizers is a bad practice and can cause performance and other issues, and there are only several cases when they should be used. The chapter begins with next words:
Finalizers are unpredictable, often
dangerous, and generally unnecessary.

I think the reason why finalize is protected would be that maybe it's overridden by some classes in the JDK, and those overridden methods are called by JVM.

JNI memory management using the Invocation API

When I'm building a java object using JNI methods, in order to pass it in as a parameter to a java method I'm invoking using the JNI invocation API, how do I manage its memory?
Here's what I am working with:
I have a C object that has a destructor method that is more complex that free(). This C object is to be associated with a Java object, and once the application is finished with the Java object, I have no more need for the C object.
I am creating the Java object like so (error checking elided for clarity):
c_object = c_object_create ();
class = (*env)->FindClass (env, "my.class.name");
constructor = (*env)->GetMethodID (env, class, "<init>", "(J)V");
instance = (*env)->NewObject (env, class, constructor, (jlong) c_object);
method = (*env)->GetMethodID (env, other_class, "doSomeWork", "(Lmy.class.name)V");
(*env)->CallVoidMethod (env, other_class, method, instance);
So, now that I'm done with instance, what do I do with it? Ideally, I'd like to leave the garbage collection up to the VM; when it's done with instance it would be fantastic if it also called c_object_destroy() on the pointer I provided to it. Is this possible?
A separate, but related question has to do with the scope of Java entities that I create in a method like this; do I have to manually release, say, class, constructor, or method above? The JNI doc is frustratingly vague (in my judgement) on the subject of proper memory management.

The JNI spec covers the issue of who "owns" Java objects created in JNI methods here. You need to distinguish between local and global references.
When the JVM makes a JNI call out to native code, it sets up a registry to keep track of all objects created during the call. Any object created during the native call (i.e. returned from a JNI interface function) are added to this registry. References to such objects are known as local references. When the native method returns to the JVM, all local references created during the native method call are destroyed. If you're making calls back into the JVM during a native method call, the local reference will still be alive when control returns back to the native method. If the JVM invoked from native code makes another call back into the native code, a new registry of local references is created, and the same rules apply.
(In fact, you can implement you're own JVM executable (i.e. java.exe) using the JNI interface, by creating a JVM (thereby receiving a JNIEnv * pointer), looking up the class given on the command line, and invoking the main() method on it.)
All references returned from JNI interface methods are local. This means that under normal circumstances you do not need to manually deallocate references return by JNI methods, since they are destroyed when returning to the JVM. Sometimes you still want to destroy them "prematurely", for example when you lots of local references which you want to delete before returning to the JVM.
Global references are created (from local references) by using the NewGlobalRef(). They are added to a special registry and have to be deallocated manually. Global references are only used for Java object which the native code needs to hold a reference to across multiple JNI calls, for example if you have native code triggering events which should be propagated back to Java. In that case, the JNI code needs to store a reference to a Java object which is to receive the event.
Hope this clarifies the memory management issue a little bit.

There are a couple of strategies for reclaiming native resources (objects, file descriptors, etc.)
Invoke a JNI method during finalize() which frees the resource. Some people recommend against implementing finalize, and basically you can't really be sure that your native resource is ever freed. For resources such as memory this is probably not a problem, but if you have a file for example which needs to be flushed at a predictable time, finalize() probably not a good idea.
Manually invoke a cleanup method. This is useful if you have a point in time where you know that the resource must be cleaned up. I used this method when I had a resource which had to be deallocated before unloading a DLL in the JNI code. In order to allow the DLL to later be reloaded, I had to be sure that the object was really deallocated before attempting to unload the DLL. Using only finalize(), I would not have gotten this guaranteed. This can be combined with (1) to allow the resource to be allocated either during finalize() or at the manually called cleanup method. (You probably need a canonical map of WeakReferences to track which objects needs to have their cleanup method invoked.)
Supposedly the PhantomReference can be used to solve this problem as well, but I'm not sure exactly how such a solution would work.
Actually, I have to disagree with you on the JNI documentation. I find the JNI specification exceptionally clear on most of the important issues, even if the sections on managing local and global references could have been more elaborated.

Re: "A separate, but related question"... you do not need to manually release jclass, jfieldID and jmethodID when you use them in a "local" context. Any actual object references you obtain (not jclass, jfieldID, jmethodID) should be released with DeleteLocalRef.

The GC would collect your instance, but it will not automatically release the non-java heap memory allocated in the native code. You should have explicit method in your class to release the c_object instance.
This is one of the cases where I'd recommend using a finalizer checking if c_object has been released and release it, logging a message if it's not.
A useful technique is to create a Throwable instance in the Java class constructor and store it in a field (or just initialize the field inline). If the finalizer detects that the class has not been properly disposed it would print the stacktrace, pinpointing the allocation stack.
A suggestion is to avoid doing straight JNI and go with gluegen or Swig (both generate code and can be statically linked).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.