Garbage collection and asynchronous calls / Future objects - java

Below is a sample code that utilizes the Future interface to make an asynchronous call. I need some clarification about the get() method.
Future<String> future = getAsyncString();
//do something ...
String msg = "";
if (validation)
return;
else
msg = future.get();
//do something else...
return;
The future variable is initialized in a method , so the variable will be soon cleared by the GC after the method's execution as it is no longer used.
So in case that the code enters the if statement , what will be the state of the JVM? How is the JVM going to handle the wrapped result in case that noone is going to read it back? Does it affect the Thread Pool , or the thread Executor?

How is the JVM going to handle the wrapped result in case that noone is going to read it back?
Presumably you got the Future object from an Executor. For this executor to be able to set the result in the Future, it holds a reference to the Future. In other words, just because the method local reference to the object disappears as the call stack is popped, doesn't mean that the Future object (which is on the heap) is automatically eligible for garbage collection.
The async call is not cancelled or anything like that. The executor will perform the call, fill in the result, and presumably drop it's reference to the Future object. At this point the object becomes unreachable and eligible for garbage collection.
If you're certain that your code doesn't keep a reference to the Future object (i.e. leaking it in the // do something... part) then you can be sure that the Future object is (eventually) collected by the GC. (The executor doesn't have any subtle memory leaks here.)
[...] so the variable will be soon cleared by the GC.
To be precise, the variable will be discarded as the call stack is popped. This will eventually cause the Future object to be unreachable and eligible for garbage collection. The object will however typically not be garbage collected immediately as the method returns.

How is the JVM going to handle the wrapped result in case that noone is going to read it back?
If nobody(I mean any program) is going to read it back then GC will take care of it during garbage collection. But that does not mean getAsyncString() will not be executed completely, instead it will complete normally as a normal method completes.

I guess. Scheduled future will have some internal references from threadpools queues until task completion. So it cant be collected by gc before task is complete.
May be there is exists additional abstraction level between future and executor and future can be collected. But im sure that if task submitted it will be runned. No matter, was pointer to future saved or not.

You have guarantee that the object will not be GCed while you are in scope in which reference to it is defined, or there is reference to the object somewhere in code.
This applies to all Objects, and Future makes no difference here.
So, once your method ends, and its call stack is cleared, at some point in the future your object will be eligible for Garbage collection, but certainly not before reference to it exists on the method's call stack.

Related

CompletableFuture Chain uncompleted -> Garbage Collector?

if i have one (or more) CompletableFuture not started yet, and on that method(s) a few thenApplyAsync(), anyOf()-methods.
Will the Garbage Collector remove all of that?
If there is a join()/get() at the end of that chain -> same question: Will the Garbage Collector remove all of that?
Maybe we need more information about that context of the join().
That join is in a Thread the last command, and there are no side-effects.
So is in that case the Thread still active? - Java Thread Garbage collected or not
Anyway is that a good idea, to push a poisen-pill down the chain, if im sure (maybe in a try-catch-finally), that i will not start that Completable-chain, or is that not necessary?
The question is because of something like that? (https://bugs.openjdk.java.net/browse/JDK-8160402)
Some related question to it: When is the Thread-Executor signaled to shedule a new task? I think, when the CompletableFuture goes to the next chained CompletableFuture?. So i must only carry on memory-leaks and not thread-leaks?
Edit: What i mean with a not started CompletableFuture?
i mean a var notStartedCompletableFuture = new CompletableFuture<Object>(); instead of a CompletableFuture.supplyAsync(....);
I can start the notStartedCompletableFuture in that way:
notStartedCompletableFuture.complete(new Object); later in the program-flow or from another thread.
Edit 2: A more detailed Example:
AtomicReference<CompletableFuture<Object>> outsideReference=new AtomicReference<>();
final var myOuterThread = new Thread(() ->
{
final var A = new CompletableFuture<Object>();
final var B = new CompletableFuture<Object>();
final var C = A.thenApplyAsync((element) -> new Object());
final var D = CompletableFuture.anyOf(A, C);
A.complete(new Object());
// throw new RuntimeException();
//outsideReference.set(B);
----->B.complete(new Object());<------ Edit: this shouldn't be here, i remove it in my next iteration
D.join();
});
myOuterThread.start();
//myOutherThread variable is nowhere else referenced, it's sayed so a local variable, to point on my text on it^^
So in the normal case here in my example i don't have a outside
reference. The CompletableFutures in the thread have never a chance
getting completed. Normally the GC can safely erase both the thread
and and the content in there, the CompetableFutures. But i don't
think so, that this would happen?
If I abbord this by throwing an exception -> the join() is never
reached, then i think all would be erased by the GC?
If I give one of the CompletableFutures to the outside by that AtomicReference, there then could be an chance to unblock the join(), There should be no GC here, until the unblock happens. BUT! the waiting myOuterThread on that join() doesen't have to to there anything more after the join(). So it could be an optimization erasing that Thread, before someone from outside completes B. But I think this would be also not happen?!
One more question here, how I can proof that behavior, if threads are blocked by waiting on a join() or are returned to a Thread-Pool?, where the Thread also "blocks"?
You seem to be struggling with different ways that CompletableFuture might leak, depending on how you created it. But it doesn't matter how, where, when or why it was created. The only thing that matters is whether or not it is still reachable.
Will the Garbage Collector remove all of that?
There are two places where we would expect there to be references to a CompletableFuture:
In the Runnable (or whatever) that would complete the future.
In any other code that would (at some point) attempt to get the eventual value from the future.
If you have a call thenApplyAsync() or anyOf() then the reference Runnable is in the arguments to that call. If the call can still happen, then the reference to the Runnable must still be reachable.
In your example:
var notStartedCompletableFuture = new CompletableFuture<Object>();
if the variable notStartedCompletableFuture is still accessible by some code that is still executing, then that CompletableFuture is reachable and won't be garbage collected.
On the other hand, if notStartedCompletableFuture is no longer accessible, and if the future is no longer reachable by some other path, then it won't be reachable at all ... and will be a candidate for garbage collection.
If there is a join() / get() at the end of that chain -> same question: Will the Garbage Collector remove all of that?
That makes no difference. It is all based on reachability. (The only wrinkle is that a thread that is currently alive1 is always reachable, irrespective of any other references to its Thread object. The same applies to its Runnable, and other objects reachable from the Runnable.)
But it is worth noting that if you call join() or get() on a thread / future that never terminates / completes, you will block the current thread, potentially for ever. And that is as bad as a thread leak.
1 - A thread is "alive" from when it is started to when it terminates.
When is the Thread-Executor signaled to schedule a new task?
It depends what you mean by "schedule". If you mean, when is the task submitted, the answer is when submit is called. If you mean, when is it actually run ... well it goes into the queue, and it runs when it gets to the head of the queue and a worker thread is free to execute it.
In the case of thenApplyAsync() and all_of(), the tasks are submitted (i.e. the submit(...) call occurs) when the respective method call occurs. So for example if thenApplyAsync is being called on the result of a previous call, then that call must return first.
This is all a consequence of the basic properties of Java expression evaluation ... applied to the expression that you are using to construct the chain of stages.
In general you don't need try / finally or try with resources to clean up potential memory leaks.
All you need to do is to make sure that you don't keep references to the various futures, stages, etc in variables, data structures, etc that will remain accessible / reachable beyond the lifetime of your computation. If you do that ... those references are liable to be the source of the leaks.
Thread leaks should not be your concern. If your code is not creating threads explicitly, they are being managed by the executor service / pool.
If a thread calls join() or get() on a CompletableFuture that will never be completed, it will remain blocked forever (except if it gets interrupted), holding a reference to that future.
If that future is the root of a chain of descendant futures (+ tasks and executors), it will also keep a reference to those, which will also remain in memory (as well as all transitively referenced objects).
A future does not normally hold references to its “parent(s)” when created through the then*() methods, so they should normally be garbage collected if there are no other references – but pay attention to those, e.g. local variables in the calling thread, reference to a List<CompletableFuture<?>> used in a lambda after allOf() etc.
This Answer only addresses with the 3 followup questions in your "Edit 2".
So in the normal case here in my example i don't have a outside
reference.
I assume that you are referring to the version with the commented out statements.
The CompletableFutures in the thread have never a chance getting completed.
Incorrect. First, A is completed here:
A.complete(new Object());
Next B is completed here:
B.complete(new Object());
Then you call D.join(). Since D is an anyOf stage, this completes when either of A and C completes. A has already completed, so D.join() may not need to wait for C to complete. But since C applies the function asynchronously, it could complete immediately too.
Normally the GC can safely erase both the thread and and the content in there, the CompletableFutures. But I don't think so, that this would happen?
When D.join() returns, the thread terminates. At that point, its local local variables (A, B, C, and D) will be unreachable.
If I abort this by throwing an exception -> the join() is never reached, then i think all would be erased by the GC?
A completes as before, but B, C and D don't.
However, the exception terminates the thread, so the local variables A, B, C, and D then become unreachable.
If I give one of the CompletableFutures to the outside by that AtomicReference, there then could be an chance to unblock the join().
Three points:
The AtomicReference is assigned B so the join() on D is not affected.
As we saw above, it doesn't matter if that a hypothetical join() on outsideReference.value() happens or not for the variables A, B, C, and D. Those variables become unreachable, whichever way the thread terminates.
However, you have now assigned a reference to one of the CompletableFuture objects to a variable which has a different lifetime to the thread. That may mean that that CompletableFuture object stays reachable after the thread has terminated.

The behavior of "Mark & Sweep" in Java, especially for Future object

I'm wondering the lifetime of Future object, which is not bound to a named variable.
I learned that Java adopts mark & sweep style garbage collection.
In that case, any un-named object can be immediately deleted from heap. So I'm wondering if the Future might be swept out from memory even before the Runnable completes, or the memory might never be released.
Any information would be helpful, thanks!
class Main {
void main() {
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.submit(() -> { return true; }); // not bind to variable
Thread.sleep(1000);
}
}
I learned that Java adopts mark & sweep style garbage collection.
That is mostly incorrect. Most modern Java garbage collectors are NOT mark & sweep. They are mostly copying collectors that work by evacuating (copying) objects to a "to" space as they are marked. And most Java garbage collectors are also generational.
There is a lot of material published by Oracle about the Java garbage collectors and how they work. And there a good textbooks on the subject too.
In that case, any un-named object can be immediately deleted from heap.
Names have nothing to do with it. References are not names, and neither are variables. Java objects are deleted by the GC only if it finds that they are unreachable; i.e. if no code will never be able to find them again1. Furthermore they are not deleted immediately, or even (necessarily) at the next GC run.
So I'm wondering if the Future might be swept out from memory even before the Runnable completes, or the memory might never be released.
(That's a Callable rather than a Runnable. A Runnable doesn't return anything.)
The answer is no it won't.
The life cycle is something like this:
You call submit passing a Callable.
A CompletableFuture is created.
The CompletableFuture and the Callable are added to the executor's queue.
The CompletableFuture is returned to the caller. (In your case, the caller throws it away.)
At a later point, a worker thread takes the Future and the Callable from the queue, executes the Callable.
Then the worker thread calls complete on the Future to provide the result.
Finally, something will typically call Future.get to obtain the result. (But not in your example.)
In order for the step 6. to work, the CompletableFuture must still be reachable. It won't be thrown away until all references are lost or discarded. Certainly not until after step 6 has completed.
Bottom line: a Java handles the Future just like it would any other (normal) object. Don't worry about it. If anything needs it, it won't disappear.
1 - Reachability is a bit more complicated when you consider, finalization and Reference types. But the same general principal applies. If any code could still see an object, it won't be deleted.
"Submit" means to give something to someone, in this case you're giving a piece of code (in the form of a Callable) to the ExecutorService for later execution. In return, the method returns a Future object that will be updated with the result when it is done.
In order for the ExecutorService to update the Future object, it needs to hold on the Future object too, together with the code reference (the Callable).
Therefore, the ExecutorService maintains references to both the Callable object and the Future object until the job has been completed. Those references makes both objects reachable, preventing the objects from being garbage-collected.
Since your code discarded the returned Future object, the object will become eligible for GC as soon as the job completes, but not before that.

How does Java GC call finalize() method?

As far as I understand, GC starts with some set of initial objects (stack, static objects) and recursively traverses it building a graph of reachable objects. Then it marks the memory taken by these objects as occupied and assumes all the rest of the memory free.
But what if this 'free' memory contains an object with finalize method? GC has to call it, but I don't see how it can even know about objects that aren't reachable anymore.
I suppose GC can keep track of all 'finalizable' objects while they are alive. If so, does having finalizable objects make garbage collecting more expensive even when they are still alive?
Consider the Reference API.
It offers some references with special semantics to the GC, i.e Weak, Soft, and Phantom references. There’s simply another non-public type of special reference, for objects needing finalization.
Now, when the garbage collector traverses the object graph and encounters such a special reference object, it will not mark objects reachable through this reference as strongly reachable, but reachable with the special semantics. So if an object is only finalizer-reachable, the reference will be enqueued, so that one (or one of the) finalizer thread(s) can poll the queue and execute the finalize() method (it’s not the garbage collector itself calling this method).
In other words, the garbage collector never processes entirely unreachable objects here. To apply a special semantic to the reachability, the reference object must be reachable, so the referent can be reached through that reference. In case of finalizer-reachability, Finalizer.register is called when an object is created and it creates an instance of Finalizer in turn, a subclass of FinalReference, and right in its constructor, it calls an add() method which will insert the reference into a global linked list. So all these FinalReference instances are reachable through that list until an actual finalization happens.
Since this FinalReference will be created right on the instantiation of the object, if its class declares a non-trivial finalize() method, there is already some overhead due to having a finalization requirement, even if the object has not collected yet.
The other issue is that an object processed by a finalizer thread is reachable by that thread and might even escape, depending on what the finalize() method does. But the next time, this object becomes unreachable, the special reference object does not exist anymore, so it can be treated like any other unreachable object.
This would only be a performance issue, if memory is very low and the next garbage collection had to be performed earlier to eventually reclaim that object. But this doesn’t happen in the reference implementation (aka “HotSpot” or “OpenJDK”). In fact, there could be an OutOfMemoryError while objects are pending in the finalizer queue, whose processing could make more memory reclaimable. There is no guaranty that finalization runs fast enough for you’re purposes. That’s why you should not rely on it.
But what if this 'free' memory contains an object with finalize
method? GC has to call it, but I don't see how it can even know about
objects that aren't reachable anymore.
Let's say we use CMS garbage collector. After it successfully marked all live objects in a first phase, it will then scan memory again and remove all dead objects. GC thread does not call finalize method directly for these objects.
During creation, they are wrapped and added to finalizer queue by JVM (see java.lang.ref.Finalizer.register(Object)). This queue is processed in another thread (java.lang.ref.Finalizer.FinalizerThread), finalize method will be called when there are no references to the object. More details are covered in this blog post.
If so, does having finalizable objects make garbage collecting more
expensive even when they are still alive?
As you can now see, most of the time it does not.
The finalise method is called when an object is about to get garbage collected. That means, when GC determines that the object is no longer being referenced, it can call the finalise method on it. It doesn't have to keep track of objects to be finalised.
According to javadoc, finalize
Called by the garbage collector on an object when garbage collection determines that there are no more references to the object.
So the decision is based on reference counter or something like that.
Actually it is possible not to have this method called at all. So it may be not a good idea to use it as destructor.

when will finalize() be called on my class instance in this scenario?

I know that finalize() is called whenever a class instance is collected by the garbage collector. However, I am a little bit confused when passing an instance of a class to another thread via a queue.
Let's say this is a skeleton of Thread1:
for(i=0; i<1000; i++) {
Packet pkt = new Packet(); // instance of class
pkt.id = i;
thread2.queue.put(pkt);
}
Then, thread 2 will remove the packet from the queue and perform lengthy operations. Does this second thread get a "copy" of the packet, or is it by some form of reference? The importance is that, if it is by copy, the finalize() on the instance created in thread 1 can be called before thread 2 is done with the packet. If it is by reference, I am guaranteed that finalize() is only called once for the information in the packet.
This basic example may not show the importance, but I am storing a C-pointer (from JNI) in the packet to destroy some memory when I am done with the object. If it is passed by copy, the memory may get destroyed before the second thread is done with it. If it is passed by reference, then it should only be destroyed once the GC sees it is no longer in use by BOTH threads (my desired behavior). If this latter scenario is not guaranteed, I will not use finalize() and use something else but it will be more complex.
The second thread receives the same actual object instance. You're safe from premature finalization.
It receives a copy of the object reference, if you want to think of it that way.
In addition, finalize is not necessarily run when the garbage collector finds that the object has become garbage - the VM is free to run it at any later time, and to actually reclaim the memory some time after that. You really can't rely on when finalize will be run. However, since what you care about is knowing that finalize won't be called before the second thread finishes with the object, that's immaterial. But worth knowing!

What if a finalizer makes an object reachable?

In Java, finalize is called on an object (that overrides it) when it's about to be garbage collectioned, so when it's unreachable. But what if the finalizer makes the object reachable again, what happens then?
Then the object doesn't get garbage collected, basically. This is called object resurrection. Perform a search for that term, and you should get a bunch of interesting articles. As Jim mentioned, one important point is that the finalizer will only be run once.
The object will not be collected until it gets unreachable again.
According to the JavaDoc, finalize() will not be called again.
If you read the API description carefully, you'll see that the finalizer can make the object reachable again. The object won't be discarded until it is unreachable (again), but finalize() won't be called more than once.
Yeah, this is why you don't use finalizers (Well, one of the many reasons).
There is a reference collection that is made to do this stuff. I'll look it up and post it here in a sec, but I think it's PhantomReference.
Yep, PhantomReference:
Phantom reference objects, which are enqueued after the collector determines that their referents may otherwise be reclaimed. Phantom references are most often used for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.
It actually does another pass to check and make sure there are no more references to the object. Since it will fail that test on its second pass, you'll end up not freeing the memory for the object.
Because finalize is only called a single time for any given object, the next time through when it has no references, it will just free the memory without calling finalize. Some good information here on finalization.

Categories