Implementing Future interface for shared computation

Implementing Future interface for shared computation - java

I'm implementing the Future<Collection<Integer>> interface in order to share the result of some bulk computation among all thread in the application.
In fact, I intended to just put an instance of a class implemetnting Future<Collection<Integer>> into an ApplicationScope object so that any other thread which need the result just ask for the Future from the object and call the method get() on it, therefore using the computation performed by some another thread.
My question is about implementing the cancel method. For now, I would write something like that:
public class CustomerFutureImpl implements Future<Collection<Integer>>{
private Thread computationThread;
private boolean started;
private boolean cancelled;
private Collection<Integer> computationResult;
private boolean cancel(boolean mayInterruptIfRunning){
if( computationResult != null )
return false;
if( !started ){
cancelled = true;
return true;
} else {
if(mayInterruptIfRunning)
computationThread.interrupt();
}
}
//The rest of the methods
}
But the method implementation doesn't satisfy the documentation of the Future because we need to throw CancellationException in any thread awaiting for the result (has called the get() method).
Should I add another one field like private Collection<Thread> waitingForTheResultThreads; and then interrupt each thread from the Collection, catch interrupted exception and then throw new CancellationException()?
The thing is that such a solution seems kind of wierd to me... Not sure about that.

Generally you should avoid implementing Future directly at all. Concurrency code is very hard to get right, and frameworks for distributed execution - notably ExecutorService - will provide Future instances referencing the units of work you care about.
You may know that already and are intentionally creating a new similar service, but I feel it's important to call out that for the vast majority of use cases, you should not need to define your own Future implementation.
You might want to look at the concurrency tools Guava provides, in particular ListenableFuture, which is a sub-interface of Future that provides additional features.
Assuming that you really do want to define a custom Future type, use Guava's AbstractFuture implementation as a starting point, so that you don't have to reinvent the complex details you're running into.
To your specific question, if you look at the implementation of AbstractFuture.get(), you'll see that it's implemented with a while loop that looks for value to become non-null, at which time it calls getDoneValue() which either returns the value or raises a CancellationException. So essentially, each thread that is blocking on a call to Future.get() is polling the Future.value field every so often and raising a CancellationException if it detects that the Future has been cancelled. There's no need to keep track of a Collection<Thread> or anything of the sort, since each thread can inspect the state of the Future independently, and return or throw as needed.

Related

Thread vs Runnable vs CompletableFuture in Java multi threading

I am trying to implement multi threading in my Spring Boot app. I am just beginner on multi threading in Java and after making some search and reading articles on various pages, I need to be clarified about the following points. So;
As far as I see, I can use Thread, Runnable or CompletableFuture in order to implement multi threading in a Java app. CompletableFuture seems a newer and cleaner way, but Thread may have more advantages. So, should I stick to CompletableFuture or use all of them based on the scenario?
Basically I want to send 2 concurrent requests to the same service method by using CompletableFuture:
CompletableFuture<Integer> future1 = fetchAsync(1);
CompletableFuture<Integer> future2 = fetchAsync(2);
Integer result1 = future1.get();
Integer result2 = future2.get();
How can I send these request concurrently and then return result based on the following condition:
if the first result is not null, return result and stop process
if the first result is null, return the second result and stop process
How can I do this? Should I use CompletableFuture.anyOf() for that?

CompletableFuture is a tool which settles atop the Executor/ExecutorService abstraction, which has implementations dealing with Runnable and Thread. You usually have no reason to deal with Thread creation manually. If you find CompletableFuture unsuitable for a particular task you may try the other tools/abstractions first.
If you want to proceed with the first (in the sense of faster) non‑null result, you can use something like
CompletableFuture<Integer> future1 = fetchAsync(1);
CompletableFuture<Integer> future2 = fetchAsync(2);
Integer result = CompletableFuture.anyOf(future1, future2)
.thenCompose(i -> i != null?
CompletableFuture.completedFuture((Integer)i):
future1.thenCombine(future2, (a, b) -> a != null? a: b))
.join();
anyOf allows you to proceed with the first result, but regardless of its actual value. So to use the first non‑null result we need to chain another operation which will resort to thenCombine if the first result is null. This will only complete when both futures have been completed but at this point we already know that the faster result was null and the second is needed. The overall code will still result in null when both results were null.
Note that anyOf accepts arbitrarily typed futures and results in a CompletableFuture<Object>. Hence, i is of type Object and a type cast needed. An alternative with full type safety would be
CompletableFuture<Integer> future1 = fetchAsync(1);
CompletableFuture<Integer> future2 = fetchAsync(2);
Integer result = future1.applyToEither(future2, Function.identity())
.thenCompose(i -> i != null?
CompletableFuture.completedFuture(i):
future1.thenCombine(future2, (a, b) -> a != null? a: b))
.join();
which requires us to specify a function which we do not need here, so this code resorts to Function.identity(). You could also just use i -> i to denote an identity function; that’s mostly a stylistic choice.
Note that most complications stem from the design that tries to avoid blocking threads by always chaining a dependent operation to be executed when the previous stage has been completed. The examples above follow this principle as the final join() call is only for demonstration purposes; you can easily remove it and return the future, if the caller expects a future rather than being blocked.
If you are going to perform the final blocking join() anyway, because you need the result value immediately, you can also use
Integer result = future1.applyToEither(future2, Function.identity()).join();
if(result == null) {
Integer a = future1.join(), b = future2.join();
result = a != null? a: b;
}
which might be easier to read and debug. This ease of use is the motivation behind the upcoming Virtual Threads feature. When an action is running on a virtual thread, you don’t need to avoid blocking calls. So with this feature, if you still need to return a CompletableFuture without blocking the your caller thread, you can use
CompletableFuture<Integer> resultFuture = future1.applyToEitherAsync(future2, r-> {
if(r != null) return r;
Integer a = future1.join(), b = future2.join();
return a != null? a: b;
}, Executors.newVirtualThreadPerTaskExecutor());
By requesting a virtual thread for the dependent action, we can use blocking join() calls within the function without hesitation which makes the code simpler, in fact, similar to the previous non-asynchronous variant.
In all cases, the code will provide the faster result if it is non‑null, without waiting for the completion of the second future. But it does not stop the evaluation of the unnecessary future. Stopping an already ongoing evaluation is not supported by CompletableFuture at all. You can call cancel(…) on it, but this will will only set the completion state (result) of the future to “exceptionally completed with a CancellationException”
So whether you call cancel or not, the already ongoing evaluation will continue in the background and only its final result will be ignored.
This might be acceptable for some operations. If not, you would have to change the implementation of fetchAsync significantly. You could use an ExecutorService directly and submit an operation to get a Future which support cancellation with interruption.
But it also requires the operation’s code to be sensitive to interruption, to have an actual effect:
When calling blocking operations, use those methods that may abort and throw an InterruptedException and do not catch-and-continue.
When performing a long running computational intense task, poll Thread.interrupted() occasionally and bail out when true.

So, should I stick to CompletableFuture or use all of them based on the scenario?
Use the one that is most appropriate to the scenario. Obviously, we can't be more specific unless you explain the scenario.
There are various factors to take into account. For example:
Thread + Runnable doesn't have a natural way to wait for / return a result. (But it is not hard to implement.)
Repeatedly creating bare Thread objects is inefficient because thread creation is expensive. Thread pooling is better but you shouldn't implement a thread pool yourself.
Solutions that use an ExecutorService take care of thread pooling and allow you to use Callable and return a Future. But for a once-off async computation this might be over-kill.
Solutions that involve ComputableFuture allow you to compose and combine asynchronous tasks. But if you don't need to do that, using ComputableFuture may be overkill.
As you can see ... there is no single correct answer for all scenarios.
Should I use CompletableFuture.anyOf() for that?
No. The logic of your example requires that you must have the result for future1 to determine whether or not you need the result for future2. So the solution is something like this:
Integer i1 = future1.get();
if (i1 == null) {
return future2.get();
} else {
future2.cancel(true);
return i1;
}
Note that the above works with plain Future as well as CompletableFuture. If you were using CompletableFuture because you thought that anyOf was the solution, then you didn't need to do that. Calling ExecutorService.submit(Callable) will give you a Future ...
It will be more complicated if you need to deal with exceptions thrown by the tasks and/or timeouts. In the former case, you need to catch ExecutionException and the extract its cause exception to get the exception thrown by the task.
There is also the caveat that the second computation may ignore the interrupt and continue on regardless.

So, should I stick to CompletableFuture or use all of them based on the scenario?
Well, they all have different purposes and you'll probably use them all either directly or indirectly:
Thread represents a thread and while it can be subclassed in most cases you shouldn't do so. Many frameworks maintain thread pools, i.e. they spin up several threads that then can take tasks from a task pool. This is done to reduce the overhead that thread creation brings as well as to reduce the amount of contention (many threads and few cpu cores mean a lot of context switches so you'd normally try to have fewer threads that just work on one task after another).
Runnable was one of the first interfaces to represent tasks that a thread can work on. Another is Callable which has 2 major differences to Runnable: 1) it can return a value while Runnable has void and 2) it can throw checked exceptions. Depending on your case you can use either but since you want to get a result, you'll more likely use Callable.
CompletableFuture and Future are basically a way for cross-thread communication, i.e. you can use those to check whether the task is done already (non-blocking) or to wait for completion (blocking).
So in many cases it's like this:
you submit a Runnable or Callable to some executor
the executor maintains a pool of Threads to execute the tasks you submitted
the executor returns a Future (one implementation being CompletableFuture) for you to check on the status and results of the task without having to synchronize yourself.
However, there may be other cases where you directly provide a Runnable to a Thread or even subclass Thread but nowadays those are far less common.
How can I do this? Should I use CompletableFuture.anyOf() for that?
CompletableFuture.anyOf() wouldn't work since you'd not be able to determine which of the 2 you'd pass in was successful first.
Since you're interested in result1 first (which btw can't be null if the type is int) you basically want to do the following:
Integer result1 = future1.get(); //block until result 1 is ready
if( result1 != null ) {
return result1;
} else {
return future2.get(); //result1 was null so wait for result2 and return it
}
You'd not want to call future2.get() right away since that would block until both are done but instead you're first interested in future1 only so if that produces a result you wouldn't have for future2 to ever finish.
Note that the code above doesn't handle exceptional completions and there's also probably a more elegant way of composing the futures like you want but I don't remember it atm (if I do I'll add it as an edit).
Another note: you could call future2.cancel() if result1 isn't null but I'd suggest you first check whether cancelling would even work (e.g. you'd have a hard time really cancelling a webservice request) and what the results of interrupting the service would be. If it's ok to just let it complete and ignore the result that's probably the easier way to go.

Is a method thread-safe if it just calls another thread safe method in Java?

I wonder if a method is going to be thread safe if it just calls another thread safe method. I have an example like this. As the docs state, ConcurrentSkipListSet is thread-safe.
public class MessageHolder {
private Set<String> processedIds;
public MessageHolder() {
this.processedIds = new ConcurrentSkipListSet<>();
}
public void add (String id) {
processedIds.add(id);
}
public boolean contains (String id) {
return processedIds.contains(id);
}
public void remove (String id) {
processedIds.remove(id);
}
}
You may ask why I am not using ConcurrentSkipListSet directly. The reason is that I want to create an interface for the actions performed here and this example will be like an in memory version.

I think that you deserve some clarification on the comments and also some clarification on what causes a race condition.
On race conditions-- the basic idea of all threading is that at anytime of execution the current thread that you are on could be rescheduled to a later time or another thread might be executing and accessing the same data in parrallel.
For one, the process IDs should be final as mentioned before. YOUR CODE WILL NOT BE THREAD SAFE UNTIL YOU DO THIS. Even though the ConcurrentSkipListSet<> is thread safe this doesn't stop the variable processIds from being reassigned by another thread. Also, Java is weird and to your processIds field must be marked final to guarantee that is initialized before the constructor completes. I found this stackoverflow post that explains some of the issues with object construction in java for some more reading. Constructor synchronization in Java. Basically, don't mark your constructor fields as synchronized, but if you want to guarantee variable initialization in the constructor, which in this case you do, then mark your field as final.
To answer your question on whether this is thread safe, the only answer would be that it depends on the client usage you are expecting. The methods you provided are indeed thread safe for their intended purposes as written, but a client could use them and produce a race condition. Your intuition on whether or not it would be 100% necessary to use the synchronized keyword is correct. Yet, what the comments are alluding too is that not making these methods explicitly thread safe might have some dire consequence in the future for the maintainability and correctness of your code.
A client could still use the API you provide in an unsafe way that could result in a race condition as mentioned in one of the comments. If you are providing an interface to a client you may not care about this... or you might care about this and want to provide a mechanism for the client to make multiple accesses to your class with guaranteed thread safety.
Overall, I would probably recommend that you mark your methods as synchronized for a couple of reasons 1) it makes in clear to the client that they are accessing methods that are thread safe which is very important. I can easily imagine a situation where the client decides to use a lock when it isn't needed at the detriment of performance. 2) Someone could change your methods to contain some different logic in the future that requires the synchronized keyword (this isn't so unlikely because you seem to already be in a threaded environment).

Java 8 CompletableFuture lazy computation control

I've got a question about CompletableFuture and its possible usage for lazy computations.
It seems like it is a great substitute for RunnableFuture for this task since it is possible to easily create task chains and to have total control of each chain link. Still I found that it is very hard to control when exactly does the computation take place.
If I just create a CompletableFuture with supplyAssync method or something like that, it is OK. It waits patiently for me to call get or join method to compute. But if I try to make an actual chain with whenCompose, handle or any other method, the evaluation starts immediately, which is very frustrating.
Of course, I can always place some blocker task at the start of the chain and release the block when I am ready to begin calculation, but it seems a bit ugly solution. Does anybody know how to control when does CompletableFuture actually run.

CompletableFuture is a push-design, i.e. results are pushed down to dependent tasks as soon as they become available. This also means side-chains that are not in themselves consumed still get executed, which can have side-effects.
What you want is a pull-design where ancestors would only be pulled in as their data is consumed.
This would be a fundamentally different design because side-effects of non-consumed trees would never happen.
Of course with enough contortions CF could be made to do what you want, but you should look into the fork-join framework instead which allows you to only run the computations you depend on instead of pushing down results.

There's a conceptual difference between RunnableFuture and CompletableFuture that you're missing here.
RunnableFuture implementations take a task as input and hold onto it. It runs the task when you call the run method.
A CompletableFuture does not hold onto a task. It only knows about the result of a task. It has three states: complete, incomplete, and completed exceptionally (failed).
CompletableFuture.supplyAsync is a factory method that gives you an incomplete CompletableFuture. It also schedules a task which, when it completes, will pass its result to the CompletableFuture's complete method. In other words, the future that supplyAsync hands you doesn't know anything about the task, and can't control when the task runs.
To use a CompletableFuture in the way you describe, you would need to create a subclass:
public class RunnableCompletableFuture<T> extends CompletableFuture<T> implements RunnableFuture<T> {
private final Callable<T> task;
public RunnableCompletableFuture(Callable<T> task) {
this.task = task;
}
#Override
public void run() {
try {
complete(task.call());
} catch (Exception e) {
completeExceptionally(e);
}
}
}

A simple way of dealing with your problem is wrapping your CompletableFuture in something with a lazy nature. You could use a Supplier or even Java 8 Stream.

it is late, but how about using constructor for first CompletableFuture in the chain?
CompletableFuture<Object> cf = new CompletableFuture<>();
// compose the chain
cf.thenCompose(sometask_here);
// later starts the chain with
cf.complete(anInputObject);

Java Optimization with Threads

I'm using a custom class Foo in Java as the key type in a HashMap. All the fields of Foo instances are immutable (they are declared final and private and are assigned values only in the constructor). Thus, the hashCode() of a given Foo object is also fixed, and for optimization purposes, I am calculating it in the constructor and simply returning that value in the hashCode() method.
Instances of Foo also have a value() method which returns a similar fixed value once the object has been instantiated. Currently I am also calculating it in the constructor and returning it in the method, but there is a difference between hashCode() and value(): hashCode() is called for the first time almost instantly after the object is created, but value() is called much later. I understand that having a separate Thread to calculate the hash-code would simply increase the run-time because of synchronization issues, but:
is this a good way to calculate value()? Would it improve run-time at all?
are simple Threads enough, or do I need to use pools etc.?
Note: this may seem like I'm optimizing the wrong parts of my program, but I've already worked on the 'correct' parts and brought the average run-time down from ~17 seconds to ~2 seconds. Edit: there will be upwards of 5000 Foo objects, and that's a conservative estimate.

It definitely sounds like deferred calculation is a good approach here - and yes, if you create a lot of these objects, a thread pool is the way to go.
As for value()'s return value until it's ready, I would stay away from returning invalid values, and instead either make it blocking (and add some isValueReady() helper) or make it instantly return a "future" - some object that offers those same isReady and a blocking get methods.
Also, never rely on "much later" - always make sure the value there is ready before using it.

I recommend creating a Future for value - create a static fixedTheadPool and submit the value calculations on it. This way there's no risk that value will be accessed before it's available - the worst case is that whatever is accessing value will block on a Future.get call (or use the version with a timeout if e.g. deadlock is a concern)
Because Future.get throws checked exceptions which can be a nuisance, you can wrap the get call in your class's getter method and wrap the checked exceptions in a RuntimeException
class MyClass {
private static final ExecutorService executor = Executors.newFixedThreadPool(/* some value that makes sense */);
private final Future<Value> future;
public MyClass() {
future = executor.submit(/* Callable */);
}
public boolean isValueDone() {
return future.isDone();
}
public Value value() {
try {
return future.get();
} catch(InterruptedException|ExecutionException e) {
throw new RuntimeException(e);
}
}
}

Java - Handling Non-Blocking Calls

In my application I am using a third-party API. It is a non-blocking method which returns immediately. I have a collection of elements over which I have to invoke this method.
Now, my problem is that I have to find a way till all the method execution gets completed and do my next operation. How can I handle this? I cannot modify the third-party API.
In short it looks like this
for(Object object: objects){
methodA(object); //this is a non-blocking call and returns immediately
}
// here I want to do my next task only after all the methodA calls completed execution

What you are asking for is impossible ... unless the third party API also includes some method that allows you to wait until one or more calls to methodA have completed.
Does it?
EDIT
As Kathy Stone notes, another possibility is that the API might have a callback mechanism, whereby a thread (behind the API) that is doing the work started by the methodA call "calls back" to your code. (There would need to be some other method in the API that allows you to register the callback object.)
There are other possibilities as well ... (some too horrible to mention) ... but they all entail the API being designed to support synchronization with end of the tasks started by methodA.

As Stephen noted it is possible if you have some way of knowing that the method has completed. If you have some kind of callback or listener for this you could use something like a counting semaphore:
final Semaphore block = new Semaphore();
//HERE SOMETHING APPROPRIATE TO YOUR API
myAPI.registerListener(new APIListener(){
public void methodADone(){
block.release();
}
});
int permits = 0;
for(Object object: objects){
methodA(object); //this is a non-blocking call and returns immediately
permits++;
}
block.acquire(permits);
Of course you would need extra checking to make sure you are releasing permits for the correct object collections, depending on how your code is threaded and what mechanism the API provides to know the call has completed, but this is one approach that could be used.

How do you dertermine a methodA() call has finished?
Does the method return any handle? Or do the object has any property to be set by the methodA() call? So collect them an do a loop with sleep and check all remaining handles or object properties, each removed from the remaining if completed.
The waiting code cann look like:
while(!remaining.isEmpty()) {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
continue;
}
Iterator<HandleOrObjectWithProperty> i = remaining.iterator();
while (i.hasNext()) {
HandleOrObjectWithProperty result = i.next();
if (result.handleHasFinishedOrPropertyIsSet()) {
i.remove();
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.