ExecutorService.submit(Task) vs CompletableFuture.supplyAsync(Task, Executor) - java

To run some stuff in parallel or asynchronously I can use either an ExecutorService: <T> Future<T> submit(Runnable task, T result); or the CompletableFuture Api:static <U> CompletableFuture<U> supplyAsync(Supplier<U> supplier, Executor executor);
(Lets assume I use in both cases the same Executor)
Besides the return type Future vs. CompletableFuture are there any remarkable differences. Or When to use what?
And what are the differences if I use the CompletableFuture API with default Executor (the method without executor)?

Besides the return type Future vs. CompletableFuture are there any remarkable differences. Or When to use what?
It's rather simple really. You use the Future when you want the executing thread to wait for async computation response. An example of this is with a parallel merge/sort. Sort left asynchronously, sort right synchronously, wait on left to complete (future.get()), merge results.
You use a CompleteableFuture when you want some action executed, with the result after completion, asynchronously from the executed thread. For instance: I want to do some computation asynchronously and when I compute, write the results to some system. The requesting thread may not need to wait on a result then.
You can mimic the above example in a single Future executable, but the CompletableFuture offers a more fluent interface with better error handling.
It really depends on what you want to do.
And what are the differences if i use the CompletableFutureApi with default Executor (the method without executor)?
It will delegate to ForkJoin.commonPool() which is a default size to the number of CPUs on your system. If you are doing something IO intensive (reading and writing to the file system) you should define the thread pool differently.
If it's CPU intensive, using the commonPool makes most sense.

CompletableFuture has rich features like chaining multiple futures, combining the futures, executing some action after future is executed (both synchronously as well as asynchronously), etc.
However, CompletableFuture is no different than Future in terms of performance. Even when combine multiple instances of CompletableFuture (using .thenCombine and .join in the end), none of them get executed unless we call .get method and during this time, the invoking thread is blocked. I feel in terms of performance, this is not better than Future.
Please let me know if I am missing some aspect of performance here.

This clarified for me the difference between future an completable future a bit more: Difference between Future and Promise
CompletableFuture is more like a promise.

Related

Asynchronous execution/operation with CompletableFuture in Java 8+

In Java, is calling get() method while looping on CompletableFuture instances as good as doing synchronous operations although CompletableFuture is used for async calls?
'get()' waits until the future is completed. If that's what you want, it's what you use. There's no general rule.
For example, you might be using a method that is inherently asynchronous, but in your particular use, you need to wait for it to complete. If so, then there's nothing wrong with waiting for it to complete!
You mention a loop. You might find it applicable to start all the tasks in the loop, collecting a list of futures, and then (outside the loop) wait for them all to complete. That way you're getting some parallelism.
But as a general rule: it depends.

Is there a performance benefit of using CompletableFuture callbacks?

CompletableFuture allows to provide callbacks for async calls. You can create a long chain of callbacks where each async call will trigger the next one on completion. This is deemed a better way to write async code instead of using Future where you've to block the thread to get the result of first computation before triggering the next one.
I can understand the argument that callback chains in Completable Futures can provide a more readable code but I'm wondering if there's a performance benefit as well to this approach or is it just a syntactic sugar?
For example, consider the following code:
ExecutorService exec = Executors.newSingleThreadExecutor();
CompletableFuture.supplyAsync(this::findAccountNumber, exec)
.thenApply(this::calculateBalance)
.thenApply(this::notifyBalance)
.thenAccept((i)->notifyByEmail())
.join();
In this code, calculateBalance() can't start until findAccountNumber() finishes so essentially calculateBalance() is blocked on findAccountNumber() and so on for the next methods in the callback chain. How is it better than the following (performance-wise):
ExecutorService exec = Executors.newSingleThreadExecutor();
Future<Integer> accountNumberFuture = exec.submit(findAccountNumberCallable);
Integer accountNumber = accountNumberFuture.get();
Future<String> calculateBalanceFuture = exec.submit(calculateBalanceCallable(accountNumber);
....
....
In most cases, you won't notice a difference, but if you want to be able to have a lot of concurrent asynchronous calls waiting something, you'll want to use CompletableFuture.
The reason is that if you simply call get() on a regular Future the Thread and all resources associated with it become blocked until the call returns. If you have many calls your thread pool might get exhausted, or if you use a CachedThreadPool you might cause lots of threads to be created.
With CompletableFuture, an object is stored on the heap which represents where the application should pick up next, as opposed to using the call stack. The guy who built the API has a talk about it over here.

When to use non-async methods of CompletableFuture?

I (mostly) understand the three execution methods of CompletableFuture:
non-async (synchronous execution)
default async (asynchronous using the default executor)
custom async (asynchronous using a custom executor)
My question is: when should one favor the use of non-async methods?
What happens if you have a code block that invokes other methods that also return CompletableFutures? This might look cheap on the surface, but what happens if those methods also use non-async invocation? Doesn't this add up to one long non-async block that could get expensive?
Should one restrict the use of non-async execution to short, well-defined code-blocks that do not invoke other methods?
When should one favor the use of non-async methods?
The decision for continuations is no different than for the antecedent task itself. When do you choose to make an operation asynchronous (e.g., using a CompletableFuture) vs. writing purely synchronous code? The same guidance applies here.
If you are simply consuming the result or using the completion signal to kick off another asynchronous operation, then that itself is a cheap operation, and there is no reason not to use the synchronous completion methods.
On the other hand, if you are chaining together multiple long-running operations that would each be an async operation in their own right, then use the async completion methods.
If you're somewhere in between, trust your gut, or just go with the async completion methods. If you're not coordinating thousands of tasks, then you won't be adding a whole lot of overhead.
Should one restrict the use of non-async execution to short, well-defined code-blocks that do not invoke other methods?
I would use them for operations that are not long-running. You don't need to restrict their use to trivially short and simple callbacks. But I think you have the right idea.
If you're using CompletableFuture, then you have decided that at least some operations in your code base necessitate async execution, but presumably not all operations are async. How did you decide which should be async and which should not? If you apply that same analysis to continuations, I think you'll be fine.
What happens if you have a code block that invokes other methods that also return CompletableFutures? This might look cheap on the surface, but what happens if those methods also use non-async invocation? Doesn't this add up to one long non-async block that could get expensive?
Returning a CompletableFuture generally signifies that the underlying operation is scheduled to occur asynchronously, so that should not be a problem. In most cases, I would expect the flow to look something like this:
You synchronously call an async method returning a CompletableFuture. It schedules some async operation to eventually provide a result. Your call returns almost immediately, with no blocking.
Upon completion, one or more continuations may be invoked. Some of those may invoke additional async operations. Those will call into methods that will schedule additional async operations, but as before, they return almost immediately.
Go to (2), or finish.

What are the differences between a Scala Future and a Java Future

Are there any conceptual, functional or mechanical differences between a Scala Future and a Java Future? Conceptually I can't see any differences as they both aim to provide a mechanism for asynchronous computation.
The main inconvenience of java.util.concurrent.Future is the fact that you can't get the value without blocking.
In fact, the only way to retrieve a value is the get method, that (quoting from the docs)
Waits if necessary for the computation to complete, and then retrieves its result.
With scala.concurrent.Future you get instead a real non-blocking computation, as you can attach callbacks for completion (success/failure) or simply map over it and chain multiple Futures together in a monadic fashion.
Long story short, scala's Future allows for asynchronous computation without blocking more threads than necessary.
Java Future:Both represent result of an Asynchronous computation,but Java's Future requires that you access the result via a blocking get method.Although you can call isDone to find out if a Java Future has completed before calling get, thereby avoiding any blocking, you must wait until the Java Future has completed before proceeding with any computation that uses the result.
Scala Future: You can specify transformations on a Scala Future whether it has completed or not.Each transformation results in a new Future representing the asynchronous result of the original Future transformed by the function. This allows you to describe asynchronous computations as a series of transformations.
Scala's Future often eliminate, the need to reason about shared data and locks.When you invoke a Scala method, it performs a computation "while you wait" and returns a result. If that result is a Future, the Future represents another computation to be performed asynchronously often by a completely different thread.
Example: Instead of blocking then continuing with another computation, you can just map the next computation onto the future.
Following future will complete after ten seconds:
val fut = Future { Thread.sleep(10000);21+21}
Mapping this future with a function that increments by one will yield another future.
val result = fut.map(x=>x+1)

How do I know when all threads in a ExecutorService are finished?

I know that shutdown() and awaitTermination() exist. The problem is that the runnables in the pool need to be able to add an unknown number (can't use a countdownlatch) of other runnables to it and if I call shutdown() those tasks will be rejected. How can I know when they're done?
Work with Future rather than with Runnable. There's this Future#isDone method that may help you.
In case you don't have anything meaningful to return from the Callable, use Callable<Void> and Future<Void>.
Instead of submitting Runnable tasks to an Executor, you should rather use ForkJoinTask/ForkJoinPool instead. A ForkJoinTask runs inside a ForkJoinPool and can spawn an arbitrary number of (sub)tasks and wait for them to complete, without actually blocking the current thread. A ForkJoinTask is complete when all of its sub-tasks are done, so the entire computation is done, when the initial (root) ForkJoinTask is complete.
See Oracle - The Java™ Tutorials - Fork/Join for details.
As all of your tasks are resultless (Runnable), you should subclass RecursiveAction (which is itself a subclass of ForkJoinTask). Implement the method compute(), and spawn an arbitrary number of new tasks there by either calling invoke(subtask), invokeAll(subtask1, subtask2, ...) or subtask.fork() followed by subtask.join().
The entire computation is executed as follows:
MyRecursiveAction task = new MyRecursiveAction(params);
ForkJoinPool pool = new ForkJoinPool(numberOfThreads);
pool.invoke(task); // will block until task is done
Unfortunatley the advantages of Fork/Join have some limitations, e.g.:
(...) Computations should ideally avoid synchronized methods or blocks, and
should minimize other blocking synchronization apart from joining
other tasks or using synchronizers such as Phasers that are advertised
to cooperate with fork/join scheduling. Subdividable tasks should also
not perform blocking I/O, and should ideally access variables that are
completely independent of those accessed by other running tasks. These
guidelines are loosely enforced by not permitting checked exceptions
such as IOExceptions to be thrown. (...)
For more detail see API docs of ForkJoinTask.
If you are able to use Guava Futures, you can use Futures.allAsList or Futures.successfulAsList. This allows you to wrap a number of Future instances that you got back from the ExecutorService into a single Future which you can then check to see if it is finished using isDone() (or just get(), for that matter, if you want to block until completion).

Categories