CompletableFuture allows to provide callbacks for async calls. You can create a long chain of callbacks where each async call will trigger the next one on completion. This is deemed a better way to write async code instead of using Future where you've to block the thread to get the result of first computation before triggering the next one.
I can understand the argument that callback chains in Completable Futures can provide a more readable code but I'm wondering if there's a performance benefit as well to this approach or is it just a syntactic sugar?
For example, consider the following code:
ExecutorService exec = Executors.newSingleThreadExecutor();
CompletableFuture.supplyAsync(this::findAccountNumber, exec)
.thenApply(this::calculateBalance)
.thenApply(this::notifyBalance)
.thenAccept((i)->notifyByEmail())
.join();
In this code, calculateBalance() can't start until findAccountNumber() finishes so essentially calculateBalance() is blocked on findAccountNumber() and so on for the next methods in the callback chain. How is it better than the following (performance-wise):
ExecutorService exec = Executors.newSingleThreadExecutor();
Future<Integer> accountNumberFuture = exec.submit(findAccountNumberCallable);
Integer accountNumber = accountNumberFuture.get();
Future<String> calculateBalanceFuture = exec.submit(calculateBalanceCallable(accountNumber);
....
....
In most cases, you won't notice a difference, but if you want to be able to have a lot of concurrent asynchronous calls waiting something, you'll want to use CompletableFuture.
The reason is that if you simply call get() on a regular Future the Thread and all resources associated with it become blocked until the call returns. If you have many calls your thread pool might get exhausted, or if you use a CachedThreadPool you might cause lots of threads to be created.
With CompletableFuture, an object is stored on the heap which represents where the application should pick up next, as opposed to using the call stack. The guy who built the API has a talk about it over here.
Related
Context: I've read this SO thread discussing the differences between CompletableFuture and Thread.
But I'm trying to understand when should I use new Thread() instead of runAsync().
I understand that runAsyn() is more efficient for short/one-time parallel task because my program might avoid the overhead of creating a brand new thread, but should I also consider it for long running operations?
What are the factors that I should be aware before considering to use one over the other?
Thanks everyone.
The difference between using the low-level concurrency APIs (such as Thread) and others is not just about the kind of work that you want to get done, it's also about the programming model and also how easy they make it to configure the environment in which the task runs.
In general, it is advisable to use higher-level APIs, such as CompletableFuture instead of directly using Threads.
I understand that runAsyn() is more efficient for short/one-time parallel task
Well, maybe, assuming you call runAsync, counting on it to use the fork-join pool. The runAsync method is overloaded with a method that allows one to specify the java.util.concurrent.Executor with which the asynchronous task is executed.
Using an ExecutorService for more control over the thread pool and using CompletableFuture.runAsync or CompletableFuture.supplyAsync with a specified executor service (or not) are both generally preferred to creating and running a Thread object directly.
There's nothing particularly for or against using the CompletableFuture API for long-running tasks. But the choice one makes to use Threads has other implications as well, among which:
The CompletableFuture gives a better API for programming reactively, without forcing us to write explicit synchronization code. (we don't get this when using threads directly)
The Future interface (which is implemented by CompletableFuture) gives other additional, obvious advantages.
So, in short: you can (and probably should, if the alternative being considered is the Thread API) use the CompletableFuture API for your long-running tasks. To better control thread pools, you can combine it with executor services.
The main difference is CompletableFuture run your task by default on the ForkJoinPool.commonPool. But if you create your own thread and start it will execute as a single thread, not on a Thread pool. Also if you want to execute some task in a sequence but asynchronously. Then you can do like below.
CompletableFuture.runAsync(() -> {
System.out.println("On first task");
System.out.println("Thread : " + Thread.currentThread());
}).thenRun(() -> {
System.out.println("On second task");
});
Output:
On first task
Thread : Thread[ForkJoinPool.commonPool-worker-1,5,main]
On second task
If you run the above code you can see that which pool CompletableFuture is using.
Note: Threads is Daemon in ForkJoinPool.commonPool.
To run some stuff in parallel or asynchronously I can use either an ExecutorService: <T> Future<T> submit(Runnable task, T result); or the CompletableFuture Api:static <U> CompletableFuture<U> supplyAsync(Supplier<U> supplier, Executor executor);
(Lets assume I use in both cases the same Executor)
Besides the return type Future vs. CompletableFuture are there any remarkable differences. Or When to use what?
And what are the differences if I use the CompletableFuture API with default Executor (the method without executor)?
Besides the return type Future vs. CompletableFuture are there any remarkable differences. Or When to use what?
It's rather simple really. You use the Future when you want the executing thread to wait for async computation response. An example of this is with a parallel merge/sort. Sort left asynchronously, sort right synchronously, wait on left to complete (future.get()), merge results.
You use a CompleteableFuture when you want some action executed, with the result after completion, asynchronously from the executed thread. For instance: I want to do some computation asynchronously and when I compute, write the results to some system. The requesting thread may not need to wait on a result then.
You can mimic the above example in a single Future executable, but the CompletableFuture offers a more fluent interface with better error handling.
It really depends on what you want to do.
And what are the differences if i use the CompletableFutureApi with default Executor (the method without executor)?
It will delegate to ForkJoin.commonPool() which is a default size to the number of CPUs on your system. If you are doing something IO intensive (reading and writing to the file system) you should define the thread pool differently.
If it's CPU intensive, using the commonPool makes most sense.
CompletableFuture has rich features like chaining multiple futures, combining the futures, executing some action after future is executed (both synchronously as well as asynchronously), etc.
However, CompletableFuture is no different than Future in terms of performance. Even when combine multiple instances of CompletableFuture (using .thenCombine and .join in the end), none of them get executed unless we call .get method and during this time, the invoking thread is blocked. I feel in terms of performance, this is not better than Future.
Please let me know if I am missing some aspect of performance here.
This clarified for me the difference between future an completable future a bit more: Difference between Future and Promise
CompletableFuture is more like a promise.
After doing lots of searching on Java, I really am very confused over the following questions:
Why would I choose an asynchronous method over a multi-threaded method?
Java futures are supposed to be non-blocking. What does non-blocking mean? Why call it non-blocking when the method to extract information from a Future--i.e., get()--is blocking and will simply halt the entire thread till the method is done processing? Perhaps a callback method that rings the church bell of completion when processing is complete?
How do I make a method async? What is the method signature?
public List<T> databaseQuery(String Query, String[] args){
String preparedQuery = QueryBaker(Query, args);
List<int> listOfNumbers = DB_Exec(preparedQuery); // time taking task
return listOfNumbers;
}
How would this fictional method become a non blocking method? Or if you want please provide a simple synchronous method and an asynchronous method version of it.
Why would I choose an asynchronous method over a multi-threaded method?
Asynchronous methods allow you to reduce the number of threads. Instead of tying up a thread in a blocking call, you can issue an asynchronous call and then be notified later when it completes. This frees up the thread to do other processing in the meantime.
It can be more convoluted to write asynchronous code, but the benefit is improved performance and memory utilization.
Java futures are supposed to be non-blocking. What does non-blocking mean? Why call it non-blocking when the method to extract information from a Future--i.e., get()--is blocking and will simply halt the entire thread till the method is done processing ? Perhaps a callback method that rings the church bell of completion when processing is complete?
Check out CompletableFuture, which was added in Java 8. It is a much more useful interface than Future. For one, it lets you chain all kinds of callbacks and transformations to futures. You can set up code that will run once the future completes. This is much better than blocking in a get() call, as you surmise.
For instance, given asynchronous read and write methods like so:
CompletableFuture<ByteBuffer> read();
CompletableFuture<Integer> write(ByteBuffer bytes);
You could read from a file and write to a socket like so:
file.read()
.thenCompose(bytes -> socket.write(bytes))
.thenAccept(count -> log.write("Wrote {} bytes to socket.", count)
.exceptionally(exception -> {
log.error("Input/output error.", exception);
return null;
});
How do I make a method async? What is the method signature?
You would have it return a future.
public CompletableFuture<List<T>> databaseQuery(String Query, String[] args);
It's then the responsibility of the method to perform the work in some other thread and avoid blocking the current thread. Sometimes you will have worker threads ready to go. If not, you could use the ForkJoinPool, which makes background processing super easy.
public CompletableFuture<List<T>> databaseQuery(String query, String[] args) {
CompletableFuture<List<T>> future = new CompletableFuture<>();
Executor executor = ForkJoinPool.commonPool();
executor.execute(() -> {
String preparedQuery = QueryBaker(Query, args);
List<T> list = DB_Exec(preparedQuery); // time taking task
future.complete(list);
});
}
why would I choose a Asynchronous method over a multi-threaded method
They sound like the same thing to me except asynchronous sounds like it will use one thread in the back ground.
Java futures is supposed to be non blocking ?
Non- blocking operations often use a Future, but the object itself is blocking, though only when you wait on it.
What does Non blocking mean?
The current thread doesn't wait/block.
Why call it non blocking when the method to extract information from a Future < some-object > i.e. get() is blocking
You called it non-blocking. Starting the operation in the background is non-blocking, but if you need the results, blocking is the easiest way to get this result.
and will simply halt the entire thread till the method is done processing ?
Correct, it will do that.
Perhaps a callback method that rings the church bell of completion when processing is complete ?
You can use a CompletedFuture, or you can just add to the task anything you want to do at the end. You only need to block on things which have to be done in the current thread.
You need to return a Future, and do something else while you wait, otherwise there is no point using a non-blocking operation, you may as well execute it in the current thread as it's simpler and more efficient.
You have the synchronous version already, the asynchronous version would look like
public Future<List<T>> databaseQuery(String Query, String[] args) {
return executor.submit(() -> {
String preparedQuery = QueryBaker(Query, args);
List<int> listOfNumbers = DB_Exec(preparedQuery); // time taking task
return listOfNumbers;
});
}
I'm not a guru on multithreading but I'm gonna try to answer these questions for my sake as well
why would I choose a Asynchronous method over a multi-threaded method ? (My problem: I believe I read too much and now I am myself confused)`
Multi-threading is working with multiple threads, there isn't much else to it. One interesting concept is that multiple threads cannot work in a truly parallel fashion and thus divides each thread into small bits to give the illusion of working in parallel.
1
One example where multithreading would be useful is in real-time multiplayer games, where each thread corresponds to each user. User A would use thread A and User B would use thread B. Each thread could track each user's activity and data could be shared between each thread.
2
Another example would be waiting for a long http call. Say you're designing a mobile app and the user clicks on download for a file of 5 gigabytes. If you don't use multithreading, the user would be stuck on that page without being able to perform any action until the http call completes.
It's important to note that as a developer multithreading is only a way of designing code. It adds complexity and doesn't always have to be done.
Now for Async vs Sync, Blocking vs Non-blocking
These are some definitions I found from http://doc.akka.io/docs/akka/2.4.2/general/terminology.html
Asynchronous vs. Synchronous
A method call is considered synchronous if the caller cannot make progress until the method returns a value or throws an exception. On the other hand, an asynchronous call allows the caller to progress after a finite number of steps, and the completion of the method may be signalled via some additional mechanism (it might be a registered callback, a Future, or a message).
A synchronous API may use blocking to implement synchrony, but this is not a necessity. A very CPU intensive task might give a similar behavior as blocking. In general, it is preferred to use asynchronous APIs, as they guarantee that the system is able to progress. Actors are asynchronous by nature: an actor can progress after a message send without waiting for the actual delivery to happen.
Non-blocking vs. Blocking
We talk about blocking if the delay of one thread can indefinitely delay some of the other threads. A good example is a resource which can be used exclusively by one thread using mutual exclusion. If a thread holds on to the resource indefinitely (for example accidentally running an infinite loop) other threads waiting on the resource can not progress. In contrast, non-blocking means that no thread is able to indefinitely delay others.
Non-blocking operations are preferred to blocking ones, as the overall progress of the system is not trivially guaranteed when it contains blocking operations.
I find that async vs sync refers more to the intent of the call whereas blocking vs non-blocking refers to the result of the call. However, it wouldn't be wrong to say usually asynchronous goes with non-blocking and synchronous goes with blocking.
2> Java futures is supposed to be non blocking ? What does Non blocking mean? Why call it non blocking when the method to extract information from a Future < some-object > i.e. get() is blocking and will simply halt the entire thread till the method is done processing ? Perhaps a callback method that rings the church bell of completion when processing is complete ?
Non-blocking do not block the thread that calls the method.
Futures were introduced in Java to represent the result of a call, although it may have not been complete. Going back to the http file example, Say you call a method like the following
Future<BigData> future = server.getBigFile(); // getBigFile would be an asynchronous method
System.out.println("This line prints immediately");
The method getBigFile would return immediately and proceed to the next line of code. You would later be able to retrieve the contents of the future (or be notified that the contents are ready). Libraries/Frameworks like Netty, AKKA, Play use Futures extensively.
How do I make a method Async? What is the method signature?
I would say it depends on what you want to do.
If you want to quickly build something, you would use high level functions like Futures, Actor models, etc. something which enables you to efficiently program in a multithreaded environment without making too many mistakes.
On the other hand if you just want to learn, I would say it's better to start with low level multithreading programming with mutexes, semaphores, etc.
Examples of codes like these are numerous in google if you just search java asynchronous example with any of the keywords I have written.
Let me know if you have any other questions!
I know that shutdown() and awaitTermination() exist. The problem is that the runnables in the pool need to be able to add an unknown number (can't use a countdownlatch) of other runnables to it and if I call shutdown() those tasks will be rejected. How can I know when they're done?
Work with Future rather than with Runnable. There's this Future#isDone method that may help you.
In case you don't have anything meaningful to return from the Callable, use Callable<Void> and Future<Void>.
Instead of submitting Runnable tasks to an Executor, you should rather use ForkJoinTask/ForkJoinPool instead. A ForkJoinTask runs inside a ForkJoinPool and can spawn an arbitrary number of (sub)tasks and wait for them to complete, without actually blocking the current thread. A ForkJoinTask is complete when all of its sub-tasks are done, so the entire computation is done, when the initial (root) ForkJoinTask is complete.
See Oracle - The Java™ Tutorials - Fork/Join for details.
As all of your tasks are resultless (Runnable), you should subclass RecursiveAction (which is itself a subclass of ForkJoinTask). Implement the method compute(), and spawn an arbitrary number of new tasks there by either calling invoke(subtask), invokeAll(subtask1, subtask2, ...) or subtask.fork() followed by subtask.join().
The entire computation is executed as follows:
MyRecursiveAction task = new MyRecursiveAction(params);
ForkJoinPool pool = new ForkJoinPool(numberOfThreads);
pool.invoke(task); // will block until task is done
Unfortunatley the advantages of Fork/Join have some limitations, e.g.:
(...) Computations should ideally avoid synchronized methods or blocks, and
should minimize other blocking synchronization apart from joining
other tasks or using synchronizers such as Phasers that are advertised
to cooperate with fork/join scheduling. Subdividable tasks should also
not perform blocking I/O, and should ideally access variables that are
completely independent of those accessed by other running tasks. These
guidelines are loosely enforced by not permitting checked exceptions
such as IOExceptions to be thrown. (...)
For more detail see API docs of ForkJoinTask.
If you are able to use Guava Futures, you can use Futures.allAsList or Futures.successfulAsList. This allows you to wrap a number of Future instances that you got back from the ExecutorService into a single Future which you can then check to see if it is finished using isDone() (or just get(), for that matter, if you want to block until completion).
I have a long-running calculation that I have split up with Java's ForkJoinTask.
Java's FutureTask provides a template method done(). Overriding this method allows for "registering a completion handler".
Is it possible to register a completion handler for a ForkJoinTask?
I am asking because I don't want to have blocking threads in my application - but my application will have a blocking thread as soon as I retrieve the calculation result via calls to result = ForkJoinPool.invoke(myForkJoinTask) or result = ForkJoinPool.submit(myForkJoinTask).get().
I think you mean "lock free" programming http://en.wikipedia.org/wiki/Non-blocking_algorithm? While FutureTask.get() possibly blocks the current thread (and thus leaves an idling CPU) ForkJoinTask.get() (or join) tries to keep the CPU busy.
This works well if you are able to split your problem into many small peaces (ForkJoinTask). If one FJTask is internally waiting for the result of an other task, which is not ready, the ForkJoinTask tries to pick up some other work (Task) to do from its ForkJoinPool and executes that task(s) meanwhile.
Until all your Task are CPU bound, it works fine: all your CPU(s) are kept busy.
It does NOT work if any of your Task waits for some external event (i.e. sending a REST call to the Mars rover). Also the problem should form a DAG, else you may get a deadlock. But until you join only tasks you forked before in the same Task it works well. Even better if you join the task you forked at last.
So it is not too worse to call get() or join() within/between your Tasks.
You mentioned a completion handler to solve the problem. If you are implementing the ForkJoinTask yourself you may have a look at RecursiveTask or even RecursiveAction. You will implement compute() and you may easily forward the result of each task to some collector at the end of your compute() function instead of returning it.
But you have to consider that you collector will be called concurrently! For adding values or counting completion counts have a look at java.util.concurrent.atomic. Avoid using synchronized blocks. Else all your Tasks have to wait for this single bottleneck and only one CPU keeps working.
I think propagating the results involves more problems than returning them (since FJPool handles this). In addition it becomes difficult to decide (and to communicate to the outside) at which point your final result is done.