Thread vs CompletableFuture - java

What is the advantage of passing code directly to thread vs using CompletableFuture instead?
Thread thread = new Thread(() -> {do something});
thread.start();
VS
CompletableFuture<Void> cf1 =
CompletableFuture.runAsync(() -> {do something});

CompletableFuture.runAsync(...) runs the Runnable in the forkJoin-Pool which is managed, while new Thread() creates a new thread which you have to manage.
What does "is managed" mean, it's pre-allocated and the threads are shared in the JVM. When the runnable is completed, the thread can be reused for other runnables. This makes better usage of resources, especially as thread instantiation is an expensive operation - not only the object, but also some extra non-heap memory - the thread stack - has to be allocated.

#Gerald Mücke already mentioned the important difference:
CompletableFuture.runAsync(...) runs the Runnable in the forkJoin-Pool which is managed, while new Thread() creates a new thread which you have to manage.
CompletableFuture will use threads managed by a ThreadPool (default or customized).
However, I think the following two points are also should be considered.
First
CompletableFuture has so many easily understandable methods to chain different asynchronous computations together, making it much easier to introduce asynchrony than directly using Thread.
CompletableFuture[] futures = IntStream.rangeClosed(0, LEN).boxed()
.map(i -> CompletableFuture.supplyAsync(() -> runStage1(i), EXECUTOR_SERVICE))
.map(future -> future.thenCompose(i -> CompletableFuture.supplyAsync(() -> runStage2(i), EXECUTOR_SERVICE)))
.toArray(size -> new CompletableFuture[size]);
CompletableFuture.allOf(futures).join();
Second
You should never forget to handle exceptions; with CompletableFuture, you can directly handle them like this:
completableFuture.handle((s, e) -> e.toString()).join()
or take advantage of them this way to interrupt the computation:
completableFuture.completeExceptionally(new RuntimeException("Calculation failed!"));
while you will easily encounter some serious problems using Thread.

CompletableFuture is a promise which uses default ForkJoinPool (thread pool of size equal to number of CPU's) unless provided with another ThreadPool. A thread pool will contain n or more number of Thread.

Related

Thread join vs ExecutorService.awaitTermination

I have a group of threads which all need to be executed in parallel and I must wait on all of them to complete.
Should I use the plain old Thread or the ExecutorService ? for the ExecutorService.awaitTermination I must give a certain time that I'm willing to wait but for Thread.join I must not.
I don't do anything with the results that the threads give , I don't need any futures.
EDIT:
ExecutorService es = Executors.newFixedThreadPool(kThreads);
List<Callable<Void>> calls = new LinkedList<>();
container.forEach(
calls.add(() -> { //creating a thread/task
BufferedImage scaledBufferedImage=imageService.scale(...);
imageService.transferToAWS(...);
return null;
})
);
es.invokeAll(calls); //executes each task
es.shutdown(); //ensure that no new tasks will be accepted
es.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS); //wait for all tasks to finish
return kLinksToTheScaledImages;
As you say, you don't need the Futures as such, but you can use them to await termination.
Since you're using Java 8, you can do this via
ExecutorService es = Executors.newFixedThreadPool(kThreads);
container.stream()
.map(d -> createRunnable(d)) // now you have Runnables
.map(es::submit) // now they're Futures
.forEach(Future::get); // get() will wait for futures to finish
EDIT:
I just realized that the stream will probably prevent the parallelism to begin with, so you need to collect them intermediately:
List<Future<?>> futures = container.stream()
.map(d -> createRunnable(d)) // now you have Runnables
.map(es::submit) // now they're Futures
.collect(Collectors.toList());
futures.forEach(Future::get);
Actually, I fully understand that people are confused about having to wait without the futures returning a value. IMO, it would make more sense to have each future return the link of the upload it creates:
String scaleAndUpload() {
BufferedImage scaledBufferedImage=imageService.scale(...);
imageService.transferToAWS(...);
return linkToUploadedImage;
}
So you would get something like
List<Future<?>> futures = container.stream()
.map(d -> scaleAndUpload()) // now you have Runnables
.map(es::submit) // now they're Futures
.collect(Collectors.toList());
return futures.stream()
.map(Future::get) // collect the link when the future is finished
.collect(Collectors.toList()); // create a list of them
go for the executor service. this gives you many other benefits on top of managing the thread termination (e.g. capacity scaling and separation of execution logic and the actual tasks)
With Executors you just do this.
ex.shutdown();
while (!ex.awaitTermination(1, TimeUnit.MINUTES)) {
}
Where ex is your ExecutorService. In the loop you can check if your threads are still alive or something like that.
Should I use the plain old Thread or the ExecutorService ?
Use ExecutorService
for the ExecutorService.awaitTermination I must give a certain time that I'm willing to wait but for Thread.join I must not.
You have to look into this post for proper shutdown of ExecutorService.
How to properly shutdown java ExecutorService
Few more alternatives have been quoted in this post:
wait until all threads finish their work in java
You can do while(!executor.isTerminated) {}. Then you don't have to say how long you are willing to wait.

If I'm dependent on a slow service, how do I avoid spending processor time waiting for it?

So I'm working with code a little like this:
public void callSlowService(List<Object> objectsToCallFor) {
objectsToCallFor
.parallelStream()
.forEach(object -> slowService.call(object))
}
Where slowService.call takes ~100-500ms
The problem is, I can parallelize all I want, but at the end of the day I'm still locking a thread for 500ms just waiting around, and my CPUs have other things they could be doing for other threads.
Assuming that I can't change this other service whatsoever (I can't), is there some other design I could use on my side which frees my CPUs do other things while waiting for a response from slowService?
You can't.
However, you can mitigate the problem. Use one new thread that is fed through a BlockingQueue of List<Object> objectsToCallFor and feeds another queue with the results as they arrive.
You will still wait the required time for the results but at least your main processing will continue meanwhile.
If I understand you correctly you wanna make this method asynchronous. If you don't need results and you want to skip error handling you may want to launch code in new thread.
Like:
public void callSlowService(final List<Object> objectsToCallFor) {
new Thread(() -> {
objectsToCallFor
.parallelStream()
.forEach(object -> slowService.call(object));
}).start();
}
Another(better) way - using ThreadPool. You shoulc be carefull with using parallelStream() -> then you will be using shared ForkJoinPool that can affect performance of your application. It is more safety to use old good cached thread pool, or any other threadpool implementation depends of project needs.
public ExecutorService executor = Executors.newCachedThreadPool();
public void callSlowService(final List<Object> objectsToCallFor) {
for (Object obj : objectsToCallFor) {
executor.submit(() -> {
objectsToCallFor
.parallelStream()
.forEach(object -> slowService.call(object));
});
}
}
In fact, there are few other way to make similar in more beautiful way, but this should works also.

Java parallelstream not using optimal number of threads when using newCachedThreadPool()

I have made two separate implementations of parallel reads from database.
First implementation is using ExecutorService with newCachedThreadPool() constructor and Futures: I simply make a call that returns a future for each read case and then after I make all the calls I call get() on them. This implementation works OK and is fast enough.
The second implementation is using parallel streams. When I put parallel stream call into the same ExecutorService pool it works almost 5 times slower and it seems that it is not using as many threads as I would hope. When I instead put it into ForkJoinPool pool = new ForkJoinPool(50) then it works as fast as the previous implementation.
My question is:
Why do parallel streams under-utilize threads in newCachedThreadPool version?
Here is the code for the second implementation (I am not posting the first implementation, cause that one works OK anyway):
private static final ExecutorService pool = Executors.newCachedThreadPool();
final List<AbstractMap.SimpleImmutableEntry<String, String>> simpleImmutableEntryStream =
personIdList.stream().flatMap(
personId -> movieIdList.stream().map(
movieId -> new AbstractMap.SimpleImmutableEntry<>(personId, movieId))).collect(Collectors.toList());
final Future<Map<String, List<Summary>>> futureMovieSummaryForPerson = pool.submit(() -> {
final Stream<Summary> summaryStream = simpleImmutableEntryStream.parallelStream().map(
inputPair -> {
return FeedbackDao.find(inputPair.getKey(), inputPair.getValue());
}).filter(Objects::nonNull);
return summaryStream.collect(Collectors.groupingBy(Summary::getPersonId));
});
This is related to how ForkJoinTask.fork is implemented, if the current thread comes from a ForkJoinPool it will use the same pool to submit the new tasks but if not it will use the common pool with the total amount of processors in your local machine and here when you create your pool with Executors.newCachedThreadPool(), the thread created by this pool is not recognized as coming from a ForkJoinPool such that it uses the common pool.
Here is how it is implemented, it should help you to better understand:
public final ForkJoinTask<V> fork() {
Thread t;
if ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread)
((ForkJoinWorkerThread)t).workQueue.push(this);
else
ForkJoinPool.common.externalPush(this);
return this;
}
The thread created by the pool Executors.newCachedThreadPool() will not be of type ForkJoinWorkerThread such that it will use the common pool with an under optimized pool size to submit the new tasks.

CompletableFuture single task that continues with many parallel tasks

I have the following code:
return CompletableFuture.supplyAsync(() -> {
return foo; // some custom object
})
.thenAccept(foo -> {
// ??? need to spawn N async parallel jobs that works on 'foo'
});
In english: the first task creates the foo object asynchronously; and then I need to run N parallel processes on it.
Is there a better way to do this then:
...
CompletableFuture[] parallel = new CompletableFuture[N];
for (int i = 0; i < N; i++) {
parallel[i] = CompletableFuture.runAsync(() -> {
work(foo);
});
}
CompletableFuture.allOf(parallel).join();
...
I don't like this as one thread gets locked while waiting N jobs to finish.
You can chain as many independent jobs as you like on a particular prerequisite job, e.g.
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
Collections.nCopies(N, base).forEach(f -> f.thenAcceptAsync(foo -> work(foo)));
will spawn N parallel jobs, invoking work(foo) concurrently, after the completion of the initial job which provides the Foo instance.
But keep in mind, that the underlying framework will consider the number of available CPU cores to size the thread pool actually executing the parallel jobs, so if N > #cores, some of these jobs may run one after another.
If the work is I/O bound, thus, you want to have a higher number of parallel threads, you have to specify your own executor.
The nCopies/forEach is not necessary, a for loop would do as well, but it provides a hint of how to handle subsequent jobs, that depend on the completion of all these parallel jobs:
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
CompletableFuture<Void> all = CompletableFuture.allOf(
Collections.nCopies(N, base).stream()
.map(f -> f.thenAcceptAsync(foo -> work(foo)))
.toArray(CompletableFuture<?>[]::new));
Now you can use all to check for the completion of all jobs or chain additional actions.
Since CompletableFuture.allOf already returns another CompletableFuture<Void>a you can just do another .thenAccept on it and extract the returned values from the CFs in parallel in the callback, that way you avoid calling join

How does CompletableFuture know that tasks are independent?

Imagine that we have the following dummy code:
CompletableFuture<BigInteger> cf1 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(2L));
CompletableFuture<BigInteger> cf2 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(3L));
cf1.thenCombine(cf2, (x, y) -> x.add(y)).thenAccept(System.out::println);
Does JVM know that cf1 and cf2 carry independent threads in this case? And what will change if threads will be dependent (for example, use one connection to database)?
More general, how does CompletableFuture synchronize threads?
A CompletableFuture has no relation to any thread. It is just a holder for a result retrieved asynchronously with methods to operate on that result.
The static supplyAsync and runAsync methods are just helper methods. The javadoc of supplyAsync states
Returns a new CompletableFuture that is asynchronously completed by a
task running in the ForkJoinPool.commonPool() with the value obtained
by calling the given Supplier.
This is more or less equivalent to
Supplier<R> sup = ...;
CompletableFuture<R> future = new CompletableFuture<R>();
ForkJoinPool.commonPool().submit(() -> {
try {
R result = sup.get();
future.complete(result);
} catch (Throwable e) {
future.completeExceptionally(e);
}
});
return future;
The CompletableFuture is returned, even allowing you to complete it before the task submitted to the pool.
More general, how does CompletableFuture synchronize threads?
It doesn't, since it doesn't know which threads are operating on it. This is further hinted at in the javadoc
Since (unlike FutureTask) this class has no direct control over the
computation that causes it to be completed, cancellation is treated as
just another form of exceptional completion. Method cancel has the
same effect as completeExceptionally(new CancellationException()).
Method isCompletedExceptionally() can be used to determine if a
CompletableFuture completed in any exceptional fashion.
CompletableFuture objects do not control processing.
I don't think that a CompletableFuture (CF) "synchronizes threads". It uses the executor you have provided or the common pool if you have not provided one.
When you call supplyAsync, the CF submits the various tasks to that pool which in turns manages the underlying threads to execute the tasks.
It doesn't know, nor does it try to synchronize anything. It is still the client's responsibility to properly synchronize access to mutable shared data.

Categories