I have two questions:
1. What is the simplest canonical form for running a Callable as a task in Java 8, capturing and processing the result?
2. In the example below, what is the best/simplest/clearest way to hold the main process open until all the tasks have completed?
Here's the example I have so far -- is this the best approach in Java 8 or is there something more basic?
import java.util.*;
import java.util.concurrent.*;
import java.util.function.*;
public class SimpleTask implements Supplier<String> {
private SplittableRandom rand = new SplittableRandom();
final int id;
SimpleTask(int id) { this.id = id; }
#Override
public String get() {
try {
TimeUnit.MILLISECONDS.sleep(rand.nextInt(50, 300));
} catch(InterruptedException e) {
System.err.println("Interrupted");
}
return "Completed " + id + " on " +
Thread.currentThread().getName();
}
public static void main(String[] args) throws Exception {
for(int i = 0; i < 10; i++)
CompletableFuture.supplyAsync(new SimpleTask(i))
.thenAccept(System.out::println);
System.in.read(); // Or else program ends too soon
}
}
Is there a simpler and clearer Java-8 way to do this? And how do I eliminate the System.in.read() in favor of a better approach?
The canonical way to wait for the completion of multiple CompletableFuture instance is to create a new one depending on all of them via CompletableFuture.allOf. You can use this new future to wait for its completion or schedule new follow-up actions just like with any other CompletableFuture:
CompletableFuture.allOf(
IntStream.range(0,10).mapToObj(SimpleTask::new)
.map(s -> CompletableFuture.supplyAsync(s).thenAccept(System.out::println))
.toArray(CompletableFuture<?>[]::new)
).join();
Of course, it always gets simpler if you forego assigning a unique id to each task. Since your first question was about Callable, I’ll demonstrate how you can easily submit multiple similar tasks as Callables via an ExecutorService:
ExecutorService pool = Executors.newCachedThreadPool();
pool.invokeAll(Collections.nCopies(10, () -> {
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(
ThreadLocalRandom.current().nextInt(50, 300)));
final String s = "Completed on "+Thread.currentThread().getName();
System.out.println(s);
return s;
}));
pool.shutdown();
The executor service returned by Executors.newCachedThreadPool() is unshared and won’t stay alive, even if you forget to invoke shutDown(), but it can take up to one minute before all threads are terminated then.
Since your first question literally was: “What is the simplest canonical form for running a Callable as a task in Java 8, capturing and processing the result?”, the answer might be that the simplest form still is invoking it’s call() method directly, e.g.
Callable<String> c = () -> {
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(
ThreadLocalRandom.current().nextInt(50, 300)));
return "Completed on "+Thread.currentThread().getName();
};
String result = c.call();
System.out.println(result);
There’s no simpler way…
Consider collecting the futures into a list. Then you can use join() on each future to await their completion in the current thread:
List<CompletableFuture<Void>> futures = IntStream.range(0,10)
.mapToObj(id -> supplyAsync(new SimpleTask(id)).thenAccept(System.out::println))
.collect(toList());
futures.forEach(CompletableFuture::join);
Related
I have a Stream<Item> which I'm mapping to a CompleteableFuture<ItemResult>
What I'd like to do is to know when all the futures are completed.
One may suggest to:
collect all the futures to an array and use CompleteableFuture.allOf(). This is somewhat problematic since there could be hundreds of thousands of items
just continue with forEach(CompleteableFuture::join). This is problematic too as calling forEach with join will just block the stream and it will be essentially a serial processing and not concurrent
Inject a poisoned item in the end of the stream. This could work but it's not that elegant in my view
check if the executor queue is empty - This is quite limiting because I might use more than one executor in the future. Also, the queue can be momentarily empty
Monitor the database instead and check the number of new items
I feel like all the suggested solutions aren't good enough.
What is the appropriate way to monitor the futures?
Thanks
EDIT:
another (vague) idea I had in mind is to use a counter and wait for it to go down to zero. But again, need to check that it's not a momentarily 0..
Disclaimer: I'm not sure whether Phaser is the right tool here, and if yes, whether it's better to have one root with multiple children or to chain them like I'm proposing below, so feel free to correct me.
Here's one approach that uses Phaser.
A Phaser has a limited number of parties, so we need to create a new child Phaser if that limit is about to get reached:
private Phaser register(Phaser phaser) {
if (phaser.getRegisteredParties() < 65534) {
// warning: side-effect,
// conflicts with AtomicReference#updateAndGet recommendation,
// might not fit well if the Stream is parallel:
phaser.register();
return phaser;
} else {
return new Phaser(phaser, 1);
}
}
Register each CompletableFuture against that Phaser chain, and deregister once done:
private void register(CompletableFuture<?> future, AtomicReference<Phaser> phaser) {
Phaser registeredPhaser = phaser.updateAndGet(this::register);
future
.thenRun(registeredPhaser::arriveAndDeregister)
.exceptionally(e -> {
// log e?
registeredPhaser.arriveAndDeregister();
return null;
});
}
Wait for all futures to be finished:
private <T> void await(Stream<CompletableFuture<T>> futures) {
Phaser rootPhaser = new Phaser(1);
AtomicReference<Phaser> phaser = new AtomicReference<>(rootPhaser);
futures.forEach(future -> register(future, phaser));
rootPhaser.arriveAndAwaitAdvance();
rootPhaser.arriveAndDeregister();
}
Example:
ExecutorService executor = Executors.newFixedThreadPool(500);
// creating fake stream with 500,000 futures:
Stream<CompletableFuture<Integer>> stream = IntStream
.rangeClosed(1, 500_000)
.mapToObj(i -> CompletableFuture.supplyAsync(() -> {
try {
TimeUnit.MILLISECONDS.sleep(10);
if (i % 50_000 == 0) {
System.out.println(Thread.currentThread().getName() + ": " + i);
}
return i;
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
}, executor));
// usage:
await(stream);
System.out.println("Done");
Outputs:
pool-1-thread-348: 50000
pool-1-thread-395: 100000
pool-1-thread-333: 150000
pool-1-thread-30: 200000
pool-1-thread-120: 250000
pool-1-thread-10: 300000
pool-1-thread-241: 350000
pool-1-thread-340: 400000
pool-1-thread-283: 450000
pool-1-thread-176: 500000
Done
Hi I am somewhat new to Java.
I have a method that takes in a map, and for each key value pair in the map it writes to a file.
I want to have a thread per key,value pair in the map running so that I can create multiple files at the same time. Not sure what is the proper way of doing this or how to use executor service to do get this done.
Here is a very simple example of what I'm trying to. Instead of writing all the code for writing the file Im just using system.out.println in the example:
public class CityWriter
{
public static void main(String []args)
{
LinkedHashMap<Integer, ArrayList<City>> stateNumCitiesMap = new LinkedHashMap<Integer, ArrayList<City>>();
stateNumCitiesMap = retrieveStateCitiesMap();
int numOfThreadsToExecuteAtATime = 10;
ExecutorService executor = Executors.newFixedThreadPool(numOfThreadsToExecuteAtATime);
for(Integer key : stateNumCitiesMap.keySet()) //Could have up to 50 key,values in map
{
executor.execute(writeCitiesOfStateToFile(key, StateNumCitiesMap.get(key)));
}
executor.shutdown();
}
public LinkedHashMap<Integer, ArrayList<Cities>> writeCitiesOfStateToFile(int stateNum, List<City> citiesList)
{
for(City city : citiesList)
{
System.out.println(stateNum +" "+ city);
}
}
}//end of class
My problem is that it doesn't seem like it is executing threads in parallel here. Also I don't want to run more than 10 threads at a time even though the for loop will call the executor 50 times.
please let me know what would be the most efficient way to do this.
Actually, if i understood your question well, your code does exactly what you want(of course if we omit all syntax errors in your code snippet):
It does not spawn more than 10 threads, because you have specified here Executors.newFixedThreadPool(10) how many threads you want
All your x map entries will be assigned to executor as potential job. Then executor will run each of them in parallel with all 10 threads(but no more than 10 jobs at once)
You can try this snippet out and check that several threads are doing the job in parallel:
public static void main(String[] args) {
Map<Integer, List<String>> stateNumCitiesMap = new LinkedHashMap<>();
for (int i = 0; i < 100; i++) {
stateNumCitiesMap.put(i, Collections.singletonList("ABC"));
}
ExecutorService executor = Executors.newFixedThreadPool(10);
for (Integer key : stateNumCitiesMap.keySet()) {
executor.execute(() -> writeCitiesOfStateToFile(key, stateNumCitiesMap.get(key)));
}
executor.shutdown();
}
public static void writeCitiesOfStateToFile(int stateNum, List<String> citiesList) {
for (String city : citiesList) {
System.out.println(stateNum + " " + Thread.currentThread().getName());
}
}
In case you don't want to give a job one by one to the executor, you can pass a batch of them once.
public static void main(String[] args) throws InterruptedException {
Map<Integer, List<String>> stateNumCitiesMap = new LinkedHashMap<>();
for (int i = 0; i < 100; i++) {
stateNumCitiesMap.put(i, Collections.singletonList("ABC"));
}
ExecutorService executor = Executors.newFixedThreadPool(10);
List<Callable<Void>> jobs = new ArrayList<>();
for (Integer key : stateNumCitiesMap.keySet()) {
jobs.add(() -> {
writeCitiesOfStateToFile(key, stateNumCitiesMap.get(key));
return null;
});
}
executor.invokeAll(jobs);
executor.shutdown();
}
public static void writeCitiesOfStateToFile(int stateNum, List<String> citiesList) {
for (String city : citiesList) {
System.out.println(stateNum + " " + Thread.currentThread().getName());
}
}
You can use "invokeAll" method for multiple executions and even get their results (as done or not). It will use 10 thread for them even they are 50. The results will be returned when all tasks are completed. Something like below, take it as pseudo.
Callable<int> callableTask = (fileName) -> {
// implement write to the file
return 0;
};
ExecutorService executor = Executors.newFixedThreadPool(10);
List<Callable<int>> tasksList;
for(City city : citiesList)
{
tasksList.add(callableTask(city.toString()));
}
executor.invokeAll(tasksList);
In java you need to provide the runnable interface to any object you wish to run in a thread , you are not doing this and this is what the executor is expecting.
executor.execute(() -> your function )
is actually
executor.execute(new Runnable() {
#Override
public void run() {
// your code
}
});
the method does not implement runnables , only in the run method of runnable will it be threaded
the reason is that the executor uses a kind of observer pattern and you subscribe the runnable to it , the executor then runs the run method
from java docs :
The Runnable interface should be implemented by any class whose instances are intended to be executed by a thread. The class must define a method of no arguments called run.
This interface is designed to provide a common protocol for objects that wish to execute code while they are active. For example, Runnable is implemented by class Thread. Being active simply means that a thread has been started and has not yet been stopped.
Can also make the method return a runnable itself
public static Runnable writeCitiesOfStateToFile(params) {
return () -> System.out.println(params);
}
Executor#execute may be synchronous
You said:
it doesn't seem like it is executing threads in parallel here
You did not explain the reason for that perception.
But, FYI, that may indeed be the case. You called the execute method on your ExecutorService.
for(Integer key : stateNumCitiesMap.keySet()) //Could have up to 50 key,values in map
{
executor.execute(writeCitiesOfStateToFile(key, StateNumCitiesMap.get(key)));
}
That execute method is inherited from the Executor interface, the super-interface of ExecutorService. That interface, and its method, are documented as maybe running your task asynchronously. To quote the Javadoc:
The command may execute in a new thread, in a pooled thread, or in the calling thread, at the discretion of the Executor implementation.
So you may indeed be seeing sequential non-thread synchronous execution rather than async.
From my reading of the ExecutorService methods submit, invokeAll, & invokeAny, these seem to be promising to always run asynchronously.
I do not believe such synchronous behavior is happening though, given your choice of ExecutorService implementation. Your call to Executors.newFixedThreadPool produces an object of type ThreadPoolExecutor. Looking briefly at the source code of that concrete class’ execute method, it appears to always work asynchronously (though I am not entirely sure).
Nevertheless, it would seem that we should not always assume async execution when using Executor#execute.
I'm starting to be comfortable with Java CompletableFuture composition, having worked with JavaScript promises. Basically the composition just scheduled the chained commands on the indicated executor. But I'm unsure of which thread is running when the composition is performed.
Let's say I have two executors, executor1 and executor2; for simplicity let's say they are separate thread pools. I schedule a CompletableFuture (to use a very loose description):
CompletableFuture<Foo> futureFoo = CompletableFuture.supplyAsync(this::getFoo, executor1);
Then when that is done I transform the Foo to Bar using the second executor:
CompletableFuture<Bar> futureBar .thenApplyAsync(this::fooToBar, executor2);
I understand that getFoo() will be called from a thread in the executor1 thread pool. I understand that fooToBar() will be called from a thread in the executor2 thread pool.
But what thread is used for the actual composition, i.e. after getFoo() finishes and futureFoo() is complete; but before the fooToBar() command gets scheduled on executor2? In other words, what thread actually runs the code to schedule the second command on the second executor?
Is the scheduling performed as part of the same thread in executor1 that called getFoo()? If so, would this completable future composition be equivalent to my simply scheduling fooToBar() manually myself in the first command in the executor1 task?
This is intentionally unspecified. In practice, it will be handled by the same code that also handles the chained operations when the variants without the Async suffix are invoked and exhibits similar behavior.
So when we use the following test code
CompletableFuture.supplyAsync(() -> {
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
return "";
}, r -> new Thread(r, "A").start())
.thenAcceptAsync(s -> {}, r -> {
System.out.println("scheduled by " + Thread.currentThread());
new Thread(r, "B").start();
});
it will likely print
scheduled by Thread[A,5,main]
as the thread that completed the previous stage was used to schedule the depending action.
However when we use
CompletableFuture<String> first = CompletableFuture.supplyAsync(() -> "",
r -> new Thread(r, "A").start());
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
first.thenAcceptAsync(s -> {}, r -> {
System.out.println("scheduled by " + Thread.currentThread());
new Thread(r, "B").start();
});
it will likely print
scheduled by Thread[main,5,main]
as by the time the main thread invokes thenAcceptAsync, the first future is already completed and the main thread will schedule the action itself.
But that is not the end of the story. When we use
CompletableFuture<String> first = CompletableFuture.supplyAsync(() -> {
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(5));
return "";
}, r -> new Thread(r, "A").start());
Set<String> s = ConcurrentHashMap.newKeySet();
Runnable submitter = () -> {
String n = Thread.currentThread().getName();
do {
for(int i = 0; i < 1000; i++)
first.thenAcceptAsync(x -> s.add(n+" "+Thread.currentThread().getName()),
Runnable::run);
} while(!first.isDone());
};
Thread b = new Thread(submitter, "B");
Thread c = new Thread(submitter, "C");
b.start();
c.start();
b.join();
c.join();
System.out.println(s);
It may not only print the combinations B A and C A from the first scenario and B B and C C from the second. On my machine it reproducibly also prints the combinations B C and C B indicating that an action passed to thenAcceptAsync by one thread got submitted to the executor by the other thread calling thenAcceptAsync with a different action at the same time.
This is matching the scenarios for the thread evaluating the function passed to thenApply (without the Async) described in this answer. As said at the beginning, that was what I expected as both things are likely handled by the same code. But unlike the thread evaluating the function passed to thenApply, the thread invoking the execute method on the Executor is not even mentioned in the documentation. So in theory, another implementation could use an entirely different thread not calling a method on the future nor completing it.
At the end is a simple program that does like your code snippet and allows you to play with it.
The output confirms that the executor you supply is called to complete (unless you explicitly call complete early enough - which would happen in the calling thread of complete) when the condition it is waiting on is ready - the get() on a Future blocks until the Future is finished.
Supply an arg - there's an executor 1 and executor 2, supply no args there's just one executor. The output is either (same executor - things a run as separate tasks in the same executor sequentially) -
In thread Thread[main,5,main] - getFoo
In thread Thread[main,5,main] - getFooToBar
In thread Thread[pool-1-thread-1,5,main] - Supplying Foo
In thread Thread[pool-1-thread-1,5,main] - fooToBar
In thread Thread[main,5,main] - Completed
OR (two executors - things again run sequentially but using different executors) -
In thread Thread[main,5,main] - getFoo
In thread Thread[main,5,main] - getFooToBar
In thread Thread[pool-1-thread-1,5,main] - Supplying Foo
In thread Thread[pool-2-thread-1,5,main] - fooToBar
In thread Thread[main,5,main] - Completed
Remember: the code with the executors (in this example can start immediately in another thread .. the getFoo was called prior to even getting to setting up the FooToBar).
Code follows -
package your.test;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.function.Function;
import java.util.function.Supplier;
public class TestCompletableFuture {
private static void dumpWhichThread(final String msg) {
System.err.println("In thread " + Thread.currentThread().toString() + " - " + msg);
}
private static final class Foo {
final int i;
Foo(int i) {
this.i = i;
}
};
public static Supplier<Foo> getFoo() {
dumpWhichThread("getFoo");
return new Supplier<Foo>() {
#Override
public Foo get() {
dumpWhichThread("Supplying Foo");
return new Foo(10);
}
};
}
private static final class Bar {
final String j;
public Bar(final String j) {
this.j = j;
}
};
public static Function<Foo, Bar> getFooToBar() {
dumpWhichThread("getFooToBar");
return new Function<Foo, Bar>() {
#Override
public Bar apply(Foo t) {
dumpWhichThread("fooToBar");
return new Bar("" + t.i);
}
};
}
public static void main(final String args[]) throws InterruptedException, ExecutionException, TimeoutException {
final TestCompletableFuture obj = new TestCompletableFuture();
obj.running(args.length == 0);
}
private String running(final boolean sameExecutor) throws InterruptedException, ExecutionException, TimeoutException {
final Executor executor1 = Executors.newSingleThreadExecutor();
final Executor executor2 = sameExecutor ? executor1 : Executors.newSingleThreadExecutor();
CompletableFuture<Foo> futureFoo = CompletableFuture.supplyAsync(getFoo(), executor1);
CompletableFuture<Bar> futureBar = futureFoo.thenApplyAsync(getFooToBar(), executor2);
try {
// Try putting a complete here before the get ..
return futureBar.get(50, TimeUnit.SECONDS).j;
}
finally {
dumpWhichThread("Completed");
}
}
}
Which thread triggers the Bar stage to progress - in the above - it's executor1. In general the thread completing the future (i.e. giving it a value) is what releases the thing depending on it. If you completed the FutureFoo immediately on the main thread - it would be the one triggering it.
SO you have to be careful with this. If you have "N" things all waiting on the future results - but use only a single threaded executor - then the first one scheduled will block that executor until it completes. You can extrapolate to M threads, N futures - it can decay into "M" locks preventing the rest of things progressing.
There's a thread pool with a single thread that is used to perform tasks submitted by multiple threads. The task is actually comprised of two parts - perform with meaningful result and cleanup that takes quite some time but returns no meaningful result. At the moment (obviously incorrect) implementation looks something like this. Is there an elegant way to ensure that another perform task will be executed only after previous cleanup task?
public class Main {
private static class Worker {
int perform() {
return 1;
}
void cleanup() {
}
}
private static void perform() throws InterruptedException, ExecutionException {
ExecutorService pool = Executors.newFixedThreadPool(1);
Worker w = new Worker();
Future f = pool.submit(() -> w.perform());
pool.submit(w::cleanup);
int x = (int) f.get();
System.out.println(x);
}
}
Is there an elegant way to ensure that another perform task will be executed only after previous cleanup task?
The most obvious thing to do is to call cleanup() from perform() but I assume there is a reason why you aren't doing that.
You say that your solution is currently "obviously incorrect". Why? Because of race conditions? Then you could add a synchronized block:
synchronized (pool) {
Future f = pool.submit(() -> w.perform());
pool.submit(w::cleanup);
}
That would ensure that the cleanup() would come immediately after a perform(). If you are worried about the performance hit with the synchronized, don't be.
Another solution might be to use the ExecutorCompletionService class although I'm not sure how that would help with one thread. I've used it before when I had cleanup tasks running in another thread pool.
If you are using java8, you can do this with CompletableFuture
CompletableFuture.supplyAsync(() -> w.perform(), pool)
.thenApplyAsync(() -> w.cleanup(), pool)
.join();
I have Callable object executed using ExecutorService.
How to return interim results from this callable?
I know there is javax.swing.SwingWorker#publish(results) for Swing but I don't use Swing.
There are a couple of ways of doing this. You could do it with a callback or you could do it with a queue.
Here's an example of doing it with a callback:
public static interface Callback<T> {
public void on(T event);
}
Then, an implementation of the callback that does something with your in progress events:
final Callback<String> callback = new Callback<String>() {
public void on(String event) {
System.out.println(event);
}
};
Now you can use the callback in your pool:
Future<String> submit = pool.submit(new Callable<String>() {
public String call() throws Exception {
for(int i = 0; i < 10; i++) {
callback.on("process " + i);
}
return "done";
}
});
It is not clear what an "interim result" really is. The interfaces used in the concurrency package simply do not define this, but assume methods that resemble more or less pure functions.
Hence, instead this:
interim = compute something
finalresult = compute something else
do something like this:
interim = compute something
final1 = new Pair( interim, fork(new Future() { compute something else }) )
(Pseudocode, thought to convey the idea, not compileable code)
EDIT The idea is: instead of running a single monolithic block of computations (that happens to reach a state where some "interim results" are available) break it up so that the first task returns the former "interim" result and, at the same time, forks a second task that computes the final result. Of course, a handle to this task must be delivered to the caller so that it eventually can get the final result. Usually, this is done with the Future interface.
You can pass, let's say, an AtomicInteger to your class (the one that will be submitted by the executor) inside that class you increment it's value and from the calling thread you check it's value
Something like this:
public class LongComputation {
private AtomicInteger progress = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException,
ExecutionException {
AtomicInteger progress = new AtomicInteger(0);
LongComputation computation = new LongComputation(progress);
ExecutorService executor = Executors.newFixedThreadPool(2);
Future<Integer> result = executor.submit(() -> computation.compute());
executor.shutdown();
while (!result.isDone()) {
System.out.printf("Progress...%d%%%n", progress.intValue());
TimeUnit.MILLISECONDS.sleep(100);
}
System.out.printf("Result=%d%n", result.get());
}
public LongComputation(AtomicInteger progress) {
this.progress = progress;
}
public int compute() throws InterruptedException {
for (int i = 0; i < 100; i++) {
TimeUnit.MILLISECONDS.sleep(100);
progress.incrementAndGet();
}
return 1_000_000;
}
}
What you're looking for is java.util.concurrent.Future.
A Future represents the result of an asynchronous computation. Methods
are provided to check if the computation is complete, to wait for its
completion, and to retrieve the result of the computation. The result
can only be retrieved using method get when the computation has
completed, blocking if necessary until it is ready. Cancellation is
performed by the cancel method. Additional methods are provided to
determine if the task completed normally or was cancelled. Once a
computation has completed, the computation cannot be cancelled. If you
would like to use a Future for the sake of cancellability but not
provide a usable result, you can declare types of the form Future
and return null as a result of the underlying task.
You would have to roll your own API with something like Observer/Observerable if you want to publish intermediate results as a push. A simpler thing would be to just poll for current state through some self defined method.