Future.cancel(true) does not reliably cancel/interrupt thread - java

I try to execute several tasks in parallel with a CompletionService. The problems arise, when i try to implement cancelation.
Here is a sketch of the code I use:
void startTasks(int numberOfTasks) throws Exception {
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
CompletionService<TaskResultType> completionService = new ExecutorCompletionService<TaskResultType>(executor);
ConcurrentLinkedQueue<TaskResultType> results = new ConcurrentLinkedQueue<BenchmarkResult>();
ArrayList<Future> futures = new ArrayList<Future>();
for (int i = 0; i < numberOfTasks ; i++) {
TypeOfTask task = ... ;
Future future = completionService.submit(task);
futures.add(future);
}
boolean failed = false;
Throwable cause = null;
for (int i = 0; i < numberOfThreads; i++) {
try {
Future<TaskResultType> resultFuture = completionService.take();
TaskResultType result = resultFuture.get();
results.add(result);
} catch (ExecutionException e) {
failed = true;
cause = e.getCause();
/* cancel all other running tasks in case of failure in one task */
for (Future future : futures) {
future.cancel(true);
}
} catch (CancellationException e) {
// consume (planned cancellation from calling future.cancel())
}
}
executor.shutdown();
// code to throw an exception using cause
}
The tasks implement Callable.
When I now throw an exception in one of the tasks in most of the cases it works out fine, i.e. I immediately get the CancellationExceptions from the other tasks and the tasks finish immediately (lets call this case A). But sometimes (lets call this case B), some of the tasks finish first and then throw the CancellationException. Future.cancel(true) returned true in both cases for all tasks (except the one with the initial ExecutionException, because this one was already canceled).
I check for the interrupted flag with Thread.currentThread.isInterrupted(), in the tasks that do complete (i.e. the tasks where the cancelation is unsuccessful), the interrupted flag is set to false.
All that seems to be very strange behavior in my eyes. Anybody any idea what the problem could be?
Update
The best idea so far I have is, that somewhere deep within the code comprising the task (only some high level code is from myself) the interrupted status is consumed, e.g. by a catched InterruptedException which doesn't call Thread.interrupt() to reestablish the status. The exact time the interrupted flag is set by Future.cancel() might vary slightly due to scheduling of the threads, that would explain the inconsistent behavior. Would a consumed interrupted status explain the behavior of case B?

But sometimes, some of the tasks finish first and then throw the
CancellationException.
Could it be the case that you cancel a task, it's normally interrupted (in this state you may think that it returns a result, but for the CompletionService it's cancelled), the future is being returned by take(), you call future.get() and there is the CancellationException.
You could also have a look at Guava's Futures.allAsList, which seems to be doing a very similar thing:
Creates a new ListenableFuture whose value is a list containing the values of all its input futures, if all succeed. If any input fails, the returned future fails.

But sometimes, some of the tasks finish first and then throw the CancellationException. Future.cancel(true) returned true in both cases for all tasks (except the one with the initial ExecutionException, because this one was already canceled).
If your program was not finishing (or taking a long time to finish after an exception is thrown) then I suspect your problem is that one of the tasks is doing IO or otherwise blocked and is not checking for Thread.currentThread().isInterrupted(). So even though you have canceled the Future and the thread is interrupted, this doesn't get detected.
However, it seems like the program is finishing. So I'm not sure what the error case is here. If you catch an exception you call future.cancel(true) on all of the futures in the list. The one that threw and the ones that have already finished should all return false since they can't be canceled. The ones that returned true from cancel should have been interrupted.
For example. if the 2nd to last thread throws an exception then future.cancel(true) should only return true for the last thread running.
One thing to do is to remove the futures as they finish so you don't re-cancel already completed jobs. But all that might do is mask the issue you are seeing now:
Future<TaskResultType> resultFuture = completionService.take();
futures.remove(resultFuture);
Update:
It is highly possible that some code is swallowing the interrupt. It happens all of the time unfortunately. If some of the threads are not finishing immediately when they are canceled and are running to completion then this is probably what is happening.

Related

How to cancel Java 8 completable future?

I am playing with Java 8 completable futures. I have the following code:
CountDownLatch waitLatch = new CountDownLatch(1);
CompletableFuture<?> future = CompletableFuture.runAsync(() -> {
try {
System.out.println("Wait");
waitLatch.await(); //cancel should interrupt
System.out.println("Done");
} catch (InterruptedException e) {
System.out.println("Interrupted");
throw new RuntimeException(e);
}
});
sleep(10); //give it some time to start (ugly, but works)
future.cancel(true);
System.out.println("Cancel called");
assertTrue(future.isCancelled());
assertTrue(future.isDone());
sleep(100); //give it some time to finish
Using runAsync I schedule execution of a code that waits on a latch. Next I cancel the future, expecting an interrupted exception to be thrown inside. But it seems that the thread remains blocked on the await call and the InterruptedException is never thrown even though the future is canceled (assertions pass). An equivalent code using ExecutorService works as expected. Is it a bug in the CompletableFuture or in my example?
When you call CompletableFuture#cancel, you only stop the downstream part of the chain. Upstream part, i. e. something that will eventually call complete(...) or completeExceptionally(...), doesn't get any signal that the result is no more needed.
What are those 'upstream' and 'downstream' things?
Let's consider the following code:
CompletableFuture
.supplyAsync(() -> "hello") //1
.thenApply(s -> s + " world!") //2
.thenAccept(s -> System.out.println(s)); //3
Here, the data flows from top to bottom - from being created by supplier, through being modified by function, to being consumed by println. The part above particular step is called upstream, and the part below is downstream. E. g. steps 1 and 2 are upstream for step 3.
Here's what happens behind the scenes. This is not precise, rather it's a convenient mind model of what's going on.
Supplier (step 1) is being executed (inside the JVM's common ForkJoinPool).
The result of the supplier is then being passed by complete(...) to the next CompletableFuture downstream.
Upon receiving the result, that CompletableFuture invokes next step - a function (step 2) which takes in previous step result and returns something that will be passed further, to the downstream CompletableFuture's complete(...).
Upon receiving the step 2 result, step 3 CompletableFuture invokes the consumer, System.out.println(s). After consumer is finished, the downstream CompletableFuture will receive it's value, (Void) null
As we can see, each CompletableFuture in this chain has to know who are there downstream waiting for the value to be passed to their's complete(...) (or completeExceptionally(...)). But the CompletableFuture don't have to know anything about it's upstream (or upstreams - there might be several).
Thus, calling cancel() upon step 3 doesn't abort steps 1 and 2, because there's no link from step 3 to step 2.
It is supposed that if you're using CompletableFuture then your steps are small enough so that there's no harm if a couple of extra steps will get executed.
If you want cancellation to be propagated upstream, you have two options:
Implement this yourself - create a dedicated CompletableFuture (name it like cancelled) which is checked after every step (something like step.applyToEither(cancelled, Function.identity()))
Use reactive stack like RxJava 2, ProjectReactor/Flux or Akka Streams
Apparently, it's intentional. The Javadoc for the method CompletableFuture::cancel states:
[Parameters:] mayInterruptIfRunning - this value has no effect in this implementation because interrupts are not used to control processing.
Interestingly, the method ForkJoinTask::cancel uses almost the same wording for the parameter mayInterruptIfRunning.
I have a guess on this issue:
interruption is intended to be used with blocking operations, like sleep, wait or I/O operations,
but neither CompletableFuture nor ForkJoinTask are intended to be used with blocking operations.
Instead of blocking, a CompletableFuture should create a new CompletionStage, and cpu-bound tasks are a prerequisite for the fork-join model. So, using interruption with either of them would defeat their purpose. And on the other hand, it might increase complexity, that's not required if used as intended.
If you actually want to be able to cancel a task, then you have to use Future itself (e.g. as returned by ExecutorService.submit(Callable<T>), not CompletableFuture. As pointed out in the answer by nosid, CompletableFuture completely ignores any call to cancel(true).
My suspicion is that the JDK team did not implement interruption because:
Interruption was always hacky, difficult for people to understand, and difficult to work with. The Java I/O system is not even interruptible, despite calls to InputStream.read() being blocking calls! (And the JDK team have no plans to make the standard I/O system interruptible again, like it was in the very early Java days.)
The JDK team have been trying very hard to phase out old broken APIs from the early Java days, such as Object.finalize(), Object.wait(), Thread.stop(), etc. I believe Thread.interrupt() is considered to be in the category of things that must be eventually deprecated and replaced. Therefore, newer APIs (like ForkJoinPool and CompletableFuture) are already not supporting it.
CompletableFuture was designed for building DAG-structured pipelines of operations, similar to the Java Stream API. It's very dificult to succinctly describe how interruption of one node of a dataflow DAG should affect execution in the rest of the DAG. (Should all concurrent tasks be canceled immediately, when any node is interrupted?)
I suspect the JDK team just didn't want to deal with getting interruption right, given the levels of internal complexity that the JDK and libraries have reached these days. (The internals of the lambda system -- ugh.)
One very hacky way around this would be to have each CompletableFuture export a reference to itself to an externally-visible AtomicReference, then the Thread reference could be interrupted directly when needed from another external thread. Or if you start all the tasks using your own ExecutorService, in your own ThreadPool, you can manually interrupt any or all the threads that were started, even if CompletableFuture refuses to trigger interruption via cancel(true). (Note though that CompletableFuture lambdas cannot throw checked exceptions, so if you have an interruptible wait in a CompletableFuture, you'll have to re-throw as an unchecked exception.)
More simply, you could just declare an AtomicReference<Boolean> cancel = new AtomicReference<>() in an external scope, and periodically check this flag from inside each CompletableFuture task's lambda.
You could also try setting up a DAG of Future instances rather than a DAG of CompletableFuture instances, that way you can exactly specify how exceptions and interruption/cancellation in any one task should affect the other currently-running tasks. I show how to do this in my example code in my question here, and it works well, but it's a lot of boilerplate.
You need an alternative implementation of CompletionStage to accomplish true thread interruption. I've just released a small library that serves exactly this purpose - https://github.com/vsilaev/tascalate-concurrent
The call to wait will still block even if Future.cancel(..) is called. As mentioned by others the CompletableFuture will not use interrupts to cancel the task.
According to the javadoc of CompletableFuture.cancel(..):
mayInterruptIfRunning this value has no effect in this implementation because interrupts are not used to control processing.
Even if the implementation would cause an interrupt, you would still need a blocking operation in order to cancel the task or check the status via Thread.interrupted().
Instead of interrupting the Thread, which might not be always easy to do, you may have check points in your operation where you can gracefully terminate the current task. This can be done in a loop over some elements that will be processed or you check before each step of the operation for the cancel status and throw an CancellationException yourself.
The tricky part is to get a reference of the CompletableFuture within the task in order to call Future.isCancelled(). Here is an example of how it can be done:
public abstract class CancelableTask<T> {
private CompletableFuture<T> task;
private T run() {
try {
return compute();
} catch (Throwable e) {
task.completeExceptionally(e);
}
return null;
}
protected abstract T compute() throws Exception;
protected boolean isCancelled() {
Future<T> future = task;
return future != null && future.isCancelled();
}
public Future<T> start() {
synchronized (this) {
if (task != null) throw new IllegalStateException("Task already started.");
task = new CompletableFuture<>();
}
return task.completeAsync(this::run);
}
}
Edit: Here the improved CancelableTask version as a static factory:
public static <T> CompletableFuture<T> supplyAsync(Function<Future<T>, T> operation) {
CompletableFuture<T> future = new CompletableFuture<>();
return future.completeAsync(() -> operation.apply(future));
}
here is the test method:
#Test
void testFuture() throws InterruptedException {
CountDownLatch started = new CountDownLatch(1);
CountDownLatch done = new CountDownLatch(1);
AtomicInteger counter = new AtomicInteger();
Future<Object> future = supplyAsync(task -> {
started.countDown();
while (!task.isCancelled()) {
System.out.println("Count: " + counter.getAndIncrement());
}
System.out.println("Task cancelled");
done.countDown();
return null;
});
// wait until the task is started
assertTrue(started.await(5, TimeUnit.SECONDS));
future.cancel(true);
System.out.println("Cancel called");
assertTrue(future.isCancelled());
assertTrue(future.isDone());
assertTrue(done.await(5, TimeUnit.SECONDS));
}
If you really want to use interrupts in addition to the CompletableFuture, then you can pass a custom Executor to CompletableFuture.completeAsync(..) where you create your own Thread, override cancel(..) in the CompletableFuture and interrupt your Thread.
The CancellationException is part of the internal ForkJoin cancel routine. The exception will come out when you retrieve the result of future:
try { future.get(); }
catch (Exception e){
System.out.println(e.toString());
}
Took a while to see this in a debugger. The JavaDoc is not that clear on what is happening or what you should expect.

How to notice an exception from the parent thread?

I'm feeding threads into an ExecutorService.
These threads are manipulating some data, and if there's a conflict, the data object throws an exception, which is caught by the conflicting thread, which in turn aborts and does not complete execution.
When this happens, the aborting thread needs to be put back in the queue and fed back into the executor.
How can I tell if an exception was thrown, from the parent thread?
When you submit() a task on the ExecutorService you get a Future as a result. When the execution has finished you can call get() on that future. This will return the result if applicable, or it will throw an ExecutionException if the original task threw one. If you want the real exception object you can do getCause().
Also note that you would be putting a Task back into the service, that task is ran on a Thread which has not really terminated (just caught the exception and is waiting for a new one).
Here is an example usage (you can use Runnable if you don't care for the result).
Callable<String> myCallable = ...;
Future<String> future = myExector.submit(myCallable);
// Do something else until myCallable.isDone() returns true.
try {
String result = future.get();
}catch(ExecutionException e){
// Handle error, perhaps create new Callable to submit.
}

Java BlockingQueue blocking on take(), with a slight twist

I have a situation where I have 2 blocking queues. The first I insert some tasks that I execute. When each task completes, it adds a task to the second queue, where they are executed.
So my first queue is easy: I just check to make sure it's not empty and execute, else I interrupt():
public void run() {
try {
if (taskQueue1.isEmpty()) {
SomeTask task = taskQueue1.poll();
doTask(task);
taskQueue2.add(task);
}
else {
Thread.currentThread().interrupt();
}
}
catch (InterruptedException ex) {
ex.printStackTrace();
}
}
The second one I do the following, which as you can tell, doesn't work:
public void run() {
try {
SomeTask2 task2 = taskQueue2.take();
doTask(task2);
}
catch (InterruptedException ex) {
}
Thread.currentThread().interrupt();
}
How would you solve it so that the second BlockingQueue doesn't block on take(), yet finishes only when it knows there are no more items to be added. It would be good if the 2nd thread could see the 1st blocking queue perhaps, and check if that was empty and the 2nd queue was also empty, then it would interrupt.
I could also use a Poison object, but would prefer something else.
NB: This isn't the exact code, just something I wrote here:
You make it sound as though the thread processing the first queue knows that there are no more tasks coming as soon as its queue is drained. That sounds suspicious, but I'll take you at your word and propose a solution anyway.
Define an AtomicInteger visible to both threads. Initialize it to positive one.
Define the first thread's operation as follows:
Loop on Queue#poll().
If Queue#poll() returns null, call AtomicInteger#decrementAndGet() on the shared integer.
If AtomicInteger#decrementAndGet() returned zero, interrupt the second thread via Thread#interrupt(). (This handles the case where no items ever arrived.)
In either case, exit the loop.
Otherwise, process the extracted item, call AtomicInteger#incrementAndGet() on the shared integer, add the extracted item to the second thread's queue, and continue the loop.
Define the second thread's operation as follows:
Loop blocking on BlockingQueue#take().
If BlockingQueue#take() throws InterruptedException, catch the exception, call Thread.currentThread().interrupt(), and exit the loop.
Otherwise, process the extracted item.
Call AtomicInteger#decrementAndGet() on the shared integer.
If AtomicInteger#decrementAndGet() returned zero, exit the loop.
Otherwise, continue the loop.
Make sure you understand the idea before trying to write the actual code. The contract is that the second thread continues waiting on more items from its queue until the count of expected tasks reaches zero. At that point, the producing thread (the first one) will no longer push any new items into the second thread's queue, so the second thread knows that it's safe to stop servicing its queue.
The screwy case arises when no tasks ever arrive at the first thread's queue. Since the second thread only decrements and tests the count after it processes an item, if it never gets a chance to process any items, it won't ever consider stopping. We use thread interruption to handle that case, at the cost of another conditional branch in the first thread's loop termination steps. Fortunately, that branch will execute only once.
There are many designs that could work here. I merely described one that introduced only one additional entity—the shared atomic integer—but even then, it's fiddly. I think that using a poison pill would be much cleaner, though I do concede that neither Queue#add() nor BlockingQueue#put() accept null as a valid element (due to Queue#poll()'s return value contract). It would be otherwise be easy to use null as a poison pill.
I can't figure out what you are actually trying to do here, but I can say that the interrupt() in your first run() method is either pointless or wrong.
If you are running the run() method in your own Thread object, then that thread is about to exit anyway, so there's no point interrupting it.
If you are running the run() method in an executor with a thread pool, then you most likely don't want to kill the thread or shut down the executor at all ... at that point. And if you do want to shutdown the executor, then you should call one of its shutdown methods.
For instance, here's a version what does what you seeming to be doing without all of the interrupt stuff, and without thread creation/destruction churn.
public class TaskExecutor {
private ExecutorService executor = new ThreadPoolExecutorService(...);
public void submitTask1(final SomeTask task) {
executor.submit(new Runnable(){
public void run() {
doTask(task);
submitTask2(task);
}
});
}
public void submitTask2(final SomeTask task) {
executor.submit(new Runnable(){
public void run() {
doTask2(task);
}
});
}
public void shutdown() {
executor.shutdown();
}
}
If you want separate queuing for the tasks, simply create and use two different executors.

ThreadPoolExecutor's getActiveCount()

I have a ThreadPoolExecutor that seems to be lying to me when I call getActiveCount(). I haven't done a lot of multithreaded programming however, so perhaps I'm doing something incorrectly.
Here's my TPE
#Override
public void afterPropertiesSet() throws Exception {
BlockingQueue<Runnable> workQueue;
int maxQueueLength = threadPoolConfiguration.getMaximumQueueLength();
if (maxQueueLength == 0) {
workQueue = new LinkedBlockingQueue<Runnable>();
} else {
workQueue = new LinkedBlockingQueue<Runnable>(maxQueueLength);
}
pool = new ThreadPoolExecutor(
threadPoolConfiguration.getCorePoolSize(),
threadPoolConfiguration.getMaximumPoolSize(),
threadPoolConfiguration.getKeepAliveTime(),
TimeUnit.valueOf(threadPoolConfiguration.getTimeUnit()),
workQueue,
// Default thread factory creates normal-priority,
// non-daemon threads.
Executors.defaultThreadFactory(),
// Run any rejected task directly in the calling thread.
// In this way no records will be lost due to rejection
// however, no records will be added to the workQueue
// while the calling thread is processing a Task, so set
// your queue-size appropriately.
//
// This also means MaxThreadCount+1 tasks may run
// concurrently. If you REALLY want a max of MaxThreadCount
// threads don't use this.
new ThreadPoolExecutor.CallerRunsPolicy());
}
In this class I also have a DAO that I pass into my Runnable (FooWorker), like so:
#Override
public void addTask(FooRecord record) {
if (pool == null) {
throw new FooException(ERROR_THREAD_POOL_CONFIGURATION_NOT_SET);
}
pool.execute(new FooWorker(context, calculator, dao, record));
}
FooWorker runs record (the only non-singleton) through a state machine via calculator then sends the transitions to the database via dao, like so:
public void run() {
calculator.calculate(record);
dao.save(record);
}
Once my main thread is done creating new tasks I try and wait to make sure all threads finished successfully:
while (pool.getActiveCount() > 0) {
recordHandler.awaitTermination(terminationTimeout,
terminationTimeoutUnit);
}
What I'm seeing from output logs (which are presumably unreliable due to the threading) is that getActiveCount() is returning zero too early, and the while() loop is exiting while my last threads are still printing output from calculator.
Note I've also tried calling pool.shutdown() then using awaitTermination but then the next time my job runs the pool is still shut down.
My only guess is that inside a thread, when I send data into the dao (since it's a singleton created by Spring in the main thread...), java is considering the thread inactive since (I assume) it's processing in/waiting on the main thread.
Intuitively, based only on what I'm seeing, that's my guess. But... Is that really what's happening? Is there a way to "do it right" without putting a manual incremented variable at the top of run() and a decremented at the end to track the number of threads?
If the answer is "don't pass in the dao", then wouldn't I have to "new" a DAO for every thread? My process is already a (beautiful, efficient) beast, but that would really suck.
As the JavaDoc of getActiveCount states, it's an approximate value: you should not base any major business logic decisions on this.
If you want to wait for all scheduled tasks to complete, then you should simply use
pool.shutdown();
pool.awaitTermination(terminationTimeout, terminationTimeoutUnit);
If you need to wait for a specific task to finish, you should use submit() instead of execute() and then check the Future object for completion (either using isDone() if you want to do it non-blocking or by simply calling get() which blocks until the task is done).
The documentation suggests that the method getActiveCount() on ThreadPoolExecutor is not an exact number:
getActiveCount
public int getActiveCount()
Returns the approximate number of threads that are actively executing tasks.
Returns: the number of threads
Personally, when I am doing multithreaded work such as this, I use a variable that I increment as I add tasks, and decrement as I grab their output.

Java Thread - weird Thread.interrupted() and future.cancel(true) behaviour

I want to manage a list of Futures objects returned by my TaskExecutor.
I've something like this
List<Future<String>> list
void process(ProcessThis processThis) {
for ( ...) {
Future<String> future = taskExecutor.submit(processThis);
list.add(future)
}
}
void removeFutures() {
for(Future future : list) {
assert future.cancel(true);
}
ProcessThis is a task that implements Callable< String> and checks for Thread.interrupted() status
public String call() {
while (true) {
if (Thread.interrupted()) {
break;
}
doSomething();
}
}
Now the problem is that only a subset of the concurrent Threads is returning 'true' when Thread.interrupted() is called.
The assert in removeFutures() returns true for every future that is removed (I checked isDone() and isCompleted() as well.
The number of Threads that are interrupted is random. Over 15 running Threads sometimes 13 are interrupted, sometimes 2 ...
I really don't understand where's the issue. if I call future.cancel(true) and this returns true... and then I check Thread.interrupted (this is called only once), I expect this to return true as well.
Any idea of what am I missing?
I'm on build java 1.6.0_02-b05
Be aware that Thread.interrupted() returns the current interrupted status and then clears it, so all future invocations will return false. What you want is probably Thread.currentThread().isInterrupted().
Also be aware that future.cancel(true) will usually only return false if the task had already been completed or canceled. If it returns true, that is no guarantee that the task will actually be canceled.
What is happening in doSomething()? Its possible that a RuntimeException is escaping somewhere due to the interrupt. Do you have an UncaughtExceptionHandler set? If not, you'll need to pass a ThreadFactory to the Executor that will set the exception handler and log any missed exceptions.
At least, you should restore the interruption flag to make taskExecutor aware of the thread interruption:
public String call() {
while (true) {
if (Thread.interrupted()) {
Thread.currentThread().interrupt();
break;
}
doSomething();
}
}
A potential problem that interrupts are often swallowed. So somewhere deep in doSomething() (or even in class loading), an interrupt may be caught by, say, wait() and then discarded by 'careless' code. Interrupts are evil, IMO.
It may be worth checking that all of you tasks are actually running at the time of the cancel.

Categories