I'm using the ExecutorService to process thousands of small independent tasks. Each task, on completion, stores the result (which is either true of false).
So, instead of processing all of the tasks, I want to shutdown the threadpool prematurely, if a task has found the answer!
It feels like I'm missing something very obvious here...
Consider using the invokeAny method. It returns when just one is done.
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html#invokeAny(java.util.Collection)
The desire you express reminds me of a Klein bottle, where there's little distinction maintained between what's "inside" and what's "outside." Here, the tasks submitted to the ExecutorService need to know that they must notify a latching gate outside the thread pool and shut it down when first transitioning from having seen no true task outcomes to having seen at least one.
I won't write the code for you, but I'll sketch the solution. It may help to define an interface on which the tasks must call when they complete:
interface TaskObserver
{
void completed(boolean result);
}
Each task instance can be constructed with a reference to such a TaskObserver, on which the task body will call just before it completes and yields control back to the invoking ExecutorService. You could even write a base class to assist in participating in this protocol:
public abstract class ObservableTask implements Callable<Boolean>
{
protected ObservableTask(TaskObserver observer)
{
if (null == observer)
throw NullPointerException();
observer_ = observer;
}
public final Boolean call()
{
final boolean result = evaluate();
observer_.completed(result);
return result;
}
protected abstract boolean evaluate();
private final TaskObserver observer_;
}
Alternately, instead of using extension to define tasks, you could write a concrete class like this that accepts a reference to a Callable<Boolean> in its constructor in addition to the TaskObserver reference, and works through delegation instead.
Moving on, the implementation of TaskObserver will store an AtomicBoolean, which must be set to false initially. The body of the completed(boolean) method must attempt to set the AtomicBoolean from false to true if the result passed to completed(boolean) is true. If the transition from false to true is successful, shut down the ExecutorService and stop submitting any more tasks; any subsequent calls calls to the TaskObserver will come from tasks that had already been submitted and were too far along to comply with a cancellation request.
public void complete(boolean result)
{
if (result &&
latch_.compareAndSet(false, true))
{
// Set a flag to cease submitting new tasks.
service_.shutdownNow();
if (!service_.awaitTermination(timeoutMagnitude, timeoutUnit))
{
// Report a problem in shutting down the pool in a timely manner.
}
}
}
If that's not enough of a push to get you started, please follow up with additional questions.
Related
I've got a question about CompletableFuture and its possible usage for lazy computations.
It seems like it is a great substitute for RunnableFuture for this task since it is possible to easily create task chains and to have total control of each chain link. Still I found that it is very hard to control when exactly does the computation take place.
If I just create a CompletableFuture with supplyAssync method or something like that, it is OK. It waits patiently for me to call get or join method to compute. But if I try to make an actual chain with whenCompose, handle or any other method, the evaluation starts immediately, which is very frustrating.
Of course, I can always place some blocker task at the start of the chain and release the block when I am ready to begin calculation, but it seems a bit ugly solution. Does anybody know how to control when does CompletableFuture actually run.
CompletableFuture is a push-design, i.e. results are pushed down to dependent tasks as soon as they become available. This also means side-chains that are not in themselves consumed still get executed, which can have side-effects.
What you want is a pull-design where ancestors would only be pulled in as their data is consumed.
This would be a fundamentally different design because side-effects of non-consumed trees would never happen.
Of course with enough contortions CF could be made to do what you want, but you should look into the fork-join framework instead which allows you to only run the computations you depend on instead of pushing down results.
There's a conceptual difference between RunnableFuture and CompletableFuture that you're missing here.
RunnableFuture implementations take a task as input and hold onto it. It runs the task when you call the run method.
A CompletableFuture does not hold onto a task. It only knows about the result of a task. It has three states: complete, incomplete, and completed exceptionally (failed).
CompletableFuture.supplyAsync is a factory method that gives you an incomplete CompletableFuture. It also schedules a task which, when it completes, will pass its result to the CompletableFuture's complete method. In other words, the future that supplyAsync hands you doesn't know anything about the task, and can't control when the task runs.
To use a CompletableFuture in the way you describe, you would need to create a subclass:
public class RunnableCompletableFuture<T> extends CompletableFuture<T> implements RunnableFuture<T> {
private final Callable<T> task;
public RunnableCompletableFuture(Callable<T> task) {
this.task = task;
}
#Override
public void run() {
try {
complete(task.call());
} catch (Exception e) {
completeExceptionally(e);
}
}
}
A simple way of dealing with your problem is wrapping your CompletableFuture in something with a lazy nature. You could use a Supplier or even Java 8 Stream.
it is late, but how about using constructor for first CompletableFuture in the chain?
CompletableFuture<Object> cf = new CompletableFuture<>();
// compose the chain
cf.thenCompose(sometask_here);
// later starts the chain with
cf.complete(anInputObject);
I'm implementing the Future<Collection<Integer>> interface in order to share the result of some bulk computation among all thread in the application.
In fact, I intended to just put an instance of a class implemetnting Future<Collection<Integer>> into an ApplicationScope object so that any other thread which need the result just ask for the Future from the object and call the method get() on it, therefore using the computation performed by some another thread.
My question is about implementing the cancel method. For now, I would write something like that:
public class CustomerFutureImpl implements Future<Collection<Integer>>{
private Thread computationThread;
private boolean started;
private boolean cancelled;
private Collection<Integer> computationResult;
private boolean cancel(boolean mayInterruptIfRunning){
if( computationResult != null )
return false;
if( !started ){
cancelled = true;
return true;
} else {
if(mayInterruptIfRunning)
computationThread.interrupt();
}
}
//The rest of the methods
}
But the method implementation doesn't satisfy the documentation of the Future because we need to throw CancellationException in any thread awaiting for the result (has called the get() method).
Should I add another one field like private Collection<Thread> waitingForTheResultThreads; and then interrupt each thread from the Collection, catch interrupted exception and then throw new CancellationException()?
The thing is that such a solution seems kind of wierd to me... Not sure about that.
Generally you should avoid implementing Future directly at all. Concurrency code is very hard to get right, and frameworks for distributed execution - notably ExecutorService - will provide Future instances referencing the units of work you care about.
You may know that already and are intentionally creating a new similar service, but I feel it's important to call out that for the vast majority of use cases, you should not need to define your own Future implementation.
You might want to look at the concurrency tools Guava provides, in particular ListenableFuture, which is a sub-interface of Future that provides additional features.
Assuming that you really do want to define a custom Future type, use Guava's AbstractFuture implementation as a starting point, so that you don't have to reinvent the complex details you're running into.
To your specific question, if you look at the implementation of AbstractFuture.get(), you'll see that it's implemented with a while loop that looks for value to become non-null, at which time it calls getDoneValue() which either returns the value or raises a CancellationException. So essentially, each thread that is blocking on a call to Future.get() is polling the Future.value field every so often and raising a CancellationException if it detects that the Future has been cancelled. There's no need to keep track of a Collection<Thread> or anything of the sort, since each thread can inspect the state of the Future independently, and return or throw as needed.
I have been using this pattern for a while, but I only recently came to think that it might not be OK to do this. Basically, I use some variant of this pattern:
public class SampleJavaAsync
{
public SampleJavaAsync() { }
private boolean completed;
public void start()
{
new Thread(new Runnable() {
#Override
public void run() {
//... do something on a different thread
completed = true;
}
}).start();
}
public void update()
{
if (!completed) return;
//... do something else
}
}
*The user is responsible for making sure start is only called once. update is called wherever and whenever.
I've always assumed this is threadsafe in Java, because even though nothing is strictly synchronized, I only ever set completed to true. Once it has been observed to be true, it will not reset to false. It is initialized to false in the constructor, which is by definition thread safe (unless you do something stupid in it). So, is it thread safe to use unresettable flags in this way? (And if so, does it even provide any performance benefits?)
Thanks
Java: it's feasible for update() to not see the update to completed that has already happened. Unless you mark it volatile, the JVM is permitted to do all sorts of things in the name of optimization (namely reordering reads and writes as it sees fit), meaning you could feasibly hit a state where the thread running update() NEVER sees that completed has changed, because it's not marked volatile, and it thinks it can optimize away that pesky write (or defer it).
You would at least run the risk of having inconsistency when it's first set, where, e.g. a call to update() on the same thread could see a different value than the same call from another thread, at the same time.
Better explained:
http://jeremymanson.blogspot.com/2008/11/what-volatile-means-in-java.html
Or, if you're really curious about concurrency in Java, buy a copy of JCIP:
http://jcip.net.s3-website-us-east-1.amazonaws.com/
I am circling through LinkedBlockingQueue millions of Strings.
The reading thread should end its execution when there are no more items in source.
I thought about putting a dummy value like "SHUTDOWN" in LinkedBlockingQueue.
The reader does this:
while ((data = (String)MyLinkedBlockingQueue.take()).equals("SHUTDOWN") == false) {
//read and live
}
Is it efficient to execute equals on every string? If not what can I use instead?
You are on the right track. This is the standard idiom for finishing processing of a BlockingQueue, it's called the "poison pill". i usually implement it using a special private static final instance so you can do object equality and don't risk overlapping with a real value. e.g.:
private static final String SHUTDOWN = new String("SHUTDOWN"); // use new String() so you don't get an interned value
public void readQueue() {
while ((data = (String)MyLinkedBlockingQueue.take()) != SHUTDOWN) {
//read and live
}
}
public void shutdownQueue() {
MyLinkedBlockingQueue.put(SHUTDOWN);
}
You can also think of using poll() and ending the loop when it returns null.
This could be implemented so that you don't have to check for the "poison pill" every time. Consider making use of a ThreadPoolExecutor that works on your LinkedBlockingQueue. When you want to shut down processing, call the shutdown() method on the executor object. From the documentation of that method:
Initiates an orderly shutdown in which previously submitted tasks are
executed, but no new tasks will be accepted. Invocation has no
additional effect if already shut down.
See this post if you're interested in shutting down processing immediately while tasks are still pending in the queue: With a Java ExecutorService, how do I complete actively executing tasks but halt the processing of waiting tasks?
I have a ThreadPoolExecutor that seems to be lying to me when I call getActiveCount(). I haven't done a lot of multithreaded programming however, so perhaps I'm doing something incorrectly.
Here's my TPE
#Override
public void afterPropertiesSet() throws Exception {
BlockingQueue<Runnable> workQueue;
int maxQueueLength = threadPoolConfiguration.getMaximumQueueLength();
if (maxQueueLength == 0) {
workQueue = new LinkedBlockingQueue<Runnable>();
} else {
workQueue = new LinkedBlockingQueue<Runnable>(maxQueueLength);
}
pool = new ThreadPoolExecutor(
threadPoolConfiguration.getCorePoolSize(),
threadPoolConfiguration.getMaximumPoolSize(),
threadPoolConfiguration.getKeepAliveTime(),
TimeUnit.valueOf(threadPoolConfiguration.getTimeUnit()),
workQueue,
// Default thread factory creates normal-priority,
// non-daemon threads.
Executors.defaultThreadFactory(),
// Run any rejected task directly in the calling thread.
// In this way no records will be lost due to rejection
// however, no records will be added to the workQueue
// while the calling thread is processing a Task, so set
// your queue-size appropriately.
//
// This also means MaxThreadCount+1 tasks may run
// concurrently. If you REALLY want a max of MaxThreadCount
// threads don't use this.
new ThreadPoolExecutor.CallerRunsPolicy());
}
In this class I also have a DAO that I pass into my Runnable (FooWorker), like so:
#Override
public void addTask(FooRecord record) {
if (pool == null) {
throw new FooException(ERROR_THREAD_POOL_CONFIGURATION_NOT_SET);
}
pool.execute(new FooWorker(context, calculator, dao, record));
}
FooWorker runs record (the only non-singleton) through a state machine via calculator then sends the transitions to the database via dao, like so:
public void run() {
calculator.calculate(record);
dao.save(record);
}
Once my main thread is done creating new tasks I try and wait to make sure all threads finished successfully:
while (pool.getActiveCount() > 0) {
recordHandler.awaitTermination(terminationTimeout,
terminationTimeoutUnit);
}
What I'm seeing from output logs (which are presumably unreliable due to the threading) is that getActiveCount() is returning zero too early, and the while() loop is exiting while my last threads are still printing output from calculator.
Note I've also tried calling pool.shutdown() then using awaitTermination but then the next time my job runs the pool is still shut down.
My only guess is that inside a thread, when I send data into the dao (since it's a singleton created by Spring in the main thread...), java is considering the thread inactive since (I assume) it's processing in/waiting on the main thread.
Intuitively, based only on what I'm seeing, that's my guess. But... Is that really what's happening? Is there a way to "do it right" without putting a manual incremented variable at the top of run() and a decremented at the end to track the number of threads?
If the answer is "don't pass in the dao", then wouldn't I have to "new" a DAO for every thread? My process is already a (beautiful, efficient) beast, but that would really suck.
As the JavaDoc of getActiveCount states, it's an approximate value: you should not base any major business logic decisions on this.
If you want to wait for all scheduled tasks to complete, then you should simply use
pool.shutdown();
pool.awaitTermination(terminationTimeout, terminationTimeoutUnit);
If you need to wait for a specific task to finish, you should use submit() instead of execute() and then check the Future object for completion (either using isDone() if you want to do it non-blocking or by simply calling get() which blocks until the task is done).
The documentation suggests that the method getActiveCount() on ThreadPoolExecutor is not an exact number:
getActiveCount
public int getActiveCount()
Returns the approximate number of threads that are actively executing tasks.
Returns: the number of threads
Personally, when I am doing multithreaded work such as this, I use a variable that I increment as I add tasks, and decrement as I grab their output.