I'm working on a project that has a thread pool to which it submits tasks. Each task is a chain, so to speak. When the task executes, it does what it needs to do, then checks the result. Each of these tasks contains a map of results (which is just an enum) and additional tasks. These are called within the same thread and the cycle repeats until there are no more tasks, at which point it goes back up the chain, adding each result to a collection and returning that to the main thread. Q&D example:
public abstract class MyCallable implements Callable<MyResponse> {
private Map<ResponseEnum, List<MyCallable>> callbacks;
public List<MyResponse> call() {
List<MyResponse> resp = new ArrayList<MyResponse>();
try{
//Run the process method and collect the result
MyResponse response = process();
List<MyCallable> next = callbacks.get(response.getResult());
if (next != null && !next.isEmpty()){
//Run within same thread, return results
for (MyCallable m : next){
resp.addAll(m.call();
}
return resp;
} else {
//No more responses, pass them back up the chain
resp.add(response);
return list;
}
//Anything goes wrong, we catch it here and wrap it in a response
} catch (Exception e){
resp.add(new MyExceptionResponse(e));
return resp;
}
}
//Implemented by all child classes, does the actual work
public abstract MyResponse process() throws Exception;
Bear in mind that this is also a prototype that I have not yet really tested out, so I'm aware that this may not be perfect or necessarily completely feasible.
The concern I have is this: A task is added to the thread pool and begins execution. In the main thread, a Future is created and a .get(N, TimeUnit) is called on it to retrieve the result. What if that task times out? We get a TimeoutException. Now, within a try/catch block I could cancel the Future, but is there any way for me to cancel the Future and extract the results, at least as far as they go? Three tasks may have executed and returned results before the the fourth stalled out. The try/catch in MyCallable should return a result and push it back up the chain if there's an exception (Ie, InterruptedException when .cancel(true) is called), but is it possible for me to get that result?
Of course, if I'm going about this completely wrong in the first place, that would also be good to know. This is my first big foray into multithreading.
EDIT: Okay, with that in mind, a wrapper has been placed around the MyCallable class. The wrapper implements Callable and returns the collection. The collection is passed down the chain of MyCallable objects and the results added, so if the Future.get times out, we can retrieve the collection and get the partial results.
However, this brings up a potential race condition. If the current MyCallable being invoked is waiting for an external service, then the Future.cancel(true) operation will cause an InterruptedException within MyCallable. This is caught and the exception is wrapped in a response object and added to the collection. The thing is, if the main thread cancels the Future, synchronizes on the wrapper or the collection in the wrapper, and then gets the collection, will that create a race condition between the getting of the collection and the try/catch block in MyCallable adding the wrapped exception to the collection? Or will the main thread wait for the catching of the exception and then execute the next line?
At the point when you get your TimeoutException, the task submitted to the Executor Service is merrily going forth on its way: it is only your waiting which has received the exception. That presumably means that the result map is still being populated.
What you could do is is use a concurrent map and safely extract whatever results are present after the timeout has occurred.
Related
Below is the piece of code that submits a job.. Let's say I have 3 threads running. how does the get method wait and obtain the appropriate thread results.
Future<?> result = threadPool.submitTasks(new Callable<T>() {
public T call() throws Exception {
// do something
}
});
anyType = (T) result.get();
Or Lets say I have Task A resulted 1 and Task B resulted 2.. When it comes to get method, what is the guarantee that it returns the correct values?
Your submitted task (in this case the Callable) is wrapped into the instance of the returned Future. In essence, the Future is directly related to the task it was created for, and not any other task.
Internally, when calling get, the future will attempt to acquire a lock that it shares in common with its wrapped task. Once acquired, it then queries the status of the task in order to determine what to do next:
Throw an exception if the Future was cancelled, or if the underlying task generated an exception
Otherwise, return the result that was generated by the task.
This is broadly how it works, there are several implementations of Future and they all have different internal logic.
You're assuming there is a guarantee they will receive the correct output. This is a question of implementing Thread safe code.
Often, to make an implementation thread safe, you will lock it from use with some sort of flag. This will indicate to other threads that they can not use it.
I'm feeding threads into an ExecutorService.
These threads are manipulating some data, and if there's a conflict, the data object throws an exception, which is caught by the conflicting thread, which in turn aborts and does not complete execution.
When this happens, the aborting thread needs to be put back in the queue and fed back into the executor.
How can I tell if an exception was thrown, from the parent thread?
When you submit() a task on the ExecutorService you get a Future as a result. When the execution has finished you can call get() on that future. This will return the result if applicable, or it will throw an ExecutionException if the original task threw one. If you want the real exception object you can do getCause().
Also note that you would be putting a Task back into the service, that task is ran on a Thread which has not really terminated (just caught the exception and is waiting for a new one).
Here is an example usage (you can use Runnable if you don't care for the result).
Callable<String> myCallable = ...;
Future<String> future = myExector.submit(myCallable);
// Do something else until myCallable.isDone() returns true.
try {
String result = future.get();
}catch(ExecutionException e){
// Handle error, perhaps create new Callable to submit.
}
I have a situation where I have 2 blocking queues. The first I insert some tasks that I execute. When each task completes, it adds a task to the second queue, where they are executed.
So my first queue is easy: I just check to make sure it's not empty and execute, else I interrupt():
public void run() {
try {
if (taskQueue1.isEmpty()) {
SomeTask task = taskQueue1.poll();
doTask(task);
taskQueue2.add(task);
}
else {
Thread.currentThread().interrupt();
}
}
catch (InterruptedException ex) {
ex.printStackTrace();
}
}
The second one I do the following, which as you can tell, doesn't work:
public void run() {
try {
SomeTask2 task2 = taskQueue2.take();
doTask(task2);
}
catch (InterruptedException ex) {
}
Thread.currentThread().interrupt();
}
How would you solve it so that the second BlockingQueue doesn't block on take(), yet finishes only when it knows there are no more items to be added. It would be good if the 2nd thread could see the 1st blocking queue perhaps, and check if that was empty and the 2nd queue was also empty, then it would interrupt.
I could also use a Poison object, but would prefer something else.
NB: This isn't the exact code, just something I wrote here:
You make it sound as though the thread processing the first queue knows that there are no more tasks coming as soon as its queue is drained. That sounds suspicious, but I'll take you at your word and propose a solution anyway.
Define an AtomicInteger visible to both threads. Initialize it to positive one.
Define the first thread's operation as follows:
Loop on Queue#poll().
If Queue#poll() returns null, call AtomicInteger#decrementAndGet() on the shared integer.
If AtomicInteger#decrementAndGet() returned zero, interrupt the second thread via Thread#interrupt(). (This handles the case where no items ever arrived.)
In either case, exit the loop.
Otherwise, process the extracted item, call AtomicInteger#incrementAndGet() on the shared integer, add the extracted item to the second thread's queue, and continue the loop.
Define the second thread's operation as follows:
Loop blocking on BlockingQueue#take().
If BlockingQueue#take() throws InterruptedException, catch the exception, call Thread.currentThread().interrupt(), and exit the loop.
Otherwise, process the extracted item.
Call AtomicInteger#decrementAndGet() on the shared integer.
If AtomicInteger#decrementAndGet() returned zero, exit the loop.
Otherwise, continue the loop.
Make sure you understand the idea before trying to write the actual code. The contract is that the second thread continues waiting on more items from its queue until the count of expected tasks reaches zero. At that point, the producing thread (the first one) will no longer push any new items into the second thread's queue, so the second thread knows that it's safe to stop servicing its queue.
The screwy case arises when no tasks ever arrive at the first thread's queue. Since the second thread only decrements and tests the count after it processes an item, if it never gets a chance to process any items, it won't ever consider stopping. We use thread interruption to handle that case, at the cost of another conditional branch in the first thread's loop termination steps. Fortunately, that branch will execute only once.
There are many designs that could work here. I merely described one that introduced only one additional entity—the shared atomic integer—but even then, it's fiddly. I think that using a poison pill would be much cleaner, though I do concede that neither Queue#add() nor BlockingQueue#put() accept null as a valid element (due to Queue#poll()'s return value contract). It would be otherwise be easy to use null as a poison pill.
I can't figure out what you are actually trying to do here, but I can say that the interrupt() in your first run() method is either pointless or wrong.
If you are running the run() method in your own Thread object, then that thread is about to exit anyway, so there's no point interrupting it.
If you are running the run() method in an executor with a thread pool, then you most likely don't want to kill the thread or shut down the executor at all ... at that point. And if you do want to shutdown the executor, then you should call one of its shutdown methods.
For instance, here's a version what does what you seeming to be doing without all of the interrupt stuff, and without thread creation/destruction churn.
public class TaskExecutor {
private ExecutorService executor = new ThreadPoolExecutorService(...);
public void submitTask1(final SomeTask task) {
executor.submit(new Runnable(){
public void run() {
doTask(task);
submitTask2(task);
}
});
}
public void submitTask2(final SomeTask task) {
executor.submit(new Runnable(){
public void run() {
doTask2(task);
}
});
}
public void shutdown() {
executor.shutdown();
}
}
If you want separate queuing for the tasks, simply create and use two different executors.
I am using several threads to do some heavy (and error-prone) processing on a large data set. I require all threads to finish execution, regardless of whether they throw an exception or terminate normally (no value is returned), before the program can continue. I am using a CountDownLatch to achieve this, and an ExecutorService to actually run the jobs. I want the worker threads (let's call them JobManager-s for the sake of argument) to notify the latch even if they throw an exception. A JobManager can take anywhere between a second and an hour to complete, and may fail at any time. The idea is to invoke the "finalizer" method of JobManager if an exception is thrown. Now, the ExecutorService likes to catch exceptions or to conceal the true origin of the ones it does not. I have though of a few ways around this, neither of which is satisfactory:
Use ExecutorService#execute(Runnable r) rather than submit(Runnable r). I can do that since I do not care about the return value of the JobManager. I have provided a custom ThreadFactory, which attaches an UncaughtExceptionHandler to each newly created thread. The problem with this approach is that when UncaughtExceptionHandler#uncaughtException(Thread t, Throwable e) is invoked, t's Runnable is of type ThreadPoolExecutor$Worker, and not of type JobManager, which prevents me from invoking the "finalizer" method.
Use a custom ExecutorService and override the afterExecute(Runnable r, Throwable t) method. This suffers from the same problem as 1.
Wrap the whole JobManager#doWork() in a catch statement and use the return value to indicate if an exception was thrown. I can then submit the jobs and use FutureTask#get() to decide if an exception was thrown. I do not like this solution because I feel return codes the wrong tool when you have an elaborate exception mechanism. Moreover, get() will wait (unless interrupted), which means I cannot handle errors in other threads immediately.
Get rid of the CountDownLatch. Store all Futures in a list and repeatedly poke in until I am satisfied with the states. This might work, but feels like a dirty hack.
Any suggestions are greatly appreciated.
As far as I understand, you can use a simple try-finally block:
public class JobManager {
public void doWork() {
try {
...
} finally {
countDownLatch.countDown();
}
}
}
I have a ThreadPoolExecutor that seems to be lying to me when I call getActiveCount(). I haven't done a lot of multithreaded programming however, so perhaps I'm doing something incorrectly.
Here's my TPE
#Override
public void afterPropertiesSet() throws Exception {
BlockingQueue<Runnable> workQueue;
int maxQueueLength = threadPoolConfiguration.getMaximumQueueLength();
if (maxQueueLength == 0) {
workQueue = new LinkedBlockingQueue<Runnable>();
} else {
workQueue = new LinkedBlockingQueue<Runnable>(maxQueueLength);
}
pool = new ThreadPoolExecutor(
threadPoolConfiguration.getCorePoolSize(),
threadPoolConfiguration.getMaximumPoolSize(),
threadPoolConfiguration.getKeepAliveTime(),
TimeUnit.valueOf(threadPoolConfiguration.getTimeUnit()),
workQueue,
// Default thread factory creates normal-priority,
// non-daemon threads.
Executors.defaultThreadFactory(),
// Run any rejected task directly in the calling thread.
// In this way no records will be lost due to rejection
// however, no records will be added to the workQueue
// while the calling thread is processing a Task, so set
// your queue-size appropriately.
//
// This also means MaxThreadCount+1 tasks may run
// concurrently. If you REALLY want a max of MaxThreadCount
// threads don't use this.
new ThreadPoolExecutor.CallerRunsPolicy());
}
In this class I also have a DAO that I pass into my Runnable (FooWorker), like so:
#Override
public void addTask(FooRecord record) {
if (pool == null) {
throw new FooException(ERROR_THREAD_POOL_CONFIGURATION_NOT_SET);
}
pool.execute(new FooWorker(context, calculator, dao, record));
}
FooWorker runs record (the only non-singleton) through a state machine via calculator then sends the transitions to the database via dao, like so:
public void run() {
calculator.calculate(record);
dao.save(record);
}
Once my main thread is done creating new tasks I try and wait to make sure all threads finished successfully:
while (pool.getActiveCount() > 0) {
recordHandler.awaitTermination(terminationTimeout,
terminationTimeoutUnit);
}
What I'm seeing from output logs (which are presumably unreliable due to the threading) is that getActiveCount() is returning zero too early, and the while() loop is exiting while my last threads are still printing output from calculator.
Note I've also tried calling pool.shutdown() then using awaitTermination but then the next time my job runs the pool is still shut down.
My only guess is that inside a thread, when I send data into the dao (since it's a singleton created by Spring in the main thread...), java is considering the thread inactive since (I assume) it's processing in/waiting on the main thread.
Intuitively, based only on what I'm seeing, that's my guess. But... Is that really what's happening? Is there a way to "do it right" without putting a manual incremented variable at the top of run() and a decremented at the end to track the number of threads?
If the answer is "don't pass in the dao", then wouldn't I have to "new" a DAO for every thread? My process is already a (beautiful, efficient) beast, but that would really suck.
As the JavaDoc of getActiveCount states, it's an approximate value: you should not base any major business logic decisions on this.
If you want to wait for all scheduled tasks to complete, then you should simply use
pool.shutdown();
pool.awaitTermination(terminationTimeout, terminationTimeoutUnit);
If you need to wait for a specific task to finish, you should use submit() instead of execute() and then check the Future object for completion (either using isDone() if you want to do it non-blocking or by simply calling get() which blocks until the task is done).
The documentation suggests that the method getActiveCount() on ThreadPoolExecutor is not an exact number:
getActiveCount
public int getActiveCount()
Returns the approximate number of threads that are actively executing tasks.
Returns: the number of threads
Personally, when I am doing multithreaded work such as this, I use a variable that I increment as I add tasks, and decrement as I grab their output.