Java happens-before consistent thread view on two ConcurrentMaps - java

I have a java class to handle a multithreaded subscription service. By implementing the Subscribable interface, tasks can be submitted to the service and periodically executed. A sketch of the code is seen below:
import java.util.concurrent.*;
public class Subscribtions {
private ConcurrentMap<Subscribable, Future<?>> futures = new ConcurrentHashMap<Subscribable, Future<?>>();
private ConcurrentMap<Subscribable, Integer> cacheFutures = new ConcurrentHashMap<Subscribable, Integer>();
private ScheduledExecutorService threads;
public Subscribtions() {
threads = Executors.newScheduledThreadPool(16);
}
public void subscribe(Subscribable subscription) {
Runnable runnable = getThread(subscription);
Future<?> future = threads.scheduleAtFixedRate(runnable, subscription.getInitialDelay(), subscription.getPeriod(), TimeUnit.SECONDS);
futures.put(subscription, future);
}
/*
* Only called from controller thread
*/
public void unsubscribe(Subscribable subscription) {
Future<?> future = futures.remove(subscription); //1. Might be removed by worker thread
if (future != null)
future.cancel(false);
else {
//3. Worker-thread view := cacheFutures.put() -> futures.remove()
//4. Controller-thread has seen futures.remove(), but has it seen cacheFutures.put()?
}
}
/*
* Only called from worker threads
*/
private void delay(Runnable runnable, Subscribable subscription, long delay) {
cacheFutures.put(subscription, 0); //2. Which is why it is cached first
Future<?> currentFuture = futures.remove(subscription);
if (currentFuture != null) {
currentFuture.cancel(false);
Future<?> future = threads.scheduleAtFixedRate(runnable, delay, subscription.getPeriod(), TimeUnit.SECONDS);
futures.put(subscription, future);
}
}
private Runnable getThread(Subscribable subscription) {
return new Runnable() {
public void run() {
//Do work...
boolean someCondition = true;
long someDelay = 100;
if (someCondition) {
delay(this, subscription, someDelay);
}
}
};
}
public interface Subscribable {
long getInitialDelay();
long getPeriod();
}
}
So the class permits to:
Subscribe to new tasks
Unsubscribe from existing tasks
Delay a periodically executed task
Subscriptions are added/removed by an external controlling thread, but delays are incurred only by the internal worker threads. This could happen, if for instance a worker thread found no update from the last execution or e.g. if the thread only needs to execute from 00.00 - 23.00.
My problem is that a worker thread may call delay() and remove its future from the ConcurrentMap, and the controller thread may concurrently call unsubscribe(). Then if the controller thread checks the ConcurrentMap before the worker thread has put in a new future, the unsubscribe() call will be lost.
There are some (not exhaustive list perhaps) solutions:
Use a lock between the delay() and unsubscribe() methods
Same as above, but one lock per subscribtion
(preferred?) Use no locks, but "cache" removed futures in the delay() method
As for the third solution, since the worker-thread has established the happens-before relationship cacheFutures.put() -> futures.remove(), and the atomicity of ConcurrentMap makes the controller thread see futures.remove(), does it also see the same happens-before relationship as the worker thread? I.e. cacheFutures.put() -> futures.remove()? Or does the atomicity only hold for the futures map with updates to other variables being propagated later?
Any other comments are also welcome, esp. considering use of the volatile keyword. Should the cache-map be declared volatile? thanks!

One lock per subscription would require you to maintain yet another map, and possibly thereby to introduce additional concurrency issues. I think that would be better avoided. The same applies even more so to caching removed subscriptions, plus that affords the added risk of unwanted resource retention (and note that it's not the Futures themselves that you would need to cache, but rather the Subscribables with which they are associated).
Any way around, you will need some kind of synchronization / locking. For example, in your option (3) you need to avoid an unsubscribe() for a given subscription happening between delay() caching that subscription and removing its Future. The only way you could avoid that without some form of locking would be if you could use just one Future per subscription, kept continuously in place from the time it is enrolled by subscribe() until it is removed by unsubscribe(). Doing so is not consistent with the ability to delay an already-scheduled subscription.
As for the third solution, since the worker-thread has established the happens-before relationship cacheFutures.put() -> futures.remove(), and the atomicity of ConcurrentMap makes the controller thread see futures.remove(), does it also see the same happens-before relationship as the worker thread?
Happens-before is a relationship between actions in an execution of a program. It is not specific to any one thread's view of the execution.
Or does the atomicity only hold for the futures map with updates to other variables being propagated later?
The controller thread will always see the cacheFutures.put() performed by an invocation of delay() occuring before the futures.remove() performed by that same invocation. I don't think that helps you, though.
Should the cache-map be declared volatile?
No. That would avail nothing, because although the contents of that map change, the map itself is always the same object, and the reference to it does not change.
You could consider having subscribe(), delay(), and unsubscribe() each synchronize on the Subscribable presented. That's not what I understood you to mean about having a lock per subscription, but it is similar. It would avoid the need for a separate data structure to maintain such locks. I guess you could also build locking methods into the Subscribable interface if you want to avoid explicit synchronization.

You have a ConcurrentMap but you aren't using it. Consider something along these lines:
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
final class SO33555545
{
public static void main(String... argv)
throws InterruptedException
{
ScheduledExecutorService workers = Executors.newScheduledThreadPool(16);
Subscriptions sub = new Subscriptions(workers);
sub.subscribe(() -> System.out.println("Message received: A"));
sub.subscribe(() -> System.out.println("Message received: B"));
Thread.sleep(TimeUnit.SECONDS.toMillis(30));
workers.shutdown();
}
}
final class Subscriptions
{
private final ConcurrentMap<Subscribable, Task> tasks = new ConcurrentHashMap<>();
private final ScheduledExecutorService workers;
public Subscriptions(ScheduledExecutorService workers)
{
this.workers = workers;
}
void subscribe(Subscribable sub)
{
Task task = new Task(sub);
Task current = tasks.putIfAbsent(sub, task);
if (current != null)
throw new IllegalStateException("Already subscribed");
task.activate();
}
private Future<?> schedule(Subscribable sub)
{
Runnable task = () -> {
sub.invoke();
if (Math.random() < 0.25) {
System.out.println("Delaying...");
delay(sub, 5);
}
};
return workers.scheduleAtFixedRate(task, sub.getPeriod(), sub.getPeriod(), TimeUnit.SECONDS);
}
void unsubscribe(Subscribable sub)
{
Task task = tasks.remove(sub);
if (task != null)
task.cancel();
}
private void delay(Subscribable sub, long delay)
{
Task task = new Task(sub);
Task obsolete = tasks.replace(sub, task);
if (obsolete != null) {
obsolete.cancel();
task.activate();
}
}
private final class Task
{
private final FutureTask<Future<?>> future;
Task(Subscribable sub)
{
this.future = new FutureTask<>(() -> schedule(sub));
}
void activate()
{
future.run();
}
void cancel()
{
boolean interrupted = false;
while (true) {
try {
future.get().cancel(false);
break;
}
catch (ExecutionException ignore) {
ignore.printStackTrace(); /* Cancellation is unnecessary. */
break;
}
catch (InterruptedException ex) {
interrupted = true; /* Keep waiting... */
}
}
if (interrupted)
Thread.currentThread().interrupt(); /* Reset interrupt state. */
}
}
}
#FunctionalInterface
interface Subscribable
{
default long getPeriod()
{
return 4;
}
void invoke();
}

Related

Thread-safe FIFO queue with unique items and thread pool

I have to manage scheduled file replications in a system. The file replications are scheduled by users and I need to restrict the amount of system resources used during replication. The amount of time that each replication may take is not defined (i.e. a replication may be scheduled to run every 15 minutes and the previous run may still be running when the next run is due) and a replication should not be queued if it's already queued or running.
I have a scheduler that periodically checks for due file replications and, for each one, (1) add it to a blocking queue if it is not queued nor running or (2) drop it otherwise.
private final Object scheduledReplicationsLock = new Object();
private final BlockingQueue<Replication> replicationQueue = new LinkedBlockingQueue<>();
private final Set<Long> queuedReplicationIds = new HashSet<>();
private final Set<Long> runningReplicationIds = new HashSet<>();
public boolean add(Replication replication) {
synchronized (scheduledReplicationsLock) {
// If the replication job is either still executing or is already queued, do not add it.
if (queuedReplicationIds.contains(replication.id) || runningReplicationIds.contains(replication.id)) {
return false;
}
replicationQueue.add(replication)
queuedReplicationIds.add(replication.id);
}
I also have a pool of threads that waits until there is a replication in the queue and executes it. Below is the main method of each thread in the thread pool:
public void run() {
while (True) {
Replication replication = null;
synchronized (scheduledReplicationsLock) {
// This will block until a replication job is ready to be run or the current thread is interrupted.
replication = replicationQueue.take();
// Move the ID value out of the queued set and into the active set
Long replicationId = replication.getId();
queuedReplicationIds.remove(replicationId);
runningReplicationIds.add(replicationId);
}
executeReplication(replication)
}
}
This code gets into a deadlock because the first thread in the thread poll will get scheduledLock and prevent the scheduler to add replications to the queue. Moving replicationQueue.take() out of the synchronized block will eliminate the deadlock, but then it's possible that a element is removed from the queue and the hash sets are not atomically updated with it, which could cause a replication to be incorrectly dropped.
Should I use BlockingQueue.poll() and release the lock + sleep if the queue is empty instead of using BlockingQueue.take() ?
Fixes to the current solution or other solutions that meet the requirements are welcome.
wait / notify
Keeping your same control flow, instead of blocking on the BlockingQueue instance while holding the mutex lock, you can wait on notifications for the scheduledReplicationsLock forcing the worker thread to release the lock and return to the waiting pool.
Here down a reduced sample of your producer:
private final List<Replication> replicationQueue = new LinkedList<>();
private final Set<Long> runningReplicationIds = new HashSet<>();
public boolean add(Replication replication) {
synchronized (replicationQueue) {
// If the replication job is either still executing or is already queued, do not add it.
if (replicationQueue.contains(replication) || runningReplicationIds.contains(replication.id)) {
return false;
} else {
replicationQueue.add(replication);
replicationQueue.notifyAll();
}
}
}
The worker Runnable would then be updated as follows:
public void run() {
synchronized (replicationQueue) {
while (true) {
if (replicationQueue.isEmpty()) {
scheduledReplicationsLock.wait();
}
if (!replicationQueue.isEmpty()) {
Replication replication = replicationQueue.poll();
runningReplicationIds.add(replication.getId())
executeReplication(replication);
}
}
}
}
BlockingQueue
Generally you are better off using the BlockingQueue to coordinate your producer and replicating worker pool.
The BlockingQueue is, as the name implies, blocking by nature and will cause the calling thread to block only if items cannot be pulled / pushed from / to the queue.
Meanwhile, note that you will have to update your running / enqueued state management as you will only synchronizing on the BlockingQueue items dropping any constraints. This then will depend on the context, whether this would be acceptable or not.
This way, you would drop all other used mutex(es) and use on the BlockingQueue as your synchronization state:
private final BlockingQueue<Replication> replicationQueue = new LinkedBlockingQueue<>();
public boolean add(Replication replication) {
// not sure if this is the proper invariant to check as at some point the replication would be neither queued nor running while still have been processed
if (replicationQueue.contains(replication)) {
return false;
}
// use `put` instead of `add` as this will block waiting for free space
replicationQueue.put(replication);
return true;
}
The workers will then take indefinitely from the BlockingQueue:
public void run() {
while (true) {
Replication replication = replicationQueue.take();
executeReplication(replication);
}
}
You no need to use any additional synchronization block if you using BlockingQueue
Quote from docs (https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html)
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control.
just use something like this
public void run() {
try {
while (replicationQueue.take()) { //Thread will be wait for the next element in the queue
Long replicationId = replication.getId();
queuedReplicationIds.remove(replicationId);
runningReplicationIds.add(replicationId);
executeReplication(replication);
}
} catch (InterruptedException ex) {
//if interrupted while waiting next element
}
}
}
look in javadoc https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/LinkedBlockingQueue.html#take()
Or you can use BlockinQueue.pool() with timeout settings
UPD: After discussion, I extend LinkedBlockingQueue with two ConcurrentHashSets and add method afterTake() to remove processed Replicas. You do not need an additional synchronizations outside the queue. Just put replica in the first thread and take it in another, and call afterTake() when replication finished. You need to override other method if you want to use it.
package ru.everytag;
import io.vertx.core.impl.ConcurrentHashSet;
import java.util.concurrent.LinkedBlockingQueue;
public class TwoPhaseBlockingQueue<E> extends LinkedBlockingQueue<E> {
private ConcurrentHashSet<E> items = new ConcurrentHashSet<>();
private ConcurrentHashSet<E> taken = new ConcurrentHashSet<>();
#Override
public void put(E e) throws InterruptedException {
if (!items.contains(e)) {
items.add(e);
super.put(e);
}
}
public E take() {
E item = take();
taken.add(item);
items.remove(item);
return item;
}
public void afterTake(E e) {
if (taken.contains(e)) {
taken.remove(e);
} else if (items.contains(e)) {
throw new IllegalArgumentException("Element still in the queue");
}
}
}

How to release a semaphore and let any threads continue?

I want to create a semaphore that prevents a certain method to be executed more than 1x at a time.
If any other thread requests access, it should wait until the semaphore is released:
private Map<String, Semaphore> map;
public void test() {
String hash; //prevent to run the long running method with the same hash concurrently
if (map.contains(hash)) {
map.get(hash).aquire(); //wait for release of the lock
callLongRunningMethod();
} else {
Semaphore s = new Semaphore(1);
map.put(hash, s);
callLongRunningMethod();
s.release(); //any number of registered threads should continue
map.remove(hash);
}
}
Question: how can I lock the semaphore with just one thread, but release it so that any number of threads can continue as soon as released?
Some clarifications:
Imagine the long running method is a transactional method. Looks into the database. If no entry is found, a heavy XML request is send and persisted to db. Also maybe further async processed might be triggered as this is supposed to be the "initial fetch" of the data. Then return the object from DB (within that method). If the DB entry had existed, it would directly return the entity.
Now if multiple threads access the long running method at the same time, all methods would fetch the heavy XML (traffic, performance), and all of them would try to persist the same object into the DB (because the long running method is transactional). Causing eg non-unique exceptions. Plus all of them triggering the optional async threads.
When all but one thread is locked, only the first is responsible for persisting the object. Then, when finished, all other threads will detect that the entry already exists in DB and just serve that object.
As far as I understand, you don't need to use Semaphore here. Instead, you should use ReentrantReadWriteLock. Additionally, the test method is not thread safe.
The sample below is the implementation of your logic using RWL
private ConcurrentMap<String, ReadWriteLock> map = null;
void test() {
String hash = null;
ReadWriteLock rwl = new ReentrantReadWriteLock(false);
ReadWriteLock lock = map.putIfAbsent(hash, rwl);
if (lock == null) {
lock = rwl;
}
if (lock.writeLock().tryLock()) {
try {
compute();
map.remove(hash);
} finally {
lock.writeLock().unlock();
}
} else {
lock.readLock().lock();
try {
compute();
} finally {
lock.readLock().unlock();
}
}
}
In this code, the first successful thread would acquire WriteLock while other Threads would wait for release of write lock. After release of a WriteLock all Threads waiting for release would proceed concurrently.
As far as I understand your need you want to be able to ensure that the task is executed by one single thread for the first time then you want to allow several threads to execute it if so you need to rely on a CountDownLatch as next:
Here is how it could be implemented with CountDownLatch:
private final ConcurrentMap<String, CountDownLatch> map = new ConcurrentHashMap<>();
public void test(String hash) {
final CountDownLatch latch = new CountDownLatch(1);
final CountDownLatch previous = map.putIfAbsent(hash, latch);
if (previous == null) {
try {
callLongRunningMethod();
} finally {
map.remove(hash, latch);
latch.countDown();
}
} else {
try {
previous.await();
callLongRunningMethod();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
I think you could do that by using a very high permit number (higher than the number of threads, e.g. 2000000).
Then in the function that should run exclusively you acquire the complete number of permits (acquire(2000000)) and in the other threads you acquire only a single permit.
I think that the easiest way to do this would be using an ExecutorService and Future:
class ContainingClass {
private final ConcurrentHashMap<String, Future<?>> pending =
new ConcurrentHashMap<>();
private final ExecutorService executor;
ContainingClass(ExecutorService executor) {
this.executor = executor;
}
void test(String hash) {
Future<?> future = pending.computeIfAbsent(
hash,
() -> executor.submit(() -> longRunningMethod()));
// Exception handling omitted for clarity.
try {
future.get(); // Block until LRM has finished.
} finally {
// Always remove: in case of exception, this allows
// the value to be computed again.
pending.values().remove(future);
}
}
}
Ideone Demo
Removing the future from the values is thread safe because computeIfAbsent and remove are atomic: either the computeIfAbsent is run before the remove, in which case the existing future is returned, and is immediately complete; or it is run after, and a new future is added, resulting in a new call to longRunningMethod.
Note that it removes the future from pending.values(), not from the pending directly: consider the following example:
Thread 1 and Thread 2 are run concurrently
Thread 1 completes, and removes the value.
Thread 3 is run, adding a new future to the map
Thread 2 completes, and tries to remove the value.
If the future were removed from the map by key, Thread 2 would remove Thread 3's future, which is a different instance from Thread 2's future.
This simplifies the longRunningMethod too, since it is no longer required to do the "check if I need to do anything" for the blocked threads: that the Future.get() has completed successfully in the blocking thread is sufficient to indicate that no additional work is needed.
I ended as follows using CountDownLatch:
private final ConcurrentMap<String, CountDownLatch> map = new ConcurrentHashMap<>();
public void run() {
boolean active = false;
CountDownLatch count = null;
try {
if (map.containsKey(hash)) {
count = map.get(hash);
count.await(60, TimeUnit.SECONDS); //wait for release or timeout
} else {
count = new CountDownLatch(1);
map.put(hash, count); //block any threads with same hash
active = true;
}
return runLongRunningTask();
} finally {
if (active) {
count.countDown(); //release
map.remove(hash, count);
}
}
}

How to wait for completion of multiple tasks in Java?

What is the proper way to implement concurrency in Java applications? I know about Threads and stuff, of course, I have been programming for Java for 10 years now, but haven't had too much experience with concurrency.
For example, I have to asynchronously load a few resources, and only after all have been loaded, can I proceed and do more work. Needless to say, there is no order how they will finish. How do I do this?
In JavaScript, I like using the jQuery.deferred infrastructure, to say
$.when(deferred1,deferred2,deferred3...)
.done(
function(){//here everything is done
...
});
But what do I do in Java?
You can achieve it in multiple ways.
1.ExecutorService invokeAll() API
Executes the given tasks, returning a list of Futures holding their status and results when all complete.
2.CountDownLatch
A synchronization aid that allows one or more threads to wait until a set of operations being performed in other threads completes.
A CountDownLatch is initialized with a given count. The await methods block until the current count reaches zero due to invocations of the countDown() method, after which all waiting threads are released and any subsequent invocations of await return immediately. This is a one-shot phenomenon -- the count cannot be reset. If you need a version that resets the count, consider using a CyclicBarrier.
3.ForkJoinPool or newWorkStealingPool() in Executors is other way
Have a look at related SE questions:
How to wait for a thread that spawns it's own thread?
Executors: How to synchronously wait until all tasks have finished if tasks are created recursively?
I would use parallel stream.
Stream.of(runnable1, runnable2, runnable3).parallel().forEach(r -> r.run());
// do something after all these are done.
If you need this to be asynchronous, then you might use a pool or Thread.
I have to asynchronously load a few resources,
You could collect these resources like this.
List<String> urls = ....
Map<String, String> map = urls.parallelStream()
.collect(Collectors.toMap(u -> u, u -> download(u)));
This will give you a mapping of all the resources once they have been downloaded concurrently. The concurrency will be the number of CPUs you have by default.
If I'm not using parallel Streams or Spring MVC's TaskExecutor, I usually use CountDownLatch. Instantiate with # of tasks, reduce once for each thread that completes its task. CountDownLatch.await() waits until the latch is at 0. Really useful.
Read more here: JavaDocs
Personally, I would do something like this if I am using Java 8 or later.
// Retrieving instagram followers
CompletableFuture<Integer> instagramFollowers = CompletableFuture.supplyAsync(() -> {
// getInstaFollowers(userId);
return 0; // default value
});
// Retrieving twitter followers
CompletableFuture<Integer> twitterFollowers = CompletableFuture.supplyAsync(() -> {
// getTwFollowers(userId);
return 0; // default value
});
System.out.println("Calculating Total Followers...");
CompletableFuture<Integer> totalFollowers = instagramFollowers
.thenCombine(twitterFollowers, (instaFollowers, twFollowers) -> {
return instaFollowers + twFollowers; // can be replaced with method reference
});
System.out.println("Total followers: " + totalFollowers.get()); // blocks until both the above tasks are complete
I used supplyAsync() as I am returning some value (no. of followers in this case) from the tasks otherwise I could have used runAsync(). Both of these run the task in a separate thread.
Finally, I used thenCombine() to join both the CompletableFuture. You could also use thenCompose() to join two CompletableFuture if one depends on the other. But in this case, as both the tasks can be executed in parallel, I used thenCombine().
The methods getInstaFollowers(userId) and getTwFollowers(userId) are simple HTTP calls or something.
You can use a ThreadPool and Executors to do this.
https://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html
This is an example I use Threads. Its a static executerService with a fixed size of 50 threads.
public class ThreadPoolExecutor {
private static final ExecutorService executorService = Executors.newFixedThreadPool(50,
new ThreadFactoryBuilder().setNameFormat("thread-%d").build());
private static ThreadPoolExecutor instance = new ThreadPoolExecutor();
public static ThreadPoolExecutor getInstance() {
return instance;
}
public <T> Future<? extends T> queueJob(Callable<? extends T> task) {
return executorService.submit(task);
}
public void shutdown() {
executorService.shutdown();
}
}
The business logic for the executer is used like this: (You can use Callable or Runnable. Callable can return something, Runnable not)
public class MultipleExecutor implements Callable<ReturnType> {//your code}
And the call of the executer:
ThreadPoolExecutor threadPoolExecutor = ThreadPoolExecutor.getInstance();
List<Future<? extends ReturnType>> results = new LinkedList<>();
for (Type Type : typeList) {
Future<? extends ReturnType> future = threadPoolExecutor.queueJob(
new MultipleExecutor(needed parameters));
results.add(future);
}
for (Future<? extends ReturnType> result : results) {
try {
if (result.get() != null) {
result.get(); // here you get the return of one thread
}
} catch (InterruptedException | ExecutionException e) {
logger.error(e, e);
}
}
The same behaviour as with $.Deferred in jQuery you can archive in Java 8 with a class called CompletableFuture. This class provides the API for working with Promises. In order to create async code you can use one of it's static creational methods like #runAsync, #supplyAsync. Then applying some computation of results with #thenApply.
I usually opt for an async notify-start, notify-progress, notify-end approach:
class Task extends Thread {
private ThreadLauncher parent;
public Task(ThreadLauncher parent) {
super();
this.parent = parent;
}
public void run() {
doStuff();
parent.notifyEnd(this);
}
public /*abstract*/ void doStuff() {
// ...
}
}
class ThreadLauncher {
public void stuff() {
for (int i=0; i<10; i++)
new Task(this).start();
}
public void notifyEnd(Task who) {
// ...
}
}

use FutureTask for concurrency

I have a service like:
class DemoService {
Result process(Input in) {
filter1(in);
if (filter2(in)) return...
filter3(in);
filter4(in);
filter5(in);
return ...
}
}
Now I want it faster and I found that some filters can start at the same time, while some filters must wait for others to finish. For example:
filter1--
|---filter3--
filter2-- |---filter5
---filter4--
which means:
1.filter1 and filter2 can start at the same time, so do filter3 and filter4
2.filter3 and filter4 must wait for filter2 to finish
one more thing:
if filter2 returns true, then the 'process' method returns immediately and ignores the following filters.
now my solution is using FutureTask:
// do filter's work at FutureTask
for (Filter filter : filters) {
FutureTask<RiskResult> futureTask = new FutureTask<RiskResult>(new CallableFilter(filter, context));
executorService.execute(futureTask);
}
//when all FutureTask are submitted, wait for result
for(Filter filter : filters) {
if (filter.isReturnNeeded()) {
FutureTask<RiskResult> futureTask = context.getTask(filter.getId());
riskResult = futureTask.get();
if (canReturn(filter, riskResult)) {
returnOk = true;
return riskResult;
}
}
}
my CallableFilter:
public class CallableFilter implements Callable<RiskResult> {
private Filter filter;
private Context context;
#Override
public RiskResult call() throws Exception {
List<Filter> dependencies = filter.getDependentFilters();
if (dependencies != null && dependencies.size() > 0) {
//wait for its dependency filters to finish
for (Filter d : dependencies) {
FutureTask<RiskResult> futureTask = context.getTask(d.getId());
futureTask.get();
}
}
//do its own work
return filter.execute(context);
}
}
I want to know:
1.is it a good idea to use FutureTask in the case? is there a better solution?
2.the overhead of thread context switch.
thanks!
In Java 8 you can use CompletableFuture to chain your filters after each other. Use the thenApply and thenCompose family of methods in order to add new asynchronous filters to the CompletableFuture - they will execute after the previous step is finished. thenCombine combines two independent CompletableFutures when both are finished. Use allOf to wait for the result of more than two CompletableFuture objects.
If you can't use Java 8, then the Guava ListenableFuture can do the same, see Listenable Future Explained. With Guava you can wait for multiple independently running filters to finish with Futures.allAsList - this also returns a ListenableFuture.
With both approaches the idea is that after you declare your future actions, their dependencies on each other, and their threads, you get back a single Future object, which encapsulates your end result.
EDIT: The early return could be implemented by explicitly completing the CompletableFuture with the complete() method or using a Guava SettableFuture (which implements ListenableFuture)
You can use a ForkJoinPool for parallelization, which is explicitely thought for that kind of parallel computions:
(...) Method join() and its variants are appropriate for use only when completion dependencies are acyclic; that is, the parallel computation can be described as a directed acyclic graph (DAG) (...)
(see ForkJoinTask)
The advantage of a ForkJoinPool is that every task can spawn new tasks and also wait for them to complete without actually blocking the executing thread (which otherwise might cause a deadlock if more tasks are waiting for others to complete than threads are available).
This is an example that should work so far, although it has some limitations yet:
It ignores filter results
It does not prematurely finish execution if filter 2 returns true
Exception handling is not implemented
The main idea behind this code: Every filter is represented as Node that may depend on other nodes (= filters that must complete before this filter can execute). Dependent nodes are spawned as parallel tasks.
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;
public class Node<V> extends RecursiveTask<V> {
private static final short VISITED = 1;
private final Callable<V> callable;
private final Set<Node<V>> dependencies = new HashSet<>();
#SafeVarargs
public Node(Callable<V> callable, Node<V>... dependencies) {
this.callable = callable;
this.dependencies.addAll(Arrays.asList(dependencies));
}
public Set<Node<V>> getDependencies() {
return this.dependencies;
}
#Override
protected V compute() {
try {
// resolve dependencies first
for (Node<V> node : dependencies) {
if (node.tryMarkVisited()) {
node.fork(); // start node
}
}
// wait for ALL nodes to complete
for (Node<V> node : dependencies) {
node.join();
}
return callable.call();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
public boolean tryMarkVisited() {
return compareAndSetForkJoinTaskTag((short) 0, VISITED);
}
}
Usage example:
public static void main(String[] args) {
Node<Void> filter1 = new Node<>(filter("filter1"));
Node<Void> filter2 = new Node<>(filter("filter2"));
Node<Void> filter3 = new Node<>(filter("filter3"), filter1, filter2);
Node<Void> filter4 = new Node<>(filter("filter4"), filter1, filter2);
Node<Void> filter5 = new Node<>(filter("filter5"), filter3, filter4);
Node<Void> root = new Node<>(() -> null, filter5);
ForkJoinPool.commonPool().invoke(root);
}
public static Callable<Void> filter(String name) {
return () -> {
System.out.println(Thread.currentThread().getName() + ": start " + name);
Thread.sleep(1000);
System.out.println(Thread.currentThread().getName() + ": end " + name);
return null;
};
}

How to make ThreadPoolExecutor's submit() method block if it is saturated?

I want to create a ThreadPoolExecutor such that when it has reached its maximum size and the queue is full, the submit() method blocks when trying to add new tasks. Do I need to implement a custom RejectedExecutionHandler for that or is there an existing way to do this using a standard Java library?
One of the possible solutions I've just found:
public class BoundedExecutor {
private final Executor exec;
private final Semaphore semaphore;
public BoundedExecutor(Executor exec, int bound) {
this.exec = exec;
this.semaphore = new Semaphore(bound);
}
public void submitTask(final Runnable command)
throws InterruptedException, RejectedExecutionException {
semaphore.acquire();
try {
exec.execute(new Runnable() {
public void run() {
try {
command.run();
} finally {
semaphore.release();
}
}
});
} catch (RejectedExecutionException e) {
semaphore.release();
throw e;
}
}
}
Are there any other solutions? I'd prefer something based on RejectedExecutionHandler since it seems like a standard way to handle such situations.
You can use ThreadPoolExecutor and a blockingQueue:
public class ImageManager {
BlockingQueue<Runnable> blockingQueue = new ArrayBlockingQueue<Runnable>(blockQueueSize);
RejectedExecutionHandler rejectedExecutionHandler = new ThreadPoolExecutor.CallerRunsPolicy();
private ExecutorService executorService = new ThreadPoolExecutor(numOfThread, numOfThread,
0L, TimeUnit.MILLISECONDS, blockingQueue, rejectedExecutionHandler);
private int downloadThumbnail(String fileListPath){
executorService.submit(new yourRunnable());
}
}
You should use the CallerRunsPolicy, which executes the rejected task in the calling thread. This way, it can't submit any new tasks to the executor until that task is done, at which point there will be some free pool threads or the process will repeat.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.CallerRunsPolicy.html
From the docs:
Rejected tasks
New tasks submitted in method execute(java.lang.Runnable) will be
rejected when the Executor has been
shut down, and also when the Executor
uses finite bounds for both maximum
threads and work queue capacity, and
is saturated. In either case, the
execute method invokes the
RejectedExecutionHandler.rejectedExecution(java.lang.Runnable,
java.util.concurrent.ThreadPoolExecutor)
method of its
RejectedExecutionHandler. Four
predefined handler policies are
provided:
In the default ThreadPoolExecutor.AbortPolicy, the
handler throws a runtime
RejectedExecutionException upon
rejection.
In ThreadPoolExecutor.CallerRunsPolicy,
the thread that invokes execute itself
runs the task. This provides a simple
feedback control mechanism that will
slow down the rate that new tasks are
submitted.
In ThreadPoolExecutor.DiscardPolicy, a
task that cannot be executed is simply
dropped.
In ThreadPoolExecutor.DiscardOldestPolicy,
if the executor is not shut down, the
task at the head of the work queue is
dropped, and then execution is retried
(which can fail again, causing this to
be repeated.)
Also, make sure to use a bounded queue, such as ArrayBlockingQueue, when calling the ThreadPoolExecutor constructor. Otherwise, nothing will get rejected.
Edit: in response to your comment, set the size of the ArrayBlockingQueue to be equal to the max size of the thread pool and use the AbortPolicy.
Edit 2: Ok, I see what you're getting at. What about this: override the beforeExecute() method to check that getActiveCount() doesn't exceed getMaximumPoolSize(), and if it does, sleep and try again?
I know, it is a hack, but in my opinion most clean hack between those offered here ;-)
Because ThreadPoolExecutor uses blocking queue "offer" instead of "put", lets override behaviour of "offer" of the blocking queue:
class BlockingQueueHack<T> extends ArrayBlockingQueue<T> {
BlockingQueueHack(int size) {
super(size);
}
public boolean offer(T task) {
try {
this.put(task);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
return true;
}
}
ThreadPoolExecutor tp = new ThreadPoolExecutor(1, 2, 1, TimeUnit.MINUTES, new BlockingQueueHack(5));
I tested it and it seems to work.
Implementing some timeout policy is left as a reader's exercise.
Hibernate has a BlockPolicy that is simple and may do what you want:
See: Executors.java
/**
* A handler for rejected tasks that will have the caller block until
* space is available.
*/
public static class BlockPolicy implements RejectedExecutionHandler {
/**
* Creates a <tt>BlockPolicy</tt>.
*/
public BlockPolicy() { }
/**
* Puts the Runnable to the blocking queue, effectively blocking
* the delegating thread until space is available.
* #param r the runnable task requested to be executed
* #param e the executor attempting to execute this task
*/
public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
try {
e.getQueue().put( r );
}
catch (InterruptedException e1) {
log.error( "Work discarded, thread was interrupted while waiting for space to schedule: {}", r );
}
}
}
The BoundedExecutor answer quoted above from Java Concurrency in Practice only works correctly if you use an unbounded queue for the Executor, or the semaphore bound is no greater than the queue size. The semaphore is state shared between the submitting thread and the threads in the pool, making it possible to saturate the executor even if queue size < bound <= (queue size + pool size).
Using CallerRunsPolicy is only valid if your tasks don't run forever, in which case your submitting thread will remain in rejectedExecution forever, and a bad idea if your tasks take a long time to run, because the submitting thread can't submit any new tasks or do anything else if it's running a task itself.
If that's not acceptable then I suggest checking the size of the executor's bounded queue before submitting a task. If the queue is full, then wait a short time before trying to submit again. The throughput will suffer, but I suggest it's a simpler solution than many of the other proposed solutions and you're guaranteed no tasks will get rejected.
The following class wraps around a ThreadPoolExecutor and uses a Semaphore to block then the work queue is full:
public final class BlockingExecutor {
private final Executor executor;
private final Semaphore semaphore;
public BlockingExecutor(int queueSize, int corePoolSize, int maxPoolSize, int keepAliveTime, TimeUnit unit, ThreadFactory factory) {
BlockingQueue<Runnable> queue = new LinkedBlockingQueue<Runnable>();
this.executor = new ThreadPoolExecutor(corePoolSize, maxPoolSize, keepAliveTime, unit, queue, factory);
this.semaphore = new Semaphore(queueSize + maxPoolSize);
}
private void execImpl (final Runnable command) throws InterruptedException {
semaphore.acquire();
try {
executor.execute(new Runnable() {
#Override
public void run() {
try {
command.run();
} finally {
semaphore.release();
}
}
});
} catch (RejectedExecutionException e) {
// will never be thrown with an unbounded buffer (LinkedBlockingQueue)
semaphore.release();
throw e;
}
}
public void execute (Runnable command) throws InterruptedException {
execImpl(command);
}
}
This wrapper class is based on a solution given in the book Java Concurrency in Practice by Brian Goetz. The solution in the book only takes two constructor parameters: an Executor and a bound used for the semaphore. This is shown in the answer given by Fixpoint. There is a problem with that approach: it can get in a state where the pool threads are busy, the queue is full, but the semaphore has just released a permit. (semaphore.release() in the finally block). In this state, a new task can grab the just released permit, but is rejected because the task queue is full. Of course this is not something you want; you want to block in this case.
To solve this, we must use an unbounded queue, as JCiP clearly mentions. The semaphore acts as a guard, giving the effect of a virtual queue size. This has the side effect that it is possible that the unit can contain maxPoolSize + virtualQueueSize + maxPoolSize tasks. Why is that? Because of the
semaphore.release() in the finally block. If all pool threads call this statement at the same time, then maxPoolSize permits are released, allowing the same number of tasks to enter the unit. If we were using a bounded queue, it would still be full, resulting in a rejected task. Now, because we know that this only occurs when a pool thread is almost done, this is not a problem. We know that the pool thread will not block, so a task will soon be taken from the queue.
You are able to use a bounded queue though. Just make sure that its size equals virtualQueueSize + maxPoolSize. Greater sizes are useless, the semaphore will prevent to let more items in. Smaller sizes will result in rejected tasks. The chance of tasks getting rejected increases as the size decreases. For example, say you want a bounded executor with maxPoolSize=2 and virtualQueueSize=5. Then take a semaphore with 5+2=7 permits and an actual queue size of 5+2=7. The real number of tasks that can be in the unit is then 2+5+2=9. When the executor is full (5 tasks in queue, 2 in thread pool, so 0 permits available) and ALL pool threads release their permits, then exactly 2 permits can be taken by tasks coming in.
Now the solution from JCiP is somewhat cumbersome to use as it doesn't enforce all these constraints (unbounded queue, or bounded with those math restrictions, etc.). I think that this only serves as a good example to demonstrate how you can build new thread safe classes based on the parts that are already available, but not as a full-grown, reusable class. I don't think that the latter was the author's intention.
you can use a custom RejectedExecutionHandler like this
ThreadPoolExecutor tp= new ThreadPoolExecutor(core_size, // core size
max_handlers, // max size
timeout_in_seconds, // idle timeout
TimeUnit.SECONDS, queue, new RejectedExecutionHandler() {
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
// This will block if the queue is full
try {
executor.getQueue().put(r);
} catch (InterruptedException e) {
System.err.println(e.getMessage());
}
}
});
I don't always like the CallerRunsPolicy, especially since it allows the rejected task to 'skip the queue' and get executed before tasks that were submitted earlier. Moreover, executing the task on the calling thread might take much longer than waiting for the first slot to become available.
I solved this problem using a custom RejectedExecutionHandler, which simply blocks the calling thread for a little while and then tries to submit the task again:
public class BlockWhenQueueFull implements RejectedExecutionHandler {
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
// The pool is full. Wait, then try again.
try {
long waitMs = 250;
Thread.sleep(waitMs);
} catch (InterruptedException interruptedException) {}
executor.execute(r);
}
}
This class can just be used in the thread-pool executor as a RejectedExecutinHandler like any other, for example:
executorPool = new ThreadPoolExecutor(1, 1, 10,
TimeUnit.SECONDS, new SynchronousQueue<Runnable>(),
new BlockWhenQueueFull());
The only downside I see is that the calling thread might get locked slightly longer than strictly necessary (up to 250ms). Moreover, since this executor is effectively being called recursively, very long waits for a thread to become available (hours) might result in a stack overflow.
Nevertheless, I personally like this method. It's compact, easy to understand, and works well.
Create your own blocking queue to be used by the Executor, with the blocking behavior you are looking for, while always returning available remaining capacity (ensuring the executor will not try to create more threads than its core pool, or trigger the rejection handler).
I believe this will get you the blocking behavior you are looking for. A rejection handler will never fit the bill, since that indicates the executor can not perform the task. What I could envision there is that you get some form of 'busy waiting' in the handler. That is not what you want, you want a queue for the executor that blocks the caller...
To avoid issues with #FixPoint solution. One could use ListeningExecutorService and release the semaphore onSuccess and onFailure inside FutureCallback.
Recently I found this question having the same problem. The OP does not say so explicitly, but we do not want to use the RejectedExecutionHandler which executes a task on the submitter's thread, because this will under-utilize the worker threads if this task is a long running one.
Reading all the answers and comments, in particular the flawed solution with the semaphore or using afterExecute I had a closer look at the code of the ThreadPoolExecutor to see if there is some way out. I was amazed to see that there are more than 2000 lines of (commented) code, some of which make me feel dizzy. Given the rather simple requirement I actually have --- one producer, several consumers, let the producer block when no consumers can take work --- I decided to roll my own solution. It is not an ExecutorService but just an Executor. And it does not adapt the number of threads to the work load, but holds a fixed number of threads only, which also fits my requirements. Here is the code. Feel free to rant about it :-)
package x;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executor;
import java.util.concurrent.RejectedExecutionException;
import java.util.concurrent.SynchronousQueue;
/**
* distributes {#code Runnable}s to a fixed number of threads. To keep the
* code lean, this is not an {#code ExecutorService}. In particular there is
* only very simple support to shut this executor down.
*/
public class ParallelExecutor implements Executor {
// other bounded queues work as well and are useful to buffer peak loads
private final BlockingQueue<Runnable> workQueue =
new SynchronousQueue<Runnable>();
private final Thread[] threads;
/*+**********************************************************************/
/**
* creates the requested number of threads and starts them to wait for
* incoming work
*/
public ParallelExecutor(int numThreads) {
this.threads = new Thread[numThreads];
for(int i=0; i<numThreads; i++) {
// could reuse the same Runner all over, but keep it simple
Thread t = new Thread(new Runner());
this.threads[i] = t;
t.start();
}
}
/*+**********************************************************************/
/**
* returns immediately without waiting for the task to be finished, but may
* block if all worker threads are busy.
*
* #throws RejectedExecutionException if we got interrupted while waiting
* for a free worker
*/
#Override
public void execute(Runnable task) {
try {
workQueue.put(task);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RejectedExecutionException("interrupt while waiting for a free "
+ "worker.", e);
}
}
/*+**********************************************************************/
/**
* Interrupts all workers and joins them. Tasks susceptible to an interrupt
* will preempt their work. Blocks until the last thread surrendered.
*/
public void interruptAndJoinAll() throws InterruptedException {
for(Thread t : threads) {
t.interrupt();
}
for(Thread t : threads) {
t.join();
}
}
/*+**********************************************************************/
private final class Runner implements Runnable {
#Override
public void run() {
while (!Thread.currentThread().isInterrupted()) {
Runnable task;
try {
task = workQueue.take();
} catch (InterruptedException e) {
// canonical handling despite exiting right away
Thread.currentThread().interrupt();
return;
}
try {
task.run();
} catch (RuntimeException e) {
// production code to use a logging framework
e.printStackTrace();
}
}
}
}
}
I believe there is quite elegant way to solve this problem by using java.util.concurrent.Semaphore and delegating behavior of Executor.newFixedThreadPool.
The new executor service will only execute new task when there is a thread to do so. Blocking is managed by Semaphore with number of permits equal to number of threads. When a task is finished it returns a permit.
public class FixedThreadBlockingExecutorService extends AbstractExecutorService {
private final ExecutorService executor;
private final Semaphore blockExecution;
public FixedThreadBlockingExecutorService(int nTreads) {
this.executor = Executors.newFixedThreadPool(nTreads);
blockExecution = new Semaphore(nTreads);
}
#Override
public void shutdown() {
executor.shutdown();
}
#Override
public List<Runnable> shutdownNow() {
return executor.shutdownNow();
}
#Override
public boolean isShutdown() {
return executor.isShutdown();
}
#Override
public boolean isTerminated() {
return executor.isTerminated();
}
#Override
public boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedException {
return executor.awaitTermination(timeout, unit);
}
#Override
public void execute(Runnable command) {
blockExecution.acquireUninterruptibly();
executor.execute(() -> {
try {
command.run();
} finally {
blockExecution.release();
}
});
}
I had the same need in the past: a kind of blocking queue with a fixed size for each client backed by a shared thread pool. I ended up writing my own kind of ThreadPoolExecutor:
UserThreadPoolExecutor
(blocking queue (per client) + threadpool (shared amongst all clients))
See: https://github.com/d4rxh4wx/UserThreadPoolExecutor
Each UserThreadPoolExecutor is given a maximum number of threads from a shared ThreadPoolExecutor
Each UserThreadPoolExecutor can:
submit a task to the shared thread pool executor if its quota is not reached. If its quota is reached, the job is queued (non-consumptive blocking waiting for CPU). Once one of its submitted task is completed, the quota is decremented, allowing another task waiting to be submitted to the ThreadPoolExecutor
wait for the remaining tasks to complete
I found this rejection policy in elastic search client. It blocks caller thread on blocking queue. Code below-
static class ForceQueuePolicy implements XRejectedExecutionHandler
{
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor)
{
try
{
executor.getQueue().put(r);
}
catch (InterruptedException e)
{
//should never happen since we never wait
throw new EsRejectedExecutionException(e);
}
}
#Override
public long rejected()
{
return 0;
}
}
I recently had a need to achieve something similar, but on a ScheduledExecutorService.
I had to also ensure that I handle the delay being passed on the method and ensure that either the task is submitted to execute at the time as the caller expects or just fails thus throwing a RejectedExecutionException.
Other methods from ScheduledThreadPoolExecutor to execute or submit a task internally call #schedule which will still in turn invoke the methods overridden.
import java.util.concurrent.*;
public class BlockingScheduler extends ScheduledThreadPoolExecutor {
private final Semaphore maxQueueSize;
public BlockingScheduler(int corePoolSize,
ThreadFactory threadFactory,
int maxQueueSize) {
super(corePoolSize, threadFactory, new AbortPolicy());
this.maxQueueSize = new Semaphore(maxQueueSize);
}
#Override
public ScheduledFuture<?> schedule(Runnable command,
long delay,
TimeUnit unit) {
final long newDelayInMs = beforeSchedule(command, unit.toMillis(delay));
return super.schedule(command, newDelayInMs, TimeUnit.MILLISECONDS);
}
#Override
public <V> ScheduledFuture<V> schedule(Callable<V> callable,
long delay,
TimeUnit unit) {
final long newDelayInMs = beforeSchedule(callable, unit.toMillis(delay));
return super.schedule(callable, newDelayInMs, TimeUnit.MILLISECONDS);
}
#Override
public ScheduledFuture<?> scheduleAtFixedRate(Runnable command,
long initialDelay,
long period,
TimeUnit unit) {
final long newDelayInMs = beforeSchedule(command, unit.toMillis(initialDelay));
return super.scheduleAtFixedRate(command, newDelayInMs, unit.toMillis(period), TimeUnit.MILLISECONDS);
}
#Override
public ScheduledFuture<?> scheduleWithFixedDelay(Runnable command,
long initialDelay,
long period,
TimeUnit unit) {
final long newDelayInMs = beforeSchedule(command, unit.toMillis(initialDelay));
return super.scheduleWithFixedDelay(command, newDelayInMs, unit.toMillis(period), TimeUnit.MILLISECONDS);
}
#Override
protected void afterExecute(Runnable runnable, Throwable t) {
super.afterExecute(runnable, t);
try {
if (t == null && runnable instanceof Future<?>) {
try {
((Future<?>) runnable).get();
} catch (CancellationException | ExecutionException e) {
t = e;
} catch (InterruptedException ie) {
Thread.currentThread().interrupt(); // ignore/reset
}
}
if (t != null) {
System.err.println(t);
}
} finally {
releaseQueueUsage();
}
}
private long beforeSchedule(Runnable runnable, long delay) {
try {
return getQueuePermitAndModifiedDelay(delay);
} catch (InterruptedException e) {
getRejectedExecutionHandler().rejectedExecution(runnable, this);
return 0;
}
}
private long beforeSchedule(Callable callable, long delay) {
try {
return getQueuePermitAndModifiedDelay(delay);
} catch (InterruptedException e) {
getRejectedExecutionHandler().rejectedExecution(new FutureTask(callable), this);
return 0;
}
}
private long getQueuePermitAndModifiedDelay(long delay) throws InterruptedException {
final long beforeAcquireTimeStamp = System.currentTimeMillis();
maxQueueSize.tryAcquire(delay, TimeUnit.MILLISECONDS);
final long afterAcquireTimeStamp = System.currentTimeMillis();
return afterAcquireTimeStamp - beforeAcquireTimeStamp;
}
private void releaseQueueUsage() {
maxQueueSize.release();
}
}
I have the code here, will appreciate any feedback.
https://github.com/AmitabhAwasthi/BlockingScheduler

Categories