Good day.
I have blocker issue with my web crawler project.
Logic is simple. First creates one Runnable, it downloads html document, scans all links and then on all funded links it creates new Runnable objects. Each new created Runnable in its turn creates new Runnable objects for each link and execute them.
Problem is that ExecutorService never stops.
CrawlerTest.java
public class CrawlerTest {
public static void main(String[] args) throws InterruptedException {
new CrawlerService().crawlInternetResource("https://jsoup.org/");
}
}
CrawlerService.java
import java.io.IOException;
import java.util.Collections;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class CrawlerService {
private Set<String> uniqueUrls = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>(10000));
private ExecutorService executorService = Executors.newFixedThreadPool(8);
private String baseDomainUrl;
public void crawlInternetResource(String baseDomainUrl) throws InterruptedException {
this.baseDomainUrl = baseDomainUrl;
System.out.println("Start");
executorService.execute(new Crawler(baseDomainUrl)); //Run first thread and scan main domain page. This thread produce new threads.
executorService.awaitTermination(10, TimeUnit.MINUTES);
System.out.println("End");
}
private class Crawler implements Runnable { // Inner class that encapsulates thread and scan for links
private String urlToCrawl;
public Crawler(String urlToCrawl) {
this.urlToCrawl = urlToCrawl;
}
public void run() {
try {
findAllLinks();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private void findAllLinks() throws InterruptedException {
/*Try to add new url in collection, if url is unique adds it to collection,
* scan document and start new thread for finded links*/
if (uniqueUrls.add(urlToCrawl)) {
System.out.println(urlToCrawl);
Document htmlDocument = loadHtmlDocument(urlToCrawl);
Elements findedLinks = htmlDocument.select("a[href]");
for (Element link : findedLinks) {
String absLink = link.attr("abs:href");
if (absLink.contains(baseDomainUrl) && !absLink.contains("#")) { //Check that we are don't go out of domain
executorService.execute(new Crawler(absLink)); //Start new thread for each funded link
}
}
}
}
private Document loadHtmlDocument(String internetResourceUrl) {
Document document = null;
try {
document = Jsoup.connect(internetResourceUrl).ignoreHttpErrors(true).ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0")
.timeout(10000).get();
} catch (IOException e) {
System.out.println("Page load error");
e.printStackTrace();
}
return document;
}
}
}
This app need about 20 secs to scan jsoup.org for all unique links. But it just wait 10 minutes executorService.awaitTermination(10, TimeUnit.MINUTES);
and then I see dead main thread and still working executor.
Threads
How to force ExecutorService work correctly?
I think problem is that it invoke executorService.execute inside another task instead in main thread.
You are misusing awaitTermination. According to javadoc you should call shutdown first:
Blocks until all tasks have completed execution after a shutdown request, or the timeout occurs, or the current thread is interrupted, whichever happens first.
To achieve your goal I'd suggest to use CountDownLatch (or latch that support increments like this one) to determine exact moment when there is no tasks left so you safely can do shutdown.
I see your comment from earlier:
I can't use CountDownLatch because I don't know beforehand how many unique links I will collect from resource.
First off, vsminkov is spot on with the answer as to why awaitTermniation will sit and wait for 10 minutes. I will offer an alternate solution.
Instead of using a CountDownLatch use a Phaser. For each new task, you can register, and await completion.
Create a single phaser and register each time a execute.submit is invoked and arrive each time a Runnable completes.
public void crawlInternetResource(String baseDomainUrl) {
this.baseDomainUrl = baseDomainUrl;
Phaser phaser = new Phaser();
executorService.execute(new Crawler(phaser, baseDomainUrl));
int phase = phaser.getPhase();
phase.awaitAdvance(phase);
}
private class Crawler implements Runnable {
private final Phaser phaser;
private String urlToCrawl;
public Crawler(Phaser phaser, String urlToCrawl) {
this.urlToCrawl = urlToCrawl;
this.phaser = phaser;
phaser.register(); // register new task
}
public void run(){
...
phaser.arrive(); //may want to surround this in try/finally
}
You are not calling shutdown.
This may work - An AtomicLong variable in the CrawlerService. Increment before every new sub task is submitted to executor service.
Modify your run() method to decrement this counter and if 0, shutdown the executor service
public void run() {
try {
findAllLinks();
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
//decrements counter
//If 0, shutdown executor from here or just notify CrawlerService who would be doing wait().
}
}
In the "finally", reduce the counter and when the counter is zero, shutdown executor or just notify CrawlerService. 0 means, this is the last one, no other is running, none pending in queue. No task will submit any new sub tasks.
How to force ExecutorService work correctly?
I think problem is that it invoke executorService.execute inside another task instead in main thread.
No. The problem is not with ExecutorService. You are using APIs in incorrect manner and hence not getting right result.
You have to use three APIs in a certain order to get right result.
1. shutdown
2. awaitTermination
3. shutdownNow
Recommended way from oracle documentation page of ExecutorService:
void shutdownAndAwaitTermination(ExecutorService pool) {
pool.shutdown(); // Disable new tasks from being submitted
try {
// Wait a while for existing tasks to terminate
if (!pool.awaitTermination(60, TimeUnit.SECONDS)) {
pool.shutdownNow(); // Cancel currently executing tasks
// Wait a while for tasks to respond to being cancelled
if (!pool.awaitTermination(60, TimeUnit.SECONDS))
System.err.println("Pool did not terminate");
}
} catch (InterruptedException ie) {
// (Re-)Cancel if current thread also interrupted
pool.shutdownNow();
// Preserve interrupt status
Thread.currentThread().interrupt();
}
shutdown(): Initiates an orderly shutdown in which previously submitted tasks are executed, but no new tasks will be accepted.
shutdownNow():Attempts to stop all actively executing tasks, halts the processing of waiting tasks, and returns a list of the tasks that were awaiting execution.
awaitTermination():Blocks until all tasks have completed execution after a shutdown request, or the timeout occurs, or the current thread is interrupted, whichever happens first.
On a different note: If you want to wait for all tasks to complete, refer to this related SE question:
wait until all threads finish their work in java
I prefer using invokeAll() or ForkJoinPool(), which are best suited for your use case.
I have a java class to handle a multithreaded subscription service. By implementing the Subscribable interface, tasks can be submitted to the service and periodically executed. A sketch of the code is seen below:
import java.util.concurrent.*;
public class Subscribtions {
private ConcurrentMap<Subscribable, Future<?>> futures = new ConcurrentHashMap<Subscribable, Future<?>>();
private ConcurrentMap<Subscribable, Integer> cacheFutures = new ConcurrentHashMap<Subscribable, Integer>();
private ScheduledExecutorService threads;
public Subscribtions() {
threads = Executors.newScheduledThreadPool(16);
}
public void subscribe(Subscribable subscription) {
Runnable runnable = getThread(subscription);
Future<?> future = threads.scheduleAtFixedRate(runnable, subscription.getInitialDelay(), subscription.getPeriod(), TimeUnit.SECONDS);
futures.put(subscription, future);
}
/*
* Only called from controller thread
*/
public void unsubscribe(Subscribable subscription) {
Future<?> future = futures.remove(subscription); //1. Might be removed by worker thread
if (future != null)
future.cancel(false);
else {
//3. Worker-thread view := cacheFutures.put() -> futures.remove()
//4. Controller-thread has seen futures.remove(), but has it seen cacheFutures.put()?
}
}
/*
* Only called from worker threads
*/
private void delay(Runnable runnable, Subscribable subscription, long delay) {
cacheFutures.put(subscription, 0); //2. Which is why it is cached first
Future<?> currentFuture = futures.remove(subscription);
if (currentFuture != null) {
currentFuture.cancel(false);
Future<?> future = threads.scheduleAtFixedRate(runnable, delay, subscription.getPeriod(), TimeUnit.SECONDS);
futures.put(subscription, future);
}
}
private Runnable getThread(Subscribable subscription) {
return new Runnable() {
public void run() {
//Do work...
boolean someCondition = true;
long someDelay = 100;
if (someCondition) {
delay(this, subscription, someDelay);
}
}
};
}
public interface Subscribable {
long getInitialDelay();
long getPeriod();
}
}
So the class permits to:
Subscribe to new tasks
Unsubscribe from existing tasks
Delay a periodically executed task
Subscriptions are added/removed by an external controlling thread, but delays are incurred only by the internal worker threads. This could happen, if for instance a worker thread found no update from the last execution or e.g. if the thread only needs to execute from 00.00 - 23.00.
My problem is that a worker thread may call delay() and remove its future from the ConcurrentMap, and the controller thread may concurrently call unsubscribe(). Then if the controller thread checks the ConcurrentMap before the worker thread has put in a new future, the unsubscribe() call will be lost.
There are some (not exhaustive list perhaps) solutions:
Use a lock between the delay() and unsubscribe() methods
Same as above, but one lock per subscribtion
(preferred?) Use no locks, but "cache" removed futures in the delay() method
As for the third solution, since the worker-thread has established the happens-before relationship cacheFutures.put() -> futures.remove(), and the atomicity of ConcurrentMap makes the controller thread see futures.remove(), does it also see the same happens-before relationship as the worker thread? I.e. cacheFutures.put() -> futures.remove()? Or does the atomicity only hold for the futures map with updates to other variables being propagated later?
Any other comments are also welcome, esp. considering use of the volatile keyword. Should the cache-map be declared volatile? thanks!
One lock per subscription would require you to maintain yet another map, and possibly thereby to introduce additional concurrency issues. I think that would be better avoided. The same applies even more so to caching removed subscriptions, plus that affords the added risk of unwanted resource retention (and note that it's not the Futures themselves that you would need to cache, but rather the Subscribables with which they are associated).
Any way around, you will need some kind of synchronization / locking. For example, in your option (3) you need to avoid an unsubscribe() for a given subscription happening between delay() caching that subscription and removing its Future. The only way you could avoid that without some form of locking would be if you could use just one Future per subscription, kept continuously in place from the time it is enrolled by subscribe() until it is removed by unsubscribe(). Doing so is not consistent with the ability to delay an already-scheduled subscription.
As for the third solution, since the worker-thread has established the happens-before relationship cacheFutures.put() -> futures.remove(), and the atomicity of ConcurrentMap makes the controller thread see futures.remove(), does it also see the same happens-before relationship as the worker thread?
Happens-before is a relationship between actions in an execution of a program. It is not specific to any one thread's view of the execution.
Or does the atomicity only hold for the futures map with updates to other variables being propagated later?
The controller thread will always see the cacheFutures.put() performed by an invocation of delay() occuring before the futures.remove() performed by that same invocation. I don't think that helps you, though.
Should the cache-map be declared volatile?
No. That would avail nothing, because although the contents of that map change, the map itself is always the same object, and the reference to it does not change.
You could consider having subscribe(), delay(), and unsubscribe() each synchronize on the Subscribable presented. That's not what I understood you to mean about having a lock per subscription, but it is similar. It would avoid the need for a separate data structure to maintain such locks. I guess you could also build locking methods into the Subscribable interface if you want to avoid explicit synchronization.
You have a ConcurrentMap but you aren't using it. Consider something along these lines:
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
final class SO33555545
{
public static void main(String... argv)
throws InterruptedException
{
ScheduledExecutorService workers = Executors.newScheduledThreadPool(16);
Subscriptions sub = new Subscriptions(workers);
sub.subscribe(() -> System.out.println("Message received: A"));
sub.subscribe(() -> System.out.println("Message received: B"));
Thread.sleep(TimeUnit.SECONDS.toMillis(30));
workers.shutdown();
}
}
final class Subscriptions
{
private final ConcurrentMap<Subscribable, Task> tasks = new ConcurrentHashMap<>();
private final ScheduledExecutorService workers;
public Subscriptions(ScheduledExecutorService workers)
{
this.workers = workers;
}
void subscribe(Subscribable sub)
{
Task task = new Task(sub);
Task current = tasks.putIfAbsent(sub, task);
if (current != null)
throw new IllegalStateException("Already subscribed");
task.activate();
}
private Future<?> schedule(Subscribable sub)
{
Runnable task = () -> {
sub.invoke();
if (Math.random() < 0.25) {
System.out.println("Delaying...");
delay(sub, 5);
}
};
return workers.scheduleAtFixedRate(task, sub.getPeriod(), sub.getPeriod(), TimeUnit.SECONDS);
}
void unsubscribe(Subscribable sub)
{
Task task = tasks.remove(sub);
if (task != null)
task.cancel();
}
private void delay(Subscribable sub, long delay)
{
Task task = new Task(sub);
Task obsolete = tasks.replace(sub, task);
if (obsolete != null) {
obsolete.cancel();
task.activate();
}
}
private final class Task
{
private final FutureTask<Future<?>> future;
Task(Subscribable sub)
{
this.future = new FutureTask<>(() -> schedule(sub));
}
void activate()
{
future.run();
}
void cancel()
{
boolean interrupted = false;
while (true) {
try {
future.get().cancel(false);
break;
}
catch (ExecutionException ignore) {
ignore.printStackTrace(); /* Cancellation is unnecessary. */
break;
}
catch (InterruptedException ex) {
interrupted = true; /* Keep waiting... */
}
}
if (interrupted)
Thread.currentThread().interrupt(); /* Reset interrupt state. */
}
}
}
#FunctionalInterface
interface Subscribable
{
default long getPeriod()
{
return 4;
}
void invoke();
}
I am working on a project in which I will be having different Bundles. Let's take an example, Suppose I have 5 Bundles and each of those bundles will have a method name process.
Now currently, I am calling the process method of all those 5 bundles sequentially, one by one and then I am writing to the database. But that's what I don't want.
Below are the things that I am looking for-
I need to call all those 5 Bundles process method in parallel using multithreaded code and then write to the database. I am not sure what is the right way to do that? Should I have five thread? One thread for each bundle? But what will happen in that scenario, suppose if I have 50 bundles, then I will have 50 threads?
And also, I want to have timeout feature as well. If any bundles is taking lot of time than the threshold setup by us, then it should get timeout and log as an error that this bundle has taken lot of time.
I hope the question is clear enough.
Below is the code I have so far which is calling process method of each bundles sequentially one by one.
public void callBundles(final Map<String, Object> eventData) {
final Map<String, String> outputs = (Map<String, String>)eventData.get(Constants.HOLDER);
for (final BundleRegistration.BundlesHolderEntry entry : BundleRegistration.getInstance()) {
// calling the process method of a bundle
final Map<String, String> response = entry.getPlugin().process(outputs);
// then write to the database.
System.out.println(response);
}
}
I am not sure what is the best and efficient way to do this? And I don't want to write sequentially. Because, in future, it might be possible that I will have more than 5 bundles.
Can anyone provide me an example of how can I do this? I have tried doing it but somehow it is not the way I am looking for.
Any help will be appreciated on this. Thanks.
Update:-
This is what I came up with-
public void callBundles(final Map<String, Object> eventData) {
// Three threads: one thread for the database writer, five threads for the plugin processors
final ExecutorService executor = Executors.newFixedThreadPool(5);
final BlockingQueue<Map<String, String>> queue = new LinkedBlockingQueue<Map<String, String>>();
#SuppressWarnings("unchecked")
final Map<String, String> outputs = (Map<String, String>)eventData.get(Constants.EVENT_HOLDER);
for (final BundleRegistration.BundlesHolderEntry entry : BundleRegistration.getInstance()) {
executor.submit(new Runnable () {
public void run() {
final Map<String, String> response = entry.getPlugin().process(outputs);
// put the response map in the queue for the database to read
queue.offer(response);
}
});
}
Future<?> future = executor.submit(new Runnable () {
public void run() {
Map<String, String> map;
try {
while(true) {
// blocks until a map is available in the queue, or until interrupted
map = queue.take();
// write map to database
System.out.println(map);
}
} catch (InterruptedException ex) {
// IF we're catching InterruptedException then this means that future.cancel(true)
// was called, which means that the plugin processors are finished;
// process the rest of the queue and then exit
while((map = queue.poll()) != null) {
// write map to database
System.out.println(map);
}
}
}
});
// this interrupts the database thread, which sends it into its catch block
// where it processes the rest of the queue and exits
future.cancel(true); // interrupt database thread
// wait for the threads to finish
try {
executor.awaitTermination(5, TimeUnit.MINUTES);
} catch (InterruptedException e) {
//log error here
}
}
But I was not able to add any timeout feature into this yet.. And also If I am run my above code as it is, then also it is not running.. I am missing anything?
Can anybody help me with this?
This is BASIC example, partially based on the solution presented in ExecutorService that interrupts tasks after a timeout.
You will have to figure out the best way to get this implemented into your own code. Use it only as a guide!
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
public class ExecutorExample {
// This is used to "expire" long running tasks
protected static final ScheduledExecutorService EXPIRE_SERVICE = Executors.newScheduledThreadPool(1);
// This is used to manage the bundles and process them as required
protected static final ExecutorService BUNDLES_SERVICE = Executors.newFixedThreadPool(10);
public static void main(String[] args) {
// A list of the future tasks created by the BUNDLES_SERVICE.
// We need this so we can monitor the progress of the output
List<Future<String>> futureTasks = new ArrayList<>(100);
// This is a list of all the tasks that have either completed
// or begin canceled...we want these so we can determine
// the results...
List<Future<String>> completedTasks = new ArrayList<>(100);
// Add all the Bundles to the BUNDLES_SERVICE
for (int index = 0; index < 100; index++) {
Bundle bundle = new Bundle();
// We need a reference to the future so we can cancel it if we
// need to
Future<String> futureBundle = BUNDLES_SERVICE.submit(bundle);
// Set this bundles future, see Bundle for details
bundle.setFuture(futureBundle);
// Add it to our monitor queue...
futureTasks.add(futureBundle);
}
// Basically we are going to move all completed/canceled bundles
// from the "active" to the completed list and wait until there
// are no more "active" tasks
while (futureTasks.size() > 0) {
try {
// Little bit of a pressure release...
Thread.sleep(1000);
} catch (InterruptedException ex) {
}
// Check all the bundles...
for (Future<String> future : futureTasks) {
// If it has completed or was cancelled, move it to the completed
// list. AKAIK, isDone will return true is isCancelled is true as well,
// but this illustrates the point
if (future.isCancelled() || future.isDone()) {
completedTasks.add(future);
}
}
// Remove all the completed tasks from the future tasks lists
futureTasks.removeAll(completedTasks);
// Some idea of progress...
System.out.println("Still have " + futureTasks.size() + " outstanding tasks...");
}
// Dump the results...
int index = 0;
for (Future<String> future : completedTasks) {
index++;
System.out.print("Task " + index);
if (future.isCancelled()) {
System.out.println(" was canceled");
} else if (future.isDone()) {
try {
System.out.println(" completed with " + future.get());
} catch (Exception ex) {
System.out.println(" failed because of " + ex.getMessage());
}
}
}
System.exit(0);
}
public static class ExpireBundle implements Runnable {
private final Future futureBundle;
public ExpireBundle(Future futureBundle) {
this.futureBundle = futureBundle;
}
#Override
public void run() {
futureBundle.cancel(true);
}
}
public static class Bundle implements Callable<String> {
private volatile Future<String> future;
#Override
public String call() throws Exception {
// This is the tricky bit. In order to cancel a task, we
// need to wait until it runs, but we also need it's future...
// We could use another, single threaded queue to do the job
// but that's getting messy again and it won't provide the information
// we need back to the original calling thread that we are using
// to schedule and monitor the threads...
// We need to have a valid future before we can continue...
while (future == null) {
Thread.sleep(250);
}
// Schedule an expiry call for 5 seconds from NOW...this is important
// I original thought about doing this when I schedule the original
// bundle, but that precluded the fact that some tasks would not
// have started yet...
EXPIRE_SERVICE.schedule(new ExpireBundle(future), 5, TimeUnit.SECONDS);
// Sleep for a random amount of time from 1-10 seconds
Thread.sleep((long) (Math.random() * 9000) + 1000);
return "Happy";
}
protected void setFuture(Future<String> future) {
this.future = future;
}
}
}
Also. I had thought of using ExecutorService#invokeAll to wait for the tasks to complete, but this precluded the ability to timeout tasks. I don't like having to feed the Future into the Callable, but no other solution seemed to come to mind that would allow me to get the results from the submitted Future.
I want to create a ThreadPoolExecutor such that when it has reached its maximum size and the queue is full, the submit() method blocks when trying to add new tasks. Do I need to implement a custom RejectedExecutionHandler for that or is there an existing way to do this using a standard Java library?
One of the possible solutions I've just found:
public class BoundedExecutor {
private final Executor exec;
private final Semaphore semaphore;
public BoundedExecutor(Executor exec, int bound) {
this.exec = exec;
this.semaphore = new Semaphore(bound);
}
public void submitTask(final Runnable command)
throws InterruptedException, RejectedExecutionException {
semaphore.acquire();
try {
exec.execute(new Runnable() {
public void run() {
try {
command.run();
} finally {
semaphore.release();
}
}
});
} catch (RejectedExecutionException e) {
semaphore.release();
throw e;
}
}
}
Are there any other solutions? I'd prefer something based on RejectedExecutionHandler since it seems like a standard way to handle such situations.
You can use ThreadPoolExecutor and a blockingQueue:
public class ImageManager {
BlockingQueue<Runnable> blockingQueue = new ArrayBlockingQueue<Runnable>(blockQueueSize);
RejectedExecutionHandler rejectedExecutionHandler = new ThreadPoolExecutor.CallerRunsPolicy();
private ExecutorService executorService = new ThreadPoolExecutor(numOfThread, numOfThread,
0L, TimeUnit.MILLISECONDS, blockingQueue, rejectedExecutionHandler);
private int downloadThumbnail(String fileListPath){
executorService.submit(new yourRunnable());
}
}
You should use the CallerRunsPolicy, which executes the rejected task in the calling thread. This way, it can't submit any new tasks to the executor until that task is done, at which point there will be some free pool threads or the process will repeat.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.CallerRunsPolicy.html
From the docs:
Rejected tasks
New tasks submitted in method execute(java.lang.Runnable) will be
rejected when the Executor has been
shut down, and also when the Executor
uses finite bounds for both maximum
threads and work queue capacity, and
is saturated. In either case, the
execute method invokes the
RejectedExecutionHandler.rejectedExecution(java.lang.Runnable,
java.util.concurrent.ThreadPoolExecutor)
method of its
RejectedExecutionHandler. Four
predefined handler policies are
provided:
In the default ThreadPoolExecutor.AbortPolicy, the
handler throws a runtime
RejectedExecutionException upon
rejection.
In ThreadPoolExecutor.CallerRunsPolicy,
the thread that invokes execute itself
runs the task. This provides a simple
feedback control mechanism that will
slow down the rate that new tasks are
submitted.
In ThreadPoolExecutor.DiscardPolicy, a
task that cannot be executed is simply
dropped.
In ThreadPoolExecutor.DiscardOldestPolicy,
if the executor is not shut down, the
task at the head of the work queue is
dropped, and then execution is retried
(which can fail again, causing this to
be repeated.)
Also, make sure to use a bounded queue, such as ArrayBlockingQueue, when calling the ThreadPoolExecutor constructor. Otherwise, nothing will get rejected.
Edit: in response to your comment, set the size of the ArrayBlockingQueue to be equal to the max size of the thread pool and use the AbortPolicy.
Edit 2: Ok, I see what you're getting at. What about this: override the beforeExecute() method to check that getActiveCount() doesn't exceed getMaximumPoolSize(), and if it does, sleep and try again?
I know, it is a hack, but in my opinion most clean hack between those offered here ;-)
Because ThreadPoolExecutor uses blocking queue "offer" instead of "put", lets override behaviour of "offer" of the blocking queue:
class BlockingQueueHack<T> extends ArrayBlockingQueue<T> {
BlockingQueueHack(int size) {
super(size);
}
public boolean offer(T task) {
try {
this.put(task);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
return true;
}
}
ThreadPoolExecutor tp = new ThreadPoolExecutor(1, 2, 1, TimeUnit.MINUTES, new BlockingQueueHack(5));
I tested it and it seems to work.
Implementing some timeout policy is left as a reader's exercise.
Hibernate has a BlockPolicy that is simple and may do what you want:
See: Executors.java
/**
* A handler for rejected tasks that will have the caller block until
* space is available.
*/
public static class BlockPolicy implements RejectedExecutionHandler {
/**
* Creates a <tt>BlockPolicy</tt>.
*/
public BlockPolicy() { }
/**
* Puts the Runnable to the blocking queue, effectively blocking
* the delegating thread until space is available.
* #param r the runnable task requested to be executed
* #param e the executor attempting to execute this task
*/
public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
try {
e.getQueue().put( r );
}
catch (InterruptedException e1) {
log.error( "Work discarded, thread was interrupted while waiting for space to schedule: {}", r );
}
}
}
The BoundedExecutor answer quoted above from Java Concurrency in Practice only works correctly if you use an unbounded queue for the Executor, or the semaphore bound is no greater than the queue size. The semaphore is state shared between the submitting thread and the threads in the pool, making it possible to saturate the executor even if queue size < bound <= (queue size + pool size).
Using CallerRunsPolicy is only valid if your tasks don't run forever, in which case your submitting thread will remain in rejectedExecution forever, and a bad idea if your tasks take a long time to run, because the submitting thread can't submit any new tasks or do anything else if it's running a task itself.
If that's not acceptable then I suggest checking the size of the executor's bounded queue before submitting a task. If the queue is full, then wait a short time before trying to submit again. The throughput will suffer, but I suggest it's a simpler solution than many of the other proposed solutions and you're guaranteed no tasks will get rejected.
The following class wraps around a ThreadPoolExecutor and uses a Semaphore to block then the work queue is full:
public final class BlockingExecutor {
private final Executor executor;
private final Semaphore semaphore;
public BlockingExecutor(int queueSize, int corePoolSize, int maxPoolSize, int keepAliveTime, TimeUnit unit, ThreadFactory factory) {
BlockingQueue<Runnable> queue = new LinkedBlockingQueue<Runnable>();
this.executor = new ThreadPoolExecutor(corePoolSize, maxPoolSize, keepAliveTime, unit, queue, factory);
this.semaphore = new Semaphore(queueSize + maxPoolSize);
}
private void execImpl (final Runnable command) throws InterruptedException {
semaphore.acquire();
try {
executor.execute(new Runnable() {
#Override
public void run() {
try {
command.run();
} finally {
semaphore.release();
}
}
});
} catch (RejectedExecutionException e) {
// will never be thrown with an unbounded buffer (LinkedBlockingQueue)
semaphore.release();
throw e;
}
}
public void execute (Runnable command) throws InterruptedException {
execImpl(command);
}
}
This wrapper class is based on a solution given in the book Java Concurrency in Practice by Brian Goetz. The solution in the book only takes two constructor parameters: an Executor and a bound used for the semaphore. This is shown in the answer given by Fixpoint. There is a problem with that approach: it can get in a state where the pool threads are busy, the queue is full, but the semaphore has just released a permit. (semaphore.release() in the finally block). In this state, a new task can grab the just released permit, but is rejected because the task queue is full. Of course this is not something you want; you want to block in this case.
To solve this, we must use an unbounded queue, as JCiP clearly mentions. The semaphore acts as a guard, giving the effect of a virtual queue size. This has the side effect that it is possible that the unit can contain maxPoolSize + virtualQueueSize + maxPoolSize tasks. Why is that? Because of the
semaphore.release() in the finally block. If all pool threads call this statement at the same time, then maxPoolSize permits are released, allowing the same number of tasks to enter the unit. If we were using a bounded queue, it would still be full, resulting in a rejected task. Now, because we know that this only occurs when a pool thread is almost done, this is not a problem. We know that the pool thread will not block, so a task will soon be taken from the queue.
You are able to use a bounded queue though. Just make sure that its size equals virtualQueueSize + maxPoolSize. Greater sizes are useless, the semaphore will prevent to let more items in. Smaller sizes will result in rejected tasks. The chance of tasks getting rejected increases as the size decreases. For example, say you want a bounded executor with maxPoolSize=2 and virtualQueueSize=5. Then take a semaphore with 5+2=7 permits and an actual queue size of 5+2=7. The real number of tasks that can be in the unit is then 2+5+2=9. When the executor is full (5 tasks in queue, 2 in thread pool, so 0 permits available) and ALL pool threads release their permits, then exactly 2 permits can be taken by tasks coming in.
Now the solution from JCiP is somewhat cumbersome to use as it doesn't enforce all these constraints (unbounded queue, or bounded with those math restrictions, etc.). I think that this only serves as a good example to demonstrate how you can build new thread safe classes based on the parts that are already available, but not as a full-grown, reusable class. I don't think that the latter was the author's intention.
you can use a custom RejectedExecutionHandler like this
ThreadPoolExecutor tp= new ThreadPoolExecutor(core_size, // core size
max_handlers, // max size
timeout_in_seconds, // idle timeout
TimeUnit.SECONDS, queue, new RejectedExecutionHandler() {
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
// This will block if the queue is full
try {
executor.getQueue().put(r);
} catch (InterruptedException e) {
System.err.println(e.getMessage());
}
}
});
I don't always like the CallerRunsPolicy, especially since it allows the rejected task to 'skip the queue' and get executed before tasks that were submitted earlier. Moreover, executing the task on the calling thread might take much longer than waiting for the first slot to become available.
I solved this problem using a custom RejectedExecutionHandler, which simply blocks the calling thread for a little while and then tries to submit the task again:
public class BlockWhenQueueFull implements RejectedExecutionHandler {
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
// The pool is full. Wait, then try again.
try {
long waitMs = 250;
Thread.sleep(waitMs);
} catch (InterruptedException interruptedException) {}
executor.execute(r);
}
}
This class can just be used in the thread-pool executor as a RejectedExecutinHandler like any other, for example:
executorPool = new ThreadPoolExecutor(1, 1, 10,
TimeUnit.SECONDS, new SynchronousQueue<Runnable>(),
new BlockWhenQueueFull());
The only downside I see is that the calling thread might get locked slightly longer than strictly necessary (up to 250ms). Moreover, since this executor is effectively being called recursively, very long waits for a thread to become available (hours) might result in a stack overflow.
Nevertheless, I personally like this method. It's compact, easy to understand, and works well.
Create your own blocking queue to be used by the Executor, with the blocking behavior you are looking for, while always returning available remaining capacity (ensuring the executor will not try to create more threads than its core pool, or trigger the rejection handler).
I believe this will get you the blocking behavior you are looking for. A rejection handler will never fit the bill, since that indicates the executor can not perform the task. What I could envision there is that you get some form of 'busy waiting' in the handler. That is not what you want, you want a queue for the executor that blocks the caller...
To avoid issues with #FixPoint solution. One could use ListeningExecutorService and release the semaphore onSuccess and onFailure inside FutureCallback.
Recently I found this question having the same problem. The OP does not say so explicitly, but we do not want to use the RejectedExecutionHandler which executes a task on the submitter's thread, because this will under-utilize the worker threads if this task is a long running one.
Reading all the answers and comments, in particular the flawed solution with the semaphore or using afterExecute I had a closer look at the code of the ThreadPoolExecutor to see if there is some way out. I was amazed to see that there are more than 2000 lines of (commented) code, some of which make me feel dizzy. Given the rather simple requirement I actually have --- one producer, several consumers, let the producer block when no consumers can take work --- I decided to roll my own solution. It is not an ExecutorService but just an Executor. And it does not adapt the number of threads to the work load, but holds a fixed number of threads only, which also fits my requirements. Here is the code. Feel free to rant about it :-)
package x;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executor;
import java.util.concurrent.RejectedExecutionException;
import java.util.concurrent.SynchronousQueue;
/**
* distributes {#code Runnable}s to a fixed number of threads. To keep the
* code lean, this is not an {#code ExecutorService}. In particular there is
* only very simple support to shut this executor down.
*/
public class ParallelExecutor implements Executor {
// other bounded queues work as well and are useful to buffer peak loads
private final BlockingQueue<Runnable> workQueue =
new SynchronousQueue<Runnable>();
private final Thread[] threads;
/*+**********************************************************************/
/**
* creates the requested number of threads and starts them to wait for
* incoming work
*/
public ParallelExecutor(int numThreads) {
this.threads = new Thread[numThreads];
for(int i=0; i<numThreads; i++) {
// could reuse the same Runner all over, but keep it simple
Thread t = new Thread(new Runner());
this.threads[i] = t;
t.start();
}
}
/*+**********************************************************************/
/**
* returns immediately without waiting for the task to be finished, but may
* block if all worker threads are busy.
*
* #throws RejectedExecutionException if we got interrupted while waiting
* for a free worker
*/
#Override
public void execute(Runnable task) {
try {
workQueue.put(task);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RejectedExecutionException("interrupt while waiting for a free "
+ "worker.", e);
}
}
/*+**********************************************************************/
/**
* Interrupts all workers and joins them. Tasks susceptible to an interrupt
* will preempt their work. Blocks until the last thread surrendered.
*/
public void interruptAndJoinAll() throws InterruptedException {
for(Thread t : threads) {
t.interrupt();
}
for(Thread t : threads) {
t.join();
}
}
/*+**********************************************************************/
private final class Runner implements Runnable {
#Override
public void run() {
while (!Thread.currentThread().isInterrupted()) {
Runnable task;
try {
task = workQueue.take();
} catch (InterruptedException e) {
// canonical handling despite exiting right away
Thread.currentThread().interrupt();
return;
}
try {
task.run();
} catch (RuntimeException e) {
// production code to use a logging framework
e.printStackTrace();
}
}
}
}
}
I believe there is quite elegant way to solve this problem by using java.util.concurrent.Semaphore and delegating behavior of Executor.newFixedThreadPool.
The new executor service will only execute new task when there is a thread to do so. Blocking is managed by Semaphore with number of permits equal to number of threads. When a task is finished it returns a permit.
public class FixedThreadBlockingExecutorService extends AbstractExecutorService {
private final ExecutorService executor;
private final Semaphore blockExecution;
public FixedThreadBlockingExecutorService(int nTreads) {
this.executor = Executors.newFixedThreadPool(nTreads);
blockExecution = new Semaphore(nTreads);
}
#Override
public void shutdown() {
executor.shutdown();
}
#Override
public List<Runnable> shutdownNow() {
return executor.shutdownNow();
}
#Override
public boolean isShutdown() {
return executor.isShutdown();
}
#Override
public boolean isTerminated() {
return executor.isTerminated();
}
#Override
public boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedException {
return executor.awaitTermination(timeout, unit);
}
#Override
public void execute(Runnable command) {
blockExecution.acquireUninterruptibly();
executor.execute(() -> {
try {
command.run();
} finally {
blockExecution.release();
}
});
}
I had the same need in the past: a kind of blocking queue with a fixed size for each client backed by a shared thread pool. I ended up writing my own kind of ThreadPoolExecutor:
UserThreadPoolExecutor
(blocking queue (per client) + threadpool (shared amongst all clients))
See: https://github.com/d4rxh4wx/UserThreadPoolExecutor
Each UserThreadPoolExecutor is given a maximum number of threads from a shared ThreadPoolExecutor
Each UserThreadPoolExecutor can:
submit a task to the shared thread pool executor if its quota is not reached. If its quota is reached, the job is queued (non-consumptive blocking waiting for CPU). Once one of its submitted task is completed, the quota is decremented, allowing another task waiting to be submitted to the ThreadPoolExecutor
wait for the remaining tasks to complete
I found this rejection policy in elastic search client. It blocks caller thread on blocking queue. Code below-
static class ForceQueuePolicy implements XRejectedExecutionHandler
{
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor)
{
try
{
executor.getQueue().put(r);
}
catch (InterruptedException e)
{
//should never happen since we never wait
throw new EsRejectedExecutionException(e);
}
}
#Override
public long rejected()
{
return 0;
}
}
I recently had a need to achieve something similar, but on a ScheduledExecutorService.
I had to also ensure that I handle the delay being passed on the method and ensure that either the task is submitted to execute at the time as the caller expects or just fails thus throwing a RejectedExecutionException.
Other methods from ScheduledThreadPoolExecutor to execute or submit a task internally call #schedule which will still in turn invoke the methods overridden.
import java.util.concurrent.*;
public class BlockingScheduler extends ScheduledThreadPoolExecutor {
private final Semaphore maxQueueSize;
public BlockingScheduler(int corePoolSize,
ThreadFactory threadFactory,
int maxQueueSize) {
super(corePoolSize, threadFactory, new AbortPolicy());
this.maxQueueSize = new Semaphore(maxQueueSize);
}
#Override
public ScheduledFuture<?> schedule(Runnable command,
long delay,
TimeUnit unit) {
final long newDelayInMs = beforeSchedule(command, unit.toMillis(delay));
return super.schedule(command, newDelayInMs, TimeUnit.MILLISECONDS);
}
#Override
public <V> ScheduledFuture<V> schedule(Callable<V> callable,
long delay,
TimeUnit unit) {
final long newDelayInMs = beforeSchedule(callable, unit.toMillis(delay));
return super.schedule(callable, newDelayInMs, TimeUnit.MILLISECONDS);
}
#Override
public ScheduledFuture<?> scheduleAtFixedRate(Runnable command,
long initialDelay,
long period,
TimeUnit unit) {
final long newDelayInMs = beforeSchedule(command, unit.toMillis(initialDelay));
return super.scheduleAtFixedRate(command, newDelayInMs, unit.toMillis(period), TimeUnit.MILLISECONDS);
}
#Override
public ScheduledFuture<?> scheduleWithFixedDelay(Runnable command,
long initialDelay,
long period,
TimeUnit unit) {
final long newDelayInMs = beforeSchedule(command, unit.toMillis(initialDelay));
return super.scheduleWithFixedDelay(command, newDelayInMs, unit.toMillis(period), TimeUnit.MILLISECONDS);
}
#Override
protected void afterExecute(Runnable runnable, Throwable t) {
super.afterExecute(runnable, t);
try {
if (t == null && runnable instanceof Future<?>) {
try {
((Future<?>) runnable).get();
} catch (CancellationException | ExecutionException e) {
t = e;
} catch (InterruptedException ie) {
Thread.currentThread().interrupt(); // ignore/reset
}
}
if (t != null) {
System.err.println(t);
}
} finally {
releaseQueueUsage();
}
}
private long beforeSchedule(Runnable runnable, long delay) {
try {
return getQueuePermitAndModifiedDelay(delay);
} catch (InterruptedException e) {
getRejectedExecutionHandler().rejectedExecution(runnable, this);
return 0;
}
}
private long beforeSchedule(Callable callable, long delay) {
try {
return getQueuePermitAndModifiedDelay(delay);
} catch (InterruptedException e) {
getRejectedExecutionHandler().rejectedExecution(new FutureTask(callable), this);
return 0;
}
}
private long getQueuePermitAndModifiedDelay(long delay) throws InterruptedException {
final long beforeAcquireTimeStamp = System.currentTimeMillis();
maxQueueSize.tryAcquire(delay, TimeUnit.MILLISECONDS);
final long afterAcquireTimeStamp = System.currentTimeMillis();
return afterAcquireTimeStamp - beforeAcquireTimeStamp;
}
private void releaseQueueUsage() {
maxQueueSize.release();
}
}
I have the code here, will appreciate any feedback.
https://github.com/AmitabhAwasthi/BlockingScheduler