ThreadPoolExecutor on java doc

ThreadPoolExecutor on java doc - java

at http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html
you can read the description of parameters to constructor.
Specifically in the "Core and maximum pool sizes" paragraph, it's written:
If there are more than corePoolSize but less than maximumPoolSize threads running, a new thread will be created only if the queue is full.
...
By setting maximumPoolSize to an essentially unbounded value such as Integer.MAX_VALUE, you allow the pool to accommodate an arbitrary number of concurrent tasks.
Now I can't understand what "only if the queue is full" in the first part stands for...
Will ThreadPoolExecutor wait until queue is full or it will simply create a new worker?
An suppose now that we have more tasks that aren't asynchronous between them: using a ThreadPoolExecutor could cause a deadlock? Supposing that my first 10 tasks are producer and that CorePoolSize is 10, then succeeding consumer tasks will go to queue and won't run until the queue is full? If so this behavior may cause deadlock because first 10 producers could go on wait, suspending all 10 threads of the Core.
When the queue is full?
I'm not sure I understood well the documentation, because Executors.newCachedThreadPool() seems to create a new Worker until maxPoolSize is reached and THEN it sends task to queue.
I'm a little confused.
Thank you

When you construct the ThreadPoolExecutor, you pass in an instance of BlockingQueue<Runnable> called workQueue, to hold the tasks, and it is this queue that is being referred to.
In fact, the section of the docs called "Queuing" goes into more detail about the phrase you're confused about:
Any BlockingQueue may be used to transfer and hold submitted tasks. The use of this queue interacts with pool sizing:
If fewer than corePoolSize threads are running, the Executor always prefers adding a new thread rather than queuing.
If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread.
If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected.
As for your second part, about inter-task dependencies - in this case I don't think it's a good idea to put them into an ExecutorService at all. The ExecutorService is good for running a self-contained bit of code at some point in the future, but by design it's not meant to be strongly deterministic about when this happens, other than "at some convenient point in the (hopefully near) future, after tasks that were previously queued have started."
Combine this lack of precision of timing, with the hard ordering requirements that concurrent operation imposes, and you can see that having a producer and a consumer that need to talk to each other, put into a general purpose ExecutorService, is a recipe for very annoying and confusing bugs.
Yes, I'm sure you could get it to work with sufficient tweaking of parameters. However, it wouldn't be clear why it worked, it wouldn't be clear what the dependencies were, and when (not if) it broke, it would be very hard to diagnose. (Harder than normal concurrency problems, I suspect). The bottom line is that an ExecutorService isn't designed to run Runnables with hard timing restrictions, so this could even be broken by a new release of Java, because it doesn't have to work like this.
I think you're asking the wrong question, by looking at the details when perhaps your concepts are a little shaky. Perhaps if you explained what you wanted to achieve there would be a better way to go about it.

To clarify your first quote - if the executor already has corePoolSize threads running, and all of those threads are busy when a new task is submitted, it will not create any more threads, but will enqueue the task until one of the threads becomes free. It will only create a new thread (up to the maxPoolSize limit) when the queue becomes full. If the queue is huge (i.e. bounded only by memory constraints) then no more than corePoolSize threads will ever be created.
Executors.newCachedThreadPool() will create an executor with a zero-sized queue, so the queue is always full. This means that it will create a new thread (or re-use an idle one) as soon as you submit the task. As such, it's not a good demonstration of what the core/max/queue parameters are for.

Related

Which blocking queue to use with ThreadPoolExecutor? Any advantages with fixed capacity LinkedBlockingDeque?

I see from the java docs -
ThreadPoolExecutor(int corePoolSize,
int maximumPoolSize,
long keepAliveTime,
TimeUnit unit,
BlockingQueue<Runnable> workQueue,
RejectedExecutionHandler handler)
Where -
workQueue – the queue to use for holding tasks before they are executed. This queue will hold only the Runnable tasks submitted by the execute method.
Now java provides various type of blocking queues and the java doc clearly say when to use what type of queue with ThreadPoolExecutor-
Queuing
Any BlockingQueue may be used to transfer and hold submitted tasks. The use of this queue interacts with pool sizing:
If fewer than corePoolSize threads are running, the Executor always prefers adding a new thread rather than queuing.
If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread.
If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected.
There are three general strategies for queuing:
Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed. This policy avoids lockups when handling sets of requests that might have internal dependencies. Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection of new submitted tasks. This in turn admits the possibility of unbounded thread growth when commands continue to arrive on average faster than they can be processed.
Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.) This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution; for example, in a web page server. While this style of queuing can be useful in smoothing out transient bursts of requests, it admits the possibility of unbounded work queue growth when commands continue to arrive on average faster than they can be processed.
Bounded queues. A bounded queue (for example, an ArrayBlockingQueue) helps prevent resource exhaustion when used with finite maximumPoolSizes, but can be more difficult to tune and control. Queue sizes and maximum pool sizes may be traded off for each other: Using large queues and small pools minimizes CPU usage, OS resources, and context-switching overhead, but can lead to artificially low throughput. If tasks frequently block (for example if they are I/O bound), a system may be able to schedule time for more threads than you otherwise allow. Use of small queues generally requires larger pool sizes, which keeps CPUs busier but may encounter unacceptable scheduling overhead, which also decreases throughput.
Below is my Question -
I have seen code usages as below -
BlockingQueue<Runnable> workQueue = new LinkedBlockingDeque<>(90);
ExecutorService executorService = new ThreadPoolExecutor(1, 10, 30,
TimeUnit.SECONDS, workQueue,
new ThreadPoolExecutor.CallerRunsPolicy());
So, as the Deque (in the above code) is anyway of fixed capacity. What advantage am I getting with LinkedBlockingDeque<>(90) when compared to below -
LinkedBlockingQueue<>(90) ; - just want to know about deque advantage over queue in this case not in general. How the Executor will benefit for a deque over a queue.
ArrayBlockingQueue<>(90) ; - (i see one can also mention fairness etc but this not of my current interest) So why not just use an Array over Deque (i.e when using a deque of fixed capacity).

LinkedBlockingQueue is an optionally-bounded blocking queue based on linked nodes. Its capacity is not limited.
ArrayBlockingQueue is bounded blocking queue in which a fixed-sized array holds elements.
In your case, there's no benefit anywhere. ArrayBlockingQueue may prove to be more efficient, as it uses fixed-size array in a single memory span.
Difference between Queue and Deque is just it's mechanism. Queue is LIFO while Deque is FIFO.
In LIFO the last task inserted is the last to be executed
In FIFO the last task inserted is the first one to be executed
Consider the following: You want your tasks to be executed as they come in? Use LIFO. You want your tasks to be executed the other way around? use FIFO.

The main benefit is when you're using the thread pool to execute some kind of a pipeline. As a rule of thumb, at each stage in a pipeline, the queue either is almost always empty (producer(s) tend(s) to be slower than the consumer(s)), or else the queue almost always is full (producer(s) tend(s) to be faster.)
If the producer(s) is/are faster, and if the application is meant to continue running indefinitely, then you need a fixed-size, blocking queue to put "back pressure" on the producers. If there was no back pressure, then the queue would continue to grow until eventually, some bad thing happened. (e.g., process runs out of memory, or system breaks down because "tasks" spend too much time delayed in the queues.)

What is the use case for unbounded queue in Java Executors?

Executors factory from Java uses unbounded pending tasks queue. For instance, Executors.newFixedThreadPool uses new LinkedBlockingQueue which has not limit for tasks to be accepted.
public static ExecutorService newFixedThreadPool(int nThreads) {
return new ThreadPoolExecutor(nThreads, nThreads,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>());
}
When new task arrives, and there is no thread available it goes to the queue. Tasks can be added to the queue indefinitely causing OutOfMemoryError.
What is the scenario for using this approach actually? Why Java creators didn't use bounded queue? I can't imagine a scenario when unbounded is better the bounded, but I may be missing something. Can someone provide a decent explanation? Best!

This is the default approach and the user can choose to change to a bounded queue.
Now maybe your question is why is this the default?
It is actually harder to deal with bounded queues, what would you do if the queue is full? You drop the task and don't accept it? You throw an exception and fail the entire process? Isn't that what would happen in the case OOM? So all these are decision need to be taken by the user whose accepting lots of long running tasks, which is not the the default Java user.
A use case for unbounded queue could simply be when you only expect a small number of running concurrent requests but you don't know exactly how much or you can implement back pressure in a different stage of your application like throttling your API requests.

You can reject tasks by using ArrayBlockingQueue (bounded blocking queue)
final BlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(100);
executorService = new ThreadPoolExecutor(n, n,
0L, TimeUnit.MILLISECONDS,
queue);
Code above is equivalent to Executors.newFixedThreadPool(n), however instead of default unlimited LinkedBlockingQueue we use ArrayBlockingQueue with fixed capacity of 100. This means that if 100 tasks are already queued (and n being executed), new task will be rejected with RejectedExecutionException.

Tasks can be added to the queue indefinitely causing OutOfMemoryError
No. The queue is not really unbouned, for an unbounded LinkedBlockingQueue, it's capacity is Integer.MAX_VALUE(2147483647). When there is no enough space, RejectedExecutionHandler will handle new arrival tasks. And the default handler is AbortPolicy, it will abort new tasks directly.
I can't imagine a scenario when unbounded is better the bounded
The users might don't care about the queue size, or they just don't want to limit the cached tasks.
If you do care about it, you can create a ThreadPoolExecutor with custom construtor.

Since you're asking about the "use case", very simple: every time you have a lot of single tasks you want to finish eventually. Say you want to download 100s of thousands of files? Create a download task for each, submit to ExecutorService, wait for termination. The tasks will finish eventually since you don't add any more, and there's no reason for a limit.

ThreadPoolExecutor javadoc, three strategies for queuing and lockups

In Oracle documentation for ThreadPoolExecutor class it is written:
There are three general strategies for queuing:
Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands
off tasks to threads without otherwise holding them. Here, an attempt
to queue a task will fail if no threads are immediately available to
run it, so a new thread will be constructed. This policy avoids
lockups when handling sets of requests that might have internal
dependencies. Direct handoffs generally require unbounded
maximumPoolSizes to avoid rejection of new submitted tasks. This in
turn admits the possibility of unbounded thread growth when commands
continue to arrive on average faster than they can be processed.
Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new
tasks to wait in the queue when all corePoolSize threads are busy.
Thus, no more than corePoolSize threads will ever be created. (And the
value of the maximumPoolSize therefore doesn't have any effect.) This
may be appropriate when each task is completely independent of others,
so tasks cannot affect each others execution; for example, in a web
page server. While this style of queuing can be useful in smoothing
out transient bursts of requests, it admits the possibility of
unbounded work queue growth when commands continue to arrive on
average faster than they can be processed.
...
Why direct handoff strategy is better at avoiding lockups in comparison to unbounded queues strategy? Or do I understand it incorrectly?

Let's say you have a corePoolSize = 1. If the first task submits another task to the same pool and wait for the results it will lock up indefinitely.
However if a task is completley independent there would be no reason to use direct handoff in regards to preventing lockups.
This is just an example, internal dependency can mean a lot of different things.

What is the exact behavior of ExecutorService.submit (in terms of queuing requests)?

I want to use an ExecutorService that uses a single thread. And now I am inserting requests (via submit) on a higher rate than that thread can deal with them. What happens?
I am specifically wondering:
Are there any guarantees on ordering - will tasks be executed in the exact same order?
Is there a (theoretical) limit on which the ExecutorService will start throwing away incoming requests?
Out of curiosity: what changes when the service is using a pool of threads?
(sure, I can assume that some queue might be used; and that the Oracle implementation just "does the right thing"; but I am actually wondering if there is a real "spec" somewhere that nails down the expected behavior)

If you created a fixed thread-pool ExecutorService with Executors.newFixedThreadPool(1); (or newSingleThreadExecutor()) then the Javadoc clearly specifies what happens.
Are there any guarantees on ordering - will tasks be executed in the exact same order?
A fixed-thread pool uses a LinkedBlockingQueue to hold pending tasks. Such queue implements a FIFO strategy (first-in-first-out) so the order of execution is guaranteed.
Is there a (theoretical) limit on which the ExecutorService will start throwing away incoming requests?
Quoting the Javadoc:
If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available.
Every incoming request will be added to an unbounded queue so there is no limit and no requests will be rejected (so the theoretical limit is Integer.MAX_VALUE).
Out of curiosity: what changes when the service is using a pool of threads?
If you mean, "what changes if there are more than 1 thread in the fixed thread pool", then nothing. The queue will still have a FIFO nature and there will be no limit on this queue. Otherwise, it depends on how you create the thread-pool.

I take it you are getting your ExecutorService via Executors.newSingleThreadExecutor()?
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
So:
Are there any guarantees on ordering - will tasks be executed in the exact same order?
Tasks are guaranteed to execute sequentially.
Is there a (theoretical) limit on which the ExecutorService will start throwing away incoming requests?
Operating off an unbounded queue. So as large as memory/the backing store of the queue will allow. Commonly Integer.MAX_VALUE.
Out of curiosity: what changes when the service is using a pool of threads?
Depends on how you create the ExecutorService. You can create with bounded queues if you wished, or with a queue that did not use FIFO (such as PriorityBlockingQueue. The documentation for ThreadPoolExecutor gives a good overview of your different options.

ScheduledThreadPoolExecutor and corePoolSize 0?

I'd like to have a ScheduledThreadPoolExecutor which also stops the last thread if there is no work to do, and creates (and keeps threads alive for some time) if there are new tasks. But once there is no more work to do, it should again discard all threads.
I naivly created it as new ScheduledThreadPoolExecutor(0) but as a consequence, no thread is ever created, nor any scheduled task is ever executed.
Can anybody tell me if I can achieve my goal without writing my own wrapper around the ScheduledThreadpoolExecutor?
Thanks in advance!

Actually you can do it, but its non-obvious:
Create a new ScheduledThreadPoolExecutor
In the constructor set the core threads to the maximum number of threads you want
set the keepAliveTime of the executor
and at last, allow the core threads to timeout
m_Executor = new ScheduledThreadPoolExecutor ( 16,null );
m_Executor.setKeepAliveTime ( 5, TimeUnit.SECONDS );
m_Executor.allowCoreThreadTimeOut ( true );
This works only with Java 6 though

I suspect that nothing provided in java.util.concurrent will do this for you, just because if you need a scheduled execution service, then you often have recurring tasks to perform. If you have a recurring task, then it usually makes more sense to just keep the same thread around and use it for the next recurrence of the task, rather than tearing down your thread and having to build a new one at the next recurrence.
Of course, a scheduled executor could be used for inserting delays between non-recurring tasks, or it could be used in cases where resources are so scarce and recurrence is so infrequent that it makes sense to tear down all your threads until new work arrives. So, I can see cases where your proposal would definitely make sense.
To implement this, I would consider trying to wrap a cached thread pool from Executors.newCachedThreadPool together with a single-threaded scheduled executor service (i.e. new ScheduledThreadPoolExecutor(1)). Tasks could be scheduled via the scheduled executor service, but the scheduled tasks would be wrapped in such a way that rather than having your single-threaded scheduled executor execute them, the single-threaded executor would hand them over to the cached thread pool for actual execution.
That compromise would give you a maximum of one thread running when there is absolutely no work to do, and it would give you as many threads as you need (within the limits of your system, of course) when there is lots of work to do.

Reading the ThreadPoolExecutor javadocs might suggest that Alex V's solution is okay. However, doing so will result in unnecessarily creating and destroying threads, nothing like a cashed thread-pool. The ScheduledThreadPool is not designed to work with a variable number of threads. Having looked at the source, I'm sure you'll end up spawning a new thread almost every time you submit a task. Joe's solution should work even if you are ONLY submitting delayed tasks.
PS. I'd monitor your threads to make sure your not wasting resources in your current implementation.

This problem is a known bug in ScheduledThreadPoolExecutor (Bug ID 7091003) and has been fixed in Java 7u4. Though looking at the patch, the fix is that "at least one thread is started even if corePoolSize is 0."

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.