Java Concurrent Iteration: Divide and Conquer vs Runnable for each item

Java Concurrent Iteration: Divide and Conquer vs Runnable for each item - java

When I have hundreds of items to iterate through, and I have to do a computation-heavy operation to each one, I would take a "divide and conquer" approach. Essentially, I would take the processor count + 1, and divide those items into the same number of batches. And then I would execute each batch on a runnable in a cached thread pool. It seems to work well. My GUI task went from 20 seconds to 2 seconds, which is a much better experience for the user.
However, I was reading Brian Goetz' fine book on concurrency, and I noticed that for iterating through a list of items, he would take a totally different approach. He would kick off a Runnable for each item! Before, I always speculated this would be bad, especially on a cached thread pool which could create tons of threads. However each runnable would probably finish very quickly in the larger scope, and I understand the cached thread pool is very optimal for short tasks.
So which is the more accepted paradigm to iterate through computation-heavy items? Dividing into a fixed number of batches and giving each batch a runnable? Or kicking each item off in its own runnable? If the latter approach is optimal, is it okay to use a cached thread pool or is it better to use a bounded thread pool?

With batches you will always have to wait for the longest running batch (you are as fast as the slowest batch). "Divide and conquer" implies management overhead: doing administration for the dividing and monitoring the conquering.
Creating a task for each item is relative straightforward (no management), but you are right in that it may start hundreds of threads (unlikely, but it could happen) which will only slow things down (context switching) if the task does no/very few I/O and is mostly CPU intensive.
If the cached thread pool does not start hundreds of threads (see getLargestPoolSize), then by all means, use the cached thread pool. If too many threads are started then one alternative is to use a bounded thread pool. But a bounded thread pool needs some tuning/decisions: do you use an unbounded task queue or a bounded task queue with a CallerRunsPolicy for example?
On a side note: there is also the ForkJoinPool which is suitable for tasks that start sub-tasks.

Related

Why can't the core threads of a thread pool in Java be reused in the initial phase?

I recently had a question looking at the source code of ThreadPoolExecutor: If thread pool represents the reuse of existing threads to reduce the overhead of thread creation or destruction, why not reuse core threads in the initial phase? That is, when the current number of threads is less than the number of core threads, first check whether there are core threads that have completed the task, if so, reuse. Why not? Instead of creating a new thread before the number of core threads is reached, does this violate thread pool design principles?
The following is a partial comment on the addWorker() method in ThreadPoolExecutor
#param firstTask the task the new thread should run first (or null if none). Workers are created with an initial first task (in method execute()) to bypass queuing when there are fewer than corePoolSize threads (in which case we always start one), or when the queue is full (in which case we must bypass queue). Initially idle threads are usually created via prestartCoreThread or to replace other dying workers.

This was actually requested already: JDK-6452337. A core libraries developer has noted:
I like this idea, but ThreadPoolExecutor is already complicated enough.
Keep in mind that corePoolSize is an essential part of ThreadPoolExecutor and is saying how many workers are always active/idle at least. Reaching this number just naturally takes a very short time. You set corePoolSize according to your needs and it's expected that the workload will meet this number.
My assumption is that optimizing this "warm-up phase" – taking it for granted that this will actually increase efficiency – is not worth it. I can't quantify for you what additional complexity this optimization will bring, I'm not developing Java Core libraries, but I assume that it's not worth it.
You can think of it like that: The "warm-up phase" is constant while the thread pool will run for an undefined amount of time. In an ideal world, the initial phase actually should take no time at all, the workload should be there as you create the thread pool. So you are thinking about an optimization that optimizes something that is not the expected thread pool state.
The thread workers will have to be created at some point anyways. This optimization only delays the creation. Imagine you have a corePoolSize of 10, so there is the overhead of creating 10 threads at least. This overhead won't change if you do it later. Yes, resources are also taken later but here I'm asking if the thread pool is configured correctly in the first place: Is corePoolSize correct, does it meet the current workload?
Notice that ThreadPoolExecutor has methods like setCorePoolSize(int) and allowCoreThreadTimeOut(boolean) and more that allow you to configure the thread pool according to your needs.

one Thread Pool for my whole android application

like - network operation and bitmap manipulating an image loading and other kinds of work can I create a single TheadPoolExecuter for my whole application and execute on it.
if the answer is no -> why? and how to create thread pool for every single operation?
or if yes -> is performance problem occurs?
thanks in advance.

Both of approach have advantages and disadvantages.
In case of single thread pool (singleton implementation, I suppose):
➕ you have one entry point to submit background task
➕ it easily to implement and control life cycle
➖ if you have a lot of different quick tasks and some long running task, long running tasks may hold all thread in limited pool while user wait some quick action in UI
Different thread pools (one pool for one type of task):
➕ thread pool of long-running tasks can accumulate task while quick task can be executed in their own thread pool in-depend
➕ you know everything about tasks in your application - you can fine-tune pool size for every type of task, setup threads priority, initial stack size etc. with thread factory
➕ if you define thread group and thread name, it can help you in debug
➖ have different thread pools involve to hard control their life cycle
➖ this implementation will not give a lot of benefits in poor separation by tasks classes
Any case, you need some compromise and an assessment of the advantages

Talking teorically i think you can do that and according to oracle documentation should be improve your performance:
Thread pools address two different problems: they usually provide
improved performance when executing large numbers of asynchronous
tasks, due to reduced per-task invocation overhead, and they provide a
means of bounding and managing the resources, including threads,
consumed when executing a collection of tasks. Each ThreadPoolExecutor
also maintains some basic statistics, such as the number of completed
tasks.

How do I know if Fork and Join has enough pool size in Java?

I am trying to implement a divide-and-conquer solution to some large data. I use fork and join to break down things into threads. However I have a question regarding the fork mechanism: if I set my divide and conquer condition as:
#Override
protected SomeClass compute(){
if (list.size()<LIMIT){
//Do something here
...
}else{
//Divide the list and invoke sub-threads
SomeRecursiveTaskClass subWorker1 = new SomeRecursiveTaskClass(list.subList());
SomeRecursiveTaskClass subWorker2 = new SomeRecursiveTaskClass(list.subList());
invokeAll(subWorker1, subWorker2);
...
}
}
What will happen if there is not enough resource to invoke subWorker (e.g. not enough thread in pool)? Does Fork/Join framework maintains a pool size for available threads? Or should I add this condition into my divide-and-conquer logic?

Each ForkJoinPool has a configured target parallelism. This isn’t exactly matching the number of threads, i.e. if a worker thread is going to wait via a ManagedBlocker, the pool may start even more threads to compensate. The parallelism of the commonPool defaults to “number of CPU cores minus one”, so when incorporating the initiating non-pool thread as helper, the resulting parallelism will utilize all CPU cores.
When you submit more jobs than threads, they will be enqueued. Enqueuing a few jobs can help utilizing the threads, as not all jobs may run exactly the same time, so threads running out of work may steal jobs from other threads, but splitting the work too much may create an unnecessary overhead.
Therefore, you may use ForkJoinTask.getSurplusQueuedTaskCount() to get the current number of pending jobs that are unlikely to be stolen by other threads and split only when it is below a small threshold. As its documentation states:
This value may be useful for heuristic decisions about whether to fork other tasks. In many usages of ForkJoinTasks, at steady state, each worker should aim to maintain a small constant surplus (for example, 3) of tasks, and to process computations locally if this threshold is exceeded.
So this is the condition to decide whether to split your jobs further. Since this number reflects when idle threads steal your created jobs, it will cause balancing when the jobs have different CPU load. Also, it works the other way round, if the pool is shared (like the common pool) and threads are already busy, they will not pick up your jobs, the surplus count will stay high and you will automatically stop splitting then.

POC (Proof of concept) of ThreadPools with Executors

Can anybody explain with examples about why should we use Thread-pools.
I have know about use of threadpools with Executors theoretically.
I have gone through number of tutorials, but I didn't get any practically examples about why should we use Threadpools, it can be newFixedThreadPool or newCachedThreadPool or newSingleThreadExecutor
in terms of scalability and performance .
If anybody explain me with respect to performance and scalability with examples about it?

First off, check this description of thread pools that I wrote yesterday: Android Thread Pool to manage multiple bluetooth handeling threads? (ok, it was about android but it's the same for classic java).
The main use I always seem to find for using a threadpool is that is very nicely manages a very common problem: producer-consumer. In this pattern, someone needs to constantly send work items (the producer) to be processed by someone else (the consumers). The work items are obtained from some stream-like source, like a socket, a database, or a collection of disk files, and needs multiple workers in order to be processed efficiently. The main components identifiable here are:
the producer: a thread that keeps posting jobs
a queue where the jobs are posted
the consumers: worker threads that take jobs from the queue and execute them
In addition to this, synchronization needs to be employed to make all this work correctly, since reading and writing to the queue without synchronization can lead to corrupted and inconsistent data. Also, we need to make the system efficient, since the consumers should not waste CPU cycles when there is nothing to do.
Now this pattern is very common, but to implement it from scratch it takes a considerable effort, which is error prone and needs to be carefully reviewed.
The solution is the thread pool. It very conveniently manages the work queue, the consumer threads and all the synchronization needed. All you need to do is play the role of the producer and feed the pool with tasks!

I would start with a problem and only then try to find a solution for it.
If you start the way you have, you can have a solution looking for a problem to solve and you are likely to use it inappropriately.
If you can't think of a use for thread pools, don't use them. ;)
A common mistake people make is to assume that because they have lots of cpus now, they have to use them all as if this were a reason in itself. Its like saying I have lots of disk space, I must find a way to use all of it.
A good reason to use thread pools is to improve the performance of CPU bounds processes and the simplicity of IO bound processes (rather than using non-blocking IO with one thread)
If you have a busy CPU bound process which performs tasks which can be executed independently you have a good use case for a thread pool.
Note: Thread pool often has just one thread. There are specific static factories for these. If you want a simple background worker, this may be an option.
Note 2: A common mistake is to assume that a CPU bound tasks will run best on hundreds or thousands of threads. The optimial number of threads can be the number of core or cpus you have. Once all these are busy, you may find additional threads just add overhead.

Initializing a new thread (and its own stack) is a costly operation.
Thread pools are use to avoid this cost by reusing threads already created. Thus using thread pools you get better performance then creating new threads every time.
Also note that created threads might need to be "deleted" after they have been used, which increases the cost of garbage collection and the frequency it will happen (as the memory fills up faster).
This analysis is just from the performance point of view. I cannot think of an advantage of using thread pools in terms of scalability at the moment.

I googled "why use java thread pools" and found:
A thread pool offers a solution to both the problem of thread
life-cycle overhead and the problem of resource thrashing.
http://www.ibm.com/developerworks/library/j-jtp0730/index.html
and
The newCachedThreadPool method creates an executor with an expandable
thread pool. This executor is suitable for applications that launch
many short-lived tasks.
The newSingleThreadExecutor method creates an
executor that executes a single task at a time.
http://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html

Recursively adding threads to a Java thread pool

I am working on a tutorial for my Java concurrency course. The objective is to use thread pools to compute prime numbers in parallel.
The design is based on the Sieve of Eratosthenes. It has an array of n bools, where n is the largest integer you are checking, and each element in the array represents one integer. True is prime, false is non prime, and the array is initially all true.
A thread pool is used with a fixed number of threads (we are supposed to experiment with the number of threads in the pool and observe the performance).
A thread is given a integer multiple to process. The thread then finds the first true element in the array that is not a multiple of thread's integer. The thread then creates a new thread on the thread pool which is given the found number.
After a new thread is formed, the existing thread then continues to set all multiples of it's integer in the array to false.
The main program thread starts the first thread with the integer '2', and then waits for all spawned threads to finish. It then spits out the prime numbers and the time taken to compute.
The issue I have is that the more threads there are in the thread pool, the slower it takes with 1 thread being the fastest. It should be getting faster not slower!
All the stuff on the internet about Java thread pools create n worker threads the main thread then wait for all threads to finish. The method I use is recursive as a worker can spawn more worker threads.
I would like to know what is going wrong, and if Java thread pools can be used recursively.

Your solution may run slower as threads are added for some of following problems:
Thread creation overheads: creating a thread is expensive.
Processor contention: if there are more threads than there are processors to execute them, some of threads will be suspended waiting for a free processor. The result is that the average processing rate for each thread drops. Also, the OS then needs to time-slice the threads, and that takes away time that would otherwise be used for "real" work.
Virtual memory contention: each thread needs memory for its stack. If your machine doesn't have enough physical memory for the workload, each new thread stack increases virtual memory contention which results in paging which slows things down
Cache contention: each thread will (presumably) be scanning a different section of the array, resulting in memory cache misses. This slows down memory accesses.
Lock contention: if your threads are all reading and updating a shared array and using synchronized and one lock object to control access to the array, you could be suffering from lock contention. If a single lock object is used, each thread will spend most of its time waiting to acquire the lock. The net result is that the computation is effectively serialized, and the overall processing rate drops to the rate of a single processor / thread.
The first four problems are inherent to multi-threading, and there are no real solutions ... apart from not creating too many threads and reusing the ones that you have already created. However, there are a number of ways to attack the lock contention problem. For example,
Recode the application so that each thread scans for multiple integers, but in its own section of the array. This will eliminate lock contention on the arrays, though you will then need a way to tell each thread what to do, and that needs to be designed with contention in mind.
Create an array of locks for different regions of the array, and have the threads pick the lock to used based on the region of the array they are operating on. You would still get contention, but on average you should get less contention.
Design and implement a lockless solution. This would entail DEEP UNDERSTANDING of the Java memory model. And it would be very difficult to prove / demonstrate that a lockless solution does not contain subtle concurrency flaws.
Finally, recursive creation of threads is probably a mistake, since it will make it harder to implement thread reuse and the anti-lock-contention measures.

How many processors are available on your system? If #threads > #processors, adding more threads is going to slow things down for a compute-bound task like this.
Remember no matter how many threads you start, they're still all sharing the same CPU(s). The more time the CPU spends switching between threads, the less time it can be doing actual work.
Also note that the cost of starting a thread is significant compared to the cost of checking a prime - you can probably do hundreds or thousands of multiplications in the time it takes to fire up 1 thread.

The key point of a thread pool is to keep a set of thread alive and re-use them to process tasks. Usually the pattern is to have a queue of tasks and randomly pick one thread from the pool to process it. If there is no free thread and the pool is full, just wait.
The problem you designed is not a good one to be solved by a thread pool, because you need threads to run in order. Correct me if I'm wrong here.
thread #1: set 2's multiple to false
thread #2: find 3, set 3's multiple to false
thread #3: find 5, set 5's multiple to false
thread #4: find 7, set 7's multiple to false
....
These threads need to be run in order and they're interleaving (how the runtime schedules them) matters.
For example, if thread #3 starts running before thread #1 sets "4" to false, it will find "4" and continue to reset 4's multiples. This ends up doing a lot of extra work, although the final result will be correct.

Restructure your program to create a fixed ThreadPoolExecutor in advance. Make sure you call ThreadPoolExecutor#prestartAllCoreThreads(). Have your main method submit a task for the integer 2. Each task will submit another task. Since you are using a thread pool, you won't be creating and terminating a bunch of threads, but instead allowing the same threads to take on new tasks as they become available. This will reduce on overall execution overhead.
You should discover that in this case the optimum number of threads is equal to the number of processors (P) on the machine. It is often the case that the optimum number of threads is P+1. This is because P+1 minimizes overhead from context switching while also minimizing loss from idle/blocking time.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.