ForkJoinPool scheduling vs ExecutorService

ForkJoinPool scheduling vs ExecutorService - java

I'm slightly confused by the internal scheduling mechanism of the ExecutorService and the ForkJoinPool.
I understand the ExecutorService scheduling is done this way.
A bunch of tasks are queued. Once a thread is available it will handle the first available task and so forth.
Meanwhile, a ForkJoinPool is presented as distinct because it uses a work-stealing algorithm. If I understand correctly it means a thread can steal some tasks from another thread.
Yet, I don't really understand the difference between the mechanism implemented in ExecutorService and in ForkJoinPool. From my understanding, both mechanisms should reduce the idle time of each thread as much as possible.
I would understand if in the case of an ExecutorService, each thread would have its own queue. Yet, it is not the case as the queue is shared by the different threads of the pool...
Any clarification would be more than welcome!

Suppose you have a very big array of ints and you want to add all of them. With an ExecutorService you might say: let's divide that array into chunks of let's say number of threads / 4. So if you have an array of 160 elements (and you have 4 CPUs), you create 160 / 4 / 4 = 10, so you would create 16 chunks each holding 10 ints. Create runnables/callables and submit those to an executor service (and of course think of a way to merge those results once they are done).
Now your hopes are that each of the CPUs will take 4 of those tasks and work on them. Now let's also suppose that some of the numbers are very complicated to add (of course not, but bear with me), it could turn out that 3 threads/CPUs are done with their work while one of them is busy only with the first chunk. No one wants that, of course, but could happen. The bad thing now is that you can't do anything about it.
What ForkJoinPool does instead is say provide me with how you want to split your task and the implementation for the minimal workload I have to do and I'll take care of the rest. In the Stream API this is done with Spliterators; mainly with two methods trySplit (that either returns null meaning nothing can be split more or a new Spliterator - meaning a new chunk) and forEachRemaning that will process elements once you can't split your task anymore. And this is where work stealing will help you.
You say how your chunks are computed (usually split in half) and what to do when you can't split anymore. ForkJoinPool will dispatch the first chunk to all threads and when some of them are free - they are done with their work, they can query other queues from other threads and see if they have work. If they notice that there are chunks in some other threads queues, they will take them, split them on their own and work on those. It can even turn out that they don't do the entire work on that chunks on their own - some other thread can now query this thread's queue and notice that there is still work to do and so on... This is far better as now, when those 3 threads are free they can pick up some other work to do - and all of them are busy.
This example is a bit simplified, but is not very far from reality. It's just that you need to have a lot more chunks than CPU's/threads for work stealing to work; thus usually trySplit has to have a smart implementation and you need lots of elements in the source of your stream.

Related

Best approach: tree set structure vs thread pool executor

Guys I'm in bit dilemma between Tree Set and Thread Pool Executor
Following is the scenario :
First Approach
I have to use structure which has tasks in it with priorities of each task.Now based on treeset constructor(with comparator interface)
I can compare task on priorities and based on that, tasks are ordered properly.
Now after that, tasks should processed in order of priority through iteration of tree set and execute each task one by one.
Second Approach
second approach is to do some sort of logic building and use core functionality of Thread pool executor and for this I had taken inspiration from this link and I had achieved my requirements with this approach also which will choose high priority task first and execute it first and same way it will execute all the tasks.
Now my confusion here is which one is best to use in term of performance costs, flexibility(increase/decrease threads) etc and why should I opt for it?
Any suggestions and answers are highly appreciated.

There are two different notions of priority embedded in your question:
starting priority: in which order tasks are submitted for execution, (point 1 of your first approach explanation)
runtime priority: in which order threads are considered for scheduling (point 3)
These two properties happen to be equal in your scenario, so the tree set will help you define both of them. The executor will help you enforce them, but you will need an ad-hoc tailored executor (based on thread pooling or not), to start your threads up with a specific priority. Basically, each time a task is pulled out of the priority queue, it should be associated with a thread set at the task's priority level. I assume that this is the feature that the executor implementation found in the article you link is providing, and thus what you do.
As for thread pools, from the documentation:
Using worker threads minimizes the overhead due to thread creation. Thread objects use a significant amount of memory, and in a large-scale application, allocating and deallocating many thread objects creates a significant memory management overhead.
Worker threads are threads managed by threadpools, and are conservatively recycled (as opposed to destroyed and recreated), to handle sequences of tasks. I Don't think it matters much with regard to priority handling, but it will optimise your usage of resources.
Regarding the implementation from the article, the code uses a simple blocking deque for handling incoming tasks, hence it's a plain fifo priority scheme. It doesn't reorder tasks.

Finally got the real winner out of this two. I should select for Thread pool Executor because of following reasons
Performance cost: Here if we see, using a resource maximum is main motive to get performance during heavy load.So if we use threads in this high time it will be providing high performance as an advantage of multi-threading .
Flexibility:Flexibility in terms of scalable use of resources i.e during low time we can reduce number of worker threads in thread pool executor architecture and vice versa.
Less number of iterations and minimal updates:If we maintain tree set every time, it will check with the help of comparator interface though it has complexity O(logn) but after that we have to fetch it and it will become a sequential flow of single source so we will not multi-threaded environment advantage.
Faster processing:With the help of threading architecture we can achieve faster output.
etc were the reasons which I pointed out during a heavy brain storming,googling and last but not the least stack Overflow searching. Thank you all for your humble support and huge appreciation to #didierc for getting me clear over it.

You can try DelayedQueue in ordinary threadpool.
ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(size, size, 0, TimeUnit.DAYS, new DelayQueue<>());
threadPoolExecutor.execute(runnable);
Runnable should be implements Comparable . So In this implementation , priority will taken care by delayedqueue.
This approach will be easier to implement.

Java Concurrent Iteration: Divide and Conquer vs Runnable for each item

When I have hundreds of items to iterate through, and I have to do a computation-heavy operation to each one, I would take a "divide and conquer" approach. Essentially, I would take the processor count + 1, and divide those items into the same number of batches. And then I would execute each batch on a runnable in a cached thread pool. It seems to work well. My GUI task went from 20 seconds to 2 seconds, which is a much better experience for the user.
However, I was reading Brian Goetz' fine book on concurrency, and I noticed that for iterating through a list of items, he would take a totally different approach. He would kick off a Runnable for each item! Before, I always speculated this would be bad, especially on a cached thread pool which could create tons of threads. However each runnable would probably finish very quickly in the larger scope, and I understand the cached thread pool is very optimal for short tasks.
So which is the more accepted paradigm to iterate through computation-heavy items? Dividing into a fixed number of batches and giving each batch a runnable? Or kicking each item off in its own runnable? If the latter approach is optimal, is it okay to use a cached thread pool or is it better to use a bounded thread pool?

With batches you will always have to wait for the longest running batch (you are as fast as the slowest batch). "Divide and conquer" implies management overhead: doing administration for the dividing and monitoring the conquering.
Creating a task for each item is relative straightforward (no management), but you are right in that it may start hundreds of threads (unlikely, but it could happen) which will only slow things down (context switching) if the task does no/very few I/O and is mostly CPU intensive.
If the cached thread pool does not start hundreds of threads (see getLargestPoolSize), then by all means, use the cached thread pool. If too many threads are started then one alternative is to use a bounded thread pool. But a bounded thread pool needs some tuning/decisions: do you use an unbounded task queue or a bounded task queue with a CallerRunsPolicy for example?
On a side note: there is also the ForkJoinPool which is suitable for tasks that start sub-tasks.

POC (Proof of concept) of ThreadPools with Executors

Can anybody explain with examples about why should we use Thread-pools.
I have know about use of threadpools with Executors theoretically.
I have gone through number of tutorials, but I didn't get any practically examples about why should we use Threadpools, it can be newFixedThreadPool or newCachedThreadPool or newSingleThreadExecutor
in terms of scalability and performance .
If anybody explain me with respect to performance and scalability with examples about it?

First off, check this description of thread pools that I wrote yesterday: Android Thread Pool to manage multiple bluetooth handeling threads? (ok, it was about android but it's the same for classic java).
The main use I always seem to find for using a threadpool is that is very nicely manages a very common problem: producer-consumer. In this pattern, someone needs to constantly send work items (the producer) to be processed by someone else (the consumers). The work items are obtained from some stream-like source, like a socket, a database, or a collection of disk files, and needs multiple workers in order to be processed efficiently. The main components identifiable here are:
the producer: a thread that keeps posting jobs
a queue where the jobs are posted
the consumers: worker threads that take jobs from the queue and execute them
In addition to this, synchronization needs to be employed to make all this work correctly, since reading and writing to the queue without synchronization can lead to corrupted and inconsistent data. Also, we need to make the system efficient, since the consumers should not waste CPU cycles when there is nothing to do.
Now this pattern is very common, but to implement it from scratch it takes a considerable effort, which is error prone and needs to be carefully reviewed.
The solution is the thread pool. It very conveniently manages the work queue, the consumer threads and all the synchronization needed. All you need to do is play the role of the producer and feed the pool with tasks!

I would start with a problem and only then try to find a solution for it.
If you start the way you have, you can have a solution looking for a problem to solve and you are likely to use it inappropriately.
If you can't think of a use for thread pools, don't use them. ;)
A common mistake people make is to assume that because they have lots of cpus now, they have to use them all as if this were a reason in itself. Its like saying I have lots of disk space, I must find a way to use all of it.
A good reason to use thread pools is to improve the performance of CPU bounds processes and the simplicity of IO bound processes (rather than using non-blocking IO with one thread)
If you have a busy CPU bound process which performs tasks which can be executed independently you have a good use case for a thread pool.
Note: Thread pool often has just one thread. There are specific static factories for these. If you want a simple background worker, this may be an option.
Note 2: A common mistake is to assume that a CPU bound tasks will run best on hundreds or thousands of threads. The optimial number of threads can be the number of core or cpus you have. Once all these are busy, you may find additional threads just add overhead.

Initializing a new thread (and its own stack) is a costly operation.
Thread pools are use to avoid this cost by reusing threads already created. Thus using thread pools you get better performance then creating new threads every time.
Also note that created threads might need to be "deleted" after they have been used, which increases the cost of garbage collection and the frequency it will happen (as the memory fills up faster).
This analysis is just from the performance point of view. I cannot think of an advantage of using thread pools in terms of scalability at the moment.

I googled "why use java thread pools" and found:
A thread pool offers a solution to both the problem of thread
life-cycle overhead and the problem of resource thrashing.
http://www.ibm.com/developerworks/library/j-jtp0730/index.html
and
The newCachedThreadPool method creates an executor with an expandable
thread pool. This executor is suitable for applications that launch
many short-lived tasks.
The newSingleThreadExecutor method creates an
executor that executes a single task at a time.
http://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html

Spawning tons of threads without running out of memory

I have a multi-threaded application which creates hundreds of threads on the fly. When the JVM has less memory available than necessary to create the next Thread, it's unable to create more threads. Every thread lives for 1-3 minutes. Is there a way, if I create a thread and don't start it, the application can be made to automatically start it when it has resources, and otherwise wait until existing threads die?

You're responsible for checking your available memory before allocating more resources, if you're running close to your limit. One way to do this is to use the MemoryUsage class, or use one of:
Runtime.getRuntime().totalMemory()
Runtime.getRuntime().freeMemory()
...to see how much memory is available. To figure out how much is used, of course, you just subtract total from free. Then, in your app, simply set a MAX_MEMORY_USAGE value that, when your app has used that amount or more memory, it stops creating more threads until the amount of used memory has dropped back below this threshold. This way you're always running with the maximum number of threads, and not exceeding memory available.
Finally, instead of trying to create threads without starting them (because once you've created the Thread object, you're already taking up the memory), simply do one of the following:
Keep a queue of things that need to be done, and create a new thread for those things as memory becomes available
Use a "thread pool", let's say a max of 128 threads, as all your "workers". When a worker thread is done with a job, it simply checks the pending work queue to see if anything is waiting to be done, and if so, it removes that job from the queue and starts work.

I ran into a similar issue recently and I used the NotifyingBlockingThreadPoolExecutor solution described at this site:
http://today.java.net/pub/a/today/2008/10/23/creating-a-notifying-blocking-thread-pool-executor.html
The basic idea is that this NotifyingBlockingThreadPoolExecutor will execute tasks in parallel like the ThreadPoolExecutor, but if you try to add a task and there are no threads available, it will wait. It allowed me to keep the code with the simple "create all the tasks I need as soon as I need them" approach while avoiding huge overhead of waiting tasks instantiated all at once.
It's unclear from your question, but if you're using straight threads instead of Executors and Runnables, you should be learning about java.util.concurrent package and using that instead: http://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html

Just write code to do exactly what you want. Your question describes a recipe for a solution, just implement that recipe. Also, you should give serious thought to re-architecting. You only need a thread for things you want to do concurrently and you can't usefully do hundreds of things concurrently.

This is an alternative, lower level solution Then the above mentioed NotifyingBlocking executor - it is probably not as ideal but will be simple to implement
If you want alot of threads on standby, then you ultimately need a mechanism for them to know when its okay to "come to life". This sounds like a case for semaphores.
Make sure that each thread allocates no unnecessary memory before it starts working. Then implement as follows :
1) create n threads on startup of the application, stored in a queue. You can Base this n on the result of Runtime.getMemory(...), rather than hard coding it.
2) also, creat a semaphore with n-k permits. Again, base this onthe amount of memory available.
3) now, have each of n-k threads periodically check if the semaphore has permits, calling Thread.sleep(...) in between checks, for example.
4) if a thread notices a permit, then update the semaphore, and acquire the permit.
If this satisfies your needs, you can go on to manage your threads using a more sophisticated polling or wait/lock mechanism later.

Parallel-processing in Java; advice needed i.e. on Runnanble/Callable interfaces

Assume that I have a set of objects that need to be analyzed in two different ways, both of which take relatively long time and involve IO-calls, I am trying to figure out how/if I could go about optimizing this part of my software, especially utilizing the multiple processors (the machine i am sitting on for ex is a 8-core i7 which almost never goes above 10% load during execution).
I am quite new to parallel-programming or multi-threading (not sure what the right term is), so I have read some of the prior questions, particularly paying attention to highly voted and informative answers. I am also in the process of going through the Oracle/Sun tutorial on concurrency.
Here's what I thought out so far;
A thread-safe collection holds the objects to be analyzed
As soon as there are objects in the collection (they come a couple at a time from a series of queries), a thread per object is started
Each specific thread takes care of the initial pre-analysis preparations; and then calls on the analyses.
The two analyses are implemented as Runnables/Callables, and thus called on by the thread when necessary.
And my questions are:
Is this a reasonable scheme, if not, how would you go about doing this?
In order to make sure things don't get out of hand, should I implement a ThreadManager or some thing of that sort, which starts and stops threads, and re-distributes them when they are complete? For example, if i have 256 objects to be analyzed, and 16 threads in total, the ThreadManager assigns the first finished thread to the 17th object to be analyzed etc.
Is there a dramatic difference between Runnable/Callable other than the fact that Callable can return a result? Otherwise should I try to implement my own interface, in that case why?
Thanks,

You could use a BlockingQueue implementation to hold your objects and spawn your threads from there. This interface is based on the producer-consumer principle. The put() method will block if your queue is full until there is some more space and the take() method will block if the queue is empty until there are some objects again in the queue.
An ExecutorService can help you manage your pool of threads.
If you are awaiting a result from your spawned threads then Callable interface is a good idea to use since you can start the computation earlier and work in your code assuming the results in Future-s. As far as the differencies with the Runnable interface, from the Callable javadoc:
The Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception.
Some general things you need to consider in your quest for java concurrency:
Visibility is not coming by defacto. volatile, AtomicReference and other objects in the java.util.concurrent.atomic package are your friends.
You need to carefully ensure atomicity of compound actions using synchronization and locks.

Your idea is basically sound. However, rather than creating threads directly, or indirectly through some kind of ThreadManager of your own design, use an Executor from Java's concurrency package. It does everything you need, and other people have already taken the time to write and debug it. An executor manages a queue of tasks, so you don't need to worry about providing the threadsafe queue yourself either.
There's no difference between Callable and Runnable except that the former returns a value. Executors will handle both, and ready them the same.
It's not clear to me whether you're planning to make the preparation step a separate task to the analyses, or fold it into one of them, with that task spawning the other analysis task halfway through. I can't think of any reason to strongly prefer one to the other, but it's a choice you should think about.

The Executors provides factory methods for creating thread pools. Specifically Executors#newFixedThreadPool(int nThreads) creates a thread pool with a fixed size that utilizes an unbounded queue. Also if a thread terminates due to a failure then a new thread will be replaced in its place. So in your specific example of 256 tasks and 16 threads you would call
// create pool
ExecutorService threadPool = Executors.newFixedThreadPool(16);
// submit task.
Runnable task = new Runnable(){};;
threadPool.submit(task);
The important question is determining the proper number of threads for you thread pool. See if this helps Efficient Number of Threads

Sounds reasonable, but it's not as trivial to implement as it may seem.
Maybe you should check the jsr166y project.
That's probably the easiest solution to your problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.