POC (Proof of concept) of ThreadPools with Executors - java

Can anybody explain with examples about why should we use Thread-pools.
I have know about use of threadpools with Executors theoretically.
I have gone through number of tutorials, but I didn't get any practically examples about why should we use Threadpools, it can be newFixedThreadPool or newCachedThreadPool or newSingleThreadExecutor
in terms of scalability and performance .
If anybody explain me with respect to performance and scalability with examples about it?

First off, check this description of thread pools that I wrote yesterday: Android Thread Pool to manage multiple bluetooth handeling threads? (ok, it was about android but it's the same for classic java).
The main use I always seem to find for using a threadpool is that is very nicely manages a very common problem: producer-consumer. In this pattern, someone needs to constantly send work items (the producer) to be processed by someone else (the consumers). The work items are obtained from some stream-like source, like a socket, a database, or a collection of disk files, and needs multiple workers in order to be processed efficiently. The main components identifiable here are:
the producer: a thread that keeps posting jobs
a queue where the jobs are posted
the consumers: worker threads that take jobs from the queue and execute them
In addition to this, synchronization needs to be employed to make all this work correctly, since reading and writing to the queue without synchronization can lead to corrupted and inconsistent data. Also, we need to make the system efficient, since the consumers should not waste CPU cycles when there is nothing to do.
Now this pattern is very common, but to implement it from scratch it takes a considerable effort, which is error prone and needs to be carefully reviewed.
The solution is the thread pool. It very conveniently manages the work queue, the consumer threads and all the synchronization needed. All you need to do is play the role of the producer and feed the pool with tasks!

I would start with a problem and only then try to find a solution for it.
If you start the way you have, you can have a solution looking for a problem to solve and you are likely to use it inappropriately.
If you can't think of a use for thread pools, don't use them. ;)
A common mistake people make is to assume that because they have lots of cpus now, they have to use them all as if this were a reason in itself. Its like saying I have lots of disk space, I must find a way to use all of it.
A good reason to use thread pools is to improve the performance of CPU bounds processes and the simplicity of IO bound processes (rather than using non-blocking IO with one thread)
If you have a busy CPU bound process which performs tasks which can be executed independently you have a good use case for a thread pool.
Note: Thread pool often has just one thread. There are specific static factories for these. If you want a simple background worker, this may be an option.
Note 2: A common mistake is to assume that a CPU bound tasks will run best on hundreds or thousands of threads. The optimial number of threads can be the number of core or cpus you have. Once all these are busy, you may find additional threads just add overhead.

Initializing a new thread (and its own stack) is a costly operation.
Thread pools are use to avoid this cost by reusing threads already created. Thus using thread pools you get better performance then creating new threads every time.
Also note that created threads might need to be "deleted" after they have been used, which increases the cost of garbage collection and the frequency it will happen (as the memory fills up faster).
This analysis is just from the performance point of view. I cannot think of an advantage of using thread pools in terms of scalability at the moment.

I googled "why use java thread pools" and found:
A thread pool offers a solution to both the problem of thread
life-cycle overhead and the problem of resource thrashing.
http://www.ibm.com/developerworks/library/j-jtp0730/index.html
and
The newCachedThreadPool method creates an executor with an expandable
thread pool. This executor is suitable for applications that launch
many short-lived tasks.
The newSingleThreadExecutor method creates an
executor that executes a single task at a time.
http://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html

Related

Use Executors.newFixedThreadPool for only two threads?

For a particular action, application creates two threads (doing different tasks) and main thread doesn't wait for it. Again for some cases, it can be only one thread too.
If I move this one to Executors.newFixedThreadPool(), does it make any difference? I understand Executors are doing thread management. It will be good for multi-threading scenarios.
But I want to know does it makes any small difference at least when two threads are changed to use executors? Please help.
Thanks in advance.
This may results in better CPU utilization when u have a many threads and want to
execute few of them at a time, but if you have only two thread then I think it is
not beneficial to use Executors.
from docs.oracle
Thread pools address two different problems: they usually provide improved performance when executing large numbers of asynchronous tasks, due to reduced per-task invocation overhead, and they provide a means of bounding and managing the resources, including threads, consumed when executing a collection of tasks. Each ThreadPoolExecutor also maintains some basic statistics, such as the number of completed tasks.

Spawning tons of threads without running out of memory

I have a multi-threaded application which creates hundreds of threads on the fly. When the JVM has less memory available than necessary to create the next Thread, it's unable to create more threads. Every thread lives for 1-3 minutes. Is there a way, if I create a thread and don't start it, the application can be made to automatically start it when it has resources, and otherwise wait until existing threads die?
You're responsible for checking your available memory before allocating more resources, if you're running close to your limit. One way to do this is to use the MemoryUsage class, or use one of:
Runtime.getRuntime().totalMemory()
Runtime.getRuntime().freeMemory()
...to see how much memory is available. To figure out how much is used, of course, you just subtract total from free. Then, in your app, simply set a MAX_MEMORY_USAGE value that, when your app has used that amount or more memory, it stops creating more threads until the amount of used memory has dropped back below this threshold. This way you're always running with the maximum number of threads, and not exceeding memory available.
Finally, instead of trying to create threads without starting them (because once you've created the Thread object, you're already taking up the memory), simply do one of the following:
Keep a queue of things that need to be done, and create a new thread for those things as memory becomes available
Use a "thread pool", let's say a max of 128 threads, as all your "workers". When a worker thread is done with a job, it simply checks the pending work queue to see if anything is waiting to be done, and if so, it removes that job from the queue and starts work.
I ran into a similar issue recently and I used the NotifyingBlockingThreadPoolExecutor solution described at this site:
http://today.java.net/pub/a/today/2008/10/23/creating-a-notifying-blocking-thread-pool-executor.html
The basic idea is that this NotifyingBlockingThreadPoolExecutor will execute tasks in parallel like the ThreadPoolExecutor, but if you try to add a task and there are no threads available, it will wait. It allowed me to keep the code with the simple "create all the tasks I need as soon as I need them" approach while avoiding huge overhead of waiting tasks instantiated all at once.
It's unclear from your question, but if you're using straight threads instead of Executors and Runnables, you should be learning about java.util.concurrent package and using that instead: http://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html
Just write code to do exactly what you want. Your question describes a recipe for a solution, just implement that recipe. Also, you should give serious thought to re-architecting. You only need a thread for things you want to do concurrently and you can't usefully do hundreds of things concurrently.
This is an alternative, lower level solution Then the above mentioed NotifyingBlocking executor - it is probably not as ideal but will be simple to implement
If you want alot of threads on standby, then you ultimately need a mechanism for them to know when its okay to "come to life". This sounds like a case for semaphores.
Make sure that each thread allocates no unnecessary memory before it starts working. Then implement as follows :
1) create n threads on startup of the application, stored in a queue. You can Base this n on the result of Runtime.getMemory(...), rather than hard coding it.
2) also, creat a semaphore with n-k permits. Again, base this onthe amount of memory available.
3) now, have each of n-k threads periodically check if the semaphore has permits, calling Thread.sleep(...) in between checks, for example.
4) if a thread notices a permit, then update the semaphore, and acquire the permit.
If this satisfies your needs, you can go on to manage your threads using a more sophisticated polling or wait/lock mechanism later.

Is Java's fork-and-join thread pool is good for executing IO bound task?

in my application, I have to solve a problem by executing many network-io bound task and sometime one io bound task and be divided into smaller io bound tasks. These tasks are currently getting executed using Java's standard threadpool mechanism. I am wondering whether I can move to fork-and-join framework? But the question is, is forkandjoin framework usually being used to solve io bound operations or CPU bound? I assume they are mostly for CPU bound operations cause fork-and-join framework makes use of work stealing technique to make use of multo core processors, but if I use it for IO bound tasks, will there be any adverse effect?
Fork-join is designed for compute-bound tasks so generally I'd say no. Fork-join does have an API (the ManagedBlocker api) to tell the FJ framework that your thread will be blocking for a while and not to line up new tasks but it's really designed for short waits (like obtaining a lock), not arbitrarily long waits for IO.
We have a system that uses fork-join and we shunt IO-bound tasks off to a separate executor pool. When data arrives it triggers tasks into the fork-join pool so that only cpu-bound work occurs there.
there does not seem to be a compelling advantage to fork-joins in this case.
there does not seem to be a signficant disadvantage either because you would not be driving some resource too hard.
all in all, i would stay with the thread pool until you have no other important development to do.
If you are trying to address the "I/O bound" aspect of your problem, I doubt that switching from standard threads to fork-and-join is going to improve things ... assuming that you've implemented the current thread-based solution properly. (And based on Alex Miller's answer, the switch could actually make things significantly worse.)
Or to put it another way, the way to make your I/O bound application go faster is to address the problems that make it I/O bound ... or increase your system's I/O bandwidth.

Difference between Thread and Threadpool

Can any one guide me with example about Thread and ThreadPool what is difference between them? which is best to use...? what are the drawback on its
Since a thread can only run once, you'd have to use a thread per task. However, creating and starting threads is somewhat expensive and can lead to a situation where too many threads are waiting for execution (don't remember the exact name for this right now) - which further reduces performance.
A thread pool is - as the name suggests - a pool of worker threads which are always running. Those threads then normally take tasks from a list, execute them, then try to take the next task. If there's no task, the thread will wait.
Using a thread pool has several advantages:
you don't have to create a thread per task
you normally have the optimal number of threads for your system (depending on the JVM too)
you can concentrate on writing tasks and use the thread pool to manage the infrastructure
Edit: Here are some quite good articles on concurrency in general: Sutter's Mill, look at the bottom for more links. Although they're primarily written for C/C++ the general concepts are the same, since it also describes the interdependence between concurrency solutions and hardware. A good article to understand concurrency performance issues is this article on drdobbs.com.
A thread pool is a collection of threads which are assigned to perform uniformed tasks.
The advantages of using thread pool pattern is that you can define how many threads is allowed to execute simultaneously. This is to avoid server crashing due to high CPU load or out of memory condition, e.g. the server's hardware capacity can support up to 100 requests per second only.
Database pooling has the similar concept with thread pool.
This pattern is widely used in most of the back-end servers' application process.
While a thread, is a unit which execute a task.

Creating Sub-Threads From a Thread in Java

In a java program, I spawned one thread other than the main thread, and then spawned another two threads from the original thread I created(two sub threads). In all the cases I used the Runnable interface to create threads. My question is, is there a better way of doing this? Does the performance degrade when you spawn threads recursively?
There is no such thing as a parent-child relationship between threads in Java. Once created, they have a life of their own.
Regarding performance, you may want to use an ExecutorService to control the number of threads created in your application. Too many threads will kill performance for sure. See the Executors class too.
The way you are creating threads is perfectly ok if it is only a few. Otherwise, executor services are the preferred method.
There is no problem with what you are doing, no performance degradation. If you had a more complicated program with a large number of threads, you could look for utility classes in java.util.concurrent.
There is no problem creating a few threads this way.
For slower operations like network IO it's even pretty good, you can have quit a lot of threads as they are mostly waiting.
For number crunching threads I'd use an ExecutorService retrieved using Executors.newFixedThreadPool with the Runtime.numberOfProcessors() or something like that as the amount of threads. If processing is CPU constrained more threads only make it less efficient.
Also, disk IO tends to be better serial than parallel.
If you have CPU or disk IO constrained processors you can also look at the producer-consumer pattern as described in the BlockingQueue javadocs. Your main thread (or threads) create processing or load tasks and dump these on a blocking queue. A fixed amount of worker threads process the items on the queue.

Categories