Guys I'm in bit dilemma between Tree Set and Thread Pool Executor
Following is the scenario :
First Approach
I have to use structure which has tasks in it with priorities of each task.Now based on treeset constructor(with comparator interface)
I can compare task on priorities and based on that, tasks are ordered properly.
Now after that, tasks should processed in order of priority through iteration of tree set and execute each task one by one.
Second Approach
second approach is to do some sort of logic building and use core functionality of Thread pool executor and for this I had taken inspiration from this link and I had achieved my requirements with this approach also which will choose high priority task first and execute it first and same way it will execute all the tasks.
Now my confusion here is which one is best to use in term of performance costs, flexibility(increase/decrease threads) etc and why should I opt for it?
Any suggestions and answers are highly appreciated.
There are two different notions of priority embedded in your question:
starting priority: in which order tasks are submitted for execution, (point 1 of your first approach explanation)
runtime priority: in which order threads are considered for scheduling (point 3)
These two properties happen to be equal in your scenario, so the tree set will help you define both of them. The executor will help you enforce them, but you will need an ad-hoc tailored executor (based on thread pooling or not), to start your threads up with a specific priority. Basically, each time a task is pulled out of the priority queue, it should be associated with a thread set at the task's priority level. I assume that this is the feature that the executor implementation found in the article you link is providing, and thus what you do.
As for thread pools, from the documentation:
Using worker threads minimizes the overhead due to thread creation. Thread objects use a significant amount of memory, and in a large-scale application, allocating and deallocating many thread objects creates a significant memory management overhead.
Worker threads are threads managed by threadpools, and are conservatively recycled (as opposed to destroyed and recreated), to handle sequences of tasks. I Don't think it matters much with regard to priority handling, but it will optimise your usage of resources.
Regarding the implementation from the article, the code uses a simple blocking deque for handling incoming tasks, hence it's a plain fifo priority scheme. It doesn't reorder tasks.
Finally got the real winner out of this two. I should select for Thread pool Executor because of following reasons
Performance cost: Here if we see, using a resource maximum is main motive to get performance during heavy load.So if we use threads in this high time it will be providing high performance as an advantage of multi-threading .
Flexibility:Flexibility in terms of scalable use of resources i.e during low time we can reduce number of worker threads in thread pool executor architecture and vice versa.
Less number of iterations and minimal updates:If we maintain tree set every time, it will check with the help of comparator interface though it has complexity O(logn) but after that we have to fetch it and it will become a sequential flow of single source so we will not multi-threaded environment advantage.
Faster processing:With the help of threading architecture we can achieve faster output.
etc were the reasons which I pointed out during a heavy brain storming,googling and last but not the least stack Overflow searching. Thank you all for your humble support and huge appreciation to #didierc for getting me clear over it.
You can try DelayedQueue in ordinary threadpool.
ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(size, size, 0, TimeUnit.DAYS, new DelayQueue<>());
threadPoolExecutor.execute(runnable);
Runnable should be implements Comparable . So In this implementation , priority will taken care by delayedqueue.
This approach will be easier to implement.
Related
like - network operation and bitmap manipulating an image loading and other kinds of work can I create a single TheadPoolExecuter for my whole application and execute on it.
if the answer is no -> why? and how to create thread pool for every single operation?
or if yes -> is performance problem occurs?
thanks in advance.
Both of approach have advantages and disadvantages.
In case of single thread pool (singleton implementation, I suppose):
➕ you have one entry point to submit background task
➕ it easily to implement and control life cycle
➖ if you have a lot of different quick tasks and some long running task, long running tasks may hold all thread in limited pool while user wait some quick action in UI
Different thread pools (one pool for one type of task):
➕ thread pool of long-running tasks can accumulate task while quick task can be executed in their own thread pool in-depend
➕ you know everything about tasks in your application - you can fine-tune pool size for every type of task, setup threads priority, initial stack size etc. with thread factory
➕ if you define thread group and thread name, it can help you in debug
➖ have different thread pools involve to hard control their life cycle
➖ this implementation will not give a lot of benefits in poor separation by tasks classes
Any case, you need some compromise and an assessment of the advantages
Talking teorically i think you can do that and according to oracle documentation should be improve your performance:
Thread pools address two different problems: they usually provide
improved performance when executing large numbers of asynchronous
tasks, due to reduced per-task invocation overhead, and they provide a
means of bounding and managing the resources, including threads,
consumed when executing a collection of tasks. Each ThreadPoolExecutor
also maintains some basic statistics, such as the number of completed
tasks.
For a particular action, application creates two threads (doing different tasks) and main thread doesn't wait for it. Again for some cases, it can be only one thread too.
If I move this one to Executors.newFixedThreadPool(), does it make any difference? I understand Executors are doing thread management. It will be good for multi-threading scenarios.
But I want to know does it makes any small difference at least when two threads are changed to use executors? Please help.
Thanks in advance.
This may results in better CPU utilization when u have a many threads and want to
execute few of them at a time, but if you have only two thread then I think it is
not beneficial to use Executors.
from docs.oracle
Thread pools address two different problems: they usually provide improved performance when executing large numbers of asynchronous tasks, due to reduced per-task invocation overhead, and they provide a means of bounding and managing the resources, including threads, consumed when executing a collection of tasks. Each ThreadPoolExecutor also maintains some basic statistics, such as the number of completed tasks.
Can anybody explain with examples about why should we use Thread-pools.
I have know about use of threadpools with Executors theoretically.
I have gone through number of tutorials, but I didn't get any practically examples about why should we use Threadpools, it can be newFixedThreadPool or newCachedThreadPool or newSingleThreadExecutor
in terms of scalability and performance .
If anybody explain me with respect to performance and scalability with examples about it?
First off, check this description of thread pools that I wrote yesterday: Android Thread Pool to manage multiple bluetooth handeling threads? (ok, it was about android but it's the same for classic java).
The main use I always seem to find for using a threadpool is that is very nicely manages a very common problem: producer-consumer. In this pattern, someone needs to constantly send work items (the producer) to be processed by someone else (the consumers). The work items are obtained from some stream-like source, like a socket, a database, or a collection of disk files, and needs multiple workers in order to be processed efficiently. The main components identifiable here are:
the producer: a thread that keeps posting jobs
a queue where the jobs are posted
the consumers: worker threads that take jobs from the queue and execute them
In addition to this, synchronization needs to be employed to make all this work correctly, since reading and writing to the queue without synchronization can lead to corrupted and inconsistent data. Also, we need to make the system efficient, since the consumers should not waste CPU cycles when there is nothing to do.
Now this pattern is very common, but to implement it from scratch it takes a considerable effort, which is error prone and needs to be carefully reviewed.
The solution is the thread pool. It very conveniently manages the work queue, the consumer threads and all the synchronization needed. All you need to do is play the role of the producer and feed the pool with tasks!
I would start with a problem and only then try to find a solution for it.
If you start the way you have, you can have a solution looking for a problem to solve and you are likely to use it inappropriately.
If you can't think of a use for thread pools, don't use them. ;)
A common mistake people make is to assume that because they have lots of cpus now, they have to use them all as if this were a reason in itself. Its like saying I have lots of disk space, I must find a way to use all of it.
A good reason to use thread pools is to improve the performance of CPU bounds processes and the simplicity of IO bound processes (rather than using non-blocking IO with one thread)
If you have a busy CPU bound process which performs tasks which can be executed independently you have a good use case for a thread pool.
Note: Thread pool often has just one thread. There are specific static factories for these. If you want a simple background worker, this may be an option.
Note 2: A common mistake is to assume that a CPU bound tasks will run best on hundreds or thousands of threads. The optimial number of threads can be the number of core or cpus you have. Once all these are busy, you may find additional threads just add overhead.
Initializing a new thread (and its own stack) is a costly operation.
Thread pools are use to avoid this cost by reusing threads already created. Thus using thread pools you get better performance then creating new threads every time.
Also note that created threads might need to be "deleted" after they have been used, which increases the cost of garbage collection and the frequency it will happen (as the memory fills up faster).
This analysis is just from the performance point of view. I cannot think of an advantage of using thread pools in terms of scalability at the moment.
I googled "why use java thread pools" and found:
A thread pool offers a solution to both the problem of thread
life-cycle overhead and the problem of resource thrashing.
http://www.ibm.com/developerworks/library/j-jtp0730/index.html
and
The newCachedThreadPool method creates an executor with an expandable
thread pool. This executor is suitable for applications that launch
many short-lived tasks.
The newSingleThreadExecutor method creates an
executor that executes a single task at a time.
http://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html
Assume that I have a set of objects that need to be analyzed in two different ways, both of which take relatively long time and involve IO-calls, I am trying to figure out how/if I could go about optimizing this part of my software, especially utilizing the multiple processors (the machine i am sitting on for ex is a 8-core i7 which almost never goes above 10% load during execution).
I am quite new to parallel-programming or multi-threading (not sure what the right term is), so I have read some of the prior questions, particularly paying attention to highly voted and informative answers. I am also in the process of going through the Oracle/Sun tutorial on concurrency.
Here's what I thought out so far;
A thread-safe collection holds the objects to be analyzed
As soon as there are objects in the collection (they come a couple at a time from a series of queries), a thread per object is started
Each specific thread takes care of the initial pre-analysis preparations; and then calls on the analyses.
The two analyses are implemented as Runnables/Callables, and thus called on by the thread when necessary.
And my questions are:
Is this a reasonable scheme, if not, how would you go about doing this?
In order to make sure things don't get out of hand, should I implement a ThreadManager or some thing of that sort, which starts and stops threads, and re-distributes them when they are complete? For example, if i have 256 objects to be analyzed, and 16 threads in total, the ThreadManager assigns the first finished thread to the 17th object to be analyzed etc.
Is there a dramatic difference between Runnable/Callable other than the fact that Callable can return a result? Otherwise should I try to implement my own interface, in that case why?
Thanks,
You could use a BlockingQueue implementation to hold your objects and spawn your threads from there. This interface is based on the producer-consumer principle. The put() method will block if your queue is full until there is some more space and the take() method will block if the queue is empty until there are some objects again in the queue.
An ExecutorService can help you manage your pool of threads.
If you are awaiting a result from your spawned threads then Callable interface is a good idea to use since you can start the computation earlier and work in your code assuming the results in Future-s. As far as the differencies with the Runnable interface, from the Callable javadoc:
The Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception.
Some general things you need to consider in your quest for java concurrency:
Visibility is not coming by defacto. volatile, AtomicReference and other objects in the java.util.concurrent.atomic package are your friends.
You need to carefully ensure atomicity of compound actions using synchronization and locks.
Your idea is basically sound. However, rather than creating threads directly, or indirectly through some kind of ThreadManager of your own design, use an Executor from Java's concurrency package. It does everything you need, and other people have already taken the time to write and debug it. An executor manages a queue of tasks, so you don't need to worry about providing the threadsafe queue yourself either.
There's no difference between Callable and Runnable except that the former returns a value. Executors will handle both, and ready them the same.
It's not clear to me whether you're planning to make the preparation step a separate task to the analyses, or fold it into one of them, with that task spawning the other analysis task halfway through. I can't think of any reason to strongly prefer one to the other, but it's a choice you should think about.
The Executors provides factory methods for creating thread pools. Specifically Executors#newFixedThreadPool(int nThreads) creates a thread pool with a fixed size that utilizes an unbounded queue. Also if a thread terminates due to a failure then a new thread will be replaced in its place. So in your specific example of 256 tasks and 16 threads you would call
// create pool
ExecutorService threadPool = Executors.newFixedThreadPool(16);
// submit task.
Runnable task = new Runnable(){};;
threadPool.submit(task);
The important question is determining the proper number of threads for you thread pool. See if this helps Efficient Number of Threads
Sounds reasonable, but it's not as trivial to implement as it may seem.
Maybe you should check the jsr166y project.
That's probably the easiest solution to your problem.
I have a Java project where I need to run things in parallel. I do this with executors. The thing is, I need to use executors in a great many places. Should I favor passing a few executors around to do the work (forget about limiting the global number of threads for a moment) or is it preferable to create the executors where I need them?
What you really need to think about is controlling the number of Threads working off any Executors you create.
The number of threads you create off each executor will be a function of the frequency of arrival and expected duration (processing time) of each task being submitted. Having a queue per logical task type allows you to tune the executor for just that task, so that you don't have more threads than required, and you can always keep up with the expected task throughput.
If you have one monolithic Executor shared across all processing stages in your app it becomes much harder to tune.
SEDA is a typical concurrency pattern that reflects this principle of queue per processing stage.
In some instances it does make sense to have a shared executor, such as for infrequent, ad-hoc or low priority scheduled tasks.
There's no strict rule that will tell you how many executors should be used. One thing, though can be recommended. Use some dependency injection mechanism or framework to inject executor implementations. This will allow quick and easy replacement and configuration of used executors.