i have a ScheduledExecutorService that gets tasks for periodically execution:
scheduler = Executors.newScheduledThreadPool( what size? );
public addTask(ScheduledFuture<?> myTask, delay, interval) {
myTask = scheduler.scheduleAtFixedRate(new Runnable() {
// doing work here
},
delay,
interval,
TimeUnit.MILLISECONDS );
}
The number of tasks the scheduler gets depends solely on the user of my program. Normaly it should be a good idea, afaik, to make the ThreadPoolSize #number_of_Cpu_Threads, so that each CPU or CPU Thread executes one Task at a time, cause this should give the fastest throughput. But what should i do if the Tasks involve I/O (as they do in my program)? The tasks in my program are grabbing data from a server on the internet and saving them in a db. So that means most of the time they are waiting for the data to come in (aka idle). So what would be the best solution for this problem?
It really depends on the exact context:
How many tasks will be added? (You've said it's up to the user, but do you have any idea? Do you know this before you need to create the pool?)
How long does each of them take?
Will they be doing any intensive work?
If they're all saving to the same database, is there any concurrency issue there? (Perhaps you want to have several threads fetching from different servers and putting items in a queue, but only one thread actually storing data in the database?)
So long as you don't get "behind", how important is the performance anyway?
Ultimately I strongly suspect you'll need to benchmark this yourself - it's impossible to give general guidance without more information, and even with specific numbers it would be mostly guesswork. Hard data is much more useful :)
Note that the argument to newScheduledThreadPool only specifies the number of core threads to keep in the thread pool if threads are idle - so it's going to be doing a certain amount of balancing itself.
Related
I am trying to implement a divide-and-conquer solution to some large data. I use fork and join to break down things into threads. However I have a question regarding the fork mechanism: if I set my divide and conquer condition as:
#Override
protected SomeClass compute(){
if (list.size()<LIMIT){
//Do something here
...
}else{
//Divide the list and invoke sub-threads
SomeRecursiveTaskClass subWorker1 = new SomeRecursiveTaskClass(list.subList());
SomeRecursiveTaskClass subWorker2 = new SomeRecursiveTaskClass(list.subList());
invokeAll(subWorker1, subWorker2);
...
}
}
What will happen if there is not enough resource to invoke subWorker (e.g. not enough thread in pool)? Does Fork/Join framework maintains a pool size for available threads? Or should I add this condition into my divide-and-conquer logic?
Each ForkJoinPool has a configured target parallelism. This isn’t exactly matching the number of threads, i.e. if a worker thread is going to wait via a ManagedBlocker, the pool may start even more threads to compensate. The parallelism of the commonPool defaults to “number of CPU cores minus one”, so when incorporating the initiating non-pool thread as helper, the resulting parallelism will utilize all CPU cores.
When you submit more jobs than threads, they will be enqueued. Enqueuing a few jobs can help utilizing the threads, as not all jobs may run exactly the same time, so threads running out of work may steal jobs from other threads, but splitting the work too much may create an unnecessary overhead.
Therefore, you may use ForkJoinTask.getSurplusQueuedTaskCount() to get the current number of pending jobs that are unlikely to be stolen by other threads and split only when it is below a small threshold. As its documentation states:
This value may be useful for heuristic decisions about whether to fork other tasks. In many usages of ForkJoinTasks, at steady state, each worker should aim to maintain a small constant surplus (for example, 3) of tasks, and to process computations locally if this threshold is exceeded.
So this is the condition to decide whether to split your jobs further. Since this number reflects when idle threads steal your created jobs, it will cause balancing when the jobs have different CPU load. Also, it works the other way round, if the pool is shared (like the common pool) and threads are already busy, they will not pick up your jobs, the surplus count will stay high and you will automatically stop splitting then.
I have around 100000 tasks need to be done, I know that they are CPU intensive, but will only take a short execution time (When CPU is fast enough).
I use ExecutorService executor = Executors.newFixedThreadPool(8);
I choose 8 because my CPU has 8 cores.
Then to process my tasks, I loop through all of them:
for(Task task : tasks) {
executor.submit(new Runnable() {
// 1. Text analyzing
// 2. Add result to a LinkedBlockingQueue
}
}
What I observed is that for the first few thousands of task, it is really fast. But then, says after 10k tasks has been processed, the speed becomes slower, and slower...
I tried to understand but fail to figure out why it gradually becomes slower. Since when a task is done, the resource will also be freed. So I expected that the processing speed should be stable.
Then I figured out that the problem may belong to the LinkedBlockingQueue that I use to store the result from the task. But it seems that LinkedBlockingQueue provides good performance for inserting.
Can someone give me some hints or suggestions what I may do wrong in this case?
Thank you.
The problem belongs to the performance decrement of the LinkedBlockingQueue. In my case the producers were more productive in adding data to the queue while the consumers were too slow to handle.
Java performance problem with LinkedBlockingQueue
I need to perform a task every few hours and I'm looking for most efficient solution for that. I thought about two approaches:
1) busy waiting
while(true){
doMyJob()
wait(2*hour);
}
2) executor scheduling:
executor.schedule(new MyJobTask(),2,TimeUnit.HOUR);
...
class MyJobTask implements Runnable{
void run(){
doMyJob();
...
executor.schedule(new MyJobTask(),2,TimeUnit.HOUR);
}
Could you please advise me which solution is more efficient and in what situation each of them is more preferable (if any). Intuitively, I would go for second solution but I couldn't find anything to prove my intuition. If you have some other solutions - please share. Solution should be also memory efficient (that's why I have a dilemma - do I need to create and keep a ThreadPool object only to do a simple job every two hours).
None of the proposed solutions is really advisable inside an EE container (where you should avoid messing with threads), which you seem to target according to the tags of your question.
Starting with Java EE 5 there is the timer service which according to my tests works quite nicely with longer timeouts like the 2 hours in your example. There is one point that you really shouldn't forget though - quoting from the aforementioned tutorial:
Timers are persistent. If the server is shut down (or even crashes), timers are saved and will become active again when the server is restarted. If a timer expires while the server is down, the container will call the #Timeout method when the server is restarted.
If for whatever reason this solution is not acceptable you should have a look at the Quartz Scheduler. Its possibilities exceed your requirements by far, but at least it gives you a ready to use solution whose compatibility with a wide range of application servers is guaranteed.
Both should have about the same efficiency but I would suggest using ScheduledExecutorService
Executors.newSingleThreadScheduledExecutor()
.scheduleAtFixedRate(new MyJobTask(), 0, 2, TimeUnit.HOUR);
There are several reasons, detailed here: A better way to run code for a period of time
But importantly, ScheduledExecutorService allows you to use multiple threads so that tasks which take a long time don't necessarily have to back-up your queue of tasks (the service can be running 2 of your tasks simultaneously). Also, if doMyJob throws an exception, ScheduledExecutorService will continue to schedule your task rather than being cancelled because it failed to reschedule the task.
I want to control the amount of time that each thread uses.
One thread does some processing and another processes data in the database, but the insertion is slower than processing because of the amount of generated data. I want to give more processor time to insert that data.
Is it possible do this with threads? At the moment, I'm putting a sleep in the thread doing the processing, but the time of insertion changes according to the machine. Is there another way I can do this? Is the way involving the use of thread synchronization inside my program?
You can increase the priority of a thread using Thread.setPriority(...) but this is not ideal.
Perhaps you can use some form of blocking queue from the java.util.concurrent package to make one Thread wait while another Thread is doing something. For example, a SynchronousQueue can be used to send a message from one Thread to another Thread that it can now do something.
Another approach is to use Runnables instead of Threads, and submit the Runnables to an Executor, such as ThreadPoolExecutor. This executor will have the role of making sure Runnables are using a fair amount of time.
The first thing to mention is that thread priority doesn't per se mean "share of the CPU". There seems to be a lot of confusion about what thread priority actually means, partly because it actually means different things under different OS's. If you're working in Linux, it actually does mean something close to relative share of CPU. But under Windows, it definitely doesn't. So in case it's of any help, you may firstly want to look at some information I compiled a little while ago about thread priorities in Java, which explains what Thread Priorities Actually Mean on different systems.
The general answer to your question is that if you want a thread to take a particular share of CPU, it's better to implicitly do that programmatically: periodically, for each "chunk" of processing, measure how much time elapsed (or how much CPU was used-- they're not strictly speaking the same thing), then sleep an appropriate amount of time so that the processing/sleep ratio comes to roughly the % of processing time you intended.
However, I'm not sure that will actually help your task here.
As I understand, basically you have an insertion task which is the rate determining step. Under average circumstances, it's unlikely that the system is "deliberately dedicating less CPU than it can or needs to" to the thread running that insertion.
So there's probably more mileage in looking at that insertion task and seeing if programmatically you can change how that insertion task functions. For example: can you insert in larger batches? if the insertion process really is CPU bound for some reason (which I am suspicious of), can you multi-thread it? why does your application actually care about waiting for the insertion to finish, and can you change that dependency?
If the insertion is to a standard DB system, I wonder if that insertion is terribly CPU bound anyway?
One way would be to set the priority of the processing thread to be lower than the other. But beware this is not recommended as it wont keep your code platform independent. (DIfferent thread priorities behave differently on different platforms).
Another way would be to use a service where database thread would keep sending messages about its current status (probably some flag "aboutToOver").
Or use synchronization say a binary semaphore. When the database thread is working, the other thread would be blocked and hence db thread would be using all the resources. But again processing thread would be blocked in the mean time. Actually this will be the best solution as the processign thread can perform say 3-4 tasks and then will get blocked by semaphore till later when it can again get up and do task
I'd like to have a ScheduledThreadPoolExecutor which also stops the last thread if there is no work to do, and creates (and keeps threads alive for some time) if there are new tasks. But once there is no more work to do, it should again discard all threads.
I naivly created it as new ScheduledThreadPoolExecutor(0) but as a consequence, no thread is ever created, nor any scheduled task is ever executed.
Can anybody tell me if I can achieve my goal without writing my own wrapper around the ScheduledThreadpoolExecutor?
Thanks in advance!
Actually you can do it, but its non-obvious:
Create a new ScheduledThreadPoolExecutor
In the constructor set the core threads to the maximum number of threads you want
set the keepAliveTime of the executor
and at last, allow the core threads to timeout
m_Executor = new ScheduledThreadPoolExecutor ( 16,null );
m_Executor.setKeepAliveTime ( 5, TimeUnit.SECONDS );
m_Executor.allowCoreThreadTimeOut ( true );
This works only with Java 6 though
I suspect that nothing provided in java.util.concurrent will do this for you, just because if you need a scheduled execution service, then you often have recurring tasks to perform. If you have a recurring task, then it usually makes more sense to just keep the same thread around and use it for the next recurrence of the task, rather than tearing down your thread and having to build a new one at the next recurrence.
Of course, a scheduled executor could be used for inserting delays between non-recurring tasks, or it could be used in cases where resources are so scarce and recurrence is so infrequent that it makes sense to tear down all your threads until new work arrives. So, I can see cases where your proposal would definitely make sense.
To implement this, I would consider trying to wrap a cached thread pool from Executors.newCachedThreadPool together with a single-threaded scheduled executor service (i.e. new ScheduledThreadPoolExecutor(1)). Tasks could be scheduled via the scheduled executor service, but the scheduled tasks would be wrapped in such a way that rather than having your single-threaded scheduled executor execute them, the single-threaded executor would hand them over to the cached thread pool for actual execution.
That compromise would give you a maximum of one thread running when there is absolutely no work to do, and it would give you as many threads as you need (within the limits of your system, of course) when there is lots of work to do.
Reading the ThreadPoolExecutor javadocs might suggest that Alex V's solution is okay. However, doing so will result in unnecessarily creating and destroying threads, nothing like a cashed thread-pool. The ScheduledThreadPool is not designed to work with a variable number of threads. Having looked at the source, I'm sure you'll end up spawning a new thread almost every time you submit a task. Joe's solution should work even if you are ONLY submitting delayed tasks.
PS. I'd monitor your threads to make sure your not wasting resources in your current implementation.
This problem is a known bug in ScheduledThreadPoolExecutor (Bug ID 7091003) and has been fixed in Java 7u4. Though looking at the patch, the fix is that "at least one thread is started even if corePoolSize is 0."