Execution in FixedThreadPool gets slower as time goes by

Execution in FixedThreadPool gets slower as time goes by - java

I have around 100000 tasks need to be done, I know that they are CPU intensive, but will only take a short execution time (When CPU is fast enough).
I use ExecutorService executor = Executors.newFixedThreadPool(8);
I choose 8 because my CPU has 8 cores.
Then to process my tasks, I loop through all of them:
for(Task task : tasks) {
executor.submit(new Runnable() {
// 1. Text analyzing
// 2. Add result to a LinkedBlockingQueue
}
}
What I observed is that for the first few thousands of task, it is really fast. But then, says after 10k tasks has been processed, the speed becomes slower, and slower...
I tried to understand but fail to figure out why it gradually becomes slower. Since when a task is done, the resource will also be freed. So I expected that the processing speed should be stable.
Then I figured out that the problem may belong to the LinkedBlockingQueue that I use to store the result from the task. But it seems that LinkedBlockingQueue provides good performance for inserting.
Can someone give me some hints or suggestions what I may do wrong in this case?
Thank you.

The problem belongs to the performance decrement of the LinkedBlockingQueue. In my case the producers were more productive in adding data to the queue while the consumers were too slow to handle.
Java performance problem with LinkedBlockingQueue

Related

How do I know if Fork and Join has enough pool size in Java?

I am trying to implement a divide-and-conquer solution to some large data. I use fork and join to break down things into threads. However I have a question regarding the fork mechanism: if I set my divide and conquer condition as:
#Override
protected SomeClass compute(){
if (list.size()<LIMIT){
//Do something here
...
}else{
//Divide the list and invoke sub-threads
SomeRecursiveTaskClass subWorker1 = new SomeRecursiveTaskClass(list.subList());
SomeRecursiveTaskClass subWorker2 = new SomeRecursiveTaskClass(list.subList());
invokeAll(subWorker1, subWorker2);
...
}
}
What will happen if there is not enough resource to invoke subWorker (e.g. not enough thread in pool)? Does Fork/Join framework maintains a pool size for available threads? Or should I add this condition into my divide-and-conquer logic?

Each ForkJoinPool has a configured target parallelism. This isn’t exactly matching the number of threads, i.e. if a worker thread is going to wait via a ManagedBlocker, the pool may start even more threads to compensate. The parallelism of the commonPool defaults to “number of CPU cores minus one”, so when incorporating the initiating non-pool thread as helper, the resulting parallelism will utilize all CPU cores.
When you submit more jobs than threads, they will be enqueued. Enqueuing a few jobs can help utilizing the threads, as not all jobs may run exactly the same time, so threads running out of work may steal jobs from other threads, but splitting the work too much may create an unnecessary overhead.
Therefore, you may use ForkJoinTask.getSurplusQueuedTaskCount() to get the current number of pending jobs that are unlikely to be stolen by other threads and split only when it is below a small threshold. As its documentation states:
This value may be useful for heuristic decisions about whether to fork other tasks. In many usages of ForkJoinTasks, at steady state, each worker should aim to maintain a small constant surplus (for example, 3) of tasks, and to process computations locally if this threshold is exceeded.
So this is the condition to decide whether to split your jobs further. Since this number reflects when idle threads steal your created jobs, it will cause balancing when the jobs have different CPU load. Also, it works the other way round, if the pool is shared (like the common pool) and threads are already busy, they will not pick up your jobs, the surplus count will stay high and you will automatically stop splitting then.

invokeAll takes longer to return than the sum of constituent futures

I am using the ExecutorService.invokeAll version that takes a collection of callables and a timeout. I am timing the running time of each individual callable, and the sum of these individual running times is less than the timeout specified. I am making sure that my fixed size thread pool has atleast twice as many threads as the number of callables. Also I have made sure that the underlying threadpool queue is usually empty, which means I don't have a fast producer slow consumer problem. Still I am seeing the invokeAll timing out a lot. Any insight is appreciated?

Java Concurrent Iteration: Divide and Conquer vs Runnable for each item

When I have hundreds of items to iterate through, and I have to do a computation-heavy operation to each one, I would take a "divide and conquer" approach. Essentially, I would take the processor count + 1, and divide those items into the same number of batches. And then I would execute each batch on a runnable in a cached thread pool. It seems to work well. My GUI task went from 20 seconds to 2 seconds, which is a much better experience for the user.
However, I was reading Brian Goetz' fine book on concurrency, and I noticed that for iterating through a list of items, he would take a totally different approach. He would kick off a Runnable for each item! Before, I always speculated this would be bad, especially on a cached thread pool which could create tons of threads. However each runnable would probably finish very quickly in the larger scope, and I understand the cached thread pool is very optimal for short tasks.
So which is the more accepted paradigm to iterate through computation-heavy items? Dividing into a fixed number of batches and giving each batch a runnable? Or kicking each item off in its own runnable? If the latter approach is optimal, is it okay to use a cached thread pool or is it better to use a bounded thread pool?

With batches you will always have to wait for the longest running batch (you are as fast as the slowest batch). "Divide and conquer" implies management overhead: doing administration for the dividing and monitoring the conquering.
Creating a task for each item is relative straightforward (no management), but you are right in that it may start hundreds of threads (unlikely, but it could happen) which will only slow things down (context switching) if the task does no/very few I/O and is mostly CPU intensive.
If the cached thread pool does not start hundreds of threads (see getLargestPoolSize), then by all means, use the cached thread pool. If too many threads are started then one alternative is to use a bounded thread pool. But a bounded thread pool needs some tuning/decisions: do you use an unbounded task queue or a bounded task queue with a CallerRunsPolicy for example?
On a side note: there is also the ForkJoinPool which is suitable for tasks that start sub-tasks.

Thread.sleep() VS Executor.scheduleWithFixedDelay()

Goal: Execute certain code every once in a while.
Question: In terms of performance, is there a significant difference between:
while(true) {
execute();
Thread.sleep(10 * 1000);
}
and
executor.scheduleWithFixedDelay(runnableWithoutSleep, 0, 10, TimeUnit.SECONDS);
?
Of course, the latter option is more kosher. Yet, I would like to know whether I should embark on an adventure called "Spend a few days refactoring legacy code to say goodbye to Thread.sleep()".
Update:
This code runs in super/mega/hyper high-load environment.

You're dealing with sleep times termed in tens of seconds. The possible savings by changing your sleep option here is likely nanoseconds or microseconds.
I'd prefer the latter style every time, but if you have the former and it's going to cost you a lot to change it, "improving performance" isn't a particularly good justification.
EDIT re: 8000 threads
8000 threads is an awful lot; I might move to the scheduled executor just so that you can control the amount of load put on your system. Your point about varying wakeup times is something to be aware of, although I would argue that the bigger risk is a stampede of threads all sleeping and then waking in close succession and competing for all the system resources.
I would spend the time to throw these all in a fixed thread pool scheduled executor. Only have as many running concurrently as you have available of the most limited resource (for example, # cores, or # IO paths) plus a few to pick up any slop. This will give you good throughput at the expense of latency.
With the Thread.sleep() method it will be very hard to control what is going on, and you will likely lose out on both throughput and latency.
If you need more detailed advice, you'll probably have to describe what you're trying to do in more detail.

Since you haven't mentioned the Java version, so, things might change.
As I recall from the source code of Java, the prime difference that comes is the way things are written internally.
For Sun Java 1.6 if you use the second approach the native code also brings in the wait and notify calls to the system. So, in a way more thread efficient and CPU friendly.
But then again you loose the control and it becomes more unpredictable for your code - consider you want to sleep for 10 seconds.
So, if you want more predictability - surely you can go with option 1.
Also, on a side note, in the legacy systems when you encounter things like this - 80% chances there are now better ways of doing it- but the magic numbers are there for a reason(the rest 20%) so, change it at own risk :)

There are different scenarios,
The Timer creates a queue of tasks that is continually updated. When the Timer is done, it may not be garbage collected immediately. So creating more Timers only adds more objects onto the heap. Thread.sleep() only pauses the thread, so memory overhead would be extremely low
Timer/TimerTask also takes into account the execution time of your task, so it will be a bit more accurate. And it deals better with multithreading issues (such as avoiding deadlocks etc.).
If you thread get exception and gets killed, that is a problem. But TimerTask will take care of it. It will run irrespective of failure in previous run
The advantage of TimerTask is that it expresses your intention much better (i.e. code readability), and it already has the cancel() feature implemented.
Reference is taken from here

You said you are running in a "mega... high-load environment" so if I understand you correctly you have many such threads simultaneously sleeping like your code example. It takes less CPU time to reuse a thread than to kill and create a new one, and the refactoring may allow you to reuse threads.
You can create a thread pool by using a ScheduledThreadPoolExecutor with a corePoolSize greater than 1. Then when you call scheduleWithFixedDelay on that thread pool, if a thread is available it will be reused.
This change may reduce CPU utilization as threads are being reused rather than destroyed and created, but the degree of reduction will depend on the tasks they're doing, the number of threads in the pool, etc. Memory usage will also go down if some of the tasks overlap since there will be less threads sitting idle at once.

Threadpoolsize of ScheduledExecutorService

i have a ScheduledExecutorService that gets tasks for periodically execution:
scheduler = Executors.newScheduledThreadPool( what size? );
public addTask(ScheduledFuture<?> myTask, delay, interval) {
myTask = scheduler.scheduleAtFixedRate(new Runnable() {
// doing work here
},
delay,
interval,
TimeUnit.MILLISECONDS );
}
The number of tasks the scheduler gets depends solely on the user of my program. Normaly it should be a good idea, afaik, to make the ThreadPoolSize #number_of_Cpu_Threads, so that each CPU or CPU Thread executes one Task at a time, cause this should give the fastest throughput. But what should i do if the Tasks involve I/O (as they do in my program)? The tasks in my program are grabbing data from a server on the internet and saving them in a db. So that means most of the time they are waiting for the data to come in (aka idle). So what would be the best solution for this problem?

It really depends on the exact context:
How many tasks will be added? (You've said it's up to the user, but do you have any idea? Do you know this before you need to create the pool?)
How long does each of them take?
Will they be doing any intensive work?
If they're all saving to the same database, is there any concurrency issue there? (Perhaps you want to have several threads fetching from different servers and putting items in a queue, but only one thread actually storing data in the database?)
So long as you don't get "behind", how important is the performance anyway?
Ultimately I strongly suspect you'll need to benchmark this yourself - it's impossible to give general guidance without more information, and even with specific numbers it would be mostly guesswork. Hard data is much more useful :)
Note that the argument to newScheduledThreadPool only specifies the number of core threads to keep in the thread pool if threads are idle - so it's going to be doing a certain amount of balancing itself.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.