Prevent from slow job taking over a thread pool - java

I have a system where currently every job has it's own Runnable class and I pre defined a fixed number of threads for every job.
My understanding is that it is a wrong practice, because:
You have to tailor the number of threads with respect to the machine running the process.
Each threads can only take one type of job.
Would you agree on that? (current solution is wrong)
So, I'd like to use something like Java's ThreadPool instead. I was conflicted with an argument claiming that by doing so, slow jobs will take over most of the thread pool, leaving no place to the other jobs. Whereas, with the current solution, a fixed number of threads were assigned to the slow worker and it won't hurt the others.
(Notice that you can't know a-priori if a job will be "slow")
How can a system be both adaptive in the number of threads it uses, but at the same time not be bounded to the most slow job?

You could try getting the time it takes for the job to complete (With a hand-made Timer class of sorts. Then you normalize this value by dividing this time by the maximum time any given thread has taken. Finally, you multiply this number by a fixed number which varies depending on how many threads you want running per job per second. This will be the requested amount of threads this process should be using. You can adjust that according.
Edit: You can set minimum and maximum values that regulate how many threads a job is entitled to. You could alternatively request threads from a very spacious job when another thread enters the system.
Hope that helps!

It's more of a business problem. Let's say I am a telecom operator. I bar my subscribers from making outgoing calls when they don't clear their dues. When they make payment I clear a flag and in a second the subscriber can make calls. But a lot of other activities go on in my system like usage processing, billing, bill formatting etc.
Now let's assume I have a system wide common pool of threads and I started the billing of 50K subscribers. All my threads are now processing the relatively long running billing jobs and a huge queue is building up.
A poor customer now makes a payment and wants to make an urgent call. But I have no thread left in my pool to clear the flag. The customer had to wait for an hour before he can make the call. That's SLA breach.
What I should have done is create separate thread pools. If the call unblocking jobs are not very frequent and short, I can create a separate pool for it with core size 5 maybe. For billing jobs I'd rather create a pool with core size 25 and max-size 30.
So, my system limits won't anyway exceed because I know in even the worst situation I won't have more than 30 threads.
This will also make it easy to debug. If I have a different thread name pattern for each pool amd my system has some issues. I can easily take a thread dump and understand if the billing or the payment stuff is the culprit.
So, I think the existing design is based on some business use case which you need to thoroughly understand before proposing a solution.

Related

Recommended size of a thread pool where several related tasks are submitted once

I am trying to make a program that will execute a variable number of possibly (but not certainly) computationally heavy tasks in parallel. These tasks (of Runnable type) will all be submitted at the same time and the thread pool should shut down once all these tasks are complete (in other words, the pool will only need to accept the initial tasks and nothing more).
In most of the answers that I found on this site, the question was about a server-based task (I am running my program on a decent desktop) or a pool that accepts tasks over irregular time intervals. In the questions that were not specific about the use, the answer was usually "it depends."
I have basically zero experience with threads, so I really do not know what is the optimal "thread count to task intensity" ratio.
For context, the program that I am working on deals with collections of matrices (represented by 3D arrays) where each matrix can contain up to 1000x1000 elements. One of the tasks may be to perform a convolution operation, and each task is an operation on one of the matrices in the collection.
Is there a recommendation for this specific type of problem?
The same that you hear when that question gets asked for a server: don't make assumptions, make experiments.
Try to identify (worst case: guess) the typical hardware setup that your users are running your software on. Then make sure you can do nicely automated performance testing. And then see what happens.
But thing is: that won't help much. You see, when you run your own server, you are (hopefully) in control about the workload that these machines are busy with. For a desktop setup, where remote users run your code on their boxes ... you have zero insights what else is running there. You might find that 16 threads are fine for 50% of the users. But the rest is maybe doing a lot of other things on their machines, and 16 is already way too much for them.
And that is the real crux. No matter what number you find "good to go" for a specific hardware configuration, you have no control about other workloads.
From that point of view, I would be pretty conservative. For a CPU intensive workload "too many" threads isn't helpful anyway, so go with the number of CPUs, or better number of cores as starting point.
Beyond that, what might be really helpful here: add some sort of "data gathering" to your application. Meaning: have it call home regularly, to tell you things like: "this is the hardware I am running on, I am using X threads, and the other workload on the system is Y". That might help you to get to some heuristics to adapt to the most important user setups. But be diligent about what data to collect. Define the questions you want to be answered upfront, and then pull the data you need to answer these questions.
If you workload is computationally intensive (CPU bound) you might want to look into ForkJoinPool which implements worker stealing.
A ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute tasks submitted to the pool and/or created by other active tasks (eventually blocking waiting for work if none exist). This enables efficient processing when most tasks spawn other subtasks (as do most ForkJoinTasks), as well as when many small tasks are submitted to the pool from external clients.

How to decide on the ThreadPoolTaskExecutor pools and queue sizes?

This is may be more general question, on how to decide on the thread pool size, but let's use the Spring ThreadPoolTaskExecutor for this case. I have the following configuration for the pool core and max size and the queue capacity. I've already read about what all these configurations mean - there is a good answer here.
#SpringBootApplication
#EnableAsync
public class MySpringBootApp {
public static void main(String[] args) {
ApplicationContext ctx = SpringApplication.run(MySpringBootApp.class, args);
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(5);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(25);
return executor;
}
}
The above numbers look random to me and I want to understand how to set them up correctly based on my environment. I will outline the following constraints that I have:
the application will be running on a two-core CPU box
the executor will work on a task which usually takes about 1-2
seconds to finish.
Usually I expect 800/min tasks to be submitted to my executor, spiking at 2500/min
The task will construct some objects and make an HTTP call to Google pubsub.
Ideally I'd like to understand what other constraints I need to consider and based on them what will be a reasonable configuration for my pools and queue sizes.
Update : This answer got a few votes over the years so I'm adding a shortened version for people who don't have the time to read my weird metaphor :
TL;DR answer :
The actual constraint is that a (logical) CPU core can only run a single thread at the same time. Thus :
Number of core : Number of logical core of your CPUs * 1/(ratio_of_time_your_thread_is_runnable_when_doing_your_task)
So, if you have 8 logical cores on your machine, you can safely put 8 threads in your threadPool (well, remember to exclude the other threads that may be used). Then you need to ask yourself if you can put more : you need to benchmark the kind of task you intend to run on your threadpool : if you notice the thread are, on average running only 50% of the time, that means your CPU is free to go work on another thread 50% of its time and you can add more threads.
Queue size : as many as you can wait on.
The queue size is the number of items your threadPool will accept before rejecting them. It is business logic. It depends on what behavior you expect : is there a point accepting a billion tasks ? When do you throw the towel ?
If one task takes one second to complete, and you have 10 threads, that means that the 10,000th task in queue will hopefully be done in 1000 seconds. Is that acceptable ?
The worst thing to happen is having clients timeout and re-submit the same tasks before you could complete the firsts.
Original ELI12 answer :
It may not be the most accurate answer, but I'll try :
A simple approach is to be aware that your 2-core CPU will only work on two threads at the same time.
If you have relatively modern Intel CPU, and you have Hyper Threading (aka. HT(TM), HTT(TM), SMT) turned on (via setting in BIOS), your operating system will see the number of available cores as double the number of the physical cores within your CPU.
Either way, from Java to detect how many cores (or simultaneous not-preempting each other threads) you can work with, just call int cores = Runtime.getRuntime().availableProcessors();
If you try to see your application as a Workshop (an actual one) :
A processor would be represented by an employee. It is the physical unit that will add value to a product.
A task would be a lump of raw material (plus some instructions list)
Your thread is a desk on which the employee can put the task on and work.
The queue size is the length of the conveyor belt that brings the raw materials to the desk.
Thus, your question becomes "How can I choose how many desks and how long can my conveyor belt be inside my factory, given an unchanging number of employees ?".
For the how many desks (Threads) part :
An employee can only work at one desk at a time, and you can only have a single employee per desk. Thus, the basic setup would be to have at least as many desks as you have employees (to avoid having any employee (Processor) left out without any possibility to work.
But, depending on your activity, you may afford more desks per employee :
If your employees are expected to put mail inside enveloppes constantly, an operation that require their full attention (in programing : sorting collections, creating objects, incrementing counters), having more desks wouldn't help, and may even be detrimental because your employee would have to
sometime change desk (switching context, which takes some time), thus leaving the one they were working on, to make work progress on the other.
But, if your task is making pottery, and relies on your employee waiting for the clay to cook in an oven (understand getting access to external resource, such as a file system, a web service etc), your employee can afford to go model clay on another desk and get back to the first one later.
Thus, you can afford more desks per employee as long as your task have a active work/waiting ratio (running/waiting) big enough. And the number of desks being how many tasks can your employee make progress on during the waiting time.
For the conveyor belt (queue) size part :
The queue size represents how many item you are allowing to be queued before starting to reject any more task (by throwing an exception), thus being the threshold at which you start to tell "ok, I'm already overbooked and won't ever be able to comply"
First, I'd say your conveyer belt needs to fit inside the workshop. Meaning that the collection should be small enough to prevent out of memory errors (obviously).
After that, it is based on your company policy. Let's assume a task is added to the belt every time a client makes an order (another service call your API). If the caller doesn't care how much time you take to comply and trust you enough with the execution, there's no point in limiting the size of the belt.
But if you can expect that your client gets annoyed after waiting for their pottery for a month, and leaves you for a concurrent or reorder another pottery, assuming the first order was lost and won't be bothered to ever check if the first order was completed... That first order was done for nothing, you won't get payed, and if your client makes another order whenever you're too slow to comply, you'll enter in a feedback loop because every new order will slow down the whole process.
Thus, in that case, you should put up a sign telling your client "sorry, we're overbooked, you shouldn't make any new order now, as we won't be able to comply within an acceptable time range".
Then, the queue size would be : acceptable time range / time to complete a task.
Concrete Example : if your client service expects that the task it submits would have to be completed in less than 100 seconds, and knowing that every task takes 1-2 seconds, you should limit the queue to 50-100 tasks because once you have 100 tasks waiting in the queue, you're pretty sure that the next one won't be completed in less than 100 seconds, thus rejecting the task to prevent the service from waiting for nothing.

Synchronous multithreading in Java (Apache HTTPClient)

I am wondering how I would go about doing this. Say I load a list of 1,000 words and for each word a thread is created and say it does a google search on each word. The problem here is obvious. I can't have 1k threads, can I. Keep in mind I am extremely new to threads and synchronization. So basically I am wondering how I would go about using less threads. I assume I have to set thread amount to a fixed number and synchronize the threads. Was wondering how to do this with Apache HttpClient using GetThread and then run it. In run I'm getting the data from webpage and turning it into a String and then checking if it contains a certain word.
Surely you can have as many threads as you want. But in general it is not recommended to use more threads than there are processing cores on your computer.
And don't forget that creating 1000 internet sessions at once affects your networking. A size of one single google page is nearly 0.3 megabytes. Are you really going to download 300 megabytes of data at once?
By the way,
There is a funny thing about concurrency.
Some people say: "synchronization is like concurrency". It is not true.
Synchronization is the opposite of concurrency.
Concurrency is when lots of things happen in parallel.
Synchronization is when I am blocking you.
(Joshua Bloch)
Maybe you can look at this problem this way.
You have 1000 words and for each word you are going to carry out a search.
In other words there are 1000 tasks to be executed and they are not related
to each other, so there is no need for synchronization in the case of this
problem as per the following definition from Wiki.
"In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data Synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity"
So in this problem you do not have to synchronize the 1000 processes which
execute the word searches since they can run independently and dont need
to join forces. So it is not a Process synchronization.
It is not a Data synchronization either since the data of each search is
independent of the other 999 searches.
Hence when Joshua says Synchronization is when I am blocking you, there is no need of blocking in this case.
Yes all tasks can concurrently get executed in different threads.
Of course your system may not have the resources to run 1000 threads
concurrently ( read same time ).
So you need concepts like pools where a pool has a certain no of
threads...say if it has 10 threads...then those 10 will start
10 independent searches on 10 words from your list.
If any of them is done with its task then it will take up the next
word search task available and the process goes on....

Java ExecutorService - sometimes slower than sequential processing?

I'm writing a simple utility which accepts a collection of Callable tasks, and runs them in parallel. The hope is that the total time taken is little over the time taken by the longest task. The utility also adds some error handling logic - if any task fails, and the failure is something that can be treated as "retry-able" (e.g. a timeout, or a user-specified exception), then we run the task directly.
I've implemented this utility around an ExecutorService. There are two parts:
submit() all the Callable tasks to the ExecutorService, storing the Future objects.
in a for-loop, get() the result of each Future. In case of exceptions, do the "retry-able" logic.
I wrote some unit tests to ensure that using this utility is faster than running the tasks in sequence. For each test, I'd generate a certain number of Callable's, each essentially performing a Thread.sleep() for a random amount of time within a bound. I experimented with different timeouts, different number of tasks, etc. and the utility seemed to outperform sequential execution.
But when I added it to the actual system which needs this kind of utility, I saw results that were very variable - sometimes the parallel execution was faster, sometimes it was slower, and sometimes it was faster, but still took a lot more time than the longest individual task.
Am I just doing it all wrong? I know ExecutorService has invokeAll() but that swallows the underlying exceptions. I also tried using a CompletionService to fetch task results in the order in which they completed, but it exhibited more or less the same behavior. I'm reading up now on latches and barriers - is this the right direction for solving this problem?
I wrote some unit tests to ensure that using this utility is faster than running the tasks in sequence. For each test, I'd generate a certain number of Callable's, each essentially performing a Thread.sleep() for a random amount of time within a bound
Yeah this is certainly not a fair test since it is using neither CPU nor IO. I certainly hope that parallel sleeps would run faster than serial. :-)
But when I added it to the actual system which needs this kind of utility, I saw results that were very variable
Right. Whether or not a threaded application runs faster than a serial one depends a lot on a number of factors. In particular, IO bound applications will not improve in performance since they are bound by the IO channel and really cannot do concurrent operations because of this. The more processing that is needed by the application, the larger the win is to convert it to be multi-threaded.
Am I just doing it all wrong?
Hard to know without more details. You might consider playing around with the number of threads that are running concurrently. If you have a ton of jobs to process you should not be using a Executos.newCachedThreadPool() and should optimized the newFixedSizeThreadPool(...) depending on the number of CPUs your architecture has.
You also may want to see if you can isolate the IO operations in a few threads and the processing to other threads. Like one input thread reading from a file and one output thread (or a couple) writing to the database or something. So multiple sized pools may do better for different types of tasks instead of using a single thread-pool.
tried using a CompletionService to fetch task results in the order in which they completed
If you are retrying operations, using a CompletionService is exactly the way to go. As jobs finish and throw exceptions (or return failure), they can be restarted and put back into the thread-pool immediately. I don't see any reason why your performance problems would be because of this.
Multi-threaded programming doesn't come for free. It has an overhead. The over head can easily exceed and performance gain and usually makes your code more complex.
Additional threads give access to more cpu power (assuming you have spare cpus) but in general they won't make you HDD spin faster , give you more network bandwidth or speed up something which is not cpu bound.
Multiple threads can help give you a greater share of an external resource.

Multiple SingleThreadExecutors for a given application...a good idea?

This question is about the fallouts of using SingleThreadExecutor (JDK 1.6). Related questions have been asked and answered in this forum before, but I believe the situation I am facing, is a bit different.
Various components of the application (let's call the components C1, C2, C3 etc.) generate (outbound) messages, mostly in response to messages (inbound) that they receive from other components. These outbound messages are kept in queues which are usually ArrayBlockingQueue instances - fairly standard practice perhaps. However, the outbound messages must be processed in the order they are added. I guess use of a SingleThreadExector is the obvious answer here. We end up having a 1:1 situation - one SingleThreadExecutor for one queue (which is dedicated to messages emanating from one component).
Now, the number of components (C1,C2,C3...) is unknown at a given moment. They will come into existence depending on the need of the users (and will be eventually disposed of too). We are talking about 200-300 such components at the peak load. Following the 1:1 design principle stated above, we are going to arrange for 200 SingleThreadExecutors. This is the source of my query here.
I am uncomfortable with the thought of having to create so many SingleThreadExecutors. I would rather try and use a pool of SingleThreadExecutors, if that makes sense and is plausible (any ready-made, seen-before classes/patterns?). I have read many posts on recommended use of SingleThreadExecutor here, but what about a pool of the same?
What do learned women and men here think? I would like to be directed, corrected or simply, admonished :-).
If your requirement is that the messages be processed in the order that they're posted, then you want one and only one SingleThreadExecutor. If you have multiple executors, then messages will be processed out-of-order across the set of executors.
If messages need only be processed in the order that they're received for a single producer, then it makes sense to have one executor per producer. If you try pooling executors, then you're going to have to put a lot of work into ensuring affinity between producer and executor.
Since you indicate that your producers will have defined lifetimes, one thing that you have to ensure is that you properly shut down your executors when they're done.
Messaging and batch jobs is something that has been solved time and time again. I suggest not attempting to solve it again. Instead, look into Quartz, which maintains thread pools, persisting tasks in a database etc. Or, maybe even better look into JMS/ActiveMQ. But, at the very least look into Quartz, if you have not already. Oh, and Spring makes working with Quartz so much easier...
I don't see any problem there. Essentially you have independent queues and each has to be drained sequentially, one thread for each is a natural design. Anything else you can come up with are essentially the same. As an example, when Java NIO first came out, frameworks were written trying to take advantage of it and get away from the thread-per-request model. In the end some authors admitted that to provide a good programming model they are just reimplementing threading all over again.
It's impossible to say whether 300 or even 3000 threads will cause any issues without knowing more about your application. I strongly recommend that you should profile your application before adding more complexity
The first thing that you should check is that number of concurrently running threads should not be much higher than number of cores available to run those threads. The more active threads you have, the more time is wasted managing those threads (context switch is expensive) and the less work gets done.
The easiest way to limit number of running threads is to use semaphore. Acquire semaphore before starting work and release it after the work is done.
Unfortunately limiting number of running threads may not be enough. While it may help, overhead may still be to great, if time spent per context switch is major part of total cost of one unit of work. In this scenario, often the most efficient way is to have fixed number of queues. You get queue from global pool of queues when component initializes using algorithm such as round-robin for queue selection.
If you are in one of those unfortunate cases where most obvious solutions do not work, I would start with something relatively simple: one thread pool, one concurrent queue, lock, list of queues and temporary queue for each thread in pool.
Posting work to queue is simple: add payload and identity of producer.
Processing is relatively straightforward as well. First you get get next item from queue. Then you acquire the lock. While you have lock in place, you check if any of other threads is running task for same producer. If not, you register thread by adding a temporary queue to list of queues. Otherwise you add task to existing temporary queue. Finally you release the lock. Now you either run the task or poll for next and start over depending on whether current thread was registered to run tasks. After running the task, you get lock again and see, if there is more work to be done in temporary queue. If not, remove queue from list. Otherwise get next task. Finally you release the lock. Again, you choose whether to run the task or to start over.

Categories