Use only a subset of threads in an ExecutorService - java

In a typical JAVA application, one configures a global ExecutorService for managing a global thread pool. Lets say I configure a fixed thread pool of 100 threads:
ExecutorService threadPool = Executors.newFixedThreadPool(100);
Now lets say that I have a list of 1000 files to upload to a server, and for each upload I create a callable that will handle the upload of this one file.
List<Callable> uploadTasks = new ArrayList<Callable>();
// Fill the list with 1000 upload tasks
How can I limit the max number of concurrent uploads to, lets say, 5?
if I do
threadPool.invokeAll(uploadTasks);
I dont have control on how many threads my 1000 tasks will take. Potentially 100 uploads will run in parallel, but I only want max 5.
I would like to create some sort of sub-executor, wich uses a subset of the threads of the parent executor.
I dont want to create a new separated executorService just for upload, because i want to manage my thread pool globally.
Does any of you know how do to that or if an existing implementation exists? Ideally something like
ExecutorService uploadThreadPool = Executors.createSubExecutor(threadPool,5);
Many thanks,
Antoine

In a typical JAVA application, one configures a global ExecutorService for managing a global thread pool.
I'm not doing this, but maybe i'm atypical. :-) Back to your question:
As having a guard (possibly using a Semaphore inside your Callables will only clutter your global Executor which tasks waiting for each other, you'll have to either
use some external logic which ensures only 5 jobs are running at any time, which could in itself be a Callable submitted to your one Executor. This could be done by wrapping your download jobs with some logic which will pull the next job from a queue (containing the URLs or whatever) once one job is completed. Then you submit five of those "drain download queue" jobs to your Executor and call it done. But atypical as I am, I'd just
use a separate Executor for the downloads. This also gives you the ability to name your threads appropriately (in the Executors ThreadFactory), which might help with debugging and makes nice thread dumps.

Related

Can fixedThreadPool have less threads than it was assigned to?

I have Executors.newFixedThreadPool(/* nThreads= */ 2) executor service. I noticed that sometimes when I pass TWO tasks to the executor service, it runs only ONE task, while I expect it to run TWO tasks. Is that possible and why?
I have two tasks which communicate with each other. These two tasks are put inside fixed thread pool of size two because I want both tasks to be running at the same time.
Executors make sure you will reuse the thread poll most efficiently. But it doesn't gurantee all tasks are executed all at once. I am wondering if you can use 2 threads coming from 2 threadpool which only having 1 thread?

Java ThreadPool concepts, and issues with controlling the number of actual threads

I am a newbie to Java concurrency and am a bit confused by several concepts and implementation issues here. Hope you guys can help.
Say, I have a list of tasks stored in a thread-safe list wrapper:
ListWrapper jobs = ....
'ListWrapper' has synchronized fetch/push/append functions, and this 'jobs' object will be shared by multiple worker threads.
And I have a worker 'Runnable' to execute the tasks:
public class Worker implements Runnable{
private ListWrapper jobs;
public Worker(ListWrapper l){
this.jobs=l;
}
public void run(){
while(! jobs.isEmpty()){
//fetch an item from jobs and do sth...
}
}
}
Now in the main function I execute the tasks:
int NTHREADS =10;
ExecutorService service= Executors.newFixedThreadPool(NTHREADS);
//run threads..
int x=3;
for(int i=0; i<x; i++){
service.execute(new Worker(jobs) );
}
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
Now my questions are:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
Thread t1= new Worker(jobs);
Thread t2= new Worker(jobs);
...
t1.join();
t2.join();
...
Thank you very much!!
[[ There are some good answers here but I thought I'd add some more detail. ]]
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
No, not really. I suspect that the reason you weren't seeing 20 threads is that threads had already finished or had yet to be started. If you call new Thread(...).start() 20 times then you will get 20 threads started. However, if you check immediately none of them may have actually begun to run or if you check later they may have finished.
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
Quoting the Javadocs of Executors.newFixedThreadPool(...):
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks.
So changing the NTHREADS constant changes the number of threads running in the pool. Changing x changes the number of jobs that are executed by those threads. You could have 2 threads in the pool and submit 1000 jobs or you could have 1000 threads and only submit 1 job for them to work on.
Btw, after you have submitted all of your jobs, you should then shutdown the pool which stops all of the threads if all of the jobs have been run.
service.shutdown();
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It differs in that it does all of the heavy work for you.
You don't have to create a ListWrapper of the jobs since you get one inside of the ExecutorService. You just submit the jobs to the ExecutorService and it keeps track of them until the threads are available to run them.
You don't have to create any threads or worry about them throwing exceptions and dying because the ExecutorService starts/restarts the threads for you.
If you want your tasks to return information you can make use of the submit(Callable) method and use the Future to get the results of the jobs. Etc, etc..
Doing this code yourself is going to be harder to get right, more code to maintain, and most likely will not perform as well as the code in the JDK that is battle tested and optimized.
You shouldn't create threads by yourself when using a threadpool. Instead of WorkerThread class you should use a class that implements Runnable but is not a thread. Passing a Thread object to the threadpool won't make the thread run actually. The object will be passed to a different internal thread, which will simply execute the run method of your WorkerThread class.
The ExecutorService is simply incompatible with the way you want to write your program.
In the code you have right now, these WorkerThreads will stop to work when your ListWrapper is empty. If you then add something to the list, nothing will happen. This is definitely not what you wanted.
You should get rid of ListWrapper and simply put your tasks directly into the threadpool. The threadpool already incorporates an internal list of jobs shared between the threads. You should just submit your jobs to the threadpool and it will handle them accordingly.
To answer your questions:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
NTHREADS, the threadpool will create the necessary number of threads.
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It's just that ExecutorService automates a lot of things for you. You can choose from a lot of different implementations of threadpools and you can substitute them easily. You can use for instance a scheduled executor. You get extra functionality. Why reinvent the wheel?
For 1) NTHREADS is the maximum threads that the pool will ever run concurrently, but that doesn't mean there will always be that many running. It will only use as many as is needed up to that max value... which in your case is 3.
As the docs say:
At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool-int-
As for 2) using Java's concurrent executors framework is preferred with new code. You get a lot of stuff for free and removes the need for having to handle all of the fiddly thread work yourself.
The number of threads passed into newFixedThreadPool is at most how many threads could be running executing your tasks. If you only have three tasks ever submitted I'd expect the ExecutorService to only create three threads.
To answer your questions:
You should use the number you pass into the constructor to control how many threads are going to be used to execute your tasks.
This differs because of the extra functionality the ExecutorService gives you, as well as the flexibility it gives you such as in the case you need to change your ExecutorService type or number of tasks you'll run (less lines of code to change).
All that is happening is the executor service is only creating as many threads as it needs. NTHREADS is effectively the maximum number of threads it'll create.
There is no point creating ten threads up front if it only has 3 tasks to complete, the other 7 will just be hanging around consuming resources.
If you submit more than NTHREADS number of tasks then it will process that number concurrently and the rest will wait on a queue until a thread becomes free.
This isn't any different from creating a fixed set of your own threads, except the thread management and scheduling is handled for you. The executor service also restarts threads if they are killed by rogue exceptions in your task which you'd otherwise have to code for.
See: The Javadoc on Executorservice.newFixedThreadPool

Java Multithreading - Starting a lot of threads and keeping them idle, vs lazy loading new threads?

I'm in a situation where I have 4 groups of 7 threads each. My CPU (core i7) is supposed to be able to handle 8 threads, so I'm considering going through each group one at a time, running the 7 threads, then moving to the 2nd group, running its 7 threads, then 3rd and 4th groups in the same way, and then starting back at 1st group, until user sends a stop command.
My question is, once each group of 7 threads has finished processing, should I keep those threads idle, or shut them down completely and restart a new group of 7 threads at the next iteration? Which method will be faster? This is for a very speed intensive app, so I need everything to happen as quickly as possible.
I will be using a FixedThreadPool to manage each group of 7 threads. So I could either just invokeAll() and then leave them alone (presumably to idle), or I could shutdown() each threadpool after the invokeAll() and start a new thread pool at the next iteration.
Which method will be faster?
My question is, once each group of 7 threads has finished one cycle of processing, should I keep those threads idle, or shut them down completely and restart a new group of 7 threads at the next cycle?
I would use a single ExecutorService thread-pool and reuse the same threads for all tasks. See the tutorial on the subject. A thread-pool is designed to execute any Runnable or Callable class so they are task agnostic. For example, you might have ParentResult and ChildResult classes. You can submit a Callable<ParentResult> to the thread-pool which will return a Future<ParentResult> and you can submit a Callable<ChildResult> to the same thread-pool which will return a Future<ChildResult>.
The only reason why you'd want to have "groups of threads" is if each thread has some state that it must maintain -- a database connection or something. Even then many people use thread-pools since it does much of the concurrency heavy lifting for you.
If you do have to keep this state then I would certainly not shutdown the pools and then restart them later. A dormant thread/pool is taking no system resources aside from memory. The only reason why you would ever do this is if you are forking 100s of thread for the task but at that point, you should consider re-architecting your application.
You need not to schedule your threads manually. Start all 28 threads at once - this would not be slower, but well can be faster.
When you say your processor has 8 threads, I think you mean it has has 4 cores with hyperthreading. Java does not use threads in the same sense as your processor, so those 7 threads are of a different type to your processors.
The JVM handles processor usage, and is (IIRC) limited to using 1 core. The threads java uses are specific to the JVM, and are wholly separate.
As for your actual question, try testing different thread combinations to see which is fastest, which will give you a more accurate answer than arm-chair theorising.
I too prefer Alexei Kaigorodov suggestion to start all 28 threads. But I suggest you to replace newFixedThreadPoolwith new Executors API: ( since Java 8)
static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
Above API returns ForkJoinPool type of ExecutorService
Now you don't need to worry utilization of idle threads. Java will take care better utilization of idle threads with work-stealing mechanism.
If you still need four different groups of FixedThreadPools, you can proceed with invokeAll. Don't shutdown ExecutorService to switch between multiple pools. You one ExecutorService effectively. If you want to poll the result of Future tasks using invokeAll, use CompletableFuture and poll it to know the status of task execution.
static CompletableFuture<Void> runAsync(Runnable runnable, Executor executor)
Returns a new CompletableFuture that is asynchronously completed by a task running in the given executor after it runs the given action.

Multiple threads submitting actions to be done in order

A question on using threads in java (disclaimer - I am not very experienced with threads so please allow some leeway).
Overview:
I was wondering whether there was a way for multiple threads to add actions to be performed to a queue which another thread would take care of. It does not matter really what order - more important that the actions in the queue are taken care of one at a time.
Explanation:
I plan to host a small server (using servlets). I want each connection to a client to be handled by a separate thread (so far ok). However, each of these threads/clients will be making changes to a single xml file. However, the changes cannot be done at the same time.
Question:
Could I have each thread submit the changes to be made to a queue which another thread will continuously manage? As I said it does not matter on the order of the changes, just that they do not happen at the same time.
Also, please advise if this is not the best way to do this.
Thank you very much.
This is a reasonable approach. Use an unbounded BlockingQueue (e.g. a LinkedBlockingQueue) - the thread performing IO on the XML file calls take on the queue to remove the next message (blocking if the queue is empty) then processing the message to modify the XML file, while the threads submitting changes to the XML file will call offer on the queue in order to add their messages to it. The BlockingQueue is thread-safe, so there's no need for your threads to perform synchronization on it.
You could have the threads submit tasks to an ExecutorService that has only one thread. Or you could have a lock that allows only one thread to alter the file at once. The later seems more natural, as the file is a shared resource. The queue is the implied queue of threads awaiting a lock.
The Executor interface provides the abstraction you need:
An object that executes submitted Runnable tasks. This interface provides a way of decoupling task submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc. An Executor is normally used instead of explicitly creating threads."
A single-threaded executor service seems like exactly the right tool for the job. See Executors.newSingleThreadExecutor(), whose javadoc says:
Creates an Executor that uses a single worker thread operating off an
unbounded queue. (Note however that if this single thread terminates
due to a failure during execution prior to shutdown, a new one will
take its place if needed to execute subsequent tasks.) Tasks are
guaranteed to execute sequentially, and no more than one task will be
active at any given time. Unlike the otherwise equivalent
newFixedThreadPool(1) the returned executor is guaranteed not to be
reconfigurable to use additional threads.
Note that in a JavaEE context, you need to take into consideration how to terminate the worker thread when your webapp is unloaded. There are other questions here on SO that deal with this.

How do i avoid "cannot create native threads" while using ExecutorService?

I'm new to ExecutorService. Right now my scenario is "millions of data 365*24*7 is coming in"
I have some process to be done on data coming in using threads.
ExecutorService es = Executors.newSingleThread();
es.execute(new ComputeDTask(data));
I'm sending data to ComputeDTask for some execution.
How efficient is it to create a new ComputeDTask each time data comes in? That is, if data is received one million times, then a million ComputeDTask objects will be created.
The overhead of creating a thread is about 100 micro-seconds. i.e. if you do less than 100 micro-seconds of work you will have more overhead than work done and you program can be slower that being single threaded.
The overhead of create a task to an existing Executor service is about 2 micro-seconds. i.e. if the task takes less than 2 micro-second you may have more overhead than real work done.
If you have CPU bound process, you need about the same number of threads as core to keep all the cores busy, while minimising overhead.
e.g. if you have 8 cores, I suggest you combine the work done so you have 8 threads with one task each total. You can have more tasks than this but you may find it take longer to process.
Of course you should shutdown your ExecutorService when you have finished with it. The reason you don't see this done in all examples is that it can be a good idea to create one ExecutorService which is used for the life of the application.
Apparently you are creating a whole new ExecutorService for each task and never shutting them down. This of course results in the thread leak that you are observing. The proper way to use the ExecutorService is to create a single instance that manages the thread pool for you. The executors are very flexible and powerful in the way they manage the threads.
On incoming data even create the task using new ComputeDTask(data) and then pass this to ThreadPool where you have let say 100 threads and then they can execute task with higher throughput.
ExecutorService es = Executors.newFixedThreadPool(100);
onGetData(){
es.execute(new ComputeDTask(data));
}

Categories