Executor pool limit number of threads at a time - java

I have a situation in which I have to run some 10,000 threads. Obviously, one machine cannot run these many threads in parallel. Is there any way by which we can ask Thread pool to run some specific number of threads in the beginning and as soon as one thread finishes, the threads which are left can start their processing ?

Executors.newFixedThreadPool(nThreads) is what most likely you are looking for. There will only be as many threads running at one time as the number of threads specified. And yes one machine cannot run 10,000 threads at once in parallel, but it will be able to run them concurrently. Depending on how resource intensive each thread is, it may be more efficient in your case to use
Executors.newCachedThreadPool() wherein as many threads are created as needed, and threads that have finished are reused.

Using Executors.newFixedThreadPool(10000) with invokeAll will throw an OutOfMemory exception with that many threads. You still could use it by submitting tasks to it instead of invoking all tasks at same time, that's I would say safer than just invokeAll.

For this use case. You can have a ThreadPollExecuter with Blocking Queue. http://howtodoinjava.com/core-java/multi-threading/how-to-use-blockingqueue-and-threadpoolexecutor-in-java/ this tutorial explains that very well.

It sounds like you want to run 10,000 tasks on a group of threads. A relatively simple approach is to create a List and then add all the tasks to the list, wrapping them in Runnable. Then, create a class that takes the list in the constructor and pops a Runnable of the list and then runs it. This activity must be synchronized in some manner. The class exits when the list is empty. Start some number of threads using this class. They'll burn down the list and then stop. Your main thread can monitor the length of the list.

Related

run things in parallel with multithreading [duplicate]

In my Java application I have a Runnable such as:
this.runner = new Runnable({
#Override
public void run() {
// do something that takes roughly 5 seconds.
}
});
I need to run this roughly every 30 seconds (although this can vary) in a separate thread. The nature of the code is such that I can run it and forget about it (whether it succeeds or fails). I do this as follows as a single line of code in my application:
(new Thread(this.runner)).start()
Now, this works fine. However, I'm wondering if there is any sort of cleanup I should be doing on each of the thread instances after they finish running? I am doing CPU profiling of this application in VisualVM and I can see that, over the course of 1 hour runtime, a lot of threads are being created. Is this concern valid or is everything OK?
N.B. The reason I start a new Thread instead of simply defining this.runner as a Thread, is that I sometimes need to run this.runner twice simultaneously (before the first run call has finished), and I can't do that if I defined this.runner as a Thread since a single Thread object can only be run again once the initial execution has finished.
Java objects that need to be "cleaned up" or "closed" after use conventionally implement the AutoCloseable interface. This makes it easy to do the clean up using try-with-resources. The Thread class does not implement AutoCloseable, and has no "close" or "dispose" method. So, you do not need to do any explicit clean up.
However
(new Thread(this.runner)).start()
is not guaranteed to immediately start computation of the Runnable. You might not care whether it succeeds or fails, but I guess you do care whether it runs at all. And you might want to limit the number of these tasks running concurrently. You might want only one to run at once, for example. So you might want to join() the thread (or, perhaps, join with a timeout). Joining the thread will ensure that the thread will completes its computation. Joining the thread with a timeout increases the chance that the thread starts its computation (because the current thread will be suspended, freeing a CPU that might run the other thread).
However, creating multiple threads to perform regular or frequent tasks is not recommended. You should instead submit tasks to a thread pool. That will enable you to control the maximum amount of concurrency, and can provide you with other benefits (such as prioritising different tasks), and amortises the expense of creating threads.
You can configure a thread pool to use a fixed length (bounded) task queue and to cause submitting threads to execute submitted tasks itself themselves when the queue is full. By doing that you can guarantee that tasks submitted to the thread pool are (eventually) executed. The documentation of ThreadPool.execute(Runnable) says it
Executes the given task sometime in the future
which suggests that the implementation guarantees that it will eventually run all submitted tasks even if you do not do those specific tasks to ensure submitted tasks are executed.
I recommend you to look at the Concurrency API. There are numerous pre-defined methods for general use. By using ExecutorService you can call the shutdown method after submitting tasks to the executor which stops accepting new tasks, waits for previously submitted tasks to execute, and then terminates the executor.
For a short introduction:
https://www.baeldung.com/java-executor-service-tutorial

Do I need to clean up Thread objects in Java?

In my Java application I have a Runnable such as:
this.runner = new Runnable({
#Override
public void run() {
// do something that takes roughly 5 seconds.
}
});
I need to run this roughly every 30 seconds (although this can vary) in a separate thread. The nature of the code is such that I can run it and forget about it (whether it succeeds or fails). I do this as follows as a single line of code in my application:
(new Thread(this.runner)).start()
Now, this works fine. However, I'm wondering if there is any sort of cleanup I should be doing on each of the thread instances after they finish running? I am doing CPU profiling of this application in VisualVM and I can see that, over the course of 1 hour runtime, a lot of threads are being created. Is this concern valid or is everything OK?
N.B. The reason I start a new Thread instead of simply defining this.runner as a Thread, is that I sometimes need to run this.runner twice simultaneously (before the first run call has finished), and I can't do that if I defined this.runner as a Thread since a single Thread object can only be run again once the initial execution has finished.
Java objects that need to be "cleaned up" or "closed" after use conventionally implement the AutoCloseable interface. This makes it easy to do the clean up using try-with-resources. The Thread class does not implement AutoCloseable, and has no "close" or "dispose" method. So, you do not need to do any explicit clean up.
However
(new Thread(this.runner)).start()
is not guaranteed to immediately start computation of the Runnable. You might not care whether it succeeds or fails, but I guess you do care whether it runs at all. And you might want to limit the number of these tasks running concurrently. You might want only one to run at once, for example. So you might want to join() the thread (or, perhaps, join with a timeout). Joining the thread will ensure that the thread will completes its computation. Joining the thread with a timeout increases the chance that the thread starts its computation (because the current thread will be suspended, freeing a CPU that might run the other thread).
However, creating multiple threads to perform regular or frequent tasks is not recommended. You should instead submit tasks to a thread pool. That will enable you to control the maximum amount of concurrency, and can provide you with other benefits (such as prioritising different tasks), and amortises the expense of creating threads.
You can configure a thread pool to use a fixed length (bounded) task queue and to cause submitting threads to execute submitted tasks itself themselves when the queue is full. By doing that you can guarantee that tasks submitted to the thread pool are (eventually) executed. The documentation of ThreadPool.execute(Runnable) says it
Executes the given task sometime in the future
which suggests that the implementation guarantees that it will eventually run all submitted tasks even if you do not do those specific tasks to ensure submitted tasks are executed.
I recommend you to look at the Concurrency API. There are numerous pre-defined methods for general use. By using ExecutorService you can call the shutdown method after submitting tasks to the executor which stops accepting new tasks, waits for previously submitted tasks to execute, and then terminates the executor.
For a short introduction:
https://www.baeldung.com/java-executor-service-tutorial

Java ThreadPool concepts, and issues with controlling the number of actual threads

I am a newbie to Java concurrency and am a bit confused by several concepts and implementation issues here. Hope you guys can help.
Say, I have a list of tasks stored in a thread-safe list wrapper:
ListWrapper jobs = ....
'ListWrapper' has synchronized fetch/push/append functions, and this 'jobs' object will be shared by multiple worker threads.
And I have a worker 'Runnable' to execute the tasks:
public class Worker implements Runnable{
private ListWrapper jobs;
public Worker(ListWrapper l){
this.jobs=l;
}
public void run(){
while(! jobs.isEmpty()){
//fetch an item from jobs and do sth...
}
}
}
Now in the main function I execute the tasks:
int NTHREADS =10;
ExecutorService service= Executors.newFixedThreadPool(NTHREADS);
//run threads..
int x=3;
for(int i=0; i<x; i++){
service.execute(new Worker(jobs) );
}
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
Now my questions are:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
Thread t1= new Worker(jobs);
Thread t2= new Worker(jobs);
...
t1.join();
t2.join();
...
Thank you very much!!
[[ There are some good answers here but I thought I'd add some more detail. ]]
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
No, not really. I suspect that the reason you weren't seeing 20 threads is that threads had already finished or had yet to be started. If you call new Thread(...).start() 20 times then you will get 20 threads started. However, if you check immediately none of them may have actually begun to run or if you check later they may have finished.
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
Quoting the Javadocs of Executors.newFixedThreadPool(...):
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks.
So changing the NTHREADS constant changes the number of threads running in the pool. Changing x changes the number of jobs that are executed by those threads. You could have 2 threads in the pool and submit 1000 jobs or you could have 1000 threads and only submit 1 job for them to work on.
Btw, after you have submitted all of your jobs, you should then shutdown the pool which stops all of the threads if all of the jobs have been run.
service.shutdown();
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It differs in that it does all of the heavy work for you.
You don't have to create a ListWrapper of the jobs since you get one inside of the ExecutorService. You just submit the jobs to the ExecutorService and it keeps track of them until the threads are available to run them.
You don't have to create any threads or worry about them throwing exceptions and dying because the ExecutorService starts/restarts the threads for you.
If you want your tasks to return information you can make use of the submit(Callable) method and use the Future to get the results of the jobs. Etc, etc..
Doing this code yourself is going to be harder to get right, more code to maintain, and most likely will not perform as well as the code in the JDK that is battle tested and optimized.
You shouldn't create threads by yourself when using a threadpool. Instead of WorkerThread class you should use a class that implements Runnable but is not a thread. Passing a Thread object to the threadpool won't make the thread run actually. The object will be passed to a different internal thread, which will simply execute the run method of your WorkerThread class.
The ExecutorService is simply incompatible with the way you want to write your program.
In the code you have right now, these WorkerThreads will stop to work when your ListWrapper is empty. If you then add something to the list, nothing will happen. This is definitely not what you wanted.
You should get rid of ListWrapper and simply put your tasks directly into the threadpool. The threadpool already incorporates an internal list of jobs shared between the threads. You should just submit your jobs to the threadpool and it will handle them accordingly.
To answer your questions:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
NTHREADS, the threadpool will create the necessary number of threads.
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It's just that ExecutorService automates a lot of things for you. You can choose from a lot of different implementations of threadpools and you can substitute them easily. You can use for instance a scheduled executor. You get extra functionality. Why reinvent the wheel?
For 1) NTHREADS is the maximum threads that the pool will ever run concurrently, but that doesn't mean there will always be that many running. It will only use as many as is needed up to that max value... which in your case is 3.
As the docs say:
At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool-int-
As for 2) using Java's concurrent executors framework is preferred with new code. You get a lot of stuff for free and removes the need for having to handle all of the fiddly thread work yourself.
The number of threads passed into newFixedThreadPool is at most how many threads could be running executing your tasks. If you only have three tasks ever submitted I'd expect the ExecutorService to only create three threads.
To answer your questions:
You should use the number you pass into the constructor to control how many threads are going to be used to execute your tasks.
This differs because of the extra functionality the ExecutorService gives you, as well as the flexibility it gives you such as in the case you need to change your ExecutorService type or number of tasks you'll run (less lines of code to change).
All that is happening is the executor service is only creating as many threads as it needs. NTHREADS is effectively the maximum number of threads it'll create.
There is no point creating ten threads up front if it only has 3 tasks to complete, the other 7 will just be hanging around consuming resources.
If you submit more than NTHREADS number of tasks then it will process that number concurrently and the rest will wait on a queue until a thread becomes free.
This isn't any different from creating a fixed set of your own threads, except the thread management and scheduling is handled for you. The executor service also restarts threads if they are killed by rogue exceptions in your task which you'd otherwise have to code for.
See: The Javadoc on Executorservice.newFixedThreadPool

Schedule periodic tasks in Java, avoid creating new threads until necessary (like CachedThreadPool)

I have a number of tasks that I would like to execute periodically at different rates for most tasks. Some of the tasks may be scheduled for simultaneous execution though. Also, a task may need to start executing while another is currently executing.
I would also like to customize each task by setting an object for it, on which the task will operate while it is being executed.
Usually, the tasks will execute in periods of 2 to 30 minutes and will take around 4-5 seconds, sometimes up to 30 seconds when they are executed.
I've found Executors.newSingleThreadedScheduledExecutor(ThreadFactory) to be almost exactly what I want, except that it might cause me problems if a new task happens to be scheduled for execution while another is already executing. This is due to the fact that the Executor is backed up by a single execution thread.
The alternative is to use Executors.newScheduledThreadPool(corePoolSize, ThreadFactory), but this requires me to create a number of threads in a pool. I would like to avoid creating threads until it is necessary, for instance if I have two or more tasks that happen to need parallell executing due to their colliding execution schedules.
For the case above, the Executors.newCachedThreadPool(ThreadFactory) appears to do what I want, but then I can't schedule my tasks. A combination of both cached and scheduled executors would be best I think, but I am unable to find something like that in Java.
What would be the best way to implement the above do you think?
Isn't ScheduledThreadPoolExecutor.ScheduledThreadPoolExecutor(int):
ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(0);
what you need? 0 is the corePoolSize:
corePoolSize - the number of threads to keep in the pool, even if they are idle, unless allowCoreThreadTimeOut is set
I guess you will not able to do that with ScheduledExecutor, because it uses DelayedWorkQueue where as newCachedThreadPool uses ThreadPoolExecutor SynchronousQueue as a work queue.
So you can not change implementation of ScheduledThreadPoolExecutor to act like that.

Spawning tons of threads without running out of memory

I have a multi-threaded application which creates hundreds of threads on the fly. When the JVM has less memory available than necessary to create the next Thread, it's unable to create more threads. Every thread lives for 1-3 minutes. Is there a way, if I create a thread and don't start it, the application can be made to automatically start it when it has resources, and otherwise wait until existing threads die?
You're responsible for checking your available memory before allocating more resources, if you're running close to your limit. One way to do this is to use the MemoryUsage class, or use one of:
Runtime.getRuntime().totalMemory()
Runtime.getRuntime().freeMemory()
...to see how much memory is available. To figure out how much is used, of course, you just subtract total from free. Then, in your app, simply set a MAX_MEMORY_USAGE value that, when your app has used that amount or more memory, it stops creating more threads until the amount of used memory has dropped back below this threshold. This way you're always running with the maximum number of threads, and not exceeding memory available.
Finally, instead of trying to create threads without starting them (because once you've created the Thread object, you're already taking up the memory), simply do one of the following:
Keep a queue of things that need to be done, and create a new thread for those things as memory becomes available
Use a "thread pool", let's say a max of 128 threads, as all your "workers". When a worker thread is done with a job, it simply checks the pending work queue to see if anything is waiting to be done, and if so, it removes that job from the queue and starts work.
I ran into a similar issue recently and I used the NotifyingBlockingThreadPoolExecutor solution described at this site:
http://today.java.net/pub/a/today/2008/10/23/creating-a-notifying-blocking-thread-pool-executor.html
The basic idea is that this NotifyingBlockingThreadPoolExecutor will execute tasks in parallel like the ThreadPoolExecutor, but if you try to add a task and there are no threads available, it will wait. It allowed me to keep the code with the simple "create all the tasks I need as soon as I need them" approach while avoiding huge overhead of waiting tasks instantiated all at once.
It's unclear from your question, but if you're using straight threads instead of Executors and Runnables, you should be learning about java.util.concurrent package and using that instead: http://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html
Just write code to do exactly what you want. Your question describes a recipe for a solution, just implement that recipe. Also, you should give serious thought to re-architecting. You only need a thread for things you want to do concurrently and you can't usefully do hundreds of things concurrently.
This is an alternative, lower level solution Then the above mentioed NotifyingBlocking executor - it is probably not as ideal but will be simple to implement
If you want alot of threads on standby, then you ultimately need a mechanism for them to know when its okay to "come to life". This sounds like a case for semaphores.
Make sure that each thread allocates no unnecessary memory before it starts working. Then implement as follows :
1) create n threads on startup of the application, stored in a queue. You can Base this n on the result of Runtime.getMemory(...), rather than hard coding it.
2) also, creat a semaphore with n-k permits. Again, base this onthe amount of memory available.
3) now, have each of n-k threads periodically check if the semaphore has permits, calling Thread.sleep(...) in between checks, for example.
4) if a thread notices a permit, then update the semaphore, and acquire the permit.
If this satisfies your needs, you can go on to manage your threads using a more sophisticated polling or wait/lock mechanism later.

Categories