Can I run background tasks in a ThreadPool? - java

I have an ExecutorService to execute my tasks concurrently. Most of these tasks are simple actions that require ~300ms to complete each. But a few of these tasks are background processing queues that take in new sub-tasks all the time and execute them in order. These background tasks will remain active as long as there are normal tasks running.
The ThreadPool is generated through one of the Executors' methods (don't know which yet) with a user-specified Thread count. My fear is that the following situation might happen: There are less threads than there are background queues. At a given moment, all background queues are working, blocking all the threads of the ExecutorService. No normal tasks will thus be started and the program hang forever.
Is there a possibility this might happen and how can I avoid it? I'm thinking of a possibility to interrupt the background tasks to leave the place to the normal ones.
The goal is to limit the number of threads in my application because Google said having a lot of threads is bad and having them idle for most of the time is bad too.
There are ~10000 tasks that are going to be submitted in a very short amount of time at the begin of the program execution. About ~50 background task queues are needed and most of the time will be spent waiting for a background job to do.

Don't mix up long running tasks with short running tasks in same ExecutorService.
Use two different ExecutorService instances with right pool size. Even if you set the size as 50 for background threads with long running tasks, performance of the pool is not optimal since number of available cores (2 core, 4 core, 8 core etc.) is not in that number.
I would like to create two separate ExecutorService initialized with Runtime.getRuntime().availableProcessors()/2;
Have a look at below posts for more details to effectively utilize available cores:
How to implement simple threading with a fixed number of worker threads
Dynamic Thread Pool

You can have an unlimited number of threads, check out cache thread pool
Creates a thread pool that creates new threads as needed, but will
reuse previously constructed threads when they are available. These
pools will typically improve the performance of programs that execute
many short-lived asynchronous tasks. Calls to execute will reuse
previously constructed threads if available. If no existing thread is
available, a new thread will be created and added to the pool. Threads
that have not been used for sixty seconds are terminated and removed
from the cache. Thus, a pool that remains idle for long enough will
not consume any resources. Note that pools with similar properties but
different details (for example, timeout parameters) may be created
using ThreadPoolExecutor constructors.
Another option is create two different pools and reserve one for priority tasks.

The solution is that the background tasks stop instead of being idle when there is no work and get restarted if there are enough tasks again.
public class BackgroundQueue implements Runnable {
private final ExecutorService service;
private final Queue<Runnable> tasks = new ConcurrentLinkedQueue<>();
private final AtomicBoolean running = new AtomicBoolean(false);
private Future<?> future;
public BackgroundQueue(ExecutorService service) {
this.service = Objects.requireNonNull(service);
// Create a Future that immediately returns null
FutureTask f = new FutureTask<>(() -> null);
f.run();
future = f;
}
public void awaitQueueTermination() throws InterruptedException, ExecutionException {
do {
future.get();
} while (!tasks.isEmpty() || running.get());
}
public synchronized void submit(Runnable task) {
tasks.add(task);
if (running.compareAndSet(false, true))
future = service.submit(this);
}
#Override
public void run() {
while (!running.compareAndSet(tasks.isEmpty(), false)) {
tasks.remove().run();
}
}
}

Related

Which pool to choose from Java Util concurrent

I have a bunch of tasks:
public class ProcessDay implements Runnable{
#Override
public void run(){
List<ProcessHour> hr = //required hours
//do some post actions
}
}
public class ProcessHour implements Runnable{
#Override
public void run(){
List<ProcessMinutes> mins = //required minutes
//do some post actions
}
}
ProcessSeconds, ProcessMonth, ... etc
And so forth. It would be convinient to use ForkJoinPool here but it's not good from performance standpoint, because ProcessXXX tasks are being submitted to a cluster of machines and hence the method invocation itself is very short.
So for performance it's good to use Executors.cachedThreadPoolExecutor. But is there a way to combine ForkJoinPool with chachedThreadPool semantic. I mean creating threads on demand and release them if not used.
Maybe there is a better approach to this? Can you suggest something?
I mean creating threads on demand and release them if not used.
That is how cached thread pool operates. It starts with 0 threads and creates a new one each time when there is a new task to process and all threads in the pool are busy. Thread is terminated if it was idle for 60 seconds.
Default cached thread pool is created using the following ThreadPoolExecutor constructor:
return new ThreadPoolExecutor(0,
Integer.MAX_VALUE,
60L,
TimeUnit.SECONDS,
new SynchronousQueue<Runnable>())
It created with 0 possible idle threads in pool, unlimited maximum number of threads, 60 seconds timeout before idle thread is terminated, and queue implementation that doesn't store tasks and just transfers them between pool and its threads. Such pool is suitable for lots of short-lived tasks that is probably your case. As you can see it's also pretty easy to adjust it configuration according to your needs using direct ThreadPoolExecutor constructor.

Execute action when ThreadPoolExecutor has no active workers

I have a cached thread pool where new tasks are spawned in rather unpredictable manner. These tasks don't generate any results (they are Runnables rather than Callables).
I would like to have an action to be executed whenever the pool has no active workers.
However I don't want to shutdown the pool (and obviously use awaitTermination) because I would have to reinitialize it again when a new task arrives (as it could arrive unpredictably, even during the shutdown).
I came up with the following possible approaches:
Have an extra thread (outside the pool) which is spawned whenever a new task is spawned AND the ThreadPoolExecutor had no active workers. It should then continually check the getActiveWorkers() until it returns 0 and if yes, execute the desired action.
Have some thread-safe queue (which one?), where the Future of every newly spawned task is added. Whenever there's at least one entry in the queue, spawn an extra thread (outside the pool) which waits until the queue is empty and executes the desired action.
Implement a PriorityBlockingQueue to use with the pool and assign the worker threads higher priority than to the thread (now from inside the pool) which executes the desired action.
My question:
I was wondering if there is some cleaner solution, which uses some nice synchronization object (like CountDownLatch, which however cannot be used here, because I don't know the number of tasks in advance) ?
If I were you, I would implement a decorator for your thread pool that keeps track of the scheduled tasks and slighlig modifies the tasks that are run. This way, whenever a Runnable is scheduled, you can instead schedule another, decoarated Runnable which is capable of tracing its own process.
This decorator would look something like:
class RunnableDecorator implements Runnable {
private final Runnable delegate;
// this task counter must be increased on any
// scheduling of a task by the thread pool
private final AtomicInteger taskCounter;
// Constructor omitted
#Override
public void run() {
try {
delegate.run();
} finally {
if (taskCounter.decrementAndGet() == 0) {
// spawn idle action
}
}
}
}
Of course, the thread pool has to increment the counter every time a task is scheduled. Thus, the logic for this must not be added to the Runnable but to the ThreadPool. Finally, it is up to you to decide if you want to run the idle action in the same thread or if you want to provide a reference to the executing thread pool to run a new thread. If you decide the latter, note however that the completion of the idle action would then trigger another idle action. You might however also provide a method for a sort of raw scheduling. You could also add the decoration to the thread queue what however makes it harder to provide this sort of raw scheduling.
This approach is non-blocking and does not mess with your code base too much. Note that the tread pool does not start an action when it is created and therefore empty by definition.
If you look at the source behind Executors.newCachedThreadPool(), you can see how it's created with a ThreadPoolExecutor. Using that, override the execute and afterExecute methods to add a counter. This way the increment and decrement logic is isolated in one location. Ex:
ExecutorService executor = new ThreadPoolExecutor(0, Integer.MAX_VALUE, 60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>()) {
private AtomicInteger counter = new AtomicInteger(0);
#Override
public void execute(Runnable r) {
counter.incrementAndGet();
super.execute(r);
}
#Override
public void afterExecute(Runnable r, Throwable t) {
if (counter.decrementAndGet() == 0) {
// thread pool is idle - do something
}
super.afterExecute(r, t);
}
};

Kind of load balanced thread pool in java

I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording).
Let me explain what I try to achieve.
Part 1:
I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.
Part two:
Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated.
The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.
My First idea:
Have 12 threads, allow up to three jobs in parallel.
BUT: that means, the cou is not fully untilized when there is only 1 job.
I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.
I apreciate your answers...
thanks in advance
One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue
public class TaskRunner {
private static class PriorityRunnable implements Runnable,
Comparable<PriorityRunnable> {
private Runnable theRunnable;
private int priority = 0;
public PriorityRunnable(Runnable r, int priority) {
this.theRunnable = r;
this.priority = priority;
}
public int getPriority() {
return priority;
}
public void run() {
theRunnable.run();
}
public int compareTo(PriorityRunnable that) {
return this.priority - that.priority;
}
}
private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();
private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
TimeUnit.MILLISECONDS, taskQueue);
public void runTasks(Runnable... tasks) {
int priority = 0;
Runnable nextTask = taskQueue.peek();
if(nextTask instanceof PriorityRunnable) {
priority = ((PriorityRunnable)nextTask).getPriority() + 1;
}
for(Runnable t : tasks) {
exec.execute(new PriorityRunnable(t, priority));
priority += 100;
}
}
}
The idea here is that when you have a new job you call
taskRunner.runTasks(jobTask1, jobTask2, jobTask3);
and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like
j1t1, j2t1, j1t2, j2t2, etc.
This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).
The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.
Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.
Do you have a setup where you can test these different pipelines to see how they're performing for you?
I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.
Also please have a look at Fork and Join concept in java 7.
References_1 References_2References_3 References_4
Also learn about newcachedthreadpool()
Java newCachedThreadPool() versus newFixedThreadPool

Controlling the number of threads and controlling object accessing in a Multi Threaded Web Crawler in Java

I built a web crawler but it is single threaded. Now I am extending it to work with multiple threads. I am not able to understand the following :
How many threads should I create? Should it be a fixed number or a dynamic one changing according to the length of the Queue holding the URIs? (Taking into consideration the available memory also)
I have created a new class for the thread through the Runnable Interface and I want each thread's run method to access an object I created in my Main class which is calling thread.start(). How should I access this object from each thread?
I am using NetBeans.
For the first question I guess in your situation it's best to use a dynamically adjusting thread pool like:
ExecutorService exec = Executors.newCachedThreadPool();
Creates a thread pool that creates new threads as needed, but will
reuse previously constructed threads when they are available. These
pools will typically improve the performance of programs that execute
many short-lived asynchronous tasks. Calls to execute will reuse
previously constructed threads if available. If no existing thread is
available, a new thread will be created and added to the pool. Threads
that have not been used for sixty seconds are terminated and removed
from the cache. Thus, a pool that remains idle for long enough will
not consume any resources.
For the second question, you can create a constructor and pass objects that way:
class ThreadTask implements Runnable {
private Object obj;
public ThreadTask(Object obj) {
this.obj = obj;
}
public void run() {
}
}
public static void main(String[] args) {
Object obj = new Object();
exec.submit(new ThreadTask(obj));
}
You're definitely going to want concurrency with a web a crawler :)
And you're probably going to want to set up a thread pool so that you can reuse threads and not bite the cost of instantiating new threads with each task.
The thread pool options that you have are a FixedThreadPool and a CachedThreadPool. the benefits of each of these are explained in detail in the Java Concurrency Tutorial. The big drawback of the CachedThreadPool is that there's no limit on how many threads can be created; in the event that a very large number of threads are added to the pool, you might see some significant performance degradation or timeouts (if you have a socket timeout defined).
In either case, the best practice for setting up thread pools is through java.util.concurrent.Executors
It's just a matter of creating an ExecutorService by calling one of the following:
ExecutorService threadPool = Executors.newCachedThreadPool();
ExecutorService threadPool = Executors.newFixedThreadPool(500);
Once you have the threadpool, you can either invoke a single runnable (which doesn't return a response) or a callable (which does) by using the submit() method.
You can also run .invokeAll() if you're using callables to generate futures:
futures = cachedThreadPool.invokeAll(tasks,
timeout,
TimeUnit.MILLISECONDS);
And then get the results:
for (Future f: futures) {
someList.add(f.get())
}
If you want multiple threads to be able to modify the same object, you'll either need to use the synchronized keyword in the setters or use thread-safe data types.
Hope this helps. Good luck!!
There could not be any specific answer it. But you can study on following -
For 1st point study on ExecutorService and ThreadPoolExecutor.
For 2nd point study on callable and Future.

Executor in java

I was trying to run ExecutorService object with FixedThreadPool and I ran into problems.
I expected the program to run in nanoseconds but it was hung. I found that I need to use Semaphore along with it so that the items in the queue do not get added up.
Is there any way I can come to know that all the threads of the pool are used.
Basic code ...
static ExecutorService pool = Executors.newFixedThreadPool(4);
static Semaphore permits = new Semaphore(4);
try {
permits.acquire();
pool.execute(p); // Assuming p is runnable on large number of objects
permits.release();
} catch ( InterruptedException ex ) {
}
This code gets hanged and I really don't know why. How to know if pool is currently waiting for all the threads to finish?
By default, if you submit more than 4 tasks to your pool then the extra tasks will be queued until a thread becomes available.
The blog you referenced in your comment uses the semaphore to limit the amount of work that can be queued at once, which won't be a problem for you until you have many thousands of tasks queued up and they start eating into the available memory. There's an easier way to do this, anyway - construct a ThreadPoolExecutor with a bounded queue.* But this isn't your problem.
If you want to know when a task completes, notice that ExecutorService.submit() returns a Future object which can be used to wait for the task's completion:
Future<?> f = pool.execute(p);
f.get();
System.out.println("task complete");
If you have several tasks and want to wait for all of them to complete, either store each Future in a list and then call get() on each in turn, or investigate ExecutorService.invokeAll() (which essentially does the same but in a single method call).
You can also tell whether a task has completed or not:
Future<?> f = pool.execute(p);
while(!f.isDone()) {
// do something else, task not complete
}
f.get();
Finally, note that even if your tasks are complete, your program may not exit (and thus appears to "hang") if you haven't called shutdown() on the thread pool; the reason is that the threads are still running, waiting to be given more work to do.
*Edit: sorry, I just re-read my answer and realised this part is incorrect - ThreadPoolExecutor offers tasks to the queue and rejects them if they aren't accepted, so a bounded queue has different semantics to the semaphore approach.
You do not need the Semaphore.
If you are hanging it is probably because the threads are locking themselves elsewhere.
Run the code in a Debuger and when it hangs pause it and see what the threads are doing.
You could change to using a ThreadPoolExecutor. It contains a getActiveCount() method which returns an approximate count of the active threads. Why it is approximate I'm not sure.

Categories