Run parallel tasks in a long running application - java

I am building a long running application, which is modeled as a service based on service oriented architecture. Call this as 'serviceA'. It has an activity to perform, call 'activityA', whenever an API call is made to it.
activityA has an activity handler that has to perform 'n' tasks in parallel after which it consolidates and returns result to the client who called the serviceA API.
I am planning to use the ExecutorService to achieve this parallelism.
There are 2 ways to go ahead with this:
Create ExecutorService in a singleton scope, and have it as an attribute of the activity handler. Thus this same ExecutorService object is available throughout the lifetime of the service. When a new request comes, handler uses this ExecutorService object to submit parallel tasks. Then wait on the Future objects for certain timeout time. After all the parallel tasks complete, consolidate and return the activityA response.
Create new ExecutorService object everytime a request to activityA is received, in the activity handler. Submit the parallel tasks to this object, wait for the Future results for certain timeout time, consolidate the results, call shutdown on the ExecutorService object, and return the activityA API response.
Thus,
Which of the 2 above approaches should be followed? Major difference b/w the 2 is the lifetime of the ExecutorService object.
The service is supposed to be called with a volume of ~15k transactions per second, if this data helps with the decision making b/w the 2 approaches?
Advantage of 1st approach is that we will not have the overhead of creating and shutting down new ExecutorService objects, and threads. But, what happens when there is no Future result till the timeout time? Does the thread automatically shuts down? Is it available for any new request that will be coming to the ExecutorService thread pool? Or it will be in some waiting state, and eat up memory - in which case we manually need to do something (and what)?
Also, Timeout time while we call future.get() is from the time we make this get call or from the time we submitted the task to the executor service?
Please also let me know if any of the 2 way is the obvious approach to this problem.
Thanks.

The first way looks like the obvious and correct way to solve this problem, especially with the given amount of transactions. You certainly don't want to restart threads.
Future.get timeout doesn't affect the executing thread. It will continue to run the task until it is either completed or throws an exception. Until then, it won't be accepting new tasks (but other threads in the same executor will). In this case you may want to cancel it explicitly by invoking Future.cancel to free the thread for new tasks. This requires the task itself to respond properly to interrupt (instead of looping forever, for example, or waiting blocked on I/O). However, this would be the same for any threading approach since interruption is the only safe way to terminate a thread anyway. To mitigate this issue you could use a dynamic pool of threads with maximum number of running threads more than n. This will allow to process new tasks while the stuck tasks are in process of termination.
It's from the time you call it.

Related

run things in parallel with multithreading [duplicate]

In my Java application I have a Runnable such as:
this.runner = new Runnable({
#Override
public void run() {
// do something that takes roughly 5 seconds.
}
});
I need to run this roughly every 30 seconds (although this can vary) in a separate thread. The nature of the code is such that I can run it and forget about it (whether it succeeds or fails). I do this as follows as a single line of code in my application:
(new Thread(this.runner)).start()
Now, this works fine. However, I'm wondering if there is any sort of cleanup I should be doing on each of the thread instances after they finish running? I am doing CPU profiling of this application in VisualVM and I can see that, over the course of 1 hour runtime, a lot of threads are being created. Is this concern valid or is everything OK?
N.B. The reason I start a new Thread instead of simply defining this.runner as a Thread, is that I sometimes need to run this.runner twice simultaneously (before the first run call has finished), and I can't do that if I defined this.runner as a Thread since a single Thread object can only be run again once the initial execution has finished.
Java objects that need to be "cleaned up" or "closed" after use conventionally implement the AutoCloseable interface. This makes it easy to do the clean up using try-with-resources. The Thread class does not implement AutoCloseable, and has no "close" or "dispose" method. So, you do not need to do any explicit clean up.
However
(new Thread(this.runner)).start()
is not guaranteed to immediately start computation of the Runnable. You might not care whether it succeeds or fails, but I guess you do care whether it runs at all. And you might want to limit the number of these tasks running concurrently. You might want only one to run at once, for example. So you might want to join() the thread (or, perhaps, join with a timeout). Joining the thread will ensure that the thread will completes its computation. Joining the thread with a timeout increases the chance that the thread starts its computation (because the current thread will be suspended, freeing a CPU that might run the other thread).
However, creating multiple threads to perform regular or frequent tasks is not recommended. You should instead submit tasks to a thread pool. That will enable you to control the maximum amount of concurrency, and can provide you with other benefits (such as prioritising different tasks), and amortises the expense of creating threads.
You can configure a thread pool to use a fixed length (bounded) task queue and to cause submitting threads to execute submitted tasks itself themselves when the queue is full. By doing that you can guarantee that tasks submitted to the thread pool are (eventually) executed. The documentation of ThreadPool.execute(Runnable) says it
Executes the given task sometime in the future
which suggests that the implementation guarantees that it will eventually run all submitted tasks even if you do not do those specific tasks to ensure submitted tasks are executed.
I recommend you to look at the Concurrency API. There are numerous pre-defined methods for general use. By using ExecutorService you can call the shutdown method after submitting tasks to the executor which stops accepting new tasks, waits for previously submitted tasks to execute, and then terminates the executor.
For a short introduction:
https://www.baeldung.com/java-executor-service-tutorial

Multiple CompletionService for one thread pool Java

I'm working on a Java server application with the general following architecture:
Clients make RPC requests to the server
The RPC server (gRPC) I believe has its own thread pool for handling requests
Requests are immediately inserted into Thread Pool 1 for more processing
A specific request type, we'll call Request R, needs to run a few asynchronous tasks in parallel, judging the results to form a consensus that it will return to the client. These tasks are a bit more long running, so I use a separate Thread Pool 2 to handle these requests. Importantly, each Request R will need to run the same 2-3 asynchronous tasks. Thread Pool 2 therefore services ALL currently executing Request R's. However, a Request R should only be able to see and retrieve the asynchronus tasks that belong to it.
To achieve this, upon every incoming Request R, while its in Thread Pool 1, it will create a new CompletionService for the request, backed by Thread Pool 2. It will submit 2-3 async tasks, and retrieve the results. These should be strictly isolated from anything else that might be running in Thread Pool 2 belonging to other requests.
My questions:
Firstly, is Java's CompletionService isolated? I couldn't find good documentation on this after checking the JavaDocs. In other words, if two or more CompletionService's are backed by the same thread pool, are any of them at risk of pulling a future belonging to another CompletionService?
Secondly, is this bad practice to be creating this many CompletionService's for each request? Is there a better way to handle this? Of course it would be a bad idea to create a new thread pool for each request, so is there a more canonical/correct way to isolate futures within a CompletionService or is what I'm doing okay?
Thanks in advance for the help. Any pointers to helpful documentation or examles would be greatly appreciated.
Code, for reference, although trivial:
public static final ExecutorService THREAD_POOL_2 =
new ThreadPoolExecutor(16, 64, 60, TimeUnit.SECONDS, new LinkedBlockingQueue<>());
// Gets created to handle a RequestR, RequestRHandler is run in Thread Pool 1
public class RequestRHandler {
CompletionService<String> cs;
RequestRHandler() {
cs = new ExecutorCompletionService<>(THREAD_POOL_2);
}
String execute() {
cs.submit(asyncTask1);
cs.submit(asyncTask2);
cs.submit(asyncTask3);
// Lets say asyncTask3 completes first
Future<String> asyncTask3Result = cs.take();
// asyncTask3 result indicates asyncTask1 & asyncTask2 results don't matter, cancel them
// without checking result
// Cancels all futures, I track all futures submitted within this request and cancel them,
// so it shouldn't affect any other requests in the TP 2 pool
cancelAllFutures(cs);
return asyncTask3Result.get();
}
}
Firstly, is Java's CompletionService isolated?
That's not garanteed as it's an interface, so the implementation decides that. But as the only implementation is ExecutorCompletionService I'd just say the answer is: yes. Every instance of ExecutorCompletionService has internally a BlockingQueue where the finished tasks are queued. Actually, when you call take on the service, it just passes the call to the queue by calling take on it. Every submitted task is wrapped by another object, which puts the task in the queue when it's finished. So each instance manages it's submitted tasks isolated from other instances.
Secondly, is this bad practice to be creating this many CompletionServices for each request?
I'd say it's okay. A CompletionService is nothing but a rather thin wrapper around an executor. You have to live with the "overhead" (internal BlockingQueue and wrapper instances for the tasks) but it's small and you are probably gaining way more from it than it costs. One could ask if you need one for just 2 to 3 tasks but it kinda depends on the tasks. At this point it's a question about if a CompletionService is worth it in general, so that's up to you to decide as it's out of scope of your question.

Do I need to clean up Thread objects in Java?

In my Java application I have a Runnable such as:
this.runner = new Runnable({
#Override
public void run() {
// do something that takes roughly 5 seconds.
}
});
I need to run this roughly every 30 seconds (although this can vary) in a separate thread. The nature of the code is such that I can run it and forget about it (whether it succeeds or fails). I do this as follows as a single line of code in my application:
(new Thread(this.runner)).start()
Now, this works fine. However, I'm wondering if there is any sort of cleanup I should be doing on each of the thread instances after they finish running? I am doing CPU profiling of this application in VisualVM and I can see that, over the course of 1 hour runtime, a lot of threads are being created. Is this concern valid or is everything OK?
N.B. The reason I start a new Thread instead of simply defining this.runner as a Thread, is that I sometimes need to run this.runner twice simultaneously (before the first run call has finished), and I can't do that if I defined this.runner as a Thread since a single Thread object can only be run again once the initial execution has finished.
Java objects that need to be "cleaned up" or "closed" after use conventionally implement the AutoCloseable interface. This makes it easy to do the clean up using try-with-resources. The Thread class does not implement AutoCloseable, and has no "close" or "dispose" method. So, you do not need to do any explicit clean up.
However
(new Thread(this.runner)).start()
is not guaranteed to immediately start computation of the Runnable. You might not care whether it succeeds or fails, but I guess you do care whether it runs at all. And you might want to limit the number of these tasks running concurrently. You might want only one to run at once, for example. So you might want to join() the thread (or, perhaps, join with a timeout). Joining the thread will ensure that the thread will completes its computation. Joining the thread with a timeout increases the chance that the thread starts its computation (because the current thread will be suspended, freeing a CPU that might run the other thread).
However, creating multiple threads to perform regular or frequent tasks is not recommended. You should instead submit tasks to a thread pool. That will enable you to control the maximum amount of concurrency, and can provide you with other benefits (such as prioritising different tasks), and amortises the expense of creating threads.
You can configure a thread pool to use a fixed length (bounded) task queue and to cause submitting threads to execute submitted tasks itself themselves when the queue is full. By doing that you can guarantee that tasks submitted to the thread pool are (eventually) executed. The documentation of ThreadPool.execute(Runnable) says it
Executes the given task sometime in the future
which suggests that the implementation guarantees that it will eventually run all submitted tasks even if you do not do those specific tasks to ensure submitted tasks are executed.
I recommend you to look at the Concurrency API. There are numerous pre-defined methods for general use. By using ExecutorService you can call the shutdown method after submitting tasks to the executor which stops accepting new tasks, waits for previously submitted tasks to execute, and then terminates the executor.
For a short introduction:
https://www.baeldung.com/java-executor-service-tutorial

shutting down ExecutorService in a spring boot Rest API

Am building a spring boot rest api application deployed on weblogic 12c.
One of my requirement is to run some long running tasks on every incoming request.
An incoming rest request could result into multiple asynchronous task executions.
Since I dont care for the response and nor any exceptions that will result from these tasks I chose to use the ExecutorService and not Callable or CompletableFuture.
ExecutorService executorService =
Executors.newFixedThreadPool(2, new CustomizableThreadFactory("-abc-"));
Then for the incoming request that I receive in controller run two for loops and assign those tasks to the ExecutorService:
for (final String orderId : orderIds) {
for (final String itemId : itemIds) {
exec.execute(new Runnable() {
public void run() {
try {
//call database operation
}catch(Throwable t) {
logger.error("EXCEPTION with {} , {}" ,orderId,itemId
)
}
});
}//for
}//for
My question is regarding shutting down of the ExecutorService.
I am aware about graceful shutdown ( shutdown ) a hybrid shutdown ( awaitTermination ) or an abrupt shutdown ( shutdownNow )
what would be the preferred approach between the three for a rest api application ?
also is there any limit on how many thread pools can get created viz a viz as the number of ExecutorService thread pools getting created will be driven by the number of incoming requests
We currently have similar requirements, this is a difficult problem to solve as you want to use the right hammer if you will. There are very heavy weight solutions to orchestrating long running processes, for example SpringBatch.
Firstly though don't bother stop and starting the ExecutorService. The whole point of that class is to take the burden of Thread management off your hands, so you don't need to create and stop Threads yourself. So you don't need to manage the manager.
But be careful with your approach. Without using queues or another load balancing technique to smartly balance the long running processes across instances in your app. Or managing what happens when a Thread dies, you may get into a world of trouble. In general I would say nowadays it doesn't make much sense to interact directly with Threads or ThreadPools, and to use higher level solutions for this type of problem.
awaitTermination is usually a bit safer, while shutdownNow is more forceful. It's usually a good idea to use awaitTermination in a functional method, or even a runnable, if you would like the executor to shut down as soon as possible, but only after it has completed doing everything that it was created to do. In other words, when there are no active tasks that the executor is executing.
Ex.)
ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime.availableProcessors);
Observable.of(items).schedule(Schedulers.from(executor)).flatMap(item -> {
... // this block represents a task that the executor will execute in a worker thread
}).onSubscribe(onNext ->
logItem(onNext), throwable ->
throwable.printStackTrace(), /* onComplete */ () ->
executor.awaitTermination(60, TimeUnit.Seconds)
);
... // you need to shutdown asap because these other methods below are also doing some computation/io-intensive stuff
Now, when this method is finished, it will call awaitTermination, which will either close the pool immediately if it is not executing any tasks, or wait up to 60 seconds if tasks are still being executed.
Threads, or workers, will cease to be active for 60 seconds of inactivity in most cases, since that is usually the default.
On the other hand, if you want tasks to stop executing as soon as (to give some examples) an exception is thrown, there was a breach in security, or another module/service has failed, you might want to use shutdownNow() to stop all tasks immediately without the option of waiting.
My advice for choosing between the two would be to use shutdownNow in you catch block if you do not want tasks to continue to be executed if there is an exception - i.e., there is no longer a reason to return the list of items to the client given that one of the items did not get added to the list.
Otherwise, I'd recommend using awaitTermination after your try-catch, set to one minute, to safely shut down the thread pool as soon as it has executed all the tasks you have given it. But only do that if you know that the executor will not responsible for executing any more tasks down the line.
The simple shutdown, if that is an option for you, is also a good method. shutdown will reject all incoming tasks but wait until current tasks are finished executing, according to the Oracle docs.
If your not sure when you need to close the executor, it might be a good idea to use an #PreDestroy method so that the executor will just before the destroy method has been called on your bean:
#PreDestroy
private void cleanup(){
executor.shutdown();
}

thread pool - make a new one per task, detect when a set of tasks is done

Running concurrent tasks via ThreadPoolExecutors. Since I have 2-3 sets of tasks to do, for now have a map of ThreadPoolExecutors and can send a set of tasks to one of them.
Now want to know when a pool has completed all tasks assigned to it. The way its organized is that I know before hand the list of tasks, so send them to a newly constructed pool, then plan to start pooling/ tracking to know when all are done.
One way would be to have another pool with 1-2 threads, that polls the other pools to know when their queues are empty. If a few scans show them as empty (with a second sleep between polling, assumes they are done).
Another way would be to sub class ThreadPoolExecutor , keep a track via the queue and over ridding afterExecute(Runnable r, Throwable t) so can know exactly when each task is done, good to show status and know when all are complete if everything moving smoothly.
Is there an implementation of the second some where? Would be good to have an interface that listeners can implement, then add them selves to the sub classed method.
Also looking for an implementation :
To to ask a pool to shut down within a time out,
If after a time out the shut down is not complete then call shutdownNow()
And if this fails then get the thread factory and stop all threads in its group. (assumes that we set the factory and it uses a group or other way to get a reference to all its threads)
Basically as sure a way as we can, to clean up a pool so that we can have this running in an app container. Some of the tasks call selenium etc so there can be hung threads.
The last ditch would be to restart the container (tomcat/jboss) but want that to be the last ditch.
Question is - know of an open source implementation of this or any code to start off with?
For your first question, you can use a ExecutorCompletionService. It will add all completed tasks into a Queue so with a blocking queue you can wait until all tasks arrived at the queue.
Or create a subclass of FutureTask and override its done method to define the “after execute” action. Then submit instances of this class wrapping your jobs to the executor.
The second question has a straightforward solution. “shut down within a time out, and if after a time out the shut down is not complete then call shutdownNow()”:
executor.shutDown();
if(!executor.awaitTermination(timeout, timeUnit))
executor.shutdownNow();
Stopping threads is something you shouldn’t do (Thread.stop is deprecated for a good reason). But you may invoke cancel(true) on your jobs. That could accelerate the termination if your tasks support interruption.
By the way it looks very unnatural to me having multiple ThreadPoolExecutors and playing around with shutting them down instead of simply having one ThreadPoolExecutor for all jobs and letting that ThreadPoolExecutor manage the live cycle of all threads. That’s what the ThreadPoolExecutor is made for.

Categories