How to manage ThreadPoolTaskExecutor to block some requests - java

We have advanced HR system which has a service calculates employee attendance and leaves it must follow the following if calculating for employee (X, Y, Z) it opens 3 threads and calculate in parallel but if a request to calculate data for employee X again before the previous calculation end it must postpone until previous thread calculating data for employee X finishes.
ScheduleWeekAttendanceBean scheduleWeekAttendanceBean = null;
ThreadPoolExecutor threadPoolExecutor = employeeThreadPoolExecutorMap.get(employmentBean.id);
if (threadPoolExecutor == null || threadPoolExecutor.isTerminating() || threadPoolExecutor.isTerminated()) {
threadPoolExecutor = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(), new RejectedExecutionHandlerImpl());
employeeThreadPoolExecutorMap.put(employmentBean.id, threadPoolExecutor);
ThreadPoolTaskExecutorMonitorService threadPoolTaskExecutorMonitorService = new ThreadPoolTaskExecutorMonitorService(threadPoolExecutor, "#" + employmentBean.employeeId);
Thread thread = new Thread(threadPoolTaskExecutorMonitorService);
thread.start();
}
AttendanceBuilder attendanceBuilder = (AttendanceBuilder) AppContext.getBean("attendanceBuilder");
attendanceBuilder.initialize(employmentBean, selectedDate);
Future<ScheduleWeekAttendanceBean> future = threadPoolExecutor.submit(attendanceBuilder);
scheduleWeekAttendanceBean = future.get();
if (threadPoolExecutor.getActiveCount() == 0) {
employeeThreadPoolExecutorMap.remove(employmentBean.id);
threadPoolExecutor.shutdownNow();
}
return scheduleWeekAttendanceBean;
What happens here it process them one by one I need to implement this logic but only block if same employee exists in the map.

Creating separate ThreadPoolExecutor for each employee id is overhead. Each ThreadPoolExecutor contains a Thread which consumes a lot of memory, and this may lead to fatal OutOfMemoryError. So I suggest to use SerialExecutor instead, which is described in the documentation to the java.util.concurrent.Executor. SerialExecutor does not contain thread, instead, it uses an external Executor. You can create single Executor for all SerialExecutors, and tune its configuration (number of threads). Since SerialExecutor is small, you can keep all of them in the employeeThreadPoolExecutorMap permanently.
Another approach is to use Actors instead of Executors. An actor can be considered as a specialized Executor, designed to process specific tasks (messages). You can use Akka actors, or my Simple Actor.

Related

Is re-starting a Thread better than creating a new one?

I'm wondering whether there is any advantage to keeping the same threads over the course of the execution of an object, rather than re-using the same Thread objects. I have an object for which a single (frequently used) method is parallelized using local Thread variables, such that every time the method is called, new Threads (and Runnables) are instantiated. Because the method is called so frequently, a single execution may instantiate upwards of a hundred thousand Thread objects, even though there are never more than a few (~4-6) active at any given time.
Following is a cut down example of how this method is currently implemented, to give a sense of what I mean. For reference, n is of course the pre-determined number of threads to use, whereas this.dataStructure is a (thread-safe) Map which serves as the input to the computation, as well as being modified by the computation. There are other inputs involved, but as they are not relevant to this question, I've omitted their usage. I've also omitted exception handling for the same reason.
Runnable[] tasks = new Runnable[n];
Thread[] threads = new Thread[n];
ArrayBlockingQueue<MyObject> inputs = new ArrayBlockingQueue<>(this.dataStructure.size());
inputs.addAll(this.dataStructure.values());
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
threads[i] = new Thread(tasks[i]);
threads[i].start();
}
for (int i = 0; i < n; i++)
threads[i].join();
Because these Threads (and their runnables) always execute the same way using a single ArrayBlockingQueue as input, an alternative to this would be to just "refill the queue" every time the method is called and just re-start the same Threads. This is easily implemented, but I'm unsure as to whether it would make any difference one way or the other. I'm not too familiar with concurrency, so any help is appreciated.
PS.: If there is a more elegant way to handle the polling, that would also be helpful.
It is not possible to start a Thread more than once, but conceptually, the answer to your question is yes.
This is normally accomplished with a thread pool. A thread pool is a set of Threads which rarely actually terminate. Instead, an application is passes its task to the thread pool, which picks a Thread in which to run it. The thread pool then decides whether the Thread should be terminated or reused after the task completes.
Java has some classes which make use of thread pools quite easy: ExecutorService and CompletableFuture.
ExecutorService usage typically looks like this:
ExecutorService executor = Executors.newCachedThreadPool();
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
executor.submit(tasks[i]);
}
// Doesn't interrupt or halt any tasks. Will wait for them all to finish
// before terminating its threads.
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
Executors has other methods which can create thread pools, like newFixedThreadPool() and newWorkStealingPool(). You can decide for yourself which one best suits your needs.
CompletableFuture use might look like this:
Runnable[] tasks = new Runnable[n];
CompletableFuture<?>[] futures = new CompletableFuture<?>[n];
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
futures[i] = CompletableFuture.runAsync(tasks[i]);
}
CompletableFuture.allOf(futures).get();
The disadvantage of CompletableFuture is that the tasks cannot be canceled or interrupted. (Calling cancel will mark the task as completing with an exception instead of completing successfully, but the task will not be interrupted.)
Per definition, you cannot restart a thread. According to the documentation:
It is never legal to start a thread more than once. In particular, a thread may not be restarted once it has completed execution.
Nevertheless a thread is a valuable resource, and there are implementations to reuse threads. Have a look at the Java Tutorial about Executors.

Getting data created in one thread with another thread

I am supposed to create 2 Threads. One reads from data from file and creates objects of class Merchandise. The file itself consists of over 10,000 lines:
IdOfMerchandise Weight
First thread creates Merchandise objects line and every 200 objects it writes about it. The problem I have is, that I need a second thread, working at the same time as the first one, getting these objects and summing up overall weight, writing report every 100 added.
How can i use the thread to get object data at the same time as they are created in the other thread? Is using HashMap good idea to store newly created class objects with 2 variables?
When you pass data from one thread to another thread, you need a thread-safe data structure. As you correctly pointed out, HashMap is not thread-safe. For thread-safe collections in Java, look at the java.util.concurrent package. One of the simplest ways how to implementing a producer-consumer patterns is with LinkedBlockingQueue.
Here is a complete example with two threads, one producing objects, the other one consuming and printing something every 100 objects:
AtomicBoolean finished = new AtomicBoolean(false);
LinkedBlockingQueue<String> queue = new LinkedBlockingQueue<>();
Thread thread1 = new Thread(() -> {
for (int i = 0; i < 10000; i++) {
String createdObject = Integer.toString(i);
queue.offer(createdObject);
}
finished.set(true);
});
Thread thread2 = new Thread(() -> {
int count = 0;
while (!finished.get() || !queue.isEmpty()) {
try {
String object = queue.poll(100, TimeUnit.MILLISECONDS);
if (count++ % 100 == 0) {
System.out.println(object);
}
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
});
thread1.run(); thread2.run();
thread1.join(); thread2.join();
You may notice one thing - apart from the produced items, the threads also need to exchange other information - when the producer is finished. Again, you cannot safely exchange this information without synchronization. You can use AtomicBoolean as in the example, or a volatile field.
Seems like a producer-consumer problem. This official link will help you understanding and implementing the concept.
Guarded Blocks
The basic idea is, consumer can not consume unless producer has produced something.

How to use preexisting runnables, limiting the number of runnables to create.?

Problem Statement:
I have a 5000 id's that point to rows in a database.[ Could be more than 5000 ]
Each Runnable retrieves the row in a database given an id and performs some time consuming tasks
public class BORunnable implements Callable<Properties>{
public BORunnable(String branchID) {
this.branchID=branchID;
}
public setBranchId(String branchID){
this.branchID=branchID;
}
public Properties call(){
//Get the branchID
//Do some time consuming tasks. Merely takes 1 sec to complete
return propObj;
}
}
I am going to submit these runnables to the executor service.
For that, I need to create and submit 5000 or even more runnables to the executor service. This creation of runnables, in my environment could throw out of memory exception.
[given that 5000 is just an example]
So I came up with a approach, I would be thankful if you provide anything different:
Created a thread pool of fixed size 10.
int corePoolSize = 10;
ThreadPoolExecutor executor = new ThreadPoolExecutor(corePoolSize,
corePoolSize + 5, 10, TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>());
Collection<Future<Properties>> futuresCollection =
new LinkedList<Future<Properties>>();
Added all of the branchIDs to the branchIdQueue
Queue<String> branchIdQueue = new LinkedList<String>();
Collections.addAll(branchIdQueue, branchIDs);
I am trying to reuse runnable. Created a bunch of runnable
Now i want this number of elements to be dequeued and create runnable for each
int noOfElementsToDequeue = Math.min(corePoolSize, branchIdQueue.size());
ArrayList<BORunnable>runnablesList = dequeueAndSubmitRunnable(
branchIdQueue,noOfElementsToDequeue);
ArrayList<BORunnable> dequeueAndSubmitRunnable(branchIdQueue,
noOFElementsToDequeue){
ArrayList<BORunnable> runnablesList= new ArrayList<BORunnable>();
for (int i = 0; i < noOfElementsToDequeue; i++) {
//Create this number of runnables
runnablesList.add(new BORunnable(branchIdQueue.remove()));
}
return runnablesList;
}
Submitting the retrieved runnables to the executor
for(BORunnable boRunnableObj:runnablesList){
futuresCollection.add(executor.submit(boRunnableObj));
}
If the queue is empty, I created the runnables I needed. if it's not, I want to reuse the runnable and submit to the executor.
Here I get number of runnables to be reused = the total count - current active count
[Approximate is enough for me]
int coreSize=executor.getCorePoolSize();
while(!branchIdQueue.isEmpty()){
//Total size - current active count
int runnablesToBeReused=coreSize-executor.getActiveCount();
if(runnablesToBeReused!=0){
ArrayList<String> branchIDsTobeReset = removeElementsFromQueue(
branchIdQueue,runnablesToBeReused);
ArrayList<BORunnable> boRunnableToBeReusedList =
getBORunnableToBeReused(boRunnableList,runnablesToBeReused);
for(BORunnable aRunnable:boRunnableList){
//aRunnable.set(branchIDSTobeRest.get(0));
}
}
}
My Problem is
I couldn't able to find out which Runnable has been released by the thread pool so i could use that to submit
Hence, I randomly take few runnables and try to set the branchId, but then thread race problem may occur. [don't want to use volatile]
Reusing the Runnables makes no sense as the problem is not the cost of creating or freeing the runnable instances. These come almost for free in Java.
What you want to do is to limit the number of pending jobs which is easy to achieve: just provide a limit to the queue you are passing to the executor service. That’s as easy as passing an int value (the limit) to the LinkedBlockingQueue’s constructor. Note that you can also use an ArrayBlockingQueue then as a LinkedBlockingQueue does not provide an advantage for bounded queue usage.
When you have provided a limit to the queue, the executor will reject queuing up new jobs. The only thing left to do is to provide an appropriate RejectedExecutionHandler to the executor. E.g. CallerRunsPolicy would be sufficient to avoid that the caller creates more new jobs while the threads are all busy and the queue is full.
After execution, the Runnables are subject to garbage collection.

Recursive concurrency

I have the following function, in pseudo-code:
Result calc(Data data) {
if (data.isFinal()) {
return new Result(data); // This is the actual lengthy calculation
} else {
List<Result> results = new ArrayList<Result>();
for (int i=0; i<data.numOfSubTasks(); ++i) {
results.add(calc(data.subTask(i));
}
return new Result(results); // merge all results in to a single result
}
}
I want to parallelize it, using a fixed number of threads.
My first attempt was:
ExecutorService executorService = Executors.newFixedThreadPool(numOfThreads);
Result calc(Data data) {
if (data.isFinal()) {
return new Result(data); // This is the actual lengthy calculation
} else {
List<Result> results = new ArrayList<Result>();
List<Callable<Void>> callables = new ArrayList<Callable<Void>>();
for (int i=0; i<data.numOfSubTasks(); ++i) {
callables.add(new Callable<Void>() {
public Void call() {
results.add(calc(data.subTask(i));
}
});
}
executorService.invokeAll(callables); // wait for all sub-tasks to complete
return new Result(results); // merge all results in to a single result
}
}
However, this quickly got stuck in a deadlock, because, while the top recursion level waits for all threads to finish, the inner levels also wait for threads to become available...
How can I efficiently parallelize my program without deadlocks?
Your problem is a general design problem when using ThreadPoolExecutor for tasks with dependencies.
I see two options:
1) Make sure to submit tasks in a bottom-up order, so that you never have a running task that depends on a task which didn't start yet.
2) Use the "direct handoff" strategy (See ThreadPoolExecutor documentation):
ThreadPoolExecutor executor = new ThreadPoolExecutor(poolSize, poolSize, 0, TimeUnit.SECONDS, new SynchronousQueue<Runnable>());
executor.setRejectedExecutionHandler(new CallerRunsPolicy());
The idea is using a synchronous queue so that tasks never wait in a real queue. The rejection handler takes care of tasks which don't have an available thread to run on. With this particular handler, the submitter thread runs the rejected tasks.
This executor configuration guarantees that tasks are never rejected, and that you never have deadlocks due to inter-task dependencies.
you should split your approach in two phases:
create all the tree down until data.isFinal() == true
recursively collect the results (only possible if the merging does not produce other operations/calls)
To do that, you can use [Futures][1] to make the results async. Means all results of calc will be of type Future[Result].
Immediately returning a Future will free the current thread and give space for the processing of others. With the collection of the Results (new Result(results)) you should wait for all results to be ready (ScatterGather-Pattern, you can use a semaphore to wait for all results). The collection itself will be walking a tree and checking (or waiting for the results to arrive) will happen in a single thread.
Overall you build a tree of Futures, that is used to collect the results and perform only the "expensive" operations in the threadpool.

Is adding tasks to BlockingQueue of ThreadPoolExecutor advisable?

The JavaDoc for ThreadPoolExecutor is unclear on whether it is acceptable to add tasks directly to the BlockingQueue backing the executor. The docs say calling executor.getQueue() is "intended primarily for debugging and monitoring".
I'm constructing a ThreadPoolExecutor with my own BlockingQueue. I retain a reference to the queue so I can add tasks to it directly. The same queue is returned by getQueue() so I assume the admonition in getQueue() applies to a reference to the backing queue acquired through my means.
Example
General pattern of the code is:
int n = ...; // number of threads
queue = new ArrayBlockingQueue<Runnable>(queueSize);
executor = new ThreadPoolExecutor(n, n, 1, TimeUnit.HOURS, queue);
executor.prestartAllCoreThreads();
// ...
while (...) {
Runnable job = ...;
queue.offer(job, 1, TimeUnit.HOURS);
}
while (jobsOutstanding.get() != 0) {
try {
Thread.sleep(...);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
executor.shutdownNow();
queue.offer() vs executor.execute()
As I understand it, the typical use is to add tasks via executor.execute(). The approach in my example above has the benefit of blocking on the queue whereas execute() fails immediately if the queue is full and rejects my task. I also like that submitting jobs interacts with a blocking queue; this feels more "pure" producer-consumer to me.
An implication of adding tasks to the queue directly: I must call prestartAllCoreThreads() otherwise no worker threads are running. Assuming no other interactions with the executor, nothing will be monitoring the queue (examination of ThreadPoolExecutor source confirms this). This also implies for direct enqueuing that the ThreadPoolExecutor must additionally be configured for > 0 core threads and mustn't be configured to allow core threads to timeout.
tl;dr
Given a ThreadPoolExecutor configured as follows:
core threads > 0
core threads aren't allowed to timeout
core threads are prestarted
hold a reference to the BlockingQueue backing the executor
Is it acceptable to add tasks directly to the queue instead of calling executor.execute()?
Related
This question ( producer/consumer work queues ) is similar, but doesn't specifically cover adding to the queue directly.
One trick is to implement a custom subclass of ArrayBlockingQueue and to override the offer() method to call your blocking version, then you can still use the normal code path.
queue = new ArrayBlockingQueue<Runnable>(queueSize) {
#Override public boolean offer(Runnable runnable) {
try {
return offer(runnable, 1, TimeUnit.HOURS);
} catch(InterruptedException e) {
// return interrupt status to caller
Thread.currentThread().interrupt();
}
return false;
}
};
(as you can probably guess, i think calling offer directly on the queue as your normal code path is probably a bad idea).
If it were me, I would prefer using Executor#execute() over Queue#offer(), simply because I'm using everything else from java.util.concurrent already.
Your question is a good one, and it piqued my interest, so I took a look at the source for ThreadPoolExecutor#execute():
public void execute(Runnable command) {
if (command == null)
throw new NullPointerException();
if (poolSize >= corePoolSize || !addIfUnderCorePoolSize(command)) {
if (runState == RUNNING && workQueue.offer(command)) {
if (runState != RUNNING || poolSize == 0)
ensureQueuedTaskHandled(command);
}
else if (!addIfUnderMaximumPoolSize(command))
reject(command); // is shutdown or saturated
}
}
We can see that execute itself calls offer() on the work queue, but not before doing some nice, tasty pool manipulations if necessary. For that reason, I'd think that it'd be advisable to use execute(); not using it may (although I don't know for certain) cause the pool to operate in a non-optimal way. However, I don't think that using offer() will break the executor - it looks like tasks are pulled off the queue using the following (also from ThreadPoolExecutor):
Runnable getTask() {
for (;;) {
try {
int state = runState;
if (state > SHUTDOWN)
return null;
Runnable r;
if (state == SHUTDOWN) // Help drain queue
r = workQueue.poll();
else if (poolSize > corePoolSize || allowCoreThreadTimeOut)
r = workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS);
else
r = workQueue.take();
if (r != null)
return r;
if (workerCanExit()) {
if (runState >= SHUTDOWN) // Wake up others
interruptIdleWorkers();
return null;
}
// Else retry
} catch (InterruptedException ie) {
// On interruption, re-check runState
}
}
}
This getTask() method is just called from within a loop, so if the executor's not shutting down, it'd block until a new task was given to the queue (regardless of from where it came from).
Note: Even though I've posted code snippets from source here, we can't rely on them for a definitive answer - we should only be coding to the API. We don't know how the implementation of execute() will change over time.
One can actually configure behavior of the pool when the queue is full, by specifying a RejectedExecutionHandler at instantiation. ThreadPoolExecutor defines four policies as inner classes, including AbortPolicy, DiscardOldestPolicy, DiscardPolicy, as well as my personal favorite, CallerRunsPolicy, which runs the new job in the controlling thread.
For example:
ThreadPoolExecutor threadPool = new ThreadPoolExecutor(
nproc, // core size
nproc, // max size
60, // idle timeout
TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(4096, true), // Fairness = true guarantees FIFO
new ThreadPoolExecutor.CallerRunsPolicy() ); // If we have to reject a task, run it in the calling thread.
The behavior desired in the question can be obtained using something like:
public class BlockingPolicy implements RejectedExecutionHandler {
void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
executor.getQueue.put(r); // Self contained, no queue reference needed.
}
At some point the queue must be accessed. The best place to do so is in a self-contained RejectedExecutionHandler, which saves any code duplication or potenial bugs arising from direct manipulation of the queue at the scope of the pool object. Note that the handlers included in ThreadPoolExecutor themselves use getQueue().
It's a very important question if the queue you're using is a completely different implementation from the standard in-memory LinkedBlockingQueue or ArrayBlockingQueue.
For instance if you're implementing the producer-consumer pattern using several producers on different machines, and use a queuing mechanism based on a separate persistence subsystem (like Redis), then the question becomes relevant on its own, even if you don't want a blocking offer() like the OP.
So the given answer, that prestartAllCoreThreads() has to be called (or enough times prestartCoreThread()) for the worker threads to be available and running, is important enough to be stressed.
If required, we can also use a parking lot which separates main processing from rejected tasks -
final CountDownLatch taskCounter = new CountDownLatch(TASKCOUNT);
final List<Runnable> taskParking = new LinkedList<Runnable>();
BlockingQueue<Runnable> taskPool = new ArrayBlockingQueue<Runnable>(1);
RejectedExecutionHandler rejectionHandler = new RejectedExecutionHandler() {
#Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
System.err.println(Thread.currentThread().getName() + " -->rejection reported - adding to parking lot " + r);
taskCounter.countDown();
taskParking.add(r);
}
};
ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(5, 10, 1000, TimeUnit.SECONDS, taskPool, rejectionHandler);
for(int i=0 ; i<TASKCOUNT; i++){
//main
threadPoolExecutor.submit(getRandomTask());
}
taskCounter.await(TASKCOUNT * 5 , TimeUnit.SECONDS);
System.out.println("Checking the parking lot..." + taskParking);
while(taskParking.size() > 0){
Runnable r = taskParking.remove(0);
System.out.println("Running from parking lot..." + r);
if(taskParking.size() > LIMIT){
waitForSometime(...);
}
threadPoolExecutor.submit(r);
}
threadPoolExecutor.shutdown();

Categories