Threadpool with persistent worker instances - java

I'm trying to queue up tasks in a thread pool to be executed as soon as a worker becomes free, i have found various examples of this but in all cases the examples have been setup to use a new Worker instance for each job, i want persistent workers.
I'm trying to make a ftp backup tool, i have it working but because of the limitations of a single connection it is slow. What i ideally want to do is have a single connection for scanning directories and building up a file list then four workers to download said files.
Here is an example of my FTP worker:
public class Worker implements Runnable {
protected FTPClient _ftp;
// Connection details
protected String _host = "";
protected String _user = "";
protected String _pass = "";
// worker status
protected boolean _working = false;
public Worker(String host, String user, String pass) {
this._host = host;
this._user = user;
this._pass = pass;
}
// Check if the worker is in use
public boolean inUse() {
return this._working;
}
#Override
public void run() {
this._ftp = new FTPClient();
this._connect();
}
// Download a file from the ftp server
public boolean download(String base, String path, String file) {
this._working = true;
boolean outcome = true;
//create directory if not exists
File pathDir = new File(base + path);
if (!pathDir.exists()) {
pathDir.mkdirs();
}
//download file
try {
OutputStream output = new FileOutputStream(base + path + file);
this._ftp.retrieveFile(file, output);
output.close();
} catch (Exception e) {
outcome = false;
} finally {
this._working = false;
return outcome;
}
}
// Connect to the server
protected boolean _connect() {
try {
this._ftp.connect(this._host);
this._ftp.login(this._user, this._pass);
} catch (Exception e) {
return false;
}
return this._ftp.isConnected();
}
// Disconnect from the server
protected void _disconnect() {
try {
this._ftp.disconnect();
} catch (Exception e) { /* do nothing */ }
}
}
I want to be able to call Worker.download(...) for each task in a queue whenever a worker becomes available without having to create a new connection to the ftp server for each download.
Any help would be appreciated as I've never used threads before and I'm going round in circles at the moment.

the examples have been setup to use a new Worker instance for each job, i want persistent workers.
This is a common question with a couple of different solutions. What you want is some context per thread as opposed to per Runnable or Callable that would be submitting to an ExecutorService.
One option would be to have a ThreadLocal which would create your ftp instances. This is not optimal because there would be no easy way to shutdown the ftp connection when the thread is terminated. You would then limit the number of connections by limiting the number of threads running in your thread-pool.
I think a better solution would be to use the ExecutorService only to fork your worker threads. For each worker, inject into them a BlockingQueue that they all use to dequeue and perform the tasks they need to do. This is separate from the queue used internally by the ExecutorService. You would then add the tasks to your queue and not to the ExecutorService itself.
private static final BlockingQueue<FtpTask> taskQueue
= new ArrayBlockingQueue<FtpTask>();
So your task object would have something like:
public static class FtpTask {
String base;
String path;
String file;
}
Then the run() method in your Worker class would do something like:
public void run() {
// make our permanent ftp instance
this._ftp = new FTPClient();
// connect it for the life of this thread
this._connect();
try {
// loop getting tasks until we are interrupted
// could also use volatile boolean !shutdown
while (!Thread.currentThread().isInterrupted()) {
FtpTask task = taskQueue.take();
// if you are using a poison pill
if (task == SHUTDOWN_TASK) {
break;
}
// do the download here
download(task.base, task.path, task.file);
}
} finally {
this._disconnect();
}
}
Again, you limit the number of connections by limiting the number of threads running in your thread-pool.
What i ideally want to do is have a single connection for scanning directories and building up a file list then four workers to download said files.
I would have a Executors.newFixedThreadPool(5); and add one thread which does the scanning/building and 4 worker threads that are doing the downloading. The scanning thread would be putting to the BlockingQueue while the worker threads are taking from the same queue.

I would suggest go for ThreadPoolexecutor with core size and maxpoolsize as per requirements. Also use a Linked Blocking queue in this case which will act your tasks in it in a FIFO manner.
As soon as a Thread(worker) becomes free the task will be picked from queue and executed.
Check out details of ThreadPoolExecutor. Let me know in case you get stuck anywhere in implementation of ThreadPoolexecutor.

Related

Thread-safe FIFO queue with unique items and thread pool

I have to manage scheduled file replications in a system. The file replications are scheduled by users and I need to restrict the amount of system resources used during replication. The amount of time that each replication may take is not defined (i.e. a replication may be scheduled to run every 15 minutes and the previous run may still be running when the next run is due) and a replication should not be queued if it's already queued or running.
I have a scheduler that periodically checks for due file replications and, for each one, (1) add it to a blocking queue if it is not queued nor running or (2) drop it otherwise.
private final Object scheduledReplicationsLock = new Object();
private final BlockingQueue<Replication> replicationQueue = new LinkedBlockingQueue<>();
private final Set<Long> queuedReplicationIds = new HashSet<>();
private final Set<Long> runningReplicationIds = new HashSet<>();
public boolean add(Replication replication) {
synchronized (scheduledReplicationsLock) {
// If the replication job is either still executing or is already queued, do not add it.
if (queuedReplicationIds.contains(replication.id) || runningReplicationIds.contains(replication.id)) {
return false;
}
replicationQueue.add(replication)
queuedReplicationIds.add(replication.id);
}
I also have a pool of threads that waits until there is a replication in the queue and executes it. Below is the main method of each thread in the thread pool:
public void run() {
while (True) {
Replication replication = null;
synchronized (scheduledReplicationsLock) {
// This will block until a replication job is ready to be run or the current thread is interrupted.
replication = replicationQueue.take();
// Move the ID value out of the queued set and into the active set
Long replicationId = replication.getId();
queuedReplicationIds.remove(replicationId);
runningReplicationIds.add(replicationId);
}
executeReplication(replication)
}
}
This code gets into a deadlock because the first thread in the thread poll will get scheduledLock and prevent the scheduler to add replications to the queue. Moving replicationQueue.take() out of the synchronized block will eliminate the deadlock, but then it's possible that a element is removed from the queue and the hash sets are not atomically updated with it, which could cause a replication to be incorrectly dropped.
Should I use BlockingQueue.poll() and release the lock + sleep if the queue is empty instead of using BlockingQueue.take() ?
Fixes to the current solution or other solutions that meet the requirements are welcome.
wait / notify
Keeping your same control flow, instead of blocking on the BlockingQueue instance while holding the mutex lock, you can wait on notifications for the scheduledReplicationsLock forcing the worker thread to release the lock and return to the waiting pool.
Here down a reduced sample of your producer:
private final List<Replication> replicationQueue = new LinkedList<>();
private final Set<Long> runningReplicationIds = new HashSet<>();
public boolean add(Replication replication) {
synchronized (replicationQueue) {
// If the replication job is either still executing or is already queued, do not add it.
if (replicationQueue.contains(replication) || runningReplicationIds.contains(replication.id)) {
return false;
} else {
replicationQueue.add(replication);
replicationQueue.notifyAll();
}
}
}
The worker Runnable would then be updated as follows:
public void run() {
synchronized (replicationQueue) {
while (true) {
if (replicationQueue.isEmpty()) {
scheduledReplicationsLock.wait();
}
if (!replicationQueue.isEmpty()) {
Replication replication = replicationQueue.poll();
runningReplicationIds.add(replication.getId())
executeReplication(replication);
}
}
}
}
BlockingQueue
Generally you are better off using the BlockingQueue to coordinate your producer and replicating worker pool.
The BlockingQueue is, as the name implies, blocking by nature and will cause the calling thread to block only if items cannot be pulled / pushed from / to the queue.
Meanwhile, note that you will have to update your running / enqueued state management as you will only synchronizing on the BlockingQueue items dropping any constraints. This then will depend on the context, whether this would be acceptable or not.
This way, you would drop all other used mutex(es) and use on the BlockingQueue as your synchronization state:
private final BlockingQueue<Replication> replicationQueue = new LinkedBlockingQueue<>();
public boolean add(Replication replication) {
// not sure if this is the proper invariant to check as at some point the replication would be neither queued nor running while still have been processed
if (replicationQueue.contains(replication)) {
return false;
}
// use `put` instead of `add` as this will block waiting for free space
replicationQueue.put(replication);
return true;
}
The workers will then take indefinitely from the BlockingQueue:
public void run() {
while (true) {
Replication replication = replicationQueue.take();
executeReplication(replication);
}
}
You no need to use any additional synchronization block if you using BlockingQueue
Quote from docs (https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html)
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control.
just use something like this
public void run() {
try {
while (replicationQueue.take()) { //Thread will be wait for the next element in the queue
Long replicationId = replication.getId();
queuedReplicationIds.remove(replicationId);
runningReplicationIds.add(replicationId);
executeReplication(replication);
}
} catch (InterruptedException ex) {
//if interrupted while waiting next element
}
}
}
look in javadoc https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/LinkedBlockingQueue.html#take()
Or you can use BlockinQueue.pool() with timeout settings
UPD: After discussion, I extend LinkedBlockingQueue with two ConcurrentHashSets and add method afterTake() to remove processed Replicas. You do not need an additional synchronizations outside the queue. Just put replica in the first thread and take it in another, and call afterTake() when replication finished. You need to override other method if you want to use it.
package ru.everytag;
import io.vertx.core.impl.ConcurrentHashSet;
import java.util.concurrent.LinkedBlockingQueue;
public class TwoPhaseBlockingQueue<E> extends LinkedBlockingQueue<E> {
private ConcurrentHashSet<E> items = new ConcurrentHashSet<>();
private ConcurrentHashSet<E> taken = new ConcurrentHashSet<>();
#Override
public void put(E e) throws InterruptedException {
if (!items.contains(e)) {
items.add(e);
super.put(e);
}
}
public E take() {
E item = take();
taken.add(item);
items.remove(item);
return item;
}
public void afterTake(E e) {
if (taken.contains(e)) {
taken.remove(e);
} else if (items.contains(e)) {
throw new IllegalArgumentException("Element still in the queue");
}
}
}

Shutting Down Executor Service

I am working on an application, where I continuously read data from a Kafka topic. This data comes in String format which I then write to an xml file & store it on hard disk. Now, this data comes randomly and mostly it's supposed to come in bulk, in quick succession.
To write these files, I am using an Executor Service.
ExecutorService executor = Executors.newFixedThreadPool(4);
/*
called multiple times in quick succession
*/
public void writeContent(String message) {
try {
executor.execute(new FileWriterTask(message));
} catch(Exception e) {
executor.shutdownNow();
e.printStackTrace();
}
}
private class FileWriterTask implements Runnable{
String data;
FileWriterTask(String content){
this.data = content;
}
#Override
public void run() {
try {
String fileName = UUID.randomUUID().toString();
File file = new File("custom path" + fileName + ".xml");
FileUtils.writeStringToFile(file, data, Charset.forName("UTF-8"));
} catch (IOException e) {
e.printStackTrace();
}
}
}
I want to know when should I shutdown my executor service. If my application was time bound, I would used awaitTermination on my executor instance, but my app is supposed to run continuously.
If in case of any exception, my whole app is killed, would it automatically shutdown my executor?
Or should I catch an unchecked exception and shutdown my executor, as I have done above in my code?
Can I choose not to explicitly shutdown my executor in my code? What are my options?
EDIT: Since my class was a #RestController class I used the following
way to shutdown my executor service
#PreDestroy
private void destroy() {
executor.shutdownNow();
if(executor != null) {
System.out.println("executor.isShutdown() = " + executor.isShutdown());
System.out.println("executor.isTerminated() = " + executor.isTerminated());
}
}
It is a good practice to shut down your ExecutorService. There are two types of shutdown that you should be aware of, shutdown() and shutdownNow().
If you're running you application on an application server with Java EE, you can also use a ManagedExecutorService, which is managed by the framework and will be shut down automatically.

How to create pool of clients which can handle just one task at once

My application starts couple of clients which communicate with steam. There are two types of task which I can ask for clients. One when I don't care about blocking for example ask client about your friends. But second there are tasks which I can submit just one to client and I need to wait when he finished it asynchronously. So I am not sure if there is already some design pattern but you can see what I already tried. When I ask for second task I removed it from queue and return it here after this task is done. But I don't know if this is good sollution because I can 'lost' some clients when I do something wrong
#Component
public class SteamClientWrapper {
private Queue<DotaClientImpl> clients = new LinkedList<>();
private final Object clientLock = new Object();
public SteamClientWrapper() {
}
#PostConstruct
public void init() {
// starting clients here clients.add();
}
public DotaClientImpl getClient() {
return getClient(false);
}
public DotaClientImpl getClient(boolean freeLast) {
synchronized (clients) {
if (!clients.isEmpty()) {
return freeLast ? clients.poll() : clients.peek();
}
}
return null;
}
public void postClient(DotaClientImpl client) {
if (client == null) {
return;
}
synchronized (clientLock) {
clients.offer(client);
clientLock.notify();
}
}
public void doSomethingBlocking() {
DotaClientImpl client = getClient(true);
client.doSomething();
}
}
Sounds like you could use Spring's ThreadPoolTaskExecutor to do that.
An Executor is basically what you tried to do - store tasks in a queue and process the next as soon the previous has finished.
Often this is used to run tasks in parallel, but it can also reduce overhead for serial processing.
A sample doing it this way would be on
https://dzone.com/articles/spring-and-threads-taskexecutor
To ensure only one client task runs at a time, simply set the configuration to
executor.setCorePoolSize(1);
executor.setMaxPoolSize(1);

Is this a safe way to generate new threads and terminate them?

I have an application that receives alerts from other applications, usually once a minute or so but I need to be able to handle higher volume per minute. The interface I am using, and the Alert framework in general, requires that alerts may be processed asynchronously and can be stopped if they are being processed asynchronously. The stop method specifically is documented as stopping a thread. I wrote the code below to create an AlertRunner thread and then stop the thread. However, is this a proper way to handle terminating a thread? And will this code be able to scale easily (not to a ridiculous volume, but maybe an alert a second or multiple alerts at the same time)?
private AlertRunner alertRunner;
#Override
public void receive(Alert a) {
assert a != null;
alertRunner = new alertRunner(a.getName());
a.start();
}
#Override
public void stop(boolean synchronous) {
if(!synchronous) {
if(alertRunner != null) {
Thread.currentThread().interrupt();
}
}
}
class AlertRunner extends Thread {
private final String alertName;
public AlertRunner(String alertName) {
this.alertName = alertName;
}
#Override
public void run() {
try {
TimeUnit.SECONDS.sleep(5);
log.info("New alert received: " + alertName);
} catch (InterruptedException e) {
log.error("Thread interrupted: " + e.getMessage());
}
}
}
This code will not scale easily because Thread is quite 'heavy' object. It's expensive to create and it's expensive to start. It's much better to use ExecutorService for your task. It will contain a limited number of threads that are ready to process your requests:
int threadPoolSize = 5;
ExecutorService executor = Executors.newFixedThreadPool(threadPoolSize);
public void receive(Alert a) {
assert a != null;
executor.submit(() -> {
// Do your work here
});
}
Here executor.submit() will handle your request in a separate thread. If all threads are busy now, the request will wait in a queue, preventing resource exhausting. It also returns an instance of Future that you can use to wait for the completion of the handling, setting the timeout, receiving the result, for cancelling execution and many other useful things.

Using concurrent classes to process files in a directory in parallel

I am trying to figure out how to use the types from the java.util.concurrent package to parallelize processing of all the files in a directory.
I am familiar with the multiprocessing package in Python, which is very simple to use, so ideally I am looking for something similar:
public interface FictionalFunctor<T>{
void handle(T arg);
}
public class FictionalThreadPool {
public FictionalThreadPool(int threadCount){
...
}
public <T> FictionalThreadPoolMapResult<T> map(FictionalFunctor<T> functor, List<T> args){
// Executes the given functor on each and every arg from args in parallel. Returns, when
// all the parallel branches return.
// FictionalThreadPoolMapResult allows to abort the whole mapping process, at the least.
}
}
dir = getDirectoryToProcess();
pool = new FictionalThreadPool(10); // 10 threads in the pool
pool.map(new FictionalFunctor<File>(){
#Override
public void handle(File file){
// process the file
}
}, dir.listFiles());
I have a feeling that the types in java.util.concurrent allow me to do something similar, but I have absolutely no idea where to start.
Any ideas?
Thanks.
EDIT 1
Following the advices given in the answers, I have written something like this:
public void processAllFiles() throws IOException {
ExecutorService exec = Executors.newFixedThreadPool(6);
BlockingQueue<Runnable> tasks = new LinkedBlockingQueue<Runnable>(5); // Figured we can keep the contents of 6 files simultaneously.
exec.submit(new MyCoordinator(exec, tasks));
for (File file : dir.listFiles(getMyFilter()) {
try {
tasks.add(new MyTask(file));
} catch (IOException exc) {
System.err.println(String.format("Failed to read %s - %s", file.getName(), exc.getMessage()));
}
}
}
public class MyTask implements Runnable {
private final byte[] m_buffer;
private final String m_name;
public MyTask(File file) throws IOException {
m_name = file.getName();
m_buffer = Files.toByteArray(file);
}
#Override
public void run() {
// Process the file contents
}
}
private class MyCoordinator implements Runnable {
private final ExecutorService m_exec;
private final BlockingQueue<Runnable> m_tasks;
public MyCoordinator(ExecutorService exec, BlockingQueue<Runnable> tasks) {
m_exec = exec;
m_tasks = tasks;
}
#Override
public void run() {
while (true) {
Runnable task = m_tasks.remove();
m_exec.submit(task);
}
}
}
How I thought the code works is:
The files are read one after another.
A file contents are saved in a dedicated MyTask instance.
A blocking queue with the capacity of 5 to hold the tasks. I count on the ability of the server to keep the contents of at most 6 files at one time - 5 in the queue and another fully initialized task waiting to enter the queue.
A special MyCoordinator task fetches the file tasks from the queue and dispatches them to the same pool.
OK, so there is a bug - more than 6 tasks can be created. Some will be submitted, even though all the pool threads are busy. I've planned to solve it later.
The problem is that it does not work at all. The MyCoordinator thread blocks on the first remove - this is fine. But it never unblocks, even though new tasks were placed in the queue. Can anyone tell me what am I doing wrong?
The thread pool you are looking for is the ExecutorService class. You can create a fixed-size thread pool using newFixedThreadPool. This allows you to easily implement a producer-consumer pattern, with the pool encapsulating all the queue and worker functionality for you:
ExecutorService exec = Executors.newFixedThreadPool(10);
You can then submit tasks in the form of objects whose type implements Runnable (or Callable if you want to also get a result):
class ThreadTask implements Runnable {
public void run() {
// task code
}
}
...
exec.submit(new ThreadTask());
// alternatively, using an anonymous type
exec.submit(new Runnable() {
public void run() {
// task code
}
});
A big word of advice on processing multiple files in parallel: if you have a single mechanical disk holding the files it's wise to use a single thread to read them one-by-one and submit each file to a thread pool task as above, for processing. Do not do the actual reading in parallel as it will degrade performance.
A simpler solution than using ExecuterService is to implement your own producer-consumer scheme. Have a thread that create tasks and submits to a LinkedBlockingQueue or ArrayBlockingQueue and have worker threads that check this queue to retrieve the tasks and do them. You may need a special kind of tasks name ExitTask that forces the workers to exit. So at the end of the jobs if you have n workers you need to add n ExitTasks into the queue.
Basically, what #Tudor said, use an ExecutorService, but I wanted to expand on his code and I always feel strange editing other people's posts. Here's a sksleton of what you would submit to the ExecutorService:
public class MyFileTask implements Runnable {
final File fileToProcess;
public MyFileTask(File file) {
fileToProcess = file;
}
public void run() {
// your code goes here, e.g.
handle(fileToProcess);
// if you prefer, implement Callable instead
}
}
See also my blog post here for some more details if you get stuck
Since processing Files often leads to IOExceptions, I'd prefer a Callable (which can throw a checked Exception) to a Runnable, but YMMV.

Categories