I'm looking for a simple object that will hold my work threads and I need it to not limit the number of threads, and not keep them alive longer than needed.
But I do need it to have a method similar to an ExecutorService.shutdown();
(Waiting for all the active threads to finish but not accepting any new ones)
so maybe a threadpool isn't what I need, so I would love a push in the right direction.
(as they are meant to keep the threads alive)
Further clarification of intent:
each thread is an upload of a file, and I have another process that modifies files, but it waits for the file to not have any uploads. by joining each of the threads. So when they are kept alive it locks that process. (each thread adds himself to a list for a specific file on creation, so I only join() threads that upload a specific file)
One way to do what you awant is to use a Callable with a Future that returns the File object of a completed upload. Then pass the Future into another Callable that checks Future.isDone() and spins until it returns true and then do whatever you need to do to the file. Your use case is not unique and fits very neatly into the java.util.concurrent package capabilities.
One interesting class is ExecutorCompletionService class which does exactly what you want with waiting for results then proceeding with an additional calculation.
A CompletionService that uses a
supplied Executor to execute tasks.
This class arranges that submitted
tasks are, upon completion, placed on
a queue accessible using take. The
class is lightweight enough to be
suitable for transient use when
processing groups of tasks.
Usage Examples: Suppose you have a set of solvers for a certain problem,
each returning a value of some type
Result, and would like to run them
concurrently, processing the results
of each of them that return a non-null
value, in some method use(Result r).
You could write this as:
void solve(Executor e, Collection<Callable<Result>> solvers)
throws InterruptedException, ExecutionException
{
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers) { ecs.submit(s); }
int n = solvers.size();
for (int i = 0; i < n; ++i)
{
Result r = ecs.take().get();
if (r != null) { use(r); }
}
}
You don't want an unbounded ExecutorService
You almost never want to allow unbounded thread pools, as they actually can limit the performance of your application if the number of threads gets out of hand.
You domain is limited by disk or network I/O or both, so a small thread pool would be sufficient. You are not going to want to try and read from hundreds or thousands of incoming connections with a thread per connection.
Part of your solution, if you are receiving more than a handful of concurrent uploads is to investigate the java.nio package and read about non-blocking I/O as well.
Is there a reason that you don't want to reuse threads? Seems to me that the simplest thing would be to use ExecutorService anyway and let it reuse threads.
Related
I'm working on a Java server application with the general following architecture:
Clients make RPC requests to the server
The RPC server (gRPC) I believe has its own thread pool for handling requests
Requests are immediately inserted into Thread Pool 1 for more processing
A specific request type, we'll call Request R, needs to run a few asynchronous tasks in parallel, judging the results to form a consensus that it will return to the client. These tasks are a bit more long running, so I use a separate Thread Pool 2 to handle these requests. Importantly, each Request R will need to run the same 2-3 asynchronous tasks. Thread Pool 2 therefore services ALL currently executing Request R's. However, a Request R should only be able to see and retrieve the asynchronus tasks that belong to it.
To achieve this, upon every incoming Request R, while its in Thread Pool 1, it will create a new CompletionService for the request, backed by Thread Pool 2. It will submit 2-3 async tasks, and retrieve the results. These should be strictly isolated from anything else that might be running in Thread Pool 2 belonging to other requests.
My questions:
Firstly, is Java's CompletionService isolated? I couldn't find good documentation on this after checking the JavaDocs. In other words, if two or more CompletionService's are backed by the same thread pool, are any of them at risk of pulling a future belonging to another CompletionService?
Secondly, is this bad practice to be creating this many CompletionService's for each request? Is there a better way to handle this? Of course it would be a bad idea to create a new thread pool for each request, so is there a more canonical/correct way to isolate futures within a CompletionService or is what I'm doing okay?
Thanks in advance for the help. Any pointers to helpful documentation or examles would be greatly appreciated.
Code, for reference, although trivial:
public static final ExecutorService THREAD_POOL_2 =
new ThreadPoolExecutor(16, 64, 60, TimeUnit.SECONDS, new LinkedBlockingQueue<>());
// Gets created to handle a RequestR, RequestRHandler is run in Thread Pool 1
public class RequestRHandler {
CompletionService<String> cs;
RequestRHandler() {
cs = new ExecutorCompletionService<>(THREAD_POOL_2);
}
String execute() {
cs.submit(asyncTask1);
cs.submit(asyncTask2);
cs.submit(asyncTask3);
// Lets say asyncTask3 completes first
Future<String> asyncTask3Result = cs.take();
// asyncTask3 result indicates asyncTask1 & asyncTask2 results don't matter, cancel them
// without checking result
// Cancels all futures, I track all futures submitted within this request and cancel them,
// so it shouldn't affect any other requests in the TP 2 pool
cancelAllFutures(cs);
return asyncTask3Result.get();
}
}
Firstly, is Java's CompletionService isolated?
That's not garanteed as it's an interface, so the implementation decides that. But as the only implementation is ExecutorCompletionService I'd just say the answer is: yes. Every instance of ExecutorCompletionService has internally a BlockingQueue where the finished tasks are queued. Actually, when you call take on the service, it just passes the call to the queue by calling take on it. Every submitted task is wrapped by another object, which puts the task in the queue when it's finished. So each instance manages it's submitted tasks isolated from other instances.
Secondly, is this bad practice to be creating this many CompletionServices for each request?
I'd say it's okay. A CompletionService is nothing but a rather thin wrapper around an executor. You have to live with the "overhead" (internal BlockingQueue and wrapper instances for the tasks) but it's small and you are probably gaining way more from it than it costs. One could ask if you need one for just 2 to 3 tasks but it kinda depends on the tasks. At this point it's a question about if a CompletionService is worth it in general, so that's up to you to decide as it's out of scope of your question.
I have a service that handles the main entity, retrieves the first sub-entity associated with the main entity, then returns both. It also sets off a set of completable Future chains to go out & retrieve any additional entities. Currently, I just take a prebuilt set of retrieval tasks, wrap a Future async around them, then set it off with a CachedThreadPool. This is fine, but when 50+ users hit the server the primary task (of retrieving the main entity & the first sub-entity) is dramatically slowed by all of the async threads running.
I want to know if there is a way to make the asynchronous calls to run on a lower priority in order to make sure the primary call is handled quickly.
public CompletableFuture buildFutureTasks(P primaryEntity, List<List<S>> entityGroups)
{
ExecutorService pool = Executors.newCachedThreadPool();
CompletableFuture<Void> future = null;
for (List<S> entityGroup : entityGroups)
{
if (future == null || future.isDone())
{
future = CompletableFuture.runAsync(() ->
retrieveSubEntitiesForEntity(primaryEntity, entityGroup), pool);
}
else
{
future.thenRunAsync(() ->
retrieveSubEntitiesForEntities(primaryEntity, entityGroup), pool);
}
}
return future;
}
This is the fastest I've been able to make this run with 50+ users but it still dramatically slows down the more users I add.
As you are most likely to know already, there is a method Thread::setPriority. But as to the JavaDoc
Every thread has a priority. Threads with higher priority are executed
in preference to threads with lower priority.
So you can just provide a ThreadFactory when creating your cached ExecutorService.
The actual thread scheduling details is VM-implementation specific so you cannot really rely on this. I would consider using fixedThreadPool instead
But I'm not sure that the actual problem is about thread scheduling and priority. First of all, cached thread pool can (as to the documentation)
Creates a thread pool that creates new threads as needed, but will
reuse previously constructed threads when they are available.
In case of 50+ users which can call buildFutureTasks you cannot really control the number of threads created.
I would consider using fixedThreadPool so you can control if don't really need SynchronousQueue which is the underlying of cached thread pools.
Consider using the same ThreadPool for all the task and not create it inside the method buildFutureTasks every time it is called.
I need help with my multithreading code.
I have a callable class which returns a value. I have a cachedThreadPool to submit ~60,000 tasks. I collect all the Futures in a List. After the ExecutiveService has shutdown, I loop through the list of Futures, and write the returned values using a bufferedWriter. Is this correct way of implementation?
ExecutorService execService = Executors.newCachedThreadPool();
List<Future<ValidationDataObject<String, Boolean>>> futureList = new ArrayList<>();
for (int i = 0; i < emailArrayList.size(); i++) {
String emailAddress = emailArrayList.get(i);
ValidateEmail validateEmail = new ValidateEmail(emailAddress);
Future<ValidationDataObject<String, Boolean>> future =
execService.submit(validateEmail);
futureList.add(future);
}
execService.shutdown();
for (Future<ValidationDataObject<String, Boolean>> future: futureList) {
ValidationDataObject<String, Boolean> validationObject = future.get();
bufferedWriter.write(validationObject.getEmailAddress() + "|"
+ validationObject.getIsValid());
bufferedWriter.newLine();
bufferedWriter.flush();
}
if (execService.isTerminated()) bufferedWriter.close();
Should I using synchronized block for the bufferedWriter? I am thinking, It doesn't need to be synchronized because, I am using the bufferedWriter from the main Thread, right?
I have a cachedThreadPool to submit ~60,000 tasks.
Off the bat, a cached thread-pool and 60k tasks is a red flag. That is going to start 60k threads which I doubt you really want. You should use a fixed thread pool and vary the number of threads until you achieve a good balance of throughput versus overwhelming your server. Maybe start with 2x the number of CPUs and then vary it depending on the server load.
You might also might consider using a fixed size queue which will limit the number of tasks outstanding although 60k tasks is fine unless those objects are heavy.
I collect all the Futures in a List. After the ExecutiveService has shutdown, I loop through the list of Futures, and write the returned values using a bufferedWriter. Is this correct way of implementation?
Yes, that's a good pattern. You don't show the writer being created but it is certainly fine for the main thread to own that.
Should I using synchronized block for the bufferedWriter? I am thinking, It doesn't need to be synchronized because, I am using the bufferedWriter from the main Thread, right?
Right. No other threads are using it so that's fine. It is a very typical pattern to have a writer thread managing the output of a multi-thread application.
One final comment is that you might want to look at the ExecutionCompletionService which allows you to process the tasks as they finish instead of having to wait for them in order. You might require the output to be in order in which case this isn't helpful but it's good technology to know about anyway.
Apart from the fact, that executor.shutdown() will most likely not do, what you believe it to do (it simply stops the Executor from accepting new Tasks, it will not wait for all tasks to terminate), your code looks fine.
You are right, there is no need for synchronization with respect to the writer, as you access it only single threaded.
There are things, that can be improved, though. Firstly, you are not doing a lot of Exception handling. Future.get() will throw an ExecutionException, if the Callable hits an Exception.
I'm not certain, how large the deviations in execution-time of your Callables are. Assume, there are notable deviations look at the following case: Say we submit Callables A, B and C, you receive FutA, FutB and FutC. Calling the get methods will block until the calculation behind the Future is finished. In your setting, you might be waiting for FutA to complete, while FutB/FutC might already be finished and ready for writing. Worst case here is, that processing of A will delay writing for all 60000 tasks.
I think, I would go for another approach, where every Callable gets the reference to the same ConcurrentLinkedQueue and instead of returning the result via Future writes the result into that queue. In this scenario, the ordering of the result is not dependent on the ordering of the Callables but on the time, the Callables finish execution. Whether or not this results in a speedup depends on your setting (especially time to write result and deviation in execution times of the Callables).
I have some jobs that process images for my application. They take a lot of Heap memory. Thus I want to restrict the number of image processing tasks or queue them in some way.
I also use GPars to handle the image processing, but with my approach sometimes to many worker threads are open concurrently.
How can I use a ThreadPoolExecutor in Grails to get this done right?
I think u can do this by using GParsExecutorsPool
GParsExecutorsPool.withPool() {
Closure longLastingCalculation = {calculate()}
Closure fastCalculation = longLastingCalculation.async() //create a new closure, which starts the original closure on a thread pool
Future result=fastCalculation() //returns almost immediately
//do stuff while calculation performs …
println result.get()
}
For more details check this link:
Use of ThreadPool - the Java Executors' based concurrent collection processor
I have a scientific application which I usually run in parallel with xargs, but this scheme incurs repeated JVM start costs and neglects cached file I/O and the JIT compiler. I've already adapted the code to use a thread pool, but I'm stuck on how to save my output.
The program (i.e. one thread of the new program) reads two files, does some processing and then prints the result to standard output. Currently, I've dealt with output by having each thread add its result string to a BlockingQueue. Another thread takes from the queue and writes to a file, as long as a Boolean flag is true. Then I awaitTermination and set the flag to false, triggering the file to close and the program to exit.
My solution seems a little kludgey; what is the simplest and best way to accomplish this?
How should I write primary result data from many threads to a single file?
The answer doesn't need to be Java-specific if it is, for example, a broadly applicable method.
Update
I'm using "STOP" as the poison pill.
while (true) {
String line = queue.take();
if (line.equals("STOP")) {
break;
} else {
output.write(line);
}
}
output.close();
I manually start the queue-consuming thread, then add the jobs to the thread pool, wait for the jobs to finish and finally poison the queue and join the consumer thread.
That's really the way you want to do it, have the threads put their output to the queue and then have the writer exhaust it.
The only thing you might want to do to make things a little cleaner is rather than checking a flag, simply put an "all done" token on to the queue that the writer can use to know that it's finished. That way there's no out of band signaling necessary.
That's trivial to do, you can use an well known string, an enum, or simply a shared object.
You could use an ExecutorService.
Submit a Callable that would perform the task and return the string after completion.
When Submitting the Callable you get hold of a Future, store these references e.g. in a List.
Then simply iterate through the Futures and get the Strings by calling Future#get.
This will block until the task is completed if it not yet is, otherwise return the value immediately.
Example:
ExecutorService exec = Executors.newFixedThreadPool(10);
List<Future<String>> tasks = new ArrayList<Future<String>>();
tasks.add(exec.submit(new Callable<String> {
public String call() {
//do stuff
return <yourString>;
}
}));
//and so on for the other tasks
for (Future<String> task : tasks) {
String result = task.get();
//write to output
}
Many threads processing, one thread writing and a message queue between them is a good strategy. The issue that just needs to be solved, is knowing when all work is finished. One way to do that is to count how many worker threads you started, and then after that count how many responses you got. Something like this pseudo code:
int workers = 0
for each work item {
workers++
start the item's worker in a separate thread
}
while workers > 0 {
take worker's response from a queue
write response to file
workers--
}
This approach also works if the workers can find more work items while they are executing. Just include any additional not-yet-processed work in the worker responses, and then increment the workers count and start workers threads as usual.
If each of the workers returns just one message, you can use Java's ExecutorService to execute Callable instances which return the result. ExecutorService's methods give access to Future instances from which you can get the result when the Callable has finished its work.
So you would first submit all the tasks to the ExecutorService and then loop over all the Futures and get their responses. That way you would write the responses in the order in which you check the futures, which can be different from the order in which they finish their work. If latency is not important, that shouldn't be a problem. Otherwise, a message queue (as mentioned above) might be more suitable.
It's not clear if your output file has some defined order or if you just dump your data there. I assume it has no order.
I don't see why you need an extra thread for writing to output. Just synchronized the method that writes to file and call it at the end of each thread.
If you have many threads writing to the same file the simplest thing to do is to write to that file in the task.
final PrintWriter out =
ExecutorService es =
for(int i=0;i<tasks;i++)
es.submit(new Runnable() {
public void run() {
performCalculations();
// so only one thread can write to the file at a time.
synchornized(out) {
writeResults(out);
}
}
});
es.shutdown();
es.awaitTermination(1, TimeUnit.HOUR);
out.close();