Writing from Future of CachedThreadPool. Is my implementation incorrect?

Writing from Future of CachedThreadPool. Is my implementation incorrect? - java

I need help with my multithreading code.
I have a callable class which returns a value. I have a cachedThreadPool to submit ~60,000 tasks. I collect all the Futures in a List. After the ExecutiveService has shutdown, I loop through the list of Futures, and write the returned values using a bufferedWriter. Is this correct way of implementation?
ExecutorService execService = Executors.newCachedThreadPool();
List<Future<ValidationDataObject<String, Boolean>>> futureList = new ArrayList<>();
for (int i = 0; i < emailArrayList.size(); i++) {
String emailAddress = emailArrayList.get(i);
ValidateEmail validateEmail = new ValidateEmail(emailAddress);
Future<ValidationDataObject<String, Boolean>> future =
execService.submit(validateEmail);
futureList.add(future);
}
execService.shutdown();
for (Future<ValidationDataObject<String, Boolean>> future: futureList) {
ValidationDataObject<String, Boolean> validationObject = future.get();
bufferedWriter.write(validationObject.getEmailAddress() + "|"
+ validationObject.getIsValid());
bufferedWriter.newLine();
bufferedWriter.flush();
}
if (execService.isTerminated()) bufferedWriter.close();
Should I using synchronized block for the bufferedWriter? I am thinking, It doesn't need to be synchronized because, I am using the bufferedWriter from the main Thread, right?

I have a cachedThreadPool to submit ~60,000 tasks.
Off the bat, a cached thread-pool and 60k tasks is a red flag. That is going to start 60k threads which I doubt you really want. You should use a fixed thread pool and vary the number of threads until you achieve a good balance of throughput versus overwhelming your server. Maybe start with 2x the number of CPUs and then vary it depending on the server load.
You might also might consider using a fixed size queue which will limit the number of tasks outstanding although 60k tasks is fine unless those objects are heavy.
I collect all the Futures in a List. After the ExecutiveService has shutdown, I loop through the list of Futures, and write the returned values using a bufferedWriter. Is this correct way of implementation?
Yes, that's a good pattern. You don't show the writer being created but it is certainly fine for the main thread to own that.
Should I using synchronized block for the bufferedWriter? I am thinking, It doesn't need to be synchronized because, I am using the bufferedWriter from the main Thread, right?
Right. No other threads are using it so that's fine. It is a very typical pattern to have a writer thread managing the output of a multi-thread application.
One final comment is that you might want to look at the ExecutionCompletionService which allows you to process the tasks as they finish instead of having to wait for them in order. You might require the output to be in order in which case this isn't helpful but it's good technology to know about anyway.

Apart from the fact, that executor.shutdown() will most likely not do, what you believe it to do (it simply stops the Executor from accepting new Tasks, it will not wait for all tasks to terminate), your code looks fine.
You are right, there is no need for synchronization with respect to the writer, as you access it only single threaded.
There are things, that can be improved, though. Firstly, you are not doing a lot of Exception handling. Future.get() will throw an ExecutionException, if the Callable hits an Exception.
I'm not certain, how large the deviations in execution-time of your Callables are. Assume, there are notable deviations look at the following case: Say we submit Callables A, B and C, you receive FutA, FutB and FutC. Calling the get methods will block until the calculation behind the Future is finished. In your setting, you might be waiting for FutA to complete, while FutB/FutC might already be finished and ready for writing. Worst case here is, that processing of A will delay writing for all 60000 tasks.
I think, I would go for another approach, where every Callable gets the reference to the same ConcurrentLinkedQueue and instead of returning the result via Future writes the result into that queue. In this scenario, the ordering of the result is not dependent on the ordering of the Callables but on the time, the Callables finish execution. Whether or not this results in a speedup depends on your setting (especially time to write result and deviation in execution times of the Callables).

Related

Reusing ThreadPoolExecutor vs Creating and Disposing Ad Hoc?

I am building a multithreaded process that has a couple stages, each stage iterating through an unknown number of objects (hundreds of thousands from a buffered query resultset or text file). Each stage will kick off a runnable or callable for each object, but all runnables/callables must complete before moving on to the next stage.
I do not want to use a latch or any kind of synchronizer because I don't want to hurt the throughput. I suspect the latch's internals will slow things down with the synchronized counter. I also don't want to use a list of futures with invokeAll() either because I want to start execution of runnables immediately as I iterate through them.
However, creating a ThreadPoolExecutor for each stage, looping through and submitting all the runnables, and then shutting it down for each stage seems to be a functional solution...
public void runProcess() {
ResultSet rs = someDbConnection.executeQuery(someSQL);
ExecutorService stage1Executor = Executors.newFixedThreadPool(9);
while (rs.next()) {
//SUBMIT UNKNOWN # OF RUNNABLES FOR STAGE 1
}
rs.close();
stage1Executor.shutdown();
rs = someDbConnection.executeQuery(moreSQL);
ExecutorService stage2Executor = Executors.newFixedThreadPool(9);
while (rs.next()) {
//SUBMIT UNKNOWN # OF RUNNABLES FOR STAGE 2
}
rs.close();
stage2Executor.shutdown();
}
However, I know that setting up threads, threadpools, and anything that involves concurrency is expensive to construct and destroy. Or maybe it is not that big of a deal and I'm just being overly cautious about performance, because concurrency has expensive overhead no matter what. Is there a more efficient way of doing this? Using some kind of wait-for-completion operation I don't know about?

If you destroy the thread-pool and re-init a new one it will likely cost you much more than using a CountDownLatch!
Further, calling stage1Executor.shutdown(); does not promise that all the current threads will finish their execution before the new ExecutorService is up and running. Even calling shutdownNow() cannot guarantee it! (and you probably wouldn't want to call shutdownNow() because you want your threads to finish executing).
Donald Knuth's once said:
premature optimization is the root of all evil.
so even if you are not persuaded by me - better listen to him :)

Setting up and tearing down a handful of thread pools is negligible. Try it out in a loop in a test.
Using a countdown latch is fine, but maybe that might just be duplicating the work that ThreadPoolExecutor does internally and couples your task to your execution framework. Not a fan of this approach.
As for the original code, ExecutorService has an awaitTermination method so you can wait until your work is done before moving to the next stage.
For my money, your pseudo code is fine. Just replace executor.shutdown() with shutdownAndAwaitTermination(ExecutorService), the source for that is here: http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html

Multithreading help in Java

I'm new to Java, and I need some help working on this program. This is a small part of a large class project, and I must use multithreading.
Here's what I want to do algorithmically:
while (there is still input left, store chunk of input in <chunk>)
{
if there is not a free thread in my array then
wait until a thread finishes
else there is a free thread then
apply the free thread to <chunk> (which will do something to chunk and output it).
Note: The ordering of the chunks being output must be the same as input
}
So, the main things I don't know how to do:
How can I check whether or not there's a free thread in the array? I know that there is a function ThreadAlive, but it seems super inefficient to poll every single thread every time in my loop.
If there is no free thread, how can I wait until one has finished?
The ordering is important. How can I preserve the ordering in which the threads output? As in, the order of the output needs to match the order of the input. How can I guarantee this synchronization?
How do I even pass the chunk to my thread? Can I just use the Runnable interface to do this?
Any help with these four bullets is greatly appreciated. Since I'm a super noob, code samples would help significantly.
(side-note: Making an array of threads was just an idea of mine to handle the user defined number of threads. If you have a better way to handle this you're welcome to suggest it!)

Sounds like you basically have a producer/consumer model and can be solved with an ExecutorService and BlockingQueue. Here is a similar question with a similar answer:
producer/consumer work queues

As #altaiojok mentioned, you want to use an ExecutorService and BlockingQueue. The basic algorithm works like this:
ExecutorService executor = Executors.newFixedThreadPool(...); //or newCachedThreadPool, etc...
BlockingQueue<Future<?>> outputQueue = new LinkedBlockingQueue<Future<?>>();
//To be run by an input processing thread
void submitTasks() {
BufferedReader input = ... //going to assume you have a file that you want to read, this could be any method of input (file, keyboard, network, etc...)
String line = input.readLine();
while(line != null) {
outputQueue.add(executor.submit(new YourCallableImplementation(line)));
line = input.readLine();
}
}
//To be run by a different output processing thread
void processTaskOutput() {
try {
while(true) {
Future<?> resultFuture = outputQueue.take();
? result = resultFuture.get();
//process the output (write to file, send to network, print to screen, etc...
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
I'll leave it to you to figure out how to implement Runnable to make the input and output thread as well as how to implement Callable for the tasks you need to process.

I would suggest using commons-pool which offers pooling of threads so you can easily limit the number of used threads and it also offers some other helper methods.
Concerning the ordering: have a look at the synchronize keyword.
And I would suggest to have a look at the java tutorial (the part about concurrency): http://download.oracle.com/javase/tutorial/essential/concurrency/index.html

Streams might come handy:
List<Chunk> chunks = new ArrayList<>();
//....
Function<Chunk, String> toWeightInfo = (chunk) -> "weight = "+(chunk.size()*chunk.prio());
List<String> results = chunks.parallelStream()
.map(toWeightInfo)
.collect(Collectors.toList());
System.out.println(results);
The parallel stream uses the System's default "fork/join" thread pool, which should be the size of available logical CPUs and processes your stuff in parallel. It also guarantees the same order of results.
The parallel streams API hides all the complexity of assigning free threads to jobs and optimizations like work-stealing away from you. Just give it something to chew on and it will work its magic.
If you need to use a thread pool of a custom size, please refer to the
Custom thread pool in Java 8 parallel stream question.
You might also have a look at this good Java 8 Stream Tutorial.
If your case is rather complex and you're streaming chunks into your program, and you've got multiple stages of work, where some must be serial and some can be parallel and some depend on each other, you might have a look at the Disruptor framework from LMAX.
Kind regards

Use ExecutorCompletionService and Future<T>. Together they provide a threadpool based task framework that takes care of all your concerns.
How can I check whether or not there's a free thread in the array? I know that there is a function ThreadAlive, but it seems super inefficient to poll every single thread every time in my loop.
You dont have to. The executor will do this for you in an (super)efficient manner.You just have to submit tasks to it and sit back.
If there is no free thread, how can I wait until one has finished?
Again , you really dont have to. This is taken care of by executor.
The ordering is important. How can I preserve the ordering in which the threads output? As in, the order of the output needs to match the order of the input. How can I guarantee this synchronization?
This is a concern. If you want the processed output ( of chunks, in your words ) to arrive in the same order as these chunks are present in the initial array, you have to address a few points :
Is it just the order of arrival of the results that matter , or is it that the tasks processing themselves have dependencies on the order ? If it is the former , it is much easily done, but if its the later , then you have problems. ( which I think are very hard things to start with considering your admission of being new to Java, so I would just recommend more learning on your part before attempting this. )
Assuming it is the former case , what you can do is this : Submit the chunks to the executor in some order , and each submission will give you a handle ( called a Future<Result> ) to the task processed output. Store these handles in a ordered queue, and when you want the results , call the get() on these Future(s). Note that if some task in the middle of the order takes long time to complete , then the results of the following tasks will also be delayed.
How do I even pass the chunk to my thread? Can I just use the Runnable interface to do this?
Create a Callable instance wrapping one chunk each into the instance. This represents your task that you will submit() to the ExecutorService.

Simple asynchronous I/O: many threads, one file

I have a scientific application which I usually run in parallel with xargs, but this scheme incurs repeated JVM start costs and neglects cached file I/O and the JIT compiler. I've already adapted the code to use a thread pool, but I'm stuck on how to save my output.
The program (i.e. one thread of the new program) reads two files, does some processing and then prints the result to standard output. Currently, I've dealt with output by having each thread add its result string to a BlockingQueue. Another thread takes from the queue and writes to a file, as long as a Boolean flag is true. Then I awaitTermination and set the flag to false, triggering the file to close and the program to exit.
My solution seems a little kludgey; what is the simplest and best way to accomplish this?
How should I write primary result data from many threads to a single file?
The answer doesn't need to be Java-specific if it is, for example, a broadly applicable method.
Update
I'm using "STOP" as the poison pill.
while (true) {
String line = queue.take();
if (line.equals("STOP")) {
break;
} else {
output.write(line);
}
}
output.close();
I manually start the queue-consuming thread, then add the jobs to the thread pool, wait for the jobs to finish and finally poison the queue and join the consumer thread.

That's really the way you want to do it, have the threads put their output to the queue and then have the writer exhaust it.
The only thing you might want to do to make things a little cleaner is rather than checking a flag, simply put an "all done" token on to the queue that the writer can use to know that it's finished. That way there's no out of band signaling necessary.
That's trivial to do, you can use an well known string, an enum, or simply a shared object.

You could use an ExecutorService.
Submit a Callable that would perform the task and return the string after completion.
When Submitting the Callable you get hold of a Future, store these references e.g. in a List.
Then simply iterate through the Futures and get the Strings by calling Future#get.
This will block until the task is completed if it not yet is, otherwise return the value immediately.
Example:
ExecutorService exec = Executors.newFixedThreadPool(10);
List<Future<String>> tasks = new ArrayList<Future<String>>();
tasks.add(exec.submit(new Callable<String> {
public String call() {
//do stuff
return <yourString>;
}
}));
//and so on for the other tasks
for (Future<String> task : tasks) {
String result = task.get();
//write to output
}

Many threads processing, one thread writing and a message queue between them is a good strategy. The issue that just needs to be solved, is knowing when all work is finished. One way to do that is to count how many worker threads you started, and then after that count how many responses you got. Something like this pseudo code:
int workers = 0
for each work item {
workers++
start the item's worker in a separate thread
}
while workers > 0 {
take worker's response from a queue
write response to file
workers--
}
This approach also works if the workers can find more work items while they are executing. Just include any additional not-yet-processed work in the worker responses, and then increment the workers count and start workers threads as usual.
If each of the workers returns just one message, you can use Java's ExecutorService to execute Callable instances which return the result. ExecutorService's methods give access to Future instances from which you can get the result when the Callable has finished its work.
So you would first submit all the tasks to the ExecutorService and then loop over all the Futures and get their responses. That way you would write the responses in the order in which you check the futures, which can be different from the order in which they finish their work. If latency is not important, that shouldn't be a problem. Otherwise, a message queue (as mentioned above) might be more suitable.

It's not clear if your output file has some defined order or if you just dump your data there. I assume it has no order.
I don't see why you need an extra thread for writing to output. Just synchronized the method that writes to file and call it at the end of each thread.

If you have many threads writing to the same file the simplest thing to do is to write to that file in the task.
final PrintWriter out =
ExecutorService es =
for(int i=0;i<tasks;i++)
es.submit(new Runnable() {
public void run() {
performCalculations();
// so only one thread can write to the file at a time.
synchornized(out) {
writeResults(out);
}
}
});
es.shutdown();
es.awaitTermination(1, TimeUnit.HOUR);
out.close();

a "simple" thread pool in java

I'm looking for a simple object that will hold my work threads and I need it to not limit the number of threads, and not keep them alive longer than needed.
But I do need it to have a method similar to an ExecutorService.shutdown();
(Waiting for all the active threads to finish but not accepting any new ones)
so maybe a threadpool isn't what I need, so I would love a push in the right direction.
(as they are meant to keep the threads alive)
Further clarification of intent:
each thread is an upload of a file, and I have another process that modifies files, but it waits for the file to not have any uploads. by joining each of the threads. So when they are kept alive it locks that process. (each thread adds himself to a list for a specific file on creation, so I only join() threads that upload a specific file)

One way to do what you awant is to use a Callable with a Future that returns the File object of a completed upload. Then pass the Future into another Callable that checks Future.isDone() and spins until it returns true and then do whatever you need to do to the file. Your use case is not unique and fits very neatly into the java.util.concurrent package capabilities.
One interesting class is ExecutorCompletionService class which does exactly what you want with waiting for results then proceeding with an additional calculation.
A CompletionService that uses a
supplied Executor to execute tasks.
This class arranges that submitted
tasks are, upon completion, placed on
a queue accessible using take. The
class is lightweight enough to be
suitable for transient use when
processing groups of tasks.
Usage Examples: Suppose you have a set of solvers for a certain problem,
each returning a value of some type
Result, and would like to run them
concurrently, processing the results
of each of them that return a non-null
value, in some method use(Result r).
You could write this as:
void solve(Executor e, Collection<Callable<Result>> solvers)
throws InterruptedException, ExecutionException
{
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers) { ecs.submit(s); }
int n = solvers.size();
for (int i = 0; i < n; ++i)
{
Result r = ecs.take().get();
if (r != null) { use(r); }
}
}
You don't want an unbounded ExecutorService
You almost never want to allow unbounded thread pools, as they actually can limit the performance of your application if the number of threads gets out of hand.
You domain is limited by disk or network I/O or both, so a small thread pool would be sufficient. You are not going to want to try and read from hundreds or thousands of incoming connections with a thread per connection.
Part of your solution, if you are receiving more than a handful of concurrent uploads is to investigate the java.nio package and read about non-blocking I/O as well.

Is there a reason that you don't want to reuse threads? Seems to me that the simplest thing would be to use ExecutorService anyway and let it reuse threads.

Concurrent and Blocking Queue in Java

I have the classic problem of a thread pushing events to the incoming queue of a second thread. Only this time, I am very interested about performance. What I want to achieve is:
I want concurrent access to the queue, the producer pushing, the receiver poping.
When the queue is empty, I want the consumer to block to the queue, waiting for the producer.
My first idea was to use a LinkedBlockingQueue, but I soon realized that it is not concurrent and the performance suffered. On the other hand, I now use a ConcurrentLinkedQueue, but still I am paying the cost of wait() / notify() on each publication. Since the consumer, upon finding an empty queue, does not block, I have to synchronize and wait() on a lock. On the other part, the producer has to get that lock and notify() upon every single publication. The overall result is that I am paying the cost of
sycnhronized (lock) {lock.notify()} in every single publication, even when not needed.
What I guess is needed here, is a queue that is both blocking and concurrent. I imagine a push() operation to work as in ConcurrentLinkedQueue, with an extra notify() to the object when the pushed element is the first in the list. Such a check I consider to already exist in the ConcurrentLinkedQueue, as pushing requires connecting with the next element. Thus, this would be much faster than synchronizing every time on the external lock.
Is something like this available/reasonable?

I think you can stick to java.util.concurrent.LinkedBlockingQueue regardless of your doubts. It is concurrent. Though, I have no idea about its performance. Probably, other implementation of BlockingQueue will suit you better. There's not too many of them, so make performance tests and measure.

Similar to this answer https://stackoverflow.com/a/1212515/1102730 but a bit different.. I ended up using an ExecutorService. You can instantiate one by using Executors.newSingleThreadExecutor(). I needed a concurrent queue for reading/writing BufferedImages to files, as well as atomicity with reads and writes. I only need a single thread because the file IO is orders of magnitude faster than the source, net IO. Also, I was more concerned about atomicity of actions and correctness than performance, but this approach can also be done with multiple threads in the pool to speed things up.
To get an image (Try-Catch-Finally omitted):
Future<BufferedImage> futureImage = executorService.submit(new Callable<BufferedImage>() {
#Override
public BufferedImage call() throws Exception {
ImageInputStream is = new FileImageInputStream(file);
return ImageIO.read(is);
}
})
image = futureImage.get();
To save an image (Try-Catch-Finally omitted):
Future<Boolean> futureWrite = executorService.submit(new Callable<Boolean>() {
#Override
public Boolean call() {
FileOutputStream os = new FileOutputStream(file);
return ImageIO.write(image, getFileFormat(), os);
}
});
boolean wasWritten = futureWrite.get();
It's important to note that you should flush and close your streams in a finally block. I don't know about how it performs compared to other solutions, but it is pretty versatile.

I would suggest you look at ThreadPoolExecutor newSingleThreadExecutor. It will handle keeping your tasks ordered for you, and if you submit Callables to your executor, you will be able to get the blocking behavior you are looking for as well.

You can try LinkedTransferQueue from jsr166: http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166y/
It fulfills your requirements and have less overhead for offer/poll operations.
As I can see from the code, when the queue is not empty, it uses atomic operations for polling elements. And when the queue is empty, it spins for some time and park the thread if unsuccessful.
I think it can help in your case.

I use the ArrayBlockingQueue whenever I need to pass data from one thread to another. Using the put and take methods (which will block if full/empty).

Here is a list of classes implementing BlockingQueue.
I would recommend checking out SynchronousQueue.
Like #Rorick mentioned in his comment, I believe all of those implementations are concurrent. I think your concerns with LinkedBlockingQueue may be out of place.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.