I'm new to Java, and I need some help working on this program. This is a small part of a large class project, and I must use multithreading.
Here's what I want to do algorithmically:
while (there is still input left, store chunk of input in <chunk>)
{
if there is not a free thread in my array then
wait until a thread finishes
else there is a free thread then
apply the free thread to <chunk> (which will do something to chunk and output it).
Note: The ordering of the chunks being output must be the same as input
}
So, the main things I don't know how to do:
How can I check whether or not there's a free thread in the array? I know that there is a function ThreadAlive, but it seems super inefficient to poll every single thread every time in my loop.
If there is no free thread, how can I wait until one has finished?
The ordering is important. How can I preserve the ordering in which the threads output? As in, the order of the output needs to match the order of the input. How can I guarantee this synchronization?
How do I even pass the chunk to my thread? Can I just use the Runnable interface to do this?
Any help with these four bullets is greatly appreciated. Since I'm a super noob, code samples would help significantly.
(side-note: Making an array of threads was just an idea of mine to handle the user defined number of threads. If you have a better way to handle this you're welcome to suggest it!)
Sounds like you basically have a producer/consumer model and can be solved with an ExecutorService and BlockingQueue. Here is a similar question with a similar answer:
producer/consumer work queues
As #altaiojok mentioned, you want to use an ExecutorService and BlockingQueue. The basic algorithm works like this:
ExecutorService executor = Executors.newFixedThreadPool(...); //or newCachedThreadPool, etc...
BlockingQueue<Future<?>> outputQueue = new LinkedBlockingQueue<Future<?>>();
//To be run by an input processing thread
void submitTasks() {
BufferedReader input = ... //going to assume you have a file that you want to read, this could be any method of input (file, keyboard, network, etc...)
String line = input.readLine();
while(line != null) {
outputQueue.add(executor.submit(new YourCallableImplementation(line)));
line = input.readLine();
}
}
//To be run by a different output processing thread
void processTaskOutput() {
try {
while(true) {
Future<?> resultFuture = outputQueue.take();
? result = resultFuture.get();
//process the output (write to file, send to network, print to screen, etc...
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
I'll leave it to you to figure out how to implement Runnable to make the input and output thread as well as how to implement Callable for the tasks you need to process.
I would suggest using commons-pool which offers pooling of threads so you can easily limit the number of used threads and it also offers some other helper methods.
Concerning the ordering: have a look at the synchronize keyword.
And I would suggest to have a look at the java tutorial (the part about concurrency): http://download.oracle.com/javase/tutorial/essential/concurrency/index.html
Streams might come handy:
List<Chunk> chunks = new ArrayList<>();
//....
Function<Chunk, String> toWeightInfo = (chunk) -> "weight = "+(chunk.size()*chunk.prio());
List<String> results = chunks.parallelStream()
.map(toWeightInfo)
.collect(Collectors.toList());
System.out.println(results);
The parallel stream uses the System's default "fork/join" thread pool, which should be the size of available logical CPUs and processes your stuff in parallel. It also guarantees the same order of results.
The parallel streams API hides all the complexity of assigning free threads to jobs and optimizations like work-stealing away from you. Just give it something to chew on and it will work its magic.
If you need to use a thread pool of a custom size, please refer to the
Custom thread pool in Java 8 parallel stream question.
You might also have a look at this good Java 8 Stream Tutorial.
If your case is rather complex and you're streaming chunks into your program, and you've got multiple stages of work, where some must be serial and some can be parallel and some depend on each other, you might have a look at the Disruptor framework from LMAX.
Kind regards
Use ExecutorCompletionService and Future<T>. Together they provide a threadpool based task framework that takes care of all your concerns.
How can I check whether or not there's a free thread in the array? I know that there is a function ThreadAlive, but it seems super inefficient to poll every single thread every time in my loop.
You dont have to. The executor will do this for you in an (super)efficient manner.You just have to submit tasks to it and sit back.
If there is no free thread, how can I wait until one has finished?
Again , you really dont have to. This is taken care of by executor.
The ordering is important. How can I preserve the ordering in which the threads output? As in, the order of the output needs to match the order of the input. How can I guarantee this synchronization?
This is a concern. If you want the processed output ( of chunks, in your words ) to arrive in the same order as these chunks are present in the initial array, you have to address a few points :
Is it just the order of arrival of the results that matter , or is it that the tasks processing themselves have dependencies on the order ? If it is the former , it is much easily done, but if its the later , then you have problems. ( which I think are very hard things to start with considering your admission of being new to Java, so I would just recommend more learning on your part before attempting this. )
Assuming it is the former case , what you can do is this : Submit the chunks to the executor in some order , and each submission will give you a handle ( called a Future<Result> ) to the task processed output. Store these handles in a ordered queue, and when you want the results , call the get() on these Future(s). Note that if some task in the middle of the order takes long time to complete , then the results of the following tasks will also be delayed.
How do I even pass the chunk to my thread? Can I just use the Runnable interface to do this?
Create a Callable instance wrapping one chunk each into the instance. This represents your task that you will submit() to the ExecutorService.
Related
I need help with my multithreading code.
I have a callable class which returns a value. I have a cachedThreadPool to submit ~60,000 tasks. I collect all the Futures in a List. After the ExecutiveService has shutdown, I loop through the list of Futures, and write the returned values using a bufferedWriter. Is this correct way of implementation?
ExecutorService execService = Executors.newCachedThreadPool();
List<Future<ValidationDataObject<String, Boolean>>> futureList = new ArrayList<>();
for (int i = 0; i < emailArrayList.size(); i++) {
String emailAddress = emailArrayList.get(i);
ValidateEmail validateEmail = new ValidateEmail(emailAddress);
Future<ValidationDataObject<String, Boolean>> future =
execService.submit(validateEmail);
futureList.add(future);
}
execService.shutdown();
for (Future<ValidationDataObject<String, Boolean>> future: futureList) {
ValidationDataObject<String, Boolean> validationObject = future.get();
bufferedWriter.write(validationObject.getEmailAddress() + "|"
+ validationObject.getIsValid());
bufferedWriter.newLine();
bufferedWriter.flush();
}
if (execService.isTerminated()) bufferedWriter.close();
Should I using synchronized block for the bufferedWriter? I am thinking, It doesn't need to be synchronized because, I am using the bufferedWriter from the main Thread, right?
I have a cachedThreadPool to submit ~60,000 tasks.
Off the bat, a cached thread-pool and 60k tasks is a red flag. That is going to start 60k threads which I doubt you really want. You should use a fixed thread pool and vary the number of threads until you achieve a good balance of throughput versus overwhelming your server. Maybe start with 2x the number of CPUs and then vary it depending on the server load.
You might also might consider using a fixed size queue which will limit the number of tasks outstanding although 60k tasks is fine unless those objects are heavy.
I collect all the Futures in a List. After the ExecutiveService has shutdown, I loop through the list of Futures, and write the returned values using a bufferedWriter. Is this correct way of implementation?
Yes, that's a good pattern. You don't show the writer being created but it is certainly fine for the main thread to own that.
Should I using synchronized block for the bufferedWriter? I am thinking, It doesn't need to be synchronized because, I am using the bufferedWriter from the main Thread, right?
Right. No other threads are using it so that's fine. It is a very typical pattern to have a writer thread managing the output of a multi-thread application.
One final comment is that you might want to look at the ExecutionCompletionService which allows you to process the tasks as they finish instead of having to wait for them in order. You might require the output to be in order in which case this isn't helpful but it's good technology to know about anyway.
Apart from the fact, that executor.shutdown() will most likely not do, what you believe it to do (it simply stops the Executor from accepting new Tasks, it will not wait for all tasks to terminate), your code looks fine.
You are right, there is no need for synchronization with respect to the writer, as you access it only single threaded.
There are things, that can be improved, though. Firstly, you are not doing a lot of Exception handling. Future.get() will throw an ExecutionException, if the Callable hits an Exception.
I'm not certain, how large the deviations in execution-time of your Callables are. Assume, there are notable deviations look at the following case: Say we submit Callables A, B and C, you receive FutA, FutB and FutC. Calling the get methods will block until the calculation behind the Future is finished. In your setting, you might be waiting for FutA to complete, while FutB/FutC might already be finished and ready for writing. Worst case here is, that processing of A will delay writing for all 60000 tasks.
I think, I would go for another approach, where every Callable gets the reference to the same ConcurrentLinkedQueue and instead of returning the result via Future writes the result into that queue. In this scenario, the ordering of the result is not dependent on the ordering of the Callables but on the time, the Callables finish execution. Whether or not this results in a speedup depends on your setting (especially time to write result and deviation in execution times of the Callables).
and excuse the lack of knowledge on multithreaded apps, but I am new to the field.
Is there a pattern or common used methodology for monitoring the 'job completion' or 'job status' of worker threads from a monitor (a class that acts as a monitor)?
What I have currently done is create a list of workers and create one thread for each worker. After all threads have started i am looping over the worker list and 'checking their status' by making a call to a method.
At that time I couldn't come up with a different solution, but being new to the field, I don't know if this is the way to go, or if there are other solutions or patterns that I should study.
Depending on what you want, there are many ways that you can do this.
If you just want to wait until all the threads finish (i.e. all you care about is having everything finish before moving on), you can use Thread.join():
try {
for (Thread t: threadsIWaitOn)
t.join();
} catch (InterruptedException iex) {
/* ... handle error ...
}
If you want a more fine-grained control over the thread status and want to be able, at any time, to know what threads are doing, you can use the Thread.getState() function. This returns a Thread.State object that describes whether the thread is running, blocked, new, etc., and the Javadoc specifically says that it's designed for monitoring the state of a thread rather than trying to synchronize on it. This might be want you want to do.
If you want even more information than that - say, how to get a progress indicator for each thread that counts up from 0 to 100 as the thread progresses - then another option might be to create a Map from Threads to AtomicIntegers associating each thread with a counter, then pass the AtomicInteger into the constructor of each thread. That way, each thread can continuously increment the counters, and you can have another thread that continuously polls the progress.
In short, you have a lot of options based on what it is that you're trying to accomplish. Hopefully something in here helps out!
Use a ThreadPool and Executor, then you get a Future<> and you can poll for their completion and some more nice stuff, too. I can appreciate this book for you: Java Concurrency in Practice
Try to use any kind of synchronization. For example, wait on some kind of monitor/semaphore until job is done / whatever you need.
Suppose you need to deal with 2 threads, a Reader and a Processor.
Reader will read a portion of the stream data and will pass it to the Processor, that will do something.
The idea is to not stress the Reader with too much of data.
In the set up, i
// Processor will pick up data from pipeIn and will place the output in pipeOut
Thread p = new Thread(new Processor(pipeIn, pipeOut));
p.start();
// Reader will pick a bunch of bits from the InputStream and place it to pipeIn
Thread r = new Thread(new Reader(inputStream, pipeIn));
r.start();
Needless to say, neither pipe is null, when initialized.
I am thinking ... When Processor has been started it attempts to read from the pipeIn, in the following loop:
while (readingShouldContinue) {
Thread.sleep(1); // To avoid tight loop
byte[] justRead = readFrom.getDataCurrentlyInQueue();
writeDataToPipe(processData(justRead));
}
If there is no data to write, it will write nothing, should be no problem.
The Reader comes alive and picks up some data from a stream:
while ((in.read(buffer)) != -1) {
// Writes to what processor considers a pipeIn
writeTo.addDataToQueue(buffer);
}
In Pipe itself, i synchronize access to data.
public byte[] getDataCurrentlyInQueue() {
synchronized (q) {
byte[] a = q.peek();
q.clear();
return a;
}
}
I expect the 2 threads to run semi in parallel, interchanging activities between Reader and a Processor. What happens however is that
Reader reads all blocks up front
Processor treats everything as 1 single block
What am i missing please?
What am i missing please?
(First I should point out that you've left out some critical bits of the code and other information that is needed for a specific fact-based answer.)
I can think of a number of possible explanations:
There may simply be a bug in your application. There's not a lot of point guessing what that bug might be, but if you showed us more code ...
The OS thread scheduler will tend to let an active thread keep running until it blocks. If your processor has only one core (or if the OS only allows your application to use one core), then the second thread may starve ... long enough for the first one to finish.
Even if you have multiple cores, the OS thread scheduler may be slow to assign extra cores, especially if the 2nd thread starts and then immediately blocks.
It is possible that there is some "granularity" effect in the buffering that is causing work not to appear in the queue. (You could view this as a bug ... or as a tuning issue.)
It could simply be that you are not giving the application enough load for multi-threading to kick in.
Finally, I can't figure out the Thread.sleep stuff either. A properly written multi-threaded application does not use Thread.sleep for anything but long term delays; e.g. threads that do periodic house-keeping tasks in the background. If you use sleep instead of blocking, then 1) you risk making the application non-responsive, and 2) you may encourage the OS thread scheduler to give the thread fewer time slices. It could well be that this is the source of your trouble vis-a-vis thread starvation.
You reinvented parts of the java concurrent library. it would make things a lot easier if you modeled your threads with BlockingQueue instead of synchronizind things yourself.
Basically your producer would put chunks on the BlockingQueue und your consumer would while(true) loop over the queue and call get(). That way the producer would block/wait until there is a new chunk on the queue.
The reader is reading everything before its first time-slice. This means that the reading is finishing before the processor ever gets a chance to run.
Try increasing the amount of bytes that are being read, or slow down the reader somehow; maybe with a sleep() call every once in a while.
Btw. Don't poll. It is a horrendous waste of CPU cycles, and it doesn't scale at all.
Also use a synchronized queue and forget the manual locking. http://docs.oracle.com/javase/tutorial/collections/implementations/queue.html
When using multiple threads you need to determine whether you
have work which can be performed in parallel efficiently.
are not adding more overhead than the improvement you are likely to achieve
the OS, or some library is not already optimised to do what you are trying to do.
In your case, you have a good example of when not to use multi-threads. The OS is already tuned to read ahead and buffer data before you ask for it. The work the Reader does is relatively trivial. The overhead of creating new buffers, adding them to a queue and passing the data between threads is likely to be greater than the amount of work you are performing in parallel.
When you try to use multiple threads to do a task best done by a single thread, you will get strange profiling/tuning results.
+1 For a good question.
I have a scientific application which I usually run in parallel with xargs, but this scheme incurs repeated JVM start costs and neglects cached file I/O and the JIT compiler. I've already adapted the code to use a thread pool, but I'm stuck on how to save my output.
The program (i.e. one thread of the new program) reads two files, does some processing and then prints the result to standard output. Currently, I've dealt with output by having each thread add its result string to a BlockingQueue. Another thread takes from the queue and writes to a file, as long as a Boolean flag is true. Then I awaitTermination and set the flag to false, triggering the file to close and the program to exit.
My solution seems a little kludgey; what is the simplest and best way to accomplish this?
How should I write primary result data from many threads to a single file?
The answer doesn't need to be Java-specific if it is, for example, a broadly applicable method.
Update
I'm using "STOP" as the poison pill.
while (true) {
String line = queue.take();
if (line.equals("STOP")) {
break;
} else {
output.write(line);
}
}
output.close();
I manually start the queue-consuming thread, then add the jobs to the thread pool, wait for the jobs to finish and finally poison the queue and join the consumer thread.
That's really the way you want to do it, have the threads put their output to the queue and then have the writer exhaust it.
The only thing you might want to do to make things a little cleaner is rather than checking a flag, simply put an "all done" token on to the queue that the writer can use to know that it's finished. That way there's no out of band signaling necessary.
That's trivial to do, you can use an well known string, an enum, or simply a shared object.
You could use an ExecutorService.
Submit a Callable that would perform the task and return the string after completion.
When Submitting the Callable you get hold of a Future, store these references e.g. in a List.
Then simply iterate through the Futures and get the Strings by calling Future#get.
This will block until the task is completed if it not yet is, otherwise return the value immediately.
Example:
ExecutorService exec = Executors.newFixedThreadPool(10);
List<Future<String>> tasks = new ArrayList<Future<String>>();
tasks.add(exec.submit(new Callable<String> {
public String call() {
//do stuff
return <yourString>;
}
}));
//and so on for the other tasks
for (Future<String> task : tasks) {
String result = task.get();
//write to output
}
Many threads processing, one thread writing and a message queue between them is a good strategy. The issue that just needs to be solved, is knowing when all work is finished. One way to do that is to count how many worker threads you started, and then after that count how many responses you got. Something like this pseudo code:
int workers = 0
for each work item {
workers++
start the item's worker in a separate thread
}
while workers > 0 {
take worker's response from a queue
write response to file
workers--
}
This approach also works if the workers can find more work items while they are executing. Just include any additional not-yet-processed work in the worker responses, and then increment the workers count and start workers threads as usual.
If each of the workers returns just one message, you can use Java's ExecutorService to execute Callable instances which return the result. ExecutorService's methods give access to Future instances from which you can get the result when the Callable has finished its work.
So you would first submit all the tasks to the ExecutorService and then loop over all the Futures and get their responses. That way you would write the responses in the order in which you check the futures, which can be different from the order in which they finish their work. If latency is not important, that shouldn't be a problem. Otherwise, a message queue (as mentioned above) might be more suitable.
It's not clear if your output file has some defined order or if you just dump your data there. I assume it has no order.
I don't see why you need an extra thread for writing to output. Just synchronized the method that writes to file and call it at the end of each thread.
If you have many threads writing to the same file the simplest thing to do is to write to that file in the task.
final PrintWriter out =
ExecutorService es =
for(int i=0;i<tasks;i++)
es.submit(new Runnable() {
public void run() {
performCalculations();
// so only one thread can write to the file at a time.
synchornized(out) {
writeResults(out);
}
}
});
es.shutdown();
es.awaitTermination(1, TimeUnit.HOUR);
out.close();
I have the classic problem of a thread pushing events to the incoming queue of a second thread. Only this time, I am very interested about performance. What I want to achieve is:
I want concurrent access to the queue, the producer pushing, the receiver poping.
When the queue is empty, I want the consumer to block to the queue, waiting for the producer.
My first idea was to use a LinkedBlockingQueue, but I soon realized that it is not concurrent and the performance suffered. On the other hand, I now use a ConcurrentLinkedQueue, but still I am paying the cost of wait() / notify() on each publication. Since the consumer, upon finding an empty queue, does not block, I have to synchronize and wait() on a lock. On the other part, the producer has to get that lock and notify() upon every single publication. The overall result is that I am paying the cost of
sycnhronized (lock) {lock.notify()} in every single publication, even when not needed.
What I guess is needed here, is a queue that is both blocking and concurrent. I imagine a push() operation to work as in ConcurrentLinkedQueue, with an extra notify() to the object when the pushed element is the first in the list. Such a check I consider to already exist in the ConcurrentLinkedQueue, as pushing requires connecting with the next element. Thus, this would be much faster than synchronizing every time on the external lock.
Is something like this available/reasonable?
I think you can stick to java.util.concurrent.LinkedBlockingQueue regardless of your doubts. It is concurrent. Though, I have no idea about its performance. Probably, other implementation of BlockingQueue will suit you better. There's not too many of them, so make performance tests and measure.
Similar to this answer https://stackoverflow.com/a/1212515/1102730 but a bit different.. I ended up using an ExecutorService. You can instantiate one by using Executors.newSingleThreadExecutor(). I needed a concurrent queue for reading/writing BufferedImages to files, as well as atomicity with reads and writes. I only need a single thread because the file IO is orders of magnitude faster than the source, net IO. Also, I was more concerned about atomicity of actions and correctness than performance, but this approach can also be done with multiple threads in the pool to speed things up.
To get an image (Try-Catch-Finally omitted):
Future<BufferedImage> futureImage = executorService.submit(new Callable<BufferedImage>() {
#Override
public BufferedImage call() throws Exception {
ImageInputStream is = new FileImageInputStream(file);
return ImageIO.read(is);
}
})
image = futureImage.get();
To save an image (Try-Catch-Finally omitted):
Future<Boolean> futureWrite = executorService.submit(new Callable<Boolean>() {
#Override
public Boolean call() {
FileOutputStream os = new FileOutputStream(file);
return ImageIO.write(image, getFileFormat(), os);
}
});
boolean wasWritten = futureWrite.get();
It's important to note that you should flush and close your streams in a finally block. I don't know about how it performs compared to other solutions, but it is pretty versatile.
I would suggest you look at ThreadPoolExecutor newSingleThreadExecutor. It will handle keeping your tasks ordered for you, and if you submit Callables to your executor, you will be able to get the blocking behavior you are looking for as well.
You can try LinkedTransferQueue from jsr166: http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166y/
It fulfills your requirements and have less overhead for offer/poll operations.
As I can see from the code, when the queue is not empty, it uses atomic operations for polling elements. And when the queue is empty, it spins for some time and park the thread if unsuccessful.
I think it can help in your case.
I use the ArrayBlockingQueue whenever I need to pass data from one thread to another. Using the put and take methods (which will block if full/empty).
Here is a list of classes implementing BlockingQueue.
I would recommend checking out SynchronousQueue.
Like #Rorick mentioned in his comment, I believe all of those implementations are concurrent. I think your concerns with LinkedBlockingQueue may be out of place.