In java, how do we do this:
Many threads created and writes to a single file.
Since any thread can write to that file ( by acquiring the lock) the order of thread writing is gone.
The the number of threads created will be more than 40. How to dump the threads provided their sequence is not affected.
Thanks in advance.
From you description, the simplest solution is to have the threads save their results and have another thread write them in the order you intended. e.g.
int threads = Runtime.getRuntime().availableProcessors();
ExecutorServer es = Executors.newFixedThreadPool(threads);
List<Future<String>> results = new ArrayList<>();
for (int i = 0; i < tasks; i++)
results.add(es.submit(new Callable<String>() {
public String call() {
// task which returns a String
}
});
try (PrintWriter fw = new PrintWriter("output.txt")) {
for (Future f : results)
fw.println(f.get());
}
As you can see
The tasks can execute in any order, but the results are written to the file in expected order.
The number of tasks executed at once is limited to the actual number of logical CPUs.
Well, threads dont actually have an order, so you would need to created some kind of Pool Queue for them programatically.
Though i dont know if Java 1.7 added new things with its java.concurrent Classes.. that may help you with what you want.
But as for Java 1.6 and below, you would have to have a Queue to manage the threads one by one, to maintain actual order that the JVM wont do.
Related
I would like to read a file line by line, do something slow with each line that can easily be done in parallel, and write the result to a file line by line. I don't care about the order of the output. The input and output are so big they don't fit in memory. I would like to be able to set a hard limit on the number of threads running at the same time as well as the number of lines in memory.
The libary I use for file IO (Apache Commons CSV) does not seem to offer synchronised file access so I don't think I can read from the same file or write to the same file from several threads at once. If that was possible I would create a ThreadPoolExecutor and feed it a task for each line, which would simply read the line, perform the calculation and write the result.
Instead, what I think I need is a single thread that does the parsing, a bounded queue for the parsed input lines, a thread pool with jobs that do the calculations, a bounded queue for the calculated output lines, and a single thread that does the writing. A producer, a lot of consumer-producers and a consumer if that makes sense.
What I have looks like this:
BlockingQueue<CSVRecord> inputQueue = new ArrayBlockingQueue<CSVRecord>(INPUT_QUEUE_SIZE);
BlockingQueue<String[]> outputQueue = new ArrayBlockingQueue<String[]>(OUTPUT_QUEUE_SIZE);
Thread parserThread = new Thread(() -> {
while (inputFileIterator.hasNext()) {
CSVRecord record = inputFileIterator.next();
parsedQueue.put(record); // blocks if queue is full
}
});
// the job queue of the thread pool has to be bounded too, otherwise all
// the objects in the input queue will be given to jobs immediately and
// I'll run out of heap space
// source: https://stackoverflow.com/questions/2001086/how-to-make-threadpoolexecutors-submit-method-block-if-it-is-saturated
BlockingQueue<Runnable> jobQueue = new ArrayBlockingQueue<Runnable>(JOB_QUEUE_SIZE);
RejectedExecutionHandler rejectedExecutionHandler
= new ThreadPoolExecutor.CallerRunsPolicy();
ExecutorService executorService
= new ThreadPoolExecutor(
NUMBER_OF_THREADS,
NUMBER_OF_THREADS,
0L,
TimeUnit.MILLISECONDS,
jobQueue,
rejectedExecutionHandler
);
Thread processingBossThread = new Thread(() -> {
while (!inputQueue.isEmpty() || parserThread.isAlive()) {
CSVRecord record = inputQueue.take(); // blocks if queue is empty
executorService.execute(() -> {
String[] array = this.doStuff(record);
outputQueue.put(array); // blocks if queue is full
});
}
// getting here that means that all CSV rows have been read and
// added to the processing queue
executorService.shutdown(); // do not accept any new tasks
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
// wait for existing tasks to finish
});
Thread writerThread = new Thread(() -> {
while (!outputQueue.isEmpty() || consumerBossThread.isAlive()) {
String[] outputRow = outputQueue.take(); // blocks if queue is empty
outputFileWriter.printRecord((Object[]) outputRow);
});
parserThread.start();
consumerBossThread.start();
writerThread.start();
// wait until writer thread has finished
writerThread.join();
I've left out the logging and exception handling so this looks a lot shorter than it is.
This solution works but I'm not happy with it. It seems hacky to have to create my own threads, check their isAlive(), create a Runnable within a Runnable, be forced to specify a timeout when I really just want to wait until all the workers have finished, etc. All in all it's a 100+ line method, or even several hundred lines of code if I make the Runnables their own classes, for what seems like a very basic pattern.
Is there a better solution? I'd like to make use of Java's libraries as much as possible, to help keep my code maintainable and in line with best practices. I would still like to know what it's doing under the hood, but I doubt that implementing all this myself is the best way to do it.
Update:
Better solution, after suggestions from the answers:
BlockingQueue<Runnable> jobQueue = new ArrayBlockingQueue<Runnable>(JOB_QUEUE_SIZE);
RejectedExecutionHandler rejectedExecutionHandler
= new ThreadPoolExecutor.CallerRunsPolicy();
ExecutorService executorService
= new ThreadPoolExecutor(
NUMBER_OF_THREADS,
NUMBER_OF_THREADS,
0L,
TimeUnit.MILLISECONDS,
jobQueue,
rejectedExecutionHandler
);
while (it.hasNext()) {
CSVRecord record = it.next();
executorService.execute(() -> {
String[] array = this.doStuff(record);
synchronized (writer) {
writer.printRecord((Object[]) array);
}
});
}
I would like to point out something first, I could think of three possible scenarios:
1.- For all the lines of a file, the time that it needs to process a line, by using the doStuff method, is bigger than the time that it takes to read the same line from disk and parse it
2.- For all the lines of a file, the time that it needs to process a line, by using the doStuff method, is lower or equal than the time that it takes to read the same line and parse it.
3.- Neither the first nor the second scenarios for the same file.
Your solution should be good for the first scenario, but not for the second or third ones, also, you're not modifying queues in a synchronized way. Even more, if you're experiencing scenarios like number 2, then you're wasting cpu cycles when there is no data to be sent to the output, or when there are no lines to be sent to the queue to be processed by the doStuff, by spining at:
while (!outputQueue.isEmpty() || consumerBossThread.isAlive()) {
Finally, regardless of which scenario you're experiencing, I would suggest you to use Monitor objects, that will allow you to put specific threads to wait until another process notifies them that a certain condition is true and that they can be activated again. By using Monitor objects you'll not waste cpu cycles.
For more information, see:
https://docs.oracle.com/javase/7/docs/api/javax/management/monitor/Monitor.html
EDIT: I've deleted the suggestion of using Synchronized Methods, since as you've pointed out, BlockingQueue's methods are thread-safe (or almost all) and prevents race conditions.
Use ThreadPoolExecutor tied to a fixed size blocking queue and all of your complexity vanishes in a puff of JavaDoc.
Just have a single thread read the file and gorge the blocking queue, all the processing is done by the Executor.
Addenda:
You can either synchronize on your writer, or simply use yet another queue, and the processors fill that, and your single write thread consume the queue.
Synchronizing on the writer would most likely be the simplest way.
Our application requires all worker threads to synchronize at a defined point. For this we use a CyclicBarrier, but it does not seem to scale well. With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
EDIT: Synchronization happens very frequently, in the order of 100k to 1M times.
If synchronization of many threads is "hard", would it help building a synchronization tree? Thread 1 waits for 2 and 3, which in turn wait for 4+5 and 6+7, respectively, etc.; after finishing, threads 2 and 3 wait for thread 1, thread 4 and 5 wait for thread 2, etc..
1
| \
2 3
|\ |\
4 5 6 7
Would such a setup reduce synchronization overhead? I'd appreciate any advice.
See also this featured question: What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?
With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
Honestly, there's your problem right there. Figure out a performance benchmark and prove that this is the problem, or risk spending hours / days solving the entirely wrong problem.
You are thinking about the problem in a subtly wrong way that tends to lead to very bad coding. You don't want to wait for threads, you want to wait for work to be completed.
Probably the most efficient way is a shared, waitable counter. When you make new work, increment the counter and signal the counter. When you complete work, decrement the counter. If there is no work to do, wait on the counter. If you drop the counter to zero, check if you can make new work.
If I understand correctly, you're trying to break your solution up into parts and solve them separately, but concurrently, right? Then have your current thread wait for those tasks? You want to use something like a fork/join pattern.
List<CustomThread> threads = new ArrayList<CustomThread>();
for (Something something : somethings) {
threads.add(new CustomThread(something));
}
for (CustomThread thread : threads) {
thread.start();
}
for (CustomThread thread : threads) {
thread.join(); // Blocks until thread is complete
}
List<Result> results = new ArrayList<Result>();
for (CustomThread thread : threads) {
results.add(thread.getResult());
}
// do something with results.
In Java 7, there's even further support via a fork/join pool. See ForkJoinPool and its trail, and use Google to find one of many other tutorials.
You can recurse on this concept to get the tree you want, just have the threads you create generate more threads in the exact same way.
Edit: I was under the impression that you wouldn't be creating that many threads, so this is better for your scenario. The example won't be horribly short, but it goes along the same vein as the discussion you're having in the other answer, that you can wait on jobs, not threads.
First, you need a Callable for your sub-jobs that takes an Input and returns a Result:
public class SubJob implements Callable<Result> {
private final Input input;
public MyCallable(Input input) {
this.input = input;
}
public Result call() {
// Actually process input here and return a result
return JobWorker.processInput(input);
}
}
Then to use it, create an ExecutorService with a fix-sized thread pool. This will limit the number of jobs you're running concurrently so you don't accidentally thread-bomb your system. Here's your main job:
public class MainJob extends Thread {
// Adjust the pool to the appropriate number of concurrent
// threads you want running at the same time
private static final ExecutorService pool = Executors.newFixedThreadPool(30);
private final List<Input> inputs;
public MainJob(List<Input> inputs) {
super("MainJob")
this.inputs = new ArrayList<Input>(inputs);
}
public void run() {
CompletionService<Result> compService = new ExecutorCompletionService(pool);
List<Result> results = new ArrayList<Result>();
int submittedJobs = inputs.size();
for (Input input : inputs) {
// Starts the job when a thread is available
compService.submit(new SubJob(input));
}
for (int i = 0; i < submittedJobs; i++) {
// Blocks until a job is completed
results.add(compService.take())
}
// Do something with results
}
}
This will allow you to reuse threads instead of generating a bunch of new ones every time you want to run a job. The completion service will do the blocking while it waits for jobs to complete. Also note that the results list will be in order of completion.
You can also use Executors.newCachedThreadPool, which creates a pool with no upper limit (like using Integer.MAX_VALUE). It will reuse threads if one is available and create a new one if all the threads in the pool are running a job. This may be desirable later if you start encountering deadlocks (because there's so many jobs in the fixed thread pool waiting that sub jobs can't run and complete). This will at least limit the number of threads you're creating/destroying.
Lastly, you'll need to shutdown the ExecutorService manually, perhaps via a shutdown hook, or the threads that it contains will not allow the JVM to terminate.
Hope that helps/makes sense.
If you have a generation task (like the example of processing columns of a matrix) then you may be stuck with a CyclicBarrier. That is to say, if every single piece of work for generation 1 must be done in order to process any work for generation 2, then the best you can do is to wait for that condition to be met.
If there are thousands of tasks in each generation, then it may be better to submit all of those tasks to an ExecutorService (ExecutorService.invokeAll) and simply wait for the results to return before proceeding to the next step. The advantage of doing this is eliminating context switching and wasted time/memory from allocating hundreds of threads when the physical CPU is bounded.
If your tasks are not generational but instead more of a tree-like structure in which only a subset need to be complete before the next step can occur on that subset, then you might want to consider a ForkJoinPool and you don't need Java 7 to do that. You can get a reference implementation for Java 6. This would be found under whatever JSR introduced the ForkJoinPool library code.
I also have another answer which provides a rough implementation in Java 6:
public class Fib implements Callable<Integer> {
int n;
Executor exec;
Fib(final int n, final Executor exec) {
this.n = n;
this.exec = exec;
}
/**
* {#inheritDoc}
*/
#Override
public Integer call() throws Exception {
if (n == 0 || n == 1) {
return n;
}
//Divide the problem
final Fib n1 = new Fib(n - 1, exec);
final Fib n2 = new Fib(n - 2, exec);
//FutureTask only allows run to complete once
final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
//Ask the Executor for help
exec.execute(n2Task);
//Do half the work ourselves
final int partialResult = n1.call();
//Do the other half of the work if the Executor hasn't
n2Task.run();
//Return the combined result
return partialResult + n2Task.get();
}
}
Keep in mind that if you have divided the tasks up too much and the unit of work being done by each thread is too small, there will negative performance impacts. For example, the above code is a terribly slow way to solve Fibonacci.
I'm doing a download application for android. The downloading part is now successfully implemented and its working.
What I need is to download the file parallel in segments. To be more clear, if the user specify 8 segments, I want to create 8 Threads and do the downloading.
So in what way will I be able to create 8 threads dynamically? Also as I'm doing this for an phone how will I be able to maintain the memory consumption at a minimum level?
I have not worked with threads before, so I hope you can help me with this. Thank you for your time! :)
The most efficient way to create a fixed number of threads is to use the ExecutorService:
ExecutorService exec = Executors.newFixedThreadPool(8);
It's basically a fixed-size thread pool that takes a lot of the management burden from the developer.
Edit: So your flow should be something like this:
First, define your thread task class (each thread will execute the call method of its own task):
class ThreadTask implements Callable<Object> {
public Object call() {
// execute download
...
return result;
}
}
If you want to pass any parameters to the tasks, put some private fields in the class above and pass them through a constructor. Also, you can return any type from call, just change the type in the implements Callable<...> part.
When you want to fire off the threads, create the pool and submit the tasks:
ExecutorService exec = Executors.newFixedThreadPool(8);
List<Future<Object>> results = new ArrayList<Future<Object>>();
// submit tasks
for(int i = 0; i < 8; i++) {
results.add(exec.submit(new ThreadTask()));
}
...
// stop the pool from accepting new tasks
exec.shutdown();
// wait for results
for(Future<Object> result: results) {
Object obj = result.get();
}
Take a look at ExecutorService, in particular Executors.newFixedThreadPool(int i), this is an excellent way to handle threads in a system friendly matter.
Please look at my following code....
private static final int NTHREDS = 10;
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
while(rs.next()){
webLink=rs.getString(1);
FirstName=rs.getString(2);
MiddleName=rs.getString(3);
Runnable worker = new MyRunnable(webLink,FirstName,MiddleName);// this interface has run method....
executor.execute(worker);
}
//added
public class MyRunnable implements Runnable {
MyRunnable(String webLink,String FirstName,String MiddleName){
** Assigning Values...***
}
#Override
public void run() {
long sum = 0;
**Calling method to crawl by passing those Values**
try {
Thread.sleep(200);
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
}
In this part if the resultset(rs) having 100 records excutor creating 100 threads..... I need to run this process with in 10 threads. I need your help to know how to get control of threads.. If any thread has completed its task then it should process the immediate available task from the Result Set. Is it possible to achieve using executor framework.
Thanks...
vijay365
The code you've already posted does this. Your code will not immediately spawn 100 threads. It will spawn 10 threads that consume tasks from a queue containing your Runnables.
From the Executors.newFixedThreadPool Javadocs:
Creates a thread pool that reuses a
fixed set of threads operating off a
shared unbounded queue.
Instead of using a static number of threads (10 in this case) you should determine the number dynamically:
final int NTHREADS = Runtime.getRuntime().availableProcessors();
Also, I don't get why you are calling Thread.sleep?
ResultSet is probably a JDBC query result.
This design is almost certain to be doomed to failure.
The JDBC interface implementations are not thread-safe.
ResultSets are scare resources that should be closed in the same scope in which they were created. If you pass them around, you're asking for trouble.
Multi-threaded code is hard to write well and even harder to debug if incorrect.
You are almost certainly headed in the wrong direction with this design. I'd bet a large sum of money that you're guilty of premature optimization. You are hoping that multiple threads will make your code faster, but what will happen is ten threads time slicing on one CPU and taking the same time or longer. (Context switching takes time, too.)
A slightly better idea would be to load the ResultSet into an object or collection, close the ResultSet, and then do some multi-threaded processing on that returned object.
Try executor.submit(worker);
We have 1000 threads that hit a web service and time how long the call takes. We wish for each thread to return their own timing result to the main application, so that various statistics can be recorded.
Please note that various tools were considered for this, but for various reasons we need to write our own.
What would be the best way for each thread to return the timing - we have considered two options so far :-
1. once a thread has its timing result it calls a singleton that provides a synchronised method to write to the file. This ensures that all each thread will write to the file in turn (although in an undetermined order - which is fine), and since the call is done after the timing results have been taken by the thread, then being blocked waiting to write is not really an issue. When all threads have completed, the main application can then read the file to generate the statistics.
2. Using the Executor, Callable and Future interfaces
Which would be the best way, or are there any other better ways ?
Thanks very much in advance
Paul
Use the latter method.
Your workers implement Callable. You then submit them to a threadpool, and get a Future instance for each.
Then just call get() on the Futures to get the results of the calculations.
import java.util.*;
import java.util.concurrent.*;
public class WebServiceTester {
public static class Tester
implements Callable {
public Integer call() {
Integer start = now();
//Do your test here
Integer end = now();
return end - start;
}
}
public static void main(String args[]) throws Exception {
ExecutorService pool = Executors.newFixedThreadPool(1000);
Set<Future<Integer>> set = new HashSet<Future<Integer>>();
for (int i =0 ; i < 1000 i++) {
set.add(pool.submit(new Tester()));
}
Set<Integer> results = new Set<Integer>();
for (Future<Integer> future : set) {
results.put(future.get());
}
//Manipulate results however you wish....
}
}
Another possible solution I can think of would be to use a CountDownLatch (from the java concurrency packages), each thread decrementing it (flagging they are finished), then once all complete (and the CountDownLatch reaches 0) your main thread can happily go through them all, asking them what their time was.
The executor framework can be implemented here. The time processing can be done by the Callable object. The Future can help you identify if the thread has completed processing.
You could pass an ArrayBlockingQueue to the threads to report their results to. You could then have a file writing thread that takes from the queue to write to the file.