ForkJoinPool - Why program is throwing OutOfMemoryError?

ForkJoinPool - Why program is throwing OutOfMemoryError? - java

I wanted to try out ForkJoinPool in Java 8 so i wrote a small program for searching all the files whose name contains a specific keyword in a given directory.
Program:
public class DirectoryService {
public static void main(String[] args) {
FileSearchRecursiveTask task = new FileSearchRecursiveTask("./DIR");
ForkJoinPool pool = (ForkJoinPool) Executors.newWorkStealingPool();
List<String> files = pool.invoke(task);
pool.shutdown();
System.out.println("Total no of files with hello" + files.size());
}
}
class FileSearchRecursiveTask extends RecursiveTask<List<String>> {
private String path;
public FileSearchRecursiveTask(String path) {
this.path = path;
}
#Override
protected List<String> compute() {
File mainDirectory = new File(path);
List<String> filetedFileList = new ArrayList<>();
List<FileSearchRecursiveTask> recursiveTasks = new ArrayList<>();
if(mainDirectory.isDirectory()) {
System.out.println(Thread.currentThread() + " - Directory is " + mainDirectory.getName());
if(mainDirectory.canRead()) {
File[] fileList = mainDirectory.listFiles();
for(File file : fileList) {
System.out.println(Thread.currentThread() + "Looking into:" + file.getAbsolutePath());
if(file.isDirectory()) {
FileSearchRecursiveTask task = new FileSearchRecursiveTask(file.getAbsolutePath());
recursiveTasks.add(task);
task.fork();
} else {
if (file.getName().contains("hello")) {
System.out.println(file.getName());
filetedFileList.add(file.getName());
}
}
}
}
for(FileSearchRecursiveTask task : recursiveTasks) {
filetedFileList.addAll(task.join());
}
}
return filetedFileList;
}
}
This program works fine when directory doesn't have too many sub-directories and files but if its really big then it throws OutOfMemoryError.
My understanding is that max number of threads (including compensation threads) are bounded so why their is this error? Am i missing anything in my program?
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
at java.util.concurrent.ForkJoinPool.tryCompensate(ForkJoinPool.java:2020)
at java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:2057)
at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:390)
at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:719)
at FileSearchRecursiveTask.compute(DirectoryService.java:51)
at FileSearchRecursiveTask.compute(DirectoryService.java:20)
at java.util.concurrent.RecursiveTask.exec(RecursiveTask.java:94)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1107)
at java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:2046)
at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:390)
at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:719)
at FileSearchRecursiveTask.compute(DirectoryService.java:51)
at FileSearchRecursiveTask.compute(DirectoryService.java:20)
at java.util.concurrent.RecursiveTask.exec(RecursiveTask.java:94)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

You should not fork new tasks beyond all recognition. Basically, you should fork as long as there’s a chance that another worker thread can pick up the forked job and evaluate locally otherwise. Then, once you have forked a task, don’t call join() right afterwards. While the underlying framework will start compensation threads to ensure that your jobs will proceed instead of just having all threads blocked waiting for a sub-task, this will create that large amount of threads that may exceed the system’s capabilities.
Here is a revised version of your code:
public class DirectoryService {
public static void main(String[] args) {
FileSearchRecursiveTask task = new FileSearchRecursiveTask(new File("./DIR"));
List<String> files = task.invoke();
System.out.println("Total no of files with hello " + files.size());
}
}
class FileSearchRecursiveTask extends RecursiveTask<List<String>> {
private static final int TARGET_SURPLUS = 3;
private File path;
public FileSearchRecursiveTask(File file) {
this.path = file;
}
#Override
protected List<String> compute() {
File directory = path;
if(directory.isDirectory() && directory.canRead()) {
System.out.println(Thread.currentThread() + " - Directory is " + directory.getName());
return scan(directory);
}
return Collections.emptyList();
}
private List<String> scan(File directory)
{
File[] fileList = directory.listFiles();
if(fileList == null || fileList.length == 0) return Collections.emptyList();
List<FileSearchRecursiveTask> recursiveTasks = new ArrayList<>();
List<String> filteredFileList = new ArrayList<>();
for(File file: fileList) {
System.out.println(Thread.currentThread() + "Looking into:" + file.getAbsolutePath());
if(file.isDirectory())
{
if(getSurplusQueuedTaskCount() < TARGET_SURPLUS)
{
FileSearchRecursiveTask task = new FileSearchRecursiveTask(file);
recursiveTasks.add(task);
task.fork();
}
else filteredFileList.addAll(scan(file));
}
else if(file.getName().contains("hello")) {
filteredFileList.add(file.getAbsolutePath());
}
}
for(int ix = recursiveTasks.size() - 1; ix >= 0; ix--) {
FileSearchRecursiveTask task = recursiveTasks.get(ix);
if(task.tryUnfork()) task.complete(scan(task.path));
}
for(FileSearchRecursiveTask task: recursiveTasks) {
filteredFileList.addAll(task.join());
}
return filteredFileList;
}
}
The method doing the processing has been factored out into a method receiving the directory as parameter, so we are able to use it locally for arbitrary directories not necessarily being associated with a FileSearchRecursiveTask instance.
Then, the method uses getSurplusQueuedTaskCount() to determine the number of locally enqueued tasks which have not been picked up by other worker threads. Ensuring that there are some helps work balancing. But if this number exceeds the threshold, the processing will be done locally without forking more jobs.
After the local processing, it iterates over the tasks and uses tryUnfork() to identify jobs which have not been stolen by other worker threads and process them locally. Iterating backwards to start this with the youngest jobs raises the chances to find some.
Only afterwards, it join()s with all sub-jobs which are now either, completed or currently processed by another worker thread.
Note that I changed the initiating code to use the default pool. This uses “number of CPU cores” minus one worker threads, plus the initiating thread, i.e. the main thread in this example.

Just a minor change is required.
You need to specify the parallelism for newWorkStealingPool as follows:
ForkJoinPool pool = (ForkJoinPool) Executors.newWorkStealingPool(5);
As per its documentation:
newWorkStealingPool(int parallelism) -> Creates a thread pool that maintains enough threads to support the given parallelism level, and may use multiple queues to reduce contention. The parallelism level corresponds to the maximum number of threads actively engaged in, or available to engage in, task processing. The actual number of threads may grow and shrink dynamically. A work-stealing pool makes no guarantees about the order in which submitted tasks are executed.
As per the attached Java Visual VM screenshot, this parallelism allows the program to work within the memory specified and never goes out of memory.
And, one more thing (not sure if it will make any effect):
Change the order in which fork is called and the task is added to list. That is, change
FileSearchRecursiveTask task = new FileSearchRecursiveTask(file.getAbsolutePath());
recursiveTasks.add(task);
task.fork();
to
FileSearchRecursiveTask task = new FileSearchRecursiveTask(file.getAbsolutePath());
task.fork();
recursiveTasks.add(task);

Related

Multi threaded program using newFixedThreadPool doesn't run as excepted when the thread pool size is less than the number of tasks executed

package com.playground.concurrency;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
public class MyRunnable implements Runnable {
private String taskName;
public String getTaskName() {
return taskName;
}
public void setTaskName(String taskName) {
this.taskName = taskName;
}
private int processed = 0;
public MyRunnable(String name) {
this.taskName = name;
}
private boolean keepRunning = true;
public boolean isKeepRunning() {
return keepRunning;
}
public void setKeepRunning(boolean keepRunning) {
this.keepRunning = keepRunning;
}
private BlockingQueue<Integer> elements = new LinkedBlockingQueue<Integer>(10);
public BlockingQueue<Integer> getElements() {
return elements;
}
public void setElements(BlockingQueue<Integer> elements) {
this.elements = elements;
}
#Override
public void run() {
while (keepRunning || !elements.isEmpty()) {
try {
Integer element = elements.take();
Thread.sleep(10);
System.out.println(taskName +" :: "+elements.size());
System.out.println("Got :: " + element);
processed++;
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
System.out.println("Exiting thread");
}
public int getProcessed() {
return processed;
}
public void setProcessed(int processed) {
this.processed = processed;
}
}
package com.playground.concurrency.service;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import com.playground.concurrency.MyRunnable;
public class TestService {
public static void main(String[] args) throws InterruptedException {
int roundRobinIndex = 0;
int noOfProcess = 10;
List<MyRunnable> processes = new ArrayList<MyRunnable>();
for (int i = 0; i < noOfProcess; i++) {
processes.add(new MyRunnable("Task : " + i));
}
ExecutorService threadPoolExecutor = Executors.newFixedThreadPool(5);
for (MyRunnable process : processes) {
threadPoolExecutor.execute(process);
}
int totalMessages = 1000;
long start = System.currentTimeMillis();
for (int i = 1; i <= totalMessages; i++) {
processes.get(roundRobinIndex++).getElements().put(i);
if (roundRobinIndex == noOfProcess) {
roundRobinIndex = 0;
}
}
System.out.println("Done putting all the elements");
for (MyRunnable process : processes) {
process.setKeepRunning(false);
}
threadPoolExecutor.shutdown();
try {
threadPoolExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
long totalProcessed = 0;
for (MyRunnable process : processes) {
System.out.println("task " + process.getTaskName() + " processd " + process.getProcessed());
totalProcessed += process.getProcessed();
}
long end = System.currentTimeMillis();
System.out.println("total time" + (end - start));
}
}
I have a simple task that reads elements from a LinkedBlockingQueue. I create multiple instances of these tasks and execute by ExecutorService . This programs works as expected when the noOfProcess and thread pool size is same.(For ex: noOfProcess=10 and thread pool size=10).
However , if noOfProcess=10 and thread pool size =5 then the main thread keeps waiting at the below line after processing a few items.
processes.get(roundRobinIndex++).getElements().put(i);
What am i doing wrong here ?

Ah yes. The good old deadlock.
What happens is: You submit 10 Tasks to the ExecutorService, and then send jobs via .put(i). This blocks for Task 5 as expected when its queue is full. Now Task 5 is not currently being executed, and as a matter of fact will never be, since Task 0 to 4 are still clogging up your FixedThreadPool, blocking at .take() in the run() Method waiting for new Jobs from .put(i), which they will never get.
This error is a fundamental design flaw within your code and there are myriads of ways to fix it, one of which being the increased Thread Pool Size.
My suggestion is that you go back to the drawing board and rethink the structure in the main Method.
And since you posted your code, have some tips:
1.:
Posting your entire code can be interpreted as a call to 'pls fix my code', and you are encouraged to omit all uneccessary details (like all those getters and setters). Maybe check https://stackoverflow.com/help/minimal-reproducible-example
2.:
Posting two classes in the same body made things kinda complicated. Split it next time.
3.: (nitpick)
processes.get(roundRobinIndex++).getElements().put(i);
Combining two operations like you did here is bad style since it makes your code less readable for others. You could just have written:
processes.get(i % noOfProcesses).getElements().put(i);

To fix the behavior, you need to do one of the following:
have enough Runnables, each with enough queue capacity to take all 1,000 messages (for example: 100 Runnables with capacity 10 or more; or 10 Runnables with capacity 100 or more), or
have a thread pool that is large enough to accomodate all of your Runnables so that each of them can start running.
Without one of those happening, the ExecutorService will not start the extra Runnables. The main worker thread will continue adding items to each queue, including those of non-running Runnables, until it encounters a queue that is full, at which point it blocks. With 10 Runnables and thread pool size 5, the first queue to fill up will the be the 6th Runnable. This is the same if you had just 6 Runnables. The significant point is that you have at least one more Runnable than you have room in your thread pool.
From newFixedThreadPool() Javadoc:
If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available.
Consider a simpler example of 2 processes and thread pool size of 1. You'll be allowed to create the first process and submit it to the ExecutorService (so the ExecutorService will start and run it). The second process however, will not be allowed to run by the ExecutorService. Your main thread does not pay attention to this, however, and it will continue putting elements into the queue for the second process even though nothing is consuming it.
Your code is ok with noOfProcess=10 and thread pool size=5 – if you also change your queue size to 100, like this: new LinkedBlockingQueue<>(100).
You can observe this behavior – where the queue of a non-running Runnable fills up – if you change this line:
processes.get(roundRobinIndex++).getElements().put(i);
to this (which is the same logical code, but has object references saved for use inside the println() output):
MyRunnable runnable = processes.get(roundRobinIndex++);
BlockingQueue<Integer> elements = runnable.getElements();
System.out.println("attempt to put() for " + runnable.getTaskName() + " with " + elements.size() + " elements");
elements.put(i);

Java - Shutting down concurrent threads on request [duplicate]

This question already has answers here:
How to shutdown an ExecutorService?
(3 answers)
Closed 6 years ago.
I am trying to build a program that attempts to decrypt a file that has been encrypted using AES encryption for a school project. I have a list of ~100,000 english words, and want to implement multithreading within the program to optimise the time taken to attempt decryption with every word in the file.
I am having a problem when trying to stop the rest of the dictionary being searched in the event of the decryption completing successfully - "Attempting shutdown" is being printed to the console, but it appears that the threads continue working through the rest of the dictionary before the executor stops allocating new threads.
In my main program, the threads are run using this method:
private void startThreads(){
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
System.out.println("Maximum threads inside pool " + executor.getMaximumPoolSize());
for (int i = 0; i < dict.size(); i++) {
String word = dict.get(i);
Grafter grafter = new Grafter("Grafter " + i,word);
grafter.registerWorkerListener(thread -> {
List results = thread.getResults();
for (Iterator iter = results.iterator(); iter.hasNext();) {
found = (boolean) iter.next();
if(found){
System.out.println("THE WORD HAS BEEN FOUND!! Attempting shutdown");
executor.shutdown();
}
}
});
// Start the worker thread
Thread thread = new Thread(grafter);
thread.start();
}
if(!executor.isShutdown()) {
executor.shutdown();
}
}
And the implementation of the 'grafter' runnable class is as follows:
public class Grafter implements Runnable{
private String NAME;
private final String WORD;
private List listeners = new ArrayList();
private List results;
public Grafter(String name, String word){
NAME = name;
WORD = word;
}
public String getName(){
return NAME;
}
#Override
public void run() {
if (tryToDecrypt(WORD) == true){
System.out.println("Thread: '" + NAME + "' successfully decrypted using word: \"" + WORD + "\".");
results = new ArrayList();
results.add(true);
// Work done, notify listeners
notifyListeners();
}else{
results = new ArrayList();
results.add(false);
// Work done, notify listeners
notifyListeners();
}
}
private void notifyListeners() {
for (Iterator iter = listeners.iterator(); iter.hasNext();) {
GrafterListener listener = (GrafterListener) iter.next();
listener.workDone(this);
}
}
public void registerWorkerListener(GrafterListener listener) {
listeners.add(listener);
}
public List getResults() {
return results;
}
private boolean tryToDecrypt(String word){
//Decryption performed, returning true if successfully decrypted,
//Returns false if not
}
}
The correct word is right at the top of the dictionary, so success is found early in the program's execution. However, there is a long pause (as the rest of the dictionary is worked through) before the program finishes.
I am looking for help on the positioning of executor.shutdown(), and how to stop the remainder of the dictionary being parsed after the decryption successfully completes.

Your main problem is that you're not actually submitting your runnables to the executor. Thus calling shutdown on the executor has no effect on all the threads you've spawned.
Instead of creating a new thread instead do something like:
executor.submit(grafter)
This should get you most of the way, but if you want the service to shut down promptly and cleanly there's a bit more you can do. The link provided in the comments by shmosel should help you with that.
As an aside, the way you're doing this isn't going to be very efficient I don't think. Essentially you're creating a new task for every word in your dictionary which means that you have a large number of tasks (100K in your case). This means that the overhead of managing and scheduling all those task is likely to be a significant portion of the work performed by your program. Instead you may want to break the list of words up into some number of sublists each containing an equal number of words and then make your runnable process that sublist only.

Incrementing Thread Name

I have made a function that takes x amount of parameters. Each parameter, represents a file.
I want every single of these files to be assigned a thread, for counting the words of the files as fast as possible. Currently I have done something that seems to work, however I find myself in trouble of checking the threads as they are all just assigned with the name "t"
It would be nice to somehow increment the name of the threads. The first thread would be t1 and would be assigned to the first file and so on.
for (File file : fileList) {
final File f = file;
Thread t = null;
ThreadGroup test = null;
Runnable r = new Runnable() {
public void run() {
Scanner fileScan;
try {
fileScan = new Scanner(f);
}
catch(FileNotFoundException e){
System.out.println("Something went wrong while accessing the file");
return;
}
int words = 0;
while (fileScan.hasNext()) {
words++;
fileScan.next();
}
System.out.println(f.getName() + ": " + words + " words");
System.out.println(Thread.activeCount() + ": ");
}
};
t = new Thread(r);
t.start();
}
The threadcount goes up as it is supposed when checking with Thread.activeCount(), but I have no clue how to ever contact them as I have assigned all with the name t, which makes it hard to make yet another thread that shall wait for their output.
I hope my explaination clearified the problem :/
Edit:
The idea is that I will count the amount of words in different files, every file needs to be assigned a thread for itself to speed it up. Other than that, I want one thread waiting for the output from all the other threads ( meaning I will have to wait for them to finish, hence why I would appriciate accessing the name of the threads ).
At the end that last thread that has been waiting will use the collected data for it's own actions before closing the program down.

In order to for instance wait for the threads to finish, you need to save references to the threads you create. You could for instance do
List<Thread> threads = new ArrayList<>();
and then do threads.add(t); at the end of the loop.
After that you could wait for them to finish by doing
for (Thread t : threads) {
t.join();
}
What's problematic however is that there is no way for you to read the result of the threads you've started.
A much better approach to this is to use an ExecutorService and a ThreadPoolExecutor. This way you can submit Callable<Integer> instead of Runnable, and you'll be able to get the result of the word-count.
Here's an outline to get you started:
ExecutorService service = Executors.newFixedThreadPool(numThreads);
List<Future<Integer>> results = new ArrayList<>();
for (File f : fileList) {
results.add(service.submit(new Callable<Integer>() {
// ...
}));
}
for (Future<Integer> result : results) {
System.out.println("Result: " + result.get());
}

java multithreading beginner issue

I`m trying to learn multithreading programming and I have some questions about the approach that would have to be taken.
So, in my specific case I want to build a program that renames 1000 files and I was thinking to create a worker class:
public class Worker implements Runnable {
private List<File> files ;
public Worker(List<File> f){
files = f;
}
public void run(){
// read all files from list and rename them
}
}
and then in main class to do something like:
Worker w1 = new Worker(..list of 500 files...) ;
Worker w2 = new Worker(..list of the other 500 files...) ;
Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w2,"thread2");
t1.start();
t2.start();
Running this brings me no concurrency issues so I do not need synchronized code, but I`m not sure if this is the correct approach...?
Or should I create only one instance of Worker() and pass the entire 1000 files list, and the take care that no matter how many threads access the object thew won`t get the same File from the list ?
i.e :
Worker w1 = new Worker(..list of 1000 files...) ;
Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w1,"thread2");
t1.start();
t2.start();
How should I proceed here ?

The First approach you said is correct one. You need to create two Worker as each worker will work on different list of file.
Worker w1 = new Worker(..list of 500 files...) ; // First List
Worker w2 = new Worker(..list of the other 500 files...) ; // Second List
Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w2,"thread2");
t1.start();
t2.start();
It's simple here two different thread with load of 500 file will execute concurrently.

A more typical and scalable approach is one of the following:
create a collection (likely an array or list) of N threads to perform the work
use a thread pool, e.g. from Executors.newFixedThreadPool(N)
You may also wish to use a Producer Consumer pattern in which the threads pull from a common task pool. This allows natural balancing of the work - instead of essentially hard-coding one thread handles 500 tasks and the other the same number.
Consider after all what would happen if all of your larger files end up in the bucket handled by the Thread2? The first thread is done/idle and the second thread has to do all of the heavy lifting.
The producer/consumer pooling approach would be to dump all of the work (generated by the Producer's) into a task pool and then the Consumers (your worker threads) bite off small pieces (e.g. one file) at a time. This approach leads to keeping both threads occupied for a similar duration.

In learning multi-threaded programming one of the important insights is that a thread is not a task. By giving a thread a part of the list of items to process you are halfway there but the next step will take you further: constructing the task in such a way that any number of threads can execute it. To do this, you will have to get familiar with the java.util.concurrent classes. These are useful tools to help constructing the tasks.
The example below separates tasks from threads. It uses AtomicInteger to ensure each thread picks a unique task and it uses CountDownLatch to know when all work is done. The example also shows balancing: threads that execute tasks that complete faster, execute more tasks.
The example is by no means the only solution - there are other ways of doing this that could be faster, easier, better to maintain, etc..
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;
public class MultiRename implements Runnable {
public static void main(String[] args) {
final int numberOfFnames = 50;
MultiRenameParams params = new MultiRenameParams();
params.fnameList = new ArrayList<String>();
for (int i = 0; i < numberOfFnames; i++) {
params.fnameList.add("fname " + i);
}
params.fnameListIndex = new AtomicInteger();
final int numberOfThreads = 3;
params.allDone = new CountDownLatch(numberOfThreads);
ExecutorService tp = Executors.newCachedThreadPool();
System.out.println("Starting");
for (int i = 0; i < numberOfThreads; i++) {
tp.execute(new MultiRename(params, i));
}
try { params.allDone.await(); } catch (Exception e) {
e.printStackTrace();
}
tp.shutdownNow();
System.out.println("Finished");
}
private final MultiRenameParams params;
private final Random random = new Random();
// Just to show there are fast and slow tasks.
// Thread with lowest delay should get most tasks done.
private final int delay;
public MultiRename(MultiRenameParams params, int delay) {
this.params = params;
this.delay = delay;
}
#Override
public void run() {
final int maxIndex = params.fnameList.size();
int i = 0;
int count = 0;
while ((i = params.fnameListIndex.getAndIncrement()) < maxIndex) {
String fname = params.fnameList.get(i);
long sleepTimeMs = random.nextInt(10) + delay;
System.out.println(Thread.currentThread().getName() + " renaming " + fname + " for " + sleepTimeMs + " ms.");
try { Thread.sleep(sleepTimeMs); } catch (Exception e) {
e.printStackTrace();
break;
}
count++;
}
System.out.println(Thread.currentThread().getName() + " done, renamed " + count + " files.");
params.allDone.countDown();
}
static class MultiRenameParams {
List<String> fnameList;
AtomicInteger fnameListIndex;
CountDownLatch allDone;
}
}

Spawn multiple threads that take inputs from single collection and put results in single collection

Here is a brief of what i want to do , I have a scenario where
number of text files are generated dynamically on daily basis. 0
to 8 per day. size of each file can be small to big. depending on
day's data.
Need to run some checks (business checks) on them.
I plan to complete the task in minimum time, hence trying to write a parallel executor for performing checks on these files.
My idea is
Store n files in a concurrent collection (ConcurrentLinkedQueue)
remove a file, spawn a thread , that runs all checks on the file
since 1 file has no relation to another i want to be able to process multiple files
Store results in another concurrent collection ( ConcurrentLinkedQueue ... which is converted to different html pdf reports)
NOTE : number of threads can be different from number of files (I want to number of threads configurable , its not the case where number of file = number of threads )
My understanding is This way i should be able to complete the DAILY checks in minimum time.
I have my code like below , what confuses me "how to store all thread's results in single collection after each thread's completion" , my gut feeling is i am doing something funny (incorrect) the way i am storing results.
Second ques wanted to check if anyone forsees any other issues in code snippet below
Third ques this seems like a common use case ( to me ) any pointers to design pattern code snippets solving this
Note : i am using JDK 6.
public class CheckExecutor {
// to store all results of all threads here , then this will be converted to html/pdf files
static ConcurrentLinkedQueue<Result> fileWiseResult = new ConcurrentLinkedQueue<Result>();
public static void main(String[] args) {
int numberOfThreads=n; // need keep it configurable
Collection<ABCCheck> checksToExecute // will populate from business logic , ABCCheck is interface , has a method check() , there are different implementations
ConcurrentLinkedQueue<File> fileQueue = new ConcurrentLinkedQueue<File>(); // list of files for 1 day , may vary from 0 to 8
int maxNumOfFiles = fileQueue.size();
ThreadGroup tg = new ThreadGroup ("Group");
// If more number of threads than files (rare , can be considered corener case)
if (maxNumOfFiles < numberOfThreads) numberOfThreads=maxNumOfFiles;
// loop and start number of threads
for(int var=0;var<numberOfThreads;var++)
{
File currentFile = fileQueue.remove();
// execute all checks on 1 file using checksToExecute
ExecuteAllChecks checksToRun = new ExecuteAllChecks(); // business logic to populate checks
checksToRun.setchecksToExecute(checksToExecute);
checksToRun.setcheckResult(fileWiseResult); // when each check finishes want to store result here
new Thread (tg , checksToRun , "Threads for "+currentFile.getName()).start();
}
// To complete the tasak ... asap ... want to start a new thread as soon as any of current thread ends (diff files diff sizes)
while(!fileQueue.isEmpty()) {
try {
Thread.sleep(10000); // Not sure If this will cause main thread to sleep (i think it will pause current thread ) i want to pause main thread
} catch (InterruptedException e) {
e.printStackTrace();
}
// check processing of how many files completed
if( (tg.activeCount()<numberOfThreads) && (fileQueue.size()>0) ) {
int numOfThreadsToStart = numberOfThreads - tg.activeCount();
for(int var1=0;var1<numOfThreadsToStart;var1++) {
File currentFile = fileQueue.remove();
ExecuteAllchecks checksToRun = new ExecuteAllchecks();
checksToRun.setchecksToExecute(checksToExecute);
checksToRun.setcheckResult(fileWiseResult); // when each check finishes want to store result here
new Thread (tg , checksToRun , "Threads for "+currentFile.getName()).start();
}
}
}
}
}
class ExecuteAllchecks implements Runnable {
private Collection<ABCCheck> checksToExecute;
private ConcurrentLinkedQueue<Result> checkResult; // not sure if its correct , i want to store result off all threads here
public ConcurrentLinkedQueue<Result> getcheckResult() {
return checkResult;
}
// plan to instantiate the result collection globally and store result here
public void setcheckResult(ConcurrentLinkedQueue<Result> checkResult) {
this.checkResult = checkResult;
}
public Collection<ABCCheck> getchecksToExecute() {
return checksToExecute;
}
public void setchecksToExecute(Collection<ABCCheck> checksToExecute) {
this.checksToExecute = checksToExecute;
}
#Override
public void run() {
Result currentFileResult = new Result();
// TODO Auto-generated method stub
System.out.println("Execute All checks for 1 file");
// each check runs and calls setters on currentFileResult
checkResult.add(currentFileResult);
}
}

The actual implementation is very influenced by the nature of the computations itself, but somewhat general approach could be:
private final ExecutorService executor = Executors.newCachedThreadPool();
private final int taskCount = ...;
private void process() {
Collection< Callable< Result > > tasks = new ArrayList<>( taskCount );
for( int i = 0; i < taskCount; i++ ) {
tasks.add( new Callable< Result >() {
#Override
public Result call() throws Exception {
// TODO implement your logic and return result
...
return result;
}
} );
}
List< Future< Result > > futures = executor.invokeAll( tasks );
List< Result > results = new ArrayList<>( taskCount );
for( Future< Result > future : futures ) {
results.add( future.get() );
}
}
I would also recommend using sensible timeouts on future.get() invocations in order to executing thread not to stuck.
Still, I would't also recommend using cached thread pool in production as this pool is increasing whenever current pool doesn't have enough capacity for all tasks, but rather using something like Executors.newFixedThreadPool( Runtime.getRuntime().availableProcessors() )
I you actual task could be splitter into several small ones and the later be joined consider checking how that could be efficiently be done using ForkJoin framework

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ForkJoinPool - Why program is throwing OutOfMemoryError? - java

Related

Multi threaded program using newFixedThreadPool doesn't run as excepted when the thread pool size is less than the number of tasks executed

Java - Shutting down concurrent threads on request [duplicate]

Incrementing Thread Name

java multithreading beginner issue

Spawn multiple threads that take inputs from single collection and put results in single collection

Categories

Resources