Java main class ends up before threads execution - java

I have a multithreaded execution and I want to track and print out the execution time, but when I execute the code, the child threads takes longer than the main execution, thus the output is not visible nor it prints the right value, since it is terminating earlier.
Here is the code:
public static void main(String[] args) throws CorruptIndexException, IOException, LangDetectException, InterruptedException {
/* Initialization */
long startingTime = System.currentTimeMillis();
Indexer main = new Indexer(); // this class extends Thread
File file = new File(SITES_PATH);
main.addFiles(file);
/* Multithreading through ExecutorService */
ExecutorService es = Executors.newFixedThreadPool(4);
for (File f : main.queue) {
Indexer ind = new Indexer(main.writer, main.identificatore, f);
ind.join();
es.submit(ind);
}
es.shutdown();
/* log creation - code I want to execute when all the threads execution ended */
long executionTime = System.currentTimeMillis()-startingTime;
long minutes = TimeUnit.MILLISECONDS.toMinutes(executionTime);
long seconds = TimeUnit.MILLISECONDS.toSeconds(executionTime)%60;
String fileSize = sizeConversion(FileUtils.sizeOf(file));
Object[] array = {fileSize,minutes,seconds};
logger.info("{} indexed in {} minutes and {} seconds.",array);
}
I tried several solutions such as join(), wait() and notifyAll(), but none of them worked.
I found this Q&A on stackoverflow which treats my problem, but join() is ignored and if I put
es.awaitTermination(timeout, TimeUnit.SECONDS);
actually the executor service never executes threads.
Which can be the solution for executing multithreading only in ExecutorService block and finish with main execution at the end?

Given your user case you might as well utilize the invokeAll method. From the Javadoc:
Executes the given tasks, returning a list of Futures holding their
status and results when all complete. Future.isDone() is true for each
element of the returned list. Note that a completed task could have
terminated either normally or by throwing an exception. The results of
this method are undefined if the given collection is modified while
this operation is in progress.
To use:
final Collection<Indexer> tasks = new ArrayList<Indexer>();
for(final File f: main.queue) {
tasks.add(new Indexer(main.writer, main.identificatore, f));
}
final ExecutorService es = Executors.newFixedThreadPool(4);
final List<Future<Object>> results = es.invokeAll(tasks);
This will execute all supplied tasks and wait for them to finish processing before proceeding on your main thread. You will need to tweak the code to fit your particular needs, but you get the gist. A quick note, there is a variant of the invokeAll method that accepts timeout parameters. Use that variant if you want to wait up to a maximum amount of time before proceeding. And make sure to check the results collected after the invokeAll is done, in order to verify the status of the completed tasks.
Good luck.

The ExecutorService#submit() method returns a Future object, which can be used for waiting until the submitted task has completed.
The idea is that you collect all of these Futures, and then call get() on each of them. This ensures that all of the submitted tasks have completed before your main thread continues.
Something like this:
ExecutorService es = Executors.newFixedThreadPool(4);
List<Future<?>> futures = new ArrayList<Future<?>>();
for (File f : main.queue) {
Indexer ind = new Indexer(main.writer, main.identificatore, f);
ind.join();
Future<?> future = es.submit(ind);
futures.add(future);
}
// wait for all tasks to complete
for (Future<?> f : futures) {
f.get();
}
// shutdown thread pool, carry on working in main thread...

Related

ExecutorService.submit() vs ExecutorSerivce.invokeXyz()

ExecutorService contains following methods:
invokeAll(Collection<? extends Callable<T>> tasks)
invokeAny(Collection<? extends Callable<T>> tasks)
submit(Callable<T> task)
I am confused about the use of terms submit vs invoke. Does it mean that invokeXyz() methods invoke those tasks immediately as soon as possible by underlying thread pool and submit() does some kind of scheduling of tasks submitted.
This answer says "if we want to wait for completion of all tasks, which have been submitted to ExecutorService". What does "wait for" here refers to?
Both invoke..() and submit() will execute their tasks immediately (assuming threads are available to run the tasks). The difference is that invoke...() will wait for the tasks running in separate threads to finish before returning a result, whereas submit() will return immediately, meaning the task it executed is still running in another thread.
In other words, the Future objects returned from invokeAll() are guaranteed to be in a state where Future.isDone() == true. The Future object returned from submit() can be in a state where Future.isDone() == false.
We can easily demonstrate the timing difference.
public static void main(String... args) throws InterruptedException {
Callable<String> c1 = () -> { System.out.println("Hello "); return "Hello "; };
Callable<String> c2 = () -> { System.out.println("World!"); return "World!"; };
List<Callable<String>> callables = List.of(c1, c2);
ExecutorService executor = Executors.newSingleThreadExecutor();
System.out.println("Begin invokeAll...");
List<Future<String>> allFutures = executor.invokeAll(callables);
System.out.println("End invokeAll.\n");
System.out.println("Begin submit...");
List<Future<String>> submittedFutures = callables.stream().map(executor::submit).collect(toList());
System.out.println("End submit.");
}
And the result is that the callables print their Hello World message before the invokeAll() method completes; but the callables print Hello World after the submit() method completes.
/*
Begin invokeAll...
Hello
World!
End invokeAll.
Begin submit...
End submit.
Hello
World!
*/
You can play around with this code in an IDE by adding some sleep() time in c1 or c2 and watching as the terminal prints out. This should convince you that invoke...() does indeed wait for something to happen, but submit() does not.

Improve Performance for reading file line by line and processing

I have a piece of java code which does the following -
Opens a file with data in format {A,B,C} and each file has approx. 5000000 lines.
For each line in file, call a service that gives a column D and append it to {A,B,C} as {A,B,C,D}.
Write this entry into a chunkedwriter that eventually groups together 10000 lines to write back chunk to a remote location
Right now the code is taking 32 hours to execute. This process would again get repeated across another file which hypothetically takes another 32 hours but we need these processes to run daily.
Step 2 is further complicated by the fact that sometimes the service does not have D but is designed to fetch D from its super data store so it throws a transient exception asking you to wait. We have retries to handle this so an entry could technically be retried 5 times with a max delay of 60000 millis. So we could be looking at 5000000 * 5 in worst case.
The combination of {A,B,C} are unique and thus result D can't be cached and reused and a fresh request has to be made to get D every time.
I've tried adding threads like this:
temporaryFile = File.createTempFile(key, ".tmp");
Files.copy(stream, temporaryFile.toPath(),
StandardCopyOption.REPLACE_EXISTING);
reader = new BufferedReader(new InputStreamReader(new
FileInputStream(temporaryFile), StandardCharsets.UTF_8));
String entry;
while ((entry = reader.readLine()) != null) {
final String finalEntry = entry;
service.execute(() -> {
try {
processEntry(finalEntry);
} catch (Exception e) {
log.error("something");
});
count++;
}
Here processEntry method abstracts the implementation details explained above and threads are defined as
ExecutorService service = Executors.newFixedThreadPool(10);
The problem I'm having is the first set of threads spin up but the process doesn't wait until all threads finish their work and all 5000000 lines are complete. So the task that used to wait for completion for 32 hours now ends in <1min which messes up our system's state. Are there any alternative ways to do this? How can I make process wait on all threads completing?
Think about using ExecutorCompletionService if you want to take tasks as they complete you need an ExecutorCompletionService. This acts as a BlockingQueue that will allow you to poll for tasks as and when they finish.
Another solution is to wait the executor termination then you shut it down using:
ExecutorService service = Executors.newFixedThreadPool(10);
service .shutdown();
while (!service .isTerminated()) {}
One alternative is to use a latch to wait for all the tasks to complete before you shutdown the executor on the main thread.
Initialize a CountdownLatch with 1.
After you exit the loop that submits the tasks, you call latch.await();
In the task you start you have to have a callback on the starting class to let it know when a task has finished.
Note that in the starting class the callback function has to be synchronized.
In the starting class you use this callback to take the count of completed tasks.
Also inside the callback, when all tasks have completed, you call latch.countdown() for the main thread to continue, lets say, shutting down the executor and exiting.
This shows the main concept, it can be implemented with more detail and more control on the completed tasks if necessary.
It would be something like this:
public class StartingClass {
CountDownLatch latch = new CountDownLatch(1);
ExecutorService service = Executors.newFixedThreadPool(10);
BufferedReader reader;
Path stream;
int count = 0;
int completed = 0;
public void runTheProcess() {
File temporaryFile = File.createTempFile(key, ".tmp");
Files.copy(stream, temporaryFile.toPath(),
StandardCopyOption.REPLACE_EXISTING);
reader = new BufferedReader(new InputStreamReader(new
FileInputStream(temporaryFile), StandardCharsets.UTF_8));
String entry;
while ((entry = reader.readLine()) != null) {
final String finalEntry = entry;
service.execute(new Task(this,finalEntry));
count++;
}
latch.await();
service.shutdown();
}
public synchronized void processEntry(String entry) {
}
public synchronized void taskCompleted() {
completed++;
if(completed == count) {
latch.countDown();;
}
}
//This can be put in a different file.
public static class Task implements Runnable {
StartingClass startingClass;
String finalEntry;
public Task(StartingClass startingClass, String finalEntry) {
this.startingClass = startingClass;
this.finalEntry = finalEntry;
}
#Override
public void run() {
try {
startingClass.processEntry(finalEntry);
startingClass.taskCompleted();
} catch (Exception e) {
//log.error("something");
};
}
}
}
Note that you need to close the file. Also the sutting down of the executor could be written to wait a few seconds before forcing a shutdown.
The problem I'm having is the first set of threads spin up but the process doesn't wait until all threads finish their work and all 5000000 lines are complete.
When you are running jobs using an ExecutorService they are added into the service and are run in the background. To wait for them to complete you need to wait for the service to terminate:
ExecutorService service = Executors.newFixedThreadPool(10);
// submit jobs to the service here
// after the last job has been submitted, we immediately shutdown the service
service.shutdown();
// then we can wait for it to terminate as the jobs run in the background
service.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
Also, if there is a crap-ton of lines in these files, I would recommend that you use a bounded queue for the jobs so that you don't blow out memory effectively caching all of the lines in the file. This only works if the files stay around and don't go away.
// this is the same as a newFixedThreadPool(10) but with a queue of 100
ExecutorService service = new ThreadPoolExecutor(10, 10,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(100));
// set a rejected execution handler so we block the caller once the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
try {
executor.getQueue().put(r);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
}
}
});
Write this entry into a chunkedwriter that eventually groups together 10000 lines to write back chunk to a remote location
As each A,B,C job finishes, if it needs to be processed in a second step then I would also recommend looking into a ExecutorCompletionService which allows you to chain various different thread pools together so as lines finish they will immediately start working on the 2nd phase of the processing.
If instead this chunkedWriter is just a single thread then I'd recommend sharing a BlockingQueue<Result> and having the executor threads put to the queue once the lines are done and the chunkedWriter taking from the queue and doing the chunking and writing of the results. In this situation, indicating to the writer thread that it is done needs to be handled carefully -- maybe with some sort of END_RESULT constant put to the queue by the main thread waiting for the service to terminate.

How to start threads at the same time when I use executors?

I have wrote following code:
ExecutorService es = Executors.newCachedThreadPool();
long start= System.currentTimeMillis();
ArrayHolder arrayHolder = new ArrayHolderBySynchronized();
List<Runnable> runnables = new ArrayList<>();
for (int i = 0; i < readerThreadCount; i++) {
es.submit(new ArrayReader(arrayHolder));
}
for (int i = 0; i < writerThreadCount; i++) {
es.submit(new ArrayWriter(arrayHolder));
}
es.shutdown();
boolean finshed = es.awaitTermination(1, TimeUnit.MINUTES);
long finished= System.currentTimeMillis();
System.out.println(finished-start);
As I understand after execution code:
es.submit(new ArrayReader(arrayHolder));
new thread can begun execute.
I want to allow to start thread only when I submit all tasks.
I know that I can achieve it if I use CountDouwnLatch. But I want to know can I achieve it if I use ExecutorServise.
You can use invokeAll method. From docs:
Executes the given tasks, returning a list of Futures holding their
status and results when all complete. Future.isDone is true for each
element of the returned list. Note that a completed task could have
terminated either normally or by throwing an exception. The results of
this method are undefined if the given collection is modified while
this operation is in progress.
<T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
throws InterruptedException;
UPDATE:
Actually, you can't rely on the time when task execution starts and order of tasks. If you submitted all tasks you can't be sure that they will be executed in some amount of time or before some event. It depends on thread scheduling, you can't control it's behaviour. Even if you submit all tasks at the same time it does not imply that they will be executed at the same time.

How to check if the threads have completed its task or not?

OK, I created couples of threads to do some complex task. Now How may I check each threads whether it has completed successfully or not??
class BrokenTasks extends Thread {
public BrokenTasks(){
super();
}
public void run(){
//Some complex tasks related to Networking..
//Example would be fetching some data from the internet and it is not known when can it be finished
}
}
//In another class
BrokenTasks task1 = new BrokenTasks();
BrokenTasks task2 = new BrokenTasks();
BrokenTasks task3 = new BrokenTasks();
BrokenTasks task4 = new BrokenTasks();
task1.start();
.....
task4.start();
So how can I check if these all tasks completed successfully from
i) Main Program (Main Thread)
ii)From each consecutive threads.For example: checking if task1 had ended or not from within task2..
A good way to use threads is not to use them, directly. Instead make a thread pool. Then in your POJO task encapsulation have a field that is only set at the end of computation.
There might be 3-4 milliseconds delay when another thread can see the status - but finally the JVM makes it so. As long as other threads do not over write it. That you can protect by making sure each task has a unique instance of work to do and status, and other threads only poll that every 1-5 seconds or have a listener that the worker calls after completion.
A library I have used is my own
https://github.com/tgkprog/ddt/tree/master/DdtUtils/src/main/java/org/s2n/ddt/util/threads
To use : in server start or static block :
package org.s2n.ddt.util;
import org.apache.log4j.Logger;
import org.junit.Test;
import org.s2n.ddt.util.threads.PoolOptions;
import org.s2n.ddt.util.threads.DdtPools;
public class PoolTest {
private static final Logger logger = Logger.getLogger(PoolTest.class);
#Test
public void test() {
PoolOptions options = new PoolOptions();
options.setCoreThreads(2);
options.setMaxThreads(33);
DdtPools.initPool("a", options);
Do1 p = null;
for (int i = 0; i < 10; i++) {
p = new Do1();
DdtPools.offer("a", p);
}
LangUtils.sleep(3 + (int) (Math.random() * 3));
org.junit.Assert.assertNotNull(p);
org.junit.Assert.assertEquals(Do1.getLs(), 10);
}
}
class Do1 implements Runnable {
volatile static long l = 0;
public Do1() {
l++;
}
public void run() {
// LangUtils.sleep(1 + (int) (Math.random() * 3));
System.out.println("hi " + l);
}
public static long getLs() {
return l;
}
}
Things you should not do:
* Don't do things every 10-15 milliseconds
* Unless academic do not make your own thread
* don't make it more complex then it needs for 97% of cases
You can use Callable and ForkJoinPool for this task.
class BrokenTasks implements Callable {
public BrokenTasks(){
super();
}
public Object call() thrown Exception {
//Some complex tasks related to Networking..
//Example would be fetching some data from the internet and it is not known when can it be finished
}
}
//In another class
BrokenTasks task1 = new BrokenTasks();
BrokenTasks task2 = new BrokenTasks();
BrokenTasks task3 = new BrokenTasks();
BrokenTasks task4 = new BrokenTasks();
ForkJoinPool pool = new ForkJoinPool(4);
Future result1 = pool.submit(task1);
Future result2 = pool.submit(task2);
Future result3 = pool.submit(task3);
Future result4 = pool.submit(task4);
value4 = result4.get();//blocking call
value3 = result3.get();//blocking call
value2 = result2.get();//blocking call
value1 = result1.get();//blocking call
And don't forget to shutdown pool after that.
Classically you simply join on the threads you want to finish. Your thread does not proceed until join completes. For example:
// await all threads
task1.join();
task2.join();
task3.join();
task4.join();
// continue with main thread logic
(I probably would have put the tasks in a list for cleaner handling)
If a thread has not been completed its task then it is still alive. So for testing whether the thread has completed its task you can use isAlive() method.
There are two different questions here
One is if the thread still working.
The other one is if the task still not finished.
Thread is a very expensive method to solve problem, when we start a thread in java, the VM has to store context informations and solve synchronize problems(such as lock). So we usually use thread pool instead of directly thread. The benefit of thread pool is that we can use few thread to handle many different tasks. That means few threads keeps alive, while many tasks are finished.
Don’t find task status from a thread.
Thread is a worker, and tasks are jobs.
A thread may work on many different jobs one by one.
I don’t think we should ask a worker if he has finished a job. I’d rather ask the job if it is finished.
When I want to check if a job is finished, I use signals.
Use signals (synchronization aid)
There are many synchronization aid tools since JDK 1.5 works like a signal.
CountDownLatch
This object provides a counter(can be set only once and count down many times). This counter allows one or more threads to wait until a set of operations being performed in other threads completes.
CyclicBarrier
This is another useful signal that allows a set of threads to all wait for each other to reach a common barrier point.
more tools
More tools could be found in JDK java.util.concurrent package.
You can use Thread.isAlive method, see API: "A thread is alive if it has been started and has not yet died". That is in task2 run() you test task1.isAlive()
To see task1 from task2 you need to pass it as an argument to task2's construtor, or make tasks fields instead of local vars
You can use the following..
task1.join();
task2.join();
task3.join();
task4.join();
// and then check every thread by using isAlive() method
e.g : task1.isAlive();
if it return false means that thread had completed it's task
otherwise it will true
I'm not sure of your exact needs, but some Java application frameworks have handy abstractions for dealing with individual units of work or "jobs". The Eclipse Rich Client Platform comes to mind with its Jobs API. Although it may be overkill.
For plain old Java, look at Future, Callable and Executor.

What is the best way to wait for the completion of all workers in a thread pool?

Imagine I have following code:
final ExecutorService threadPool = Executors.newFixedThreadPool(
NUMBER_OF_WORKERS);
for (int i=0; i < NUMBER_OF_WORKERS; i++)
{
final Worker worker = new BirthWorker(...);
threadPool.execute(worker);
}
Now I need a piece of code, which waits, until all workers have completed their work.
Options I'm aware of:
while (!threadPool.isTerminated()) {}
Modify the code like that:
final List futures = new ArrayList(NUMBER_OF_WORKERS);
final ExecutorService threadPool = Executors.newFixedThreadPool(NUMBER_OF_WORKERS);
for (int i=0; i < NUMBER_OF_WORKERS; i++)
{
final Worker worker = new Worker(...);
futures.add(threadPool.submit(worker));
}
for (final Future future : futures) {
future.get();
}
// When we arrive here, all workers are guaranteed to have completed their work.
What is the best practice to wait for the completion of all workers?
I would suggest you use CountDownLatch (assuming this is one time activity) where in your constructor, you can specify how many threads you want to wait for and you share that instance accross the threads and you wait on all the threads to complete using await api (using timeout or complete blocking) and your thread's calling countdown api when they are done.
Another option would be, to call join method in thread to wait for their completion if you have access to each and every thread that you wish to complete.
I would use ThreadPoolExecutor.invokeAll(Collection<? extends Callable<T>> tasks)
API: Executes the given tasks, returning a list of Futures holding their status and results when all complete
CountDownLatch,as stated above, would do the work well, just keep in mind that you want to shut down the executur after your done:
final ExecutorService threadPool = Executors.newFixedThreadPool(
NUMBER_OF_WORKERS);
for (int i=0; i < NUMBER_OF_WORKERS; i++)
{
final Worker worker = new BirthWorker(...);
threadPool.execute(worker);
}
threadPool.shutdown();
unless you shut it down, threadPool.isTerminated will stay false, even when all the workers are done.

Categories