I have a piece of java code which does the following -
Opens a file with data in format {A,B,C} and each file has approx. 5000000 lines.
For each line in file, call a service that gives a column D and append it to {A,B,C} as {A,B,C,D}.
Write this entry into a chunkedwriter that eventually groups together 10000 lines to write back chunk to a remote location
Right now the code is taking 32 hours to execute. This process would again get repeated across another file which hypothetically takes another 32 hours but we need these processes to run daily.
Step 2 is further complicated by the fact that sometimes the service does not have D but is designed to fetch D from its super data store so it throws a transient exception asking you to wait. We have retries to handle this so an entry could technically be retried 5 times with a max delay of 60000 millis. So we could be looking at 5000000 * 5 in worst case.
The combination of {A,B,C} are unique and thus result D can't be cached and reused and a fresh request has to be made to get D every time.
I've tried adding threads like this:
temporaryFile = File.createTempFile(key, ".tmp");
Files.copy(stream, temporaryFile.toPath(),
StandardCopyOption.REPLACE_EXISTING);
reader = new BufferedReader(new InputStreamReader(new
FileInputStream(temporaryFile), StandardCharsets.UTF_8));
String entry;
while ((entry = reader.readLine()) != null) {
final String finalEntry = entry;
service.execute(() -> {
try {
processEntry(finalEntry);
} catch (Exception e) {
log.error("something");
});
count++;
}
Here processEntry method abstracts the implementation details explained above and threads are defined as
ExecutorService service = Executors.newFixedThreadPool(10);
The problem I'm having is the first set of threads spin up but the process doesn't wait until all threads finish their work and all 5000000 lines are complete. So the task that used to wait for completion for 32 hours now ends in <1min which messes up our system's state. Are there any alternative ways to do this? How can I make process wait on all threads completing?
Think about using ExecutorCompletionService if you want to take tasks as they complete you need an ExecutorCompletionService. This acts as a BlockingQueue that will allow you to poll for tasks as and when they finish.
Another solution is to wait the executor termination then you shut it down using:
ExecutorService service = Executors.newFixedThreadPool(10);
service .shutdown();
while (!service .isTerminated()) {}
One alternative is to use a latch to wait for all the tasks to complete before you shutdown the executor on the main thread.
Initialize a CountdownLatch with 1.
After you exit the loop that submits the tasks, you call latch.await();
In the task you start you have to have a callback on the starting class to let it know when a task has finished.
Note that in the starting class the callback function has to be synchronized.
In the starting class you use this callback to take the count of completed tasks.
Also inside the callback, when all tasks have completed, you call latch.countdown() for the main thread to continue, lets say, shutting down the executor and exiting.
This shows the main concept, it can be implemented with more detail and more control on the completed tasks if necessary.
It would be something like this:
public class StartingClass {
CountDownLatch latch = new CountDownLatch(1);
ExecutorService service = Executors.newFixedThreadPool(10);
BufferedReader reader;
Path stream;
int count = 0;
int completed = 0;
public void runTheProcess() {
File temporaryFile = File.createTempFile(key, ".tmp");
Files.copy(stream, temporaryFile.toPath(),
StandardCopyOption.REPLACE_EXISTING);
reader = new BufferedReader(new InputStreamReader(new
FileInputStream(temporaryFile), StandardCharsets.UTF_8));
String entry;
while ((entry = reader.readLine()) != null) {
final String finalEntry = entry;
service.execute(new Task(this,finalEntry));
count++;
}
latch.await();
service.shutdown();
}
public synchronized void processEntry(String entry) {
}
public synchronized void taskCompleted() {
completed++;
if(completed == count) {
latch.countDown();;
}
}
//This can be put in a different file.
public static class Task implements Runnable {
StartingClass startingClass;
String finalEntry;
public Task(StartingClass startingClass, String finalEntry) {
this.startingClass = startingClass;
this.finalEntry = finalEntry;
}
#Override
public void run() {
try {
startingClass.processEntry(finalEntry);
startingClass.taskCompleted();
} catch (Exception e) {
//log.error("something");
};
}
}
}
Note that you need to close the file. Also the sutting down of the executor could be written to wait a few seconds before forcing a shutdown.
The problem I'm having is the first set of threads spin up but the process doesn't wait until all threads finish their work and all 5000000 lines are complete.
When you are running jobs using an ExecutorService they are added into the service and are run in the background. To wait for them to complete you need to wait for the service to terminate:
ExecutorService service = Executors.newFixedThreadPool(10);
// submit jobs to the service here
// after the last job has been submitted, we immediately shutdown the service
service.shutdown();
// then we can wait for it to terminate as the jobs run in the background
service.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
Also, if there is a crap-ton of lines in these files, I would recommend that you use a bounded queue for the jobs so that you don't blow out memory effectively caching all of the lines in the file. This only works if the files stay around and don't go away.
// this is the same as a newFixedThreadPool(10) but with a queue of 100
ExecutorService service = new ThreadPoolExecutor(10, 10,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(100));
// set a rejected execution handler so we block the caller once the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
try {
executor.getQueue().put(r);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
}
}
});
Write this entry into a chunkedwriter that eventually groups together 10000 lines to write back chunk to a remote location
As each A,B,C job finishes, if it needs to be processed in a second step then I would also recommend looking into a ExecutorCompletionService which allows you to chain various different thread pools together so as lines finish they will immediately start working on the 2nd phase of the processing.
If instead this chunkedWriter is just a single thread then I'd recommend sharing a BlockingQueue<Result> and having the executor threads put to the queue once the lines are done and the chunkedWriter taking from the queue and doing the chunking and writing of the results. In this situation, indicating to the writer thread that it is done needs to be handled carefully -- maybe with some sort of END_RESULT constant put to the queue by the main thread waiting for the service to terminate.
Related
I have 5 threads (5 instances of one Runnable class) starting approximately at the same time (using CyclicBarrier) and I need to stop them all as soon as one of them finished.
Currently, I have a static volatile boolean field threadsOver that I'm setting to true at the end of doSomething(), the method that run() is calling.
private static final CyclicBarrier barrier = new CyclicBarrier(5);
private static volatile boolean threadsOver;
#Override
public void run() {
try {
/* waiting for all threads to have been initialised,
so as to start them at the same time */
barrier.await();
doSomething();
} catch (InterruptedException | BrokenBarrierException e) {
e.printStackTrace();
}
}
public void doSomething() {
// while something AND if the threads are not over yet
while (someCondition && !threadsOver) {
// some lines of code
}
// if the threads are not over yet, it means I'm the first one to finish
if (!threadsOver) {
// so I'm telling the other threads to stop
threadsOver = true;
}
}
The problem with that code is that the code in doSomething() is executing too fast and as a result, the threads that finish after the first one are already over by the time that the first thread noticed them.
I tried adding some delay in doSomething() using Thread.sleep(), which reduced the number of threads which finished even after the first one, but there are still some times where 2 or 3 threads will finish execution completely.
How could I make sure that when one thread is finished, all of the others don't execute all the way to the end?
First where I copied code snippets from: https://www.baeldung.com/java-executor-service-tutorial .
As you have 5 tasks of which every one can produce the result, I prefer Callable, but Runnable with a side effect is handled likewise.
The almost simultaneous start, the Future task aspect, and picking the first result can be done by invokeAny below:
Callable<Integer> callable1 = () -> {
return 1*2*3*5*7/5;
};
List<Callable<Integer>> callables = List.of(callable1, callable2, ...);
ExecutorService executorService = new ThreadPoolExecutor(5);
Integer results = executorService.invokeAny(callables);
executorService.shutDown();
invokeAny() assigns a collection of tasks to an ExecutorService, causing each to run, and returns the result of a successful execution of one task (if there was a successful execution).
So my goal is to measure the performance of a Streaming Engine. It's basically a library to which i can send data-packages. The idea to measure this is to generate data, put it into a Queue and let the Streaming Engine grab the data and process it.
I thought of implementing it like this: The Data Generator runs in a thread and generates data packages in an endless loop with a certain Thread.sleep(X) at the end. When doing the tests the idea is to minimize tis Thread.sleep(X) to see if this has an impact on the Streaming Engine's performance. The Data Generator writes the created packages into a queue, that is, a ConcurrentLinkedQueue, which at the same time is a Singleton.
In another thread I instantiate the Streaming Engine which continuously removes the packages from the queue by doing queue.remove(). This is done in an endlees loop without any sleeping, because it should just be done as fast as possible.
In a first try to implement this I ran into a problem. It seems as if the Data Generator is not able to put the packages into the Queue as it should be. It is doing that too slow. My suspicion is that the endless loop of the Streaming Engine thread is eating up all the resources and therefore slows down everything else.
I would be happy about how to approach this issue or other design patterns, which could solve this issue elegantly.
the requirements are: 2 threads which run in parallel basically. one is putting data into a queue. the other one is reading/removing from the queue. and i want to measure the size of the queue regularly in order to know if the engine which is reading/removing from the queue is fast enough to process the generated packages.
You can use a BlockingQueue, for example ArrayBlockingQueue, you can initialize these to a certain size, so the number of items queued will never exceed a certain number, as per this example:
// create queue, max size 100
final ArrayBlockingQueue<String> strings = new ArrayBlockingQueue<>(100);
final String stop = "STOP";
// start producing
Runnable producer = new Runnable() {
#Override
public void run() {
try {
for(int i = 0; i < 1000; i++) {
strings.put(Integer.toHexString(i));
}
strings.put(stop);
} catch(InterruptedException ignore) {
}
}
};
Thread producerThread = new Thread(producer);
producerThread.start();
// start monitoring
Runnable monitor = new Runnable() {
#Override
public void run() {
try {
while (true){
System.out.println("Queue size: " + strings.size());
Thread.sleep(5);
}
} catch(InterruptedException ignore) {
}
}
};
Thread monitorThread = new Thread(monitor);
monitorThread.start();
// start consuming
Runnable consumer = new Runnable() {
#Override
public void run() {
// infinite look, will interrupt thread when complete
try {
while(true) {
String value = strings.take();
if(value.equals(stop)){
return;
}
System.out.println(value);
}
} catch(InterruptedException ignore) {
}
}
};
Thread consumerThread = new Thread(consumer);
consumerThread.start();
// wait for producer to finish
producerThread.join();
consumerThread.join();
// interrupt consumer and monitor
monitorThread.interrupt();
You could also have third thread monitoring the size of the queue, to give you an idea of which thread is outpacing the other.
Also, you can used the timed put method and the timed or untimed offer methods, which will give you more control of what to do if the queue if full or empty. In the above example execution will stop until there is space for the next element or if there are no further elements in the queue.
We have a scheduled task that runs every 10 seconds and a thread pool with 3 threads that actually update a static common map. Every 10 seconds the scheduled action prints this map.
The problem is that I want the scheduler to stop printing after the 3 threads finish with the map. But here is the key. I don't want to stop scheduler instantly, I want to print first ( the final version of the map) and then finishes.
public class myClass implements ThreadListener {
public static ArrayList<Pair<String, Integer>> wordOccurenceSet = new ArrayList<Pair<String, Integer>>();
int numberOfThreads = 0;
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
public void getAnswer(Collection<CharacterReader> characterReaders, Outputter outputter) {
ExecutorService executor = Executors.newFixedThreadPool(characterReaders.size());
OutputterWriteBatch scheduledThread = new OutputterWriteBatch(outputter,wordOccurenceSet);
scheduler.scheduleAtFixedRate(scheduledThread, 10, 10, TimeUnit.SECONDS);
for (CharacterReader characterReader : characterReaders) {
NotifyingRunnable runnable = new CharacterReaderTask(characterReader, wordOccurenceSet);
runnable.addListener(this);
executor.execute(runnable);
}
}
#Override
public void notifyRunnableComplete(Runnable runnable) {
numberOfThreads += 1;
if(numberOfThreads == 3 ){
//All threads finished... What can I do to terminate after one more run?
}
}
}
The Listener actually just get notified when a thread finishes.
First of all, make your numberOfThreads synchronized. You don't want it to become corrupted when two Reader threads finish concurrently. It's a primitive int so it may not be corruptable (i am not that proficient with JVM), but the general rules of thread safety should be followed anyway.
// 1. let finish OutputterWriteBatch if currently running
scheduler.shutdown();
// 2. will block and wait if OutputterWriteBatch was currently running
scheduler.awaitTermination(someReasonableTimeout);
// 3. one more shot.
scheduler.schedule(scheduledThread,0);
// You could also run it directly if your outputting logic in run()
// is published via separate method, but i don't know the API so i suppose
// only Runnable is published
But this shouldn't be called directly from notifyRunnableComplete, of course. The listener method is called from your Reader threads, so it would block the last one of 3 threads from finishing timely. Rather make a notification object which some other thread will wait() on (preferably the one which executed getAnswer()), notify() it when numberOfThreads reaches 3 and put the above code after the wait().
Oh, and when wait() unblocks, you should double check that numberOfThreads is really 3, if not, cycle back to wait(). Google "spurious wakeup" to explanation why this is needed.
I have a multithreaded execution and I want to track and print out the execution time, but when I execute the code, the child threads takes longer than the main execution, thus the output is not visible nor it prints the right value, since it is terminating earlier.
Here is the code:
public static void main(String[] args) throws CorruptIndexException, IOException, LangDetectException, InterruptedException {
/* Initialization */
long startingTime = System.currentTimeMillis();
Indexer main = new Indexer(); // this class extends Thread
File file = new File(SITES_PATH);
main.addFiles(file);
/* Multithreading through ExecutorService */
ExecutorService es = Executors.newFixedThreadPool(4);
for (File f : main.queue) {
Indexer ind = new Indexer(main.writer, main.identificatore, f);
ind.join();
es.submit(ind);
}
es.shutdown();
/* log creation - code I want to execute when all the threads execution ended */
long executionTime = System.currentTimeMillis()-startingTime;
long minutes = TimeUnit.MILLISECONDS.toMinutes(executionTime);
long seconds = TimeUnit.MILLISECONDS.toSeconds(executionTime)%60;
String fileSize = sizeConversion(FileUtils.sizeOf(file));
Object[] array = {fileSize,minutes,seconds};
logger.info("{} indexed in {} minutes and {} seconds.",array);
}
I tried several solutions such as join(), wait() and notifyAll(), but none of them worked.
I found this Q&A on stackoverflow which treats my problem, but join() is ignored and if I put
es.awaitTermination(timeout, TimeUnit.SECONDS);
actually the executor service never executes threads.
Which can be the solution for executing multithreading only in ExecutorService block and finish with main execution at the end?
Given your user case you might as well utilize the invokeAll method. From the Javadoc:
Executes the given tasks, returning a list of Futures holding their
status and results when all complete. Future.isDone() is true for each
element of the returned list. Note that a completed task could have
terminated either normally or by throwing an exception. The results of
this method are undefined if the given collection is modified while
this operation is in progress.
To use:
final Collection<Indexer> tasks = new ArrayList<Indexer>();
for(final File f: main.queue) {
tasks.add(new Indexer(main.writer, main.identificatore, f));
}
final ExecutorService es = Executors.newFixedThreadPool(4);
final List<Future<Object>> results = es.invokeAll(tasks);
This will execute all supplied tasks and wait for them to finish processing before proceeding on your main thread. You will need to tweak the code to fit your particular needs, but you get the gist. A quick note, there is a variant of the invokeAll method that accepts timeout parameters. Use that variant if you want to wait up to a maximum amount of time before proceeding. And make sure to check the results collected after the invokeAll is done, in order to verify the status of the completed tasks.
Good luck.
The ExecutorService#submit() method returns a Future object, which can be used for waiting until the submitted task has completed.
The idea is that you collect all of these Futures, and then call get() on each of them. This ensures that all of the submitted tasks have completed before your main thread continues.
Something like this:
ExecutorService es = Executors.newFixedThreadPool(4);
List<Future<?>> futures = new ArrayList<Future<?>>();
for (File f : main.queue) {
Indexer ind = new Indexer(main.writer, main.identificatore, f);
ind.join();
Future<?> future = es.submit(ind);
futures.add(future);
}
// wait for all tasks to complete
for (Future<?> f : futures) {
f.get();
}
// shutdown thread pool, carry on working in main thread...
I have a .csv file containing over 70 million lines of which each line is to generate a Runnable and then executed by threadpool. This Runnable will insert a record into Mysql.
What's more , I want to record a position of the csv file for the RandomAccessFile to locate. The position is written to a File.I want to write this record when all the threads in threadpool are finished.So ThreadPoolExecutor.shutdown() is invoked. But when more lines come, I need a threadpool again. How can I reuse this current threadpool instead of make a new one.
The code is as follows:
public static boolean processPage() throws Exception {
long pos = getPosition();
long start = System.currentTimeMillis();
raf.seek(pos);
if(pos==0)
raf.readLine();
for (int i = 0; i < PAGESIZE; i++) {
String lineStr = raf.readLine();
if (lineStr == null)
return false;
String[] line = lineStr.split(",");
final ExperienceLogDO log = CsvExperienceLog.generateLog(line);
//System.out.println("userId: "+log.getUserId()%512);
pool.execute(new Runnable(){
public void run(){
try {
experienceService.insertExperienceLog(log);
} catch (BaseException e) {
e.printStackTrace();
}
}
});
long end = System.currentTimeMillis();
}
BufferedWriter resultWriter = new BufferedWriter(
new OutputStreamWriter(new FileOutputStream(new File(
RESULT_FILENAME), true)));
resultWriter.write("\n");
resultWriter.write(String.valueOf(raf.getFilePointer()));
resultWriter.close();
long time = System.currentTimeMillis()-start;
System.out.println(time);
return true;
}
Thanks !
As stated in the documentation, you cannot reuse an ExecutorService that has been shut down. I'd recommend against any workarounds, since (a) they may not work as expected in all situations; and (b) you can achieve what you want using standard classes.
You must either
instantiate a new ExecutorService; or
not terminate the ExecutorService.
The first solution is easily implemented, so I won't detail it.
For the second, since you want to execute an action once all the submitted tasks have finished, you might take a look at ExecutorCompletionService and use it instead. It wraps an ExecutorService which will do the thread management, but the runnables will get wrapped into something that will tell the ExecutorCompletionService when they have finished, so it can report back to you:
ExecutorService executor = ...;
ExecutorCompletionService ecs = new ExecutorCompletionService(executor);
for (int i = 0; i < totalTasks; i++) {
... ecs.submit(...); ...
}
for (int i = 0; i < totalTasks; i++) {
ecs.take();
}
The method take() on the ExecutorCompletionService class will block until a task has finished (either normally or abruptly). It will return a Future, so you can check the results if you wish.
I hope this can help you, since I didn't completely understand your problem.
create and group all tasks and submit them to the pool with invokeAll (which only returns when all tasks are successfully completed)
After calling shutdown on a ExecutorService, no new Task will be accepted. This means you have to create a new ExecutorService for each round of tasks.
However, with Java 8 ForkJoinPool.awaitQuiescence was introduced. If you can switch from a normal ExecutorService to ForkJoinPool, you can use this method to wait until no more tasks are running in a ForkJoinPool without having to call shutdown. This means you can fill a ForkJoinPool with Tasks, waiting until it is empty (quiescent), and then again begin filling it with Tasks, and so on.