I'm using Karate for testing REST API, now I'm trying to run feature files in parallel:
#CucumberOptions(tags = { "#someTest" })
public class ParallelTest {
#Test
public void testParallel() {
KarateStats stats = CucumberRunner.parallel(getClass(), 5,
"target/surefire-reports/cucumber-html-reports");
Assert.assertTrue(stats.getFailCount() == 0, "scenarios failed");
}
}
The test runs only 3 feature files in parallel and doesn't run all 5 features.
I got this code from CucumberRunner.parallel function:
CucumberRunner runner = new CucumberRunner(this.getClass());
List<FeatureFile> featureFiles = runner.getFeatureFiles();
Then tried to load my feature files, the list size is 3, that means the function didn't load all features.
Any idea why this is happening?
Note: all feature files under the same package.
Parallel() function code:
public static KarateStats parallel(Class clazz, int threadCount, String reportDir) {
KarateStats stats = KarateStats.startTimer();
ExecutorService executor = Executors.newFixedThreadPool(threadCount);
CucumberRunner runner = new CucumberRunner(clazz);
List<FeatureFile> featureFiles = runner.getFeatureFiles();
List<Callable<KarateJunitFormatter>> callables = new ArrayList<>(featureFiles.size());
int count = featureFiles.size();
for (int i = 0; i < count; i++) {
int index = i + 1;
FeatureFile featureFile = featureFiles.get(i);
callables.add(() -> {
String threadName = Thread.currentThread().getName();
KarateJunitFormatter formatter = getFormatter(reportDir, featureFile);
logger.info(">>>> feature {} of {} on thread {}: {}", index, count, threadName, featureFile.feature.getPath());
runner.run(featureFile, formatter);
logger.info("<<<< feature {} of {} on thread {}: {}", index, count, threadName, featureFile.feature.getPath());
formatter.done();
return formatter;
});
}
try {
List<Future<KarateJunitFormatter>> futures = executor.invokeAll(callables);
stats.stopTimer();
for (Future<KarateJunitFormatter> future : futures) {
KarateJunitFormatter formatter = future.get();
stats.addToTestCount(formatter.getTestCount());
stats.addToFailCount(formatter.getFailCount());
stats.addToSkipCount(formatter.getSkipCount());
stats.addToTimeTaken(formatter.getTimeTaken());
if (formatter.isFail()) {
stats.addToFailedList(formatter.getFeaturePath());
}
}
stats.printStats(threadCount);
return stats;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Thanks :)
The simplest explanation is that the tags in the #CucumberOptions is having an effect. Try commenting it out and try again. Else there is nothing I can make out from the information you have provided.
Related
I had an existing non-parallel code that we recently made concurrent by using executor service. Adding concurrency ensured limit the number of requests sent to another API in our scenario. So we are calling an external service and limiting requests, waiting for all requests to complete so as merge the responses later before sending the final response.
I am stuck on how to add a unit test/mock test to such a code, considering the private method is parallelized. I have added below my code structure to explain my situation.
I am trying to test here
#Test
public void processRequest() {
...
}
Code
int MAX_BULK_SUBREQUEST_SIZE = 10;
public void processRequest() {
...
// call to private method
ResponseList responseList = sendRequest(requestList);
}
private void sendRequest(List<..> requestList) {
List<Response> responseList = new ArrayList<>();
ExecutorService executorService = Executors.newFixedThreadPool(10);
int numOfSubRequests = requestList.size();
for (int i = 0; i < numOfSubRequests; i += MAX_BULK_SUBREQUEST_SIZE) {
List<Request> requestChunk;
if (i + MAX_BULK_SUBREQUEST_SIZE >= numOfSubRequests) {
requestChunk = requestList.subList(i, numOfSubRequests);
} else {
requestChunk = requestList.subList(i, i + MAX_BULK_SUBREQUEST_SIZE);
}
// parallelization
executorService.submit(() -> {
Response responseChunk = null;
try {
responseChunk = callService(requestChunk); // private method
} catch (XYZException e) {
...
try {
throw new Exception("Internal Server Error");
} catch (Exception ex) {
...
}
}
responseList.add(responseChunk);
});
}
executorService.shutdown();
try {
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
} catch (InterruptedException e) {..}
}
return responseList;
}
private Response callService(..) {
// call to public method1
method1(..);
// call to public method2
method2(..);
}
I was able to do so with unit tests and adding a mockito verify on how many times a method is called. If it's running in parallel after chunkifying, then the method will be called more than once equal to number of chunks it's going to process.
I tried to write code for counting files of certain type on my computer.
I tested both one thread solution and multi-threads asynch solution, and it seems like the one thread is working faster. Is anything wrong with my code? and if not, why isn't it working faster?
The code below:
AsynchFileCounter - The asynchronized version.
ExtensionFilter - The file filter to list only directories and files with the extension specified
BasicFileCounter - The one thread version.
public class AsynchFileCounter {
public int countFiles(String path, String extension) throws InterruptedException, ExecutionException {
ExtensionFilter filter = new ExtensionFilter(extension, true);
File f = new File(path);
return countFilesRecursive(f, filter);
}
private int countFilesRecursive(File f, ExtensionFilter filter) throws InterruptedException, ExecutionException {
return CompletableFuture.supplyAsync(() -> f.listFiles(filter))
.thenApplyAsync(files -> {
int count = 0;
for (File file : files) {
if(file.isFile())
count++;
else
try {
count += countFilesRecursive(file, filter);
} catch (Exception e) {
e.printStackTrace();
}
}
return count;
}).get();
}
}
public class ExtensionFilter implements FileFilter {
private String extension;
private boolean allowDirectories;
public ExtensionFilter(String extension, boolean allowDirectories) {
if(extension.startsWith("."))
extension = extension.substring(1);
this.extension = extension;
this.allowDirectories = allowDirectories;
}
#Override
public boolean accept(File pathname) {
if(pathname.isFile() && pathname.getName().endsWith("." + extension))
return true;
if(allowDirectories) {
if(pathname.isDirectory())
return true;
}
return false;
}
}
public class BasicFileCounter {
public int countFiles(String path, String extension) {
ExtensionFilter filter = new ExtensionFilter(extension, true);
File f = new File(path);
return countFilesRecursive(f, filter);
}
private int countFilesRecursive(File f, ExtensionFilter filter) {
int count = 0;
File [] ar = f.listFiles(filter);
for (File file : ar) {
if(file.isFile())
count++;
else
count += countFilesRecursive(file, filter);
}
return count;
}
}
You have to spawn multiple asynchronous jobs and must not wait immediately for their completion:
public int countFiles(String path, String extension) {
ExtensionFilter filter = new ExtensionFilter(extension, true);
File f = new File(path);
return countFilesRecursive(f, filter).join();
}
private CompletableFuture<Integer> countFilesRecursive(File f, FileFilter filter) {
return CompletableFuture.supplyAsync(() -> f.listFiles(filter))
.thenCompose(files -> {
if(files == null) return CompletableFuture.completedFuture(0);
int count = 0;
CompletableFuture<Integer> fileCount = new CompletableFuture<>(), all=fileCount;
for (File file : files) {
if(file.isFile())
count++;
else
all = countFilesRecursive(file, filter).thenCombine(all, Integer::sum);
}
fileCount.complete(count);
return all;
});
}
Note that File.listFiles may return null.
This code will count all files of a directory immediately but launch a new asynchronous job for sub-directories. The results of the sub-directory jobs are combined via thenCombine, to sum their results. For simplification, we create another CompletableFuture, fileCount to represent the locally counted files. thenCompose returns a future which will be completed with the result of the future returned by the specified function, so the caller can use join() to wait for the final result of the entire operation.
For I/O operations, it may help to use a different thread pool, as the default ForkJoinPool is configured to utilize the CPU cores rather the I/O bandwidth:
public int countFiles(String path, String extension) {
ExecutorService es = Executors.newFixedThreadPool(30);
ExtensionFilter filter = new ExtensionFilter(extension, true);
File f = new File(path);
int count = countFilesRecursive(f, filter, es).join();
es.shutdown();
return count;
}
private CompletableFuture<Integer> countFilesRecursive(File f,FileFilter filter,Executor e){
return CompletableFuture.supplyAsync(() -> f.listFiles(filter), e)
.thenCompose(files -> {
if(files == null) return CompletableFuture.completedFuture(0);
int count = 0;
CompletableFuture<Integer> fileCount = new CompletableFuture<>(), all=fileCount;
for (File file : files) {
if(file.isFile())
count++;
else
all = countFilesRecursive(file, filter,e).thenCombine(all,Integer::sum);
}
fileCount.complete(count);
return all;
});
}
There is no best number of threads, this depends on the actual execution environment and would be subject to measuring and tuning. When the application is supposed to run in different environments, this should be a configurable parameter.
But consider that you might be using the wrong tool for the job. An alternative are Fork/Join tasks, which support interacting with the thread pool to determine the current saturation, so once all worker threads are busy, it will proceed scanning locally with an ordinary recursion rather than submitting more asynchronous jobs:
public int countFiles(String path, String extension) {
ExtensionFilter filter = new ExtensionFilter(extension, true);
File f = new File(path);
return POOL.invoke(new FileCountTask(f, filter));
}
private static final int TARGET_SURPLUS = 3, TARGET_PARALLELISM = 30;
private static final ForkJoinPool POOL = new ForkJoinPool(TARGET_PARALLELISM);
static final class FileCountTask extends RecursiveTask<Integer> {
private final File path;
private final FileFilter filter;
public FileCountTask(File file, FileFilter ff) {
this.path = file;
this.filter = ff;
}
#Override
protected Integer compute() {
return scan(path, filter);
}
private static int scan(File directory, FileFilter filter) {
File[] fileList = directory.listFiles(filter);
if(fileList == null || fileList.length == 0) return 0;
List<FileCountTask> recursiveTasks = new ArrayList<>();
int count = 0;
for(File file: fileList) {
if(file.isFile()) count++;
else {
if(getSurplusQueuedTaskCount() < TARGET_SURPLUS) {
FileCountTask task = new FileCountTask(file, filter);
recursiveTasks.add(task);
task.fork();
}
else count += scan(file, filter);
}
}
for(int ix = recursiveTasks.size() - 1; ix >= 0; ix--) {
FileCountTask task = recursiveTasks.get(ix);
if(task.tryUnfork()) task.complete(scan(task.path, task.filter));
}
for(FileCountTask task: recursiveTasks) {
count += task.join();
}
return count;
}
}
I figured it out. since I am adding up the results in this line:
count += countFilesRecursive(file, filter);
and using get() to receive the result, I am actually waiting for the result, instead of really parallelising the code.
This is my current code, which actually runs much faster than the one thread code. However, I could not figure out an elegant way of knowing when the parallel method is done.
I would love to hear how should I solve that?
Here's the ugly way I am using:
public class AsynchFileCounter {
private LongAdder count;
public int countFiles(String path, String extension) {
count = new LongAdder();
ExtensionFilter filter = new ExtensionFilter(extension, true);
File f = new File(path);
countFilesRecursive(f, filter);
// ******** The way I check whether The function is done **************** //
int prev = 0;
int cur = 0;
do {
prev = cur;
try {
Thread.sleep(50);
} catch (InterruptedException e) {}
cur = (int)count.sum();
} while(cur>prev);
// ******************************************************************** //
return count.intValue();
}
private void countFilesRecursive(File f, ExtensionFilter filter) {
CompletableFuture.supplyAsync(() -> f.listFiles(filter))
.thenAcceptAsync(files -> {
for (File file : files) {
if(file.isFile())
count.increment();
else
countFilesRecursive(file, filter);
}
});
}
}
I did some changes to the code:
I use AtomicInteger to count the files instead of the LongAdder.
After reading Holger's answer, I decided to count directories being processed. When the number goes down to zero, the work is done. So I added a lock and a condition to let the main thread know when the work is done.
I added a check whether the file.listFiles() returns a null. I ran the code on windows and it never did (I had an empty directory, and it returned an empty array), but since it is using native code, it might return null on other OS.
public class AsynchFileCounter {
private AtomicInteger count;
private AtomicInteger countDirectories;
private ReentrantLock lock;
private Condition noMoreDirectories;
public int countFiles(String path, String extension) {
count = new AtomicInteger();
countDirectories = new AtomicInteger();
lock = new ReentrantLock();
noMoreDirectories = lock.newCondition();
ExtensionFilter filter = new ExtensionFilter(extension, true);
File f = new File(path);
countFilesRecursive(f, filter);
lock.lock();
try {
noMoreDirectories.await();
} catch (InterruptedException e) {}
finally {
lock.unlock();
}
return count.intValue();
}
private void countFilesRecursive(File f, ExtensionFilter filter) {
countDirectories.getAndIncrement();
CompletableFuture.supplyAsync(() -> f.listFiles(filter))
.thenAcceptAsync(files -> countFiles(filter, files));
}
private void countFiles(ExtensionFilter filter, File[] files) {
if(files != null) {
for (File file : files) {
if(file.isFile())
count.incrementAndGet();
else
countFilesRecursive(file, filter);
}
}
int currentCount = countDirectories.decrementAndGet();
if(currentCount == 0) {
lock.lock();
try {
noMoreDirectories.signal();
}
finally {
lock.unlock();
}
}
}
}
I have the following work queue implementation, which I use to limit the number of threads in use. It works by me initially adding a number of Runnable objects to the queue, and when I am ready to begin, I run "begin()". At this point I do not add any more to the queue.
public class WorkQueue {
private final int nThreads;
private final PoolWorker[] threads;
private final LinkedList queue;
Integer runCounter;
boolean hasBegun;
public WorkQueue(int nThreads) {
runCounter = 0;
this.nThreads = nThreads;
queue = new LinkedList();
threads = new PoolWorker[nThreads];
hasBegun = false;
for (int i = 0; i < nThreads; i++) {
threads[i] = new PoolWorker();
threads[i].start();
}
}
public boolean isQueueEmpty() {
synchronized (queue) {
if (queue.isEmpty() && runCounter == 0) {
return true;
} else {
return false;
}
}
}
public void begin() {
hasBegun = true;
synchronized (queue) {
queue.notify();
}
}
public void add(Runnable r) {
if (!hasBegun) {
synchronized (queue) {
queue.addLast(r);
runCounter++;
}
} else {
System.out.println("has begun executing. Cannot add more jobs ");
}
}
private class PoolWorker extends Thread {
public void run() {
Runnable r;
while (true) {
synchronized (queue) {
while (queue.isEmpty()) {
try {
queue.wait();
} catch (InterruptedException ignored) {
}
}
r = (Runnable) queue.removeFirst();
}
// If we don't catch RuntimeException,
// the pool could leak threads
try {
r.run();
synchronized (runCounter) {
runCounter--;
}
} catch (RuntimeException e) {
// You might want to log something here
}
}
}
}
}
This is a runnable I use to keep track of when all the jobs on the work queue have finished:
public class QueueWatcher implements Runnable {
private Thread t;
private String threadName;
private WorkQueue wq;
public QueueWatcher(WorkQueue wq) {
this.threadName = "QueueWatcher";
this.wq = wq;
}
#Override
public void run() {
while (true) {
if (wq.isQueueEmpty()) {
java.util.Date date = new java.util.Date();
System.out.println("Finishing and quiting at:" + date.toString());
System.exit(0);
break;
} else {
try {
Thread.sleep(1000);
} catch (InterruptedException ex) {
Logger.getLogger(PlaneGenerator.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
}
public void start() {
wq.begin();
System.out.println("Starting " + threadName);
if (t == null) {
t = new Thread(this, threadName);
t.setDaemon(false);
t.start();
}
}
}
This is how I use them:
Workqueue wq = new WorkQueue(9); //Get same results regardless of 1,2,3,8,9
QueueWatcher qw = new QueueWatcher(wq);
SomeRunnable1 sm1 = new SomeRunnable1();
SomeRunnable2 sm2 = new SomeRunnable2();
SomeRunnable3 sm3 = new SomeRunnable3();
SomeRunnable4 sm4 = new SomeRunnable4();
SomeRunnable5 sm5 = new SomeRunnable5();
wq.add(sm1);
wq.add(sm2);
wq.add(sm3);
wq.add(sm4);
wq.add(sm5);
qw.start();
But regardless of how many threads I use, the result is always the same - it always takes about 1m 10seconds to complete. This is about the same as when I just did a single threaded version (when everything ran in main()).
If I set wq to (1,2,3--9) threads it is always between 1m8s-1m10s. What is the problem ? The jobs (someRunnable) have nothing to do with each other and cannot block each other.
EDIT: Each of the runnables just read some image files from the filesystems and create new files in a separate directory. The new directory eventually contains about 400 output files.
EDIT: It seems that only one thread is always doing work. I made the following changes:
I let the Woolworker store an Id
PoolWorker(int id){
this.threadId = id;
}
Before running I print the id of the worker.
System.out.println(this.threadId + " got new task");
r.run();
In WorkQueue constructor when creating the poolworkers I do:
for (int i = 0; i < nThreads; i++) {
threads[i] = new PoolWorker(i);
threads[i].start();
}
But it seems that that only thread 0 does any work, as the output is always:
0 got new task
Use queue.notifyAll() to start processing.
Currently you're using queue.notify(), which will only wake a single thread. (The big clue that pointed me to this was when you mentioned only a single thread was running.)
Also, synchronizing on Integer runCounter isn't doing what you think it's doing - runCounter++ is actually assigning a new value to the Integer each time, so you're synchronizing on a lot of different Integer objects.
On a side note, using raw threads and wait/notify paradigms is complicated and error-prone even for the best programmers - it's why Java introduced the java.util.concurrent package, which provide threadsafe BlockingQueue implementations and Executors for easily managing multithreaded apps.
I have a rpt file, using which i will be generating multiple reports in pdf format. Using the Engine class from inet clear reports. The process takes very long as I have nearly 10000 reports to be generated. Can I use the Mutli-thread or some other approach to speed up the process?
Any help of how it can be done would be helpful
My partial code.
//Loops
Engine eng = new Engine(Engine.EXPORT_PDF);
eng.setReportFile(rpt); //rpt is the report name
if (cn.isClosed() || cn == null ) {
cn = ds.getConnection();
}
eng.setConnection(cn);
System.out.println(" After set connection");
eng.setPrompt(data[i], 0);
ReportProperties repprop = eng.getReportProperties();
repprop.setPaperOrient(ReportProperties.DEFAULT_PAPER_ORIENTATION, ReportProperties.PAPER_FANFOLD_US);
eng.execute();
System.out.println(" After excecute");
try {
PDFExportThread pdfExporter = new PDFExportThread(eng, sFileName, sFilePath);
pdfExporter.execute();
} catch (Exception e) {
e.printStackTrace();
}
PDFExportThread execute
public void execute() throws IOException {
FileOutputStream fos = null;
try {
String FileName = sFileName + "_" + (eng.getPageCount() - 1);
File file = new File(sFilePath + FileName + ".pdf");
if (!file.getParentFile().exists()) {
file.getParentFile().mkdirs();
}
if (!file.exists()) {
file.createNewFile();
}
fos = new FileOutputStream(file);
for (int k = 1; k <= eng.getPageCount(); k++) {
fos.write(eng.getPageData(k));
}
fos.flush();
fos.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (fos != null) {
fos.close();
fos = null;
}
}
}
This is a very basic code. A ThreadPoolExecutor with a fixed size threads in a pool is the backbone.
Some considerations:
The thread pool size should be equal or less than the DB connection pool size. And, it should be of an optimal number which is reasonable for parallel Engines.
The main thread should wait for sufficient time before killing all threads. I have put 1 hour as the wait time, but that's just an example.
You'll need to have proper Exception handling.
From the API doc, I saw stopAll and shutdown methods from the Engine class. So, I'm invoking that as soon as our work is done. That's again, just an example.
Hope this helps.
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.sql.Connection;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class RunEngine {
public static void main(String[] args) throws Exception {
final String rpt = "/tmp/rpt/input/rpt-1.rpt";
final String sFilePath = "/tmp/rpt/output/";
final String sFileName = "pdfreport";
final Object[] data = new Object[10];
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(10);
for (int i = 0; i < data.length; i++) {
PDFExporterRunnable runnable = new PDFExporterRunnable(rpt, data[i], sFilePath, sFileName, i);
executor.execute(runnable);
}
executor.shutdown();
executor.awaitTermination(1L, TimeUnit.HOURS);
Engine.stopAll();
Engine.shutdown();
}
private static class PDFExporterRunnable implements Runnable {
private final String rpt;
private final Object data;
private final String sFilePath;
private final String sFileName;
private final int runIndex;
public PDFExporterRunnable(String rpt, Object data, String sFilePath,
String sFileName, int runIndex) {
this.rpt = rpt;
this.data = data;
this.sFilePath = sFilePath;
this.sFileName = sFileName;
this.runIndex = runIndex;
}
#Override
public void run() {
// Loops
Engine eng = new Engine(Engine.EXPORT_PDF);
eng.setReportFile(rpt); // rpt is the report name
Connection cn = null;
/*
* DB connection related code. Check and use.
*/
//if (cn.isClosed() || cn == null) {
//cn = ds.getConnection();
//}
eng.setConnection(cn);
System.out.println(" After set connection");
eng.setPrompt(data, 0);
ReportProperties repprop = eng.getReportProperties();
repprop.setPaperOrient(ReportProperties.DEFAULT_PAPER_ORIENTATION,
ReportProperties.PAPER_FANFOLD_US);
eng.execute();
System.out.println(" After excecute");
FileOutputStream fos = null;
try {
String FileName = sFileName + "_" + runIndex;
File file = new File(sFilePath + FileName + ".pdf");
if (!file.getParentFile().exists()) {
file.getParentFile().mkdirs();
}
if (!file.exists()) {
file.createNewFile();
}
fos = new FileOutputStream(file);
for (int k = 1; k <= eng.getPageCount(); k++) {
fos.write(eng.getPageData(k));
}
fos.flush();
fos.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (fos != null) {
try {
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
fos = null;
}
}
}
}
/*
* Dummy classes to avoid compilation errors.
*/
private static class ReportProperties {
public static final String PAPER_FANFOLD_US = null;
public static final String DEFAULT_PAPER_ORIENTATION = null;
public void setPaperOrient(String defaultPaperOrientation, String paperFanfoldUs) {
}
}
private static class Engine {
public static final int EXPORT_PDF = 1;
public Engine(int exportType) {
}
public static void shutdown() {
}
public static void stopAll() {
}
public void setPrompt(Object singleData, int i) {
}
public byte[] getPageData(int k) {
return null;
}
public int getPageCount() {
return 0;
}
public void execute() {
}
public ReportProperties getReportProperties() {
return null;
}
public void setConnection(Connection cn) {
}
public void setReportFile(String reportFile) {
}
}
}
I will offer this "answer" as a possible quick & dirty solution to get you started on a parallelization effort.
One way or another you're going to build a render farm.
I don't think there is a trivial way to do this in java; I would love to have someone post an answer that show how to parallelize your example in just a few lines of code. But until that happens this will hopefully help you make some progress.
You're going to have limited scaling in the same JVM instance.
But... let's see how far you get with that and see if it helps enough.
Design challenge #1: restarting.
You will probably want a place to keep the status for each of your reports e.g. "units of work".
You want this in case you need to re-start everything (maybe your server crashes) and you don't want to re-run all of the reports thus far.
Lots of ways you can do this; database, check to see if a "completed" file exists in your report folder (not sufficient for the *.pdf to exist, as that may be incomplete... for xyz_200.pdf you could maybe make an empty xyz_200.done or xyz_200.err file to help with re-running any problem children... and by the time you code up that file manipulation/checking/initialization logic, seems like it may have been easier to add a column to your database which holds the list of work to-be-done).
Design consideration #2: maximizing throughput (avoiding overload).
You don't want to saturate you system and run one thousand reports in parallel.
Maybe 10.
Maybe 100.
Probably not 5,000.
You will need to do some sizing research and see what gets you near 80 to 90% system utilization.
Design consideration #3: scaling across multiple servers
Overly complex, outside the scope of a Stack Exchange answer.
You'd have to spin up JVM's on multiple systems that are running something like the workers below, and a report-manager that can pull work items from a shared "queue" structure, again a database table is probably easier here than doing something file-based (or a network feed).
Sample Code
Caution: None of this code is well tested, it almost certainly has an abundance of typos, logic errors and poor design. Use at your own risk.
So anyway... I do want to give you the basic idea of a rudimentary task runner.
Replace your "// Loops" example in the question with code like the following:
main loop (original code example)
This is more or less doing what your example code did, modified to push most of the work into ReportWorker (new class, see below). Lots of stuff seems to be packed into your original question's example of "// Loop", so I'm not trying to reverse engineer that.
fwiw, it was unclear to me where "rpt" and "data[i]" are coming from so I hacked up some test data.
public class Main {
public static boolean complete( String data ) {
return false; // for testing nothing is complete.
}
public static void main(String args[] ) {
String data[] = new String[] {
"A",
"B",
"C",
"D",
"E" };
String rpt = "xyz";
// Loop
ReportManager reportMgr = new ReportManager(); // a new helper class (see below), it assigns/monitors work.
long startTime = System.currentTimeMillis();
for( int i = 0; i < data.length; ++i ) {
// complete is something you should write that knows if a report "unit of work"
// finished successfully.
if( !complete( data[i] ) ) {
reportMgr.assignWork( rpt, data[i] ); // so... where did values for your "rpt" variable come from?
}
}
reportMgr.waitForWorkToFinish(); // out of new work to assign, let's wait until everything in-flight complete.
long endTime = System.currentTimeMillis();
System.out.println("Done. Elapsed time = " + (endTime - startTime)/1000 +" seconds.");
}
}
ReportManager
This class is not thread safe, just have your original loop keep calling assignWork() until you're out of reports to assign then keep calling it until all work is done, e.g. waitForWorkToFinish(), as shown above. (fwiw, I don't think you could say any of the classes here are especially thread safe).
public class ReportManager {
public int polling_delay = 500; // wait 0.5 seconds for testing.
//public int polling_delay = 60 * 1000; // wait 1 minute.
// not high throughput millions of reports / second, we'll run at a slower tempo.
public int nWorkers = 3; // just 3 for testing.
public int assignedCnt = 0;
public ReportWorker workers[];
public ReportManager() {
// initialize our manager.
workers = new ReportWorker[ nWorkers ];
for( int i = 0; i < nWorkers; ++i ) {
workers[i] = new ReportWorker( i );
System.out.println("Created worker #"+i);
}
}
private ReportWorker handleWorkerError( int i ) {
// something went wrong, update our "report" status as one of the reports failed.
System.out.println("handlerWokerError(): failure in "+workers[i]+", resetting worker.");
workers[i].teardown();
workers[i] = new ReportWorker( i ); // just replace everything.
return workers[i]; // the new worker will, incidentally, be avaialble.
}
private ReportWorker handleWorkerComplete( int i ) {
// this unit of work was completed, update our "report" status tracker as success.
System.out.println("handleWorkerComplete(): success in "+workers[i]+", resetting worker.");
workers[i].teardown();
workers[i] = new ReportWorker( i ); // just replace everything.
return workers[i]; // the new worker will, incidentally, be avaialble.
}
private int activeWorkerCount() {
int activeCnt = 0;
for( int i = 0; i < nWorkers; ++i ) {
ReportWorker worker = workers[i];
System.out.println("activeWorkerCount() i="+i+", checking worker="+worker);
if( worker.hasError() ) {
worker = handleWorkerError( i );
}
if( worker.isComplete() ) {
worker = handleWorkerComplete( i );
}
if( worker.isInitialized() || worker.isRunning() ) {
++activeCnt;
}
}
System.out.println("activeWorkerCount() activeCnt="+activeCnt);
return activeCnt;
}
private ReportWorker getAvailableWorker() {
// check each worker to see if anybody recently completed...
// This (rather lazily) creates completely new ReportWorker instances.
// You might want to try pooling (salvaging and reinitializing them)
// to see if that helps your performance.
System.out.println("\n-----");
ReportWorker firstAvailable = null;
for( int i = 0; i < nWorkers; ++i ) {
ReportWorker worker = workers[i];
System.out.println("getAvailableWorker(): i="+i+" worker="+worker);
if( worker.hasError() ) {
worker = handleWorkerError( i );
}
if( worker.isComplete() ) {
worker = handleWorkerComplete( i );
}
if( worker.isAvailable() && firstAvailable==null ) {
System.out.println("Apparently worker "+worker+" is 'available'");
firstAvailable = worker;
System.out.println("getAvailableWorker(): i="+i+" now firstAvailable = "+firstAvailable);
}
}
return firstAvailable; // May (or may not) be null.
}
public void assignWork( String rpt, String data ) {
ReportWorker worker = getAvailableWorker();
while( worker == null ) {
System.out.println("assignWork: No workers available, sleeping for "+polling_delay);
try { Thread.sleep( polling_delay ); }
catch( InterruptedException e ) { System.out.println("assignWork: sleep interrupted, ignoring exception "+e); }
// any workers avaialble now?
worker = getAvailableWorker();
}
++assignedCnt;
worker.initialize( rpt, data ); // or whatever else you need.
System.out.println("assignment #"+assignedCnt+" given to "+worker);
Thread t = new Thread( worker );
t.start( ); // that is pretty much it, let it go.
}
public void waitForWorkToFinish() {
int active = activeWorkerCount();
while( active >= 1 ) {
System.out.println("waitForWorkToFinish(): #active workers="+active+", waiting...");
// wait a minute....
try { Thread.sleep( polling_delay ); }
catch( InterruptedException e ) { System.out.println("assignWork: sleep interrupted, ignoring exception "+e); }
active = activeWorkerCount();
}
}
}
ReportWorker
public class ReportWorker implements Runnable {
int test_delay = 10*1000; //sleep for 10 seconds.
// (actual code would be generating PDF output)
public enum StatusCodes { UNINITIALIZED,
INITIALIZED,
RUNNING,
COMPLETE,
ERROR };
int id = -1;
StatusCodes status = StatusCodes.UNINITIALIZED;
boolean initialized = false;
public String rpt = "";
public String data = "";
//Engine eng;
//PDFExportThread pdfExporter;
//DataSource_type cn;
public boolean isInitialized() { return initialized; }
public boolean isAvailable() { return status == StatusCodes.UNINITIALIZED; }
public boolean isRunning() { return status == StatusCodes.RUNNING; }
public boolean isComplete() { return status == StatusCodes.COMPLETE; }
public boolean hasError() { return status == StatusCodes.ERROR; }
public ReportWorker( int id ) {
this.id = id;
}
public String toString( ) {
return "ReportWorker."+id+"("+status+")/"+rpt+"/"+data;
}
// the example code doesn't make clear if there is a relationship between rpt & data[i].
public void initialize( String rpt, String data /* data[i] in original code */ ) {
try {
this.rpt = rpt;
this.data = data;
/* uncomment this part where you have the various classes availble.
* I have it commented out for testing.
cn = ds.getConnection();
Engine eng = new Engine(Engine.EXPORT_PDF);
eng.setReportFile(rpt); //rpt is the report name
eng.setConnection(cn);
eng.setPrompt(data, 0);
ReportProperties repprop = eng.getReportProperties();
repprop.setPaperOrient(ReportProperties.DEFAULT_PAPER_ORIENTATION, ReportProperties.PAPER_FANFOLD_US);
*/
status = StatusCodes.INITIALIZED;
initialized = true; // want this true even if we're running.
} catch( Exception e ) {
status = StatusCodes.ERROR;
throw new RuntimeException("initialze(rpt="+rpt+", data="+data+")", e);
}
}
public void run() {
status = StatusCodes.RUNNING;
System.out.println("run().BEGIN: "+this);
try {
// delay for testing.
try { Thread.sleep( test_delay ); }
catch( InterruptedException e ) { System.out.println(this+".run(): test interrupted, ignoring "+e); }
/* uncomment this part where you have the various classes availble.
* I have it commented out for testing.
eng.execute();
PDFExportThread pdfExporter = new PDFExportThread(eng, sFileName, sFilePath);
pdfExporter.execute();
*/
status = StatusCodes.COMPLETE;
System.out.println("run().END: "+this);
} catch( Exception e ) {
System.out.println("run().ERROR: "+this);
status = StatusCodes.ERROR;
throw new RuntimeException("run(rpt="+rpt+", data="+data+")", e);
}
}
public void teardown() {
if( ! isInitialized() || isRunning() ) {
System.out.println("Warning: ReportWorker.teardown() called but I am uninitailzied or running.");
// should never happen, fatal enough to throw an exception?
}
/* commented out for testing.
try { cn.close(); }
catch( Exception e ) { System.out.println("Warning: ReportWorker.teardown() ignoring error on connection close: "+e); }
cn = null;
*/
// any need to close things on eng?
// any need to close things on pdfExporter?
}
}
I have a parent thread that sends messages to MQ and it manages a ThreadPoolExecutor for worker threads which listen to MQ and writes message to output file. I manage a threadpool of size 5. So when I run my program, I have 5 files with messages. Everything works fine until here. I now need to merge these 5 files in my parent thread.
How do I know ThreadPoolExecutor finished processing so I can start merging files.
public class ParentThread {
private MessageSender messageSender;
private MessageReciever messageReciever;
private Queue jmsQueue;
private Queue jmsReplyQueue;
ExecutorService exec = Executors.newFixedThreadPool(5);
public void sendMessages() {
System.out.println("Sending");
File xmlFile = new File("c:/filename.txt");
List<String> lines = null;
try {
lines = FileUtils.readLines(xmlFile, null);
} catch (IOException e) {
e.printStackTrace();
}
for (String line : lines){
messageSender.sendMessage(line, this.jmsQueue, this.jmsReplyQueue);
}
int count = 0;
while (count < 5) {
messageSender.sendMessage("STOP", this.jmsQueue, this.jmsReplyQueue);
count++;
}
}
public void listenMessages() {
long finishDate = new Date().getTime();
for (int i = 0; i < 5; i++) {
Worker worker = new Worker(i, this.messageReciever, this.jmsReplyQueue);
exec.execute(worker);
}
exec.shutdown();
if(exec.isTerminated()){ //PROBLEM is HERE. Control Never gets here.
long currenttime = new Date().getTime() - finishDate;
System.out.println("time taken: "+currenttime);
mergeFiles();
}
}
}
This is my worker class
public class Worker implements Runnable {
private boolean stop = false;
private MessageReciever messageReciever;
private Queue jmsReplyQueue;
private int processId;
private int count = 0;
private String message;
private File outputFile;
private FileWriter outputFileWriter;
public Worker(int processId, MessageReciever messageReciever,
Queue jmsReplyQueue) {
this.processId = processId;
this.messageReciever = messageReciever;
this.jmsReplyQueue = jmsReplyQueue;
}
public void run() {
openOutputFile();
listenMessages();
}
private void listenMessages() {
while (!stop) {
String message = messageReciever.receiveMessage(null,this.jmsReplyQueue);
count++;
String s = "message: " + message + " Recieved by: "
+ processId + " Total recieved: " + count;
System.out.println(s);
writeOutputFile(s);
if (StringUtils.isNotEmpty(message) && message.equals("STOP")) {
stop = true;
}
}
}
private void openOutputFile() {
try {
outputFile = new File("C:/mahi/Test", "file." + processId);
outputFileWriter = new FileWriter(outputFile);
} catch (IOException e) {
System.out.println("Exception while opening file");
stop = true;
}
}
private void writeOutputFile(String message) {
try {
outputFileWriter.write(message);
outputFileWriter.flush();
} catch (IOException e) {
System.out.println("Exception while writing to file");
stop = true;
}
}
}
How will I know when the ThreadPool has finished processing so I can do my other clean up work?
Thanks
If you Worker class implements Callable instead of Runnable, then you'd be able to see when your threads complete by using a Future object to see if the Thread has returned some result (e.g. boolean which would tell you whether it has finished execution or not).
Take a look in section "8. Futures and Callables" # website below, it has exactly what you need imo:
http://www.vogella.com/articles/JavaConcurrency/article.html
Edit: So after all of the Futures indicate that their respective Callable's execution is complete, its safe to assume your executor has finished execution and can be shutdown/terminated manually.
Something like this:
exec.shutdown();
// waiting for executors to finish their jobs
while (!exec.awaitTermination(50, TimeUnit.MILLISECONDS));
// perform clean up work
You can use a thread for monitoring ThreadPoolExecutor like that
import java.util.concurrent.ThreadPoolExecutor;
public class MyMonitorThread implements Runnable {
private ThreadPoolExecutor executor;
private int seconds;
private boolean run=true;
public MyMonitorThread(ThreadPoolExecutor executor, int delay)
{
this.executor = executor;
this.seconds=delay;
}
public void shutdown(){
this.run=false;
}
#Override
public void run()
{
while(run){
System.out.println(
String.format("[monitor] [%d/%d] Active: %d, Completed: %d, Task: %d, isShutdown: %s, isTerminated: %s",
this.executor.getPoolSize(),
this.executor.getCorePoolSize(),
this.executor.getActiveCount(),
this.executor.getCompletedTaskCount(),
this.executor.getTaskCount(),
this.executor.isShutdown(),
this.executor.isTerminated()));
try {
Thread.sleep(seconds*1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
And add
MyMonitorThread monitor = new MyMonitorThread(executorPool, 3);
Thread monitorThread = new Thread(monitor);
monitorThread.start();
to your class where ThreadPoolExecutor is located.
It will show your threadpoolexecutors states in every 3 seconds.