Thread to process multiple rest calls - java

I am trying to process around 1000 files using below code:
ExecutorService executor = Executors.newFixedThreadPool(threadPoolSize);
Runnable worker = null;
for (File file : files) {
if (file.isFile()) {
worker = new FileProcessThread(file, connectionVo);
executor.execute(worker);
file.deleteOnExit();
}
}
while (!executor.isTerminated()) {
System.out.println("Still running");
}
executor.shutdown();
System.out.println("Finished all threads");
This code creates multiple threads. Each thread has multiple rest calls inside.
These rest apis are for processing input file. Each thread also logs each transaction event which occurs while processing.
But result of these threads execution is not consistent.
For few threads it works perfectly fine.Picks the file.Logs correct
transactions and move processed file to proper directory.
But for for some threads it shows some unpredictable behavior such as it logs file process event of one thread into other.
Steps in each thread :
Create transaction - rest call
Log event in transaction for process start - rest call
Gives file to other module for file conversion - rest call which internally
creates one more thread which is synchronized
Once file is processed it is moved to other - in the same code directory
I want consistent performance out of these threads. Any help will be appreciated.
Code inside run :
long transactionID = 0l;
long connectionId = connectionVo.getConnectionId();
try {
transactionID = beginTransaction.getTransactionId();
FileInputStream processedFileData;
processedFileData = new FileInputStream(file);
Response response = Service.postMessage(stream2file,
connectionId, 0, transactionID);
if (response.getStatus() != 200) {
writToDirectory(stream2file, userError, file.getName(), transactionID);
}
} else {
String userArchive = getUserArchive();
if (checkDirectory(userArchive, transactionID)) {
writToDirectory(stream2file, userArchive, file.getName(), transactionID);
}
}
file.delete();
} catch (FileNotFoundException e) {
}

I suggest you use Java 8 to do the multi-threading as it is much cleaner.
files.parallelStream()
.filter(File::isFile)
.forEach(f -> new FileProcessThread(file, connectionVo).run());
Your task deletes the file when finished successfully.
This will only pass each file to one task.
BTW Don't call your tasks xxxThread unless they are actually a Thread, and avoid ever sub-classing a Thread.

Related

Two Threads Executing Same Method

I am developing an API request and I'm using multi threading.In the output I'm getting the same request twice generated by two threads.As I debugged two thread are calling the same method again.So need help so that this issue is resolved
This is my pseudo code
public void run() {
logger.debug("Thread " + currentThread().getName() + " Running");
String message = "";
Connection connection = null;
InputStream fileinput = null;
Properties properties = new Properties();
try {
File file = new File("/home/sridhar.anirudh/eclipse-workspace/API/Change.properties");
fileinput = new FileInputStream(file);
properties.load(fileinput);
soapEndpointUrl = properties.getProperty("endpoint_url");
soapAction = properties.getProperty("soap_action");
} catch (Exception e) {
e.printStackTrace();
}
try {
connection = Database.getInstance().getConnection();
} catch (SQLException e1) {
logger.error("Failed To Get Connection " + e1.getMessage());
return;
}
if (CATEGORY.equalsIgnoreCase("fraudrestriction")) {
String soapResponse = callSoapWebServiceFraudRestriction(soapEndpointUrl, soapAction);
String response_status = "";
if (soapResponse.contains("<tns:Description>SUCCESS</tns:Description>") &&
soapResponse.contains("<tns:Code>ERR_000</tns:Code>")) {
response_status = "SUCCESS";
If you kick off two copies of the thread, they will both run, creating the effect you see.
You can create multiple worker threads, but you need to allocate the work between those workers such that each performs a subset of the total workload.
Since you're (seemingly) parsing and processing a file, and making a network service request in response to that file's contents, it's not clear how you intend to divide up the work. That's the key; to use multiple threads to improve throughput, you the programmer must devise a means of partitioning the work between those threads.
As an analogy, if you have one (human) worker working on a job, simply hiring a second worker won't get the job completed any faster unless the work is divided between those workers. That division is your problem. There's nothing magical about threads that can do this for you.

Reading stdout of nodejs from Java (using apache commons exec). Thread safe or not?

I'm trying to write a torrent streaming client in Java using webtorrent-cli, which runs on NodeJS. When installed as a node module, webtorrent-cli gives a nice webtorrent.cmd script which can be used to work with it. When download for a torrent starts, the cli updates the standard output each second with useful details like download speed, % of torrent downloaded, seeds available etc.
To observe such a "dynamic" stdout in Java (with commons exec), I am using the following snippet:
private static Thread processCreator() {
return new Thread(() -> {
try {
// Read stdout in a thread safe manner (hopefully)
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
PumpStreamHandler handler = new PumpStreamHandler(baos);
String command = getCommand();
CommandLine cmd = CommandLine.parse(command);
Executor cmdExecutor = new DefaultExecutor();
cmdExecutor.setStreamHandler(handler);
// Schedule a service to print the content of baos each second
final ScheduledExecutorService service = Executors.newSingleThreadScheduledExecutor();
service.scheduleAtFixedRate(() -> {
try {
// Read and reset atomically
synchronized (baos) {
System.out.println(baos.toString("UTF-8"));
// Resetting so that buffer size doesn't grow arbitrarily
baos.reset();
}
}
catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}, 0, 1, TimeUnit.SECONDS);
cmdExecutor.execute(cmd);
// Let the remaining bytes be processed
sleep(1000);
// Shutdown
service.shutdown();
} catch (IOException ioe) {
ioe.printStackTrace();
}
});
}
public static void main(String[] args) throws InterruptedException {
Thread process = processCreator();
process.start();
process.join();
}
I'm concerned about how the ByteArrayOutputStream is being written. The class itself is thread safe, but if the implementation writes to the buffer byte by byte, or in a way that "updated output" (from webtorrent-cli) is only partially written to the buffer by the time scheduled service captures the monitor and starts processing, then that's going to cause problems. In this case, because I'm just printing content of the buffer, it won't be that much of trouble I guess. But I've to process the output and extract out a couple of details in the fixed scheduled service. I can think of a different way to achieve proper co-ordination (e.g.: observe the completeness of an update by marking the event when buffer receives bytes that form the first line in webtorrent-cli's stdout...and mark the update as completed when buffer receives bytes that form the last line. Each update has identical first and last lines...or at least a few bytes in the beginning and end are identical). But that would be a bit more work than this. My question is, can I be certain that write to the buffer has happened in a single atomic call to ByteArrayOutputStream.write(byte[], ...)'. I hope I've explained my question well enough. If you need more details, let me know in the comments. BTW, when the code above is run, the output suggests that co-ordination is being properly managed. But maybe I'm just lucky that the race condition has been avoided so far?

Reading container logs with 'follow' flag, thread is blocked forever

Given backend, which processes user requests of reading container logs(with follow option). Following approach is used:
Future<?> f = threadPool.submit(() -> {
try {
while (logStream.hasNext()) {
LogMessage msg = logStream.next();
String text = StandardCharsets.UTF_8.decode(msg.content()).toString();
emitter.send(SseEmitter.event().data(text).name(msg.stream().name()));
}
emitter.complete();
} catch (Exception ex) {
emitter.completeWithError(ex);
}
});
Where threadPool is just a Executors.newCachedThreadPool() and emitter is Spring's SseEmitter.
The problem is: when user no longer wants to read logs, he just closes the connection, but this thread is still running(execution is blocked in logStream.hasNext() which is calling InputStream.read(..)).
As far as I understand, hasNext() will never return false(at least while container is running properly), so this loop is endless, and we need to stop it somehow. Possible solution I tried:
emitter.onCompletion(() -> {
f.cancel(true);
});
was not successful. No InterruptedException was thrown.
Question: Is there any way of unblocking thread? Or maybe there is another approach of solving this problem (i.e. have a possibility to stop waiting for logs)?

How to read lines from a CSV to use in multiple threads

Suppose I have a CSV file with hundreds of lines with two random keywords as cells I'd like to Google search and have the first result on the page printed to the console or stored in some array. In the case of this example, I imagine I would successfully do this reading one line at a time using something like the following:
CSVReader reader = new CSVReader(new FileReader(FILE_PATH));
String [] nextLine;
while ((nextLine = reader.readNext())) !=null) {
driver.get("http://google.com/");
driver.findElement(By.name("q").click();
driver.findElement(By.name("q").clear();
driver.findElement(By.name("q").sendKeys(nextLine[0] + " " + nextLine[1]);
System.out.println(driver.findElement(By.xpath(XPATH_TO_1ST));
}
How would I go about having 5 or however many threads of chromedriver through selenium process the CSV file as fast as possible? I've been able to get 5 lines done at a time implementing Runnable on a class that does this and starting 5 threads, but I would like to know if there is a solution where as soon as one thread is complete, it processes the next available or unprocessed line, as opposed to waiting for the 5 searches to process, then going on to the next 5 lines. Would appreciate any suggested reading or tips on cracking this!
This is a pure java response, rather than specifically a selenium response.
You want to partition the data. A crude but effective partitioner can be made by reading a row from the CSV file and putting it in a Queue. Afterwards, run as many threads as you can profitably use to simply pull the next entry off of the queue and process it.
If you want to do 5 (or more) threads at the same time, you would need to start 5 instances of WebDriver as it is not thread safe. As for updating the CSV, you would need to synchronize writes to that for each thread to prevent corruption to the file itself, or you could batch up updates at some threshold and write several lines at once.
See this Can Selenium use multi threading in one browser?
Update:
How about this? It ensures the web driver is not re-used between threads.
CSVReader reader = new CSVReader(new FileReader(FILE_PATH));
// number to do at same time
int concurrencyCount = 5;
ExecutorService executorService = Executors.newFixedThreadPool(concurrencyCount);
CompletionService<Boolean> completionService = new ExecutorCompletionService<Boolean>(executorService);
String[] nextLine;
// ensure we use a distinct WebDriver instance per thread
final LinkedBlockingQueue<WebDriver> webDrivers = new LinkedBlockingQueue<WebDriver>();
for (int i=0; i<concurrencyCount; i++) {
webDrivers.offer(new ChromeDriver());
}
int count = 0;
while ((nextLine = reader.readNext()) != null) {
final String [] line = nextLine;
completionService.submit(new Callable<Boolean>() {
public Boolean call() {
try {
// take a webdriver from the queue to use
final WebDriver driver = webDrivers.take();
driver.get("http://google.com/");
driver.findElement(By.name("q")).click();
driver.findElement(By.name("q")).clear();
driver.findElement(By.name("q")).sendKeys(line[0] + " " + line[1]);
System.out.println(line[1]);
line[2] = driver.findElement(By.xpath(XPATH_TO_1ST)).getText();
// put webdriver back on the queue
webDrivers.offer(driver);
return true;
} catch (InterruptedException e) {
e.printStackTrace();
return false;
}
}
});
count++;
}
boolean errors = false;
while(count-- > 0) {
Future<Boolean> resultFuture = completionService.take();
try {
Boolean result = resultFuture.get();
} catch(Exception e) {
e.printStackTrace();
errors = true;
}
}
System.out.println("done, errors=" + errors);
for (WebDriver webDriver : webDrivers) {
webDriver.close();
}
executorService.shutdown();
You can create Callable for each row and give it to the ExecutorService. It takes care of the execution of the tasks and manages the worker threads for you. Carefully choose the thread pool size for optimal execution time.
More information about thread pool size can be found here

java thread waiting for dead process to finish

I wrote a java class in order to perform multithreaded tasks, each task running an external process.
The process is in charge of converting ".chp" files into ".txt" files. It is written in C.
This process breaks at one point because it disappears when looking at a "top" in my terminal (probably due to a corrupted chp file). The problem is that the process in my java thread does not return. The "process.waitFor()" seems to go on forever (at least 'til the 12 hours I specified for the ExecutorService.
Am I doing something wrong (not catching an exception?)?
I tried setting a class variable of type String in MyThread and putting an error message in place of throwing a new RuntimeException, then print the String at the end of the main, but the thread code doesn't reach to this point. It still gets stuck at the waitFor().
Shouldn't the process terminate once the C program has failed?
The program prints on the terminal (cf: MyThread):
A
B
C
main:
String pathToBin = "/path/to/bin";
List<MyThread> threadList = new ArrayList<MyThread>();
for (File f : folderList) {
File[] chpFilesInFolder = f.listFiles(new FilenameFilter() {
#Override
public boolean accept(File dir, String name) {
if (name.endsWith(".chp")){
return true;
}else{
return false;
}
}
});
File chpFile = writeChpFiles(chpFilesInFolder);
String[] cmd = {pathToBin, "--arg1", chpFile, "--out-dir", outputFolder};
MyThread t = new MyThread(cmd, f, chpFilesInFolder);
threadList.add(t);
}
ExecutorService threadExecutor = Executors.newFixedThreadPool(4);
for(MyThread th : threadList){
threadExecutor.execute(th);
}
threadExecutor.shutdown();
try {
threadExecutor.awaitTermination(12, TimeUnit.HOURS);
} catch (InterruptedException e) {
e.printStackTrace();
}
MyThread:
class MyThread extends Thread{
private String[] cmd;
private File chpFolder;
private File[] chpFilesInFolder;
public MyThread(String[] cmd, File chpFolder, File[] chpFilesInFolder){
this.cmd = cmd;
this.chpFolder = chpFolder;
this.chpFilesInFolder = chpFilesInFolder;
}
#Override
public void run() {
Process process = null;
try{
System.err.println("A ");
ProcessBuilder procBuilder = new ProcessBuilder(cmd);
procBuilder.redirectErrorStream(true);
System.err.println("B");
process = procBuilder.start();
System.err.println("C");
process.waitFor();
System.err.println("D");
if(process.exitValue()!=0) System.err.println("ERROR !"+process.exitValue());
System.err.println("E");
}catch(IOException e){
e.printStackTrace();
}catch(InterruptedException e){
e.printStackTrace();
}catch(Throwable e){
e.printStackTrace();
}finally{
System.err.println("F");
if(process!=null) {try { process.destroy();} catch(Exception err) {err.printStackTrace();}}
}
File[] txtFilesInFolder = chpFolder.listFiles(new FilenameFilter() {
#Override
public boolean accept(File dir, String name) {
if (name.endsWith(".chp.txt")){
return true;
}else{
return false;
}
}
});
if (txtFilesInFolder.length==chpFilesInFolder.length){
for (File chp : chpFilesInFolder) {
chp.delete();
}
File logFile = new File(chpFolder, "apt-chp-to-txt.log");
if (logFile.exists()){
logFile.delete();
}
}else{
throw new RuntimeException("CHPs have not all been transformed to TXT in "+chpFolder.getAbsolutePath());
}
Is it possible that your C program is producing output on stdout? If so, you need to read Process.getOutputStream() before Process.waitFor() returns - see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4254231
Alternatively, call your C program that a shell script that redirects stdout.
You can use the jstack command to confirm that the thread is indeed blocked at Process.waitFor().
You could have the main thread wait for a reasonable amount of time and then call some method on the MyThread class to kill the started process, thus causing the thread to finish.
as often I would suggest to use a more robust and professional point of view while using a messsaging solution to make your C program interact with your Java application, it will be easy and clean to avoid those non daemon threads waiting for ever because of the crash of your C application... now all brokers have a STOMP interface which is pretty cool for any kind of application to invoke (just use any Http library), broker configuration will enable to restart non finished jobs, to put some timeouts and so one..Even if JMS does not support request and response it's quite easy to implement such paradigm....
HTH
Jerome
If I understad correctly, your Java threads remain waiting after the C program crashes.
Make the spawned C process send heart beats. You can do this even by printing sth to console (or inserting in a table) and have the Java thread every so often wake up and check the heartbeat. If it's not there, assume the C process died and terminate the thread.
Launching external processes in Java can get a little bit tricky. I usually try to avoid them as you'll have to deal with different error codes and some terminal madness. I recommend you use specialized libraries such as commons-exec (http://commons.apache.org/exec/)

Categories