I am developing a web application in which I am running Java in the front end and shell script at the back end. The application is mainly about analysis many files and the java program gets the inputs from the user such as which file they want to analyze from which date to which date they want to analyze.Lets assume user gives data from July 1-8. I need to process the 8 days file. Each day has about 100 files to be processed. So my goal is to make this process in parallel than doing this sequential. I have basically two ideas regarding this. I wanted to share this with you ppl and get ur suggestions.
PLAN 1:
There is a Java program(Business Layer), which invokes a shell script using process builder. Can I split the given date by the user, for instance (1-8) into 4 threads where each thread would do the operation of two days. such as (1-2) thread 1 and (3-4) thread 2 and it goes on. If I follow this approach what are all the pros and cons. Also how coordinate among the threads by this approach.
Plan 2:
Call the shell script from Java and inside the shell script spawn multiple processes and as I said earlier, I can spawn process 1 to do the job of date (1-2) and process 2 (3-4) and it goes on. What are all the pros and cons of this approach. And I am writing the processed output into a single file. So if I have multiple processes how can I make the single file updated by multiple processes.
Also any reference of any links related to my question
IMPORTANT:
As I told I need to process 100's of log files for each day inside a shell script, and one of my requirement is to constantly update my front end environment regarding the status of my jobs in shell script (i.e) day 1 has completed, day 2 has completed and so on . I know I can do echo from shell script and then I can get the value from Java. But the problem is if I do an echo inside the shell script, inside the loop of processing the files, my call terminates and I again have to call back from Java. Any ideas of how to make this update happen.
First, I would suggest considering the first rule of optimization: do not optimize.
Then if you really think you need to optimize it, I would pick the 1st approach and do as much as possible in Java.
One approach could be the following:
1) run all the processes with ProcessBuilder and create a List<Process>
2) Wrap each Process into a ShellScriptProcess and acquire a List<ShellScriptProcess>
class ShellScriptProcess implements Runneable() {
private Process process;
public ShellScriptProcess(Process process) {
this.process=process;
}
boolean synchronized finished = false;
public void run() {
process.waitFor(); //this will wait until the process terminates
finished = true;
}
public boolean isFinished(){
return finished;
}
}
3) wait for processes to finish
while(!allFinished) {
for (ShellScriptProcess sp : shellScriptProcesses) {
allFinished = true;
if (sp.isFinished()) {
// hurray, a process has finished, inform the UI
// you want to do something smarter here though,
//like removing the finished processes from the list
}
else {
allFinished = false;
}
}
}
This is only a very rough solution, just to demonstrate the idea of how this could be acomplished. And I didn't test the code, it might contain of syntax errors :) Hope this helps.
Related
I'm facing a problem here and i'm thinking you guys might be able to help/point me toward appropriate documentation.
But first, context:
I'm working on this c++ script that can call and run different java runnables with arguments. But I'm a complete noob in c++, and in general coding. Started Java a couple weeks back.
this c++ script has for purpose (among other things) to intercept inputs and simulate other inputs that will then be read by already running java threads. The goal is for java to be able to run a loop like this:
public class CalledByCpp extends JFrame{
protected static KeyList listener = new KeyList(); //custom KeyListener
//frame constructor goes here
public void main(String[] args){
//initialization of bunch of things
this.addKeyListener(listener);
while(true){
requestFocus();
}
}
protected static class KeyList(){
//Handling of key pressed
}
Does this sounds possible? Or is there an easier method? And what's the event creation method i'm looking for in c++?
Side note: Both cpp and java would run on a Linux desktop (debian) without any screen attached so I assume it's safe for the java frame to loop on requestFocus(). Right?
Another lead I had was to build a driver, but I have no idea what's the difference, and how to do that, or even if it's a lead worth investing time and effort into when this cpp could act as a driver itself.
Thanks a lot!
You will probably want to give the input to Java through standard in. The 'event' to listen for could just be reading a line from standard in, or possibly multiple lines. You would just need to have some way of detecting when to stop reading and do something. Since all threads in your Java program would receive this data, you would also need some way of detecting which thread was intended to use that data. I would start your C++ program first, and then launch Java from within your C++ program, set up so that you can write data into standard in of the Java process from the C++ program. In a UNIX environment, there is a standard way to do this. From your C++ program, run something like the following (you will need to include the header unistd.h):
int fd[2];
// For sharing data with Java process
pipe(fd);
pid_t id = fork();
if (id < 0) {// system error
printf("Failed to fork process");
exit(1);
}
// parent process
else if (id > 0) {// id of child process (Java)
// optional: writes to standard out go to Java process
// can also use write(fd[1], <data>, <length>)
dup2(1, fd[1]);
close(fd[0]);
// do stuff and write to Java process as needed
// signal end of data to Java process
close(fd[1]);
// wait for Java process to exit and clean it up
if (wait(NULL) < 0) {// system error
exit(2);
}
}
else {// child (Java) process, id = 0
// Read stdin from parent (C++) process
dup2(0, fd[0]);
close(fd[1]);
char* path = "/path/to/java";
// Arguments to java. Always end with NULL
char* argv[] = {"Java_arg1", "Java_arg2", NULL}
// Run java
if (execv(path, argv) < 0) {// system error
printf("Failed to execute java command");
exit(1);
}
}
So I've had the time to try out a quick code. Using Xdotool, which you can find on github or apt-get install xdotool
//in the loop deciding which input to simulate
system("xdotool key T");
this would simulate one press of the key T. No console popping up, the script can run in background and generate inputs easily. They are actual system input so I've been able to read them in java.
I'm trying to execute a Spark program with spark-submit (in particular GATK Spark tools, so the command is not spark-submit, but something similar): this program accept an only input, so I'm trying to write some Java code in order to accept more inputs.
In particular I'm trying to execute a spark-submit for each input, through the pipe function of JavaRDD:
JavaRDD<String> bashExec = ubams.map(ubam -> par1 + "|" + par2)
.pipe("/path/script.sh");
where par1 and par2 are parameters that will be passed to the script.sh, which will handle (splitting by "|" ) and use them to execute something similar to spark-submit.
Now, I don't expect to obtain speedup compared to the execution of a single input because I'm calling other Spark functions, but just to distribute the workload of more inputs on different nodes and have linear execution time to the number of inputs.
For example, the GATK Spark tool lasted about 108 minutes with an only input, with my code I would expect that with two similar inputs it would last something similar to about 216 minutes.
I noticed that that the code "works", or rather I obtain the usual output on my terminal. But in at least 15 hours, the task wasn't completed and it was still executing.
So I'm asking if this approach (executing spark-submit with the pipe function) is stupid or probably there are other errors?
I hope to be clear in explaining my issue.
P.S. I'm using a VM on Azure with 28GB of Memory and 4 execution threads.
Is it possible
Yes, it is technically possible. With a bit caution it is even possible to create a new SparkContext in the worker thread, but
Is it (...) wise
No. You should never do something like this. There is a good reason for Spark disallowing nested parallelization in the first place. Anything that happens inside a task is a black-box, therefore it cannot be accounted during DAG computation and resources allocation. In the worst case scenario job will just deadlock with the main job waiting for the tasks to finish, and tasks waiting for the main job to release required resource.
How to solve this. The problem is rather roughly outlined so it hard to give you a precise advice but you can:
Use driver local loop to submit multiple jobs sequentially from a single application.
Use threading and in-application scheduling to submit multiple jobs concurrently from a single application.
Use independent orchestration tool to submit multiple independent applications, each handling one set of parameters.
First, to shortly describe my problem. Based on the simbad simulator ( http://simbad.sourceforge.net/doc.php - not important for my question ), I want to build a system that deploys rovers which will explore the environment. The idea is that these rovers will avoid obstacles in the environment as well as other rovers. Let's call this a simulation.
The main elements in this simulation are of course the rovers, the environment, and a central station which will control the rovers and also send commands to it. This will run on a thread.
What I would like to have, is on another thread/process, to have a listener. This will listen to commands inputted from the keyboard and translate them into commands that will be applied in my simulation by the central station.
For example, each rover might have an ID, and I might want to remove a remover based on its id. Then I'd like to write something like: remove rover 1, the listener that is running on another thread maps this to a command and for example calls the function centralStation.removeRobot(id_of_robot).
What is the best way of implementing this ? Basically I will have 2 threads, one running the simulation, one listening to commands, and the centralStation should be a shared resource ? How do I make it a shared resource (make a main, initiate the central station, then call the other 2 threads to start doing their job? ) ?
I was wondering what the best practices for this is, and how to make it as simple as possible.
Thank you :)
A simple solution is to simply put an appropriate data structure "between" your components.
For example an instance of ConcurrentLinkedQueue. The idea here: your "input" thread writes "command" objects into that queue; and the other thread looks into that queue, and when it finds a new command, that is "applied" to the simulation.
The important aspect is: you really do not want that two threads are operating on the same data somehow.
Well how about Java Exchanger, where String is the id of rover/command that your listener would transfer to central station
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Exchanger.html
If I am understanding it correct then you want to start the system and at runtime pass the rover id/command, after processing it via a Listener(which would be in a separate thread), to the central station(which would be in a separate thread).
So how I might have proceeded with this would be:
In main thread, start the simulator, Create an Exchanger, and start two threads, one for central station and another for listener.
// pseudocode for main
main() {
// start simulator ( I am not sure what this thing is)
Exchanger<String> exc = new Exchanger<String>();
new CentralStationThread(exc);
new CommandListenerThread(exc);
}
Now in CentralStationThread one of the first thing that you might wanna do is register with the listener
//pseudocode for run method of central station
public void run(){
String roverIdToStop = exc.exchange(new String);
// some code to trigger the rover stop
// send in replacement rover
}
And something similar in CommandListenerThread thread, but not at start
//pseudocode for run method of listener
public void run(){
// Listen to keyboard
// String roverIdOrCommand = Parse the command & make something out out it
// when the command is ready to be sent to central station do following
String roverIdToStop = exc.exchange(roverIdOrCommand);
// keep looking for further commands
}
I agree, There might me several ways to achieve the same but this is what came to my mind. Hope it helps !
I have program that uses 4 input .txt files. When program is started, certain data manipulation is done, data is grabbed from those 4 files, and program creates 2 output .txt files at the end. Program execution time at that time is about 18 seconds. I would like to know what would be fastest way to write this programs in terms of program execution? This is how program is written now:
First, I started to read and write data sequentially, so my program logic is based on 2 parts - One loop (for reading necessary data and file creation) is used for first output file, and another loop (also for reading necessary data and file creation for that fole) is used for second output file. Second method needed to wait to first method to be done, with this approach. My total time of execution was at this point about 18 seconds.
Then, I involved theads - I used 2 threadsm (one thread running it's loop) so every loop would be running in separate thread. With that approach, I cut total time of execution to about 9 seconds.
Now; I started to ask If I could be speeding up total time of execution even more?
My code for the first thread looks like this (and the code for other thread is basically more or less similar):
Thread thread1 = new Thread() {
public void run() {
final List<Articles> article_list = ac.getFileValues("resources/articles.txt");
String file_contents = "";
String file1_data = "";
for (int i = 0; i < article_list.size(); i++) {
double price_art_local_val = cc.getPrice("resources/pricelist.txt", article_list.get(i).sifra);
double[] art_all_shops = sc.getAmountInAllStores("resources/supply.txt", article_list.get(i).sifra);
double total_value_art_all_shops_local = price_art_local_val * art_all_shops[0];
double total_value_art_all_shops_foregin = total_value_art_all_shops_local / exchange_rate;
file_contents = article_list.get(i).sifra+"\t"+article_list.get(i).naziv+"\t"+df.format(price_art_local_val)+"\t"+df.format(art_all_shops[0])+"\t"+article_list.get(i).jedinica_mjere+"\t"+df.format(total_value_art_all_shops_local)+"\t"+df.format(total_value_art_all_shops_foregin)+"\t"+df.format(art_all_shops[1])+"\n";
System.out.print(file_contents);
file1_data += file_contents;
}
if(file1_data != "")
{
save.saveFile("results/supply_value_articles.txt", file1_data);
}
}
};
The place I see further execution time reduction in my opinion is on methods cc.getPrice and sc.getAmountInAllStores() in that piece of code. My view is that It would be good to achieve that they run in separate threads, so other method would not have to wait execution of first method. Am I on good track?
So I presume that if I want to speed up execution in for loop I would need to make execution of methods cc.getPrice() and sc.getAmountInAllStores() also in separate threads. If that is not right solution, I would like to know what to do.
Then, if this is right solution, how can I achieve that? I do not know how to properly write code to use another thread, if that code is already in run() method. Can it be done even, I am not sure? Also, that methods return certain values, and they need to be stored in variables. So I would need also to get data for that metods stored; meaning threads would return data for me. I do not know how to done that properly. Seems like this won't help me then to basically write this (if I create one thread for method cc.getPrice, and another thread for method sc.getAmountInAllStores and name them thread3 and thread4):
thread3.join();
thread4.join();
I would need piece of code showing the appropriate solution.
If I am not on the right track, and this can't be done (starting and using new threads inside running thread), please instruct me what to do. I have read some stackowerflow questions about BlockingQuene, but I think my approach brings something else that I need.
Please help with examples of code If you can. Thank you very much, help appreciated.
I am creating a process P1 by using Process P1= Runtime.exec(...). My process P1 is creating another process say P2, P3....
Then I want to kill process P1 and all the processes created by P1 i.e. P2, P3...
P1.destroy() is killing P1 only, not its sub processes.
I also Googled it and found it's a Java bug:
http://bugs.sun.com/view_bug.do?bug_id=4770092
Does anyone have any ideas on how to do it?
Yes, it is a Bug, but if you read the evaluation the underlying problem is that it is next to impossible to implement "kill all the little children" on Windows.
The answer is that P1 needs to be responsible for doing its own tidy-up.
I had a similar issue where I started a PowerShell Process which started a Ping Process, and when I stopped my Java Application the PowerShell Process would die (I would use Process.destroy() to kill it) but the Ping Process it created wouldn't.
After messing around with it this method was able to do the trick:
private void stopProcess(Process process) {
process.descendants().forEach(new Consumer<ProcessHandle>() {
#Override
public void accept(ProcessHandle t) {
t.destroy();
}
});
process.destroy();
}
It kills the given Process and all of its sub-processes.
PS: You need Java 9 to use the Process.descendants() method.
Java does not expose any information on process grandchildren with good reason. If your child process starts another process then it is up to the child process to manage them.
I would suggest either
Refactoring your design so that your parent creates/controls all child processes, or
Using operating system commands to destroy processes, or
Using another mechanism of control like some form of Inter-Process Communication (there are plenty of Java libraries out there designed for this).
Props to #Giacomo for suggesting the IPC before me.
Is you writing other processes' code or they are something you cannot change?
If you can, I would consider modifying them so that they accept some kind of messages (even through standard streams) so they nicely terminate upon request, terminating children if they have, on their own.
I don't find that "destroying process" something clean.
if it is bug, as you say then you must keep track pf process tree of child process and kill all child process from tree when you want to kill parent process
you need to use data structure tree for that, if you have only couple of process than use list
Because the Runtime.exec() return a instance of Process, you can use some array to store their reference and kill them later by Process.destroy().