Currently I am working on a Java web app implementing the Spring Framework + MVC structure. I have ~10 different Python scripts that all query an API, perform some data analysis and manipulation, and save the results. All of the python scripts are in the same directory.
I would like to know how/ what would be the most efficient way to run these files from my service's Controller. Browsing around, I've seen solutions similar to this:
String command = "python /c start python path\to\script\script.py";
Process p = Runtime.getRuntime().exec(command);
However, since they are all in the same directory, I was wondering if there is a more efficient way to run the files. Perhaps another 'driver' script that calls each of the ~10 files, and then running that 'driver' from my Java file?
Any advice or suggestions would be appreciated. Let me know if any additional information is required.
Related
So I have been tasked with integrating a program called "lightSIDE" into a hadoop job, and I'm having some trouble figuring out how to go about this.
So essentially, rather than a single JAR, lightSIDE comes as an entire directory, including xml files that are crucial to its running.
Up until now, the way the data scientists on my team have been using this program is by running a python script that actually runs an executable, but this seems extremely inefficient as it would be spinning up a new JVM every time it gets called. That being said, I have no idea how else to handle this.
If you are writing your own MapReduce jobs then it is possible to include all the jar files as as libraries and xml files as resources.
I'm one of the maintainers for the LightSide Researcher's Workbench. LightSide also includes a tiny PredictionServer class to handle predictions on new instances over HTTP - you can see it here on BitBucket.
If you want to train new models instead, you could modify this server to do what you want, drawing clues from the side.recipe.Chef class.
I have .exe file (I don't have source files so I won't be able to edit the program) taking as parameter path to file which be processing and on the end giving results. For example in console I run this program as follow : program.exe -file file_to_process [other_parametrs]. I have also jar executable file which take two parameters file_to_process and second file and [others_parameters]. In both cases I would like to split input file into smallest part and run programs in parallel. Is there any way to do it efficient with Apache Spark Java framework. I'm new with parallel computations and I read about RDD and pipe operator but I don't know if it would be good in my case because I have path to file.
I will be very grateful for some help or tips.
I have run into similar issues recently, and I have a working code with spark 2.1.0. The basic idea is that, you put your exe with its dependencies such as dll into HDFS or your local and use addFiles to add them into driver, which will also copy them into work executors. Then you can load your file as a RDD, and use mapPartitionsWithIndex function to save each partition into local and execute the exe (use SparkFiles.get to get the path from the work executor) to that partition using Process.
Hope that helps.
I think the general answer is "no". Spark is a framework and in general it administers very specific mechanisms for cluster configuration, shuffling its own data, read big inputs (based on HDFS), monitoring task completion and retries and performing efficient computation. It is not well suited for a case where you have a program you can't touch and that expects a file from the local filesystem.
I guess you could put your inputs on HDFS, then, since Spark accepts arbitrary java/Scala code, you could use whatever language facilities you have to dump to a local file, launch a process (i.e.this), then build some complex logic to monitor for completion (maybe based on the content of the output). the mapPartitions() Spark method would be the one best suited for this.
That said, I would not recommend it. It will be ugly, complex, require you to mess with permissions on the nodes and things like that and would not take good advantage of Spark's strengths.
Spark is well suited for you problem though, especially if each line of your file can be processed independently. I would look to see if there is a way to get the program's code, a library that does the same or if the algorithm is trivial enough to re-implement.
Probably not the answer you were looking for though :-(
I have an embedded system using a python interface. Currently the system is using a (system-local) XML-file to persist data in case the system gets turned off. But normally the system is running the entire time. When the system starts, the XML-file is read in and information is stored in python-objects. The information then is used for processing. My aim is to edit this information remotely (over TCP/IP) even during process. I would like to use JAVA to get this done, and i have been thinking about something to share the objects. The problem is, that I'm missing some keywords to find the right technologies to get this done. What i found is SOAP, but i think it is not the right thing for this case, is that true? I'm grateful for any tips.
As I understand, you are using XML file to store start up configuration
And my assumptions on your interface between Java & Python apps
You want your Java application to retrieve objects over Python interface
And process them locally and send it back to Python interface to reload config ?
So, depending on your circumstances, you can workout something with the following
Jython
Pickle (if you have no restriction on startup config file format or can afford to do conversion)
https://pypi.python.org/pypi/Pyro4
Also you can get some ideas from here:
Sharing a complex object between Python processes?
You should ask your python application to open a XML-RPC socket which clients can connect on. This could let an outside application to execute an endpoint, which would manipulate your python object values in someway. There are several good choices for Java XML-RPC libraries, including the amazing org.apache.xmlrpc library.
I have a java application(runnable jar) and VB scripts which I'm using to telnet to a remote machine and executing some cmds. So, I first execute the vbs files and then run my jar(in all everything is working fine).
But, now I want to integrate scripts and my java jar such that, running the jar should first trigger the script followed by Java related task.
Few thing which I've come across are -
I cannot trigger Vbs from Java(javax.script - correct me if I'm wrong). So, possible options to rewite the script in are -
Javascript(have no idea what my Javascript file would have so that after reading it inside java class I can write it to the Socket output stream.)
PHP(I tried this using Java bridge but it gives some error saying cgi needs to installed. And, I believe it also requires PHP to be installed on the host machine before executing my jar. So I'm not going any futher with this approach.)
Long story short, I don't want to create any dependencies - I am looking for something like where I can package any external lib with my jar(if required) and use it to execute my scripts.
You can execute the VB-Script in an external command. There are a lot of resources on the internet that explain how to do that - for instance this link also explains how to start a VB-Script from within java. However I do not know if you need the output from the script within the Java. If so you'll have to listen to the outputstream of the created process. You should find an example for that as well with that link (using the processbuilder)
If you have the script packaged within your jar, I fear you'll have to unpack it to a temporary folder and execute it there.
The closest I have seen about VB script as a JVM language is in answer here.
Visual Basic or VBScript as Java Scripting Engine
Have you seen this wikipedia entry about JVM languages?
http://en.wikipedia.org/wiki/List_of_JVM_languages
Also, have you considered using Ant and using it programmatically from java?
Another option is to use groovy/Ant from Java.
I just got a requirement to create a small (I assume standalone) utility to hit some code in our web application to do some custom processing of files from the app and then dump the files into a shared drive. My question is what is the best way for doing this? Do I just create a small app and then jar it up and run it off a command line or is there a better way?
Sorry, I didn't give enough detail. It's an old application, like over 10 years, so while it's been upgraded to jdk 1.6, most of the code uses the old collections, old loops, etc... There aren't any interfaces, very tightly coupled code that uses inheritance with lots of nested objects. The web app will do the processing. I think what they want is create some code outside of the application code that will login and then fire off the file processing code. Prior to this I had upgraded their version of Windward Reports in a separate branch and they want to make sure that the processed files: contracts, forms, etc.. don't get altered greatly as there are legal requirements on fonts and layouts. So this utility will go in, fire off the list of reports (a few thousand) dump it to a share drive so they can view them with another tool for comparision based on rules you can automate with that commercial tool, en masse. I was thinking create a small class with a main method, then jar it up and while the web server is running with my upgraded branch code, run the utility off the command line to fire it off.
There's not enough to go on here. How is the web app's functions exposed? If it's a REST interface then wget/curl/spring-rest-template are the way to go. If it's something like a JFS app then you're going to need something like Selenium to imitate a browser. If the functionality is in a shared library (JAR) then there web never even comes into play.
Well, I was originally looking at creating a standalone utility jar that I would run off the command line to connect with URLConnection to the app, but I found there is already testing code built into the application that I can run from a command line as long as I deploy the new code with the existing code. The utility will dump out the files to a shared drive and then XTest can be run to compare files. After reviewing the capabilities of XTest, it appears that it can handle the comparison of files well.