I have the code below:
from subprocess import Popen, PIPE, STDOUT
p = Popen(['java', '-jar', 'action.jar'], stdin=PIPE, stdout=PIPE, stderr=STDOUT)
stdout1, stderr1 = p.communicate(input=sample_input1)
print "Result is", stdout1
p = Popen(['java', '-jar', 'action.jar'], stdin=PIPE, stdout=PIPE, stderr=STDOUT)
stdout2, stderr2 = p.communicate(input=sample_input2)
print "Result is", stdout2
Loading the jar takes a lot of time and is very inefficient. Is there any way to avoid reloading it the second time, in the second line p = Popen(...), i.e. just loading it once in the beginning and continue using that instance? I tried to remove the second line unsuccessfully, Python complains:
"ValueError: I/O operation on closed file".
Is there any solution to this? Thanks!
communicate() waits for the process to terminate, so that explains the error you're getting -- the second time you call it, the process isn't running any more.
It really depends on how that JAR was written, and the kind of input it expects. If it supports executing its action more than once based on input, and if you can reformat your input that way, then it would work. If the JAR does its thing once and terminates, there's not much you can do.
If you don't mind writing a bit of Java, you can add a wrapper around the classes in action.jar that takes both your sample inputs in turn and passes them to the code in the jar.
You can save on the cost of starting up the Java Virtual Machine, using a tool like Nailgun.
Related
I am going through the Java IO. Just started with standard input and output streams. Please look at the simple program given below,
public static void main(String args[]){
Scanner scanner = new Scanner(System.in);
System.out.println("Give us your input");
String str = scanner.nextLine();
System.out.println("Standard Output: " + str);
System.err.println("Standard Error Output: " +str );
}
The output varies while running this two or three times. Please find couple of the outputs below,
Running for the first time:
Give us your input
my Name
Standard Error Output: my Name
Standard Output: my Name
Process finished with exit code 0
Running second time with the same code:
Give us your input
my Name
Standard Output: my Name
Standard Error Output: my Name
Process finished with exit code 0
I would like to know why the output changes with System.err
Your program will write to first System.out and then System.err (and println will flush these streams as well), but there is no guarantee in which order/interleaving the two streams will appear in your console.
Since you are writing to them at practically the same time, you will get both combinations. I suppose you might even get half-line interleavings.
System.out and System.err write to different streams that are connected via different pipes to your command shell. The command shell will then read from these pipes and write the data to your console application. That will ultimately write to your screen.
There are a number of places where data written to one stream could "overtake" data written to the other one.
It could possibly occur in the JVM itself, since the Java specs make no guarantees about which stream gets written first. (In fact, this is unlikely if there is only one thread doing the writing. With current Java implementations, the behavior will probably be deterministic ... though unspecified.)
It could be happening in the OS, since there are no guarantees on the order of delivery of data written to two independent pipes.
It could be happening in the shell, since nothing in the shell specs place any priority of reading from the pipes.
In short, there are lots of areas where the behavior is unspecified.
It is also worth noting that the observed behavior is liable to depend on the version of Java you use, the OS and OS tools, your hardware, and the load on your system.
Finally, there is probably nothing that you could do to guarantee that the observed interleaving (or not) are consistent. If you need consistency, write your output to one stream or the other exclusively.
no guarantee of order for System.out, System.in, System.err anything can be appeared first so order of these streams are not fixed
I am asking this question particularly for an Expect implementation in Java. However, I would like to know general suggestions as well.
In Expect programming, is it possible to expect exactly what is prompted after spawning a new process?
For example, instead of expecting some pattern or a fixed string, isn't it better to just expect what is prompted. I feel this should be really helpful at times(especially when there's no conditional sending).
Consider the sample java code here that uses JSch and Expect4j java libraries to do ssh and execute list of commands(ls,pwd,mkdir testdir) on the remote machine.
My question here is why is it necessary to specify a pattern for the prompt? Is it not possible it to get the exact prompt from Channel itself and expect it?
I've programmed in "expect" and in "java".
I think you misunderstand what "expect" basically does. It doesn't look for exact items prompted after spawning a new process.
An expect program basically consists of:
Something that reads the terminal
A set of patterns (typically regular expressions), coupled to a blocks of code.
So, when a new process is spawned, there's a loop that looks something like this
while (terminal.hasMoreText()) {
buffered_text += terminal.readInput();
for (Pattern pattern : patterns) {
if (pattern.matches(buffered_text)) {
String match = pattern.getMatch(buffered_text);
bufferedText.removeAllTextBefore(match);
bufferedText.removeText(match);
pattern.executeBlock();
}
}
}
Of course, this is a massive generalization. But it is close enough to illustrate that expect itself doesn't "exactly expect" anything after launching a process. The program provided to the expect interpreter (which primarily consists of patterns and blocks of code to execute when the patterns match) contains the items which the interpreter's loop will use to match the process's output.
This is why you see some pretty odd expect scripts. For example, nearly everyone "expects" "ogin:" instead of "Login:" because there's little consistency on whether the login prompt is upper or lower case.
You don't have to expect anything. You're free to just send commands immediately and indiscriminately.
It's considered good practice to only reply to specific prompts so that you don't accidentally ruin something by saying the wrong thing at the wrong time, but you're entirely free to ignore this.
The main consideration is that while your normal flow might be:
$ create-backup
$ mkdir latest
$ mv backup.tar.gz latest
With no expectations and just blindly writing input, you can end up with this:
$ create-backup
Disk full, cleanup started...
Largest file: precious-family-memories.tar (510MB)
[R]emove, [S]ave, [A]bort
Invalid input: m
Invalid input: k
Invalid input: d
Invalid input: i
Removing file...
$ latest
latest: command not found
$ mv backup.tar.gz latest
whereas a program that expects $ before continuing would just wait and eventually realize that things are not going according to plan.
A few commands are sensitive to timing (e.g. telnet), but other than that you can send commands whenever you want, with or without waiting for anything at all.
I'm trying to do sentiment analysis on a large corpus of tweets in a local MongoDB instance with Ruby on Rails 4, Ruby 2.1.2 and Mongoid ORM.
I've used the freely available https://loudelement-free-natural-language-processing-service.p.mashape.com API on Mashape.com, however it starts timing out after pushing through a few hundred tweets in rapid fire sequence -- clearly it isn't meant for going through tens of thousands of tweets and that's understandable.
So next I thought I'd use the Stanford CoreNLP library promoted here: http://nlp.stanford.edu/sentiment/code.html
The default usage, in addition to using the library in Java 1.8 code, seems to be to use XML input and output files. For my use case this is annoying given I have tens of thousands of short tweets as opposed to long text files. I would want to use CoreNLP like a method and do a tweets.each type of loop.
I guess one way would be to construct an XML file with all of the tweets and then get one out of the Java process and parse that and put it back to the DB, but that feels alien to me and would be a lot of work.
So, I was happy to find on the site linked above a way to run CoreNLP from the command line and accept the text as stdin so that I didn't have to start fiddling with the filesystem but rather feed the text as a parameter. However, starting up the JVM separately for each tweet adds a huge overhead compared to using the loudelement free sentiment analysis API.
Now, the code I wrote is ugly and slow but it works. Still, I'm wondering if there's a better way to run the CoreNLP java program from within Ruby without having to start fiddling with the filesystem (creating temp files and giving them as params) or writing Java code?
Here's the code I'm using:
def self.mass_analyze_w_corenlp # batch run the method in multiple Ruby processes
todo = Tweet.all.exists(corenlp_sentiment: false).limit(5000).sort(follow_ratio: -1) # start with the "least spammy" tweets based on follow ratio
counter = 0
todo.each do |tweet|
counter = counter+1
fork {tweet.analyze_sentiment_w_corenlp} # run the analysis in a separate Ruby process
if counter >= 5 # when five concurrent processes are running, wait until they finish to preserve memory
Process.waitall
counter = 0
end
end
end
def analyze_sentiment_w_corenlp # run the sentiment analysis for each tweet object
text_to_be_analyzed = self.text.gsub("'"){" "}.gsub('"'){' '} # fetch the text field of DB item strip quotes that confuse the command line
start = "echo '"
finish = "' | java -cp 'vendor/corenlp/*' -mx250m edu.stanford.nlp.sentiment.SentimentPipeline -stdin"
command_string = start+text_to_be_analyzed+finish # assemble the command for the command line usage below
output =`#{command_string}` # run the CoreNLP on the command line, equivalent to system('...')
to_db = output.gsub(/\s+/, "").downcase # since CoreNLP uses indentation, remove unnecessary whitespace
# output is in the format of "neutral, "positive", "negative" and so on
puts "Sentiment analysis successful, sentiment is: #{to_db} for tweet #{text_to_be_analyzed}."
self.corenlp_sentiment = to_db # insert result as a field to the object
self.save! # sentiment analysis done!
end
You can at least avoid the ugly and dangerous command line stuff by using IO.popen to open and communicate with the external process, for example:
input_string = "
foo
bar
baz
"
output_string =
IO.popen("grep 'foo'", 'r+') do |pipe|
pipe.write(input_string)
pipe.close_write
pipe.read
end
puts "grep said #{output_string.strip} but not bar"
EDIT: to avoid the overhead of reloading the Java program on each item, you can open the pipe around the todo.each loop an communicate with the process like this
inputs = ['a', 'b', 'c', 'd']
IO.popen('cat', 'r+') do |pipe|
inputs.each do |s|
pipe.write(s + "\n")
out = pipe.readline
puts "cat said '#{out.strip}'"
end
end
that is, if the Java program supports such line-buffered "batch" input. However, it should not be very difficult to modify it to do so, if not.
As suggested in the comments by #Qualtagh, I decided to use JRuby.
I first attempted to use Java to use MongoDB as the interface (read directly from MongoDB, analyze with Java / CoreNLP and write back to MongoDB), but the MongoDB Java Driver was more complex to use than the Mongoid ORM I use with Ruby, so this is why I felt JRuby was more appropriate.
Doing a REST service for Java would have required me first to learn how to do a REST service in Java, which might have been easy, or then not. I didn't want to spend time figuring that out.
So the code I needed to do to run my code was:
def analyze_tweet_with_corenlp_jruby
require 'java'
require 'vendor/CoreNLPTest2.jar' # I made this Java JAR with IntelliJ IDEA that includes both CoreNLP and my initialization class
analyzer = com.me.Analyzer.new # this is the Java class I made for running the CoreNLP analysis, it initializes the CoreNLP with the correct annotations etc.
result = analyzer.analyzeTweet(self.text) # self.text is where the text-to-be-analyzed resides
self.corenlp_sentiment = result # adds the result into this field in the MongoDB model
self.save!
return "#{result}: #{self.text}" # for debugging purposes
end
I hope to execute a bash script and obtain normal output or error message. I know for C we have erron or perror to get the number of exit code and the corresponding message. Whether or not there is an equivalent for Java. If not, how could we achieve the same task as in C?
If you start your process like this:
Process p = Runtime.getRuntime().exec("<command>");
The the error code is return like this:
int error_code = p.waitFor();
You're confusing things.
The exit status from a process, out to bash, Java or whatever environment started it, is not the same thing as the values of errno, that can be printed by perror(). The latter are errors for individual system calls, which are clearly at a much lower level.
The exit status of a process can be whatever, there's no standard for that since it's usually just pass/fail, the detailed level is typically coverered by the program itself.
See, for instance, the exit status of GNU grep:
Normally, the exit status is 0 if selected lines are found and 1 otherwise. But the exit status is 2 if an error occurred [...].
That's very high-level, and doesn't tell you what error occured inside grep, just that overall, it failed.
I have a C executable which I can run it from CYGWIN. I also want to run same file from JAVA. The C program gets input and output via stdin and stdout. It mainly gets string and outputs string.
I think I can start the program with process builder successfully. However I can not interact with the C program. To start .exe I use ProcessBuilder, see following.
Process cmd = new ProcessBuilder("path to exe").start();
The main method of my C program is here:
int main(argc, argv)
{
/* set command line or config file parms */
config(argc, argv);
/* read grammar, initialize parser, malloc space, etc */
init_parse(dir, dict_file, grammar_file, frames_file, priority_file);
/* for each utterance */
while( fgets(line, LINE_LEN-1, fp) ) {
/* assign word strings to slots in frames */
parse(line, gram);
/* print parses to buffer */
for(i= 0; i < num_parses; i++ )
print_parse(i, out_ptr, extract, gram);
/* clear parser temps */
reset(num_nets);
}
}
My goal is to send input and get output from Java.
If you only need stdinput/output then you can get the appropriate streams using a ProcessBuilder or some form of System.exec quite easily.
After that just generate output and parse input but beware. The input and output streams generally should be processed in different threads. Otherwise it is very easy to get a deadlock, since most programs won't expect stdin and stdout to be tied to a single process (e.g. the stdout fills your input buffer while you are still trying to write to the stdin stream. Your write is blocked waiting for the program to read more and it won't since its write is blocked waiting for you to read more. Classic.)
Be careful with threads but have fun!
You need to start reading about JNI before going any further. Google is your friend here.
Frankly, your main C method is short. Why don't you want to write this in Java again?
Other good library allow easy acces to native file is JNA. Maybe Runtime class might help you.