I have a java program which does individual jobs e.g. takes in a file, does some processing on it and creates a new file. To run it I have to type the following in the command line.
java myprogram.jar -input myfile1.txt -output output/myfile1.txt
However i wish to batch process a few thousand files, so i would like to increment the number at the end of the myfile part of the string. So once the first job is finished, the second job will then start, and so on so forth. Rather than have thousands of instances of the java program running at the same time.
Any help would be appreciated.
Jon
I would use bash or something, but if you need to use python, you can use subprocess.call to do this:
from subprocess import call
for i in range(1,1000):
call(["java myprogram -input myfile%d.txt -output" % (i)])
This is a perfect use for a bash script (if you're in a *nix environment) or a .bat file if you are in Windows. Bash example:
#!/bin/bash
for i in {1..5}
do
java myprogram.jar -input myfile$i.txt -output output/myfile$i.txt
done
I would suggest just modifying your Java program to handle processing a whole directory so instead of handing over files pass over a directory to work on then the java program would process all of the files in the directory and write out several output files. Use some simple name mapping scheme for the output. That way you could exploit threads to handle several files at once should you want to boost speed for multi-core boxes. Also that keeps your overhead low because only 1 JVM is running.
You don't have to modify your Java program to do this. You could write a new program that leverages the code out of the JVM.
Related
I am wondering what the best course of action would be in order to get a java ".jar" file output into a python variable.
For example, let's say a user has a complicated java package (that perhaps the user didn't write and doesn't understand) which they can run in the command window / terminal with
java -jar FileProcessor.jar -i "input.txt" -o "output.txt"
Is there a manner to call this in python and get the "output.txt" in a variable, similar to this method:
proc = subprocess.Popen(['java', '-jar', 'FileProcessor.jar', '-i', 'sample.txt', '-o'], stdout=subprocess.PIPE)
py_out = proc.stdout.read()
print(py_out)
The problem is clearly that the java file looks for the output file "output.txt" after -o.
Obviously I am open to other ideas, but as I understand my main options are:
subprocess call/Popen? using stdout to log the variable
Write a wrapper of the java package
Is there a better way to achieve this? The first method doesn't appear to be working as easily as many examples would show and I have no idea how difficult it would be to write a wrapper around a java package as I have never done so.
if your program FileProcessor.jar doesn't have a special case to output to standard output, there's nothing much you can do (but write to a temporary file and read it again, but that's cheating :).
Most nice commands (Unix commands for instance) either dump data to stdout when the -o option is omitted, or when -o option has - value (dash).
if you cannot modify your program you can fool the program into believing it's writing on a file whereas it's writing on the console:
Linux/UNIX:
get current tty with tty command. Ex: /dev/pts/2
pass that to your program
read standard output
Windows:
use CON as output file name
read standard output
I have a Java program utility that I want to execute as a command in cmd. I added the location to the PATH variable, but java programs needs to be executed using java - jar "...". How do I shorten that to just the program name, like mysql or netstat?
Update:
I neglected to mention that this java program takes arguments of its own to handle its tasks, so the batch program would need to pass the arguments passed to it over the the java program. I'm not skilled enough in batch to know how to do this.
~Jacob
You could create a batch file or bash script (depending upon your OS) that calls the program with the proper java -jar commands, and simply name the batch (or bash) script whatever you would like to enter as the command. Place this in a directory that is in your PATH variable, and have at it.
Edit: Read this for info on how to parse command line parameters in batch scripts. Just take the parameters passed to the batch file, parse them, and pass them to your jar file with:
java -jar jarfile.jar param1 param2 ...
So for example, lets's assume that your program takes two arguments. Your script could then be as follows:
java -jar jarfile.jar %1 %2
I am not an expert in batch files by any means, so there is probably a more proper way to do this. That being said, why over complicate things?
With Launch4J you can wrap a Java program in a standalone executable file. I'm not going to copy their (long) feature list here, but definite highlights are the numerous ways presented to customize the resulting exe, its small size, the fact that it's open source and its permissive license that allows commercial usage.
I am trying to do something using system exec in Java
Runtime.getRuntime().exec(command);
Surprisingly everything that is related with paths, directories and files is not working well
I don't get why and just want to know is there any alternatives?
The alternative is to use the ProcessBuilder class, which has a somewhat cleaner interface, but your main problem is probably related to how the OS processes command lines, and there isn't much Java can do to help you with that.
As noted above, cd is a shell builtin. i.e. it's not an executable. You can determine this using:
$ which cd
cd: shell built-in command
As it's not a standalone executable, Runtime.exec() won't be able to do anything with it.
You may be better off writing a shell script to do the shell-specific stuff (e.g. change the working directory) and then simply execute that shell script using Runtime.exec(). You can set PATH variables etc. within your script and leave Java to simply execute your script.
One thing that catches people out is that you have to consume your script's stdout/stderr (even if you throw it away). If you don't do this properly your process will likely block. See this SO answer for more details.
The exec() method can take three arguments. The third is the directory your subprocess should use as its working directory. That solves your "cd" problem, anyway.
I have a simple Java console application and would like to test its input / output automatically. The input is always only one line, but the output is sometimes more than one line.
How can I do this? (with a Linux shell / Python / Eclipse / Java)
You could use pipes in Linux. For example, run your problem like this:
java myProgram < input_file > output_file
This will run myProgram and feed input from input_file. All output will be written to a file called output_file.
Now create another file called expected_file which you should handcreate to specify the exact output you expect on some input (specifically, the input you have in input_file).
Then you can use diff to compare the output_file and the expected_file:
diff output_file expected_file
This will output any differences between the two files. If there are no differences, nothing will be returned. Specifically, if something gets returned, your program does not work correctly (or your test is wrong).
The final step is to link all these commands in some scripting language like Ruby (:)) or Bash (:().
This is the most straight-forward way to do this sort of testing. If you need to write more tests, consider using some test frameworks like junit.
In eclipse you can log your console output to a physical file using the Run configuration settings. Run-> Run Configuration-> Select your application->go to common tab-> in 'Standard input and output' section specify physical file path.
You can execute any Unix command using watch command. Watch command will be executed until you terminate it either by CTRL+C or kill the process.
$ watch -n 5 ls
By default watch command uses 2 second interval, you can change it using -n option.
Or you could write a function like this in your .bashrc (from here)
function run() {
number=$1
shift
for i in {1..$number}; do
$#
done
}
And use it like this
run 10 command
I have a java program that uses ProcessBuilder to call the unix sort command. When I run this code within my IDE (intelliJ) it only takes about a second to sort 500,000 lines. When I package it into an executable jar, and run that from the terminal it takes about 10 seconds. When I run the sort command myself from the terminal, it takes 20 seconds!
Why the vast difference in performance and any way I can get the jar to execute with the same performance? Environment is OSX 10.6.8 and java 1.6.0_26. The bottom of the sort man page says "sort 5.93 November 2004"
The command it is executing is:
sort -t' ' -k5,5f -k4,4f -k1,1n /path/to/imput/file -o /path/to/output/file
Note that when I run sort from the terminal I need to manually escape the tab delimiter and use the argument -t$'\t' instead of the actual tab (which I can pass to ProcessBuilder).
Looking as ps everything seems the same except when run from IDE the sort command has a TTY of ?? instead of ttys000--but from this question I don't think that should make a difference. Perhaps BASH is slowing me down? I am running out of ideas and want to close this 20x performance gap!
I'm going to venture two guesses:
perhaps you are invoking different versions of sort (do a which sort and use the full absolute path to recompare?)
perhaps you are using more complicated locale settings (leading to more complicated character set handling etc.)? Try
export LANG=C
sort -t' ' -k5,5f -k4,4f -k1,1n /input/file -o /output/file
to compare
Have a look at this project: http://code.google.com/p/externalsortinginjava/
Avoid the need of calling external sort entirely.