Suppose I have a java.lang.Process process object representing a sub-process I want to start from Java. I need to get both stdout and stderr output from the sub-process combined as a single String, and for the purpose of this question, I have chosen to store stdout first, followed by stderr. Based on my current understanding, I should be reading from them simultaneously. Sounds like a good task for CompletableFuture, I presume?
Hence, I have the following code snippets:
Getting the output
final CompletableFuture<String> output = fromStream(process.getInputStream()).thenCombine(
fromStream(process.getErrorStream()),
(stdout, stderr) -> Stream.concat(stdout, stderr).collect(Collectors.joining("\n")));
// to actually get the result, for example
System.out.println(output.get());
fromStream() helper method
public static CompletableFuture<Stream<String>> fromStream(final InputStream stream) {
return CompletableFuture.supplyAsync(() -> {
return new BufferedReader(new InputStreamReader(stream)).lines();
});
}
Is there a better/nicer Java-8-way of doing this task? I understand there are the redirectOutput() and redirectError() methods from ProcessBuilder, but I don't suppose I can use them to redirect to just a String?
As pointed out in the comments, I missed out on the redirectErrorStream(boolean) method that allows me to pipe stderr to stdout internally, so there's only one stream to deal with. In this case, using a CompletableFuture is completely overkill (pun unintended...?) and I'll probably be better off without it.
Related
I have some code that looks like this (simplified pseudo-code):
[...]
// stream constructed of series of web service calls
Stream<InputStream> slowExternalSources = StreamSupport.stream(spliterator, false);
[...]
then this
public Stream<String> getLines(Stream<InputStream> slowExternalSources) {
return slowExternalSources.flatMap(is -> new BufferedReader(new InputStreamReader(is)).lines())
.onClose(() -> is.close());
}
and later this
Stream<String> lineStream = getLines();
lineStream.parallel().forEach( ... do some fast CPU-intensive stuff here ... }
I've been strugging to try to make this code execute with some level of parallelisation.
Inspection in jps/jstack/jmc shows that all the InputStream reading is occurring in the main thread, and not paralleling at all.
Possible culprints:
BufferedReader.lines() uses a Spliterator with parallel=false to construct the stream (source: see Java sources)
I think I read some articles that said flatMap does not interact well with parallel(). I am not able to locate that article right now.
How can I fix this code so that it runs in parallel?
I would like to retain the Java8 Streams if possible, to avoid rewriting existing code that expects a Stream.
NOTE I added java.util.concurrent to the tags because I suspect it might be part of the answer, even though it's not part of the question.
Is there a way to start a process in Java? in .Net this is done with for example:
System.Diagnostics.Process.Start("processname");
Is there an equivalent in Java so I can then let the user find the application and then it would work for any OS?
http://www.rgagnon.com/javadetails/java-0014.html
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.nio.file.Paths;
public class CmdExec {
public static void main(String args[]) {
try {
// enter code here
Process p = Runtime.getRuntime().exec(
Paths.get(System.getenv("windir"), "system32", "tree.com /A").toString()
);
// enter code here
try(BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()))) {
String line;
while ((line = input.readLine()) != null) {
System.out.println(line);
}
}
} catch (Exception err) {
err.printStackTrace();
}
}
}
You can get the local path using System properties or a similar approach.
http://download.oracle.com/javase/tutorial/essential/environment/sysprop.html
The Java Class Library represents external processes using the java.lang.Process class. Processes can be spawned using a java.lang.ProcessBuilder:
Process process = new ProcessBuilder("processname").start();
or the older interface exposed by the overloaded exec methods on the java.lang.Runtime class:
Process process = Runtime.getRuntime().exec("processname");
Both of these will code snippets will spawn a new process, which usually executes asynchronously and can be interacted with through the resulting Process object. If you need to check that the process has finished (or wait for it to finish), don't forget to check that the exit value (exit code) returned by process.exitValue() or process.waitFor() is as expected (0 for most programs), since no exception is thrown if the process exits abnormally.
Also note that additional code is often necessary to handle the process's I/O correctly, as described in the documentation for the Process class (emphasis added):
By default, the created subprocess does not have its own terminal or console. All its standard I/O (i.e. stdin, stdout, stderr) operations will be redirected to the parent process, where they can be accessed via the streams obtained using the methods getOutputStream(), getInputStream(), and getErrorStream(). The parent process uses these streams to feed input to and get output from the subprocess. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, or even deadlock.
One way to make sure that I/O is correctly handled and that the exit value indicates success is to use a library like jproc that deals with the intricacies of capturing stdout and stderr, and offers a simple synchronous interface to run external processes:
ProcResult result = new ProcBuilder("processname").run();
jproc is available via maven central:
<dependency>
<groupId>org.buildobjects</groupId>
<artifactId>jproc</artifactId>
<version>2.5.1</version>
</dependency>
See Runtime.exec() and the Process class. In its simplest form:
Process myProcess = Runtime.getRuntime().exec(command);
...
Note that you also need to read the process' output (eg: myProcess.getInputStream()) -- or the process will hang on some systems. This can be highly confusing the first time, and should be included in any introduction to these APIs. See James P.'s response for an example.
You might also want to look into the new ProcessBuilder class, which makes it easier to change environment variables and to invoke subprocesses :
Process myProcess = new ProcessBuilder(command, arg).start();
...
I want to simply execute a linux terminal command like ls from LuaJ and the result that it will return or anything that returns i want to receive it and will show the names in the Java Gui. I searched but found this but not one with LuaJ.
Is there any function to execute the terminal command from LuaJ ??
There are multiple ways to do this, for one, you can implement it yourself in Java then link it to LuaJ.
LuaFunction command = new OneArgFunction()
{
public LuaValue call(LuaValue cmd)
{
Process p = Runtime.getRuntime().exec("/bin/sh", "-c", cmd.checkstring());
BufferedReader br = new BufferedReader(new InputStreamReader(p.getInputStream()));
int returnCode = p.waitFor();
return LuaValue.valueOf(returnCode);
}
}
globals.set("command", command);
Then in Lua:
local code = command("ls");
The problem with actually getting the output of a command is that you can't just have a fixall solution. For all the system knows you could be calling a program which runs for 2 hours generating constant output, which could be an issue, not to mention if the program requires input. If you know you're only going to use certain functions you can make a dirty version of above function to capture the output from the stream and return it all instead of the exit code, just don't use it on other processes that don't return quickly. The other alternative is to create a class that wraps the input and output streams from the process and return a coerced version of that class, and manage the input and output from lua.
Lua does have a function that's part of the OsLib called execute(), if execute doesn't exist in your current environment then in Java call:
globals.load(new OsLib());
Before loading the lua code. the os.execute() function returns the status code, and doesn't return the streams, so no way to get the output there. To get around this you can modify the command to pipe the output to a temp file and open it with the io library (new IoLib() if doesn't exist in current environment).
The other option is to use io.openProcess, which also executes the command and returns a file to read the output from.
Resources:
http://luaj.org/luaj/3.0/api/org/luaj/vm2/lib/OsLib.html
http://luaj.org/luaj/3.0/api/org/luaj/vm2/lib/IoLib.html
I'm trying to execute a Spark Streaming example with Twitter as the source as follows:
public static void main (String.. args) {
SparkConf conf = new SparkConf().setAppName("Spark_Streaming_Twitter").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaStreamingContext jssc = new JavaStreamingContext(sc, new Duration(2));
JavaSQLContext sqlCtx = new JavaSQLContext(sc);
String[] filters = new String[] {"soccer"};
JavaReceiverInputDStream<Status> receiverStream = TwitterUtils.createStream(jssc,filters);
jssc.start();
jssc.awaitTermination();
}
But I'm getting the following exception
Exception in thread "main" java.lang.AssertionError: assertion failed: No output streams registered, so nothing to execute
at scala.Predef$.assert(Predef.scala:179)
at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:158)
at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:416)
at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:437)
at org.apache.spark.streaming.api.java.JavaStreamingContext.start(JavaStreamingContext.scala:501)
at org.learning.spark.TwitterStreamSpark.main(TwitterStreamSpark.java:53)
Any suggestion how to fix this issue?
When an output operator is called, it triggers the computation of a
stream.
Without output operator on DStream no computation is invoked. basically you will need to invoke any of below method on stream
print()
foreachRDD(func)
saveAsObjectFiles(prefix, [suffix])
saveAsTextFiles(prefix, [suffix])
saveAsHadoopFiles(prefix, [suffix])
http://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations
you can also first apply any transformations and then output functions too if required.
Exception in thread "main" java.lang.AssertionError: assertion failed: No output streams registered, so nothing to execute
TL;DR Use one of the available output operators like print, saveAsTextFiles or foreachRDD (or less often used saveAsObjectFiles or saveAsHadoopFiles).
In other words, you have to use an output operator between the following lines in your code:
JavaReceiverInputDStream<Status> receiverStream = TwitterUtils.createStream(jssc,filters);
// --> The output operator here <--
jssc.start();
Quoting the Spark official documentation's Output Operations on DStreams (highlighting mine):
Output operations allow DStream's data to be pushed out to external systems like a database or a file systems. Since the output operations actually allow the transformed data to be consumed by external systems, they trigger the actual execution of all the DStream transformations (similar to actions for RDDs).
The point is that without an output operator you have "no output streams registered, so nothing to execute".
As one commenter has noticed, you have to use an output transformation, e.g. print or foreachRDD, before starting the StreamingContext.
Internally, whenever you use one of the available output operators, e.g. print or foreach, DStreamGraph is requested to add an output stream.
You can find the registration when a new ForEachDStream is created and registered afterwards (which is exactly to add it as an output stream).
It also -wrongly- fails accusing this problem, but the real cause is the non multiple numbers between the slide window durations from streaming input and the RDD time windows.
It only logs a warning: you fix it, and the context stops failing :D
I have a script which executes a program several times, producing about 350 lines of output to both STDERR and STDOUT. Now, I need to execute the script in Java, thereby printing the output streams to their original destinations. So, basically, I execute the script from inside a Java class, maintaining the original behavior for the user.
The way I do this is inspired from suggestions like Reading streams from java Runtime.exec and, functionally, works fine.
Process p = Runtime.getRuntime().exec(cmdarray);
new Thread(new ProcessInputStreamHandler(p.getInputStream(), System.out)).start();
new Thread(new ProcessInputStreamHandler(p.getErrorStream(), System.err)).start();
return p.waitFor();
And the class ProcessInputStreamHandler:
class ProcessInputStreamHandler implements Runnable {
private BufferedReader in_reader;
private PrintStream out_stream;
public ProcessInputStreamHandler(final InputStream in_stream, final PrintStream out_stream) {
this.in_reader = new BufferedReader(new InputStreamReader(in_stream));
this.out_stream = out_stream;
}
#Override public void run() {
String line;
try {
while ((line = in_reader.readLine()) != null) {
out_stream.println(line);
}
} catch (Exception e) {throw new Error(e);}
out_stream.flush();
}
}
Now regarding my problem statement: While the execution of the script takes about 17 seconds, the "encapsulated" execution takes at least 21 seconds. Where do I lose these 4 or more seconds?
I already tried using a ProcessBuilder with redirection of STDERR to STDOUT, using POSIX vfork with libraries like https://github.com/axiak/java_posix_spawn, using a byte buffer instead of a BufferedReader... everything with no positive result at all.
Are there any suggestings? I understand that there will be some performance loss, but 4 seconds seem to be a bit much to me...
Appreciate any suggestions!
Best Regards and Thanks in Advance.
The fastest way for your task is to use Java 7 and
return new ProcessBuilder(cmdarray).inheritIO().start().waitFor();
If that doesn’t help, I think there’s nothing you can do as every other approach would add even more code to your runtime environment that has to be processed.
Don't know if it will improve performance or not, but you can try the NuProcess library which while also providing non-blocking (asynchronous) I/O will also use vfork on Linux, which does decrease process launch times (and memory overhead) quite a bit.