Reading external process Error Stream heavily impacts performance - java

I have a Java program that (including other stuff) reads from an external Python application using Input Stream.
Here is the code I use to read it:
InputStreamReader isr = new InputStreamReader(p.getInputStream()),
isrError = new InputStreamReader(p.getErrorStream());
BufferedReader br = new BufferedReader(isr), brError = new BufferedReader(isrError);
new Thread() {
#Override
public void run() {
try {
while (brError.readLine() != null);
} catch (Exception e) {
}
}
}.start();
while ((line = br.readLine()) != null) { //line is a previously declared String
//do whatever with line
}
I create the thread to read the Error Stream too, because the Python application throws errors when something goes wrong (I can't edit it, it is third party software), and for some reason eventually the InputStream gets blocked if I don't read the ErrorStream.
Is there any way to make while (brError.readLine() != null); have less impact on performance?
Right now I am looking at performance with VisualVM, and while the Java software usually stays between 0-5% CPU usage, which is pretty nice, but around 60-65% of that usage is being used by this loop in this thread, which it's only function is to prevent the main loop from blocking. And I need to improve the performance as much as possible (This is going into industrial lines, so using resources correctly is really important).
Thank you all.

For easier handling (if you don't need the contents while running), use redirectError(File) in ProcessBuilder.
ProcessBuilder pb = new ProcessBuilder("foo", "-bar");
pb.redirectError(new File("/tmp/errors.log"));
pb.start();
If you're getting cpu spinning from while (brError.readLine() != null);, you should look at what the error stream is returning. Since readLine() is a blocking call, it would mean that the error stream is pumping a lot of lines out.

You're converting the throw-away stream to characters needlessly, which may be a bit costly, especially when you're using UTF-8 (depending on the platform encoding is usually wrong, anyway).
Drop the Reader, use BufferedInputStream for the throw-away stream.
However, for external processes, the redirection is surely superior as there's no processing in Java at all.

Related

Does a Java InputStream help or hurt memory usage with large files?

I see some posts on StackOverflow that contradict each other, and I would like to get a definite answer.
I started with the assumption that using a Java InputStream would allow me to stream bytes out of a file, and thus save on memory, as I would not have to consume the whole file at once. And that is exactly what I read here:
Loading all bytes to memory is not a good practice. Consider returning the file and opening an input stream to read it, so your application won't crash when handling large files. – andrucz
Download file to stream instead of File
But then I used an InputStream to read a very large Microsoft Excel file (using the Apache POI library) and I ran into this error:
java.lang.outofmemory exception while reading excel file (xlsx) using POI
I got an OutOfMemory error.
And this crucial bit of advice saved me:
One thing that'll make a small difference is when opening the file to start with. If you have a file, then pass that in! Using an InputStream requires buffering of everything into memory, which eats up space. Since you don't need to do that buffering, don't!
I got rid of the InputStream and just used a bare java.io.File, and then the OutOfMemory error went away.
So using java.io.File is better than an InputSteam, when it comes to memory use? That doesn't make any sense.
What is the real answer?
So you are saying that an InputStream would typically help?
It entirely depends on how the application (or library) >>uses<< the InputStream
With what kind of follow up code? Could you offer an example of memory efficient Java?
For example:
// Efficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line;
while ((line = br.readLine()) != null) {
// process one line
}
}
// Inefficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
sb.append(line).append("\n");
}
String everything = sb.toString();
// process the entire string
}
// Very inefficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String everything = "";
while ((line = br.readLine()) != null) {
everything += line + "\n";
}
// process the entire string
}
(Note that there are more efficient ways of reading a file into memory. The above examples are purely to illustrate the principles.)
The general principles here are:
avoid holding the entire file in memory, all at the same time
if you have to hold the entire file in memory, then be careful about you "accumulate" the characters.
The posts that you linked to above:
The first one is not really about memory efficiency. Rather it is talking about a limitation of the AWS client-side library. Apparently, the API doesn't provide an easy way to stream an object while reading it. You have to save it the object to a file, then open the file as a stream. Whether that is memory efficient or not depends on what the application does with the stream; see above.
The second one specific to the POI APIs. Apparently, the POI library itself is reading the stream contents into memory if you use a stream. That would be an implementation limitation of that particular library. (But there could be a good reason; e.g. maybe because POI needs to be able to "seek" or "rewind" the stream.)

Error stream blocking when running external command with Java

Working on SEAndroid, I call Setools commands from my Java application.
It works perfectly with small SEAndroid policy and now I need to test my tool with real
SEAndroid policy. But unfortunately, I face a problem with an error stream.
Here my code I used to call external commands :
public static BufferedReader runCommand(final String[] args)
throws IOException {
BufferedReader stdInput = null;
BufferedReader stdError = null;
try {
Process p = Runtime.getRuntime().exec(args);
stdInput = new BufferedReader(new
InputStreamReader(p.getInputStream()));
stdError = new BufferedReader(new
InputStreamReader(p.getErrorStream()));
// read any errors from the attempted command
String s = null;
StringBuilder err = new StringBuilder();
while ((s = stdError.readLine()) != null) {
err.append(s + "\n");
}
if (err.length() != 0) {
throw new IOException(err.toString());
}
return stdInput;
} finally {
if (stdError != null) {
stdError.close();
}
}
}
So, as you can see, I call an external command. Then read the error stream and throw an exception if there is any errors, otherwise I return the InputStream, so I can parse it later.
With a real SEAndroid policy, the error stream seems to block (even if I read a single char) and I can't parse the result of the command. If I close the error stream without reading anything, the application works fine, but I want to handle errors if any.
If I type the command in a console, it works fine too.
In the first case (with small SEAndroid policy), the output of the command is small ( ~350 lines).
In the second case (with a real SEAndroid policy), the output of the command is larger ( >1500 lines).
Is it possible that the size of the output stream influences the error stream? The two streams are two distinctive resources, isn't it?
The fact that I do not read the output stream immediately have an importance?
I fear that its not a "programming" problem but more a system problem...
Any suggestion?
Thanks in advance for your help=)
Edit:
I try to read the output stream before the error stream and it works. But I need to check the error stream before perform any parsing on the output stream, so the problem is still topical.
First, it's probably better to use the newer ProcessBuilder class as opposed to Runtime exec. If you want to go a step further, you can even use Apache commons-exec which takes care of stream handling and other things for you.
Next, as you've discovered, process control is a tricky thing in Java and you've run into one of its tricky issues. From the documentation for java's Process class:
The parent process uses these streams to feed input to and get output from the subprocess. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.
You need to have something consuming both (Error and Output) streams or you risk deadlock - these should each be read on their own threads. Using something like a StreamGobbler (google it, there are plenty out there) would be a good step, or you can roll your own if you're so inclined. It isn't too hard to get it right but if you're unfamiliar with multithreading you may want to look at someone else's implementation or go the Apache commons-exec route.
The processing of output is so annoying, that I wrote little library called jproc that deals with the problem of consuming stdout and stderr. It can simply filter strings through external programs likes this:
ProcBuilder.filter("x y z","sed" ,"s/y/a/")
It also lets you specify a timeout for the completion and will convert non-zero exit codes into exception.

Why does java.lang.process's readline() behave differently for reading inputstream on different boxes with the same os

I tested this code(below) on several different linux boxes(4+) and it worked fine. However, on one linux box I ran into an issue with readline() hanging for the error inputStream(errorStream). This stream should be empty so I suspected that box was not writing out a line terminator to the errorStream for the error. I changed my code to use read() instead of readline()...but read() also hung.
I tried retrieving the input inputStream first, and that worked and there was no hangs with readline()/read() for the error inputstream. I could not do this since I needed to obtain possible errors first. Appearing to be a deadlock, I was able to resolve this by having each inputstream read from it's own thread. Why did I only see this issue on one box? Is there a kernel setting or some other setting specific to this box that could have caused this?
ProcessBuilder processBuilder = new ProcessBuilder()
try
{
Process processA = null;
synchronized (processBuilder)
{
processBuilder.command("/bin/sh","-c"," . /Home/SomeScript.ksh");
processA = processBuilder.start();
}
inputStream = processA.getInputStream();
reader = new BufferedReader(new InputStreamReader(inputStream));
errorStream = processA.getErrorStream();
errorReader = new BufferedReader(new InputStreamReader(errorStream));
String driverError;
while ((driverError = errorReader.readLine()) != null)
{
//some code
}
Why did I only see this issue on one box?
Most likely because of something in the script that is being run ... and its interactions with its environment (e.g. files, environment variables, etc)
Is there a kernel setting or some other setting specific to this box that could have caused this?
It is possible but unlikely that it is a kernel setting. It might be "something else". Indeed, it has to be "something" outside of the Java application that is to blame, at least in part.
I suggest you do the following temporarily (at least):
ProcessBuilder processBuilder = new ProcessBuilder();
processBuilder.command("/bin/sh","-c"," . /Home/SomeScript.ksh");
processBuilder.redirectErrorStream(true);
processA = processBuilder.start();
inputStream = processA.getInputStream();
reader = new BufferedReader(new InputStreamReader(inputStream));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
System.out.println("Return code is " + processA.exitValue());
That way you can see what all of the output is.
There should not be a problem if the external process fails to put a newline at the end of the last line. The Java process will see an EOF on the input stream, and the BufferedReader will return what characters it has ... and return null on the next call.
Another possibility is that the external process is blocking because it is trying to read from its standard input.
UPDATE
The redirectErrorStream also resolved the issue, but I need the error stream separate.
OK so if it did (reliably) solve the problem then that (most likely) means that you have to read the external processes stdout and stderr streams in parallel. The simple way to do is to create 2 threads to read and buffer the two streams separately. For example: Capturing stdout when calling Runtime.exec
(Your problem is due to the fact that pipes have a finite buffering capacity. The external problem is most likely alternating between writing stuff to stdout and stderr. If it tries to write to one of the pipes when that pipe is "full", it will block. But if your application is reading all of the other pipe (to EOF) before it reads the blocked pipe, then everything will deadlock. The fact that the external process is stuck in PIPE_W state is more evidence for this explanation.
One possible reason that you are seeing different behaviour on different systems is that the amount of buffering in a pipe is system dependent. But it could also be due to differences in what the external process is doing; e.g. its inputs.)
You are running OS specific commands in a script, any one could be holding the error output. You can avoid this by discarding the errors, but that is unlikely to be a good idea.
I would check the version of the OS are the same and whether you have any significant differences in the command you run in the script. If this doesn't help, take out commands from the script until it starts working. I assume an empty script doesn't do this.

Read output from external process

I am trying to run a .csh script and read it's output into a StringBuffer.
the output sometime returns empty although running the script from console returns some output. the same running flow can sometimes returns output and sometimes not, although nothing is changed in the way the process starts (same script, path , args) and the script isn't changed as well.
I'm not getting any exceptions thrown.
what might cause output now to be read correctly/successfully ?
the code segment is
public static String getOutpoutScript(Process p) {
InputStream outpout = p.getInputStream();
logger.info("Retrived script output stream");
BufferedReader buf = new BufferedReader(new InputStreamReader(outpout));
String line = "";
StringBuffer write = new StringBuffer();
try {
while ((line = buf.readLine()) != null) {
write.append(line);
}
} catch (IOException e) {
// do something
}
return write.toString().trim();
}
beside the fact not closing the streams is not good, could this or something else in the code might prevent output from being read correctly under some circumstances ?
thanks,
If you launch it with ProcessBuilder, you can combine the error stream into the output stream. This way if the program prints to stderr you'll capture this too. Alternatively you could just read both. Additionally, you may not want to use readLine, you could be stuck for awhile if the program does not print end of line character at the end.
Maybe you must replace p.getInputStream() with p.getOutputStream()
Besides this sometimes processes can block waiting on input, so you must read and write asynchronously - one possible solution is to use different threads - e.g. one thread is reading, other is writing and one that is monitoring the process.
If you have an error, this will write to getErrorStream() by default. If you have a problem, I would ensure you are reading this somewhere.
If the buffer for this stream fills, your program will stop, waiting for you to read it.
A simple way around these issues is to use ProcessBuilder.redirectErrorStream(true)

Java - RandomAccessFile (Emulating the Linux tail function)

Java IO implementation of unix/linux "tail -f" has a similar problem; but the solution is not viable for log files that generate about 50-100 lines per second.
I have an algorithm that emulates the tail functionality in Linux. For example,
File _logFile = new File("/tmp/myFile.txt");
long _filePtr = _logFile.length();
while (true)
{
long length = _logFile.length();
if (length < _filePtr)
{
// means file was truncated
}
else if (length > _filePtr)
{
// means something was added to the file
}
// we ignore len = _filePtr ... nothing was written to file
}
My problem is when: "something was added to the file" (referring to the else if() statement).
else if (length > _filePtr)
{
RandomAccessFile _raf = new RandomAccessFile(_logFile, "r");
raf.seek(_filePtr);
while ((curLine = raf.readLine()) != null)
myTextPane.append(curLine);
_filePtr = raf.getFilePointer();
raf.close();
}
The program blocks at while ((curLine = raf.readLine()).... after 15 seconds of run-time! (Note: that the program runs right for the first 15 seconds).
It appears that raf.readLine() is never hitting NULL, because I believe this log file is being written so fast that we go into an "endless cat and mouse" loop.
What's the best way to emulate Linux's tail?
I would think that you would be best served by grabbing a block of bytes based on the file's length, then release the file and parse a ByteArrayInputStream (instead of trying to read directly from the file).
So use RandomAccessFile#read(byte[]), and size the buffer using the returned file length. You won't always show the exact end of the file, but that is to be expected with this sort of polling algorithm.
As an aside, this algorithm is horrible - you are running IO operations in a crazy tight loop - the calls to File#length() will block, but not very much. Expect this routine to take your app to it's knees CPU-wise. I don't necessarily have a better solution for you (well - actually, I do - have the source application write to a stream instead of a file - but I recognize that isn't always feasible).
In addition to the above, you may want to introduce a polling delay (sleep the thread by 100ms each loop - it looks to me like you are displaying to a GUI - a 100ms delay won't hurt anyone, and will greatly improve the performance of the swing operations).
ok - final beef: You are adjusting a Swing component from what (I hope) is code not running on the EDT. Use SwingWorker#invokeLater() to update your text pane.
It appears I have found the problem and created a solution.
Under the else if statement:
while ((curLine = raf.readLine()) != null)
myTextPane.append(curLine);
This was the problem. the append(String) method of myTextPane (which is a derived class of JTextPane) evoked "setCaretPosition()" on every line append which IS BAD!!
That meant that setCaretPosition() was called 50-100 Hz trying to "scroll down." This caused a blocking overhead to the interface.
A simple solution was to create a StringBuffer class and append "curLine" until raf.readLine() read null.
Then, append the StringBuffer and voila ... no more blocking from setCaretPosition()!
Thanks to Kevin for bringing me towards the correct direction.
You could always exec the tail program:
BufferedReader in = new BufferedReader(new InputStreamReader(
Runtime.getRuntime().exec("tail -F /tmp/myFile.txt").getInputStream()));
String line;
while ((line = in.readLine()) != null) {
// process line
}

Categories