Java idiom for "piping"

Java idiom for "piping" - java

Is there a more concise/standard idiom (e.g., a JDK method) for "piping" an input to an output in Java than the following?
public void pipe(Reader in, Writer out) {
CharBuffer buf = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
while (in.read(buf) >= 0 ) {
out.append(buf.flip());
buf.clear();
}
}
[EDIT] Please note the Reader and Writer are given. The correct answer will demonstrate how to take in and out and form a pipe (preferably with no more than 1 or 2 method calls). I will accept answers where in and out are an InputStream and an OutputStream (preferably with a conversion from/to Reader/Writer). I will not accept answers where either in or out is a subclass of Reader/InputStream or Writer/OutputStrem.

IOUtils from the Apache Commons project has a number of utilily methods that do exactly what you need.
IOUtils.copy(in, out) will perform a buffered copy of all input to the output. If there is more than one spot in your codebase that requires Stream or Reader/Writer handling, using IOUtils could be a good idea.

Take a look at java.io.PipedInputStream and PipedOutputStream, or PipedReader/PipedWriter from the same package.
From the Documentation of PipedInputStream:
A piped input stream should be connected to a piped output stream; the piped input stream then provides whatever data bytes are written to the piped output stream. Typically, data is read from a PipedInputStream object by one thread and data is written to the corresponding PipedOutputStream by some other thread. Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread. The piped input stream contains a buffer, decoupling read operations from write operations, within limits. A pipe is said to be broken if a thread that was providing data bytes to the connected piped output stream is no longer alive.

The only optimization available is through FileChannels in the NIO API: Reads, Writes. The JVM can optimize this call to move the data from a file to a destination Channel without first having to move the data to kernel space. See this article for details.

Related

Difference between methods to read a byte from TCP server?

Im trying to read information sent for a client on android using the TCP protocol. In my server I have this code:
InputStream input = clienteSocket.getInputStream();
int c = input.read();
c will containt the ascci number that the client send.
I also can get this by writing:
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
I would like to know what is the difference between both methods.

You're comparing apples and oranges here.
Your first example reads one byte from the stream, unbuffered, and returns the value of that byte. (Adding 'ASCII number' to that adds no actual information.)
Your second example sets up a buffered reader, which can read chars from the stream, buffered, but it doesn't actually read anything.
You could set up two further examples:
InputStream is = new BufferedInputStream(socket.getInputStream());
int c = is.read();
This reads a byte, with buffering.
Reader reader = new InputStreamReader(socket.getInputStream();
int c = reader.read();
This reads a char, with a little buffering: not as much as BufferedReader provides.
The realistic choices are between the two buffered versions, for efficiency reasons as outlined by #StephenC, and the choice between them is dictated by whether you want bytes or chars.

The buffered approach is better because (in most cases) reduces the number of syscalls that the JVM needs to make to the operating system. Since syscalls are relatively expensive, buffering generally gives you better performance.
In your specific example:
Each time you call c.read() on an input stream you do a syscall.
The first time you do a c.read() (or other read operation) on a buffered input stream, it reads a number of bytes into an in-memory byte-array. In second, third, etc calls to c.read(), the read will typically return a byte out of the in-memory buffer, without making a syscall.
In your example, the only case where using a buffered stream doesn't help would be if you are going to read only one byte from the socket, and then close it.
UPDATE
I didn't notice that you were comparing an unbuffered InputStream with a buffered >> Reader <<. As #EJP, points out, this is "comparing Apples and Oranges". The functionality of the two versions is different. One reades bytes and the other reads characters.
(And if you don't understand that distinction ... and why it is an important distinction ... you would be advised to read the Java Tutorial lesson on Basic I/O. Particularly the sections on byte streams, character streams and buffered streams.)

InputStreamReader buffering

In the InputStreamReader class documentation it is declared that:
To enable the efficient conversion of bytes to characters, more bytes
may be read ahead from the underlying stream than are necessary to
satisfy the current read operation.
what does this statement mean ?
Does the implementation buffer some input data ?
If so, after the use of an InputStreamReader the current position, on the stream we are reading, may not be what we expect...

You have read the documentation correctly, and correctly deduced that you cannot reliably go back to reading the underlying input stream and expect to know where you are in the stream.

How to reliably read and write to a running external process from Java?

I would like to be able to spawn an external process from Java, and periodically write to its input and read the response as if it were the console. Much of the time, however, when I read the process' output, nothing is available. Is there a good practice to do this sort of thing (even avoiding it)?
Here's a stripped down example of what doesn't work:
import org.apache.commons.exec.*;
import java.io.*;
//...
CommandLine cl = CommandLine.parse("/usr/bin/awk {print($1-1)}");
System.out.println(cl.toStrings());
Process proc = new ProcessBuilder(cl.toStrings()).start();
OutputStream os = proc.getOutputStream(); // avoiding *Buffered* classes
InputStream is = proc.getInputStream(); // to lessen buffering complications
os.write(("4" + System.getProperty("line.separator")).getBytes());
os.flush(); // Doesn't seem to flush.
// os.close(); // uncommenting works, but I'd like to keep the process running
System.out.println("reading");
System.out.println(is.read()); // read even one byte? usually is.available() -> 0
Strangely, if I wrap up the OutputStream in a BufferedWriter, the I can read from some processes (cat), but not others (awk, grep).

Generally, the approach taken is proper. Few things though:
InputStream.read() is a blocking method. It waits for input and is not CPU-intensive. You should revolve around it...
More than one byte can be read, just use read(byte[] buffer, int offset, int len)
Wrapping input streams to BufferedInputStream eases the access (readLine() method). This is an alternative to BufferedReader.
Don't forget to use Process.waitFor().
Also, make sure that the external process writes to standard output (and not to standard error). Two possibilities here:
use Process.getErrorStream() (and treat it as another InputStream)
change command line to /usr/bin/awk {print($1-1)} 2>&1. This will redirect standard error to standard output.

flushing input stream: java

Is there a way to flush the input stream in Java, perhaps prior to closing it? In relation to
iteratively invoking the statements below, while reading several files on disk
InputStream fileStream = item.openStream();
fileStream.close;

InputStream cannot be flushed. Why do you want to do this?
OutputStream can be flushed as it implements the interface Flushable. Flushing makes IMHO only sense in scenarios where data is written (to force a write of buffered data). Please see the documentation of Flushable for all implementing classes.

This is an old question but appears to be the only one of its kind, and I think there is a valid use case for "flushing" an input stream.
In Java, if you are using a BufferedReader or BufferedInputStream (which I believe is a common case), "flushing" the stream can be considered to be equivalent to discarding all data currently in the buffer -- i.e., flushing the buffer.
For an example of when this might be useful, consider implementing a REPL for a programming language such as Python or similar.
In this case you might use a BufferedReader on System.in. The user enters a (possibly large) expression and hits enter. At this point, the expression (or some part of it) is stored in the buffer of your Reader.
Now, if a syntax error occurs somewhere within the expression, it will be caught and displayed to the user. However, the remainder of the expression still resides in the input buffer.
When the REPL loop continues, it will begin reading just beyond the point where the syntax error occurred, in the middle of some erroneous expression. This is likely not desirable. Rather, it would be better to simply discard the remainder of the buffer and continue with a "fresh start."
In this sense, we can use the BufferedReader API method ready() to discard any remaining buffered characters. The documentation reads:
"Tells whether this stream is ready to be read. A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready."
Then we can define a method to "flush" a BufferedReader as:
void flush(BufferedReader input) throws IOException
{
while (input.ready())
input.read();
}
which simply discards all remaining data until the buffer is empty. We then call flush() after handling a syntax error (by displaying to the user). When the REPL loop resumes you have an empty buffer and thus do not get a pile of meaningless errors caused by the "junk" left in the buffer.

Should I use DataInputStream or BufferedInputStream

I want to read each line from a text file and store them in an ArrayList (each line being one entry in the ArrayList).
So far I understand that a BufferedInputStream writes to the buffer and only does another read once the buffer is empty which minimises or at least reduces the amount of operating system operations.
Am I correct - do I make sense?
If the above is the case in what situations would anyone want to use DataInputStream. And finally which of the two should I be using and why - or does it not matter.

Use a normal InputStream (e.g. FileInputStream) wrapped in an InputStreamReader and then wrapped in a BufferedReader - then call readLine on the BufferedReader.
DataInputStream is good for reading primitives, length-prefixed strings etc.

The two classes are not mutually exclusive - you can use both of them if your needs suit.
As you picked up, BufferedInputStream is about reading in blocks of data rather than a single byte at a time. It also provides the convenience method of readLine(). However, it's also used for peeking at data further in the stream then rolling back to a previous part of the stream if required (see the mark() and reset() methods).
DataInputStream/DataOutputStream provides convenience methods for reading/writing certain data types. For example, it has a method to write/read a UTF String. If you were to do this yourself, you'd have to decide on how to determine the end of the String (i.e. with a terminator byte or by specifying the length of the string).
This is different from BufferedInputStream's readLine() which, as the method sounds like, only returns a single line. writeUTF()/readUTF() deal with Strings - that string can have as many lines it it as it wants.
BufferedInputStream is suitable for most text processing purposes. If you're doing something special like trying to serialize the fields of a class to a file, you'd want to use DataInput/OutputStream as it offers greater control of the data at a binary level.
Hope that helps.

You can always use both:
final InputStream inputStream = ...;
final BufferedInputStream bufferedInputStream =
new BufferedInputStream(inputStream);
final DataInputStream dataInputStream =
new DataInputStream(bufferedInputStream);

InputStream: Base class to read byte from stream (network or file ), provide ability to read byte from the stream and delete the end of the stream.
DataInputStream: To read data directly as a primitive datatype.
BufferInputStream: Read data from the input stream and use buffer to optimize the speed to access the data.

You shoud use DataInputStream in cases when you need to interpret the primitive types in a file written by a language other Java in platform-independent manner.

I would advocate using Jakarta Commons IO and the readlines() method (of whatever variety).
It'll look after buffering/closing etc. and give you back a list of text lines. I'll happily roll my own input stream wrapping with buffering etc., but nine times out of ten the Commons IO stuff works fine and is sufficient/more concise/less error prone etc.

The differences are:
The DataInputStream works with the binary data, while the BufferedReader work with character data.
All primitive data types can be handled by using the corresponding methods in DataInputStream class, while only string data can be read from BufferedReader class and they need to be parsed into the respective primitives.
DataInputStream is a part of filtered streams, while BufferedReader is not.
DataInputStream consumes less amount of memory space being it is a binary stream, whereas BufferedReader consumes more memory space being it is character stream.
The data to be handled is limited in DataInputStream, whereas the number of characters to be handled has wide scope in BufferedReader.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.