Read all standard input into a Java byte array - java

What's the simplest way in modern Java (using only the standard libraries) to read all of standard input until EOF into a byte array, preferably without having to provide that array oneself? The stdin data is binary stuff and doesn't come from a file.
I.e. something like Ruby's
foo = $stdin.read
The only partial solution I could think of was along the lines of
byte[] buf = new byte[1000000];
int b;
int i = 0;
while (true) {
b = System.in.read();
if (b == -1)
break;
buf[i++] = (byte) b;
}
byte[] foo[i] = Arrays.copyOfRange(buf, 0, i);
... but that seems bizarrely verbose even for Java, and uses a fixed size buffer.

I'd use Guava and its ByteStreams.toByteArray method:
byte[] data = ByteStreams.toByteArray(System.in);
Without using any 3rd party libraries, I'd use a ByteArrayOutputStream and a temporary buffer:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[32 * 1024];
int bytesRead;
while ((bytesRead = System.in.read(buffer)) > 0) {
baos.write(buffer, 0, bytesRead);
}
byte[] bytes = baos.toByteArray();
... possibly encapsulating that in a method accepting an InputStream, which would then be basically equivalent to ByteStreams.toByteArray anyway...

If you're reading from a file, Files.readAllBytes is the way to do it.
Otherwise, I'd use a ByteBuffer:
ByteBuffer buf = ByteBuffer.allocate(1000000);
ReadableByteChannel channel = Channels.newChannel(System.in);
while (channel.read(buf) >= 0)
;
buf.flip();
byte[] bytes = Arrays.copyOf(buf.array(), buf.limit());

Related

Reading InputStream bytes and writing to ByteArrayOutputStream

I have code block to read mentioned number of bytes from an InputStream and return a byte[] using ByteArrayOutputStream. When I'm writing that byte[] array to a file, resultant file on the filesystem seems broken. Can anyone help me find out problem in the below code block.
public byte[] readWrite(long bytes, InputStream in) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int maxReadBufferSize = 8 * 1024; //8KB
long numReads = bytes/maxReadBufferSize;
long numRemainingRead = bytes % maxReadBufferSize;
for(int i=0; i<numReads; i++) {
byte bufr[] = new byte[maxReadBufferSize];
int val = in.read(bufr, 0, bufr.length);
if(val != -1) {
bos.write(bufr);
}
}
if(numRemainingRead > 0) {
byte bufr[] = new byte[(int)numRemainingRead];
int val = in.read(bufr, 0, bufr.length);
if(val != -1) {
bos.write(bufr);
}
}
return bos.toByteArray();
}
My understanding of the problem statement
Read bytes number of bytes from the given InputStream in a ByteArrayOutputStream.
Finally, return a byte array.
Key observations
A lot of work is done to make sure bytes are read in chunks of 8KB.
Also, the last remaining chunk of odd size is read separately.
A lot of work is also done to make sure we are reading from the correct offset.
My views
Unless we are reading a very large file (>10MB) I don't see a valid reason for reading in chunks of 8KB.
Let Java libraries do all the hard work of maintaining offset and making sure we don't read outside limits.
Eg: We don't have to give offset, simply do inputStream.read(b) over and over, the next byte array of size b.length will be read. Similarly, we can simply write to outputStream.
Code
public byte[] readWrite(long bytes, InputStream in) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buffer = new byte[(int)bytes];
is.read(buffer);
bos.write(buffer);
return bos.toByteArray();
}
References
About InputStreams
Byte Array to Human Readable Format

Inflate output to file performance improvement

I am following code similar to below. Looking around at different implementations it seems that most people are performing the same operations by doing the byte copy. Is there possible a faster way to handle inflating from a file and printing back out to file?
public static String unzipString(InputStream in) {
try {
int length = (int) in.readUBits( 16 );
// Add extra byte to array when Inflater is set to true
byte[] data = in.read( length );
ByteArrayInputStream bin = new ByteArrayInputStream(input);
InflaterInputStream in = new InflaterInputStream(bin);
FileoutputStream bout = new FileoutputStream(this.file);
int b;
while ((b = in.read()) != -1) {
bout.write(b);
}
bout.close();
} catch (IOException io) {
return null;
}
}
copying one byte at a time is always going to be a very slow way to process a file. I suggest you use a buffer of say 8 KB instead.
try (FileOutputStream fout = new FileOutputStream(this.file)) {
byte[] bytes = new byte[8192];
for (int len; (len = in.read(bytes)) != -1;)
fout.write(b, 0, len);
}
BTW To make it faster you could avoid copying the byte[] in the first place with InputStream which wraps in but reads exactly length bytes.

Trim Padding From ByteArrayOutputStream

I'm working with Amazon S3 and would like to upload an InputStream (which requires counting the number of bytes I'm sending).
public static boolean uploadDataTo(String bucketName, String key, String fileName, InputStream stream) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[1];
try {
while (stream.read(buffer) != -1) { // copy from stream to buffer
out.write(buffer); // copy from buffer to byte array
}
} catch (Exception e) {
UtilityFunctionsObject.writeLogException(null, e);
}
byte[] result = out.toByteArray(); // we needed all that just for length
int bytes = result.length;
IO.close(out);
InputStream uploadStream = new ByteArrayInputStream(result);
....
}
I was told copying a byte at a time is highly inefficient (obvious for large files). I can't make it more because it will add padding to the ByteArrayOutputStream, which I can't strip out. I can strip it out from result, but how can I do it safely? If I use an 8KB buffer, can I just strip out the right most buffer[i] == 0? Or is there a better way to do this? Thanks!
Using Java 7 on Windows 7 x64.
You can do something like this:
int read = 0;
while ((read = stream.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
stream.read() returns the number of bytes that have been written into buffer. You can pass this information to the len parameter of out.write(). So you make sure that you write only the bytes you have read from the stream.
Use Jakarta Commons IOUtils to copy from the input stream to the byte array stream in a single step. It will use an efficient buffer, and not write any excess bytes.
If you want efficiency you could process the file as you read it. I would replace uploadStream with stream and remove the rest of the code.
If you need some buffering you can do this
InputStream uploadStream = new BufferedInputStream(stream);
the default buffer size is 8 KB.
If you want the length use File.length();
long length = new File(fileName).length();

Decoding characters in Java: why is it faster with a reader than using buffers?

I am trying several ways to decode the bytes of a file into characters.
Using java.io.Reader and Channels.newReader(...)
public static void decodeWithReader() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
Reader reader = Channels.newReader(channel, decoder, -1);
final char[] buffer = new char[4096];
for(;;) {
if(-1 == reader.read(buffer)) {
break;
}
}
fis.close();
}
Using buffers and a decoder manually:
public static void readWithBuffers() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
final long fileLength = channel.size();
long position = 0;
final int bufferSize = 1024 * 1024; // 1MB
CharBuffer cbuf = CharBuffer.allocate(4096);
while(position < fileLength) {
MappedByteBuffer bbuf = channel.map(MapMode.READ_ONLY, position, Math.min(bufferSize, fileLength - position));
for(;;) {
CoderResult res = decoder.decode(bbuf, cbuf, false);
if(CoderResult.OVERFLOW == res) {
cbuf.clear();
} else if (CoderResult.UNDERFLOW == res) {
break;
}
}
position += bbuf.position();
}
fis.close();
}
For a 200MB text file, the first approach consistently takes 300ms to complete. The second approach consistently takes 700ms. Do you have any idea why the reader approach is so much faster?
Can it run even faster with another implementation?
The benchmark is performed on Windows 7, and JDK7_07.
For comparison can you try.
public static void readWithBuffersISO_8859_1() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
MappedByteBuffer bbuf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
while(bbuf.remaining()>0) {
char ch = (char)(bbuf.get() & 0xFF);
}
fis.close();
}
This assumes an ISO-8859-1. If you want maximum speed, treating the text like a binary format can help if its an option.
As #EJP points out, you are changing a number of things as once and you need to start with the simplest comparable example and see how much difference each element adds.
Here is a third implementation that does not use mapped buffers. In the same conditions than before, it runs consistently in 220ms. The default charset on my machine being "windows-1252", if I force the simpler "ISO-8859-1" charset the decoding is even faster (about 150ms).
It looks like the usage of native features like mapped buffers actually hurts performance (for this very use case). Also interesting, if I allocate direct buffers instead of heap buffers (look at the commented lines) then the performance is reduced (a run then takes around 400ms).
So far the answer seems to be: to decode characters as fast as possible in Java (provided you can't enforce the usage of one charset), use a decoder manually, write the decode loop with heap buffers, do not use mapped buffers or even native ones. I have to admit that I don't really know why it is so.
public static void readWithBuffers() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
// CharsetDecoder decoder = Charset.forName("ISO-8859-1").newDecoder();
ByteBuffer bbuf = ByteBuffer.allocate(4096);
// ByteBuffer bbuf = ByteBuffer.allocateDirect(4096);
CharBuffer cbuf = CharBuffer.allocate(4096);
// CharBuffer cbuf = ByteBuffer.allocateDirect(2 * 4096).asCharBuffer();
for(;;) {
if(-1 == channel.read(bbuf)) {
decoder.decode(bbuf, cbuf, true);
decoder.flush(cbuf);
break;
}
bbuf.flip();
CoderResult res = decoder.decode(bbuf, cbuf, false);
if(CoderResult.OVERFLOW == res) {
cbuf.clear();
} else if (CoderResult.UNDERFLOW == res) {
bbuf.compact();
}
}
fis.close();
}

Write an InputStream to an HttpServletResponse

I have an InputStream that I want written to a HttpServletResponse.
There's this approach, which takes too long due to the use of byte[]
InputStream is = getInputStream();
int contentLength = getContentLength();
byte[] data = new byte[contentLength];
is.read(data);
//response here is the HttpServletResponse object
response.setContentLength(contentLength);
response.write(data);
I was wondering what could possibly be the best way to do it, in terms of speed and efficiency.
Just write in blocks instead of copying it entirely into Java's memory first. The below basic example writes it in blocks of 10KB. This way you end up with a consistent memory usage of only 10KB instead of the complete content length. Also the enduser will start getting parts of the content much sooner.
response.setContentLength(getContentLength());
byte[] buffer = new byte[10240];
try (
InputStream input = getInputStream();
OutputStream output = response.getOutputStream();
) {
for (int length = 0; (length = input.read(buffer)) > 0;) {
output.write(buffer, 0, length);
}
}
As creme de la creme with regard to performance, you could use NIO Channels and a directly allocated ByteBuffer. Create the following utility/helper method in some custom utility class, e.g. Utils:
public static long stream(InputStream input, OutputStream output) throws IOException {
try (
ReadableByteChannel inputChannel = Channels.newChannel(input);
WritableByteChannel outputChannel = Channels.newChannel(output);
) {
ByteBuffer buffer = ByteBuffer.allocateDirect(10240);
long size = 0;
while (inputChannel.read(buffer) != -1) {
buffer.flip();
size += outputChannel.write(buffer);
buffer.clear();
}
return size;
}
}
Which you then use as below:
response.setContentLength(getContentLength());
Utils.stream(getInputStream(), response.getOutputStream());
BufferedInputStream in = null;
BufferedOutputStream out = null;
OutputStream os;
os = new BufferedOutputStream(response.getOutputStream());
in = new BufferedInputStream(new FileInputStream(file));
out = new BufferedOutputStream(os);
byte[] buffer = new byte[1024 * 8];
int j = -1;
while ((j = in.read(buffer)) != -1) {
out.write(buffer, 0, j);
}
I think that is very close to the best way, but I would suggest the following change. Use a fixed size buffer(Say 20K) and then do the read/write in a loop.
For the loop do something like
byte[] buffer=new byte[20*1024];
outputStream=response.getOutputStream();
while(true) {
int readSize=is.read(buffer);
if(readSize==-1)
break;
outputStream.write(buffer,0,readSize);
}
ps: Your program will not always work as is, because read don't always fill up the entire array you give it.

Categories