I have an InputStream that I want written to a HttpServletResponse.
There's this approach, which takes too long due to the use of byte[]
InputStream is = getInputStream();
int contentLength = getContentLength();
byte[] data = new byte[contentLength];
is.read(data);
//response here is the HttpServletResponse object
response.setContentLength(contentLength);
response.write(data);
I was wondering what could possibly be the best way to do it, in terms of speed and efficiency.
Just write in blocks instead of copying it entirely into Java's memory first. The below basic example writes it in blocks of 10KB. This way you end up with a consistent memory usage of only 10KB instead of the complete content length. Also the enduser will start getting parts of the content much sooner.
response.setContentLength(getContentLength());
byte[] buffer = new byte[10240];
try (
InputStream input = getInputStream();
OutputStream output = response.getOutputStream();
) {
for (int length = 0; (length = input.read(buffer)) > 0;) {
output.write(buffer, 0, length);
}
}
As creme de la creme with regard to performance, you could use NIO Channels and a directly allocated ByteBuffer. Create the following utility/helper method in some custom utility class, e.g. Utils:
public static long stream(InputStream input, OutputStream output) throws IOException {
try (
ReadableByteChannel inputChannel = Channels.newChannel(input);
WritableByteChannel outputChannel = Channels.newChannel(output);
) {
ByteBuffer buffer = ByteBuffer.allocateDirect(10240);
long size = 0;
while (inputChannel.read(buffer) != -1) {
buffer.flip();
size += outputChannel.write(buffer);
buffer.clear();
}
return size;
}
}
Which you then use as below:
response.setContentLength(getContentLength());
Utils.stream(getInputStream(), response.getOutputStream());
BufferedInputStream in = null;
BufferedOutputStream out = null;
OutputStream os;
os = new BufferedOutputStream(response.getOutputStream());
in = new BufferedInputStream(new FileInputStream(file));
out = new BufferedOutputStream(os);
byte[] buffer = new byte[1024 * 8];
int j = -1;
while ((j = in.read(buffer)) != -1) {
out.write(buffer, 0, j);
}
I think that is very close to the best way, but I would suggest the following change. Use a fixed size buffer(Say 20K) and then do the read/write in a loop.
For the loop do something like
byte[] buffer=new byte[20*1024];
outputStream=response.getOutputStream();
while(true) {
int readSize=is.read(buffer);
if(readSize==-1)
break;
outputStream.write(buffer,0,readSize);
}
ps: Your program will not always work as is, because read don't always fill up the entire array you give it.
Related
I am following code similar to below. Looking around at different implementations it seems that most people are performing the same operations by doing the byte copy. Is there possible a faster way to handle inflating from a file and printing back out to file?
public static String unzipString(InputStream in) {
try {
int length = (int) in.readUBits( 16 );
// Add extra byte to array when Inflater is set to true
byte[] data = in.read( length );
ByteArrayInputStream bin = new ByteArrayInputStream(input);
InflaterInputStream in = new InflaterInputStream(bin);
FileoutputStream bout = new FileoutputStream(this.file);
int b;
while ((b = in.read()) != -1) {
bout.write(b);
}
bout.close();
} catch (IOException io) {
return null;
}
}
copying one byte at a time is always going to be a very slow way to process a file. I suggest you use a buffer of say 8 KB instead.
try (FileOutputStream fout = new FileOutputStream(this.file)) {
byte[] bytes = new byte[8192];
for (int len; (len = in.read(bytes)) != -1;)
fout.write(b, 0, len);
}
BTW To make it faster you could avoid copying the byte[] in the first place with InputStream which wraps in but reads exactly length bytes.
I wanted to use Base64.java to encode and decode files. Encode.wrap(InputStream) and decode.wrap(InputStream) worked but runned slowly. So I used following code.
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException {
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE]; //final int BUFF_SIZE = 1024;
byte[] outBuff = null;
while (in.read(inBuff) > 0) {
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
out.flush();
out.close();
in.close();
}
However, it always throws
Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit
at java.util.Base64$Decoder.decode0(Base64.java:704)
at java.util.Base64$Decoder.decode(Base64.java:526)
at Base64Coder.JavaBase64FileCoder.decodeFile(JavaBase64FileCoder.java:69)
...
After I changed final int BUFF_SIZE = 1024; into final int BUFF_SIZE = 3*1024;, the code worked. Since "BUFF_SIZE" is also used to encode file, I believe there were something wrong with the file encoded (1024 % 3 = 1, which means paddings are added in the middle of the file).
Also, as #Jon Skeet and #Tagir Valeev mentioned, I should not ignore the return value from InputStream.read(). So, I modified the code as below.
(However, I have to mention that the code does run much faster than using wrap(). I noticed the speed difference because I had coded and intensively used Base64.encodeFile()/decodeFile() long before jdk8 was released. Now, my buffed jdk8 code runs as fast as my original code. So, I do not know what is going on with wrap()... )
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException
{
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE];
byte[] outBuff = null;
int bytesRead = 0;
while (true)
{
bytesRead = in.read(inBuff);
if (bytesRead == BUFF_SIZE)
{
outBuff = decoder.decode(inBuff);
}
else if (bytesRead > 0)
{
byte[] tempBuff = new byte[bytesRead];
System.arraycopy(inBuff, 0, tempBuff, 0, bytesRead);
outBuff = decoder.decode(tempBuff);
}
else
{
out.flush();
out.close();
in.close();
return;
}
out.write(outBuff);
}
}
Special thanks to #Jon Skeet and #Tagir Valeev.
I strongly suspect that the problem is that you're ignoring the return value from InputStream.read, other than to check for the end of the stream. So this:
while (in.read(inBuff) > 0) {
// This always decodes the *complete* buffer
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
should be
int bytesRead;
while ((bytesRead = in.read(inBuff)) > 0) {
outBuff = decoder.decode(inBuff, 0, bytesRead);
out.write(outBuff);
}
I wouldn't expect this to be any faster than using wrap though.
Try to use decode.wrap(new BufferedInputStream(new FileInputStream(inputFileName))). With buffering it should be at least as fast as your manually crafted version.
As for why your code doesn't work: that's because the last chunk is likely to be shorter than 1024 bytes, but you try to decode the whole byte[] array. See the #JonSkeet answer for details.
Well, I changed
"final int BUFF_SIZE = 1024;"
into
"final int BUFF_SIZE = 1024 * 3;"
It worked!
So, I guess probabaly there is something wrong with padding... I mean, when encoding the file, (since 1024 % 3 = 1) there must be paddings. And those might raise problems when decoding...
You should records the number of bytes you have read, beside this,
You should be sure that your buffer size is divisible for 3, cause in Base64, every 3 bytes have four output(64 is 2^6, and 3*8 equals 4*6), by doing this, you can avoid padding problems.( In this way your output will not have the wrong ending of "=")
I have a file with 3236000 bytes and I want to read 2936000 from start and write to an OutputStream
InputStream is = new FileInputStream(file1);
OutputStream os = new FileOutputStream(file2);
AFunctionToCopy(is,os,0,2936000); /* a function or sourcecode to write input stream 0to2936000 bytes */
I can read and write byte by byte, but it's to slow (i think) from buffered reading
How can do I copy it?
public static void copyStream(InputStream input, OutputStream output, long start, long end)
throws IOException
{
for(int i = 0; i<start;i++) input.read(); // dispose of the unwanted bytes
byte[] buffer = new byte[1024]; // Adjust if you want
int bytesRead;
while ((bytesRead = input.read(buffer)) != -1 && bytesRead<=end) // test for EOF or end reached
{
output.write(buffer, 0, bytesRead);
}
}
should work for you.
If you have access to the Apache Commons library, you can use:
IOUtils.copyLarge(InputStream input, OutputStream output, long inputOffset, long length)
What's the simplest way in modern Java (using only the standard libraries) to read all of standard input until EOF into a byte array, preferably without having to provide that array oneself? The stdin data is binary stuff and doesn't come from a file.
I.e. something like Ruby's
foo = $stdin.read
The only partial solution I could think of was along the lines of
byte[] buf = new byte[1000000];
int b;
int i = 0;
while (true) {
b = System.in.read();
if (b == -1)
break;
buf[i++] = (byte) b;
}
byte[] foo[i] = Arrays.copyOfRange(buf, 0, i);
... but that seems bizarrely verbose even for Java, and uses a fixed size buffer.
I'd use Guava and its ByteStreams.toByteArray method:
byte[] data = ByteStreams.toByteArray(System.in);
Without using any 3rd party libraries, I'd use a ByteArrayOutputStream and a temporary buffer:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[32 * 1024];
int bytesRead;
while ((bytesRead = System.in.read(buffer)) > 0) {
baos.write(buffer, 0, bytesRead);
}
byte[] bytes = baos.toByteArray();
... possibly encapsulating that in a method accepting an InputStream, which would then be basically equivalent to ByteStreams.toByteArray anyway...
If you're reading from a file, Files.readAllBytes is the way to do it.
Otherwise, I'd use a ByteBuffer:
ByteBuffer buf = ByteBuffer.allocate(1000000);
ReadableByteChannel channel = Channels.newChannel(System.in);
while (channel.read(buf) >= 0)
;
buf.flip();
byte[] bytes = Arrays.copyOf(buf.array(), buf.limit());
I am trying several ways to decode the bytes of a file into characters.
Using java.io.Reader and Channels.newReader(...)
public static void decodeWithReader() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
Reader reader = Channels.newReader(channel, decoder, -1);
final char[] buffer = new char[4096];
for(;;) {
if(-1 == reader.read(buffer)) {
break;
}
}
fis.close();
}
Using buffers and a decoder manually:
public static void readWithBuffers() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
final long fileLength = channel.size();
long position = 0;
final int bufferSize = 1024 * 1024; // 1MB
CharBuffer cbuf = CharBuffer.allocate(4096);
while(position < fileLength) {
MappedByteBuffer bbuf = channel.map(MapMode.READ_ONLY, position, Math.min(bufferSize, fileLength - position));
for(;;) {
CoderResult res = decoder.decode(bbuf, cbuf, false);
if(CoderResult.OVERFLOW == res) {
cbuf.clear();
} else if (CoderResult.UNDERFLOW == res) {
break;
}
}
position += bbuf.position();
}
fis.close();
}
For a 200MB text file, the first approach consistently takes 300ms to complete. The second approach consistently takes 700ms. Do you have any idea why the reader approach is so much faster?
Can it run even faster with another implementation?
The benchmark is performed on Windows 7, and JDK7_07.
For comparison can you try.
public static void readWithBuffersISO_8859_1() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
MappedByteBuffer bbuf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
while(bbuf.remaining()>0) {
char ch = (char)(bbuf.get() & 0xFF);
}
fis.close();
}
This assumes an ISO-8859-1. If you want maximum speed, treating the text like a binary format can help if its an option.
As #EJP points out, you are changing a number of things as once and you need to start with the simplest comparable example and see how much difference each element adds.
Here is a third implementation that does not use mapped buffers. In the same conditions than before, it runs consistently in 220ms. The default charset on my machine being "windows-1252", if I force the simpler "ISO-8859-1" charset the decoding is even faster (about 150ms).
It looks like the usage of native features like mapped buffers actually hurts performance (for this very use case). Also interesting, if I allocate direct buffers instead of heap buffers (look at the commented lines) then the performance is reduced (a run then takes around 400ms).
So far the answer seems to be: to decode characters as fast as possible in Java (provided you can't enforce the usage of one charset), use a decoder manually, write the decode loop with heap buffers, do not use mapped buffers or even native ones. I have to admit that I don't really know why it is so.
public static void readWithBuffers() throws Exception {
FileInputStream fis = new FileInputStream(FILE);
FileChannel channel = fis.getChannel();
CharsetDecoder decoder = Charset.defaultCharset().newDecoder();
// CharsetDecoder decoder = Charset.forName("ISO-8859-1").newDecoder();
ByteBuffer bbuf = ByteBuffer.allocate(4096);
// ByteBuffer bbuf = ByteBuffer.allocateDirect(4096);
CharBuffer cbuf = CharBuffer.allocate(4096);
// CharBuffer cbuf = ByteBuffer.allocateDirect(2 * 4096).asCharBuffer();
for(;;) {
if(-1 == channel.read(bbuf)) {
decoder.decode(bbuf, cbuf, true);
decoder.flush(cbuf);
break;
}
bbuf.flip();
CoderResult res = decoder.decode(bbuf, cbuf, false);
if(CoderResult.OVERFLOW == res) {
cbuf.clear();
} else if (CoderResult.UNDERFLOW == res) {
bbuf.compact();
}
}
fis.close();
}