Inflate output to file performance improvement - java

I am following code similar to below. Looking around at different implementations it seems that most people are performing the same operations by doing the byte copy. Is there possible a faster way to handle inflating from a file and printing back out to file?
public static String unzipString(InputStream in) {
try {
int length = (int) in.readUBits( 16 );
// Add extra byte to array when Inflater is set to true
byte[] data = in.read( length );
ByteArrayInputStream bin = new ByteArrayInputStream(input);
InflaterInputStream in = new InflaterInputStream(bin);
FileoutputStream bout = new FileoutputStream(this.file);
int b;
while ((b = in.read()) != -1) {
bout.write(b);
}
bout.close();
} catch (IOException io) {
return null;
}
}

copying one byte at a time is always going to be a very slow way to process a file. I suggest you use a buffer of say 8 KB instead.
try (FileOutputStream fout = new FileOutputStream(this.file)) {
byte[] bytes = new byte[8192];
for (int len; (len = in.read(bytes)) != -1;)
fout.write(b, 0, len);
}
BTW To make it faster you could avoid copying the byte[] in the first place with InputStream which wraps in but reads exactly length bytes.

Related

Reading InputStream bytes and writing to ByteArrayOutputStream

I have code block to read mentioned number of bytes from an InputStream and return a byte[] using ByteArrayOutputStream. When I'm writing that byte[] array to a file, resultant file on the filesystem seems broken. Can anyone help me find out problem in the below code block.
public byte[] readWrite(long bytes, InputStream in) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int maxReadBufferSize = 8 * 1024; //8KB
long numReads = bytes/maxReadBufferSize;
long numRemainingRead = bytes % maxReadBufferSize;
for(int i=0; i<numReads; i++) {
byte bufr[] = new byte[maxReadBufferSize];
int val = in.read(bufr, 0, bufr.length);
if(val != -1) {
bos.write(bufr);
}
}
if(numRemainingRead > 0) {
byte bufr[] = new byte[(int)numRemainingRead];
int val = in.read(bufr, 0, bufr.length);
if(val != -1) {
bos.write(bufr);
}
}
return bos.toByteArray();
}
My understanding of the problem statement
Read bytes number of bytes from the given InputStream in a ByteArrayOutputStream.
Finally, return a byte array.
Key observations
A lot of work is done to make sure bytes are read in chunks of 8KB.
Also, the last remaining chunk of odd size is read separately.
A lot of work is also done to make sure we are reading from the correct offset.
My views
Unless we are reading a very large file (>10MB) I don't see a valid reason for reading in chunks of 8KB.
Let Java libraries do all the hard work of maintaining offset and making sure we don't read outside limits.
Eg: We don't have to give offset, simply do inputStream.read(b) over and over, the next byte array of size b.length will be read. Similarly, we can simply write to outputStream.
Code
public byte[] readWrite(long bytes, InputStream in) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buffer = new byte[(int)bytes];
is.read(buffer);
bos.write(buffer);
return bos.toByteArray();
}
References
About InputStreams
Byte Array to Human Readable Format

IllegalArgumentException using Java8 Base64 decoder

I wanted to use Base64.java to encode and decode files. Encode.wrap(InputStream) and decode.wrap(InputStream) worked but runned slowly. So I used following code.
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException {
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE]; //final int BUFF_SIZE = 1024;
byte[] outBuff = null;
while (in.read(inBuff) > 0) {
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
out.flush();
out.close();
in.close();
}
However, it always throws
Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit
at java.util.Base64$Decoder.decode0(Base64.java:704)
at java.util.Base64$Decoder.decode(Base64.java:526)
at Base64Coder.JavaBase64FileCoder.decodeFile(JavaBase64FileCoder.java:69)
...
After I changed final int BUFF_SIZE = 1024; into final int BUFF_SIZE = 3*1024;, the code worked. Since "BUFF_SIZE" is also used to encode file, I believe there were something wrong with the file encoded (1024 % 3 = 1, which means paddings are added in the middle of the file).
Also, as #Jon Skeet and #Tagir Valeev mentioned, I should not ignore the return value from InputStream.read(). So, I modified the code as below.
(However, I have to mention that the code does run much faster than using wrap(). I noticed the speed difference because I had coded and intensively used Base64.encodeFile()/decodeFile() long before jdk8 was released. Now, my buffed jdk8 code runs as fast as my original code. So, I do not know what is going on with wrap()... )
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException
{
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE];
byte[] outBuff = null;
int bytesRead = 0;
while (true)
{
bytesRead = in.read(inBuff);
if (bytesRead == BUFF_SIZE)
{
outBuff = decoder.decode(inBuff);
}
else if (bytesRead > 0)
{
byte[] tempBuff = new byte[bytesRead];
System.arraycopy(inBuff, 0, tempBuff, 0, bytesRead);
outBuff = decoder.decode(tempBuff);
}
else
{
out.flush();
out.close();
in.close();
return;
}
out.write(outBuff);
}
}
Special thanks to #Jon Skeet and #Tagir Valeev.
I strongly suspect that the problem is that you're ignoring the return value from InputStream.read, other than to check for the end of the stream. So this:
while (in.read(inBuff) > 0) {
// This always decodes the *complete* buffer
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
should be
int bytesRead;
while ((bytesRead = in.read(inBuff)) > 0) {
outBuff = decoder.decode(inBuff, 0, bytesRead);
out.write(outBuff);
}
I wouldn't expect this to be any faster than using wrap though.
Try to use decode.wrap(new BufferedInputStream(new FileInputStream(inputFileName))). With buffering it should be at least as fast as your manually crafted version.
As for why your code doesn't work: that's because the last chunk is likely to be shorter than 1024 bytes, but you try to decode the whole byte[] array. See the #JonSkeet answer for details.
Well, I changed
"final int BUFF_SIZE = 1024;"
into
"final int BUFF_SIZE = 1024 * 3;"
It worked!
So, I guess probabaly there is something wrong with padding... I mean, when encoding the file, (since 1024 % 3 = 1) there must be paddings. And those might raise problems when decoding...
You should records the number of bytes you have read, beside this,
You should be sure that your buffer size is divisible for 3, cause in Base64, every 3 bytes have four output(64 is 2^6, and 3*8 equals 4*6), by doing this, you can avoid padding problems.( In this way your output will not have the wrong ending of "=")

Read all standard input into a Java byte array

What's the simplest way in modern Java (using only the standard libraries) to read all of standard input until EOF into a byte array, preferably without having to provide that array oneself? The stdin data is binary stuff and doesn't come from a file.
I.e. something like Ruby's
foo = $stdin.read
The only partial solution I could think of was along the lines of
byte[] buf = new byte[1000000];
int b;
int i = 0;
while (true) {
b = System.in.read();
if (b == -1)
break;
buf[i++] = (byte) b;
}
byte[] foo[i] = Arrays.copyOfRange(buf, 0, i);
... but that seems bizarrely verbose even for Java, and uses a fixed size buffer.
I'd use Guava and its ByteStreams.toByteArray method:
byte[] data = ByteStreams.toByteArray(System.in);
Without using any 3rd party libraries, I'd use a ByteArrayOutputStream and a temporary buffer:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[32 * 1024];
int bytesRead;
while ((bytesRead = System.in.read(buffer)) > 0) {
baos.write(buffer, 0, bytesRead);
}
byte[] bytes = baos.toByteArray();
... possibly encapsulating that in a method accepting an InputStream, which would then be basically equivalent to ByteStreams.toByteArray anyway...
If you're reading from a file, Files.readAllBytes is the way to do it.
Otherwise, I'd use a ByteBuffer:
ByteBuffer buf = ByteBuffer.allocate(1000000);
ReadableByteChannel channel = Channels.newChannel(System.in);
while (channel.read(buf) >= 0)
;
buf.flip();
byte[] bytes = Arrays.copyOf(buf.array(), buf.limit());

Write an InputStream to an HttpServletResponse

I have an InputStream that I want written to a HttpServletResponse.
There's this approach, which takes too long due to the use of byte[]
InputStream is = getInputStream();
int contentLength = getContentLength();
byte[] data = new byte[contentLength];
is.read(data);
//response here is the HttpServletResponse object
response.setContentLength(contentLength);
response.write(data);
I was wondering what could possibly be the best way to do it, in terms of speed and efficiency.
Just write in blocks instead of copying it entirely into Java's memory first. The below basic example writes it in blocks of 10KB. This way you end up with a consistent memory usage of only 10KB instead of the complete content length. Also the enduser will start getting parts of the content much sooner.
response.setContentLength(getContentLength());
byte[] buffer = new byte[10240];
try (
InputStream input = getInputStream();
OutputStream output = response.getOutputStream();
) {
for (int length = 0; (length = input.read(buffer)) > 0;) {
output.write(buffer, 0, length);
}
}
As creme de la creme with regard to performance, you could use NIO Channels and a directly allocated ByteBuffer. Create the following utility/helper method in some custom utility class, e.g. Utils:
public static long stream(InputStream input, OutputStream output) throws IOException {
try (
ReadableByteChannel inputChannel = Channels.newChannel(input);
WritableByteChannel outputChannel = Channels.newChannel(output);
) {
ByteBuffer buffer = ByteBuffer.allocateDirect(10240);
long size = 0;
while (inputChannel.read(buffer) != -1) {
buffer.flip();
size += outputChannel.write(buffer);
buffer.clear();
}
return size;
}
}
Which you then use as below:
response.setContentLength(getContentLength());
Utils.stream(getInputStream(), response.getOutputStream());
BufferedInputStream in = null;
BufferedOutputStream out = null;
OutputStream os;
os = new BufferedOutputStream(response.getOutputStream());
in = new BufferedInputStream(new FileInputStream(file));
out = new BufferedOutputStream(os);
byte[] buffer = new byte[1024 * 8];
int j = -1;
while ((j = in.read(buffer)) != -1) {
out.write(buffer, 0, j);
}
I think that is very close to the best way, but I would suggest the following change. Use a fixed size buffer(Say 20K) and then do the read/write in a loop.
For the loop do something like
byte[] buffer=new byte[20*1024];
outputStream=response.getOutputStream();
while(true) {
int readSize=is.read(buffer);
if(readSize==-1)
break;
outputStream.write(buffer,0,readSize);
}
ps: Your program will not always work as is, because read don't always fill up the entire array you give it.

Java InputStream reading problem

I have a Java class, where I'm reading data in via an InputStream
byte[] b = null;
try {
b = new byte[in.available()];
in.read(b);
} catch (IOException e) {
e.printStackTrace();
}
It works perfectly when I run my app from the IDE (Eclipse).
But when I export my project and it's packed in a JAR, the read command doesn't read all the data. How could I fix it?
This problem mostly occurs when the InputStream is a File (~10kb).
Thanks!
Usually I prefer using a fixed size buffer when reading from input stream. As evilone pointed out, using available() as buffer size might not be a good idea because, say, if you are reading a remote resource, then you might not know the available bytes in advance. You can read the javadoc of InputStream to get more insight.
Here is the code snippet I usually use for reading input stream:
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = 0;
while ((bytesRead = in.read(buffer)) >= 0){
for (int i = 0; i < bytesRead; i++){
//Do whatever you need with the bytes here
}
}
The version of read() I'm using here will fill the given buffer as much as possible and
return number of bytes actually read. This means there is chance that your buffer may contain trailing garbage data, so it is very important to use bytes only up to bytesRead.
Note the line (bytesRead = in.read(buffer)) >= 0, there is nothing in the InputStream spec saying that read() cannot read 0 bytes. You may need to handle the case when read() reads 0 bytes as special case depending on your case. For local file I never experienced such case; however, when reading remote resources, I actually seen read() reads 0 bytes constantly resulting the above code into an infinite loop. I solved the infinite loop problem by counting the number of times I read 0 bytes, when the counter exceed a threshold I will throw exception. You may not encounter this problem, but just keep this in mind :)
I probably will stay away from creating new byte array for each read for performance reasons.
read() will return -1 when the InputStream is depleted. There is also a version of read which takes an array, this allows you to do chunked reads. It returns the number of bytes actually read or -1 when at the end of the InputStream. Combine this with a dynamic buffer such as ByteArrayOutputStream to get the following:
InputStream in = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int read;
byte[] input = new byte[4096];
while ( -1 != ( read = in.read( input ) ) ) {
buffer.write( input, 0, read );
}
input = buffer.toByteArray()
This cuts down a lot on the number of methods you have to invoke and allows the ByteArrayOutputStream to grow its internal buffer faster.
File file = new File("/path/to/file");
try {
InputStream is = new FileInputStream(file);
byte[] bytes = IOUtils.toByteArray(is);
System.out.println("Byte array size: " + bytes.length);
} catch (IOException e) {
e.printStackTrace();
}
Below is a snippet of code that downloads a file (*. Png, *. Jpeg, *. Gif, ...) and write it in BufferedOutputStream that represents the HttpServletResponse.
BufferedInputStream inputStream = bo.getBufferedInputStream(imageFile);
try {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int bytesRead = 0;
byte[] input = new byte[DefaultBufferSizeIndicator.getDefaultBufferSize()];
while (-1 != (bytesRead = inputStream.read(input))) {
buffer.write(input, 0, bytesRead);
}
input = buffer.toByteArray();
response.reset();
response.setBufferSize(DefaultBufferSizeIndicator.getDefaultBufferSize());
response.setContentType(mimeType);
// Here's the secret. Content-Length should equal the number of bytes read.
response.setHeader("Content-Length", String.valueOf(buffer.size()));
response.setHeader("Content-Disposition", "inline; filename=\"" + imageFile.getName() + "\"");
BufferedOutputStream outputStream = new BufferedOutputStream(response.getOutputStream(), DefaultBufferSizeIndicator.getDefaultBufferSize());
try {
outputStream.write(input, 0, buffer.size());
} finally {
ImageBO.close(outputStream);
}
} finally {
ImageBO.close(inputStream);
}
Hope this helps.

Categories