Setting Size of Java Deflater (and Inflater) Output Byte Buffer - java

I need to deflate one or more byte arrays and later inflate them back to normal size. I've looked over the example given in the api docs, and found some other examples.
After looking these examples over, I have two questions which may be unrelated, but they seem connected as I'm trying to understand this.
In the API documentation example, the output buffer for both the Inflater and Deflater is set at 1024 bytes. The example data is only a short sentence, so that is reasonable. But how would I know how big to make the output buffer? Or will Deflater (and Inflater) adjust the size of the output buffer as needed?
Instead of guessing at the size of a buffer, can I use ByteArrayOutputStream and wrap a DeflatorOutputStream around that? Since ByteArrayOutputStream changes the size of the byte array, it wouldn't be necessary to know the size of the output or guess at it, as it seems one would have to do in the API example.

1.In the API documentation example, the output buffer for both the Inflater and Deflater is set at 1024 bytes. The example data is only a short sentence, so that is reasonable. But how would I know how big to make the output buffer? Or will Deflater (and Inflater) adjust the size of the output buffer as needed?
In streams, buffers are just temporary space before passing onto the another stream. Changing the buffers size can change performance but has little to do with the amount of data processed.
2.Instead of guessing at the size of a buffer, can I use ByteArrayOutputStream and wrap a DeflatorOutputStream around that? Since ByteArrayOutputStream changes the size of the byte array, it wouldn't be necessary to know the size of the output or guess at it, as it seems one would have to do in the API example.
You can do that, or you can send it directly to the stream you want the data to go to.

Here's an example of compressing and decompressing using byte arrays:
import java.util.zip.Deflater;
import java.util.zip.InflaterInputStream;
...
byte[] sourceData; // bytes to compress (reuse byte[] for compressed data)
String filename; // where to write
{
// compress the data
Deflater deflater = new Deflater(Deflater.DEFAULT_COMPRESSION);
deflater.setInput(sourceData);
deflater.finish();
int compressedSize = deflater.deflate(data, 0, sourceData.length, Deflater.FULL_FLUSH);
// write the data
OutputStream stream = new FileOutputStream(filename);
stream.write(data, 0, compressedSize);
stream.close();
}
{
byte[] uncompressedData = new byte[1024]; // where to store the data
// read the data
InputStream stream = new InflaterInputStream(new FileInputStream(filename));
// read data - note: may not read fully (or evenly), read from stream until len==0
int len, offset = 0;
while ((len = stream.read(uncompressedData , offset, uncompressedData .length-offset))>0) {
offset += len;
}
stream.close();
}

Related

Reading large file in bytes by chunks with dynamic buffer size

I'm trying to read a large file by chunks and save them in an ArrayList of bytes.
My code, in short, looks like this:
public ArrayList<byte[]> packets = new ArrayList<>();
FileInputStream fis = new FileInputStream("random_text.txt");
byte[] buffer = new byte[512];
while (fis.read(buffer) > 0){
packets.add(buffer);
}
fis.close();
But it has a behavior that I don't know how to solve, for example: If a file has only the words "hello world", this chunk does not necessarily need to be 512 bytes long. In fact, I want each chunk to be a maximum of 512 bytes not that they all necessarily have that size.
First of all, what you are doing is probably a bad idea. Storing a file's contents in memory like this is liable to be a waste of heap space ... and can lead to OutOfMemoryError exceptions and / or a requirement for an excessively large heap if you process large (enough) input files.
The second problem is that your code is wrong. You are repeatedly reading the data into the same byte array. Each time you do, it overwrites what was there before. So you will end up will a list containing lots of reference to a single byte array ... containing just the last chunk of data that you read.
To solve the problem that you asked about1, you will need to copy the chunk that you read to a new (smaller) byte array.
Something like this:
public ArrayList<byte[]> packets = new ArrayList<>();
try (FileInputStream fis = new FileInputStream("random_text.txt")) {
byte[] buffer = new byte[512];
int len;
while ((len = fis.read(buffer)) > 0) {
packets.add(Arrays.copyOf(buffer, len));
}
}
Note that this also deals with the second problem I mentioned. And fixes a potential resource leak by using try with resource syntax to manage the closure of the input stream.
A final issue: If this is really a text file that you are reading, you probably should be using a Reader to read it, and char[] or String to hold it.
But even if you do that there are some awkward edge cases if your text contains Unicode codepoints that are not in code plane 0. For example, emojis. The edge cases will involve code points that are represented as a surrogate pair AND the pair being split on a chunk boundary. Reading and storing the text as lines would avoid that.
1 - The issue here is not the "wasted" space. Unless you are reading and caching a large number of small file, any space wastage due to "short" chunks will be unimportant. The important issue is knowing which bytes in each byte[] are actually valid data.

How to improve GZIP performance

Currently I do have the problem that this piece of code will be called >500k of times. The size of the compressed byte[] is less than 1KB. Every time the method is called all of the streams has to been created. So I am looking for a way to improve this code.
private byte[] unzip(byte[] data) throws IOException, DataFormatException {
byte[] unzipData = new byte[4096];
try (ByteArrayInputStream in = new ByteArrayInputStream(data);
GZIPInputStream gzipIn = new GZIPInputStream(in);
ByteArrayOutputStream out = new ByteArrayOutputStream()) {
int read = 0;
while( (read = gzipIn.read(unzipData)) != -1) {
out.write(unzipData, 0, read);
}
return out.toByteArray();
}
}
I already tried to replace ByteArrayOutputStream with a ByteBuffer, but at the time of creation I don't know how many bytes I need to allocate.
Also, I tried to use Inflater but I stumbled across the problem descriped here.
Any other idea what I could do to improve the perfomance of this code.
UPDATE#1
Maybe this lib helps someone.
Also there is an open JDK-Bug.
Profile your application, to be sure that you're really spending optimizable time in this function. It doesn't matter how many times you call this function; if it doesn't account for a significant fraction of overall program execution time, then optimization is wasted.
Pre-size the ByteArrayOutputStream. The default buffer size is 32 bytes, and resizes require copying all existing bytes. If you know that your decoded arrays will be around 1k, use new ByteArrayOutputStream(2048).
Rather than reading a byte at a time, read a block at a time, using a pre-allocated byte[]. Beware that you must use the return value from read as an input to write. Better, use something like Jakarta Commons IOUtils.copy() to avoid mistakes.
I'm not sure if it applies in your case, but I've found incredible speed difference when comparing using the default buffer size of GZIPInputStream vs increasing to 65536.
example: using a 500M input file ->
new GZIPInputStream(new FileInputStream(path.toFile())) // takes 4 mins to process
vs
new GZIPInputStream(new FileInputStream(path.toFile()), 65536) // takes 10s
J
More details can be found here http://java-performance.info/java-io-bufferedinputstream-and-java-util-zip-gzipinputstream/
Both BufferedInputStream and GZIPInputStream have internal buffers.
Default size for the former one is 8192 bytes and for the latter one
is 512 bytes. Generally it worth increasing any of these sizes to at
least 65536.
You can use the Inflater class method reset() to reuse the Inflater object without having to recreate it each time. You will have a little bit of added programming to do in order to decode the gzip header and perform the integrity check with the gzip trailer. You would then use Inflater with the nowrap option to decompress the raw deflated data after then gzip header and before the trailer.

store fixed bytes into byte array from input stream

I'm trying to learn Java and I came across this practice problem in which I have to create a URL extractor. I am able to stream data and print it. However I'm not really familiar with the buffered reader therefore I need help with creating a buffer of 100 bytes, copying 100 bytes of data from the stream to this byte array, then process this part, then take the next chunk of 100 bytes from the stream and so on....
The following is my code and any help would greatly be appreciated.
I know that what i want needs to be done inside the while loop. I think I need to create a byte array and then store the data into it. It is the how I'm more interested in.
EDIT: I do not need the code sample for anything because I'm trying to learn. Only the description of how I can do this would suffice . Thanks a lot in advance.
Create a byte array (of the size you want) outside your while-loop (you can re-use it that way, so it's faster).
You can use a BufferedInputStream wrapped around your original InputStream instead of a Reader (as Readers can convert bytes to Strings, but we don't need that).
Then you can use the read(byte[]) method of BufferedInputStream to copy the next series of bytes into the array. You can then process the retrieved bytes the way you want.
See the API documentation as a reference of what read(byte[]) does.
As mentioned in the comments, a Reader (and its subclass BufferedReader) is used to read characters not bytes. You should instead use a BufferedInputStream to read into a byte array of the specified size:
public static void main(String[] args) throws IOException {
String website = "thecakestory.com";
Socket client = new Socket(InetAddress.getByName(website), 80);
PrintWriter pw = new PrintWriter(client.getOutputStream());
pw.println("GET /index.php / HTTP/1.1\r\n");
pw.println("Host: " + website);
pw.flush();
BufferedInputStream input = new BufferedInputStream(client.getInputStream());
String x;
int bytesRead;
byte[] contents = new byte[100];
while ((bytesRead = input.read(contents)) != -1) {
x = new String(contents, 0, bytesRead);
System.out.print(x);
}
client.close();
pw.close();
}
Some useful links:
For an introduction to Java IO related stuff, see the Java tutorial page http://docs.oracle.com/javase/tutorial/essential/io/. This should be the starting point for learning about streams, readers, etc.
For the documentation of BufferedInputStream and BufferedReader, see their API reference:
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedInputStream.html
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html

What is the best way to write an int array (image data) to file

I have a large int array containing image data in the format ARGB (alpha, R, G and B channels, 1 byte each). I want to save it to file in onPause() to be able to reload it when the app is restarted. What do you think is the best way to do that?
I found the following methods:
Convert the int array to a byte array manually (see here) and then use a FileOutputStream to output the byte array.
Wrap the array into a java.nio.IntBuffer and then write the object to file using java.io.ObjectOutputStream.writeObject().
Write each element one at a time using java.io.ObjectOutputStream.writeInt().
All these methods seem quite wasteful so there is probably another, better way. Possibly even a way to use image compression to reduce the size of the file?
From my point of view you can also use android specific storages
Use database/content provider for storing image data
Use out Bundle in onSaveInstance method
If your still want to write to a file you can use following NIO based code:
static void writeIntArray(int[] array) throws IOException {
FileOutputStream fos = new FileOutputStream("out.file");
try {
ByteBuffer byteBuff = ByteBuffer.allocate((Integer.SIZE / Byte.SIZE) * array.length);
IntBuffer intBuff = byteBuff.asIntBuffer();
intBuff.put(array);
intBuff.flip();
FileChannel fc = fos.getChannel();
fc.write(byteBuff);
} finally {
fos.close();
}
}
None of those. Some of them don't even make sense.
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
then call dos.writeInt() as many times as necessary, then close dos. The buffer will take away most of the pain.
Or else create an IntBuffer and use FileChannel.write(), but I've never been able to figure out how that works in the absence of an IntBuffer.asByteBuffer() method. Or else create a ByteBuffer, take it as an IntBuffer via asIntBuffer(), put the data in, then adjust the ByteBuffer's length, which is another missing piece of the API, and again use FileChannel.write(ByteBuffer).

How to get number of bytes?

How do I obtain the number of bytes before allocating the byte size of the array 'handsize' as shown below as the incoming ByteArray data are sent in 3 different sizes. Thanks.
BufferedInputStream bais = new
BufferedInputStream(requestSocket.getInputStream());
DataInputStream datainput = new DataInputStream(bais);
//need to read the number of bytes here before proceeding.
byte[] handsize = new byte[bytesize];
datainput.readFully(handsize);
You could use a ByteArrayOutputStream, then you wouldn't have to worry about it.
ByteArrayOutputStream out = new ByteArrayOutputStream();
//write data to output stream
byte[] bytes = out.toByteArray();
There's no way to know how many bytes of data are yet to be received on a socket--knowing this would be tantamount to clairvoyance. If you're using your own protocol for client/server communication, you could send the number of bytes of data as an integer, before sending the actual bytes themselves. Then the receiving side would know how many bytes to expect.
As has been pointed out, the problem as stated is impossible. But why do you need to know? Why not store the data in a variable size structure, like an ArrayList, as you read it? Or maybe even process it as you read?

Categories