Byte array lengths with Deflater and Inflater - java

How do we understand the defined length of the byte array?
For instance in this example we are defining here that the length of the byte array is 100.
What if the data that would have to be written to the byte array would be longer than 100 bytes?
The same for the result variable here. I don't understand how these lengths work and how to choose a proper length of a byte array for the needs if you don't know how big your data will be?
try {
// Encode a String into bytes
String inputString = "blahblahblah";
byte[] input = inputString.getBytes("UTF-8");
// Compress the bytes
**byte[] output = new byte[100];**
Deflater compresser = new Deflater();
compresser.setInput(input);
compresser.finish();
int compressedDataLength = compresser.deflate(output);
compresser.end();
// Decompress the bytes
Inflater decompresser = new Inflater();
decompresser.setInput(output, 0, compressedDataLength);
**byte[] result = new byte[100];**
int resultLength = decompresser.inflate(result);
decompresser.end();
// Decode the bytes into a String
String outputString = new String(result, 0, resultLength, "UTF-8");
} catch(java.io.UnsupportedEncodingException ex) {
// handle
} catch (java.util.zip.DataFormatException ex) {
// handle
}
And for this example, the byte array that is used here as input, is actually called a buffer, how do we understand it?

Here, when you call compresser.deflate(output) you cannot know the size needed for output unless you know how this method works. But this is not a problem since output is meant as a buffer.
So you should call deflate multiple times and insert output in another object like an OutputStream, like this:
byte[] buffer = new byte[1024];
while (!deflater.finished()) {
int count = deflater.deflate(buffer);
outputStream.write(buffer, 0, count);
}
Same goes for inflating.

By allocating 100 bytes to the byte array, JVM guarantees that a buffer large enough to hold 100 JVM defined bytes (i.e. 8 bits) is available to the caller. Any attempt to access the array with more than 100 bytes would result in exception e.g. ArrayIndexOutOfBoundException in case you directly try to access the array by array[101].
In case the code is written as your demo, the caller assumes the data length never exceeds 100.

Related

Base64 Encoded to Decoded File Conversion Problem

I am processing very large files (> 2Gig). Each input file is Base64 encoded, andI am outputting to new files after decoding. Depending on the buffer size (LARGE_BUF) and for a given input file, my input to output conversion either works fine, is missing one or more bytes, or throws an exception at the outputStream.write line (IllegalArgumentException: Last unit does not have enough bits). Here is the code snippet (could not cut and paste so my not be perfect):
.
.
final int LARGE_BUF = 1024;
byte[] inBuf = new byte[LARGE_BUF];
try(InputStream inputStream = new FileInputStream(inFile); OutputStream outStream new new FileOutputStream(outFile)) {
for(int len; (len = inputStream.read(inBuf)) > 0); ) {
String out = new String(inBuf, 0, len);
outStream.write(Base64.getMimeDecoder().decode(out.getBytes()));
}
}
For instance, for my sample input file, if LARGE_BUF is 1024, output file is 4 bytes too small, if 2*1024, I get the exception mentioned above, if 7*1024, it works correctly. Grateful for any ideas. Thank you.
First, you are converting bytes into a String, then immediately back into bytes. So, remove the use of String entirely.
Second, base64 encoding turns each sequence of three bytes into four bytes, so when decoding, you need four bytes to properly decode three bytes of original data. It is not safe to create a new decoder for each arbitrarily read sequence of bytes, which may or may not have a length which is an exact multiple of four.
Finally, Base64.Decoder has a wrap(InputStream) method which makes this considerably easier:
try (InputStream inputStream = Base64.getDecoder().wrap(
new BufferedInputStream(
Files.newInputStream(Paths.get(inFile))))) {
Files.copy(inputStream, Paths.get(outFile));
}

How to read Serial data the same way in Processing as C#?

In C#, I use the SerialPort Read function as so:
byte[] buffer = new byte[100000];
int bytesRead = serial.Read(buffer, 0, 100000);
In Processing, I use readBytes as so:
byte[] buffer = new byte[100000];
int bytesRead = serial.readBytes(buffer);
In Processing, I'm getting the incorrect byte values when I loop over the buffer array from the readBytes function, but when I just use the regular read function I get the proper values, but I can't grab the data into a byte array. What am I doing wrong in the Processing version of the code that's leading me to get the wrong values in the buffer array?
I print out the data the same way in both versions:
for(int i=0; i<bytesRead; i++){
println(buffer[i]);
}
C# Correct Output:
Processing Incorrect Output:
Java bytes are signed, so any value over 128 will overflow.
A quick solution is to do
int anUnsignedByte = (int) aSignedByte & 0xff;
to each of your bytes.

Inserting a string into a bytebuffer

I am trying to write a bunch of integers and a string into the byte buffer. Later this byte array will be written to the hard drive. Everything seems to be fine except when I am writing the string in the loop only the last character is written. The parsing of the string appears correct as I have checked that.
It appears to be the way I use the bbuf.put statement. Do I need to flush it after, and why does the .putInt statement work fine and not .put
//write the PCB from memory to file system
private static void _tfs_write_pcb()
{
int c;
byte[] bytes = new byte[11];
//to get the bytes from volume name
try {
bytes = constants.filename.getBytes("UTF-8"); //convert to bytes format to pass to function
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
ByteBuffer bbuf = ByteBuffer.allocate(bl_size);
bbuf = bbuf.putInt(rt_dir_start);
bbuf = bbuf.putInt(first_free_data_bl);
bbuf = bbuf.putInt(num_bl_fat);
bbuf = bbuf.putInt(bl_size);
bbuf = bbuf.putInt(max_rt_entries);
bbuf = bbuf.putInt(ft_copies);
for (c=0; c < vl_name.length(); c++) {
System.out.println((char)bytes[c]);
bbuf = bbuf.put(bytes[c]);
}
_tfs_write_block(1, bbuf.array());
}
ByteBuffer has a method for put'ting an array of byte. Is there a reason to put them one at a time? I note that put(byte) is abstract as well.
So the for loop is simplified to:
bbuf = bbuf.put(bytes, 6, bytes.length);
http://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html#put-byte:A-
EDIT: The Javadoc specifies that put(byte[]) begins at index 0, so use the form put(byte[], index, length) instead.
public final ByteBuffer put(byte[] src)
Relative bulk put method (optional operation).
This method transfers the entire content of the given source byte array
into this buffer. An invocation of this method of the form dst.put(a)
behaves in exactly the same way as the invocation
dst.put(a, 0, a.length)
Of course, it really should not matter HOW you insert the String bytes. I am just suggesting discovery experimentation.

Base64 encode file by chunks

I want to split a file into multiple chunks (in this case, trying lengths of 300) and base64 encode it, since loading the entire file to memory gives a negative array exception when base64 encoding it. I tried using the following code:
int offset = 0;
bis = new BufferedInputStream(new FileInputStream(f));
while(offset + 300 <= f.length()){
byte[] temp = new byte[300];
bis.skip(offset);
bis.read(temp, 0, 300);
offset += 300;
System.out.println(Base64.encode(temp));
}
if(offset < f.length()){
byte[] temp = new byte[(int) f.length() - offset];
bis.skip(offset);
bis.read(temp, 0, temp.length);
System.out.println(Base64.encode(temp));
}
At first it appears to be working, however, at one point it switches to just printing out "AAAAAAAAA" and fills up the entire console with it, and the new file is corrupted when decoded. What could be causing this error?
skip() "Skips over and discards n bytes of data from the input stream", and read() returns "the number of bytes read".
So, you read some bytes, skip some bytes, read some more, skip, .... eventually reaching EOF at which point read() returns -1, but you ignore that and use the content of temp which contains all 0's, that are then encoded to all A's.
Your code should be:
try (InputStream in = new BufferedInputStream(new FileInputStream(f))) {
int len;
byte[] temp = new byte[300];
while ((len = in.read(temp)) > 0)
System.out.println(Base64.encode(temp, 0, len));
}
This code reuses the single buffer allocated before the loop, so it will also cause much less garbage collection than your code.
If Base64.encode doesn't have a 3 parameter version, do this:
try (InputStream in = new BufferedInputStream(new FileInputStream(f))) {
int len;
byte[] temp = new byte[300];
while ((len = in.read(temp)) > 0) {
byte[] data;
if (len == temp.length)
data = temp;
else {
data = new byte[len];
System.arraycopy(temp, 0, data, 0, len);
}
System.out.println(Base64.encode(data));
}
}
Be sure to use a buffer size that is a multiple of 3 for encoding and a multiple of 4 for decoding when using chunks of data.
300 fulfills both, so that is already OK. Just as an info for those trying different buffer sizes.
Keep in mind, that reading from a stream into a buffer can in some cicumstances result in a buffer not being fully filled, even though the end of the stream was not yet reached. Might be possible when reading from an internet stream and a timeout occures.
You can heal that, but taking that into account would lead to much more complex coding, that would not be educational anymore.

Zlib compression is too big in size

I am completely new to java, I have decided to learn it by doing a small project in it. I need to compress some string using zlib and write it to a file. However, file turn out to be too big in size. Here is code example:
String input = "yasar\0yasar"; // test input. Input will have null character in it.
byte[] compressed = new byte[100]; // hold compressed content
Deflater compresser = new Deflater();
compresser.setInput(input.getBytes());
compresser.finish();
compresser.deflate(compressed);
File test_file = new File(System.getProperty("user.dir"), "test_file");
try {
if (!test_file.exists()) {
test_file.createNewFile();
}
try (FileOutputStream fos = new FileOutputStream(test_file)) {
fos.write(compressed);
}
} catch (IOException e) {
e.printStackTrace();
}
This write a 1 kilobytes file, while the file should be at most 11 bytes (because the content is 11 bytes here.). I think problem is in the way I initialize the byte array compressed as 100 bytes, but I don't know how big the compreesed data will be in advance. What am I doing wrong here? How can I fix it?
If you don't want to write the whole array and instead write just the part of it that was filled by Deflater use OutputStream#write(byte[] array, int offset, int lenght)
Roughly like
String input = "yasar\0yasar"; // test input. Input will have null character in it.
byte[] compressed = new byte[100]; // hold compressed content
Deflater compresser = new Deflater();
compresser.setInput(input.getBytes());
compresser.finish();
int length = compresser.deflate(compressed);
File test_file = new File(System.getProperty("user.dir"), "test_file");
try {
if (!test_file.exists()) {
test_file.createNewFile();
}
try (FileOutputStream fos = new FileOutputStream(test_file)) {
fos.write(compressed, 0, length); // starting at 0th byte - lenght(-1)
}
} catch (IOException e) {
e.printStackTrace();
}
You will probably still see 1kB or so in Windows because what you see there seems to be either rounded (you wrote 100 bytes before) or it refers to the size on the filesystem which is at least 1 block large (should be 4kb IIRC). Rightclick the file and check the size in the properties, that should show the actual size.
If you don't know the size in advance, don't use Deflater, use a DeflaterOutputStream that writes data of any length compressed.
try (OutputStream out = new DeflaterOutputStream(new FileOutputStream(test_file))) {
out.write("hello!".getBytes());
}
Above example will use the default values for deflating but you can pass a configured Deflater in the constructor of DeflaterOutputStream to change the behavior.
you write to file all 100 bytes of compressed array, but you have to write only really compressed bytes returned by deflater.
int compressedsize = compresser.deflate(compressed);
fos.write(compressed, 0, compressedsize);

Categories