compression on java nio direct buffers

compression on java nio direct buffers - java

The gzip input/output stream dont operate on Java direct buffers.
Is there any compression algorithm implementation out there that operates directly on direct buffers?
This way there would be no overhead of copying a direct buffer to a java byte array for compression.

I don't mean to detract from your question, but is this really a good optimization point in your program? Have you verified with a profiler that you indeed have a problem? Your question as stated implies you have not done any research, but are merely guessing that you will have a performance or memory problem by allocating a byte[]. Since all the answers in this thread are likely to be hacks of some sort, you should really verify that you actually have a problem before fixing it.
Back to the question, if you're wanting to compress the data "in place" in on a ByteBuffer, the answer is no, there is no capability to do that built into Java.
If you allocated your buffer like the following:
byte[] bytes = getMyData();
ByteBuffer buf = ByteBuffer.wrap(bytes);
You can filter your byte[] through a ByteBufferInputStream as the previous answer suggested.

Wow old question, but stumbled upon this today.
Probably some libs like zip4j can handle this, but you can get the job done with no external dependencies since Java 11:
If you are interested only in compressing data, you can just do:
void compress(ByteBuffer src, ByteBuffer dst) {
var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
try {
def.setInput(src);
def.finish();
def.deflate(dst, Deflater.SYNC_FLUSH);
if (src.hasRemaining()) {
throw new RuntimeException("dst too small");
}
} finally {
def.end();
}
}
Both src and dst will change positions, so you might have to flip them after compress returns.
In order to recover compressed data:
void decompress(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
var inf = new Inflater(true);
try {
inf.setInput(src);
inf.inflate(dst);
if (src.hasRemaining()) {
throw new RuntimeException("dst too small");
}
} finally {
inf.end();
}
}
Note that both methods expect (de-)compression to happen in a single pass, however, we could use slight modified versions in order to stream it:
void compress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) {
var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
try {
def.setInput(src);
def.finish();
int cmp;
do {
cmp = def.deflate(dst, Deflater.SYNC_FLUSH);
if (cmp > 0) {
sink.accept(dst.flip());
dst.clear();
}
} while (cmp > 0);
} finally {
def.end();
}
}
void decompress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) throws DataFormatException {
var inf = new Inflater(true);
try {
inf.setInput(src);
int dec;
do {
dec = inf.inflate(dst);
if (dec > 0) {
sink.accept(dst.flip());
dst.clear();
}
} while (dec > 0);
} finally {
inf.end();
}
}
Example:
void compressLargeFile() throws IOException {
var in = FileChannel.open(Paths.get("large"));
var temp = ByteBuffer.allocateDirect(1024 * 1024);
var out = FileChannel.open(Paths.get("large.zip"));
var start = 0;
var rem = ch.size();
while (rem > 0) {
var mapped=Math.min(16*1024*1024, rem);
var src = in.map(MapMode.READ_ONLY, start, mapped);
compress(src, temp, (bb) -> {
try {
out.write(bb);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
});
rem-=mapped;
}
}
If you want fully zip compliant data:
void zip(ByteBuffer src, ByteBuffer dst) {
var u = src.remaining();
var crc = new CRC32();
crc.update(src.duplicate());
writeHeader(dst);
compress(src, dst);
writeTrailer(crc, u, dst);
}
Where:
void writeHeader(ByteBuffer dst) {
var header = new byte[] { (byte) 0x8b1f, (byte) (0x8b1f >> 8), Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 };
dst.put(header);
}
And:
void writeTrailer(CRC32 crc, int uncompressed, ByteBuffer dst) {
if (dst.order() == ByteOrder.LITTLE_ENDIAN) {
dst.putInt((int) crc.getValue());
dst.putInt(uncompressed);
} else {
dst.putInt(Integer.reverseBytes((int) crc.getValue()));
dst.putInt(Integer.reverseBytes(uncompressed));
}
So, zip imposes 10+8 bytes of overhead.
In order to unzip a direct buffer into another, you can wrap the src buffer into an InputStream:
class ByteBufferInputStream extends InputStream {
final ByteBuffer bb;
public ByteBufferInputStream(ByteBuffer bb) {
this.bb = bb;
}
#Override
public int available() throws IOException {
return bb.remaining();
}
#Override
public int read() throws IOException {
return bb.hasRemaining() ? bb.get() & 0xFF : -1;
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
var rem = bb.remaining();
if (rem == 0) {
return -1;
}
len = Math.min(rem, len);
bb.get(b, off, len);
return len;
}
#Override
public long skip(long n) throws IOException {
var rem = bb.remaining();
if (n > rem) {
bb.position(bb.limit());
n = rem;
} else {
bb.position((int) (bb.position() + n));
}
return n;
}
}
and use:
void unzip(ByteBuffer src, ByteBuffer dst) throws IOException {
try (var is = new ByteBufferInputStream(src); var gis = new GZIPInputStream(is)) {
var tmp = new byte[1024];
var r = gis.read(tmp);
if (r > 0) {
do {
dst.put(tmp, 0, r);
r = gis.read(tmp);
} while (r > 0);
}
}
}
Of course, this is not cool since we are copying data to a temporary array, but nevertheless, it is sort of a roundtrip check that proves that nio-based zip encoding writes valid data that can be read from standard io-based consumers.
So, if we just ignore crc consistency checks we can just drop header/footer:
void unzipNoCheck(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
src.position(src.position() + 10).limit(src.limit() - 8);
decompress(src, dst);
}

If you are using ByteBuffers you can use some simple Input/OutputStream wrappers such as these:
public class ByteBufferInputStream extends InputStream {
private ByteBuffer buffer = null;
public ByteBufferInputStream( ByteBuffer b) {
this.buffer = b;
}
#Override
public int read() throws IOException {
return (buffer.get() & 0xFF);
}
}
public class ByteBufferOutputStream extends OutputStream {
private ByteBuffer buffer = null;
public ByteBufferOutputStream( ByteBuffer b) {
this.buffer = b;
}
#Override
public void write(int b) throws IOException {
buffer.put( (byte)(b & 0xFF) );
}
}
Test:
ByteBuffer buffer = ByteBuffer.allocate( 1000 );
ByteBufferOutputStream bufferOutput = new ByteBufferOutputStream( buffer );
GZIPOutputStream output = new GZIPOutputStream( bufferOutput );
output.write("stackexchange".getBytes());
output.close();
buffer.position( 0 );
byte[] result = new byte[ 1000 ];
ByteBufferInputStream bufferInput = new ByteBufferInputStream( buffer );
GZIPInputStream input = new GZIPInputStream( bufferInput );
input.read( result );
System.out.println( new String(result));

Related

Kotlin gzip uncompress fail

I try to simplify my java gzip uncompress code to kotlin. But after I changed, it sames broken.
Here is the java code
public static byte[] uncompress(byte[] compressedBytes) {
if (null == compressedBytes || compressedBytes.length == 0) {
return null;
}
ByteArrayOutputStream out = null;
ByteArrayInputStream in = null;
GZIPInputStream gzipInputStream = null;
try {
out = new ByteArrayOutputStream();
in = new ByteArrayInputStream(compressedBytes);
gzipInputStream = new GZIPInputStream(in);
byte[] buffer = new byte[256];
int n = 0;
while ((n = gzipInputStream.read(buffer)) >= 0) {
out.write(buffer, 0, n);
}
return out.toByteArray();
} catch (IOException ignore) {
} finally {
CloseableUtils.closeQuietly(gzipInputStream);
CloseableUtils.closeQuietly(in);
CloseableUtils.closeQuietly(out);
}
return null;
}
This is my kotlin code.
payload = GZIPInputStream(payload.inputStream())
.bufferedReader()
.use { it.readText() }
.toByteArray()
And I got this error.
com.google.protobuf.nano.InvalidProtocolBufferNanoException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length.
It seems that the decompression process was interrupted by reader?

The readText(charset: Charset = Charsets.UTF_8) decodes the bytes into UTF-8 character set, which is why it says "This could mean either than the input has been truncated" it probably have tried to convert 8-bits into a Char and build a String out of it.
Use the readBytes() to get ByteArray which is represented same as byte[] in JVM platform.
Example:
GZIPInputStream(payload.inputStream())
.bufferedReader()
.use { it.readBytes() }
Edit:
For reading bytes, you shouldn't be using the Reader, it is meant for reading the Text in UTF-8 format as defined in the Kotlin's InputStream.bufferedReader:
public inline fun InputStream.bufferedReader(charset: Charset = Charsets.UTF_8): BufferedReader = reader(charset).buffered()
The InputStream.readBytes() will read the bytes at a buffer of 8KB itself.
public fun InputStream.readBytes(): ByteArray {
val buffer = ByteArrayOutputStream(maxOf(DEFAULT_BUFFER_SIZE, this.available()))
copyTo(buffer)
return buffer.toByteArray()
}
// This copies with 8KB buffer automatically
// DEFAULT_BUFFER_SIZE = 8 * 1024
public fun InputStream.copyTo(out: OutputStream, bufferSize: Int = DEFAULT_BUFFER_SIZE): Long {
var bytesCopied: Long = 0
val buffer = ByteArray(bufferSize)
var bytes = read(buffer)
while (bytes >= 0) {
out.write(buffer, 0, bytes)
bytesCopied += bytes
bytes = read(buffer)
}
return bytesCopied
}
So you just have to do:
GZIPInputStream(payload.inputStream()).use { it.readBytes() }

use the following function:
fun File.unzip(unzipLocationRoot: File? = null) {
val rootFolder = unzipLocationRoot
?: File(parentFile.absolutePath + File.separator + nameWithoutExtension)
if (!rootFolder.exists()) {
rootFolder.mkdirs()
}
ZipFile(this).use { zip ->
zip
.entries()
.asSequence()
.map {
val outputFile = File(rootFolder.absolutePath + File.separator + it.name)
ZipIO(it, outputFile)
}
.map {
it.output.parentFile?.run {
if (!exists()) mkdirs()
}
it
}
.filter { !it.entry.isDirectory }
.forEach { (entry, output) ->
zip.getInputStream(entry).use { input ->
output.outputStream().use { output ->
input.copyTo(output)
}
}
}
}
}
Pass the file as a parameter as follow:
val zipFile = File(your file directory, your file name)
zipFile.unzip()
Hope this would help 🙏🏼

Java 8 - Most effective way to merge List<byte[]> to byte[]

I have a library that returns some binary data as list of binary arrays. Those byte[] need to be merged into an InputStream.
This is my current implementation:
public static InputStream foo(List<byte[]> binary) {
byte[] streamArray = null;
binary.forEach(bin -> {
org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
});
return new ByteArrayInputStream(streamArray);
}
but this is quite cpu intense. Is there a better way?
Thanks for all the answers. I did a performance test. Those are my results:
Function: 'NicolasFilotto' => 68,04 ms average on 100 calls
Function: 'NicolasFilottoEstSize' => 65,24 ms average on 100 calls
Function: 'NicolasFilottoSequenceInputStream' => 63,09 ms average on 100 calls
Function: 'Saka1029_1' => 63,06 ms average on 100 calls
Function: 'Saka1029_2' => 0,79 ms average on 100 calls
Function: 'Coco' => 541,60 ms average on 10 calls
I'm not sure if 'Saka1029_2' is measured correctly...
this is the execute function:
private static double execute(Callable<InputStream> funct, int times) throws Exception {
List<Long> executions = new ArrayList<>(times);
for (int idx = 0; idx < times; idx++) {
BufferedReader br = null;
long startTime = System.currentTimeMillis();
InputStream is = funct.call();
br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null) {}
executions.add(System.currentTimeMillis() - startTime);
}
return calculateAverage(executions);
}
note that I read every input stream
those are the used implementations:
public static class NicolasFilotto implements Callable<InputStream> {
private final List<byte[]> binary;
public NicolasFilotto(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
}
public static class NicolasFilottoSequenceInputStream implements Callable<InputStream> {
private final List<byte[]> binary;
public NicolasFilottoSequenceInputStream(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
return new SequenceInputStream(
Collections.enumeration(
binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())));
}
}
public static class NicolasFilottoEstSize implements Callable<InputStream> {
private final List<byte[]> binary;
private final int lineSize;
public NicolasFilottoEstSize(List<byte[]> binary, int lineSize) {
this.binary = binary;
this.lineSize = lineSize;
}
#Override
public InputStream call() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream(binary.size() * lineSize);
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
}
public static class Saka1029_1 implements Callable<InputStream> {
private final List<byte[]> binary;
public Saka1029_1(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
int pos = 0;
for (byte[] bin : binary) {
int length = bin.length;
System.arraycopy(bin, 0, all, pos, length);
pos += length;
}
return new ByteArrayInputStream(all);
}
}
public static class Saka1029_2 implements Callable<InputStream> {
private final List<byte[]> binary;
public Saka1029_2(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
int size = binary.size();
return new InputStream() {
int i = 0, j = 0;
#Override
public int read() throws IOException {
if (i >= size) return -1;
if (j >= binary.get(i).length) {
++i;
j = 0;
}
if (i >= size) return -1;
return binary.get(i)[j++];
}
};
}
}
public static class Coco implements Callable<InputStream> {
private final List<byte[]> binary;
public Coco(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
byte[] streamArray = new byte[0];
for (byte[] bin : binary) {
streamArray = org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
}
return new ByteArrayInputStream(streamArray);
}
}

You could use a ByteArrayOutputStream to store the content of each byte arrays of your list but to make it efficient, we would need to create the instance of ByteArrayOutputStream with an initial size that matches the best as possible with the target size, so if you know the size or at least the average size of the array of bytes, you should use it, the code would be:
public static InputStream foo(List<byte[]> binary) {
ByteArrayOutputStream baos = new ByteArrayOutputStream(ARRAY_SIZE * binary.size());
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
Another approach would be to use SequenceInputStream in order to logically concatenate all the ByteArrayInputStream instances representing one element of your list, as next:
public static InputStream foo(List<byte[]> binary) {
return new SequenceInputStream(
Collections.enumeration(
binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())
)
);
}
The interesting aspect of this approach is the fact that you have no need to copy anything, you only create instances of ByteArrayInputStream that will use the byte array as it is.
To avoid collecting the result as a List which has a cost especially if your initial List is big, you can directly call iterator() as proposed by #Holger, then we will simply need to convert an iterator into an enumeration which can be done with IteratorUtils.asEnumeration(iterator) from Apache Commons Collection, the final code would then be:
public static InputStream foo(List<byte[]> binary) {
return new SequenceInputStream(
IteratorUtils.asEnumeration(
binary.stream().map(ByteArrayInputStream::new).iterator()
)
);
}

Try this.
public static InputStream foo(List<byte[]> binary) {
byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
int pos = 0;
for (byte[] bin : binary) {
int length = bin.length;
System.arraycopy(bin, 0, all, pos, length);
pos += length;
}
return new ByteArrayInputStream(all);
}
Or
public static InputStream foo(List<byte[]> binary) {
int size = binary.size();
return new InputStream() {
int i = 0, j = 0;
#Override
public int read() throws IOException {
if (i >= size) return -1;
if (j >= binary.get(i).length) {
++i;
j = 0;
}
if (i >= size) return -1;
return binary.get(i)[j++];
}
};
}

Read the newly appended file content to an InputStream in Java

I have a writer program that writes a huge serialized java object (at the scale of 1GB) into a binary file on local disk at a specific speed. Actually, the writer program (implemented in C language) is a network receiver that receives the bytes of the serialized object from a remote server. The implementation of the writer is fixed.
Now, I want to implement a Java reader program that reads the file and deserializes it to a Java object. Since the file could be very large, it is beneficial to reduce the latency of deserializing the object. Particularly, I want the Java reader starts to read/deserialize the object once the first byte of the object has been written to the disk file so that the reader can start to deserialize the object even before the entire serialized object has been written to the file. The reader knows the size of the file ahead of time (before the first byte is written to the file).
I think what I need is something like a blocking file InputStream that will be blocked when it reaches the EndOfFile but it has not read the expected number of bytes (the size of the file will be). Thus, whenever new bytes have been written to the file, the reader's InputStream could keep reading the new content. However, FileInputStream in Java does not support this feature.
Probably, I also need a file listener that monitoring the changes made to the file to achieve this feature.
I am wondering if there is any existing solution/library/package can achieve this function. Probably the question may be similar to some of the questions in monitoring the log files.
The flow of the bytes is like this:
FileInputStream -> SequenceInputStream -> BufferedInputStream -> JavaSerializer

You need two threads: Thread1 to download from the server and write to a File, and Thread2 to read the File as it becomes available.
Both threads should share a single RandomAccessFile, so access to the OS file can be synchronized correctly. You could use a wrapper class like this:
public class ReadWriteFile {
ReadWriteFile(File f, long size) throws IOException {
_raf = new RandomAccessFile(f, "rw");
_size = size;
_writer = new OutputStream() {
#Override
public void write(int b) throws IOException {
write(new byte[] {
(byte)b
});
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
if (len < 0)
throw new IllegalArgumentException();
synchronized (_raf) {
_raf.seek(_nw);
_raf.write(b, off, len);
_nw += len;
_raf.notify();
}
}
};
}
void close() throws IOException {
_raf.close();
}
InputStream reader() {
return new InputStream() {
#Override
public int read() throws IOException {
if (_pos >= _size)
return -1;
byte[] b = new byte[1];
if (read(b, 0, 1) != 1)
throw new IOException();
return b[0] & 255;
}
#Override
public int read(byte[] buff, int off, int len) throws IOException {
synchronized (_raf) {
while (true) {
if (_pos >= _size)
return -1;
if (_pos >= _nw) {
try {
_raf.wait();
continue;
} catch (InterruptedException ex) {
throw new IOException(ex);
}
}
_raf.seek(_pos);
len = (int)Math.min(len, _nw - _pos);
int nr = _raf.read(buff, off, len);
_pos += Math.max(0, nr);
return nr;
}
}
}
private long _pos;
};
}
OutputStream writer() {
return _writer;
}
private final RandomAccessFile _raf;
private final long _size;
private final OutputStream _writer;
private long _nw;
}
The following code shows how to use ReadWriteFile from two threads:
public static void main(String[] args) throws Exception {
File f = new File("test.bin");
final long size = 1024;
final ReadWriteFile rwf = new ReadWriteFile(f, size);
Thread t1 = new Thread("Writer") {
public void run() {
try {
OutputStream w = new BufferedOutputStream(rwf.writer(), 16);
for (int i = 0; i < size; i++) {
w.write(i);
sleep(1);
}
System.out.println("Write done");
w.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
};
Thread t2 = new Thread("Reader") {
public void run() {
try {
InputStream r = new BufferedInputStream(rwf.reader(), 13);
for (int i = 0; i < size; i++) {
int b = r.read();
assert (b == (i & 255));
}
int eof = r.read();
assert (eof == -1);
r.close();
System.out.println("Read done");
} catch (IOException ex) {
ex.printStackTrace();
}
}
};
t1.start();
t2.start();
t1.join();
t2.join();
rwf.close();
}

Java: How to check that 2 binary files are same?

What is the easiest way to check (in a unit test) whether binary files A and B are equal?

Are third-party libraries fair game? Guava has Files.equal(File, File). There's no real reason to bother with hashing if you don't have to; it can only be less efficient.

There's always just reading byte by byte from each file and comparing them as you go. Md5 and Sha1 etc still have to read all the bytes so computing the hash is extra work that you don't have to do.
if (file1.length() != file2.length()) {
return false;
}
try( InputStream in1 = new BufferedInputStream(new FileInputStream(file1));
InputStream in2 = new BufferedInputStream(new FileInputStream(file2));
) {
int value1, value2;
do {
//since we're buffered, read() isn't expensive
value1 = in1.read();
value2 = in2.read();
if(value1 != value2) {
return false;
}
} while(value1 >= 0);
// since we already checked that the file sizes are equal
// if we're here we reached the end of both files without a mismatch
return true;
}

With assertBinaryEquals.
public static void assertBinaryEquals(java.io.File expected,
java.io.File actual)
http://junit-addons.sourceforge.net/junitx/framework/FileAssert.html

Read the files in (small) blocks and compare them:
static boolean binaryDiff(File a, File b) throws IOException {
if(a.length() != b.length()){
return false;
}
final int BLOCK_SIZE = 128;
InputStream aStream = new FileInputStream(a);
InputStream bStream = new FileInputStream(b);
byte[] aBuffer = new byte[BLOCK_SIZE];
byte[] bBuffer = new byte[BLOCK_SIZE];
do {
int aByteCount = aStream.read(aBuffer, 0, BLOCK_SIZE);
bStream.read(bBuffer, 0, BLOCK_SIZE);
if (!Arrays.equals(aBuffer, bBuffer)) {
return false;
}
}
while(aByteCount < 0);
return true;
}

If you want to avoid dependencies you can do it using quite nicely with Files.readAllBytes and Assert.assertArrayEquals
Assert.assertArrayEquals("Binary files differ",
Files.readAllBytes(Paths.get(expectedBinaryFile)),
Files.readAllBytes(Paths.get(actualBinaryFile)));
Note: This will read the whole file so it might not be efficient with large files.

Since Java 12 you could also use the Files.mismatch method JavaDoc. It will return -1L if the files are the same.

I had to do the same in a unit test too, so I used SHA1 hashes to do that, to spare the the calculation of the hashes I check if the files sizes are equal first. Here was my attempt:
public class SHA1Compare {
private static final int CHUNK_SIZE = 4096;
public void assertEqualsSHA1(String expectedPath, String actualPath) throws IOException, NoSuchAlgorithmException {
File expectedFile = new File(expectedPath);
File actualFile = new File(actualPath);
Assert.assertEquals(expectedFile.length(), actualFile.length());
try (FileInputStream fisExpected = new FileInputStream(actualFile);
FileInputStream fisActual = new FileInputStream(expectedFile)) {
Assert.assertEquals(makeMessageDigest(fisExpected),
makeMessageDigest(fisActual));
}
}
public String makeMessageDigest(InputStream is) throws NoSuchAlgorithmException, IOException {
byte[] data = new byte[CHUNK_SIZE];
MessageDigest md = MessageDigest.getInstance("SHA1");
int bytesRead = 0;
while(-1 != (bytesRead = is.read(data, 0, CHUNK_SIZE))) {
md.update(data, 0, bytesRead);
}
return toHexString(md.digest());
}
private String toHexString(byte[] digest) {
StringBuilder sha1HexString = new StringBuilder();
for(int i = 0; i < digest.length; i++) {
sha1HexString.append(String.format("%1$02x", Byte.valueOf(digest[i])));
}
return sha1HexString.toString();
}
}

Byte array not fully consumed

I am writing an FLV parser in Java and have come up against an issue. The program successfully parses and groups together tags into packets and correctly identifies and assigns a byte array for each tag's body based upon the BodyLength flag in the header. However in my test files it successfully completes this but stops before the last 4 bytes.
The byte sequence left out in the first file is :
00 00 14 C3
And in the second:
00 00 01 46
Clearly it is an issue with the final 4 bytes of both files however I cannot spot the error in my logic. I suspect it might be:
while (in.available() != 0)
However I also doubt this is the case as the program is successfully entering the loop for the final tag however it is just stopping 4 bytes short. Any help is greatly appreciated. (I know proper exception handling is as yet not taking place)
Parser.java
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.lang.reflect.Array;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.InputMismatchException;
/**
*
* #author A
*
* Parser class for FLV files
*/
public class Parser {
private static final int HEAD_SIZE = 9;
private static final int TAG_HEAD_SIZE = 15;
private static final byte[] FLVHEAD = { 0x46, 0x4C, 0x56 };
private static final byte AUDIO = 0x08;
private static final byte VIDEO = 0x09;
private static final byte DATA = 0x12;
private static final int TYPE_INDEX = 4;
private File file;
private FileInputStream in;
private ArrayList<Packet> packets;
private byte[] header = new byte[HEAD_SIZE];
Parser() throws FileNotFoundException {
throw new FileNotFoundException();
}
Parser(URI uri) {
file = new File(uri);
init();
}
Parser(File file) {
this.file = file;
init();
}
private void init() {
packets = new ArrayList<Packet>();
}
public void parse() {
boolean test = false;
try {
test = parseHeader();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
if (test) {
System.out.println("Header Verified");
// Add header packet to beginning of list & then null packet
Packet p = new Packet(PTYPE.P_HEAD);
p.setSize(header.length);
p.setByteArr(header);
packets.add(p);
p = null;
try {
parseTags();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} else {
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// throw FileNotFoundException because incorrect file
}
}
private boolean parseHeader() throws FileNotFoundException, IOException {
if (file == null)
throw new FileNotFoundException();
in = new FileInputStream(file);
in.read(header, 0, 9);
return Arrays.equals(FLVHEAD, Arrays.copyOf(header, FLVHEAD.length));
}
private void parseTags() throws IOException {
if (file == null)
throw new FileNotFoundException();
byte[] tagHeader = new byte[TAG_HEAD_SIZE];
Arrays.fill(tagHeader, (byte) 0x00);
byte[] body;
byte[] buf;
PTYPE pt;
int OFFSET = 0;
while (in.available() != 0) {
// Read first 5 - bytes, previous tag size + tag type
in.read(tagHeader, 0, 5);
if (tagHeader[TYPE_INDEX] == AUDIO) {
pt = PTYPE.P_AUD;
} else if (tagHeader[TYPE_INDEX] == VIDEO) {
pt = PTYPE.P_VID;
} else if (tagHeader[TYPE_INDEX] == DATA) {
pt = PTYPE.P_DAT;
} else {
// Header should've been dealt with - if previous data types not
// found then throw exception
System.out.println("Unexpected header format: ");
System.out.print(String.format("%02x\n", tagHeader[TYPE_INDEX]));
System.out.println("Last Tag");
packets.get(packets.size()-1).diag();
System.out.println("Number of tags found: " + packets.size());
throw new InputMismatchException();
}
OFFSET = TYPE_INDEX;
// Read body size - 3 bytes
in.read(tagHeader, OFFSET + 1, 3);
// Body size buffer array - padding for 1 0x00 bytes
buf = new byte[4];
Arrays.fill(buf, (byte) 0x00);
// Fill size bytes
buf[1] = tagHeader[++OFFSET];
buf[2] = tagHeader[++OFFSET];
buf[3] = tagHeader[++OFFSET];
// Calculate body size
int bSize = ByteBuffer.wrap(buf).order(ByteOrder.BIG_ENDIAN)
.getInt();
// Initialise Array
body = new byte[bSize];
// Timestamp
in.read(tagHeader, ++OFFSET, 3);
Arrays.fill(buf, (byte) 0x00);
// Fill size bytes
buf[1] = tagHeader[OFFSET++];
buf[2] = tagHeader[OFFSET++];
buf[3] = tagHeader[OFFSET++];
int milliseconds = ByteBuffer.wrap(buf).order(ByteOrder.BIG_ENDIAN)
.getInt();
// Read padding
in.read(tagHeader, OFFSET, 4);
// Read body
in.read(body, 0, bSize);
// Diagnostics
//printBytes(body);
Packet p = new Packet(pt);
p.setSize(tagHeader.length + body.length);
p.setByteArr(concat(tagHeader, body));
p.setMilli(milliseconds);
packets.add(p);
p = null;
// Zero out for next iteration
body = null;
Arrays.fill(buf, (byte)0x00);
Arrays.fill(tagHeader, (byte)0x00);
milliseconds = 0;
bSize = 0;
OFFSET = 0;
}
in.close();
}
private byte[] concat(byte[] tagHeader, byte[] body) {
int aLen = tagHeader.length;
int bLen = body.length;
byte[] C = (byte[]) Array.newInstance(tagHeader.getClass()
.getComponentType(), aLen + bLen);
System.arraycopy(tagHeader, 0, C, 0, aLen);
System.arraycopy(body, 0, C, aLen, bLen);
return C;
}
private void printBytes(byte[] b) {
System.out.println("\n--------------------");
for (int i = 0; i < b.length; i++) {
System.out.print(String.format("%02x ", b[i]));
if (((i % 8) == 0 ) && i != 0)
System.out.println();
}
}
}
Packet.java
public class Packet {
private PTYPE type = null;
byte[] buf;
int milliseconds;
Packet(PTYPE t) {
this.setType(t);
}
public void setSize(int s) {
buf = new byte[s];
}
public PTYPE getType() {
return type;
}
public void setType(PTYPE type) {
if (this.type == null)
this.type = type;
}
public void setByteArr(byte[] b) {
this.buf = b;
}
public void setMilli(int milliseconds) {
this.milliseconds = milliseconds;
}
public void diag(){
System.out.println("|-- Tag Type: " + type);
System.out.println("|-- Milliseconds: " + milliseconds);
System.out.println("|-- Size: " + buf.length);
System.out.println("|-- Bytes: ");
for(int i = 0; i < buf.length; i++){
System.out.print(String.format("%02x ", buf[i]));
if (((i % 8) == 0 ) && i != 0)
System.out.println();
}
System.out.println();
}
}
jFLV.java
import java.net.URISyntaxException;
public class jFLV {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
Parser p = null;
try {
p = new Parser(jFLV.class.getResource("sample.flv").toURI());
} catch (URISyntaxException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
p.parse();
}
}
PTYPE.java
public enum PTYPE {
P_HEAD,P_VID,P_AUD,P_DAT
};

Both your use of available() and your call to read are broken. Admittedly I would have somewhat expected this to be okay for a FileInputStream (until you reach the end of the stream, at which point ignoring the return value for read could still be disastrous) but I personally assume that streams can always return partial data.
available() only tells you whether there's any data available right now. It's very rarely useful - just ignore it. If you want to read until the end of the stream, you should usually keep calling read until it returns -1. It's slightly tricky to combine that with "I'm trying to read the next block", admittedly. (It would be nice if InputStream had a peek() method, but it doesn't. You can wrap it in a BufferedInputStream and use mark/reset to test that at the start of each loop... ugly, but it should work.)
Next, you're ignoring the result of InputStream.read (in multiple places). You should always use the result of this, rather than assuming it has read the amount of data you've asked for. You might want a couple of helper methods, e.g.
static byte[] readExactly(InputStream input, int size) throws IOException {
byte[] data = new byte[size];
readExactly(input, data);
return data;
}
static void readExactly(InputStream input, byte[] data) throws IOException {
int index = 0;
while (index < data.length) {
int bytesRead = input.read(data, index, data.length - index);
if (bytesRead < 0) {
throw new EOFException("Expected more data");
}
}
}

You should use one of the read methods instead of available, as available() "Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream."
It is not designed to check how long you can read.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.