File length retrieval from unclosed FileOutputStream - java

A custom file format being devised requires robust file corruption checking, currently implemented via a SHA-2 checksum appended at the end of the file. Given the large sizes of the given file, it takes a while to validate the SHA-2 checksums and thus it has been decided to place the final file size near the start of the file to quickly filter out files with mismatched file lengths.
Maybe I am just thinking too much, but is there anyway the following example will fail?
File outputFile = ... // Output file
try(FileOutputStream fOut = new FileOutputStream(outputFile);
FileChannel fChannel = fOut.getChannel()){
fOut.write(MAGIC); // byte array to magic header
fOut.write(new Byte[8]); // reserve space for eventual file size
//// [Bulk Writing to File] ////
// Ensure all writing is complete
fOut.flush();
fOut.getFD().sync(); // Is this necessary?
// Write final file size to file
ByteBuffer finalFileSize = ByteBuffer.allocate(8);
finalFileSize.order(ByteOrder.BIG_ENDIAN);
finalFileSize.putLong(outputFile.length()); // Will this statement return an inaccurate file length?
fChannel.position(MAGIC.length);
fChannel.write(finalFileSize);
}catch(IOException ex){
// Exception handle code... deletes current file and starts again.
}
I am particularly worried about outputFile.length() returning an invalid length due to the file stream being unclosed (and thus some bytes could persist in memory/metadata not updated on specific platforms).
In my particular case, having the filelength simply be unavailable (written as 0) is better than having it be invalid since the corruption detection code will ignore file lengths <= 0 and move on SHA-2 validation but will reject positive filelength mismatches.
Is my implementation sufficient or do I need to resort to writing a counting stream wrapper around the FIS to make sure the file length is correct?

Related

Getting java.io.EOFException while reading a SQLite file from temp directory

I am seeing an EOFException exception while reading a SQLite file from temp directory. Following is the code for reading the file. And also the exception is not seen always. Consider out of 50K files it is coming for 3 to 4 times.
public static byte[] decompressLzmaStream(InputStream inputStream, int size)
throws CompressorException, IOException {
if(size < 1) {
size = 1024 * 100;
}
try(LZMACompressorInputStream lzmaInputStream =
new LZMACompressorInputStream(inputStream);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(size)) {
byte[] buffer = new byte[size];
int length;
while (-1 != (length = lzmaInputStream.read(buffer))) {
byteArrayOutputStream.write(buffer, 0, length);
}
byteArrayOutputStream.flush();
return byteArrayOutputStream.toByteArray();
}
}
I am using the following dependency for the decompression
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.20</version>
</dependency>
The exception is thrown at while (-1 != (length = lzmaInputStream.read(buffer))) { line. Following is the exception.
java.io.EOFException: null at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290)
at org.tukaani.xz.rangecoder.RangeDecoderFromStream.normalize(Unknown Source)
at org.tukaani.xz.rangecoder.RangeDecoder.decodeBit(Unknown Source)
at org.tukaani.xz.lzma.LZMADecoder.decode(Unknown Source)
at org.tukaani.xz.LZMAInputStream.read(Unknown Source)
at org.apache.commons.compress.compressors.lzma.
LZMACompressorInputStream.read(LZMACompressorInputStream.java:62)
at java.io.InputStream.read(InputStream.java:101)
Anyone have any idea about the following constructor of commons-compress.
// I am using this constructor of LZMACompressorInputStream
public LZMACompressorInputStream(InputStream inputStream) throws IOException {
this.in = new LZMAInputStream(this.countingStream = new CountingInputStream(inputStream), -1);
}
// This is added in later version of commons-compress, what is memoryLimitInKb
public LZMACompressorInputStream(InputStream inputStream, int memoryLimitInKb) throws IOException {
try {
this.in = new LZMAInputStream(this.countingStream = new CountingInputStream(inputStream), memoryLimitInKb);
} catch (MemoryLimitException var4) {
throw new org.apache.commons.compress.MemoryLimitException((long)var4.getMemoryNeeded(), var4.getMemoryLimit(), var4);
}
}
As I read for LZMA streams we need to pass the uncompressed size to the constructor here --> https://issues.apache.org/jira/browse/COMPRESS-286?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=14109417#comment-14109417
An LZMA decoder needs to know when the compressed stream ends. If uncompressed size was known during compression, the header of the stream (located at the beginning of the stream)
will contain the uncompressed size. When the output of the decoder reaches this size, the decoder knows that the
end of the stream is reached. If uncompressed size was not kown during compression, the header will
not contain the size. In this case the encoder assumes that the stream is explicitely terminated with an
end of stream marker.
Since LZMA streams are also used in container formats like 7z and xz, the LZMAOutputStream
and LZMAInputStream classes also provide contructors for reading/writing streams without an header.
COMPRESS-286 is about decompressing a 7z archive that contains an entry with LZMA compression. A 7z archive contains LZMA streams without an header. The information that is usually stored in the header of the LZMA is stored separated from the stream.
Apache commons SevenZFile class for reading 7z archives creates LZMAInputStream objects with the following constructor:
LZMAInputStream(InputStream in, long uncompSize, byte propsByte, int dictSize)
The additional parameters of the constructor represent the information that is usually stored in the header at the beginning of the LZMA stream. The fix of COMPRESS-286 ensured that also the uncompressed size (was missing before) is handed over to the LZMAInputStream.
LZMACompressorInputStream makes also use of LZMAInputStream but it assumes that the compressed stream contains an explicit header. Therefore it is not possible to hand over the information through it's constructor.
The memoryLimitInKb parameter only limits the memory that is used for decompression and has nothing to do with uncompressed size. Main contributor to required memory is the selected size of the dictionary. This size is specified during compression and is also stored in the header of the stream. Its maximum value is 4 GB. Usually the size of the dictionary is smaller than the uncompressed size. A dictionary greater than the uncompressed size is an absolute waste of memory. A corrupted LZMA header can easily lead to an OOM error and a manipulated stream even opens the doors for a denial of service attacks. Therefore it is wise to limit maximum memory usage when you read an unverified LZMA stream.
To sum it up: Since you do not read a 7z archive with an LZMA compressed entry, COMPRESS-286 has nothing to do with your issue. But the similar stacktrace may be an indicator that something is wrong with the headers of your stream.
Ensure that data is compressed with an instance of LZACompressorOutputStream(automatically selects
dictionary size, all other parameters and ensures that a header is written). If you should use LZAOutputStream directly, make sure that you use an instance that actually writes a header.

How can I read a Base64 file that comes as a chain?

I am currently developing a REST service which receives in its request a field where it is passed a file in base 64 format ("n" characters come). What I do within the service logic is to convert that character string to a File to save it in a predetermined path.
But the problem is that when the file is too large (3MB) the service becomes slow and takes a long time to respond.
This is the code I am using:
String filename = "TEXT.DOCX"
BufferedOutputStream stream = null;
// THE FIELD base64file IS WHAT A STRING IN BASE FORMAT COMES FROM THE REQUEST 64
byte [] fileByteArray = java.util.Base64.getDecoder (). decode (base64file);
// VALID FILE SIZE
if ((1 * 1024 * 1024 <fileByteArray.length) {
    logger.info ("The file [" + filename + "] is too large");
} else {
    stream = new BufferedOutputStream (new FileOutputStream (new File ("C: \" + filename)));
    stream.write (fileByteArray);
}
How can I do to avoid this inconvenience. And that my service does not take so long to convert the file to File.
Buffering does not improve your performance here, as all you are trying to do is simply write the file as fast as possible. Generally it looks fine, change your code to directly use the FileOutputStream and see if it betters things:
try (FileOutputStream stream = new FileOutputStream(path)) {
stream.write(bytes);
}
Alternatively you could also try using something like Apache Commons to do the task for you:
FileUtils.writeByteArrayToFile(new File(path), bytes);
Try the following, also for large files.
Path outPath = Paths.get(filename);
try (InputStream in = Base64.getDecoder ().wrap(base64file)) {
Files.copy(in, outPath);
}
This keeps only a buffer in memory. Your code might become slow because of taking more memory.
wrap takes an InputStream which you should provide, not the entire String.
From Network point of view:
Both json and xml can support large amount of data exchange. And, 3MB is not really huge. But, there is a limitation on how much browser can handle (if this call is from a user interface).
Also, web server like Tomcat has property to handle 2MB by default (check maxPostSize http://tomcat.apache.org/tomcat-6.0-doc/config/http.html#Common_Attributes)
You can also try chunking the request payload (although it shouldn't be required for a 3MB file)
From Implementation point of view:
Write operation on your disk could be slow. It also depends on your OS.
If your file size is really large, you can use Java FileChannel class with ByteBuffer.
To know the cause of slowness (network delay or code), check the performance with a simple Java program against the web service call.

FileChannel.transferFrom to shift file content

I tried to use FileChannel.transferFrom to move some content of a file to the begining.
try (RandomAccessFile rafNew = new RandomAccessFile(_fileName, "rw");
RandomAccessFile rafOld = new RandomAccessFile(_fileName, "r");)
{
rafOld.seek(pos);
rafOld.getChannel().transferTo(0, count, rafNew.getChannel());
} catch (IOException e) {
throw new RuntimeException(e.getMessage());
}
The result of this is a file with strange repetitions of data. The example works if I first transfer data to a buffer file and then from buffer file back to the origin file again.
The Java Docs say nothing about the case where source and destination are the same file.
You are transferring 'count' bytes starting from zero from 'rafOld' to 'rafNew', which hasn't had any seeks done on it, so is also at position zero. So at best your code doesn't do what you said it does. The seek() you did on 'rafOld' doesn't affect the transferTo() operation. You should have removed it and written
transferTo(pos, count, rafNew.getChannel());
But there are still two problems with this.
If count > pos you will be overwriting the source region.
transferTo() must be called in a loop, as it isn't guaranteed to compete the entire transfer in a single call. It returns the number of bytes actually transferred,

Receiving files over socket

I am implementing a Direct Connect client. I am using the NMDC protocol. I can connect to a hub and other connected clients. I am trying to retrieve the file list from each client, I understand that in order to do that one must download the file files.xml.bz2 from the other client. The protocol to download a file is as follows:
-> $ADCGET file <filename> <params>|
<- $ADCSND file <fileName> <params>|
<- (*** binary data is now transfered from client B to client A ***)
I am trying to create a file named files.xml.bz2 using the binary data received. Here's my code:
//filesize is provided through the $ADCSND response from other client
byte[] data = new byte[filesize];
/*
Reading binary data from socket inputstream
*/
int read = 0;
for (int i=0; read<filesize;){
int available = in2.available();
int leftspace = filesize-read;
if (available>0){
in2.read(data, read, available>leftspace? leftspace:available);
++i;
}
read += (available>leftspace? leftspace:available)+1;
}
/*
writing the bytes to an actual file
*/
ByteArrayInputStream f = new ByteArrayInputStream(data);
FileOutputStream file = new FileOutputStream("files.xml.bz2");
file.write(data);
file.close();
The file is created, however, the contents (files.xml) are not readable. Opening it in firefox gives:
XML Parsing Error: not well-formed
Viewing the contents in the terminal only reads binary data. What am i doing wrong?
EDIT
I also tried Decompressing the file using the bz2 libray from Apache Ant.
ByteArrayInputStream f = new ByteArrayInputStream(data);
BZip2CompressorInputStream bzstream = new BZip2CompressorInputStream(f);
FileOutputStream xmlFile = new FileOutputStream("files.xml");
byte[] bytes = new byte[1024];
while((bzstream.read(bytes))!=-1){
xmlFile.write(bytes);
}
xmlFile.close();
bzstream.close();
I get an error, here's the stacktrace:
java.io.IOException: Stream is not in the BZip2 format
at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.init(BZip2CompressorInputStream.java:240)
at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.<init>(BZip2CompressorInputStream.java:132)
at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.<init>(BZip2CompressorInputStream.java:109)
at control.Controller$1.run(Controller.java:196)
Usual, typical misuse of available(). All you need to copy a stream in Java is as follows:
while ((count = in.read(buffer)) >= 0)
{
out.write(buffer, 0, count);
}
Use this with any size buffer greater than zero, but preferably several kilobytes. You don't need a new buffer per iteration, and you don't need to know how much data is available to read without blocking, as you have to block, otherwise you're just smoking the CPU. But you do need to know how much data was actually read per iteration, and this is the first place where your code falls down.
The error java.io.IOException: Stream is not in the BZip2 format is generated by the constructor of class BZip2CompressorInputStream. I decided to scan the bytes, looking for the magic number to make sure that the file was bz2 format, it turns out that Java was right -- it wasnt in bz2 format.
Upon examining the source code of Jucy, I saw that the reason for this was a slight error in the command I sent to the other client, in essence, this error was caused a mistake in my protocol implementation. The solution was:
Replace:
$ADCGET file files.xml.bz2 0 -1 ZL1|
With:
$ADCGET file files.xml.bz2 0 -1|
ZL1 specifies compression of the files being sent (Not necessary).

Java transfer files via UDP. Compressed files are damaged?

i'm trying to transfer Files with a DatagrammSocket in Java. I'm reading the files into 4096 Byte pieces. We are using ACK, so all pieces are in the right order, we tried pdf, exe, jpg and lot more stuff successfully, but iso, zip and 7z are not working. They have exactly the same size afterwards. Do you have any idea?
Reading the Parts:
byte[] b = new byte[FileTransferClient.PACKAGE_SIZE - 32];
FileInputStream read = new FileInputStream(file);
read.skip((part - 1) * (FileTransferClient.PACKAGE_SIZE - 32));
read.read(b);
content = b;
Writing the Parts:
stream = new FileOutputStream(new File(this.filePath));
stream.write(output);
...
stream.write(output);
stream.close();
(Sorry for great grammar, i'm German)
Your write() method calls are assuming that the entire buffer was filled by receive(). You must use the length provided with the DatagramPacket:
datagramSocket.receive(packet);
stream.write(packet.getData(), packet.getOffset(), packet.getLength());
If there is overhead in the packet, e.g. a sequence number, which there should be, you will need to adjust the offset and length accordingly.
NB TCP will ensure 'everything gets transferred and is not damaged'.

Categories