Incomplete file using RandomAccessFile in java - java

I made a small program to download data and write it to a file.
Here is the code:
public void run()
{
byte[] bytes = new byte[1024];
int bytes_read;
URLConnection urlc = null;
RandomAccessFile raf = null;
InputStream i = null;
try
{
raf = new RandomAccessFile("file1", "rw");
}
catch(Exception e)
{
e.printStackTrace();
return;
}
try
{
urlc = new URL(link).openConnection();
i = urlc.getInputStream();
}
catch(Exception e)
{
e.printStackTrace();
return;
}
while(canDownload())
{
try
{
bytes_read = i.read(bytes);
}
catch(Exception e)
{
e.printStackTrace();
return;
}
if(bytes_read != -1)
{
try
{
raf.write(bytes, 0, bytes_read);
}
catch(Exception e)
{
e.printStackTrace();
return;
}
}
else
{
try
{
i.close();
raf.close();
return;
}
catch(Exception e)
{
e.printStackTrace();
return;
}
}
}
}
The problem is that when I download big files, I get few bytes missing in the end of the file.
I tried to change the byte array size to 2K, and the problem was solved. But when I downloaded a bigger file (500 MB) , I got few bytes missing again.
I said "Ok, let's try with 4K size". And I changed the byte array size to 4K. It worked!
Nice, but then I downloaded a 4 GB file, I got bytes missing in the end again!
I said "Cool, let's try with 8K size". And then I changed the byte array size to 8K. Worked.
My first question is: Why this happens? (when I change buffer size, the file doesn't get corrupted).
Ok, in theory, the file corrupted problem can be solved changing the byte array size to bigger values.
But there's another problem: how can I measure the download speed (in one second interval) with big byte array sizes?
For example: Let's say that my download speed is 2 KB/s. And the byte array size is 4 K.
My second question is: How can I measure the speed (in one second interval) if the thread will have to wait the byte array to be full? My answer should be: change the byte array size to a smaller value. But the file will get corrupted xD.
After trying to solve the problem by myself, I spent 2 days searching over the internet for a solution. And nothing.
Please, can you guys answer my two questions? Thanks =D
Edit
Code for canDownload():
synchronized private boolean canDownload()
{
return can_download;
}

My advice is to use a proven library such as Apache Commons IO instead of trying to roll your own code. For your particular problem, take a look at the copyURLToFile(URL, File) method.

I would:
Change the RandomAccessFile to a FileOutputStream.
Get rid of canDownload(), whatever it's for, and set a read timeout on the connection instead.
Simplify the copy loop to this:
while ((bytes_read = i.read(bytes)) > 0)
{
out.write(bytes, 0, bytes_read);
}
out.close();
i.close();
with all the exception handling outside this loop.

I think you will find the problem is that you closed the underlying InputStream while the RandomAccessFile still had data in its write buffers. This will be why you are occasionally missing the last few bytes of data.
The race condition is between the JVM flushing the final write, and your call to i.close().
Removing the i.close() should fix the problem; it isn't necessary as the raf.close() closes the underlying stream anyway, but this way you give the RAF a chance to flush any outstanding buffers before it does so.

Related

OutOfMemoryException when reading bytes from file

I'm trying to make a hex dumping application and for that, I need to read the file bytes. I'm using apache io version 2.8.0 to do the hex dumping. This is the code I'm using:
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
try {
byte[] bytes = Files.readAllBytes(Paths.get(getPackageManager().getApplicationInfo("com.pixel.gun3d", 0).nativeLibraryDir.concat("/libil2cpp.so")));
Log.e("MainActivity", dumpHex(bytes));
} catch (PackageManager.NameNotFoundException | IOException e) {
e.printStackTrace();
}
}
private String dumpHex(byte[] bytes) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
HexDump.dump(bytes, 0, out, 0);
} catch (IOException e) {
e.printStackTrace();
}
return new String(out.toByteArray());
}
And the error I get is this: java.lang.OutOfMemoryError: Failed to allocate a 155189264 byte allocation with 25165824 free bytes and 94MB until OOM, max allowed footprint 328602112, growth limit 402653184
I looked it up and none of the suggestions I tried such as adding android:largeHeap="true" and android:hardwareAccelerated="false" to the manifest worked. Any help is appreciated <3
When reading file it is recommended to partition the read into small chunks to avoid this kind of errors.
Reading the whole files bytes into byte array has limitations, for example the file size is over Integer.MAX_VALUE you will get an OutOfMemoryException.
Try doing something similar to this:
byte[] buffer = new byte[1024];
FileInputStream fis = new FileInputStream(file);
int readBytesNum = fis.read(buffer);
while (readBytesNum > 0)
{
//do what you need
readBytesNum = in.read(buffer);
}
This means you would need to change the implementation of HexDump to handle files in partitions

Writing bytes into a file by blocks in java

I am currently writing an application which has a result a file this is composed by blocks of bytes that are processed in blocks so the goal is to process one block convert it into bytes and write(append) that block of bytes into the file then process the next block and so on..until it finishes having all of the bytes of all of the blocks stored into the file, i have been trying using the following piece of code:
try (ObjectOutputStream oos = new ObjectOutputStream(file)) {
oos.writeObject(bytestobwritten);
oos.flush();
oos.close();
stat = 1;
} catch(FileNotFoundException ex) {
Logger.getLogger(Filer.class.getName()).log(Level.WARNING, "Error by writing block of bytes", ex);
} //end catch
The above code is inside the while structure that process the bytes the variable bytestobwritten contains the bytes of the current block
The issue here is that it is not appending all of the bytes only remains the last block of bytes..i need all of them being "Concatenated" to make the result amount of bytes for that file..
do you have any idea on how to deal with this situation in java? will appreciate any help, thanks in advance.
So I'm not sure you understand your problem you're trying to solve. But first ditch ObjectOutputStream for now. We can use OutputStream (or DataOutputStream) to write bytes.
When you talk about writing blocks of bytes to a file you have to answer the question: Are all blocks the same size? That's really important because if you write blocks with varying lengths you won't be able to read it back in because you don't know where a block begins and ends. You will need to know the size of the next block before you read it. If it's fixed block size that changes the code, but fixed block sizes has the limitation that no one block can be bigger than the block size.
public saveBLocks( List<Block> blocks ) {
DataOutputStream stream = new DataOutputStream( new FileOutputStream( "someFile.txt" ) );
try {
for( int i = 0; i < blocks.size(); i++ ) {
byte[] buffer = createBuffer( blocks.get(i) );
// save out the block size to the stream if we have varying block size
stream.writeInt( buffer.length );
// save the block, assumes buffer is the exact size of the block
stream.write( buffer, 0, buffer.length );
}
stream.flush();
} finally {
stream.close();
}
}
After reading part of your question I wonder if you are just copying bytes between two streams which makes this simpler, and you don't really have to worry about blocks per se.

copying XML file from URL returns incomplete file

I am writing a small program to retrieve a large number of XML files. The program sort of works, but no matter which solution from stackoverflow I use, every XML file I save locally misses the end of the file. By "the end of the file" I mean approximately 5-10 lines of xml code. The files are of different length (~500-2500 lines) and the total length doesn't seem to have an effect on the size of the missing bit. Currently the code looks like this:
package plos;
import static org.apache.commons.io.FileUtils.copyURLToFile;
import java.io.File;
public class PlosXMLfetcher {
public PlosXMLfetcher(URL u,File f) {
try {
org.apache.commons.io.FileUtils.copyURLToFile(u, f);
} catch (IOException ex) {
Logger.getLogger(PlosXMLfetcher.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I have tried using BufferedInputStream and ReadableByteChannel as well. I have tried running it in threads, I have tried using read and readLine. Every solution gives me an incomplete XML file as return.
In some of my tests (I can't remember which, sorry), I got a socket connection reset error - but the above code executes without error messages.
I have manually downloaded some of the XML files as well, to check if they are actually complete on the remote server - which they are.
I'm guessing that somewhere along the way a BufferedWriter or BufferedOutputStream has not had flush() called on it.
Why not write your own copy function to rule out FileUtils.copyURLToFile(u, f)
public void copyURLToFile(u, f) {
InputStream in = u.openStream();
try {
FileOutputStream out = new FileOutputStream(f);
try {
byte[] buffer = new byte[1024];
int count;
while ((count = in.read(buffer) > 0) {
out.write(buffer, 0, count);
}
out.flush();
} finally {
out.close();
}
} finally {
in.close();
}
}

IFS file copy using JT400 in code

I have this piece of code that would copy files from IFS to a local drive. And I would like to ask some suggestions on how to make it better.
public void CopyFile(AS400 system, String source, String destination){
File destFile = new File(destination);
IFSFile sourceFile = new IFSFile(system, source);
if (!destFile.exists()){
try {
destFile.createNewFile();
} catch (IOException e) {
e.printStackTrace();
}
}
IFSFileInputStream in = null;
OutputStream out = null;
try {
in = new IFSFileInputStream(sourceFile);
out = new FileOutputStream(destFile);
// Transfer bytes from in to out
byte[] buf = new byte[1024];
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
} catch (AS400SecurityException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if(in != null) {
in.close();
}
if(out != null) {
out.close();
}
} catch (IOException e) {
e.printStackTrace();
}
} // end try catch finally
} // end method
Where
source = full IFS path + filename and
destination = full local path + filename
I would like to ask some things regarding the following:
a. Performance considerations
would this have a big impact in terms for CPU usage for the host AS400 system?
would this have a big impact on the JVM to be used (in terms of memory usage)
would including this to a web app affect app server performance (would it be a heavy task or not)?
would using this to copy multiple files (running it redundantly) be a big burden to all resources involved?
b. Code Quality
Did my implementation of IFSFileInputStream suffice, or would a simple FileInputStream object do the job nicely?
AFAIK, I just needed the AS400 object to make sure the source file referenced is a file from IFS.
I am a noob at AS400 and IFS an would like to ask an honest opinion from experienced ones.
All in all it looks fine (without trying). It should not have a noticeable impact.
in.read() may return 0. Test for -1 instead.
Instead of manually buffering, just wrap in and out with their respective BufferedInputStream/BufferedOutputstream and read one character at a time and test it for -1.
try-catch is hard to get pretty. This will do, but you will later get more experience and learn how to do it somewhat better.
Do NOT swallow exceptions and print them. The code calling you will have no idea whether it went well or not.
When done with an AS400 object, use as400.disconnectAllServices().
See IBM Help example code:
http://publib.boulder.ibm.com/infocenter/iadthelp/v7r1/index.jsp?topic=/com.ibm.etools.iseries.toolbox.doc/ifscopyfileexample.htm
Regards

java: decomprss files into string too slow

Here is how I compressed the string into a file:
public static void compressRawText(File outFile, String src) {
FileOutputStream fo = null;
GZIPOutputStream gz = null;
try {
fo = new FileOutputStream(outFile);
gz = new GZIPOutputStream(fo);
gz.write(src.getBytes());
gz.flush();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
gz.close();
fo.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Here is how I decompressed it:
static int BUFFER_SIZE = 8 * 1024;
static int STRING_SIZE = 2 * 1024 * 1024;
public static String decompressRawText(File inFile) {
InputStream in = null;
InputStreamReader isr = null;
StringBuilder sb = new StringBuilder(STRING_SIZE);//constant resizing is costly, so set the STRING_SIZE
try {
in = new FileInputStream(inFile);
in = new BufferedInputStream(in, BUFFER_SIZE);
in = new GZIPInputStream(in, BUFFER_SIZE);
isr = new InputStreamReader(in);
char[] cbuf = new char[BUFFER_SIZE];
int length = 0;
while ((length = isr.read(cbuf)) != -1) {
sb.append(cbuf, 0, length);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e1) {
e1.printStackTrace();
}
}
return sb.toString();
}
The decompression seems to take forever to do. I have got a feeling that I am doing too much redundant steps in the decompression bit. any idea of how I could speed it up?
EDIT: have modified the code to the above based on the following given recommendations,
1. I chaged the pattern, so to simply my code a bit, but if I couldn't use IOUtils is this still ok to use this pattern?
2. I set the StringBuilder buffer to be of 2M, as suggested by entonio, should I set it to be a little bit more? the memory is still OK, I still have around 10M available as it is suggested by the heap monitor from eclipse
3. I cut the BufferedReader and added a BufferedInputStream, but I am still not sure about the BUFFER_SIZE, any suggestions?
The above modification has improved the time taken to loop all my 30 2M files from almost 30 seconds to around 14, but I need to reduce it to under 10, is it even possible on android? Ok, basically, I need to process a text file in all 60M, I have divided them up into 30 2M, and before I start processing on each strings, I did the above timing on the time cost for me just to loop all the files and get the String in the file into my memory. Since I don't have much experience, will it be better, if I use 60 of 1M files instead? or any other improvement should I adopt? Thanks.
ALSO: Since physical IO is quite time consuming, and since my compressed version of files are all quite small(around 2K from 2M of text), is it possible for me to still do the above, but on a file that is already mapped to memory? possibly using java NIO? Thanks
The BufferedReader's only purpose is the readLine() method you don't use, so why not just read from the InputStreamReader? Also, maybe decreasing the buffer size may be helpful. Also, you should probably specify the encoding while both reading and writing, though that shouldn't have an impact on performance.
edit: more data
If you know the size of the string ahead, you should add a length parameter to decompressRawText and use it to initialise the StringBuilder. Otherwise it will be constantly resized in order to accomodate the result, and that's costly.
edit: clarification
2MB implies a lot of resizes. There is no harm if you specify a capacity higher than the length you end up with after reading (other than temporarily using more memory, of course).
You should wrap the FileInputStream with a BufferedInputStream before wrapping with a GZipInputStream, rather than using a BufferedReader.
The reason is that, depending on implementation, any of the various input classes in your decoration hierarchy could decide to read on a byte-by-byte basis (and I'd say the InputStreamReader is most likely to do this). And that would translate into many read(2) calls once it gets to the FileInputStream.
Of course, this may just be superstition on my part. But, if you're running on Linux, you can always test with strace.
Edit: once nice pattern to follow when building up a bunch of stream delegates is to use a single InputStream variable. Then, you only have one thing to close in your finally block (and can use Jakarta Commons IOUtils to avoid lots of nested try-catch-finally blocks).
InputStream in = null;
try
{
in = new FileInputStream("foo");
in = new BufferedInputStream(in);
in = new GZIPInputStream(in);
// do something with the stream
}
finally
{
IOUtils.closeQuietly(in);
}
Add a BufferedInputStream between the FileInputStream and the GZIPInputStream.
Similarly when writing.

Categories