String remoteFile2 = "/test/song.mp3";
File downloadFile2 = new File("D:/Downloads/song.mp3");
OutputStream outputStream2 = new BufferedOutputStream(new FileOutputStream(downloadFile2));
InputStream inputStream = ftpClient.retrieveFileStream(remoteFile2);
byte[] bytesArray = new byte[4096];
int bytesRead = -1;
while ((bytesRead = inputStream.read(bytesArray)) != -1) {
outputStream2.write(bytesArray, 0, bytesRead);
}
This is a sample file writing code in java,
byte[] bytesArray = new byte[4096];
In this line what exactly 4096 means, what is the possibility of changing this value?
When deal with stream, you often read bytes in chunk.
If you read / write byte one by one then there are lots of overhead (like init the array to store the byte, put the byte to stream, remember the current position in file... etc) for each byte.
So if you read a group of bytes, you still have those overhead but lesser (For example if you have 4000 bytes, you have 4000x overhead. But if you read 100 bytes per time, you have 4000/100 = 40x overhead only)
The length of chunk is often choosen to balance between the time to read/write the chunk and the size of chunk.
Its often set to 2k or 4k. Might be related with disk sector (512 bytes, 2048 bytes...)
Here 4096 is the buffer size. So whenever you loop is going on it first read 4096 bytes and after that it will go inside the loop.
Related
I'm trying to write a function which downloads a file at a specific URL. The function produces a corrupt file unless I make the buffer an array of size 1 (as it is in the code below).
The ternary statement above the buffer initialization (which I plan to use) along with hard-coded integer values other than 1 will manufacture a corrupted file.
Note: MAX_BUFFER_SIZE is a constant, defined as 8192 (2^13) in my code.
public static void downloadFile(String webPath, String localDir, String fileName) {
try {
File localFile;
FileOutputStream writableLocalFile;
InputStream stream;
url = new URL(webPath);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
int size = connection.getContentLength(); //File size in bytes
int read = 0; //Bytes read
localFile = new File(localDir);
//Ensure that directory exists, otherwise create it.
if (!localFile.exists())
localFile.mkdirs();
//Ensure that file exists, otherwise create it.
//Note that if we define the file path as we do below initially and call mkdirs() it will create a folder with the file name (I.e. test.exe). There may be a better alternative, revisit later.
localFile = new File(localDir + fileName);
if (!localFile.exists())
localFile.createNewFile();
writableLocalFile = new FileOutputStream(localFile);
stream = connection.getInputStream();
byte[] buffer;
int remaining;
while (read != size) {
remaining = size - read; //Bytes still to be read
//remaining > MAX_BUFFER_SIZE ? MAX_BUFFER_SIZE : remaining
buffer = new byte[1]; //Adjust buffer size according to remaining data (to be read).
read += stream.read(buffer); //Read buffer-size amount of bytes from the stream.
writableLocalFile.write(buffer, 0, buffer.length); //Args: Bytes to read, offset, number of bytes
}
System.out.println("Read " + read + " bytes.");
writableLocalFile.close();
stream.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
The reason I've written it this way is so I may provide a real time progress bar to the user as they are downloading. I've removed it from the code to reduce clutter.
len = stream.read(buffer);
read += len;
writableLocalFile.write(buffer, 0, len);
You must not use buffer.length as the bytes read, you need to use the return value of the read call. Because it might return a short read and then your buffer contains junk (0 bytes or data from previous reads) after the read bytes.
And besides calculating the remaining and using dynamic buffers just go for 16k or something like that. The last read will be short, which is fine.
InputStream.read() may read number of bytes fewer than you requested. But you always append whole buffer to the file. You need to capture actual number of read bytes and append only those bytes to the file.
Additionally:
Watch for InputStream.read() to return -1 (EOF)
Server may return incorrect size. As such, the check read != size is dangerous. I would advise not to rely on the Content-Length HTTP field altogether. Instead, just keep reading from the input stream until you hit EOF.
I want to split a file into multiple chunks (in this case, trying lengths of 300) and base64 encode it, since loading the entire file to memory gives a negative array exception when base64 encoding it. I tried using the following code:
int offset = 0;
bis = new BufferedInputStream(new FileInputStream(f));
while(offset + 300 <= f.length()){
byte[] temp = new byte[300];
bis.skip(offset);
bis.read(temp, 0, 300);
offset += 300;
System.out.println(Base64.encode(temp));
}
if(offset < f.length()){
byte[] temp = new byte[(int) f.length() - offset];
bis.skip(offset);
bis.read(temp, 0, temp.length);
System.out.println(Base64.encode(temp));
}
At first it appears to be working, however, at one point it switches to just printing out "AAAAAAAAA" and fills up the entire console with it, and the new file is corrupted when decoded. What could be causing this error?
skip() "Skips over and discards n bytes of data from the input stream", and read() returns "the number of bytes read".
So, you read some bytes, skip some bytes, read some more, skip, .... eventually reaching EOF at which point read() returns -1, but you ignore that and use the content of temp which contains all 0's, that are then encoded to all A's.
Your code should be:
try (InputStream in = new BufferedInputStream(new FileInputStream(f))) {
int len;
byte[] temp = new byte[300];
while ((len = in.read(temp)) > 0)
System.out.println(Base64.encode(temp, 0, len));
}
This code reuses the single buffer allocated before the loop, so it will also cause much less garbage collection than your code.
If Base64.encode doesn't have a 3 parameter version, do this:
try (InputStream in = new BufferedInputStream(new FileInputStream(f))) {
int len;
byte[] temp = new byte[300];
while ((len = in.read(temp)) > 0) {
byte[] data;
if (len == temp.length)
data = temp;
else {
data = new byte[len];
System.arraycopy(temp, 0, data, 0, len);
}
System.out.println(Base64.encode(data));
}
}
Be sure to use a buffer size that is a multiple of 3 for encoding and a multiple of 4 for decoding when using chunks of data.
300 fulfills both, so that is already OK. Just as an info for those trying different buffer sizes.
Keep in mind, that reading from a stream into a buffer can in some cicumstances result in a buffer not being fully filled, even though the end of the stream was not yet reached. Might be possible when reading from an internet stream and a timeout occures.
You can heal that, but taking that into account would lead to much more complex coding, that would not be educational anymore.
I'm trying to make a file hexadecimal converter (input file -> output hex string of the file)
The code I came up with is
static String open2(String path) throws FileNotFoundException, IOException,OutOfMemoryError {
System.out.println("BEGIN LOADING FILE");
StringBuilder sb = new StringBuilder();
//sb.ensureCapacity(2147483648);
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
//System.out.println(sb.capacity());
sb.append(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return sb.toString();
}
I am sure that "path" is a valid filename.
The problem is with big files (>=
500mb), the compiler outputs a OutOfMemoryError: Java Heap Space on the StringBuilder.append.
To create this code I followed some tips from http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly but I got a doubt when I tried to force a space allocation for the StringBuilder sb: "2147483648 is too big for an int".
If I want to use this code even with very big files (let's say up to 2gb if I really have to stop somewhere) what's the better way to output a hexadecimal string conversion of the file in terms of speed?
I'm now working on copying the converted string into a file. Anyway I'm having problems of "writing the empty buffer on the file" after the eof of the original one.
static String open3(String path) throws FileNotFoundException, IOException {
System.out.println("BEGIN LOADING FILE (Hope this is the last change)");
FileWriter fos = new FileWriter("HEXTMP");
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
fos.write(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return "HEXTMP";
}
obviously the file HEXTMP created has a size multiple of 256k, but if the file is 257k it will be a 512 file with LOT of "000000" at the end.
I know I just have to create a last byte array with cut length.
(I used a file writer because i wanted to write the string of hex; otherwise it would have just copied the file as-is)
Why are you loading complete file?
You can load few bytes in buffer from input file, process bytes in buffer, then write processed bytes buffer to output file. Continue this till all bytes from input file are not processed.
FileInputStream fis = new FileInputStream("in file");
FileOutputStream fos = new FileOutputStream("out");
byte buffer [] = new byte[8192];
while(true){
int count = fis.read(buffer);
if(count == -1)
break;
byte[] processed = processBytesToConvert(buffer, count);
fos.write(processed);
}
fis.close();
fos.close();
So just read few bytes in buffer, convert it to hex string, get bytes from converted hex string, then write back these bytes to file, and continue for next few input bytes.
The problem here is that you try to read the whole file and store it in memory.
You should use stream, read some lines of your input file, convert them and write them in the output file. That way your program can scale, whatever the size of the input file is.
The key would be to read file in chunks instead of reading all of it in one go. Depending on its use you could vary size of the chunk. For example, if you are trying to make a hex viewer / editor determine how much content is being shown in the viewport and read only as much of data from file. Or if you are simply converting and dumping hex to another file use any chunk size that is small enough to fit in memory but big enough for performance. This should be tunable over some runs. Perhaps use filesystem NIO in Java 7 so that you can do all three tasks - reading, processing and writing - concurrently. The link included in question gives good primer on reading files.
what's the probably fastest way of reading relatively huge files with Java's I/O-methods? My current solution uses the BufferedInputStream saving to an byte-array with 1024 bytes allocated to it. Each buffer is than saved in an ArrayList for later use. The whole process is called via a separate thread (callable-interface).
Not very fast though.
ArrayList<byte[]> outputArr = new ArrayList<byte[]>();
try {
BufferedInputStream reader = new BufferedInputStream(new FileInputStream (dir+filename));
byte[] buffer = new byte[LIMIT]; // == 1024
int i = 0;
while (reader.available() != 0) {
reader.read(buffer);
i++;
if (i <= LIMIT){
outputArr.add(buffer);
i = 0;
buffer = null;
buffer = new byte[LIMIT];
}
else continue;
}
System.out.println("FileReader-Elements: "+outputArr.size()+" w. "+buffer.length+" byte each.");
I would use a memory mapped file which is fast enough to do in the same thread.
final FileChannel channel = new FileInputStream(fileName).getChannel();
MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
// when finished
channel.close();
This assumes the file is smaller than 2 GB and will take 10 milli-seconds or less.
Don't use available(): it's not reliable. And don't ignore the result of the read() method: it tells you how many bytes were actually read. And if you want to read everything in memory, use a ByteArrayOutputStream rather than using a List<byte[]>:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int read;
while ((read = reader.read(buffer)) >= 0) {
baos.write(buffer, 0, read);
}
byte[] everything = baos.toByteArray();
I think 1024 is a bit small as a buffer size. I would use a larger buffer (something like 16 KB or 32KB)
Note that Apache commons IO and Guava have utility methods that do this for you, and have been optimized already.
Have a look at Java NIO (Non-Blocking Input/Output) API. Also, this question might prove being useful.
I don't have much experience with IO, but I've heard that NIO is much more efficient way of handling large sets of data.
I'm implementing a file transfer server, and I've run into an issue with sending a file larger than 2 GB over the network. The issue starts when I get the File I want to work with and try to read its contents into a byte[]. I have a for loop :
for(long i = 0; i < fileToSend.length(); i += PACKET_SIZE){
fileBytes = getBytesFromFile(fileToSend, i);
where getBytesFromFile() reads a PACKET_SIZE amount of bytes from fileToSend which is then sent to the client in the for loop. getBytesFromFile() uses i as an offset; however, the offset variable in FileInputStream.read() has to be an int. I'm sure there is a better way to read this file into the array, I just haven't found it yet.
I would prefer to not use NIO yet, although I will switch to using that in the future. Indulge my madness :-)
It doesn't look like you're reading data from the file properly. When reading data from a stream in Java, it's standard practice to read data into a buffer. The size of the buffer can be your packet size.
File fileToSend = //...
InputStream in = new FileInputStream(fileToSend);
OutputStream out = //...
byte buffer[] = new byte[PACKET_SIZE];
int read;
while ((read = in.read(buffer)) != -1){
out.write(buffer, 0, read);
}
in.close();
out.close();
Note that, the size of the buffer array remains constant. But-- if the buffer cannot be filled (like when it reaches the end of the file), the remaining elements of the array will contain data from the last packet, so you must ignore these elements (this is what the out.write() line in my code sample does)
Umm, realize that your handling of the variable i is not correct..
Iteration 0: i=0
Iteration 1: i=PACKET_SIZE
...
...
Iteration n: i=PACKET_SIZE*n