Base64 encode file by chunks - java

I want to split a file into multiple chunks (in this case, trying lengths of 300) and base64 encode it, since loading the entire file to memory gives a negative array exception when base64 encoding it. I tried using the following code:
int offset = 0;
bis = new BufferedInputStream(new FileInputStream(f));
while(offset + 300 <= f.length()){
byte[] temp = new byte[300];
bis.skip(offset);
bis.read(temp, 0, 300);
offset += 300;
System.out.println(Base64.encode(temp));
}
if(offset < f.length()){
byte[] temp = new byte[(int) f.length() - offset];
bis.skip(offset);
bis.read(temp, 0, temp.length);
System.out.println(Base64.encode(temp));
}
At first it appears to be working, however, at one point it switches to just printing out "AAAAAAAAA" and fills up the entire console with it, and the new file is corrupted when decoded. What could be causing this error?

skip() "Skips over and discards n bytes of data from the input stream", and read() returns "the number of bytes read".
So, you read some bytes, skip some bytes, read some more, skip, .... eventually reaching EOF at which point read() returns -1, but you ignore that and use the content of temp which contains all 0's, that are then encoded to all A's.
Your code should be:
try (InputStream in = new BufferedInputStream(new FileInputStream(f))) {
int len;
byte[] temp = new byte[300];
while ((len = in.read(temp)) > 0)
System.out.println(Base64.encode(temp, 0, len));
}
This code reuses the single buffer allocated before the loop, so it will also cause much less garbage collection than your code.
If Base64.encode doesn't have a 3 parameter version, do this:
try (InputStream in = new BufferedInputStream(new FileInputStream(f))) {
int len;
byte[] temp = new byte[300];
while ((len = in.read(temp)) > 0) {
byte[] data;
if (len == temp.length)
data = temp;
else {
data = new byte[len];
System.arraycopy(temp, 0, data, 0, len);
}
System.out.println(Base64.encode(data));
}
}

Be sure to use a buffer size that is a multiple of 3 for encoding and a multiple of 4 for decoding when using chunks of data.
300 fulfills both, so that is already OK. Just as an info for those trying different buffer sizes.
Keep in mind, that reading from a stream into a buffer can in some cicumstances result in a buffer not being fully filled, even though the end of the stream was not yet reached. Might be possible when reading from an internet stream and a timeout occures.
You can heal that, but taking that into account would lead to much more complex coding, that would not be educational anymore.

Related

Downloaded files are corrupted when buffer length is > 1

I'm trying to write a function which downloads a file at a specific URL. The function produces a corrupt file unless I make the buffer an array of size 1 (as it is in the code below).
The ternary statement above the buffer initialization (which I plan to use) along with hard-coded integer values other than 1 will manufacture a corrupted file.
Note: MAX_BUFFER_SIZE is a constant, defined as 8192 (2^13) in my code.
public static void downloadFile(String webPath, String localDir, String fileName) {
try {
File localFile;
FileOutputStream writableLocalFile;
InputStream stream;
url = new URL(webPath);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
int size = connection.getContentLength(); //File size in bytes
int read = 0; //Bytes read
localFile = new File(localDir);
//Ensure that directory exists, otherwise create it.
if (!localFile.exists())
localFile.mkdirs();
//Ensure that file exists, otherwise create it.
//Note that if we define the file path as we do below initially and call mkdirs() it will create a folder with the file name (I.e. test.exe). There may be a better alternative, revisit later.
localFile = new File(localDir + fileName);
if (!localFile.exists())
localFile.createNewFile();
writableLocalFile = new FileOutputStream(localFile);
stream = connection.getInputStream();
byte[] buffer;
int remaining;
while (read != size) {
remaining = size - read; //Bytes still to be read
//remaining > MAX_BUFFER_SIZE ? MAX_BUFFER_SIZE : remaining
buffer = new byte[1]; //Adjust buffer size according to remaining data (to be read).
read += stream.read(buffer); //Read buffer-size amount of bytes from the stream.
writableLocalFile.write(buffer, 0, buffer.length); //Args: Bytes to read, offset, number of bytes
}
System.out.println("Read " + read + " bytes.");
writableLocalFile.close();
stream.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
The reason I've written it this way is so I may provide a real time progress bar to the user as they are downloading. I've removed it from the code to reduce clutter.
len = stream.read(buffer);
read += len;
writableLocalFile.write(buffer, 0, len);
You must not use buffer.length as the bytes read, you need to use the return value of the read call. Because it might return a short read and then your buffer contains junk (0 bytes or data from previous reads) after the read bytes.
And besides calculating the remaining and using dynamic buffers just go for 16k or something like that. The last read will be short, which is fine.
InputStream.read() may read number of bytes fewer than you requested. But you always append whole buffer to the file. You need to capture actual number of read bytes and append only those bytes to the file.
Additionally:
Watch for InputStream.read() to return -1 (EOF)
Server may return incorrect size. As such, the check read != size is dangerous. I would advise not to rely on the Content-Length HTTP field altogether. Instead, just keep reading from the input stream until you hit EOF.

File writing in Java using FileOutputStream

String remoteFile2 = "/test/song.mp3";
File downloadFile2 = new File("D:/Downloads/song.mp3");
OutputStream outputStream2 = new BufferedOutputStream(new FileOutputStream(downloadFile2));
InputStream inputStream = ftpClient.retrieveFileStream(remoteFile2);
byte[] bytesArray = new byte[4096];
int bytesRead = -1;
while ((bytesRead = inputStream.read(bytesArray)) != -1) {
outputStream2.write(bytesArray, 0, bytesRead);
}
This is a sample file writing code in java,
byte[] bytesArray = new byte[4096];
In this line what exactly 4096 means, what is the possibility of changing this value?
When deal with stream, you often read bytes in chunk.
If you read / write byte one by one then there are lots of overhead (like init the array to store the byte, put the byte to stream, remember the current position in file... etc) for each byte.
So if you read a group of bytes, you still have those overhead but lesser (For example if you have 4000 bytes, you have 4000x overhead. But if you read 100 bytes per time, you have 4000/100 = 40x overhead only)
The length of chunk is often choosen to balance between the time to read/write the chunk and the size of chunk.
Its often set to 2k or 4k. Might be related with disk sector (512 bytes, 2048 bytes...)
Here 4096 is the buffer size. So whenever you loop is going on it first read 4096 bytes and after that it will go inside the loop.

How can I read a specific number of bytes from a FileInputStream object using buffers

I have a series of objects stored within a file concatenated as below:
sizeOfFile1 || file1 || sizeOfFile2 || file2 ...
The size of the files are serialized long objects and the files are just the raw bytes of the files.
I am trying to extract the files from the input file. Below is my code:
FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
long size = (long) objectInputStream.readObject();
FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
int chunkSize = 256;
final byte[] temp = new byte[chunkSize];
int finalChunkSize = (int) (size % chunkSize);
final byte[] finalTemp = new byte[finalChunkSize];
while(fileInputStream.available() > 0 && size > 0)
{
if (fileInputStream.available() > finalChunkSize)
{
int i = fileInputStream.read(temp);
secBufferedOutputStream.write(temp, 0, i);
size = size - i;
}
else
{
int i = fileInputStream.read(finalTemp);
secBufferedOutputStream.write(finalTemp, 0, i);
size = 0;
}
}
bufferedOutputStream.close();
}
fileOutputStream.close();
My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.
Can anyone see the issue here?
Regards.
Wrap it in a DataInputStream and use readFully(byte[]).
But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.
NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.
you could try NIO instead...
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);
This reads only SIZE bytes from the file.
B
This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:
DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
OutputStream out = ...;
long size;
try { size = in.readLong(); } catch (EOFException e) { break; }
while(size > 0) {
int len = (size > buf.length)?buf.length:size;
len = in.read(buf, 0, len);
out.write(buf, 0, len);
size-=len;
}
out.close();
}
Save yourself a lot of trouble by doing one of these things:
Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
or use Thrift
or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)
If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.
There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.
I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.

Fastest way of reading relatively huge byte-files in Java

what's the probably fastest way of reading relatively huge files with Java's I/O-methods? My current solution uses the BufferedInputStream saving to an byte-array with 1024 bytes allocated to it. Each buffer is than saved in an ArrayList for later use. The whole process is called via a separate thread (callable-interface).
Not very fast though.
ArrayList<byte[]> outputArr = new ArrayList<byte[]>();
try {
BufferedInputStream reader = new BufferedInputStream(new FileInputStream (dir+filename));
byte[] buffer = new byte[LIMIT]; // == 1024
int i = 0;
while (reader.available() != 0) {
reader.read(buffer);
i++;
if (i <= LIMIT){
outputArr.add(buffer);
i = 0;
buffer = null;
buffer = new byte[LIMIT];
}
else continue;
}
System.out.println("FileReader-Elements: "+outputArr.size()+" w. "+buffer.length+" byte each.");
I would use a memory mapped file which is fast enough to do in the same thread.
final FileChannel channel = new FileInputStream(fileName).getChannel();
MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
// when finished
channel.close();
This assumes the file is smaller than 2 GB and will take 10 milli-seconds or less.
Don't use available(): it's not reliable. And don't ignore the result of the read() method: it tells you how many bytes were actually read. And if you want to read everything in memory, use a ByteArrayOutputStream rather than using a List<byte[]>:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int read;
while ((read = reader.read(buffer)) >= 0) {
baos.write(buffer, 0, read);
}
byte[] everything = baos.toByteArray();
I think 1024 is a bit small as a buffer size. I would use a larger buffer (something like 16 KB or 32KB)
Note that Apache commons IO and Guava have utility methods that do this for you, and have been optimized already.
Have a look at Java NIO (Non-Blocking Input/Output) API. Also, this question might prove being useful.
I don't have much experience with IO, but I've heard that NIO is much more efficient way of handling large sets of data.

Java Read File Larger than 2 GB (Using Chunking)

I'm implementing a file transfer server, and I've run into an issue with sending a file larger than 2 GB over the network. The issue starts when I get the File I want to work with and try to read its contents into a byte[]. I have a for loop :
for(long i = 0; i < fileToSend.length(); i += PACKET_SIZE){
fileBytes = getBytesFromFile(fileToSend, i);
where getBytesFromFile() reads a PACKET_SIZE amount of bytes from fileToSend which is then sent to the client in the for loop. getBytesFromFile() uses i as an offset; however, the offset variable in FileInputStream.read() has to be an int. I'm sure there is a better way to read this file into the array, I just haven't found it yet.
I would prefer to not use NIO yet, although I will switch to using that in the future. Indulge my madness :-)
It doesn't look like you're reading data from the file properly. When reading data from a stream in Java, it's standard practice to read data into a buffer. The size of the buffer can be your packet size.
File fileToSend = //...
InputStream in = new FileInputStream(fileToSend);
OutputStream out = //...
byte buffer[] = new byte[PACKET_SIZE];
int read;
while ((read = in.read(buffer)) != -1){
out.write(buffer, 0, read);
}
in.close();
out.close();
Note that, the size of the buffer array remains constant. But-- if the buffer cannot be filled (like when it reaches the end of the file), the remaining elements of the array will contain data from the last packet, so you must ignore these elements (this is what the out.write() line in my code sample does)
Umm, realize that your handling of the variable i is not correct..
Iteration 0: i=0
Iteration 1: i=PACKET_SIZE
...
...
Iteration n: i=PACKET_SIZE*n

Categories