Zlib compression is too big in size - java

I am completely new to java, I have decided to learn it by doing a small project in it. I need to compress some string using zlib and write it to a file. However, file turn out to be too big in size. Here is code example:
String input = "yasar\0yasar"; // test input. Input will have null character in it.
byte[] compressed = new byte[100]; // hold compressed content
Deflater compresser = new Deflater();
compresser.setInput(input.getBytes());
compresser.finish();
compresser.deflate(compressed);
File test_file = new File(System.getProperty("user.dir"), "test_file");
try {
if (!test_file.exists()) {
test_file.createNewFile();
}
try (FileOutputStream fos = new FileOutputStream(test_file)) {
fos.write(compressed);
}
} catch (IOException e) {
e.printStackTrace();
}
This write a 1 kilobytes file, while the file should be at most 11 bytes (because the content is 11 bytes here.). I think problem is in the way I initialize the byte array compressed as 100 bytes, but I don't know how big the compreesed data will be in advance. What am I doing wrong here? How can I fix it?

If you don't want to write the whole array and instead write just the part of it that was filled by Deflater use OutputStream#write(byte[] array, int offset, int lenght)
Roughly like
String input = "yasar\0yasar"; // test input. Input will have null character in it.
byte[] compressed = new byte[100]; // hold compressed content
Deflater compresser = new Deflater();
compresser.setInput(input.getBytes());
compresser.finish();
int length = compresser.deflate(compressed);
File test_file = new File(System.getProperty("user.dir"), "test_file");
try {
if (!test_file.exists()) {
test_file.createNewFile();
}
try (FileOutputStream fos = new FileOutputStream(test_file)) {
fos.write(compressed, 0, length); // starting at 0th byte - lenght(-1)
}
} catch (IOException e) {
e.printStackTrace();
}
You will probably still see 1kB or so in Windows because what you see there seems to be either rounded (you wrote 100 bytes before) or it refers to the size on the filesystem which is at least 1 block large (should be 4kb IIRC). Rightclick the file and check the size in the properties, that should show the actual size.
If you don't know the size in advance, don't use Deflater, use a DeflaterOutputStream that writes data of any length compressed.
try (OutputStream out = new DeflaterOutputStream(new FileOutputStream(test_file))) {
out.write("hello!".getBytes());
}
Above example will use the default values for deflating but you can pass a configured Deflater in the constructor of DeflaterOutputStream to change the behavior.

you write to file all 100 bytes of compressed array, but you have to write only really compressed bytes returned by deflater.
int compressedsize = compresser.deflate(compressed);
fos.write(compressed, 0, compressedsize);

Related

Base64 Encoded to Decoded File Conversion Problem

I am processing very large files (> 2Gig). Each input file is Base64 encoded, andI am outputting to new files after decoding. Depending on the buffer size (LARGE_BUF) and for a given input file, my input to output conversion either works fine, is missing one or more bytes, or throws an exception at the outputStream.write line (IllegalArgumentException: Last unit does not have enough bits). Here is the code snippet (could not cut and paste so my not be perfect):
.
.
final int LARGE_BUF = 1024;
byte[] inBuf = new byte[LARGE_BUF];
try(InputStream inputStream = new FileInputStream(inFile); OutputStream outStream new new FileOutputStream(outFile)) {
for(int len; (len = inputStream.read(inBuf)) > 0); ) {
String out = new String(inBuf, 0, len);
outStream.write(Base64.getMimeDecoder().decode(out.getBytes()));
}
}
For instance, for my sample input file, if LARGE_BUF is 1024, output file is 4 bytes too small, if 2*1024, I get the exception mentioned above, if 7*1024, it works correctly. Grateful for any ideas. Thank you.
First, you are converting bytes into a String, then immediately back into bytes. So, remove the use of String entirely.
Second, base64 encoding turns each sequence of three bytes into four bytes, so when decoding, you need four bytes to properly decode three bytes of original data. It is not safe to create a new decoder for each arbitrarily read sequence of bytes, which may or may not have a length which is an exact multiple of four.
Finally, Base64.Decoder has a wrap(InputStream) method which makes this considerably easier:
try (InputStream inputStream = Base64.getDecoder().wrap(
new BufferedInputStream(
Files.newInputStream(Paths.get(inFile))))) {
Files.copy(inputStream, Paths.get(outFile));
}

Downloaded files are corrupted when buffer length is > 1

I'm trying to write a function which downloads a file at a specific URL. The function produces a corrupt file unless I make the buffer an array of size 1 (as it is in the code below).
The ternary statement above the buffer initialization (which I plan to use) along with hard-coded integer values other than 1 will manufacture a corrupted file.
Note: MAX_BUFFER_SIZE is a constant, defined as 8192 (2^13) in my code.
public static void downloadFile(String webPath, String localDir, String fileName) {
try {
File localFile;
FileOutputStream writableLocalFile;
InputStream stream;
url = new URL(webPath);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
int size = connection.getContentLength(); //File size in bytes
int read = 0; //Bytes read
localFile = new File(localDir);
//Ensure that directory exists, otherwise create it.
if (!localFile.exists())
localFile.mkdirs();
//Ensure that file exists, otherwise create it.
//Note that if we define the file path as we do below initially and call mkdirs() it will create a folder with the file name (I.e. test.exe). There may be a better alternative, revisit later.
localFile = new File(localDir + fileName);
if (!localFile.exists())
localFile.createNewFile();
writableLocalFile = new FileOutputStream(localFile);
stream = connection.getInputStream();
byte[] buffer;
int remaining;
while (read != size) {
remaining = size - read; //Bytes still to be read
//remaining > MAX_BUFFER_SIZE ? MAX_BUFFER_SIZE : remaining
buffer = new byte[1]; //Adjust buffer size according to remaining data (to be read).
read += stream.read(buffer); //Read buffer-size amount of bytes from the stream.
writableLocalFile.write(buffer, 0, buffer.length); //Args: Bytes to read, offset, number of bytes
}
System.out.println("Read " + read + " bytes.");
writableLocalFile.close();
stream.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
The reason I've written it this way is so I may provide a real time progress bar to the user as they are downloading. I've removed it from the code to reduce clutter.
len = stream.read(buffer);
read += len;
writableLocalFile.write(buffer, 0, len);
You must not use buffer.length as the bytes read, you need to use the return value of the read call. Because it might return a short read and then your buffer contains junk (0 bytes or data from previous reads) after the read bytes.
And besides calculating the remaining and using dynamic buffers just go for 16k or something like that. The last read will be short, which is fine.
InputStream.read() may read number of bytes fewer than you requested. But you always append whole buffer to the file. You need to capture actual number of read bytes and append only those bytes to the file.
Additionally:
Watch for InputStream.read() to return -1 (EOF)
Server may return incorrect size. As such, the check read != size is dangerous. I would advise not to rely on the Content-Length HTTP field altogether. Instead, just keep reading from the input stream until you hit EOF.

Memory problems loading a file, plus converting into hex

I'm trying to make a file hexadecimal converter (input file -> output hex string of the file)
The code I came up with is
static String open2(String path) throws FileNotFoundException, IOException,OutOfMemoryError {
System.out.println("BEGIN LOADING FILE");
StringBuilder sb = new StringBuilder();
//sb.ensureCapacity(2147483648);
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
//System.out.println(sb.capacity());
sb.append(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return sb.toString();
}
I am sure that "path" is a valid filename.
The problem is with big files (>=
500mb), the compiler outputs a OutOfMemoryError: Java Heap Space on the StringBuilder.append.
To create this code I followed some tips from http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly but I got a doubt when I tried to force a space allocation for the StringBuilder sb: "2147483648 is too big for an int".
If I want to use this code even with very big files (let's say up to 2gb if I really have to stop somewhere) what's the better way to output a hexadecimal string conversion of the file in terms of speed?
I'm now working on copying the converted string into a file. Anyway I'm having problems of "writing the empty buffer on the file" after the eof of the original one.
static String open3(String path) throws FileNotFoundException, IOException {
System.out.println("BEGIN LOADING FILE (Hope this is the last change)");
FileWriter fos = new FileWriter("HEXTMP");
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
fos.write(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return "HEXTMP";
}
obviously the file HEXTMP created has a size multiple of 256k, but if the file is 257k it will be a 512 file with LOT of "000000" at the end.
I know I just have to create a last byte array with cut length.
(I used a file writer because i wanted to write the string of hex; otherwise it would have just copied the file as-is)
Why are you loading complete file?
You can load few bytes in buffer from input file, process bytes in buffer, then write processed bytes buffer to output file. Continue this till all bytes from input file are not processed.
FileInputStream fis = new FileInputStream("in file");
FileOutputStream fos = new FileOutputStream("out");
byte buffer [] = new byte[8192];
while(true){
int count = fis.read(buffer);
if(count == -1)
break;
byte[] processed = processBytesToConvert(buffer, count);
fos.write(processed);
}
fis.close();
fos.close();
So just read few bytes in buffer, convert it to hex string, get bytes from converted hex string, then write back these bytes to file, and continue for next few input bytes.
The problem here is that you try to read the whole file and store it in memory.
You should use stream, read some lines of your input file, convert them and write them in the output file. That way your program can scale, whatever the size of the input file is.
The key would be to read file in chunks instead of reading all of it in one go. Depending on its use you could vary size of the chunk. For example, if you are trying to make a hex viewer / editor determine how much content is being shown in the viewport and read only as much of data from file. Or if you are simply converting and dumping hex to another file use any chunk size that is small enough to fit in memory but big enough for performance. This should be tunable over some runs. Perhaps use filesystem NIO in Java 7 so that you can do all three tasks - reading, processing and writing - concurrently. The link included in question gives good primer on reading files.

Out of memory when encoding file to base64

Using Base64 from Apache commons
public byte[] encode(File file) throws FileNotFoundException, IOException {
byte[] encoded;
try (FileInputStream fin = new FileInputStream(file)) {
byte fileContent[] = new byte[(int) file.length()];
fin.read(fileContent);
encoded = Base64.encodeBase64(fileContent);
}
return encoded;
}
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at org.apache.commons.codec.binary.BaseNCodec.encode(BaseNCodec.java:342)
at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:657)
at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:622)
at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:604)
I'm making small app for mobile device.
You cannot just load the whole file into memory, like here:
byte fileContent[] = new byte[(int) file.length()];
fin.read(fileContent);
Instead load the file chunk by chunk and encode it in parts. Base64 is a simple encoding, it is enough to load 3 bytes and encode them at a time (this will produce 4 bytes after encoding). For performance reasons consider loading multiples of 3 bytes, e.g. 3000 bytes - should be just fine. Also consider buffering input file.
An example:
byte fileContent[] = new byte[3000];
try (FileInputStream fin = new FileInputStream(file)) {
while(fin.read(fileContent) >= 0) {
Base64.encodeBase64(fileContent);
}
}
Note that you cannot simply append results of Base64.encodeBase64() to encoded bbyte array. Actually, it is not loading the file but encoding it to Base64 causing the out-of-memory problem. This is understandable because Base64 version is bigger (and you already have a file occupying a lot of memory).
Consider changing your method to:
public void encode(File file, OutputStream base64OutputStream)
and sending Base64-encoded data directly to the base64OutputStream rather than returning it.
UPDATE: Thanks to #StephenC I developed much easier version:
public void encode(File file, OutputStream base64OutputStream) {
InputStream is = new FileInputStream(file);
OutputStream out = new Base64OutputStream(base64OutputStream)
IOUtils.copy(is, out);
is.close();
out.close();
}
It uses Base64OutputStream that translates input to Base64 on-the-fly and IOUtils class from Apache Commons IO.
Note: you must close the FileInputStream and Base64OutputStream explicitly to print = if required but buffering is handled by IOUtils.copy().
Either the file is too big, or your heap is too small, or you've got a memory leak.
If this only happens with really big files, put something into your code to check the file size and reject files that are unreasonably big.
If this happens with small files, increase your heap size by using the -Xmx command line option when you launch the JVM. (If this is in a web container or some other framework, check the documentation on how to do it.)
If the file recurs, especially with small files, the chances are that you've got a memory leak.
The other point that should be made is that your current approach entails holding two complete copies of the file in memory. You should be able to reduce the memory usage, though you'll typically need a stream-based Base64 encoder to do this. (It depends on which flavor of the base64 encoding you are using ...)
This page describes a stream-based Base64 encoder / decoder library, and includes lnks to some alternatives.
Well, do not do it for the whole file at once.
Base64 works on 3 bytes at a time, so you can read your file in batches of "multiple of 3" bytes, encode them and repeat until you finish the file:
// the base64 encoding - acceptable estimation of encoded size
StringBuilder sb = new StringBuilder(file.length() / 3 * 4);
FileInputStream fin = null;
try {
fin = new FileInputStream("some.file");
// Max size of buffer
int bSize = 3 * 512;
// Buffer
byte[] buf = new byte[bSize];
// Actual size of buffer
int len = 0;
while((len = fin.read(buf)) != -1) {
byte[] encoded = Base64.encodeBase64(buf);
// Although you might want to write the encoded bytes to another
// stream, otherwise you'll run into the same problem again.
sb.append(new String(buf, 0, len));
}
} catch(IOException e) {
if(null != fin) {
fin.close();
}
}
String base64EncodedFile = sb.toString();
You are not reading the whole file, just the first few kb. The read method returns how many bytes were actually read. You should call read in a loop until it returns -1 to be sure that you have read everything.
The file is too big for both it and its base64 encoding to fit in memory. Either
process the file in smaller pieces or
increase the memory available to the JVM with the -Xmx switch, e.g.
java -Xmx1024M YourProgram
This is best code to upload image of more size
bitmap=Bitmap.createScaledBitmap(bitmap, 100, 100, true);
ByteArrayOutputStream stream = new ByteArrayOutputStream();
bitmap.compress(Bitmap.CompressFormat.PNG, 100, stream); //compress to which format you want.
byte [] byte_arr = stream.toByteArray();
String image_str = Base64.encodeBytes(byte_arr);
Well, looks like your file is too large to keep the multiple copies necessary for an in-memory Base64 encoding in the available heap memory at the same time. Given that this is for a mobile device, it's probably not possible to increase the heap, so you have two options:
make the file smaller (much smaller)
Do it in a stram-based way so that you're reading from an InputStream one small part of the file at a time, encode it and write it to an OutputStream, without ever keeping the enitre file in memory.
In Manifest in applcation tag write following
android:largeHeap="true"
It worked for me
Java 8 added Base64 methods, so Apache Commons is no longer needed to encode large files.
public static void encodeFileToBase64(String inputFile, String outputFile) {
try (OutputStream out = Base64.getEncoder().wrap(new FileOutputStream(outputFile))) {
Files.copy(Paths.get(inputFile), out);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}

Reading and writing binary file in Java (seeing half of the file being corrupted)

I have some working code in python that I need to convert to Java.
I have read quite a few threads on this forum but could not find an answer. I am reading in a JPG image and converting it into a byte array. I then write this buffer it to a different file. When I compare the written files from both Java and python code, the bytes at the end do not match. Please let me know if you have a suggestion. I need to use the byte array to pack the image into a message that needs to be sent over to a remote server.
Java code (Running on Android)
Reading the file:
File queryImg = new File(ImagePath);
int imageLen = (int)queryImg.length();
byte [] imgData = new byte[imageLen];
FileInputStream fis = new FileInputStream(queryImg);
fis.read(imgData);
Writing the file:
FileOutputStream f = new FileOutputStream(new File("/sdcard/output.raw"));
f.write(imgData);
f.flush();
f.close();
Thanks!
InputStream.read is not guaranteed to read any particular number of bytes and may read less than you asked it to. It returns the actual number read so you can have a loop that keeps track of progress:
public void pump(InputStream in, OutputStream out, int size) {
byte[] buffer = new byte[4096]; // Or whatever constant you feel like using
int done = 0;
while (done < size) {
int read = in.read(buffer);
if (read == -1) {
throw new IOException("Something went horribly wrong");
}
out.write(buffer, 0, read);
done += read;
}
// Maybe put cleanup code in here if you like, e.g. in.close, out.flush, out.close
}
I believe Apache Commons IO has classes for doing this kind of stuff so you don't need to write it yourself.
Your file length might be more than int can hold and than you end up having wrong array length, hence not reading entire file into the buffer.

Categories