compressing files in JAVA

compressing files in JAVA - java

My project has a requirement that I have to receive a file via a REST service(using jersey) and store it in the database.
The file size will be around 2-4MB.
The received file can be either zip or pdf format.
Before storing in database I would like to compress it.
I googled and found that there are many available classes like GZip, Zip, Deflater... I thought of using Deflater as it looked very simple.I have written the following code for zipping.
Deflater deflater = new Deflater();
deflater.setInput(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
deflater.finish();
byte[] buffer = new byte[1024];
while (!deflater.finished()) {
int count = deflater.deflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
byte[] output = outputStream.toByteArray();
byte[] output = outputStream.toByteArray();
Could any one please suggest for my use case If I use the above code is it fine or do I have to use some other classes to perform the same.
Thanks,
Kitty

ByteArrayOutputStream caches the compressed output in memory. you have to wrap it around a FileOutputStream to avoid any OOM issue while writing in case of big files.

Related

Upload to S3 using Gzip in Java

I'm new to Java and I'm trying to upload a large file ( ~10GB ) to Amazon S3. Could anyone please help me with how to use GZip outputsteam for it ?
I've been through some documentations but got confused about Byte Streams, Gzip streams. They must be used together ? Can anyone help me with this piece of code ?
Thanks in advance.

Have a look at this,
Is it possible to gzip and upload this string to Amazon S3 without ever being written to disk?
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
GZipOuputStream gzipOut = new GZipOutputStream(byteOut);
// write your stuff
byte[] bites = byteOut.toByteArray();
//write the bites to the amazon stream
Since its a large file you might want to have a look at multi part upload

This question could have been more specific and there are several ways to achieve this. One approach might look like the below.
The example depends on the commons-io and commons-compress libraries, and uses classes from the java.nio.file package.
public static void compressAndUpload(AmazonS3 s3, InputStream in)
throws IOException
{
// Create temp file
Path tmpPath = Files.createTempFile("prefix", "suffix");
// Create and write to gzip compressor stream
OutputStream out = Files.newOutputStream(tmpPath);
GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(out);
IOUtils.copy(in, gzOut);
// Read content from temp file
InputStream fileIn = Files.newInputStream(tmpPath);
long size = Files.size(tmpPath);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentType("application/x-gzip");
metadata.setContentLength(size);
// Upload file to S3
s3.putObject(new PutObjectRequest("bucket", "key", fileIn, metadata));
}
Buffering, error handling and closing of streams are omitted for brevity.

JAVA: Stream any file to browser correctly

So I have created my own personal HTTP Server in Java from scratch.
So far it is working fine but with one major flaw.
When I try to pass big files to the browser I get a Java Heap Space error. I know how to fix this error through the JVM but I am looking for the long term solution for this.
//declare an integer for the byte length of the file
int length = (int) f.length();
//start the fileinput stream.
FileInputStream fis = new FileInputStream(f);
//byte array with the length of the file
byte[] bytes = new byte[length];
//write the file until the bytes is empty.
while ((length = fis.read(bytes)) != -1 ){
write(bytes, 0, length);
}
flush();
//close the file input stream
fis.close();
This way sends the file to the browser successfully and streams it perfectly but the issue is, because I am creating a byte array with the length of the file. When the file is very big I get the Heap Space error.
I have eliminated this issue by using a buffer as shown below and I dont get Heap Space errors anymore. BUT the way shown below does not stream the files in the browser correctly. It's as if the file bytes are being shuffled and are being sent to the browser all together.
final int bufferSize = 4096;
byte buffer[] = new byte[bufferSize];
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
while ( true )
{
int length = bis.read( buffer, 0, bufferSize );
if ( length < 0 ) break;
write( buffer, 0, length );
}
flush();
bis.close();
fis.close();
NOTE1:
All the correct Response Headers are being sent perfectly to the browser.
Note2:
Both ways work perfectly on a computer browser but only the first way works on a smartphone's browser (but sometimes it gives me Heap Space error).
If someone knows how to correctly send files to a browser and stream them correctly I would be a very very happy man.
Thank you in advance! :)

When reading from a BufferedInputStream you can allow its' buffer to handle the buffering, there is no reason to read everything into a byte[] (and certainly not a byte[] of the entire File). Read one byte at a time, and rely on the internal buffer of the stream. Something like,
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
int abyte;
while ((abyte = bis.read()) != -1 ){
write(abyte);
}

Emm... As I can see it, you try to use chunks in your code anyway,
as I can remember, even the apache HttpClient+FileUpload solution has file size limit about <=2.1GB or something (correct me if I am wrong) so it is a bit hard thing...
I haven't tried the solution yet but as a test you can use java.io.RandomAccessFile in combination with File(Input/Output)Stream on the client and server not to read and write the whole file at a time but sequence of lets say <=30MB blocks for example to avoid the annoying outofmemory errors ; An example of using RandomAccessFile can be found here https://examples.javacodegeeks.com/core-java/io/randomaccessfile/java-randomaccessfile-example/
But still you give less details :( I mean is your client suppose to be a common Java application or not?
If you have some additional information please let me know
Good luck :)

Read hidden zip file

I have a jpeg, and on the end of it I wrote a zip file.
Inside this zip file is a single txt file called hidden.txt. I can change the extension to zip and read the file just fine on my laptop (debian) but when I try to read it using either a ZipInputStream or using ZipFile I get an error telling me it's not a zip file.
I tried separating the jpg part out first by reading the whole thing to a Bitmap then writing that to a byte[], however the byte[] encompassed more than just the image.
My method to combine the bitmap and the zipFile (a byte[])
private byte[] combineFiles(Bitmap drawn, byte[] zip) throws
IOException {
InputStream in;
ByteArrayOutputStream out = new ByteArrayOutputStream();
/*write the first file*/
byte[] img;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
drawn.compress(Bitmap.CompressFormat.JPEG, 100, byteArrayOutputStream);
img = byteArrayOutputStream.toByteArray();
in = new ByteArrayInputStream(img);
IOUtils.copy(in, out);
in.close();
/*add the second (hidden) file*/
in = new ByteArrayInputStream(zip);
IOUtils.copy(in, out);
out.flush();
in.close();
return out.toByteArray();
}
So really I have two questions,
How do I separate the jpg and zip portions of the file?
How do I unzip hidden.txt (preferably into a byte[])
fairly certain I know this one, but what I am doing currently does not work, probably because I am doing #1 wrong

Ok, well here's how I would do this. Although it's very hacky.
The problem is that it's hard to tell the index of the boundary between the image data and the zip data. Assuming that you can write arbitrary data after the image data and still have a working image file, here is something you could try:
write out the image data.
write out a magical string like "BEGIN_ZIP"
write out the zip data.
Now, when you are trying to read things back in:
byte[] data = readAllTheBytes();
int index = searchFor("BEGIN_ZIP", data) + "BEGIN_ZIP".length();
// now you know that the zip data begins at index and goes to the end of the byte array
// so just use a regular zipinputstream to read in the zip data.

In JPEG file 0xFF, 0xD8 sequence of bytes indicates start of image and 0xFF, 0xD9 sequence of bytes indicates end of image JPEG Structure Wikipedia. So simply search for the latter sequence in file and you will be able to separate image and zip parts. Then use ZipInputStream to read (decompress) the data from zip file.

java Files.readAllBytes(image.png) doesn't work

I was trying to read from file and then write to other file. I use code bellow to do so.
byte[] bytes = Files.readAllBytes(file1);
Writer Writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2), "UTF-8"));
for(int i=0;i<bytes.length;i++)
Writer.write(bytes[i]);
Writer.close();
But when I change file1 to picture.png and file2 to picture2.png, this method doesn't work and I can't open picture2.png using image viewer.
What have I done wrong?

Writers are for writing text, possibly in different formats (ie utf-8 / 16, etc). For writing raw bytes, don't use writers. Just use (File)OutputStreams.
It is truly as simple as
byte[] bytes = ...;
FileOutputStream fos = ...;
fos.write(bytes);

The other answers explain why what you have potentially fails.
I'm curious why you're already using one Java NIO method, but not others? The library already has methods to do this for you.
byte[] bytes = Files.readAllBytes(file1);
Files.write(file2, bytes, StandardOpenOption.CREATE_NEW); // or relevant OpenOptions
or
FileOutputStream out = new FileOutputStream(file2); // or buffered
Files.copy(file1, out);
out.close();
or
Files.copy(file1, file2, options);

The problem is that Writer.write() doesn't take a byte. It takes a char, which is variable size, and often bigger than one byte.
But once you've got the whole thing read in as a byte[], you can just use Files.write() to send the whole array to a file in much the same way that you read it in:
Files.write(filename, bytes);
This is the more modern NIO idiom, rather than using an OutputStream.
It's worth reading the tutorial.

Fastest way to read/write Images from a File into a BufferedImage?

What is the fastest way to read Images from a File into a BufferedImage in Java/Grails?
What is the fastest way to write Images from a BufferedImage into a File in Java/Grails?
my variant (read):
byte [] imageByteArray = new File(basePath+imageSource).readBytes()
InputStream inStream = new ByteArrayInputStream(imageByteArray)
BufferedImage bufferedImage = ImageIO.read(inStream)
my variant (write):
BufferedImage bufferedImage = // some image
def fullPath = // image page + file name
byte [] currentImage
try{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write( bufferedImage, "jpg", baos );
baos.flush();
currentImage = baos.toByteArray();
baos.close();
}catch(IOException e){
System.out.println(e.getMessage());
}
}
def newFile = new FileOutputStream(fullPath)
newFile.write(currentImage)
newFile.close()

Your solution to read is basically reading the bytes twice, once from the file and once from the ByteArrayInputStream. Don't do that
With Java 7 to read
BufferedImage bufferedImage = ImageIO.read(Files.newInputStream(Paths.get(basePath + imageSource)));
With Java 7 to write
ImageIO.write(bufferedImage, "jpg", Files.newOutputStream(Paths.get(fullPath)));
The call to Files.newInputStream will return a ChannelInputStream which (AFAIK) is not buffered. You'll want to wrap it
new BufferedInputStream(Files.newInputStream(...));
So that there are less IO calls to disk, depending on how you use it.

I'm late to the party, but anyway...
Actually, using:
ImageIO.read(new File(basePath + imageSource));
and
ImageIO.write(bufferedImage, "jpeg", new File(fullPath));
...might prove faster (try it, using a profiler, to make sure).
This is because these variants use RandomAccessFile-backed ImageInputStream/ImageOutputStream implementations behind the scenes, while the InputStream/OutputStream-based versions will by default use a disk-backed seekable stream implementation. The disk-backing involves writing the entire contents of the stream to a temporary file and possibly reading back from it (this is because image I/O often benefits from non-linear data access).
If you want to avoid extra I/O with the stream based versions, at the cost of using more memory, it is possible to call the ambiguously named ImageIO.setUseCache(false), to disable disk caching of the seekable input streams. This is obviously not a good idea if you are dealing with very large images.

You are almost good for writing. Just don't use the intermediate ByteArrayOutputStream. It is a giant bottleneck in your code. Instead wrap the FileOutputStream in a BufferedOutputStream and do the same.
Same goes indeed for your reading. Remove the Itermediate ByteArrayInputStream.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

compressing files in JAVA - java

ByteArrayOutputStream caches the compressed output in memory. you have to wrap it around a FileOutputStream to avoid any OOM issue while writing in case of big files.

Related

Upload to S3 using Gzip in Java

JAVA: Stream any file to browser correctly

Read hidden zip file

java Files.readAllBytes(image.png) doesn't work

Fastest way to read/write Images from a File into a BufferedImage?

Categories

Resources