BufferedImage reduces Image size - java

I was using java ImageIO and BufferedImage for some image operations and wanted to see how it behaves when I output the image as a jpg. There I found some interesting behaviour that I can't quite explain. I have the following code. The code reads an image and then outputs the same image as "copy.jpg" in the same folder. The code is in Kotlin, but the functions used are java functions:
val image = File("some/image/path.jpg")
val bufImage = ImageIO.read(image.inputStream())
FileOutputStream(File(image.parentFile, "copy.jpg")).use { os ->
ImageIO.write(bufImage, "jpg", os)
}
I would expect it to output the exactly same file, except maybe the meta information. However the resulting file was almost a tenth of the original file. I doubt the meta information would be that much. The exact size difference varied depending on which image file I used, however every time the output image would be smaller. But I could not see a quality difference to the old file. When zooming in I would see the same pixels.
Why is the file size reduced so dramatically?

JPEG is lossy compression: it throws away lots of information in order to keep the file small.  (An uncompressed image file could be orders of magnitude larger.)
It's intended to throw away information that you're not likely to see or care about, of course; but it still loses some image data.
And the loss is generational: if you have an image that came from a JPEG file, and then recompress it to a JPEG file, it will usually lose more data, giving a worse-quality result than the first JPEG file — even if the compression settings are exactly the same.  (Trying to approximate an already-compressed image won't work the same as trying to approximate the original source image. And there's no way to recover information which is already lost!)
That's almost certainly what's happening here.  Your code reads a JPEG file and expands it into a BufferedImage (which holds the uncompressed image data), and then compresses it again into a new JPEG file, which loses further quality.  It's probably using a lot higher compression than the first file used, hence the smaller size.
I'd be surprised if you couldn't see any difference between the two JPEG files in an image viewer or editor, when magnified.  (JPEG artefacts are most obvious around sharp edges and boundaries, but if you know what to look for you can sometimes see them elsewhere.  Subtle changes can be easier to see if you can line up both images on the exact same area of screen and flip directly between them.)
You can control how much information is lost when creating a JPEG — but the ImageIO.write() method you're using doesn't provide a way to do that.  See this question for how to do it.  (It's in Java, but you should be able to follow it.)
Obviously, the more information you're prepared to lose, the smaller file you can end up with.  But note that if you choose a high-quality setting, the result could be a lot larger than the first JPEG, even though it will probably still lose slightly more quality.
(That's why, if you're doing any sort of processing on an image, it's best to keep it in lossless formats until the very end, and compress to a lossy format like JPEG only once, to avoid losing quality each time you save and reload.)
As you indicate, another reason could be the loss of non-image data — you're unlikely to notice the loss of metadata such as camera settings, but the file could have had a sizeable thumbnail image too.

Related

Is there a way to incrementally write to an image file to avoid running out of RAM when rendering a large image?

Currently I am running out of RAM when rendering images and while I have optimized it as much as I can for memory efficiency, I have realized that for images as large as I want it will never be enough so instead I want to write to a file as I am rendering. I have no idea how I could do this and I'm almost certain that most image formats won't be suitable for this as to recalculate compression they would need to put the image being appended to into RAM.
This is my current code. I'm happy to completely switch libraries to get this to work.
panel.logic.Calculate(false); // Renders the image - I can make this render in steps
Graphics2D g2d = bufferedImage.createGraphics();
panel.paint(g2d);
g2d.dispose();
SimpleDateFormat formatter = new SimpleDateFormat("dd-MM-yyyy_HH-mm-ss");
Date date = new Date();
File file = new File(formatter.format(date) + ".png");
try {
ImageIO.write(bufferedImage, "png", file);
} catch (IOException ioException) {
ioException.printStackTrace();
}
Yes, this is possible in several image formats, but won't be easy with the standard Java Graphics2D api. In particular, the class java.awt.image.BufferedImage explicitly represents an image where the entire bitmap is held in memory.
I would start by asking, how large are the images you are thinking of here? Unless your generating program is unusually memory constrained, then any image that is too big to hold in memory during generating will then be too big to hold in memory during display, so would be useless, I think?
In order to write an image file in a "streaming" style, you will need a format that allows you to write pixels or regions. This will be hard in image formats that are more sophisticated like JPEG, but easier in image formats that are more pixel oriented like BMP or PNG.
For BMP, you would need to write the file header, then stream out the pixel array into the file, which you could do pixel-by-pixel without holding the whole thing in memory, then write the footer. The format is described here: https://en.wikipedia.org/wiki/BMP_file_format
For PNG, it would be much the same, except that the file format is quite a bit more complicated and involves a compression layer (which can still be handled in a streaming format).
There aren't many libraries to handle this approach, because of the obvious limitations I outlined above and that other commenters have outlined: if an image is so large that it needs it, then it will be too large to ever display it.
I think you might be able to persuade ImageIO to write a BMP or PNG in a streaming fashion if you implement a custom RenderedImage class. I might see if I can get that to work; I'll update here if so.
Example code
I had a go at writing a PNG image in a streaming fashion using ImageIO.
Here's the code:
https://gist.github.com/RichardBradley/e7326ec777faccb9579ad4e0b0358f87
I found that the PNG encoder will request the image one scanline at a time, regardless of the Tile settings of the Image.
See com/sun/imageio/plugins/png/PNGImageWriter.java:818
(This may in fact be a bug in PNGImageWriter, but no-one has noticed because no-one writes images in a streaming style in real world use.)
If you want to stream the data pixel-by-pixel instead of line-by-line, you could fork PNGImageWriter and refactor that section. I think it should be quite possible.
(I am not 100% certain that something inside the ImageIO / PNGImageWriter pipeline will not just buffer the image to memory anyway. You could turn the image size right up and retest to be sure.)
Your problem is not that the final image may not fit in memory.
The problem is that the rendering process takes too much memory.
This means that you have to modify the rendering process in such a way that it will write its intermediate results to disk instead of keeping it in memory.
This may mean that you can use the BMP format and write bit by bit to the disk as described in the answer provided by Rich (ok, in larger chunks, not really each single bit …), or you write an intermediate format of your own, or you allocate disk memory as cache memory.
But when your current rendering process finishes without an OOME, writing the resulting image to disk cannot be the real issue; only when writing would mean that the given data structure has to be converted again into a particular format, this could cause an issue (for example, the renderer returns a byte array holding the image as BMP, but the output should be a JPEG – in that case, you may have to hold the image in memory twice, and that could cause the OOME).
But without knowing details about what panel.logic.Calculate() and panel.paint() are really doing (in detail!), the question is difficult to answer.
First, I think you can assign more memory to JVM by Config -xmx on jvm parameters.
Second, Use Lazy-Load Strategy. you can try to split the image and every image splited is loaded when it display on the panel.
May be you can take the following steps:
Downsize the image
Change image format
Assign more memory to JVM

Convert image to a fixed format for throwing away all the extra annotations

I am trying to implement attachments in my application and user is able to upload image files (png, jpg, jpeg). I have read OWASP recommendations for image uploads, and one of the tips was to - convert the input image to a bitmap (keeping only the bitmap data, and throwing away all the extra annotations), then convert the bitmap to your desired output format. One reasonable way to do this is to convert to PBM format, then convert to PNG.
Image is saved as byte array.
I am trying to rewrite uploaded image by using ImageTranscoder from ImageIO library. But i am not really sure what it is doing, and if all the possibly malicious code is removed from image, because it seems that only metadata is being rewritten.
Is there any suggestions, best practices, of how desired goal should be achieved to remove all possibly malicious code inside image file?
You do not need an intermediate file format like PBM, as BufferedImage (which is the standard way of representing an in-memory bitmap in Java) is just plain pixel data. You can just go from encoded "anything" to decoded bitmap to encoded PNG.
The simplest way you could possibly do what you describe is:
ImageIO.write(ImageIO.read(input), "PNG", output);
This is rather naive code, and will break for many real-world files, or possibly just silently not output anything. You probably want to handle at least the most normal error cases, so something like below:
BufferedImage image = ImageIO.read(input);
if (image == null) {
// TODO: Handle image not read (decoded)
}
else if (!ImageIO.write(image, "PNG", output)) {
// TODO: Handle image not written (could not be encoded as PNG)
}
Other things to consider: The above will remove malicious code in the meta data. However, there might be special images crafted for DoS (small files decoding to huge in-memory representations, TIFF IFD loops, and much more). These problems need to be addressed in the image decoders for the various input formats. But at least your output files should be safe from this.
In addition, malicious code could be stored in the ICC profile, which might be carried over to the output image. You can probably avoid this by force converting all images to the built-in sRGB color space, or writing the images without ICC profiles.
PS: The ImageTranscoder interface is intended for situations where you want to keep as much meta data as possible (that is why it has methods only for meta data), and allows transformation of meta data from one file format to another (one could argue the name should have been MetadataTranscoder).

join multiple jpeg images with low or constant memory footprint

I have multiple images and each image has a resolution of around 2560x10000, I want to join all these images to make one single image. I cannot use the BufferedImage method as the final image and the image I have to join will be in the memory at the same time causing OutOfMemory. So I tried a the below approach:
public static void joinJpegFiles(File infile, File outfile, float compQuality,int i) {
try {
RenderedImage renderedImage = ImageIO.read(infile);
ImageWriter Iwriter = null;
Iterator iter = ImageIO.getImageWritersByFormatName("jpeg");
if (iter.hasNext()) {
Iwriter = (ImageWriter)iter.next();
}
ImageOutputStream IOStream = ImageIO.createImageOutputStream(outfile);
Iwriter.setOutput(IOStream);
IOStream.seek( IOStream.length());
JPEGImageWriteParam JIWP=new JPEGImageWriteParam(Locale.getDefault());
JIWP.setCompressionMode(ImageWriteParam.MODE_EXPLICIT) ;
JIWP.setCompressionQuality(compQuality);
Iwriter.write(null, new IIOImage(renderedImage,null,null), JIWP);
IOStream.flush();
IOStream.close();
Iwriter.dispose();
} catch (IOException e) {
System.out.println("write error: " + e.getMessage());
}
}
This method is called for each image I want to join.
The issue with this approach is that the size of the final image is increasing and equals the sum of sizes of all images I joined, but only the first image is visible when I open the final image.
I still cant figure out what I am doing wrong and I also couldnt find any sample code to join jpegs other than the BufferedImage and ImageIO.write approach. I read at a news group that it works for tiff format but I need this to work for jpeg/png formats.
I assume you have already solved this, or worked around it somehow, but.. In case anyone else need to solve a similar problem:
It's a little unclear what you are trying to achieve here. Do you really want to create one large image, or create a single file containing multiple images?
Multiple images in single file:
Your code seems to append multiple standalone JPEG files into one file. The JPEG (JFIF) format does not support this, and most software will probably just see your file as the first JPEG with loads of junk bytes appended to it at the end. PNG does not allow storing multiple images in one file AFAIK. A format like TIFF does allow multiple images (it even allows you to store them as JPEG streams), which is probably why TIFF was brought up.
However, the JPEG standard has a concept called Abbreviated Streams, that is very much like how JPEG is usually stored in pyramidal TIFF. The ImageIO JPEGImageWriter does support this feature:
"Abbreviated streams are written using the sequence methods of ImageWriter. Stream metadata is used to write a tables-only image at the beginning of the stream, and the tables are set up for use, using ImageWriter.prepareWriteSequence. If no stream metadata is supplied to ImageWriter.prepareWriteSequence, then no tables-only image is written. If stream metadata containing no tables is supplied to ImageWriter.prepareWriteSequence, then a tables-only image containing default visually lossless tables is written."
I'm not sure how other software will interpret these kinds of files, and according to the libjpeg docs it probably won't even work:
"While abbreviated datastreams
can be useful in a closed environment, their use is strongly discouraged in
any situation where data exchange with other applications might be needed.
Caveat designer."
So.. It may or may not be appropriate for your use case.
Compose multiple images into one large image:
If on the other hand, you really want to compose multiple images into one large image (and later store as a single JPEG), you could have a look at some code I wrote long ago, to allow working with large images, without using heap memory.
It uses memory-mapped byte buffers, and might be painfully slow if you don't have enough memory to store the data in RAM. Also, the resulting BufferedImage will always be of TYPE_CUSTOM, so it will miss most potentially hardware or native acceleration you'll normally benefit from, and some operations may not work on it. However, at least you are not limited by either heap size, nor physical RAM.

How to Combine Images without loading them into RAM in Java

I have a very large (around a gigapixel) image I am trying to generate, and so far I can only create images up to around 40 megapixels in a BufferedImage before I get an out of memory error. I want to construct the image piece by piece, then combine the pieces without loading the images into memory. I could also do this by writing each piece to a file, but ImageIO does not support this.
I think JAI can help you build what you want. I would suggest looking at the data structures and streams offered by JAI.
Also, have a look at these questions, might help you with ideas.
How to save a large fractal image with the least possible memory footprint
How to create a big image file from many tiles
Appending to an Image File
You basically want to reverse 2 there.
Good luck with your project ;)
Not a proper solution, just a sketch.
Unpacking a piece of image is not easy when an image is compressed. You can decompress, by an external tool, the image into some trivial format (xpm, uncompressed tiff). Then you could load pieces of this image as byte arrays, because the format is so straightforward, and create Image instances out of these raw data.
I see two easy solutions. Create a custom binary format for your image. For saving, just generate one part at a time, seek() to the appropriate spot in the file, then offload your data. For loading, seek() to the appropriate spot in the file, then load your data.
The other solution is to learn an image format yourself. bmp is uncompressed, but the only easy one to learn. Once learned, the above steps work quite well.
Remember to convert your image to a byte array for easy storage.
If there is no way to do it built into Java (for your sake I hope this is not the case and that someone answers saying so), then you will need to implement an algorithm yourself, just as others have commented here saying so.
You do not necessarily need to understand the entire algorithm yourself. If you take a pre-existing algorithm, you could just modify it to load the file as a byte stream, create a byte buffer to keep reading chunks of the file, and modify the algorithm to accept this data a chunk at a time.
Some algorithms, such as jpg, might not be possible to implement with a linear stream of file chunks in this manner. As #warren suggested, bmp is probably the easiest to implement in this way since that file format just has a header of so many bytes then it just dumps the RGBA data straight out in binary format (along with some padding). So if you were to load up your sub-images that need to be combined, loading them logically 1 at a time (though you could actually multithread this thing and load the next data concurrently to speed it up, as this process is going to take a long time), reading the next line of data, saving that out to your binary output stream, and so on.
You might even need to load the sub-images multiple times. For example, imagine an image being saved which is made up of 4 sub-images in a 2x2 grid. You might need to load image 1, read its first line of data, save that to your new file, release image 1, load image 2, read its first line of data, save, release 2, load 1 to read its 2nd line of data, and so on. You would be more likely to need to do this if you use a compressed image format for saving in.
To suggest a bmp again, since bmp is not compressed and you can just save the data in whatever format you want (assuming the file was opened in a manner which provides random access), you could skip around in the file you're saving so that you can completely read 1 sub-image and save all of its data before moving on to the next one. That might provide run time savings, but it might also provide terrible saved file sizes.
And I could go on. There are likely to be multiple pitfalls, optimizations, and so on.
Instead of saving 1 huge file which is the result of combining other files, what if you created a new image file format which was merely made up of meta-data allowing it to reference other files in a way which combined them logically without actually creating 1 massive file? Whether or not creating a new image file format is an option depends on your software; if you are expecting people to take these images to use in other software, then this would not work - at least, not unless you could get your new image file format to catch on and become standard.

How can I do image manipulation on a very large BMP?

I am trying to do some manipulation (specifically, conversion to a different type of splitting into tiles) on a set of very large (a few GB) BMP image files.
I'm not sure I understand the BMP file format, but is it necessary to load the entire file into memory? I was unable to find any API that didn't require loading the entire file at some point. ImageMagick wasn't able to do it either.
Java would be the best tool of choice for me, but any other solution including command line tools or desktop software would be acceptable.
Based on this, it should be reasonably apparent that you can use the fact that it's row-packed. You should be able to read part of a row, store it, advance to the same position in the next row and repeat until you have completed a tile of the desired size. Obviously you may be able to do multiple tiles at once if you can store an entire row worth of tiles in memory all at once.
It is not necessary to load more than the header of the file at once. The header format is describe in the linked Wikipedia entry. It's probably worth paying attention to any compression schemes being used - compression is likely to make this task a bit harder (though still not impossible) :)
Not sure about 2011 year of this questing, but Java 7 (and possibly 6) has good approach for such pleasure.
Use ImageReader and ImageReadParam classes, they allow to read any rectangular part of source image if needed. Set source rectangle and read
... // initialization
private javax.imageio.ImageReadParam m_params;
private javax.imageio.ImageReader m_reader;
... // reading
m_params.setSourceRegion( readRect );
BufferedImage rdImg = m_reader.read( i, m_params );
... // processing/displaying etc
As BMP (it works too for any standard Java image type) is very planar format without any compression in 99.99% cases, to read any its part such way is very fast process with low memory consumption except resulted BufferedImage. Or you can even reuse previous BufferedImage on consequent request with the rect of same dimensions. See ImageReadParam.setDestination(BufferedImage destination). But I didn't test this option.
Also I found one small reading bug namely in BMP reader class implementation of Java runtime lib. If somebody be interested in this, I'll show simple way to correct it.

Categories