How to Combine Images without loading them into RAM in Java

How to Combine Images without loading them into RAM in Java - java

I have a very large (around a gigapixel) image I am trying to generate, and so far I can only create images up to around 40 megapixels in a BufferedImage before I get an out of memory error. I want to construct the image piece by piece, then combine the pieces without loading the images into memory. I could also do this by writing each piece to a file, but ImageIO does not support this.

I think JAI can help you build what you want. I would suggest looking at the data structures and streams offered by JAI.
Also, have a look at these questions, might help you with ideas.
How to save a large fractal image with the least possible memory footprint
How to create a big image file from many tiles
Appending to an Image File
You basically want to reverse 2 there.
Good luck with your project ;)

Not a proper solution, just a sketch.
Unpacking a piece of image is not easy when an image is compressed. You can decompress, by an external tool, the image into some trivial format (xpm, uncompressed tiff). Then you could load pieces of this image as byte arrays, because the format is so straightforward, and create Image instances out of these raw data.

I see two easy solutions. Create a custom binary format for your image. For saving, just generate one part at a time, seek() to the appropriate spot in the file, then offload your data. For loading, seek() to the appropriate spot in the file, then load your data.
The other solution is to learn an image format yourself. bmp is uncompressed, but the only easy one to learn. Once learned, the above steps work quite well.
Remember to convert your image to a byte array for easy storage.

If there is no way to do it built into Java (for your sake I hope this is not the case and that someone answers saying so), then you will need to implement an algorithm yourself, just as others have commented here saying so.
You do not necessarily need to understand the entire algorithm yourself. If you take a pre-existing algorithm, you could just modify it to load the file as a byte stream, create a byte buffer to keep reading chunks of the file, and modify the algorithm to accept this data a chunk at a time.
Some algorithms, such as jpg, might not be possible to implement with a linear stream of file chunks in this manner. As #warren suggested, bmp is probably the easiest to implement in this way since that file format just has a header of so many bytes then it just dumps the RGBA data straight out in binary format (along with some padding). So if you were to load up your sub-images that need to be combined, loading them logically 1 at a time (though you could actually multithread this thing and load the next data concurrently to speed it up, as this process is going to take a long time), reading the next line of data, saving that out to your binary output stream, and so on.
You might even need to load the sub-images multiple times. For example, imagine an image being saved which is made up of 4 sub-images in a 2x2 grid. You might need to load image 1, read its first line of data, save that to your new file, release image 1, load image 2, read its first line of data, save, release 2, load 1 to read its 2nd line of data, and so on. You would be more likely to need to do this if you use a compressed image format for saving in.
To suggest a bmp again, since bmp is not compressed and you can just save the data in whatever format you want (assuming the file was opened in a manner which provides random access), you could skip around in the file you're saving so that you can completely read 1 sub-image and save all of its data before moving on to the next one. That might provide run time savings, but it might also provide terrible saved file sizes.
And I could go on. There are likely to be multiple pitfalls, optimizations, and so on.
Instead of saving 1 huge file which is the result of combining other files, what if you created a new image file format which was merely made up of meta-data allowing it to reference other files in a way which combined them logically without actually creating 1 massive file? Whether or not creating a new image file format is an option depends on your software; if you are expecting people to take these images to use in other software, then this would not work - at least, not unless you could get your new image file format to catch on and become standard.

Related

Is there a way to incrementally write to an image file to avoid running out of RAM when rendering a large image?

Currently I am running out of RAM when rendering images and while I have optimized it as much as I can for memory efficiency, I have realized that for images as large as I want it will never be enough so instead I want to write to a file as I am rendering. I have no idea how I could do this and I'm almost certain that most image formats won't be suitable for this as to recalculate compression they would need to put the image being appended to into RAM.
This is my current code. I'm happy to completely switch libraries to get this to work.
panel.logic.Calculate(false); // Renders the image - I can make this render in steps
Graphics2D g2d = bufferedImage.createGraphics();
panel.paint(g2d);
g2d.dispose();
SimpleDateFormat formatter = new SimpleDateFormat("dd-MM-yyyy_HH-mm-ss");
Date date = new Date();
File file = new File(formatter.format(date) + ".png");
try {
ImageIO.write(bufferedImage, "png", file);
} catch (IOException ioException) {
ioException.printStackTrace();
}

Yes, this is possible in several image formats, but won't be easy with the standard Java Graphics2D api. In particular, the class java.awt.image.BufferedImage explicitly represents an image where the entire bitmap is held in memory.
I would start by asking, how large are the images you are thinking of here? Unless your generating program is unusually memory constrained, then any image that is too big to hold in memory during generating will then be too big to hold in memory during display, so would be useless, I think?
In order to write an image file in a "streaming" style, you will need a format that allows you to write pixels or regions. This will be hard in image formats that are more sophisticated like JPEG, but easier in image formats that are more pixel oriented like BMP or PNG.
For BMP, you would need to write the file header, then stream out the pixel array into the file, which you could do pixel-by-pixel without holding the whole thing in memory, then write the footer. The format is described here: https://en.wikipedia.org/wiki/BMP_file_format
For PNG, it would be much the same, except that the file format is quite a bit more complicated and involves a compression layer (which can still be handled in a streaming format).
There aren't many libraries to handle this approach, because of the obvious limitations I outlined above and that other commenters have outlined: if an image is so large that it needs it, then it will be too large to ever display it.
I think you might be able to persuade ImageIO to write a BMP or PNG in a streaming fashion if you implement a custom RenderedImage class. I might see if I can get that to work; I'll update here if so.
Example code
I had a go at writing a PNG image in a streaming fashion using ImageIO.
Here's the code:
https://gist.github.com/RichardBradley/e7326ec777faccb9579ad4e0b0358f87
I found that the PNG encoder will request the image one scanline at a time, regardless of the Tile settings of the Image.
See com/sun/imageio/plugins/png/PNGImageWriter.java:818
(This may in fact be a bug in PNGImageWriter, but no-one has noticed because no-one writes images in a streaming style in real world use.)
If you want to stream the data pixel-by-pixel instead of line-by-line, you could fork PNGImageWriter and refactor that section. I think it should be quite possible.
(I am not 100% certain that something inside the ImageIO / PNGImageWriter pipeline will not just buffer the image to memory anyway. You could turn the image size right up and retest to be sure.)

Your problem is not that the final image may not fit in memory.
The problem is that the rendering process takes too much memory.
This means that you have to modify the rendering process in such a way that it will write its intermediate results to disk instead of keeping it in memory.
This may mean that you can use the BMP format and write bit by bit to the disk as described in the answer provided by Rich (ok, in larger chunks, not really each single bit …), or you write an intermediate format of your own, or you allocate disk memory as cache memory.
But when your current rendering process finishes without an OOME, writing the resulting image to disk cannot be the real issue; only when writing would mean that the given data structure has to be converted again into a particular format, this could cause an issue (for example, the renderer returns a byte array holding the image as BMP, but the output should be a JPEG – in that case, you may have to hold the image in memory twice, and that could cause the OOME).
But without knowing details about what panel.logic.Calculate() and panel.paint() are really doing (in detail!), the question is difficult to answer.

First, I think you can assign more memory to JVM by Config -xmx on jvm parameters.
Second, Use Lazy-Load Strategy. you can try to split the image and every image splited is loaded when it display on the panel.

May be you can take the following steps:
Downsize the image
Change image format
Assign more memory to JVM

BufferedImage reduces Image size

I was using java ImageIO and BufferedImage for some image operations and wanted to see how it behaves when I output the image as a jpg. There I found some interesting behaviour that I can't quite explain. I have the following code. The code reads an image and then outputs the same image as "copy.jpg" in the same folder. The code is in Kotlin, but the functions used are java functions:
val image = File("some/image/path.jpg")
val bufImage = ImageIO.read(image.inputStream())
FileOutputStream(File(image.parentFile, "copy.jpg")).use { os ->
ImageIO.write(bufImage, "jpg", os)
}
I would expect it to output the exactly same file, except maybe the meta information. However the resulting file was almost a tenth of the original file. I doubt the meta information would be that much. The exact size difference varied depending on which image file I used, however every time the output image would be smaller. But I could not see a quality difference to the old file. When zooming in I would see the same pixels.
Why is the file size reduced so dramatically?

JPEG is lossy compression: it throws away lots of information in order to keep the file small.  (An uncompressed image file could be orders of magnitude larger.)
It's intended to throw away information that you're not likely to see or care about, of course; but it still loses some image data.
And the loss is generational: if you have an image that came from a JPEG file, and then recompress it to a JPEG file, it will usually lose more data, giving a worse-quality result than the first JPEG file — even if the compression settings are exactly the same.  (Trying to approximate an already-compressed image won't work the same as trying to approximate the original source image. And there's no way to recover information which is already lost!)
That's almost certainly what's happening here.  Your code reads a JPEG file and expands it into a BufferedImage (which holds the uncompressed image data), and then compresses it again into a new JPEG file, which loses further quality.  It's probably using a lot higher compression than the first file used, hence the smaller size.
I'd be surprised if you couldn't see any difference between the two JPEG files in an image viewer or editor, when magnified.  (JPEG artefacts are most obvious around sharp edges and boundaries, but if you know what to look for you can sometimes see them elsewhere.  Subtle changes can be easier to see if you can line up both images on the exact same area of screen and flip directly between them.)
You can control how much information is lost when creating a JPEG — but the ImageIO.write() method you're using doesn't provide a way to do that.  See this question for how to do it.  (It's in Java, but you should be able to follow it.)
Obviously, the more information you're prepared to lose, the smaller file you can end up with.  But note that if you choose a high-quality setting, the result could be a lot larger than the first JPEG, even though it will probably still lose slightly more quality.
(That's why, if you're doing any sort of processing on an image, it's best to keep it in lossless formats until the very end, and compress to a lossy format like JPEG only once, to avoid losing quality each time you save and reload.)
As you indicate, another reason could be the loss of non-image data — you're unlikely to notice the loss of metadata such as camera settings, but the file could have had a sizeable thumbnail image too.

How To Calculate JPG Data As It Loads From The Input Stream

How To Calculate JPG Data As It Loads From The Input Stream
I need to calculate RGB pixel data from a JPG file on demand. In other words, I cannot load the whole image. I need to open the stream, skip to the information I need, and ultimately return an array of RGB information I need.
I want to extract all the compression information I need, and use it to go after a specific targeted pixel.
The programming language I need to implement this in is JAVA. Is there any classes/APIs that will help me achieve this? Or do I need to create my own JPGInputStream?

If your JPEG stream contains a sequential frame, you could decode each scan (usually 1, 3, or 4) as they arrive and display them. It would look pretty funky color wise.
If your JPEG stream contains a progressive frame, you could also decode after each scan. In that case the progression would be pretty normal.
This kind of approach was great in the days of dialup internet where it could take minutes to download a single image. These days, there tends to be little value in it.

Optimising Java's NIO for small files

We have a file I/O bottleneck. We have a directory which contains lots of JPEG files, and we want to read them in in real time as a movie. Obviously this is not an ideal format, but this is a prototype object tracking system and there is no possibility to change the format as they are used elsewhere in the code.
From each file we build a frame object which basically means having a buffered image and an explicit bytebuffer containing all of the information from the image.
What is the best strategy for this? The data is on a SSD which in theory has read/write rates around 400Mb/s, but in practice is reading no more than 20 files per second (3-4Mb/s) using the naive implementation:
bufferedImg = ImageIO.read(imageFile);[1]
byte[] data = ((DataBufferByte)bufferedImg.getRaster().getDataBuffer()).getData();[2]
imgBuf = ByteBuffer.wrap(data);
However, Java produces lots of possibilities for improving this.
(1) CHannels. Esp File Channels
(2) Gathering/Scattering.
(3) Direct Buffering
(4) Memory Mapped Buffers
(5) MultiThreading - use a bunch of callables to access many files simultaneously.
(6) Wrapping the files in a single large file.
(7) Other things I haven't thought of yet.
I would just like to know if anyone has extensively tested the different options, and knows what is optimal? I assume that (3) is a must, but I would still like to optimise the reading of a single file as far as possible, and am unsure of the best strategy.
Bonus Question: In the code snipped above, when does the JVM actually 'hit the disk' and read in the contents of the file, is it [1] or is that just a file handler which `points' to the object? It kind of makes sense to lazily evaluate but I don't know how the implementation of the ImageIO class works.

ImageIO.read(imageFile)
As it returns BufferedImage, I assume it will hit disk and just not file handler.

How can I do image manipulation on a very large BMP?

I am trying to do some manipulation (specifically, conversion to a different type of splitting into tiles) on a set of very large (a few GB) BMP image files.
I'm not sure I understand the BMP file format, but is it necessary to load the entire file into memory? I was unable to find any API that didn't require loading the entire file at some point. ImageMagick wasn't able to do it either.
Java would be the best tool of choice for me, but any other solution including command line tools or desktop software would be acceptable.

Based on this, it should be reasonably apparent that you can use the fact that it's row-packed. You should be able to read part of a row, store it, advance to the same position in the next row and repeat until you have completed a tile of the desired size. Obviously you may be able to do multiple tiles at once if you can store an entire row worth of tiles in memory all at once.
It is not necessary to load more than the header of the file at once. The header format is describe in the linked Wikipedia entry. It's probably worth paying attention to any compression schemes being used - compression is likely to make this task a bit harder (though still not impossible) :)

Not sure about 2011 year of this questing, but Java 7 (and possibly 6) has good approach for such pleasure.
Use ImageReader and ImageReadParam classes, they allow to read any rectangular part of source image if needed. Set source rectangle and read
... // initialization
private javax.imageio.ImageReadParam m_params;
private javax.imageio.ImageReader m_reader;
... // reading
m_params.setSourceRegion( readRect );
BufferedImage rdImg = m_reader.read( i, m_params );
... // processing/displaying etc
As BMP (it works too for any standard Java image type) is very planar format without any compression in 99.99% cases, to read any its part such way is very fast process with low memory consumption except resulted BufferedImage. Or you can even reuse previous BufferedImage on consequent request with the rect of same dimensions. See ImageReadParam.setDestination(BufferedImage destination). But I didn't test this option.
Also I found one small reading bug namely in BMP reader class implementation of Java runtime lib. If somebody be interested in this, I'll show simple way to correct it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.