Lossless image extraction from PDF - java

I'm using PDFBox to extract images out of a PDF file and feed it to another image processing library (that can handle different image formats). My current code is like this:
PDImageXObject pdImage;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BufferedImage image = pdImage.getImage();
ImageIO.write(image, "png", baos);
byte[] imageBytes = baos.toByteArray();
This will take whatever is stored in the PDF file and use Java graphics to convert it to PNG. Is there a better way to avoid conversion and extract the image in whatever format it is embedded? I don't want to degrade image quality (I suppose mitigated by using a lossless format like PNG?) and incur conversion overhead.

The DEFLATE algorithm is used by the FlateDecode filter and by the PNG file format. However a stream of FlateDecode-compressed data isn't itself a PNG file.
Also, you need to consider the colorspace representation of the Image XObject (e.g. DeviceCMYK) versus what PNG actually supports.
By targeting lossless compression for your output image file you won't lose any information. (Be sure you actually need a lossless extracted image, often people assume lossy compression means their image will now have so many changes it's no longer recognizable. Though in many cases depending on the parameters the loss is hardly noticeable to the naked eye and you can substantially benefit from the size savings of Lossy compression.)
If performance is slow it could simply be the quality of your PDF software responsible for extracting the image and saving it.

Related

Extract image into a file from PDImageXObject without loading it into memory

This is related to How to extract image bytes out of PDF efficiently, but I'll try to restate the problem differently so it's less about PDF parsing and more about image processing.
I'm using PDFBox to extract images out of PDF files. There's an class PDImageXObject that represents the image inside the PDF, which contains image metadata (height, width, etc), and exposes two APIs to pull out the image are: BufferedImage getImage() and BufferedImage getImage(Rectangle rect, int subsampling);.
The current code is straightforward:
BufferedImage image = pdImage.getImage();
ImageIO.write(image, "jpg", baos);
However, for a large image, I'm having an issue with memory usage, as BufferedImage is storing uncompressed image data in memory, which is a lot bigger than the compressed result.
Is there a way to avoid loading the whole image into memory by breaking it up into tiles (e.g. 1024x1024) and iterating over them using the getImage signature that takes Rectangle? I'm seeing some promising information about JAI being able to use Tiles to output a compressed image without loading the uncompressed content into memory at once, but I don't understand how to tie it together with what I have from PDImageXObject. Or is there another way to do it? Is JAI still an active project?
By the way, the purpose of extracting the image is to feed it into the next component in the pipeline that can handle multiple image formats. So, if some format other than jpg, is more suited for tiled processing, that should be ok.
I'm aware of one possibility using something like BigBufferedImage. But I was thinking processing a Tile at a time looked promising.
OK, I found a libray: Commons Imaging. Class Imaging maybe can help you.
I think you can try createInputStream() method, find out the size of real data(bytes length).

Modify image data in ByteBuffer

I would like to modify the per-pixel RGB color data of a PNG image that I have loaded as a ByteBuffer, preferably a simple, lightweight solution.
I currently load the data directly from the file into a ByteBuffer using a ReadableByteChannel, which does not decode the PNG data.
So the question is, how do I
Decode the ByteBuffer PNG data into something where I can modify the pixel data
Turn it back into a valid ByteBuffer ('valid' means that it would be accepted by an OpenGl shader)
The PNG image encoding includes (among other things) a ZLIB compression, you cannot access the pixel values directly. You need (practically speaking) to decode the image, for example reading it into a BufferedImage with ImageIO.read( )
If for some reason (huge image, memory constraints) you don't like to load the full image in memory in a BufferedImage you could use a progressive reader (eg PNGJ)

How to save lossless jpg in java?

I have to save a jpeg image lossless. I am work on a steganography project but Java compressing and saving my result. I research every forums and try everything but it didn't work.
Here my example code for lossless save a jpeg image:
BufferedImage image = ImageIO.read(new File("sources/image.jpg"));
ImageWriter writer = ImageIO.getImageWritersByFormatName("JPEG").next();
JPEGImageWriteParam jpegParams = new JPEGImageWriteParam(null);
jpegParams.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
jpegParams.setCompressionQuality(1f);
writer.setOutput(ImageIO.createImageOutputStream(new File("example.jpg")));
writer.write(null, new IIOImage(image,null,null), jpegParams);
writer.dispose();
After this process I compute PSNR value is 28.53173 and "example.jpg"'s size is bigger than "image.jpg".
I try import JAI library but I am not sure Java 8 is support JAI.
JPEG is lossy all the time. Even at 100% compression quality there will be some loss of information, but it will be minimised.
The reason why your example.jpg has a bigger size is because it was encoded with a 100% quality factor, while most jpeg encoders have a default value of 50%-75%, which is what was most likely used for example.jpg. You can try different quality factors to see when both files have the same size.
A lossless JPEG format does exist (available in JAI), but it should be thought of as a different format to the conventional JPEG. However, it's not widely used and you'll probably not be able to view it in most applications, which would practically defeat the point of sharing an innocent image.

How to save photos in full resolution in android?

I have made an app that uses the camera to capture images. The images are passed back to my application as Bitmap. I want to know how to modify my code to save the Bitmap into JPEG format at its full resolution?
FileOutputStream out = new FileOutputStream(file);
bitmap.compress(Bitmap.CompressFormat.JPEG, 100, out);
out.flush();
out.close();
I think the Bitmap is being compressed into smaller size!
Compression refers to the the reduction of physical disk space required to save the image. It doesn't automatically mean that the image quality or resolution is also reduced.
JPEG is part of a group of file formats that (mostly) belong to the lossy compression algorithms. In other words some minor image detail and quality is sacrificed to reduce the file size of the image, but it still wouldn't reduce the resolution of the image.
If you want to reduce the file size of the image, but don't want to loose any image quality you need to use a file format which supports lossless compression. You can for example use Bitmap.CompressFormat.PNG.
WEBP supports both lossy and lossless compression (and is even smaller than PNG and JPG in file size). But support for WEBP was only added in API level 14 so there might be some backwards compatibility problems. Just use WEBP if possible, otherwise PNG if you care about image quality.
In any case let's look at the compress() method:
public boolean compress (Bitmap.CompressFormat format, int quality, OutputStream stream)
As you can see you can choose the CompressFormat, pass an OutputStream in and pick a quality. The number you pass in as quality can be between 0 and 100 and it determines if you compress either lossy or lossless. Since you pass in 100 the compression will always be lossless regardless of which CompressFormat you pick!
As an aside: Since PNG only supports lossless compression it will ignore the quality parameter completely and always save the image without reducing its quality!

How to prevent loss of image quality while using ImageIO.write() method?

I am dealing with Images in java where I do read and write Images in my local disk. My Problem is while writing Images I am losing the quality of the actual Image I read. It reduces the quality by image file size from 6.19MB(actual image size) file to 1.22MB(written using ImageIO.write() which is drastic loss. How can I prevent this loss when i do use
ImageIO.write(image, "jpg", os);
for writing Image.
Remember, I dont need any compression over here. Just I want to read the Image and write the same image with same quality and same file Size. I also tried,
writer.write(null, new IIOImage(image, null, null), param);
but it takes my execution time more and does a compression process.
Please help me out in this. Is there any chance to write lossless image quality using
ImageIO.write(image, "jpg", os);
or any other way?
Thanks in advance...!
You seem to be a little confused about JPEG "compression" and "quality" (and I partly blame the ImageIO API for that, as they named the setting "compressionQuality").
JPEG is always going to compress. That's the point of the format. If you don't want compression, JPEG is not the format you want to use. Try uncompressed TIFF or BMP. As the commenters have already said, normal JPEG is always going to be lossy (JPEG-LS and JPEG Lossless are really different algorithms btw).
If you still want to go with JPEG, here's some help. Unfortunately, there's no way to control the "compressionQuality" setting using:
ImageIO.write(image, "jpg", os);
You need to use the more verbose form:
ImageReader reader = ...;
reader.setInput(...);
IIOImage image = reader.readAll(0, null); // Important, also read metadata
RenderedImage renderedImage = image.getRenderedImage();
// Modify renderedImage as you like
ImageWriter writer = ImageIO.getImageWriter(reader);
ImageWriteParam param = writer.getDefaultWriteParam();
paran.setCompressionMode(MODE_COPY_FROM_METADATA); // This modes ensures closest to original compression
writer.setOutput(...);
writer.write(null, image, param); // Write image along with original meta data
Note: try/finally blocks and stream close()and reader/writer dispose() omitted for clarity. You'll want them in real code. :-)
And this shouldn't really make your execution time noticeably longer, as ImageIO.write(...) uses similar code internally.

Categories