PDFBox extracted image is much bigger than original page - java

I have an image-only PDF file that looks like a scan of a really big page. Preview shows me that it is about 42x30 inches, and 3047x2160 pixelss. I guess it was scanned at 72dpi resolution.
I'm extracting this image with PDFBox by looking for instances of PDImageXObject, similar to https://stackoverflow.com/a/37664125/10026.
However, for this image, PDImageXObject.getWidth() and PDImageXObject.getHeight() give me 16928 and 12000, respectively. When I call PDImageXObject.getImage(), it creates an enormous BufferedImage in memory.
Is there a better way to get the image out of so that it keeps the original pixel size?

Related

Extract image into a file from PDImageXObject without loading it into memory

This is related to How to extract image bytes out of PDF efficiently, but I'll try to restate the problem differently so it's less about PDF parsing and more about image processing.
I'm using PDFBox to extract images out of PDF files. There's an class PDImageXObject that represents the image inside the PDF, which contains image metadata (height, width, etc), and exposes two APIs to pull out the image are: BufferedImage getImage() and BufferedImage getImage(Rectangle rect, int subsampling);.
The current code is straightforward:
BufferedImage image = pdImage.getImage();
ImageIO.write(image, "jpg", baos);
However, for a large image, I'm having an issue with memory usage, as BufferedImage is storing uncompressed image data in memory, which is a lot bigger than the compressed result.
Is there a way to avoid loading the whole image into memory by breaking it up into tiles (e.g. 1024x1024) and iterating over them using the getImage signature that takes Rectangle? I'm seeing some promising information about JAI being able to use Tiles to output a compressed image without loading the uncompressed content into memory at once, but I don't understand how to tie it together with what I have from PDImageXObject. Or is there another way to do it? Is JAI still an active project?
By the way, the purpose of extracting the image is to feed it into the next component in the pipeline that can handle multiple image formats. So, if some format other than jpg, is more suited for tiled processing, that should be ok.
I'm aware of one possibility using something like BigBufferedImage. But I was thinking processing a Tile at a time looked promising.
OK, I found a libray: Commons Imaging. Class Imaging maybe can help you.
I think you can try createInputStream() method, find out the size of real data(bytes length).

Rotate images without saving the whole image in memory

I'm saving a large PNG file (40000 x 3000) using PNGJ library. Now I need to rotate the image 90 degrees to the right without saving the whole image in memory. PNGJ library is limited to write images line by line, so I can't rotate each line and write the imagem column by column.
Is there any way to do that?
PNGJ library is limited to write images line by line
Actually, it's the PNG format that's line-oriented. And you can't read a single pixel of a PNG image without reading all the "previous" pixels. So, I guess you are out of luck.
The best you can do, I think, if you cannot store the full image in memory, is to load and write it by K horizontal stripes. You fill the first stripe by reading the full image (you only store the fist pixels of each row, that correspond to the pixels of the first horizontal stripe of the rotated image, discarding the rest), write it, and read again the file to fill and write the second stripe, etc.
This involves K readings of the original file (of course, you should make the stripe as thick as your memory permits, so as to make K small). I hope you get the idea.
You can do that with PNGJ.

Jpeg to Svg and Image Tracing

Currently we have a requirement where we have an image depicting the blueprint of the mall (red specifies the booked up areas and white specifies the available areas) and the image is available in a raster (JPEG) format.
We would like to drag and drop some icons onto the available areas of the image (in white). There should also be zoom in and zoom out functionality to be given for the above image as well
Since the JPEG has a lossy scaling, zooming after a certain limit can result in a jagged image. One proposed solution is to convert the image to SVG (Scalable Vector graphics).
Going with the expanded form of SVG, it simply tells us that image is:
s=>scalable (i.e. you can zoom to any level without compromising the quality)
v=>vectorized (i.e co-ordinates are available)
So by simply looking at the XML format of the image, we can predict whether to allow dropping an object at fill=red or fill=white where red and white are the two colors in the image. This might not be appropriate solution, but I'm just guessing it this way
Now the problems I see with this approach is:
Converting an image with some open source tool (InkSpace) - if we trace it with ink-space, which uses portace inside it to trace the image, it can handle only black and white colors.
Note-: Most of the tools comes with some license.
Problem with inkspace is that it embeds the image into the SVG map and does not create the co-ordinates. If you trace it with inkspace, it only creates the outline of the image.
Converting it with some online utility - Not recommended in our case, but doing so results in a large size of the SVG image. For a 700 KB file, the SVG generated is about 39 MB, which when opened up on a browser crashes the browser.
Most of the time when the image is converted to an SVG, it becomes way too large a big factor to worry about. There are utilities available like Gzip to compress files, but this is a two way route - first you convert, then you compress.
Using delinate (which employs a portace and autotrace engines in it) - the quality of the image produced is not good.
Using Java code - Again the quality suffers. Java graphics are not fully developed to handle the conversion (size is again way too large)
Converting the image to PDF, then to SVG - this also embeds the image into the SVG file, which is useless as no co-ordinates are available
Does anybody got any idea on this ,how to deal with this situation?,Can we handle the drag and drop on raster(jpeg,png...etc) images itself?
Thanks
Dishant Anand

Java-generated PDF renders fine on screen, but does not print correctly

I'm generating a complex PDF from a swing application by printing my JComponent.
This PDF is created by getting a proxy Graphics2D object from an iText PdfTemplate object.
The PDF is viewable on-screen, but when printed in banner mode on a Lexmark 4650 from windows printing cuts off after the fourth page, with most items not being printed.
Is there a good way to look at the contents of the PDF to see if something is out of line? The PDF seems to be larger than it needs to be, given the information it contains.
Or, a way to get a useful error from the Lexmark printer?
Download the PDF File
EDIT 2011-10-18 13:45:00 PST: replaced PDF with a smaller version with less PDF shape data. Still not printing correctly.
Here is the output from the printer. You can see that printout cuts off shortly after the 410 depth.
We have seen printing fail when there is not enough memory - printing needs a much bigger raster than screen. Does increasing memory help?
There was a rendering issue where a line was being drawn to a coordinate of Integer.MIN_VALUE, this made the printer very unhappy.

Java Image Quality (JPEG)

Is it possible to get the current quality of an existing image?
I want to load a JPEG and save it again without any change in quality and DPI. (I need to do some pixel manipulation before saving)
JPEG is a lossy format.
The direct way to do this, read the image, do what you need to do, reencode the image, will result in the image being slightly deteriorated.
That said, if that is fine, then you need to know that the way that quality works in JPEG encoding, is to determine how much information to keep about contrast. The less quality, the less sharp a transition you can have. In other words, this is not a single setting enclosed in the JPEG-file, but a setting determining the number of image data saved.
What you can do is to say that the final image need to be around the same size as the original. You can then encode the result at different quality settings and choose the one giving the image size you want.

Categories