How can I get Images coordinates in pdf into JSONfile?

How can I get Images coordinates in pdf into JSONfile? - java

I have coded creating html page included images extracting a page in pdf document.
I had tried to extract images from pdf and then I succeeded to extract images from pdf and to apply the images to html page using PDFBox lib. but I did not extract image coordinates in html page.
So searched how to extract image coordinates in pdf, I tried to extract image coordinates in pdf using PDFBox Library.
Below code :
public static void main(String[] args) throws Exception
{
try
{
PDDocument document = PDDocument.load(
"/Users/tmdtjq/Downloads/PDFTest/test.pdf" );
PrintImageLocations printer = new PrintImageLocations();
List allPages = document.getDocumentCatalog().getAllPages();
for( int i=0; i<allPages.size(); i++ )
{
PDPage page = (PDPage)allPages.get( i );
int pageNum = i+1;
System.out.println( "Processing page: " + pageNum );
printer.processStream( page, page.findResources(),
page.getContents().getStream() );
}
}
finally
{
}
}
protected void processOperator( PDFOperator operator, List arguments ) throws IOException
{
String operation = operator.getOperation();
if( operation.equals( "Do" ) )
{
COSName objectName = (COSName)arguments.get( 0 );
Map xobjects = getResources().getXObjects();
PDXObject xobject = xobjects.get( objectName.getName() );
if( xobject instanceof PDXObjectImage )
{
try
{
PDXObjectImage image = (PDXObjectImage)xobject;
PDPage page = getCurrentPage();
Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
double rotationInRadians =(page.findRotation() * Math.PI)/180;
AffineTransform rotation = new AffineTransform();
rotation.setToRotation( rotationInRadians );
AffineTransform rotationInverse = rotation.createInverse();
Matrix rotationInverseMatrix = new Matrix();
rotationInverseMatrix.setFromAffineTransform( rotationInverse );
Matrix rotationMatrix = new Matrix();
rotationMatrix.setFromAffineTransform( rotation );
Matrix unrotatedCTM = ctm.multiply( rotationInverseMatrix );
float xScale = unrotatedCTM.getXScale();
float yScale = unrotatedCTM.getYScale();
float xPosition = unrotatedCTM.getXPosition();
float yPosition = unrotatedCTM.getYPosition();
System.out.println( "Found image[" + objectName.getName() + "] " +
"at " + xPosition + "," + yPosition +
" size=" + (xScale/100f*image.getWidth()) + "," + (yScale/100f*image.getHeight() ));
}
catch( NoninvertibleTransformException e )
{
throw new WrappedIOException( e );
}
}
}
}
Outputs printing X,Y Positions in images is All 0.0, 0.0.
I think because getGraphicsState() is method to return the graphicsState.
But I want to get specific images coordinates applied to height,width of a PDF page in order to create html page.
I think maybe it is solution to extract JSON from images coordinates in PDF.
Please introduce image coordinates in PDF to JSON tool or suggest PDF Library.
(Already I used pdf2json tool in FlexPaper. this tool extracts JSONfile including not images data but just texts data(content, coordinates, font..) from PDF page.)

I was able to find images with searching for cm operator.
I overrided PDFTextStripper the following way:
Note: it doesn't take into account rotation and mirroring!
public static class TextFinder extends PDFTextStripper {
public TextFinder() throws IOException {
super();
}
#Override
protected void startPage(PDPage page) throws IOException {
// process start of the page
super.startPage(page);
}
#Override
public void process(PDFOperator operator, List<COSBase> arguments)
throws IOException {
if ("cm".equals(operator.getOperation())) {
float width = ((COSNumber)arguments.get(0)).floatValue();
float height = ((COSNumber)arguments.get(3)).floatValue();
float x = ((COSNumber)arguments.get(4)).floatValue();
float y = ((COSNumber)arguments.get(5)).floatValue();
// process image coordinates
}
super.processOperator(operator, arguments);
}
#Override
protected void writeString(String text,
List<TextPosition> textPositions) throws IOException {
for (TextPosition position : textPositions) {
// process text coordinates
}
super.writeString(text, textPositions);
}
}
Of course, one can use PDFStreamEngine instead of PDFTextStripper, if one is not interested in finding text together with images.

Related

Detecting text field overflow

Assuming I have a PDF document with a text field with some font and size defined, is there a way to determine if some text will fit inside the field rectangle using PDFBox?
I'm trying to avoid cases where text is not fully displayed inside the field, so in case the text overflows given the font and size, I would like to change the font size to Auto (0).

This code recreates the appearance stream to be sure that it exists so that there is a bbox (which can be a little bit smaller than the rectangle).
public static void main(String[] args) throws IOException
{
// file can be found at https://issues.apache.org/jira/browse/PDFBOX-142
// https://issues.apache.org/jira/secure/attachment/12742551/Testformular1.pdf
try (PDDocument doc = PDDocument.load(new File("Testformular1.pdf")))
{
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDTextField field = (PDTextField) acroForm.getField("Name");
PDAnnotationWidget widget = field.getWidgets().get(0);
// force generation of appearance stream
field.setValue(field.getValue());
PDRectangle rectangle = widget.getRectangle();
PDAppearanceEntry ap = widget.getAppearance().getNormalAppearance();
PDAppearanceStream appearanceStream = ap.getAppearanceStream();
PDRectangle bbox = appearanceStream.getBBox();
float fieldWidth = Math.min(bbox.getWidth(), rectangle.getWidth());
String defaultAppearance = field.getDefaultAppearance();
System.out.println(defaultAppearance);
// Pattern must be improved, font may have numbers
// /Helv 12 Tf 0 g
final Pattern p = Pattern.compile("\\/([A-z]+) (\\d+).+");
Matcher m = p.matcher(defaultAppearance);
if (!m.find() || m.groupCount() != 2)
{
System.out.println("oh-oh");
System.exit(-1);
}
String fontName = m.group(1);
int fontSize = Integer.parseInt(m.group(2));
PDResources resources = appearanceStream.getResources();
if (resources == null)
{
resources = acroForm.getDefaultResources();
}
PDFont font = resources.getFont(COSName.getPDFName(fontName));
float stringWidth = font.getStringWidth("Tilman Hausherr Tilman Hausherr");
System.out.println("stringWidth: " + stringWidth * fontSize / 1000);
System.out.println("field width: " + fieldWidth);
}
}
The output is:
/Helv 12 Tf 0 g
stringWidth: 180.7207
field width: 169.29082

Get Images by rectangle

I have this method which can extract text in a specific location in the pdf
public static void getTextByRectangle(PDDocument doc,Rectangle rect) throws IOException{
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition( true );
stripper.addRegion( "class1", rect );
PDPage firstPage = doc.getPage(0);
stripper.extractRegions( firstPage );
System.out.println( "Text in the area:" + rect );
System.out.println( stripper.getTextForRegion( "class1" ) );
}
Is it possible to do the same thing but for extracting images??

Yes, you can extract all images, and compare the position of rect and images. Here is the example by pdfbox. This can get image locations.
You need create a class extends PDFStreamEngine. Like this,
public class PrintImageLocations extends PDFStreamEngine
You should override processOperator. And from ctmNew, you can get the image location, then compare image with yours rect, you will get the right image.
#Override
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException {
String operation = operator.getName();
if ("Do".equals(operation)) {
COSName objectName = (COSName) operands.get(0);
PDXObject xobject = getResources().getXObject(objectName);
if (xobject instanceof PDImageXObject) {
PDImageXObject image = (PDImageXObject) xobject;
Matrix ctmNew = getGraphicsState().getCurrentTransformationMatrix();
float imageXScale = ctmNew.getScalingFactorX();
float imageYScale = ctmNew.getScalingFactorY();
// position in user space units. 1 unit = 1/72 inch at 72 dpi
System.out.println("position in PDF = " + ctmNew.getTranslateX() + ", " + ctmNew.getTranslateY() + " in user space units");
// displayed size in user space units
System.out.println("displayed size = " + imageXScale + ", " + imageYScale + " in user space units");
} else if (xobject instanceof PDFormXObject) {
PDFormXObject form = (PDFormXObject) xobject;
showForm(form);
}
} else {
super.processOperator(operator, operands);
}
}
Thanks mkl and FiReTiTi's advice.

PDFBox: put two A4 pages on one A3

I have a pdf document with one or more pages A4 paper.
The resulting pdf document should be A3 paper where each page contains two from the first one (odd on the left, even on the right side).
I already got it to render the A4 pages into images and the odd pages are successfully placed on the first parts of a new A3 pages but I cannot get the even pages to be placed.
public class CreateLandscapePDF {
public void renderPDF(File inputFile, String output) {
PDDocument docIn = null;
PDDocument docOut = null;
float width = 0;
float height = 0;
float posX = 0;
float posY = 0;
try {
docIn = PDDocument.load(inputFile);
PDFRenderer pdfRenderer = new PDFRenderer(docIn);
docOut = new PDDocument();
int pageCounter = 0;
for(PDPage pageIn : docIn.getPages()) {
pageIn.setRotation(270);
BufferedImage bufferedImage = pdfRenderer.renderImage(pageCounter);
width = bufferedImage.getHeight();
height = bufferedImage.getWidth();
PDPage pageOut = new PDPage(PDRectangle.A3);
PDImageXObject image = LosslessFactory.createFromImage(docOut, bufferedImage);
PDPageContentStream contentStream = new PDPageContentStream(docOut, pageOut, AppendMode.APPEND, true, true);
if((pageCounter & 1) == 0) {
pageOut.setRotation(90);
docOut.addPage(pageOut);
posX = 0;
posY = 0;
} else {
posX = 0;
posY = width;
}
contentStream.drawImage(image, posX, posY);
contentStream.close();
bufferedImage.flush();
pageCounter++;
}
docOut.save(output + "\\LandscapeTest.pdf");
docOut.close();
docIn.close();
} catch(IOException io) {
io.printStackTrace();
}
}
}
I'm using Apache PDFBox 2.0.2 (pdfbox-app-2.0.2.jar)

Thank you very much for your help and the link to the other question - I think I already read it but wasn't able to use in in my code yet.
But finally the PDFClown made the job, though I think it's not very nice to use PDFBox and PDFClown in the same program.
Anyway here's my working code to combine A4 pages on A3 paper.
public class CombinePages {
public void run(String input, String output) {
try {
Document source = new File(input).getDocument();
Pages sourcePages = source.getPages();
Document target = new File().getDocument();
Page targetPage = null;
int pageCounter = 0;
double moveByX = .0;
for(Page sourcePage : source.getPages()) {
if((pageCounter & 1) == 0) {
//even page gets a blank page
targetPage = new Page(target);
target.setPageSize(PageFormat.getSize(PageFormat.SizeEnum.A3, PageFormat.OrientationEnum.Landscape));
target.getPages().add(targetPage);
moveByX = .0;
} else {
moveByX = .50;
}
//get content from source page
XObject xObject = sourcePages.get(pageCounter).toXObject(target);
PrimitiveComposer composer = new PrimitiveComposer(targetPage);
Dimension2D targetSize = targetPage.getSize();
Dimension2D sourceSize = xObject.getSize();
composer.showXObject(xObject, new Point2D.Double(targetSize.getWidth() * moveByX, targetSize.getHeight() * .0), new Dimension(sourceSize.getWidth(), sourceSize.getHeight()), XAlignmentEnum.Left, YAlignmentEnum.Top, 0);
composer.flush();
pageCounter++;
}
target.getFile().save(output + "\\CombinePages.pdf", SerializationModeEnum.Standard);
source.getFile().close();
} catch (FileNotFoundException fnf) {
log.error(fnf);
} catch (IOException io) {
log.error(io);
}
}
}

pdfBox add different lines to pdf

I'm looking into generating a pdf-document. At the moment I'm trying out different approaches. I want to get more than one line in a pdf-document. Using a HelloWorld code example I came up with ...
package org.apache.pdfbox.examples.pdmodel;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
/**
* Creates a "Hello World" PDF using the built-in Helvetica font.
*
* The example is taken from the PDF file format specification.
*/
public final class HelloWorld
{
private HelloWorld()
{
}
public static void main(String[] args) throws IOException
{
String filename = "line.pdf";
String message = "line";
PDDocument doc = new PDDocument();
try
{
PDPage page = new PDPage();
doc.addPage(page);
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream contents = new PDPageContentStream(doc, page);
contents.beginText();
contents.setFont(font, 12);
// Loop to create 25 lines of text
for (int y = 0; y< 25; y++) {
int ty = 700 + y * 15;
contents.newLineAtOffset(100, ty);
//contents.newLineAtOffset(125, ty);
//contents.showText(Integer.toString(i));
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i));
}
contents.endText();
contents.close();
doc.save(filename);
}
finally
{
doc.close();
System.out.println("HelloWorld finished after 'doc.close()'.");
}
}
}
But looking at my resulting document I only see "line 0" once, and no other lines. What am I doing wrong?

Your issue is that you think PDPageContentStream.newLineAtOffset uses absolute coordinates. This is not the case, it uses relative coordinates, cf. the JavaDocs:
/**
* The Td operator.
* Move to the start of the next line, offset from the start of the current line by (tx, ty).
*
* #param tx The x translation.
* #param ty The y translation.
* #throws IOException If there is an error writing to the stream.
* #throws IllegalStateException If the method was not allowed to be called at this time.
*/
public void newLineAtOffset(float tx, float ty) throws IOException
So your additional lines are way off the visible page area.
Thus, you might want to something like this:
...
contents.beginText();
contents.setFont(font, 12);
contents.newLineAtOffset(100, 700);
// Loop to create 25 lines of text
for (int i = 0; i < 25; i++) {
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i));
contents.newLineAtOffset(0, -15);
}
contents.endText();
...
Here you start at 100, 700 and move down for each line by 15.

In addition to mkl's answer you could also create a new text operation for each line. Doing that will enable you to use absolute coordinates.
...
contents.setFont(font, 12);
// Loop to create 25 lines of text
for (int i = 0; i < 25; i++) {
int ty = 700 + y * 15;
contents.beginText();
contents.newLineAtOffset(100, ty);
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i))
contents.endText();
}
...
Whether you need this or not depends on your usecase.
For example I wanted to write some text right aligned. In that case it was easier to use absolute position, so I created a helper method like this:
public static void showTextRightAligned(PDPageContentStream contentStream, PDType1Font font, int fontsize, float rightX, float topY, String text) throws IOException
{
float textWidth = fontsize * font.getStringWidth(text) / 1000;
float leftX = rightX - textWidth;
contentStream.beginText();
contentStream.newLineAtOffset(leftX, topY);
contentStream.showText(text);
contentStream.endText();
}

You could do something like this:
contentStream.beginText();
contentStream.newLineAtOffset(20,750);
//This begins the cursor at top right
contentStream.setFont(PDType1Font.TIMES_ROMAN,8);
for (String readList : resultList) {
contentStream.showText(readList);
contentStream.newLineAtOffset(0,-12);
//This will move cursor down by 12pts on every run of loop
}

Loading an animated image to a BufferedImage array

I'm trying to implement animated textures into an OpenGL game seamlessly. I made a generic ImageDecoder class to translate any BufferedImage into a ByteBuffer. It works perfectly for now, though it doesn't load animated images.
I'm not trying to load an animated image as an ImageIcon. I need the BufferedImage to get an OpenGL-compliant ByteBuffer.
How can I load every frames as a BufferedImage array in an animated image ?
On a similar note, how can I get the animation rate / period ?
Does Java handle APNG ?

The following code is an adaption from my own implementation to accommodate the "into array" part.
The problem with gifs is: There are different disposal methods which have to be considered, if you want this to work with all of them. The code below tries to compensate for that. For example there is a special implementation for "doNotDispose" mode, which takes all frames from start to N and paints them on top of each other into a BufferedImage.
The advantage of this method over the one posted by chubbsondubs is that it does not have to wait for the gif animation delays, but can be done basically instantly.
BufferedImage[] array = null;
ImageInputStream imageInputStream = ImageIO.createImageInputStream(new ByteArrayInputStream(data)); // or any other source stream
Iterator<ImageReader> imageReaders = ImageIO.getImageReaders(imageInputStream);
while (imageReaders.hasNext())
{
ImageReader reader = (ImageReader) imageReaders.next();
try
{
reader.setInput(imageInputStream);
frames = reader.getNumImages(true);
array = new BufferedImage[frames];
for (int frameId : frames)
{
int w = reader.getWidth(0);
int h = reader.getHeight(0);
int fw = reader.getWidth(frameId);
int fh = reader.getHeight(frameId);
if (h != fh || w != fw)
{
GifMeta gm = getGifMeta(reader.getImageMetadata(frameId));
// disposalMethodNames: "none", "doNotDispose","restoreToBackgroundColor","restoreToPrevious",
if ("doNotDispose".equals(gm.disposalMethod))
{
image = new BufferedImage(w, h, BufferedImage.TYPE_INT_ARGB);
Graphics2D g = (Graphics2D) image.getGraphics();
for (int f = 0; f <= frameId; f++)
{
gm = getGifMeta(reader.getImageMetadata(f));
if ("doNotDispose".equals(gm.disposalMethod))
{
g.drawImage(reader.read(f), null, gm.imageLeftPosition, gm.imageTopPosition);
}
else
{
// XXX "Unimplemented disposalMethod (" + getName() + "): " + gm.disposalMethod);
}
}
g.dispose();
}
else
{
image = reader.read(frameId);
// XXX "Unimplemented disposalMethod (" + getName() + "): " + gm.disposalMethod;
}
}
else
{
image = reader.read(frameId);
}
if (image == null)
{
throw new NullPointerException();
}
array[frame] = image;
}
}
finally
{
reader.dispose();
}
}
return array;
private final static class GifMeta
{
String disposalMethod = "none";
int imageLeftPosition = 0;
int imageTopPosition = 0;
int delayTime = 0;
}
private GifMeta getGifMeta(IIOMetadata meta)
{
GifMeta gm = new GifMeta();
final IIOMetadataNode gifMeta = (IIOMetadataNode) meta.getAsTree("javax_imageio_gif_image_1.0");
NodeList childNodes = gifMeta.getChildNodes();
for (int i = 0; i < childNodes.getLength(); ++i)
{
IIOMetadataNode subnode = (IIOMetadataNode) childNodes.item(i);
if (subnode.getNodeName().equals("GraphicControlExtension"))
{
gm.disposalMethod = subnode.getAttribute("disposalMethod");
gm.delayTime = Integer.parseInt(subnode.getAttribute("delayTime"));
}
else if (subnode.getNodeName().equals("ImageDescriptor"))
{
gm.imageLeftPosition = Integer.parseInt(subnode.getAttribute("imageLeftPosition"));
gm.imageTopPosition = Integer.parseInt(subnode.getAttribute("imageTopPosition"));
}
}
return gm;
}

I don't think Java supports APNG by default, but you can use an 3rd party library to parse it:
http://code.google.com/p/javapng/source/browse/trunk/javapng2/src/apng/com/sixlegs/png/AnimatedPngImage.java?r=300
That might be your easiest method. As for getting the frames from an animated gif you have to register an ImageObserver:
new ImageIcon( url ).setImageObserver( new ImageObserver() {
public void imageUpdate( Image img, int infoFlags, int x, int y, int width, int height ) {
if( infoFlags & ImageObserver.FRAMEBITS == ImageObserver.FRAMEBITS ) {
// another frame was loaded do something with it.
}
}
});
This loads asynchronously on another thread so imageUpdate() won't be called immediately. But it will be called for each frame as it parses it.
http://docs.oracle.com/javase/1.4.2/docs/api/java/awt/image/ImageObserver.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I get Images coordinates in pdf into JSONfile? - java

Related

Detecting text field overflow

Get Images by rectangle

PDFBox: put two A4 pages on one A3

pdfBox add different lines to pdf

Loading an animated image to a BufferedImage array

Categories

Resources