pdfBox add different lines to pdf - java

I'm looking into generating a pdf-document. At the moment I'm trying out different approaches. I want to get more than one line in a pdf-document. Using a HelloWorld code example I came up with ...
package org.apache.pdfbox.examples.pdmodel;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
/**
* Creates a "Hello World" PDF using the built-in Helvetica font.
*
* The example is taken from the PDF file format specification.
*/
public final class HelloWorld
{
private HelloWorld()
{
}
public static void main(String[] args) throws IOException
{
String filename = "line.pdf";
String message = "line";
PDDocument doc = new PDDocument();
try
{
PDPage page = new PDPage();
doc.addPage(page);
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream contents = new PDPageContentStream(doc, page);
contents.beginText();
contents.setFont(font, 12);
// Loop to create 25 lines of text
for (int y = 0; y< 25; y++) {
int ty = 700 + y * 15;
contents.newLineAtOffset(100, ty);
//contents.newLineAtOffset(125, ty);
//contents.showText(Integer.toString(i));
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i));
}
contents.endText();
contents.close();
doc.save(filename);
}
finally
{
doc.close();
System.out.println("HelloWorld finished after 'doc.close()'.");
}
}
}
But looking at my resulting document I only see "line 0" once, and no other lines. What am I doing wrong?

Your issue is that you think PDPageContentStream.newLineAtOffset uses absolute coordinates. This is not the case, it uses relative coordinates, cf. the JavaDocs:
/**
* The Td operator.
* Move to the start of the next line, offset from the start of the current line by (tx, ty).
*
* #param tx The x translation.
* #param ty The y translation.
* #throws IOException If there is an error writing to the stream.
* #throws IllegalStateException If the method was not allowed to be called at this time.
*/
public void newLineAtOffset(float tx, float ty) throws IOException
So your additional lines are way off the visible page area.
Thus, you might want to something like this:
...
contents.beginText();
contents.setFont(font, 12);
contents.newLineAtOffset(100, 700);
// Loop to create 25 lines of text
for (int i = 0; i < 25; i++) {
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i));
contents.newLineAtOffset(0, -15);
}
contents.endText();
...
Here you start at 100, 700 and move down for each line by 15.

In addition to mkl's answer you could also create a new text operation for each line. Doing that will enable you to use absolute coordinates.
...
contents.setFont(font, 12);
// Loop to create 25 lines of text
for (int i = 0; i < 25; i++) {
int ty = 700 + y * 15;
contents.beginText();
contents.newLineAtOffset(100, ty);
contents.showText(message + " " + Integer.toString(i));
System.out.println(message + " " + Integer.toString(i))
contents.endText();
}
...
Whether you need this or not depends on your usecase.
For example I wanted to write some text right aligned. In that case it was easier to use absolute position, so I created a helper method like this:
public static void showTextRightAligned(PDPageContentStream contentStream, PDType1Font font, int fontsize, float rightX, float topY, String text) throws IOException
{
float textWidth = fontsize * font.getStringWidth(text) / 1000;
float leftX = rightX - textWidth;
contentStream.beginText();
contentStream.newLineAtOffset(leftX, topY);
contentStream.showText(text);
contentStream.endText();
}

You could do something like this:
contentStream.beginText();
contentStream.newLineAtOffset(20,750);
//This begins the cursor at top right
contentStream.setFont(PDType1Font.TIMES_ROMAN,8);
for (String readList : resultList) {
contentStream.showText(readList);
contentStream.newLineAtOffset(0,-12);
//This will move cursor down by 12pts on every run of loop
}

Related

Negative X or Y obtained from PdfBox while extracting text position

I am trying to extract all text in a pdf along with their coordinates.
I am using Apache PDFBox 2.0.8 and following the sample program DrawPrintTextLocations .
It seems to work mostly, but for certain pdf-s i get negative values for the x and y coordinates of the bounding boxes. Refer this pdf file for example.
My app assumes the coordinate system as a normal pdf (x goes from left to right an y goes top to bottom). so these are throwing my computations off.
Below is the relevant piece of code.
import org.apache.fontbox.util.BoundingBox;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType3Font;
import org.apache.pdfbox.pdmodel.interactive.pagenavigation.PDThreadBead;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.geom.AffineTransform;
import java.awt.geom.Rectangle2D;
import java.awt.image.BufferedImage;
import java.io.*;
import java.util.List;
/**
* This is an example on how to get some x/y coordinates of text and to show them in a rendered
* image.
*
* #author Ben Litchfield
* #author Tilman Hausherr
*/
public class DrawPrintTextLocations extends PDFTextStripper {
private AffineTransform flipAT;
private AffineTransform rotateAT;
private AffineTransform transAT;
private final float DPI = 200.0f;
private final double PT2PX = DPI / 72.0;
private final AffineTransform dpiAT = AffineTransform.getScaleInstance(PT2PX, PT2PX);
private final String filename;
static final int SCALE = 1;
private Graphics2D g2d;
private final PDDocument document;
/**
* Instantiate a new PDFTextStripper object.
*
* #param document
* #param filename
* #throws IOException If there is an error loading the properties.
*/
public DrawPrintTextLocations(PDDocument document, String filename) throws IOException {
this.document = document;
this.filename = filename;
}
/**
* This will print the documents data.
*
* #param args The command line arguments.
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException {
String pdfLoc = "/debug/pdfbox/p2_VS008PI.pdf";
if (args.length == 1) {
pdfLoc = args[0];
}
try (PDDocument document = PDDocument.load(new File(pdfLoc))) {
DrawPrintTextLocations stripper = new DrawPrintTextLocations(document, pdfLoc);
stripper.setSortByPosition(true);
for (int page = 0; page < document.getNumberOfPages(); ++page) {
stripper.stripPage(page);
}
}
}
private void stripPage(int page) throws IOException {
PDFRenderer pdfRenderer = new PDFRenderer(document);
BufferedImage image = pdfRenderer.renderImageWithDPI(page, DPI);
PDPage pdPage = document.getPage(page);
PDRectangle cropBox = pdPage.getCropBox();
// flip y-axis
flipAT = new AffineTransform();
flipAT.translate(0, pdPage.getBBox().getHeight());
flipAT.scale(1, -1);
// page may be rotated
rotateAT = new AffineTransform();
int rotation = pdPage.getRotation();
if (rotation != 0) {
PDRectangle mediaBox = pdPage.getMediaBox();
switch (rotation) {
case 90:
rotateAT.translate(mediaBox.getHeight(), 0);
break;
case 270:
rotateAT.translate(0, mediaBox.getWidth());
break;
case 180:
rotateAT.translate(mediaBox.getWidth(), mediaBox.getHeight());
break;
default:
break;
}
rotateAT.rotate(Math.toRadians(rotation));
}
// cropbox
transAT = AffineTransform.getTranslateInstance(-cropBox.getLowerLeftX(), cropBox.getLowerLeftY());
g2d = image.createGraphics();
g2d.setStroke(new BasicStroke(0.1f));
g2d.scale(SCALE, SCALE);
setStartPage(page + 1);
setEndPage(page + 1);
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
writeText(document, dummy);
g2d.dispose();
String imageFilename = filename;
int pt = imageFilename.lastIndexOf('.');
imageFilename = imageFilename.substring(0, pt) + "-marked-" + (page + 1) + ".png";
ImageIO.write(image, "png", new File(imageFilename));
}
/**
* Override the default functionality of PDFTextStripper.
*/
#Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
for (TextPosition text : textPositions) {
AffineTransform at = text.getTextMatrix().createAffineTransform();
PDFont font = text.getFont();
BoundingBox bbox = font.getBoundingBox();
float xadvance = font.getWidth(text.getCharacterCodes()[0]); // todo: should iterate all chars
Rectangle2D.Float rect1 = new Rectangle2D.Float(0, bbox.getLowerLeftY(), xadvance, bbox.getHeight());
if (font instanceof PDType3Font) {
at.concatenate(font.getFontMatrix().createAffineTransform());
} else {
at.scale(1 / 1000f, 1 / 1000f);
}
Shape s1 = at.createTransformedShape(rect1);
s1 = flipAT.createTransformedShape(s1);
s1 = rotateAT.createTransformedShape(s1);
s1 = dpiAT.createTransformedShape(s1);
g2d.setColor(Color.blue);
g2d.draw(s1);
Rectangle bounds = s1.getBounds();
if (bounds.getX() < 0 || bounds.getY() < 0) {
// THIS is where things go wrong
// i need these coordinates to be +ve
System.out.println(bounds.toString());
System.out.println(rect1.toString());
}
}
}
}
And here is some snippet of the output from the first page of the above pdf.
SECTION 10 – INSURANCE & OTHER FINANCIAL RESOURCES
java.awt.Rectangle[x=-3237,y=40,width=19,height=43]
java.awt.Rectangle[x=-3216,y=40,width=20,height=43]
java.awt.Rectangle[x=-3194,y=40,width=23,height=43]
java.awt.Rectangle[x=-3170,y=40,width=22,height=43]
The characters with negative coordinates are outside the cropbox (also characters with coordinates bigger than the cropbox height / width). See the cropbox as a cutout from something bigger. To see the whole thing, run this code
pdPage.setCropBox(pdPage.getMediaBox());
for each page of your PDF and then save and view it.
Per your comment
Following your advice of setting the crop box to the media box, actually changed the whole on screen appearance of the pdf, now i got 3 pages collated as one.
This suggests that physically, this is a folded sheet that has 3 pages on each side. The online PDF displays this as 6 pages for easy viewing on a computer.

PDFBox how to get the upper-left coordinates of an Image?

I'm using the following script to get the image positions within a Page. How can I transfer them to Pixel coordinates of the upper-left Corner? Because I want to create a Rectangle based on the position and size of the image and compare it to another Rectangle.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.pdfbox.examples.util;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.contentstream.operator.DrawObject;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.contentstream.PDFStreamEngine;
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.pdfbox.contentstream.operator.state.Concatenate;
import org.apache.pdfbox.contentstream.operator.state.Restore;
import org.apache.pdfbox.contentstream.operator.state.Save;
import org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters;
import org.apache.pdfbox.contentstream.operator.state.SetMatrix;
/**
* This is an example on how to get the x/y coordinates of image locations.
*
* #author Ben Litchfield
*/
public class PrintImageLocations extends PDFStreamEngine
{
/**
* Default constructor.
*
* #throws IOException If there is an error loading text stripper properties.
*/
public PrintImageLocations() throws IOException
{
addOperator(new Concatenate());
addOperator(new DrawObject());
addOperator(new SetGraphicsStateParameters());
addOperator(new Save());
addOperator(new Restore());
addOperator(new SetMatrix());
}
/**
* This will print the documents data.
*
* #param args The command line arguments.
*
* #throws IOException If there is an error parsing the document.
*/
public static void main( String[] args ) throws IOException
{
if( args.length != 1 )
{
usage();
}
else
{
PDDocument document = null;
try
{
document = PDDocument.load( new File(args[0]) );
PrintImageLocations printer = new PrintImageLocations();
int pageNum = 0;
for( PDPage page : document.getPages() )
{
pageNum++;
System.out.println( "Processing page: " + pageNum );
printer.processPage(page);
}
}
finally
{
if( document != null )
{
document.close();
}
}
}
}
/**
* This is used to handle an operation.
*
* #param operator The operation to perform.
* #param operands The list of arguments.
*
* #throws IOException If there is an error processing the operation.
*/
#Override
protected void processOperator( Operator operator, List<COSBase> operands) throws IOException
{
String operation = operator.getName();
if( "Do".equals(operation) )
{
COSName objectName = (COSName) operands.get( 0 );
PDXObject xobject = getResources().getXObject( objectName );
if( xobject instanceof PDImageXObject)
{
PDImageXObject image = (PDImageXObject)xobject;
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
System.out.println("*******************************************************************");
System.out.println("Found image [" + objectName.getName() + "]");
Matrix ctmNew = getGraphicsState().getCurrentTransformationMatrix();
float imageXScale = ctmNew.getScalingFactorX();
float imageYScale = ctmNew.getScalingFactorY();
// position in user space units. 1 unit = 1/72 inch at 72 dpi
System.out.println("position in PDF = " + ctmNew.getTranslateX() + ", " + ctmNew.getTranslateY() + " in user space units");
// raw size in pixels
System.out.println("raw image size = " + imageWidth + ", " + imageHeight + " in pixels");
// displayed size in user space units
System.out.println("displayed size = " + imageXScale + ", " + imageYScale + " in user space units");
// displayed size in inches at 72 dpi rendering
imageXScale /= 72;
imageYScale /= 72;
System.out.println("displayed size = " + imageXScale + ", " + imageYScale + " in inches at 72 dpi rendering");
// displayed size in millimeters at 72 dpi rendering
imageXScale *= 25.4;
imageYScale *= 25.4;
System.out.println("displayed size = " + imageXScale + ", " + imageYScale + " in millimeters at 72 dpi rendering");
System.out.println();
}
else if(xobject instanceof PDFormXObject)
{
PDFormXObject form = (PDFormXObject)xobject;
showForm(form);
}
}
else
{
super.processOperator( operator, operands);
}
}
/**
* This will print the usage for this document.
*/
private static void usage()
{
System.err.println( "Usage: java " + PrintImageLocations.class.getName() + " <input-pdf>" );
}
}
Source: https://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintImageLocations.java
Possible Formula to get the coordinates of the upper left edge:
X1 = ctmNew.getTranslateX() * topixelfactor
Y1 = pageheight - ctmNew.getTranslateY() * toPixelFactor - imageHeight
I can get the pageHeight by page.getMediaBox().getHeight(); But the problem is to calculate the "toPixelFactor" as the Translation from ctmNew.getTranslateY() is given in User Space Units and not in Pixel.

How can I get Images coordinates in pdf into JSONfile?

I have coded creating html page included images extracting a page in pdf document.
I had tried to extract images from pdf and then I succeeded to extract images from pdf and to apply the images to html page using PDFBox lib. but I did not extract image coordinates in html page.
So searched how to extract image coordinates in pdf, I tried to extract image coordinates in pdf using PDFBox Library.
Below code :
public static void main(String[] args) throws Exception
{
try
{
PDDocument document = PDDocument.load(
"/Users/tmdtjq/Downloads/PDFTest/test.pdf" );
PrintImageLocations printer = new PrintImageLocations();
List allPages = document.getDocumentCatalog().getAllPages();
for( int i=0; i<allPages.size(); i++ )
{
PDPage page = (PDPage)allPages.get( i );
int pageNum = i+1;
System.out.println( "Processing page: " + pageNum );
printer.processStream( page, page.findResources(),
page.getContents().getStream() );
}
}
finally
{
}
}
protected void processOperator( PDFOperator operator, List arguments ) throws IOException
{
String operation = operator.getOperation();
if( operation.equals( "Do" ) )
{
COSName objectName = (COSName)arguments.get( 0 );
Map xobjects = getResources().getXObjects();
PDXObject xobject = xobjects.get( objectName.getName() );
if( xobject instanceof PDXObjectImage )
{
try
{
PDXObjectImage image = (PDXObjectImage)xobject;
PDPage page = getCurrentPage();
Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
double rotationInRadians =(page.findRotation() * Math.PI)/180;
AffineTransform rotation = new AffineTransform();
rotation.setToRotation( rotationInRadians );
AffineTransform rotationInverse = rotation.createInverse();
Matrix rotationInverseMatrix = new Matrix();
rotationInverseMatrix.setFromAffineTransform( rotationInverse );
Matrix rotationMatrix = new Matrix();
rotationMatrix.setFromAffineTransform( rotation );
Matrix unrotatedCTM = ctm.multiply( rotationInverseMatrix );
float xScale = unrotatedCTM.getXScale();
float yScale = unrotatedCTM.getYScale();
float xPosition = unrotatedCTM.getXPosition();
float yPosition = unrotatedCTM.getYPosition();
System.out.println( "Found image[" + objectName.getName() + "] " +
"at " + xPosition + "," + yPosition +
" size=" + (xScale/100f*image.getWidth()) + "," + (yScale/100f*image.getHeight() ));
}
catch( NoninvertibleTransformException e )
{
throw new WrappedIOException( e );
}
}
}
}
Outputs printing X,Y Positions in images is All 0.0, 0.0.
I think because getGraphicsState() is method to return the graphicsState.
But I want to get specific images coordinates applied to height,width of a PDF page in order to create html page.
I think maybe it is solution to extract JSON from images coordinates in PDF.
Please introduce image coordinates in PDF to JSON tool or suggest PDF Library.
(Already I used pdf2json tool in FlexPaper. this tool extracts JSONfile including not images data but just texts data(content, coordinates, font..) from PDF page.)
I was able to find images with searching for cm operator.
I overrided PDFTextStripper the following way:
Note: it doesn't take into account rotation and mirroring!
public static class TextFinder extends PDFTextStripper {
public TextFinder() throws IOException {
super();
}
#Override
protected void startPage(PDPage page) throws IOException {
// process start of the page
super.startPage(page);
}
#Override
public void process(PDFOperator operator, List<COSBase> arguments)
throws IOException {
if ("cm".equals(operator.getOperation())) {
float width = ((COSNumber)arguments.get(0)).floatValue();
float height = ((COSNumber)arguments.get(3)).floatValue();
float x = ((COSNumber)arguments.get(4)).floatValue();
float y = ((COSNumber)arguments.get(5)).floatValue();
// process image coordinates
}
super.processOperator(operator, arguments);
}
#Override
protected void writeString(String text,
List<TextPosition> textPositions) throws IOException {
for (TextPosition position : textPositions) {
// process text coordinates
}
super.writeString(text, textPositions);
}
}
Of course, one can use PDFStreamEngine instead of PDFTextStripper, if one is not interested in finding text together with images.

How to generate multiple lines in PDF using Apache pdfbox

I am using Pdfbox to generate PDF files using Java. The problem is that when i add long text contents in the document, it is not displayed properly. Only a part of it is displayed. That too in a single line.
I want text to be in multiple lines.
My code is given below:
PDPageContentStream pdfContent=new PDPageContentStream(pdfDocument, pdfPage, true, true);
pdfContent.beginText();
pdfContent.setFont(pdfFont, 11);
pdfContent.moveTextPositionByAmount(30,750);
pdfContent.drawString("I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox");
pdfContent.endText();
My output:
Adding to the answer of Mark you might want to know where to split your long string. You can use the PDFont method getStringWidth for that.
Putting everything together you get something like this (with minor differences depending on the PDFBox version):
PDFBox 1.8.x
PDDocument doc = null;
try
{
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
PDFont pdfFont = PDType1Font.HELVETICA;
float fontSize = 25;
float leading = 1.5f * fontSize;
PDRectangle mediabox = page.getMediaBox();
float margin = 72;
float width = mediabox.getWidth() - 2*margin;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
int spaceIndex = text.indexOf(' ', lastSpace + 1);
if (spaceIndex < 0)
spaceIndex = text.length();
String subString = text.substring(0, spaceIndex);
float size = fontSize * pdfFont.getStringWidth(subString) / 1000;
System.out.printf("'%s' - %f of %f\n", subString, size, width);
if (size > width)
{
if (lastSpace < 0)
lastSpace = spaceIndex;
subString = text.substring(0, lastSpace);
lines.add(subString);
text = text.substring(lastSpace).trim();
System.out.printf("'%s' is line\n", subString);
lastSpace = -1;
}
else if (spaceIndex == text.length())
{
lines.add(text);
System.out.printf("'%s' is line\n", text);
text = "";
}
else
{
lastSpace = spaceIndex;
}
}
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.moveTextPositionByAmount(startX, startY);
for (String line: lines)
{
contentStream.drawString(line);
contentStream.moveTextPositionByAmount(0, -leading);
}
contentStream.endText();
contentStream.close();
doc.save("break-long-string.pdf");
}
finally
{
if (doc != null)
{
doc.close();
}
}
(BreakLongString.java test testBreakString for PDFBox 1.8.x)
PDFBox 2.0.x
PDDocument doc = null;
try
{
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
PDFont pdfFont = PDType1Font.HELVETICA;
float fontSize = 25;
float leading = 1.5f * fontSize;
PDRectangle mediabox = page.getMediaBox();
float margin = 72;
float width = mediabox.getWidth() - 2*margin;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
int spaceIndex = text.indexOf(' ', lastSpace + 1);
if (spaceIndex < 0)
spaceIndex = text.length();
String subString = text.substring(0, spaceIndex);
float size = fontSize * pdfFont.getStringWidth(subString) / 1000;
System.out.printf("'%s' - %f of %f\n", subString, size, width);
if (size > width)
{
if (lastSpace < 0)
lastSpace = spaceIndex;
subString = text.substring(0, lastSpace);
lines.add(subString);
text = text.substring(lastSpace).trim();
System.out.printf("'%s' is line\n", subString);
lastSpace = -1;
}
else if (spaceIndex == text.length())
{
lines.add(text);
System.out.printf("'%s' is line\n", text);
text = "";
}
else
{
lastSpace = spaceIndex;
}
}
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.newLineAtOffset(startX, startY);
for (String line: lines)
{
contentStream.showText(line);
contentStream.newLineAtOffset(0, -leading);
}
contentStream.endText();
contentStream.close();
doc.save(new File(RESULT_FOLDER, "break-long-string.pdf"));
}
finally
{
if (doc != null)
{
doc.close();
}
}
(BreakLongString.java test testBreakString for PDFBox 2.0.x)
The result
This looks as expected.
Of course there are numerous improvements to make but this should show how to do it.
Adding unconditional line breaks
In a comment aleskv asked:
could you add line breaks when there are \n in the string?
One can easily extend the solution to unconditionally break at newline characters by first splitting the string at '\n' characters and then iterating over the split result.
E.g. if instead of the long string from above
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox";
you want to process this even longer string with embedded new line characters
String textNL = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox.\nFurthermore, I have added some newline characters to the string at which lines also shall be broken.\nIt should work alright like this...";
you can simply replace
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
[...]
}
in the solutions above by
String textNL = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox.\nFurthermore, I have added some newline characters to the string at which lines also shall be broken.\nIt should work alright like this...";
List<String> lines = new ArrayList<String>();
for (String text : textNL.split("\n"))
{
int lastSpace = -1;
while (text.length() > 0)
{
[...]
}
}
(from BreakLongString.java test testBreakStringNL)
The result:
I know it's a bit late, but i had a little problem with mkl's solution. If the last line would only contain one word, your algorithm writes it on the previous one.
For Example: "Lorem ipsum dolor sit amet" is your text and it should add a line break after "sit".
Lorem ipsum dolor sit
amet
But it does this:
Lorem ipsum dolor sit amet
I came up with my own solution i want to share with you.
/**
* #param text The text to write on the page.
* #param x The position on the x-axis.
* #param y The position on the y-axis.
* #param allowedWidth The maximum allowed width of the whole text (e.g. the width of the page - a defined margin).
* #param page The page for the text.
* #param contentStream The content stream to set the text properties and write the text.
* #param font The font used to write the text.
* #param fontSize The font size used to write the text.
* #param lineHeight The line height of the font (typically 1.2 * fontSize or 1.5 * fontSize).
* #throws IOException
*/
private void drawMultiLineText(String text, int x, int y, int allowedWidth, PDPage page, PDPageContentStream contentStream, PDFont font, int fontSize, int lineHeight) throws IOException {
List<String> lines = new ArrayList<String>();
String myLine = "";
// get all words from the text
// keep in mind that words are separated by spaces -> "Lorem ipsum!!!!:)" -> words are "Lorem" and "ipsum!!!!:)"
String[] words = text.split(" ");
for(String word : words) {
if(!myLine.isEmpty()) {
myLine += " ";
}
// test the width of the current line + the current word
int size = (int) (fontSize * font.getStringWidth(myLine + word) / 1000);
if(size > allowedWidth) {
// if the line would be too long with the current word, add the line without the current word
lines.add(myLine);
// and start a new line with the current word
myLine = word;
} else {
// if the current line + the current word would fit, add the current word to the line
myLine += word;
}
}
// add the rest to lines
lines.add(myLine);
for(String line : lines) {
contentStream.beginText();
contentStream.setFont(font, fontSize);
contentStream.moveTextPositionByAmount(x, y);
contentStream.drawString(line);
contentStream.endText();
y -= lineHeight;
}
}
///// FOR PDBOX 2.0.X
// FOR ADDING DYNAMIC PAGE ACCORDING THE LENGTH OF THE CONTENT
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
public class Document_Creation {
public static void main (String args[]) throws IOException {
PDDocument doc = null;
try
{
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
PDFont pdfFont = PDType1Font.HELVETICA;
float fontSize = 25;
float leading = 1.5f * fontSize;
PDRectangle mediabox = page.getMediaBox();
float margin = 72;
float width = mediabox.getWidth() - 2*margin;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox.An essay is, generally, a piece of writing that gives the author's own argument — but the definition is vague, overlapping with those of an article, a pamphlet, and a short story. Essays have traditionally been sub-classified as formal and informal. Formal essays are characterized by serious purpose, dignity, logical organization, length,whereas the informal essay is characterized by the personal element (self-revelation, individual tastes and experiences, confidential manner), humor, graceful style, rambling structure, unconventionality or novelty of theme.Lastly, one of the most attractive features of cats as housepets is their ease of care. Cats do not have to be walked. They get plenty of exercise in the house as they play, and they do their business in the litter box. Cleaning a litter box is a quick, painless procedure. Cats also take care of their own grooming. Bathing a cat is almost never necessary because under ordinary circumstances cats clean themselves. Cats are more particular about personal cleanliness than people are. In addition, cats can be left home alone for a few hours without fear. Unlike some pets, most cats will not destroy the furnishings when left alone. They are content to go about their usual activities until their owners return.";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
int spaceIndex = text.indexOf(' ', lastSpace + 1);
if (spaceIndex < 0)
spaceIndex = text.length();
String subString = text.substring(0, spaceIndex);
float size = fontSize * pdfFont.getStringWidth(subString) / 1000;
System.out.printf("'%s' - %f of %f\n", subString, size, width);
if (size > width)
{
if (lastSpace < 0)
lastSpace = spaceIndex;
subString = text.substring(0, lastSpace);
lines.add(subString);
text = text.substring(lastSpace).trim();
System.out.printf("'%s' is line\n", subString);
lastSpace = -1;
}
else if (spaceIndex == text.length())
{
lines.add(text);
System.out.printf("'%s' is line\n", text);
text = "";
}
else
{
lastSpace = spaceIndex;
}
}
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.newLineAtOffset(startX, startY);
float currentY=startY;
for (String line: lines)
{
currentY -=leading;
if(currentY<=margin)
{
contentStream.endText();
contentStream.close();
PDPage new_Page = new PDPage();
doc.addPage(new_Page);
contentStream = new PDPageContentStream(doc, new_Page);
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.newLineAtOffset(startX, startY);
currentY=startY;
}
contentStream.showText(line);
contentStream.newLineAtOffset(0, -leading);
}
contentStream.endText();
contentStream.close();
doc.save("C:/Users/VINAYAK/Desktop/docccc/break-long-string.pdf");
}
finally
{
if (doc != null)
{
doc.close();
}
}
}
}
Just draw the string in a position below, typically done within a loop:
float textx = margin+cellMargin;
float texty = y-15;
for(int i = 0; i < content.length; i++){
for(int j = 0 ; j < content[i].length; j++){
String text = content[i][j];
contentStream.beginText();
contentStream.moveTextPositionByAmount(textx,texty);
contentStream.drawString(text);
contentStream.endText();
textx += colWidth;
}
texty-=rowHeight;
textx = margin+cellMargin;
}
These are the important lines:
contentStream.beginText();
contentStream.moveTextPositionByAmount(textx,texty);
contentStream.drawString(text);
contentStream.endText();
Just keep drawing new strings in new positions. For an example using a table, see here:
http://fahdshariff.blogspot.ca/2010/10/creating-tables-with-pdfbox.html
contentStream.moveTextPositionByAmount(textx,texty) is key point.
say for example if you are using a A4 size means 580,800 is width and height correspondling(approximately). so you have move your text based on the position of your document size.
PDFBox supports varies page format . so the height and width will vary for different page format
Pdfbox-layout abstracts out all the tedious details of managing the layout. As a complete Kotlin example, here is how to convert a text file to a pdf without worrying about line wrapping and pagination.
import org.apache.pdfbox.pdmodel.font.PDType1Font
import rst.pdfbox.layout.elements.Document
import rst.pdfbox.layout.elements.Paragraph
import java.io.File
fun main() {
val textFile = "input.txt"
val pdfFile = "output.pdf"
val font = PDType1Font.COURIER
val fontSize = 12f
val document = Document(40f, 50f, 40f, 60f)
val paragraph = Paragraph()
File(textFile).forEachLine {
paragraph.addText("$it\n", fontSize, font)
}
document.add(paragraph)
document.save(File(pdfFile))
}

highlight text using pdfbox when it's location in the pdf is known

Does pdfbox provide some utility to highlight the text when I have it's co-ordinates?
Bounds of the text is known.
I know there are other libraries that provide the same functionality like pdfclown etc. But does pdfbox provide something like that?
well i found this out. it is simple.
PDDocument doc = PDDocument.load(/*path to the file*/);
PDPage page = (PDPage)doc.getDocumentCatalog.getAllPages.get(i);
List annots = page.getAnnotations;
PDAnnotationTextMarkup markup = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.Su....);
markup.setRectangle(/*your PDRectangle*/);
markup.setQuads(/*float array of size eight with all the vertices of the PDRectangle in anticlockwise order*/);
annots.add(markup);
doc.save(/*path to the output file*/);
This is an extended answer from the number 1 here, and basically is the same code as above.
Improves the coordinates points in respect to the page size in the current document, as well the yellow color that is very lighter and sometimes if the word is short and smaller is difficult to see.
Also highlight the full word taking the X, Y coordinates from the top-left to the top-right. Takes the coordinates from the first character and from the last one in the string.
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.graphics.color.PDColor;
import org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
public class MainSource extends PDFTextStripper {
public MainSource() throws IOException {
super();
}
public static void main(String[] args) throws IOException {
PDDocument document = null;
String fileName = "C:/AnyPDFFile.pdf";
try {
document = PDDocument.load( new File(fileName) );
PDFTextStripper stripper = new MainSource();
stripper.setSortByPosition( true );
stripper.setStartPage( 0 );
stripper.setEndPage( document.getNumberOfPages() );
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
File file1 = new File("C:/AnyPDFFile-New.pdf");
document.save(file1);
}
finally {
if( document != null ) {
document.close();
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*/
#Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
boolean isFound = false;
float posXInit = 0,
posXEnd = 0,
posYInit = 0,
posYEnd = 0,
width = 0,
height = 0,
fontHeight = 0;
String[] criteria = {"Word1", "Word2", "Word3", ....};
for (int i = 0; i < criteria.length; i++) {
if (string.contains(criteria[i])) {
isFound = true;
}
}
if (isFound) {
posXInit = textPositions.get(0).getXDirAdj();
posXEnd = textPositions.get(textPositions.size() - 1).getXDirAdj() + textPositions.get(textPositions.size() - 1).getWidth();
posYInit = textPositions.get(0).getPageHeight() - textPositions.get(0).getYDirAdj();
posYEnd = textPositions.get(0).getPageHeight() - textPositions.get(textPositions.size() - 1).getYDirAdj();
width = textPositions.get(0).getWidthDirAdj();
height = textPositions.get(0).getHeightDir();
System.out.println(string + "X-Init = " + posXInit + "; Y-Init = " + posYInit + "; X-End = " + posXEnd + "; Y-End = " + posYEnd + "; Font-Height = " + fontHeight);
/* numeration is index-based. Starts from 0 */
float quadPoints[] = {posXInit, posYEnd + height + 2, posXEnd, posYEnd + height + 2, posXInit, posYInit - 2, posXEnd, posYEnd - 2};
List<PDAnnotation> annotations = document.getPage(this.getCurrentPageNo() - 1).getAnnotations();
PDAnnotationTextMarkup highlight = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
PDRectangle position = new PDRectangle();
position.setLowerLeftX(posXInit);
position.setLowerLeftY(posYEnd);
position.setUpperRightX(posXEnd);
position.setUpperRightY(posYEnd + height);
highlight.setRectangle(position);
// quadPoints is array of x,y coordinates in Z-like order (top-left, top-right, bottom-left,bottom-right)
// of the area to be highlighted
highlight.setQuadPoints(quadPoints);
PDColor yellow = new PDColor(new float[]{1, 1, 1 / 255F}, PDDeviceRGB.INSTANCE);
highlight.setColor(yellow);
annotations.add(highlight);
}
}
}
This works for pdfbox 2.0.7
PDDocument document = /* get doc */
/* numeration is index-based. Starts from 0 */
List<PDAnnotation> annotations = document.getPage(yourPageNumber - 1).getAnnotations();
PDAnnotationTextMarkup highlight = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
highlight.setRectangle(PDRectangle.A4);
// quadPoints is array of x,y coordinates in Z-like order (top-left, top-right, bottom-left,bottom-right)
// of the area to be highlighted
highlight.setQuadPoints(quadPoints);
PDColor yellow = new PDColor(new float[]{1, 1, 204 / 255F}, PDDeviceRGB.INSTANCE);
highlight.setColor(yellow);
annotations.add(highlight);
Note: such annotation will be displayed if you save doc in file, but it will not appear in image created from page since there is no AppearanceStream created for this annotation. I solved it with code drafts from PDFBOX-3353
Simplest way ... draw a rectangle in the desired location and set the height to 1 and the fill color to BLACK.
or ...
Using PDFBox ...
//create the page PDDocument doc = new PDDocument();
PDPage page1 = new PDPage();
doc.addPage(page1);
//create the stream
PDPageContentStream stream1 = new PDPageContentStream(doc, page1);
//to simply draw an underscore with the coordinates
//where the first is x start, second y start, third x end, fourth y end
stream1.drawLine(20, 740, 590, 740);
//to draw an underscore thicker than one pixel
//first x begin second y begin third length fourth thickness
stream1.addRect(345, 568, 70, 2);
stream1.setNonStrokingColor(Color.BLACK); stream1.fill();
Another solution could be drawing a yellow-ish rectangle with a lower alpha, like in the follow sample code:
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page, AppendMode.APPEND, true, true);
PDFont font = PDType1Font.COURIER;
final int fontSize = 16;
//Writing text
contentStream.beginText();
contentStream.setFont(font, fontSize );
contentStream.newLineAtOffset(25, 250);
contentStream.showText("Hello world");
contentStream.endText();
//Changing alpha mode
PDExtendedGraphicsState gs = new PDExtendedGraphicsState();
gs.setNonStrokingAlphaConstant(0.2f);
gs.setStrokingAlphaConstant(0.2f);
gs.setBlendMode(BlendMode.MULTIPLY);
contentStream.setGraphicsStateParameters(gs);
//Setting color
contentStream.setNonStrokingColor(new Color(255, 255, 0, 100));
//Highlighting (that is, drawing a rectangle)
contentStream.addRect(25, 250, font.getStringWidth("Hello world")*fontSize/1000, font.getBoundingBox().getHeight()*fontSize/1000);
contentStream.fill();
contentStream.close();
//Resetting alpha means creating a new content stream...
//writing a new rectangle just to test alpha changing
contentStream = new PDPageContentStream(document, page, AppendMode.APPEND, true, true);
gs = new PDExtendedGraphicsState();
gs.setNonStrokingAlphaConstant(1f);
gs.setStrokingAlphaConstant(1f);
gs.setBlendMode(BlendMode.MULTIPLY);
contentStream.setGraphicsStateParameters(gs);
contentStream.setNonStrokingColor(new Color(255, 255, 0, 100));
contentStream.addRect(50, 50, 50, 50);
contentStream.fill();
contentStream.close();
document.save(Constants.PATH);
document.close();
Producing this as result

Categories