Update: This is working in adobe reader, but not in the osx default pdf reader. Many of our users use the default osx reader so ideally I could get it working there, I know it supports annotations)
I am using Apache PDFBox 2.0.22 to try and add annotations to a pdf programmatically. The code I have runs, and produces a pdf with an annotation, but the content's text of the annotation is empty (See screenshot). What am I doing wrong?
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.graphics.color.PDColor;
import org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.List;
public class Sample{
public static void main(String[] args) throws FileNotFoundException, IOException{
PDDocument doc = PDDocument.load(new File("test.pdf"));
try {
//insert new page
PDPage page = (PDPage) doc.getDocumentCatalog().getPages().get(0);
List<PDAnnotation> annotations = page.getAnnotations();
//generate instanse for annotation
PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
//set the rectangle
PDRectangle position = new PDRectangle();
position.setLowerLeftX(170);
position.setLowerLeftY(125);
position.setUpperRightX(195);
position.setUpperRightY(140);
txtMark.setRectangle(position);
//set the quadpoint
float[] quads = new float[8];
//x1,y1
quads[0] = position.getLowerLeftX();
quads[1] = position.getUpperRightY() - 2;
//x2,y2
quads[2] = position.getUpperRightX();
quads[3] = quads[1];
//x3,y3
quads[4] = quads[0];
quads[5] = position.getLowerLeftY() - 2;
//x4,y4
quads[6] = quads[2];
quads[7] = quads[5];
txtMark.setQuadPoints(quads);
txtMark.setAnnotationName("My annotation");
txtMark.setTitlePopup("title popup");
txtMark.setContents("Highlighted since it's important");
txtMark.setRichContents("Here is some rich content");
PDColor blue = new PDColor(new float[] { 0, 0, 1 }, PDDeviceRGB.INSTANCE);
txtMark.setColor(blue);
annotations.add(txtMark);
page.setAnnotations(annotations);
doc.save("test-out.pdf");
}finally
{
doc.close();
}
}
}
just adding this as an answer to have a better formatting.
PDFBOX is constantly changing, and maybe .constructAppearances() works better at some point - but meanwhile, ... I myself followed the advice to create my own appearance stream for the annotation. But it still did not appear to work. I had tried everything, it seemed - and in every case sometimes the annotations would appear in Acrobat Reader DC, but they would not show in Chrome or Firefox default PDF viewer.
First of all - if you setContent() on the annotation object itself, AND if you don't supply your own appearance stream - the viewers like Acrobat Reader DC will sometimes try to construct their own version of appearance - which varies across viewers - now if you supply your own, .. some viewers will still construct their own.. e.g. interactive elements like a small yellow icon that indicates that we have an annotation here and which shows the contents if you hover the mouse cursor over it - that's why, I guess, some of the programmers will try to not call setContent() (?)
Moreover, if you, like me, choose to use PDAnnotationTextMarkup with the SUB_TYPE_FREETEXT subtype - to just write something over the existing PDF, Acrobat Reader DC specifically will attempt to create its own visual something - it seems that it tries to modify (to correct?) the contents of the document, producing and showing a slightly changed version, which will result in existing digital signatures changing state - we can overcome this problem according to this SO q&a.
But it seems that there is a better way to write a static text over the PDF which Acrobat Reader DC will not try to modify, and it's the Rubber Stamp annotation. You can plainly substitute PDAnnotationTextMarkup (and PDFBOX 3.0.0 has another name for that..) with PDAnnotationRubberStamp. Everything else stays the same.
So what I was doing wrong in all my annotations code was to misplace the bounding box for the annotation's appearance. setBBox function takes as a parameter a PDRectangle, and PDAnnotation.getRectangle() returns a PDRectangle - but the first cannot use the second as is! Because the second PDRectangle coordinates are in relation to the page's lower left corner, while we need a rectangle that has 0, 0 as X, Y.
So the code that I came up with is as follows (and I haven't yet starting with _font and the font embedding in general which seems like a huge topic..):
private void addAnnotation(String name, PDDocument doc, PDPage page, float x, float y, String text) throws IOException {
List<PDAnnotation> annotations = page.getAnnotations();
PDAnnotationRubberStamp t = new PDAnnotationRubberStamp();
t.setAnnotationName(name); // might play important role
t.setPrinted(true); // always visible
t.setReadOnly(true); // does not interact with user
t.setContents(text);
// calculate realWidth, realHeight according to font size (e.g. using _font.getStringWidth(text))
float realWidth = 100, realHeight = 100;
PDRectangle rect = new PDRectangle(x, y, realWidth, realHeight);
t.setRectangle(rect);
PDAppearanceDictionary ap = new PDAppearanceDictionary();
ap.setNormalAppearance(createAppearanceStream(doc, t));
t.setAppearance(ap);
annotations.add(t);
page.setAnnotations(annotations);
// these must be set for incremental save to work properly (PDFBOX < 3.0.0 at least?)
ap.getCOSObject().setNeedToBeUpdated(true);
t.getCOSObject().setNeedToBeUpdated(true);
page.getResources().getCOSObject().setNeedToBeUpdated(true);
page.getCOSObject().setNeedToBeUpdated(true);
doc.getDocumentCatalog().getPages().getCOSObject().setNeedToBeUpdated(true);
doc.getDocumentCatalog().getCOSObject().setNeedToBeUpdated(true);
}
private void modifyAppearanceStream(PDAppearanceStream aps, PDAnnotation ann) throws IOException {
PDAppearanceContentStream apsContent = null;
try {
PDRectangle rect = ann.getRectangle();
rect = new PDRectangle(0, 0, rect.getWidth(), rect.getHeight()); // need to be relative - this is mega important because otherwise it appears as if nothing is printed
aps.setBBox(rect); // set bounding box to the dimensions of the annotation itself
// embed our unicode font (NB: yes, this needs to be done otherwise aps.getResources() == null which will cause NPE later during setFont)
PDResources res = new PDResources();
_fontName = res.add(_font).getName(); // okay I create _font elsewhere
aps.setResources(res);
// draw directly on the XObject's content stream
apsContent = new PDAppearanceContentStream(aps);
apsContent.beginText();
apsContent.setFont(PDType1Font.HELVETICA_BOLD, _fontSize); // _font
apsContent.setTextMatrix(Matrix.getTranslateInstance(0, 1));
apsContent.showText(ann.getContents());
apsContent.endText();
}
finally {
if (apsContent != null) {
try { apsContent.close(); } catch (Exception ex) { log.error(ex.getMessage(), ex); }
}
}
aps.getResources().getCOSObject().setNeedToBeUpdated(true);
aps.getCOSObject().setNeedToBeUpdated(true);
}
private PDAppearanceStream createAppearanceStream(final PDDocument document, PDAnnotation ann) throws IOException
{
PDAppearanceStream aps = new PDAppearanceStream(document);
modifyAppearanceStream(aps, ann);
return aps;
}
Related
Context
I am writing a Java code which fill PDF Forms using PDFBox with some user inputs.
Some of the inputs are in Chinese.
When I generated the PDF, I don't have any errors in the logs but the rendered text is absolutely not the same.
What I currently have
Here is what I do:
In the PDF file, I specified the SimSun font for the field using Adobe Pro.
This font handle Simplified Chinese characters.
I have the font SimSun installed on my server.
PDFBox doesn't display any error (if I remove the SimSun font from my server then PDFBox fallback on another font that is not able to render the characters). So i guess it is able to find the font and use it.
What I tried
I was able to make this work but I had to manually load the font in the code and add it to the PDF (see examples below).
But that is not a solution as it means that I would have to load the font every time and add it the the PDF. I would also have to do the same for many other languages.
As far as I understood, PDFBox should be able to use any fonts installed on the server.
Below is a test class that tries 3 different approaches. Only the last one works so far:
Classic generation
Simply put Chinese characters inside the text field without changing anything.
The characters are not rendered correctly (some of them are missing and the ones displayed does not match the input).
Generation with embedded font
Try to embed the SimSun font inside the PDF with the PDResource.add(font) method.
The result is the same as the first method.
Embed the font and use it
I embed the SimSun font and I also override the font used in the TextField to use the SimSun font I just added.
This approach works.
After quite a few readings, I found out that the issue might come from the version of the font I am using.
Windows 8 (which I use to create the form) uses v5.04 of Simsun font.
I use v2.10 on my laptop and my servers, both being Linux based (I can not find the v5.04).
However, I don't know:
If the issue is really coming from this.
If I have the right to use this font, as it is developed by Microsoft (and Apple).
Where to find the latest version of it.
I tried using another font but:
I only find OTF fonts (and not TTF) that support Chinese characters.
PDFBox does not support OTF (yet). It is planed for v3.0.0.
So if someone has an idea on how to make this work without having to embed and change the font's name in the code, that would be great!
Here are the PDF I used and the code that tests the 3 methods I talked about.
The TextField in the pdf is named comment.
package org.test;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.cos.COSString;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Hello world!
*/
public class App {
private static final String SIMPLIFIED_CHINESE_STRING = "我不明白为什么它不起作用。";
public static void main(String[] args) throws IOException {
System.out.println("Hello World!");
// Test 1
classicGeneration();
// Test 2
generationWithEmbededFont();
Test 3
generationWithFontOverride();
System.out.println("Bye!");
}
/**
* Classic PDF generation without any changes to the PDF.
*/
private static void classicGeneration() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDField commentField = acroForm.getField("comment");
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-classic-generation.pdf"));
}
/**
* Trying to embed the font in the PDF. It doesn't seem to work.
* The result is the same as classicGeneration method.
*/
private static void generationWithEmbededFont() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDFont font = PDType0Font.load(document, new File("/usr/share/fonts/SimSun.ttf"));
PDResources res = acroForm.getDefaultResources();
if (res == null) {
res = new PDResources();
}
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
PDField commentField = acroForm.getField("comment");
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-with-embeded-font.pdf"));
}
/**
* Embed the font in the PDF and change the font used in the TextField to use this one.
* Here the PDF is correctly rendered and all the characters are displayed.
* #throws IOException
*/
private static void generationWithFontOverride() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDField commentField = acroForm.getField("comment");
// Load the font
InputStream resourceAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("SimSun.ttf");
PDFont font = PDType0Font.load(document, resourceAsStream);
PDResources res = acroForm.getDefaultResources();
if (res == null) {
res = new PDResources();
}
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
// Change the font used by the TextField
COSDictionary dict = commentField.getCOSObject();
COSString defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);
if (defaultAppearance != null) {
String currentFont = dict.getString(COSName.DA);
// Retrieve the current font size and color used for the field in order to use the same but with the new font.
String regex = "[\\w]* ([\\w\\s]*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(currentFont);
// Default font size if we fail to extract the current one
String fontSize = " 11 Tf";
if (matcher.find()) {
fontSize = " " + matcher.group(1);
}
// Change the font of the TextField.
dict.setString(COSName.DA, "/" + fontName.getName() + fontSize);
}
commentField.getCOSObject().addAll(dict);
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-with-font-override.pdf"));
}
// HELPER
private static PDDocument loadPdf() throws IOException {
InputStream stream = Thread.currentThread().getContextClassLoader().getResourceAsStream("sample.pdf");
return PDDocument.load(stream);
}
}
I simply want to add color to the background of the PDF that I'm generating with this library.
I want my pages to have color as the background or even a picture. The documentation got me dizzy. There are no useful or meaningful descriptions; it can hardly be called documentation.
Why is this simple task so hard to achieve with this library? Do I have to go through the trouble of reading a whole book, just to understand how to use a library?
There is no straightforward answer online, or in their "examples", but I managed to find a similar question about having various page background-colors in the PDF file here.
UPDATE: It seems that the iText-7 eBooks/resources have been updated during the past 3 years. The following links are
working as of 21/07/2021.
NEW EBOOK URL FOR ITEXT-7 BUILDING BLOCKS HERE
VARIOUS CODE EXAMPLES HERE
ALL RESOURCES INDEX HERE
The solution is overly complex, in my opinion. This is just background color and it is a task that could have been made considerably less time consuming to understand. Making a framework as modular and flexible as possible is understandable, but sometimes there are some trivial tasks that people just want to get done quickly.
Anyway, here is the solution for anyone who might have the same problem as I did:
//Class that creates the PDF
public class PdfCreator {
//Helper class so we can add colour to our pages when we call it from outer class
private static class PageBackgroundsEvent implements IEventHandler {
#Override
public void handleEvent(Event event) {
PdfDocumentEvent docEvent = (PdfDocumentEvent) event;
PdfPage page = docEvent.getPage();
PdfCanvas canvas = new PdfCanvas(page);
Rectangle rect = page.getPageSize();
//I used custom rgb for Color
Color bgColour = new DeviceRgb(255, 204, 204);
canvas .saveState()
.setFillColor(bgColour)
.rectangle(rect.getLeft(), rect.getBottom(), rect.getWidth(), rect.getHeight())
.fillStroke()
.restoreState();
}
}
//PATH_OF_FILE is the path that the PDF will be created at.
String filename = PATH_OF_FILE + "/myFile.pdf";
OutputStream outputStream = new FileOutputStream(new File(filename));
PdfWriter writer = new PdfWriter(outputStream);
PdfDocument pdfDoc = new PdfDocument(writer);
pdfDoc.addEventHandler(PdfDocumentEvent.START_PAGE, new PageBackgroundsEvent());
PageSize pageSize = pdfDoc.getDefaultPageSize();
Document document = new Document(pdfDoc, pageSize);
document.close();
}
Background images can be added the same way! See this link
to set a background image :
pdfDoc.addEventHandler(PdfDocumentEvent.END_PAGE, event -> {
new PdfCanvas(((PdfDocumentEvent)event).getPage())
.addImageAt(ImageFactory.create("filename.png"), 50f, 50f, true);
});
I'm trying to overlay a PDF on-top of all pages in a PDF, at the top left hand side of each page. The PDFs that will be of different sizes. The PDF overlay is a constant size, which is smaller than all the pages of the PDF.
I can only seem to get PDFBox to put the overlay in the middle of the PDFs.
I would prefer not to convert the PDF overlay to a bitmap (PDImageXObject) and insert it onto the pages. Here is some rough code which I'm playing about with:-
public static void main(String[] args) throws Exception {
String overlayPath = "C:\\OverlayPdf.pdf";
String overlayOnMePath = "C:\\ToBeOverlayedOn.pdf";
PDDocument overlayOnMe = PDDocument.load(new File(overlayOnMePath)); //Document to write to.
overlayPath = overlayPath + "Anno.pdf";
HashMap<Integer, String> overlayGuide = new HashMap<>();
for (int i = 0; i < overlayOnMe.getNumberOfPages(); i++) {
overlayGuide.put(i + 1, overlayPath);
}
Overlay overlay = new Overlay();
overlay.setInputPDF(overlayOnMe);
overlay.setOverlayPosition(Overlay.Position.FOREGROUND);
overlay.overlay(overlayGuide);
overlayOnMe.save(new File(overlayOnMePath + "_OVERLAYED.pdf"));
overlay.close();
}
My gut feeling is its an affine transformation but I couldn't get that working either.
I have created a new issue and it allows to pass a transform, this will be in version 2.0.10 or higher. This will be done in calculateAffineTransform by extending the overlay class. To put the stamp on the top left, the new method would look like this:
protected AffineTransform calculateAffineTransform(PDPage page, PDRectangle overlayMediaBox)
{
AffineTransform at = new AffineTransform();
PDRectangle pageMediaBox = page.getMediaBox();
at.translate(0, pageMediaBox.getHeight() - overlayMediaBox.getHeight());
return at;
}
I am working with Apache PDFBox 2.0.8. My Objective is to convert a PDF into Image and enlarge the canvas and put the contents in the center so that i can put some header and footer in the remaining space.
My issue is that the canvas is getting enlarged but the contents are not getting centered, they are stick to the bottom.
public class PDFRescale {
public static void main(String[] args) {
try {
String pdfFilename = "/MuhimbiPOC/Templates/Source_doc_withheaderfooter.pdf";
PDDocument document = PDDocument.load(new File(pdfFilename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
PDPage pge = new PDPage();
PDFRescale ps = new PDFRescale();
int pageCounter = 0;
for (PDPage page : document.getPages())
{
final PDRectangle mediaBox = pge.getMediaBox();
mediaBox.setUpperRightX((float) (mediaBox.getUpperRightX()));
mediaBox.setUpperRightY((float) (mediaBox.getUpperRightY() * 1.5));
mediaBox.setLowerLeftY((float) (mediaBox.getLowerLeftY() * 1.5));
// note that the page number parameter is zero based
page.setMediaBox(mediaBox);
BufferedImage bim = pdfRenderer.renderImageWithDPI(pageCounter, 140, ImageType.RGB);
// suffix in filename will be used as the file format
ImageIOUtil.writeImage(bim, pdfFilename + "-" + (pageCounter++) + ".png", 140);
}
System.out.println("Task Completed ... ");
document.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
My issue is that the canvas is getting enlarged but the contents are not getting centered, they are stick to the bottom.
That is your issue for PDF pages whose mediaBox.getLowerLeftY() is 0. While this is very common, it is not required. If you had worked with a more generic selection of PDFs, you'd have seen that your issue is that the former contents eventually can be anywhere, even off-screen!
The cause is that you do
mediaBox.setUpperRightY((float) (mediaBox.getUpperRightY() * 1.5));
mediaBox.setLowerLeftY((float) (mediaBox.getLowerLeftY() * 1.5));
This would only work if the origin was somewhere on the horizontal mid-screen axis.
Instead use something like
mediaBox.setUpperRightY(mediaBox.getUpperRightY() + mediaBox.getHeight() * 0.5f);
mediaBox.setLowerLeftY(mediaBox.getLowerLeftY() - mediaBox.getHeight() * 0.5f);
Another issue of your code: you only set the MediaBox and ignore the CropBox. pdfRenderer.renderImageWithDPI on the other hand uses the CropBox. Only for PDF pages without explicit CropBox your code enlarges the page area. For a generic solution you should also adapt the CropBox.
What I am trying to do here is to create text and place it onto a blank page. That page would then be overlayed onto another document and that would then be saved as one document. In 1.8 I was able to create a blank PDPage in a PDF, write text to it as needed, then overlay that PDF with another and then save or view on screen using the code below -
overlayDoc = new PDDocument();
page = new PDPage();
overlayDoc.addPage(page);
overlayObj = new Overlay();
font = PDType1Font.COURIER_OBLIQUE;
try {
contentStream = new PDPageContentStream(overlayDoc, page);
contentStream.setFont(font, 10);
}
catch (Exception e){
System.out.println("content stream failed");
}
After I created the stream, when I needed to write something to the overlay document's contentStream, I would call this method, give it my x, y coords and tell it what text to write (again, this is in my 1.8 version):
protected void writeString(int x, int y, String text) {
if (text == null) return;
try {
contentStream.moveTo(x, y);
contentStream.beginText();
contentStream.drawString(text); // deprecated. Use showText(String text)
contentStream.endText();
}
catch (Exception e){
System.out.println(text + " failed. " + e.toString());
}
}
I would call this method whenever I needed to add text and to wherever I needed to do so. After this, I would close my content stream and then merge the documents together as such:
import org.apache.pdfbox.Overlay;
Overlay overlayObj = new Overlay();
....
PDDocument finalDoc = overlayObj.overlay(overlayDoc, originalDoc);
finalDoc now contains a PDDocument which is my original PDF with text overlayed where needed. I could save it and view it as a BufferedImage on the desktop. The reason I moved to 2.0 was that first off I needed to stay on top of the most recent library and also that I was having issues putting an image onto the page (see here).
The issue I am having in this question is that 2.0 no longer has something similar to the org.apache.pdfbox.Overlay class. To confuse me even more is that there are two Overlay classes in 1.8 (org.apache.pdfbox.Overlay and org.apache.pdfbox.util.Overlay) whereas in 2.0 there is only one. The class I need (org.apache.pdfbox.Overlay), or the methods it offers at least, are not present in 2.0 as far as I can tell. I can only find org.apache.pdfbox.multipdf.Overlay.
Here's some quick code that works, it adds "deprecated" over a document and saves it elsewhere:
PDDocument overlayDoc = new PDDocument();
PDPage page = new PDPage();
overlayDoc.addPage(page);
Overlay overlayObj = new Overlay();
PDFont font = PDType1Font.COURIER_OBLIQUE;
PDPageContentStream contentStream = new PDPageContentStream(overlayDoc, page);
contentStream.setFont(font, 50);
contentStream.setNonStrokingColor(0);
contentStream.beginText();
contentStream.moveTextPositionByAmount(200, 200);
contentStream.drawString("deprecated"); // deprecated. Use showText(String text)
contentStream.endText();
contentStream.close();
PDDocument originalDoc = PDDocument.load(new File("...inputfile.pdf"));
overlayObj.setOverlayPosition(Overlay.Position.FOREGROUND);
overlayObj.setInputPDF(originalDoc);
overlayObj.setAllPagesOverlayPDF(overlayDoc);
Map<Integer, String> ovmap = new HashMap<Integer, String>(); // empty map is a dummy
overlayObj.setOutputFile("... result-with-overlay.pdf");
overlayObj.overlay(ovmap);
overlayDoc.close();
originalDoc.close();
What I did additionally to your version:
declare variables
close the content stream
set a color
set to foreground
set a text position (not a stroke path position)
add an empty map
And of course, I read the OverlayPDF source code, it shows more possibilities what you can do with the class.
Bonus content:
Do the same without using the Overlay class, which allows further manipulation of the document before saving it.
PDFont font = PDType1Font.COURIER_OBLIQUE;
PDDocument originalDoc = PDDocument.load(new File("...inputfile.pdf"));
PDPage page1 = originalDoc.getPage(0);
PDPageContentStream contentStream = new PDPageContentStream(originalDoc, page1, true, true, true);
contentStream.setFont(font, 50);
contentStream.setNonStrokingColor(0);
contentStream.beginText();
contentStream.moveTextPositionByAmount(200, 200);
contentStream.drawString("deprecated"); // deprecated. Use showText(String text)
contentStream.endText();
contentStream.close();
originalDoc.save("....result2.pdf");
originalDoc.close();