PDFbox Could not find font: /Helv - java

I try to add form fields to existing PDF file but the following error appears PDFbox Could not find font: /Helv
My code in Java has the following view:
PDDocument pdf = PDDocument.load(inputStream);
PDDocumentCatalog docCatalog = pdf.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDPage page = pdf.getPage(0);
PDTextField textBox = new PDTextField(acroForm);
textBox.setPartialName("SampleField");
acroForm.getFields().add(textBox);
PDAnnotationWidget widget = textBox.getWidgets().get(0);
PDRectangle rect = new PDRectangle(0, 0, 0, 0);
widget.setRectangle(rect);
widget.setPage(page);
widget.setAppearance(acroForm.getFields().get(0).getWidgets().get(0).getAppearance());
widget.setPrinted(false);
page.getAnnotations().add(widget);
acroForm.refreshAppearances();
acroForm.flatten();
pdf.save(outputStream);
pdf.close();
Do you have any ideas why the exception appears?
There is top of stack trace
java.io.IOException: Could not find font: /Helv
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processSetFont(PDDefaultAppearanceString.java:179)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processOperator(PDDefaultAppearanceString.java:132)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processAppearanceStringOperators(PDDefaultAppearanceString.java:108)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.<init>(PDDefaultAppearanceString.java:86)
at org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:93)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.<init>(AppearanceGeneratorHelper.java:100)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:262)
at org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:368)
at com.workjam.service.impl.PDFService.fillForm(PDFService.java:85)
Here is the link for PDF: https://drive.google.com/file/d/0B2--NSDOiujoR3hOZFYteUl2UE0/view?usp=sharing

Your new text field doesn't have a default appearance, so PDFBox makes one for you (/Helv 0 Tf 0 g).
Solution 1: get it from the field you're using (this will not work with every PDF because you're making several assumptions, i.e. that there is a field and that it is a text field)
textBox.setDefaultAppearance(((PDTextField)acroForm.getFields().get(0)).getDefaultAppearance());
Solution 2: initialize the default resources:
PDResources resources = new PDResources();
resources.put(COSName.getPDFName("Helv"), PDType1Font.HELVETICA);
acroForm.setDefaultResources(resources);
See also the CreateSimpleForm.java example from the source code download.
Update: this has been fixed in 2.0.8, see issue PDFBOX-3943.

The cause is a combination of you and the source PDF not providing a default appearance for the text field and PDFBox providing defaults inconsequentially.
The default appearance
According to the specification, each field containing variable text (e.g. your text field) must have a DA default appearance value:
DA
string
(Required; inheritable) The default appearance string containing a sequence of valid page-content graphics or text state operators that define such properties as the field’s text size and colour.
(ISO 32000-1, Table 222 – Additional entries common to all fields containing variable text)
In addition to parent fields in the field tree, the DA value can also be inherited from the AcroForm dictionary:
DA
string
(Optional) A document-wide default value for the DA attribute of variable text fields (see 12.7.3.3, “Variable Text”).
(ISO 32000-1, Table 218 – Entries in the interactive form dictionary)
In your PDF
You don't provide a default appearance, and your PDF does not have a default in the AcroForm dictionary.
Thus, strictly speaking, it is not valid at the moment you call acroForm.refreshAppearances(). So PDFBox could reject that call due to missing information.
It works differently, though, as PDFBox provides defaults for certain AcroForm dictionary entries if they are not present, in particular
final String adobeDefaultAppearanceString = "/Helv 0 Tf 0 g ";
// DA entry is required
if (getDefaultAppearance().length() == 0)
{
setDefaultAppearance(adobeDefaultAppearanceString);
}
Unfortunately, though, PDFBox does not ensure that the font Helv used here is in the default resources unless they are also missing completely.
Solutions
I just saw Tilman wrote an answer here, too. You can find solutions to your issue there.

Related

pdfbox embedding subset font for annotations

I am trying to use Apache PDFBOX v2.0.21 to modify existing PDF documents, adding signatures and annotations. That means that I am actively using incremental save mode. I am also embedding LiberationSans font to accommodate some Unicode characters. It makes sense for me to use the subsetting feature of PDF embedded fonts as embedding LiberationSans in full makes the PDF file around 200+ KB more in side.
After multiple trials and errors I finally managed to have something working - all but the font subsetting. The way I do this is to initialize the PDFont object once using
try (InputStream fs = PDFService.class.getResourceAsStream("/static/fonts/LiberationSans-Regular.ttf")) {
_font = PDType0Font.load(pddoc, fs, true);
}
And then to use custom Appearance Stream to show the text.
private void addAnnotation(String name, PDDocument doc, PDPage page, float x, float y, String text) throws IOException {
List<PDAnnotation> annotations = page.getAnnotations();
PDAnnotationRubberStamp t = new PDAnnotationRubberStamp();
t.setAnnotationName(name); // might play important role
t.setPrinted(true); // always visible
t.setReadOnly(true); // does not interact with user
t.setContents(text);
PDRectangle rect = ....;
t.setRectangle(rect);
PDAppearanceDictionary ap = new PDAppearanceDictionary();
ap.setNormalAppearance(createAppearanceStream(doc, t));
ap.getCOSObject().setNeedToBeUpdated(true);
t.setAppearance(ap);
annotations.add(t);
page.setAnnotations(annotations);
t.getCOSObject().setNeedToBeUpdated(true);
page.getResources().getCOSObject().setNeedToBeUpdated(true);
page.getCOSObject().setNeedToBeUpdated(true);
doc.getDocumentCatalog().getPages().getCOSObject().setNeedToBeUpdated(true);
doc.getDocumentCatalog().getCOSObject().setNeedToBeUpdated(true);
}
private PDAppearanceStream createAppearanceStream(final PDDocument document, PDAnnotation ann) throws IOException
{
PDAppearanceStream aps = new PDAppearanceStream(document);
PDRectangle rect = ann.getRectangle();
rect = new PDRectangle(0, 0, rect.getWidth(), rect.getHeight());
aps.setBBox(rect); // set bounding box to the dimensions of the annotation itself
// embed our unicode font (NB: yes, this needs to be done otherwise aps.getResources() == null which will cause NPE later during setFont)
PDResources res = new PDResources();
_fontName = res.add(_font).getName();
aps.setResources(res);
PDAppearanceContentStream apsContent = null;
try {
// draw directly on the XObject's content stream
apsContent = new PDAppearanceContentStream(aps);
apsContent.beginText();
apsContent.setFont(_font, _fontSize);
apsContent.showText(ann.getContents());
apsContent.endText();
}
finally {
if (apsContent != null) {
try { apsContent.close(); } catch (Exception ex) { log.error(ex.getMessage(), ex); }
}
}
aps.getResources().getCOSObject().setNeedToBeUpdated(true);
aps.getCOSObject().setNeedToBeUpdated(true);
return aps;
}
This code runs, but creates a PDF with dots instead of actual characters, which, I guess, means that the font subset has not been embedded. Moreover, I get the following warnings:
2021-04-17 12:33:31.326 WARN 20820 --- [ main]
o.a.p.pdmodel.PDAbstractContentStream : attempting to use subset
font LiberationSans without proper context
After looking through the source code, I get and I guess that I am messing something up when creating the appearance stream - somehow it's not connected with the PDDocument and the subsetting does not continue normally. Note that the above code works well when the font is embedded fully (i.e. if I call PDType0Font.load with the last parameter set to false)
Can anyone think of some hint to give to me? Thank you!
I don't know - am I lucky? It is very often that luckiness in programming points to something completely wrong or misleading. In any case, if someone can still give a hint, my ears are more than open...
Again, after looking through the code, I saw the following in PDDocument.save():
// subset designated fonts
for (PDFont font : fontsToSubset)
{
font.subset();
}
This is not happening in PDDocument.saveIncremental() which I am using. Just to mess around with the code, I went and did the following just before calling saveIncremental() on my document:
_font.subset(); // you can see in the beginning of the question how _font is created
_font.getCOSObject().setNeedToBeUpdated(true);
pddoc.saveIncremental(baos);
Believe it or not, but the document was saved correctly - at least it appears correct in Acrobat Reader DC and Chrome & Firefox PDF viewers. Note that Unicode codepoints are added to the subset for the font during showText() on appearance content stream.
UPDATE 18/04/2021: as I mentioned in the comments, I got reports from users that started seeing messages like "Cannot extract the embedded font XXXXXX+LiberationSans-Regular from ...", when they opened the modified PDF files. Strangely enough, I didn't see these messages during my tests. It turns out that my copy of Acrobat Reader DC was newer than theirs, and specifically with the continuous release version 2021.001.20149 no errors were shown, while with the continuous release version 2020.012.20043 the above message was shown.
After investigations, it turns out that the problem was with the way I was embedding the font. I am not aware if any other way exists, and I am not that familiar with the PDF specification to know otherwise. What I was doing, as you can see from the above code, was to load the font ONCE for the document, and then to use it freely in the resource dictionary of the appearance stream of EVERY annotation. This had as a result all the resource dictionaries of the annotation content streams to reference an F1 font that was defined with the SAME /BaseFont name. The PDF Reference, 3rd ed. on p.323 specifically states that:
"... the PostScript name of the font - ... - begins with a tag
followed by a plus sign (+). The tag consists of exactly six uppercase
letters; the choice of letters is arbitrary, but different subsets in
the same PDF file must have different tags..."
Once I started to call PDType0Font.load for each of my annotations and calling subset() (and of course setNeedToBeUpdated) after creating appearance stream for each of them, I saw that the BaseName attributes started to look indeed differently - and indeed, the older 2020 version of Acrobat Reader DC stopped complaining.
[edit 07/10/2021: even trying to use a single PDFont object per page (having multiple annotations with this font), and subsetting it once, after having called showText on appearances of all annotations, appears to not work - it appears that the subsetting uses the letters I passed to the first showText, and not the others, resulting in wrong rendering of the 2nd, 3rd etc. annotations that might have characters that didn't exist in the 1st annotation - so I reiterate that what worked was to use loadFont for each separate annotation and then (after modifying appearance with showText, which will mark the letters to be used during subsetting) to call subset() on each of these fonts (which will result in the change of the font name)]
Note that other than using iText RUPS for inspecting the PDF contents, one could use Foxit PDF viewer to at least ensure that the subset font names are different. Acrobat Reader DC and PDF-xChange in Properties -> Fonts just show the initial font name, like LiberationSans, without showing the 6-letter unique prefix.
UPDATE 19/04/2021 I am still working on this issue - because I still get reports about the infamous "Cannot extract the embedded font" message. It is quite possible that the original cause of that message was not (or not only) the fact that the different subsets had same BaseFont names. One thing that I am observing is that on some computers, the stamp annotations that I am using cause Acrobat Reader DC to open automatically the so called "Comments pane" - there are options to turn this automatic thing off (Preferences -> Commenting -> Show comments pane when a PDF with comments is opened). When this pane opens, either manually or automatically, the error message appears (and I was on my wits ends to see why same version of Acrobat Reader DC behaves differently for different machines). I think that Acrobat Reader tries to extract the full version of the font and fails, since it is only a subset. But, I guess, this doesn't have to do with the semantic contents of the document - the document still passes "qpdf --check". I am currently trying to find if it is possible to restrict stamps to not allow comments - i.e. some way to disable the comments pane in Acrobat Reader DC, although I have little hope.
UPDATE 20/04/2021 opened a new question here

Apache PDFBox - Not able to read all fields from PDF

We are trying to read a PDF and populate values in it dynamically. Based on a incoming request we run some rules and derive what PDF to use and then populate values to it dynamically. We are using Apache PDFBox version 2.0.11 and for some reason we are facing issues with a particular PDF Template. We are not able to read some of the fields for this particular template and generated PDF is incomplete. Wondering if something to do with original PDF itself. Here is the code snippet we are using to read fields and populate it.
PDDocument pdfTemplate = PDDocument.load(inputStream);
PDDocumentCatalog docCatalog = pdfTemplate.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
acroForm.setXFA(null);
COSArrayList<PDField> list = (COSArrayList<PDField>) acroForm.getFields();
for (PDField field : list) {
field.setReadOnly(true);
logger.debug("Field name "+field.getFullyQualifiedName())))
//use logic to populate value by calling field.setValue();
}
When we tried to print each field name we observed that more than 30 percent of the fields are missing. Can any one help on how to fix it? PDF is of 15 pages with different questions. If the issue is with Original PDF itself then what might be reason to not able read some of the fields?
You probably have hierarchical fields on that form. Try something like the code below instead...
PDDocument pdfTemplate = PDDocument.load(inputStream);
PDDocumentCatalog docCatalog = pdfTemplate.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDFieldTree fieldTree = acroForm.getFieldTree();
Iterator<PDField> fieldTreeIterator = fieldTree.iterator();
while (fieldTreeIterator.hasNext()) {
PDField field = fieldTreeIterator.next();
if (field instanceof PDTerminalField) {
String fullyQualifiedName = field.getFullyQualifiedName();
logger.debug("Field name "+fullyQualifiedName);
}
}
PDAcroForm.getFields() only gets the root fields, not their children. PDAcroForm.getFieldTree() gets all fields but then you need to test to see if they're terminal before setting a value. Non-terminal fields can't have a value and don't have widgets (representations on the page) associated with them. You'll know this is the problem if the fully qualified name has periods in it. The periods represent the hierarchy.
Issue was resolved after reconstructing the whole PDF again.

How to reuse font from one pdf in another in iText7?

I'm trying to open PDF file in iText7, write there some new piece of text, apply font from original PDF to it and save it in another PDF document. I'm using Java 1.8
Thus, I need a set of font names used in original pdf, from where user will choose one, that will be applied to a new paragraph.
And I also need to somehow apply this font.
For now I have this piece of code, that I've taken from here:
public static void main(String[] args) throws IOException {
PdfDocument pdf = new PdfDocument(new PdfReader("example.pdf"));
Set<PdfName> fonts = listAllUsedFonts(pdf);
fonts.stream().forEach(System.out::println);
}
public static Set<PdfName> listAllUsedFonts(PdfDocument pdfDoc) throws IOException {
PdfDictionary acroForm = pdfDoc.getCatalog().getPdfObject().getAsDictionary(PdfName.AcroForm);
if (acroForm == null) {
return null;
}
PdfDictionary dr = acroForm.getAsDictionary(PdfName.DR);
if (dr == null) {
return null;
}
PdfDictionary font = dr.getAsDictionary(PdfName.Font);
if (font == null) {
return null;
}
return font.keySet();
}
It returns this output:
/Helv
/ZaDb
However, the only font example.pdf has is Verdana (it is what document properties in Adobe Acrobat Pro says). Moreover, there are Verdana in two implementations: Bold and normal.
So, I have these questions:
Why does this function returns two fonts instead of one (Verdana).
How can I generate normal well-read names of fonts to display them
to user (e.g. Helvetica instead of Helv)?
How can I apply font got from the original document to the
new paragraph?
Thank you in advance!
If you only wish to display the names of the fonts being used (which you are legally allowed to do) you can use the following code:
public void go() throws IOException {
final Set<String> usedFontNames = new HashSet<>();
IEventListener fontNameExtractionStrategy = new IEventListener() {
#Override
public void eventOccurred(IEventData iEventData, EventType eventType) {
if(iEventData instanceof TextRenderInfo)
{
TextRenderInfo tri = (TextRenderInfo) iEventData;
String fontName = tri.getFont().getFontProgram().getFontNames().getFontName();
usedFontNames.add(fontName);
}
}
#Override
public Set<EventType> getSupportedEvents() {
return null;
}
};
PdfCanvasProcessor parser = new PdfCanvasProcessor(fontNameExtractionStrategy);
File inputFile = new File("YOUR_INPUT_FILE_HERE.pdf");
PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile));
for(int i=1;i<=pdfDocument.getNumberOfPages();i++)
{
parser.processPageContent(pdfDocument.getPage(i));
}
pdfDocument.close();
for(String fontName : usedFontNames)
{
System.out.println(fontName);
}
}
You should not reuse a font from one PDF in another PDF, and here's why: fonts are hardly ever fully embedded in a PDF document. For instance: you use the font Verdana regular (238 KB) and the font Verdana bold (207 KB), but when you create a simple PDF document saying "Hello World" in regular and bold, the file size will be much smaller than 238 + 207 KB. Why is this? Because the PDF will only consist of a subset of the font Verdana regular and a subset of the font Verdana bold.
You may have noticed that I am talking of the font Verdana regular
and the font Verdana bold. Those are two different fonts from
the same font family. Reading your question, I notice that you don't make that distinction. You talk about the font Verdana with
two implementations bold and normal. This is incorrect. You should
talk about the font family Verdana and two fonts Verdana bold and
Verdana regular.
A PDF usually contains subsets of different fonts. It can even contain two different subsets of the same font. See also What are the extra characters in the font name of my PDF?
Your goal is to take the font of one PDF and to use that font of another PDF. However, suppose that your original PDF only contains the subset that is required to write "Hello World" and that you want to create a new PDF saying "Hello Universe." That will never work, because the subset won't contain the glyphs to render the letter U, n, i, v, r, and s.
Also take into account that fonts are usually licensed. Many fonts
have a license that states that you can use to font to create a
document and embed that font in that document. However, there is
often a clause that says that other people are not allowed to
extract to font to use it in a different context. For instance: you paid for the font when you purchased a copy of MS Windows, but someone
who receives a PDF containing that font may not have a license to use
that font. See Does one need to have a license for fonts if we are using ttf files in itext?
Given the technical and legal issues related to your question, I don't think it makes sense to work on a code sample. Your design is flawed. You should work with a licensed font program instead of trying to extract a font from an existing PDF. This answers question 3: How can I apply font got from the original document to the new paragraph? You can't: it is forbidden by law (see Extra info below) and it might be technically impossible if the subset doesn't contain all the characters you need!
Furthermore, the sample you found on the official iText web site looks for the fonts defined in a form. /Helv and ZaDb refer to Helvetica and Zapfdingbats. Those are two fonts of a set of 14 known as the Standard Type 1 fonts. These fonts are never embedded in the document since every viewer is supposed to know how to render them. You don't need a full font program if you want to use these fonts; the font metrics are sufficient. For instance: iText ships with 14 AFM files (AFM = Adobe Font Metrics) that contain the font metrics.
You wonder why you don't find Verdana, since Verdana is used as font for the text in your document, but you are looking at the wrong place. You are asking iText for the fonts used for the form, not for the fonts used in the text. This answer question 1: Why does this function returns two fonts instead of one (Verdana).
As for your question 2: you are looking at the internal name of the font, and that internal name can be anything (even /F1, /F2,...). The postscript name of the font is stored in the font dictionary. That's the name you need.
Extra info:
I checked the Verdana license:
Microsoft supplied font. You may use this font to create, display, and print content as permitted by the license terms or terms of use, of the Microsoft product, service, or content in which this font was included. You may only (i) embed this font in content as permitted by the embedding restrictions included in this font; and (ii) temporarily download this font to a printer or other output device to help print content. Any other use is prohibited.
The use you want to make of the font is prohibited. If you have a license for Verdana, you can embed the font in a PDF. However, it is not permitted to extract that font and use it for another purpose. You need to use the original font program.

Java PDFBox does not maintain the font appearence of a field if it appears severraly in a PDF Form

I need to fill a pdf form dynamically from my java web app and I found PDFBox to be really useful except for an issue or challenge am facing when I have multiple fields with same name.
I have 5 fields with same name(lets say 'wcode') in different places on a one page pdf form. This fields have different fonts. Normally when you fill out one field manually the other fields automatically pick the sames value, the same this happens when I fill it using PDFbox except that PDFBox changes all my 5 fields to have same font as the first field to appear in the pdf form.
Here is the code used to fill the field.
PDDocument _pdfDocument = PDDocument.load(new File(originalPdf))
PDAcroForm acroForm = _pdfDocument.getDocumentCatalog().getAcroForm();
PDTextField myCodeField = (PDTextField) acroForm.getField("wcode");
if (myCodeField != null) {
myCodeField .setValue(my.getCode());
}
//Refresh layout && Flatten the document
acroForm.refreshAppearances();
acroForm.flatten();
_pdfDocument.save(outputFile);
I added
acroForm.refreshAppearances();
after some research but that did not change anything.
So if the first 'wcode' field to appear on the pdf form is 6pt all the other 'wcode' fields in the rest the pdf will be 6pt even if I had set them in appearance properties to 12pt.
I am using PDFBox 2.0.5
The issue has been resolved in version PDFBox 2.0.6 released about a month ago.
Check comment on the jira 3837 here

How to set multiline value to PDField

I'm using PDFBox 1.8.x for filling out an acroform in a PDF document. For this, I'm iterating over all PDFields in the document and setting the correct values which are stored in my HashMap (aParameter).
PDAcroForm acroform = aTemplate.getDocumentCatalog().getAcroForm();
List fields = acroform.getFields();
for(int i=0; i<fields.size(); i++)
{
PDField field = (PDField) fields.get(i);
String lValue = aParameter.get(field.getPartialName());
if(lValue != null) field.setValue(lValue);
field.setReadonly(true);
}
It's working very great except for my one and only multiline textbox. In my resulting PDF, the correct text is in my multiline textbox, but without any new lines. While searching for reasons I found some very old answers that PDFBox doesn't support multilines, it can't set a new line character on it's own. Is this behavior still up to date? Or are there any possibilities to automatically break the text to a new line, when the width of the textbox is reached?
This answer looks like a solution, but they are using another technique (drawString) to write the PDF. Is there a possibility to modify this idea to match my behavior of filling out PDFields?
I'm happy about any ideas!

Categories