PdfBox embed fonts into existing document

PdfBox embed fonts into existing document - java

I have a pdf file which shows font properties in Okular (or whatever PDF viewer) like that:
Name: Helvetica
Type: Type1
File: /usr/share/fonts/truetype/liberation2/LiberationSans-regular.ttf
Embedded: No
I want to embed Helvetica with PDFBox 2xx without modifying file content (text) itself so it would always available with a file.
Is it possible at all?
I tried something like:
PDDocument document = PDDocument.load(myFile);
InputStream stream = new FileInputStream(new File("/home/user/fonts_temp/Helvetica.ttf"));
PDFont fontToEmbed = PDType0Font.load(document, stream, true);
PDResources resources = document.getPage(pageNumber).getResources();
resources.add(fontToEmbed);
//or use the font from pdfbox:
resources.add(PDType1Font.HELVETICA);
document.save(somewhere);
document.close();
I also tried to call
COSName fontCosName = resources.add(PDType1Font.HELVETICA);
resources.put(fontCosName, font);
What am I doing wrong?
Edit:
#TilmanHausherr thank you for the clue! But I'm still missing something. Currently my code looks like:
PDFont helvetica = PDType0Font.load(document, new FileInputStream(new File("/path/Helvetica.ttf")), false);
...
PDResources resources = page.getResources();
for (COSName fontCosName : resources.getFontNames()){
if(resources.getFont(fontCosName).getName().equals("Helvetica")) {
resources.put(fontCosName, helvetica);
}
}
End result shows
Helvetica CID TrueType Fully Embedded
But the font is not displayed in PDF file at all now. I mean those places where the font is used are literally empty, blank page... Still something is not there.
Font itself was downloaded from here

You'd need to know the name that is currently used in the resources, so check these with resources.getFontNames()
2.
To replace a standard 14 font, use this font object:
PDTrueTypeFont.load(document, file, oldFont.getEncoding() /* or WinAnsiEncoding.INSTANCE which is usually right */ );
this ensures that the same encoding is used as the standard 14 font. (It's different for the Zapf Dingbats and the Symbol font)

Related

Added font not loading correct pdfbox for acroform

I'm trying to embed the fonts using the following code,
which is based on Stackoverflow and PDFBOX-2661:
The font to embed as alternative to Helvetica is DejaVuSans.
// given: PDDocument document, PDAcroForm acroForm
InputStream font_file = ClassLoader.getSystemResourceAsStream("DejaVuSans.ttf");
font = PDType0Font.load(document, font_file);
if (font_file != null) {
font_file.close();
}
System.err.println("Embedded font 'DejaVuSans.ttf' loaded.");
PDResources resources = acroForm.getDefaultResources();
if (resources == null) {
resources = new PDResources();
}
resources.put(COSName.getPDFName("Helv"), font);
resources.put(COSName.getPDFName("Helvetica"), font);
// Also use "DejaVuSans.ttf" for "HeBo", "HelveticaBold" and "Helvetica-Bold" in a similar way, but this is left out to keep this short.
acroForm.setDefaultResources(resources);
// let pdfbox handle refreshing the values, now that all the fonts should be there.
acroForm.refreshAppearances();
However in acroForm.refreshAppearances(), it results in a lot of Using fallback font LiberationSans for CID-keyed TrueType font DejaVuSans. Debugging it a bit, down there in createDescendantFont it tries to load (in org.apache.pdfbox.pdmodel.font.PDCIDFontType2's findFontOrSubstitute) the font file "DejaVuSans" from the filesystem again, instead of using the provided resource. As it is provided in the JAR file instead of from the normal filesystem (system's fonts) is not found, resulting in the fallback font to be used.
How can I make it recognise and load the font correctly?
What I already tried:
I tried extending the font loading mechanism, but as everything is private and/or final, I had to stop after I already copied about 10 files unchanged from the original code just to be able to access them; that must be possible in a different way.
Direct writes to the ContentStream seem to use a different way (contentStream.setFont(pdfFont, fontSize)), so that is not affected.

The current AcroForm form field refreshing mechanism in PDFBox is not really usable in combination with fonts yet to be subsetted.
The cause is that whenever a font is used for refreshing an appearance, it is retrieved from some resources dictionary. In those resource dictionaries, though, there is not your original PDType0Font but only a preliminary version of the PDF objects backing your PDType0Font. But these PDF objects don't know that they back a font that eventually shall be subsetted, so retrieval of that font generates a new, different PDType0Font object which claims to be non-embedded. So it also is not informed about glyphs to eventually embed.
This also is the reason why the PDType0Font.load method you use is documented (JavaDoc comments) with the hint If you are loading a font for AcroForm, then use the 3-parameter constructor instead:
/**
* Loads a TTF to be embedded and subset into a document as a Type 0 font. If you are loading a
* font for AcroForm, then use the 3-parameter constructor instead.
*
* #param doc The PDF document that will hold the embedded font.
* #param input An input stream of a TrueType font. It will be closed before returning.
* #return A Type0 font with a CIDFontType2 descendant.
* #throws IOException If there is an error reading the font stream.
*/
public static PDType0Font load(PDDocument doc, InputStream input) throws IOException
And the 3-parameter constructor in its documentation tells you not to use subsetting for fonts for AcroForm usage:
/**
* Loads a TTF to be embedded into a document as a Type 0 font.
*
* #param doc The PDF document that will hold the embedded font.
* #param input An input stream of a TrueType font. It will be closed before returning.
* #param embedSubset True if the font will be subset before embedding. Set this to false when
* creating a font for AcroForm.
* #return A Type0 font with a CIDFontType2 descendant.
* #throws IOException If there is an error reading the font stream.
*/
public static PDType0Font load(PDDocument doc, InputStream input, boolean embedSubset)
throws IOException
But even using that 3 parameter constructor with embedSubset set to false does not render a good result. At first glance the rendered fields look ok:
But as soon as you click into them, something weird happens:
#Tilman, there probably still is something to fix here.
The underlying problem with the subset embedded font can also occur in other contexts, e.g.:
try ( PDDocument pdDocument = new PDDocument();
InputStream font_file = [...] ) {
PDType0Font font = PDType0Font.load(pdDocument, font_file);
PDResources pdResources = new PDResources();
COSName name = pdResources.add(font);
PDPage pdPage = new PDPage();
pdPage.setResources(pdResources);
pdDocument.addPage(pdPage);
try ( PDPageContentStream canvas = new PDPageContentStream(pdDocument, pdPage) ) {
canvas.setFont(pdResources.getFont(name), 12);
canvas.beginText();
canvas.newLineAtOffset(30, 700);
canvas.showText("Some test text.");
canvas.endText();
}
pdDocument.save("sampleOfType0Issue.pdf");
}
(RefreshAppearances test testIllustrateType0Issue)

Font different in MS Edge than Chrome for PDF when using PDFBox

I have a PDF template created in Acrobat Reader DC which contains a field that I trying to fill with some text. The field has a specific font that I want to keep. I am able to obtain the field and change the value.
However, when I open the PDF in Internet Explorer the font is a default font. The confusing part is that if I open it in Chrome then it shows the correct font. Not sure why that is, any help is appreciated. I am using PDFBox version 2.
(The font works if I don't use Java to edit the file, if I just manually change it inside Acrobat and save the file then it shows correctly.)
See below for the code used.
File file = new File("PDFToReadFrom.pdf");
PDDocument pdDoc = PDDocument.load(file);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
for(PDField pdField : pdAcroForm.getFields()){
pdField.setValue("value");
}
pdDoc.save(new File("test.pdf"));
pdDoc.close();

I suggest you'd better to compare the PDF file (Use Java edited file and the Acrobat generated file), whether they are using the same font.
According to this article, it seems that we could set the font when using PDFBox to create a PDF file.

JAVA - font path instead font name in docx

I am using Apache POI to generate .docx document. I added external fonts to my project. For example:
String playfairDisplayRegular = this.getClass().getClassLoader().getResource("PlayfairDisplay-Regular.ttf").getFile();
I used playfairDisplayRegular in paragraph. When I mark text in the document in the field with the name of the font is path for example:
/C:/Users/..../Documents...
instead of the font name (the font is working). Any ideas ?
Greetings, Artur

URL.getFile() just returns the file name part (+ optional query part ?...) of the URL.
For resources (files possibly inside a jar, residing on the class path) one should rather not use File, but use an InputStream, whenever possible.
With java.awt.Font:
Font font = Font.createFont(Font.TRUETYPE_FONT,
getClass().getResourceAsStream("/PlayfairDisplay-Regular.ttf"));
In the docx you can now use font.getFamily() (for XSLFTextRun.setFontFamily) and such.
Embedding fonts in the docx:
Meanwhile apache poi might be able to embed fonts (license issue for you!), but doing it yourself should be simple: .docx is a zip format, fonts are in a /fonts/ subdirectory. You can test it in a small docx written in MSWord. Writing the file can be done by a zip file system: "jar:file:/C:/... .docx", and Files.copy.

Using java.awt.Font will be problematic for me, bacause my syntax looks like this:
printParagraph(createParagraphWithAlignment(document, ParagraphAlignment.RIGHT),
"something",
new Font(playfairDisplayRegular, 12, Boolean.TRUE, Boolean.FALSE));
And methods used:
protected XWPFRun printParagraph(XWPFParagraph paragraph, String text, Font font) {
XWPFRun run = paragraph.createRun();
run.setText(text);
run.setFontSize(font.getSize());
run.setBold(font.getBold());
run.setItalic(font.getItalic());
run.setFontFamily(font.getName());
return run;
}
protected XWPFParagraph createParagraphWithAlignment(IBody ibody, ParagraphAlignment alignment) {
XWPFParagraph paragraph = castParagraph(ibody);
paragraph.setAlignment(alignment);
return paragraph;
}

Image not displaying (but loading) in PDF generation using resource stream

I created a PDF using PDFBOX. The entire PDF generates perfectly and even the images loaded while i was using
PDImageXObject ptabelle = PDImageXObject.createFromFile("src/main/resources/pdf/ptabelle.png", pdDocument);
But the project will need to go live sometime so I have to replace the static path with a class loader. After doing all that the PDF generates, the text is displayed, but not the image.
The interesting thing is that inside the PDF the "box" where the image should be is there, but not the image.
Here is the code for the stream generation.
ClassLoader classLoader = getClass().getClassLoader();
PDStream pdStream = new PDStream(pdDocument, classLoader.getResourceAsStream("pdf/ptabelle.png"));
PDResources pdResources = new PDResources();
PDImageXObject ptabelle = new PDImageXObject(pdStream, pdResources);
PDPageContentStream pdPageContentStream = new PDPageContentStream(pdDocument, page4);
And here is the call in the code, the length + width variables are defined in the code.
pdPageContentStream.drawImage(ptabelle, TEXT_BEGIN, currentYCoord, 172, 107);

Instead of new PDImageXObject(pdStream, pdResources) which is for PDFBox internal use, please use the appropriate LosslessFactory method. So your code would look like this:
BufferedImage bim = ImageIO.read(classLoader.getResourceAsStream("pdf/ptabelle.png"));
PDImageXObject img = LosslessFactory.createFromImage(pdDocument, bim);
See also the javadoc of PDImageXObject.createFromFileByExtension, which explains what factory methods can be called instead.

Embed non-embedded fonts in PDF with IText

So I have the following problem. I receive a PDF file which contains a set of fonts. These fonts are not embedded into the file. Here is a simple example:
I would like to embed these fonts inside the PDF, so they're self-contained and always available. But things don't seem that simple. I'm using IText to do my PDF processing.
I have read and tried the following questions/answers:
how-to-create-pdf-with-font-information-and-embed-actual-font-when-merging-them
embed-truetype-fonts-in-existing-pdf
embed-font-into-pdf-file-by-using-itext
how-to-check-that-all-used-fonts-are-embedded-in-pdf-with-java-itext
Chapter 16.1.4 Replacing a font of the book iText in Action - 2nd Edition
...
But what had gotten me closest was the following example: EmbedFontPostFacto.java (which comes from the book). I was able to embed the Arial font when providing the Arial.ttf file.
But with this, like with other examples, I need the source file of the font in order to embed it. In my case, I don't have the source file. But I might have them on the system however. So I'd like to query my available fonts on the system and see if it corresponds to the given font.
Something of the likes as
GraphicsEnvironment e = GraphicsEnvironment.getLocalGraphicsEnvironment();
java.awt.Font[] fonts = e.getAllFonts();
for(java.awt.Font f : fonts){
System.out.println(f.getFontName());
}
But I cannot transform the given java.awt.Font into a RandomAccessFile or a byte[] to be used in order to embed the font file itself. Is there another way for embedding fonts into a PDF, without having the source file of the font itself?

For Windows C:\Windows\Fonts or such contain all font files, and in the explorer shows also font names. So a manual search is feasible.
In java, you have GraphicsEnvironment.getAvailableFontFamilyNames() and Font.getFamilyName() to check for a name from the PDF like "Arial MT."
However a getter for the file is missing from Font.
So list all files of the font directory, and load each file consecutively as Font.
GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
Font font = Font.createFont(Font.TRUETYPE_FONT, ttfFile);
ge.registerFont(font); // If you want to load the font.
if (pdfFontName.startsWith(font.getFamilyName()) {
System.out.printf("%s - %s / %s%n", ttfFile.getName(), font.getFamilyName(),
font.getName());
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.