How to change font of an embedded resource using PDFBox - java

I'm trying to get rid of a custom font that has been used for years. Due to regulations I need to replace this font with a common one.
Anyways, I've tried to write a JUnit Test to change the font of a pdf using PDFBox.
This is what I have done:
#Test
public void changeFontOfAllPdfsToArial() throws Exception {
PDDocument document = PDDocument.load(new File("src/test/broken_pdf.pdf"));
for(PDPage page : document.getPages()) {
PDResources resources = page.getResources();
for(COSName key : resources.getFontNames()) {
PDFont font = resources.getFont(key);
System.out.println(font.getFontDescriptor().getFontName());
if(resources.getFont(key).toString().contains("CUSTOM")) {
}
}
}
document.save(new File(PDFs.get(0).getAbsolutePath() + "_test"));
}
Iterating through the list gives me all the fonts of the document.
I'm getting the COSName key of the resource, but how do I change the font of it? Thanks for your help!
€: Just to mention: The font is embedded.

Related

How to use TTF font with PDFBox AcroForm and then flatten document?

I have been trying to make a fillable PDF file with LibreOffice Writer 7.2.2.2. Here is how the document looks like:
All fields right of the vertical lines are form textboxes, each one having its own name(tbxOrderId, tbxFullName...). Each textbox uses SF Pro Text Light as font. Only the one on the bottom right(tbxTotal) - Total €123.00 has Oswald Regular. The document looks alright when I fill these fields with LibreOffice Writer.
Below this are my export settings. I chose Archive PDF A-2b in order to embed the fonts into the document.
Here is the output when I run pdffonts to the exported PDF file.
However, when I run the following code which just changes the values of tbxOrderId and tbxTotal, the output PDF document is missing these fonts.
public class Start {
public static void main(String[] args) {
try {
PDDocument pDDocument = PDDocument.load(new File("/media/stoyank/Elements/Java/tmp/Receipt.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
PDField field = pDAcroForm.getField("tbxOrderId");
field.setValue("192753");
field = pDAcroForm.getField("tbxTotal");
field.setValue("Total: €192.00");
pDAcroForm.flatten();
pDDocument.save("/media/stoyank/Elements/Java/tmp/output.pdf");
pDDocument.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
This is how the output document looks like:
I tried to add the font manually by referring to this Stackoverflow question, but still no success:
PDDocument pDDocument = PDDocument.load(new File("/media/stoyank/Elements/Java/tmp/Receipt.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
InputStream font_file = ClassLoader.getSystemResourceAsStream("Oswald-Regular.ttf");
PDType0Font font = PDType0Font.load(pDDocument, font_file, false);
if (font_file != null) font_file.close();
PDResources resources = pDAcroForm.getDefaultResources();
if (resources == null) resources = new PDResources();
resources.put(COSName.getPDFName("Oswald-Regular"), font);
pDAcroForm.setDefaultResources(resources);
pDAcroForm.refreshAppearances();
PDField field = pDAcroForm.getField("tbxOrderId");
field.setValue("192753");
field = pDAcroForm.getField("tbxTotal");
field.setValue("Total: €192.00");
pDAcroForm.flatten();
pDDocument.save("/media/stoyank/Elements/Java/tmp/output.pdf");
pDDocument.close();
After I write into these textbox fields, I want to flatten the document.
Here is my folder structure:
System: Ubuntu 20.04
Also, here is a link to the ODT file that I then export to a PDF and the exported PDF.
Your file doesn't have correct appearance streams for the fields, this is a bug from the software that created the PDF. Call pDAcroForm.refreshAppearances(); as early as possible.
The code in pastebin is fine (it is based on CreateSimpleFormWithEmbeddedFont.java example), except that you should keep the default resources and not start with empty resources. So your code should look like this:
pDAcroForm.refreshAppearances();
PDType0Font formFont = PDType0Font.load(pDDocument, ...input stream..., false);
PDResources resources = pDAcroForm.getDefaultResources();
if (resources == null)
{
resources = new PDResources();
pDAcroForm.setDefaultResources(resources);
}
final String fontName = resources.add(formFont).getName();
// Acrobat sets the font size on the form level to be
// auto sized as default. This is done by setting the font size to '0'
String defaultAppearanceString = "/" + fontName + " 0 Tf 0 g";
PDTextField field = (PDTextField) (pDAcroForm.getField("tbxTotal"));
field.setDefaultAppearance(defaultAppearanceString);
field.setValue("Total: €192.00");

PdfBox - change font or fontName in pdf file?

please tell me.
I have a pdf files with fonts HPDFAA+Arial-BoldMTBold. This font name incorrect and it's a subset...
I change fonts with library Asponse.pdf.dll, https://docs.aspose.com/pdf/net/replace-text-in-pdf/, paragraph - Replace fonts in existing PDF file, but this library trail version.
How can i do this with PDFBox? I want to replace this font on Arial-BoldMT or rename font name.
UPD: my attempts have led nowhere...In PDFontDescriptor i can rename font, but how i can apply for PDFont? Or i'm going the wrong way?
PDDocument pdfDocument = PDDocument.load(new File("Sample.pdf"));
PDPageTree pages = pdfDocument.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames()) {
PDFont font = res.getFont(fontName);
PDFontDescriptor fontDescriptor = font.getFontDescriptor();
System.out.println("fontDes: " + fontDescriptor.getFontName());
String oldFontName = fontDescriptor.getFontName();
String newFontName = oldFontName.replace("Arial-BoldMTBold", "Arial-BoldMT");
fontDescriptor.setFontName(newFontName);
System.out.println("font: " + font.getName());
}
Here's code that is tailored to your file. It will only help you if this is about many similar files.
try (PDDocument doc = PDDocument.load(new File(XXX,"outerBox.pdf")))
{
PDPage page = doc.getPage(0);
for (COSName name : page.getResources().getFontNames())
{
PDFont font = page.getResources().getFont(name);
String fontName = font.getName();
if (font instanceof PDType0Font && fontName.endsWith("BoldMTBold"))
{
PDType0Font type0font = (PDType0Font) font;
String newFontName = fontName.substring(0, fontName.length() - 4);
type0font.getCOSObject().setString(COSName.BASE_FONT, newFontName);
PDCIDFont descendantFont = type0font.getDescendantFont();
descendantFont.getCOSObject().setString(COSName.BASE_FONT, newFontName);
PDFontDescriptor fontDescriptor = descendantFont.getFontDescriptor();
fontDescriptor.setFontName(newFontName);
}
}
doc.save(new File(XXX,"outerBox-saved.pdf"));
}
PDF structure, seen with PDFDebugger:

PDFBox does not correctly render Simsun (chinese) font

Context
I am writing a Java code which fill PDF Forms using PDFBox with some user inputs.
Some of the inputs are in Chinese.
When I generated the PDF, I don't have any errors in the logs but the rendered text is absolutely not the same.
What I currently have
Here is what I do:
In the PDF file, I specified the SimSun font for the field using Adobe Pro.
This font handle Simplified Chinese characters.
I have the font SimSun installed on my server.
PDFBox doesn't display any error (if I remove the SimSun font from my server then PDFBox fallback on another font that is not able to render the characters). So i guess it is able to find the font and use it.
What I tried
I was able to make this work but I had to manually load the font in the code and add it to the PDF (see examples below).
But that is not a solution as it means that I would have to load the font every time and add it the the PDF. I would also have to do the same for many other languages.
As far as I understood, PDFBox should be able to use any fonts installed on the server.
Below is a test class that tries 3 different approaches. Only the last one works so far:
Classic generation
Simply put Chinese characters inside the text field without changing anything.
The characters are not rendered correctly (some of them are missing and the ones displayed does not match the input).
Generation with embedded font
Try to embed the SimSun font inside the PDF with the PDResource.add(font) method.
The result is the same as the first method.
Embed the font and use it
I embed the SimSun font and I also override the font used in the TextField to use the SimSun font I just added.
This approach works.
After quite a few readings, I found out that the issue might come from the version of the font I am using.
Windows 8 (which I use to create the form) uses v5.04 of Simsun font.
I use v2.10 on my laptop and my servers, both being Linux based (I can not find the v5.04).
However, I don't know:
If the issue is really coming from this.
If I have the right to use this font, as it is developed by Microsoft (and Apple).
Where to find the latest version of it.
I tried using another font but:
I only find OTF fonts (and not TTF) that support Chinese characters.
PDFBox does not support OTF (yet). It is planed for v3.0.0.
So if someone has an idea on how to make this work without having to embed and change the font's name in the code, that would be great!
Here are the PDF I used and the code that tests the 3 methods I talked about.
The TextField in the pdf is named comment.
package org.test;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.cos.COSString;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Hello world!
*/
public class App {
private static final String SIMPLIFIED_CHINESE_STRING = "我不明白为什么它不起作用。";
public static void main(String[] args) throws IOException {
System.out.println("Hello World!");
// Test 1
classicGeneration();
// Test 2
generationWithEmbededFont();
Test 3
generationWithFontOverride();
System.out.println("Bye!");
}
/**
* Classic PDF generation without any changes to the PDF.
*/
private static void classicGeneration() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDField commentField = acroForm.getField("comment");
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-classic-generation.pdf"));
}
/**
* Trying to embed the font in the PDF. It doesn't seem to work.
* The result is the same as classicGeneration method.
*/
private static void generationWithEmbededFont() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDFont font = PDType0Font.load(document, new File("/usr/share/fonts/SimSun.ttf"));
PDResources res = acroForm.getDefaultResources();
if (res == null) {
res = new PDResources();
}
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
PDField commentField = acroForm.getField("comment");
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-with-embeded-font.pdf"));
}
/**
* Embed the font in the PDF and change the font used in the TextField to use this one.
* Here the PDF is correctly rendered and all the characters are displayed.
* #throws IOException
*/
private static void generationWithFontOverride() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDField commentField = acroForm.getField("comment");
// Load the font
InputStream resourceAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("SimSun.ttf");
PDFont font = PDType0Font.load(document, resourceAsStream);
PDResources res = acroForm.getDefaultResources();
if (res == null) {
res = new PDResources();
}
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
// Change the font used by the TextField
COSDictionary dict = commentField.getCOSObject();
COSString defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);
if (defaultAppearance != null) {
String currentFont = dict.getString(COSName.DA);
// Retrieve the current font size and color used for the field in order to use the same but with the new font.
String regex = "[\\w]* ([\\w\\s]*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(currentFont);
// Default font size if we fail to extract the current one
String fontSize = " 11 Tf";
if (matcher.find()) {
fontSize = " " + matcher.group(1);
}
// Change the font of the TextField.
dict.setString(COSName.DA, "/" + fontName.getName() + fontSize);
}
commentField.getCOSObject().addAll(dict);
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-with-font-override.pdf"));
}
// HELPER
private static PDDocument loadPdf() throws IOException {
InputStream stream = Thread.currentThread().getContextClassLoader().getResourceAsStream("sample.pdf");
return PDDocument.load(stream);
}
}

Unable to save Arabic words in a PDF - PDFBox Java

Trying to save Arabic words in an editable PDF. It works all fine with English ones but when I use Arabic words, I am getting this exception:
java.lang.IllegalArgumentException:
U+0627 is not available in this font Helvetica encoding: WinAnsiEncoding
Here is how I generated PDF:
public static void main(String[] args) throws IOException
{
String formTemplate = "myFormPdf.pdf";
try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate)))
{
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
if (acroForm != null)
{
PDTextField field = (PDTextField) acroForm.getField( "sampleField" );
field.setValue("جملة");
}
pdfDocument.save("updatedPdf.pdf");
}
}
That's how I made it work, I hope it would help others. Just use the font that is supported by the language that you want to use in the PDF.
public static void main(String[] args) throws IOException
{
String formTemplate = "myFormPdf.pdf";
try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate)))
{
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
// you can read ttf from resources as well, this is just for testing
PDFont font = PDType0Font.load(pdfDocument,new File("/path/to/font.ttf"));
String fontName = acroForm.getDefaultResources().add(pdfont).getName();
if (acroForm != null)
{
PDTextField field = (PDTextField) acroForm.getField( "sampleField" );
field.setDefaultAppearance("/"+fontName +" 0 Tf 0 g");
field.setValue("جملة");
}
pdfDocument.save("updatedPdf.pdf");
}
}
Edited: Adding the comment of mkl
The font name and the font size are parameters of the Tf instruction, and the gray value 0 for black is the parameter for the g instruction. Parameters and instruction names must be appropriately separated.
You need a font which supports those Arabic symbols.
Once you've got a compatible font, you can load it using PDType0Font
final PDFont font = PDType0Font.load(...);
A Type 0 font is a font which references multiple other fonts' formats, and can, potentially, load all available symbols.
See also the Cookbook - working with fonts (no examples with Type 0, but still useful).

PDF trailer error on pdfs created by PDFBox

I try to create a PDF file with PDFBox and then to create from it an image using commercial library jPDFImages (Qoppa software). Yes, i know that PDFBox is able too to create images from PDFs, but for some reasons I need to use commercial library.
I created PDF file and pass it to jPDFImajes, but I have an error: "Unable to find PDF trailer". Qoppa software describe this error.
The problem is seems to be in the PDF trailer, which is created by PDFBox, but I don`t understand how to set it up in right mode? (I have a problem only with PDFs created with PDFBox)
Here is my code for pdf creation:
public void createPDFFromImage( String file) throws Exception {
PDDocument doc = null;
try {
doc = new PDDocument();
BufferedImage bufferedImage = ImageIO.read(new File(/home/.../files/test.png));
PDPage page = new PDPage();
doc.addPage( page );
PDJpeg ximage = new PDJpeg(doc,bufferedImage, (float) 0.95);
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
contentStream.drawXObject(ximage, x, y, W, H);
contentStream.close();
doc.save(file);
} finally {
if( doc != null ) {
doc.close();
}
}
}
Here is an error from commercial library:
java.lang.RuntimeException: com.qoppa.pdf.PDFException: Unable to find PDF trailer.
Caused by: com.qoppa.pdf.PDFException: Unable to find PDF trailer.
I think the problem is, how do I create pdf. Maybe I need to add some information to the pdf to make it valid?

Categories