PdfBox - change font or fontName in pdf file? - java

please tell me.
I have a pdf files with fonts HPDFAA+Arial-BoldMTBold. This font name incorrect and it's a subset...
I change fonts with library Asponse.pdf.dll, https://docs.aspose.com/pdf/net/replace-text-in-pdf/, paragraph - Replace fonts in existing PDF file, but this library trail version.
How can i do this with PDFBox? I want to replace this font on Arial-BoldMT or rename font name.
UPD: my attempts have led nowhere...In PDFontDescriptor i can rename font, but how i can apply for PDFont? Or i'm going the wrong way?
PDDocument pdfDocument = PDDocument.load(new File("Sample.pdf"));
PDPageTree pages = pdfDocument.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames()) {
PDFont font = res.getFont(fontName);
PDFontDescriptor fontDescriptor = font.getFontDescriptor();
System.out.println("fontDes: " + fontDescriptor.getFontName());
String oldFontName = fontDescriptor.getFontName();
String newFontName = oldFontName.replace("Arial-BoldMTBold", "Arial-BoldMT");
fontDescriptor.setFontName(newFontName);
System.out.println("font: " + font.getName());
}

Here's code that is tailored to your file. It will only help you if this is about many similar files.
try (PDDocument doc = PDDocument.load(new File(XXX,"outerBox.pdf")))
{
PDPage page = doc.getPage(0);
for (COSName name : page.getResources().getFontNames())
{
PDFont font = page.getResources().getFont(name);
String fontName = font.getName();
if (font instanceof PDType0Font && fontName.endsWith("BoldMTBold"))
{
PDType0Font type0font = (PDType0Font) font;
String newFontName = fontName.substring(0, fontName.length() - 4);
type0font.getCOSObject().setString(COSName.BASE_FONT, newFontName);
PDCIDFont descendantFont = type0font.getDescendantFont();
descendantFont.getCOSObject().setString(COSName.BASE_FONT, newFontName);
PDFontDescriptor fontDescriptor = descendantFont.getFontDescriptor();
fontDescriptor.setFontName(newFontName);
}
}
doc.save(new File(XXX,"outerBox-saved.pdf"));
}
PDF structure, seen with PDFDebugger:

Related

How to use TTF font with PDFBox AcroForm and then flatten document?

I have been trying to make a fillable PDF file with LibreOffice Writer 7.2.2.2. Here is how the document looks like:
All fields right of the vertical lines are form textboxes, each one having its own name(tbxOrderId, tbxFullName...). Each textbox uses SF Pro Text Light as font. Only the one on the bottom right(tbxTotal) - Total €123.00 has Oswald Regular. The document looks alright when I fill these fields with LibreOffice Writer.
Below this are my export settings. I chose Archive PDF A-2b in order to embed the fonts into the document.
Here is the output when I run pdffonts to the exported PDF file.
However, when I run the following code which just changes the values of tbxOrderId and tbxTotal, the output PDF document is missing these fonts.
public class Start {
public static void main(String[] args) {
try {
PDDocument pDDocument = PDDocument.load(new File("/media/stoyank/Elements/Java/tmp/Receipt.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
PDField field = pDAcroForm.getField("tbxOrderId");
field.setValue("192753");
field = pDAcroForm.getField("tbxTotal");
field.setValue("Total: €192.00");
pDAcroForm.flatten();
pDDocument.save("/media/stoyank/Elements/Java/tmp/output.pdf");
pDDocument.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
This is how the output document looks like:
I tried to add the font manually by referring to this Stackoverflow question, but still no success:
PDDocument pDDocument = PDDocument.load(new File("/media/stoyank/Elements/Java/tmp/Receipt.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
InputStream font_file = ClassLoader.getSystemResourceAsStream("Oswald-Regular.ttf");
PDType0Font font = PDType0Font.load(pDDocument, font_file, false);
if (font_file != null) font_file.close();
PDResources resources = pDAcroForm.getDefaultResources();
if (resources == null) resources = new PDResources();
resources.put(COSName.getPDFName("Oswald-Regular"), font);
pDAcroForm.setDefaultResources(resources);
pDAcroForm.refreshAppearances();
PDField field = pDAcroForm.getField("tbxOrderId");
field.setValue("192753");
field = pDAcroForm.getField("tbxTotal");
field.setValue("Total: €192.00");
pDAcroForm.flatten();
pDDocument.save("/media/stoyank/Elements/Java/tmp/output.pdf");
pDDocument.close();
After I write into these textbox fields, I want to flatten the document.
Here is my folder structure:
System: Ubuntu 20.04
Also, here is a link to the ODT file that I then export to a PDF and the exported PDF.
Your file doesn't have correct appearance streams for the fields, this is a bug from the software that created the PDF. Call pDAcroForm.refreshAppearances(); as early as possible.
The code in pastebin is fine (it is based on CreateSimpleFormWithEmbeddedFont.java example), except that you should keep the default resources and not start with empty resources. So your code should look like this:
pDAcroForm.refreshAppearances();
PDType0Font formFont = PDType0Font.load(pDDocument, ...input stream..., false);
PDResources resources = pDAcroForm.getDefaultResources();
if (resources == null)
{
resources = new PDResources();
pDAcroForm.setDefaultResources(resources);
}
final String fontName = resources.add(formFont).getName();
// Acrobat sets the font size on the form level to be
// auto sized as default. This is done by setting the font size to '0'
String defaultAppearanceString = "/" + fontName + " 0 Tf 0 g";
PDTextField field = (PDTextField) (pDAcroForm.getField("tbxTotal"));
field.setDefaultAppearance(defaultAppearanceString);
field.setValue("Total: €192.00");

Printing Chinese characters in pdfbox

I'm using the following set-up:
Java 11.0.1
pdfbox 2.0.15
Objective: Rendering a pdf that contains Chinese characters
Problem: java.lang.IllegalArgumentException: U+674E is not available in this font's encoding: WinAnsiEncoding
I already tried:
Using different fonts for Chinese character support. The latest one is NotoSansCJKtc-Regular.ttf
Set font to unicode as described here: Java: Write national characters to PDF using PDFBox, however the used loadTTF method is deprecated.
Using Arial-Unicode-MS_4302.ttf
My code looks like this (shortened a bit):
try (InputStream pdfIn = inputStream; PDDocument pdfDocument =
PDDocument.load(pdfIn)) {
PDFont formFont;
//Check if Chinese characters are present
if (!Util.containsHanScript(queryString)) {
formFont = PDType0Font.load(pdfDocument,
PdfReportGenerator.class.getResourceAsStream("LiberationSans-Regular.ttf"),
false);
} else {
formFont = PDType0Font.load(pdfDocument,
PdfReportGenerator.class.getResourceAsStream("NotoSansCJKtc-Regular.ttf"),
false);
}
List<PDField> fields = acroForm.getFields();
//Load fields into Map
Map<String, PDField> pdfFields = new HashMap<>();
for (PDField field : fields) {
String key = field.getPartialName();
pdfFields.put(key, field);
}
PDField currentField = pdfFields.get("someFieldID");
PDVariableText pdfield = (PDVariableText) currentField;
PDResources res = acroForm.getDefaultResources();
String fontName = res.add(formFont).getName();
String defaultAppearanceString = "/" + fontName + " 10 Tf 0 g";
pdfield.setDefaultAppearance(defaultAppearanceString);
pdfield.setValue("李柱");
acroForm.flatten(fields, true);
ByteArrayOutputStream pdfOut = new ByteArrayOutputStream();
pdfDocument.save(pdfOut);
}
Expected result: Chinese characters on pdf.
Actual result: java.lang.IllegalArgumentException: U+674E is not available in this font's encoding: WinAnsiEncoding
So my question is about how to best support rendering of Chinese characters with pdfbox. Any help is appreciated.
The following code works for me, it uses the file of PDFBOX-4629:
PDDocument doc = PDDocument.load(new URL("https://issues.apache.org/jira/secure/attachment/12977270/Report_Template_DE.pdf").openStream());
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDVariableText field = (PDVariableText) acroForm.getField("search_query");
List<PDField> fields = acroForm.getFields();
PDFont font = PDType0Font.load(doc, new FileInputStream("c:/windows/fonts/arialuni.ttf"), false);
PDResources res = acroForm.getDefaultResources();
String fontName = res.add(font).getName();
String defaultAppearanceString = "/" + fontName + " 10 Tf 0 g";
field.setDefaultAppearance(defaultAppearanceString);
field.setValue("李柱");
acroForm.flatten(fields, true);
doc.save("saved.pdf");
doc.close();

Unable to save Arabic words in a PDF - PDFBox Java

Trying to save Arabic words in an editable PDF. It works all fine with English ones but when I use Arabic words, I am getting this exception:
java.lang.IllegalArgumentException:
U+0627 is not available in this font Helvetica encoding: WinAnsiEncoding
Here is how I generated PDF:
public static void main(String[] args) throws IOException
{
String formTemplate = "myFormPdf.pdf";
try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate)))
{
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
if (acroForm != null)
{
PDTextField field = (PDTextField) acroForm.getField( "sampleField" );
field.setValue("جملة");
}
pdfDocument.save("updatedPdf.pdf");
}
}
That's how I made it work, I hope it would help others. Just use the font that is supported by the language that you want to use in the PDF.
public static void main(String[] args) throws IOException
{
String formTemplate = "myFormPdf.pdf";
try (PDDocument pdfDocument = PDDocument.load(new File(formTemplate)))
{
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
// you can read ttf from resources as well, this is just for testing
PDFont font = PDType0Font.load(pdfDocument,new File("/path/to/font.ttf"));
String fontName = acroForm.getDefaultResources().add(pdfont).getName();
if (acroForm != null)
{
PDTextField field = (PDTextField) acroForm.getField( "sampleField" );
field.setDefaultAppearance("/"+fontName +" 0 Tf 0 g");
field.setValue("جملة");
}
pdfDocument.save("updatedPdf.pdf");
}
}
Edited: Adding the comment of mkl
The font name and the font size are parameters of the Tf instruction, and the gray value 0 for black is the parameter for the g instruction. Parameters and instruction names must be appropriately separated.
You need a font which supports those Arabic symbols.
Once you've got a compatible font, you can load it using PDType0Font
final PDFont font = PDType0Font.load(...);
A Type 0 font is a font which references multiple other fonts' formats, and can, potentially, load all available symbols.
See also the Cookbook - working with fonts (no examples with Type 0, but still useful).

How to change font of an embedded resource using PDFBox

I'm trying to get rid of a custom font that has been used for years. Due to regulations I need to replace this font with a common one.
Anyways, I've tried to write a JUnit Test to change the font of a pdf using PDFBox.
This is what I have done:
#Test
public void changeFontOfAllPdfsToArial() throws Exception {
PDDocument document = PDDocument.load(new File("src/test/broken_pdf.pdf"));
for(PDPage page : document.getPages()) {
PDResources resources = page.getResources();
for(COSName key : resources.getFontNames()) {
PDFont font = resources.getFont(key);
System.out.println(font.getFontDescriptor().getFontName());
if(resources.getFont(key).toString().contains("CUSTOM")) {
}
}
}
document.save(new File(PDFs.get(0).getAbsolutePath() + "_test"));
}
Iterating through the list gives me all the fonts of the document.
I'm getting the COSName key of the resource, but how do I change the font of it? Thanks for your help!
€: Just to mention: The font is embedded.

How to extract font styles of text contents using pdfbox?

I am using pdfbox library to extract text contents from pdf file.I would able to extract all the text,but couldn't find the method to extract font styles.
This is not the right way to extract font. To read font one has to iterate through pdf pages and extract font as below:
PDDocument doc = PDDocument.load("C:/mydoc3.pdf");
List<PDPage> pages = doc.getDocumentCatalog().getAllPages();
for(PDPage page:pages){
Map<String,PDFont> pageFonts=page.getResources().getFonts();
}
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
public class pdf2box {
public static void main(String args[])
{
try
{
PDDocument pddDocument=PDDocument.load("table2.pdf");
PDFTextStripper textStripper=new PDFTextStripper();
System.out.println(textStripper.getText(pddDocument));
textStripper.getFonts();
pddDocument.close();
}
catch(Exception ex)
{
ex.printStackTrace();
}
}
}
File file = new File("sample.pdf");
PDDocument document = PDDocument.load(file);
for (int i = 0; i < document.getNumberOfPages(); ++i)
{
PDPage page = document.getPage(i);
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames())
{
PDFont font = res.getFont(fontName);
System.out.println(font.getName());
}
}

Categories