Setting substitution fonts in AcroForm itext7 - java

I have a pdf with AcroForm and need to fill it with string that contains different languages glyphs (English, Chinese, Korean, Khmer).
In iText5 I've used:
AcroFields form = stamper.getAcroFields();
form.addSubstitutionFont(arialFont);
form.addSubstitutionFont(khmerFont);
And it worked fine for Chinese and Korean, but I've faced an issue with Khmer ligatures not being rendered. Found out that I need pdfCalligraph addon to make ligatures work, but it comes with iText7 only. I've managed to add paragraphs with proper Khmer ligatures rendering (requiring typography as a dependency and loading a license key). But in AcroForm it won't do it automatically. I'm struggling to find an iText7 version of addSubstitutionFont and make it work with pdfCalligraph.
Code I've used with iText7:
PdfReader reader = new PdfReader(templatePath);
PdfDocument pdf = new PdfDocument(reader, new PdfWriter(outputPath));
Document document = new Document(pdf);
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
PdfFont khmerFont = PdfFontFactory.createFont(pathToKhmerFont, PdfEncodings.IDENTITY_H, true);
PdfFont font = PdfFontFactory.createFont(pathToArialUnicodeFont, PdfEncodings.IDENTITY_H, true);
pdf.addFont(khmerFont);
pdf.addFont(font);
FontSet set = new FontSet();
set.addFont(pathToKhmerFont);
set.addFont(pathToArialUnicodeFont);
document.setFontProvider(new FontProvider(set));
document.setProperty(Property.FONT, "Arial");
form.setNeedAppearances(true);
String content = "khmer ថ្ងៃឈប់សម្រាក and chinese 假日 and korean 휴일";
PdfFormField tf = form.getField("Text3");
tf.setValue(content);
// tf.setFont(khmerFont);
tf.regenerateField();
// add a paragraph just to check pdfCalligraph works
document.add(new Paragraph(content));
pdf.close();
String used to test proper rendering: "khmer ថ្ងៃឈប់សម្រាក and chinese 假日 and korean 휴일"
iText5 in form field without pdfCalligraph, but with substitution fonts:
iText7 in form field with pdfCalligraph loaded (set arial font field.setFont(arialFont)):
iText7 in form field with pdfCalligraph loaded (set khmer font field.setFont(khmerFont)):
iText7 same document but in a paragraph instead of form field with pdfCalligraph loaded (It is an expected resut, so it does use pdfCalligraph for paragraphs, but not for form fields):
So, as you can see there're basically 2 issues:
How do I addSubstitutionFont in iText7?
How do I use pdfCalligraph in PdfFormField appearance?
I've also checked if pdfCalligraph works in text form and it looks like it does not. Here is a code I've used to check it:
LicenseKey.loadLicenseFile(path_to_license);
String outputPath = path_to_output_doc;
PdfDocument pdf = new PdfDocument(new PdfWriter(outputPath));
Document document = new Document(pdf);
// prepare fonts for pdfCalligraph to use
FontSet set = new FontSet();
set.addFont("/path_to/Khmer.ttf");
set.addFont("/path_to/ArialUnicodeMS.ttf");
FontProvider fontProvider = new FontProvider(set);
document.setFontProvider(fontProvider);
document.setProperty(Property.FONT, "Arial");
String content = "khmer ថ្ងៃឈប់សម្រាក and chinese 假日 and korean 휴일";
// Add a paragraph to check if pdfCalligraph works
document.add(new Paragraph(content));
// Add a form text field
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
PdfTextFormField field = PdfFormField.createText(pdf, new Rectangle(36, 700, 400, 30), "test");
field.setValue(content);
form.addField(field);
document.close();
Output with pdfCalligraph dependency loaded (as you can see paragraph rendered properly, but in form all non-halvetica chars just ignored:
Output without pdfCalligraph dependency loaded (as you can see paragraph is not rendered properly which is expected, the form field looks same as with loaded pdfCalligraph):
Am I missing something?

Related

ITEXT and PDFBOX is not detecting all the form fields present in the pdf

In this code I've used for finding the number of fields in the pdf using Itext and PDFBOX with Java, I'm attaching the pdf, it has 11 fields but the fields present in the page 1 are not getting detected and the size being printed is 2 for the cases.
PdfDocument doc = new PdfDocument(new PdfReader(file));
PdfAcroForm form = PdfAcroForm.getAcroForm(doc, true);
System.out.println("form fields size from Itext:"+form.getFormFields().size());
PDDocument document = PDDocument.load(file);
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
List<PDField> fields = acroForm.getFields();
System.out.println("form fields size from PDFBOX:"+fields.size());
PDF FILE HERE IN THIS LINK
The form information in your PDF is inconsistent.
The global AcroForm form definition in your PDF contains only 2 fields, Text Field 6 and Text Field 7, which happen to be the two fields on page two.
Page one in its Annots array references ten form field widgets, each of them merged with a form field object. These fields are not referenced from the AcroForm form definition in your PDF. Thus, they are not part of the form of the PDF but merely some arbitrary annotations hanging around.
To fix the issue, simply reference the form fields of the widget annotations of page one from the AcroForm form definition.

Add HTML Markup using java Apache PDFBOX

I have been using PDFBOX and EasyTable which extends PDFBOX to draw datatables. I have hit a problem whereby I have a java object with a string of HTML data that I need to be added to the PDF using PDFBOX. A dig at the documentation seems not to bear any fruits.
The code below is a snippet hello world, which I want on the pdf been generated to have H1 formatting.
// Create a document and add a page to it
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage( page );
// Create a new font object selecting one of the PDF base fonts
PDFont font = PDType1Font.HELVETICA_BOLD;
// Start a new content stream which will "hold" the to be created content
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
contentStream.beginText();
contentStream.setFont( font, 12 );
contentStream.moveTextPositionByAmount( 100, 700 );
contentStream.drawString( "<h1>HelloWorld</h1>" );
contentStream.endText();
// Make sure that the content stream is closed:
contentStream.close();
// Save the results and ensure that the document is properly closed:
document.save( "Hello World.pdf");
document.close();
}
Use jerico to format the html to free text while mapping correctly the output of tags.
sample
public String extractAllText(String htmlText){
return new net.htmlparser.jericho
.Source(htmlText)
.getRenderer()
.setMaxLineLength(Integer.MAX_VALUE)
.setNewLine(null)
.toString();
}
Include on your gradle or maven:
compile group: 'net.htmlparser.jericho', name: 'jericho-html', version: '3.4'
PDFBox does not know HTML, at least not for creating content.
Thus, with plain PDFBox you have to parse the HTML yourself and derive special text drawing characteristics from the tags text is in.
E.g. when you encounter "<h1>HelloWorld</h1>", you have to extract the text "HelloWorld" and use the information that it is in a h1 tag to select an appropriate prime header font and font size to draw that "HelloWorld".
Alternatively you can look for a library doing that HTML parsing and transforming to PDF text drawing instructions for PDFBox, e.g. Open HTML to PDF.

PDFTable Itext arabic

I have coded java code and I wanted Arabic words to be displayed at PdfPTable which was asses to itext document to create PDF document
as attached picture "???" is Arabic code '
PdfPTable header = new PdfPTable(6);
PdfPTable tbame = new PdfPTable(1);
tbame.addCell(" >>>>>> " + install.getCustId().getFullName() + " <<<<<<");
tbame.setHorizontalAlignment(PdfPTable.ALIGN_CENTER);
tbame.setLockedWidth(false);
tbame.setExtendLastRow(false);
tbame.setWidthPercentage(100);
header.addCell("End");
header.addCell("Start");
Please read the documentation and you'll find out that the addCell(String content) method can not be used to add Arabic text for two reasons:
When you use this method, the default font Helvetica is used. You need to use a font that knows how to draw Arabic shapes. This is explained in the answer to this question: Itext Arabic Font coming as question marks
Arabic is written from right to left, which means that you need to change the run direction of the content of the cell as is explained in my answer to this question: RTL not working in pdf generation with itext 5.5 for Arabic text
A code snippet:
BaseFont bf = BaseFont.createFont("c:/WINDOWS/Fonts/arialuni.ttf",
BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(bf, 12);
Phrase phrase = new Phrase(
"\u0644\u0648\u0631\u0627\u0646\u0633 \u0627\u0644\u0639\u0631\u0628", font);
PdfPCell cell = new PdfPCell(phrase);
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
table.addCell(cell);
If you don't have access to the font arialuni.ttf, you'll have to find another font that contains Arabic glyphs.

Make AcroFields(iTextSharp) are non editable and set bold

I'm using iTextSharp to populate the data to PDF Templates, which is created in OpenOffice.
it populating fine, I'm getting proper PDF, But that is coming editable mode. I want non-editable PDF. And also make some rows BOLD( by Program). below is my snippet code.
PdfStamper stamper = new PdfStamper(reader, outputStream);
AcroFields fields = stamper.getAcroFields();
//loop
fields.setField("Desc_", "HILINSKI, MARK");
Please help me.
Thanks.
If you don't want to form to be editable, use form flattening as is done in the FillDataSheet example. Add this to your code:
fields.setGenerateAppearances(true);
stamper.setFormFlattening(true);
If you want to change the font of specific fields, use the setFieldProperty() method to change the "textfont" as is done in the TextFieldFonts example:
BaseFont bold =
BaseFont.createFont(BaseFont.HELVETICA_BOLD, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED);
form.setFieldProperty("Desc_", "textfont", bold, null);
For more info, read the official documentation.

create Unicode pdf file contain Bangla letter using iText in java

I have to generate pdf file using iText in Netbeans IDE. The pdf may contain bangla letter. I already generate pdf file with Bangla letters. But the problem is Bangla letters are not in correct form.
Suppose I have to show: বরিশাল -- But pdf generate: [1]: http://i.stack.imgur.com/abwOV.jpg
Suppose I have to show: পড়ি -- But pdf generate: পড় ি
My code to generate this file:
Document document = new Document();
BaseFont unicode = BaseFont.createFont("c:/windows/fonts/NikoshBan.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(unicode);
PdfWriter writer=PdfWriter.getInstance(document, new FileOutputStream("TableDat.pdf"));
document.open();
document.add(new Paragraph("বরিশাল",font));
document.close();

Categories