How to display arabic characters in pdf file generated using PDFBox - java

I'm trying to display an arabic string in a pdf file generated using PDFBox, Actually i can display an arabic string from RTL using ICU4J and a specific font but the problem is this :
even if the string appear correctly from RTL the characters still separated and even if i try some fonts the problem is not solved yet.
Here is a snippet of code used for test:
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
String dir = "resources/fonts/";
//I've used several fonts
PDType0Font farialUni = PDType0Font.load(document, new File(dir + "ARIALUNI.ttf"));
PDFont fArabType = PDType0Font.load(document, new File(dir + "arabtype.ttf"));
PDFont fMuka = PDType0Font.load(document, new File(dir + "Mukadimah.ttf"));
PDFont fFreeSans = PDType0Font.load(document, new File(dir + "FreeSans.ttf"));
PDFont fNoto = PDType0Font.load(document, new File(dir + "NotoNaskhArabic-Regular.ttf"));
PDPageContentStream stream = new PDPageContentStream(document, page);
stream.beginText();
stream.setFont(fNoto, 12);
stream.setLeading(12 * 1.2);
stream.newLineAtOffset(100, 800);
//Switch text order from to RTL
BiDiClass bidiClass = new BiDiClass();
String arabicText = "\u0627\u0644\u0633\u0644\u0627\u0645 \u0639\u0644\u064A\u0643\u0645 ";
//Use icu to inverse the order
String out = bidiClass.makeLineLogicalOrder(arabicText, true);
System.out.println(out);
stream.showText(out);
stream.newLine();
// ligature
stream.showText(out);
stream.endText();
stream.close();
document.save("example.pdf");
document.close();
The resulting string that this code give me is like this :
‫ال س ل ام‪ ‬ع ل ي ك م‪ ‬‬
Note: i add the space character in the resulting string only for clarity.

Related

How to add Font with ITextRender in Java

I want to generate a Pdf from a Html file
I used ITextRenderer.
The PDf is generated without problems in english and french
I have a problem to generate an Arabic Pdf the characters in Arabic are not shown.
I tried to add a font that supports the Arabic, but it seemes that the font is not applied (the font already exists in my PC)
ITextRenderer renderer = new ITextRenderer();
if (language.equals("fr")) {
template = "LocationBillTemplateFr";
} else if (language.equals("en")) {
template = "LocationBillTemplateEn";
} else {
template = "LocationBillTemplateAr";
renderer.getFontResolver().addFont("C:\\WINDOWS\\Fonts\\simpbdo.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
}
TemplateEngine templateEngine = PdfGenerationUtils.prepareTemplateHtmlWithPdf();
String renderedHtmlContent = templateEngine.process(template, context);
String xHtml = PdfGenerationFromHtmlUtils.convertToXhtml(renderedHtmlContent);
renderer.setDocumentFromString(xHtml, PdfGenerationFromHtmlUtils.baseUrl());
renderer.layout();
String absolutePath = environment.getProperty("app.payment.receipt.location.folder");
OutputStream outputStream = new FileOutputStream(
absolutePath + File.separator + "Payment" + subscriptionId + ".pdf");
renderer.createPDF(outputStream);
outputStream.close();

PdfBox - change font or fontName in pdf file?

please tell me.
I have a pdf files with fonts HPDFAA+Arial-BoldMTBold. This font name incorrect and it's a subset...
I change fonts with library Asponse.pdf.dll, https://docs.aspose.com/pdf/net/replace-text-in-pdf/, paragraph - Replace fonts in existing PDF file, but this library trail version.
How can i do this with PDFBox? I want to replace this font on Arial-BoldMT or rename font name.
UPD: my attempts have led nowhere...In PDFontDescriptor i can rename font, but how i can apply for PDFont? Or i'm going the wrong way?
PDDocument pdfDocument = PDDocument.load(new File("Sample.pdf"));
PDPageTree pages = pdfDocument.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames()) {
PDFont font = res.getFont(fontName);
PDFontDescriptor fontDescriptor = font.getFontDescriptor();
System.out.println("fontDes: " + fontDescriptor.getFontName());
String oldFontName = fontDescriptor.getFontName();
String newFontName = oldFontName.replace("Arial-BoldMTBold", "Arial-BoldMT");
fontDescriptor.setFontName(newFontName);
System.out.println("font: " + font.getName());
}
Here's code that is tailored to your file. It will only help you if this is about many similar files.
try (PDDocument doc = PDDocument.load(new File(XXX,"outerBox.pdf")))
{
PDPage page = doc.getPage(0);
for (COSName name : page.getResources().getFontNames())
{
PDFont font = page.getResources().getFont(name);
String fontName = font.getName();
if (font instanceof PDType0Font && fontName.endsWith("BoldMTBold"))
{
PDType0Font type0font = (PDType0Font) font;
String newFontName = fontName.substring(0, fontName.length() - 4);
type0font.getCOSObject().setString(COSName.BASE_FONT, newFontName);
PDCIDFont descendantFont = type0font.getDescendantFont();
descendantFont.getCOSObject().setString(COSName.BASE_FONT, newFontName);
PDFontDescriptor fontDescriptor = descendantFont.getFontDescriptor();
fontDescriptor.setFontName(newFontName);
}
}
doc.save(new File(XXX,"outerBox-saved.pdf"));
}
PDF structure, seen with PDFDebugger:

No glyph for U+000A in font NotoSerifDevanagari-Bold

i am trying to create pdf using pdfbox. i am storing EditText data as html in Sqlite DB.
now i am retrieving data from sqliteDB and creating pdf of that. this data is having marathi language as well as english language.
i am using NotoSerifDevanagari-Bold font and have added it to assets folder. from there i am accessing this font into code. but i am getting error. please find my code and error below.
AssetManager assetManager;
PDFBoxResourceLoader.init(getApplicationContext());
File FilePath = Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS);
assetManager = getAssets();
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDFont font = PDType0Font.load(document, assetManager.open("notoserifdevanagaribold.ttf"));
PDPageContentStream contentStream;
// Define a content stream for adding to the PDF
contentStream = new PDPageContentStream(document, page);
Cursor data = mDatabaseHelper.getDataByDeckname(deckname);
StringBuilder builder=new StringBuilder();
while (data.moveToNext()) {
String front_page_desc = data.getString(3);
String back_page_desc = data.getString(4);
contentStream.beginText();
contentStream.setNonStrokingColor(15, 38, 192);
contentStream.setFont(font, 12);
contentStream.newLineAtOffset(100, 700);
contentStream.showText(Html.fromHtml(front_page_desc).toString());
contentStream.endText();
contentStream.beginText();
contentStream.setNonStrokingColor(15, 38, 192);
contentStream.setFont(font, 12);
contentStream.newLineAtOffset(100, 700);
contentStream.showText(Html.fromHtml(back_page_desc).toString());
contentStream.endText();
}
contentStream.close();
String path = FilePath.getAbsolutePath() + "/temp.pdf";
document.save(path);
document.close();
ERROR
W/System.err: java.lang.IllegalArgumentException: No glyph for U+000A in font NotoSerifDevanagari-Bold
I tried so many examples for above error but i am not able to fix the issue. this error i am getting on contentStream.showText(Html.fromHtml(front_page_desc).toString()); line. can someone please help me on above.
As per this link U+000A is the Unicode for new line. Any font will fail if you try to render it.
In order to avoid such error you can try something like this:
String[] lines = text.split("\\n");
for (String line : lines) {
if (!line.isBlank()) {
contentStream.showText(line);
// add new line here if you want to
}
}

Printing Chinese characters in pdfbox

I'm using the following set-up:
Java 11.0.1
pdfbox 2.0.15
Objective: Rendering a pdf that contains Chinese characters
Problem: java.lang.IllegalArgumentException: U+674E is not available in this font's encoding: WinAnsiEncoding
I already tried:
Using different fonts for Chinese character support. The latest one is NotoSansCJKtc-Regular.ttf
Set font to unicode as described here: Java: Write national characters to PDF using PDFBox, however the used loadTTF method is deprecated.
Using Arial-Unicode-MS_4302.ttf
My code looks like this (shortened a bit):
try (InputStream pdfIn = inputStream; PDDocument pdfDocument =
PDDocument.load(pdfIn)) {
PDFont formFont;
//Check if Chinese characters are present
if (!Util.containsHanScript(queryString)) {
formFont = PDType0Font.load(pdfDocument,
PdfReportGenerator.class.getResourceAsStream("LiberationSans-Regular.ttf"),
false);
} else {
formFont = PDType0Font.load(pdfDocument,
PdfReportGenerator.class.getResourceAsStream("NotoSansCJKtc-Regular.ttf"),
false);
}
List<PDField> fields = acroForm.getFields();
//Load fields into Map
Map<String, PDField> pdfFields = new HashMap<>();
for (PDField field : fields) {
String key = field.getPartialName();
pdfFields.put(key, field);
}
PDField currentField = pdfFields.get("someFieldID");
PDVariableText pdfield = (PDVariableText) currentField;
PDResources res = acroForm.getDefaultResources();
String fontName = res.add(formFont).getName();
String defaultAppearanceString = "/" + fontName + " 10 Tf 0 g";
pdfield.setDefaultAppearance(defaultAppearanceString);
pdfield.setValue("李柱");
acroForm.flatten(fields, true);
ByteArrayOutputStream pdfOut = new ByteArrayOutputStream();
pdfDocument.save(pdfOut);
}
Expected result: Chinese characters on pdf.
Actual result: java.lang.IllegalArgumentException: U+674E is not available in this font's encoding: WinAnsiEncoding
So my question is about how to best support rendering of Chinese characters with pdfbox. Any help is appreciated.
The following code works for me, it uses the file of PDFBOX-4629:
PDDocument doc = PDDocument.load(new URL("https://issues.apache.org/jira/secure/attachment/12977270/Report_Template_DE.pdf").openStream());
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDVariableText field = (PDVariableText) acroForm.getField("search_query");
List<PDField> fields = acroForm.getFields();
PDFont font = PDType0Font.load(doc, new FileInputStream("c:/windows/fonts/arialuni.ttf"), false);
PDResources res = acroForm.getDefaultResources();
String fontName = res.add(font).getName();
String defaultAppearanceString = "/" + fontName + " 10 Tf 0 g";
field.setDefaultAppearance(defaultAppearanceString);
field.setValue("李柱");
acroForm.flatten(fields, true);
doc.save("saved.pdf");
doc.close();

Write Heart Symbol in pdf using itext java

Trying to write heart symbol in pdf through java code .
This is my input to pdf : ❤️❤️❤️
But pdf generated is empty without anything written.
Using itext to write to pdf.
The Font used is tradegothic_lt_boldcondtwenty.ttf
OutputStream file = new FileOutputStream(fileName);
Document document = new Document(PageSize.A6);
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
PdfLayer nested = new PdfLayer("Layer 1", writer);
PdfContentByte cb = writer.getDirectContent();
cb.beginLayer(nested);
ColumnText ct = new ColumnText(cb);
Font font = getFont();
Phrase para1 = new Phrase("❤️❤️❤️",font);
ct.setSimpleColumn(para1,38,0,260,138,15, Element.ALIGN_LEFT);
ct.go();
cb.endLayer();
document.close();
file.close();
private Font getFont() {
final String methodName = "generatePDF";
LOGGER.entering(CLASSNAME, methodName);
Font font = null;
try {
String filename = tradegothic_lt_boldcondtwenty.ttf;
FontFactory.register(filename, filename);
font = FontFactory.getFont(filename, BaseFont.CP1252, BaseFont.EMBEDDED,11.8f);
} catch(Exception exception) {
LOGGER.logp(Level.SEVERE, CLASSNAME, methodName, "Exception Occurred while fetching the Trade Gothic font." + exception);
font = FontFactory.getFont(FontFactory.HELVETICA_BOLD,11.8f);
}
return font;
}
Phrase para1 has the heart correctly. But not able to see in pdf

Categories