Error trying to show Spanish or French characters with PDFBOX - java

I´m trying to create a PDF with PDFBOX-2.0.0-SNAPSHOT but I´m having problems and errors.
This is the typical Hello World example with Spanish and French characters:
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDType1Font font = PDType1Font.HELVETICA;
PDPageContentStream stream = new PDPageContentStream(document, page);
String text = "áÁÀà";
stream.beginText();
stream.setFont(font, 12);
stream.newLineAtOffset(100, 700);
stream.showText(text);
stream.endText();
stream.close();
document.save("sample.pdf");
document.close();
And I get this error:
sep 02, 2015 12:42:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
ADVERTENCIA: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.IllegalArgumentException: This font type only supports 8-bit code points
If I load arialuni.ttf font it compiles but only get question marks in the PDF file.
I have tried PDFBOX 1.8 and doesn´t work either.
Any idea?
Thanks in advance.
UPDATE:
After some test I realized that if you change the encoding of the project (at least in Intellij IDEA) and don´t retype the problematic characters in the code, the new encoding doesn´t take effect.

The PDType1Font.XXX are fonts which are provided by the PDF viewers itself which don't support unicode. You should be able to use a TTF font like on: https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/EmbeddedFonts.java
PDType0Font font = PDType0Font.load(document, new File("path/YourFont.ttf"));

Related

PdfBox write hindi characters in pdf file

I tried many things to write hindi characters using Apache PdfBox but seems its existing issue in the library.
I tried many font files available, Can someone really help me out in this.
I tried following :
PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDFont font = PDTrueTypeFont.loadTTF( doc, new FileInputStream(new File("D:\\Data\\fonts\\dn.ttf")));
font.setFontEncoding(new WinAnsiEncoding());
PDPageContentStream content = new PDPageContentStream( doc, page, true, false );
content.setFont(font, 10);
content.beginText();
content.moveTextPositionByAmount( 200, 100 );
content.drawString( "हिंदी" ); // Writing word "Hindi" in hindi language.
content.endText();
content.close();
doc.save( new FileOutputStream(new File("D:\\testOutput1.pdf")));
doc.close();
It's working for me in PDFBox.
The trick here is to use non-unicode string instead of unicode string.
Use Kruti Dev Font given in below link.
Then convert your unicode string to non-unicode string.
And finally use that converted string in your code.
That means replace this like
content.drawString( "हिंदी" ); // Writing word "Hindi" in hindi language.
With this line
content.drawString( "fganh" ); // Writing word "Hindi" in hindi language.
Convert Unicode (Mangal) To Kruti Dev Font
I think this cannot be done using PdfBox as there are lot of issues with it.
I tried many fonts and the encoding types of PdfBox but failed to write in Hindi.
At the end I tried it in Node Js express pdfmaker() which converts Html to PDF, However I had issues on my Linux server and I installed appropriate ttf font and it worked !

Drawing glyph from ZapfDingbats using PDFbox

I am trying to draw a checkmark (found in the PDF standard ZapfDingbats font, Unicode 2714) in my PDF document. I'm a newbie to Apache's PDFBox, using version 2.0.0 at the moment (no specific reason except that it's the newest).
My code looks as follows:
PDDocument document = PDDocument.load(new File("myfile.pdf"));
PDPage page = document.getPages().get(0); // first page
PDPageContentStream contentStream = new PDPageContentStream(document, page, AppendMode.APPEND, true);
PDType1Font font = PDType1Font.ZAPF_DINGBATS;
String glyph = "\u2714";
contentStream.beginText();
contentStream.setFont(font, fontSize);
contentStream.newLineAtOffset(10, 10); // towards lower left corner of page
contentStream.showText(glyph);
contentStream.endText();
contentStream.close();
document.save("output.pdf");
document.close();
... but this produces a nice Exception:
Exception in thread "main" java.lang.IllegalArgumentException: U+2714 ('a20') is not available in this font's encoding: WinAnsiEncoding
at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:345)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:286)
:
Debugging through the code shows that what happens at PDType1Font.java:345 is:
(PDType1Font extends PDSimpleFont)
PDSimpleFont.glyphList correctly contains a mapping from the Unicode codepoint (U+2714) to a PDF name ("a20") as shown in the Exception text (set up in PDSimpleFont's constructor for the ZapfDingbat glyphs).
... but the PDSimpleFont.encoding, which is set to WinAnsiEncoding in PDType1Font's constructor line 110, does not contain the name a20 - these names (encodings) are set up statically in the WinAnsiEncoding class - see the WIN_ANSI_ENCODING_TABLE constant at line 36.
Has anyone managed showing Dingbat glyphs using PDFbox - even if in a lower version?
I suspect it is a bug (a20 should be converted to 064 according to "ZapfDingbats Set and Encoding" and I can't find where this is being done), please open an issue in JIRA. In the meantime, here's a workaround if you're using windows:
instead of
PDType1Font font = PDType1Font.ZAPF_DINGBATS;
use
PDFont font = PDType0Font.load(document, new File("c:/windows/fonts/arialuni.ttf"));
Update: now solved
This was indeed found to be a bug and JIRA issue PDFBOX-3298 addressed this. It is now resolved in PDFBox version 2.0.3.

Font embedding error PDFA/1a iText 5.5.6

I have a class given below which is giving me Exception: Exception in thread "main" com.itextpdf.text.DocumentException: com.itextpdf.text.pdf.PdfAConformanceException: All the fonts must be embedded. This one isn't: ZapfDingbats
I have the ZapfDingbats font embedded but I am still getting this Exception.
What i am trying to achieve here is create a list with a bullet in front of every item in the list.
What am i missing here?
public class SquareBullet {
public static void main(String[] args) throws IOException, DocumentException, XMPException {
Document document = new Document();
PdfAWriter writer = PdfAWriter.getInstance(document, new FileOutputStream("list.pdf"), PdfAConformanceLevel.PDF_A_1A);
writer.setViewerPreferences(PdfAWriter.PageModeUseOutlines);
writer.setRunDirection(PdfAWriter.RUN_DIRECTION_LTR);
writer.setTagged(PdfAWriter.markAll);
writer.createXmpMetadata();
XmpWriter xmp = writer.getXmpWriter();
DublinCoreProperties.addSubject(xmp.getXmpMeta(), "Subject");
DublinCoreProperties.setTitle(xmp.getXmpMeta(), "Title", "en_US", "en_US");
DublinCoreProperties.setDescription(xmp.getXmpMeta(), "Description", "en_US", "en_US");
PdfProperties.setKeywords(xmp.getXmpMeta(), "Keywords");
PdfProperties.setVersion(xmp.getXmpMeta(), "1.4");
document.addLanguage("en_US");
document.open();
Font font = FontFactory.getFont(FontFactory.ZAPFDINGBATS, BaseFont.ZAPFDINGBATS, BaseFont.EMBEDDED, 12);
Font font1 = FontFactory.getFont(FontFactory.HELVETICA, BaseFont.WINANSI, BaseFont.EMBEDDED, 12);
ICC_Profile icc = ICC_Profile.getInstance(new FileInputStream("sRGB Color Space Profile.icm"));
writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
List list = new List(10);
list.setListSymbol(new Chunk(String.valueOf((char)110), font));
list.add(new ListItem(new Chunk("Test 1", font1)));
list.add(new ListItem(new Chunk("Test 2", font1)));
list.add(new ListItem(new Chunk("Test 3", font1)));
document.add(list);
document.close();
}
}
Your claim I have the ZapfDingbats font embedded is wrong.
Granted, you define the font like this:
Font font = FontFactory.getFont(FontFactory.ZAPFDINGBATS,
BaseFont.ZAPFDINGBATS, BaseFont.EMBEDDED, 12);
As you use BaseFont.EMBEDDED, you might assume that the font will be embedded, but it isn't. You can check this by using that font in any other PDF that isn't PDF/A: if you go to Document Properties > Fonts, you'll see that the font isn't embedded.
Why is this?
There are 14 special fonts in PDF. We refer to them as the Standard Type 1 fonts. Every PDF viewer should be able to render text that uses those fonts, hence these fonts don't need to be embedded: 4 Helvetica fonts (regular, bold, italic, bold-italic), 4 Times Roman fonts (regular, bold, italic, bold-italic), 4 Courier fonts (regular, bold, italic, bold-italic), Symbol and Zapfdingbats.
iText ships with the AFM files of these fonts. AFM stands for Adobe Font Metrics and the files contain data about the widths, bounding boxes, and other metrics of glyphs that are available in each font.
The actual description of the shape of these fonts isn't shipped with iText. These are stored in a PFB (Printer Font Binary) file. Without these PFB files, iText can't (and won't) embed these Standard Type 1 fonts.
In other words: iText ignores the BaseFont.EMBEDDED parameter.
This is documented on many places. If you want to create PDF/A, you need font files, such as TTF, OTF, TTC files, or a combination of AFM and PFB files.
you must add the "jasperreports-fonts-" jar to your classpath.

PDF Generation having multi-lingual text using Flying Saucer

I am trying to print Arabic and English text in PDF using Flying Saucer library. Here's my code :
String inputFile = "D:/test.xhtml";
String url = new File(inputFile).toURI().toURL().toString();
String outputFile = "D:/doc.pdf";
OutputStream os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
ITextFontResolver resolver = renderer.getFontResolver();
resolver.addFont("D:/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
os.close();
and my XHTML file has following data enclosed in paragraph tags:
اب اب اب اب Hello
The output generated displays only English characters but not Arabic glyphs. Please help.
for some reason, if no specific font is used, the generated PDF uses some kind of default (probably Helvetica) font, that contains a very limited character set, that obviously does not contain the Greek code page.
Reference
Arial is a pretty standard font, installed by default in most operating system, and implements a wide variety of alphabets (including Greek).

how to handle with writing to pdf file chinese characters

i am trying to get text from properties file that he is coded in utf-8 and write it in to a PDF file using document object in java .
Document document = new Document();
File file = new File(FILES_PATH + ".pdf");
FileOutputStream fos = new FileOutputStream(file);
PdfWriter.getInstance(document, fos);
.
.
.
pdfTable table;
document.add(table);
document.close();
when i get just the value from property is ignores Chinese characters .
when i try to encode the string instead Chinese characters i get
strange words or "?".
tried to code it in utf-8 , iso-8859-1 , gbk or gb3212.
need help that PDF file will be able to get Chinese characters
It will not work that way.
In order to display Unicode character in PDF, that is not in build-in PDF fonts, you need to specify custom font for the text frangment and create the separate fragment for each text fragment, that is covered by given font. You need also to embed the used fonts into PDF document (so please consider, if the licence for the fonts you use enables distributing them).
So each String could be rendered using many fonts. But iText has the class FontSelector, that does that task:
FontSelector selector = new FontSelector();
BaseFont bf1 = BaseFont.createFont(fontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
bf1.setSubset(true);
Font font1 = new Font(baseFont, 12, Font.BOLD);
selector.addFont(font1);
// ... do that with all fonts you need
Phrase ph = selector.process(TEXT);
document.add(new Paragraph(ph));
More complex example you can find in my article: Using dynamic fonts for international texts in iText

Categories