I am trying to draw a checkmark (found in the PDF standard ZapfDingbats font, Unicode 2714) in my PDF document. I'm a newbie to Apache's PDFBox, using version 2.0.0 at the moment (no specific reason except that it's the newest).
My code looks as follows:
PDDocument document = PDDocument.load(new File("myfile.pdf"));
PDPage page = document.getPages().get(0); // first page
PDPageContentStream contentStream = new PDPageContentStream(document, page, AppendMode.APPEND, true);
PDType1Font font = PDType1Font.ZAPF_DINGBATS;
String glyph = "\u2714";
contentStream.beginText();
contentStream.setFont(font, fontSize);
contentStream.newLineAtOffset(10, 10); // towards lower left corner of page
contentStream.showText(glyph);
contentStream.endText();
contentStream.close();
document.save("output.pdf");
document.close();
... but this produces a nice Exception:
Exception in thread "main" java.lang.IllegalArgumentException: U+2714 ('a20') is not available in this font's encoding: WinAnsiEncoding
at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:345)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:286)
:
Debugging through the code shows that what happens at PDType1Font.java:345 is:
(PDType1Font extends PDSimpleFont)
PDSimpleFont.glyphList correctly contains a mapping from the Unicode codepoint (U+2714) to a PDF name ("a20") as shown in the Exception text (set up in PDSimpleFont's constructor for the ZapfDingbat glyphs).
... but the PDSimpleFont.encoding, which is set to WinAnsiEncoding in PDType1Font's constructor line 110, does not contain the name a20 - these names (encodings) are set up statically in the WinAnsiEncoding class - see the WIN_ANSI_ENCODING_TABLE constant at line 36.
Has anyone managed showing Dingbat glyphs using PDFbox - even if in a lower version?
I suspect it is a bug (a20 should be converted to 064 according to "ZapfDingbats Set and Encoding" and I can't find where this is being done), please open an issue in JIRA. In the meantime, here's a workaround if you're using windows:
instead of
PDType1Font font = PDType1Font.ZAPF_DINGBATS;
use
PDFont font = PDType0Font.load(document, new File("c:/windows/fonts/arialuni.ttf"));
Update: now solved
This was indeed found to be a bug and JIRA issue PDFBOX-3298 addressed this. It is now resolved in PDFBox version 2.0.3.
Related
I tried many things to write hindi characters using Apache PdfBox but seems its existing issue in the library.
I tried many font files available, Can someone really help me out in this.
I tried following :
PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDFont font = PDTrueTypeFont.loadTTF( doc, new FileInputStream(new File("D:\\Data\\fonts\\dn.ttf")));
font.setFontEncoding(new WinAnsiEncoding());
PDPageContentStream content = new PDPageContentStream( doc, page, true, false );
content.setFont(font, 10);
content.beginText();
content.moveTextPositionByAmount( 200, 100 );
content.drawString( "हिंदी" ); // Writing word "Hindi" in hindi language.
content.endText();
content.close();
doc.save( new FileOutputStream(new File("D:\\testOutput1.pdf")));
doc.close();
It's working for me in PDFBox.
The trick here is to use non-unicode string instead of unicode string.
Use Kruti Dev Font given in below link.
Then convert your unicode string to non-unicode string.
And finally use that converted string in your code.
That means replace this like
content.drawString( "हिंदी" ); // Writing word "Hindi" in hindi language.
With this line
content.drawString( "fganh" ); // Writing word "Hindi" in hindi language.
Convert Unicode (Mangal) To Kruti Dev Font
I think this cannot be done using PdfBox as there are lot of issues with it.
I tried many fonts and the encoding types of PdfBox but failed to write in Hindi.
At the end I tried it in Node Js express pdfmaker() which converts Html to PDF, However I had issues on my Linux server and I installed appropriate ttf font and it worked !
I've located a region of interest in the page by tracking TextPosition objects using PDFTextStripper as shown in the example: https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintTextLocations.java
As shown, the TextPosition has been retrieved from fields like
text.getXDirAdj(), text.getWidthDirAdj(), text.getYDirAdj(), text.getHeightDir() .
From this example I tried to keep everything else the same except setting the cropBox of the target page.
https://github.com/apache/pdfbox/blob/2.0.3/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java
OLD CROPBOX: [0.0,0.0,595.276,841.89] -> NEW CROPBOX [50.0,42.0,592.0,642.0].
So how can I use the getYDirAdj and getXDirAdj in setting the cropbox correctly ?
The original pdf file I'm processing can be downloaded from here: http://downloadcenter.samsung.com/content/UM/201504/20150407095631744/ENG-US_NMATSCJ-1.103-0330.pdf
Cropping the page
In a comment the OP reduced his problem to
Ok. Given a java PDRectangle rect = new PDRectangle(40f, 680f, 510f, 100f) obtained from TextLocation how would a java code snippet, that sets the cropBox of a single page look like ? Or how would you do it? TextLocation based rect --> some transformation --> setCropBox(theRightBox).
To set the crop box of the page twelve of the given document to the given PDRectangle you can use code like this:
PDDocument pdDocument = PDDocument.load(resource);
PDPage page = pdDocument.getPage(12-1);
page.setCropBox(new PDRectangle(40f, 680f, 510f, 100f));
pdDocument.save(new File(RESULT_FOLDER, "ENG-US_NMATSCJ-1.103-0330-page12cropped.pdf"));
(SetCropBox.java test method testSetCropBoxENG_US_NMATSCJ_1_103_0330)
Adobe Reader now shows merely this part of page twelve:
Beware, though, the page in question does not only specify a media box (mandatory) and a crop box, it also defines a bleed box and an art box. Thus, application which consider those boxes more interesting than the crop box, might display the page differently. In particular the art box (being defined as "the extent of the page’s meaningful content") might by some applications be considered important.
Rendering the cropped page
In a comment to this answer the OP remarked
This is good and works. It correctly saves the page in the PDF file. I've tried to do the same in JPG and failed.
I reduced the OP's code to the essentials
PDDocument pdDocument = PDDocument.load(resource);
PDPage page = pdDocument.getPage(12-1);
page.setCropBox(new PDRectangle(40f, 680f, 510f, 100f));
PDFRenderer renderer = new PDFRenderer(pdDocument);
BufferedImage img = renderer.renderImage(12 - 1, 4f);
ImageIOUtil.writeImage(img, new File(RESULT_FOLDER, "ENG-US_NMATSCJ-1.103-0330-page12cropped.jpg").getAbsolutePath(), 300);
pdDocument.close();
(SetCropBox.java test method testSetCropBoxImgENG_US_NMATSCJ_1_103_0330)
The result:
Thus, I cannot reproduce an issue here.
Possible details to check for:
ImageIOUtil is not part of the main PDFBox artifact, instead it is located in pdfbox-tools; does the version of that artifact match the version of the core pdfbox artifact?
I run the code in an Oracle Java 8 environment; other Java environments might give rise to different results.
There are minor differences in our implementations. E.g. I load the PDF via an InputStream, you directly from file system, I have hardcoded the page number, you have it in some variable, ... None of these differences should cause your problem, but who knows...
I´m trying to create a PDF with PDFBOX-2.0.0-SNAPSHOT but I´m having problems and errors.
This is the typical Hello World example with Spanish and French characters:
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDType1Font font = PDType1Font.HELVETICA;
PDPageContentStream stream = new PDPageContentStream(document, page);
String text = "áÁÀà";
stream.beginText();
stream.setFont(font, 12);
stream.newLineAtOffset(100, 700);
stream.showText(text);
stream.endText();
stream.close();
document.save("sample.pdf");
document.close();
And I get this error:
sep 02, 2015 12:42:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
ADVERTENCIA: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.IllegalArgumentException: This font type only supports 8-bit code points
If I load arialuni.ttf font it compiles but only get question marks in the PDF file.
I have tried PDFBOX 1.8 and doesn´t work either.
Any idea?
Thanks in advance.
UPDATE:
After some test I realized that if you change the encoding of the project (at least in Intellij IDEA) and don´t retype the problematic characters in the code, the new encoding doesn´t take effect.
The PDType1Font.XXX are fonts which are provided by the PDF viewers itself which don't support unicode. You should be able to use a TTF font like on: https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/EmbeddedFonts.java
PDType0Font font = PDType0Font.load(document, new File("path/YourFont.ttf"));
I'm having problems with character encodings with DynamicReports in Jasper Reports. I don't know where you should indicate the encoding. There are problems with accented characters. I have tried:
exporter.setParameter(JRExporterParameter.CHARACTER_ENCODING, "UTF-8"); //CP1252
exporter.setParameter(JRPdfExporterParameter.CHARACTER_ENCODING, "UTF-8");
The screen capture linked to below shows that the characters are shown correctly in my code but not in the report. How can I set the encoding in the report correctly?
Today I had same problem, I will describe my solution.
My problem is not about encoding, it is about font.
DynamicReports create the pdf document with Helvetica font.
When I changed fontName to "DejaVu Serif", then problem was solved.
StyleBuilder myStyle= stl.style().setPadding(2);
myStyle.setFontName("DejaVu Serif");
TextColumnBuilder<Double> weightCol = col.column("Ağırlığı", "weight", type.doubleType());
weightCol.setStyle(myStyle);
I am using itext and ColdFusion (java) to write text strings to a PDF document. I have both trueType and openType fonts that I need to use. Truetype fonts seem to be working correctly, but the kerning is not being used for any font file ending in .otf. The code below writes "Line 1 of Text" in Airstream (OpenType) but the kerning between "T" and "e" is missing. When the same font is used in other programs, it has kerning. I downloaded a newer version of itext also, but the kerning still did not work. Does anyone know how to get kerning to work with otf fonts in itext?
<cfscript>
pdfContentByte = createObject("java","com.lowagie.text.pdf.PdfContentByte");
BaseFont= createObject("java","com.lowagie.text.pdf.BaseFont");
bf = BaseFont.createFont("c:\windows\fonts\AirstreamITCStd.otf", "" , BaseFont.EMBEDDED);
document = createobject("java","com.lowagie.text.Document").init();
fileOutput = createObject("java","java.io.FileOutputStream").init("c:\inetpub\test.pdf");
writer = createobject("java","com.lowagie.text.pdf.PdfWriter").getInstance(document,fileOutput);
document.open();
cb = writer.getDirectContent();
cb.beginText();
cb.setFontAndSize(bf, 72);
cb.showTextAlignedKerned(PdfContentByte.ALIGN_LEFT,"Line 1 of Text",0,72,0);
cb.endText();
document.close();
bf.hasKernPairs(); //returns NO
bf.getClass().getName(); //returns "com.lowagie.text.pdf.TrueTypeFont"
</cfscript>
according the socalled spec: http://www.microsoft.com/typography/otspec/kern.htm
OpenType™ fonts containing CFF outlines are not supported by the 'kern' table and must use the 'GPOS' OpenType Layout table.
I checked out the source, IText implementation only check the kern for truetype font, not read GPOS table at all, so the internal kernings must be empty, and the hasKernPairs must return false.
So, there have 2 way to solove:
get rid of the otf you used:)
patch the truetypefont by reading the GPosition table
wait for me, I'm processing the cff content, but PDF is optional of ever of my:) but not exclude the possibility:)
Have a look at this thread about How to use Open Type Fonts in Java.
Here is stated that otf is not supported by java (not even with iText). Otf support depends on sdk version and OS.
Alternatively you could use FontForge which converts otf to ttf.