Character encoding errors with DynamicReport

Character encoding errors with DynamicReport - java

I'm having problems with character encodings with DynamicReports in Jasper Reports. I don't know where you should indicate the encoding. There are problems with accented characters. I have tried:
exporter.setParameter(JRExporterParameter.CHARACTER_ENCODING, "UTF-8"); //CP1252
exporter.setParameter(JRPdfExporterParameter.CHARACTER_ENCODING, "UTF-8");
The screen capture linked to below shows that the characters are shown correctly in my code but not in the report. How can I set the encoding in the report correctly?

Today I had same problem, I will describe my solution.
My problem is not about encoding, it is about font.
DynamicReports create the pdf document with Helvetica font.
When I changed fontName to "DejaVu Serif", then problem was solved.
StyleBuilder myStyle= stl.style().setPadding(2);
myStyle.setFontName("DejaVu Serif");
TextColumnBuilder<Double> weightCol = col.column("Ağırlığı", "weight", type.doubleType());
weightCol.setStyle(myStyle);

Related

Java iText7 UTF8 Characters [duplicate]

I am trying to create a pdf with greek characters using iText 7 for Java.
Only latin characters and numbers are visible in the PDF.
I am loading fonts using this code:
PdfFont normalFont = PdfFontFactory.createFont(FontConstants.HELVETICA, "CP1253");
What should I do?

This is the solution:
PdfFont normalFont = PdfFontFactory.createFont("C:\\Windows\\Fonts\\arial.ttf", "Identity-H", true);
You can use any font that supports your language. Also Identity-H seems to be important as the encoding of the PDF file.

OpenHtmlToPdf cyrillic symbols displaying

I have a problem displaying Cyrillic symbols. I have an HTML containing Cyrillic symbols. The problem is that after converting they all displaying like ### instead of symbols. I'm using the library like this:
var document = Jsoup.parse(new ByteArrayInputStream(resultHtml), "UTF-8", "/");
ByteArrayOutputStream os = new ByteArrayOutputStream();
try (os) {
var temp = new W3CDom().fromJsoup(document);
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.toStream(os);
builder.useFont(new File("/resources/fonts/times.ttf"), "Times");
builder.withW3cDocument(temp, null);
builder.run();
}
return os;
The resultHtml is a HTML string and it's okay, because using library iText7 I got the result I wanted: I got PDF with normal symbols, but the problem is that it's not free, I'm saying this only to cut the area of possible problems, so I assume the problem is in how I use the library. I don't really have any resources related to html, that's why it's baseUri is / and null. Library gives me 2 warnings but I don't think the problem is because of that because it says it's ignoring it.
com.openhtmltopdf.css-parse WARNING:: (null#inline_style_1) so-language is an unrecognized CSS property at line 21. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: (null#inline_style_1) so-language is an unrecognized CSS property at line 32. Ignoring declaration.
I checked in the debug, I can see the document is okay because I can see the formed HTML with cyrillic symbols normally, but the temp is becoming [#document:null]. I read that it doesn't mean the document is null, but maybe it's the problem? I tried different charsets like CP1251, CP1252 but they're giving strange symbols too. At first I tried all charsets without the font declaration, because the only font in use is TimesNewRoman and I think it's default, but then added it in resources and in code declaration, but it didn't help. I'm using 1.0.10 version of the library and 1.14.3 version of jsoup.

Copy formatted text between PDF documents, empty unicode mapping for font

I'm using PDFBox 2.0.8 for pdf content extraction, converting it to JSON and then building a new document from created JSON (to clean possible vulnerabilities). I've extended PDFTextStripper class for getting font info:
PDFont font = textPosition.getFont() // it is embedded font
Now I'm trying to write just the same extracted character with its font to new pdf document:
contentStream.setFont(font, 16);
contentStream.showText(text);
and I'm getting java.lang.IllegalArgumentException: No glyph for U+004A in font HLOXAY+Birka-SemiBoldItalic exception on the second line.
The text I want to write is "John Whitington" from the third page of a "PDF Explained" book.
I've already read that it is because of current font doesn't have a Unicode mapping. But as I understand if this text is displayed in all readers there should be a way for copying it to another pdf.
I just want to full copy text and fonts info between documents.
Sorry if this duplicates any question here, but after a few days of searching, I still can't find an acceptable solution. Thanks in advance for any help.

Drawing glyph from ZapfDingbats using PDFbox

I am trying to draw a checkmark (found in the PDF standard ZapfDingbats font, Unicode 2714) in my PDF document. I'm a newbie to Apache's PDFBox, using version 2.0.0 at the moment (no specific reason except that it's the newest).
My code looks as follows:
PDDocument document = PDDocument.load(new File("myfile.pdf"));
PDPage page = document.getPages().get(0); // first page
PDPageContentStream contentStream = new PDPageContentStream(document, page, AppendMode.APPEND, true);
PDType1Font font = PDType1Font.ZAPF_DINGBATS;
String glyph = "\u2714";
contentStream.beginText();
contentStream.setFont(font, fontSize);
contentStream.newLineAtOffset(10, 10); // towards lower left corner of page
contentStream.showText(glyph);
contentStream.endText();
contentStream.close();
document.save("output.pdf");
document.close();
... but this produces a nice Exception:
Exception in thread "main" java.lang.IllegalArgumentException: U+2714 ('a20') is not available in this font's encoding: WinAnsiEncoding
at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:345)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:286)
:
Debugging through the code shows that what happens at PDType1Font.java:345 is:
(PDType1Font extends PDSimpleFont)
PDSimpleFont.glyphList correctly contains a mapping from the Unicode codepoint (U+2714) to a PDF name ("a20") as shown in the Exception text (set up in PDSimpleFont's constructor for the ZapfDingbat glyphs).
... but the PDSimpleFont.encoding, which is set to WinAnsiEncoding in PDType1Font's constructor line 110, does not contain the name a20 - these names (encodings) are set up statically in the WinAnsiEncoding class - see the WIN_ANSI_ENCODING_TABLE constant at line 36.
Has anyone managed showing Dingbat glyphs using PDFbox - even if in a lower version?

I suspect it is a bug (a20 should be converted to 064 according to "ZapfDingbats Set and Encoding" and I can't find where this is being done), please open an issue in JIRA. In the meantime, here's a workaround if you're using windows:
instead of
PDType1Font font = PDType1Font.ZAPF_DINGBATS;
use
PDFont font = PDType0Font.load(document, new File("c:/windows/fonts/arialuni.ttf"));
Update: now solved
This was indeed found to be a bug and JIRA issue PDFBOX-3298 addressed this. It is now resolved in PDFBox version 2.0.3.

Adding support to the new Rupee symbol to iTextPDF in Java

My project is related to a billing application and I am using the iTextPdf library for PDF file generation. However my requirement is to display the new Rupee symbol in the PDF generated by iTextPdf, instead of "Rs.".
I know that following Unicode \u20B9 is for the new Rupee symbol. I am using the following code for formatting:
String formater(String a) {
DecimalFormat formatter = new DecimalFormat("\u20B9 000");
return formatter.format(Double.parseDouble(a));
}
But the generated PDF file does not show any Rupee symbol. So how can I use that with the iTextPdf library? Is there any additional font required to be merged with the library itself?

changing in IDE is not the issue.
Your iTextPDF will be writing content into PDF in a particular character set. that controls if the data is shown properly or not.
you can try these 2 links
SO question for that contain how to check character set
iTextPdf site for how to correct character set

Thanks mkl, Naveen for the help.
Hope this can help someone, I did the following for this:
Step 1: Downloaded Font that has Rupee Symbol Unicode (for this I updated my windows (Windows Update and got the arial font with rupee symbol)
Step 2: Using iText I created the embedded base font with IDENTITY_H encoding:
BaseFont baseFont = BaseFont.createFont(this.getClass().getResource("arial.ttf").toString(), BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(baseFont);
Now the pdf generated has the new rupee symbol.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Character encoding errors with DynamicReport - java

Related

Java iText7 UTF8 Characters [duplicate]

OpenHtmlToPdf cyrillic symbols displaying

Copy formatted text between PDF documents, empty unicode mapping for font

Drawing glyph from ZapfDingbats using PDFbox

Adding support to the new Rupee symbol to iTextPDF in Java

Categories

Resources