Surrogate Chinese Characters Not Displaying iText Java

Surrogate Chinese Characters Not Displaying iText Java - java

Using iText 5.5.11 from the maven repo https://mvnrepository.com/artifact/com.itextpdf/itextpdf/5.5.11
public class test {
public static void main(String[] args) throws DocumentException, IOException {
final String text = "BMP: \u6d4b \u8bd5 Surrogate: \uD841\uDF0E \uD841\uDF31 \uD859\uDC02";
BaseFont baseFont = BaseFont.createFont("C:\\Windows\\Fonts\\arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(baseFont, 6.8f);
Document doc = new Document();
PdfWriter.getInstance(doc, new FileOutputStream("out.pdf"));
doc.open();
Paragraph p = new Paragraph();
p.add(new Phrase(text, font));
doc.add(p);
doc.close();
}
}
The non-surrogate characters in the basic multilingual plane are rendered on the resulting pdf, but the surrogate characters are not.
Edit: Also tried with font "STSong-Light" with encoding "UniGB-UCS2-H" (as in examples in book). Same result - surrogate characters missing.
Edit2: Got it to work with "SimSun-ExtB" font

This is usually a sign that the font being used (in this case Arial) does not have the glyphs for your characters.

Related

iText- Appending arabic text in pdf table cell phrase at different positions in a page

I want to make a pdf report with English and Arabic texts. I have many tables/phrases across the page. I want to display Arabic text also along with English. I have seen the Arabic example in iText doxument also, using ColumnText. I couldn't help myself with that. My doubt is how to set canvas.setSimpleColumn(36, 750, 559, 780), the arguments in this method for tables/phrases at different positions. I have referred below questions also.Still I have issues.
Writing Arabic in pdf using itext,
http://developers.itextpdf.com/examples/font-examples/language-specific-exampleshe
Below is my code..
private static final String ARABIC = "\u0627\u0644\u0633\u0639\u0631 \u0627\u0644\u0627\u062c\u0645\u0627\u0644\u064a";
private static final String FONT = "resources/fonts/ARIALUNI.TTF";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("test.pdf"));
document.open();
Font f = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
PdfPTable table = new PdfPTable(3);
Phrase phrase = new Phrase();
Chunk chunk = new Chunk("test value", inlineFont);
phrase.add(chunk);
// I want to add Arabic text also here..but direction is not //correct.also coming as single alphabets
p.add(new Chunk(ARABIC, f));
PdfPCell cell1 = new PdfPCell(phrase);
cell1.setFixedHeight(50f);
table.addCell(cell1);
document.add(table);
document.close();

Your code is kind of sloppy.
For example:
you define a PdfPTable with 3 columns, but you only add a single cell. That table will never be rendered.
you define a Phrase with name phrase, but later in your code you use p.add(...). There is no variable with name p in your code.
...
This lack of respect for the StackOverflow reader can result in not getting an answer, because you are expecting the reader not only to fix the actual problem –not being able to use English and Arabic text in a single PdfPCell—, but also to fix all the other (avoidable) errors in your code.
This is a working example:
public static final String FONT = "resources/fonts/NotoNaskhArabic-Regular.ttf";
public static final String ARABIC = "\u0627\u0644\u0633\u0639\u0631 \u0627\u0644\u0627\u062c\u0645\u0627\u0644\u064a";
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
Font f = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
PdfPTable table = new PdfPTable(1);
Phrase phrase = new Phrase();
Chunk chunk = new Chunk("test value");
phrase.add(chunk);
phrase.add(new Chunk(ARABIC, f));
PdfPCell cell = new PdfPCell(phrase);
cell.setUseDescender(true);
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
table.addCell(cell);
document.add(table);
document.close();
}
The result looks like this:
As you can see, both the English and the Arabic text can be read fine. You may be surprised by the alignment and the order of the text. As we are working in the Right-to-Left writing system, left and right are switched. By default, text is left aligned, but as soon as we introduce the RTL run direction, this changes to right aligned.
In your code, you add the English text first, followed by the Arabic text. Text in Arabic is read from right to left. That's why you see the English text to the right, and why the Arabic text is added to the left of the English text.
All of this has been improved in iText 7. iText 7 has an extra pdfCalligraph module that takes care of other writing systems in a transparent way.

Can't export Vietnamese characters to PDF using iText

I'm trying to export Vietnamese characters to PDF using iText. I tried to use
BaseFont bf = BaseFont.createFont(fontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
It displays some unicode characters correctly, for example Russian, but not the ones with accents in Vietnamese language (ạ,ã,ố etc.).
Here's the class I wrote:
public class PDFMaker {
private final static String FILE = "FilePdf.pdf";
public static File fontFile = new File("fonts/arialuni.ttf");
public static void makePDF() throws IOException{
try{
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(FILE));
BaseFont bf = BaseFont.createFont(fontFile.getAbsolutePath(), BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(bf,15);
document.open();
document.add(new Paragraph("Đại học bách khoa Hà Nội", font));
document.close();
} catch (FileNotFoundException | DocumentException e) {
e.printStackTrace(System.out);
}}
It displays: Đi hc bách khoa Hà Ni. Please help.

The characters aren't shown, because MS Arial Unicode doesn't know those characters. You need to use another font. For instance: I downloaded a package of Vietnames fonts from SourceForge and I replaced arialuni.ttf in your code sample with vuArial.ttf (found in the downloaded package). When using that font, all the characters were visible.

Unicode characters in iText PDF

I need help with iText I look at some Google result and some here but don't find anything that work for me. I need to use polish character in my pdf but i got nothing for no. Here is a code that I think is important if need something else write in comment:
private static Font bigFont = new Font(Font.FontFamily.HELVETICA, 18, Font.BOLD);
another
Paragraph par = new Paragraph(Łabadzak, bigFont);
Can any1 tell me what to do to make that Ł visible in pdf and other polish character
UPDATE
I fund this but dunno how to use it for my project
Polish character in itext PDF

You need a unicode font. Here is an example:
BaseFont bf = BaseFont.createFont("arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Paragraph p = new Paragraph("Şinasi ıssız ile ağaç", new Font(bf, 22));
document.add(p);
http://abdullahakay.blogspot.com/2011/11/java-itext-unicode.html
EDIT:
Here, the font file name arialuni.tff is a static resource directly under /src/main/resources/ and can be any Unicode Font File of your choice. Here's a list of free Unicode Font Files available online.

It depends on used font and encoding. i found something like this:
http://itext-general.2136553.n4.nabble.com/Polish-National-Characters-are-not-getting-displayed-in-the-PDF-created-by-iTExt-td2163833.html
There is example like this:
BaseFont bf = BaseFont.createFont("c:/windows/fonts/arial.ttf",
BaseFont.CP1250, BaseFont.EMBEDDED);
Font font = new Font(bf, 12);
String polish = "\u0104\u0105\u0106\u0107\u0118\u0119";
document.add(new Paragraph(polish, font));
Remember that some fonts does not contain polish national characters.

In case you are using com.itextpdf.kernel package you can use any encoding which is not present in PdfEncodings class
PdfWriter writer ;
writer = new PdfWriter( dest ) ;
PdfDocument pdf = new PdfDocument( writer ) ;
Document document = new Document( pdf ) ;
FontProgram fontProgram = FontProgramFactory.createFont( ) ;
PdfFont font = PdfFontFactory.createFont( fontProgram, "Cp1254" ) ;
document.setFont( font );
for turkish characters I used "Cp1254"

iText doesn't like my special characters

I'm trying to generate a pdf file using iText.
The file gets produced just fine, but I can seem to use special characters like german ä, ö, ...
The sentence I want to be written is (for example)
■ ...ä...ö...
but the output is
â– ...Ã¤...Ã¶...
(I had to kind of blur the sentences, but I guess you see what I'm talking about...)
Somehow this black block-thing and all "Umlaute" can't be generated ...
The font used is the following:
private static Font smallBold = new Font(Font.FontFamily.TIMES_ROMAN, 12,
Font.BOLD);
So there should be no problem about the font not having these characters...
I'm using IntelliJ Idea to develop, the encoding of the .java file is set to UTF-8, so there should be no problem too...
I'm kind of lost here; does anyone know what i may do to get it working?
Thanks in advance and greetz
gilaras
---------------UPDATE---------------
So here's (part of) the code:
#Controller
public class Generator {
...
Font font = new Font(Font.FontFamily.TIMES_ROMAN, 9f, Font.BOLD);
...
Paragraph intro = new Paragraph("Ich interessiere mich für ...!", font_12_bold);
Paragraph wantContact = new Paragraph("■ Ich hätte gerne ... ", font);
...
Phrase south = new Phrase("■ Süden □ Ost-West ...");
...
#RequestMapping(value = "/generatePdf", method = RequestMethod.POST)
#ResponseBody
public String generatePdf(HttpServletRequest request) throws IOException, DocumentException, com.lowagie.text.DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(FILE));
addMetaData(document);
document.open();
addContent(document, request);
document.add(new Paragraph("äöü"));
document.close();
return "";
}
private void addContent(Document document, HttpServletRequest request)
throws DocumentException {
Paragraph preface = new Paragraph();
preface.setAlignment(Element.ALIGN_JUSTIFIED);
addEmptyLine(preface, 1);
preface.add(new Paragraph("Rückantwort", catFont));
addEmptyLine(preface, 2);
preface.add(intro);
addEmptyLine(preface, 1);
if (request.getParameter("dec1").equals("wantContact")) {
preface.add(wantContact);
} else {
...
}
document.add(preface);
}
private static void addEmptyLine(Paragraph paragraph, int number) {
for (int i = 0; i < number; i++) {
paragraph.add(new Paragraph(" "));
}
}
private static void addMetaData(Document document) {
document.addTitle("...");
document.addSubject("...");
document.addKeywords("...");
document.addAuthor("...");
document.addCreator("...");
}
}
I had to take some things out, but I kept some Umlaut-character and other special characters, so that you can see, where the problem occurs ... :-)

You might want to try and embed the font using this technique:
BaseFont times = BaseFont.createFont(BaseFont.TIMES_ROMAN, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(times, 12, Font.BOLD);

How to create a PDF document from languages of Unicode char set regarding using third party Fonts

I'm using PDFBox and iText to create a simple (just paragraphs) pdf document from various languages. Something like :
pdfBox:
private static void createPdfBoxDocument(File from, File to) {
PDDocument document = null;
try {
document = new TextToPDF().createPDFFromText(new FileReader(from));
document.save(new FileOutputStream(to));
} finally {
if (document != null)
document.close();
}
}
private void createPdfBoxDoc() throws IOException, FileNotFoundException, COSVisitorException {
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
PDType1Font font = PDType1Font.TIMES_ROMAN;
contentStream.setFont(font, 12);
contentStream.beginText();
contentStream.moveTextPositionByAmount(100, 400);
contentStream.drawString("š");
contentStream.endText();
contentStream.close();
document.save("test.pdf");
document.close();
}
itext:
private static Font blackFont = new Font(Font.FontFamily.COURIER, 12, Font.NORMAL, BaseColor.BLACK);
private static void createITextDocument(File from, File to) {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(to));
document.open();
addContent(document, getParagraphs(from));
document.close();
}
private static void addContent(Document document, List<String> paragraphs) {
for (int i = 0; i < paragraphs.size(); i++) {
document.add(new Paragraph(paragraphs.get(i), blackFont));
}
}
The input files are encoded in UTF-8 and some languages of Unicode char set, like Russian alphabet etc., are not rendered properly in pdf. The Fonts in both libraries don't support Unicode charset I suppose and I can't find any documentation on how to add and use third party fonts. Could please anybody help me out with an example ?

If you are using iText, it has quite good support.
In iText in Action (chapter 2.2.2) you can read more.
You have to download some unicode Fonts like arialuni.ttf and do it like this :
public static File fontFile = new File("fonts/arialuni.ttf");
public static void createITextDocument(File from, File to) throws DocumentException, IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(to));
document.open();
writer.getAcroForm().setNeedAppearances(true);
BaseFont unicode = BaseFont.createFont(fontFile.getAbsolutePath(), BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
FontSelector fs = new FontSelector();
fs.addFont(new Font(unicode));
addContent(document, getParagraphs(from), fs);
document.close();
}
private static void addContent(Document document, List<String> paragraphs, FontSelector fs) throws DocumentException {
for (int i = 0; i < paragraphs.size(); i++) {
Phrase phrase = fs.process(paragraphs.get(i));
document.add(new Paragraph(phrase));
}
}
arialuni.ttf fonts work for me, so far I checked it support for
BG, ES, CS, DA, DE, ET, EL, EN, FR, IT, LV, LT, HU, MT, NL, PL, PT, RO, SK, SL, FI, SV
and only PDF in Romanian language wasn't created properly...
With PDFBox it's almost the same:
private void createPdfBoxDoc() throws IOException, FileNotFoundException, COSVisitorException {
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
PDFont font = PDTrueTypeFont.loadTTF(document, "fonts/arialuni.ttf");
contentStream.setFont(font, 12);
contentStream.beginText();
contentStream.moveTextPositionByAmount(100, 400);
contentStream.drawString("š");
contentStream.endText();
contentStream.close();
document.save("test.pdf");
document.close();
}
However as Gagravarr says, it doesn't work because of this issue PDFBOX-903 . Even with 1.6.0-SNAPSHOT version. Maybe trunk will work. I suggest you to use iText. It works there perfectly.

You may find this answer helpful - it confirms that you can't do what you need with one of the standard type 1 fonts, as they're Latin1 only
In theory, you just need to embed a suitable font into the document, which handles all your codepoints, and use that. However, there's at least one open bug with writing unicode strings, so there's a chance it might not work just yet... Try the latest pdfbox from svn trunk too though to see if it helps!

In my project, I just copied the font that supported UTF8 (or whatever language you want) to a directory (or you can used Windows fonts path) and add some code, it looked like this
BaseFont baseFont = BaseFont.createFont("c:\\a.ttf", BaseFont.IDENTITY_H,true);
Font font = new Font(baseFont);
document.add(new Paragraph("Not English Text",font));
Now, you can use this font to show your text in various languages.

//use this code.Sometimes setfont() willnot work with Paragraph
try
{
FileOutputStream out=new FileOutputStream(name);
Document doc=new Document();
PdfWriter.getInstance(doc, out);
doc.open();
Font f=new Font(FontFamily.TIMES_ROMAN,50.0f,Font.UNDERLINE,BaseColor.RED);
Paragraph p=new Paragraph("New PdF",f);
p.setAlignment(Paragraph.ALIGN_CENTER);
doc.add(p);
doc.close();
}
catch(Exception e)
{
System.out.println(e);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Surrogate Chinese Characters Not Displaying iText Java - java

This is usually a sign that the font being used (in this case Arial) does not have the glyphs for your characters.

Related

iText- Appending arabic text in pdf table cell phrase at different positions in a page

Can't export Vietnamese characters to PDF using iText

Unicode characters in iText PDF

iText doesn't like my special characters

How to create a PDF document from languages of Unicode char set regarding using third party Fonts

Categories

Resources