How To Display Arabic Text in PDFBox - java

I want to create report in Arabic using PDFBox I have saw different solution on stackoverflow but cant be get solution for my problem yet.
the arabic word came as an character in reverse order how to fix it if you have any example please help me here is my code.
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// TODO Auto-generated method stub
String relativeWebPath = "/font/arial.ttf";
String absoluteDiskPath = getServletContext().getRealPath(relativeWebPath);
File file = new File(absoluteDiskPath);
System.out.print(file);
ByteArrayOutputStream output=new ByteArrayOutputStream();
PDDocument document=new PDDocument();
PDFont font = PDType0Font.load(document, new File(absoluteDiskPath));
PDPage test=new PDPage();
document.addPage(test);
PDPageContentStream content=new PDPageContentStream(document, test);
final String EXAMPLE = "النص العربي";
System.out.print(EXAMPLE);
content.beginText();
content.newLineAtOffset(50, 680);
content.setFont(font, 12);
content.showText(EXAMPLE);
System.out.print(EXAMPLE);
content.endText();
content.close();
PDFTextStripper textStripper = new PDFTextStripper();
String Text = textStripper.getText(document);
System.out.print(Text);
document.save(output);
document.close();
response.setContentType("application/pdf;base64,BASE_64_ENCODED_PDF");
response.addHeader("Content-Disposition", "inline; filename=\"TestReport.pdf\"");
response.getCharacterEncoding();
response.getOutputStream().write(output.toByteArray());
}

For support for bidirectional languages like Arabic. PDFBox uses the ICU4J library from the International Components for Unicode (ICU) project to support such languages in PDF documents. You need add the ICU4J jar to your project.

Related

Telugu Text is not displaying properly in itext generated pdf

I am creating a small Bilingual application in Telugu and English languages using Java and Spring Boot. Here, I have to generate Pdf both Telugu and English Languages. So, I am Generating a pdf Using itext Library. This pdf has both Telugu and English Languages. but, The problem is English is appearing Properly in the PDF but Telugu Language is not displaying properly
I've tried with multiple .ttf telugu fonts. but the telugu text is not displaying properly.
example: the Telugu text that I want to show is వ్యవసాయ శాఖ
but the output that I am getting is "Please refer the below image"
So, anyone Please guide me to solve this issue. Thank You.
Below I am Attching the Code Please, check.
1.Code
public class samplepdf {
public void export(HttpServletRequest request, HttpServletResponse response)
throws DocumentException, ParseException, java.io.IOException, SAXException, ParserConfigurationException
{
ByteArrayOutputStream baos = null;
baos = new ByteArrayOutputStream();
//Create Document
Document doc = new Document();
//Pass 2 things ---> What to render & where to render
PdfWriter instance2 = PdfWriter.getInstance(doc, response.getOutputStream());
Paragraph bottom = new Paragraph();
bottom.add(Chunk.NEWLINE);
bottom.setAlignment(Element.ALIGN_BOTTOM);
bottom.setAlignment(Element.ALIGN_CENTER);
//Open the document to start the work
doc.open();
Paragraph paragraph2 = new Paragraph();
//String fontpath = request.getServletContext().getRealPath("/static/TeluguFonts/NotoSansTelugu-Light.ttf");
//String fontpath = request.getServletContext().getRealPath("/static/TeluguFonts/Chathura Light.ttf");
//String fontpath = request.getServletContext().getRealPath("/static/TeluguFonts/Gidugu.ttf");
//String fontpath = request.getServletContext().getRealPath("/static/TeluguFonts/akshar.ttf");
//String fontpath = request.getServletContext().getRealPath("/static/TeluguFonts/gautami.ttf");
//String fontpath = request.getServletContext().getRealPath("/static/TeluguFonts/Chathura Light.ttf");
String fontpath = request.getServletContext().getRealPath("/static/TeluguFonts/NotoSansTelugu-VariableFont.ttf");
BaseFont bf_cjk1 = BaseFont.createFont(fontpath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font cjk1 = new Font(bf_cjk1, 12,Font.NORMAL);
paragraph2.setFont(cjk1);
paragraph2.add(Chunk.NEWLINE);
paragraph2.add("వ్యవసాయ శాఖ");
paragraph2.add(Chunk.NEWLINE);
paragraph2.add(Chunk.NEWLINE);
doc.add(paragraph2);
//Here, Closing the document.
doc.close();
try {
ServletOutputStream os = response.getOutputStream();
os.flush();
os.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}

Creating PDF file in java using PDDocument results in corrupted PDF files

I'm trying to create temporary PDF files in Java using PDDocument. I'm employing the following method to create a temporary PDF file.
/* Create a temporary PDF file.*/
private File createPdf(String fileName) throws IOException {
final PDDocument document = new PDDocument();
final File file = File.createTempFile(fileName, ".pdf");
//write it
BufferedWriter bw = new BufferedWriter(new FileWriter(file));
bw.write("This is the temporary pdf file content");
bw.close();
document.save(file);
document.close();
return file;
}
This is the test.
#Test
public void testCreateAndMergePdfs() throws IOException {
Collection<File> pdfs = new ArrayList<>(Arrays.asList(createPdf("File1"), createPdf("File2")));
assertFalse(CollectionUtils.isEmpty(pdfs));
PdfPrintPojo pdfPrintPojo = new PdfPrintPojo(pdfs);
File mergedFile = service.createAndMergePDFs(pdfPrintPojo, "Merged");
assertNotNull(mergedFile);
List<File> list = new ArrayList<>(pdfs);
File file1 = list.get(0);
File file2 = list.get(1);
assertTrue(FileUtils.contentEquals(file1, file2));
}
What I'm trying to do here is to create and merge two PDF files. When I run the test, it creates two PDF files in the temp folder, for example, \AppData\Local\Temp\File16375814641476797612.pdf and \AppData\Local\Temp\File24102718409195239661.pdf and the merged file at \AppData\Local\Temp\Merged_merged_3755858389884894769.pdf. But the test fails at
assertTrue(FileUtils.contentEquals(file1, file2));
When I try to open the PDF files in the temp folder, it says that the PDF is corrupted. Also, I have no idea why the files are not being saved as File1 and File2. Can anyone help me with this?
Using Apache PDFBox tutorial, I managed to create a working PDF file(s). The method was changed as follows.
/* Create a temporary PDF file.*/
private File createPdf(String fileName) throws IOException {
// Create a document and add a page to it
final PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
// Create a new font object selecting one of the PDF base fonts
PDFont font = PDType1Font.HELVETICA_BOLD;
// Start a new content stream which will "hold" the to be created content
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.newLineAtOffset(100, 700);
contentStream.showText("Hello World");
contentStream.endText();
// Make sure that the content stream is closed:
contentStream.close();
// Save the results and ensure that the document is properly closed:
File file = File.createTempFile(fileName, ".pdf");
document.save(file);
document.close();
return file;
}
As for the test, I took the approach of using PDDocument to load the files, then extract data as String using PDFTextStripper and using assertions on those Strings.
#Test
public void testCreateAndMergePdfs() throws IOException {
Collection<File> pdfs = new ArrayList<>(Arrays.asList(createPdf("File1"), createPdf("File2")));
assertFalse(CollectionUtils.isEmpty(pdfs));
PdfPrintPojo pdfPrintPojo = new PdfPrintPojo(pdfs);
File mergedFile = service.createAndMergePDFs(pdfPrintPojo, "Merged");
assertNotNull(mergedFile);
List<File> list = new ArrayList<>(pdfs);
/* Load the PDF files and extract data as String. */
PDDocument document1 = PDDocument.load(list.get(0));
PDDocument document2 = PDDocument.load(list.get(1));
PDDocument merged = PDDocument.load(mergedFile);
PDFTextStripper stripper = new PDFTextStripper();
String file1Data = stripper.getText(document1);
String file2Data = stripper.getText(document2);
String mergedData = stripper.getText(merged);
/* Assert that data from file 1 and 2 are equal with each other and merged file. */
assertEquals(file1Data, file2Data);
assertEquals(file1Data + file2Data, mergedData);
}
The way you compare the file contents is a bit different, Could you try with below,
#Test
public void testCreateAndMergePdfs() {
Assert.assertEquals(FileUtils.readLines(file1), FileUtils.readLines(file2));
}
Or you can try
byte[] file1Bytes = Files.readAllBytes(Paths.get("Path to File 1"));
byte[] file2Bytes = Files.readAllBytes(Paths.get("Path to File 2"));
String file1 = new String(file1Bytes, StandardCharsets.UTF_8);
String file2 = new String(file2Bytes, StandardCharsets.UTF_8);
assertEquals("The content in the strings should match", file1, file2);
Or
File file1 = new File(file1);
File file2 = new File(file2);
assertThat(file1).hasSameContentAs(file2);

Surrogate Chinese Characters Not Displaying iText Java

Using iText 5.5.11 from the maven repo https://mvnrepository.com/artifact/com.itextpdf/itextpdf/5.5.11
public class test {
public static void main(String[] args) throws DocumentException, IOException {
final String text = "BMP: \u6d4b \u8bd5 Surrogate: \uD841\uDF0E \uD841\uDF31 \uD859\uDC02";
BaseFont baseFont = BaseFont.createFont("C:\\Windows\\Fonts\\arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(baseFont, 6.8f);
Document doc = new Document();
PdfWriter.getInstance(doc, new FileOutputStream("out.pdf"));
doc.open();
Paragraph p = new Paragraph();
p.add(new Phrase(text, font));
doc.add(p);
doc.close();
}
}
The non-surrogate characters in the basic multilingual plane are rendered on the resulting pdf, but the surrogate characters are not.
Edit: Also tried with font "STSong-Light" with encoding "UniGB-UCS2-H" (as in examples in book). Same result - surrogate characters missing.
Edit2: Got it to work with "SimSun-ExtB" font
This is usually a sign that the font being used (in this case Arial) does not have the glyphs for your characters.

iText doesn't like my special characters

I'm trying to generate a pdf file using iText.
The file gets produced just fine, but I can seem to use special characters like german ä, ö, ...
The sentence I want to be written is (for example)
■ ...ä...ö...
but the output is
■...ä...ö...
(I had to kind of blur the sentences, but I guess you see what I'm talking about...)
Somehow this black block-thing and all "Umlaute" can't be generated ...
The font used is the following:
private static Font smallBold = new Font(Font.FontFamily.TIMES_ROMAN, 12,
Font.BOLD);
So there should be no problem about the font not having these characters...
I'm using IntelliJ Idea to develop, the encoding of the .java file is set to UTF-8, so there should be no problem too...
I'm kind of lost here; does anyone know what i may do to get it working?
Thanks in advance and greetz
gilaras
---------------UPDATE---------------
So here's (part of) the code:
#Controller
public class Generator {
...
Font font = new Font(Font.FontFamily.TIMES_ROMAN, 9f, Font.BOLD);
...
Paragraph intro = new Paragraph("Ich interessiere mich für ...!", font_12_bold);
Paragraph wantContact = new Paragraph("■ Ich hätte gerne ... ", font);
...
Phrase south = new Phrase("■ Süden □ Ost-West ...");
...
#RequestMapping(value = "/generatePdf", method = RequestMethod.POST)
#ResponseBody
public String generatePdf(HttpServletRequest request) throws IOException, DocumentException, com.lowagie.text.DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(FILE));
addMetaData(document);
document.open();
addContent(document, request);
document.add(new Paragraph("äöü"));
document.close();
return "";
}
private void addContent(Document document, HttpServletRequest request)
throws DocumentException {
Paragraph preface = new Paragraph();
preface.setAlignment(Element.ALIGN_JUSTIFIED);
addEmptyLine(preface, 1);
preface.add(new Paragraph("Rückantwort", catFont));
addEmptyLine(preface, 2);
preface.add(intro);
addEmptyLine(preface, 1);
if (request.getParameter("dec1").equals("wantContact")) {
preface.add(wantContact);
} else {
...
}
document.add(preface);
}
private static void addEmptyLine(Paragraph paragraph, int number) {
for (int i = 0; i < number; i++) {
paragraph.add(new Paragraph(" "));
}
}
private static void addMetaData(Document document) {
document.addTitle("...");
document.addSubject("...");
document.addKeywords("...");
document.addAuthor("...");
document.addCreator("...");
}
}
I had to take some things out, but I kept some Umlaut-character and other special characters, so that you can see, where the problem occurs ... :-)
You might want to try and embed the font using this technique:
BaseFont times = BaseFont.createFont(BaseFont.TIMES_ROMAN, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(times, 12, Font.BOLD);

How to create a PDF document from languages of Unicode char set regarding using third party Fonts

I'm using PDFBox and iText to create a simple (just paragraphs) pdf document from various languages. Something like :
pdfBox:
private static void createPdfBoxDocument(File from, File to) {
PDDocument document = null;
try {
document = new TextToPDF().createPDFFromText(new FileReader(from));
document.save(new FileOutputStream(to));
} finally {
if (document != null)
document.close();
}
}
private void createPdfBoxDoc() throws IOException, FileNotFoundException, COSVisitorException {
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
PDType1Font font = PDType1Font.TIMES_ROMAN;
contentStream.setFont(font, 12);
contentStream.beginText();
contentStream.moveTextPositionByAmount(100, 400);
contentStream.drawString("š");
contentStream.endText();
contentStream.close();
document.save("test.pdf");
document.close();
}
itext:
private static Font blackFont = new Font(Font.FontFamily.COURIER, 12, Font.NORMAL, BaseColor.BLACK);
private static void createITextDocument(File from, File to) {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(to));
document.open();
addContent(document, getParagraphs(from));
document.close();
}
private static void addContent(Document document, List<String> paragraphs) {
for (int i = 0; i < paragraphs.size(); i++) {
document.add(new Paragraph(paragraphs.get(i), blackFont));
}
}
The input files are encoded in UTF-8 and some languages of Unicode char set, like Russian alphabet etc., are not rendered properly in pdf. The Fonts in both libraries don't support Unicode charset I suppose and I can't find any documentation on how to add and use third party fonts. Could please anybody help me out with an example ?
If you are using iText, it has quite good support.
In iText in Action (chapter 2.2.2) you can read more.
You have to download some unicode Fonts like arialuni.ttf and do it like this :
public static File fontFile = new File("fonts/arialuni.ttf");
public static void createITextDocument(File from, File to) throws DocumentException, IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(to));
document.open();
writer.getAcroForm().setNeedAppearances(true);
BaseFont unicode = BaseFont.createFont(fontFile.getAbsolutePath(), BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
FontSelector fs = new FontSelector();
fs.addFont(new Font(unicode));
addContent(document, getParagraphs(from), fs);
document.close();
}
private static void addContent(Document document, List<String> paragraphs, FontSelector fs) throws DocumentException {
for (int i = 0; i < paragraphs.size(); i++) {
Phrase phrase = fs.process(paragraphs.get(i));
document.add(new Paragraph(phrase));
}
}
arialuni.ttf fonts work for me, so far I checked it support for
BG, ES, CS, DA, DE, ET, EL, EN, FR, IT, LV, LT, HU, MT, NL, PL, PT, RO, SK, SL, FI, SV
and only PDF in Romanian language wasn't created properly...
With PDFBox it's almost the same:
private void createPdfBoxDoc() throws IOException, FileNotFoundException, COSVisitorException {
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
PDFont font = PDTrueTypeFont.loadTTF(document, "fonts/arialuni.ttf");
contentStream.setFont(font, 12);
contentStream.beginText();
contentStream.moveTextPositionByAmount(100, 400);
contentStream.drawString("š");
contentStream.endText();
contentStream.close();
document.save("test.pdf");
document.close();
}
However as Gagravarr says, it doesn't work because of this issue PDFBOX-903 . Even with 1.6.0-SNAPSHOT version. Maybe trunk will work. I suggest you to use iText. It works there perfectly.
You may find this answer helpful - it confirms that you can't do what you need with one of the standard type 1 fonts, as they're Latin1 only
In theory, you just need to embed a suitable font into the document, which handles all your codepoints, and use that. However, there's at least one open bug with writing unicode strings, so there's a chance it might not work just yet... Try the latest pdfbox from svn trunk too though to see if it helps!
In my project, I just copied the font that supported UTF8 (or whatever language you want) to a directory (or you can used Windows fonts path) and add some code, it looked like this
BaseFont baseFont = BaseFont.createFont("c:\\a.ttf", BaseFont.IDENTITY_H,true);
Font font = new Font(baseFont);
document.add(new Paragraph("Not English Text",font));
Now, you can use this font to show your text in various languages.
//use this code.Sometimes setfont() willnot work with Paragraph
try
{
FileOutputStream out=new FileOutputStream(name);
Document doc=new Document();
PdfWriter.getInstance(doc, out);
doc.open();
Font f=new Font(FontFamily.TIMES_ROMAN,50.0f,Font.UNDERLINE,BaseColor.RED);
Paragraph p=new Paragraph("New PdF",f);
p.setAlignment(Paragraph.ALIGN_CENTER);
doc.add(p);
doc.close();
}
catch(Exception e)
{
System.out.println(e);
}
}

Categories