getting wrong arabic translation in PDF iText

getting wrong arabic translation in PDF iText - java

I am generating PDF file from my HTML string, But when PDF file getting generated the content in HTML and PDF does not match. The content is PDF is some random content. I read about the issue on google and they suggest using Unicode notation like %u0627%u0646%u0627%20%u0627%u0633%u0645%u0649%20%u0639%u0628%u062F%u0627%u0644%u0644%u0647. But I am putting this into my HTML it is getting printing as it is.
related issue: Writing Arabic in pdf using itext
package com.example.demo;
import com.itextpdf.html2pdf.ConverterProperties;
import com.itextpdf.html2pdf.HtmlConverter;
import com.itextpdf.styledxmlparser.css.media.MediaDeviceDescription;
import com.itextpdf.styledxmlparser.css.media.MediaType;
import com.itextpdf.html2pdf.resolver.font.DefaultFontProvider;
import com.itextpdf.layout.font.FontProvider;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
#SpringBootApplication
public class DemoApplication {
public static void main(String[] args) throws IOException {
SpringApplication.run(DemoApplication.class, args);
String htmlSource = getContent();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
ConverterProperties converterProperties = new ConverterProperties();
FontProvider dfp = new DefaultFontProvider(true, false, false);
dfp.addFont("/Library/Fonts/Arial.ttf");
converterProperties.setFontProvider(dfp);
converterProperties.setMediaDeviceDescription(new MediaDeviceDescription(MediaType.PRINT));
HtmlConverter.convertToPdf(htmlSource, outputStream, converterProperties);
byte[] bytes = outputStream.toByteArray();
File pdfFile = new File("java19.pdf");
FileOutputStream fos = new FileOutputStream(pdfFile);
fos.write(bytes);
fos.flush();
fos.close();
}
private static String getContent() {
return "<!DOCTYPE html>\n" +
"<html lang=\"en\">\n" +
"\n" +
"<head>\n" +
" <meta charset=\"UTF-8\">\n" +
" <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n" +
" <meta http-equiv=\"X-UA-Compatible\" content=\"ie=edge\">\n" +
" <title>Document</title>\n" +
" <style>\n" +
" #page {\n" +
" margin: 0;\n" +
" font-family: arial;\n" +
" }\n" +
" </style>\n" +
"</head>\n" +
"\n" +
"<body\n" +
" style=\"margin: 0;padding: 0;font-family: arial, sans-serif;font-size: 14px;line-height: 125%;width: 100%;-ms-text-size-adjust: 100%;-webkit-text-size-adjust: 100%;color: #222222;\">\n" +
" <table cellpadding=\"0\" cellspacing=\"0\" width=\"100%\" style=\"background: white; direction: rtl;\">\n" +
" <tbody>\n" +
" <tr>\n" +
" <td style=\"padding: 0 35px;\">\n" +
" <p> انا اسمى عبدالله\n" +
" </p>\n" +
" </td>\n" +
" </tr>\n" +
" </tbody>\n" +
" </table>\n" +
"\n" +
"</body>\n" +
"\n" +
"</html>";
}
}

It's difficult to determine what the issue is exactly without seeing the faulty output. But your "random content" sounds like an encoding issue.
Since you have your Arabic content directly in your source code, you have to be careful about encoding. For example, using ISO-8859-1, the resulting PDF output is:
Using Unicode escape sequences (\uXXXX), you can indeed avoid some of these encoding issues. Replacing
" <p> انا اسمى عبدالله\n" +
with
" <p>\u0627\u0646\u0627 \u0627\u0633\u0645\u0649 \u0639\u0628\u062F\u0627\u0644\u0644" +
results in Arabic glyphs, even when using ISO-8859-1 encoding. Alternatively, you can use UTF-8 to get the correct content regardless of the use of Unicode escape sequences.
When your encoding issues are solved, you will likely get output like this:
For correct rendering of certain writing systems, an optional module pdfCalligraph is needed for iText 7. With this module enabled, the resulting output looks like this:
The code used for the tests above:
public static void main(String[] args) throws IOException {
// Needed for pdfCalligraph
LicenseKey.loadLicenseFile("all-products.xml");
File pdfFile = new File("java19.pdf");
OutputStream outputStream = new FileOutputStream(pdfFile);
String htmlSource = getContent();
ConverterProperties converterProperties = new ConverterProperties();
FontProvider dfp = new DefaultFontProvider(true, false, false);
dfp.addFont("/Library/Fonts/Arial.ttf");
converterProperties.setFontProvider(dfp);
converterProperties.setMediaDeviceDescription(new MediaDeviceDescription(MediaType.PRINT));
HtmlConverter.convertToPdf(htmlSource, outputStream, converterProperties);
}
private static String getContent() {
return "<!DOCTYPE html>\n" +
"<html lang=\"en\">\n" +
"\n" +
"<head>\n" +
" <meta charset=\"UTF-8\">\n" +
" <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n" +
" <meta http-equiv=\"X-UA-Compatible\" content=\"ie=edge\">\n" +
" <title>Document</title>\n" +
" <style>\n" +
" #page {\n" +
" margin: 0;\n" +
" font-family: arial;\n" +
" }\n" +
" </style>\n" +
"</head>\n" +
"\n" +
"<body\n" +
" style=\"margin: 0;padding: 0;font-family: arial, sans-serif;font-size: 14px;line-height: 125%;width: 100%;-ms-text-size-adjust: 100%;-webkit-text-size-adjust: 100%;color: #222222;\">\n" +
" <table cellpadding=\"0\" cellspacing=\"0\" width=\"100%\" style=\"background: white; direction: rtl;\">\n" +
" <tbody>\n" +
" <tr>\n" +
" <td style=\"padding: 0 35px;\">\n" +
// Arabic content
// " <p> انا اسمى عبدالله\n" +
// Arabic content with Unicode escape sequences
" <p>\u0627\u0646\u0627 \u0627\u0633\u0645\u0649 \u0639\u0628\u062F\u0627\u0644\u0644\u0647" +
" </p>\n" +
" </td>\n" +
" </tr>\n" +
" </tbody>\n" +
" </table>\n" +
"\n" +
"</body>\n" +
"\n" +
"</html>";
}

Please check to make sure that your sourcefile and compiler use the same encoding, e.g. UTF-8. I sometimes check that by including characters that are only available in unicode and not in other classic codepages.
I tried to reproduce the issue and I got the following warning in the logging when running the example code:
Cannot find pdfCalligraph module, which was implicitly required by one of the layout properties
This was already mentioned by Alexsey Subach and can cause the following issue:
Problems with text direction (I am no expert on Arabic but the text was aligned to the right)
Wrong combination of characters (For the details see this document: https://itextpdf.com/sites/default/files/2018-12/iText_pdfCalligraph_4pager.pdf )
This is the output I got without pdfCalligraph:
pdf result without calligraph
Created with the codebase on this repository
So in order to get everything to work perfectly like your browser does with the HTML for Arabic you will also need:
A commercial license for https://itextpdf.com/en/products/itext-7/pdfcalligraph
Code to load the license file (or you will get a LicenseFileNotLoadedException )
This dependency https://repo.itextsupport.com/releases/com/itextpdf/typography/2.0.6/
Your question is tagged as regarding iText7 but there may be other possible free alternatives depending on your requirements like Apache FOP that should work with Arabic Ligatures according to this source but probably require rework as it is based on XSL-FO. In theory you could generate the XSL-FO with any templating mechanism that you currently use e.g.: JSP/JSF/Thymeleaf etc. and use something like a ServletFilter to convert the XSL-FO to a PDF on the fly during a request (in a web application)

Make sure your fonts support the characters you need and if you use Maven resource directory to include extra fonts during the build check that the font file is not filtered (properties replacement) as that corrupts the file: Maven corrupting binary files in source/main/resources when building jar

Related

Problem converting HTML with Japanese character To PDF using java with YaHP Html to Pdf Converter

I'm trying to convert html string with Japanese character to PDF using YaHP Html to Pdf Converter.
I am using Eclipse Photon Release (4.8.0)
Here is my main class that invokes the YaHP Html :
public static void main(String[] args) {
String pdfOutFileName = "C:\\test\\JP-Test.pdf";
double pageHeight = 80;
String htmlContent = "<html>\r\n" +
" <head>\r\n" +
" <meta http-equiv=Content-Type content=\"text/html; charset=UTF-8\">\r\n" +
" <style type=\"text/css\">\r\n" +
" span.cls_005hr{font-family:Arial,serif;font-size:16.8px;color:rgb(50,50,50);font-weight:normal;font-style:normal;text-decoration: none}\r\n" +
" div.cls_005hr{font-family:Arial,serif;font-size:14.8px;color:rgb(50,50,50);font-weight:normal;font-style:normal;text-decoration: none}\r\n" +
" </style>\r\n" +
" </head>\r\n" +
" <body>\r\n" +
" <table border=0 cellpadding=0 cellspacing=0 width=720>\r\n" +
" <col width=10 >\r\n" +
" <col width=710 >\r\n" +
" <tr>\r\n" +
" <td valign=\"middle\" height=\"80\" bgcolor=\"#f0f0f0\">\r\n" +
" <div><span class=\"cls_005hr\">JPTesting</span></div>\r\n" +
" </td>\r\n" +
" <td valign=\"middle\" height=\"80\" bgcolor=\"#f0f0f0\">\r\n" +
" <div><span class=\"cls_005hr\">株式会社　ビー・エス・デーインフォメーションテクノロジー</span></div>\r\n" +
" </td>\r\n" +
" </span>\r\n" +
" </tr>\r\n" +
" </table>\r\n" +
" </body>\r\n" +
"</html>";
System.out.println("htmlContent: [" + htmlContent + "]");
try {
ByteArrayOutputStream outFormPDF = new ByteArrayOutputStream();
outFormPDF = PDFUtil.convertHtmlToPDF(htmlContent, pageHeight);
byte[] bOutFormPDF = outFormPDF.toByteArray();
OutputStream os = new FileOutputStream(pdfOutFileName);
os.write(bOutFormPDF);
System.out.println("Successfully Finished writing PDF to output file");
os.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
and here is the PDFUtil class method that calls YaHP Converter
public static ByteArrayOutputStream convertHtmlToPDF (String htmlContent, double pageHeight) throws CConvertException, IOException {
ByteArrayOutputStream outFormPDF = new ByteArrayOutputStream();
Scanner scanner = new Scanner(htmlContent).useDelimiter("\\Z");
String htmlContents = scanner.next();
CYaHPConverter converter = new CYaHPConverter();
Map properties = new HashMap();
List headerFooterList = new ArrayList();
URL resource = PDFUtil.class.getClassLoader().getResource("fonts");
String fontDirectory = resource.getPath() ;
properties.put(IHtmlToPdfTransformer.PDF_RENDERER_CLASS, IHtmlToPdfTransformer.FLYINGSAUCER_PDF_RENDERER);
properties.put(IHtmlToPdfTransformer.FOP_TTF_FONT_PATH, fontDirectory);
PageSize pageSize = IHtmlToPdfTransformer.LEGALP;
if (pageHeight>0) {
String sHeight = Double.toString(pageHeight);
sHeight = sHeight.substring(0,sHeight.indexOf("."));
pageHeight = Double.parseDouble(sHeight);
System.out.println ("pageHeight : " + pageHeight);
pageSize = new PageSize(21.6d, pageHeight, 0.7d, 0.5d, 1.5d, 1.5d);
}
System.out.println ("Calling converter.convertToPdf");
converter.convertToPdf(htmlContents,
pageSize,
headerFooterList,
"file://tmp/Html2PdfConvertTemp",
outFormPDF,
properties);
System.out.println ("Successfully Called converter.convertToPdf");
scanner.close();
return outFormPDF;
}
For some reason, the output PDF file contains "JPTesting", but does not contain the Japanese letters : "株式会社　ビー・エス・デーインフォメーションテクノロジー" .
Any help would be much appreciated.

Found the solution. Posting my solution here in case anyone else may struggle with the same issue that I had.
I have added Japanese font from google : https://fonts.google.com/?subset=japanese (I Picked Shippori Mincho B1), added to my font resource directory.
Updated my html CSS to pick up those new fonts :
span.cls_005jp{font-family:Shippori Mincho B1, Arial,serif;...
Update my html to use those new fonts for tags that may contain Japanese letters :
株式会社　ビー・エス・デーインフォメーションテクノロジー\r\n"
Thank you, #g00se, for pointing me to the right direction!

how do i edit HTML inside string JAVA CODE easily

i developed android app and i used HTML code to Print the result in formatted form.
the code will be like this :
String adress = editText1.getText().toString();
String phone = editText2.getText().toString();
String licenseNo = editText3.getText().toString();
"\n" +
" <div class=\"down\">\n" +
" <div class=\"EASY\">\n" +
" <img src=\"" + image + "\" alt=\"QR\" >\n" +
" </div>\n" +
" <table class=\" info_table\">\n" +
" <tbody>\n" +
" <tr>\n" +
" <td class=\"info\">" + phone + "</td>\n" +
" <td class=\"info\"> " + licenseNo + "</td>\n" +
" </tr>\n" +
" <tr>\n" +
" <td class=\"info_A\" colspan=\"2\"> " + adress + "</td>\n" +
" </tr>\n" +
" </tbody>\n" +
" </table>\n" +
" </div>";
as you see it should contain \n and + because its inside a string object contained variable values from user input.
i faced some difficulty when i need to Edit this Piece of code cose i will test it inside the application.
my question is what is the best way to get this code to edit and preview it outside android studio and reinsert it again.
i used find and replace but it make some problems.
hope i explained enough

You can create a XML file in the resources folder, read it and then use format in order to change the values there.

IBM Maximo - sending HTML email

I need to send HTML e-mail from Maxima, but as a result, only text without tags and without a table comes to the mail, how I can do it true?
Code example:
package com.vetasi.testPackage;
import psdi.server.MXServer;
import psdi.server.SimpleCronTask;
import psdi.util.logging.MXLogger;
import psdi.util.logging.MXLoggerFactory;
public class Test1 extends SimpleCronTask{
MXLogger logger = MXLoggerFactory.getLogger("com.test.TestReportCron");
#Override
public void cronAction(){
String message1 = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n" +
"<html xmlns=\"http://www.w3.org/1999/xhtml\">\r\n" +
" <head>\r\n" +
" <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\r\n" +
" <title>Title</title>\r\n" +
" <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"/>\r\n" +
"</head>\r\n" +
"<body style=\"margin: 0; padding: 0;\">\r\n" +
" <table border=\"1\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\">\r\n" +
" <tr>\r\n" +
" <td>\r\n" +
" Hello!\r\n" +
" </td>\r\n" +
" </tr>\r\n" +
" </table>\r\n" +
"</body>\r\n" +
"</html>" ;
try{
MXServer.sendEMail("Misha89uatest#gmail.com", "maxadmin#us.ibm.com", "Hello my friend", message1);
}catch (Exception e) {
logger.error(e.getStackTrace());
e.printStackTrace();
}
}
}
Result:
Comes e-mail
When I using the Java Mail API, everything works fine, but there is much more code, and it is preferable to do in Maximo methods Anybody has any ideas?

Is there anything wrong with FileWriter?

I don't know what's going on with the FileWriter, because it only writes out the HTML part, but nothing from the String array content. content stores a lot of long Strings. Is it because of Java's garbage collector?
I print out the content and everything is there, but FileWrter did not write anything from content to that file except the HTML part. I added System.out.println(k); inside the enhanced for-loop. content array is not null though.
public void writeHtml(String[] content) {
File file = new File("final.html");
try {
try (FileWriter Fw = new FileWriter(file)) {
Fw.write("<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n"
+ "<html>\n"
+ "<head>\n"
+ "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=us-ascii\">\n"
+ "<title>" + fileName +" for The Delinquent</title>\n"
+ "<style type = \"text/css\">\n"
+ "body {font-family: \"Times New Roman, serif\"; font-size: 14 or 18; text-align: justify;};\n"
+ "p { margin-left: 1%; margin-right: 1%; }\n"
+ "</style>\n"
+ "</head><body>");
for (String k : content) {
Fw.write(k+"\n");
}
Fw.write("</body></html>");
}
} catch (Exception e) {
e.printStackTrack();
}
}
How the final.html looks like after running the program:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<title>the_delinquent.txt for The Delinquent</title>
<style type = "text/css">
body {font-family: "Times New Roman, serif"; font-size: 14 or 18; text-
align: justify;};
p { margin-left: 1%; margin-right: 1%; }
</style>
</head><body>
</body></html>
I know content is not empty because I did this:
for (String k: content) {
System.out.println(k + "\n");
bw.write(k + "\n");
}
Everything printed out. so weird : (

You code is working. The only thing that prevents content to be written - empty content. It has no elements.

Your code iis basically correct, maybe the content array is empty.
The following is in modernized java style.
public void writeHtml(String[] content) {
Path file = Paths.get("final.html");
try (BufferedWriter fw = Files.newBufferedWriter(file, StandardCharsets.UTF_8)) {
fw.write("<!DOCTYPE html>\n"
+ "<html>\n"
+ "<head>\n"
+ "<meta charset=UTF-8\">\n"
+ "<title>" + fileName + " for The Delinquent</title>\n"
+ "<style type = \"text/css\">\n"
+ "body {font-family: \"Times New Roman, serif\";"
+ " font-size: 14 or 18; text-align: justify;};\n"
+ "p { margin-left: 1%; margin-right: 1%; }\n"
+ "</style>\n"
+ "</head><body>");
fw.write("Content has: " + content.length + " strings.<br>\n");
for (String k : content) {
fw.write("* " + k + "<br>\n");
}
fw.write("</body></html>\n");
} catch (IOException e) {
System.out.println("Error " + e.getMessage());
}
}
FileWriter is an old utility class that uses the default charset, so not portable. Better specify the charset to correspond to the charset in HTML.
The encoding UTF-8 allows full Unicode range of characters, like comma like quotes (typical to MS Word). Java internally also uses Unicode, so it is a fine match.
HTML 5 is the latest HTML version, now generally disposable.
At one spot a typo /n entered.
Multiple spaces and line breaks are converted to a single space. So I added <br> for a line break.
Normally one would add to the method header throws IOException to let the caller handle any irregularities.

How can I add a newline character to a String in Java?

In a Java application, I am creating a String like below (by concatenation):
String notaCorrente = dataOdierna + " - " + testoNotaCorrente;
My problem is that I want to add also something like an HTML newline character at the end of this String (that will be shown into an HTML page).
How can I implement it?

The newline character in Java is "\n" which will look like this:
String notaCorrente = dataOdierna + " - " + testoNotaCorrente + "\n";
However, this will not display as you expect on your HTML page. You can try adding an html break tag, or add the
(Line Feed) and 
 (Carriage Return) HTML entities:
String notaCorrente = dataOdierna + " - " + testoNotaCorrente + "<br>";
or
String notaCorrente = dataOdierna + " - " + testoNotaCorrente + "
&#10";

For a newline that will result in a line break in HTML, use
String notaCorrente = dataOdierna + " - " + testoNotaCorrente + "<br>";
For a newline that will result in a line break in your text editor, use
String notaCorrente = dataOdierna + " - " + testoNotaCorrente + System.lineSeparator();
And for both, use
String notaCorrente = dataOdierna + " - " + testoNotaCorrente + "<br>" + System.lineSeparator();
Why not \n?
\n is specific to certain operating systems, while others use \r\n. System.lineSeparator() will get you the one that is relevant to the system where you are executing your application. See the documentation for more info on this function, and Wikipedia for more info on newlines in general.

Simply, need to add <br/> (break line tag of HTML).
String notaCorrente = dataOdierna + " - " + testoNotaCorrente + "<br/>";
so, while you are going to display this content, <br/> tag will rendered on HTML page in form of new line.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

getting wrong arabic translation in PDF iText - java

Make sure your fonts support the characters you need and if you use Maven resource directory to include extra fonts during the build check that the font file is not filtered (properties replacement) as that corrupts the file: Maven corrupting binary files in source/main/resources when building jar

Related

Problem converting HTML with Japanese character To PDF using java with YaHP Html to Pdf Converter

how do i edit HTML inside string JAVA CODE easily

IBM Maximo - sending HTML email

Is there anything wrong with FileWriter?

How can I add a newline character to a String in Java?

Categories

Resources