convert html file with Arabic Characters to pdf using java [duplicate]

convert html file with Arabic Characters to pdf using java [duplicate] - java

I am having trouble to display the Arabic Characters from HTML Content in PDF Generation as "?"
I am able to display the Arabic text from String variable. At the same time I am not able to generate the Arabic Text from the HTML String.
I want to display the PDF with two column, left side English and the right side Arabic Text.
when I use the following program to convert into pdf. Please help me in this regard.
try
{
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
ByteArrayOutputStream out = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, out);
BaseFont bf = BaseFont.createFont("C:\\arial.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font font = new Font(bf, 8);
document.open();
BufferedReader br = new BufferedReader(new FileReader("C:\\style.css"));
StringBuffer fileContents = new StringBuffer();
String line = br.readLine();
while (line != null)
{
fileContents.append(line);
line = br.readLine();
}
br.close();
String styles = fileContents.toString(); //"p { font-family: Arial;}";
Paragraph cirNoEn = null;
Paragraph cirNoAr = null;
String htmlContentEn = null;
String htmlContentAr = null;
PdfPCell contentEnCell = new PdfPCell();
PdfPCell contentArCell = new PdfPCell();
cirNoEn = new Paragraph("Circular No. (" + cirEnNo + ")", new Font(bf, 14, Font.BOLD | Font.UNDERLINE));
cirNoAr = new Paragraph("رقم التعميم (" + cirArNo + ")", new Font(bf, 14, Font.BOLD | Font.UNDERLINE));
htmlContentEn = “< p >< span > Dear….</ span ></ p >”;
htmlContentAr = “< p >< span > رقم التعميم رقم التعميم </ p >< p > رقم التعميم ….</ span ></ p >”;
for (Element e : XMLWorkerHelper.parseToElementList(htmlContentEn, styles))
{
for (Chunk c : e.getChunks())
{
c.setFont(new Font(bf));
}
contentEnCell.addElement(e);
}
for (Element e : XMLWorkerHelper.parseToElementList(htmlContentAr, styles))
{
for (Chunk c:e.getChunks())
{
c.setFont(new Font(bf));
}
contentArCell.addElement(e);
}
PdfPCell emptyCell = new PdfPCell();
PdfPCell cirNoEnCell = new PdfPCell(cirNoEn);
PdfPCell cirNoArCell = new PdfPCell(cirNoAr);
cirNoEnCell.setHorizontalAlignment(Element.ALIGN_CENTER);
cirNoArCell.setHorizontalAlignment(Element.ALIGN_CENTER);
emptyCell.setBorder(Rectangle.NO_BORDER);
emptyCell.setFixedHeight(15);
cirNoEnCell.setBorder(Rectangle.NO_BORDER);
cirNoArCell.setBorder(Rectangle.NO_BORDER);
contentEnCell.setBorder(Rectangle.NO_BORDER);
contentArCell.setBorder(Rectangle.NO_BORDER);
cirNoArCell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
contentArCell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
contentEnCell.setNoWrap(false);
contentArCell.setNoWrap(false);
PdfPTable circularInfoTable = null;
emptyCell.setColspan(2);
circularInfoTable = new PdfPTable(2);
circularInfoTable.addCell(cirNoEnCell);
circularInfoTable.addCell(cirNoArCell);
circularInfoTable.addCell(emptyCell);
circularInfoTable.addCell(emptyCell);
circularInfoTable.addCell(emptyCell);
circularInfoTable.addCell(contentEnCell);
circularInfoTable.addCell(contentArCell);
circularInfoTable.addCell(emptyCell);
circularInfoTable.getDefaultCell().setBorder(PdfPCell.NO_BORDER);
circularInfoTable.setWidthPercentage(100);
document.add(circularInfoTable);
document.close();
}
catch (Exception e)
{
}

Please take a look at the ParseHtml7 and ParseHtml8 examples. They take HTML input with Arabic characters and they create a PDF with the same Arabic text:
Before we look at the code, allow me to explain that it's not a good idea to use non-ASCII characters in source code. For instance: this is not done:
htmlContentAr = “<p><span> رقم التعميم رقم التعميم</p><p>رقم التعميم ….</span></p>”;
You never know how a Java file containing these glyphs will be stored. If it's not stored as UTF-8, the characters may end up looking like something completely different. Versioning systems are known to have problems with non-ASCII characters and even compilers can get the encoding wrong. If you really want to stored hard-coded String values in your code, use the UNICODE notation. Part of your problem is an encoding problem, and you can read more about this here: Can't get Czech characters while generating a PDF
For the examples shown in the screen shots, I saved the following files using UTF-8 encoding:
This is what you'll find in the file arabic.html:
<html>
<body style="font-family: Noto Naskh Arabic">
<p>رقم التعميم رقم التعميم</p>
<p>رقم التعميم</p>
</body>
</html>
This is what you'll find in the file arabic2.html:
<html>
<body style="font-family: Noto Naskh Arabic">
<table>
<tr>
<td dir="rtl">رقم التعميم رقم التعميم</td>
<td dir="rtl">رقم التعميم</td>
</tr>
</table>
</body>
</html>
The second part of your problem concerns the font. It is important that you use a font that knows how to draw Arabic glyphs. It is hard to believe that you have arial.ttf right at the root of your C: drive. That's not a good idea. I would expect you to use C:/windows/fonts/arialuni.ttf which certainly knows Arabic glyphs.
Selecting the font isn't sufficient. Your HTML needs to know which font family to use. Because most of the examples in the documentation use Arial, I decided to use a NOTO font. I discovered these fonts by reading this question: iText pdf not displaying Chinese characters when using NOTO fonts or Source Hans. I really like these fonts because they are nice and (almost) every language is supported. For instance, I used NotoNaskhArabic-Regular.ttf which means that I need to define the font familie like this:
style="font-family: Noto Naskh Arabic"
I defined the style in the body tag of my XML, it's obvious that you can choose where to define it: in an external CSS file, in the styles section of the <head>, at the level of a <td> tag,... That choice is entirely yours, but you have to define somewhere which font to use.
Of course: when XML Worker encounters font-family: Noto Naskh Arabic, iText doesn't know where to find the corresponding NotoNaskhArabic-Regular.ttf unless we register that font. We can do this, by creating an instance of the FontProvider interface. I chose to use the XMLWorkerFontProvider, but you're free to write your own FontProvider implementation:
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/NotoNaskhArabic-Regular.ttf");
There is one more hurdle to take: Arabic is written from right to left. I see that you want to define the run direction at the level of the PdfPCell and that you add the HTML content to this cell using an ElementList. That's why I first wrote a similar example, named ParseHtml7:
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
// Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/NotoNaskhArabic-Regular.ttf");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline pdf = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));
PdfPTable table = new PdfPTable(1);
PdfPCell cell = new PdfPCell();
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
for (Element e : elements) {
cell.addElement(e);
}
table.addCell(cell);
document.add(table);
// step 5
document.close();
}
There is no table in the HTML, but we create our own PdfPTable, we add the content from the HTML to a PdfPCell with run direction LTR, and we add this cell to the table, and the table to the document.
Maybe that's your actual requirement, but why would you do this in such a convoluted way? If you need a table, why don't you create that table in HTML and define some cells are RTL like this:
<td dir="rtl">...</td>
That way, you don't have to create an ElementList, you can just parse the HTML to PDF as is done in the ParseHtml8 example:
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
// Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/NotoNaskhArabic-Regular.ttf");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));;
// step 5
document.close();
}
There is less code needed in this example, and when you want to change the layout, it's sufficient to change the HTML. You don't need to change your Java code.
One more example: in ParseHtml9, I create a table with an English name in one column ("Lawrence of Arabia") and the Arabic translation in the other column ("لورانس العرب"). Because I need different fonts for English and Arabic, I define the font at the <td> level:
<table>
<tr>
<td>Lawrence of Arabia</td>
<td dir="rtl" style="font-family: Noto Naskh Arabic">لورانس العرب</td>
</tr>
</table>
For the first column, the default font is used and no special settings are needed to write from left to right. For the second column, I define an Arabic font and I set the run direction to "rtl".
The result looks like this:
That's much easier than what you're trying to do in your code.

Related

iText 7 can not set margin

I have an HTML string, i need to convert it to pdf, but pdf that i need must have specific size and margin. I did as the example show, now i have pdf with width and height that i set, BUT i can`t change or delete the margin, so pls help me.
using (FileStream fs = new FileStream(somePDFFile, FileMode.OpenOrCreate, FileAccess.Write))
{
iText.Kernel.Pdf.PdfWriter pdfWriter = new iText.Kernel.Pdf.PdfWriter(fs);
iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(pdfWriter);
var v = pdfDoc.GetDefaultPageSize().ApplyMargins<iText.Kernel.Geom.Rectangle>(1, 1, 1, 1, true);
pdfDoc.GetDefaultPageSize().SetWidth(250f);
pdfDoc.GetDefaultPageSize().SetHeight(200f);
pdfDoc.GetCatalog().SetLang(new iText.Kernel.Pdf.PdfString("en-US"));
//Set the document to be tagged
pdfDoc.SetTagged();
iText.Html2pdf.ConverterProperties props = new iText.Html2pdf.ConverterProperties();
iText.Html2pdf.HtmlConverter.ConvertToPdf(htmlString, pdfDoc, props);
pdfDoc.Close();
}

I searched for an answer, but I could only find this approach:
public void createPdf(String src, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
properties.setBaseUri(new File(src).getParent());
List<IElement> elements =
HtmlConverter.convertToElements(new FileInputStream(src), properties);
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
pdf.setTagged();
Document document = new Document(pdf);
document.setMargins(100, 50, 50, 100);
for (IElement element : elements) {
document.add((IBlockElement)element);
}
document.close();
}
In other words: I convert the HTML to a list of elements, and I then add those elements to a Document for which I define a margin.
My preferred solution would have been to define the margin at the level of the <body> tag as done in How to margin the body of the page (html)? Unfortunately, I noticed that this isn't supported yet (and I made a ticket for the iText development team to fix this).
I also tried the convertToDocument() method, but I wasn't able to set immediateFlush to false. I also asked the team to look into this.
Maybe there's also a property that could be introduced, although I'm not all that sure if this should be a ConverterProperties property, a PdfDocument property, or a PdfWriter property.
Update:
You could use the #page rule in CSS to define the margins. For instance:
<style>
#page {
margin-top: 200pt;
}
</style>
This creates a PDF with a top margin of 200pt.

Why on splitting table to new page page padding is changed?

Why on splitting table to new page page padding/margins is/are changed?
See what I mean:
Code:
//Some logic to get data.
PdfPTable table = new PdfPTable(cols);
table.setWidthPercentage(100);
table.setHorizontalAlignment(Element.ALIGN_JUSTIFIED_ALL);
Phrase headerText = new Phrase(header);
headerText.setFont(FontFactory.getFont(FontFactory.COURIER_BOLD,14.6f));
PdfPCell headerRow = new PdfPCell(headerText);
headerRow.setColspan(7);
headerRow.setBackgroundColor(BaseColor.LIGHT_GRAY);
headerRow.setHorizontalAlignment(Element.ALIGN_CENTER);
headerRow.setPadding(5);
table.addCell(headerRow);
Set<Integer> keys = data.keySet();
double sum = 0;
for (Integer key : keys) {
//There data is added in table...
}
//generate pdf
Document document = new Document();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter.getInstance(document,byteArrayOutputStream);
document.open();
document.setMargins(5,5,5,5);
document.add(table);
document.add(p);
document.add(paragraph);
document.addCreationDate();
document.addTitle("Tenant activity");
document.close();
logger.debug("Pdf generated");
File f = new File("activity.pdf");
logger.debug("File path: " + f.getAbsolutePath());
How can I set padding for page / margin for table same as on first page, to each page?

This is the wrong order:
document.open();
document.setMargins(5,5,5,5);
This is the right order:
document.setMargins(5,5,5,5);
document.open();
When you open the document, or when you invoke document.newPage(), the next page is initialized, and you can't change page properties such as size or margins of that page.
So if you change the page size or margins, those changes will only be valid on the next page, not on the current page.
Why is this? Well, this is PDF, everything is page based, and once a page has been initialized, you'd get some really weird side-effects if you change those properties while adding content.

iText- Appending arabic text in pdf table cell phrase at different positions in a page

I want to make a pdf report with English and Arabic texts. I have many tables/phrases across the page. I want to display Arabic text also along with English. I have seen the Arabic example in iText doxument also, using ColumnText. I couldn't help myself with that. My doubt is how to set canvas.setSimpleColumn(36, 750, 559, 780), the arguments in this method for tables/phrases at different positions. I have referred below questions also.Still I have issues.
Writing Arabic in pdf using itext,
http://developers.itextpdf.com/examples/font-examples/language-specific-exampleshe
Below is my code..
private static final String ARABIC = "\u0627\u0644\u0633\u0639\u0631 \u0627\u0644\u0627\u062c\u0645\u0627\u0644\u064a";
private static final String FONT = "resources/fonts/ARIALUNI.TTF";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("test.pdf"));
document.open();
Font f = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
PdfPTable table = new PdfPTable(3);
Phrase phrase = new Phrase();
Chunk chunk = new Chunk("test value", inlineFont);
phrase.add(chunk);
// I want to add Arabic text also here..but direction is not //correct.also coming as single alphabets
p.add(new Chunk(ARABIC, f));
PdfPCell cell1 = new PdfPCell(phrase);
cell1.setFixedHeight(50f);
table.addCell(cell1);
document.add(table);
document.close();

Your code is kind of sloppy.
For example:
you define a PdfPTable with 3 columns, but you only add a single cell. That table will never be rendered.
you define a Phrase with name phrase, but later in your code you use p.add(...). There is no variable with name p in your code.
...
This lack of respect for the StackOverflow reader can result in not getting an answer, because you are expecting the reader not only to fix the actual problem –not being able to use English and Arabic text in a single PdfPCell—, but also to fix all the other (avoidable) errors in your code.
This is a working example:
public static final String FONT = "resources/fonts/NotoNaskhArabic-Regular.ttf";
public static final String ARABIC = "\u0627\u0644\u0633\u0639\u0631 \u0627\u0644\u0627\u062c\u0645\u0627\u0644\u064a";
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
Font f = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
PdfPTable table = new PdfPTable(1);
Phrase phrase = new Phrase();
Chunk chunk = new Chunk("test value");
phrase.add(chunk);
phrase.add(new Chunk(ARABIC, f));
PdfPCell cell = new PdfPCell(phrase);
cell.setUseDescender(true);
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
table.addCell(cell);
document.add(table);
document.close();
}
The result looks like this:
As you can see, both the English and the Arabic text can be read fine. You may be surprised by the alignment and the order of the text. As we are working in the Right-to-Left writing system, left and right are switched. By default, text is left aligned, but as soon as we introduce the RTL run direction, this changes to right aligned.
In your code, you add the English text first, followed by the Arabic text. Text in Arabic is read from right to left. That's why you see the English text to the right, and why the Arabic text is added to the left of the English text.
All of this has been improved in iText 7. iText 7 has an extra pdfCalligraph module that takes care of other writing systems in a transparent way.

iText: Importing styled Text and informations from an existing PDF

I´m generating PDFs using iText and it works fine. But I need a way to import html styled informations from an existing PDF at some point.
I know i could just use the XMLWorker class to generate the text directly from html in my own document. But cause I´m not sure whether it actually supports all html features I´m looking to work around this.
Therefore a PDF is generated from html using XSLT. The content of this PDF then should be copied to my document.
There are two ways discribed in the book ("iText in Action").
One that parses the PDF and gets you the text (or other informations) from the document using PdfReaderContentParser and TextExtractionStrategy.
It looks like this:
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
TextExtractionStrategy strategy;
for(int i=1;i<=reader.getNumberOfPages();i++){
strategy = parser.processContent(i, new LocationTextExtractionStrategy());
document.add(new Chunk(strategy.getResultantText()));
}
But this only prints plain text to the document. Obviously there are more ExtractionStrategys and maybe one of them does exactly what i want but i couldn´t find it yet.
The second way is to copy an itextpdf.text.Image of each side of the PDF to your document. This is obviously not a good idea, cause it will add the entire page to your document even if there is only one line of text in the existing PDF. Its done like this:
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(RESULT));
PdfReader reader = new PdfReader(pdf);
PdfImportedPage page;
for(int i=1;i<=reader.getNumberOfPages();i++){
page = writer.getImportedPage(reader,i);
document.add(Image.getInstance(page));
}
Like I said this copys all the empty lines at the end of the PDF aswell, but i need to continue my text immediatly after the last line of text.
If I could convert this itext.text.Image into a java.awt.BufferedImage I could use getSubImage(); and informations i can extract from the PDF to cut away all the empty lines. But i wasn´t able to find a way to to this.
This are the two ways i found. But cause none of them is suitable for my purpose as they are my question is:
Is there a way to import everything except the empty lines at the end, but including text-style informations, tables and everything else from a PDF to my document using iText?

You can trim away empty space of the XSLT generated PDF and then import the trimmed pages as in your code.
Example code
The following code borrows from the code in my answer to Using iTextPDF to trim a page's whitespace. In contrast to the code there, though, we have to manipulate the media box, not the crop box, because this is the only box respected by PdfWriter.getImportedPage.
Before importing a page from a given PdfReader, crop it using this method:
static void cropPdf(PdfReader reader) throws IOException
{
int n = reader.getNumberOfPages();
for (int i = 1; i <= n; i++)
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MarginFinder finder = parser.processContent(i, new MarginFinder());
Rectangle rect = new Rectangle(finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
PdfDictionary page = reader.getPageN(i);
page.put(PdfName.MEDIABOX, new PdfArray(new float[]{rect.getLeft(), rect.getBottom(), rect.getRight(), rect.getTop()}));
}
}
(excerpt from ImportPageWithoutFreeSpace.java)
The extended render listener MarginFinder is taken as is from the question linked to above. You can find a copy here: MarginFinder.java.
Example run
Using this code
PdfReader readerText = new PdfReader(docText);
cropPdf(readerText);
PdfReader readerGraphics = new PdfReader(docGraphics);
cropPdf(readerGraphics);
try ( FileOutputStream fos = new FileOutputStream(new File(RESULT_FOLDER, "importPages.pdf")))
{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, fos);
document.open();
document.add(new Paragraph("Let's import 'textOnly.pdf'", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.add(Image.getInstance(writer.getImportedPage(readerText, 1)));
document.add(new Paragraph("and now 'graphicsOnly.pdf'", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.add(Image.getInstance(writer.getImportedPage(readerGraphics, 1)));
document.add(new Paragraph("That's all, folks!", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.close();
}
finally
{
readerText.close();
readerGraphics.close();
}
(excerpt from unit test method testImportPages in ImportPageWithoutFreeSpace.java)
I imported both the page from the docText document
and the page from the docGraphics document
into a new document with some text before, between, and after. The result:
As you can see, source styles are preserved but free space around is discarded.

RTL not working in pdf generation with itext 5.5 for Arabic text

I have java code that writes arabic characters with the help of itext 5.5 and xmlworker jars, but its writing left to right even after writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL) is used.
Code used is:
public class CreateArabic extends DefaultHandler {
/** Paths to and encodings of fonts we're going to use in this example */
public static String[][] FONTS = {
{"C:/arialuni.ttf", BaseFont.IDENTITY_H},
{"C:/abserif4_5.ttf", BaseFont.IDENTITY_H},
{"C:/damase.ttf", BaseFont.IDENTITY_H},
{"C:/fsex2p00_public.ttf", BaseFont.IDENTITY_H}
};
/** Holds he fonts that can be used for the peace message. */
public FontSelector fs;
public CreateArabic() {
fs = new FontSelector();
for (int i = 0; i < FONTS.length; i++) {
fs.addFont(FontFactory.getFont(FONTS[i][0], FONTS[i][1], BaseFont.EMBEDDED));
}
}
public static void main(String args[]) {
try {
// step 1
Rectangle pagesize = new Rectangle(8.5f * 72, 11 * 72);
Document document = new Document();//pagesize, 72, 72, 72, 72);// step1
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream("c:\\report.pdf"));
writer.setInitialLeading(200.5f);
//writer.getAcroForm().setNeedAppearances(true);
//writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
document.open();
FontFactory.registerDirectories();
Font font = FontFactory.getFont("C:\\damase.ttf",
BaseFont.IDENTITY_H, true, 22, Font.BOLD);
// step 3
document.open();
// step 4
XMLWorkerHelper helper = XMLWorkerHelper.getInstance();
// CSS
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = helper.getCSS(new FileInputStream(
"D:\\Itext_Test\\Test\\src\\test.css"));
cssResolver.addCss(cssFile);
// HTML
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
// fontProvider.addFontSubstitute("lowagie", "garamond");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(
cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// // Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver,
html);
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
System.out.println("RUN DIRECTION --> "+writer.getRunDirection());
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker,Charset.forName("UTF-8"));
String htmlString2 = "<html><body style=\"color:red;\">Hello"+"??"+"</body></html>";
String htmlString = "<body style='font-family:arial;'>h"+"??"+"<p style='font-family:arial;' > ????? </p></body>";
String html1 ="<html><head></head><body>Hello <p style=\"color:red\" >oo ??</p> World! \u062a\u0639\u0637\u064a \u064a\u0648\u0646\u064a\u0643\u0648\u062f \u0631\u0642\u0645\u0627 \u0641\u0631\u064a\u062f\u0627 \u0644\u0643\u0644 \u062d\u0631\u0641 "+htmlString+"Testing</body></html>";
ByteArrayInputStream is = new ByteArrayInputStream(htmlString.getBytes("UTF-8"));
p.detectEncoding(is);
p.parse(is, Charset.forName("UTF-8"));//.parse(is, "UTF-8");//parse(is);//ASMO-708
// step 5
document.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Output file is also attached.

As documented, this is not supposed to work:
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
Arabic (and Hebrew) can only be rendered correctly in the context of ColumnText and PdfPCell. In other words: if you want to use Arabic from XML Worker, you need to create an ElementList and then add the elments to a ColumnText object as is done here.
You need to set the run direction at the level of the ColumnText object.

//This solution works for me: :)
// document
Document document = new Document(PageSize.LEGAL);
//fonts
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("/Users/ibrahimbakhsh/Library/Fonts/tradbdo.ttf", BaseFont.IDENTITY_H);
fontProvider.register("/Users/ibrahimbakhsh/Library/Fonts/trado.otf", BaseFont.IDENTITY_H);
fontProvider.register("/Users/ibrahimbakhsh/Library/Fonts/tahoma.ttf", BaseFont.IDENTITY_H);
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// CSS
CSSResolver cssResolver =
XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline end = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, end);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// HTML
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.autoBookmark(false);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML));
//writer
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
writer.setInitialLeading(12.5f);
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
// step 4
document.open();
// step 5
for (Element e : elements) {
//out.println(e.toString());
if(e instanceof PdfPTable){
PdfPTable t = (PdfPTable) e;
t.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
ArrayList<PdfPRow> rows = t.getRows();
for(PdfPRow row:rows){
PdfPCell[] cells = row.getCells();
for(PdfPCell cell:cells){
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
}
}
e = t;
}
document.add(e);
}
//try adding new table
PdfPTable table = new PdfPTable(1);
table.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
Font f = new Font(BaseFont.createFont("/Users/ibrahimbakhsh/Library/Fonts/trado.otf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED));
PdfPCell cell = new PdfPCell(new Paragraph("تجربة نص عربي",f));
table.addCell(cell);
document.add(table);
// step 6
document.close();

For developers that need an straightforward solution
I used this trick and output is very clean and nice!
create a PDFPTable with 1 column
for every paragraph of your content, create a Paragraph object and set its alignment to Paragraph.ALIGN_JUSTIFIED (I don't know why but it causes to paragraph align to right of page!!!)
create a PDFPCell and remove its borders using setBorder(Rectangle.NO_BORDER) and add the paragraph to cell
add the cell to the table
here is a code sample to your convenience.
public void main(){
/*
* create and initiate document
* */
// repeat this for all your paragraphs
PdfPTable pTable = new PdfPTable(1);
Paragraph paragraph = getCellParagraph();
paragraph.add("your RTL content");
PdfPCell cell = getPdfPCellNoBorder(paragraph);
pTable.addCell(cell);
// after add all your content
document.add(pTable);
}
private Paragraph getCellParagraph() {
Paragraph paragraph = new Paragraph();
paragraph.setAlignment(Paragraph.ALIGN_JUSTIFIED);
// set other styles you need like custom font
return paragraph;
}
private PdfPCell getPdfPCellNoBorder(Paragraph paragraph) {
PdfPCell cell = new PdfPCell();
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
cell.setPaddingBottom(8);
cell.setBorder(Rectangle.NO_BORDER);
cell.addElement(paragraph);
return cell;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

convert html file with Arabic Characters to pdf using java [duplicate] - java

Related

iText 7 can not set margin

Why on splitting table to new page page padding is changed?

iText- Appending arabic text in pdf table cell phrase at different positions in a page

iText: Importing styled Text and informations from an existing PDF

RTL not working in pdf generation with itext 5.5 for Arabic text

Categories

Resources