iText converting incomplete Html file content to pdf using java

iText converting incomplete Html file content to pdf using java - java

I am trying to convert html file into pdf using iText lib(4.2.0). But the problem is it's not printing all the html content to pdf, its only partially printing some data. Here is the code to convert html to pdf.
InputStream il = new FileInputStream("/tmp/test.html");
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("/tmp/pdf.pdf"));
writer.setInitialLeading(12.5f);
// step 3
document.open();
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// CSS
CSSResolver cssResolver = new StyleAttrCSSResolver();
InputStream csspathtest =new FileInputStream("/tmp/test.css");
CssFile cssfiletest = XMLWorkerHelper.getCSS(csspathtest);
cssResolver.addCss(cssfiletest);
Pipeline<?> pipeline = new CssResolverPipeline(cssResolver,
new HtmlPipeline(htmlContext,
new PdfWriterPipeline(
document, writer)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser p = new XMLParser(worker);
p.parse(il);
// step
document.close();
Here is the sample html file http://codepaste.net/65kmhp

Related

iText7 PdfHtml - Display Page Number at Footer

I am using iText7 - pdfHTML to convert my HTML template into a PDF.
I want to display a Page X of Y on my footer; However, I always getting Page 1 of 1. (There are 20 pages)
Below is my codebase to convert multiple HTML files into PDF.
// Converter Class file
ByteArrayOutputStream outputBaos = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(outputBaos);
PdfDocument pdf = new PdfDocument(writer);
PdfMerger merger = new PdfMerger(pdf);
...
for (String template : templates) {
ByteArrayOutputStream tempOutput = new ByteArrayOutputStream();
PdfWriter tempWriter = new PdfWriter(tempOutput);
PdfDocument tempPdf = new PdfDocument(tempWriter);
HtmlConverter.convertToPdf(template, tempPdf, converterProperties);
tempPdf = new PdfDocument(
new PdfReader(
new ByteArrayInputStream(tempOutput.tpByteArray())
)
);
merger.merge(tempPdf, 1, tempPdf.getNumberOfPages());
tempPdf.close();
}
pdf.close();
return outputBaos;
// HTML file
#page {
#bottom-right {
content: "Page " counter(page) " of " counter(pages);
}
}
May I know how can I achieve this?

ă ș ț characters missing from pdf generated from html with PdfWriter

I am trying to convert some html content to a pdf using the itext PdfWriter, like this:
Document document = new Document();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
InputStream stream = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"));
document.close();
but the ă ș ț charaters are missing from the generated pdf. I have tried setting the encoding or the font, but with no luck. What I tried was to use a font provider and set it as a param to the parseXHtml method.
I set the encoding, but nothing changed.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
fontProvider.setUseUnicode(true);
fontProvider.defaultEncoding = BaseFont.CP1257;
I also tried setting the font, but it was not applied to the pdf.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register(PATH_TO_TTF_FONT_FILE_HOSTED_ON_S3);
And then set the param for parseXHtml.
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"), fontProvider);
Is there any way I could use the PdfWriter to convert all characters correctly from html to pdf?

itext xmlworker doesn't work with custom font family

My application parses html and put the content in pdf, now it needs to support multiple fonts ex: Myanmar and Chinese.
I am able to do this with phrases but not with html even for single font family
Working Code
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("/home/a2.pdf"));
Font font = FontFactory.getFont("KozMinPro-Regular", "UniJIS-UCS2-H", false);
Font font1 = FontFactory.getFont("/home/mm3.ttf", BaseFont.IDENTITY_H, false);
FontSelector fs = new FontSelector();
fs.addFont(font);
fs.addFont(font1);
Phrase phrase = fs.process("長");
phrase.add(fs.process("၀န္ထမ္းလစလႊာ( လ )"));
phrase.add(fs.process("123aab"));
document.open();
document.add(phrase);
document.close();
Tried with XMLWorker but it didn't work:
final MyFontFactory fontFactory = new MyFontFactory();
FontFactory.register("/home/Downloads/mm3.ttf");
FontFactory.setFontImp(fontFactory);
final HtmlPipelineContext htmlContext = new HtmlPipelineContext(new CssAppliersImpl(fontFactory));
final CSSResolver cssResolver = XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("/home/a2.pdf"));
document.open();
String str="<span style=\"font-family: "Open Sans", sans-serif; white-space: pre-wrap; background-color: rgb(255, 255, 255);\">Deduc၀န္ထမ္းလစာေပးေခ်လႊာ( လစာငွေ )tions哈罗</span> ";
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
InputStream is = new ByteArrayInputStream(str.getBytes(StandardCharsets.UTF_8));
worker.parseXHtml(writer, document, is, Charset.forName("UTF-8"));
// step 5
document.close();

How to change pdf font to Turkish-Style while converting HTML to PDF with ITEXT

I m running this code block to convert html page to pdf document.But I did not see Turkish characters on 'result.pdf'.My work is:
try {
Rectangle pagesize = new Rectangle(800,1200);
final Document document = new Document(pagesize);
OutputStream os = new FileOutputStream("deneme.pdf");// ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document,os);
document.open();
HtmlCleaner cleaner = new HtmlCleaner();
CleanerProperties props = cleaner.getProperties();
TagNode rootNode = cleaner.clean("Source Html");
XmlSerializer serial = new PrettyXmlSerializer(props);
String htmlClean = serial.getAsString(rootNode);
System.out.println(htmlClean);//Tidy Html
CSSResolver cssResolver = XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
/*
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
// fontProvider.setUseUnicode(true);
fontProvider.isRegistered("Helvetica");
fontProvider.addFontSubstitute("Helvetica", "Arial");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
*/
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.setImageProvider(new ImageProvider());
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
/*
BaseFont courier = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.EMBEDDED);
Font font = new Font(courier, 12, Font.NORMAL);
Chunk chunk = new Chunk("",font);
document.add(chunk);
*/
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new ByteArrayInputStream(htmlClean.getBytes("utf-8")));
document.close();
} catch (Exception e) {
e.printStackTrace();
}
I tried codes in comment lines but result is same,wrong.
How can I change result with Turkish Characters??
when I tried that code block
BaseFont freeSans = BaseFont.createFont("FreeSans.ttf","Cp1254", true);
Font font = new Font(freeSans,12, Font.NORMAL);
Chunk chunk = new Chunk("ŞşĞğİıÖö",font);
document.add(chunk);
I saw 'ŞşĞğİıÖö' in 'result.pdf'
But how can I edit XmlParser before parsing ??

How to add external CSS while generating PDF?

Currently i am using following code to generate PDF in a JSP file:
response.setContentType("application/force-download");
response.setHeader("Content-Disposition", "attachment;filename=reports.pdf");
Document document = new Document();
document.setPageSize(PageSize.A1);
PdfWriter writer = null;
writer = PdfWriter.getInstance(document, response.getOutputStream());
document.open();
ByteArrayInputStream bis = new ByteArrayInputStream(htmlSource.toString().getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, bis);
document.close();
With this am able to generate PDF.
But i would like to add CSS file while generating PDF.
Please Help me...

i am not sure in java how can you use but in c# you can add external style sheet code or syntax by this code:-
StyleSheet css = new StyleSheet();
css.LoadTagStyle("body", "face", "Garamond");
css.LoadTagStyle("body", "encoding", "Identity-H");
css.LoadTagStyle("body", "size", "12pt");
may be this helps you
Regards,
vinit

Please take a look at the ParseHtmlTable1 example. In this example, we have HTML stored in a StringBuilder object and some CSS stored in a String. In my example, I convert the sb object and the CSS object to an InputStream. If you have files with the HTML and the CSS, you could easily use a FileInputStream.
Once you have an InputStream for the HTML and the CSS, you can use this code:
// CSS
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new ByteArrayInputStream(CSS.getBytes()));
cssResolver.addCss(cssFile);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new ByteArrayInputStream(sb.toString().getBytes()));
Or, if you don't like all that code:
ByteArrayInputStream bis = new ByteArrayInputStream(htmlSource.toString().getBytes());
ByteArrayInputStream cis = new ByteArrayInputStream(cssSource.toString().getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, bis, cis);

You may try this code reference.
$html .= '<style>'.file_get_contents(_BASE_PATH.'stylesheet.css').'</style>';

Change content type to
response.setContentType("application/pdf");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

iText converting incomplete Html file content to pdf using java - java

Related

iText7 PdfHtml - Display Page Number at Footer

ă ș ț characters missing from pdf generated from html with PdfWriter

itext xmlworker doesn't work with custom font family

How to change pdf font to Turkish-Style while converting HTML to PDF with ITEXT

How to add external CSS while generating PDF?

Categories

Resources