How to convert html, css to PDF using Java? - java

I was wondering what are some good tutorials or logic on how to convert HTML and XHTML web pages to PDF using Java?
And also suggest me How can I convert html, bootstrap css to pdf using Java?

There is a library named wkhtmltopdf (https://wkhtmltopdf.org). We used it in our project.
wkhtmltopdf command line utility must be installed on a machine.
Download page: https://wkhtmltopdf.org/downloads.html
Note it also must be installed on servers your app will execute.
Your code should fetch html and css. Css should be added to html header in style tags. I'm not aware of possibility to fetch it as a separate file to html.
Next step is to convert your html to a PDF. This code will produce byte[] containing your PDF.
String html = "<...> your html";
Pdf pdf = new Pdf();
pdf.addPageFromString(html);
pdf.addParam(new Param("--disable-smart-shrinking"));
pdf.addParam(new Param("--header-html", headerPath));
pdf.addParam(new Param("--footer-html", footer.toAbsolutePath().toString()));
return pdf.getPDF();

Related

HTML to PDF - java - batik

I am trying to export a html div tag(which has svg as well) to PDF using batik. I can achieve to transform just teh svg to PDF. But with div I am facing errors. Any suggestions on how to use html content to PDF using batik.
I referred to this, https://xmlgraphics.apache.org/batik/using/transcoder.html
SAMPLE HTML which I am trying to use, <div id="toexport"><svg>MY CHART</svg> <div>OTHER CONTENTS like LEGENDS</div></div>

How to generate the PDF in CQ5.6.1 using page content in cq5

How to generate the PDF in CQ5.6.1 using page content.
A button in my site (genarate PDF) on click of the button i have to genarate the PDF file using the same page content.
Please let me know is there any out of the box PDF genarator in CQ or do i need to get the any linsenced product to genarate the PDF.
Thanks..
Adobe CQ is integrated with the Apache FOP, a formatter able to create PDF files. This tutorial describes how to enable content rewriter providing PDF version of the content under the .pdf extension.
However, please keep in mind that this approach requires manually writing the XSLT transform file able to process your page (and every component on it) and output the XSL-FO document.
In a previous project (CQ 5.5) we used https://code.google.com/p/wkhtmltopdf/ to create PDF files.. worked pretty good!
I had used Phantomjs to create a a custom pdf from the cq5 pages. for example if you don not want to display the right trail in the pdf or you want to disable the header footer. all this can be achieved with the help of phantomjs.
create a servlet which will execute a command at your server.
phantomjs <custom.js> 'page_url' nameofthepdf.pdf
here custom.js will show or hide html content based on your need.
This will work for all pages irrespective of cq5 or any tool.

Displaying Microsoft Office Document(word, excel etc) in JSF Page or HTML

I got to display documents in web page instead download it. There are HTML 5 components(object, iframe) for pdf but office document. Is there a way to display Word, Excel etc. documents on JSF page or HTML.
Web browsers cannot natively display Microsoft Office documents.
The solution is usually to generate them as PDFs on the server-side, then allow the user to view them (if that have the PDF plugin) or download the file.
If you are working with OpenXML documents (.docx, etc...) you can use an XML Parser to obtain the content and display it as you wish.

HTML storage in Java

I want to download an HTML page, extract some used full text out of this HTML and convert the HTML to PDF then store the useful text and PDF in a noSQL solution.
What is the most efficient way to pass the HTML to the modules which extract useful text and the module which creates the PDF. I don't want to download the same HTML twice.
One way to store the HTML is to download the HTML to a local disk under a unique named folder and pass the path to other modules so that they can process the HTML.
This approach doesn't looks that good to me, as there is implementation overhead.
I would love to see the entire HTML as a single variable so I can give it to other modules so they can traverse the HTML without loading it. One idea that crossed my mind is to download and zip the HTML and related code/pics then store the binary in a byte[].
I haven't used these before but a quick Type search on eclipse with the text html gave me this:
Class HTMLDocument
From the docs :
A document that models HTML. The purpose of this model is to support both browsing and editing

Convert html to pdf with linked documents inline

I need to convert a bundle of static HTML documents into a single PDF file programmatically on the server side on a Java/J2EE platform using a batch process preferably. The pdf files would be distributed to site users for offline browsing of the web pages.
The major points of the requirements are:
The banner at the top should not be present in the final pdf document.
The navigation bar on the left should be transformed into pdf bookmarks from html hyperlinks.
All hyperlinked contents (html/pdf/doc/docx etc.) present in the web pages should be part of the final pdf document with pdf bookmarks.
Is there any standard open source way of doing this?
Try Apache FOP. I just used it to convert XML to PDF and I think you can do the same with HTML/DOM. The website has a whole section on running FOP in a Java application and there's example code for DOM to PDF.
You can try iText - but I am not sure whether it handles all that you require.
Moreover, it is always better if you explore many options and then decide what you can and cannot do. In many cases there won't be any library/API that will out of the box support all that you ask for.
You can try www.alt-soft.com Xml2PDF for this

Categories