Best way to convert HTML page into PDF file - java

I am writing a new service Convert-HTML-TO-PDF. But now I am confused that what way should I prefer.
What ways I have to implement:
Use Head-less browser and capture the HTML page and convert to PDF
Use Java/Node Lib to convert. Which will create HTML relevant component in PDF file and then render?
Now, please help me to understand what will be the best way to implement a service and why!
[update]
And what will be the advantages and disadvantages of each approach

In my view, the best way forward always depends on what you already have experience with and what approach you take. There is no right or wrong here, everyone has to decide that for themselves based on their preferences.
Each approach has advantages and disadvantages. Some of them are:
Headless Browser:
Advantage:
No large Libs necessary, therefore very memory saving
Disadvantage
the desired browser must be installed on the computer/server
rendering may differ for different browsers
Library:
Advantage:
different libraries available
for the popular libs there is a good documentation and code examples
Disadvantage
When upgrading to a newer version, code usually needs to be adapted.
When upgrading to a newer version, the result may look different.
In my projects I use a headless chrome browser. For this I found an easy to use api on Github, which uses the DevTools of Chrome.
It also includes a simple example how to print a page into a PDF.
For my purposes I have customized this example and write the HTML into a temporary file and then navigate to that file.
// Navigate to HTML-File
page.navigate(htmlTempFile.getAbsolutePath());
I can't say if this is the best way, but for me this was the easiest and most understandable way

Related

GWTCanvas doesn't work in IE8

We heavily use GWTCanvas in our project and it work excellent.
Except for IE8 in its standard document mode.
To solve this we have tried:
Update gwt-incubator to the latest version (2.1.0)
Patch GWTCanvas.java according to this link
But this didn't help. Has anybody make it works on IE8? Working and reliable solution/approach would be much appreciated.
UPDATE
It has been solved in this way:
patch excanvas.js and inject it into your GWT class
replace GWTCanvasImpl with your class via deffered binding in proper gwt.xml file
Hope this helps someone.
GWTCanvas uses the SVG specification to implement the vector objects on a given GWT site.
Although almost every other browser (Firefox, chrome and Opera and I am sure many others) have implemented the SVG one way or another, Ms does not support svg on a sufficient level yet.
Maybe including http://code.google.com/p/svgweb/ google javascript library implementation of SVG will solve your problems but then again it might not (have not tested it personally).
A different implementation of Vector Graphics fro GWT - gwt-graphics is another solution but again, if your vector objects are many, the emulation on IE gwt-graphics does make the applications not-responsive and just plain slow (personal experience).

How do I get the rendered web page page from a URL?

I don't want just the source code. I want the rendered page. This is an important distinction that I apparently cannot make by simply searching Google.
Does anyone know how I can get the rendered page from a URL?
This needs to be done in Java, hopefully without an extra library.
Another solution would be to use HTMLUnit which is a "GUI-less browser for JAVA". It is recommended by Google to generate snapshots of ajax-based webpages to make them crawlable.
You can try using a library that wraps a web browser, for example Berkelium. If you need it in Java, a Google search produced this Java wrapper API for Berkelium (I haven't tried it personally).
sites.google has an example of its use:

Simple way to apply template to existing HTML code

Suppose I have a bunch of (very simple) HTML pages, that I want to apply a common 'theme".
These files are downloaded using various Groovy scripts, and I would like to apply to them this styling during a maven build. How could I do that ?
Using which framework/library could I do that ?
Furthermore notice I want to do that in a static fashion, that's to say I want to have the following process to occur
Files are downloaded by Groovy scripts
They are processed (in a "magical" fashion) by this library
They may be sent by FTP/SCP to an hosting server
Do you know such an easy to use library ?
Depends on the details of the task but having in mind the steps you've described you can consider using velocity templates.
I would suggest using sitemesh decorator. I am a user of old version but a new release is being worked on that allows you to do exactly what you are asking for. Do a google search on sitemesh and you should find lots of examples.
In a nutshell sitemesh decoration: basic html + template = decorated page.

HTML with JS to Image/PDF

Is it possible to convert an html page with charts generated by javascript to an Image or PDF in Java?
I familiar with iText framework and it seems to be suitable but I am not sure how it handle JS generated things.
A quick search turned up this as a possible answer.
Using a library to convert to XSL-FO then another one to convert that to PDF.
Edit: This might interest you as well. There's a bit on some JBrowser class that seems to let you print web pages.
It depends on how the were generated. I suppose three possibilities:
Canvas tag.
You need to add a bit of JS code to get image using toDataURL canvas method
SVG.
You can add some code to get the full code of generated SVG document via innerHTML method.
Flash.
The worst case. I think it's hardly possible to achive what you want.
Solution 1
If you have access to plain HTML (taken after the JavaScript has executed and built the page), you can easily pass it to iText and convert it to PDF. I would recommend using Flyng-Saucer (which in turn uses iText) which has a very good and convenient API for this (See http://code.google.com/p/flying-saucer/ ).
Solution 2
On the other hand, if you do not have access to the final HTML output, you could use the Swing libraries to render the page and then take a screenshot of it. This will allow you to even use Flash, but I'm not sure whether this approach will be suitable to your problem.
However, if it is the case, you can load the web page into a Swing application (you will need to rely on a third-party browser component for JS support, but there are quite a lot out there), and then you can use the Robot class to get a screenshot of it.
Take a look at http://download.oracle.com/javase/6/docs/api/java/awt/Robot.html

Generate HTML files using XML configuration

My target is to assemble a static web site that has a lot of repeating code. Now, I could use JSP includes for that purpose. But the site will be modified infrequently and under very heavy load, also using features like gzip and I don't need the complications.
My idea is to put up a build process with some tool like ant, That build process will concatenate all HTML pieces, preprocess HTML, JS, CSS with minifier and finally apply gzip.
I want an XML configuration that will define the parts that need to go in every html page and their order.
I need advice on ant or any similar tool; how to approach the configuration, any external tools that will help? Any suggestions are much appreciated.
XSLT is perfectly suited to transform XML into another format like HTML.
You can download Apache Xalan to give it a try. Ant has support for XSLT processing.
In the java world, you can take a look at Apache Forrest, which precisely do that kind of things.
In other worlds, there also exist webgen, which is a competent Ruby site builder.
I also vaguey remember there are other alternatives, but i can't find back their name.

Categories