Svg integration in pdf using flying saucer - java

I under gone a situation of converting html to pdf, Thankfully I can achieved this through flying saucer api. But My HTML consists of svg tags while converting I am unable to get the svg in pdf. It can be achieved using a Stackoverflow question
and Tutorial.
What is meant by the replacedElementFactory?
ChainingReplacedElementFactory chainingReplacedElementFactory
= new ChainingReplacedElementFactory();
chainingReplacedElementFactory.addReplacedElementFactory(replacedElementFactory);
chainingReplacedElementFactory.addReplacedElementFactory(new SVGReplacedElementFactory());
renderer.getSharedContext().setReplacedElementFactory(chainingReplacedElementFactory);

It's just an error in the tutorial, the line with replacedElementFactory is not needed.
Here is my working example.
Java:
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.xhtmlrenderer.pdf.ITextRenderer;
public class PdfSvg {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document inputDoc = builder.parse("svg.html");
ByteArrayOutputStream output = new ByteArrayOutputStream();
ITextRenderer renderer = new ITextRenderer();
ChainingReplacedElementFactory chainingReplacedElementFactory = new ChainingReplacedElementFactory();
chainingReplacedElementFactory.addReplacedElementFactory(new SVGReplacedElementFactory());
renderer.getSharedContext().setReplacedElementFactory(chainingReplacedElementFactory);
renderer.setDocument(inputDoc, "");;
renderer.layout();
renderer.createPDF(output);
OutputStream fos = new FileOutputStream("svg.pdf");
output.writeTo(fos);
}
}
HTML:
<html>
<head>
<style type="text/css">
svg {display: block;width:100mm;height:100mm}
</style>
</head>
<body>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<circle cx="50" cy="50" r="40" stroke="black" stroke-width="3"
fill="red" />
</svg>
</div>
</body>
</html>
The ChainingReplacedElementFactory, SVGReplacedElement and SVGReplacedElementFactory comes from the tutorial.

If you wanted an in page solution, here's an alternate using #cloudformatter which is a remote formatting service. I added their Javascript to your fiddle along with some text and your Highchart chart.
http://jsfiddle.net/yk0Lxzg0/1/
var click="return xepOnline.Formatter.Format('printme', {render:'download'})";
jQuery('#buttons').append('<button onclick="'+ click +'">PDF</button>');
The above code placed in the fiddle will format the div with 'id' printme to PDF for download. That div includes your chart and some text.
http://www.cloudformatter.com/CSS2Pdf.APIDoc.Usage shows usage instructions and has many more samples of charts in SVG formatted to PDF either by themselves or as part of pages combined with text, tables and such.

#Rajesh I hope you already found a solution to your problem. If not (or anyone having issues working with flying saucer, batik and svg tags) then you might want to consider this-
removing all clip-path="url(#highcharts-xxxxxxx-xx)" from <g> tags did the trick for me.

My code is referring to the missing code part "SVGReplacedElementFactory".
And I use it like this:
renderer
.getSharedContext()
.setReplacedElementFactory( new B64ImgReplacedElementFactory() );
import com.itextpdf.text.BadElementException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.codec.Base64;
import org.apache.batik.transcoder.TranscoderException;
import org.apache.batik.transcoder.TranscoderInput;
import org.apache.batik.transcoder.TranscoderOutput;
import org.apache.batik.transcoder.image.JPEGTranscoder;
import org.apache.batik.transcoder.image.PNGTranscoder;
import org.w3c.dom.Element;
import org.xhtmlrenderer.extend.FSImage;
import org.xhtmlrenderer.extend.ReplacedElement;
import org.xhtmlrenderer.extend.ReplacedElementFactory;
import org.xhtmlrenderer.extend.UserAgentCallback;
import org.xhtmlrenderer.layout.LayoutContext;
import org.xhtmlrenderer.pdf.ITextFSImage;
import org.xhtmlrenderer.pdf.ITextImageElement;
import org.xhtmlrenderer.render.BlockBox;
import org.xhtmlrenderer.simple.extend.FormSubmissionListener;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
public class B64ImgReplacedElementFactory implements ReplacedElementFactory
{
public ReplacedElement createReplacedElement(LayoutContext c, BlockBox box, UserAgentCallback uac, int cssWidth, int cssHeight)
{
Element e = box.getElement();
if(e == null)
{
return null;
}
String nodeName = e.getNodeName();
if(nodeName.equals("img"))
{
String attribute = e.getAttribute("src");
FSImage fsImage;
try
{
fsImage = buildImage(attribute, uac);
}
catch(BadElementException e1)
{
fsImage = null;
}
catch(IOException e1)
{
fsImage = null;
}
if(fsImage != null)
{
if(cssWidth != -1 || cssHeight != -1)
{
fsImage.scale(cssWidth, cssHeight);
}
return new ITextImageElement(fsImage);
}
}
return null;
}
protected FSImage buildImage(String srcAttr, UserAgentCallback uac) throws IOException, BadElementException
{
if(srcAttr.startsWith("data:image/"))
{
// BASE64Decoder decoder = new BASE64Decoder();
// byte[] decodedBytes = decoder.decodeBuffer(b64encoded);
// byte[] decodedBytes = B64Decoder.decode(b64encoded);
byte[] decodedBytes = Base64.decode(srcAttr.substring(srcAttr.indexOf("base64,") + "base64,".length(), srcAttr.length()));
return new ITextFSImage(Image.getInstance(decodedBytes));
}
FSImage fsImage = uac.getImageResource(srcAttr).getImage();
if(fsImage == null)
{
return convertToPNG(srcAttr);
}
return null;
}
private FSImage convertToPNG(String srcAttr) throws IOException, BadElementException
{
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PNGTranscoder t = new PNGTranscoder();
// t.addTranscodingHint(JPEGTranscoder.KEY_PIXEL_UNIT_TO_MILLIMETER, (25.4f / 72f));
t.addTranscodingHint(JPEGTranscoder.KEY_WIDTH, 4000.0F);
t.addTranscodingHint(JPEGTranscoder.KEY_HEIGHT, 4000.0F);
try
{
t.transcode(
new TranscoderInput(srcAttr),
new TranscoderOutput(byteArrayOutputStream)
);
}
catch(TranscoderException e)
{
e.printStackTrace();
}
byteArrayOutputStream.flush();
byteArrayOutputStream.close();
return new ITextFSImage(Image.getInstance(byteArrayOutputStream.toByteArray()));
}
public void remove(Element e)
{
}
#Override
public void setFormSubmissionListener(FormSubmissionListener formSubmissionListener)
{
}
public void reset()
{
}
}

Related

How to load external image in FileInputStream

I am working in html to pdf conversion using flying saucer(itextrenderer)...I have to render image..I have rendered the image which was in my local storage(it works fine)..I have to do the same thing with external image..
Here is my code snippet for html to pdf image renderer...
{
try {
String url = new File(inputHtmlPath).toURI().toURL().toString();
System.out.println("URL: " + url);
OutputStream out = new FileOutputStream(outputPdfPath);
File signUpTemplate = new File("C:/Users/SFLTP022/Desktop/task/index1.html");
String content=FileUtils.readFileToString(signUpTemplate);
//Flying Saucer part
ITextRenderer renderer = new ITextRenderer();
renderer.getSharedContext().setReplacedElementFactory(new MediaReplacedElementFactory(renderer.getSharedContext().getReplacedElementFactory()));
renderer.setDocumentFromString(content.toString());
renderer.layout();
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
renderer.createPDF(baos);
//ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(out);
out.close();
}
Here is my default class to render html image,in which i have to load external image
import java.io.FileInputStream;
import java.io.InputStream;
import java.net.URL;
import org.apache.commons.io.IOUtils;
import org.w3c.dom.Element;
import org.xhtmlrenderer.extend.FSImage;
import org.xhtmlrenderer.extend.ReplacedElement;
import org.xhtmlrenderer.extend.ReplacedElementFactory;
import org.xhtmlrenderer.extend.UserAgentCallback;
import org.xhtmlrenderer.layout.LayoutContext;
import org.xhtmlrenderer.pdf.ITextFSImage;
import org.xhtmlrenderer.pdf.ITextImageElement;
import org.xhtmlrenderer.render.BlockBox;
import org.xhtmlrenderer.simple.extend.FormSubmissionListener;
import com.lowagie.text.Image;
public class MediaReplacedElementFactory implements ReplacedElementFactory {
private final ReplacedElementFactory superFactory;
public MediaReplacedElementFactory(ReplacedElementFactory superFactory) {
this.superFactory = superFactory;
}
#Override
public ReplacedElement createReplacedElement(LayoutContext layoutContext, BlockBox blockBox, UserAgentCallback userAgentCallback, int cssWidth, int cssHeight) {
Element element = blockBox.getElement();
if (element == null) {
return null;
}
String nodeName = element.getNodeName();
String className = element.getAttribute("class");
// Replace any <div class="media" data-src="image.png" /> with the
// binary data of `image.png` into the PDF.
if ("div".equals(nodeName) && "media".equals(className)) {
if (!element.hasAttribute("data-src")) {
throw new RuntimeException("An element with class `media` is missing a `data-src` attribute indicating the media file.");
}
InputStream input = null;
try {
input = new FileInputStream("https://cdn.zetran.com/testasset/images/banner/zetran/banner-parts/base/png/" + element.getAttribute("data-src"));
final byte[] bytes = IOUtils.toByteArray(input);
final Image image = Image.getInstance(bytes);
final FSImage fsImage = new ITextFSImage(image);
if (fsImage != null) {
if ((cssWidth != -1) || (cssHeight != -1)) {
fsImage.scale(cssWidth, cssHeight);
}
return new ITextImageElement(fsImage);
}
} catch (Exception e) {
throw new RuntimeException("There was a problem trying to read a template embedded graphic.", e);
} finally {
IOUtils.closeQuietly(input);
}
}
return this.superFactory.createReplacedElement(layoutContext, blockBox, userAgentCallback, cssWidth, cssHeight);
}
#Override
public void reset() {
this.superFactory.reset();
}
public void remove(Element e) {
this.superFactory.remove(e);
}
#Override
public void setFormSubmissionListener(FormSubmissionListener listener) {
// TODO Auto-generated method stub
}
}
When i tried this by loading local image it works fine,as shown below
input = new FileInputStream("C:\Users\Public\Pictures\Sample Pictures\" + element.getAttribute("data-src"));
My html part(local storage image) looks like
<div id="logo" class="media" data-src="Desert.jpg" style="width: 177px; height: 60px" />
My html part(External storage image) looks like
<div id="logo" class="media" data-src="base.png" style="width: 177px; height: 60px" />
The error message is
java.lang.RuntimeException: There was a problem trying to read a template embedded graphic.
at com.boot.MediaReplacedElementFactory.createReplacedElement(MediaReplacedElementFactory.java:56)
at org.xhtmlrenderer.render.BlockBox.calcDimensions(BlockBox.java:716)
at org.xhtmlrenderer.render.BlockBox.calcDimensions(BlockBox.java:666)
at org.xhtmlrenderer.render.BlockBox.collapseBottomMargin(BlockBox.java:1205)
at org.xhtmlrenderer.render.BlockBox.collapseBottomMargin(BlockBox.java:1228)
at org.xhtmlrenderer.render.BlockBox.collapseMargins(BlockBox.java:1126)
at org.xhtmlrenderer.render.BlockBox.layout(BlockBox.java:811)
at org.xhtmlrenderer.render.BlockBox.layout(BlockBox.java:776)
at org.xhtmlrenderer.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
at org.xhtmlrenderer.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
at org.xhtmlrenderer.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
at org.xhtmlrenderer.render.BlockBox.layoutChildren(BlockBox.java:967)
at org.xhtmlrenderer.render.BlockBox.layout(BlockBox.java:847)
at org.xhtmlrenderer.render.BlockBox.layout(BlockBox.java:776)
at org.xhtmlrenderer.pdf.ITextRenderer.layout(ITextRenderer.java:229)
at com.boot.App6.generatePDF(App6.java:67)
at com.boot.App6.main(App6.java:27)
Caused by: java.io.FileNotFoundException: https:\cdn.zetran.com\testasset\images\banner\zetran\banner-parts\base\png\base.png (The filename, directory name, or volume label syntax is incorrect)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at com.boot.MediaReplacedElementFactory.createReplacedElement(MediaReplacedElementFactory.java:45)
Yes, you can write file paths as URLs, e.g. file://...., but FileInputStream can only handle those kind of file-protocol-URLs pointing to a local file. Http-URLs must be loaded with some kind of HTTP Client.

Huge white space after header in PDF using Flying Saucer

I am trying to export an HTML page into a PDF using Flying Saucer. For some reason, the pages have a large white space after the header (id = "divTemplateHeaderPage1") divisions.
The jsFiddle link to my HTML code that is being used by PDF renderer: https://jsfiddle.net/Sparks245/uhxqdta6/.
Below is the Java code used for rendering the PDF (Test.html is the same HTML code in the fiddle) and rendering only one page.
import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.json.HTTP;
import org.json.JSONException;
import org.json.*;
import org.json.simple.JSONArray;
import org.json.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
import org.json.simple.parser.*;
import org.xhtmlrenderer.pdf.ITextRenderer;
import com.lowagie.text.DocumentException;
import com.lowagie.text.List;
import com.sun.xml.internal.bind.v2.runtime.unmarshaller.XsiNilLoader.Array;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStream;
#WebServlet("/PLPDFExport")
public class PLPDFExport extends HttpServlet
{
//Option for Serialization
private static final long serialVersionUID = 1L;
public PLPDFExport()
{
super();
}
//Get method
protected void doGet(HttpServletRequest request,
HttpServletResponse response)
throws ServletException,
IOException
{
}
//Post method
protected void doPost(HttpServletRequest request,
HttpServletResponse response)
throws ServletException,
IOException
{
StringBuffer jb = new StringBuffer();
String line = null;
int Pages;
String[] newArray = null;
try
{
BufferedReader reader = request.getReader();
while ((line = reader.readLine()) != null)
{ jb.append(line);
}
} catch (Exception e) { /*report an error*/ }
try
{
JSONObject obj = new JSONObject(jb.toString());
Pages = obj.getInt("Pages");
newArray = new String[1];
for(int cnt = 1; cnt <= 1; cnt++)
{
StringBuffer buf = new StringBuffer();
String base = "C:/Users/Sparks/Desktop/";
buf.append(readFile(base + "Test.html"));
newArray[0] = buf.toString();
}
}
catch (JSONException e)
{
// crash and burn
throw new IOException("Error parsing JSON request string");
}
//Get the parameters
OutputStream os = null;
try {
final File outputFile = File.createTempFile("FlyingSacuer.PDFRenderToMultiplePages", ".pdf");
os = new FileOutputStream(outputFile);
ITextRenderer renderer = new ITextRenderer();
// we need to create the target PDF
// we'll create one page per input string, but we call layout for the first
renderer.setScaleToFit(true);
renderer.isScaleToFit();
renderer.setDocumentFromString(newArray[0]);
renderer.layout();
try {
renderer.createPDF(os, false);
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// each page after the first we add using layout() followed by writeNextDocument()
for (int i = 1; i < newArray.length; i++) {
renderer.setScaleToFit(true);
renderer.isScaleToFit();
renderer.setDocumentFromString(newArray[i]);
renderer.layout();
try {
renderer.writeNextDocument();
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
// complete the PDF
renderer.finishPDF();
System.out.println("PDF Downloaded to " + outputFile );
System.out.println(newArray[0]);
}
finally {
if (os != null) {
try {
os.close();
} catch (IOException e) { /*ignore*/ }
}
}
//Return
response.setContentType("application/json");
response.setCharacterEncoding("UTF-8");
response.getWriter().write("File Uploaded");
}
String readFile(String fileName) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(fileName));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
return sb.toString();
} finally {
br.close();
}
}
}
The link for exported PDF: https://drive.google.com/file/d/13CmlJK0ZDLolt7C3yLN2k4uJqV3TX-4B/view?usp=sharing
I tried adding css properties like page-break-inside: avoid to the header divisions but it didn't work. Also I tried adding absolute positions and top margins to the body division (id = "divTemplateBodyPage1") just below the header div, but the white space continues to exist.
Any suggestions would be helpful.
Please take a look at the metadata of your PDF:
You are using an old third party tool that is not endorsed by iText Group, and that uses iText 2.1.7, a version of iText dating from 2009 that should no longer be used.
It would probably have been OK to complain and to write "My code isn't working" about 7 years ago, but if you would use the most recent version of iText, the result of converting your HTML to PDF would look like this:
I only needed a single line of code to get this result:
HtmlConverter.convertToPdf(new File(src), new File(dest));
In this line src is the path the the source HTML and dest is the path to the resulting PDF.
I only had to apply one minor change to your HTML. I change the #page properties like this:
#page {
size: 27cm 38cm;
margin: 0.2cm;
}
If I hadn't changed this part of the CSS, the page size would have been A4, and in that case, not all the content would have fitted the page. I also added a small margin because I didn't like the fact that the border was sticking to close to the sides of the page.
Morale: don't use old versions of libraries! Download the latest version of iText and the pdfHTML add-on. You need iText 7 core and the pdfHTML add-on. You might also want to read the HTML to PDF tutorial.

Extract Multiple Embedded Images from a single PDF Page using PDFBox

Friends, I am using PDFBox 2.0.6. I have been successfull in extracting images from the pdf file, But right now it is creating an image for single pdf page. But the issue is that there can be any no. of images in a pdf page, And I want that each embedded image should be extracted as a single image itself.
Here is the code,
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
public class DemoPdf {
public static void main(String args[]) throws Exception {
//Loading an existing PDF document
File file = new File("C:/Users/ADMIN/Downloads/Vehicle_Photographs.pdf");
PDDocument document = PDDocument.load(file);
//Instantiating the PDFRenderer class
PDFRenderer renderer = new PDFRenderer(document);
File imageFolder = new File("C:/Users/ADMIN/Desktop/image");
for (int page = 0; page < document.getNumberOfPages(); ++page) {
//Rendering an image from the PDF document
BufferedImage image = renderer.renderImage(page);
//Writing the image to a file
ImageIO.write(image, "JPEG", new File(imageFolder+"/" + page +".jpg"));
System.out.println("Image created"+ page);
}
//Closing the document
document.close();
}
}
Is it possible in PDFBox that I can extract all embedded images as separate images, Thanks
Yes. It is possible to extract all images from all the pages in pdf.
You may refer this link, extract images from pdf using PDFBox.
The basic idea here is that, extend the class with PDFStreamEngine, and override processOperator method. Call PDFStreamEngine.processPage for all the pages. And if the object that has been passed to processOperator is an Image Object, get BufferedImage from the object, and save it.
Extend PDFStreamEngine and override the processOperator some thing like
#Override
protected void processOperator( Operator operator, List<COSBase> operands) throws IOException
{
String operation = operator.getName();
if( "Do".equals(operation) )
{
COSName objectName = (COSName) operands.get( 0 );
PDXObject xobject = getResources().getXObject( objectName );
if( xobject instanceof PDImageXObject)
{
PDImageXObject image = (PDImageXObject)xobject;
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
// same image to local
BufferedImage bImage = new BufferedImage(imageWidth,imageHeight,BufferedImage.TYPE_INT_ARGB);
bImage = image.getImage();
ImageIO.write(bImage,"PNG",new File("c:\\temp\\image_"+imageNumber+".png"));
imageNumber++;
}
else
{
}
}
else
{
super.processOperator( operator, operands);
}
}
This answer is similar with #jprism. But this is intended for someone who want just copy and paste this ready to use code with demo.
import org.apache.pdfbox.contentstream.PDFStreamEngine;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.UUID;
public class ExtractImagesUseCase extends PDFStreamEngine{
private final String filePath;
private final String outputDir;
// Constructor
public ExtractImagesUseCase(String filePath,
String outputDir){
this.filePath = filePath;
this.outputDir = outputDir;
}
// Execute
public void execute(){
try{
File file = new File(filePath);
PDDocument document = PDDocument.load(file);
for(PDPage page : document.getPages()){
processPage(page);
}
}catch(IOException e){
e.printStackTrace();
}
}
#Override
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException{
String operation = operator.getName();
if("Do".equals(operation)){
COSName objectName = (COSName) operands.get(0);
PDXObject pdxObject = getResources().getXObject(objectName);
if(pdxObject instanceof PDImageXObject){
// Image
PDImageXObject image = (PDImageXObject) pdxObject;
BufferedImage bImage = image.getImage();
// File
String randomName = UUID.randomUUID().toString();
File outputFile = new File(outputDir,randomName + ".png");
// Write image to file
ImageIO.write(bImage, "PNG", outputFile);
}else if(pdxObject instanceof PDFormXObject){
PDFormXObject form = (PDFormXObject) pdxObject;
showForm(form);
}
}
else super.processOperator(operator, operands);
}
}
Demo
public class ExtractImageDemo{
public static void main(String[] args){
String filePath = "C:\\Users\\John\\Downloads\\Documents\\sample-file.pdf";
String outputDir = "C:\\Users\\John\\Downloads\\Documents\\Output";
ExtractImagesUseCase useCase = new ExtractImagesUseCase(
filePath,
outputDir
);
useCase.execute();
}
}

convert html to pdf using iText

i want to convert html file with images to pdf using iText. I am providing my source here.
This is my HTML file...
<html>
<body>
<img src='' width='62' height='80' style='float: left; margin-right: 28px;' alt="" />
<!-- <img src="add.png" alt="" /> -->
</body>
</html>
I want to convert this html file to pdf...
Am using the following java code...
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.net.URL;
import java.nio.charset.Charset;
import org.apache.commons.io.IOUtils;
import org.apache.pdfbox.encoding.Encoding;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;
import org.jsoup.select.Elements;
import org.w3c.tidy.Tidy;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.Pipeline;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerFontProvider;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.css.CssFilesImpl;
import com.itextpdf.tool.xml.css.StyleAttrCSSResolver;
import com.itextpdf.tool.xml.html.CssAppliersImpl;
import com.itextpdf.tool.xml.html.HTML;
import com.itextpdf.tool.xml.html.TagProcessor;
import com.itextpdf.tool.xml.html.TagProcessorFactory;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;
import com.itextpdf.tool.xml.pipeline.html.ImageProvider;
import com.pdfcrowd.Client;
public class App
{
public static void main( String[] args ) throws DocumentException, IOException
{
// step 1
Document document = new Document();
document.newPage();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("pdf.pdf"));
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream("index.html"));
//step 5
document.close();
System.out.println( "PDF Created!" );
}
}
Am getting the following error...
Exception in thread "main" ExceptionConverter: java.io.IOException: The document has no pages.
at com.itextpdf.text.pdf.PdfPages.writePageTree(PdfPages.java:113)
at com.itextpdf.text.pdf.PdfWriter.close(PdfWriter.java:1243)
at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:849)
at com.itextpdf.text.Document.close(Document.java:416)
at App.main(App.java:64)
Please help me out How can i convert html file with images to pdf using itext. I am able to convert that html file if i dont have images or if i hardcode the image path. Thanks in advance
You need to implement a custom image tag processor to process the images embedded inside your html:
package com.example.itext.processor;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import com.itextpdf.text.Chunk;
import com.itextpdf.text.Element;
import com.itextpdf.text.Image;
import com.itextpdf.text.log.Level;
import com.itextpdf.text.log.Logger;
import com.itextpdf.text.log.LoggerFactory;
import com.itextpdf.text.pdf.codec.Base64;
import com.itextpdf.tool.xml.NoCustomContextException;
import com.itextpdf.tool.xml.Tag;
import com.itextpdf.tool.xml.WorkerContext;
import com.itextpdf.tool.xml.exceptions.LocaleMessages;
import com.itextpdf.tool.xml.exceptions.RuntimeWorkerException;
import com.itextpdf.tool.xml.html.HTML;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;
public class ImageTagProcessor extends com.itextpdf.tool.xml.html.Image {
private final Logger logger = LoggerFactory.getLogger(getClass());
/*
* (non-Javadoc)
*
* #see com.itextpdf.tool.xml.TagProcessor#endElement(com.itextpdf.tool.xml.Tag, java.util.List, com.itextpdf.text.Document)
*/
#Override
public List<Element> end(final WorkerContext ctx, final Tag tag, final List<Element> currentContent) {
final Map<String, String> attributes = tag.getAttributes();
String src = attributes.get(HTML.Attribute.SRC);
List<Element> elements = new ArrayList<Element>(1);
if (null != src && src.length() > 0) {
Image img = null;
if (src.startsWith("data:image/")) {
final String base64Data = src.substring(src.indexOf(",") + 1);
try {
img = Image.getInstance(Base64.decode(base64Data));
} catch (Exception e) {
if (logger.isLogging(Level.ERROR)) {
logger.error(String.format(LocaleMessages.getInstance().getMessage(LocaleMessages.HTML_IMG_RETRIEVE_FAIL), src), e);
}
}
if (img != null) {
try {
final HtmlPipelineContext htmlPipelineContext = getHtmlPipelineContext(ctx);
elements.add(getCssAppliers().apply(new Chunk((com.itextpdf.text.Image) getCssAppliers().apply(img, tag, htmlPipelineContext), 0, 0, true), tag,
htmlPipelineContext));
} catch (NoCustomContextException e) {
throw new RuntimeWorkerException(e);
}
}
}
if (img == null) {
elements = super.end(ctx, tag, currentContent);
}
}
return elements;
}
}
Following code snippet registers the custom image tag processor and coverts an HTML document to PDF
public static void main(String[] args) {
convertHtmlToPdf();
}
private static void convertHtmlToPdf() {
try {
final OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
final Document document = new Document();
final PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
final TagProcessorFactory tagProcessorFactory = Tags.getHtmlTagProcessorFactory();
tagProcessorFactory.removeProcessor(HTML.Tag.IMG);
tagProcessorFactory.addProcessor(new ImageTagProcessor(), HTML.Tag.IMG);
final CssFilesImpl cssFiles = new CssFilesImpl();
cssFiles.add(XMLWorkerHelper.getInstance().getDefaultCSS());
final StyleAttrCSSResolver cssResolver = new StyleAttrCSSResolver(cssFiles);
final HtmlPipelineContext hpc = new HtmlPipelineContext(new CssAppliersImpl(new XMLWorkerFontProvider()));
hpc.setAcceptUnknown(true).autoBookmark(true).setTagFactory(tagProcessorFactory);
final HtmlPipeline htmlPipeline = new HtmlPipeline(hpc, new PdfWriterPipeline(document, writer));
final Pipeline<?> pipeline = new CssResolverPipeline(cssResolver, htmlPipeline);
final XMLWorker worker = new XMLWorker(pipeline, true);
final Charset charset = Charset.forName("UTF-8");
final XMLParser xmlParser = new XMLParser(true, worker, charset);
final InputStream is = new FileInputStream("C:\\test.html");
xmlParser.parse(is, charset);
is.close();
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
// TODO
}
}
This Exception will occur if there is no content in your pdf page.
Try Passing your InputStream like this
String str="<html>
<body>
<img src='' width='62' height='80' style='float: left; margin-right: 28px;' alt="" />
<!-- <img src="add.png" alt="" /> -->
</body>
</html>"
InputStream is = new ByteArrayInputStream(str.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);

how to add image and text in pdf for save html file

import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.html.simpleparser.HTMLWorker;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.codec.Base64.OutputStream;
import java.io.FileOutputStream;
import java.io.StringReader;
import java.net.URL;
public class myclass {
public static void main(String[] args) {
String result = "<html><body><div>(i) the recognised association shall have the approval of the Forward Markets Commission established under the Forward Contracts (Regulation) Act, 1952 (74 of 1952) in respect of trading in derivatives and shall function in accordance with the guidelines or conditions laid down by the Forward Markets Commission; </div> <body> </html>";
Document document = new Document();
OutputStream file = null;
try {
PdfWriter.getInstance(document, new FileOutputStream(
"E://Image.pdf"));
document.open();
PdfWriter.getInstance(document, file);
document.open();
#SuppressWarnings("deprecation")
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(result));
String imageUrl = "http://www.taxmann.com/emailer/demo/mobileAapp/newAppDesign.jpg";
Image image2 = Image.getInstance(new URL(imageUrl));
document.add(image2);
document.close();
file.flush();
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
I am trying to save image and text in pdf file. When we set Either text or image then it's working fine, simultaneously am not able to save image and text Both in pdf. How will I save image and text both in Pdf? I am Using iText.
May the problem is Wrong import for Outputstream and file is Always null . You never assigned any O/P strream.
Try this
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.io.StringReader;
import java.net.URL;
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.html.simpleparser.HTMLWorker;
import com.itextpdf.text.pdf.PdfWriter;
public class ItextExample {
#SuppressWarnings("deprecation")
public static void main(String[] args) {
String result = "<html><body><div>(i) the recognised association shall have the approval of the Forward Markets Commission established under the Forward Contracts (Regulation) Act, 1952 (74 of 1952) in respect of trading in derivatives and shall function in accordance with the guidelines or conditions laid down by the Forward Markets Commission; </div> <body> </html>";
Document document = new Document();
OutputStream file = null;
try {
file = new FileOutputStream("E://Image1.pdf");
PdfWriter.getInstance(document,file);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(result));
String imageUrl = "http://www.taxmann.com/emailer/demo/mobileAapp/newAppDesign.jpg";
Image image2 = Image.getInstance(new URL(imageUrl));
document.add(image2);
} catch (Exception e) {
e.printStackTrace();
}finally {
try {
document.close();
file.flush();
}catch(Exception e) {
e.printStackTrace();
}
}
}
}

Categories