How to convert HTML to PDF using iText [duplicate]

How to convert HTML to PDF using iText [duplicate] - java

This question already has answers here:
Converting HTML files to PDF [closed]
(8 answers)
Closed 9 years ago.
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.itextpdf.text.Document;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
public class GeneratePDF {
public static void main(String[] args) {
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("E:\\Test.pdf"));
Document document = new Document();
PdfWriter.getInstance(document, file);
document.open();
document.add(new Paragraph(k));
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
This is my code to convert HTML to PDF. I am able to convert it but in PDF file it saves as whole HTML while I need to display only text. <html><body> This is my Project </body></html> gets saved to PDF while it should save only This is my Project.

You can do it with the HTMLWorker class (deprecated) like this:
import com.itextpdf.text.html.simpleparser.HTMLWorker;
//...
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
Document document = new Document();
PdfWriter.getInstance(document, file);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(k));
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
or using the XMLWorker, (download from this jar) using this code:
import com.itextpdf.tool.xml.XMLWorkerHelper;
//...
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
InputStream is = new ByteArrayInputStream(k.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}

This links might be helpful to convert.
https://code.google.com/p/flying-saucer/
https://today.java.net/pub/a/today/2007/06/26/generating-pdfs-with-flying-saucer-and-itext.html
If it is a college Project, you can even go for these,
http://pd4ml.com/examples.htm
Example is given to convert HTML to PDF

Related

Convert Base64 into PDF without converting it into an Image in JAVA

I've done converting PNG to PDF using itext.
How can i convert BASE64 into a PDF File, how can i achieve this.
//b64Image contains base64code from html2canvas, this code is for converting
base64 to png
String b64Image=request.getParameter("img_val");
b64Image = b64Image.replace("data:image/png;base64,", "");
b64Image = b64Image.replace(" ", "+");
byte[] imageByte;
String path = "C:\\Users\\P.Balraj\\Desktop\\Screenshot.png";
BASE64Decoder decoder = new BASE64Decoder();
imageByte = decoder.decodeBuffer(b64Image);
File outputfile = new File(path);
BufferedOutputStream stream = new BufferedOutputStream(
new FileOutputStream(outputfile));
stream.write(imageByte, 0, imageByte.length);
stream.flush();
stream.close();
Thank you

You can add b64 image to pdf like this:
import java.io.FileOutputStream;
//com.lowagie... old version
//com.itextpdf... recent version
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Image;
import org.apache.commons.codec.binary.Base64;
public class Image64ToPDF {
public static void main(String ... args) {
Document document = new Document();
//
String b64Image = "iVBORw0KGgoAAAANSUhEUgAAAJcAAACXCAMAAAAvQTlLAAABF1BMVEX////qQzU0qFNChfT7vAU8gvTz9/5pm/bh6f3m7f4edvMhePPqPi/7tgD7uAD7ugAyfvPpOCjpNCIopUv0+vb62NYAnjf97ezU6tnrTkKv2LhCrF7whX4YokLpLRnC4cn+9/bymZPoHgDt8v4zqkHo9Ov0qqX2vrrpOzb8wgB1vob+7s+ExZNUsmv/+e9XkPXT4PzsX1WZzqX4ycbudW374+JiuHdKqU7btQPsuxaTtfj8w0U8lrSux/prrEQDpljF1vsmf+Gmz8SRsD40pGf/4Kj86cGmsjY3oH85maKAqfc/jtQ/k8Q5nZL3qBT81oTtXC/xfCb1lxvvbyHNtyb8y2MAaPK1syznAhj2nzjuZirzjSD80nX4U1R1AAAFc0lEQVR4nO2YC1faSBiGQwCtEAzJxBgICSGFoMilQERbWy+tdXfb3bXdrXvt//8dO1wOEDIZkkmGePbMo+eoR8DHb9755hs4jsFgMBgMBoPBYDAYDMYzxDo+HzsN1x1C3FqjPT62rNSdjpza0LRNU5aEGZJs2rbkNtpHKUqN3UoGCmV8CLKcqTSOrDSsxg3BRjmt3Oyhc75rq/ZQkIOdlmoZd5dmljO0MZVaR7LdnSVtPJRDWs3MBNfahdWRG7ZWSzPJoa/lCFI0qykm7ZidD83oVtOS2W2aWuMKQbHmyDV6Wk7UZK1juseUtGrbOxYOqWLRsLJcsmitvDJUwu8SR2uOUKHRYC033iLCalGJV8xsUaoW14idLSrVam+rliBJ8hQJjoe7q9aRjZWSzQqcn502BM7Tw4y5uUMoVcsa4sY/W2qMz63Vo+G079rm+jOEDJ1RB5N5OVND/k3HXZlRqhY3Dsy8YNeCWqXVrph0q3VcCSwWfnpxMhLFanFOwCoK9rZp7xyWjFa1uOOAvSgPQ5x2DZtWtbga+lg0h6H+oEPr0nH2+wlSi+KUF4rH3L8IMdNNWatfyp3+mtk0k3Zz88LweJiDYl+8YoJJK8xh6V+JOcjpV4+YlOJ7NXMeZlpQ7LeTlZncSFuL+36YW4it1lLIpG3F9RflmoqJ/yzEZKo31FA8lHIrTucNQ0i7RXBry7gI2bRh2OO0rTjuSsx5xMQvJ0IlbSl4BuV8fP0r/XQtu8R6yf7GTRF7kSD2+njo8xK/Yx5/MNiPArHYo9/r8CPOq5CPQOElqdeTfx1LfaxXNgLFV4Ra/dcIL9wTonmV35F6IeJ1laDXG1Kvks/rEBf7iF75Twl64WIf1Wuf0OvM71VK0muQoNcD8/pfeD2HfKH24+Pz9BKfnoPXoe8cEnMJepH2r/4V1XObuN8j54mzxLyIz0fk/IUL/o7mCeS8+qRgvP4oFosF+DH/LEx/mn9XKPu9iOcv7sGvJV5rmCe8COLgjV+MfF71BV/8dmGMiF7qE6Jg5BcPr5eY+5PneYPolYo+qzz5vcMbMDF3fTP1IinYK/+OIN+O3o4vXs60eL7Xiv5CiHiRb0eOs1YLKX6bWxEV7GCA6BMvyL2WCymKH/glauSCvUOknvR0nLF4h0J8fX2z8gITTA9D4k99rHhxix0pXvIeQMSVRKQrWyDuqjMeSnANf7jxevFqPcpr3CK08gPyLjEDtocPm1qwYriuv0k2n/gywuRfXvi1YMVCi+3tI7Sy2XjLCFvYBcIKoofdlPuIVSSfCVdUDbRYuIztIbXipn6KoqO9eNDd/uSXA6QW8Wi/Th0EiBn6lpAp7++QWtnybQJe3CRIDIARrsNqHfUeKZZIueC/HeQFzdRRQP6V+kQFPGi+Lfu3YzHO0bhGUPRni8l3EBugNZoY8/+m+ZOvYmXii9AmXYwYDwxjUtVaremSKkqrpdW7QF2VuHl/t1GxAfEAvYmiBy/lQk3vTTqQSU83DO+Dwf1nz7lduE1KC65LULPwCECQv2i+XTuKYp9AHjQ1hFggzR+X6Y99YG8Q2MVCAfTP8/jnY9yCAsRiVQyAn8vJdVQP1VgV45u/5PM0tOJWDIbsrhjjDoRB29IutmAY8acINC0d12C3APgoQ240FGznx2v1ot6iIlEN6J3brFTs6JEASuDYg8HoRbpBkVHnIy4mMLqUizVHqfIRagbAhOCdFjJaVRCymQG1Q28bIlDqvaDxYb1UemdntVqijXQVkzSgGp3qTnLlQ9FGPZ03NusGABwNe516OlILWvVqd6IbqqoakOkXfdId1XcaqiAUONdrWh2iaS34Q9o+DAaDwWAwGAwGg8FgMJD8B0Y0n3erXn7HAAAAAElFTkSuQmCC"; // .gif and .jpg are ok too!
String output = "c:/output.pdf";
try {
PdfWriter.getInstance(document, new FileOutputStream(output));
document.open();
byte[] decoded = Base64.decodeBase64(b64Image.getBytes());
document.add(Image.getInstance(decoded));
document.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
}

Read embedded pdf file in excel using Java

I am new to Java programming. My current project requires me to read embedded(ole) files in an excel sheet and get text contents in them. Examples for reading embedded word file worked fine, however I am unable to find help reading an embedded pdf file. Tried few things by looking at similar examples.... which didn't work out.
http://poi.apache.org/spreadsheet/quick-guide.html#Embedded
I have code below, probably with help I can get in right direction. I have used Apache POI to read embedded files in excel and pdfbox to parse pdf data.
public class ReadExcel1 {
public static void main(String[] args) {
try {
FileInputStream file = new FileInputStream(new File("C:\\test.xls"));
POIFSFileSystem fs = new POIFSFileSystem(file);
HSSFWorkbook workbook = new HSSFWorkbook(fs);
for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
String oleName = obj.getOLE2ClassName();
if(oleName.equals("Acrobat Document")){
System.out.println("Acrobat reader document");
try{
DirectoryNode dn = (DirectoryNode) obj.getDirectory();
for (Iterator<Entry> entries = dn.getEntries(); entries.hasNext();) {
DocumentEntry nativeEntry = (DocumentEntry) dn.getEntry("CONTENTS");
byte[] data = new byte[nativeEntry.getSize()];
ByteArrayInputStream bao= new ByteArrayInputStream(data);
PDFParser pdfparser = new PDFParser(bao);
pdfparser.parse();
COSDocument cosDoc = pdfparser.getDocument();
PDFTextStripper pdfStripper = new PDFTextStripper();
PDDocument pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(2);
System.out.println("Text from the pdf "+pdfStripper.getText(pdDoc));
}
}catch(Exception e){
System.out.println("Error reading "+ e.getMessage());
}finally{
System.out.println("Finally ");
}
}else{
System.out.println("nothing ");
}
}
file.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Below is the output in eclipse
Acrobat reader document
Error reading Error: End-of-File, expected line
Finally
nothing

The PDF weren't OLE 1.0 packaged, but somehow differently embedded - at least the extraction worked for me.
This is not a general solution, because it depends on how the embedding application names the entries ... of course for PDFs you could check all DocumentNode-s for the magic number "%PDF" - and in case of OLE 1.0 packaged elements this needs to be done differently ...
I think, the real filename of the pdf is somewhere hidden in the \1Ole or CompObj entries, but for the example and apparently for your use case that's not necessary to determine.
import java.io.*;
import java.net.URL;
import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.util.IOUtils;
public class EmbeddedPdfInExcel {
public static void main(String[] args) throws Exception {
NPOIFSFileSystem fs = new NPOIFSFileSystem(new URL("http://jamesshaji.com/sample.xls").openStream());
HSSFWorkbook wb = new HSSFWorkbook(fs.getRoot(), true);
for (HSSFObjectData obj : wb.getAllEmbeddedObjects()) {
String oleName = obj.getOLE2ClassName();
DirectoryNode dn = (DirectoryNode)obj.getDirectory();
if(oleName.contains("Acro") && dn.hasEntry("CONTENTS")){
InputStream is = dn.createDocumentInputStream("CONTENTS");
FileOutputStream fos = new FileOutputStream(obj.getDirectory().getName()+".pdf");
IOUtils.copy(is, fos);
fos.close();
is.close();
}
}
fs.close();
}
}

com.itextpdf.tool.xml.exceptions.RuntimeWorkerException in Java

public class GeneratePDF {
public static void main(String[] args) {
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("E:\\Test11.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
InputStream is = new ByteArrayInputStream(k.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
This is my code for convert HTML to Pdf for Static and small content Html its working fine But for dynamic and long Html content it com.itextpdf.tool.xml.exceptions.RuntimeWorkerException this Excpetion please help me where am doing Wrong .

The problem is that you have invalid html.
Try converting it using the HTMLWorker class

how to convert doc,docx files to pdf in java programatically

I am able to generate pdf from docx file using docx4j.But i need to convert doc file to pdf including images and tables.
Is there any way to convert doc to docx in java. or (doc to pdf)?

docx4j contains org.docx4j.convert.in.Doc, which uses POI to read the .doc, but it is a proof of concept, not production ready code. Last I checked, there were limits to POI's HWPF parsing of a binary .doc.
Further to mqchen's comment, you can use LibreOffice or OpenOffice to convert doc to docx. But if you are going to use LibreOffice or OpenOffice, you may as well use it to convert both .doc and .docx directly to PDF. Google 'jodconverter'.

Cribbing off the POI unit tests, I came up with this to extract the text from a word document:
public String getText(String document) {
try {
ZipInputStream is = new ZipInputStream(new FileInputStream(document));
try {
is.getNextEntry();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
IOUtils.copy(is, baos);
} finally {
baos.close();
}
byte[] byteArray = baos.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(byteArray);
HWPFDocument doc = new HWPFDocument(bais);
extractor = new WordExtractor(doc);
extractor.getText();
} finally {
is.close();
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
I do hope that points you in the right direction, if not sorts you entirely.

You can use jWordConvert for this.
jWordConvert is a Java library that can read and render Word documents
natively to convert to PDF, to convert to images, or to print the
documents automatically.
Details can be found at following link
http://www.qoppa.com/wordconvert/

https://github.com/guptachunky/Conversion-Work
This Github Link might be helpful for that.
https://github.com/guptachunky/Conversion-Work/blob/main/src/main/java/com/conversion/Conversion/Service/ConversionService.java
public void docToPdf(FileDetail fileDetail, HttpServletResponse response) {
InputStream doc;
try {
File docFile = converterToFile(fileDetail.getFile());
doc = new FileInputStream(docFile);
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
File file = File.createTempFile("output", ".pdf");
OutputStream out = new FileOutputStream(file);
PdfConverter.getInstance().convert(document, out, options);
getClaimFiles(file, response);
} catch (IOException e) {
response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
}
}
public void getClaimFiles(File file, HttpServletResponse response) {
try {
response.setContentType("application/pdf");
response.setHeader("Content-Disposition",
"attachment; filename=dummy.pdf");
response.getOutputStream().write(Files.readAllBytes(file.toPath()));
} catch (Exception e) {
response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
}
}

Java:using apache POI how to convert ms word file to pdf?

By using apache POI how to convert ms word file to pdf?
I an using the following code but its not working giving errors I guess I am importing the wrong classes?
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.apache.poi.hslf.record.Document;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class TestCon {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("/document/test2.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("/document/test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
System.out.println("Length:" + paragraphs[i].length());
System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());
// add the paragraph to the document
document.add(new Paragraph(paragraphs[i]));
}
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
}

Got It solved
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class TestCon {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("D:/Resume.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("D:/test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
System.out.println("Length:" + paragraphs[i].length());
System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());
// add the paragraph to the document
document.add(new Paragraph(paragraphs[i]));
}
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
}

This worked For Me:-
Source :- http://www.programcreek.com/java-api-examples/index.php?api=org.apache.poi.xwpf.converter.pdf.PdfConverter
package pdf;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class PDF {
public static void main(String[] args) throws Exception {
String inputFile="D:/TEST.docx";
String outputFile="D:/TEST.pdf";
if (args != null && args.length == 2) {
inputFile=args[0];
outputFile=args[1];
}
System.out.println("inputFile:" + inputFile + ",outputFile:"+ outputFile);
FileInputStream in=new FileInputStream(inputFile);
XWPFDocument document=new XWPFDocument(in);
File outFile=new File(outputFile);
OutputStream out=new FileOutputStream(outFile);
PdfOptions options=null;
PdfConverter.getInstance().convert(document,out,options);
}
}

The below code worked for me:
Public class DocToPdfConverter{
public static void main(String[] args) {
String k=null;
OutputStream fileForPdf =null;
try {
String fileName="/document/test2.doc";
//Below Code is for .doc file
if(fileName.endsWith(".doc"))
{
HWPFDocument doc = new HWPFDocument(new FileInputStream(
fileName));
WordExtractor we=new WordExtractor(doc);
k = we.getText();
fileForPdf = new FileOutputStream(new File(
"/document/DocToPdf.pdf"));
we.close();
}
//Below Code for
else if(fileName.endsWith(".docx"))
{
XWPFDocument docx = new XWPFDocument(new FileInputStream(
fileName));
// using XWPFWordExtractor Class
XWPFWordExtractor we = new XWPFWordExtractor(docx);
k = we.getText();
fileForPdf = new FileOutputStream(new File(
"/document/DocxToPdf.pdf"));
we.close();
}
Document document = new Document();
PdfWriter.getInstance(document, fileForPdf);
document.open();
document.add(new Paragraph(k));
document.close();
fileForPdf.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}

There are several steps here:
Read Word document using POI into a format-agnostic form
Convert format-agnostic form into PDF
Write PDF
I don't know if POI will do step 2 for you. I'd recommend something else, like iText.

As a side note, it's also possible to read content on-the-fly directly from a Word/Excel content stream instead of reading it from the filesystem and serializing it to disk, for example when retrieving content from CMIS repositories:
e.g.
//HWPFDocument docx = new HWPFDocument(fs);
HWPFDocument docx = new HWPFDocument(doc.getContentStream().getStream());
(doc is of type org.apache.chemistry.opencmis.client.api.Document and in this case I adapted your code to retrieve a word file from an Alfresco repository by means of opencmis and transformed it to PDF)
HTH

In addition to Kushagra's answer, here the updated maven dependencies:
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
<version>2.0.1</version>
</dependency>

This save my day, i load docx file from an url and convert it to pdf:
pom.xml
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.13</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.13</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>org.apache.poi.xwpf.converter.pdf</artifactId>
<version>LATEST</version>
</dependency>
main_class
public String wordToPDFPOI(String url) throws Exception {
InputStream doc = new URL(url).openStream();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
PdfConverter.getInstance().convert(document, baos, options);
String base64_encoded = Base64.encodeBytes(baos.toByteArray());
return base64_encoded;
}

All of the answers above will fail if the document has images. I would not suggest you to use apache poi since its library to convert word to pdf have been discontinued now. As of today I don't think that there is any open source library which do the conversion (they require some dependencies like some need MS word to be installed, etc). The best way I could think of (it will only work if you are deploying project on linux machine) is that install Libre Office (open source) in the linux machine and run this :
String command = "libreoffice --headless --convert-to pdf " + inputPath + " --outdir " + outputPath;
try {
Runtime.getRuntime().exec(command);
} catch (IOException e) {
e.printStackTrace();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to convert HTML to PDF using iText [duplicate] - java

Related

Convert Base64 into PDF without converting it into an Image in JAVA

Read embedded pdf file in excel using Java

com.itextpdf.tool.xml.exceptions.RuntimeWorkerException in Java

how to convert doc,docx files to pdf in java programatically

Java:using apache POI how to convert ms word file to pdf?

Categories

Resources