I am able to generate pdf from docx file using docx4j.But i need to convert doc file to pdf including images and tables.
Is there any way to convert doc to docx in java. or (doc to pdf)?
docx4j contains org.docx4j.convert.in.Doc, which uses POI to read the .doc, but it is a proof of concept, not production ready code. Last I checked, there were limits to POI's HWPF parsing of a binary .doc.
Further to mqchen's comment, you can use LibreOffice or OpenOffice to convert doc to docx. But if you are going to use LibreOffice or OpenOffice, you may as well use it to convert both .doc and .docx directly to PDF. Google 'jodconverter'.
Cribbing off the POI unit tests, I came up with this to extract the text from a word document:
public String getText(String document) {
try {
ZipInputStream is = new ZipInputStream(new FileInputStream(document));
try {
is.getNextEntry();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
IOUtils.copy(is, baos);
} finally {
baos.close();
}
byte[] byteArray = baos.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(byteArray);
HWPFDocument doc = new HWPFDocument(bais);
extractor = new WordExtractor(doc);
extractor.getText();
} finally {
is.close();
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
I do hope that points you in the right direction, if not sorts you entirely.
You can use jWordConvert for this.
jWordConvert is a Java library that can read and render Word documents
natively to convert to PDF, to convert to images, or to print the
documents automatically.
Details can be found at following link
http://www.qoppa.com/wordconvert/
https://github.com/guptachunky/Conversion-Work
This Github Link might be helpful for that.
https://github.com/guptachunky/Conversion-Work/blob/main/src/main/java/com/conversion/Conversion/Service/ConversionService.java
public void docToPdf(FileDetail fileDetail, HttpServletResponse response) {
InputStream doc;
try {
File docFile = converterToFile(fileDetail.getFile());
doc = new FileInputStream(docFile);
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
File file = File.createTempFile("output", ".pdf");
OutputStream out = new FileOutputStream(file);
PdfConverter.getInstance().convert(document, out, options);
getClaimFiles(file, response);
} catch (IOException e) {
response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
}
}
public void getClaimFiles(File file, HttpServletResponse response) {
try {
response.setContentType("application/pdf");
response.setHeader("Content-Disposition",
"attachment; filename=dummy.pdf");
response.getOutputStream().write(Files.readAllBytes(file.toPath()));
} catch (Exception e) {
response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
}
}
Related
Anybody have any idea about,How to handle unstructured data like Audio,Video and Images using Hbase.I tried for this alot but i didn't get any idea.please any help is appreciated.
Option 1: convert image to byte array and you can prepare put request and insert to table. Similarly audio and video files also can be achieved.
See https://docs.oracle.com/javase/7/docs/api/javax/imageio/package-summary.html
import javax.imageio.ImageIO;
/* * Convert an image to a byte array
*/
private byte[] convertImageToByteArray (String ImageName)throws IOException {
byte[] imageInByte;
BufferedImage originalImage = ImageIO.read(new File(ImageName));
// convert BufferedImage to byte array
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(originalImage, "jpg", baos);
imageInByte = baos.toByteArray();
baos.close();
return imageInByte;
}
Option 2 : You can do that in below way using Apache commons lang API. probably this is best option than above which will be applicable to all objects including image/audio/video etc.. This can be used NOT ONLY for hbase you can save it in hdfs as well
See my answer for more details.
For ex : byte[] mediaInBytes = org.apache.commons.lang.SerializationUtils.serialize(Serializable obj)
for deserializing, you can do this static Object deserialize(byte[] objectData)
see the doc in above link..
Example usage of the SerializationUtils
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.commons.lang.SerializationUtils;
public class SerializationUtilsTest {
public static void main(String[] args) {
try {
// File to serialize object to it can be your image or any media file
String fileName = "testSerialization.ser";
// New file output stream for the file
FileOutputStream fos = new FileOutputStream(fileName);
// Serialize String
SerializationUtils.serialize("SERIALIZE THIS", fos);
fos.close();
// Open FileInputStream to the file
FileInputStream fis = new FileInputStream(fileName);
// Deserialize and cast into String
String ser = (String) SerializationUtils.deserialize(fis);
System.out.println(ser);
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Note :jar of apache commons lang always available in hadoop cluster.(not external dependency)
I am creating pdf file using jasper reports. The code for generating the pdf file runs fine and the file is created on a specified path. But I want the file to be downloaded rather than to be stored on some drive of a client. I use word dynamically in my question because it is generated from jasper reports when user clicks on download. I googled for this and I got that response.setHeader is responsible for downloading but it needs a source or we can say that path of the storage. The code for generating pdf is given below.
String ip="D:\\workspace\\Jsaper1\\src\\Coll.jasper";
String op="D:\\workspace\\Jsaper1\\src\\Timesheet.pdf";
try
{
File file=new File(ip);
InputStream is=new FileInputStream(file);
Map<String, Object> params = new HashMap<String, Object>();
Datasource da=new Datasource();
JRDataSource jrdsource=new JRBeanCollectionDataSource(da.getDataSource());
JasperReport jreport=(JasperReport) JRLoader.loadObject(file);
JasperPrint jasperPrint = JasperFillManager.fillReport(jreport, params, da.getDataSource1());
JasperExportManager.exportReportToPdfFile(jasperPrint, op);
sos.flush();
sos.close();
}
catch(Exception e)
{
e.printStackTrace();
}
Since the generated file is already written in the disk, you can just open it then write to the response' OutputStream. Use IOUtils#copy from Apache to copy the content of input stream to an output stream avoiding the boilerplate. It also does internal buffering so no need to wrap your InputStream with BufferedInputStream.
public class DownloadServlet extends HttpServlet{
#Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
response.setContentType("application/octet-stream; charset=windows-1252");
response.setHeader("Content-Disposition", "attachment; filename=" + fileName);
InputStream input = null;
OutputStream output = null;
try {
input = new FileInputStream(new File("file-path-where-generated-pdf-is-stored"));
output = response.getOutputStream();
IOUtils.copy(input, output);
output.flush();
} catch (IOException e) {
//log it
} finally{
close(input);
close(output);
input = null;
output = null;
}
}
//Right way to close resource
public static void close(Closeable c) {
if (c == null) return;
try {
c.close();
}catch (IOException e) {
//log it
}
}
}
Here is a solution for you
Insert this code in your html/Interface file. You could do something like this:
response.setContentType("application/pdf");
response.setHeader("Content-Disposition", "attachment; filename=output.pdf");
ServletOutputStream sosStream = response.getOutputStream();
JasperPrint jasperPrint = (JasperPrint) session.getAttribute("jasperPrint");
JasperExportManager.exportReportToPdfStream(jasperPrint, sosStream);
File tempFile = File.createTempFile("TempFile.pdf",".tmp",new File("."));
InputStream isStream = null;
isStream = new FileInputStream(tempFile);
int ibit = 256;
while ((ibit) >= 0)
{
ibit = isStream.read();
sosStream.write(ibit);
}
sosStream.flush();
sosStream.close();
isStream.close();
out.clear();
out = pageContext.pushBody();
Notes :-
1. Response contentype and Header is set.
2. You can set jasperprint in session and retrieve it.
3. JasperExportManager.exportReportToPdfStream is called on the fly. This will take create an outpoutstream of pdf.
4. Rest of the code does writing jobs from stream to temp file.
You can call this code on click action of a button for download and a pop-up for download file will emerge asking for save location.
Try using the below sample as reference
ByteArrayOutputStream baos = new ByteArrayOutputStream();
JRAbstractExporter exporter = new JRPdfExporter();
exporter.setParameter(JRExporterParameter.OUTPUT_STREAM, baos);
ByteArrayOutputStream baos = exporter.exportReport();
OutputStream os = null;
try {
response.setContentType(mimeType);
response.setHeader("Content-disposition", "attachment; filename=\"Timesheet.pdf");
response.setHeader("Expires", "0");
response.setHeader("Cache-Control", "must-revalidate, post-check=0, pre-check=0");
os = response.getOutputStream();
response.setContentLength(baos.size());
baos.writeTo(os);
} finally {
os.flush();
os.close();
}
Please use ByteArrayOutputStream obj in report method , I have implemented in Dynamic report(Jasper Api) , Its working for me :-
#RequestMapping(value="/pdfDownload", method = RequestMethod.GET)
public void getPdfDownload(HttpServletResponse response) {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
report().columns().setDataSource().show()
.toPdf(buffer);
byte[] bytes = buffer.toByteArray();
InputStream inputStream = new ByteArrayInputStream (bytes);
IOUtils.copy(inputStream, response.getOutputStream());
response.setHeader("Content-Disposition", "attachment; filename=Accepted1.pdf");
response.flushBuffer();
}
You can use bellow code. Here is complete Example using java servlet
I have add bellow code. you can use bellow code to download pdf file from servlet..
Please download bellow jar file
Please download bellow jar file
1. jasperreports-5.0.1.jar
2. commons-logging-1.1.2.jar
3. commons-digester-2.1.jar
4. commons-collections-3.2.1-1.0.0.jar
5. commons-beanutils-1.8.3.jar
6. groovy-all-2.1.3.jar
7. com.lowagie.text-2.1.7.jar
8. your database library
Now use bellow code in servlet in doGet
#Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
try {
String path = "D:\\Software\\iReport-5.0.0-windows-installer\\u\\report4.jrxml";
JasperReport jasReport = JasperCompileManager.compileReport(path);
System.out.println("Jasper Report : " + jasReport);
//Database connection
Connection con = /*Your Datase Connection*/ ;
System.out.println(con);
//If You have paramerter add here
Map paramMap = new HashMap();
paramMap.put("id", request.getParameter("id"));
//if your have any parmeter add null to paramMap
JasperPrint jasPrint = JasperFillManager.fillReport(jasReport, null, con); //, mapParam, con);
System.out.println("Jasper Print : " + jasPrint);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
response.setContentType("application/x-download");
response.addHeader("Content-disposition", "attachment; filename=creditcard.pdf");
ServletOutputStream sos = response.getOutputStream();
JasperExportManager.exportReportToPdfStream(jasPrint, sos);
} catch (JRException ex) {
Logger.getLogger(Tests.class.getName()).log(Level.SEVERE, null, ex);
}
}
I am new to Java programming. My current project requires me to read embedded(ole) files in an excel sheet and get text contents in them. Examples for reading embedded word file worked fine, however I am unable to find help reading an embedded pdf file. Tried few things by looking at similar examples.... which didn't work out.
http://poi.apache.org/spreadsheet/quick-guide.html#Embedded
I have code below, probably with help I can get in right direction. I have used Apache POI to read embedded files in excel and pdfbox to parse pdf data.
public class ReadExcel1 {
public static void main(String[] args) {
try {
FileInputStream file = new FileInputStream(new File("C:\\test.xls"));
POIFSFileSystem fs = new POIFSFileSystem(file);
HSSFWorkbook workbook = new HSSFWorkbook(fs);
for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
String oleName = obj.getOLE2ClassName();
if(oleName.equals("Acrobat Document")){
System.out.println("Acrobat reader document");
try{
DirectoryNode dn = (DirectoryNode) obj.getDirectory();
for (Iterator<Entry> entries = dn.getEntries(); entries.hasNext();) {
DocumentEntry nativeEntry = (DocumentEntry) dn.getEntry("CONTENTS");
byte[] data = new byte[nativeEntry.getSize()];
ByteArrayInputStream bao= new ByteArrayInputStream(data);
PDFParser pdfparser = new PDFParser(bao);
pdfparser.parse();
COSDocument cosDoc = pdfparser.getDocument();
PDFTextStripper pdfStripper = new PDFTextStripper();
PDDocument pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(2);
System.out.println("Text from the pdf "+pdfStripper.getText(pdDoc));
}
}catch(Exception e){
System.out.println("Error reading "+ e.getMessage());
}finally{
System.out.println("Finally ");
}
}else{
System.out.println("nothing ");
}
}
file.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Below is the output in eclipse
Acrobat reader document
Error reading Error: End-of-File, expected line
Finally
nothing
The PDF weren't OLE 1.0 packaged, but somehow differently embedded - at least the extraction worked for me.
This is not a general solution, because it depends on how the embedding application names the entries ... of course for PDFs you could check all DocumentNode-s for the magic number "%PDF" - and in case of OLE 1.0 packaged elements this needs to be done differently ...
I think, the real filename of the pdf is somewhere hidden in the \1Ole or CompObj entries, but for the example and apparently for your use case that's not necessary to determine.
import java.io.*;
import java.net.URL;
import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.util.IOUtils;
public class EmbeddedPdfInExcel {
public static void main(String[] args) throws Exception {
NPOIFSFileSystem fs = new NPOIFSFileSystem(new URL("http://jamesshaji.com/sample.xls").openStream());
HSSFWorkbook wb = new HSSFWorkbook(fs.getRoot(), true);
for (HSSFObjectData obj : wb.getAllEmbeddedObjects()) {
String oleName = obj.getOLE2ClassName();
DirectoryNode dn = (DirectoryNode)obj.getDirectory();
if(oleName.contains("Acro") && dn.hasEntry("CONTENTS")){
InputStream is = dn.createDocumentInputStream("CONTENTS");
FileOutputStream fos = new FileOutputStream(obj.getDirectory().getName()+".pdf");
IOUtils.copy(is, fos);
fos.close();
is.close();
}
}
fs.close();
}
}
public class GeneratePDF {
public static void main(String[] args) {
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("E:\\Test11.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
InputStream is = new ByteArrayInputStream(k.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
This is my code for convert HTML to Pdf for Static and small content Html its working fine But for dynamic and long Html content it com.itextpdf.tool.xml.exceptions.RuntimeWorkerException this Excpetion please help me where am doing Wrong .
The problem is that you have invalid html.
Try converting it using the HTMLWorker class
I'm trying to generate a blank docx file using Java, add some text, then write it to a BLOB that I can return to our document processing engine (a custom mess of PL/SQL and Java). I have to use the 1.4 JVM inside Oracle 10g, so no Java 1.5 stuff. I don't have a problem writing the docx to a file on my local machine, but when I try to write to BLOB, I'm getting garbage. Am I doing something dumb? Any help is appreciated. Note in the code below, all the get[name]Xml() methods return an org.w3c.dom.Document.
public void save(String fileName) throws Exception {
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(fileName));
addEntry(zos, getDocumentXml(), "word/document.xml");
addEntry(zos, getContentTypesXml(), "[Content_Types].xml");
addEntry(zos, getRelsXml(), "_rels/.rels");
zos.flush();
zos.close();
}
public java.sql.BLOB save() throws Exception {
java.sql.Connection conn = DbUtilities.openConnection();
BLOB outBlob = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION);
outBlob.open(BLOB.MODE_READWRITE);
ZipOutputStream zos = new ZipOutputStream(outBlob.setBinaryStream(0L));
addEntry(zos, getDocumentXml(), "word/document.xml");
addEntry(zos, getContentTypesXml(), "[Content_Types].xml");
addEntry(zos, getRelsXml(), "_rels/.rels");
zos.flush();
zos.close();
return outBlob;
}
private void addEntry(ZipOutputStream zos, Document doc, String fileName) throws Exception {
Transformer t = TransformerFactory.newInstance().newTransformer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
t.transform(new DOMSource(doc), new StreamResult(baos));
ZipEntry ze = new ZipEntry(fileName);
byte[] data = baos.toByteArray();
ze.setSize(data.length);
zos.putNextEntry(ze);
zos.write(data);
zos.flush();
zos.closeEntry();
}
It looks like the problem was in our document processing engine. It was expecting a zipped docx. Glad we document our code well.
Anyway, thanks to everyone who looked at my problem.