com.itextpdf.tool.xml.exceptions.RuntimeWorkerException in Java - java

public class GeneratePDF {
public static void main(String[] args) {
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("E:\\Test11.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
InputStream is = new ByteArrayInputStream(k.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
This is my code for convert HTML to Pdf for Static and small content Html its working fine But for dynamic and long Html content it com.itextpdf.tool.xml.exceptions.RuntimeWorkerException this Excpetion please help me where am doing Wrong .

The problem is that you have invalid html.
Try converting it using the HTMLWorker class

Related

Appending text to existing word file using XWPFDocument

I am trying to append a text and screenshot to the existing word file. But every time I execute the below code I am getting error as :
org.apache.poi.EmptyFileException: The supplied file was empty (zero bytes long) at
org.apache.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:74) at
org.apache.poi.util.IOUtils.peekFirst8Bytes(IOUtils.java:57) at
org.apache.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:135)
at
org.apache.poi.openxml4j.opc.internal.ZipHelper.verifyZipHeader(ZipHelper.java:175)
at
org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipStream(ZipHelper.java:209)
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:98)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:324)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37) at
org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:116)
at test.tester.(tester.java:44) at
test.tester.main(tester.java:100)
failed to create file Taking first ss
java.lang.NullPointerException at test.tester.setText(tester.java:62)
at test.tester.main(tester.java:103)
Here is the code:
public class tester{
FileOutputStream fos = null;
XWPFDocument doc = null;
// create para and run
XWPFParagraph para = null;
XWPFRun run = null;
File file = null;
public tester() {
try {
file = new File("WordDocWithImage.docx");
writeToWord();
//doc = new XWPFDocument();
doc = new XWPFDocument(OPCPackage.open(file));
//doc = new XWPFDocument(new FileInputStream("WordDocWithImage.docx"));
para = doc.createParagraph();
run = para.createRun();
para.setAlignment(ParagraphAlignment.CENTER);
} catch (Exception e) {
e.printStackTrace();
System.out.println("failed to create file");
}
}
public FileOutputStream writeToWord() throws FileNotFoundException {
fos = new FileOutputStream("WordDocWithImage.docx");
return fos;
}
public void setText(String text) {
run.setText(text);
}
public void takeScreenshot() throws IOException, AWTException, InvalidFormatException {
// Take screenshot
Robot robot = new Robot();
Rectangle screenRect = new Rectangle(Toolkit.getDefaultToolkit().getScreenSize());
BufferedImage screenFullImage = robot.createScreenCapture(screenRect);
// convert buffered image to Input Stream
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(screenFullImage, "jpeg", baos);
baos.flush();
ByteArrayInputStream bis = new ByteArrayInputStream(baos.toByteArray());
baos.close();
// add image to word doc
run.addBreak();
run.addPicture(bis, XWPFDocument.PICTURE_TYPE_JPEG, "image file", Units.toEMU(450), Units.toEMU(250)); // 200x200
// pixels
bis.close();
}
public void writeToFile() {
try {
// write word doc to file
doc.write(fos);
fos.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
tester t = new tester();
try {
System.out.println("Taking first ss");
t.setText("First Text");
t.takeScreenshot();
System.out.println("Taking second ss");
t.setText("Second Text");
t.takeScreenshot();
t.writeToFile();
} catch(Exception e) {
e.printStackTrace();
}
}
}
Please assist.
The supplied file was empty (zero bytes long)
The problem is about getting the WordDocWithImage file in this line.
file = new File("WordDocWithImage.docx");
It could not find the docx file. You should check the location of the file and give the true path in there.
EDIT: You need to change the outputStream to different location. You can create a subfolder. I got ss trying this.
public FileOutputStream writeToWord() throws FileNotFoundException {
fos = new FileOutputStream("path/subFolder/WordDocWithImage.docx");
return fos;
}
Note: I have tried the code.

Read embedded pdf file in excel using Java

I am new to Java programming. My current project requires me to read embedded(ole) files in an excel sheet and get text contents in them. Examples for reading embedded word file worked fine, however I am unable to find help reading an embedded pdf file. Tried few things by looking at similar examples.... which didn't work out.
http://poi.apache.org/spreadsheet/quick-guide.html#Embedded
I have code below, probably with help I can get in right direction. I have used Apache POI to read embedded files in excel and pdfbox to parse pdf data.
public class ReadExcel1 {
public static void main(String[] args) {
try {
FileInputStream file = new FileInputStream(new File("C:\\test.xls"));
POIFSFileSystem fs = new POIFSFileSystem(file);
HSSFWorkbook workbook = new HSSFWorkbook(fs);
for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
String oleName = obj.getOLE2ClassName();
if(oleName.equals("Acrobat Document")){
System.out.println("Acrobat reader document");
try{
DirectoryNode dn = (DirectoryNode) obj.getDirectory();
for (Iterator<Entry> entries = dn.getEntries(); entries.hasNext();) {
DocumentEntry nativeEntry = (DocumentEntry) dn.getEntry("CONTENTS");
byte[] data = new byte[nativeEntry.getSize()];
ByteArrayInputStream bao= new ByteArrayInputStream(data);
PDFParser pdfparser = new PDFParser(bao);
pdfparser.parse();
COSDocument cosDoc = pdfparser.getDocument();
PDFTextStripper pdfStripper = new PDFTextStripper();
PDDocument pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(2);
System.out.println("Text from the pdf "+pdfStripper.getText(pdDoc));
}
}catch(Exception e){
System.out.println("Error reading "+ e.getMessage());
}finally{
System.out.println("Finally ");
}
}else{
System.out.println("nothing ");
}
}
file.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Below is the output in eclipse
Acrobat reader document
Error reading Error: End-of-File, expected line
Finally
nothing
The PDF weren't OLE 1.0 packaged, but somehow differently embedded - at least the extraction worked for me.
This is not a general solution, because it depends on how the embedding application names the entries ... of course for PDFs you could check all DocumentNode-s for the magic number "%PDF" - and in case of OLE 1.0 packaged elements this needs to be done differently ...
I think, the real filename of the pdf is somewhere hidden in the \1Ole or CompObj entries, but for the example and apparently for your use case that's not necessary to determine.
import java.io.*;
import java.net.URL;
import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.util.IOUtils;
public class EmbeddedPdfInExcel {
public static void main(String[] args) throws Exception {
NPOIFSFileSystem fs = new NPOIFSFileSystem(new URL("http://jamesshaji.com/sample.xls").openStream());
HSSFWorkbook wb = new HSSFWorkbook(fs.getRoot(), true);
for (HSSFObjectData obj : wb.getAllEmbeddedObjects()) {
String oleName = obj.getOLE2ClassName();
DirectoryNode dn = (DirectoryNode)obj.getDirectory();
if(oleName.contains("Acro") && dn.hasEntry("CONTENTS")){
InputStream is = dn.createDocumentInputStream("CONTENTS");
FileOutputStream fos = new FileOutputStream(obj.getDirectory().getName()+".pdf");
IOUtils.copy(is, fos);
fos.close();
is.close();
}
}
fs.close();
}
}

How to convert HTML to PDF using iText [duplicate]

This question already has answers here:
Converting HTML files to PDF [closed]
(8 answers)
Closed 9 years ago.
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.itextpdf.text.Document;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
public class GeneratePDF {
public static void main(String[] args) {
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("E:\\Test.pdf"));
Document document = new Document();
PdfWriter.getInstance(document, file);
document.open();
document.add(new Paragraph(k));
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
This is my code to convert HTML to PDF. I am able to convert it but in PDF file it saves as whole HTML while I need to display only text. <html><body> This is my Project </body></html> gets saved to PDF while it should save only This is my Project.
You can do it with the HTMLWorker class (deprecated) like this:
import com.itextpdf.text.html.simpleparser.HTMLWorker;
//...
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
Document document = new Document();
PdfWriter.getInstance(document, file);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(k));
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
or using the XMLWorker, (download from this jar) using this code:
import com.itextpdf.tool.xml.XMLWorkerHelper;
//...
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
InputStream is = new ByteArrayInputStream(k.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
This links might be helpful to convert.
https://code.google.com/p/flying-saucer/
https://today.java.net/pub/a/today/2007/06/26/generating-pdfs-with-flying-saucer-and-itext.html
If it is a college Project, you can even go for these,
http://pd4ml.com/examples.htm
Example is given to convert HTML to PDF

how to convert doc,docx files to pdf in java programatically

I am able to generate pdf from docx file using docx4j.But i need to convert doc file to pdf including images and tables.
Is there any way to convert doc to docx in java. or (doc to pdf)?
docx4j contains org.docx4j.convert.in.Doc, which uses POI to read the .doc, but it is a proof of concept, not production ready code. Last I checked, there were limits to POI's HWPF parsing of a binary .doc.
Further to mqchen's comment, you can use LibreOffice or OpenOffice to convert doc to docx. But if you are going to use LibreOffice or OpenOffice, you may as well use it to convert both .doc and .docx directly to PDF. Google 'jodconverter'.
Cribbing off the POI unit tests, I came up with this to extract the text from a word document:
public String getText(String document) {
try {
ZipInputStream is = new ZipInputStream(new FileInputStream(document));
try {
is.getNextEntry();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
IOUtils.copy(is, baos);
} finally {
baos.close();
}
byte[] byteArray = baos.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(byteArray);
HWPFDocument doc = new HWPFDocument(bais);
extractor = new WordExtractor(doc);
extractor.getText();
} finally {
is.close();
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
I do hope that points you in the right direction, if not sorts you entirely.
You can use jWordConvert for this.
jWordConvert is a Java library that can read and render Word documents
natively to convert to PDF, to convert to images, or to print the
documents automatically.
Details can be found at following link
http://www.qoppa.com/wordconvert/
https://github.com/guptachunky/Conversion-Work
This Github Link might be helpful for that.
https://github.com/guptachunky/Conversion-Work/blob/main/src/main/java/com/conversion/Conversion/Service/ConversionService.java
public void docToPdf(FileDetail fileDetail, HttpServletResponse response) {
InputStream doc;
try {
File docFile = converterToFile(fileDetail.getFile());
doc = new FileInputStream(docFile);
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
File file = File.createTempFile("output", ".pdf");
OutputStream out = new FileOutputStream(file);
PdfConverter.getInstance().convert(document, out, options);
getClaimFiles(file, response);
} catch (IOException e) {
response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
}
}
public void getClaimFiles(File file, HttpServletResponse response) {
try {
response.setContentType("application/pdf");
response.setHeader("Content-Disposition",
"attachment; filename=dummy.pdf");
response.getOutputStream().write(Files.readAllBytes(file.toPath()));
} catch (Exception e) {
response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
}
}

html to xhtml conversion in java

how can we convert html to well formed xhtml by using Http class api,if possible please give a
demonstration code....thanks
I just did it using Jsoup, if it works for you:
private String htmlToXhtml(final String html) {
final Document document = Jsoup.parse(html);
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
return document.html();
}
Some useful content where my solution came from:
Is it possible to convert HTML into XHTML with Jsoup 1.8.1?
http://developers.itextpdf.com/question/how-do-html-xml-conversion-generate-closed-tags
Have a look at J-Tidy: http://jtidy.sourceforge.net/ It usually does a quite good job cleaning up messy html and converting it to xhtml.
You can use the following method to get xhtml from html
public static String getXHTMLFromHTML(String inputFile,
String outputFile) throws Exception {
File file = new File(inputFile);
FileOutputStream fos = null;
InputStream is = null;
try {
fos = new FileOutputStream(outputFile);
is = new FileInputStream(file);
Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.parse(is, fos);
} catch (FileNotFoundException e) {
e.printStackTrace();
}finally{
if(fos != null){
try {
fos.close();
} catch (IOException e) {
fos = null;
}
fos = null;
}
if(is != null){
try {
is.close();
} catch (IOException e) {
is = null;
}
is = null;
}
}
return outputFile;
}

Categories