JAVA Read sample doc file, fill with data and generate PDF - java

I am triing to make an automatization program in JAVA.
I have a sample doc file. I need to fill the blank parts or the <> "signed" parts from database,
than create pdf files.
I tried to read the word :
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class ReadDocFile {
public static void main(String[] args) {
File file = null;
WordExtractor extractor = null ;
try {
file = new File("c:\\New.doc");
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
for(int i=0;i<fileData.length;i++){
if(fileData[i] != null)
System.out.println(fileData[i]);
}
}
catch(Exception exep){}
}
}
but this attemption is bad in many way cause i only need to write some of the parts, and this method make a single test from the doc.
So can you advice me some api that write in a word doc eg: after Name : or in the 5 row write this:
And when it finish with the word it should generate a pdf and do it again ...
I am looking a solution wich i found xssfworkbook with some extra function ( generate pdf of the doc ).
Or read the sample pdf and fill with datas and save to a new pdf.
Thx

Use Itext (http://sourceforge.net/projects/itext/)
and Apache POI Project http://poi.apache.org/index.html
A sample code :
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.poi.hwpf.extractor.WordExtractor;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
public static void main(String[] args) {
String pdfPath = "C:/";
String pdfDocPath = null;
try {
InputStream is = new BufferedInputStream(new FileInputStream("C:/Test.doc"));
WordExtractor wd = new WordExtractor(is);
String text = wd.getText();
/* FOR DOCX
// IMPORT
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
// CODE
XWPFDocument hdoc = new XWPFDocument(is);
extractor = new XWPFWordExtractor(hdoc);
String text = extractor.getText();
*/
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(pdfPath + "viewDoc.pdf"));
document.open();
document.add(new Paragraph(text));
document.close();
pdfDocPath = pdfPath + "viewDoc.pdf";
System.out.println("Pdf document path is" + pdfDocPath);
}
catch (FileNotFoundException e1) {
System.out.println("File does not exist.");
}
catch (IOException ioe) {
System.out.println("IO Exception");
}
catch (DocumentException e) {
e.printStackTrace();
}
}

Related

Cannot bring all the element from html to docx that have same tag with java

I have a project converting from html to docx with java, in the html document I have 2 paragraph with 2 header as a title, but when converting both of them to docx format, just one paragraph that successfully converted, but the other paragraph doesn't converted even they have same tag. Look the image below
And the code look like this
import java.io.File;
import java.io.FileOutputStream;
import java.util.List;
import java.util.Set;
import static org.apache.poi.hslf.model.textproperties.TextPropCollection.TextPropType.paragraph;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.VerticalAlign;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.zwobble.mammoth.DocumentConverter;
import org.zwobble.mammoth.Result;
/**
*
* #author Alwan
*/
public class TestWord {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
try
{
File file = new File("src/test/TEST.docx");
DocumentConverter converter = new DocumentConverter();
Result<String> result = converter.extractRawText(file);
String html = result.getValue(); // The generated HTML
Set<String> warnings = result.getWarnings(); // Any warnings during conversion
String[] part = html.split("<p>");
String[] part2 = html.split("<h1>");
FileOutputStream out = new FileOutputStream(new File("testformat.docx"));
XWPFDocument doc = new XWPFDocument();
XWPFParagraph paragraph = doc.createParagraph();
XWPFRun paragraphOneRunOne = paragraph.createRun();
XWPFRun paragraphOneRunThree = paragraph.createRun();
for (int i = 0; i < html.length(); i++)
{
if (i % 2 != 0)
{
paragraphOneRunOne.setBold(true);
paragraphOneRunOne.setItalic(true);
paragraphOneRunOne.setText(part[i].trim());
paragraphOneRunOne.addBreak();
paragraphOneRunThree.setStrike(true);
paragraphOneRunThree.setFontSize(20);
paragraphOneRunThree.setSubscript(VerticalAlign.SUBSCRIPT);
paragraphOneRunThree.setText(part2[i].trim());
System.out.println(part2[i].trim());
System.out.println(part[i].trim());
doc.write(out);
out.close();
}
System.out.println("testformat.docx written successully");
}
System.out.println("Success");
} catch(Exception e) {
e.printStackTrace();
}
}
}
The question is, how to bring all the paragraph from html into the docx format when its have a same tag? Thank you for your attention before. Sorry for my bad english

Error in parsing the HTML using XMLworkerHelper

I am getting "Invalid nested tag font found, expected closing tag table" when trying to convert one HTML to PDF using iText. Following is the program I am using.
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.StringReader;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Element;
import com.itextpdf.text.Font;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Phrase;
import com.itextpdf.text.Font.FontFamily;
import com.itextpdf.text.pdf.ColumnText;
import com.itextpdf.text.pdf.GrayColor;
import com.itextpdf.text.pdf.PdfPageEventHelper;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
public class HTML2PDF {
class Watermark extends PdfPageEventHelper{
//font for watermarking text
Font font = new Font(FontFamily.HELVETICA, 52, Font.ITALIC, new GrayColor(.75f));
public void onEndPage(PdfWriter writer, Document document){
ColumnText.showTextAligned(writer.getDirectContentUnder(),
Element.ALIGN_CENTER,
new Phrase("Watermarking Text", font),
297.5f, 421,-45);//-45 specifies the angle of Watermarking Text
}
}
public void convertHTML2PDF(){
//Create objects of Document and specify the Page size of a PDF
Document document = new Document(PageSize.A4);
try{
PdfWriter pdfWriter;
//get the instance of class PdfWriter with the document objects and output path
pdfWriter = PdfWriter.getInstance(document, new FileOutputStream("/opt/remedy/html2any-cmd-linux/bin/test.pdf"));
//setting the Watermarking in the PageEvent of PdfWriter
pdfWriter.setPageEvent(new Watermark());
document.open();
String htmlContent = "";
BufferedReader in = new BufferedReader(new FileReader("/opt/remedy/html2any-cmd-linux/bin/WO00000001004641434460592596_1.html"));
String temp;
//read the html files content and stores it in a String variable
while((temp = in.readLine())!=null){
htmlContent += temp;
}
in.close();
XMLWorkerHelper xmlWorker = XMLWorkerHelper.getInstance();
// converts the html into a PDF
xmlWorker.parseXHtml(pdfWriter, document, new StringReader(htmlContent));
document.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
//main method
public static void main(String[] args){
new HTML2PDF().convertHTML2PDF();
}
}
There are some tags in html which do not need to be closed. Could you suggest how this type of html can be parsed using XMLWorker?

how to add image and text in pdf for save html file

import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.html.simpleparser.HTMLWorker;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.codec.Base64.OutputStream;
import java.io.FileOutputStream;
import java.io.StringReader;
import java.net.URL;
public class myclass {
public static void main(String[] args) {
String result = "<html><body><div>(i) the recognised association shall have the approval of the Forward Markets Commission established under the Forward Contracts (Regulation) Act, 1952 (74 of 1952) in respect of trading in derivatives and shall function in accordance with the guidelines or conditions laid down by the Forward Markets Commission; </div> <body> </html>";
Document document = new Document();
OutputStream file = null;
try {
PdfWriter.getInstance(document, new FileOutputStream(
"E://Image.pdf"));
document.open();
PdfWriter.getInstance(document, file);
document.open();
#SuppressWarnings("deprecation")
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(result));
String imageUrl = "http://www.taxmann.com/emailer/demo/mobileAapp/newAppDesign.jpg";
Image image2 = Image.getInstance(new URL(imageUrl));
document.add(image2);
document.close();
file.flush();
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
I am trying to save image and text in pdf file. When we set Either text or image then it's working fine, simultaneously am not able to save image and text Both in pdf. How will I save image and text both in Pdf? I am Using iText.
May the problem is Wrong import for Outputstream and file is Always null . You never assigned any O/P strream.
Try this
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.io.StringReader;
import java.net.URL;
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.html.simpleparser.HTMLWorker;
import com.itextpdf.text.pdf.PdfWriter;
public class ItextExample {
#SuppressWarnings("deprecation")
public static void main(String[] args) {
String result = "<html><body><div>(i) the recognised association shall have the approval of the Forward Markets Commission established under the Forward Contracts (Regulation) Act, 1952 (74 of 1952) in respect of trading in derivatives and shall function in accordance with the guidelines or conditions laid down by the Forward Markets Commission; </div> <body> </html>";
Document document = new Document();
OutputStream file = null;
try {
file = new FileOutputStream("E://Image1.pdf");
PdfWriter.getInstance(document,file);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(result));
String imageUrl = "http://www.taxmann.com/emailer/demo/mobileAapp/newAppDesign.jpg";
Image image2 = Image.getInstance(new URL(imageUrl));
document.add(image2);
} catch (Exception e) {
e.printStackTrace();
}finally {
try {
document.close();
file.flush();
}catch(Exception e) {
e.printStackTrace();
}
}
}
}

How to hide the data between the two tags(<hidden> .. </hidden>) in Word document

I am able to Read the input document using Apache POI and also able to find the data between the tags(What to be hidden) but the problem is i'm unable to write the data in the output file. How can i do the same to write the data and hide it in the output generated file..
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.CharacterRun;
import org.apache.poi.hwpf.usermodel.Range;
public class Hidden {
public static void main(String args[]) throws Exception
{
File file = new File("D://me1.doc");
FileInputStream fin = new FileInputStream(file);
FileOutputStream fout = new FileOutputStream("D://Test.doc");
HWPFDocument doc = new HWPFDocument(fin);
Range range = doc.getRange();
WordExtractor extractor = new WordExtractor(doc);
String para[] = extractor.getParagraphText();
String output="";
String hidden="";
for (String p : para) {
String[] w = p.split("[<\\>]");
for(int k=0 ;k<w.length;k++){
if(w[k]!=null && !"".equalsIgnoreCase(w[k])){
if("hidden".equalsIgnoreCase(w[k])){
k++;
CharacterRun run = range.getCharacterRun(k);
hidden= w[k];
k++;
System.out.println(hidden);
run.setVanished(true);
doc.write(fout);
}else{
}
}
}
}
fout.close();
fin.close();
}
}

How to render PNG image from DFX file by using kabeja package?

I'm new to this kabeja package so please can some one provide code example or reading material to render PNG from DXF file using Java?
This is the sample code that generate PNG image from DXF file.
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.HashMap;
import org.kabeja.dxf.DXFDocument;
import org.kabeja.parser.*;
import org.kabeja.parser.ParserBuilder;
import org.kabeja.svg.SVGGenerator;
import org.kabeja.xml.*;
public class MyClass {
public static void main(String[] args) {
MyClas x=new MyClas();
x.parseFile("C:\\Users\\Space\\Desktop\\test2.dxf");
}
public void parseFile(String sourceFile) {
try {
FileOutputStream o=new FileOutputStream("C:\\Users\\Space\\Desktop\\test2.png");
InputStream in = new FileInputStream(sourceFile);//your stream from upload or somewhere
Parser dxfParser = ParserBuilder.createDefaultParser();
dxfParser.parse(in, "");
DXFDocument doc = dxfParser.getDocument();
SVGGenerator generator = new SVGGenerator();
//org.xml.sax.InputSource out = SAXPNGSerializer;
SAXSerializer out = new org.kabeja.batik.tools.SAXPNGSerializer();
out.setOutput(o);
generator.generate(doc,out,new HashMap());
} catch (ParseException e) {
e.printStackTrace();
} catch (Exception ioe) {
ioe.printStackTrace();
}
}
}
Hope you get what you required :)

Categories