Viewing .doc file with java applet - java

I have a web application. I've generated MS Word document in xml format (Word 2003 XML Document) on server side. I need to show this document to a user on a client side using some kind of viewer. So, question is: what libraries I can use to solve this problem? I need an API to view word document on client side using java.

You cannot reliably display a Word document in a web page using Java (or any other simple technology for that matter). There are several commercial libraries out there to render Word, but you will not find these to be easy, cheap or reliable solutions.
What you should do is the following:
(1) Open the Word engine on the server using a .NET program
(2) Convert the document to Rich Text using the Word engine
(3) Display the rich text either using the RTF Swing widget, or convert to HTML:
String rtf = [your document rich text];
BufferedReader input = new BufferedReader(new StringReader(rtf));
RTFEditorKit rtfKit = new RTFEditorKit();
StyledDocument doc = (StyledDocument) rtfKit.createDefaultDocument();
rtfEdtrKt.read( input, doc, 0 );
input.close();
HTMLEditorKit htmlKit = new HTMLEditorKit();
StringWriter output = new StringWriter();
htmlKit.write( output, doc, 0, doc.getLength());
String html = output.toString();
The main risk in this approach is that the Word engine will either crash or have a memory leak. For this reason you have to have a mechanism for restarting it periodically and testing it to make sure it is functional and not hogging memory.

docx4all is a Swing-based applet which does Word 2007 XML (ie not Word 2003 XML), which we wrote several years ago.
Get it from svn.
That's a possible approach for editing. If all you want is a viewer, which not convert to HTML or PDF? You can use docx4j for that. (Disclosure: "my" project).

You might have a look at the Apache POI - Java API to Handle Microsoft Word Files which is able to read all kinds of word documents (OLE2 and OOXML formats, .doc and .docx extensions respectively).
Reading a doc file can be easy as:
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class ReadDocFile {
public static void main(String[] args) {
File file = null;
WordExtractor extractor = null ;
try {
file = new File("c:\\New.doc");
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
for(int i=0;i<fileData.length;i++){
if(fileData[i] != null)
System.out.println(fileData[i]);
}
}
catch(Exception exep){}
}
}
You can find more at: HWPF Quick-Guide (specifically HWPF unit tests)
Note that, according to the POI site:
HWPF is still in early development.

I'd suggest looking at the openoffice source code and implement that.
It's supposed to be written in java.

Related

I can't import com.itextpdf.text.Document class

I'm building an android app and I want to use iText for creating pdf file, but I can't use Document class. As I seen in tutorials, there should be import com.itextpdf.text.Document for using Document class. For this app, I'm using com.itextpdf:itext-pdfa:5.5.9 library. I want to create a simple pdf file with 2 paragraphs, something like this:
try{
File pdfFolder = new File(Environment.getExternalStoragePublicDirectory(
Environment.DIRECTORY_DOCUMENTS), "pdfdemo");
if (!pdfFolder.exists()) {
pdfFolder.mkdir();
}
Date date = new Date() ;
String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(date);
File myFile = new File(pdfFolder + timeStamp + ".pdf");
OutputStream output = new FileOutputStream(myFile);
Document document = new Document();
PdfAWriter.getInstance(document, output);
document.open();
document.add(new Paragraph(mSubjectEditText.getText().toString()));
document.add(new Paragraph(mBodyEditText.getText().toString()));
document.close();
}catch (Exception e) {}
'
Could anyone help me with this problem? What am I doing wrong?
You say:
I'm using com.itextpdf:itext-pdfa:5.5.9 library
That is wrong for two reasons:
itext-pdfa is an addon to iText that is meant for writing or manipulating PDF/A documents. It requires the core iText libary. Read about the different parts of iText on the official web site: https://developers.itextpdf.com/itext-java
You say you want to use iText on Android, but you are referring to iText for Java. iText for Java contains classes that are not allowed on Android (java.awt.*, javax.nio,...). You should use the Android port for iText, which is called iTextG: https://developers.itextpdf.com/itextg-android
It's as if you're using iText without having visited the official iText web site. How is that even possible?
Just open your app level gradle file and add following code into your dependencies
implementation 'com.itextpdf:itext-pdfa:5.5.9'
It works for me

Apache POI Replace String in Word

I've recently been working on an automated system to make and print out letters to post. The system works as follows:
I create a file, with all the information in it, and replace some things with %... placeholders. For example, %name, %date, etc.
When I run the application, I can select a name from the list, and it automatically loads the document, replaces all the placeholders with information supplied by a MySQL database, and prints out the document. For testing purposes, I'm just saving the document for now.
I've found some tutorials on the internet, and found a code that suited my needs. Unfortunately, this code only works for Word versions older than 2007 (.doc files). What would I change for 2007+ compatibility (.docx files)?
public static void main(String[] args){
try{
FileInputStream fis = new FileInputStream("/Users/Jasper/Desktop/document.doc");
POIFSFileSystem fs = new POIFSFileSystem(fis);
HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
range.replaceText("%name", "Jasper");
range.replaceText("%age", "17");
FileOutputStream fos = new FileOutputStream("/Users/Jasper/Desktop/document2.doc");
doc.write(fos);
fis.close();
fos.close();
}catch(Exception e){
e.printStackTrace();
}
}
Please see my answer here I used this solution to automate generation of documents and it has been used since more than 1 year on production.

Creating Word File With JTextPane Style Option

I want to save the contents of a JTextPane to a word file.
I don't have a problem saving but I can't currently keep some style options such as paragraph styles.
I use these libraries:
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
Lines of code;
System.out.println("Kaydete basıldı");
String text = textPane.getText();
lblNewLabel.setText(text);
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText(text);
try {
FileOutputStream dosyaCikis = new FileOutputStream(
"sercan.docx");
document.write(dosyaCikis);
dosyaCikis.close();
} catch (Exception e2) {
e2.printStackTrace();
}
Apache POI or another way, it does not matter, I am waiting for your help.
This example shows how to set various style options:(Apache POI)
SimpleDocument
Example code form the link:
XWPFDocument doc = new XWPFDocument();
XWPFParagraph p1 = doc.createParagraph();
p1.setAlignment(ParagraphAlignment.CENTER);
p1.setBorderBottom(Borders.DOUBLE);
p1.setBorderTop(Borders.DOUBLE);
p1.setBorderRight(Borders.DOUBLE);
p1.setBorderLeft(Borders.DOUBLE);
p1.setBorderBetween(Borders.SINGLE);
p1.setVerticalAlignment(TextAlignment.TOP);
XWPFRun r1 = p1.createRun();
r1.setBold(true);
r1.setText("The quick brown fox");
r1.setBold(true);
r1.setFontFamily("Courier");
r1.setUnderline(UnderlinePatterns.DOT_DOT_DASH);
r1.setTextPosition(100);
Other examples(styles,images .etc) can be found here:
Example Package
AFAIK the options for writing Word files are limited from standard Java libraries.
You probably want to use a tool that explicitly supports Word formats - the best bet is probably LibreOffice, which is Free software. The LibreOffice API supports Java and other languages.
For a fuller explanation look here:
What's a good Java API for creating Word documents?
However that answer refers to OpenOffice, of which LibreOffice is a more actively developed fork due to management issues over the years.
You could try docx_editor_kit. From the web page:
it can open docx file and reflect the content in JEditorPane (or
JTextPane). Also user can create styled content and store the content
in docx format.
Somewhat related is my docx4all, but it hasn't been updated recently, and it may be overkill for your purposes.
Both of these use docx4j (as opposed to POI).

Error in read .doc and .docx file's content

I want to read a .txt, .doc and .docx files and print the contents of those files.when i run the below code some .doc and .txt files are read but many files are not able to read.
import java.io.File;
import javax.swing.*;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
public class FindYourDocx
{
public static void main(String[] args)
{
String text = "";
int read, N = 1024 * 1024;
char[] buffer = new char[N];
try {
JFileChooser openFile=new JFileChooser();
openFile.setCurrentDirectory(new File("."));
openFile.showOpenDialog(null);
File f1=openFile.getSelectedFile();
String file1=f1.toString();
File f =new File(file1);
JOptionPane.showMessageDialog(null,f);
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);
while(true) {
read = br.read(buffer, 0, N);
text += new String(buffer, 0, read);
System.out.println("Follows"+text+" ");
if(read < N) {
break;
}
System.out.println("Follows"+text+" "); }
} catch(Exception ex) {
ex.printStackTrace();
}
}}
by executing the above code (for some files) i got some wired messages as follows
http://i.stack.imgur.com/RwNWM.jpg
Someone please help me to solve this issues....
to read .docx i came across something like XWPFDocument using apacheio ....what is this ?
First of all you should think about your problem: What do different file types look like as a file, what is their structure, what's the content which you would like to print and what does "printing" mean at all? What your are doing is reading files, treating them as text and printing them to STDOUT. Does "printing" mean this in your case? I interpret "printing" as being able to send content to a printer and get some paper.
Another hint: Doc and Docx are binary files, which contain "printable" text "somewhere". You can't just read the files and do something with the data. You need to know how those file formats look like, were the content is etc. Java can't do that out of the box, you need additional libraries to parse those file formats and do something with them.
There are many tutorials and questions around formats like docx:
How to read docx file content in java api using poi jar
to read .docx i came across something like XWPFDocument using apacheio ....what is this ?
You mean Apache POI. To find out more, check the website. In brief, both Apache POI and docx4j (which I note you have tagged) are Java libraries aimed at reading, manipulating, and writing Microsoft Office files.
'doc' files are Microsoft proprietary binary files. If you try to read them in and display them using the Java IO API alone, all you will see is a representation of the binary data. It won't be useful to you. You need to use an API specifically for loading up and traversing Word files, which is where Apache POI or docx4j come in.
'docx' files are a newer XML-based Microsoft Office format. A docx file is essentially a zipped folder containing the various assets that make up a Word file.
As I said, in order to read a Word file properly, you will need to use one of the libraries mentioned. Both the Apache and docx4j websites contain plenty of example code to get you started opening and traversing Word documents (note that POI can work with the older .doc format, whereas docx4j is only for .docx files).
http://www.docx4java.org
http://poi.apache.org

Java library for reading Word documents

Is there an open-source Java library for reading Word documents (both .docx and the older .doc format)?
Read-only access if sufficient; I do not need to modify the Word documents using Java. However, I would like to have access to images and style information.
EDIT
I've checked out Apache POI, but it doesn't look like it is being actively maintained. See http://poi.apache.org/hwpf/index.html:
At the moment we unfortunately do not have someone taking care for HWPF and fostering its development.
Apache POI HWPF for .doc and XWPF for .docx files
There is an apache project that does this: http://poi.apache.org//
public class XParseTest
{
public static void main(String[] args) throws XmlException, OpenXML4JException, IOException
{
File file=new File("e:\\testing\\new.docx");
FileInputStream fs = new FileInputStream(file);
OPCPackage d = OPCPackage.open(fs);
XWPFWordExtractor xw = new XWPFWordExtractor(d);
System.out.println(xw.getText());
}
}
this will parse docx file...

Categories