How do I (should I?) use Apache POI HWPFDocument? - java

I'm thinking about including the Apache POI into my application. Main goal is to output RTF document, but DOC would be nice, too. But the documentation is not very detailed about writing a HWPFDocument and everything I found on the web isn't helpful at all.
I can read DOC files, that's working without any problem. But I really can't see how I write a document. Maybe someone can give me a short code example?
Thanks a lot!

If you want to do RTF, These are text files and they are support in all versions of Word.
you can use itext for simple stuff
http://itextdocs.lowagie.com/tutorial/rtf/index.php
ro
you can export them the hard way
//-- save as example.doc -------------
{
\rtf1
\ansi
\ansicpg1252
\deff0
\deflang1033
{\fonttbl
{\f0
\fswiss
\fcharset0 Arial;
}
}
{
\*
\generator Msftedit 5.41.21.2500;
}
\viewkind4
\uc1
\pard
\f0
\fs20
Hello World
\par
}

Well,
It has been a long time since the last time I used POI. I read that the HWPFDocument is now orphaned (read on apache POI website). I would recommend using the WordML specification released by Microsoft instead.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
I have used this method before. The easiest way is to create a WordML template and just replace the values using XPATH

Related

Replacing text in XWPFParagraph without changing format of the docx file

I am developing font converter app which will convert Unicode font text to Krutidev/Shree Lipi (Marathi/Hindi) font text. In the original docx file there are formatted words (i.e. Color, Font, size of the text, Hyperlinks..etc. ).
I want to keep format of the final docx same as the original docx after converting words from Unicode to another font.
PFA.
Here is my Code
try {
fileInputStream = new FileInputStream("StartDoc.docx");
document = new XWPFDocument(fileInputStream);
XWPFWordExtractor extractor = new XWPFWordExtractor(document);
List<XWPFParagraph> paragraph = document.getParagraphs();
Converter data = new Converter() ;
for(XWPFParagraph p :document.getParagraphs())
{
for(XWPFRun r :p.getRuns())
{
String string2 = r.getText(0);
data.uniToShree(string2);
r.setText(string2,0);
}
}
//Write the Document in file system
FileOutputStream out = new FileOutputStream(new File("Output.docx");
document.write(out);
out.close();
System.out.println("Output.docx written successully");
}
catch (IOException e) {
System.out.println("We had an error while reading the Word Doc");
}
Thank you for ask-an-answer.
I have worked using POI some years ago, but over excel-workbooks, but still I’ll try to help you reach the root cause of your error.
The Java compiler is smart enough to suggest good debugging information in itself!
A good first step to disambiguate the error is to not overwrite the exception message provided to you via the compiler complain.
Try printing the results of e.getLocalizedMessage()or e.getMessage() and see what you get.
Getting the stack trace using printStackTrace method is also useful oftentimes to pinpoint where your error lies!
Share your findings from the above method calls to further help you help debug the issue.
[EDIT 1:]
So it seems, you are able to process the file just right with respect to the font conversion of the data, but you are not able to reconstruct the formatting of the original data in the converted data file.
(thus, "We had an error while reading the Word Doc", is a lie getting printed ;) )
Now, there are 2 elements to a Word document:
Content
Structure or Schema
You are able to convert the data as you are working only on the content of your respective doc files.
In order to be able to retain the formatting of the contents, your solution needs to be aware of the formatting of the doc files as well and take care of that.
MS Word which defined the doc files and their extension (.docx) follows a particular set of schemas that define the rules of formatting. These schemas are defined in Microsoft's XML Namespace packages[1].
You can obtain the XML(HTML) format of the doc-file you want quite easily (see steps in [1] or code in link [2]) and even apply different schemas or possibly your own schema definitions based on the definitions provided by MS's namespaces, either programmatically, for which you need to get versed with XML, XSL and XSLT concepts (w3schools[3] is a good starting point) but this method is no less complex than writing your own version of MS-Word; or using MS-Word's inbuilt tools as shown in [1].
[1]. https://www.microsoftpressstore.com/articles/article.aspx?p=2231769&seqNum=4#:~:text=During%20conversion%2C%20Word%20tags%20the,you%20can%20an%20HTML%20file.
[2]. https://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/testcases/org/apache/poi/hwpf/converter/TestWordToHtmlConverter.java
[3]. https://www.w3schools.com/xml/
My answer provides you with a cursory overview of how to achieve what you want to, but depending on your inclination and time availability, you may want to use your discretion before you decide to head onto one path than the other.
Hope it helps!

unable to recognize file type

this is my first post. I'm new in Java. I'm working on file parser. I've tried to identify if it is CSV or another file format, but it looks like it is not quite a standard format. I'm working on apache camel solution (my first and last idea :( ), but maybe some of you recognize this kind of file format? Additionally, I've got .imp file for my output.
Here is my example input:
NrDok:FS-2222/17/W
Data:12.02.2017
SposobPlatn:GOT
NazwaWystawcy:MAAKAI Gawron
AdresWystawcy:33-123 bABA
KodWystawcy:33-112
MiastoWystawcy:bABA
UlicaWystawcy:czysfa 8
NIPWystawcy:123-19-85-123
NazwaOdbiorcy:abc abc-HANDLOWO-USŁUGOWE
AdresOdbiorcy:33-123 fghd
KodOdbiorcy:33-123
MiastoOdbiorcy:Tdsfs
UlicaOdbiorcy:dfdfdA 39
NIPOdbiorcy:82334349
TelefonOdbiorcy:654-522-124
NrOdbiorcyWSieciSklepow:efdsS-sffgsA
IloscLinii:1
Linia:Nazwa{ĆWIARTKA KG}Kod{C1}Vat{5}Jm{kg.}Asortyment{dfgv}Sww{}PKWIU{10.12.10}Ilosc{3.40}Cena{n3.21}Wartosc{n11.83}IleWOpak{1}CenaSp{b0.00}
DoZaplaty:252.32
And here is my example output file:
FH 2015.07.31 2015.07.31 F04443 Gotowka
FO 812-123-45-11 P.a.b.Uc"fdad" abcd deffF UL.fdfgdfdA 12/33 33-123 afvdf
FS 779-19-06-082 badfdf S.A. ul. Wisniowa 89 60-003 Poznan
FP 00218746 CHRZAN TARTY EXTRA POLONAISE 180G SZT 32.00 2.21 8 10.39.17.0 32.00 5900138000055
Is there any easy way to convert the first file to second file format? Maybe you know the type of this file? In a meanwhile, I'm continuing my work with apache camel.
Thanks in advance for your time and help!
I suggest you to play with https://tika.apache.org/1.1/detection.html#Mime_Magic_Detection
It's very good lib for file type recognition.
Here https://www.tutorialspoint.com/tika/tika_document_type_detection.htm we have simple example.
Your file can be read as standard Java .properties file. This type of files allows both = and : as key and value separators. While the fact that it contains non ISO-8859-1 characters like Polish Ć may prevent Java from correctly parsing it.
This line
Nazwa{ĆWIARTKA KG}Kod{C1}Vat{5}Jm{kg.}Asortyment{dfgv}Sww{}PKWIU{10.12.10}Ilosc{3.40}Cena{n3.21}Wartosc{n11.83}IleWOpak{1}CenaSp{b0.00}
Seem to be some custom serialization format of the object in the form
key1{value1}key2{value2}...
Your output file contains lots of data that is not listed in the input which makes me think that there is some data querying from external systems to build the output. You should investigate it yourself. There is no way anyone can guess the transformation with provided input.

How to save file with custom file extension in java?

Dear brothers Hope you all right?
I'm designing a document program, however, rather to save file .text extension or using any other MS-Office API in java, i want to create my custom file format such as ".sad" extension so that this sort of file can only be read by my programs, how this can be possible?
Your requirement seems ambiguous. Are you looking to make a program that creates MS Office Word documents or plain text files with a custom file extension?
In the case of the former, you can't have a custom extension as MS Word documents, by definition, have a .doc / .docx extension.
However, if you are looking to create a program that produces text files then you can easily have a custom extension. Just look at this tutorial: How to create a file in Java
I already stated why this is a bad idea. Yet I have a solution for you (more like a how-not-to-do-it)
Take your plain text you want to save, convert it to bytes and apply this "highly enthusiastic encryption nobody will ever be able to break" on it:
string plainText = "yadayada";
bytes[] bytesFromText = toBytes(plainText);
bytes[] encrypted = new Array(sizeof(bytesFromText)*2);
for(int i = 0; i < sizeof(bytesFromText); i++){
if((i modulo 2) == 0){
encrypted.push(toByte(Math.random modulo 255));
}
encrypted.push(bytesFromText[i]);
}
I let it up to you to figure out why this is a bad idea and how to decrypt it. ;)
You can create file with any extension
For example,
File f = new File("confidential.sad");
Hope this will work for you :)
Working with custom files in Java
Here is the tutorial that will help you in getting the concept about how to create your own files with custom extension such as .doc or .sad with some information embedded in it and after saving the file you want to read that information form the file.
ZIP
Similar applications often use archives to store data. Consider MS-Word and its documents >with the .docx file extension. If you change the extension of any .docx file to .zip, you >will find that the document is actually a zip archive, with only a different extension.
https://www.ict.social/java/files/working-with-custom-files-in-java-zip-archive
I have published a library that saves files, and handles everything with one line of code only, you can find it here along with its documentation
Github repository
and the answer to your question is so easy
String path = FileSaver
.get()
.save(file,"file.custom");

how to create an odt file programmatically with java?

How can I create an odt (LibreOffice/OpenOffice Writer) file with Java programmatically? A "hello world" example will be sufficient. I looked at the OpenOffice website but the documentation wasn't clear.
Take a look at ODFDOM - the OpenDocument API
ODFDOM is a free OpenDocument Format
(ODF) library. Its purpose is to
provide an easy common way to create,
access and manipulate ODF files,
without requiring detailed knowledge
of the ODF specification. It is
designed to provide the ODF developer
community with an easy lightwork
programming API portable to any
object-oriented language.
The current reference implementation
is written in Java.
// Create a text document from a standard template (empty documents within the JAR)
OdfTextDocument odt = OdfTextDocument.newTextDocument();
// Append text to the end of the document.
odt.addText("This is my very first ODF test");
// Save document
odt.save("MyFilename.odt");
later
As of this writing (2016-02), we are told that these classes are deprecated... big time, and the OdfTextDocument API documentation tells you:
As of release 0.8.8, replaced by org.odftoolkit.simple.TextDocument in
Simple API.
This means you still include the same active .jar file in your project, simple-odf-0.8.1-incubating-jar-with-dependencies.jar, but you want to be unpacking the following .jar to get the documentation: simple-odf-0.8.1-incubating-javadoc.jar, rather than odfdom-java-0.8.10-incubating-javadoc.jar.
Incidentally, the documentation link downloads a bunch of jar files inside a .zip which says "0.6.1"... but most of the stuff inside appears to be more like 0.8.1. I have no idea why they say "as of 0.8.8" in the documentation for the "deprecated" classes: just about everything is already marked deprecated.
The equivalent simple code to the above is then:
odt_doc = org.odftoolkit.simple.TextDocument.newTextDocument()
para = odt_doc.getParagraphByIndex( 0, False )
para.appendTextContent( 'stuff and nonsense' )
odt_doc.save( 'mySpankingNewFile.odt' )
PS am using Jython, but the Java should be obvious.
I have not tried it, but using JOpenDocument may be an option. (It seems to be a pure Java library to generate OpenDocument files.)
A complement of previously given solutions would be JODReports, which allows creating office documents and reports in ODT format (from templates, composed using the LibreOffice/OpenOffice.org Writer word processor).
DocumentTemplateFactory templateFactory = new DocumentTemplateFactory();
DocumentTemplate template = templateFactory .getTemplate(new File("template.odt"));
Map data = new HashMap();
data.put("title", "Title of my doc");
data.put("picture", new RenderedImageSource(ImageIO.read(new File("/tmp/lena.png"))));
data.put("answer", "42");
//...
template.createDocument(data, new FileOutputStream("output.odt"));
Optionally the documents can then be converted to PDF, Word, RTF, etc. with JODConverter.
Edit/update
Here you can find a sample project using JODReports (with non-trivial formatting cases).
I have written a jruby DSL for programmatically manipulating ODF documents.
https://github.com/noah/ocelot
It's not strictly java, but it aims to be much simpler to use than the ODFDOM.
Creating a hello world document is as easy as:
% cat examples/hello.rb
include OCELOT
Text::create "hello" do
paragraph "Hello, world!"
end
There are a few more examples (including a spreadsheet example or two) here.
I have been searching for an answer about this question for myself. I am working on a project for generating documents with different formats and I was in a bad need for library to generate ODT files.
I finally can say the that ODFToolkit with the latest version of the simple-odf library is the answer for generating text documents.
You can find the the official page here :
Apache ODF Toolkit(Incubating) - Simple API
Here is a page to download version 0.8.1 (the latest version of Simple API) as I didn't find the latest version at the official page, only version 0.6.1
And here you can find Apache ODF Toolkit (incubating) cookbook
You can try using JasperReports to generate your reports, then export it to ODS. The nice thing about this approach is
you get broad support for all JasperReports output formats, e.g. PDF, XLS, HTML, etc.
Jasper Studio makes it easy to design your reports
The ODF Toolkit project (code hosted at Github) is the new home of the former ODFDOM project, which was until 2018-11-27 a Apache Incubator project.
the solution may be JODF Java API Independentsoft company.
For example, if we want to create an Open Document file using this Java API we could do the following:
import com.independentsoft.office.odf.Paragraph;
import com.independentsoft.office.odf.TextDocument;
public class Example {
public static void main(String[] args)
{
try
{
TextDocument doc = new TextDocument();
Paragraph p1 = new Paragraph();
p1.add("Hello World");
doc.getBody().add(p1);
doc.save("c:\\test\\output.odt", true);
}
catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
}
There are also .NET solutions for this API.

How to convert UTF-8 string to RTF string in java?

Currently my project need export report to MS Word , and i choose using RTFTemplate engine to do it. But my problem is I need convert all character to RTF string first . Everyone have experiment with this problem can suggest me ?
Yes.
You can use RtfDocument from iText:
new RtfDocument().filterSpecialChar(baos, sentence, true, true);
I've described it with details in my blog:
http://lechlukasz.wordpress.com/2010/02/03/rtftemplate-and-character-encoding/
POI
is an apache project that help works with MS Office format.

Categories