how to create an odt file programmatically with java? - java

How can I create an odt (LibreOffice/OpenOffice Writer) file with Java programmatically? A "hello world" example will be sufficient. I looked at the OpenOffice website but the documentation wasn't clear.

Take a look at ODFDOM - the OpenDocument API
ODFDOM is a free OpenDocument Format
(ODF) library. Its purpose is to
provide an easy common way to create,
access and manipulate ODF files,
without requiring detailed knowledge
of the ODF specification. It is
designed to provide the ODF developer
community with an easy lightwork
programming API portable to any
object-oriented language.
The current reference implementation
is written in Java.
// Create a text document from a standard template (empty documents within the JAR)
OdfTextDocument odt = OdfTextDocument.newTextDocument();
// Append text to the end of the document.
odt.addText("This is my very first ODF test");
// Save document
odt.save("MyFilename.odt");
later
As of this writing (2016-02), we are told that these classes are deprecated... big time, and the OdfTextDocument API documentation tells you:
As of release 0.8.8, replaced by org.odftoolkit.simple.TextDocument in
Simple API.
This means you still include the same active .jar file in your project, simple-odf-0.8.1-incubating-jar-with-dependencies.jar, but you want to be unpacking the following .jar to get the documentation: simple-odf-0.8.1-incubating-javadoc.jar, rather than odfdom-java-0.8.10-incubating-javadoc.jar.
Incidentally, the documentation link downloads a bunch of jar files inside a .zip which says "0.6.1"... but most of the stuff inside appears to be more like 0.8.1. I have no idea why they say "as of 0.8.8" in the documentation for the "deprecated" classes: just about everything is already marked deprecated.
The equivalent simple code to the above is then:
odt_doc = org.odftoolkit.simple.TextDocument.newTextDocument()
para = odt_doc.getParagraphByIndex( 0, False )
para.appendTextContent( 'stuff and nonsense' )
odt_doc.save( 'mySpankingNewFile.odt' )
PS am using Jython, but the Java should be obvious.

I have not tried it, but using JOpenDocument may be an option. (It seems to be a pure Java library to generate OpenDocument files.)

A complement of previously given solutions would be JODReports, which allows creating office documents and reports in ODT format (from templates, composed using the LibreOffice/OpenOffice.org Writer word processor).
DocumentTemplateFactory templateFactory = new DocumentTemplateFactory();
DocumentTemplate template = templateFactory .getTemplate(new File("template.odt"));
Map data = new HashMap();
data.put("title", "Title of my doc");
data.put("picture", new RenderedImageSource(ImageIO.read(new File("/tmp/lena.png"))));
data.put("answer", "42");
//...
template.createDocument(data, new FileOutputStream("output.odt"));
Optionally the documents can then be converted to PDF, Word, RTF, etc. with JODConverter.
Edit/update
Here you can find a sample project using JODReports (with non-trivial formatting cases).

I have written a jruby DSL for programmatically manipulating ODF documents.
https://github.com/noah/ocelot
It's not strictly java, but it aims to be much simpler to use than the ODFDOM.
Creating a hello world document is as easy as:
% cat examples/hello.rb
include OCELOT
Text::create "hello" do
paragraph "Hello, world!"
end
There are a few more examples (including a spreadsheet example or two) here.

I have been searching for an answer about this question for myself. I am working on a project for generating documents with different formats and I was in a bad need for library to generate ODT files.
I finally can say the that ODFToolkit with the latest version of the simple-odf library is the answer for generating text documents.
You can find the the official page here :
Apache ODF Toolkit(Incubating) - Simple API
Here is a page to download version 0.8.1 (the latest version of Simple API) as I didn't find the latest version at the official page, only version 0.6.1
And here you can find Apache ODF Toolkit (incubating) cookbook

You can try using JasperReports to generate your reports, then export it to ODS. The nice thing about this approach is
you get broad support for all JasperReports output formats, e.g. PDF, XLS, HTML, etc.
Jasper Studio makes it easy to design your reports

The ODF Toolkit project (code hosted at Github) is the new home of the former ODFDOM project, which was until 2018-11-27 a Apache Incubator project.

the solution may be JODF Java API Independentsoft company.
For example, if we want to create an Open Document file using this Java API we could do the following:
import com.independentsoft.office.odf.Paragraph;
import com.independentsoft.office.odf.TextDocument;
public class Example {
public static void main(String[] args)
{
try
{
TextDocument doc = new TextDocument();
Paragraph p1 = new Paragraph();
p1.add("Hello World");
doc.getBody().add(p1);
doc.save("c:\\test\\output.odt", true);
}
catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
}
There are also .NET solutions for this API.

Related

How to add custom XML storage part to Word doc - preferrably with docx4j

I'm trying to populate a Word content control with XML data using docx4j (version 3.2.1). I'm evaluating this in order to use it for invoice generation. The documents we want to generate are not very complicated so this looks like a good approach to me.
I have created the content control through Word 2010 dev tools. This is how I try to inject the XML into the docx (taken from this example):
WordprocessingMLPackage wordMLPackage = Docx4J.load(new File(input_DOCX));
FileInputStream xmlStream = new FileInputStream(new File(input_XML));
Docx4J.bind(wordMLPackage, xmlStream, Docx4J.FLAG_BIND_INSERT_XML & Docx4J.FLAG_BIND_BIND_XML);
I get the following exception:
org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't find CustomXmlDataStoragePart! exiting..
at org.docx4j.Docx4J.bind(Docx4J.java:300)
at org.docx4j.Docx4J.bind(Docx4J.java:271)
How can I add the CustomXmlDataStoragePart with docx4j, if it doesn't exist yet? Or should/can I do this in Word directly?
Note: I decided to prepare templates in Word directly, because later on these templates must be edited by non-technical users and I don't want to burden them with extra tools, if possible.
You say you "created the content control through Word 2010 dev tools". Unless you mean the content control toolkit, you need to use that or better, either of the OpenDoPE Word addins. Not both.
These tools add a custom xml part into the docx, and allow you to associate it with your content controls via XPath data bindings.
Then, when at runtime you invoke Docx4J.bind, docx4j finds that existing custom xml part, and replaces it with the xml file you provide which contains your runtime data.

OpenSearch Compatible Response From Java

Here is an example OpenSearch description file:
http://webcat.hud.ac.uk/OpenSearch.xml
When I send a query as like that:
http://webcat.hud.ac.uk/perl/opensearch.pl?keyword=new&startpage=1&itemsperpage=20
I get a response which is compatible to OpenSearch. How can I implement OpenSearch specification at Java or is there any library for it or is there any xsd that I can generate a Java code from it?
According to the OpenSearch website's section on "Reading OpenSearch", there is a Java library which can do this, called Apache Abdera. I have not used it myself, so I cannot comment on its quality, but it should be worth looking into - apparently it can both interpret AND create OpenSearch responses, so this may be exactly what you're looking for.
Alternatively, there are quite a few very good XML parsers for Java (see this question for some suggestions), so writing your own parser for a simple OpenSearch XML file shouldn't be too difficult, since the full specification is available online.
As for an XSD, I can't find an "official" one, however there are XSD's for OpenSearch in various open source projects which have been tested and you can use, such as this one, which is part of a project called "OpenSearch Validator."
Another potential choice for writing OpenSearch results is the very mature and widely-used Apache Lucene library, which is in the list of software "writing OpenSearch results" in the previously linked OpenSearch website.
ROME also supports OpenSearch with its ROME Module A9 OpenSearch.
Sample usage:
SyndFeed feed = new SyndFeedImpl();
feed.setFeedType(feedType);
// Add the opensearch module, you would get information like totalResults from the
// return results of your search
List mods = feed.getModules();
OpenSearchModule osm = new OpenSearchModuleImpl();
osm.setItemsPerPage(1);
osm.setStartIndex(1);
osm.setTotalResults(1024);
osm.setItemsPerPage(50);
OSQuery query = new OSQuery();
query.setRole("superset");
query.setSearchTerms("Java Syndication");
query.setStartPage(1);
osm.addQuery(query);
Link link = new Link();
link.setHref("http://www.bargainstriker.com/opensearch-description.xml");
link.setType("application/opensearchdescription+xml");
osm.setLink(link);
mods.add(osm);
feed.setModules(mods);
// end add module

Validating an XML NCName in Java

I'm getting some values from Java annotations in an annotation processor to generate metadata. Some of these values are supposed to indicate XML element or attribute names. I'd like to validate the input to find out if the provided values are actually legal NCNames according to the XML specification. Only the local name is important in this case, the namespace URI doesn't play a part here.
Is there some simple way of finding out if a string is a legal XML element or attribute name? Preferably I'd use some XML API that is readily available in Java SE. One of the reasons I'm doing this stuff in the first place is to cut back on dependencies. I'm using JDK 7 so I have access to the most up-to-date classes/methods.
So far, browsing through content handler classes and SAX/DOM stuff hasn't yielded any result.
If you're prepared to have Saxon on your class path you can do
new Name10Checker().isValidNCName(s);
I can't see anything simpler in the public JDK interface.
didn't find anything straightforward in any of the jdk 6 APIs (don't know about jdk 7). a quick but possibly "hackish" way to check would be to convert it to an xml doc and see if it parses:
String name = ...;
if(name.contains(">")) {
return false;
}
String xmlDoc = "<" + name + "/>";
DocumentBuilder db = ...;
db.parse(new InputSource(new StringReader(xmlDoc)));
I ran into the same problem and found lots of implementations in foss libraries, and even an old implementation in a Java class library, which has been removed ages ago... So here's a few options to choose from:
Java Class Library: XMLUtils.isValidNCName(String ncName) (note: removed in 2004)
Apache Axis: NCName.isValid(String stValue)
Saxonica: NameChecker.isValidNCName(CharSequence ncName)
OWL API: XMLUtils.isNCName(java.lang.CharSequence s)
Validator.nu HTML Parser: NCName.isNCName(java.lang.String str)
So, if you're using one of these libraries anyway, you're fine.
As I am not, I'll go with a copy of the XMLUtils from the OWL API, which has no external dependencies, is available under non-restrictive licenses (LGPL and Apache 2.0) and consists of nice and clean code.

Document processing in Liferay portal

I've been using Liferay a lot for past 2 years, but I have never needed any extensive document management.
Now I have a portlet where users upload documents (MS office OLE2 documents, ODS documents, PDF etc.) and I have to persist them with all metadata available.
I know how would I do that without using Liferay, I'd probably use Apache solr with Apache Tika (UpdateRichDocuments and ExtractingRequestHandler) or Apache Jackrabbit that are using Apache Tika under the hood (org.apache.jackrabbit.extractor.*).
The problem is, that If I look at the trunk of Liferay, there are some key classes :
Hooks (JCRHook, FileSystemHook, CMISHook, s3Hook) that are employed from within DLLocalServiceImpl kinda directly
Another alternative is using DLAppLocalServiceImpl that is employing DLRepositoryLocalServiceImpl and the files are persisted into repository also via Hooks, but a lot of additional stuff is done in there.
There is not jackrabbit-text-extractors library in Liferay, so I suppose If I wanted metadata to be extracted from PDF, DOCs, ODS documents, I would have very hard times... because the DL service layer doesn't accept additional properties
I think I'd have to avoid using DL services and JCR hook and access Jackrabbit directly... But I would loose the compatibility and possibility migrate my repository etc.
Could please anybody collaborate on this one please ? Thank you
SOLR for indexing, Jackrabbit for document storage. Managing Liferay Document Library in code is fairly easy, just look at the DL*LocalServiceUtil classes, namely DLFolderLocalServiceUtil and DLFileLocalServiceUtil. By default Liferay just creates a matching folder/file structure on the hard drive (with names changed) so you'd only need to write code or use Jackrabbit if you wanted more than this since Liferay allows up/download and viewing out of the box via the control panel and various portlets.
I haven't used JackRabbit with Liferay but once configured everything should be managed under the covers and you shouldn't need to worry about it on the front end.
When you say "with all metadata available" I'm not sure what is retained, but aside from renaming the file so that it can be tracked there shouldn't be any other changes. It should be quick and easy to test by uploading a file of each type and checking the entries in the LIFERAY/data/document_library directory and subdirectories. Again this would be different if Jackrabbit is used.
those two services DLLocalServiceImpl and DLAppLocalServiceImpl both are and will, I suppose, important. The former one if for direct access to repository. Notice that when adding a file via this service you need to persist corresponding DlFileEntry into database and than reference that addFile(...., fileEntryId, ...).
The latter service is doing additional stuff for you, mainly asset management and workflow.
Regarding your use case, I would avoid using document library, because no metadata can go down into the JCR repository. Actually only metadata/custom properties that you could store would be custom properties AKA Expando feature of Liferay portal.
Best way for you seem to be implement your own jackrabbit hook to store data into repository and let Liferay document library use that repository.
Think Edgar is correct. If you check the current trunk via http://svn.liferay.com/repos/public/portal/trunk/portal-service/src/com/liferay/documentlibrary/service/DLLocalService.java (login as guest and no password), you will no longer find the class DLFolderLocalServiceUtil. We are using the existing DLFolderLocalServiceUtil class as well. Thanks for the heads up. We will refactor our code so when 6.1 comes around we can still use the DocumentLibrary services.
You need to always use DLAppServiceUtil ( as Liferay instructs specifically ). Here is my working code that saves a file to the CMS:
public static void saveFileToCMS(ActionRequest aReq, long groupId, String fileName, File filenameWithPath) {
try {
ServiceContext serviceContext = ServiceContextFactory.getInstance(
Group.class.getName(), aReq);
// prevents duplicate entries based on unique title name
Random rand = new Random();
Integer suffix = new Integer(rand.nextInt(10000));
DLAppServiceUtil.addFileEntry(groupId, 0, fileName, "application/vnd.ms-excel",
fileName + suffix.toString(), "description goes here", "changelogname",
filenameWithPath, serviceContext);
//log.info("Successfully added the new file");
} catch (PortalException pe) {
log.error("Portal Exception occurred while saving file to CMS");
pe.printStackTrace();
} catch (SystemException e) {
log.error("System Exception occurred while saving file to CMS");
e.printStackTrace();
}
}

How do I (should I?) use Apache POI HWPFDocument?

I'm thinking about including the Apache POI into my application. Main goal is to output RTF document, but DOC would be nice, too. But the documentation is not very detailed about writing a HWPFDocument and everything I found on the web isn't helpful at all.
I can read DOC files, that's working without any problem. But I really can't see how I write a document. Maybe someone can give me a short code example?
Thanks a lot!
If you want to do RTF, These are text files and they are support in all versions of Word.
you can use itext for simple stuff
http://itextdocs.lowagie.com/tutorial/rtf/index.php
ro
you can export them the hard way
//-- save as example.doc -------------
{
\rtf1
\ansi
\ansicpg1252
\deff0
\deflang1033
{\fonttbl
{\f0
\fswiss
\fcharset0 Arial;
}
}
{
\*
\generator Msftedit 5.41.21.2500;
}
\viewkind4
\uc1
\pard
\f0
\fs20
Hello World
\par
}
Well,
It has been a long time since the last time I used POI. I read that the HWPFDocument is now orphaned (read on apache POI website). I would recommend using the WordML specification released by Microsoft instead.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
I have used this method before. The easiest way is to create a WordML template and just replace the values using XPATH

Categories