Impossible to generate an Open Document from scratch? - java

I've been searching from hours how to generate from scratch an Open Office document (but a .doc would be glad, no docx) with a Java API (I'm using Grails). Anyway it seems that every method require a preexisting document to work correctly. I've looked on javaranch (http://www.coderanch.com/how-to/java/AccessingFileFormats) and nothing I've seen meets my desires.
So I ask you the question : is it actually possible to generate a .doc or .odt file from scratch ? Because I find that creating a document outside the code is pretty ugly.
Cheers

First, decide which "open office" you are talking about. Either it is "Office Open XML" which is the Microsoft branding of their XML document, or it is "OpenOffice" which is the native document format of "Libre Office/Open Office/Star Office" suites (and about 50 others).
Also, if you are searching for tutorials on how to access the internals of a document, odds are excellent that you will find instructions that require a document exist (to be accessed).
Perhaps searching for programmatic document creation would help, but if that doesn't land you what you need, you could note the libraries used to modify a document, and see if any of them provide APIs for "clean" document generation. Odds are good that one of them does, even if it is incomplete.
Sorry for the general answer, but it has to be general, as from the tag list, it is not even clear in which document format you intend to write.

OpenOffice.org's Universal Network Objects (UNO), allow you to generate .doc,.PDF, as well as OpenOffice documents. It supports several programming languages like: Java, C++, Visual Basic, etcc..
Some good things is that: its free, open source and plataform-independent
You can build documents, spreadsheets, presentations, etc. Start from scratch or using a template and fill the gaps..
In order to use it you will need to include some libraries that comes with the OpenOffice suite.
Useful links:
Open Office home
Open Office UNO Developer's Guide

Related

Create .vsdx files (Microsoft Visio) in Java

I'm looking for some info on how to create a .vsdx file in Java without any commercial libraries. According to other questions it seems to be pretty tough.
As a source we have a different, probably unknown file format called .epml that contains graphical information of EPCs which we should be able to convert to a .xml file. As far as I understand the .vsdx format so far, that's one of many files in the unzipped .vsdx required. I'd be glad if anyone could tell me about my options how to implement/create all the other files.
EDIT: The goal here is to be able to convert the graphic information of the .epml file so Visio is able to read & display it as in the source. Therefore, it doesn't have to be a .vsdx file if there are other possible options.
Thanks!
EPML is a not an unknown format, it is an interchange format for EPC tools. Just try to google it :)
I would suggest you convert your .epml files to .svg (there are free open source converters available, like epml2svg). Visio can read and show .svg files. Means - writing code does not seem to be required to achieve your goal (to convert .epml files to something Visio can show). AFAR there is online version of the tool as well - you upload EPML file, get back SVG, and just open it in Visio - that's it.
Side note - there are companies, like bpm-x for example, specializing in BPM tool-to-tool diagram conversion. Maybe they already have a solution for your original tool.
The .VSDX file is "office xml" format, that is also open and documented. But it's pretty tough to generate file from scratch, you are right. So in principle you could start with any code that is capable of handling open xml packages. Microsoft has OpenXML SDK, but that's .NET (MSDN HOWTO assumes you are using .NET, but explains basics of what the open xml package consists of)
AFAIK, for java, there are no open source visio libraries you could use. Java and Visio seem to live in parallel universes. The only viable commercial option I've heard of seem to be Aspose.
Interesting - whilst I cannot give a final answer, here are some thoughts:
Question 1: Why would you want to avoid commercial tools, when the final result file will require some - namely "Visio"?
1) Creating Visio files from XML:
Create template XMLs from a VSDX. Identify the files, that you need to edit. From what I've seen, these should be the masters and the pages files. Being able to make an XML from EPML, you should also know how to adapt it to a new structure.
This solution is probably by far the most tedious and less flexible.
2) Use Visio automation:
Presuming that the final document will need more than just graphics, namely shape data as well, an easier solution would consist of creating the graphics first
a) as SVG and import into Visio
b) even easier - automated drawing by Visio's automation capabilities (VBA, .Net, ...). The shapes to drop would already have been prepared as masters will all the relevant data and behaviour settings.
Then you would populate the data by means of one of the many data linking features (Wizard, Standard data linking, ODBC connections, etc.)

I have doc file which i want to convert into pdf using itext in java..how to do this things?

Don't understand what to do. I added latest version of iText jar file but not getting the solution.
please give me correct solution or code. please mention it stepwise. because I'm doing this first time...
You can't solve your use case as easily as you think. iText does not support Word to PDF conversion, because this requires very good knowledge of the OOXML format.
I'm an employee at iText Software, and for the last few weeks I've been investigating XDocReport, a tool that lets users manipulate MS Word files (DOCX) and OpenOffice/LibreOffice files (ODF). They also have a plugin that converts DOCX or ODF to PDF, with iText. Users that want this functionality can choose whether to use iText 2 under the LGPL license or iText 5 under the AGPL license.
XDocReport has already done an excellent job creating their converter tool with the limited human resources they have, and I've also made some minor progress on the project with fixing bugs and layout problems. However, there are still a lot of things that need fixing in order to make the PDFs closely resemble the MS Word documents.
The XDocReport has documentation for using their plugins, but it is not an easy thing to do, and you need to know a lot about Maven and dependencies to make it work. I've created a plugin discovery project to make that easier, but we haven't released it yet: as of now it doesn't produce the quality in PDFs that we feel our customers would want. We have no timeline for availability of this plugin.

Adding buttons, textfields, checkboxes etc. to an OpenOffice textdocument usind UNO with Java

I am trying to create a OpenOffice document using the UNO-API with Java.
I am already able to create a simple document and put some simple text into it.
What I want to do, i.e. need to do, is add also Forms (TextFields, CheckButtons, Push(Click)Buttons) into it.
The idea is to create a form in openoffice which in the end can be transformed to a PDF with interactive pdf-forms.
I am able of creating these with iText and manually with OpenOffice.
But I have not found a (simple or any) Example of creating such objects with the UNO-API.
So any help, hints or links (not that I haven't tried to find something via google, but maybe I just used the wrong key-words) is appreciated.
An alternative for you might be to use the ODF Toolkit, with its Java APIs for manipulating ODF documents, the native format of OpenOffice.org and its descendants.
I don't know if ODF Toolkit supports the features you are interested in though. Check at http://incubator.apache.org/odftoolkit/

easiest document format for displaying pages containing images and text in java

I want to be able to open up documents containing a combination of one or two pictures and text from java. The documents don't have to be pretty, but I need to be able to switch documents relatively quickly. I'm trying to figure out what the easiest method to do this is.
I can save the documents in whatever format is easiest for me, for instance html or PDF. But the documents must be somewhat easy to modify or generate new ones. I don't care if the document is displayed within a java frame or by an external tool so long as the tool is common enough to be installed on most OS and I can switch documents quickly and without too much hassle. This is an internal tool so it doesn't have to work at professional level quality.
Unfortunately, various company limitations make it a real hassle to get approval to use open source packages that haven't been pre-approved. So I can't do the obvious thing and grab an open source implementation of PDF or HTML reader for java.
So, any suggestions on the easiest format for my documents and how to read it?
You can use XHTML. So, your document will be directory that contains HTML document and image files as-is. you do not need anything beyond JDK to implement this and can use any browser to view such document. Modification is easy too.
Note: I said XHTML as a HTML that can be parsed using regular XML parser. I think it is the best choice for you.

How do I use Apache POI to read a .DOC file in Java to separate images from text?

I need to read a Word .doc file from Java that has text and images. I need to recognize the images & text and separate them into 2 files.
I've recently heard about "Apache POI." How I can use Apache POI to read Word .doc files?
The examples and sample code on apache's site are pretty good. I recommend you start there.
http://poi.apache.org/hwpf/quick-guide.html
To get specific bits of text, first create a org.apache.poi.hwpf.HWPFDocument. Fetch the range with getRange(), then get paragraphs from that. You can then get text and other properties.
Here for an example of extracting an image. Here for the latest revision as of this writing.
And of course, the Javadocs
Note that, according to the POI site,
HWPF is still in early development.
It's not free (or even cheap!) but Aspose.Words should be able to do this. Their evaluation download will let you play with small files.
Do the destination files also have to be Docs? You could open the docs in Office and save them out as HTML. Then the separation becomes trivial. RTF is also a viable option, but I can't recommend a good RTF parser off the top of my head.
Edit to say: I just remembered another possible solution: Jacob, but you'll need an instance of Office running on the same machine. It's short for Java COM Bridge and it lets you make calls to the COM libraries in Office to manipulate the documents. I'm sure it's not as scary as it might sound!

Categories