Convert Excel to Word using Aspose.Word for Java - java

I have a requirement to convert excel template to word. Then using Aspose.Word for JAVA I can merge all word templates (including the converted excel template) to PDF file.
Aspose, iText, POI, Jasper, Birt etc doesn't support this. Is there any API in Java which allows this kind of conversion?

Although, you cannot convert Excel spreadsheets to Word documents directly via Aspose.Cells APIs. FYI, Aspose.Cells is a spreadsheet management library that manages MS Excel file formats only. We have another component i.e., Aspose.Words that manages or merges MS word documents. But, I think for your specific requirements, you have to use two Aspose APIs with two steps, that are; Aspose.Cells & Aspose.Pdf to achieve your goal. You will use Aspose.Cells APIs that allows you to convert the spreadsheet formats (XLS/XLSX, etc.) to PDF format. Then you will use Aspose.Pdf APIs which allows you to convert PDF to Word document for your needs.
I am working as Support developer/ Evangelist at Aspose.

you can try the Apache POI - the Java API for Microsoft Documents..
have a look here
http://viralpatel.net/blogs/java-read-write-excel-file-apache-poi/

Related

Simple PDF generation via Java batch: iText or Apache FOP?

I have to generate a simple PDF document from a little java batch (Java 7). The generated document will contain a list and a couple of tables (nothing fancy). Aside from license problems (AGPL is not an issue in this case), which library is faster/easier to implement and has better performances between iText and Apache FOP for the desired output?
As you said , you don't need fancy tables and you need faster and easier library to implement, I'd prefer iText because it is very much simpler than Apache FOP. It is very easy to add list and tables to your PDF document by using iText. Apache FOP is much concerned about generating PDF documents in which the data to be written is stored in XML. Basically Apache FOP's main objective is to convert XML files to PDF ones.
You can visit here for more details:- http://blog.xebia.com/comparing-apache-fop-with-itext/

Difference between Apache POI api and Apache Tika Api?

I had requirement to extract specific colums/rows from Excel/CSV file. Somebody suggest me to using Tika for this task.
While going thru tika, I came across POI API and found more friendly to use it.
we may have requirement to parse PDF file in further.
I am new to this technology, i would like know difference between two and which technology is more suitable for my requirement.
Thanks,
Krishna
Apache Tika provides a common way to extract consistent text and metadata from a wide range of formats. It also provides content detection, language detection and a few other bits. If you write your code to work with Apache Tika, then your code will be able to work with a huge range of formats in the same way. You don't need to worry about whether one format has a Title, or another calls the same logical thing a LongTitle or a Subject. You don't need to worry about what library to use for what format. You call Tika, it does the hard work for you, and back comes your consistent Metadata and Textual Content
Apache POI is one of the libraries that Tika uses. POI supports most of the main Microsoft Office formats, including Excel (.xls and .xlsx). It provides access to the whole of the file format, allowing you complete control over what information you read out. (It also supports writing). Tika uses POI to get text and metadata out of the various different Microsoft formats, but doesn't extract everything. Using POI directly would allow you to decide what you care about and get that.
If you want to support lots of file formats, use Tika. If you want full control of how you get the information out, use POI.
Apache POI is full blown parser/writer for most of the Microsoft Documents. It supports both newly introduced 2007 (XSSF) format and Microsoft 2003 file formats (HSSF). Apache POI provides two level of API for parsing and generating Microsoft files. One that is higher level API that is bit memory intensive which reads the whole file and keeps in the memory something similar to DOM parsing in XML and lower level API for memory intensive use which is similar to SAX/StAX parsing.
On the other hand Apache Tika is content analysis tool which I guess only supports Microsoft Excel and lot of other extraction components. There is no support for writing new files or generating content from Tika, anyway that is not the their use case at all.
So, you have to choose depending on your need.

Document text extraction and modification

I recently came across Apache Tika, a beautiful toolkit which handles files of several types to extract the text (and some other information such as metadata).
The problem which I am facing is that given a document (in some format such as PDF, DOC, XLS and so on), I need to extract the text, modify some of it, and re-build the document in its original format (with the modified text). To my knowledge, Tika provides the facility of extraction of text, but does not 'stitch' modified documents back.
I feel that there are some libraries which do this for specific file types, but I am not aware of any toolkit similar to Tika, which provides an end-to-end solution for me by handling all the file types supported by Tika. I am also not sure if Tika itself can do this for me.
If someone knows anything of this sort, please let me know. I am looking for a library written in Java.
Regards,
Salil
EDIT: coderanch.com/how-to/java/AccessingFileFormats has several toolkits lister, but I would appreciate something that wraps all the formats supported by Tika comprehensively.
Apache POI
Apache POI is your Java Excel solution (for Excel 97-2008). We have a complete API for porting other OOXML and OLE2 formats and welcome others to participate.
OLE2 files include most Microsoft Office files such as XLS, DOC, and PPT as well as MFC serialization API based file formats. The project provides APIs for the OLE2 Filesystem (POIFS) and OLE2 Document Properties (HPSF).
Office OpenXML Format is the new standards based XML file format found in Microsoft Office 2007 and 2008. This includes XLSX, DOCX and PPTX.
Eclipse Birt
Q: What report output formats does BIRT support?
Release 2.1 supports HTML, Paginated HTML and PDF.
Release 2.2 support HTML, Paginated HTML, PDF, WORD, XLS, and PostScript
It appears that there are no better toolkits as mentioned here. The only way out would be to write your own wrapper for one or more of these toolkits to get the work done. It would have been great if Tika itself provided that facility, but that unfortunately does not seem to be the case.

java use pdfbox from msoffice to pdf

Is it possible to convert from MS office file formats using Apache PDFBox (the documentation isn't clear about this, and the javadoc seems to indicate no such capability exists), or would I need to do some tedious conversions with Apache POI?
The reason I'm asking is the answer to this StackOverflow question:
https://stackoverflow.com/questions/10861227/convert-ms-office-to-pdf-in-java
I imagine I'll need to use Apache POI, but I wanted to clarify.
In order to do this conversion, you will need MS Office, or perhaps Google Drive. PDFBox does not convert from anything to PDF or vice versa -- it simply reads and writes PDF files. Apache POI will not do that type of conversion either -- it simply reads and writes MS Office files. Specifically, it does not render them. You could implement a rendering engine for each type of Office file yourself, but that would be a gargantuan task to say the least.
Take a look at https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/.
One of possible options it mentions is XWPFConverterPDFViaIText:
org.apache.poi.xwpf.converter.pdf provides the DOCX 2 Pdf converter
based on Apache POI XWPF and iText.
You can test this converter with the REST Converter service
http://xdocreport-converter.opensagres.cloudbees.net/

Using the same API to write both Word and PDF documents

HI all
is there any kind of abstraction API over Apache POI/FOP allowing one to use the same API to write both Word and PDF documents ?
I'm not aware of a unified API for the two libraries you have mentioned.
However you may still have a couple of options using a single API:
Use Apache POI to generate the documents in Word format and then use a Word to PDF conversion library to create a PDF from the word document. Another commenter has suggested IText
Use OpenOffice via its Java API to create documents and export them in Microsoft Word or PDF format.
Docmosis will do what you require, assuming you mean a Java (or command line) API. It reads doc and odt files as templates, populates/manipulates via the Java API, and produces the output formats OpenOffice supports. Have a look at the online demo on the web site which lets you see various output formats to render a document in.
When I was working on previous project, I was sure the Apache/POI can be used for Microsoft Documents.
we have IText.jar which we can use it for PDF generation and alteration. please check this will help you.

Categories