Spreadsheet Parser in Java/Groovy - java

Hi I'm looking to parse spreadsheets (xls/ods) in Groovy. I have been using the Roo library for Ruby and was looking to try the same tasks in Groovy, as Java is already installed on a development server I use, and I would like to keep the number of technologies on the server to a simple core few.
I am aware that the ods format is zipped XML, and so can be parsed as such, but I would like to process the file using spreadsheet concepts, not XML concepts.
The ability to process xls files is not of major importance, but would save me having to save multiple xls files to ods (as this is for parsing data from clients).
Thanks

I would suggest Apache POI for access to .xls files.
I've never had to work with the .ods format, so no information on that one.

There's also JExcelAPI, which has a nice, clean, simple interface (for the most part).
Can't help you with ODS Files though.

How about looking at 'odftoolkit' ? http://odftoolkit.openoffice.org/

Groovy in Action has a chapter named "Groovy on Windows" that discusses using Scriptom, a Groovy/COM bridge (using JACOB under the covers), to access several Windows apps including Excel.
For OpenOffice, you can use ODF Toolkit, as Amit pointed out.

I second jdmichal's vote for Apache POI. I have selected it as our library of choose to handle Excel file input (.XLS). The project is also working on the .XLSX file format if you ever decide you want to support that. Based on your specifications, I don't think you want to get into converting things into CSV and it seems like you have established input and output paths. For anyone who hasn't had the joy of dealing with CSV to Excel conversion, it can get a bit dicey. I have spent hours dealing with issues created by Excel converting string data to numeric data. You can see other testimonies to this effect on the POI Case Studies page. Beyond these issues, I simply don't want to personally have to handle these inputs. I'd rather invest the programming effort and streamline the workflow for the future.
I too have not dealt with ODF and have no plans to support it in my current project. You might want to check out the OpenOffice.org ODF Toolkit Project.
Good luck and have fun,
- D.

I suggest you to take a look at SimpleXlsBuilder and SimpleXlsSlurper, both are based on apache POI and can fit your basic needs for reading from and writing to Excel 97 spreadsheets in a concise way.

If your spreadsheets are simple enught - without charts and other embedded contents - you should simply convert the spreadsheet to CSV.
Pros:
Both xls and ods will produce the same CSV - You'll have to handle just one input type.
You won't have to mess with new versions of (Open) Office.
Handling plaintext is always more fun than other obscure formats.
Cons:
One that I can think of - finding a reliable converter from xls and odf to csv. Shouldn't be too hard - OpenOffice has a built in one.

A couple things:
1) I agree that using a CSV format can simplify some of the development work. OpenCSV can help with processing CSV files. There are other good CSV parsers for Java out there. Just remember that anything that's available for Java can be used by Groovy due to Groovy's unparalleled integration with Java.
2) I know you said you wanted to avoid handling XML, but Groovy makes XML processing exceedingly simple.

Related

Invoke HSSF Serializer Invocation

I have to write a very large XLS file, I have tried Apache POI but it simply takes up too much memory for me to use.
I had a quick look through StackOverflow and I noticed some references to the Cocoon project and, specifically the HSSFSerializer. It seems that this is a more memory-efficient way to write XLS files to disk (from what I've read, please correct me if I'm wrong!).
I'm interested in the use case described here: http://cocoon.apache.org/2.1/userdocs/xls-serializer.html . I've already written the code to write out the file in the Gnumeric format, but I can't seem to find how to invoke the HSSFSerializer to convert it to XLS.
On further reading it seems like the Cocoon project is a web framework of sorts. I may very well be barking up the wrong tree, but:
Could you provide an example of reading in a file, running the HSSFSerializer on it and writing that output to another file? It's not clear how to do so from the documentation.
My friend, HSSF serializer is part of POI. You are just setting certain attributes in the xml to be serialized (but you need a whole process to create it). Also, setting a whole pipeline using this framework just to create a XLS seems odd as it changes the app's architecture. ¿Is that your decision?
From the docs:
An alternate way of generating a spreadsheet is via the Cocoon
serializer (yet you'll still be using HSSF indirectly). With Cocoon
you can serialize any XML datasource (which might be a ESQL page
outputting in SQL for instance) by simply applying the stylesheet and
designating the serializer.
If memory is an issue, try XSSF or SXSSF in POI.
I don't know if by "XLS" you mean a specific, prior to Office 2007, version of this "Horrible SpreadSheet Format" (which is what HSSF stands for), or just anything you can open with a recent version of MS Office, OpenOffice, ...
So depending on your client requirements (i.e. those that will open your Excel file), another option might be available : generating a .XLSX file.
It comes down to producing an XML file in the proper grammar, which seems to be fit to your situation, as you seem to have already done that with the Gnumeric XML-based file format without technical trouble, and without hitting memory-effisciency issues.
Please note other XML-based spreadsheet formats exist, that Excel and other clients would be able to use. You might want to dig into the open document file formats.
As to wether to use Apache Cocoon or something else:
Cocoon can sure host the XSL processing ; batch (Cocoon CLI) processing is available if you require Cocoon, but require it not to run as a webapp (though as far as I remember, CLI feature was broken in the lastest builds of the 2.1 series) ; and Cocoon comes with a load of features and technologies that could address further requirements.
Cocoon might be overkill if it just comes down to running an XSL transformation, for which there is a bunch of well-known, lighter tools you can pick from.

How to create .mpp file in java?

I am able to create .mpx file by using mpxj library in java.
I need write ( create ) .mpp file in java can any one suggest me please.
I maintain MPXJ, and the short answer to your enquiry is that, at present, MPXJ does not write MPP files.
The main reason for this is simply that despite the effort which has gone into understanding the MPP file structure, there is still a great deal of it which is not well understood, hence it is difficult to reliably generate. The other issue is that even if I was to produce some code which could generate an MPP file, the features it could write to that file are likely to lag behind what MPXJ supports in the MSPDI file format, again due to my incomplete understanding of the MPP format.
My suspicion is that the next version of MS project (project 15? Project 2013?) may probably offer a ".mppx" file format, similar to the ".docx" etc formats used by other applications in the MS Office suite. This will be XML-based and will be more straightforward to generate than the binary MPP file format currently is... let's see what Microsoft come up with!
Jon
Visit http://www.mpxj.org/faq/
Can I use MPXJ to write MPP files?
Not at present. Although it is technically feasible to generate an MPP file, the knowledge we have of the file structure is still relatively incomplete, despite the amount of data we are able to correctly extract. It is therefore likely to take a considerable amount of development effort to make this work, and it is conceivable that we will not be ablet to write the full set of attributes that MPXJ supports back into the MPP file - simply because we don't understand the format well enough. You are therefore probably better off using MSPDI which does support the full range of data items present in an MPP file.
You can
Try this: http://www.aspose.com/java/project-management-component.aspx
It writes MPP and Microsoft Project XML.
But this not free
Try this: http://www.aspose.com/java/project-management-component.aspx
It writes MPP and Microsoft Project XML.
I think by "mpp" you probably mean "Microsoft PowerPoint", correct?
Q: Why do you think MPXJ (Microsoft Project Exchange/Java) can't do this?
http://www.mpxj.org/
Welcome to MPXJ! This library provides a set of facilities to allow
project information to be manipulated in Java and .Net. MPXJ supports
a range of data formats: Microsoft Project Exchange (MPX), Microsoft
Project (MPP,MPT), Microsoft Project Data Interchange (MSPDI XML),
Microsoft Project Database (MPD), Planner (XML), Primavera (PM XML,
XER, and database), and Asta Powerproject (PP, MDB).

Generate PowerPoint 2007/2010 file using Java

Does anyone know of any API (commercial or open-source) that can generate/edit PowerPoint 2007/2010 presentations through Java. I have a template in the PowerPoint 2007/2010 format that I require to edit/update. So far I have been converting the .pptx file to xml and then editing and storing it back as .pptx. But the file gets corrupted while opening.
Is anyone aware of any other method or API that do this in Java?
We have done it programmatically (closed source at the moment, sorry) so might be able to help, but beware of a few gotchas.
One is that the POI project (at least when we looked at it last year), was quite incomplete. It didn't do PPTX Charts - which is the one feature we wanted. Infact the POI site may not be upto date, but they don't appear to support PowerPoint 20087 format (http://poi.apache.org/slideshow/index.html). Everybody recommends this project, but our evaluation was that it was pretty much useless for generating PowerPoint 2007 files via Java. Your milage may vary.
Apose also had some significant limitations when we looked at it; not doing Charts in PowerPoint 2007 being the blocking issue for us.
Another issue is that PowerPoint 2007 can be quite buggy. We have had a number of progammatically produced PPT files that caused lock ups, but when testing, we found that we can repro crashes and lock ups with simple PPTX documents created in PowerPoint 2007 - i.e. not our code.
In the end, we did the following: Unpacked a 'template' PowerPoint file to a folder, then on demand, filled the template XML with new values, zipped it up, renaming various elements & delivered it to the user as a valid PPTX. Works OK, other than the odd PowerPoint crash when people edit the file. If there was a market for it, I guess we could package up the code as a webservice (i.e xml/csv -> PPTX) or put together a commerical package, but we wouldn't do it for free.
docx4j (apache license) now includes a pptx4j component, which can open/edit/save pptx documents.
Yes. Check this out http://poi.apache.org/, they just released version 3.6 which now supports Office 2007 format documents. The best part is that it's free!
To generate a PowerPoint presentation from a template file, you can use PPT Templates.
This library provides a fluent API to replace variables inside the PPT template:
try(FileOutputStream out = new FileOutputStream("generated.pptx")) {
new PptMapper()
.text("variable", "Hello")
.text("other_variable", "World!")
.processTemplate(PptTemplateDemo.class.getResourceAsStream("/title.pptx"))
.write(out);
}
With this library, you can process text and images in the template.
Another solution that may work for you is Windward Reports (disclaimer, I'm the founder & CEO there). It uses PPTX as one of the supported template formats and merges in data to then generate a PPTX (or PDF, etc.) output.
If the edit/update you need can be handled via the data tags in Windward, this should be trivial for you. If what you need cannot be handled by the tags, then this won't work for you.
Well as mentioned by GrantB best way is to create a template, then load the template , traverse the xml graph,update the data and stream out to a output ppt. We recently did it to generate reports for clients that had complex visuals and charts in ppt. You can have a look here generate ppt in java

writing large excel spreadsheets

has anybody found a library that works well with large spreadsheets?
I've tried apache's POI but it fails miserably working with large files - both reading and writing. It uses massive amounts of memory leaving you needing a supercomputer to parse or create a 20+mb spreadsheet.
Surely there is a more memory efficient way and someone has written it?!
#pstanton..
I was working on a similar solution and was able to write large excel 2007 files with hundreds of rows exported from database. Here is the link to it:
http://vikramvkamath.blogspot.com/2010/07/writing-large-excel-files-excel-2007.html
My solution is an extension on Yegor Koslov's SheetWriter class follow this http link and it works very well for me.
Let me know in case you face any issues.
~Vikram
I cannot really recommend a library to you. But when you need the best performance, it might be worth a try to go to the people who came up with Excel in the first place. I guess the APIs that are available from .NET are much more efficient in handling Excel files. So the idea would be to implement a web service or similiar component in .NET that does most of the Excel-related grunt work for you and just invoke that from Java.
This is basically the same idea as Jannik's, but you use the Java COM Bridge to access the Excel APIs directly from Java. We have had good success doing this with Word. Obvious downside is that it only works on Windows.
Have you tried JExcelAPI as an alternative to POI ? I confess I can't comment on it's memory efficiency.
at time of posting, there is no pure java scalable solution for reading and writing large excel files.
May be CSV file format can help you. You just need to seperate each value by comma and save file with .csv extension.

Generating excel documents programmatically

Has anyone used a Java based library for generating excel documents? Preferably support for 2003?
I'm currently working with Apache POI, ( http://poi.apache.org/index.html ) which is very comprehensive. The 2003 file format version is still in beta, but seems to work well enough. I'm not exercising it's power very much, just straightforward reads and writes of Excel, but it seems reliable.
Whenever I have to do this I ask myself if one big html table would be enough.
much of the time it is. You can simply write html tags and label it as a .xls file. Excel will open it correctly
If you don't need fancy headings then just output CSV.
JExcelApi
I've used it personally for a report that is currently in production. It's a fairly decent library with sufficient docs, and it's open source.
It works very well, but has a few gotchas you should be aware of. None of them are deal breakers, just dictate how a few things should be done. Just be sure to read the FAQ. It will explain them and tell you how to avoid them.
A formatted HTML table will import correctly, but it would be better to use the Excel XML format from the Excel 2003 XML Toolbox for more advanced needs (multiple worksheets, formulas, etc).
You can also try SmartXLS for java,
it have more functions than poi and jexcelapi,and it is a commercial product.
http://www.smartxls.com/indexj.htm
I do it with Jacob as a general java and COM solution. However in my reading Jacob does not handle pretty well multiple calls (say millions of calls) and I needed to patch it a bit. These patches were not accepted by Jacob maintainer.
Anyway Jacob is open source (LGPL) and after patching it I have a production environment running for years yet.
After connecting to Excel with COM, you use standard Excel api to process documents. First you try how it works with VBS (VBScript Language Reference), then implement in java.
You can generate an excel file with a VBS and then cal the script from java like this:
String script = "your_VBS_Name.vbs"
String cmd = "D:\\YourPath" + script;
Runtime.getRuntime().exec(cmd);
to create the script is really simple
open notepad and follow the next example:
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
Set objWorkbook = objExcel.Workbooks.Add()
objWorkbook.SaveAs("D:\yourExcel.xls")
objExcel.Quit
and then save it as your_VBS_Name.vbs
Thats it!

Categories