has anybody found a library that works well with large spreadsheets?
I've tried apache's POI but it fails miserably working with large files - both reading and writing. It uses massive amounts of memory leaving you needing a supercomputer to parse or create a 20+mb spreadsheet.
Surely there is a more memory efficient way and someone has written it?!
#pstanton..
I was working on a similar solution and was able to write large excel 2007 files with hundreds of rows exported from database. Here is the link to it:
http://vikramvkamath.blogspot.com/2010/07/writing-large-excel-files-excel-2007.html
My solution is an extension on Yegor Koslov's SheetWriter class follow this http link and it works very well for me.
Let me know in case you face any issues.
~Vikram
I cannot really recommend a library to you. But when you need the best performance, it might be worth a try to go to the people who came up with Excel in the first place. I guess the APIs that are available from .NET are much more efficient in handling Excel files. So the idea would be to implement a web service or similiar component in .NET that does most of the Excel-related grunt work for you and just invoke that from Java.
This is basically the same idea as Jannik's, but you use the Java COM Bridge to access the Excel APIs directly from Java. We have had good success doing this with Word. Obvious downside is that it only works on Windows.
Have you tried JExcelAPI as an alternative to POI ? I confess I can't comment on it's memory efficiency.
at time of posting, there is no pure java scalable solution for reading and writing large excel files.
May be CSV file format can help you. You just need to seperate each value by comma and save file with .csv extension.
Related
guys. In our project, we need to add a new feature which is to export some test result data into PDF and Word format file for user. The structure of test result data in our system is generally simple 2D table (some may be a little complex that has cell consolidation). Is there any stable and mature java open source solution for this? Or should we use any report solution like BIRT (but we feel such solution may be too heavy weight for this feature)? Because the deadline of this project is tough, to save some investigation time I post a question here and want to get some suggestions. Any suggestion will be very appreciated, thanks. By the way, our project is a Swing application.
JasperReports allow export to PDF and DOC, it might be easier than using POI and iText.
As time is short and this is a new feature, you might temporize by writing a comma-separated-value file with a name acceptable to Excel. At a later time, implement a more robust solution using Apache POI or something similar that your research identifies.
I need to create an excel file with simple text data. Concretely, it will allocate a list (one per cell) of email addresses. Data volume will be aprox 200000, so, I want to format it in, at least, 4 sheets (same book). This is because I need to be able to open the file in old Excel versions (max 65536 rows per sheet).
Due to simplicity of data (with no charts, neither functions, etc.), I can use many APIs. Few years ago I used Apache POI to manage excel files with nice results, but I think that is too heavy (1,7MB for version 3.7) and I want to know if there is other APIs less heavy.
Also, i would like it would be available in Maven repository.
Thanks in advance.
JExcelAPI is your friend here. All fully Mavenised and ready to go.
Not directly an answer but this page seems to have some good information in comparing the different solutions. Gary's suggestion in JExcelAPI and ApachePOI are both presented along with a few more
I have a set of complex Excel files (with figures in it) that I want to show in a web browser. So I need to convert them into HTML page first. Since the excel files are very complex, I can not just parse them and generate a HTML table with HTML tags. The current manual solution that works fine is when I use Microsoft Excel software to save the spreadsheet as a HTML page. I want to automate this task in some way since I want to do it progrmatically through Java. Is their any existing solution or a way to do it? Thanks.
EDIT - I was able to create a macro for it but could not figure it out that how can I execute a macro on excel file from a Java program. Does somebody know?
If open office does a good job of the export then you could take a look at the source to see how it does it. OO is a combination of Java and C++ I believe so you might get lucky and find a Java solution.
Otherwise, I would try and use Excel itself to do the export and find some way of calling it programmatically. If you go down this path you'd be better of using a Microsoft stack (C# would be the most similar to Java) as I would expect it to have all the functions you need already defined.
You might look into POI:
http://poi.apache.org/
I think your best bet is to call Excel from Java using JACOB
Creating direct COM calls (which is what you'll be doing from JACOB) is a bit tough, but you'll get the hang of it. I can't imagine that the Excel VBA macro is horribly complicated. Take a look at the sample code (Usage and Documentation) in the JACOB link for what this will look like.
One other thing: Remember to explicitly clear references. JACOB will release COM handles when objects are garbage collected, but if you are doing any sort of high performance work, you will want to close those connections as quickly as possible. We generally write all of our COM code in a series of try/finally statements - the code is messy, but robust.
Try using hypernumbers. (Disclaimer, I'm the CEO)
I ended up using the Scribd API. I uploaded the document to their server through their API in realtime and pasted an iframe with a link in it which is returned by Scribd.
Has anyone used a Java based library for generating excel documents? Preferably support for 2003?
I'm currently working with Apache POI, ( http://poi.apache.org/index.html ) which is very comprehensive. The 2003 file format version is still in beta, but seems to work well enough. I'm not exercising it's power very much, just straightforward reads and writes of Excel, but it seems reliable.
Whenever I have to do this I ask myself if one big html table would be enough.
much of the time it is. You can simply write html tags and label it as a .xls file. Excel will open it correctly
If you don't need fancy headings then just output CSV.
JExcelApi
I've used it personally for a report that is currently in production. It's a fairly decent library with sufficient docs, and it's open source.
It works very well, but has a few gotchas you should be aware of. None of them are deal breakers, just dictate how a few things should be done. Just be sure to read the FAQ. It will explain them and tell you how to avoid them.
A formatted HTML table will import correctly, but it would be better to use the Excel XML format from the Excel 2003 XML Toolbox for more advanced needs (multiple worksheets, formulas, etc).
You can also try SmartXLS for java,
it have more functions than poi and jexcelapi,and it is a commercial product.
http://www.smartxls.com/indexj.htm
I do it with Jacob as a general java and COM solution. However in my reading Jacob does not handle pretty well multiple calls (say millions of calls) and I needed to patch it a bit. These patches were not accepted by Jacob maintainer.
Anyway Jacob is open source (LGPL) and after patching it I have a production environment running for years yet.
After connecting to Excel with COM, you use standard Excel api to process documents. First you try how it works with VBS (VBScript Language Reference), then implement in java.
You can generate an excel file with a VBS and then cal the script from java like this:
String script = "your_VBS_Name.vbs"
String cmd = "D:\\YourPath" + script;
Runtime.getRuntime().exec(cmd);
to create the script is really simple
open notepad and follow the next example:
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
Set objWorkbook = objExcel.Workbooks.Add()
objWorkbook.SaveAs("D:\yourExcel.xls")
objExcel.Quit
and then save it as your_VBS_Name.vbs
Thats it!
Hi I'm looking to parse spreadsheets (xls/ods) in Groovy. I have been using the Roo library for Ruby and was looking to try the same tasks in Groovy, as Java is already installed on a development server I use, and I would like to keep the number of technologies on the server to a simple core few.
I am aware that the ods format is zipped XML, and so can be parsed as such, but I would like to process the file using spreadsheet concepts, not XML concepts.
The ability to process xls files is not of major importance, but would save me having to save multiple xls files to ods (as this is for parsing data from clients).
Thanks
I would suggest Apache POI for access to .xls files.
I've never had to work with the .ods format, so no information on that one.
There's also JExcelAPI, which has a nice, clean, simple interface (for the most part).
Can't help you with ODS Files though.
How about looking at 'odftoolkit' ? http://odftoolkit.openoffice.org/
Groovy in Action has a chapter named "Groovy on Windows" that discusses using Scriptom, a Groovy/COM bridge (using JACOB under the covers), to access several Windows apps including Excel.
For OpenOffice, you can use ODF Toolkit, as Amit pointed out.
I second jdmichal's vote for Apache POI. I have selected it as our library of choose to handle Excel file input (.XLS). The project is also working on the .XLSX file format if you ever decide you want to support that. Based on your specifications, I don't think you want to get into converting things into CSV and it seems like you have established input and output paths. For anyone who hasn't had the joy of dealing with CSV to Excel conversion, it can get a bit dicey. I have spent hours dealing with issues created by Excel converting string data to numeric data. You can see other testimonies to this effect on the POI Case Studies page. Beyond these issues, I simply don't want to personally have to handle these inputs. I'd rather invest the programming effort and streamline the workflow for the future.
I too have not dealt with ODF and have no plans to support it in my current project. You might want to check out the OpenOffice.org ODF Toolkit Project.
Good luck and have fun,
- D.
I suggest you to take a look at SimpleXlsBuilder and SimpleXlsSlurper, both are based on apache POI and can fit your basic needs for reading from and writing to Excel 97 spreadsheets in a concise way.
If your spreadsheets are simple enught - without charts and other embedded contents - you should simply convert the spreadsheet to CSV.
Pros:
Both xls and ods will produce the same CSV - You'll have to handle just one input type.
You won't have to mess with new versions of (Open) Office.
Handling plaintext is always more fun than other obscure formats.
Cons:
One that I can think of - finding a reliable converter from xls and odf to csv. Shouldn't be too hard - OpenOffice has a built in one.
A couple things:
1) I agree that using a CSV format can simplify some of the development work. OpenCSV can help with processing CSV files. There are other good CSV parsers for Java out there. Just remember that anything that's available for Java can be used by Groovy due to Groovy's unparalleled integration with Java.
2) I know you said you wanted to avoid handling XML, but Groovy makes XML processing exceedingly simple.