How to load old Microsoft Office XML file (Excel) using Java - java

I'm not able to load an Excel file in the older Office XML format (think Office 2002 or 2003 version) into Java. I tried JXL and Apache's POI (version 3.7). POI doesn't work since it appears to want the newer Office .xlsx format.
Here's an example of the older Office XML format.
One can generate a similar XML file from MS Excel 2010 by saving the workbook as the format "XML Spreadsheet 2003"?
Are there any open-source Java libraries that will load the XMLSS format? Otherwise I have no choice but to write a custom parser: read the XML file then interpret the cell tags to build out the cell matrix. In this XML format, any rows with empty cell values are skipped, the next cell with data positioned with an index attribute that acts like an offset in the columns, I assume to save space in the XML file.

The format is called SpreadsheetML (do not confuse with .xlsx which is also xml-based), a library called Xelem can handle it:
import nl.fountain.xelem.excel.Workbook;
import nl.fountain.xelem.lex.ExcelReader;
//...
ExcelReader reader = new ExcelReader();
Workbook xlWorkbook = reader.getWorkbook("c:\\my\\spreadsheet.xml");
System.out.println(xlWorkbook.getSheetNames());

Copying Mark Beardsley's answer from POI team http://apache-poi.1045710.n5.nabble.com/How-to-convert-xml-to-xls-td2306602.html :
You have got an Office 2003 xml file there, not an OpenXML file; it is an early attempt by Microsoft to create an xml based file format for Excel and it is in that sense a 'valid' Office file format.
Sadly, POI cannot interpret this file at all and that is why you saw the exception when you tried to wrap it up in the InputStream and pass it to WorkbookFactory(s) constructor. You do however have a number of options;
You could use Excel itself and manually open and save each file you wish to convert, as you already have done.
If you have access to Visual Studio and can write Visual Basic or C# code then you could use a control that will allow you to control Excel programmatically. This way you could automate a file conversion process using Excel itself. Then once the file has been converted wither to the binary or OpenXML formats, POI can be used to process it.
If you are running on a stand alone PC on which a copy of Excel is installed and using the Windows operating system, then you could use OLE to do something very similar from Java code. As above, POI can be used to process the file following the conversion.
If you have access to OpenOffice, it has a rather good API that is accessible from Java code. You could use it to convert between the file types for you - it is simply a matter of discovering the correct filter to use in this case. OpenOffice is good for all except the most complex files and you should be able to use POI to process the file following conversion. However, if you choose this route, it may be best to do all of the work using OpenOffice's UNO api.
Depending upon what you want to do with the file's contents, you could create your own parser using core java code and either the SAX or Xerces parsers (consider using xmlBeans (http://xmlbeans.apache.org/) ). If you simply open the original xml file using a simple text editor, you can see that the structure is not complex and, if all you wish to get at is the raw data it contains, this could be your best option.

After a lot of pain I've found a solution to this. JODConverter uses the OpenOffice.org/LibreOffice API and can convert SpreadsheetML to whatever formats OpenOffice.org suppports.

You might get some result using the OpenOffice API. If not directly you could probably convert to a 'supported' format.
Otherwise the schema for the Office 2003 'SpreadsheetML' isn't very complicated. I have succesfully created an xslt scenario to convert a resultset (database query) to a (simple yet effective) Excel 2003 document (XML format). The other way around should not be very hard to achieve.
Cheers,
Wim

The answer today was to ask the vendor to change their Excel file format to an Excel binary rather than the old Office XML. Doing so allowed me to use Apache POI 3.7 to read the file with no issues. I appreciate the answers, as I had no idea there was no direct support in the Java-based open source libraries for this old Office XML format. Now I know next time to check earlier to see what format the Excel files are in before committing to a timeline.

I had the same problem some time ago, ended up writing a SAX parser to read the XML file. I wrote a blog post about it here.
You can find the sample project to parse the file in Github.

Related

java use pdfbox from msoffice to pdf

Is it possible to convert from MS office file formats using Apache PDFBox (the documentation isn't clear about this, and the javadoc seems to indicate no such capability exists), or would I need to do some tedious conversions with Apache POI?
The reason I'm asking is the answer to this StackOverflow question:
https://stackoverflow.com/questions/10861227/convert-ms-office-to-pdf-in-java
I imagine I'll need to use Apache POI, but I wanted to clarify.
In order to do this conversion, you will need MS Office, or perhaps Google Drive. PDFBox does not convert from anything to PDF or vice versa -- it simply reads and writes PDF files. Apache POI will not do that type of conversion either -- it simply reads and writes MS Office files. Specifically, it does not render them. You could implement a rendering engine for each type of Office file yourself, but that would be a gargantuan task to say the least.
Take a look at https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/.
One of possible options it mentions is XWPFConverterPDFViaIText:
org.apache.poi.xwpf.converter.pdf provides the DOCX 2 Pdf converter
based on Apache POI XWPF and iText.
You can test this converter with the REST Converter service
http://xdocreport-converter.opensagres.cloudbees.net/

Parse XLSX files in Java without external libraries?

quick question I've been asked to create a couple of parsers for XLSX file formats. Pretty much everywhere I've read says to grab the POI libraries, however the system I am working on are very touchy about bringing on external APIs so I'd far rather have to do some extra leg work myself then go down that route.
So is it possible (without spending days of coding) via a SAXParser to Parse an XLSX file or am I a mug if I dont use the POI libraries?
Cheers
* UPDATE *
Since extracting the XLSX fileand having a better look at the archive, I believe I can now parse these files without spending days coding, I could probably extract the information within a few hours. I am however only looking to extract the physical cell data and not any reference data on those values i.e. cell reference. I am also looking to extract the XLSX metadata. I'll provide a quick answer on how I did this when I am done for future reference.
Without spending few days of coding...it's not possible...you have to write code for at least two three days....it's just a zip file but bunch of XML files and manifest xml
a standard xlsx file is not xml so nope its not possible.
correction: Walter Laan is correct, xlsx format is indeed a zip file full of xml's and shoud be relativly easily parseable
Effectively I did this, but obviously tailored my java to read the specific xlsx XML structure.
To open the xlsx in java use the ZipEntry API's & enumerate that entry to ensure you drill down through all the various folder structures. Then follow the guide below to read the XML:
http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/
Cheers

Generate PowerPoint 2007/2010 file using Java

Does anyone know of any API (commercial or open-source) that can generate/edit PowerPoint 2007/2010 presentations through Java. I have a template in the PowerPoint 2007/2010 format that I require to edit/update. So far I have been converting the .pptx file to xml and then editing and storing it back as .pptx. But the file gets corrupted while opening.
Is anyone aware of any other method or API that do this in Java?
We have done it programmatically (closed source at the moment, sorry) so might be able to help, but beware of a few gotchas.
One is that the POI project (at least when we looked at it last year), was quite incomplete. It didn't do PPTX Charts - which is the one feature we wanted. Infact the POI site may not be upto date, but they don't appear to support PowerPoint 20087 format (http://poi.apache.org/slideshow/index.html). Everybody recommends this project, but our evaluation was that it was pretty much useless for generating PowerPoint 2007 files via Java. Your milage may vary.
Apose also had some significant limitations when we looked at it; not doing Charts in PowerPoint 2007 being the blocking issue for us.
Another issue is that PowerPoint 2007 can be quite buggy. We have had a number of progammatically produced PPT files that caused lock ups, but when testing, we found that we can repro crashes and lock ups with simple PPTX documents created in PowerPoint 2007 - i.e. not our code.
In the end, we did the following: Unpacked a 'template' PowerPoint file to a folder, then on demand, filled the template XML with new values, zipped it up, renaming various elements & delivered it to the user as a valid PPTX. Works OK, other than the odd PowerPoint crash when people edit the file. If there was a market for it, I guess we could package up the code as a webservice (i.e xml/csv -> PPTX) or put together a commerical package, but we wouldn't do it for free.
docx4j (apache license) now includes a pptx4j component, which can open/edit/save pptx documents.
Yes. Check this out http://poi.apache.org/, they just released version 3.6 which now supports Office 2007 format documents. The best part is that it's free!
To generate a PowerPoint presentation from a template file, you can use PPT Templates.
This library provides a fluent API to replace variables inside the PPT template:
try(FileOutputStream out = new FileOutputStream("generated.pptx")) {
new PptMapper()
.text("variable", "Hello")
.text("other_variable", "World!")
.processTemplate(PptTemplateDemo.class.getResourceAsStream("/title.pptx"))
.write(out);
}
With this library, you can process text and images in the template.
Another solution that may work for you is Windward Reports (disclaimer, I'm the founder & CEO there). It uses PPTX as one of the supported template formats and merges in data to then generate a PPTX (or PDF, etc.) output.
If the edit/update you need can be handled via the data tags in Windward, this should be trivial for you. If what you need cannot be handled by the tags, then this won't work for you.
Well as mentioned by GrantB best way is to create a template, then load the template , traverse the xml graph,update the data and stream out to a output ppt. We recently did it to generate reports for clients that had complex visuals and charts in ppt. You can have a look here generate ppt in java

Convert from Excel xlsx to xls in Java

I have an Excel 2007 xlsx file that I would like to programmatically convert to an .xls file. The xlsx file is an export from a reporting tool, and I would like to convert it to xls for better compatibility with the software stack of my application users. The xlsx is as plain as it gets. Just rows with data and basic type information (int/date/string). No formulas.
My platform is Java, and I do not have Microsoft Office installed. I'm looking for a solution that will allow me to convert between the formats with the least amount of effort. I.e. I'd like to avoid having to write a custom "copy application" that would read the xlsx file and copy the rows and formatting to another file. Preferably, the solution is open source and/or free.
I have looked at POI, and as far as I could tell, it can read and write both xls and xlsx files. But I was not able to tell by browsing the documentation and examples if it could read one format and write out in the other. Before I dig in any deeper, I would like to check if any of you out there have done anything like this before in Java, and if you have any tips.
Converting with POI would be a tedious task. I would like to point you to JODConverter. JODConverter is used by OpenOffice to convert its stuff, so it should work fine for that task.
However, that being said, I have not used JODConverter myself.

Spreadsheet Parser in Java/Groovy

Hi I'm looking to parse spreadsheets (xls/ods) in Groovy. I have been using the Roo library for Ruby and was looking to try the same tasks in Groovy, as Java is already installed on a development server I use, and I would like to keep the number of technologies on the server to a simple core few.
I am aware that the ods format is zipped XML, and so can be parsed as such, but I would like to process the file using spreadsheet concepts, not XML concepts.
The ability to process xls files is not of major importance, but would save me having to save multiple xls files to ods (as this is for parsing data from clients).
Thanks
I would suggest Apache POI for access to .xls files.
I've never had to work with the .ods format, so no information on that one.
There's also JExcelAPI, which has a nice, clean, simple interface (for the most part).
Can't help you with ODS Files though.
How about looking at 'odftoolkit' ? http://odftoolkit.openoffice.org/
Groovy in Action has a chapter named "Groovy on Windows" that discusses using Scriptom, a Groovy/COM bridge (using JACOB under the covers), to access several Windows apps including Excel.
For OpenOffice, you can use ODF Toolkit, as Amit pointed out.
I second jdmichal's vote for Apache POI. I have selected it as our library of choose to handle Excel file input (.XLS). The project is also working on the .XLSX file format if you ever decide you want to support that. Based on your specifications, I don't think you want to get into converting things into CSV and it seems like you have established input and output paths. For anyone who hasn't had the joy of dealing with CSV to Excel conversion, it can get a bit dicey. I have spent hours dealing with issues created by Excel converting string data to numeric data. You can see other testimonies to this effect on the POI Case Studies page. Beyond these issues, I simply don't want to personally have to handle these inputs. I'd rather invest the programming effort and streamline the workflow for the future.
I too have not dealt with ODF and have no plans to support it in my current project. You might want to check out the OpenOffice.org ODF Toolkit Project.
Good luck and have fun,
- D.
I suggest you to take a look at SimpleXlsBuilder and SimpleXlsSlurper, both are based on apache POI and can fit your basic needs for reading from and writing to Excel 97 spreadsheets in a concise way.
If your spreadsheets are simple enught - without charts and other embedded contents - you should simply convert the spreadsheet to CSV.
Pros:
Both xls and ods will produce the same CSV - You'll have to handle just one input type.
You won't have to mess with new versions of (Open) Office.
Handling plaintext is always more fun than other obscure formats.
Cons:
One that I can think of - finding a reliable converter from xls and odf to csv. Shouldn't be too hard - OpenOffice has a built in one.
A couple things:
1) I agree that using a CSV format can simplify some of the development work. OpenCSV can help with processing CSV files. There are other good CSV parsers for Java out there. Just remember that anything that's available for Java can be used by Groovy due to Groovy's unparalleled integration with Java.
2) I know you said you wanted to avoid handling XML, but Groovy makes XML processing exceedingly simple.

Categories