Map Excel workbook with multiple sheets to XSD - java

I have an Excel workbook with multiple sheets. Each sheet holds a table, the different tables have different formats.
I need to read the entire workbook into my Java program. The most convenient method IMHO is to export the entire data into a single XML and parse it (using simpleXML or some other compatible parser).
I have found no method for applying a schema to multiple sheets of a workbook, only to a single sheet. Is it possible? If so, how?

When it comes to convenience, there are many factors that influence or define it. For example, it depends if this is an ongoing thing, or if it needs to be integrated into a process, etc.
Before recommending a solution as suggested, I would try to convince you to take a look at Apache's POI (the Java API for Microsoft Documents), specifically the Excel API. It gives you a Java API for your Java program that should allow you to read what you need pretty easily. It would be a one stop shop kind of thing.
Another approach might be to use Jdbc to Odbc and access the Excel via JDBC API (JDBC to ODBC provider). I can't tell from details in your question if your deployment model would allow for this (e.g. if you run on a platform that doesn't have an ODBC provider for Excel files), but on Windows for sure is an option; also, many places on internet detailing this approach.
If you insist on going down the XML export way, QTAssistant (I am associated with it) has a comprehensive solution (XML Builder) for generating XML from any supported relational data source. It provides a GUI and a command line. In your case it would need the XLS, an XSD which describes the XML you want to get out and a mapping file (basically another XML file) to create the XML you need. In general this feature is largely used to convert test data into XML for Web service calls, so it is geared towards a certain interaction pattern between the user, the tool, and the XML generation activities. If you're interested in more details, let me know.

Related

Saving Data from a JavaFX-Application without Database

Unfortunately I couldn't find anything specific to this topic / to my problem. Here we go:
I'm building a JavaFX Business Application for a friend of mine. Unfortunately I do not have any possibility to connect to a Database. I want the Application to load a savestate from a file. The application contains a list with clients and the clients got some specific properties. I do not want to hardcode this to a .prop or .txt file, because I'm sure that there's a different way of doing this, isn't there?
Thanks in advance, appreciate it!
Lots of choices for persisting data to local storage. The exact choice depends on your needs. You do not describe enough details to make a specific recommendation.
Here is a list of possibilities, roughly in increasing order of complexity of your data.
Text file
If you have small amounts of simple data, save to a text file. You can store each piece in a separate file, or combine into a single file. Recent versions of Java have new classes to make this easier than ever. See Oracle Tutorial.
Comma-separate & Tab-delimited
For sets of structured data, write to text files in comma-separated values (CSV) or tab-delimited values. For example a list of people with rows for each person, and columns for name, phone number, and email address.
While reading/writing such files is easy enough to program yourself, I suggest using an established library to eliminate the drudgery, avoid bugs, and save yourself some time. There are a few such libraries written in Java.
My favorite is the Apache Commons CSV project. This library makes easy work of the chore of reading/writing such files. Despite the name, this library supports tab-delimited as well as comma-separated formats. I've written a few Answers here on Stack Overflow showing how to use this library, as you can see here, here, and here.
By the way, plain old ASCII defines a few character positions explicitly for delimiting in data files, with four levels of grouping (document, group, record/row, and field). Unicode, of course, inherits these from ASCII as code points. I am puzzled why these have remained so obscure and so infrequently used. Seems much more logical to me than using commas and tabs which may well exist inside the data payload.
Serialization
You can write out the data values stored within an object. This is called serialization. Java has a serialization facility built-in, but be sure to study up on the details.
To more simply write out an object’s values and later read them back in to reconstitute an object, I have enjoyed using the Simple XML Serialization project. This works well for relatively simple needs, and is aimed at the situation where you want the structure of a class to drive the process of determining what to write.
Java has other XML binding facilities both built-in and third-party. These are much more powerful in their flexibility. They are especially good for when you want to define and verify the XML structure in a rigid fashion such as defining a XML DTD or XML Schema against which to validate the data and perhaps even generate the Java class in which to represent the data.
Embedded database
For more complicated data, use an embedded relational database.
The SQLite database is bundled with many platforms. This is a C-based library, not pure Java. As the name indicates, SQLite is indeed quite “lite“, lacking rigid data types and many other common database features. SQLite is meant to be an alternative to writing text files than as a competitor to more serious databases. It is a great product if your needs fit the sweet-spot of its capabilities.
My first choice for an embedded database would be H2 Database Engine. Built in pure Java. Can be run inside your app, or separately as a server (you choice). Has sophisticated relational database features. Has been around for years, often updated, and is well-worn. The principal author has much experience in the field.

VBA connection to Java

I have an Excel with some macros. The data is currently sourced manually. In order to automate the report, I need to source the data directly from Oracle database. Unfortunately, this cannot be done, as it is a production database and passwords cannot be shared with anyone.
The next best possible approach is to connect via the Java layer. How can I connect VBA with a Java service?
Any conceptual starting points will also be appreciated.
There is a very nice API from Apache called POI for processing Microsoft documents. http://poi.apache.org/
The other approach is to use OLEDB driver for Excel which will allow you to read data from Excel exactly as you will do from any database using JDBC.
Interop between different technologies likes this is commonly achieved with a combination http and xml.
It's a long time since I saw this done so the technologies might be out of date but you can create a ADO record set from XML.
Excel can make a http call to a Java server that returns the xml. This xml can then be used to create a record set for Excel to consume just as if that record set were obtained directly from the database.

Invoke HSSF Serializer Invocation

I have to write a very large XLS file, I have tried Apache POI but it simply takes up too much memory for me to use.
I had a quick look through StackOverflow and I noticed some references to the Cocoon project and, specifically the HSSFSerializer. It seems that this is a more memory-efficient way to write XLS files to disk (from what I've read, please correct me if I'm wrong!).
I'm interested in the use case described here: http://cocoon.apache.org/2.1/userdocs/xls-serializer.html . I've already written the code to write out the file in the Gnumeric format, but I can't seem to find how to invoke the HSSFSerializer to convert it to XLS.
On further reading it seems like the Cocoon project is a web framework of sorts. I may very well be barking up the wrong tree, but:
Could you provide an example of reading in a file, running the HSSFSerializer on it and writing that output to another file? It's not clear how to do so from the documentation.
My friend, HSSF serializer is part of POI. You are just setting certain attributes in the xml to be serialized (but you need a whole process to create it). Also, setting a whole pipeline using this framework just to create a XLS seems odd as it changes the app's architecture. ¿Is that your decision?
From the docs:
An alternate way of generating a spreadsheet is via the Cocoon
serializer (yet you'll still be using HSSF indirectly). With Cocoon
you can serialize any XML datasource (which might be a ESQL page
outputting in SQL for instance) by simply applying the stylesheet and
designating the serializer.
If memory is an issue, try XSSF or SXSSF in POI.
I don't know if by "XLS" you mean a specific, prior to Office 2007, version of this "Horrible SpreadSheet Format" (which is what HSSF stands for), or just anything you can open with a recent version of MS Office, OpenOffice, ...
So depending on your client requirements (i.e. those that will open your Excel file), another option might be available : generating a .XLSX file.
It comes down to producing an XML file in the proper grammar, which seems to be fit to your situation, as you seem to have already done that with the Gnumeric XML-based file format without technical trouble, and without hitting memory-effisciency issues.
Please note other XML-based spreadsheet formats exist, that Excel and other clients would be able to use. You might want to dig into the open document file formats.
As to wether to use Apache Cocoon or something else:
Cocoon can sure host the XSL processing ; batch (Cocoon CLI) processing is available if you require Cocoon, but require it not to run as a webapp (though as far as I remember, CLI feature was broken in the lastest builds of the 2.1 series) ; and Cocoon comes with a load of features and technologies that could address further requirements.
Cocoon might be overkill if it just comes down to running an XSL transformation, for which there is a bunch of well-known, lighter tools you can pick from.

Spreadsheet Parser in Java/Groovy

Hi I'm looking to parse spreadsheets (xls/ods) in Groovy. I have been using the Roo library for Ruby and was looking to try the same tasks in Groovy, as Java is already installed on a development server I use, and I would like to keep the number of technologies on the server to a simple core few.
I am aware that the ods format is zipped XML, and so can be parsed as such, but I would like to process the file using spreadsheet concepts, not XML concepts.
The ability to process xls files is not of major importance, but would save me having to save multiple xls files to ods (as this is for parsing data from clients).
Thanks
I would suggest Apache POI for access to .xls files.
I've never had to work with the .ods format, so no information on that one.
There's also JExcelAPI, which has a nice, clean, simple interface (for the most part).
Can't help you with ODS Files though.
How about looking at 'odftoolkit' ? http://odftoolkit.openoffice.org/
Groovy in Action has a chapter named "Groovy on Windows" that discusses using Scriptom, a Groovy/COM bridge (using JACOB under the covers), to access several Windows apps including Excel.
For OpenOffice, you can use ODF Toolkit, as Amit pointed out.
I second jdmichal's vote for Apache POI. I have selected it as our library of choose to handle Excel file input (.XLS). The project is also working on the .XLSX file format if you ever decide you want to support that. Based on your specifications, I don't think you want to get into converting things into CSV and it seems like you have established input and output paths. For anyone who hasn't had the joy of dealing with CSV to Excel conversion, it can get a bit dicey. I have spent hours dealing with issues created by Excel converting string data to numeric data. You can see other testimonies to this effect on the POI Case Studies page. Beyond these issues, I simply don't want to personally have to handle these inputs. I'd rather invest the programming effort and streamline the workflow for the future.
I too have not dealt with ODF and have no plans to support it in my current project. You might want to check out the OpenOffice.org ODF Toolkit Project.
Good luck and have fun,
- D.
I suggest you to take a look at SimpleXlsBuilder and SimpleXlsSlurper, both are based on apache POI and can fit your basic needs for reading from and writing to Excel 97 spreadsheets in a concise way.
If your spreadsheets are simple enught - without charts and other embedded contents - you should simply convert the spreadsheet to CSV.
Pros:
Both xls and ods will produce the same CSV - You'll have to handle just one input type.
You won't have to mess with new versions of (Open) Office.
Handling plaintext is always more fun than other obscure formats.
Cons:
One that I can think of - finding a reliable converter from xls and odf to csv. Shouldn't be too hard - OpenOffice has a built in one.
A couple things:
1) I agree that using a CSV format can simplify some of the development work. OpenCSV can help with processing CSV files. There are other good CSV parsers for Java out there. Just remember that anything that's available for Java can be used by Groovy due to Groovy's unparalleled integration with Java.
2) I know you said you wanted to avoid handling XML, but Groovy makes XML processing exceedingly simple.

Generate Dynamic Excel from Java

We've pre-defined Excel document structure with lots of formulas and macros written.
During download of Excel, thru Java application we populate certain cells in Excel with data. After download when user open Excel, macros & formulas embedded in it will read the pre-populated data and behave accordingly.
We are right now using ExtenXLS to generate Dynamic Excel document from Java. Licence is CPU based and it doesn't support Boxes with Dual core CPU. We are forced to buy more licence.
Is there any better tool we can look at it which is either free, product and support cost are minimal (Support is must), licence is simple?
I quite liked using the Apache POI Project HSSF library (http://poi.apache.org/) - it was fairly easy to use. I didn't use it in that much depth, but it seemed fairly powerful. Also, there's JExcelAPI (http://sourceforge.net/projects/jexcelapi/) which I've not used.
If your users will have a recent version of Excel, it isn't too hard to tweak the XML file format by hand. Just save an existing document as XML, and find the places you want to replace.
I work on an open source project called XLLoop - this framework allows you to expose POJO functions as Excel functions.
So, instead of populating the excel sheet with data you could create a function that downloaded the data and have it populate in place.

Categories