Programatically convert complex Excel file into HTML format - java

I have a set of complex Excel files (with figures in it) that I want to show in a web browser. So I need to convert them into HTML page first. Since the excel files are very complex, I can not just parse them and generate a HTML table with HTML tags. The current manual solution that works fine is when I use Microsoft Excel software to save the spreadsheet as a HTML page. I want to automate this task in some way since I want to do it progrmatically through Java. Is their any existing solution or a way to do it? Thanks.
EDIT - I was able to create a macro for it but could not figure it out that how can I execute a macro on excel file from a Java program. Does somebody know?

If open office does a good job of the export then you could take a look at the source to see how it does it. OO is a combination of Java and C++ I believe so you might get lucky and find a Java solution.
Otherwise, I would try and use Excel itself to do the export and find some way of calling it programmatically. If you go down this path you'd be better of using a Microsoft stack (C# would be the most similar to Java) as I would expect it to have all the functions you need already defined.

You might look into POI:
http://poi.apache.org/

I think your best bet is to call Excel from Java using JACOB
Creating direct COM calls (which is what you'll be doing from JACOB) is a bit tough, but you'll get the hang of it. I can't imagine that the Excel VBA macro is horribly complicated. Take a look at the sample code (Usage and Documentation) in the JACOB link for what this will look like.
One other thing: Remember to explicitly clear references. JACOB will release COM handles when objects are garbage collected, but if you are doing any sort of high performance work, you will want to close those connections as quickly as possible. We generally write all of our COM code in a series of try/finally statements - the code is messy, but robust.

Try using hypernumbers. (Disclaimer, I'm the CEO)

I ended up using the Scribd API. I uploaded the document to their server through their API in realtime and pasted an iframe with a link in it which is returned by Scribd.

Related

Apache POI and EXCEL

I'm using Apache POI API to access an Excel .xlsx file, using the API I can read/write cells.
My problem is: How can I do that with the .xlsx file opened in Excel GUI?
If I try to do that I have conflict arising from concurrent access to the same resource (The process cannot access the file because it is being used by another process).
I have been told that the answer is Excel RTD and c#, c++ or other languages.
BUT I want to stick with Java,what could I do? Is switching to linux an option?
THANKS!!!
AFAIK poi only works on the file system, so there is no interaction through Real-Time Data. I think you should not edit the xlsx file while it is still open in excel if you want to prevent corruption.
If you want to use RTD, you should try to find java bindings for that. I think they are COM based, so maybe JACOB can help you. http://sourceforge.net/projects/jacob-project/
See also this discussion: http://sourceforge.net/p/jacob-project/discussion/375946/thread/946012e8/
Oh. Btw. COM is Windows only, so I would stay on Windows :)
Accessing and modifying a resources by 2 separate entities at the same time does not imply that you'll end up with a synchronized version at both ends. On the contrary, provided you manage to do so you have all the chances of ending up with an incorrect/bogus/corrupted result. Translated into java, you may think of it as multiple threads altering a variable in an unsynchronized way.
Some programs (notepad++, idea, eclipse on editor reactivation, etc) have implemented additional mechanisms which will detect if a file has been modified on the file-system outside the program itself, and provide you with options such as: reload file, ignore modifications, merge, etc, and others simply ignore these changes overwriting the file.
My guess is you'd have to do a similar thing or rethink your scenario about updating the files and triggering notifications.
As the other users said, there is no way to do this from poi. Options:
Your best option is RTD (you write a thin RTD "server" in C#, install it in the registry, and talk to it from java, e.g. via some socket; within excel, users just enter RTD formulas in their cells, for which excel calls your rtd server to get the latest data).
You can also write the data directly to excel using COM (there are also java libraries to do this, such as teamdev's jexcel, or you could write your own com wrappers).
You can write your own excel plugin.
Finally, there are lower level solutions which I've heard talk of but don't understand.

writing large excel spreadsheets

has anybody found a library that works well with large spreadsheets?
I've tried apache's POI but it fails miserably working with large files - both reading and writing. It uses massive amounts of memory leaving you needing a supercomputer to parse or create a 20+mb spreadsheet.
Surely there is a more memory efficient way and someone has written it?!
#pstanton..
I was working on a similar solution and was able to write large excel 2007 files with hundreds of rows exported from database. Here is the link to it:
http://vikramvkamath.blogspot.com/2010/07/writing-large-excel-files-excel-2007.html
My solution is an extension on Yegor Koslov's SheetWriter class follow this http link and it works very well for me.
Let me know in case you face any issues.
~Vikram
I cannot really recommend a library to you. But when you need the best performance, it might be worth a try to go to the people who came up with Excel in the first place. I guess the APIs that are available from .NET are much more efficient in handling Excel files. So the idea would be to implement a web service or similiar component in .NET that does most of the Excel-related grunt work for you and just invoke that from Java.
This is basically the same idea as Jannik's, but you use the Java COM Bridge to access the Excel APIs directly from Java. We have had good success doing this with Word. Obvious downside is that it only works on Windows.
Have you tried JExcelAPI as an alternative to POI ? I confess I can't comment on it's memory efficiency.
at time of posting, there is no pure java scalable solution for reading and writing large excel files.
May be CSV file format can help you. You just need to seperate each value by comma and save file with .csv extension.

Generating excel documents programmatically

Has anyone used a Java based library for generating excel documents? Preferably support for 2003?
I'm currently working with Apache POI, ( http://poi.apache.org/index.html ) which is very comprehensive. The 2003 file format version is still in beta, but seems to work well enough. I'm not exercising it's power very much, just straightforward reads and writes of Excel, but it seems reliable.
Whenever I have to do this I ask myself if one big html table would be enough.
much of the time it is. You can simply write html tags and label it as a .xls file. Excel will open it correctly
If you don't need fancy headings then just output CSV.
JExcelApi
I've used it personally for a report that is currently in production. It's a fairly decent library with sufficient docs, and it's open source.
It works very well, but has a few gotchas you should be aware of. None of them are deal breakers, just dictate how a few things should be done. Just be sure to read the FAQ. It will explain them and tell you how to avoid them.
A formatted HTML table will import correctly, but it would be better to use the Excel XML format from the Excel 2003 XML Toolbox for more advanced needs (multiple worksheets, formulas, etc).
You can also try SmartXLS for java,
it have more functions than poi and jexcelapi,and it is a commercial product.
http://www.smartxls.com/indexj.htm
I do it with Jacob as a general java and COM solution. However in my reading Jacob does not handle pretty well multiple calls (say millions of calls) and I needed to patch it a bit. These patches were not accepted by Jacob maintainer.
Anyway Jacob is open source (LGPL) and after patching it I have a production environment running for years yet.
After connecting to Excel with COM, you use standard Excel api to process documents. First you try how it works with VBS (VBScript Language Reference), then implement in java.
You can generate an excel file with a VBS and then cal the script from java like this:
String script = "your_VBS_Name.vbs"
String cmd = "D:\\YourPath" + script;
Runtime.getRuntime().exec(cmd);
to create the script is really simple
open notepad and follow the next example:
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
Set objWorkbook = objExcel.Workbooks.Add()
objWorkbook.SaveAs("D:\yourExcel.xls")
objExcel.Quit
and then save it as your_VBS_Name.vbs
Thats it!

Clientside Javascript --> Serverside Java --> user is served a .doc

I am helping someone out with a javascript-based web app (even though I know next to nothing about web development) and we are unsure about the best way to implement a feature we'd like to have.
Basically, the user will be using our tool to view all kinds of boring data in tables, columns, etc. via javascript. We want to implement a feature where the user can click a button or link that then allows the user to download the displayed data in a .doc file.
Our basic idea so far is something like:
call a Java function on the server with the desired data passed in as a String when the link is clicked
generate the .doc file on the server
automatically "open" a link to the file in the client's browser to initiate the download
Is this possible? If so, is it feasible? Or, can you recommend a better solution?
edit: the data does not reside on the server; rather, it is queried from a SQL database
Yep, its possible. Your saviour is the Apache POI library. Its HWPF library will help you generate Microsoft word files using java. The rest is just clever use of HTTP.
Your basic idea sounds a bit Rube-Goldbergesque.
Is the data you want in the document present on the server? If so, then all you need to do is display a plain HTML link with GET parameters that describes the data (i.e. data for customer X from date A to date B). The link will be handled on the server by a Servlet that gets the data and produces the .DOC file as its output to be downloaded by the browser - a very simple one-step process that doesn't even involve any JavaScript.
Passing large amount data as GET/POST around might not be the best idea. You could just pass in the same parameters you used to generate the HTML page earlier. You don't even need to use 3rd party library to generate DOC. You could just generate a plain old HTML file with DOC extension and Word will be happy to open it.
Sounds like Docmosis Java library could help - check out theonline demo since shows it something similar to what you're asking - generating a real doc file from a web site based on selections in the web page. Docmosis can query from databases and run pretty much anywhere.

How do I use Apache POI to read a .DOC file in Java to separate images from text?

I need to read a Word .doc file from Java that has text and images. I need to recognize the images & text and separate them into 2 files.
I've recently heard about "Apache POI." How I can use Apache POI to read Word .doc files?
The examples and sample code on apache's site are pretty good. I recommend you start there.
http://poi.apache.org/hwpf/quick-guide.html
To get specific bits of text, first create a org.apache.poi.hwpf.HWPFDocument. Fetch the range with getRange(), then get paragraphs from that. You can then get text and other properties.
Here for an example of extracting an image. Here for the latest revision as of this writing.
And of course, the Javadocs
Note that, according to the POI site,
HWPF is still in early development.
It's not free (or even cheap!) but Aspose.Words should be able to do this. Their evaluation download will let you play with small files.
Do the destination files also have to be Docs? You could open the docs in Office and save them out as HTML. Then the separation becomes trivial. RTF is also a viable option, but I can't recommend a good RTF parser off the top of my head.
Edit to say: I just remembered another possible solution: Jacob, but you'll need an instance of Office running on the same machine. It's short for Java COM Bridge and it lets you make calls to the COM libraries in Office to manipulate the documents. I'm sure it's not as scary as it might sound!

Categories