My current program is trying to download files and then combine them into one large excel file. The issue that I'm struggling with is that the website I'm downloading them from is for some reason making them .html files, but appending the .xls extension to them. This allows them to be opened by Excel manually but does not allow me to use Apache POI in order to read them as it sees a file format/extension difference. My process is as follows:
1 - Run part of my program which downloads a file through my web browser using Selenium - This works fine
2 - Manually open each downloaded file and Save-As xlsx files (Note: When I open them in Excel manually is when I'm told there is a file format/extension difference just to be clear)
3 - Run the rest of my program which combs through each new file (the ones created in step 2) and appends all the data to the ultimate output file - This works fine
Is there any way to automate the process or am I going to have to continue to do it manually?
you said in the comment that you opened the file in text editor and saw that it is HTML5.
I would use HTML parser like jsoup to get the data that you need and create a new file using Apache POI.
You can use EasyXLS library. It allows to read HTML files and save as XLSX.
ExcelDocument workbookForXLSX = new ExcelDocument();
for (int i=0; i<fileCount; i++){
ExcelDocument workbookForHTML = new ExcelDocument();
workbookForHTML.easy_LoadHTMLFile(filePath[i]);//or stream to the file
workbookForXLSX.easy_addWorksheet((ExcelWorksheet)workbookForHTML.easy_getSheetAt(0));
workbookForHTML.Dispose();
}
workbookForXLSX.easy_WriteXLSXFile(filePathXLSX);
workbookForXLSX.Dispose();
You can download the Excel library for Java from:
https://www.easyxls.com/java-excel-library
More details about reading HTML files and what HTML tags are supported at:
https://www.easyxls.com/manual/basics/import-from-html-file-format.html
Related
I am currently using Apache POI to enter data into Excel file. The only problem is I can not keep the file open if I have to append data to the same file. Are there any specific sample codes which would allow me to do so?
My basic requirement is to fetch Runtime data from a place (this I am able to do) and add it to the Excel sheet while the file is still open.
Any suggestions?
I don't think that this is possible without a C# addon or some kind of macro. You could write a simple C# addon for Excel that connects to your java programm and recieves the realtime data. The addon will write it to the spreadsheat then.
I want to create excel file from java (for example with Apache POI) that contains web query with link to my application but I couldn't find any reference for it. Is that possible?
I'll even settle for updating the link of a web query in an existing excel file.
Thank you.
What I finally did is to create an xlsx file with a temp connection. Then I opened the file using a zip reader and modified the connections.xml file inside and then I zipped the file again to xlsx.
Works like a charm.
I am working on a struts based web application. In that application, we generate and download xls file from Jsp.
In Jsp file and web.xml, I have set the content-type as "application/vnd.ms-excel"
it seems xls files generated by the jsp pages are not real excel files, but a text format that is understood by the MS Excel. Hence excel opens the files and displays the output similar to excel files saved by MS Excel. Since newer versions of MS Office 2007/2010 checks the file extension and the content inside the file, they issue a warning that the file format does not match with the content.
To get rid of the warning how can i ensure that the generated xls is real office excel file .
Please help.
For future readers who might need this...
Excel will complain anyway, even if your excel file is 'correct', as long as its structure doesn't properly match the extension. So if you're saving as an .XLS, it expects to see the classic excel file.
The popup you are getting is because of the new security feature in Office 2007, called Extension Hardening and you can disable it if you want - either manually in the registry, or you can save the patch to a .REG file and share it to you clients etc.
Save those 2 line below into a GiveItSomeName.reg file, which you can then email to your clients and tell them to execute it.
[HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Excel\Security]
"ExtensionHardening"=dword:00000000
I have an app that generates a docx file base on user input. It uses Apache POI to generate the docx file and I can get the FileOutputStream from that, the document opens perfectly on a local machine when I write it to a file.
The webapp is using Dojo xhrPost to send the necessary data to the server to generate the document. What I am wondering is how I get the docx file to the client.
I know I can do it be creating a temp file and passing the location of that file to the client to download, but I would think there would be a way to do it by piping the FileOutputStream straight to the client, which would be much cleaner.
Any suggestions?
The answer from Mr Shiny in this SO question has an example streaming an excel file, should be very similar for a docx:
How can I get an Input Stream from HSSFWorkbook Object
Except that a docx content type should, probably, be application/vnd.ms-word
I want to show a PDF document from a Java (Swing) application in a system independent manner (provided that a PDF viewer is properly installed on the target system).
Also I'd like to deploy this PDF document using Java WebStart.
Could you please tell me the "standard" way to achieve this? (I confess, I'm to lazy/busy to look up the details ...) Thanks!
I assume you mean you want to deploy the PDF along with the Java Web Start application? If so, you simply need to package the PDF(s) with your webstart application. When your application downloads and runs, the PDFs will have come too, and your code can use the getClass().getResource("/where/is/my.pdf") type of lookup to locate the PDF and then operate on it for display. You might also need to get your code to read the PDF out of the resources and save it in a temp file (File.createTempFile()) so that the PDF viewer can see it.
rough idea:
// Find the PDF in the Webstart App download
InputStream in = getClass().getResourceAsStream("/where/is/my.pdf");
// create a temp file to copy the pdf to
File tmpFile = File.createTempFile("my", "pdf");
OutputStream out = new FileOutputStream(tmpFile);
// stream the file from in to out ... heaps of examples on the net for doing this ("copy files")
// display the file
Desktop.getDesktop().open(tmpFile);
// ideally clean the tmp file up at some point.
You can use the Java 6 Desktop system and Desktop.open() to open the associated desktop application for your document (in this case, a PDF file).