POI reading Excel file with body in String - java

Currenty I am trying to read an Excel file that is polled via Apache Camel (2.25.1).
This means the method gets the file contents via a String:
#Handler
public void processFile(#Body String body) {
For reading the Excel file I use Apache POI and POI-ooxml (both 4.1.2).
However, using the String directly
WorkbookFactory.create(new ByteArrayInputStream(body.getBytes(Charset.forName("UTF-8"))))
throws an "java.io.IOException: ZIP entry size is too large or invalid".
Using the String with other encodings:
WorkbookFactory.create(new ByteArrayInputStream(body.getBytes()))
throw "org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException: No valid entries or contents found, this is not a valid OOXML (Office Open XML) file".
Besides, I tried:
File file = exchange.getIn().getBody(File.class);
Workbook workbook = new XSSFWorkbook(new FileInputStream(file));
Probably because the file is read from an FTP-server, a java.io.FileNotFoundException is thrown: Invalid file path
However, the next code does work:
URL url = new URL(fileFtpPath);
URLConnection urlc = url.openConnection();
InputStream ftpIs = urlc.getInputStream();
Workbook workbook = new XSSFWorkbook(ftpIs);
But I prefer not making a connection to the FTP server myself, since Camel has already read the file and the needed Excel contents are available (in String body).
Is there any way to read the contents of the Excel file from the String with Apache POI?

I have my routes in XML, so I use groovy to process excel files, perhaps you may find it helpful
import org.apache.poi.ss.usermodel.WorkbookFactory
def workbook = WorkbookFactory.create(request.getBody(File.class))
def sheet = workbook.getSheetAt(0)
...
There is another approach usually using for large excel files where we are dealing with a stream. To go this way we should implement XSSFSheetXMLHandler.SheetContentsHandler from org.apache.poi.xssf.eventusermodel
You could find a copy of the original POI example in this SO question, for some reason it was recently deleted from poi svn. If you are interested, my groovy version looks like this
import org.apache.poi.openxml4j.opc.OPCPackage
import org.apache.poi.ooxml.util.SAXHelper
import org.apache.poi.xssf.eventusermodel.XSSFReader
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable
import org.apache.poi.hssf.usermodel.HSSFDataFormatter
import org.xml.sax.InputSource
class MyHandler implements XSSFSheetXMLHandler.SheetContentsHandler {
...
}
def pkg = OPCPackage.open(request.getBody(InputStream.class))
def xssfReader = new XSSFReader(pkg)
def sheetParser = SAXHelper.newXMLReader()
def handler = new XSSFSheetXMLHandler(xssfReader.getStylesTable(), null, new ReadOnlySharedStringsTable(pkg), MyHandler, new HSSFDataFormatter(), false)
sheetParser.setContentHandler(handler)
sheetParser.parse(new InputSource(xssfReader.getSheetsData().next()))

You can directly convert the body into InputStream and pass this into XSSFWorkbook constructor
Exchange exchange = consumerTemplate.receive("file://C:/ftp/?noop=true", pollCount);
InputStream stream = exchange.getIn().getBody(InputStream.class);
XSSFWorkbook workbook = new XSSFWorkbook(stream);
XSSFSheet sheet = workbook.getSheetAt(0);

Related

JXL Library - Excel file fails to open after file path added as hyperlink

In SoapUI, I am writing code in groovy and I have added a file path as hyperlink using JXL library. During the runtime, I got no code error, but when I open results.xls, I get the below error
I have been trying to find a solution to this problem but Im not winning. I am not sure what I have done wrong in the code.
I have tried several ways with the file path as a variable, but none of those is working:
c:\temp\test.xml
file:///C:\temp\test.xml
Here is my code:
import jxl.*
import jxl.write.*
import com.eviware.soapui.support.XmlHolder
import com.eviware.soapui.model.project.ProjectFactoryRegistry
import com.eviware.soapui.impl.wsdl.WsdlProjectFactory
import groovy.xml.XmlUtil
//Read the source document
Workbook sourceDocument = Workbook.getWorkbook(new File("${directory}\\Data.xls"))
Sheet dataSheet = sourceDocument.getSheet("TestData")
//Create a new document as a target from the source
WritableWorkbook targetDocument = Workbook.createWorkbook(new File("${directory}\\TestResults.xls"), sourceDocument)
WritableSheet resultSheet = targetDocument.createSheet("Results", 0)
for (r=1; r<dataSheet.getRows(); r++) {
//do something....
//add hyperlink
def xmlFile = "file:///C:\\temp\\test.xml"
WritableHyperlink hyperlink = new WritableHyperlink(1, counter, new File(xmlFile), "FileLink");
resultSheet.addHyperlink(hyperlink);
}
targetDocument.write()
targetDocument.close()
sourceDocument.close()
Please note that I cannot use POI for some reason, so I stick with JXL.

Excel says: "Excel file is not a valid file extension or format type..."

I am using apache.poi to make an excel file, but when I generate it with a simple java code when I'm trying to open with Microsoft Excel, it says that this file cannot be opened because my file format or extension not valid.
I'm using the latest poi, and Micrisoft Office 2019 32-bit.
package com.company;
import java.io.*;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
public class Main {
public static void main(String[] args) throws FileNotFoundException, IOException{
Workbook wb = new HSSFWorkbook();
// An output stream accepts output bytes and sends them to sink.
OutputStream fileOut = new FileOutputStream("Geek.xlsx");
// Creating Sheets using sheet object
Sheet sheet1 = wb.createSheet("Array");
Sheet sheet2 = wb.createSheet("String");
Sheet sheet3 = wb.createSheet("LinkedList");
Sheet sheet4 = wb.createSheet("Tree");
Sheet sheet5 = wb.createSheet("Dynamic Programing");
Sheet sheet6 = wb.createSheet("Puzzles");
System.out.println("Sheets Has been Created successfully");
wb.write(fileOut);
}
}
I build it, works fine, but it makes a wrong file! What might I have done wrong?
You have to use like this.
Workbook workbook = new XSSFWorkbook(); // <--- for creating Geek.xlsx file
Workbook workbook = new HSSFWorkbook(); // <--- for creating Geek.xls file
At least if somebody meet with this problem I have resolved the problem with changing the file extension in the java code with *.xls extension (this format type is the older excel files type) and it goes like charm.
U.i.: Thanks Avi Meltser

After extracting .xls ole - file is empty with Microsoft Excel but readable with apache-poi

I have a .doc file with an embedded .xls and an embedded .doc.
I can extract both files and save it.
When I want to open the .doc - document everything is fine.
When I want to open the .xls - document it is empty, the editor opens nothing, I also dont see any empty cells nothing.
So I tried to read again with apache-poi the extracted .xls document and when I look at the Sheet-Name or Content of the cells - everything is there.
Do you have any ideas what it is?
My setup is:
apache-poi version 3.15 (I also tried some minor versions)
The word and excel files were created with office 2007.
the code - part:
POIFSFileSystem fs = new POIFSFileSystem(file);
POIOLE2TextExtractor poiole2TextExtractor = ExtractorFactory.createExtractor(fs);
POITextExtractor[] embeddedExtractors = ExtractorFactory.getEmbededDocsTextExtractors(poiole2TextExtractor);
for (POITextExtractor textExtractor : embeddedExtractors) {
// If the embedded object was an Excel spreadsheet.
if (textExtractor instanceof ExcelExtractor) {
ExcelExtractor excelExtractor = (ExcelExtractor) textExtractor;
DirectoryNode directoryNode = (DirectoryNode) excelExtractor.getRoot();
HSSFWorkbook hssfWorkbook = new HSSFWorkbook(directoryNode, true);
File tmp = new File(targetfolder, "test.xls");
FileOutputStream fileOutputStream = new FileOutputStream(tmp);
hssfWorkbook.write(fileOutputStream);
fileOutputStream.flush();
fileOutputStream.close();
hssfWorkbook.close();
}
Thank you :)
So somehow i found the problem:
For HSSFWorkbook I needed to set the following attribute:
hssfWorkbook.setHidden(false);
For all formats xlsx (2007) if you call that method, you will get an NotImplementedException - so you have to fix that manually... I found the solution as follows:
String workbookContent = new String(ZipFileUtils.getInnerFile(tmp, "xl/workbook.xml"), "UTF-8");
workbookContent = workbookContent.replaceFirst("visibility=\"hidden\"", "");
ZipFileUtils.replaceZippedFile(tmp, "xl/workbook.xml",
workbookContent.getBytes( "UTF-8"), new FileOutputStream(tmp2));
where tmp = My Extracted xlsx File and I save it to an new one at the momement tmp2

Save embedded files from .xls document (Apache POI)

I would like to save all attached files from an Excel (xls/HSSF) without extension.
I've been trying for a long time now, and I really don't know if this is even possible. I also tried Apache Tika, but I don't want to use Tika for this, because I need POI for other tasks, anyway.
I tried the sample code from the Busy Developers Guide, but this does not extract files in the old office format (doc, ppt, xls). And it throws an Error when trying to create new SlideShow(new HSLFSlideShow(dn, fs)) Error: (Remove argument to match HSLFSlideShow(dn))
My actual code is:
public static void saveEmbeddedXLS(InputStream fis_param, String embDIR) throws IOException, InvalidFormatException{
//HSSF - XLS
int i = 0;
System.out.println("Starting Embedded Search in xls...");
POIFSFileSystem fs = new POIFSFileSystem(fis_param);//create FileSystem using fileInputStream
HSSFWorkbook workbook = new HSSFWorkbook(fs);
for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
System.out.println("Objects : "+ obj.getOLE2ClassName());//the OLE2 Class Name of the object
String oleName = obj.getOLE2ClassName();//Document Type
DirectoryNode dn = (DirectoryNode) obj.getDirectory();//get Directory Node
//Trying to create an input Stream with the embedded document, argument of createDocumentInputStream should be: String; Where/How can I get this correct parameter for the function?
InputStream is = dn.createDocumentInputStream(dn);//This line is incorrect! How can I do i correctly?
FileOutputStream fos = new FileOutputStream("embDIR" + i);//Outputfilepath + Number
IOUtils.copy(is, fos);//FileInputStream > FileOutput Stream (save File without extension)
i++;
}
}
So my simple question is:
Is it possible to save ALL attachments from an xls file without any extension (as simple as possible)? And can any one provide me a solution? Many Thanks!

Why do I failed to read Excel 2007 using POI?

When I try to initialize a Workbook object I always get this error:
The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
But I followed the office sample to do this, following is my code:
File inputFile = new File(inputFileName);
InputStream is = new FileInputStream(inputFile);
Workbook wb = new XSSFWorkbook(is);
Exception occurs at code line:
Workbook wb = new XSSFWorkbook(is);
Here is POI jar including:
poi-3.8-20120326.jar
poi-ooxml-3.8-20120326.jar
poi-ooxml-schemas-3.8-20120326.jar
xmlbeans-2.3.0.jar
Can any guys give me guidance? An example showing how to read a complete Excel 2007 document will be appreciated. Thanks in advance!
I assume that you have recheck that your original file is indeed in Office 2007+XML format, right?
Edit:
Then, if you are sure the format is ok, and it works for you using the WorkbookFactory.create, you can find the answer in the code of such method:
/**
* Creates the appropriate HSSFWorkbook / XSSFWorkbook from
* the given InputStream.
* Your input stream MUST either support mark/reset, or
* be wrapped as a {#link PushbackInputStream}!
*/
public static Workbook create(InputStream inp) throws IOException, InvalidFormatException {
// If clearly doesn't do mark/reset, wrap up
if(! inp.markSupported()) {
inp = new PushbackInputStream(inp, 8);
}
if(POIFSFileSystem.hasPOIFSHeader(inp)) {
return new HSSFWorkbook(inp);
}
if(POIXMLDocument.hasOOXMLHeader(inp)) {
return new XSSFWorkbook(OPCPackage.open(inp));
}
throw new IllegalArgumentException("Your InputStream was neither an OLE2 stream, nor an OOXML stream");
}
This is the bit that you were missing: new XSSFWorkbook(OPCPackage.open(inp))

Categories