Trouble getting StreamingReader to work with Excel xlsx Java - java

This is the code that I have for reading a very large excel file (xlsx) that is 23.5MB with 700,000+ rows.
String dir = rootPath + File.separator + "tmpFiles" + File.separator
+ FILE_NAME;
File fisNew = new File(dir);
Workbook w = StreamingReader.builder()
.rowCacheSize(100)
.open(fisNew);
Sheet worksheet = null;
worksheet = w.getSheetAt(0);
worksheet.getRow(0).getPhysicalNumberOfCells();
I get an UnsupportedOperationException Null pointer error on this line:
worksheet.getRow(0).getPhysicalNumberOfCells(); And I also don't get an actual String value when I print out this line: SpecialtyUtil.removeWhiteSpaces(excelheader.getCell(0)). I am supposed to get the name of the column but I get some StreamingSheet string instead. Not so sure what I need to change here in order to process a xlsx file.
EDIT: Any idea how to write to an excel file using StreamingReader? I know that it is an unsupported operation, but is there a workaround?

If you look into the following source code in github link, StreamingSheet does not support the method getPhysicalNumberOfCells(). I provide below the code snippet.
/**
* Not supported
*/
#Override
public int getPhysicalNumberOfRows() {
throw new UnsupportedOperationException();
}
github link is given below.
https://github.com/monitorjbl/excel-streaming-reader/blob/master/src/main/java/com/monitorjbl/xlsx/impl/StreamingSheet.java#L97

We can use getLastRowNum()
Integer noOfCol = sheet.getLastRowNum(); // row no starts from 0 --- n
here is the implementation
#Override
public int getLastRowNum() {
return reader.getLastRowNum();
}
StreamingSheet.java

Related

Save embedded files from .xls document (Apache POI)

I would like to save all attached files from an Excel (xls/HSSF) without extension.
I've been trying for a long time now, and I really don't know if this is even possible. I also tried Apache Tika, but I don't want to use Tika for this, because I need POI for other tasks, anyway.
I tried the sample code from the Busy Developers Guide, but this does not extract files in the old office format (doc, ppt, xls). And it throws an Error when trying to create new SlideShow(new HSLFSlideShow(dn, fs)) Error: (Remove argument to match HSLFSlideShow(dn))
My actual code is:
public static void saveEmbeddedXLS(InputStream fis_param, String embDIR) throws IOException, InvalidFormatException{
//HSSF - XLS
int i = 0;
System.out.println("Starting Embedded Search in xls...");
POIFSFileSystem fs = new POIFSFileSystem(fis_param);//create FileSystem using fileInputStream
HSSFWorkbook workbook = new HSSFWorkbook(fs);
for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
System.out.println("Objects : "+ obj.getOLE2ClassName());//the OLE2 Class Name of the object
String oleName = obj.getOLE2ClassName();//Document Type
DirectoryNode dn = (DirectoryNode) obj.getDirectory();//get Directory Node
//Trying to create an input Stream with the embedded document, argument of createDocumentInputStream should be: String; Where/How can I get this correct parameter for the function?
InputStream is = dn.createDocumentInputStream(dn);//This line is incorrect! How can I do i correctly?
FileOutputStream fos = new FileOutputStream("embDIR" + i);//Outputfilepath + Number
IOUtils.copy(is, fos);//FileInputStream > FileOutput Stream (save File without extension)
i++;
}
}
So my simple question is:
Is it possible to save ALL attachments from an xls file without any extension (as simple as possible)? And can any one provide me a solution? Many Thanks!

Apache Poi, how to get the linked workbook?

I try to analyze excel files with links to other files and I like to know the file name and path. For that I'm using apache poi 3.14.
I figured it out for Ref3DPtg objects but for Ref3DPxg I don't know how to do it. I only get access to the cell address and the sheet name.
Does anyone know how to do it?
Code:
...
if(ptg instanceof Ref3DPxg){
cellAddress = ptg.format2DRefAsString();
sheetName = ptg.getSheetName();
workbookName = ???;
} else if(ptg instanceof Ref3DPtg) {
// by Ref3DPtg is no problem
}
Because of the way that the XLSX file format stores external references, which isn't actually =[Other.xlsx]Sheet1!A1 but actually =[23]Sheet1!A1, it's a two step process. First, get the external workbook number from the Pxg. Next, from Workbook get the ExternalLinks table for that workbook number, noting the off-by-one. (External Workbook 0 is actually the current workbook, so External Workbook 1 corresponds to External Link 0). Finally, fetch the filename for that link
So, your code should be something like:
if(ptg instanceof Ref3DPxg){
Ref3DPxg pxg = (Ref3DPxg)ptg;
int extWB = pxg.getExternalWorkbookNumber();
int extLink = extWB-1;
ExternalLinksTable links = wb.getExternalLinksTable().get(extLink);
String filename = links.getLinkedFileName();
}

Why am I getting the number of rows in an Excel sheet as 0 using JAVA?

I am trying to write some data into an Excel file with format (.xlsx). But the thing is that the first row of it is written into the file correctly and when I am trying to retrieve the number of rows and then increment the rows so that I could write more rows into the file, I am getting an exception of "Invalid format exception" and sometimes my code is getting executed but I am getting row count as 1. Please solve my issue and I am even attaching the code am working with.
public static int getRows(String memail,String testid) throws FileNotFoundException,IOException, InvalidFormatException{
int rc;
FileInputStream file = new FileInputStream(new File("email1.xlsx"));
XSSFWorkbook wb = new XSSFWorkbook (file);
WorkbookFactory.create(file);
XSSFSheet sheet = wb.getSheet(testid);
rc = sheet.getLastRowNum();
return rc;
}
Remove the following line
WorkbookFactory.create(file);
and try again.
If you have already a created file, while you are creating that again...
You can try Sheet.getPhysicalNumberOfRows(), this will returns the number of physically defined rows (NOT the number of rows in the sheet).One more point are you sure you are getting the correct sheet from wb.getSheet(testid);?

Null pointer exception with jexcel while writing

I need to append contents to an existing excel file using JExcel.
I am trying the following approach:
Read from existing workbook
workbook = Workbook.getWorkbook(new File(errorFilePath));
Create writable workbook from exisitng workbook into a temp file
if (!tempFile.exists()) {
tempFile.getParentFile().mkdirs();
tempFile.createNewFile();
}
newCopy = Workbook.createWorkbook(tempFile, workbook);
excelSheet = newCopy.getSheet(0);
Write to writable workbook(times is a writable cell format variable)
Label label;
label = new Label(column, row, stringData, times);
excelSheet .addCell(label);
Close both exisitng and writable workbook->Delete exisitng workbook
in finally block -> Rename temp file name to existing(now deleted) workbook name
finally {
if (null != newCopy) {
newCopy.write();
newCopy.close();
}
if (null != workbook) {
workbook.close();
}
if (null != errorFile && errorFile.exists()) {
errorFile.delete();
}
if (null != tempFile) {
tempFile.renameTo(new File(errorFilePath));
}
}
The problem is everything works fine for the first run(without redeploying).
But whenever I change some java code, and the web application redeploys I get a null pointer exception while closing the newly created workbook(after writing).
I am getting the following stack trace(originating from line newCopy.write())
java.lang.NullPointerException
at jxl.write.biff.CellValue.getData(CellValue.java:259)
at jxl.write.biff.LabelRecord.getData(LabelRecord.java:141)
at jxl.biff.WritableRecordData.getBytes(WritableRecordData.java:71)
at jxl.write.biff.File.write(File.java:147)
at jxl.write.biff.RowRecord.writeCells(RowRecord.java:329)
at jxl.write.biff.SheetWriter.write(SheetWriter.java:479)
at jxl.write.biff.WritableSheetImpl.write(WritableSheetImpl.java:1514)
at jxl.write.biff.WritableWorkbookImpl.write(WritableWorkbookImpl.java:950)
Java Version : 1.6
JExcel Version : 2.6.10
Windows 7
Well, first suspicion is, in this line:
label = new Label(column, row, stringData, times);
you pass null argument(s).
I faced the same issue.
I was trying to add rows to the sheet dynamically in a loop using insertRow. After spending several hours it was probably a bug in the latest version of jxl api.
JXL api after 2.6.9 seem to have bug in insertRow. I switched to 2.6.9 from 2.6.12.

POI - Excel remains held by some process and cannot be edited

I am using POI to read,edit and write excel files.
My process flow is like write an excel, read it and again write it.
Then if I try to edit the file using normal desktop excel application while my Java app is still running, the excel cannot be saved, it says some process is holding the excel,
I am properly closing all file handles.
Please help and tell me how to fix this issue.
SXSSFWorkbook wb = new SXSSFWorkbook(WINDOW_SIZE);
Sheet sheet = getCurrentSheet();//stores the current sheet in a instance variable
for (int rowNum = 0; rowNum < data.size(); rowNum++) {
if (rowNum > RECORDS_PER_SHEET) {
if (rowNum % (RECORDS_PER_SHEET * numberOfSheets) == 1) {
numberOfSheets++;
setCurrentSheet(wb.getXSSFWorkbook().createSheet());
sheet = getCurrentSheet();
}
}
final Row row = sheet.createRow(effectiveRowCnt);
for (int columnCount = 0; columnCount < data.get(rowNum).size(); columnCount++) {
final Object value = data.get(rowNum).get(columnCount);
final Cell cell = row.createCell(columnCount);
//This method creates the row and cell at the given loc and adds value
createContent(value, cell, effectiveRowCnt, columnCount, false, false);
}
}
public void closeFile(boolean toOpen) {
FileOutputStream out = null;
try {
out = new FileOutputStream(getFileName());
wb.write(out);
}
finally {
try {
if (out != null) {
out.close();
out = null;
if(toOpen){
// Open the file for user with default program
final Desktop dt = Desktop.getDesktop();
dt.open(new File(getFileName()));
}
}
}
}
}
The code looks correct. After out.close();, there shouldn't be any locks left.
Things that could still happen:
You have another Java process (for example hanging in a debugger). Your new process tries to write the file, fails (because of process 1) and in the finally, it tries to open Excel which sees the same problem. Make sure you log all exceptions that happen in wb.write(out);
Note: The code above looks correct in this respect, since it only starts Excel when out != null and that should only be the case when Java could open the file.
Maybe the file wasn't written completely (i.e. there was an exception during write()). Excel tries to open the corrupt file and gives you the wrong error message.
Use a tool like Process Explorer to find out which process keeps a lock on a file.
I tried all the options. After looking thoroughly, it seems the problem is the Event User model.
I am using the below code for reading the data:
final OPCPackage pkg = OPCPackage.open(getFileName());
final XSSFReader r = new XSSFReader(pkg);
final SharedStringsTable sst = r.getSharedStringsTable();
final XMLReader parser = fetchSheetParser(sst);
final Iterator<InputStream> sheets = r.getSheetsData();
while (sheets.hasNext()) {
final InputStream sheet = sheets.next();
final InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
}
I assume the excel is for some reason held by this process for some time. If I remove this code and use the below code:
final File file = new File(getFileName());
final FileInputStream fis = new FileInputStream(file);
final XSSFWorkbook xWb = new XSSFWorkbook(fis);
The process works fine an the excel does not remain locked.
I figured it out actually.
A very simple line was required but some some reason it was not explained in the New Halloween Document (http://poi.apache.org/spreadsheet/how-to.html#sxssf)
I checked the Busy Developer's Guide and got the solution.
I needed to add
pkg.close(); //To close the OPCPackage
This added my code works fine with any number of reads and writes on the same excel file.

Categories