Check if Excel file supports Apache POI HSSF or XSSF - java

I am using Apache POI HSSF/XSSF library to support Excel file reading functionality in my java code.
Now we are struggling with a thing that, we are not sure which version of Excel we will receive. It can be very old one or new one with macros and so on.
Workbook worbook = null;
if(FileUtils.getFileExt(file).equalsIgnoreCase("xls")) {
workbook = new HSSFWorkbook(new FileInputStream(file));
} else if (FileUtils.getFileExt(file).equalsIgnoreCase("xlsx"))
workbook = new XSSFWorkbook(new FileInputStream(file));
}
According to the Excel doc there is more formats like xlsm. xlt which we can receive.
Is there any other option to recognize which implementation should I use (HSSF vs XSSF)? Is there any possibility that xls will be not supported by HSSF?

Related

How to set Page Breaks View Mode for .xls using hssfwork book ApachePOI

I am creating an .xls excel file using apache poi. I need to set the Page Breaks View by default. But I did look at a related question on .xlsx file . I didn't find anything "how to set Page Breaks View Mode" to using HSSF of ApachePOI
enter image description here
The binary BIFF file system of *.xls and the Office Open XML file system of *.xlsx are two totally different file systems. You can't mix them. In apache poi HSSF ist for the one and XSSF for the other. The high level classes of apache poi try providing methods for both the file systems. This gets done using interfaces in SS. But outside the high level classes one needs strictly differentiate the both file systems.
Setting page break preview for a sheet is not provided by the high level classes up to now. So we need the underlyinf low lewel classes. For XSSF this are the org.openxmlformats.schemas.spreadsheetml.x2006.main.* classes, which are the XML representation of the XSSF internals. But for HSSF this are the org.apache.poi.hssf.record.* classes, which are the binary representation of the HSSF internals.
Setting page break preview for a sheet could be done like so for both the file systems:
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.hssf.model.InternalSheet;
import org.apache.poi.hssf.record.WindowTwoRecord;
public class ExcelPageBreakPreview {
public static void main(String[] args) throws Exception {
//Workbook workbook = WorkbookFactory.create(new FileInputStream("./ExcelTemplate.xlsx")); String filePath = "./ExcelInPageBreakPreview.xlsx";
Workbook workbook = WorkbookFactory.create(new FileInputStream("./ExcelTemplate.xls")); String filePath = "./ExcelInPageBreakPreview.xls";
Sheet sheet = workbook.getSheetAt(0);
//set sheet in PageBreakPreview
if (sheet instanceof XSSFSheet) {
XSSFSheet xssfSheet= (XSSFSheet)sheet;
xssfSheet.lockSelectLockedCells(true);
xssfSheet.getCTWorksheet().getSheetViews().getSheetViewArray(0).setView(org.openxmlformats.schemas.spreadsheetml.x2006.main.STSheetViewType.PAGE_BREAK_PREVIEW);
} else if (sheet instanceof HSSFSheet) {
HSSFSheet hssfSheet= (HSSFSheet)sheet;
InternalSheet internalSheet = hssfSheet.getSheet();
WindowTwoRecord record = internalSheet.getWindowTwo();
record.setSavedInPageBreakPreview(true);
}
FileOutputStream fileOut = new FileOutputStream(filePath);
workbook.write(fileOut);
fileOut.close();
workbook.close();
}
}
Previous apache poi versions might not have InternalSheet HSSFSheet.getSheet public. Then one needs using reflection to get the InternalSheet:
//InternalSheet internalSheet = hssfSheet.getSheet();
java.lang.reflect.Field _sheet = HSSFSheet.class.getDeclaredField("_sheet");
_sheet.setAccessible(true);
InternalSheet internalSheet = (InternalSheet)_sheet.get(hssfSheet);

Error: Type mismatch: cannot convert from HSSFWorkbook to Workbook

I use poi-3.2-FINAL-20081019.jar . Error:
Type mismatch: cannot convert from HSSFWorkbook to Workbook
try {
if (strType.equals("xls")) {
wb = new HSSFWorkbook(inputStream);
} else {
wb = new XSSFWorkbook(inputStream);
}
Sheet sheet = wb.getSheetAt(0);
How to fix it?
As the date in your jar shows - poi-3.2-FINAL-20081019.jar - you're usigng a jar that's almost 10 years old! You need to upgrade to something more modern, and at the very least something from this decade...
Right now (November 2017), the latest version is Apache POI 3.17. You can find the latest version on the Apache POI homepage, and see all the fixes in the Changelog
In addition, you should swap to using WorkbookFactory rather than looking at file extensions to work out what class to use. That hides all the detection complexity for you, works around mis-named files etc
Your code can then become the very simple
Workbook wb = WorkbookFactory.create(new File("input.xlsx"));
Sheet s = wb.getSheetAt(0);
(Use a File if you can, rather than InputStream, for lower memory)
Use newer version:
https://mvnrepository.com/artifact/org.apache.poi/poi/3.5-FINAL
Read more at: http://poi.apache.org/spreadsheet/converting.html

How to distinguish between XSSF and HSSF automatically in Apache POI?

I would like to be able to open Excel file of arbotrary type. Is it possible to select between HSSFWorkbook and XSSFWorkbook automatically?
Currently I write
Workbook workbook = new HSSFWorkbook(excelFile);
Can I write universal?
Yes! All you need to do is use WorkbookFactory
As per this part of the docs, it's better to use a File than an InputStream. So, just do something like:
File file = new File("input.xls");
Workbook wb = WorkbookFactory.create(file);
That will create whichever of HSSFWorkbook or XSSFWorkbook your file needs

While Reading the data from Excel file with extension xlsx using apache poi it takes long time

While reading the excel file with extension xlsx using apache poi it takes the long time for identifying the extension. Can you please help why it takes the long time?
if (file.getExcelFile().getOriginalFilename().endsWith("xls"))
{
workbook = new HSSFWorkbook(file.getExcelFile().getInputStream());
} else if (file.getExcelFile().getOriginalFilename().endsWith("xlsx"))
{
workbook = new XSSFWorkbook(file.getExcelFile().getInputStream());
} else {
throw new IllegalArgumentException("Received file does not have a standard excel extension.");
}
Promoting a comment to an answer - don't try to do this yourself, Apache POI has built-in code for doing this for you!
You should use WorkbookFactory.create(File) to do it, eg just
workbook = WorkbookFactory.create(file.getExcelFile());
As explained in the Apache POI docs, use a File directly in preference to an InputStream for quicker and lower memory processing

apache poi copy from very large xls to new workbook sheet

I have very large xls file , which contains two sheets. I want to combine these two sheets into one and copy to new workbook . But I get out of memory exception when I try to access this large xls as below :
FileInputStream fis = new FileInputStream(new File("input.xls"));
HSSFWorkbook workbook = new HSSFWorkbook(fis);
I tried using event api for xls : http://poi.apache.org/spreadsheet/how-to.html#event_api
But using that we can only read the cell values . But here I need to copy to new excel sheet.
Apache POI provides a low-memory footprint SXSSF API to write data to xlsx. It does not load everything in memory at once so it's the solution while working with very large excel files. You may want to consider this.

Categories