apache poi copy from very large xls to new workbook sheet - java

I have very large xls file , which contains two sheets. I want to combine these two sheets into one and copy to new workbook . But I get out of memory exception when I try to access this large xls as below :
FileInputStream fis = new FileInputStream(new File("input.xls"));
HSSFWorkbook workbook = new HSSFWorkbook(fis);
I tried using event api for xls : http://poi.apache.org/spreadsheet/how-to.html#event_api
But using that we can only read the cell values . But here I need to copy to new excel sheet.

Apache POI provides a low-memory footprint SXSSF API to write data to xlsx. It does not load everything in memory at once so it's the solution while working with very large excel files. You may want to consider this.

Related

Check if Excel file supports Apache POI HSSF or XSSF

I am using Apache POI HSSF/XSSF library to support Excel file reading functionality in my java code.
Now we are struggling with a thing that, we are not sure which version of Excel we will receive. It can be very old one or new one with macros and so on.
Workbook worbook = null;
if(FileUtils.getFileExt(file).equalsIgnoreCase("xls")) {
workbook = new HSSFWorkbook(new FileInputStream(file));
} else if (FileUtils.getFileExt(file).equalsIgnoreCase("xlsx"))
workbook = new XSSFWorkbook(new FileInputStream(file));
}
According to the Excel doc there is more formats like xlsm. xlt which we can receive.
Is there any other option to recognize which implementation should I use (HSSF vs XSSF)? Is there any possibility that xls will be not supported by HSSF?

Copying mutiple Excel sheets of different workbook into a single workbook

Problem: How to copy multiple Excel sheets of different workbook into a single workbook not by cell by cell copying as it makes performance issues as I have large data in sheets.Is there any option to copy whole sheets without iterating over every cell using Java. Just copy the whole sheets into other.
This below will copy every worksheet from a workbook you set. I will copy it and paste it at the end of the list of sheets on your current workbook
Sub CopyWorkbook()
Dim sh as Worksheet, wb as workbook
Set wb = workbooks("Target workbook")
For Each sh in workbooks("source workbook").Worksheets
sh.Copy After:=wb.Sheets(wb.sheets.count)
Next sh
End Sub

How to distinguish between XSSF and HSSF automatically in Apache POI?

I would like to be able to open Excel file of arbotrary type. Is it possible to select between HSSFWorkbook and XSSFWorkbook automatically?
Currently I write
Workbook workbook = new HSSFWorkbook(excelFile);
Can I write universal?
Yes! All you need to do is use WorkbookFactory
As per this part of the docs, it's better to use a File than an InputStream. So, just do something like:
File file = new File("input.xls");
Workbook wb = WorkbookFactory.create(file);
That will create whichever of HSSFWorkbook or XSSFWorkbook your file needs

.getAllPictures() Method doesn't get all the pictures in .xlsx excel file

I can't get all the pictures in .xlsx file that I made in MS Excel while when I used Google spreadsheet to create a .xlsx file it can read all the pictures that I inserted there.
My code is here:
XSSFWorkbook workbook = new XSSFWorkbook("Sample.xlsx");
XSSFSheet sheet = workbook.getSheetAt(0);
List lst = workbook.getAllPictures();
Is there a way on how to read ALL the images in a .xlsx file which is done by MS Excel?

Files vs InputStreams

I'm reading a excel file by Poi(3.7). I'm learning somethins about poi at this link Poi Quick Guide. Now my concerns is about this:
When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file
In the 3.7 version to Poi the WorkbookFactory doesn't have following method
WorkbookFactory.create(new File("MyExcel.xls"))
and i try to load my file in these ways:
First way
InputStream is = (InputStream) getClass().getResourceAsStream("/MyExcel.xlsx");
Workbook wb = WorkbookFactory.create(is);
Second way
String path = getClass().getResource("/MyExcel.xlsx").getPath();
FileInputStream fis = new FileInputStream(new File(path));
Workbook wb = WorkbookFactory.create(fis);
Now i want to ask you, what is background difference of these three possibilities to load an excel file? Which of these do you suggest?
First way is good if your file is located in file system. Second way is actually wrong. If your file is a part of your application, i.e. avaliable from the classpath use the following code:
Workbook wb = WorkbookFactory.create(getClass().getResourceAsStream("/MyExcel.xlsx").getPath());
What's wrong with your code? Actually it may work only if your classes are located directly in file system. If however they are packed into jar the line new FileInputStream(new File(path)) will throw FileNotFoundException because file indeed does not exist in file system but packed into jar.
Using an InputStream has a higher memory footprint than using a File

Categories