Excel sheet POI validation: Out Of Memory Error

Excel sheet POI validation: Out Of Memory Error - java

I am trying to validate an Excel file using Java before dumping it to database.
Here is my code snippet which causes error.
try {
fis = new FileInputStream(file);
wb = new XSSFWorkbook(fis);
XSSFSheet sh = wb.getSheet("Sheet1");
for(int i = 0 ; i < 44 ; i++){
XSSFCell a1 = sh.getRow(1).getCell(i);
printXSSFCellType(a1);
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Here is the error I get
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
at java.util.ArrayList.<init>(Unknown Source)
at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:78)
at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:75)
at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:98)
at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:106)
at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1273)
at org.apache.xmlbeans.impl.values.XmlObjectBase.stringValue(XmlObjectBase.java:1484)
at org.apache.xmlbeans.impl.values.XmlObjectBase.getStringValue(XmlObjectBase.java:1492)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getR(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFCell.<init>(XSSFCell.java:105)
at org.apache.poi.xssf.usermodel.XSSFRow.<init>(XSSFRow.java:70)
at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:179)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:143)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:130)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:286)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:207)
at com.xls.validate.ExcelValidator.main(ExcelValidator.java:79)
This works perfectly fine when the .xlsx file is less than 1 MB.
I understand this is because my .xlsx file is around 5-10 MB and POI tries to load the entire sheet at once in JVM memory.
What can be a possible workaround?

There are two options available to you. Option #1 - increase the size of your JVM Heap, so that Java has more memory available to it. Processing Excel files in POI using the UserModel code is DOM based, so the whole file (including parsed form) needs to be buffered into memory. Try a question like this one for advice on how to increase the help.
Option #2, which is more work - switch to event based (SAX) processing. This only processes part of the file at a time, so needs much much less memory. However, it requires more work from you, which is why you might be better throwing a few more GB of memory at the problem - memory is cheap while programmers aren't! The SpreadSheet howto page has instructions on how to do SAX parsing of .xlsx files, and there are various example files provided by POI you can look at for advice.
.
Also, another thing - you seem to be loading a File via a stream, which is bad as it means even more stuff needs buffering into memory. See the POI Documentation for more on this, including instructions on how to work with the File directly.

You can use SXSSF workbook from POI for memory related issues. Refer here
I faced the similar issue while reading and merging multiple CSVs into a single XLSX file.
I had a total of 3 csv sheets each with 30k rows totalling to 90k.
It got resolved by using SXSFF as below,
public static void mergeCSVsToXLSX(Long jobExecutionId, Map<String, String> csvSheetNameAndFile, String xlsxFile) {
try (SXSSFWorkbook wb = new SXSSFWorkbook(100);) { // keep 100 rows in memory, exceeding rows will be flushed to
// disk
csvSheetNameAndFile.forEach((sheetName, csv) -> {
try (CSVReader reader = new CSVReader(new FileReader(csv))) {
wb.setCompressTempFiles(true);
SXSSFSheet sheet = wb.createSheet(sheetName);
sheet.setRandomAccessWindowSize(100);
String[] nextLine;
int r = 0;
while ((nextLine = reader.readNext()) != null) {
Row row = sheet.createRow((short) r++);
for (int i = 0; i < nextLine.length; i++) {
Cell cell = row.createCell(i);
cell.setCellValue(nextLine[i]);
}
}
} catch (IOException ioException) {
logger.error("Error in reading CSV file {} for jobId {} with exception {}", csv, jobExecutionId,
ioException.getMessage());
}
});
FileOutputStream out = new FileOutputStream(xlsxFile);
wb.write(out);
wb.dispose();
} catch (IOException ioException) {
logger.error("Error in creating workbook for jobId {} with exception {}", jobExecutionId,
ioException.getMessage());
}
}

Use Event API (HSSF Only).
The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS with a relatively small memory footprint.

Well, here's a link with some detailed info about your error, and how to fix it: http://javarevisited.blogspot.com/2011/09/javalangoutofmemoryerror-permgen-space.html?m=1.
Well, let me try to explain your error:
The java.lang.OutOfMemoryError has two variants. One in the Java Heap Space, and the other in PermGen Space.
Your error could be caused by a memory leak, a low amount of system RAM, or very little RAM allocated to the Java Virtual Machine.
The difference between the Java Heap Space and PermGen Space variants is that PermGen Space stores pools of Strings and data on the primitive types, such as int, as well as how to read methods and classes, the Java Heap Space works differently. So if you have a lot of strings or classes in your project, and not enough allocated/system RAM, you will get an OutOfMemoryError. The default amount of RAM the JVM allocates to PermGen is 64 MB, which is quite a small bit of memory space. The linked article explains much more about this error and provides detailed information about how to fix this.
Hope this helps!

To resolve Outofmemery error follow this.
You can not modify existing cells in a SXSSFWorkbook but you can create the new file along with your modification using SXSSFWorkbook.
It's possible by passing the workbook object along with rowaccesswindow size.
SXSSFWorkbook workbook = new SXSSFWorkbook( new XSSFWorkbook(new FileInputStream(file)),100);
//Your changes in workbook
workbook.write(out);

To resolve Outofmemery error, follow this.
You can not modify existing cells in a SXSSFWorkbook, but you can create the new file along with your modification using SXSSFWorkbook.
It's possible by passing the workbook object along with rowaccesswindow size.
SXSSFWorkbook workbook = new SXSSFWorkbook( new XSSFWorkbook(new FileInputStream(file)),100);
//Your changes in workbook
workbook.write(out);

I too faced the same issue of OOM while parsing xlsx file...after two days of struggle, I finally found out the below code that was really perfect;
This code is based on sjxlsx. It reads the xlsx and stores in a HSSF sheet.
[code=java]
// read the xlsx file
SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));
HSSFWorkbook hsfWorkbook = new HSSFWorkbook();
org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();
Sheet sheetToRead = workbook.getSheet(0, false);
SheetRowReader reader = sheetToRead.newReader();
Cell[] row;
int rowPos = 0;
while ((row = reader.readRow()) != null) {
org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
int cellPos = 0;
for (Cell cell : row) {
if(cell != null){
org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
hfsCell.setCellValue(cell.getValue());
}
cellPos++;
}
rowPos++;
}
return hsfSheet;[/code]

Related

How to read large Excel file using POI in Java? [duplicate]

I have a large .xlsx file (141 MB, containing 293413 lines with 62 columns each) I need to perform some operations within.
I am having problems with loading this file (OutOfMemoryError), as POI has a large memory footprint on XSSF (xlsx) workbooks.
This SO question is similar, and the solution presented is to increase the VM's allocated/maximum memory.
It seems to work for that kind of file-size (9MB), but for me, it just simply doesn't work even if a allocate all available system memory. (Well, it's no surprise considering the file is over 15 times larger)
I'd like to know if there is any way to load the workbook in a way it won't consume all the memory, and yet, without doing the processing based (going into) the XSSF's underlying XML. (In other words, maintaining a puritan POI solution)
If there isn't tough, you are welcome to say it ("There isn't.") and point me the ways to a "XML" solution.

I was in a similar situation with a webserver environment. The typical size of the uploads were ~150k rows and it wouldn't have been good to consume a ton of memory from a single request. The Apache POI Streaming API works well for this, but it requires a total redesign of your read logic. I already had a bunch of read logic using the standard API that I didn't want to have to redo, so I wrote this instead: https://github.com/monitorjbl/excel-streaming-reader
It's not entirely a drop-in replacement for the standard XSSFWorkbook class, but if you're just iterating through rows it behaves similarly:
import com.monitorjbl.xlsx.StreamingReader;
InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
StreamingReader reader = StreamingReader.builder()
.rowCacheSize(100) // number of rows to keep in memory (defaults to 10)
.bufferSize(4096) // buffer size to use when reading InputStream to file (defaults to 1024)
.sheetIndex(0) // index of sheet to use (defaults to 0)
.read(is); // InputStream or File for XLSX file (required)
for (Row r : reader) {
for (Cell c : r) {
System.out.println(c.getStringCellValue());
}
}
There are some caveats to using it; due to the way XLSX sheets are structured, not all data is available in the current window of the stream. However, if you're just trying to read simple data out from the cells, it works pretty well for that.

A improvement in memory usage can be done by using a File instead of a Stream.
(It is better to use a streaming API, but the Streaming API's have limitations, see http://poi.apache.org/spreadsheet/index.html)
So instead of
Workbook workbook = WorkbookFactory.create(inputStream);
do
Workbook workbook = WorkbookFactory.create(new File("yourfile.xlsx"));
This is according to : http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream
Files vs InputStreams
"When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file."

The Excel support in Apache POI, HSSF and XSSF, supports 3 different modes.
One is a full, DOM-Like in-memory "UserModel", which supports both reading and writing. Using the common SS (SpreadSheet) interfaces, you can code for both HSSF (.xls) and XSSF (.xlsx) basically transparently. However, it needs lots of memory.
POI also supports a streaming read-only way to process the files, the EventModel. This is much more low-level than the UserModel, and gets you very close to the file format. For HSSF (.xls) you get a stream of records, and optionally some help with handling them (missing cells, format tracking etc). For XSSF (.xlsx) you get streams of SAX events from the different parts of the file, with help to get the right part of the file and also easy processing of common but small bits of the file.
For XSSF (.xlsx) only, POI also supports a write-only streaming write, suitable for low level but low memory writing. It largely just supports new files though (certain kinds of append are possible). There is no HSSF equivalent, and due to back-and-forth byte offsets and index offsets in many records it would be pretty hard to do...
For your specific case, as described in your clarifying comments, I think you'll want to use the XSSF EventModel code. See the POI documentation to get started, then try looking at these three classes in POI and Tika which use it for more details.

POI now includes an API for these cases. SXSSF http://poi.apache.org/spreadsheet/index.html
It does not load everything on memory so it could allow you to handle such file.
Note: I have read that SXSSF works as a writing API. Loading should be done using XSSF without inputstream'ing the file (to avoid a full load of it in memory)

Check this post. I show how to use SAX parser to process an XLSX file.
https://stackoverflow.com/a/44969009/4587961
In short, I extended org.xml.sax.helpers.DefaultHandler whih processes XML structure for XLSX filez. t is event parser - SAX.
class SheetHandler extends DefaultHandler {
private static final String ROW_EVENT = "row";
private static final String CELL_EVENT = "c";
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private List<String> cellCache = new LinkedList<>();
private List<String[]> rowCache = new LinkedList<>();
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if (CELL_EVENT.equals(name)) {
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
} else if (ROW_EVENT.equals(name)) {
if (!cellCache.isEmpty()) {
rowCache.add(cellCache.toArray(new String[cellCache.size()]));
}
cellCache.clear();
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
cellCache.add(lastContents);
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
public List<String[]> getRowCache() {
return rowCache;
}
}
And then I parse the XML presending XLSX file
private List<String []> processFirstSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename, PackageAccess.READ);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();
SheetHandler handler = new SheetHandler(sst);
XMLReader parser = fetchSheetParser(handler);
Iterator<InputStream> sheetIterator = r.getSheetsData();
if (!sheetIterator.hasNext()) {
return Collections.emptyList();
}
InputStream sheetInputStream = sheetIterator.next();
BufferedInputStream bisSheet = new BufferedInputStream(sheetInputStream);
InputSource sheetSource = new InputSource(bisSheet);
parser.parse(sheetSource);
List<String []> res = handler.getRowCache();
bisSheet.close();
return res;
}
public XMLReader fetchSheetParser(ContentHandler handler) throws SAXException {
XMLReader parser = new SAXParser();
parser.setContentHandler(handler);
return parser;
}

Based on monitorjbl's answer and test suite explored from poi, following worked for me on multi-sheet xlsx file with 200K records (size > 50 MB):
import com.monitorjbl.xlsx.StreamingReader;
. . .
try (
InputStream is = new FileInputStream(new File("sample.xlsx"));
Workbook workbook = StreamingReader.builder().open(is);
) {
DataFormatter dataFormatter = new DataFormatter();
for (Sheet sheet : workbook) {
System.out.println("Processing sheet: " + sheet.getSheetName());
for (Row row : sheet) {
for (Cell cell : row) {
String value = dataFormatter.formatCellValue(cell);
}
}
}
}

For latest code use this
InputStream file = new FileInputStream(
new File("uploads/" + request.getSession().getAttribute("username") + "/" + userFile));
Workbook workbook = StreamingReader.builder().rowCacheSize(100) // number of rows to keep in memory
.bufferSize(4096) // index of sheet to use (defaults to 0)
.open(file); // InputStream or File for XLSX file (required)
Iterator<Row> rowIterator = workbook.getSheetAt(0).rowIterator();
while (rowIterator.hasNext()) {
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
String cellValue = dataFormatter.formatCellValue(cell);
}}

You can use SXXSF instead of using HSSF. I could generate excel with 200000 rows.

Java Out of memory error for reading from and writing to an xlsx

I need to read several xlsx files looking for data specific to an employee and simultaneously create another xlsx file (if I find data in any of the file)with file name as employee Id appended to the name I found the data in. Eg. there is an employee with emp id 1 and there are severaal xlsx files such as A,B, C... so on; I need to look for data relating to emp id 1 in each file and for the files I get a hit I need to create a file named 1_A.xlsx.
Now although I have built the logic and am using Apache POI APIs for reading and writing, my code is throwing Out Of Memory error after creating just the first file with the data. And is unable to read the rest of the files.
I have tried using SXSSF instead of XSSF but same OOM happens.
Increasing the heap space is not an option for me.
Please help here...Thanks in advance.
Here is a piece of code :
//Reader:
Row row = null;
List<Row> listOfRecords = new ArrayList<Row>();
try {
FileInputStream fis = new FileInputStream(metaDataFile);
new InputStreamReader(fis, "ISO-8859-1");
XSSFWorkbook wb = new XSSFWorkbook(fis);
XSSFSheet sheet = wb.getSheetAt(0);
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
row = rowIterator.next();
if (!isEmptyRow(row)) {
listOfRecords.add(row);
}
}
wb.close();
fis.close();
//Writer
LOGGER.info("in createWorkbook " );
Workbook empWorkbook = new SXSSFWorkbook(200);
Sheet empSheet = empWorkbook.createSheet("Itype Sheet For Emp_"
+ personnelNumber);
int rowNum = listOfRecords.size();
System.out.println("Creating excel");
Cell c = null;
for (int i = 0; i < rowNum; i++) {
Row record = listOfRecords.get(i);
Row empRow = empSheet.createRow(i++);
if (!isEmptyRow(record)) {
int colNum = record.getLastCellNum() + 1;
for (int j = 0; j < colNum; j++) {
Cell newCell = empRow.createCell(j);
System.out.println("cellVal:"
+ String.valueOf(record.getCell(j)));
newCell.setCellValue(String.valueOf(record.getCell(j)));
}
}
}
The writer method is called from within the reader.

Reading of multiple xlsx files is indeed tricky business butI finally solved it.
I had to break down my code several folds to realise that the OOM error was due to the fact that after reading 3 files no more memory was left to process the rest of the files.
xlsx files are compressed xml files. So when we try to read them using XSSF or SXSSF APIs it loads the entire DOM to the memory thereafter choking it.
I found an excellent solution here :
[https://github.com/monitorjbl/excel-streaming-reader]
Hope this will help others who come here facing the same issue.

Writing 1 million records to Excel file with 300 columns

I am using Apache POI Streaming API- SXSSFWorkbook to write data to excel file.
But Excel file is getting corrupted for more than 100 000 records with 300 columns generally if the size is greater then 100Mb. Is there any way for writing huge data to excel file.
class Test1 {
public static void main(String[] args) throws FileNotFoundException, {
SXSSFWorkbook workbook = new SXSSFWorkbook(100);
workbook.setCompressTempFiles(true);
Sheet sheet = null;
Row row = null;
Cell cell = null;
sheet = workbook.createSheet("Demo1");
FileOutputStream outStream = new FileOutputStream("D:\\Test1.xlsx");
try {
for (int i = 0; i < 100000; i++) {
row = sheet.createRow(i);
for (int j = 0; j < 300; j++) {
cell = row.createCell(j);
cell.setCellValue(" row : "+i +" col: "+ j);
}
}
workbook.write(outStream);
} catch (Exception exception) {
exception.printStackTrace();
} finally {
workbook.dispose();
try {
outStream.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Edit 1:
What I found is, it is not the problem with Apache POI Streaming Api.It is generating the file with 1 Million records. But Excel is not loading that file. It is giving 'There isn't enough memory to complete this action' error.
I am using Excel 2013 32 bit version which can use only up till 2GB of memory. The excel file I created with 100k records and 300 columns has file size 108MB. When I try to open this file in Excel it is taking up whole lot of System memory. As soon as the memory consumption reaches 1.7 MB Excel is giving an error.
What is the minimum configuration to load 1million rows generated using Apache Streaming API? Any help would be appreciated.
Thanks.
Edit 2:
If I open Excel file generated using Apache Streaming Api in zip format(by renaming .xlsx to .zip),size of the xml file in xl->worksheets folder is around 2GB for 100k records and 300 columns. Is there any way to reduce the size of this xml file.

Never had tried generating more than about 100 to 120 columns myself. But a restriction at 255 columns max is not really surprising (had been this way with older Excel formats). Your observation that 100k rows with 200 columns is working fine, while 100k rows with 300 columns is failing, is a strong indicator of such a restriction.
Then you should be able to generate the 1 million (exactly 1048576) rows sheet with up to 255 columns.,For any extra rows and extra columns you will need to create extra sheets.
So, with your 300 columns target, you would generate sheet1 with the first 255 columns (or some less if there is reasonable logical grouping) and sheet2 with the other columns.
For more rows repeat the 2 sheets approach with a new pair of sheets until all rows have been generated.
BTW,
did you recognize that with using SXSSFWorkbook a rowAccessWindowSize of "1" is giving best performance?

Apache POI much quicker using HSSF than XSSF - what next?

I've been having some issues with parsing .xlsx files with Apache POI - I am getting java.lang.OutOfMemoryError: Java heap space in my deployed app. I'm only processing files under 5MB and around 70,000 rows so my suspicion from reading number other questions is that something is amiss.
As suggested in this comment I decided to run SSPerformanceTest.java with the suggested variables so see if there is anything wrong with my code or setup. The results show a significant difference between HSSF (.xls) and XSSF (.xlsx):
1) HSSF 50000 50 1: Elapsed 1 seconds
2) SXSSF 50000 50 1: Elapsed 5 seconds
3) XSSF 50000 50 1: Elapsed 15 seconds
The FAQ specifically says:
If you can't run that with 50,000 rows and 50 columns in all of HSSF, XSSF and SXSSF in under 3 seconds (ideally a lot less!), the problem is with your environment.
Next, it says to run XLS2CSV.java which I have done. Feeding in the XSSF file generated above (with 50000 rows and 50 columns) takes around 15 seconds - the same amount it took to write the file.
Is something wrong with my environment, and if so how do I investigate further?
Stats from VisualVM show the heap used shooting up to 1.2Gb during the processing. Surely this is way too high considering that's an extra gig on top of the heap compared to before processing began?
Note: The heap space exception mentioned above only happens in production (on Google App Engine) and only for .xlsx files, however the tests mentioned in this question have all been run on my development machine with -Xmx2g. I'm hoping that if I can fix the problem on my development setup it will use less memory when I deploy.
Stack trace from app engine:
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260)
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Cur.java:2997)
at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3211)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)

I was facing same kind of issue to read bulky .xlsx file using Apache POI and I came across
excel-streaming-reader-github
This library serves as a wrapper around that streaming API while preserving the syntax of the standard POI API
This library can help you to read large files.

The average XLSX sheet I work is about 18-22 sheets of 750 000 rows with 13-20 columns. This is spinning in the Spring web application with lots of other functionalities. I gave to whole application not that much of memory: -Xms1024m -Xmx4096m - and it works great!
First of all dumping code: it is wrong to load each and every data row in memory and than starting to dump it. In my case (reporting from the PostgreSQL database) I reworked data dump procedure to use RowCallbackHandler to write to my XLSX, during this once I reach "my limit" of 750000 rows, I create new sheet. And workbook is created with visibility window of 50 rows. In this way I am able to dump huge volumes: size of XLSX file is about 1230Mb.
Some code to write sheets:
jdbcTemplate.query(
new PreparedStatementCreator() {
#Override
public PreparedStatement createPreparedStatement(Connection connection) throws SQLException {
PreparedStatement statement = connection.prepareStatement(finalQuery, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
statement.setFetchSize(100);
statement.setFetchDirection(ResultSet.FETCH_FORWARD);
return statement;
}
}, new RowCallbackHandler() {
Sheet sheet = null;
int i = 750000;
int tableId = 0;
#Override
public void processRow(ResultSet resultSet) throws SQLException {
if (i == 750000) {
tableId++;
i = 0;
sheet = wb.createSheet(sheetName.concat(String.format("%02d%n", tableId)));
Row r = sheet.createRow(0);
Cell c = r.createCell(0);
c.setCellValue("id");
c = r.createCell(1);
c.setCellValue("Дата");
c = r.createCell(2);
c.setCellValue("Комментарий");
c = r.createCell(3);
c.setCellValue("Сумма операции");
c = r.createCell(4);
c.setCellValue("Дебет");
c = r.createCell(5);
c.setCellValue("Страхователь");
c = r.createCell(6);
c.setCellValue("Серия договора");
c = r.createCell(7);
c.setCellValue("Номер договора");
c = r.createCell(8);
c.setCellValue("Основной агент");
c = r.createCell(9);
c.setCellValue("Кредит");
c = r.createCell(10);
c.setCellValue("Программа");
c = r.createCell(11);
c.setCellValue("Дата начала покрытия");
c = r.createCell(12);
c.setCellValue("Дата планового окончания покрытия");
c = r.createCell(13);
c.setCellValue("Периодичность уплаты взносов");
}
i++;
PremiumEntity e = PremiumEntity.builder()
.Id(resultSet.getString("id"))
.OperationDate(resultSet.getDate("operation_date"))
.Comments(resultSet.getString("comments"))
.SumOperation(resultSet.getBigDecimal("sum_operation").doubleValue())
.DebetAccount(resultSet.getString("debet_account"))
.Strahovatelname(resultSet.getString("strahovatelname"))
.Seria(resultSet.getString("seria"))
.NomPolica(resultSet.getLong("nom_polica"))
.Agentname(resultSet.getString("agentname"))
.CreditAccount(resultSet.getString("credit_account"))
.Program(resultSet.getString("program"))
.PoliciStartDate(resultSet.getDate("polici_start_date"))
.PoliciPlanEndDate(resultSet.getDate("polici_plan_end_date"))
.Periodichn(resultSet.getString("id_periodichn"))
.build();
Row r = sheet.createRow(i);
Cell c = r.createCell(0);
c.setCellValue(e.getId());
if (e.getOperationDate() != null) {
c = r.createCell(1);
c.setCellStyle(dateStyle);
c.setCellValue(e.getOperationDate());
}
c = r.createCell(2);
c.setCellValue(e.getComments());
c = r.createCell(3);
c.setCellValue(e.getSumOperation());
c = r.createCell(4);
c.setCellValue(e.getDebetAccount());
c = r.createCell(5);
c.setCellValue(e.getStrahovatelname());
c = r.createCell(6);
c.setCellValue(e.getSeria());
c = r.createCell(7);
c.setCellValue(e.getNomPolica());
c = r.createCell(8);
c.setCellValue(e.getAgentname());
c = r.createCell(9);
c.setCellValue(e.getCreditAccount());
c = r.createCell(10);
c.setCellValue(e.getProgram());
if (e.getPoliciStartDate() != null) {
c = r.createCell(11);
c.setCellStyle(dateStyle);
c.setCellValue(e.getPoliciStartDate());
}
;
if (e.getPoliciPlanEndDate() != null) {
c = r.createCell(12);
c.setCellStyle(dateStyle);
c.setCellValue(e.getPoliciPlanEndDate());
}
c = r.createCell(13);
c.setCellValue(e.getPeriodichn());
}
});
After reworking my code on dumping the data to XLSX, I came to problem, that it requires Office in 64 bits to open them. So I need to split my workbook with lots of sheets into separate XLSX files with single sheets to make them readable on average machine. And again I used small visibility windows and streamed processing, and kept the whole application working well without any sights of OutOfMemory.
Some code to read and split sheets:
OPCPackage opcPackage = OPCPackage.open(originalFile, PackageAccess.READ);
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(opcPackage);
XSSFReader xssfReader = new XSSFReader(opcPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
int index = 0;
while (iter.hasNext()) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
DataFormatter formatter = new DataFormatter();
InputSource sheetSource = new InputSource(stream);
SheetToWorkbookSaver saver = new SheetToWorkbookSaver(sheetName);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(
styles, null, strings, saver, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch(ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
stream.close();
// this creates new File descriptors inside storage
FileDto partFile = new FileDto("report_".concat(StringUtils.trimToEmpty(sheetName)).concat(".xlsx"));
File cloneFile = fileStorage.read(partFile);
FileOutputStream cloneFos = new FileOutputStream(cloneFile);
saver.getWb().write(cloneFos);
cloneFos.close();
}
and
public class SheetToWorkbookSaver implements XSSFSheetXMLHandler.SheetContentsHandler {
private SXSSFWorkbook wb;
private Sheet sheet;
private CellStyle dateStyle ;
private Row currentRow;
public SheetToWorkbookSaver(String workbookName) {
this.wb = new SXSSFWorkbook(50);
this.dateStyle = this.wb.createCellStyle();
this.dateStyle.setDataFormat(this.wb.getCreationHelper().createDataFormat().getFormat("dd.mm.yyyy"));
this.sheet = this.wb.createSheet(workbookName);
}
#Override
public void startRow(int rowNum) {
this.currentRow = this.sheet.createRow(rowNum);
}
#Override
public void endRow(int rowNum) {
}
#Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
int thisCol = (new CellReference(cellReference)).getCol();
Cell c = this.currentRow.createCell(thisCol);
c.setCellValue(formattedValue);
c.setCellComment(comment);
}
#Override
public void headerFooter(String text, boolean isHeader, String tagName) {
}
public SXSSFWorkbook getWb() {
return wb;
}
}
So it reads and writes data. I guess in your case you should rework your code to same patterns: keep in memory only small footprint of data. So I would suggest for reading create custom SheetContentsReader, which will be pushing data to some database, where it can be easily processed, aggregated, etc.

Write items into excel cell

I am absolute new to the Eclipse Java coding. I am trying to finish a project for managing the inventory. The part i am having trouble with is that, when I tried to write the items into the excel cell, I got errors saying that the array is out of bounds.
PS: item and item.getPartname etc are all defined under another class file.
Please help. thanks
FileOutputStream os =new FileOutputStream("orderreceipt");
//Create a new workbook for writing data
HSSFWorkbook wb2 = new HSSFWorkbook();
//Create a new sheet:
HSSFSheet newsheet = wb2.createSheet("MyNewSheet");
//Create a new row:
for (int i=0; i<6; i++){
HSSFRow newrow = newsheet.createRow(i);
sentorder item = (sentorder)items.get(i);
for (short j=0; j<5; j++){
HSSFCell cell = newrow.createCell(j);
cell.setCellValue(item.getPartname());
cell.setCellValue(item.getPartnumber());
cell.setCellValue(item.getQuantity());
cell.setCellValue(new Date());
HSSFCellStyle styleOfCell = wb2.createCellStyle();
styleOfCell.setDataFormat(HSSFDataFormat
.getBuiltinFormat("m/d/yy"));
styleOfCell.setFillForegroundColor(HSSFColor.AQUA.index);
styleOfCell.setFillPattern(HSSFCellStyle.BORDER_THIN);
cell.setCellStyle(styleOfCell);
}}
wb2.write(os);
}

I can see quite a few problems with the attached code:
Missing file extension when creating new FileOutputStream - since you're generating the .xls workbook, you'd probably like to store it in the XLS file (the extension is not added automatically), also just make sure you have a proper file path to the directory you have write permissions (local application dir, as in this case should be ok though).
As already mentioned you are re-setting the same cell value 4 times
You are creating the same cell style multiple times (this is not cached behind the scenes and there is largely limited number of cells styles which can be created so if you were generating a couple thousands of rows you might get into troubles
You don't flush() and close() stream after writing your workbook. Streams in Java a precious resources which need to be manually closed.
Without stack trace it's difficult to say 100% where the ArrayOutOfBound issue you're seeing is coming from, however my guess would be that you're trying to access a item (from items collection) with the index that doesn't exist, which is a consequence that you're driving your report data from row indexes instead of the list of items you have.
Also, since you're quite new to Java a couple of guidelines which will allow you to produce hopefully better and less error-prone code in the future:
Use proper Java naming convention - please follow standard Java naming convention http://java.about.com/od/javasyntax/a/nameconventions.htm , your code will be easier to read and reason about (especially when you're looking for help from community) - i.e. sentorder class should be named as SentOrder.
Try to split your code into smaller, more testable modules i.e. you can have a helper createDataRow method called from your main method, in general having more than a couple of inner loops in one method makes them incredibly difficult to test, debug and reason about.
Unless you really need to generate .xls format, consider using XSSF* classes for generating xlsx document - it has many improvements over HSSF* (including much better dataFormat support).
Having those in mind I've rewritten your example:
public void improved(List<SentOrder> items) throws IOException {
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("MyNewSheet");
HSSFCellStyle styleOfCell = workbook.createCellStyle();
styleOfCell.setDataFormat(HSSFDataFormat.getBuiltinFormat("m/d/yy"));
styleOfCell.setFillForegroundColor(HSSFColor.AQUA.index);
styleOfCell.setFillPattern(HSSFCellStyle.BORDER_THIN);
int rowIndex = 0;
for(SentOrder item : items) {
HSSFRow row = sheet.createRow(rowIndex++);
HSSFCell nameCell = row.createCell(0);
nameCell.setCellValue(item.getPartName());
HSSFCell numberCell = row.createCell(1);
numberCell.setCellValue(item.getPartNumber());
HSSFCell quantityCell = row.createCell(2);
quantityCell.setCellValue(item.getQuantity());
HSSFCell dateCell = row.createCell(3);
dateCell.setCellValue(new Date());
dateCell.setCellStyle(styleOfCell);
}
FileOutputStream os = new FileOutputStream("order_receipt.xls");
try {
workbook.write(os);
} finally {
os.flush();
os.close();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Excel sheet POI validation: Out Of Memory Error - java

Related

How to read large Excel file using POI in Java? [duplicate]

Java Out of memory error for reading from and writing to an xlsx

Writing 1 million records to Excel file with 300 columns

Apache POI much quicker using HSSF than XSSF - what next?

Write items into excel cell

Categories

Resources