API for Processing large XLSX file - java

I have one question. Is there any API that can process xlsx and xls file. The requirement is i have one excel file. i have to encrypt values of some specific columns. without affecting the format of cell like cell color, cell formula, cell date format, cell currency format, charts etc. I have used APACHE POI library. i did not get success. it is very slow and not working on large file. i also searched on google but i dint get proper result.

An alternative to POI is JXL.
We used it successfully with rather large files.

Related

how to get cell type using jopendocument while reading ods cell?

I need to parse .ods document to string values. So, if any of the cell has date, currency or any other format other than number and text I need to get that data according to the cell type and convert it using String. I'm doing like that so because I don't want my data to get converted while normally getting the value. So please help with with fetching the data of the cell with the help of cell type and also help me with getting if cell has any formula.
Note : The code should be written in java using jopendoucment jar methods (which will be helpful while parsing ods file where as famous apache poi cannot parse open document formats, I do have the code for that for parsing .xls and .xlsx format as it is written with famous apache poi jar ) .
Thanks for your reply in advance
You can use MutableCell.getValue() to get value of cell and MutableCell.getValueType() to get data type to define conversion method.

Java API Automated formula's in Excel with trash output [duplicate]

I am filling cells of an Excel file using Apache POI, and there are a lot of formula cells in the document. However, their values are not refreshed when I open the document in Excel.
It's my understanding that I need to use a FormulaEvaluator to refresh formula cells. Is there a way, though, to update all formula cells at once? There are a lot of them, and while making an exhaustive list is not out of question, it's certainly not something I'm very willing to do.
Sure. Refreshing all the formulas in a workbook is possibly the more typical use case anyway.
If you're using HSSF, call evaluatorAllFormulaCells:
HSSFFormulaEvaluator.evaluateAllFormulaCells(hssfWorkbook)
If you're using XSSF, call evaluatorAllFormulaCells:
XSSFFormulaEvaluator.evaluateAllFormulaCells(xssfWorkbook)
More details are available on the poi website
wb.setForceFormulaRecalculation(true);
// replace "wb" with your HSSFWorkbook/XSSFWorkbook object
https://poi.apache.org/apidocs/org/apache/poi/hssf/usermodel/HSSFWorkbook.html#setForceFormulaRecalculation-boolean-
https://poi.apache.org/apidocs/org/apache/poi/xssf/usermodel/XSSFWorkbook.html#setForceFormulaRecalculation-boolean-

Parsing xlsx files as chunks via streaming/pagination strategy using apache poi

There is a case wherein xlsx,xlsm files having huge amount of data(in orders of 80-100MB) is causing memory heap out of space issues on servers using the load() method of Workbook object, which takes FileInputStream as parameter.
Its intended to load the data, validate the cell content and report error in case there is invalid record entry. If all data is correct then write it to the table.Hence, the following didn't suffice my purpose.
Error While Reading Large Excel Files (xlsx) Via Apache POI
The problem involves paginated parsing, data validating and then writing to database.
As xlsx files are in zip format containing content XML, you may remove pages by a simple parsing/discarding, creating a smaller content XML. Then create a smaller xlsx and use Apache POI. Use a test xlsx to develop the parsing. The XML in general has no line breaks or indentation; so an XML beautifier / tree editor might help. Excel uses shared strings so the actual content is hard to see.
Use a zip file system (URLs "jar:file://... .xlsx") to operate on the xlsx.
StAX parser is a good approach to this situation.
https://docs.oracle.com/javase/tutorial/jaxp/stax/index.html
We can iterate over the sheets to obtain index of value at each cell, and use SharedStringsTable object to get the value at particular cell location.

Using Apache POI can I stop users copying formatting?

I have made an excel sheet which is generated by Java. The cells can only accept certain values, depending on data validation done against lists on a separate sheet.
This all works great, but if a user copies some values from another cell and pastes it into the cell it avoids validation... is there any way to prevent this?
You can do this by setting data format for that cell.
style = wb.createCellStyle();
style.setDataFormat(wb.createDataFormat().getFormat("0.000%"));
which will percentage value.
For documentation goto:
Apache HSSF Doc

File size issue. while poi read write

I am using apache poi api to deal with my spread sheet files.
I have observed, if we try to edit an existing .xls file it size is not the same as if that same file (same data ) is written in one go.
It is normal for an Excel spreadsheet to grow after being opened or edited. When a spreadsheet is opened in Microsoft Excel the formulas are automatically calculated, so this increases the size of the file. If a spreadsheet is opened by Apache POI it is up to the developer to call the (FormulaEvaluator) to update all the values. When a spreadsheet is read by Apache POI and the formulas have not been evaluated, formula answers may be invalid.
POI will always write out one record per cell
Excel, however, will sometimes bunch several similar sequential cells up into a single record. For example, if you have 3 cells in a row that are blank but styled, then excel will generate a MulBlankRecord which holds all of them. For several cells in a row with simple numbers in them, excel uses a MulRKRecord
When POI reads in a file, it expands all the Mul* records out. At write time, the individual cell records are written, so the file gets slightly bigger. I think there's an entry in the POI bugzilla for the enhancement to get POI to coalesce cells into Mul records, but no-one seems to have volunteered to work on it yet...

Categories