How to protect formulas in a column during JXLS transformation - java

Attached is the snapshot of xlsx which i used to write data using JXLS. however am writing data only in COLUMN 1. but from the downloaded file. the formulas present in column 4 are changed. How do we protect the formulas present in a whole column from jxls transformation.
below is the code snippet i use to write the data.
XLSTransformer transformer = new XLSTransformer();
Workbook destWorkbook = null;
try {
InputStream is = new BufferedInputStream(new FileInputStream(templateFile));
destWorkbook = transformer.transformXLS(is, beans);
} catch (ParsePropertyException | InvalidFormatException | IOException e) {
throw new LoyException(e);
}
return destWorkbook;
I tried locking the cell and protecting the sheet , but it didn't work.

Related

How to read large Excel file using POI in Java? [duplicate]

I have a large .xlsx file (141 MB, containing 293413 lines with 62 columns each) I need to perform some operations within.
I am having problems with loading this file (OutOfMemoryError), as POI has a large memory footprint on XSSF (xlsx) workbooks.
This SO question is similar, and the solution presented is to increase the VM's allocated/maximum memory.
It seems to work for that kind of file-size (9MB), but for me, it just simply doesn't work even if a allocate all available system memory. (Well, it's no surprise considering the file is over 15 times larger)
I'd like to know if there is any way to load the workbook in a way it won't consume all the memory, and yet, without doing the processing based (going into) the XSSF's underlying XML. (In other words, maintaining a puritan POI solution)
If there isn't tough, you are welcome to say it ("There isn't.") and point me the ways to a "XML" solution.
I was in a similar situation with a webserver environment. The typical size of the uploads were ~150k rows and it wouldn't have been good to consume a ton of memory from a single request. The Apache POI Streaming API works well for this, but it requires a total redesign of your read logic. I already had a bunch of read logic using the standard API that I didn't want to have to redo, so I wrote this instead: https://github.com/monitorjbl/excel-streaming-reader
It's not entirely a drop-in replacement for the standard XSSFWorkbook class, but if you're just iterating through rows it behaves similarly:
import com.monitorjbl.xlsx.StreamingReader;
InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
StreamingReader reader = StreamingReader.builder()
.rowCacheSize(100) // number of rows to keep in memory (defaults to 10)
.bufferSize(4096) // buffer size to use when reading InputStream to file (defaults to 1024)
.sheetIndex(0) // index of sheet to use (defaults to 0)
.read(is); // InputStream or File for XLSX file (required)
for (Row r : reader) {
for (Cell c : r) {
System.out.println(c.getStringCellValue());
}
}
There are some caveats to using it; due to the way XLSX sheets are structured, not all data is available in the current window of the stream. However, if you're just trying to read simple data out from the cells, it works pretty well for that.
A improvement in memory usage can be done by using a File instead of a Stream.
(It is better to use a streaming API, but the Streaming API's have limitations, see http://poi.apache.org/spreadsheet/index.html)
So instead of
Workbook workbook = WorkbookFactory.create(inputStream);
do
Workbook workbook = WorkbookFactory.create(new File("yourfile.xlsx"));
This is according to : http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream
Files vs InputStreams
"When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file."
The Excel support in Apache POI, HSSF and XSSF, supports 3 different modes.
One is a full, DOM-Like in-memory "UserModel", which supports both reading and writing. Using the common SS (SpreadSheet) interfaces, you can code for both HSSF (.xls) and XSSF (.xlsx) basically transparently. However, it needs lots of memory.
POI also supports a streaming read-only way to process the files, the EventModel. This is much more low-level than the UserModel, and gets you very close to the file format. For HSSF (.xls) you get a stream of records, and optionally some help with handling them (missing cells, format tracking etc). For XSSF (.xlsx) you get streams of SAX events from the different parts of the file, with help to get the right part of the file and also easy processing of common but small bits of the file.
For XSSF (.xlsx) only, POI also supports a write-only streaming write, suitable for low level but low memory writing. It largely just supports new files though (certain kinds of append are possible). There is no HSSF equivalent, and due to back-and-forth byte offsets and index offsets in many records it would be pretty hard to do...
For your specific case, as described in your clarifying comments, I think you'll want to use the XSSF EventModel code. See the POI documentation to get started, then try looking at these three classes in POI and Tika which use it for more details.
POI now includes an API for these cases. SXSSF http://poi.apache.org/spreadsheet/index.html
It does not load everything on memory so it could allow you to handle such file.
Note: I have read that SXSSF works as a writing API. Loading should be done using XSSF without inputstream'ing the file (to avoid a full load of it in memory)
Check this post. I show how to use SAX parser to process an XLSX file.
https://stackoverflow.com/a/44969009/4587961
In short, I extended org.xml.sax.helpers.DefaultHandler whih processes XML structure for XLSX filez. t is event parser - SAX.
class SheetHandler extends DefaultHandler {
private static final String ROW_EVENT = "row";
private static final String CELL_EVENT = "c";
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private List<String> cellCache = new LinkedList<>();
private List<String[]> rowCache = new LinkedList<>();
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if (CELL_EVENT.equals(name)) {
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
} else if (ROW_EVENT.equals(name)) {
if (!cellCache.isEmpty()) {
rowCache.add(cellCache.toArray(new String[cellCache.size()]));
}
cellCache.clear();
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
cellCache.add(lastContents);
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
public List<String[]> getRowCache() {
return rowCache;
}
}
And then I parse the XML presending XLSX file
private List<String []> processFirstSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename, PackageAccess.READ);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();
SheetHandler handler = new SheetHandler(sst);
XMLReader parser = fetchSheetParser(handler);
Iterator<InputStream> sheetIterator = r.getSheetsData();
if (!sheetIterator.hasNext()) {
return Collections.emptyList();
}
InputStream sheetInputStream = sheetIterator.next();
BufferedInputStream bisSheet = new BufferedInputStream(sheetInputStream);
InputSource sheetSource = new InputSource(bisSheet);
parser.parse(sheetSource);
List<String []> res = handler.getRowCache();
bisSheet.close();
return res;
}
public XMLReader fetchSheetParser(ContentHandler handler) throws SAXException {
XMLReader parser = new SAXParser();
parser.setContentHandler(handler);
return parser;
}
Based on monitorjbl's answer and test suite explored from poi, following worked for me on multi-sheet xlsx file with 200K records (size > 50 MB):
import com.monitorjbl.xlsx.StreamingReader;
. . .
try (
InputStream is = new FileInputStream(new File("sample.xlsx"));
Workbook workbook = StreamingReader.builder().open(is);
) {
DataFormatter dataFormatter = new DataFormatter();
for (Sheet sheet : workbook) {
System.out.println("Processing sheet: " + sheet.getSheetName());
for (Row row : sheet) {
for (Cell cell : row) {
String value = dataFormatter.formatCellValue(cell);
}
}
}
}
For latest code use this
InputStream file = new FileInputStream(
new File("uploads/" + request.getSession().getAttribute("username") + "/" + userFile));
Workbook workbook = StreamingReader.builder().rowCacheSize(100) // number of rows to keep in memory
.bufferSize(4096) // index of sheet to use (defaults to 0)
.open(file); // InputStream or File for XLSX file (required)
Iterator<Row> rowIterator = workbook.getSheetAt(0).rowIterator();
while (rowIterator.hasNext()) {
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
String cellValue = dataFormatter.formatCellValue(cell);
}}
You can use SXXSF instead of using HSSF. I could generate excel with 200000 rows.

Excel POI Unexpected missing row when some rows already present

Someone have already seen this error ?
I get it just when I create my HSSFWorkbook
try {
LOGGER.info("Open Excel file: " + filename);
InputStream inputStream = new FileInputStream(filename);
Workbook wb = new HSSFWorkbook(inputStream);
Sheet sheet = wb.getSheetAt(0);
/* save excel */
FileOutputStream fileOut = new FileOutputStream(filenameOutput);
wb.write(fileOut);
fileOut.close();
wb.close();
inputStream.close();
} catch (IOException e1) {
LOGGER.log(Level.SEVERE, e1.getMessage(), e1);
}
error is on new HSSFWorkbook(inputstream)
Exception in thread "main" java.lang.RuntimeException: Unexpected missing row when some rows already present
at org.apache.poi.hssf.usermodel.HSSFSheet.setPropertiesFromSheet(HSSFSheet.java:212)
at org.apache.poi.hssf.usermodel.HSSFSheet.<init>(HSSFSheet.java:137)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:338)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:289)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:224)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:382)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:364)
Any idea ?
I have seen this question on http://apache-poi.1045710.n5.nabble.com/Unexpected-missing-row-when-some-rows-already-present-td5527417.html but they don't give good answer
I use POI 3.12
UPDATE : My excel contain 53 rows and hundreds columns so he is not empty. The excel was generated with Buisness Object and some people have the very same issues with no solution.
Source code of HSSFSheet is here
Ok I've found the solution.
Upgrade to 3.14 solve the problem.
The exception in 3.13 and previous is generated by this code
RowRecord row = sheet.getNextRow();
boolean rowRecordsAlreadyPresent = row != null;
207 if (hrow == null) {
208 // Some tools (like Perl module Spreadsheet::WriteExcel - bug 41187) skip the RowRecords
209 // Excel, OpenOffice.org and GoogleDocs are all OK with this, so POI should be too.
210 if (rowRecordsAlreadyPresent) {
211 // if at least one row record is present, all should be present.
212 throw new RuntimeException("Unexpected missing row when some rows already present");
213 }
But on 3.14 they just comment it
if (hrow == null) {
/* we removed this check, see bug 47245 for the discussion around this
// Some tools (like Perl module Spreadsheet::WriteExcel - bug 41187) skip the RowRecords
// Excel, OpenOffice.org and GoogleDocs are all OK with this, so POI should be too.
if (rowRecordsAlreadyPresent) {
// if at least one row record is present, all should be present.
throw new RuntimeException("Unexpected missing row when some rows already present");
}*/
// create the row record on the fly now.
RowRecord rowRec = new RowRecord(cval.getRow());
sheet.addRow(rowRec);
hrow = createRowFromRecord(rowRec);
}
As they said, some tools skip the RowRec, in my case it's when i generate my .xls with Buisness Object
example:
Workbook workbook = null;
try {
workbook = WorkbookFactory.create(is);
} catch (IOException e) {
e.printStackTrace();
} catch (InvalidFormatException e) {
e.printStackTrace();
}
Sheet sheet = workbook.getSheetAt(0);
XSSF or HSSF doesn't matter, both should work well.

reading from an array and creating an workbook that is .xls in 2003 format through POI

I am working as java developer and i want to convert an .csv file to .xls 2003 format so that my csv will be converted into .xls file the structure of my .csv file would be like
REC_STATUS,TRADE_ID,SETTLEMENT_DATE,TRADE_EFFECTIVE_DATE,PAYMENT_TYPE,VERSION,BREAK_DOWN_BUCKET,CAUSE,NUM_CASHFLOWS_AFFECTED,PROFILE
Found only in File :B,178942690,01-Feb-16,03-Dec-14,"Coupon",5,NOISY_BREAK_BUCKET,REC_TOOL_ISSUE_PAYMENT_DIRECTION_MISMATCH | REC_TOOL_ISSUE_NOTIONAL_MISMATCH | TRADE_VERSION,1,AVS Offshore
Found only in File :B,197728700,Various,21-Dec-15,"Coupon,(x20)",2,ACTUAL DATA BREAK BUCKET,ACTUAL_DATA_BREAK,20,AVS Offshore
now i have used the open csv parser (http://opencsv.sourceforge.net/) for this and coded as shown below
CSVReader reader = new CSVReader(
new FileReader("c:/xxx.csv"), ',', '"');
// Read all rows at once
List<String[]> allRows = reader.readAll();
now please advise after reading i want to create an .xls workbook in similar fashion as .csv was created above so i have to read the values first i have to read the header and then corresponding values and have to write them in .xls 2003 format please advise how i would achieve this I am using POI 3.10 version for it
Folks please advise how to put array contents in workbook now
enter code here
class Handle
{
public static void data(List list,String path)
{
Workbook wb = new XSSFWorkbook();
HSSFSheet sheet = (HSSFSheet) wb.createSheet("Sheet");
int i=0;
while(i<list.size())
{
HSSFRow row = sheet.createRow(i);
String [] data = (String[]) list.get(i);
int j=0;
while(j<data.length)
{
HSSFCell cell = row.createCell(j);
cell.setCellValue(data[j]);
j++;
}
i++;
}
FileOutputStream fos = new FileOutputStream(path+".xls");
wb.write(fos);
fos.flush();
fos.close();
wb.close();
}
}
As told by #gagravarr you can use common SS classes to handle both ''.xls'' and ''.xlsx'' format with the same code..

POI - Excel remains held by some process and cannot be edited

I am using POI to read,edit and write excel files.
My process flow is like write an excel, read it and again write it.
Then if I try to edit the file using normal desktop excel application while my Java app is still running, the excel cannot be saved, it says some process is holding the excel,
I am properly closing all file handles.
Please help and tell me how to fix this issue.
SXSSFWorkbook wb = new SXSSFWorkbook(WINDOW_SIZE);
Sheet sheet = getCurrentSheet();//stores the current sheet in a instance variable
for (int rowNum = 0; rowNum < data.size(); rowNum++) {
if (rowNum > RECORDS_PER_SHEET) {
if (rowNum % (RECORDS_PER_SHEET * numberOfSheets) == 1) {
numberOfSheets++;
setCurrentSheet(wb.getXSSFWorkbook().createSheet());
sheet = getCurrentSheet();
}
}
final Row row = sheet.createRow(effectiveRowCnt);
for (int columnCount = 0; columnCount < data.get(rowNum).size(); columnCount++) {
final Object value = data.get(rowNum).get(columnCount);
final Cell cell = row.createCell(columnCount);
//This method creates the row and cell at the given loc and adds value
createContent(value, cell, effectiveRowCnt, columnCount, false, false);
}
}
public void closeFile(boolean toOpen) {
FileOutputStream out = null;
try {
out = new FileOutputStream(getFileName());
wb.write(out);
}
finally {
try {
if (out != null) {
out.close();
out = null;
if(toOpen){
// Open the file for user with default program
final Desktop dt = Desktop.getDesktop();
dt.open(new File(getFileName()));
}
}
}
}
}
The code looks correct. After out.close();, there shouldn't be any locks left.
Things that could still happen:
You have another Java process (for example hanging in a debugger). Your new process tries to write the file, fails (because of process 1) and in the finally, it tries to open Excel which sees the same problem. Make sure you log all exceptions that happen in wb.write(out);
Note: The code above looks correct in this respect, since it only starts Excel when out != null and that should only be the case when Java could open the file.
Maybe the file wasn't written completely (i.e. there was an exception during write()). Excel tries to open the corrupt file and gives you the wrong error message.
Use a tool like Process Explorer to find out which process keeps a lock on a file.
I tried all the options. After looking thoroughly, it seems the problem is the Event User model.
I am using the below code for reading the data:
final OPCPackage pkg = OPCPackage.open(getFileName());
final XSSFReader r = new XSSFReader(pkg);
final SharedStringsTable sst = r.getSharedStringsTable();
final XMLReader parser = fetchSheetParser(sst);
final Iterator<InputStream> sheets = r.getSheetsData();
while (sheets.hasNext()) {
final InputStream sheet = sheets.next();
final InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
}
I assume the excel is for some reason held by this process for some time. If I remove this code and use the below code:
final File file = new File(getFileName());
final FileInputStream fis = new FileInputStream(file);
final XSSFWorkbook xWb = new XSSFWorkbook(fis);
The process works fine an the excel does not remain locked.
I figured it out actually.
A very simple line was required but some some reason it was not explained in the New Halloween Document (http://poi.apache.org/spreadsheet/how-to.html#sxssf)
I checked the Busy Developer's Guide and got the solution.
I needed to add
pkg.close(); //To close the OPCPackage
This added my code works fine with any number of reads and writes on the same excel file.

Why do I failed to read Excel 2007 using POI?

When I try to initialize a Workbook object I always get this error:
The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
But I followed the office sample to do this, following is my code:
File inputFile = new File(inputFileName);
InputStream is = new FileInputStream(inputFile);
Workbook wb = new XSSFWorkbook(is);
Exception occurs at code line:
Workbook wb = new XSSFWorkbook(is);
Here is POI jar including:
poi-3.8-20120326.jar
poi-ooxml-3.8-20120326.jar
poi-ooxml-schemas-3.8-20120326.jar
xmlbeans-2.3.0.jar
Can any guys give me guidance? An example showing how to read a complete Excel 2007 document will be appreciated. Thanks in advance!
I assume that you have recheck that your original file is indeed in Office 2007+XML format, right?
Edit:
Then, if you are sure the format is ok, and it works for you using the WorkbookFactory.create, you can find the answer in the code of such method:
/**
* Creates the appropriate HSSFWorkbook / XSSFWorkbook from
* the given InputStream.
* Your input stream MUST either support mark/reset, or
* be wrapped as a {#link PushbackInputStream}!
*/
public static Workbook create(InputStream inp) throws IOException, InvalidFormatException {
// If clearly doesn't do mark/reset, wrap up
if(! inp.markSupported()) {
inp = new PushbackInputStream(inp, 8);
}
if(POIFSFileSystem.hasPOIFSHeader(inp)) {
return new HSSFWorkbook(inp);
}
if(POIXMLDocument.hasOOXMLHeader(inp)) {
return new XSSFWorkbook(OPCPackage.open(inp));
}
throw new IllegalArgumentException("Your InputStream was neither an OLE2 stream, nor an OOXML stream");
}
This is the bit that you were missing: new XSSFWorkbook(OPCPackage.open(inp))

Categories