POI: wrong number of cell per row form method getPhysicalNumberOfCells - java

I'm using POI java library to read an Excel.
My Exel have a simple structure composed by 8 columns.
The problem is that reading column length by method getPhysicalNumberOfCells I get different number for each row.

The problem is that getPhysicalNumberOfCells has a different meaning of what I was thinking.
getPhysicalNumberOfCells returns the number of cell in a row that have a content.
In fact POI stores for each row only the data added into exel, for example if you have data in columns 0,3,5, getPhysicalNumberOfCells will return always 3, beacause 3 is the number of "filled" cells.
To achive the purpose of get the logical numeber of cell in a row we use:
getLastCellNum()
According whit documentation, this method gets the index of the last cell. This value is increased BY ONE, so in the example above, whit maximum index of 5 will be 6..
I think this has been done to simplify iteration over rowcell.
Moreover, there is a method
getLastCellNum
that show first cell index.
An example inspired from official documentation:
short minColIdx = row.getFirstCellNum();
short maxColIdx = row.getLastCellNum();
for(short colIdx=minColIdx ; colIx<maxColIdx ; colIdx++) {
Cell cell = row.getCell(colIx);
//get value of cell
}

Related

Retrieve the cell position in A1 notation while processing cells in a spreadsheet using sheets-api

Currently I am using sheets-api with a mask of sheets.data.rowData.values.formattedValue to retrieve the formatted values (in string as seen by the user on the spreadsheet) only. And I then try to convert each of the strings to their appropriate data types (as expected, based on the column to which the cell belongs).
But I have a use case of listing down all the cells (in A1 notation) where the cells are uninitialised/ null/ empty/ have incorrect datatype.
Q: How do I know the cell location (in A1 notation) of the CellData that I am processing currently?
PS: Using the row/column indexes to guess the cell location in A1 notation seems like a hack, and is also inaccurate while using the mask for formattedValue. Uninitialised cells are automatically omitted from the list of cell values in a row. So though your spreadsheet might be of the size 4 x 8 (rows x col). RowData for some rows might contain < 8 entries (as some cells might be blank). It is strange but that is the current behaviour of the sheets-api, sadly.

Apache POI - Formula to find maximum length of characters in a column

Hi All,
My objective was to find the maximum length of characters in a column. To achieve that I thought of using the below piece of code but it is not providing the expected result.
String range = "MAX(LEN(A1:A3))";
formulaCell.setCellFormula(range);
formulaEvaluator.evaluateInCell(formulaCell);
Below is the issue which I am facing
The formula given in the code is being set as =MAX(LEN(#A1:A3)) because of which the formula returns the length of the cell in the same row where the formula is being inserted.
I am not sure why apache POI is adding '#' to the formula if someone please let me know if there are any alternate ways.
It is not apache poi which puts the # into the formula. Excel 365 is doing this because the formula part LEN(A1:A3) is wrong for a normal cell formula as LEN only expects one parameter and not multiple or a cell range. So MAX(LEN(A1:A3)) would must be an array formula.
So if you would set the formula as such:
...
String formula = "MAX(LEN(A1:A3))";
//cell.setCellFormula(formula);
CellRange cellRange = sheet.setArrayFormula(formula, CellRangeAddress.valueOf("B1"));
...
Then Excel would handle that formula correctly.
But FormulaEvaluator cannot evaluate such array formulas correctly. It would evaluate only MAX(LEN(A1)).
So if the need is that apache poi shall know the maximum length of characters in a column, then the only way is iterating over all rows (cells) in that column, get the length of cell contents for each cell and get the maximum.
Btw.: If someone wonders why Excel 365 not has the need to put that formula as an array formula using CtrlShiftEnter and not has array markers ({=MAX(LEN(A1:A3)}) around that formula:
Excel 365 has a new feature called dynamic array formulas. This feature detects array formulas - in this case because LEN has a cell range as the argument - and then marks such formulas as array automatically. Additional it puts a metadata information to the cell that this cell contains an array formula. So the special markers {...} are not more needed. But apache poi does not support that new feature.

Apache POI formula to check any of the cell from the range is not blank

I am working with Apache POI, using conditional formating. I want to be able to write a formula such as - if any of the columns from within the specified range is not a number then highlight all of them. I am trying to use with the formula - ISNUMBER($J1:P1000). But this does not work.
ConditionalFormattingRule rule = sheetCF.createConditionalFormattingRule("ISNUMBER($J1:P1000))");
If I try with just single cell with formula - ISNUMBER($J1) it works. But I want condition if any of the cells through J to P is a number then do some highlighting.
Details of code to highlight some cells based on some rule is given in this thread, so not repeating- Apache POI - Conditional formatting - need to set different cell range for rule and formatting
As I understand the question now (also took comments into account), the requirement is to highlight the whole range J1:P[n] (I will take J1:P1000 for example) if any of the cells within this range contains numeric content. This is posible using a formula as the ConditionalFormattingRule.
Background knowledge:
Conditional formatting (CF) works having rules applied to a range of cells and having formats to use if the rule is fulfilled. While CF process runs, each cell in the applied range is tested whether it fulfills the rule. If so, the format will be used, if not, then not.
So if the rule is a formula, then we must look at this formula from point of view of each single cell in the range. There it plays a important role
whether cell references in the formula are relative or are fixated using $.
In cell references the $ can fixing the column reference as well as the row reference. For example in A1 both, the column reference as well as the row reference, are relative. In $A1, the column reference to column A is fix and the row reference is relative. In A$1, the column reference is relative and the row reference to row 1 is fix. In $A$1 both, the column reference to column A as well as the row reference to row 1 are fix. So this last reference will always referencing cell A1.
The concrete examples:
In my answer Apache POI - Conditional formatting - need to set different cell range for rule and formatting, which is related to this answer, a formula rule: AND(ISNUMBER($C1), $C1>5) is applied to the range G1:L1000. So from point of view the single cell in G1:L1000, the rule checks the following:
Is the cell value in column $C (always in column C because this reference is fixated), in same row where the single cell exists (because the row references are relative), numeric and greater than 5?
In comment I have suggested a rule AND(ISNUMBER($C1), $C1>5, G1="") applied to the same range G1:L1000. This checks the same as the above and:
Is the single cell in columns G:L, where the single cell exists, (not always in column G because the column reference is relative), in same row where the single cell exists (because the row references are relative), empty (equals an empty string)?
Now your actual requirement:
"to highlight the whole range J1:P1000 if any of the cells within this range contans numeric content"
The function COUNT does only count numbers. So COUNT($J$1:$P$1000) will be greater than 0 if any cell in J1:P1000 contains a number.
So
ConditionalFormattingRule rule = sheetCF.createConditionalFormattingRule("(COUNT($J$1:$P$1000)>0)");
applied to CellRangeAddress.valueOf("J1:P1000") could work as you wants.
From point of view the single cell for each cell the COUNT must count the whole range. Thats why the references in $J$1:$P$1000 are all fixated and are not relative.

Reading <no data> with POI event user model

I am not sure if I have understood properly, but I think if I use the POI Event model, and there is no data (just blank, not even a space) in a excel column, the data is not read at all.
The problem I have is say I have a model with 15 columns and I am reading the data to populate the model from a excel sheet.
I read the excel and store the data in a List and then add to the model
If one of the column has no data, POI reads nothing and hence the List I create has only 14 columns.
And then I cannot match the columns in the model and the List.
How do I solve this issue?
If you were using HSSF for a .xls file, you could have used MissingRecordAwareHSSFListener which would have sent you special records to indicate gaps. A good example for that is XLS2CSVmra. However, you say you're on XSSF for .xlsx, so it's a little different
The is an Apache POI example that covers missing records in a .xlsx file, you'll want to look at the XLSX2CSV example. Basically, you should have a variable that holds the last column number you saw. When the startElement fires for c (cell), check the reference column against the last one seen. If there's a gap, trigger empty cells to your logic at that point, then record the new column and process.
The logic to get the current column number would be something like:
// Get the cell reference
String r = attributes.getValue("r");
int firstDigit = -1;
for (int c = 0; c < r.length(); ++c) {
if (Character.isDigit(r.charAt(c))) {
firstDigit = c;
break;
}
}
thisColumn = nameToColumn(r.substring(0, firstDigit));
Don't forget to reset the column counter on a new row (endElement for row is often a good one to trigger that on)
Thanks #Gagravarr, for giving me the initial idea to solve the problem.
I did not follow, exactly what you suggested though. Instead what I did is, I compared the characters (ASCII values) from the column references, the current one and the last one.
I kept track of the last column reference and checked that the difference between two column reference should always be 1 i.e. B1 - A1 == 1.
Any deviation from this I considered as a skipping of a column.
The solution works perfectly, below is a snippet of what I did:
char currColChar = attributes.getValue("r").charAt(0);
// Check if we have missed a column due to no data
if ((currColChar - lastColumnChar) > 1) {
//logic for column skipped
}
lastColumnChar = currColChar;

Apache POI : Difference between row.getLastCellNum and row.getNoOfPhysicalCell

What is the difference between row.getLastCellNum() and row.getNoOfPhysicalCell() in POI HSSFRow ? Or are they same ?
getLastCellNum()
Gets the index of the last cell contained in this row PLUS ONE
getPhysicalNumberOfCells()
gets the number of defined cells (NOT number of cells in the actual row!). That is to say if only columns 0,4,5 have values then there would be 3.

Categories