I am using the Apache POI library to read values from an Excel sheet into a Java program.
I iterate through each row of a table to get the values I need.
Within the object Row, there is a TreeMap that contains XSSFCell objects as values.
Normally I get the following TreeMap:
Where key 4 is included. The value often is an empty string as chosen in this picture.
For some reason, for some objects I get the following TreeMap:
Where the key 4 is missing.
Both Row Objects belong to the same table.
This is how I use my object Row:
XSSFSheet mySheet = myWorkBook.getSheet("nameOfSheet");
Iterator<Row> rowIterator = mySheet.iterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// here I call my method
}
You can prevent this from causing inconsistency in your application by calling "getCell()" passing in both an index and a MissingCellPolicy, probably Row.RETURN_BLANK_AS_NULL.
The Apache POI guide explains:
In some cases, when iterating, you need full control over how missing
or blank rows and cells are treated, and you need to ensure you visit
every cell and not just those defined in the file. (The CellIterator
will only return the cells defined in the file, which is largely those
with values or stylings, but it depends on Excel).
In cases such as these, you should fetch the first and last column
information for a row, then call getCell(int, MissingCellPolicy) to
fetch the cell. Use a MissingCellPolicy to control how blank or null
cells are handled.
// Decide which rows to process
int rowStart = Math.min(15, sheet.getFirstRowNum());
int rowEnd = Math.max(1400, sheet.getLastRowNum());
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
if (r == null) {
// This whole row is empty
// Handle it as needed
continue;
}
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
}
Related
I am working with excel reports. I need to generate pivot table with some particular field as default value in row labels rather selecting all fields. I am using apache POI.
This what I am getting automatically when I load excel sheet
This is what I need
AreaReference source = new AreaReference("A1:D5", SpreadsheetVersion.EXCEL2007);
CellReference position = new CellReference(10,0);
XSSFPivotTable pivotTable = sheet1.createPivotTable(source, position,wb.getSheet("1econtent"));
pivotTable.addReportFilter(2);
pivotTable.addRowLabel(0);
pivotTable.addColumnLabel(DataConsolidateFunction.SUM, 1);
pivotTable.addColumnLabel(DataConsolidateFunction.SUM, 1,"% of value");
pivotTable.getCTPivotTableDefinition().getDataFields().getDataFieldArray(1).setShowDataAs(org.openxmlformats.schemas.spreadsheetml.x2006.main.STShowDataAs.PERCENT_OF_COL);
DataFormat dataformat = wb.createDataFormat();
short numFmtId = dataformat.getFormat("0.00%");
pivotTable.getCTPivotTableDefinition().getDataFields().getDataFieldArray(1).setNumFmtId(numFmtId);
pivotTable.getCTPivotTableDefinition().getPivotTableStyleInfo().setName("PivotStyleMedium10");
I tried many ways but I didn't find any answer.
The pivot table creation in apache poi is only rudimentary until now.
Apache poi adds as much pivot field items of type "default" (<item t="default"/>) as rows are in the data source. This is because they don't want have a look at the data and so they are assuming as much different values as rows are in the data source. This is fine because Excel will rebuild its pivot cache while opening.
But if we want preselect items, then this is not fine. Then we must know what items there are that can be preselected.
So we need at least as much items as we want preselect being numbered items: <item x="0"/><item x="1"/><item x="2"/>.... And we must build a cache definition which has shared elements for those items.
To fulfill that requirement we need determine the unique labels all over the rows in data source. Then for each unique label take the item as numbered item. Then build a cache definition which has shared elements for those items. Then set all not wanted items hidden.
Let's have a complete example which creates what you have shown in your question's pictures. Your code in your question is incomplete, so I had must predict what data you have as data source. Please try avoiding that in further questions and always provide Minimal, Reproducible Example. Else you will not get answers further.
import java.io.FileOutputStream;
import org.apache.poi.ss.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.xssf.usermodel.*;
class CreatePivotTablePercentAndFilter {
public static void main(String[] args) throws Exception {
try (Workbook workbook = new XSSFWorkbook();
FileOutputStream fileout = new FileOutputStream("ooxml-pivottable.xlsx") ) {
Sheet pivotSheet = workbook.createSheet("Pivot");
Sheet dataSheet = workbook.createSheet("Data");
setCellData(dataSheet);
AreaReference areaReference = new AreaReference("A1:D5", SpreadsheetVersion.EXCEL2007);
XSSFPivotTable pivotTable = ((XSSFSheet)pivotSheet).createPivotTable(areaReference, new CellReference("A4"), dataSheet);
pivotTable.addRowLabel(0);
pivotTable.addColumnLabel(DataConsolidateFunction.SUM, 1);
pivotTable.addColumnLabel(DataConsolidateFunction.SUM, 1,"% of value");
pivotTable.getCTPivotTableDefinition().getDataFields().getDataFieldArray(1).setShowDataAs(
org.openxmlformats.schemas.spreadsheetml.x2006.main.STShowDataAs.PERCENT_OF_COL);
DataFormat dataformat = workbook.createDataFormat();
short numFmtId = dataformat.getFormat("0.00%");
pivotTable.getCTPivotTableDefinition().getDataFields().getDataFieldArray(1).setNumFmtId(numFmtId);
/*
Apache poi adds 5 pivot field items of type "default" (<item t="default"/>) for each row label here.
This is because there are 5 rows (A1:D5) and, because they don't want have a look at the data,
they are assuming max 5 different values.
This is fine because Excel will rebuild its pivot cache while opening.
But if we want preselect items, then this is not fine. Then we must know what items there are that can be preselected.
So we need at least as much items as we want preselect being numbered items: <item x="0"/><item x="1"/><item x="2"/>...
And we must build a cache definition which has shared elements for those items.
To fulfill that we need determine the unique labels in column.
Then for each unique label take the item as numbered item.
Then build a cache definition which has shared elements for those items.
Then set all not wanted items hidden.
*/
//determine unique labels in column 0
java.util.TreeSet<String> uniqueItems = new java.util.TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
for (int r = areaReference.getFirstCell().getRow()+1; r < areaReference.getLastCell().getRow()+1; r++) {
uniqueItems.add(dataSheet.getRow(r).getCell(areaReference.getFirstCell().getCol()).getStringCellValue());
}
System.out.println(uniqueItems);
int i = 0;
for (String item : uniqueItems) {
//take the items as numbered items: <item x="0"/><item x="1"/><item x="2"/>
pivotTable.getCTPivotTableDefinition().getPivotFields().getPivotFieldArray(0).getItems().getItemArray(i).unsetT();
pivotTable.getCTPivotTableDefinition().getPivotFields().getPivotFieldArray(0).getItems().getItemArray(i).setX((long)i);
//build a cache definition which has shared elements for those items
//<sharedItems><s v="Jack"/><s v="Jane"/><s v="Tarzan"/><s v="Terk"/></sharedItems>
pivotTable.getPivotCacheDefinition().getCTPivotCacheDefinition().getCacheFields().getCacheFieldArray(0)
.getSharedItems().addNewS().setV(item);
i++;
}
//Now we can predefinite a filter.
//If the need is selecting multiple items, first MultipleItemSelectionAllowed needs to be set.
pivotTable.getCTPivotTableDefinition().getPivotFields().getPivotFieldArray(0).setMultipleItemSelectionAllowed(true);
//Then set H(idden) true for all items which not shall be selected. All except "Jane" in this case.
i = 0;
for (String item : uniqueItems) {
if (!"Jane".equals(item))
pivotTable.getCTPivotTableDefinition().getPivotFields().getPivotFieldArray(0).getItems().getItemArray(i).setH(true);
i++;
}
workbook.write(fileout);
}
}
static void setCellData(Sheet sheet) {
Row row;
Cell cell;
Object[][] data = new Object[][]{
new Object[]{"Names", "Values", "ColC", "ColD"},
new Object[]{"Jane", 10d, "?", "?"},
new Object[]{"Tarzan", 5d, "?", "?"},
new Object[]{"Terk", 10d, "?", "?"},
new Object[]{"Jack", 10d, "?", "?"}
};
for (int r = 0; r < data.length; r++) {
row = sheet.createRow(r);
Object[] rowData = data[r];
for (int c = 0; c < rowData.length; c++) {
cell = row.createCell(c);
if (rowData[c] instanceof String) {
cell.setCellValue((String)rowData[c]);
} else if (rowData[c] instanceof Double) {
cell.setCellValue((Double)rowData[c]);
}
}
}
}
}
I need to remove several lines of an excel xls sheet.
These lines always contain the same first cell thats why i check the first cell of all rows to find these rows
SSFCell myCell = myRow.getCell(0);
myCell.setCellType(Cell.CELL_TYPE_STRING);
String foundString = myCell.getStringCellValue();
if(foundString.equals(searchString)){
foundRows.add(rowCount);
}
rowCount++;
I then go on and "remove" those rows using removeRow which nulls all values
public static void removeRows() {
List<Integer> foundRowsToDelete = new ArrayList<Integer>();
//Copy values to another list
for(int i=0; i<foundRows.size(); i++){
foundRowsToDelete.add(foundRows.get(i));
}
//Delete values from rows, leaving empty rows
while(foundRowsToDelete.size()!=0){
int rowIndex = foundRowsToDelete.get(0);
Row removingRow = mySheet.getRow(rowIndex);
if (removingRow != null) {
mySheet.removeRow(removingRow);
foundRowsToDelete.remove(0);
}
}
//Move empty rows to bottom of the sheet
for(int i = 0; i < mySheet.getLastRowNum(); i++){
if(isRowEmpty(i)){
mySheet.shiftRows(i+1, mySheet.getLastRowNum(), -1);
i--;
}
}
}
I check if they are empty through using the duplicated rowcounter
//Comparision of previously detected empty rows and given row count
public static boolean isRowEmpty(int suspectedRowNumber) {
for(int i=0;i<foundRows.size();i++){
if (suspectedRowNumber == foundRows.get(i)){
foundRows.remove(i);
return true;
}
}
return false;
}
However only the first of these rows gets deleted. The rest will stay empty.
I therefore assume that there is something wrong with some incrementing done by me, but i just can't figure out exactly why.
Thanks for your help in advance.
It's not immediately clear why your code isn't working, but I look at a couple things to debug
Your foundRowsToDelete ArrayList is being populated with values contained in the foundRows Array. Are you sure what you expect to find in foundRows is actually there.
Is there a reason you don't remove the row when initally iterating through the rows in your sheet? Maybe something like this:
Sheet sheet = workbook.getSheetAt(0);
For (Row row : sheet) {
SSFCell myCell = row.getCell(0);
if(myCell.getCellType() == Cell.CELL_TYPE_STRING){
String foundString = myCell.getStringCellValue();
if(foundString.equalsIgnoreCase(searchString){
// why not just remove here?
sheet.removeRow(row);
}
}
}
}
When trying to read an Excel sheet I get an exception if some cell is empty:
Cell[] rowCells = sheet.getRow(1);
or
Cell cell = sheet.getCell(0,1);
I always get the same message:
java.lang.ArrayIndexOutOfBoundsException: 1
at jxl.read.biff.SheetImpl.getCell(SheetImpl.java:356)
at gui.ReadExcel.read(ReadExcel.java:45)
at gui.GUIcontroller.chooseSaveFile(GUIcontroller.java:101)
What is the problem? How can I know if the cell is empty, so I won't copy its value?
You can use the getRows or getColumns method to check the bounds of the sheet. The ArrayIndexOutOfBoundsException occurs because you are trying to access a value, which is beyond the range of the farthest cell which is not empty.
int rows = sheet.getRows();
int columns = sheet.getColumns();
int i = 1;
if(i<rows)
Cell[] rowCells = sheet.getRow(i); //Won't throw an Exception
if(i<rows && j<columns)
Cell cell = sheet.getCell(i,j);
In this case you can't read the cell because, as far as jxl is concerned, it doesn't really exist on the spreadsheet. It has yet to be created so there is really no cell to get. It may sound odd because excel sheets go on for what seems like forever though it doesn't store the data of all these empty cells because the file size would be huge. So when jxl goes to read the data it will simply tell you there is nothing there.
If you want to read the cells and all your cells are grouped together than you could try:
int width = sheet.getColumns();
int height = sheet.getRows();
List<Cell> cells = new ArrayList<Cell>();
for(int i=0; i<width; i++){
for(int j=0; j<height; j++){
cells.add(sheet.getCell(i, j));
}
}
If they're not grouped together and your not sure which cells maybe empty there is still a fairly simple solution
List<Cell> cells = new ArrayList<Cell>();
Cell cell = null;
try{
cell = sheet.getCell(0, 1);
}catch(Exception e){
e.printStackTrace();
}finally{
if(cell != null){
cells.add(cell);
}
}
This way you can safely attempt to read a cell and throw it away if it doesn't contain anything.
I hope this is what you were looking for.
I have a program reading excel sheet from a java program.
I am iterating over cells as below:
Iterator cells = row.cellIterator();
String temp;
StringBuilder sb;
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");
while (cells.hasNext()) {
Cell cell = (Cell) cells.next();
temp = null;
switch (cell.getCellType()) {
case Cell.CELL_TYPE_STRING:
temp = cell.getRichStringCellValue().getString();
break;
case Cell.CELL_TYPE_NUMERIC:
if (DateUtil.isCellDateFormatted(cell)) {
temp = sdf.format(cell.getDateCellValue());
} else {
temp = df.format(new BigDecimal(cell.getNumericCellValue()));
}
break;
default:
}
if (temp == null || temp.equalsIgnoreCase("null")) {
sb.append("").append(";");
} else {
sb.append(temp).append(";");
}
}
As seen, I am trying to create a string builder containing values from excel row in semicolon separated way.
Issue is, if a column value is empty, I want it as an empty value in the string builder with two consecutive semicolons.
However, the call
Cell cell = (Cell) cells.next();
simply ignores the empty cells and jumps over to next non empty cell.
So the line
if (temp == null || temp.equalsIgnoreCase("null"))
is never met.
How to get a handle on empty column values as well in the iterator ?
This is virtually a duplicate of this question, and so my answer to that question basically applies exactly to you too.
The Cell Iterator only iterates over cells that are defined in the file. If the cell has never been used in Excel, it probably won't appear in the file (Excel isn't always consistent...), so POI won't see it
If you want to make sure you hit every cell, you should lookup by index instead, and either check for null cells (indicating the cell has never existed in the file), or set a MissingCellPolicy to control how you want null and blank cells to be treated
So, if you really do want to get every cell, do something like:
Row r = sheet.getRow(myRowNum);
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn=0; cn<lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
You can do this
int previous=0;
while(cell.hasNext())
{
Cell cell = (Cell) cells.next();
int current=cell.getColumnIndex();
int numberofsemicolons=previous-current;
for(numberofsemicolons)
{
sb.append("").append(";");
}
previous=current;
}
or you can do
int numberofcells=row.getFirstCellNum()-row.getLastCellNum();
for(i=0;i<=numberofcells;i++)
{
Cell cell = (Cell) cells.next();
int current=cell.getColumnIndex();
while(i<current)
{
sb.append("").append(";");
i++
}
}
Answer posted by #Gagravarr works perfectly for me but MissingCellPolicy is an enum now, so while getting the cell value instead of using
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
I have used
Cell c =r.getCell(cn,Row.MissingCellPolicy.RETURN_BLANK_AS_NULL);
I am using Apache POI java and want to get the total number of rows which are not empty. I successfully processed a whole row with all its columns. Now I am assuming that I get an excel sheet with multiple rows and not a single row...so how to go about that? I was thinking of getting total number of rows (int n) and then loop until i<=n but not sure.
Suggestions are most welcome :)
Note: Apache POI version is 3.8. I am not dealing with Xlsx format...only xls.
Yes I tried this code but got 20 in return....which is not possible given I have only 5 rows
FileInputStream fileInputStream = new FileInputStream("COD.xls");
HSSFWorkbook workbook = new HSSFWorkbook(fileInputStream);
HSSFSheet worksheet = workbook.getSheet("COD");
HSSFRow row1 = worksheet.getRow(3);
Iterator rows = worksheet.rowIterator();
int noOfRows = 0;
while( rows.hasNext() ) {
HSSFRow row = (HSSFRow) rows.next();
noOfRows++;
}
System.out.println("Number of Rows: " + noOfRows);
for (int i = 0; i <= sheet.getLastRowNum(); i++) {
if ((tempRow = sheet.getRow(i)) != null) {
//Your Code Here
}
}
The problem is that POI considers empty rows as physical rows. This happens at times in Excel and while they are not visible to the eye, the rows certainly exist.
If you were to open you Excel sheet and select everything below your data, then delete it (i know it is empty looking, but do it anyway), POI will return the right number.
You may want to getPhysicalNumberOfRows() other than getLastRowNum()?
You can iterate over the rows which are not empty using this:
Iterator<Row> rowIterator = sheet.rowIterator();
while (rowIterator.hasNext()) {
Row row = (Row) rowIterator.next();
// Your code here
}
Thanks
worksheet.getLastRownNum() // *base index 0*
This method will give you the last row number where you might have fill the row, even if you have filled 5 rows, there can be cases that you might have filled some spaces in the remaining 15 rows or at the 21st row because of which it is giving last row number as 20.
There can be the cases that in every 5th row you enters data (starting from 1), then your 5th entry will be in 21st row, so again, if you use this method, you will get 20 in result.