Reading <no data> with POI event user model

Reading <no data> with POI event user model - java

I am not sure if I have understood properly, but I think if I use the POI Event model, and there is no data (just blank, not even a space) in a excel column, the data is not read at all.
The problem I have is say I have a model with 15 columns and I am reading the data to populate the model from a excel sheet.
I read the excel and store the data in a List and then add to the model
If one of the column has no data, POI reads nothing and hence the List I create has only 14 columns.
And then I cannot match the columns in the model and the List.
How do I solve this issue?

If you were using HSSF for a .xls file, you could have used MissingRecordAwareHSSFListener which would have sent you special records to indicate gaps. A good example for that is XLS2CSVmra. However, you say you're on XSSF for .xlsx, so it's a little different
The is an Apache POI example that covers missing records in a .xlsx file, you'll want to look at the XLSX2CSV example. Basically, you should have a variable that holds the last column number you saw. When the startElement fires for c (cell), check the reference column against the last one seen. If there's a gap, trigger empty cells to your logic at that point, then record the new column and process.
The logic to get the current column number would be something like:
// Get the cell reference
String r = attributes.getValue("r");
int firstDigit = -1;
for (int c = 0; c < r.length(); ++c) {
if (Character.isDigit(r.charAt(c))) {
firstDigit = c;
break;
}
}
thisColumn = nameToColumn(r.substring(0, firstDigit));
Don't forget to reset the column counter on a new row (endElement for row is often a good one to trigger that on)

Thanks #Gagravarr, for giving me the initial idea to solve the problem.
I did not follow, exactly what you suggested though. Instead what I did is, I compared the characters (ASCII values) from the column references, the current one and the last one.
I kept track of the last column reference and checked that the difference between two column reference should always be 1 i.e. B1 - A1 == 1.
Any deviation from this I considered as a skipping of a column.
The solution works perfectly, below is a snippet of what I did:
char currColChar = attributes.getValue("r").charAt(0);
// Check if we have missed a column due to no data
if ((currColChar - lastColumnChar) > 1) {
//logic for column skipped
}
lastColumnChar = currColChar;

Related

Retrieve the cell position in A1 notation while processing cells in a spreadsheet using sheets-api

Currently I am using sheets-api with a mask of sheets.data.rowData.values.formattedValue to retrieve the formatted values (in string as seen by the user on the spreadsheet) only. And I then try to convert each of the strings to their appropriate data types (as expected, based on the column to which the cell belongs).
But I have a use case of listing down all the cells (in A1 notation) where the cells are uninitialised/ null/ empty/ have incorrect datatype.
Q: How do I know the cell location (in A1 notation) of the CellData that I am processing currently?
PS: Using the row/column indexes to guess the cell location in A1 notation seems like a hack, and is also inaccurate while using the mask for formattedValue. Uninitialised cells are automatically omitted from the list of cell values in a row. So though your spreadsheet might be of the size 4 x 8 (rows x col). RowData for some rows might contain < 8 entries (as some cells might be blank). It is strange but that is the current behaviour of the sheets-api, sadly.

Spark access Row object value

I want to iterate a dataframe by partitions and for each partition iterate all of its rows and create a deleteList of them that will contain HBase's delete objects for each row.
I'm using Spark and HBase with Java and I've created a Row object with the following code:
df.foreachPartition((ForeachPartitionFunction<Row> iterator -> {
while (iterator.hasNext()) {
Row row = RowFactory.create(iterator.next());
deleteList.add(new Delete(Bytes.toBytes(String.valueOf(row))));
}
}
But it won't work because I cannot access row's value correctly. While df has one column named "hbase_key".

It's hard to tell from your post which class exactly is Row, but I suspect it is org.apache.spark.sql.Row ?
If that's the case, try the methods like getString(i) or similar, where i is the index of the column in the row you are trying to access.
Again, depending on how you are configuring your Hbase access, I suspect that in your case the 0 index would be the value of the row-key of the physical HBase table, and the subsequent indices will be the respective column values that are returned with your row. But again, that would depend on how exactly you arrived at this point in your code.
Your Row object should have methods to access other data types as well, such as getInt(i), etc.

Generic Class for JTable

i got a task which iam not sure of how to solve.
I have to fill a JTable with rows i get from a .txt document. The problem is that there are multiple .txt documents which have more or less rows and columns for the JTable.
example:
inside the cars.txt:
id;hp;price;quantity
1;100;7000;5
4;120;20000;2
7;300;80000;3
inside the bikes.txt
id;price;quantity;color;year
3;80;20;red;2010
5;200;40;green;2011
12;150;10;blue;2007
So, when a .txt is chosen a JDialog will pop up with a JTable inside, where the data will be shown.
I thought that i could maybe create a "class Anything" where i have a instance variable String[][] which i can define the sizes by reading the .txt and after saving the data in one array i can count how many rows and how many columns it has,
with the cars.txt example it would be: String[4][3]
Is that a good way to work with or is there a better way to do it?
Thanks for the help :D

Your question is a bit vague on what you want to do specifically.
Do you want to simply fill the table with all data given? Or do you only want certain columns used? When you choose the text files are you aware of which column names they have (can you hardcode this or not).
A good start would be...
EDITED here's the solution.....
DefaultTableModel dtm = (DefaultTableModel)yourJTable.getModel();
// This divides your txt file to a string array divided by rows
string[] RowSplit = yourTxtFileThatYouRead.split("\n");
//this assumes that your txt file contains the column headers
dtm.setColumnHeaders(RowSplit[0].split(";"));
//Start the iteration at 1 to skip the column headers
for (int i = 1; i < RowSplit.length; ++i) {
dtm.addRow(RowSplit[i].split(//some delimeter //));
dtm.fireTableDataChanged();
The first part sets the column headers and enables for variation within your table column size.
The second part sequentially adds rows.
edited for formatting
edited for better answer

As shown in How to Use Tables: Creating a Table Model, you can extend AbstractTableModel to manage models of arbitrary dimensions. Let your model manage a List<List<String>>. Parse the first line of each file into a List<String> that is accessed by your implementations of getColumnCount() and getColumnName(). Parse subsequent lines into one List<String> per row; access the List of such rows in your implementation of getValueAt(). A related example that manages a Map<String, String> is shown here. Although more complex, you can use Class Literals as Runtime-Type Tokens for non-string data; return the token in your implementation of getColumnClass() to get the default render and editor for supported types. Finally, consider one of these file based JDBC drivers for flat files.

POI: wrong number of cell per row form method getPhysicalNumberOfCells

I'm using POI java library to read an Excel.
My Exel have a simple structure composed by 8 columns.
The problem is that reading column length by method getPhysicalNumberOfCells I get different number for each row.

The problem is that getPhysicalNumberOfCells has a different meaning of what I was thinking.
getPhysicalNumberOfCells returns the number of cell in a row that have a content.
In fact POI stores for each row only the data added into exel, for example if you have data in columns 0,3,5, getPhysicalNumberOfCells will return always 3, beacause 3 is the number of "filled" cells.
To achive the purpose of get the logical numeber of cell in a row we use:
getLastCellNum()
According whit documentation, this method gets the index of the last cell. This value is increased BY ONE, so in the example above, whit maximum index of 5 will be 6..
I think this has been done to simplify iteration over rowcell.
Moreover, there is a method
getLastCellNum
that show first cell index.
An example inspired from official documentation:
short minColIdx = row.getFirstCellNum();
short maxColIdx = row.getLastCellNum();
for(short colIdx=minColIdx ; colIx<maxColIdx ; colIdx++) {
Cell cell = row.getCell(colIx);
//get value of cell
}

When does apache poi override Excel formatting and when does it not?

I would like to generate an Excel using Apache POI in which I can display about 2000 records, where each record comprises a date and a value.
I would like this Excel to be formatted properly, colouring the cell backgrounds, and applying appropriate number formats.
I can do both of these tasks, but I can not do the formatting as efficiently as I want.
The 3 methods I have tried for applying the formatting are as follows: All three methods involve using a pre-formatted Excel template. The question is, however, how much formatting to do in Excel (and how to apply the formatting) and how much to do in Java.
Method 1:
Formatting one row in Excel itself, and copying the formatting using Java code. For instance:
Row existingRow = mySheet.getRow(4);
Cell existingCell = existingRow.getCell(0);
CellStyle currentStyle = existingCell.getCellStyle();
for (int w = 0; w < refData.size(); w++) {
MyValues aa = refData.get(w);
Row r = CellUtil.getRow(w + 4, mySheet);
CellUtil.getCell(r, 0).setCellValue(aa.getMarketDate());
if (w>0) {
CellUtil.getCell(r, 0).setCellStyle(currentStyle);
}
Method 2: Select the cells containing the required format in Excel and paste over the region I require (2000 rows) and then just fill in the data using Apache POI
Method 3: Apply the formatting to the columns using Excel, and then just fill in the data using Apache POI.
The third method is by far preferable for me, because (a) I do not need to start programming Java code when I can just pre-format in Excel [note that my real-life problem includes tens of columns and not just one column] (b) Applying a format to a column is highly advantageous in terms of memory used by the workbook.
The only problem is that when Apache POI writes to cells where the format is copied and pasted, then they are displayed fine. When it writes to cells where the format has been applied to the column, then it removes the formatting before pasting.
Is there any way of getting around this? I assume that there isn't because Apache POI works by considering each row individually. For instance, to apply a format to a column, one needs to apply the format to each cell individually in the column

One way would be to use a VBA macro to apply the formatting to the column upon opening the workbook.
Private Sub Workbook_Open()
'column formatting
End Sub
However obviously this has the disadvantage that the user would need to enable macros

You need to read the style from the "guide" cell and apply it to the column. This way new cells will get it as their style.
eg. (Once before you start writing values)
Row guideRow = mySheet.getRow(0);
for (int ii = 0 ; ii < row.getNumColumns; ++ii) {
CellStyle currentStyle = row.getCell(ii).getCellStyle();
mySheet.setDefaultColumnStyle(ii, currentStyle);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading <no data> with POI event user model - java

Related

Retrieve the cell position in A1 notation while processing cells in a spreadsheet using sheets-api

Spark access Row object value

Generic Class for JTable

POI: wrong number of cell per row form method getPhysicalNumberOfCells

When does apache poi override Excel formatting and when does it not?

Categories

Resources