Generic way to parse excel files

Generic way to parse excel files - java

I have a requirement to parse the excel file and create the list of objects from that. TO do the same we are using the There are org.apache.poi to read the excel file and we are able to get the required details, currently we are getting the cell value based on the index and set it to the object field. But we believe it is not a good way to get the values based on the index and we should find a generic way to successfully parse the excel file in case some columns are added or removed so that we don't have to do much effort on code. I came across this article which almost fulfills the requirements but used the reflection methods which we are not allowed to use. Is there any possible way to parse the excel file without using the cell index where we don't have to put much effort if the format of the excel file gets changed?
public List<DTO> jsonConverter(Workbook workbook, Sheet sheet, String filename)
throws ParseException {
List<DTO> listOfDTOs = new ArrayList<>();
Row row;
for (int index = 1; index <= sheet.getLastRowNum(); index++) {
row = sheet.getRow(index);
if (row != null) {
DTO dto = new DTO();
dto.setFieldX(
getCellValueAsStringBasedOnCellType(
workbook, row.getCell(0, MissingCellPolicy.CREATE_NULL_AS_BLANK)));
dto.setFieldY(
getCellValueAsStringBasedOnCellType(
workbook, row.getCell(1, MissingCellPolicy.CREATE_NULL_AS_BLANK)));
listOfDTOs.add(dto);
}
}
return listOfDTOs;
}
public String getCellValueAsStringBasedOnCellType(Workbook workbook, Cell cell) {
DataFormatter formatter = new DataFormatter();
if (cell != null && cell.getCellType() == CellType.FORMULA) {
FormulaEvaluator evaluator = workbook.getCreationHelper().createFormulaEvaluator();
return formatter.formatCellValue(cell, evaluator);
}
return formatter.formatCellValue(cell);
}

Sure. It's fairly simple:
The setup
Read the first row in the excel file, treat it as columns, and read every cell as a string.
Store these in an array.
Now you can turn any index into the column name, simply using the expression headers[idx].
Thus, for any given cell, you know the header name. Now you need to translate this knowledge into the right call. Given that you're in column, say, E (i == 4), the header value is header[4] which is, say, Address, then you want to take the string ("Address") and turn that into the right call. You end up needing to invoke:
String cellValue = getCellValueAsStringBasedOnCellType(workbook, row.getCell(1, MissingCellPolicy.CREATE_NULL_AS_BLANK)));
dto.setFieldAddress(cellValue);
Everything in that code snippet is the same for any value of that string, except the setFieldAddress name.
So, we need to turn the string "Address" into the act of invoking setFieldAddress.
The solution
java.util.function and hashmaps to the rescue!
This is a way to store in a variable the concept of taking a dto instance and setting the Address field:
BiConsumer<DTO, String> setAddress = (dto, value) -> dto.setFieldAddress(value);
or even simpler:
BiConsumer<DTO, String> setAddress = DTO::setFieldAddress;
These snippets do the same thing: They don't set an address; they are a recipe for how that is done, and you store the concept of setting an address on a DTO in a variable so you can run it later and as many times as you want. This is generally called a 'closure' a 'lambda'.
We can store these things in a map:
Map<String, BiConsumer<DTO, String>> dtoSetters = new HashMap<>();
dtoSetters.put("Address", DTO::setFieldAddress);
And then we can just figure it out:
int colIdx = ...;
String headerName = header[colIdx];
var setter = dtoSetters.get(headerName);
if (setter == null) throw new IllegalStateException("Unexpected column header in excel sheet: " + headerName);
String cellValue = getCellValueAsStringBasedOnCellType(workbook, row.getCell(1, MissingCellPolicy.CREATE_NULL_AS_BLANK)));
setter.apply(dto, cellValue);
Thus, make that map (once, at system boot, e.g. with static initializers), replace your dto.setFieldX code with the above, and voila.

Related

Cannot unbox null value

I have the following code:
List<Details> detailsList = new ArrayList<>();
List<String[]> csv = csvReader.readAll();
final Map<String, Integer> mappedHeaders = mapHeaders(csv.get(0));
List<String[]> data = csv.subList(1, csv.size());
for (String[] entry : data) {
Details details = new Details(
entry[mappedHeaders.get("A")],
entry[mappedHeaders.get("B")],
entry[mappedHeaders.get("C")]);
detailsList.add(details);
I'm essentially reading in a CSV file as a list of string arrays where the first list item is the CSV file headers and all remaining elements correspond to the data rows. However, since different CSV files of the same features might have different feature column ordering I don't know the ordering in advance. For that, I have a mapHeaders method which maps the headers to indices so I can later properly put together the Details object (for example, if headers are ["B", "A", "C"], the mappedHeaders would correspond to {B: 0; A: 1; C: 2}.
I also have some test data files of different column orderings and all but one of them work as they should. However, the one that doesn't work gives me
java.lang.NullPointerException: cannot unbox null value
when trying to evaluate entry[mappedHeaders.get("A")]. Additionally, when running the code in debugging mode, the mappedHeaders contains the correct keys and values and the value for "A" isn't null.
I have also tried entry[mappedHeaders.getOrDefault("A", Arrays.asList(csv.get(0)).indexOf("A"))] which returns -1. The only thing that works is entry[mappedHeaders.getOrDefault("A", 0)] since A is the first column in the failing case, but that workaround don't seem very feasible as there might be more failing cases that I don't know about, but where the ordering is different. What might be the reason for such behavior? Might it be some weird encoding issue?

That's because you are trying to unbox a null value.
A method like intValue, longValue() or doubleValue() is being called on a null object.
Integer val = null;
if (val == 1) {
// NullPointerException
}
Integer val = null;
if (val == null) {
// This works
}
Integer val = 0;
if (val == 1) {
// This works
}

Comparing Keys in a Hashmap

I have a test.csv file that is formatted as:
Home,Owner,Lat,Long
5th Street,John,5.6765,-6.56464564
7th Street,Bob,7.75,-4.4534564
9th Street,Kyle,4.64,-9.566467364
10th Street,Jim,14.234,-2.5667564
I have a hashmap that reads a file that contains the same header contents such as the CSV, just a different format, with no accompanying data.
In example:
Map<Integer, String> container = new HashMap<>();
where,
Key, Value
[0][NULL]
[1][Owner]
[2][Lat]
[3][NULL]
I have also created a second hash map that:
BufferedReader reader = new BufferedReader (new FileReader("test.csv"));
CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT);
Boolean headerParsed = false;
CSVRecord headerRecord = null;
int i;
Map<String,String> value = new HashMap<>();
for (final CSVRecord record : parser) {
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
for (i =0; i< record.size(); i++) {
value.put (headerRecord.get(0), record.get(0));
}
}
I want to read and compare the hashmap, if the container map has a value that is in the value map, then I put that value in to a corresponding object.
example object
public DataSet (//args) {
this.home
this.owner
this.lat
this.longitude
}
I want to create a function where the data is set inside the object when the hashmaps are compared and when a value map key is equal to a contain map key, and the value is placed is set into the object. Something really simply that is efficient at handling the setting as well.
Please note: I made the CSV header and the rows finite, in real life, the CSV could have x number of fields(Home,Owner,Lat,Long,houseType,houseColor, ect..), and a n number of values associated to those fields

First off, your approach to this problem is too unnecessarily long. From what I see, all you are trying to do is this:
Select a two columns from a CSV file, and add them to a data structure. I highlighted the word two because in a map, you have a key and a value. One column becomes the key, and the other becomes the value.
What you should do instead:
Import the names of columns you wish to add to the data structure into two strings. (You may read them from a file).
Iterate over the CSV file using the CSVParser class that you did.
Store the value corresponding to the first desired column in a string, repeat with the value corresponding to the second desired column, and push them both into a DataSet object, and push the DataSet object into a List<DataSet>.
If you prefer to stick to your way of solving the problem:
Basically, the empty file is supposed to hold just the headers (column names), and that's why you named the corresponding hash map containers. The second file is supposed to contain the values and hence you named the corresponding hash map values.
First off, where you say
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
you probably mean to say
if (!headerParsed) {
headerRecord = record;
headerParsed = true;
}
and where you say
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(0), record.get(0));
}
you probably mean
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(i), record.get(i));
}
i.e. You iterate over one record and store the value corresponding to each column.
Now I haven't tried this code on my desktop, but since the for loop also iterates over Home and Longitude, I think it should create an error and you should add an extra check before calling value.put (i.e. value.put("Home", "5th Street") should create an error I suppose). Wrap it inside an if conditional and check of the headerRecord(i) even exists in the containers hash map.
for (i =0; i< record.size(); i++) {
if (container[headerRecord.get(i)] != NULL) {
value.put(headerRecord.get(i), record.get(i));
}
}
Now thing is, that the data structure itself depends on which values from the containers hash map you want to store. It could be Home and Lat, or Owner and Long. So we are stuck. How about you create a data structure like below:
struct DataSet {
string val1;
string val2;
}
Also, note that this DataSet is only for storing ONE row. For storing information from multiple rows, you need to create a Linked List of DataSet.
Lastly, the container file contains ALL the column names. Not all these columns will be stored in the Data Set (i.e. You chose to NULL Home and Long. You could have chosen to NULL Owner and Lat), hence the header file is not what you need to make this decision.
If you think about it, just iterate over the values hash map and store the first value in string val1 and the second value in val2.
List<DataSet> myList;
DataSet row;
Iterator it = values.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
row.val1 = pair.getKey();
row.val2 = pair.getValue();
myList.add(row);
it.remove();
}
I hope this helps.

Excel Columns into Java Using API

I had to create a program that calculates the GPA, using apache poi that reads the xlsx excel file. It coints 220 rows and 4 columns, such as
Course Number Course Name Credit Hours Course Multipler
110 Eng 1 CP 5.0 1.0
There are 220 other courses.
However, I was able to print those data using cell.getStringCellValue and cell.getNumericCellValue, but I can't get these printed data into each array.
I wanted to create an array called courseNumList and put courseNumList[0] the first course Number, and the second course number in courseNumList[1].. on and on..
I want to create 4 arrays, but what is a good way?
private static ArrayList<Object> c = new ArrayList <Object>();
public static void readXLSXFile() throws IOException {
InputStream ExcelFileToRead = new FileInputStream("C:/Users/14KimTa/Desktop/Downloads/Course_List.xlsx");
XSSFWorkbook wb = new XSSFWorkbook(ExcelFileToRead);
XSSFWorkbook test = new XSSFWorkbook();
XSSFSheet sheet = wb.getSheetAt(0);
XSSFRow row;
XSSFCell cell;
Iterator rows = sheet.rowIterator();
while (rows.hasNext())
{
row=(XSSFRow) rows.next();
Iterator cells = row.cellIterator();
while (cells.hasNext())
{
cell=(XSSFCell) cells.next();
if (cell.getCellType() == XSSFCell.CELL_TYPE_STRING)
{
System.out.print(cell.getStringCellValue()+" ");
c.add(getStringCellValue());
}
else if(cell.getCellType() == XSSFCell.CELL_TYPE_NUMERIC)
{
System.out.print(cell.getNumericCellValue()+" ");
}
}
System.out.println();
}
}
this is my code so far.
I tried to create each columns into arrays, but it is not working at all.
Thanks!

I would create a new class to define your data, Course, with one field for each of your columns (4 fields). Then I would create some kind of List (ArrayList<Course> looks good) to hold all Courses. An array of Courses would work too, since you know how many there are from the start. In a loop, I would create one Course object for each row, setting the fields based on the values from cell.getStringCellValue() and cell.getNumericCellValue(), adding the Course to the List (or array) after processing each row.

I don't think creating one array per each column is a good idea. Keeping track of data in the same row by following the same indexes in the 4 arrays may be cumbersome and bad practice.
I would rather create a Java class - Course - with 4 fields -courseNumber, courseName, creditHours and courseMultiplier. Then, I would create a collection of such objects, e.g. Collection<Course> courses = new ArrayList<Course>();, and populate it according to the data read from Excel - one object per row.
EDIT:
I'd suggest you create a custom type instead of using Object for your ArrayList type parameter. You're not gaining much by using Object.
Then, for each row, you'd do the following:
//...obtain your values from cells and populate `courseNumber`, `courseName`,`creditHours` and `courseMultiplier` accordingly
Course course = new Course();
course.setCourseNumber(courseNumber);
course.setCourseName(courseName);
course.setCreditHours(creditHours);
course.setCourseMultiplier(courseMultiplier);
c.add(course);
This snippet of code should be placed inside the loop you use for iterating through rows.

Query View doesn't return any values in couchdb

I have a view defined as:
function(doc)
{
if (doc.type="user")
{
emit([doc.uid, doc.groupid], null);
}
}
In Java code, I have written
List<String> keys = new ArrayList<String>();
keys.add("93");
keys.add("23");
ViewQuery q = createQuery("getProfileInfo").descending(true).keys(keys).includeDocs(true);
ViewResult vr = db.queryView(q);
List<Row> rows = vr.getRows();
for (Row row : rows) {
System.out.println("Key--->"+row.getKey());
System.out.println("Value--->"+key);
}
My code always returns 0 rows - what have I missed?

I suspect you have a type mismatch, but it's impossible to tell for sure without seeing the view's rows. So, if I'm wrong please post an example row from your view.
'keys' is encoded to JSON before it is sent to CouchDB. You're adding two strings - "93" and "23" - but I'm guessing they're actually integers in the documents. In JSON, a string and an integer are encoded differently. A pair of strings is encoded to ["93", "23"] and a pair of integers is encoded to [93, 23].
If I'm correct then 'keys' should be defined as List<Integer> (or however that looks in Java).

Modified the code to
ComplexKey keys = ComplexKey.of(“93”,”23”);
ViewQuery q = createQuery("getProfileInfo").descending(true).key(keys).includeDocs(true);
ViewResult vr = db.queryView(q);
List<Row> rows = vr.getRows();
for (Row row : rows) {
System.out.println("Key--->"+row.getKey());
System.out.println("Value--->"+row.getValue());
}
and it works fine now.

How to merge CSV files in Java

My first CSV file looks like this with header included (header is included only at the top not after every entry):
NAME,SURNAME,AGE
Fred,Krueger,Unknown
.... n records
My second file might look like this:
NAME,MIDDLENAME,SURNAME,AGE
Jason,Noname,Scarry,16
.... n records with this header template
The merged file should look like this:
NAME,SURNAME,AGE,MIDDLENAME
Fred,Krueger,Unknown,
Jason,Scarry,16,Noname
....
Basically if headers don't match, all new header titles (columns) should be added after original header and their values according to that order.
Update
Above CSV were made smaller so I can illustrate what I want to achieve, in reality CSV files are generated one step before this (merge) and can be up to 100 columns
How can I do this?

I'd create a model for the 'bigger' format (a simple class with four fields and a collection for instances of this class) and implemented two parsers, one for the first, one for the second model. Create records for all rows of both csv files and implement a writer to output the csv in the correct format. IN brief:
public void convert(File output, File...input) {
List<Record> records = new ArrayList<Record>();
for (File file:input) {
if (input.isThreeColumnFormat()) {
records.addAll(ThreeColumnFormatParser.parse(file));
} else {
records.addAll(FourColumnFormatParser.parse(file));
}
}
CsvWriter.write(output, records);
}
From your comment I see, that you a lot of different csv formats with some common columns.
You could define the model for any row in the various csv files like this:
public class Record {
Object id; // some sort of unique identifier
Map<String, String> values; // all key/values of a single row
public Record(Object id) {this.id=id;}
public void put(String key, String value){
values.put(key, value);
}
public void get(String key) {
values.get(key);
}
}
For parsing any file you would first read the header and add the column headers to a global keystore (will be needed later on for outputting), then create records for all rows, like:
//...
List<Record> records = new ArrayList<Record>()
for (File file:getAllFiles()) {
List<String> keys = getColumnsHeaders(file);
KeyStore.addAll(keys); // the store is a Set
for (String line:file.getLines()) {
String[] values = line.split(DELIMITER);
Record record = new Record(file.getName()+i); // as an example for id
for (int i = 0; i < values.length; i++) {
record.put(keys.get(i), values[i]);
}
records.add(record);
}
}
// ...
Now the keystore has all used column header names and we can iterate over the collection of all records, get all values for all keys (and get null if the file for this record didn't use the key), assemble the csv lines and write everything to a new file.

Read in the header of the first file and create a list of the column names. Now read the header of the second file and add any column names that don't exist already in the list to the end of the list. Now you have your columns in the order that you want and you can write this to the new file first.
Next I would parse each file and for each row I would create a Map of column name to value. Once the row is parsed you could then iterate over the new list of column names and pull the values from the map and write them immediately to the new file. If the value is null don't print anything (just a comma, if required).
There might be more efficient solutions available, but I think this meets the requirements you set out.

Try this:
http://ondra.zizka.cz/stranky/programovani/ruzne/querying-transforming-csv-using-sql.texy
crunch input.csv output.csv "SELECT AVG(duration) AS durAvg FROM (SELECT * FROM indata ORDER BY duration LIMIT 2 OFFSET 6)"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Generic way to parse excel files - java

Related

Cannot unbox null value

Comparing Keys in a Hashmap

Excel Columns into Java Using API

Query View doesn't return any values in couchdb

How to merge CSV files in Java

Categories

Resources