SuperCSV reading from multiple files and parsing into one bean object - java

I am currently trying to read in multiple CSV files using beanReader before taking a few columns from each and parsing them into one bean.
So far I cannot seem to parse columns from different files into one bean object. Is this even possible with ICsvBeanReader?

Yes, it's possible :) As of Super CSV 2.2.0 you can read into an existing bean (see javadoc).
The following example uses 3 readers simultaneously (operating on 3 different files) - the first reader is used to create the bean, the other 2 just update the existing bean. This approach assumes that each file has the same number of rows (and that each row number represents the same person). If they don't, but they share some unique identifier, you'll have to read all the records from the first file into memory first, then update from the second/third matching on the identifier.
I've tried to make it a little bit smart, so you don't have to hard-code the name mapping - it just nulls out the headers it doesn't know about (so that Super CSV doesn't attempt to map fields that don't exist in your bean - see the partial reading examples on the website). Of course this will only work if your file has headers - otherwise you'll just have to hard code the mapping arrays with nulls in the appropriate places.
Person bean
public class Person {
private String firstName;
private String sex;
private String country;
// getters/setters
}
Example code
public class Example {
private static final String FILE1 = "firstName,lastName\nJohn,Smith\nSally,Jones";
private static final String FILE2 = "age,sex\n21,male\n24,female";
private static final String FILE3 = "city,country\nBrisbane,Australia\nBerlin,Germany";
private static final List<String> DESIRED_HEADERS = Arrays.asList("firstName", "sex", "country");
#Test
public void testMultipleFiles() throws Exception {
try (
ICsvBeanReader reader1 = new CsvBeanReader(new StringReader(FILE1), CsvPreference.STANDARD_PREFERENCE);
ICsvBeanReader reader2 = new CsvBeanReader(new StringReader(FILE2), CsvPreference.STANDARD_PREFERENCE);
ICsvBeanReader reader3 = new CsvBeanReader(new StringReader(FILE3), CsvPreference.STANDARD_PREFERENCE);){
String[] mapping1 = getNameMappingFromHeader(reader1);
String[] mapping2 = getNameMappingFromHeader(reader2);
String[] mapping3 = getNameMappingFromHeader(reader3);
Person person;
while((person = reader1.read(Person.class, mapping1)) != null){
reader2.read(person, mapping2);
reader3.read(person, mapping3);
System.out.println(person);
}
}
}
private String[] getNameMappingFromHeader(ICsvBeanReader reader) throws IOException{
String[] header = reader.getHeader(true);
// only read in the desired fields (set unknown headers to null to ignore)
for (int i = 0; i < header.length; i++){
if (!DESIRED_HEADERS.contains(header[i])){
header[i] = null;
}
}
return header;
}
}
Output
Person [firstName=John, sex=male, country=Australia]
Person [firstName=Sally, sex=female, country=Germany]

Related

Accessing Object Property for Individual using Apache Jena

I have created an OWL ontology using Protégé, describing a patient database system. I am now attempting to develop a Java code using Apache Jena to read the OWL file I created, then perform a number of operations on it. My primary goal is to get my code to be able to find a specific Individual by name (Patient name for example) and then access a specific Object Property for that individual, and output its value. For example, A patient "John" has an object property "Treated_By" which corresponds to another individual "Amy" (Amy is an individual of type doctor). However, I have been unable to figure out which Jena method is used to retrieve Object property values from a certain individual.
Here is my code (Please ignore comments, they are fragments of previous attempts for this task):
public class Main {
public static void main(String[] args) {
OntModel model = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
String fileName = "C:/Users/Ahmed Medhat/Documents/assignment1ontv3.0.owl";
try {
InputStream inputStream = new FileInputStream(fileName);
model.read(inputStream, "RDF/XML");
//model.read(inputStream, "OWL/XML");
inputStream.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
Scanner sc = new Scanner(System.in);
System.out.println("Enter Patient Name: ");
String patientName = sc.next();
ExtendedIterator<Individual> itI = model.listIndividuals();
while (itI.hasNext()) {
Individual i = itI.next();
String localName = i.getLocalName();
//System.out.println(patientName);
//System.out.println(localName);
if(localName.equals(patientName))
{
//System.out.println("Conditional Code Accessed.");
OntClass Class = i.getOntClass();
System.out.println("Patient Disease is: " + Class.listDeclaredProperties());
}
System.out.println("Failed.");
}
}
}
Try this (replace the property URI accordingly):
final Property p = model.createObjectProperty("http://example.org/Treated_by");
final RDFNode object = i.getPropertyValue(p);

Only partialially parse a CSV file with OpenCSV

I have a CSV file which I want to parse in Java with OpenCSV's csvreader.
To do so I have created a bean object to which the information is mapped. Mine is a bit long so here's an example I got from a tutorial :
package net.viralpatel.java;
public class Country {
private String countryName;
private String capital;
public String getCountryName() {
return countryName;
}
public void setCountryName(String countryName) {
this.countryName = countryName;
}
public String getCapital() {
return capital;
}
public void setCapital(String capital) {
this.capital = capital;
}
}
The code I used to parse my CSV file and map the information to the bean resembles this one :
ColumnPositionMappingStrategy strat = new ColumnPositionMappingStrategy();
strat.setType(Country.class);
String[] columns = new String[] {"countryName", "capital"};
strat.setColumnMapping(columns);
CsvToBean csv = new CsvToBean();
String csvFilename = "C:\\sample.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List list = csv.parse(strat, csvReader);
The problem is that my CSV contains not only raw data but also column titles and other data. For the column titles, I solved the issue by only reading my file from a certain line with :
CSVReader csvReader = new CSVReader(new FileReader(csvFilename), ';', '\'', 1);
(1 being the line from which the reading starts)
The other data is mostly strings in (for example) integer columns at the end of the file.
For example i have a "Max Speed" column with integer information, just next to a "Distance" column with integer information too. But at the end of the "Distance" column there is the total distance, so the String "total:" is in the "Max Speed" column right next to it.
What can I do to ensure that the reader ignores this last lines and only reads the raw information above?
PS : the CSV files I read have different lengths. So saying "stop reading after line X" won't do the trick. On the other hand the "appendix" lines are always the same. So saying "Stop reading two lines before the end of the file" should work.
Thank you very much for your help.
You can always fall to lower level and check raw string array before map it into bean like this:
ColumnPositionMappingStrategy<Country> strat = new ColumnPositionMappingStrategy<Country>();
strat.setType(Country.class);
String[] columns = new String[] {"countryName", "capital"};
strat.setColumnMapping(columns);
PublicProcessLineCsvToBean<Country> csv = new PublicProcessLineCsvToBean<Country>();
String csvFilename = "C:\\sample.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List<Country> list = new ArrayList<Country>();
String [] row = csvReader.readNext(); //skip header
if(row == null) throw new RuntimeException("File is empty");
row = csvReader.readNext();
String [] nextRow = csvReader.readNext();
while(row != null) {
if(nextRow == null) break; //check what 'row' is last
if("Total:".equalsIgnoreCase(row[1])) break; //check column for special strings
list.add(csv.processLine(strat, row));
row = nextRow;
nextRow = csvReader.readNext();
}
and to make processLine public:
public static class PublicProcessLineCsvToBean<T> extends CsvToBean<T> {
#Override
public T processLine(MappingStrategy<T> mapper, String[] line) throws IllegalAccessException, InvocationTargetException, InstantiationException, IntrospectionException {
return super.processLine(mapper, line);
}
}
If you are using the newer versions of opencsv then inject a CsvToBeanFilter into you CSVtoBean class. The opencsv javadoc gives an excellent example of how to create a filter. For your example you would just create a filter whose allowLine method would return false if the Max Speed is null, empty or "total:"

How to read from particular header in opencsv?

I have a csv file. I want to extract particular column from it.For example:
Say, I have csv:
id1,caste1,salary,name1
63,Graham,101153.06,Abraham
103,Joseph,122451.02,Charlie
63,Webster,127965.91,Violet
76,Smith,156150.62,Eric
97,Moreno,55867.74,Mia
65,Reynolds,106918.14,Richard
How can i use opencsv to read only data from header caste1?
Magnilex and Sparky are right in that CSVReader does not support reading values by column name. But that being said there are two ways you can do this.
Given that you have the column names and the default CSVReader reads the header you can search the first the header for the position then use that from there on out;
private int getHeaderLocation(String[] headers, String columnName) {
return Arrays.asList(headers).indexOf(columnName);
}
so your method would look like (leaving out a lot of error checks you will need to put in)
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
int columnPosition;
nextLine = reader.readNext();
columnPosition = getHeaderLocation(nextLine, "castle1");
while ((nextLine = reader.readNext()) != null && columnPosition > -1) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[columnPosition]);
}
I would only do the above if you were pressed for time and it was only one column you cared about. That is because openCSV can convert directly to an object that has the variables the same as the header column names using the CsvToBean class and the HeaderColumnNameMappingStrategy.
So first you would define a class that has the fields (and really you only need to put in the fields you want - extras are ignored and missing ones are null or default values).
public class CastleDTO {
private int id1;
private String castle1;
private double salary;
private String name1;
// have all the getters and setters here....
}
Then your code would look like
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
HeaderColumnNameMappingStrategy<CastleDTO> castleStrategy = new HeaderColumnNameMappingStrategy<CastleDTO>();
CsvToBean<CastleDTO> csvToBean = new CsvToBean<CastleDTO>();
List<CastleDTO> castleList = csvToBean.parse(castleStrategy, reader);
for (CastleDTO dto : castleList) {
System.out.println(dto.getCastle1());
}
There is no built in functionality in opencsv for reading from a column by name.
The official FAQ example has the following example on how to read from a file:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
You simply fetch the value in second column for each row by accesing the row with nextLine[1] (remember, arrays indices are zero based).
So, in your case you could simply read from the second line:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
System.out.println(nextLine[1]);
}
For a more sophisticated way of determining the column index from its header, refer to the answer from Scott Conway.
From the opencsv docs:
Starting with version 4.2, there’s another handy way of reading CSV files that doesn’t even require creating special classes. If your CSV file has headers, you can just initialize a CSVReaderHeaderAware and start reading the values out as a map:
reader = new CSVReaderHeaderAware(new FileReader("yourfile.csv"));
record = reader.readMap();
.readMap() will return a single record. You need to call .readMap() repeatedly to get all the records until you get null when it runs to the end (or to the first empty line), e.g.:
Map<String, String> values;
while ((values = reader.readMap()) != null) {
// consume the values here
}
The class also has another constructor which allows more customization, e.g.:
CSVReaderHeaderAware reader = new CSVReaderHeaderAware(
new InputStreamReader(inputStream),
0, // skipLines
parser, // custom parser
false, // keep end of lines
true, // verify reader
0, // multiline limit
null // null for default locale
);
One downside which I have found is that since the reader is lazy it does not offer a record count, therefore, if you need to know the total number (for example to display correct progress information), then you'll need to use another reader just for counting lines.
You also have available the CSVReaderHeaderAwareBuilder
I had a task to remove several columns from existing csv, example of csv:
FirstName, LastName, City, County, Zip
Steve,Hopkins,London,Greater London,15554
James,Bond,Vilnius,Vilniaus,03250
I needed only FirstName and LastName columns with values and it is very important that order should be the same - default rd.readMap() does not preserve the order, code for this task:
String[] COLUMN_NAMES_TO_REMOVE = new String[]{"", "City", "County", "Zip"};
CSVReaderHeaderAware rd = new CSVReaderHeaderAware(new StringReader(old.csv));
CSVWriter writer = new CSVWriter((new FileWriter(new.csv)),
CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER, CSVWriter.NO_ESCAPE_CHARACTER, CSVWriter.DEFAULT_LINE_END);
// let's get private field
Field privateField = CSVReaderHeaderAware.class.getDeclaredField("headerIndex");
privateField.setAccessible(true);
Map<String, Integer> headerIndex = (Map<String, Integer>) privateField.get(rd);
// do ordering in natural order - 0, 1, 2 ... n
Map<String, Integer> sortedInNaturalOrder = headerIndex.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.naturalOrder()))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue,
(oldValue, newValue) -> oldValue, LinkedHashMap::new));
// let's get headers in natural order
List<String> headers = sortedInNaturalOrder.keySet().stream().distinct().collect(Collectors.toList());
// let's remove headers
List<String> removedColumns = new ArrayList<String>(Arrays.asList(COLUMN_NAMES_TO_REMOVE));
headers.removeAll(removedColumns);
// save column names
writer.writeNext(headers.toArray(new String[headers.size()]));
List<String> keys = new ArrayList<>();
Map<String, String> values;
while ((values = rd.readMap()) != null) {
for (String key : headers) {
keys.add(values.get(key));
if (keys.size() == headers.size()) {
String[] itemsArray = new String[headers.size()];
itemsArray = keys.toArray(itemsArray);
// save values
writer.writeNext(itemsArray);
keys.clear();
}
}
}
writer.flush();
Output:
FirstName, LastName
Steve,Hopkins
James,Bond
Looking at the javadoc
if you create a CSVReader object, then you can use the method .readAll to pull the entire file. It returns a List of String[], with each String[] representing a line of the file. So now you have the tokens of each line, and you only want the second element of that, so split them up as they have been nicely given to you with delimiters. And on each line you only want the second element, so:
public static void main(String[] args){
String data = "63,Graham,101153.06,Abraham";
String result[] = data.split(",");
System.out.print(result[1]);
}

Spring Batch Write Header

I have a spring-batch file that extracts data from a database and writes it to a .CSV file.
I would like to add the names of the columns that are extracted as the headers of the file without hard coding them on the file.
Is possible to write the header when I get the results or is there another solution?
Thanks
fileItemWriter.setHeaderCallback(new FlatFileHeaderCallback() {
public void writeHeader(Writer writer) throws IOException {
writer.write(Arrays.toString(names));
}
});
[names] can be fetched using reflections from the domain class you created for the column names to be used by rowMapper, something like below :
private String[] reflectFields() throws ClassNotFoundException {
Class job = Class.forName("DomainClassName");
Field[] fields = FieldUtils.getAllFields(job);
names = new String[fields.length];
for(int i=0; i<fields.length; i++){
names[i] = fields[i].getName();
}
return names;
}

Java CSV Reader/Writer Questions

I have some questions regarding reading and writing to CSV files (or if there is a simpler alternative).
Scenario:
I need to have a simple database of people and some basic information about them. I need to be able to add new entries and search through the file for entries. I also need to be able to find an entry and modify it (i.e change their name or fill in a currently empty field).
Now I'm not sure if a CSV reader/writer is the best route or not? I wouldn't know where to begin with SQL in Java but if anyone knows of a good resource for learning that, that would be great.
Currently I am using SuperCSV, I put together a test project based around some example code:
class ReadingObjects {
// private static UserBean userDB[] = new UserBean[2];
private static ArrayList<UserBean> arrUserDB = new ArrayList<UserBean>();
static final CellProcessor[] userProcessors = new CellProcessor[] {
new StrMinMax(5, 20),
new StrMinMax(8, 35),
new ParseDate("dd/MM/yyyy"),
new Optional(new ParseInt()),
null
};
public static void main(String[] args) throws Exception {
ICsvBeanReader inFile = new CsvBeanReader(new FileReader("foo.csv"), CsvPreference.EXCEL_PREFERENCE);
try {
final String[] header = inFile.getCSVHeader(true);
UserBean user;
int i = 0;
while( (user = inFile.read(UserBean.class, header, userProcessors)) != null) {
UserBean addMe = new UserBean(user.getUsername(), user.getPassword(), user.getTown(), user.getDate(), user.getZip());
arrUserDB.add(addMe);
i++;
}
} finally {
inFile.close();
}
for(UserBean currentUser:arrUserDB){
if (currentUser.getUsername().equals("Klaus")) {
System.out.println("Found Klaus! :D");
}
}
WritingMaps.add();
}
}
And a writer class:
class WritingMaps {
public static void add() throws Exception {
ICsvMapWriter writer = new CsvMapWriter(new FileWriter("foo.csv", true), CsvPreference.EXCEL_PREFERENCE);
try {
final String[] header = new String[] { "username", "password", "date", "zip", "town"};
String test = System.getProperty("line.seperator");
// set up some data to write
final HashMap<String, ? super Object> data1 = new HashMap<String, Object>();
data1.put(header[0], "Karlasa");
data1.put(header[1], "fdsfsdfsdfs");
data1.put(header[2], "17/01/2010");
data1.put(header[3], 1111);
data1.put(header[4], "New York");
System.out.println(data1);
// the actual writing
// writer.writeHeader(header);
writer.write(data1, header);
// writer.write(data2, header);
} finally {
writer.close();
}
}
}
Issues:
I'm struggling to get the writer to add a new line to the CSV file. Purely for human readability purposes, not such a big deal.
I'm not sure how I would add data to an existing record to modify it. (remove and add it again? Not sure how to do this).
Thanks.
Have you considered an embedded database like H2, HSQL or SQLite? They can all persist to the filesystem and you'll discover a more flexible datastore with less code.
The easiest solution is to read the file at application startup into an in-memory structure (list of UserBean, for example), to add, remove, modify beans in this in-memory structure, and to write the whole list of UserBean to the file when the app closes, or when the user chooses to Save.
Regarding newlines when writing, the javadoc seems to indicate that the writer will take care of that. Just call write for each of your user bean, and the writer will automatically insert newlines between each row.

Categories