I have 2 csv files which the same data but output of the two files are in different order.
I want to output both lists in the same order.
List csv1
System.out.println(csv1);
Employee, Address, Name, Email
System.out.println(csv2);
Output of this List looks like;
Address, Email, Employee Name
How can I sort the lists to print in the column order;
Employee, Name, Email, Address
Note: I can't use integer col(1),col(3) because column 1 in csv1 does not match col1 in csv2
data is read as follows:
List<String> ret = new ArrayList<>();
BufferedReader r = new BufferedReader(new InputStreamReader(str));
Stream lines = r.lines().skip(1);
lines.forEachOrdered(
line -> {
line= ((String) line).replace("\"", "");
ret.add((String) line);
I've assumed that you need to parse these two csv files and output in order.
You can use Apache Commons-CSV library for parsing. I've considered below examples
Solution using external library:
test1.csv
Address,Email,Employee,Name
SecondMainRoad,test2#gmail.com,Frank,Michael
test2.csv
Employee,Address,Name,Email
John,FirstMainRoad,Doe,test#gmail.com
Sample program
public static void main(String[] args) throws IOException {
try(Reader csvReader = Files.newBufferedReader(Paths.get
("test2.csv"))) {
// Initialize CSV parser and iterator.
CSVParser csvParser = new CSVParser(csvReader, CSVFormat.Builder.create()
.setRecordSeparator(System.lineSeparator())
.setHeader()
.setSkipHeaderRecord(true)
.setIgnoreEmptyLines(true)
.build());
Iterator<CSVRecord> csvRecordIterator = csvParser.iterator();
while(csvRecordIterator.hasNext())
{
final CSVRecord csvRecord = csvRecordIterator.next();
final Map<String, String> recordMap = csvRecord.toMap();
System.out.println(String.format("Employee:%s", recordMap.get("Employee")));
System.out.println(String.format("Name:%s", recordMap.get("Name")));
System.out.println(String.format("Email:%s", recordMap.get("Email")));
System.out.println(String.format("Address:%s", recordMap.get("Address")));
}
}
}
Standlone Solution:
public class CSVTesterMain {
public static void main(String[] args) {
// I have used string variables to hold csv data, In this case, you can replace with file output lines.
String csv1= "Employee,Address,Name,Email\r\n" +
"John,FirstMainRoad,Doe,test#gmail.com\r\n" +
"Henry,ThirdCrossStreet,Joseph,email#gmail.com";
String csv2 = "Address,Email,Employee,Name\r\n" +
"SecondMainRoad,test2#gmail.com,Michael,Sessner\r\n" +
"CrossRoad,test25#gmail.com,Vander,John";
// Map key - To hold header information
// Map Value - List of lines holding values to the corresponding headers.
Map<String, List<String>> dataMap = new HashMap<>();
Stream<String> csv1LineStream = csv1.lines();
Stream<String> csv2LineStream = csv2.lines();
// We are using the same method to parse different csv formats. We are maintaining reference to the headers
// in the form of Map key which will helps us to emit output later as per our format.
populateDataMap(csv1LineStream, dataMap);
populateDataMap(csv2LineStream, dataMap);
// Now we have dataMap that holds data from multiple csv files. Key of the map is responsible to
// determine the header sequence.
// Print the output as per the sequence Employee, Name, Email, Address
System.out.println("Employee,Name,Email,Address");
dataMap.forEach((header, lineList) -> {
// Logic to determine the index value for each column.
List<String> headerList = Arrays.asList(header.split(","));
int employeeIdx = headerList.indexOf("Employee");
int nameIdx = headerList.indexOf("Name");
int emailIdx = headerList.indexOf("Email");
int addressIdx = headerList.indexOf("Address");
// Now we know the index value of each of these columns that can be emitted in our format.
// You can output to a file in your case.
// Iterate through each line, split and output as per the format.
lineList.forEach(line -> {
String[] data = line.split(",");
System.out.println(String.format("%s,%s,%s,%s", data[employeeIdx],
data[nameIdx],
data[emailIdx],
data[addressIdx]
));
});
});
}
private static void populateDataMap(Stream<String> csvLineStream, Map<String, List<String>> dataMap) {
// Populate data map associating the data to respective headers.
Iterator<String> csvIterator = csvLineStream.iterator();
// Fetch header. (In my example, I am sure that my first line is always the header).
String header = csvIterator.next();
if(! dataMap.containsKey(header))
dataMap.put(header, new ArrayList<>());
// Iterate through the remaining lines and populate data map.
while(csvIterator.hasNext())
dataMap.get(header).add(csvIterator.next());
}
}
Here I am using Jackson dataformat library to parse the csv files.
Dependency
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
<version>2.13.2</version>
</dependency>
File 1
employee, address, name, email
1, address 1, Name 1, name1#example.com
2, address 2, Name 2, name2#example.com
3, address 3, Name 3, name3#example.com
File 2
address, email, employee, name
address 4, name4#example.com, 4, Name 4
address 5, name5#example.com, 5, Name 5
address 6, name6#example.com, 6, Name 6
Java Program
Here EmployeeDetails is a POJO class. And it is expected that the location of the csv files is passed as an argument.
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.databind.ObjectReader;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
public class EmployeeDataParser {
public static void main(String[] args) {
File directoryPath = new File(args[0]);
File filesList[] = directoryPath.listFiles();
List<EmployeeDetails> employeeDetails = new ArrayList<>();
EmployeeDataParser employeeDataParser=new EmployeeDataParser();
for(File file : filesList) {
System.out.println("File path: "+file.getAbsolutePath());
employeeDataParser.readEmployeeData(employeeDetails, file.getAbsolutePath());
}
System.out.println("number of employees into list: " + employeeDetails.size());
employeeDataParser.printEmployeeDetails(employeeDetails);
}
private List<EmployeeDetails> readEmployeeData(List<EmployeeDetails> employeeDetails,
String filePath){
CsvMapper csvMapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader();
ObjectReader oReader = csvMapper.readerFor(EmployeeDetails.class).with(schema);
try (Reader reader = new FileReader(filePath)) {
MappingIterator<EmployeeDetails> mi = oReader.readValues(reader);
while (mi.hasNext()) {
EmployeeDetails current = mi.next();
employeeDetails.add(current);
}
} catch (IOException e) {
System.out.println("IOException Caught !!!");
System.out.println(e.getStackTrace());
}
return employeeDetails;
}
private void printEmployeeDetails(List<EmployeeDetails> employeeDetails) {
System.out.printf("%5s %10s %15s %25s", "Employee", "Name", "Email", "Address");
System.out.println();
for(EmployeeDetails empDetail:employeeDetails){
System.out.format("%5s %15s %25s %15s", empDetail.getEmployee(),
empDetail.getName(),
empDetail.getEmail(),
empDetail.getAddress());
System.out.println();
}
}
}
Related
I have 3 different types of csv files, each with different headers. I currently use a MultiresourceItemReader and delegate the reading to a FlatfileItemReader as follows
#Bean
#StepScope
public MultiResourceItemReader<Model> multiResourceItemReader() {
MultiResourceItemReader<FileRow> resourceItemReader = new MultiResourceItemReader<FileRow>();
resourceItemReader.setResources( getInputResources() );
resourceItemReader.setDelegate( reader() );
return resourceItemReader;
}
#Bean
#StepScope
public FlatFileItemReader reader() {
log.debug("Header : {}", extraInfoHolder.getHeader());
return new FlatFileItemReaderBuilder<Model>()
.skippedLinesCallback(line -> {
String rsrc = multiResourceItemReader().getCurrentResource().toString();
log.debug("Current Resource : {}", rsrc);
// Verify file header is what we expect
if (!StringUtils.equals( line, extraInfoHolder.getHeader() )) {
throw new IllegalArgumentException( String.format("Invalid Header in " + rsrc) );
}
})
.name( "myReader" )
.linesToSkip( HEADER_ROW )
.lineMapper( new DefaultLineMapper() {
{
setLineTokenizer( getDelimitedLineTokenizer() );
setFieldSetMapper( getBeanWrapperFieldSetMapper() );
}} )
.build();
}
However, I'd like to read the csv file into an HashMap instead of a Model POJO, i.e. if the file is formatted as follows
First Name, Last Name, Age Doug, Jones, 57 Sam, Reed, 39
I'd like to read each line into a map where the key is the header token and the value is the file value, Map 1: First Name -> Doug Last Name -> Jones Age -> 57
Map 2: First Name -> Sam Last Name -> Reed Age -> 39
In classic Spring Batch fashion, I'd like to read one row, convert it into a map, process + write it, then read the next row. How can I achieve this?
This will return the maps that you want,
private static List<Map<String, Object>> getMapsFrom(String file) throws IOException {
List<Map<String, Object>> maps = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File(file))))) {
int index = 0;
String line;
String[] keys = new String[3];
while ((line = br.readLine()) != null) {
if (index++ == 0){
keys = line.split(",");
}else{
String[] values = line.split(",");
for (int i = 0; i < values.length; i++) {
values[i] = values[i].trim();
}
Map<String, Object> map = new HashMap<>();
map.put(keys[0], values[0]);
map.put(keys[1], values[1]);
map.put(keys[2], Integer.parseInt(values[2]));
maps.add(map);
}
}
}
return maps;
}
assuming your csv file is always in the form of
First Name, Last Name, Age
Doug, Jones, 57
Sam, Reed, 39
Here is a screenshot of the maps returned from the file sample above,
I am learning how to work with files in Java. I have a sample file which contains key pairs and it values. I am trying to find a key pairs and if it matches, then output file would be updated with both, key pair and it's value. I am able to get key pairs in output file but unable to get values too. Stringbuilder may work here to append strings but I don't know how.
Below are my input and output files.
Input File:
born time 9 AM London -- kingNumber 1234567890 -- address: abc/cd/ef -- birthmonth: unknown
born time 9 AM Europe -- kingNumber 1234567890 -- address: abc/cd/ef -- birthmonth: december
Expected Output File:
kingNumber 1234567890 birthmonth unknown
kingNumber 1234567890 birthmonth unkbown
Current Output File:
kingNumber birthmonth
kingNumber birthmonth
I am able to write key pair ("kingNumber" and "birthmonth" in this case) to output file but I am not sure what I can do to get it's value too.
String kn = "kingNumber:";
String bd = "birthmonth:";
try {
File f = new File("sample.txt");
Scanner sc = new Scanner(f);
FileWriter fw = new FileWriter("output.txt");
while(sc.hasNextLine()) {
String lineContains = sc.next();
if(lineContains.contains(kn)) {
fw.write(kn + "\n");
// This is where I am stuck. What
// can I do to get it's value (number in this case).
}
else if(lineContains.contains(bd)) {
fw.write(bd);
// This is where I am stuck. What
// can I do to get it's value (birthday in this case).
}
}
} catch (IOException e) {
e.printStackTrace();
}
you could use java.util.regex.Pattern & java.util.regex.Matcherwith a pattern alike:
^born\stime\s([a-zA-Z0-9\s]*)\s--\skingNumber\s(\d+)\s--\saddress:\s([a-zA-Z0-9\s/]*)\s--\sbirthmonth:\s([a-zA-Z0-9\s]*)$
write less, do more.
I have written a simple parser that it following data format from your example.
You will need to call it like this:
PairParser parser = new PairParser(lineContains);
then you can get value from the parser by pair keys
How to get value:
parser.getValue("kingNumber")
Note that keys do not have trailing column character.
The parser code is here:
package com.grenader.example;
import java.util.HashMap;
import java.util.Map;
public class PairParser {
private Map<String, String> data = new HashMap<>();
/**
* Constructor, prepare the data
* #param dataString line from the given data file
*/
public PairParser(String dataString) {
if (dataString == null || dataString.isEmpty())
throw new IllegalArgumentException("Data line cannot be empty");
// Spit the input line into array of string blocks based on '--' as a separator
String[] blocks = dataString.split("--");
for (String block : blocks)
{
if (block.startsWith("born time")) // skip this one because it doesn't looks like a key/value pair
continue;
String[] strings = block.split("\\s");
if (strings.length != 3) // has not exactly 3 items (first items is empty), skipping this one as well
continue;
String key = strings[1];
String value = strings[2];
if (key.endsWith(":"))
key = key.substring(0, key.length()-1).trim();
data.put(key.trim(), value.trim());
}
}
/**
* Return value based on key
* #param key
* #return
*/
public String getValue(String key)
{
return data.get(key);
}
/**
* Return number of key/value pairs
* #return
*/
public int size()
{
return data.size();
}
}
And here is the Unit Test to make sure that the code works
package com.grenader.example;
import com.grenader.example.PairParser;
import org.junit.Test;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.fail;
public class PairParserTest {
#Test
public void getValue_Ok() {
PairParser parser = new PairParser("born time 9 AM London -- kingNumber 1234567890 -- address: abc/cd/ef -- birthmonth: unknown");
assertEquals("1234567890", parser.getValue("kingNumber"));
assertEquals("unknown", parser.getValue("birthmonth"));
}
#Test(expected = IllegalArgumentException.class)
public void getValue_Null() {
new PairParser(null);
fail("This test should fail with Exception");
}
#Test(expected = IllegalArgumentException.class)
public void getValue_EmptyLine() {
new PairParser("");
fail("This test should fail with Exception");
}
#Test()
public void getValue_BadData() {
PairParser parser = new PairParser("bad data bad data");
assertEquals(0, parser.size());
}
}
I have two csv files. One Master CSV File around 500000 records. Another DailyCSV file has 50000 Records.
The DailyCSV files misses few columns which has to be fetched from Master CSV File.
For example
DailyCSV File
id,name,city,zip,occupation
1,Jhon,Florida,50069,Accountant
MasterCSV File
id,name,city,zip,occupation,company,exp,salary
1, Jhon, Florida, 50069, Accountant, AuditFirm, 3, $5000
What I have to do is, read both files, match the records with ID, if ID is present in the master file, then i have to fetch company, exp, salary and write it to a new csv file.
How to achieve this.??
What I have done Currently
while (true) {
line = bstream.readLine();
lineMaster = bstreamMaster.readLine();
if (line == null || lineMaster == null)
{
break;
}
else
{
while(lineMaster != null)
readlineSplit = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String splitId = readlineSplit[4];
String[] readLineSplitMaster =lineMaster.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String SplitIDMaster = readLineSplitMaster[13];
System.out.println(splitId + "|" + SplitIDMaster);
//System.out.println(splitId.equalsIgnoreCase(SplitIDMaster));
if (splitId.equalsIgnoreCase(SplitIDMaster)) {
String writeLine = readlineSplit[0] + "," + readlineSplit[1] + "," + readlineSplit[2] + "," + readlineSplit[3] + "," + readlineSplit[4] + "," + readlineSplit[5] + "," + readLineSplitMaster[15]+ "," + readLineSplitMaster[16] + "," + readLineSplitMaster[17];
System.out.println(writeLine);
pstream.print(writeLine + "\r\n");
}
}
}pstream.close();
fout.flush();
bstream.close();
bstreamMaster.close();
First of all, your current parsing approach will be painfully slow. Use a CSV parsing library dedicated for that to speed things up. With uniVocity-parsers you can process your 500K records in less than a second. This is how you can use it to solve your problem:
First let's define a few utility methods to read/write your files:
//opens the file for reading (using UTF-8 encoding)
private static Reader newReader(String pathToFile) {
try {
return new InputStreamReader(new FileInputStream(new File(pathToFile)), "UTF-8");
} catch (Exception e) {
throw new IllegalArgumentException("Unable to open file for reading at " + pathToFile, e);
}
}
//creates a file for writing (using UTF-8 encoding)
private static Writer newWriter(String pathToFile) {
try {
return new OutputStreamWriter(new FileOutputStream(new File(pathToFile)), "UTF-8");
} catch (Exception e) {
throw new IllegalArgumentException("Unable to open file for writing at " + pathToFile, e);
}
}
Then, we can start reading your daily CSV file, and generate a Map:
public static void main(String... args){
//First we parse the daily update file.
CsvParserSettings settings = new CsvParserSettings();
//here we tell the parser to read the CSV headers
settings.setHeaderExtractionEnabled(true);
//and to select ONLY the following columns.
//This ensures rows with a fixed size will be returned in case some records come with less or more columns than anticipated.
settings.selectFields("id", "name", "city", "zip", "occupation");
CsvParser parser = new CsvParser(settings);
//Here we parse all data into a list.
List<String[]> dailyRecords = parser.parseAll(newReader("/path/to/daily.csv"));
//And convert them to a map. ID's are the keys.
Map<String, String[]> mapOfDailyRecords = toMap(dailyRecords);
... //we'll get back here in a second.
This is the code to generate a Map from the list of daily records:
/* Converts a list of records to a map. Uses element at index 0 as the key */
private static Map<String, String[]> toMap(List<String[]> records) {
HashMap<String, String[]> map = new HashMap<String, String[]>();
for (String[] row : records) {
//column 0 will always have an ID.
map.put(row[0], row);
}
return map;
}
With the map of records, we can process your master file and generate the list of updates:
private static List<Object[]> processMasterFile(final Map<String, String[]> mapOfDailyRecords) {
//we'll put the updated data here
final List<Object[]> output = new ArrayList<Object[]>();
//configures the parser to process only the columns you are interested in.
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
settings.selectFields("id", "company", "exp", "salary");
//All parsed rows will be submitted to the following RowProcessor. This way the bigger Master file won't
//have all its rows stored in memory.
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
// Incoming rows from MASTER will have the ID as index 0.
// If the daily update map contains the ID, we'll get the daily row
String[] dailyData = mapOfDailyRecords.get(row[0]);
if (dailyData != null) {
//We got a match. Let's join the data from the daily row with the master row.
Object[] mergedRow = new Object[8];
for (int i = 0; i < dailyData.length; i++) {
mergedRow[i] = dailyData[i];
}
for (int i = 1; i < row.length; i++) { //starts from 1 to skip the ID at index 0
mergedRow[i + dailyData.length - 1] = row[i];
}
output.add(mergedRow);
}
}
});
CsvParser parser = new CsvParser(settings);
//the parse() method will submit all rows to the RowProcessor defined above.
parser.parse(newReader("/path/to/master.csv"));
return output;
}
Finally, we can get the merged data and write everything to another file:
... // getting back to the main method here
//Now we process the master data and get a list of updates
List<Object[]> updatedData = processMasterFile(mapOfDailyRecords);
//And write the updated data to another file
CsvWriterSettings writerSettings = new CsvWriterSettings();
writerSettings.setHeaders("id", "name", "city", "zip", "occupation", "company", "exp", "salary");
writerSettings.setHeaderWritingEnabled(true);
CsvWriter writer = new CsvWriter(newWriter("/path/to/updates.csv"), writerSettings);
//Here we write everything, and get the job done.
writer.writeRowsAndClose(updatedData);
}
This should work like a charm. Hope it helps.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
I will approach the problem in a step by step manner.
First I will parse/read the master CSV file and keep its content into a hashmap, where the key will be each record's unique 'id' as for the value maybe you can store them in a hash or simply create a java class to store the information.
Example of hash:
{
'1' : { 'name': 'Jhon',
'City': 'Florida',
'zip' : 50069,
....
}
}
Next, read your comparer csv file. For each row, read the 'id' and check if the key exists on the hashmap you have created earlier.
if it exists, then from the hashmap access the information you need and write to a new CSV file.
Also, you might want to consider using a 3rd party CSV parser to make this task easier.
If you have maven maybe you can follow this example I found on net. Otherwise you can just google for apache 'csv parser' example on the internet.
http://examples.javacodegeeks.com/core-java/apache/commons/csv-commons/writeread-csv-files-with-apache-commons-csv-example/
I am looking for an idea how to accomplish this task. So I'll start with how my program is working.
My program reads a CSV file. They are key value pairs separated by a comma.
L1234456,ygja-3bcb-iiiv-pppp-a8yr-c3d2-ct7v-giap-24yj-3gie
L6789101,zgna-3mcb-iiiv-pppp-a8yr-c3d2-ct7v-gggg-zz33-33ie
etc
Function takes a file and parses it into an arrayList of String[]. The function returns the ArrayList.
public ArrayList<String[]> parseFile(File csvFile) {
Scanner scan = null;
try {
scan = new Scanner(csvFile);
} catch (FileNotFoundException e) {
}
ArrayList<String[]> records = new ArrayList<String[]>();
String[] record = new String[2];
while (scan.hasNext()) {
record = scan.nextLine().trim().split(",");
records.add(record);
}
return records;
}
Here is the code, where I am calling parse file and passing in the CSVFile.
ArrayList<String[]> Records = parseFile(csvFile);
I then created another ArrayList for files that aren't parsed.
ArrayList<String> NotParsed = new ArrayList<String>();
So the program then continues to sanitize the key value pairs separated by a comma. So we first start with the first key in the record. E.g L1234456. If the record could not be sanitized it then it replaces the current key with "CouldNOtBeParsed" text.
for (int i = 0; i < Records.size(); i++) {
if(!validateRecord(Records.get(i)[0].toString())) {
Logging.info("Records could not be parsed " + Records.get(i)[0]);
NotParsed.add(srpRecords.get(i)[0].toString());
Records.get(i)[0] = "CouldNotBeParsed";
} else {
Logging.info(Records.get(i)[0] + " has been sanitized");
}
}
Next we do the 2nd key in the key value pair e.g ygja-3bcb-iiiv-pppp-a8yr-c3d2-ct7v-giap-24yj-3gie
for (int i = 0; i < Records.size(); i++) {
if(!validateRecordKey(Records.get(i)[1].toString())) {
Logging.info("Record Key could not be parsed " + Records.get(i)[0]);
NotParsed.add(Records.get(i)[1].toString());
Records.get(i)[1] = "CouldNotBeParsed";
} else {
Logging.info(Records.get(i)[1] + " has been sanitized");
}
}
The problem is that I need both keyvalue pairs to be sanitized, make a separate list of the keyValue pairs that could not be sanitized and a list of the ones there were sanitized so they can be inserted into a database. The ones that cannot will be printed out to the user.
I thought about looping thought the records and removing the records with the "CouldNotBeParsed" text so that would just leave the ones that could be parsed. I also tried removing the records from the during the for loop Records.remove((i)); However that messes up the For loop because if the first record could not be sanitized, then it's removed, the on the next iteration of the loop it's skipped because record 2 is now record 1. That's why i went with adding the text.
Atually I need two lists, one for the Records that were sanitized and another that wasn't.
So I was thinking there must be a better way to do this. Or a better method of sanitizing both keyValue pairs at the same time or something of that nature. Suggestions?
Start by changing the data structure: rather than using a list of two-element String[] arrays, define a class for your key-value pairs:
class KeyValuePair {
private final String key;
private final String value;
public KeyValuePair(String k, String v) { key = k; value = v; }
public String getKey() { return key; }
public String getValue() { return value; }
}
Note that the class is immutable.
Now make an object with three lists of KeyValuePair objects:
class ParseResult {
private final List<KeyValuePair> sanitized = new ArrayList<KeyValuePair>();
private final List<KeyValuePair> badKey = new ArrayList<KeyValuePair>();
private final List<KeyValuePair> badValue = new ArrayList<KeyValuePair>();
public ParseResult(List<KeyValuePair> s, List<KeyValuePair> bk, List<KeyValuePair> bv) {
sanitized = s;
badKey = bk;
badValue = bv;
}
public List<KeyValuePair> getSanitized() { return sanitized; }
public List<KeyValuePair> getBadKey() { return badKey; }
public List<KeyValuePair> getBadValue() { return badValue; }
}
Finally, populate these three lists in a single loop that reads from the file:
public static ParseResult parseFile(File csvFile) {
Scanner scan = null;
try {
scan = new Scanner(csvFile);
} catch (FileNotFoundException e) {
???
// Do something about this exception.
// Consider not catching it here, letting the caller deal with it.
}
final List<KeyValuePair> sanitized = new ArrayList<KeyValuePair>();
final List<KeyValuePair> badKey = new ArrayList<KeyValuePair>();
final List<KeyValuePair> badValue = new ArrayList<KeyValuePair>();
while (scan.hasNext()) {
String[] tokens = scan.nextLine().trim().split(",");
if (tokens.length != 2) {
???
// Do something about this - either throw an exception,
// or log a message and continue.
}
KeyValuePair kvp = new KeyValuePair(tokens[0], tokens[1]);
// Do the validation on the spot
if (!validateRecordKey(kvp.getKey())) {
badKey.add(kvp);
} else if (!validateRecord(kvp.getValue())) {
badValue.add(kvp);
} else {
sanitized.add(kvp);
}
}
return new ParseResult(sanitized, badKey, badValue);
}
Now you have a single function that produces a single result with all your records cleanly separated into three buckets - i.e. sanitized records, records with bad keys, and record with good keys but bad values.
I am novice to java however, I cannot seem to figure this one out. I have a CSV file in the following format:
String1,String2
String1,String2
String1,String2
String1,String2
Each line are pairs. The 2nd line is a new record, same with the 3rd. In the real word the CSV file will change in size, sometimes it will be 3 records, or 4, or even 10.
My issues is how do I read the values into an array and dynamically adjust the size? I would imagine, first we would have to parse though the csv file, get the number of records/elements, then create the array based on that size, then go though the CSV again and store it in the array.
I'm just not sure how to accomplish this.
Any help would be appreciated.
You can use ArrayList instead of Array. An ArrayList is a dynamic array. ex.
Scanner scan = new Scanner(new File("yourfile"));
ArrayList<String[]> records = new ArrayList<String[]>();
String[] record = new String[2];
while(scan.hasNext())
{
record = scan.nextLine().split(",");
records.add(record);
}
//now records has your records.
//here is a way to loop through the records (process)
for(String[] temp : records)
{
for(String temp1 : temp)
{
System.out.print(temp1 + " ");
}
System.out.print("\n");
}
Just replace "yourfile" with the absolute path to your file.
You could do something like this.
More traditional for loop for processing the data if you don't like the first example:
for(int i = 0; i < records.size(); i++)
{
for(int j = 0; j < records.get(i).length; j++)
{
System.out.print(records.get(i)[j] + " ");
}
System.out.print("\n");
}
Both for loops are doing the same thing though.
You can simply read the CSV into a 2-dimensional array just in 2 lines with the open source library uniVocity-parsers.
Refer to the following code as an example:
public static void main(String[] args) throws FileNotFoundException {
/**
* ---------------------------------------
* Read CSV rows into 2-dimensional array
* ---------------------------------------
*/
// 1st, creates a CSV parser with the configs
CsvParser parser = new CsvParser(new CsvParserSettings());
// 2nd, parses all rows from the CSV file into a 2-dimensional array
List<String[]> resolvedData = parser.parseAll(new FileReader("/examples/example.csv"));
// 3rd, process the 2-dimensional array with business logic
// ......
}
tl;dr
Use the Java Collections rather than arrays, specifically a List or Set, to auto-expand as you add items.
Define a class to hold your data read from CSV, instantiating an object for each row read.
Use the Apache Commons CSV library to help with the chore of reading/writing CSV files.
Class to hold data
Define a class to hold the data of each row being read from your CSV. Let's use Person class with a given name and surname, to be more concrete than the example in your Question.
In Java 16 and later, more briefly define the class as a record.
record Person ( String givenName , String surname ) {}
In older Java, define a conventional class.
package work.basil.example;
public class Person {
public String givenName, surname;
public Person ( String givenName , String surname ) {
this.givenName = givenName;
this.surname = surname;
}
#Override
public String toString ( ) {
return "Person{ " +
"givenName='" + givenName + '\'' +
" | surname='" + surname + '\'' +
" }";
}
}
Collections, not arrays
Using the Java Collections is generally better than using mere arrays. The collections are more flexible and more powerful. See Oracle Tutorial.
Here we will use the List interface to collect each Person object instantiated from data read in from the CSV file. We use the concrete ArrayList implementation of List which uses arrays in the background. The important part here, related to your Question, is that you can add objects to a List without worrying about resizing. The List implementation is responsible for any needed resizing.
If you happen to know the approximate size of your list to be populated, you can supply an optional initial capacity as a hint when creating the List.
Apache Commons CSV
The Apache Commons CSV library does a nice job of reading and writing several variants of CSV and Tab-delimited formats.
Example app
Here is an example app, in a single PersoIo.java file. The Io is short for input-output.
Example data.
GivenName,Surname
Alice,Albert
Bob,Babin
Charlie,Comtois
Darlene,Deschamps
Source code.
package work.basil.example;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class PersonIo {
public static void main ( String[] args ) {
PersonIo app = new PersonIo();
app.doIt();
}
private void doIt ( ) {
Path path = Paths.get( "/Users/basilbourque/people.csv" );
List < Person > people = this.read( path );
System.out.println( "People: \n" + people );
}
private List < Person > read ( final Path path ) {
Objects.requireNonNull( path );
if ( Files.notExists( path ) ) {
System.out.println( "ERROR - no file found for path: " + path + ". Message # de1f0be7-901f-4b57-85ae-3eecac66c8f6." );
}
List < Person > people = List.of(); // Default to empty list.
try {
// Hold data read from file.
int initialCapacity = ( int ) Files.lines( path ).count();
people = new ArrayList <>( initialCapacity );
// Read CSV file.
BufferedReader reader = Files.newBufferedReader( path );
Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
for ( CSVRecord record : records ) {
// GivenName,Surname
// Alice,Albert
// Bob,Babin
// Charlie,Comtois
// Darlene,Deschamps
String givenName = record.get( "GivenName" );
String surname = record.get( "Surname" );
// Use read data to instantiate.
Person p = new Person( givenName , surname );
// Collect
people.add( p ); // For real work, you would define a class to hold these values.
}
} catch ( IOException e ) {
e.printStackTrace();
}
return people;
}
}
When run.
People:
[Person{ givenName='Alice' | surname='Albert' }, Person{ givenName='Bob' | surname='Babin' }, Person{ givenName='Charlie' | surname='Comtois' }, Person{ givenName='Darlene' | surname='Deschamps' }]