How to read several csv files into a HashMap?

How to read several csv files into a HashMap? - java

I have 3 different types of csv files, each with different headers. I currently use a MultiresourceItemReader and delegate the reading to a FlatfileItemReader as follows
#Bean
#StepScope
public MultiResourceItemReader<Model> multiResourceItemReader() {
MultiResourceItemReader<FileRow> resourceItemReader = new MultiResourceItemReader<FileRow>();
resourceItemReader.setResources( getInputResources() );
resourceItemReader.setDelegate( reader() );
return resourceItemReader;
}
#Bean
#StepScope
public FlatFileItemReader reader() {
log.debug("Header : {}", extraInfoHolder.getHeader());
return new FlatFileItemReaderBuilder<Model>()
.skippedLinesCallback(line -> {
String rsrc = multiResourceItemReader().getCurrentResource().toString();
log.debug("Current Resource : {}", rsrc);
// Verify file header is what we expect
if (!StringUtils.equals( line, extraInfoHolder.getHeader() )) {
throw new IllegalArgumentException( String.format("Invalid Header in " + rsrc) );
}
})
.name( "myReader" )
.linesToSkip( HEADER_ROW )
.lineMapper( new DefaultLineMapper() {
{
setLineTokenizer( getDelimitedLineTokenizer() );
setFieldSetMapper( getBeanWrapperFieldSetMapper() );
}} )
.build();
}
However, I'd like to read the csv file into an HashMap instead of a Model POJO, i.e. if the file is formatted as follows
First Name, Last Name, Age Doug, Jones, 57 Sam, Reed, 39
I'd like to read each line into a map where the key is the header token and the value is the file value, Map 1: First Name -> Doug Last Name -> Jones Age -> 57
Map 2: First Name -> Sam Last Name -> Reed Age -> 39
In classic Spring Batch fashion, I'd like to read one row, convert it into a map, process + write it, then read the next row. How can I achieve this?

This will return the maps that you want,
private static List<Map<String, Object>> getMapsFrom(String file) throws IOException {
List<Map<String, Object>> maps = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File(file))))) {
int index = 0;
String line;
String[] keys = new String[3];
while ((line = br.readLine()) != null) {
if (index++ == 0){
keys = line.split(",");
}else{
String[] values = line.split(",");
for (int i = 0; i < values.length; i++) {
values[i] = values[i].trim();
}
Map<String, Object> map = new HashMap<>();
map.put(keys[0], values[0]);
map.put(keys[1], values[1]);
map.put(keys[2], Integer.parseInt(values[2]));
maps.add(map);
}
}
}
return maps;
}
assuming your csv file is always in the form of
First Name, Last Name, Age
Doug, Jones, 57
Sam, Reed, 39
Here is a screenshot of the maps returned from the file sample above,

Related

Arrange List in Java to output specific columns

I have 2 csv files which the same data but output of the two files are in different order.
I want to output both lists in the same order.
List csv1
System.out.println(csv1);
Employee, Address, Name, Email
System.out.println(csv2);
Output of this List looks like;
Address, Email, Employee Name
How can I sort the lists to print in the column order;
Employee, Name, Email, Address
Note: I can't use integer col(1),col(3) because column 1 in csv1 does not match col1 in csv2
data is read as follows:
List<String> ret = new ArrayList<>();
BufferedReader r = new BufferedReader(new InputStreamReader(str));
Stream lines = r.lines().skip(1);
lines.forEachOrdered(
line -> {
line= ((String) line).replace("\"", "");
ret.add((String) line);

I've assumed that you need to parse these two csv files and output in order.
You can use Apache Commons-CSV library for parsing. I've considered below examples
Solution using external library:
test1.csv
Address,Email,Employee,Name
SecondMainRoad,test2#gmail.com,Frank,Michael
test2.csv
Employee,Address,Name,Email
John,FirstMainRoad,Doe,test#gmail.com
Sample program
public static void main(String[] args) throws IOException {
try(Reader csvReader = Files.newBufferedReader(Paths.get
("test2.csv"))) {
// Initialize CSV parser and iterator.
CSVParser csvParser = new CSVParser(csvReader, CSVFormat.Builder.create()
.setRecordSeparator(System.lineSeparator())
.setHeader()
.setSkipHeaderRecord(true)
.setIgnoreEmptyLines(true)
.build());
Iterator<CSVRecord> csvRecordIterator = csvParser.iterator();
while(csvRecordIterator.hasNext())
{
final CSVRecord csvRecord = csvRecordIterator.next();
final Map<String, String> recordMap = csvRecord.toMap();
System.out.println(String.format("Employee:%s", recordMap.get("Employee")));
System.out.println(String.format("Name:%s", recordMap.get("Name")));
System.out.println(String.format("Email:%s", recordMap.get("Email")));
System.out.println(String.format("Address:%s", recordMap.get("Address")));
}
}
}
Standlone Solution:
public class CSVTesterMain {
public static void main(String[] args) {
// I have used string variables to hold csv data, In this case, you can replace with file output lines.
String csv1= "Employee,Address,Name,Email\r\n" +
"John,FirstMainRoad,Doe,test#gmail.com\r\n" +
"Henry,ThirdCrossStreet,Joseph,email#gmail.com";
String csv2 = "Address,Email,Employee,Name\r\n" +
"SecondMainRoad,test2#gmail.com,Michael,Sessner\r\n" +
"CrossRoad,test25#gmail.com,Vander,John";
// Map key - To hold header information
// Map Value - List of lines holding values to the corresponding headers.
Map<String, List<String>> dataMap = new HashMap<>();
Stream<String> csv1LineStream = csv1.lines();
Stream<String> csv2LineStream = csv2.lines();
// We are using the same method to parse different csv formats. We are maintaining reference to the headers
// in the form of Map key which will helps us to emit output later as per our format.
populateDataMap(csv1LineStream, dataMap);
populateDataMap(csv2LineStream, dataMap);
// Now we have dataMap that holds data from multiple csv files. Key of the map is responsible to
// determine the header sequence.
// Print the output as per the sequence Employee, Name, Email, Address
System.out.println("Employee,Name,Email,Address");
dataMap.forEach((header, lineList) -> {
// Logic to determine the index value for each column.
List<String> headerList = Arrays.asList(header.split(","));
int employeeIdx = headerList.indexOf("Employee");
int nameIdx = headerList.indexOf("Name");
int emailIdx = headerList.indexOf("Email");
int addressIdx = headerList.indexOf("Address");
// Now we know the index value of each of these columns that can be emitted in our format.
// You can output to a file in your case.
// Iterate through each line, split and output as per the format.
lineList.forEach(line -> {
String[] data = line.split(",");
System.out.println(String.format("%s,%s,%s,%s", data[employeeIdx],
data[nameIdx],
data[emailIdx],
data[addressIdx]
));
});
});
}
private static void populateDataMap(Stream<String> csvLineStream, Map<String, List<String>> dataMap) {
// Populate data map associating the data to respective headers.
Iterator<String> csvIterator = csvLineStream.iterator();
// Fetch header. (In my example, I am sure that my first line is always the header).
String header = csvIterator.next();
if(! dataMap.containsKey(header))
dataMap.put(header, new ArrayList<>());
// Iterate through the remaining lines and populate data map.
while(csvIterator.hasNext())
dataMap.get(header).add(csvIterator.next());
}
}

Here I am using Jackson dataformat library to parse the csv files.
Dependency
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
<version>2.13.2</version>
</dependency>
File 1
employee, address, name, email
1, address 1, Name 1, name1#example.com
2, address 2, Name 2, name2#example.com
3, address 3, Name 3, name3#example.com
File 2
address, email, employee, name
address 4, name4#example.com, 4, Name 4
address 5, name5#example.com, 5, Name 5
address 6, name6#example.com, 6, Name 6
Java Program
Here EmployeeDetails is a POJO class. And it is expected that the location of the csv files is passed as an argument.
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.databind.ObjectReader;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
public class EmployeeDataParser {
public static void main(String[] args) {
File directoryPath = new File(args[0]);
File filesList[] = directoryPath.listFiles();
List<EmployeeDetails> employeeDetails = new ArrayList<>();
EmployeeDataParser employeeDataParser=new EmployeeDataParser();
for(File file : filesList) {
System.out.println("File path: "+file.getAbsolutePath());
employeeDataParser.readEmployeeData(employeeDetails, file.getAbsolutePath());
}
System.out.println("number of employees into list: " + employeeDetails.size());
employeeDataParser.printEmployeeDetails(employeeDetails);
}
private List<EmployeeDetails> readEmployeeData(List<EmployeeDetails> employeeDetails,
String filePath){
CsvMapper csvMapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader();
ObjectReader oReader = csvMapper.readerFor(EmployeeDetails.class).with(schema);
try (Reader reader = new FileReader(filePath)) {
MappingIterator<EmployeeDetails> mi = oReader.readValues(reader);
while (mi.hasNext()) {
EmployeeDetails current = mi.next();
employeeDetails.add(current);
}
} catch (IOException e) {
System.out.println("IOException Caught !!!");
System.out.println(e.getStackTrace());
}
return employeeDetails;
}
private void printEmployeeDetails(List<EmployeeDetails> employeeDetails) {
System.out.printf("%5s %10s %15s %25s", "Employee", "Name", "Email", "Address");
System.out.println();
for(EmployeeDetails empDetail:employeeDetails){
System.out.format("%5s %15s %25s %15s", empDetail.getEmployee(),
empDetail.getName(),
empDetail.getEmail(),
empDetail.getAddress());
System.out.println();
}
}
}

Find duplicates in first column and take average based on third column

My issue here is I need to compute average time for each Id and compute average time of each id.
Sample data
T1,2020-01-16,11:16pm,start
T2,2020-01-16,11:18pm,start
T1,2020-01-16,11:20pm,end
T2,2020-01-16,11:23pm,end
I have written a code in such a way that I kept first column and third column in a map.. something like
T1, 11:16pm
but I could not able to compute values after keeping those values in a map. Also tried to keep them in string array and split into line by line. By same issue facing for that approach also.
**
public class AverageTimeGenerate {
public static void main(String[] args) throws IOException {
File file = new File("/abc.txt");
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
ArrayList<String> list = new ArrayList<>();
String[] tokens = line.split(",");
for (String s: tokens) {
list.add(s);
}
Map<String, String> map = new HashMap<>();
String[] data = line.split(",");
String ids= data[0];
String dates = data[1];
String transactionTime = data[2];
String transactionStartAndEndTime = data[3];
String[] transactionIds = ids.split("/n");
String[] timeOfEachTransaction = transactionTime.split("/n");
for(String id : transactionIds) {
for(String time : timeOfEachTransaction) {
map.put(id, time);
}
}
}
}
}
}
Can anyone suggest me is it possible to find duplicates in a map and compute values in map, Or is there any other way I can do this so that the output should be like
`T1 2:00
T2 5:00'

I don't know what is your logic to complete the average time but you can save data in map for one particular transaction. The map structure can be like this. Transaction id will be the key and all the time will be in array list.
Map<String,List<String>> map = new HashMap<String,List<String>>();

You can do like this:
Map<String, String> result = Files.lines(Paths.get("abc.txt"))
.map(line -> line.split(","))
.map(arr -> {
try {
return new AbstractMap.SimpleEntry<>(arr[0],
new SimpleDateFormat("HH:mm").parse(arr[2]));
} catch (ParseException e) {
return null;
}
}).collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.collectingAndThen(Collectors
.mapping(Map.Entry::getValue, Collectors.toList()),
list -> toStringTime.apply(convert.apply(list)))));
for simplify I've declared two functions.
Function<List<Date>, Long> convert = list -> (list.get(1).getTime() - list.get(0).getTime()) / 2;
Function<Long, String> toStringTime = l -> l / 60000 + ":" + l % 60000 / 1000;

How to select random text value from specific row using java

I have three input fields.
First Name
Last item
Date Of Birth
I would like to get random data for each input from a property file.
This is how the property file looks. Field name and = should be ignored.
- First Name= Robert, Brian, Shawn, Bay, John, Paul
- Last Name= Jerry, Adam ,Lu , Eric
- Date of Birth= 01/12/12,12/10/12,1/2/17
Example: For First Name: File should randomly select one name from the following names
Robert, Brian, Shawn, Bay, John, Paul
Also I need to ignore anything before =
FileInputStream objfile = new FileInputStream(System.getProperty("user.dir "+path);
in = new BufferedReader(new InputStreamReader(objfile ));
String line = in.readLine();
while (line != null && !line.trim().isEmpty()) {
String eachRecord[]=line.trim().split(",");
Random rand = new Random();
//I need to pick first name randomly from the file from row 1.
send(firstName,(eachRecord[0]));

If you know that you're always going to have just those 3 lines in your property file I would get put each into a map with an index as the key then randomly generate a key in the range of the map.
// your code here to read the file in
HashMap<String, String> firstNameMap = new HashMap<String, String>();
HashMap<String, String> lastNameMap = new HashMap<String, String>();
HashMap<String, String> dobMap = new HashMap<String, String>();
String line;
while (line = in.readLine() != null) {
String[] parts = line.split("=");
if(parts[0].equals("First Name")) {
String[] values = lineParts[1].split(",");
for (int i = 0; i < values.length; ++i) {
firstNameMap.put(i, values[i]);
}
}
else if(parts[0].equals("Last Name")) {
// do the same as FN but for lastnamemap
}
else if(parts[0].equals("Date of Birth") {
// do the same as FN but for dobmap
}
}
// Now you can use the length of the map and a random number to get a value
// first name for instance:
int randomNum = ThreadLocalRandom.current().nextInt(0, firstNameMap.size(0 + 1);
System.out.println("First Name: " + firstNameMap.get(randomNum));
// and you would do the same for the other fields
The code can easily be refactored with some helper methods to make it cleaner, we'll leave that as a HW assignment :)
This way you have a cache of all your values that you can call at anytime and get a random value. I realize this isn't the most optimum solution having nested loops and 3 different maps but if your input file only contains 3 lines and you're not expecting to have millions of inputs it should be just fine.

Haven't programmed stuff like this in a long time.
Feel free to test it, and let me know if it works.
The result of this code should be a HashMap object called values
You can then get the specific fields you want from it, using get(field_name)
For example - values.get("First Name"). Make sure to use to correct case, because "first name" won't work.
If you want it all to be lower case, you can just add .toLowerCase() at the end of the line that puts the field and value into the HashMap
import java.lang.Math;
import java.util.HashMap;
public class Test
{
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
// set the value of "in" here, so you actually read from it
HashMap<String, String> values = new HashMap<String, String>();
String line;
while (((line = in.readLine()) != null) && !line.trim().isEmpty()) {
if(!line.contains("=")) {
continue;
}
String[] lineParts = line.split("=");
String[] eachRecord = lineParts[1].split(",");
System.out.println("adding value of field type = " + lineParts[0].trim());
// now add the mapping to the values HashMap - values[field_name] = random_field_value
values.put(lineParts[0].trim(), eachRecord[(int) (Math.random() * eachRecord.length)].trim());
}
System.out.println("First Name = " + values.get("First Name"));
System.out.println("Last Name = " + values.get("Last Name"));
System.out.println("Date of Birth = " + values.get("Date of Birth"));
}
}

Reading and matching contents of two big files

I have two files each having the same format with approximately 100,000 lines. For each line in file one I am extracting the second component or column and if I find a match in the second column of second file, I extract their third components and combine them, store or output it.
Though my implementation works but the programs runs extremely slow, it takes more than an hour to iterate over the files, compare and output all the results.
I am reading and storing the data of both files in ArrayList then iterate over those list and do the comparison. Below is my code, is there any performance related glitch or its just normal for such an operation.
Note : I was using String.split() but I understand form other post that StringTokenizer is faster.
public ArrayList<String> match(String file1, String file2) throws IOException{
ArrayList<String> finalOut = new ArrayList<>();
try {
ArrayList<String> data = readGenreDataIntoMemory(file1);
ArrayList<String> data1 = readGenreDataIntoMemory(file2);
StringTokenizer st = null;
for(String line : data){
HashSet<String> genres = new HashSet<>();
boolean sameMovie = false;
String movie2 = "";
st = new StringTokenizer(line, "|");
//String line[] = fline.split("\\|");
String ratingInfo = st.nextToken();
String movie1 = st.nextToken();
String genreInfo = st.nextToken();
if(!genreInfo.equals("null")){
for(String s : genreInfo.split(",")){
genres.add(s);
}
}
StringTokenizer st1 = null;
for(String line1 : data1){
st1 = new StringTokenizer(line1, "|");
st1.nextToken();
movie2 = st1.nextToken();
String genreInfo2= st1.nextToken();
//If the movie name are similar then they should have the same genre
//Update their genres to be the same
if(!genreInfo2.equals("null") && movie1.equals(movie2)){
for(String s : genreInfo2.split(",")){
genres.add(s);
}
sameMovie = true;
break;
}
}
if(sameMovie){
finalOut.add(ratingInfo+""+movieName+""+genres.toString()+"\n");
}else if(sameMovie == false){
finalOut.add(line);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return finalOut;
}

I would use the Streams API
String file1 = "files1.txt";
String file2 = "files2.txt";
// get all the lines by movie name for each file.
Map<String, List<String[]>> map = Stream.of(Files.lines(Paths.get(file1)),
Files.lines(Paths.get(file2)))
.flatMap(p -> p)
.parallel()
.map(s -> s.split("[|]", 3))
.collect(Collectors.groupingByConcurrent(sa -> sa[1], Collectors.toList()));
// merge all the genres for each movie.
map.forEach((movie, lines) -> {
Set<String> genres = lines.stream()
.flatMap(l -> Stream.of(l[2].split(",")))
.collect(Collectors.toSet());
System.out.println("movie: " + movie + " genres: " + genres);
});
This has the advantage of being O(n) instead of O(n^2) and it's multi-threaded.

Do a hash join.
As of now you are doing an outer loop join which is O(n^2), the hash join will be amortized O(n)
Put the contents of each file in a hash map, with key the field you want (second field).
Map<String,String> map1 = new HashMap<>();
// build the map from file1
Then do the hash join
for(String key1 : map1.keySet()){
if(map2.containsKey(key1)){
// do your thing you found the match
}
}

Aggregate data in CSV file using Java

I have a big CSV file, thousands of rows, and I want to aggregate some columns using java code.
The file in the form:
1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1
The results should be:
T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1

Put your data to a Map like structure, each time add +1 to a stored value when a key (in your case ""+T+year) found.

You can use map like
Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);
or you can define your own class with T and Year field by overriding hashcode and equals method. Then you can use
Map<YourClass, Integer> map= new HashMap<>();
T1,2012, 2

String csv =
"1,2012,T1\n"
+ "2,2015,T2\n"
+ "3,2013,T1\n"
+ "4,2012,T1\n";
Map<String, Integer> map = new TreeMap<>();
BufferedReader reader = new BufferedReader(new StringReader(csv));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
String key = fields[2] + "," + fields[1];
Integer value = map.get(key);
if (value == null)
value = 0;
map.put(key, value + 1);
}
System.out.println(map);
// -> {T1,2012=2, T1,2013=1, T2,2015=1}

Use uniVocity-parsers for the best performance. It should take 1 second to process 1 million rows.
CsvParserSettings settings = new CsvParserSettings();
settings.selectIndexes(1, 2); //select the columns we are going to read
final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here
//Use a custom implementation of RowProcessor
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.
Integer count = results.get(key);
if (count == null) {
count = 0;
}
results.put(key, count + 1);
}
});
//creates a parser with the above configuration and RowProcessor
CsvParser parser = new CsvParser(settings);
String input = "1,2012,T1"
+ "\n2,2015,T2"
+ "\n3,2013,T1"
+ "\n4,2012,T1";
//the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
parser.parse(new StringReader(input));
//Here are the results:
for(Entry<List<String>, Integer> entry : results.entrySet()){
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
Output:
[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read several csv files into a HashMap? - java

Related

Arrange List in Java to output specific columns

Find duplicates in first column and take average based on third column

How to select random text value from specific row using java

Reading and matching contents of two big files

Aggregate data in CSV file using Java

Categories

Resources