Aggregate data in CSV file using Java

Aggregate data in CSV file using Java - java

I have a big CSV file, thousands of rows, and I want to aggregate some columns using java code.
The file in the form:
1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1
The results should be:
T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1

Put your data to a Map like structure, each time add +1 to a stored value when a key (in your case ""+T+year) found.

You can use map like
Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);
or you can define your own class with T and Year field by overriding hashcode and equals method. Then you can use
Map<YourClass, Integer> map= new HashMap<>();
T1,2012, 2

String csv =
"1,2012,T1\n"
+ "2,2015,T2\n"
+ "3,2013,T1\n"
+ "4,2012,T1\n";
Map<String, Integer> map = new TreeMap<>();
BufferedReader reader = new BufferedReader(new StringReader(csv));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
String key = fields[2] + "," + fields[1];
Integer value = map.get(key);
if (value == null)
value = 0;
map.put(key, value + 1);
}
System.out.println(map);
// -> {T1,2012=2, T1,2013=1, T2,2015=1}

Use uniVocity-parsers for the best performance. It should take 1 second to process 1 million rows.
CsvParserSettings settings = new CsvParserSettings();
settings.selectIndexes(1, 2); //select the columns we are going to read
final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here
//Use a custom implementation of RowProcessor
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.
Integer count = results.get(key);
if (count == null) {
count = 0;
}
results.put(key, count + 1);
}
});
//creates a parser with the above configuration and RowProcessor
CsvParser parser = new CsvParser(settings);
String input = "1,2012,T1"
+ "\n2,2015,T2"
+ "\n3,2013,T1"
+ "\n4,2012,T1";
//the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
parser.parse(new StringReader(input));
//Here are the results:
for(Entry<List<String>, Integer> entry : results.entrySet()){
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
Output:
[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Related

Find duplicates in first column and take average based on third column

My issue here is I need to compute average time for each Id and compute average time of each id.
Sample data
T1,2020-01-16,11:16pm,start
T2,2020-01-16,11:18pm,start
T1,2020-01-16,11:20pm,end
T2,2020-01-16,11:23pm,end
I have written a code in such a way that I kept first column and third column in a map.. something like
T1, 11:16pm
but I could not able to compute values after keeping those values in a map. Also tried to keep them in string array and split into line by line. By same issue facing for that approach also.
**
public class AverageTimeGenerate {
public static void main(String[] args) throws IOException {
File file = new File("/abc.txt");
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
ArrayList<String> list = new ArrayList<>();
String[] tokens = line.split(",");
for (String s: tokens) {
list.add(s);
}
Map<String, String> map = new HashMap<>();
String[] data = line.split(",");
String ids= data[0];
String dates = data[1];
String transactionTime = data[2];
String transactionStartAndEndTime = data[3];
String[] transactionIds = ids.split("/n");
String[] timeOfEachTransaction = transactionTime.split("/n");
for(String id : transactionIds) {
for(String time : timeOfEachTransaction) {
map.put(id, time);
}
}
}
}
}
}
Can anyone suggest me is it possible to find duplicates in a map and compute values in map, Or is there any other way I can do this so that the output should be like
`T1 2:00
T2 5:00'

I don't know what is your logic to complete the average time but you can save data in map for one particular transaction. The map structure can be like this. Transaction id will be the key and all the time will be in array list.
Map<String,List<String>> map = new HashMap<String,List<String>>();

You can do like this:
Map<String, String> result = Files.lines(Paths.get("abc.txt"))
.map(line -> line.split(","))
.map(arr -> {
try {
return new AbstractMap.SimpleEntry<>(arr[0],
new SimpleDateFormat("HH:mm").parse(arr[2]));
} catch (ParseException e) {
return null;
}
}).collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.collectingAndThen(Collectors
.mapping(Map.Entry::getValue, Collectors.toList()),
list -> toStringTime.apply(convert.apply(list)))));
for simplify I've declared two functions.
Function<List<Date>, Long> convert = list -> (list.get(1).getTime() - list.get(0).getTime()) / 2;
Function<Long, String> toStringTime = l -> l / 60000 + ":" + l % 60000 / 1000;

How to select random text value from specific row using java

I have three input fields.
First Name
Last item
Date Of Birth
I would like to get random data for each input from a property file.
This is how the property file looks. Field name and = should be ignored.
- First Name= Robert, Brian, Shawn, Bay, John, Paul
- Last Name= Jerry, Adam ,Lu , Eric
- Date of Birth= 01/12/12,12/10/12,1/2/17
Example: For First Name: File should randomly select one name from the following names
Robert, Brian, Shawn, Bay, John, Paul
Also I need to ignore anything before =
FileInputStream objfile = new FileInputStream(System.getProperty("user.dir "+path);
in = new BufferedReader(new InputStreamReader(objfile ));
String line = in.readLine();
while (line != null && !line.trim().isEmpty()) {
String eachRecord[]=line.trim().split(",");
Random rand = new Random();
//I need to pick first name randomly from the file from row 1.
send(firstName,(eachRecord[0]));

If you know that you're always going to have just those 3 lines in your property file I would get put each into a map with an index as the key then randomly generate a key in the range of the map.
// your code here to read the file in
HashMap<String, String> firstNameMap = new HashMap<String, String>();
HashMap<String, String> lastNameMap = new HashMap<String, String>();
HashMap<String, String> dobMap = new HashMap<String, String>();
String line;
while (line = in.readLine() != null) {
String[] parts = line.split("=");
if(parts[0].equals("First Name")) {
String[] values = lineParts[1].split(",");
for (int i = 0; i < values.length; ++i) {
firstNameMap.put(i, values[i]);
}
}
else if(parts[0].equals("Last Name")) {
// do the same as FN but for lastnamemap
}
else if(parts[0].equals("Date of Birth") {
// do the same as FN but for dobmap
}
}
// Now you can use the length of the map and a random number to get a value
// first name for instance:
int randomNum = ThreadLocalRandom.current().nextInt(0, firstNameMap.size(0 + 1);
System.out.println("First Name: " + firstNameMap.get(randomNum));
// and you would do the same for the other fields
The code can easily be refactored with some helper methods to make it cleaner, we'll leave that as a HW assignment :)
This way you have a cache of all your values that you can call at anytime and get a random value. I realize this isn't the most optimum solution having nested loops and 3 different maps but if your input file only contains 3 lines and you're not expecting to have millions of inputs it should be just fine.

Haven't programmed stuff like this in a long time.
Feel free to test it, and let me know if it works.
The result of this code should be a HashMap object called values
You can then get the specific fields you want from it, using get(field_name)
For example - values.get("First Name"). Make sure to use to correct case, because "first name" won't work.
If you want it all to be lower case, you can just add .toLowerCase() at the end of the line that puts the field and value into the HashMap
import java.lang.Math;
import java.util.HashMap;
public class Test
{
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
// set the value of "in" here, so you actually read from it
HashMap<String, String> values = new HashMap<String, String>();
String line;
while (((line = in.readLine()) != null) && !line.trim().isEmpty()) {
if(!line.contains("=")) {
continue;
}
String[] lineParts = line.split("=");
String[] eachRecord = lineParts[1].split(",");
System.out.println("adding value of field type = " + lineParts[0].trim());
// now add the mapping to the values HashMap - values[field_name] = random_field_value
values.put(lineParts[0].trim(), eachRecord[(int) (Math.random() * eachRecord.length)].trim());
}
System.out.println("First Name = " + values.get("First Name"));
System.out.println("Last Name = " + values.get("Last Name"));
System.out.println("Date of Birth = " + values.get("Date of Birth"));
}
}

How to Loop next element in hashmap

I have a set of strings like this
A_2007-04, A_2007-09, A_Agent, A_Daily, A_Execute, A_Exec, B_Action, B_HealthCheck
I want output as:
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
Key = B, Value = [Action,HealthCheck]
I'm using HashMap to do this
pckg:{A,B}
count:total no of strings
reports:set of strings
Logic I used is nested loop:
for (String l : reports[i]) {
for (String r : pckg) {
String[] g = l.split("_");
if (g[0].equalsIgnoreCase(r)) {
report.add(g[1]);
dirFiles.put(g[0], report);
} else {
break;
}
}
}
I'm getting output as
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
How to get second key?
Can someone suggest logic for this?

Assuming that you use Java 8, it can be done using computeIfAbsent to initialize the List of values when it is a new key as next:
List<String> tokens = Arrays.asList(
"A_2007-04", "A_2007-09", "A_Agent", "A_Daily", "A_Execute",
"A_Exec", "P_Action", "P_HealthCheck"
);
Map<String, List<String>> map = new HashMap<>();
for (String token : tokens) {
String[] g = token.split("_");
map.computeIfAbsent(g[0], key -> new ArrayList<>()).add(g[1]);
}

In terms of raw code this should do what I think you are trying to achieve:
// Create a collection of String any way you like, but for testing
// I've simply split a flat string into an array.
String flatString = "A_2007-04,A_2007-09,A_Agent,A_Daily,A_Execute,A_Exec,"
+ "P_Action,P_HealthCheck";
String[] reports = flatString.split(",");
Map<String, List<String>> mapFromReportKeyToValues = new HashMap<>();
for (String report : reports) {
int underscoreIndex = report.indexOf("_");
String key = report.substring(0, underscoreIndex);
String newValue = report.substring(underscoreIndex + 1);
List<String> existingValues = mapFromReportKeyToValues.get(key);
if (existingValues == null) {
// This key hasn't been seen before, so create a new list
// to contain values which belong under this key.
existingValues = new ArrayList<>();
mapFromReportKeyToValues.put(key, existingValues);
}
existingValues.add(newValue);
}
System.out.println("Generated map:\n" + mapFromReportKeyToValues);
Though I recommend tidying it up and organising it into a method or methods as fits your project code.

Doing this with Map<String, ArrayList<String>> will be another good approach I think:
String reports[] = {"A_2007-04", "A_2007-09", "A_Agent", "A_Daily",
"A_Execute", "A_Exec", "P_Action", "P_HealthCheck"};
Map<String, ArrayList<String>> map = new HashMap<>();
for (String rep : reports) {
String s[] = rep.split("_");
String prefix = s[0], suffix = s[1];
ArrayList<String> list = new ArrayList<>();
if (map.containsKey(prefix)) {
list = map.get(prefix);
}
list.add(suffix);
map.put(prefix, list);
}
// Print
for (Map.Entry<String, ArrayList<String>> entry : map.entrySet()) {
String key = entry.getKey();
ArrayList<String> valueList = entry.getValue();
System.out.println(key + " " + valueList);
}

for (String l : reports[i]) {
String[] g = l.split("_");
for (String r : pckg) {
if (g[0].equalsIgnoreCase(r)) {
report = dirFiles.get(g[0]);
if(report == null){ report = new ArrayList<String>(); } //create new report
report.add(g[1]);
dirFiles.put(g[0], report);
}
}
}
Removed the else part of the if condition. You are using break there which exits the inner loop and you never get to evaluate the keys beyond first key.
Added checking for existing values. As suggested by Orin2005.
Also I have moved the statement String[] g = l.split("_"); outside inner loop so that it doesn't get executed multiple times.

Reading and matching contents of two big files

I have two files each having the same format with approximately 100,000 lines. For each line in file one I am extracting the second component or column and if I find a match in the second column of second file, I extract their third components and combine them, store or output it.
Though my implementation works but the programs runs extremely slow, it takes more than an hour to iterate over the files, compare and output all the results.
I am reading and storing the data of both files in ArrayList then iterate over those list and do the comparison. Below is my code, is there any performance related glitch or its just normal for such an operation.
Note : I was using String.split() but I understand form other post that StringTokenizer is faster.
public ArrayList<String> match(String file1, String file2) throws IOException{
ArrayList<String> finalOut = new ArrayList<>();
try {
ArrayList<String> data = readGenreDataIntoMemory(file1);
ArrayList<String> data1 = readGenreDataIntoMemory(file2);
StringTokenizer st = null;
for(String line : data){
HashSet<String> genres = new HashSet<>();
boolean sameMovie = false;
String movie2 = "";
st = new StringTokenizer(line, "|");
//String line[] = fline.split("\\|");
String ratingInfo = st.nextToken();
String movie1 = st.nextToken();
String genreInfo = st.nextToken();
if(!genreInfo.equals("null")){
for(String s : genreInfo.split(",")){
genres.add(s);
}
}
StringTokenizer st1 = null;
for(String line1 : data1){
st1 = new StringTokenizer(line1, "|");
st1.nextToken();
movie2 = st1.nextToken();
String genreInfo2= st1.nextToken();
//If the movie name are similar then they should have the same genre
//Update their genres to be the same
if(!genreInfo2.equals("null") && movie1.equals(movie2)){
for(String s : genreInfo2.split(",")){
genres.add(s);
}
sameMovie = true;
break;
}
}
if(sameMovie){
finalOut.add(ratingInfo+""+movieName+""+genres.toString()+"\n");
}else if(sameMovie == false){
finalOut.add(line);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return finalOut;
}

I would use the Streams API
String file1 = "files1.txt";
String file2 = "files2.txt";
// get all the lines by movie name for each file.
Map<String, List<String[]>> map = Stream.of(Files.lines(Paths.get(file1)),
Files.lines(Paths.get(file2)))
.flatMap(p -> p)
.parallel()
.map(s -> s.split("[|]", 3))
.collect(Collectors.groupingByConcurrent(sa -> sa[1], Collectors.toList()));
// merge all the genres for each movie.
map.forEach((movie, lines) -> {
Set<String> genres = lines.stream()
.flatMap(l -> Stream.of(l[2].split(",")))
.collect(Collectors.toSet());
System.out.println("movie: " + movie + " genres: " + genres);
});
This has the advantage of being O(n) instead of O(n^2) and it's multi-threaded.

Do a hash join.
As of now you are doing an outer loop join which is O(n^2), the hash join will be amortized O(n)
Put the contents of each file in a hash map, with key the field you want (second field).
Map<String,String> map1 = new HashMap<>();
// build the map from file1
Then do the hash join
for(String key1 : map1.keySet()){
if(map2.containsKey(key1)){
// do your thing you found the match
}
}

How to sort a string into a map and print the results

I have a string in the format nm=Alan&hei=72&hair=brown
I would like to split this information up, add a conversion to the first value and print the results in the format
nm Name Alan
hei Height 72
hair Hair Color brown
I've looked at various methods using the split function and hashmaps but have had no luck piecing it all together.
Any advice would be very useful to me.

Map<String, String> aliases = new HashMap<String, String>();
aliases.put("nm", "Name");
aliases.put("hei", "Height");
aliases.put("hair", "Hair Color");
String[] params = str.split("&"); // gives you string array: nm=Alan, hei=72, hair=brown
for (String p : params) {
String[] nv = p.split("=");
String name = nv[0];
String value = nv[1];
System.out.println(nv[0] + " " + aliases.get(nv[0]) + " " + nv[1]);
}
I really do not understand what you problem was...

Try something like this:
static final String DELIMETER = "&"
Map<String,String> map = ...
map.put("nm","Name");
map.put("hei","Height");
map.put("hair","Hair color");
StringBuilder builder = new StringBuilder();
String input = "nm=Alan&hei=72&hair=brown"
String[] splitted = input.split(DELIMETER);
for(Stirng str : splitted){
int index = str.indexOf("=");
String key = str.substring(0,index);
builder.append(key);
builder.append(map.get(key));
builder.append(str.substring(index));
builder.append("\n");
}

A HashMap consists of many key, value pairs. So when you use split, devise an appropriate regex (&). Once you have your string array, you can use one of the elements as the key (think about which element will make the best key). However, you may now be wondering- "how do I place the rest of elements as the values?". Perhaps you can create a new class which stores the rest of the elements and use objects of this class as values for the hashmap.
Then printing becomes easy- merely search for the value of the corresponding key. This value will be an object; use the appropriate method on this object to retrieve the elements and you should be able to print everything.
Also, remember to handle exceptions in your code. e.g. check for nulls, etc.
Another thing: your qn mentions the word "sort". I don't fully get what that means in this context...

Map<String, String> propsMap = new HashMap<String, String>();
Map<String, String> propAlias = new HashMap<String, String>();
propAlias.put("nm", "Name");
propAlias.put("hei", "Height");
propAlias.put("hair", "Hair Color");
String[] props = input.split("&");
if (props != null && props.length > 0) {
for (String prop : props) {
String[] propVal = prop.split("=");
if (propVal != null && propVal.length == 2) {
propsMap.put(propVal[0], propVal[1]);
}
}
}
for (Map.Entry tuple : propsMap.getEntrySet()) {
if (propAlias.containsKey(tuple.getKey())) {
System.out.println(tuple.getKey() + " " + propAlias.get(tuple.getKey()) + " " + tuple.getValue());
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Aggregate data in CSV file using Java - java

I have a big CSV file, thousands of rows, and I want to aggregate some columns using java code. The file in the form: 1,2012,T1 2,2015,T2 3,2013,T1 4,2012,T1 The results should be: T, Year, Count T1,2012, 2 T1,2013, 1 T2,2015, 1

Put your data to a Map like structure, each time add +1 to a stored value when a key (in your case ""+T+year) found.

You can use map like Map<String, Integer> rowMap = new HashMap<>(); rowMap("T1", 1); rowMap("T2", 2); rowMap("2012", 1); or you can define your own class with T and Year field by overriding hashcode and equals method. Then you can use Map<YourClass, Integer> map= new HashMap<>(); T1,2012, 2

Related

Find duplicates in first column and take average based on third column

How to select random text value from specific row using java

How to Loop next element in hashmap

Reading and matching contents of two big files

How to sort a string into a map and print the results

Categories

Resources