Find duplicates in first column and take average based on third column

Find duplicates in first column and take average based on third column - java

My issue here is I need to compute average time for each Id and compute average time of each id.
Sample data
T1,2020-01-16,11:16pm,start
T2,2020-01-16,11:18pm,start
T1,2020-01-16,11:20pm,end
T2,2020-01-16,11:23pm,end
I have written a code in such a way that I kept first column and third column in a map.. something like
T1, 11:16pm
but I could not able to compute values after keeping those values in a map. Also tried to keep them in string array and split into line by line. By same issue facing for that approach also.
**
public class AverageTimeGenerate {
public static void main(String[] args) throws IOException {
File file = new File("/abc.txt");
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
ArrayList<String> list = new ArrayList<>();
String[] tokens = line.split(",");
for (String s: tokens) {
list.add(s);
}
Map<String, String> map = new HashMap<>();
String[] data = line.split(",");
String ids= data[0];
String dates = data[1];
String transactionTime = data[2];
String transactionStartAndEndTime = data[3];
String[] transactionIds = ids.split("/n");
String[] timeOfEachTransaction = transactionTime.split("/n");
for(String id : transactionIds) {
for(String time : timeOfEachTransaction) {
map.put(id, time);
}
}
}
}
}
}
Can anyone suggest me is it possible to find duplicates in a map and compute values in map, Or is there any other way I can do this so that the output should be like
`T1 2:00
T2 5:00'

I don't know what is your logic to complete the average time but you can save data in map for one particular transaction. The map structure can be like this. Transaction id will be the key and all the time will be in array list.
Map<String,List<String>> map = new HashMap<String,List<String>>();

You can do like this:
Map<String, String> result = Files.lines(Paths.get("abc.txt"))
.map(line -> line.split(","))
.map(arr -> {
try {
return new AbstractMap.SimpleEntry<>(arr[0],
new SimpleDateFormat("HH:mm").parse(arr[2]));
} catch (ParseException e) {
return null;
}
}).collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.collectingAndThen(Collectors
.mapping(Map.Entry::getValue, Collectors.toList()),
list -> toStringTime.apply(convert.apply(list)))));
for simplify I've declared two functions.
Function<List<Date>, Long> convert = list -> (list.get(1).getTime() - list.get(0).getTime()) / 2;
Function<Long, String> toStringTime = l -> l / 60000 + ":" + l % 60000 / 1000;

Related

Get,Put key and values from nested hashmap

I want to create a nested HashMap which returns the frequency of terms among multiple files. Like,
Map<String, Map<String, Integer>> wordToDocumentMap=new HashMap<>();
I have been able to return the number of times a term appears in a file.
Map<String, Integer> map = new HashMap<>();//for frequecy count
String str = "Wikipedia is a free online encyclopedia, created and edited by
volunteers around the world."; //String str suppose a file a.java
// The query string
String query = "edited Wikipedia volunteers";
// Split the given string and the query string on space
String[] strArr = str.split("\\s+");
String[] queryArr = query.split("\\s+");
// Map to hold the frequency of each word of query in the string
Map<String, Integer> map = new HashMap<>();
for (String q : queryArr) {
for (String s : strArr) {
if (q.equals(s)) {
map.put(q, map.getOrDefault(q, 0) + 1);
}
}
}
// Display the map
System.out.println(map);
In my code its count the frequency of the given query Individually. But I want to Map the query term and its frequency with its filenames. I have searched around the web for a solution but am finding it tough to find a solution that applies to me. Any help would be appreciated!

I hope I'm understanding you correctly.
What you want is to be able to read in a list of files and map the file name to the map you create in the code above. So let's start with your code and let's turn it into a function:
public Map<String, Integer> createFreqMap(String str, String query) {
Map<String, Integer> map = new HashMap<>();//for frequecy count
// The query string
String query = "edited Wikipedia volunteers";
// Split the given string and the query string on space
String[] strArr = str.split("\\s+");
String[] queryArr = query.split("\\s+");
// Map to hold the frequency of each word of query in the string
Map<String, Integer> map = new HashMap<>();
for (String q : queryArr) {
for (String s : strArr) {
if (q.equals(s)) {
map.put(q, map.getOrDefault(q, 0) + 1);
}
}
}
// Display the map
System.out.println(map);
return map;
}
OK so now you have a nifty function that makes a map from a string and a query
Now you're going to want to set up a system for reading in a file to a string.
There are a bunch of ways to do this. You can look here for some ways that work for different java versions: https://stackoverflow.com/a/326440/9789673
lets go with this (assuming >java 11):
String content = Files.readString(path, StandardCharsets.US_ASCII);
Where path is the path to the file you want.
Now we can put it all together:
String[] paths = ["this.txt", "that.txt"]
Map<String, Map<String, Integer>> output = new HashMap<>();
String query = "edited Wikipedia volunteers"; //String query = "hello";
for (int i = 0; i < paths.length; i++) {
String content = Files.readString(paths[i], StandardCharsets.US_ASCII);
output.put(paths[i], createFreqMap(content, query);
}

How to Loop next element in hashmap

I have a set of strings like this
A_2007-04, A_2007-09, A_Agent, A_Daily, A_Execute, A_Exec, B_Action, B_HealthCheck
I want output as:
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
Key = B, Value = [Action,HealthCheck]
I'm using HashMap to do this
pckg:{A,B}
count:total no of strings
reports:set of strings
Logic I used is nested loop:
for (String l : reports[i]) {
for (String r : pckg) {
String[] g = l.split("_");
if (g[0].equalsIgnoreCase(r)) {
report.add(g[1]);
dirFiles.put(g[0], report);
} else {
break;
}
}
}
I'm getting output as
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
How to get second key?
Can someone suggest logic for this?

Assuming that you use Java 8, it can be done using computeIfAbsent to initialize the List of values when it is a new key as next:
List<String> tokens = Arrays.asList(
"A_2007-04", "A_2007-09", "A_Agent", "A_Daily", "A_Execute",
"A_Exec", "P_Action", "P_HealthCheck"
);
Map<String, List<String>> map = new HashMap<>();
for (String token : tokens) {
String[] g = token.split("_");
map.computeIfAbsent(g[0], key -> new ArrayList<>()).add(g[1]);
}

In terms of raw code this should do what I think you are trying to achieve:
// Create a collection of String any way you like, but for testing
// I've simply split a flat string into an array.
String flatString = "A_2007-04,A_2007-09,A_Agent,A_Daily,A_Execute,A_Exec,"
+ "P_Action,P_HealthCheck";
String[] reports = flatString.split(",");
Map<String, List<String>> mapFromReportKeyToValues = new HashMap<>();
for (String report : reports) {
int underscoreIndex = report.indexOf("_");
String key = report.substring(0, underscoreIndex);
String newValue = report.substring(underscoreIndex + 1);
List<String> existingValues = mapFromReportKeyToValues.get(key);
if (existingValues == null) {
// This key hasn't been seen before, so create a new list
// to contain values which belong under this key.
existingValues = new ArrayList<>();
mapFromReportKeyToValues.put(key, existingValues);
}
existingValues.add(newValue);
}
System.out.println("Generated map:\n" + mapFromReportKeyToValues);
Though I recommend tidying it up and organising it into a method or methods as fits your project code.

Doing this with Map<String, ArrayList<String>> will be another good approach I think:
String reports[] = {"A_2007-04", "A_2007-09", "A_Agent", "A_Daily",
"A_Execute", "A_Exec", "P_Action", "P_HealthCheck"};
Map<String, ArrayList<String>> map = new HashMap<>();
for (String rep : reports) {
String s[] = rep.split("_");
String prefix = s[0], suffix = s[1];
ArrayList<String> list = new ArrayList<>();
if (map.containsKey(prefix)) {
list = map.get(prefix);
}
list.add(suffix);
map.put(prefix, list);
}
// Print
for (Map.Entry<String, ArrayList<String>> entry : map.entrySet()) {
String key = entry.getKey();
ArrayList<String> valueList = entry.getValue();
System.out.println(key + " " + valueList);
}

for (String l : reports[i]) {
String[] g = l.split("_");
for (String r : pckg) {
if (g[0].equalsIgnoreCase(r)) {
report = dirFiles.get(g[0]);
if(report == null){ report = new ArrayList<String>(); } //create new report
report.add(g[1]);
dirFiles.put(g[0], report);
}
}
}
Removed the else part of the if condition. You are using break there which exits the inner loop and you never get to evaluate the keys beyond first key.
Added checking for existing values. As suggested by Orin2005.
Also I have moved the statement String[] g = l.split("_"); outside inner loop so that it doesn't get executed multiple times.

Reading and matching contents of two big files

I have two files each having the same format with approximately 100,000 lines. For each line in file one I am extracting the second component or column and if I find a match in the second column of second file, I extract their third components and combine them, store or output it.
Though my implementation works but the programs runs extremely slow, it takes more than an hour to iterate over the files, compare and output all the results.
I am reading and storing the data of both files in ArrayList then iterate over those list and do the comparison. Below is my code, is there any performance related glitch or its just normal for such an operation.
Note : I was using String.split() but I understand form other post that StringTokenizer is faster.
public ArrayList<String> match(String file1, String file2) throws IOException{
ArrayList<String> finalOut = new ArrayList<>();
try {
ArrayList<String> data = readGenreDataIntoMemory(file1);
ArrayList<String> data1 = readGenreDataIntoMemory(file2);
StringTokenizer st = null;
for(String line : data){
HashSet<String> genres = new HashSet<>();
boolean sameMovie = false;
String movie2 = "";
st = new StringTokenizer(line, "|");
//String line[] = fline.split("\\|");
String ratingInfo = st.nextToken();
String movie1 = st.nextToken();
String genreInfo = st.nextToken();
if(!genreInfo.equals("null")){
for(String s : genreInfo.split(",")){
genres.add(s);
}
}
StringTokenizer st1 = null;
for(String line1 : data1){
st1 = new StringTokenizer(line1, "|");
st1.nextToken();
movie2 = st1.nextToken();
String genreInfo2= st1.nextToken();
//If the movie name are similar then they should have the same genre
//Update their genres to be the same
if(!genreInfo2.equals("null") && movie1.equals(movie2)){
for(String s : genreInfo2.split(",")){
genres.add(s);
}
sameMovie = true;
break;
}
}
if(sameMovie){
finalOut.add(ratingInfo+""+movieName+""+genres.toString()+"\n");
}else if(sameMovie == false){
finalOut.add(line);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return finalOut;
}

I would use the Streams API
String file1 = "files1.txt";
String file2 = "files2.txt";
// get all the lines by movie name for each file.
Map<String, List<String[]>> map = Stream.of(Files.lines(Paths.get(file1)),
Files.lines(Paths.get(file2)))
.flatMap(p -> p)
.parallel()
.map(s -> s.split("[|]", 3))
.collect(Collectors.groupingByConcurrent(sa -> sa[1], Collectors.toList()));
// merge all the genres for each movie.
map.forEach((movie, lines) -> {
Set<String> genres = lines.stream()
.flatMap(l -> Stream.of(l[2].split(",")))
.collect(Collectors.toSet());
System.out.println("movie: " + movie + " genres: " + genres);
});
This has the advantage of being O(n) instead of O(n^2) and it's multi-threaded.

Do a hash join.
As of now you are doing an outer loop join which is O(n^2), the hash join will be amortized O(n)
Put the contents of each file in a hash map, with key the field you want (second field).
Map<String,String> map1 = new HashMap<>();
// build the map from file1
Then do the hash join
for(String key1 : map1.keySet()){
if(map2.containsKey(key1)){
// do your thing you found the match
}
}

Aggregate data in CSV file using Java

I have a big CSV file, thousands of rows, and I want to aggregate some columns using java code.
The file in the form:
1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1
The results should be:
T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1

Put your data to a Map like structure, each time add +1 to a stored value when a key (in your case ""+T+year) found.

You can use map like
Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);
or you can define your own class with T and Year field by overriding hashcode and equals method. Then you can use
Map<YourClass, Integer> map= new HashMap<>();
T1,2012, 2

String csv =
"1,2012,T1\n"
+ "2,2015,T2\n"
+ "3,2013,T1\n"
+ "4,2012,T1\n";
Map<String, Integer> map = new TreeMap<>();
BufferedReader reader = new BufferedReader(new StringReader(csv));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
String key = fields[2] + "," + fields[1];
Integer value = map.get(key);
if (value == null)
value = 0;
map.put(key, value + 1);
}
System.out.println(map);
// -> {T1,2012=2, T1,2013=1, T2,2015=1}

Use uniVocity-parsers for the best performance. It should take 1 second to process 1 million rows.
CsvParserSettings settings = new CsvParserSettings();
settings.selectIndexes(1, 2); //select the columns we are going to read
final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here
//Use a custom implementation of RowProcessor
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.
Integer count = results.get(key);
if (count == null) {
count = 0;
}
results.put(key, count + 1);
}
});
//creates a parser with the above configuration and RowProcessor
CsvParser parser = new CsvParser(settings);
String input = "1,2012,T1"
+ "\n2,2015,T2"
+ "\n3,2013,T1"
+ "\n4,2012,T1";
//the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
parser.parse(new StringReader(input));
//Here are the results:
for(Entry<List<String>, Integer> entry : results.entrySet()){
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
Output:
[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

how to avoid duplicates and count the list once stored in LinkedHashMap

I have a scenario to count number of names which is stored in the LinkedHashMap , and names can be duplicated but i should not count the duplicate name.
Below is Sample code:
LinkedHashMap<Long,MyApplicationDTO> myApps = (LinkedHashMap<Long,MyApplicationDTO>) request.getAttribute("data");
for (Map.Entry app : myApps.entrySet()) {
Long ID = (Long)app.getKey() ;
MyApplicationDTO singleMyApp = (MyApplicationDTO) app.getValue();
LinkedHashMap<Long, MyDTO> myList = singleMyApp.getMyList();
String name = "";
for (Map.Entry details : myList.entrySet()) {
Long id1 = (Long)details.getKey() ;
MyDTO myDetails = (MyDTO) details.getValue();
name = myDetails.getName(); // For first time it stores A
//how to loop so that i can only get the count of names as 3 by avoiding duplicate names from the below shown list.
//A B A B A B C
}
}
On the Screen i have something as below:
Name :
A
B
A
B
A
B
C
I have to print the count of the name as 3(non repeating names)

As you iterate over the entrySet, add all names to a Set<String>. Then output set.size().
The Set will not add duplicates when you add names by set.add(name), so the size of the set will be the count of uniqe names.

LinkedHashMap<Long,MyApplicationDTO> myApps = (LinkedHashMap<Long,MyApplicationDTO>) request.getAttribute("data");
for (Map.Entry app : myApps.entrySet()) {
Long ID = (Long)app.getKey() ;
MyApplicationDTO singleMyApp = (MyApplicationDTO) app.getValue();
LinkedHashMap<Long, MyDTO> myList = singleMyApp.getMyList();
String name = "";
Set<String> uniqueNames = new HashSet<String>();
for (Map.Entry details : myList.entrySet()) {
Long id1 = (Long)details.getKey() ;
MyDTO myDetails = (MyDTO) details.getValue();
name = myDetails.getName(); // For first time it stores A
//how to loop so that i can only get the count of names as 3 by avoiding duplicate names from the below shown list.
//A B A B A B C
uniqueNames.add(name);
}
}
To get size do =
uniqueNames.size();

I'm not sure I'm understanding your question entirely, but if you're just looking to count the occurrences of unique values in the LinkedHashmap you can do something like this `
public static void main(String[] args) {
LinkedHashMap<Long, String> myApps = new LinkedHashMap<Long, String>();
myApps.put(4L, "A");
myApps.put(14L, "B");
myApps.put(44L, "A");
myApps.put(54L, "B");
myApps.put(46L, "A");
myApps.put(543L, "B");
myApps.put(144L, "C");
ArrayList<String> names = new ArrayList<String>();
for (Map.Entry app : myApps.entrySet()) {
if (!(names.contains(app.getValue()))) {
names.add(app.getValue().toString());
}
}
System.out.println(names.size());
for (String s : names ) {
System.out.print(s + " ");
}
}

One liner with JAVA 8 Stream API
LinkedHashMap<Long, MyDTO> myList = singleMyApp.getMyList();
long disctinctNamesCount = myList.values().stream().map(MyDTO::getNAme).distinct().count();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find duplicates in first column and take average based on third column - java

I don't know what is your logic to complete the average time but you can save data in map for one particular transaction. The map structure can be like this. Transaction id will be the key and all the time will be in array list. Map<String,List<String>> map = new HashMap<String,List<String>>();

Related

Get,Put key and values from nested hashmap

How to Loop next element in hashmap

Reading and matching contents of two big files

Aggregate data in CSV file using Java

how to avoid duplicates and count the list once stored in LinkedHashMap

Categories

Resources