I want to create a nested HashMap which returns the frequency of terms among multiple files. Like,
Map<String, Map<String, Integer>> wordToDocumentMap=new HashMap<>();
I have been able to return the number of times a term appears in a file.
Map<String, Integer> map = new HashMap<>();//for frequecy count
String str = "Wikipedia is a free online encyclopedia, created and edited by
volunteers around the world."; //String str suppose a file a.java
// The query string
String query = "edited Wikipedia volunteers";
// Split the given string and the query string on space
String[] strArr = str.split("\\s+");
String[] queryArr = query.split("\\s+");
// Map to hold the frequency of each word of query in the string
Map<String, Integer> map = new HashMap<>();
for (String q : queryArr) {
for (String s : strArr) {
if (q.equals(s)) {
map.put(q, map.getOrDefault(q, 0) + 1);
}
}
}
// Display the map
System.out.println(map);
In my code its count the frequency of the given query Individually. But I want to Map the query term and its frequency with its filenames. I have searched around the web for a solution but am finding it tough to find a solution that applies to me. Any help would be appreciated!
I hope I'm understanding you correctly.
What you want is to be able to read in a list of files and map the file name to the map you create in the code above. So let's start with your code and let's turn it into a function:
public Map<String, Integer> createFreqMap(String str, String query) {
Map<String, Integer> map = new HashMap<>();//for frequecy count
// The query string
String query = "edited Wikipedia volunteers";
// Split the given string and the query string on space
String[] strArr = str.split("\\s+");
String[] queryArr = query.split("\\s+");
// Map to hold the frequency of each word of query in the string
Map<String, Integer> map = new HashMap<>();
for (String q : queryArr) {
for (String s : strArr) {
if (q.equals(s)) {
map.put(q, map.getOrDefault(q, 0) + 1);
}
}
}
// Display the map
System.out.println(map);
return map;
}
OK so now you have a nifty function that makes a map from a string and a query
Now you're going to want to set up a system for reading in a file to a string.
There are a bunch of ways to do this. You can look here for some ways that work for different java versions: https://stackoverflow.com/a/326440/9789673
lets go with this (assuming >java 11):
String content = Files.readString(path, StandardCharsets.US_ASCII);
Where path is the path to the file you want.
Now we can put it all together:
String[] paths = ["this.txt", "that.txt"]
Map<String, Map<String, Integer>> output = new HashMap<>();
String query = "edited Wikipedia volunteers"; //String query = "hello";
for (int i = 0; i < paths.length; i++) {
String content = Files.readString(paths[i], StandardCharsets.US_ASCII);
output.put(paths[i], createFreqMap(content, query);
}
Related
My issue here is I need to compute average time for each Id and compute average time of each id.
Sample data
T1,2020-01-16,11:16pm,start
T2,2020-01-16,11:18pm,start
T1,2020-01-16,11:20pm,end
T2,2020-01-16,11:23pm,end
I have written a code in such a way that I kept first column and third column in a map.. something like
T1, 11:16pm
but I could not able to compute values after keeping those values in a map. Also tried to keep them in string array and split into line by line. By same issue facing for that approach also.
**
public class AverageTimeGenerate {
public static void main(String[] args) throws IOException {
File file = new File("/abc.txt");
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
ArrayList<String> list = new ArrayList<>();
String[] tokens = line.split(",");
for (String s: tokens) {
list.add(s);
}
Map<String, String> map = new HashMap<>();
String[] data = line.split(",");
String ids= data[0];
String dates = data[1];
String transactionTime = data[2];
String transactionStartAndEndTime = data[3];
String[] transactionIds = ids.split("/n");
String[] timeOfEachTransaction = transactionTime.split("/n");
for(String id : transactionIds) {
for(String time : timeOfEachTransaction) {
map.put(id, time);
}
}
}
}
}
}
Can anyone suggest me is it possible to find duplicates in a map and compute values in map, Or is there any other way I can do this so that the output should be like
`T1 2:00
T2 5:00'
I don't know what is your logic to complete the average time but you can save data in map for one particular transaction. The map structure can be like this. Transaction id will be the key and all the time will be in array list.
Map<String,List<String>> map = new HashMap<String,List<String>>();
You can do like this:
Map<String, String> result = Files.lines(Paths.get("abc.txt"))
.map(line -> line.split(","))
.map(arr -> {
try {
return new AbstractMap.SimpleEntry<>(arr[0],
new SimpleDateFormat("HH:mm").parse(arr[2]));
} catch (ParseException e) {
return null;
}
}).collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.collectingAndThen(Collectors
.mapping(Map.Entry::getValue, Collectors.toList()),
list -> toStringTime.apply(convert.apply(list)))));
for simplify I've declared two functions.
Function<List<Date>, Long> convert = list -> (list.get(1).getTime() - list.get(0).getTime()) / 2;
Function<Long, String> toStringTime = l -> l / 60000 + ":" + l % 60000 / 1000;
The string can be
"accountno=18&username=abc&password=1236" or "username=abc&accountno=18&password=1236" or the accountno can be present anywhere in the string.
I need to get the accountno details from this string using a key value pair. I used spilt on "&" but I'm unable to get the result.
import java.util.regex.*;
public class RegexStrings {
public static void main(String[] args) {
String input = "accountno=18&username=abc&password=1236";
String exten = null;
Matcher m = Pattern.compile("^accountno: (.&?)$", Pattern.MULTILINE).matcher(input);
if (m.find()) {
exten = m.group(1);
}
System.out.println("AccountNo: "+exten);
}
}
How can I get the accountno value from this above string as key value pair in java
You may handle this by first splitting on & to isolate each key/value pair, then iterate that collection and populate a map:
String input = "accountno=18&username=abc&password=1236";
String[] parts = input.split("&");
Map<String, String> map = new HashMap<>();
for (String part : parts) {
map.put(part.split("=")[0], part.split("=")[1]);
}
System.out.println("account number is: " + map.get("accountno"));
This prints:
account number is: 18
Using some simple tools, like string.split and Map, you can easly do that:
Map<String, String> parse(String frase){
Map<String, String> map = new TreeMap<>();
String words[] = frase.aplit("\\&");
for(String word : words){
String keyValuePair = word.split("\\=");
String key = keyValuePair[0];
String value = keyValuePair[1];
map.put(key, value);
}
return map;
}
To get a specific value, like "accountno", just retrive that key map.get("accountno")
The same answer mentioned by #vinicius can be achieved using Java 8 by :
Map<String, String> map = Arrays.stream(input.split("&"))
.map(str -> str.split("="))
.collect(Collectors.toMap(s -> s[0],s -> s[1]));
//To retrieve accountno from map
System.out.println(map.get("accountno"));
As you said
the accountno can be present anywhere in the string
String input = "accountno=18&username=abc&password=1236";
//input = "username=abc&accountno=19&password=1236";
Matcher m = Pattern.compile("&?accountno\\s*=\\s*(\\w+)&?").matcher(input);
if (m.find()) {
System.out.println("Account no " + m.group(1));
}
This would work even when accountno is somewhere in the middle of the string
Output:
Account no 18
You can try out regex here:
https://regex101.com/r/nOHmzc/2
I have a set of strings like this
A_2007-04, A_2007-09, A_Agent, A_Daily, A_Execute, A_Exec, B_Action, B_HealthCheck
I want output as:
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
Key = B, Value = [Action,HealthCheck]
I'm using HashMap to do this
pckg:{A,B}
count:total no of strings
reports:set of strings
Logic I used is nested loop:
for (String l : reports[i]) {
for (String r : pckg) {
String[] g = l.split("_");
if (g[0].equalsIgnoreCase(r)) {
report.add(g[1]);
dirFiles.put(g[0], report);
} else {
break;
}
}
}
I'm getting output as
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
How to get second key?
Can someone suggest logic for this?
Assuming that you use Java 8, it can be done using computeIfAbsent to initialize the List of values when it is a new key as next:
List<String> tokens = Arrays.asList(
"A_2007-04", "A_2007-09", "A_Agent", "A_Daily", "A_Execute",
"A_Exec", "P_Action", "P_HealthCheck"
);
Map<String, List<String>> map = new HashMap<>();
for (String token : tokens) {
String[] g = token.split("_");
map.computeIfAbsent(g[0], key -> new ArrayList<>()).add(g[1]);
}
In terms of raw code this should do what I think you are trying to achieve:
// Create a collection of String any way you like, but for testing
// I've simply split a flat string into an array.
String flatString = "A_2007-04,A_2007-09,A_Agent,A_Daily,A_Execute,A_Exec,"
+ "P_Action,P_HealthCheck";
String[] reports = flatString.split(",");
Map<String, List<String>> mapFromReportKeyToValues = new HashMap<>();
for (String report : reports) {
int underscoreIndex = report.indexOf("_");
String key = report.substring(0, underscoreIndex);
String newValue = report.substring(underscoreIndex + 1);
List<String> existingValues = mapFromReportKeyToValues.get(key);
if (existingValues == null) {
// This key hasn't been seen before, so create a new list
// to contain values which belong under this key.
existingValues = new ArrayList<>();
mapFromReportKeyToValues.put(key, existingValues);
}
existingValues.add(newValue);
}
System.out.println("Generated map:\n" + mapFromReportKeyToValues);
Though I recommend tidying it up and organising it into a method or methods as fits your project code.
Doing this with Map<String, ArrayList<String>> will be another good approach I think:
String reports[] = {"A_2007-04", "A_2007-09", "A_Agent", "A_Daily",
"A_Execute", "A_Exec", "P_Action", "P_HealthCheck"};
Map<String, ArrayList<String>> map = new HashMap<>();
for (String rep : reports) {
String s[] = rep.split("_");
String prefix = s[0], suffix = s[1];
ArrayList<String> list = new ArrayList<>();
if (map.containsKey(prefix)) {
list = map.get(prefix);
}
list.add(suffix);
map.put(prefix, list);
}
// Print
for (Map.Entry<String, ArrayList<String>> entry : map.entrySet()) {
String key = entry.getKey();
ArrayList<String> valueList = entry.getValue();
System.out.println(key + " " + valueList);
}
for (String l : reports[i]) {
String[] g = l.split("_");
for (String r : pckg) {
if (g[0].equalsIgnoreCase(r)) {
report = dirFiles.get(g[0]);
if(report == null){ report = new ArrayList<String>(); } //create new report
report.add(g[1]);
dirFiles.put(g[0], report);
}
}
}
Removed the else part of the if condition. You are using break there which exits the inner loop and you never get to evaluate the keys beyond first key.
Added checking for existing values. As suggested by Orin2005.
Also I have moved the statement String[] g = l.split("_"); outside inner loop so that it doesn't get executed multiple times.
I have two files each having the same format with approximately 100,000 lines. For each line in file one I am extracting the second component or column and if I find a match in the second column of second file, I extract their third components and combine them, store or output it.
Though my implementation works but the programs runs extremely slow, it takes more than an hour to iterate over the files, compare and output all the results.
I am reading and storing the data of both files in ArrayList then iterate over those list and do the comparison. Below is my code, is there any performance related glitch or its just normal for such an operation.
Note : I was using String.split() but I understand form other post that StringTokenizer is faster.
public ArrayList<String> match(String file1, String file2) throws IOException{
ArrayList<String> finalOut = new ArrayList<>();
try {
ArrayList<String> data = readGenreDataIntoMemory(file1);
ArrayList<String> data1 = readGenreDataIntoMemory(file2);
StringTokenizer st = null;
for(String line : data){
HashSet<String> genres = new HashSet<>();
boolean sameMovie = false;
String movie2 = "";
st = new StringTokenizer(line, "|");
//String line[] = fline.split("\\|");
String ratingInfo = st.nextToken();
String movie1 = st.nextToken();
String genreInfo = st.nextToken();
if(!genreInfo.equals("null")){
for(String s : genreInfo.split(",")){
genres.add(s);
}
}
StringTokenizer st1 = null;
for(String line1 : data1){
st1 = new StringTokenizer(line1, "|");
st1.nextToken();
movie2 = st1.nextToken();
String genreInfo2= st1.nextToken();
//If the movie name are similar then they should have the same genre
//Update their genres to be the same
if(!genreInfo2.equals("null") && movie1.equals(movie2)){
for(String s : genreInfo2.split(",")){
genres.add(s);
}
sameMovie = true;
break;
}
}
if(sameMovie){
finalOut.add(ratingInfo+""+movieName+""+genres.toString()+"\n");
}else if(sameMovie == false){
finalOut.add(line);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return finalOut;
}
I would use the Streams API
String file1 = "files1.txt";
String file2 = "files2.txt";
// get all the lines by movie name for each file.
Map<String, List<String[]>> map = Stream.of(Files.lines(Paths.get(file1)),
Files.lines(Paths.get(file2)))
.flatMap(p -> p)
.parallel()
.map(s -> s.split("[|]", 3))
.collect(Collectors.groupingByConcurrent(sa -> sa[1], Collectors.toList()));
// merge all the genres for each movie.
map.forEach((movie, lines) -> {
Set<String> genres = lines.stream()
.flatMap(l -> Stream.of(l[2].split(",")))
.collect(Collectors.toSet());
System.out.println("movie: " + movie + " genres: " + genres);
});
This has the advantage of being O(n) instead of O(n^2) and it's multi-threaded.
Do a hash join.
As of now you are doing an outer loop join which is O(n^2), the hash join will be amortized O(n)
Put the contents of each file in a hash map, with key the field you want (second field).
Map<String,String> map1 = new HashMap<>();
// build the map from file1
Then do the hash join
for(String key1 : map1.keySet()){
if(map2.containsKey(key1)){
// do your thing you found the match
}
}
I have a string in the format nm=Alan&hei=72&hair=brown
I would like to split this information up, add a conversion to the first value and print the results in the format
nm Name Alan
hei Height 72
hair Hair Color brown
I've looked at various methods using the split function and hashmaps but have had no luck piecing it all together.
Any advice would be very useful to me.
Map<String, String> aliases = new HashMap<String, String>();
aliases.put("nm", "Name");
aliases.put("hei", "Height");
aliases.put("hair", "Hair Color");
String[] params = str.split("&"); // gives you string array: nm=Alan, hei=72, hair=brown
for (String p : params) {
String[] nv = p.split("=");
String name = nv[0];
String value = nv[1];
System.out.println(nv[0] + " " + aliases.get(nv[0]) + " " + nv[1]);
}
I really do not understand what you problem was...
Try something like this:
static final String DELIMETER = "&"
Map<String,String> map = ...
map.put("nm","Name");
map.put("hei","Height");
map.put("hair","Hair color");
StringBuilder builder = new StringBuilder();
String input = "nm=Alan&hei=72&hair=brown"
String[] splitted = input.split(DELIMETER);
for(Stirng str : splitted){
int index = str.indexOf("=");
String key = str.substring(0,index);
builder.append(key);
builder.append(map.get(key));
builder.append(str.substring(index));
builder.append("\n");
}
A HashMap consists of many key, value pairs. So when you use split, devise an appropriate regex (&). Once you have your string array, you can use one of the elements as the key (think about which element will make the best key). However, you may now be wondering- "how do I place the rest of elements as the values?". Perhaps you can create a new class which stores the rest of the elements and use objects of this class as values for the hashmap.
Then printing becomes easy- merely search for the value of the corresponding key. This value will be an object; use the appropriate method on this object to retrieve the elements and you should be able to print everything.
Also, remember to handle exceptions in your code. e.g. check for nulls, etc.
Another thing: your qn mentions the word "sort". I don't fully get what that means in this context...
Map<String, String> propsMap = new HashMap<String, String>();
Map<String, String> propAlias = new HashMap<String, String>();
propAlias.put("nm", "Name");
propAlias.put("hei", "Height");
propAlias.put("hair", "Hair Color");
String[] props = input.split("&");
if (props != null && props.length > 0) {
for (String prop : props) {
String[] propVal = prop.split("=");
if (propVal != null && propVal.length == 2) {
propsMap.put(propVal[0], propVal[1]);
}
}
}
for (Map.Entry tuple : propsMap.getEntrySet()) {
if (propAlias.containsKey(tuple.getKey())) {
System.out.println(tuple.getKey() + " " + propAlias.get(tuple.getKey()) + " " + tuple.getValue());
}
}