HashMap String and Count number of times each word is used

HashMap String and Count number of times each word is used - java

The question below is in Java
Sample data : https://tartarus.org/martin/PorterStemmer/output.txt
I have a tokenizationString String array that contain words that similar to the list above with many duplicated words.
I have to conver that string array into a hashmap and then use the hashmap to count the number of times each word is used (count the duplicated value in the string array but i have to use hashmap related method) .
I am thinking of doing in this way
Map<Integer, String> hashMap = new HashMap<Integer, String>();
for(int i = 0 ; i < tokenizationString.length; i++)
{
hashMap.put(i, tokenizationString[i]);
}
After that I will have to sort the string array by # of time they are used.
In the end I want to be able to print out the result like:
the "was used" 502 "times"
i "was used" 50342 "times"
apple "was used" 50 "times"

Firstly, your map should be like Map<String, Integer>(string and its frequency).
I am giving you the Java 8 stream solution.
public static void main(String[] args) {
try (Stream<String> lines = Files.lines(Paths.get("out.txt"))) {
Map<String, Long> frequency = lines
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.collect(Collectors.toMap(
Map.Entry::getKey,
Map.Entry::getValue,
(o, n) -> o,
LinkedHashMap::new
));
} catch (IOException e) {
e.printStackTrace();
}
}
Above code will read from file line by line. Then collect as a frequency map. Then again convert them into stream of entrySet. Then sort the stream based on the value in reverse order. Lastly collect them as a LinkedHashMap. LinkedHashMap because it will maintain the insersion order. Take look at Java 8 Stream API.

Instead of
hashMap.put(i, tokenizationString[i]);
first check if the word is already present, and then increment the corresponding entry:
int count = hashMap.containsKey(tokenizationString[i]) ? hashMap.get(tokenizationString[i]) : 0;
hashMap.put(tokenizationString[i], count + 1);

you can achieve this by Google Gauva library's MultiMap class as below. Also find the working example at this link - https://gist.github.com/dkalawadia/8d06fba1c2c87dd94ab3e803dff619b0
FileInputStream fstream = null;
BufferedReader br = null;
try {
fstream = new FileInputStream("C:\\temp\\output.txt");
br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
Multimap<String, String> multimap = ArrayListMultimap.create();
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
multimap.put(strLine, strLine);
}
for (String key : multimap.keySet()) {
System.out.println(key + "was used " + multimap.get(key).size() + "times");
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (fstream != null) {
fstream.close();
}
if(br!=null){
br.close();
}
}

Related

How to serialize/save a very large hashmap in a file, knowing the position of each key in the file?

I have a large HashMap <String, List<String>> that I want to save in a file. I don't want to serialize it with Java's default methods because they also store a lot of things I don't need, such as the information of the class (I only want the strings, basically). I also would like to know where each of the keys is stored in the file, so I don't have to lookup the entire file to find it. (the file/hashmap will be too large to keep it all in memory). My idea was to loop through the file and just calculate how many bytes have been used for writing this key and value pair, and storing the exact position of them in a HashMap of the format <String, Long>.
For example, Imagine I have a hashmap
{
"car01":["car", "coche", "macchina", "automobil"],
"dog01": ["dog", "perro", "cane", "cao"]
}
Then the file could be something like
car01[car,coche,macchina,automobil]dog01[dog,perro,cane,cao]
And the index hashmap could be something like
{"car01":0, "dog01":35}
I tried iterating like this:
long characterCount = 0;
HashMap<String, List<String>> index = indexOfIndexes.get(indexName);
Path path = Paths.get(outputfile);
try(Writer writer = Files.newBufferedWriter(path)) {
index.forEach((key, value) -> {
try {
writer.write(key + value);
}
catch (IOException ex) { throw new UncheckedIOException(ex); }
});
} catch(UncheckedIOException ex) { throw ex.getCause(); }
But I don't know how to calculate the amount of characters/bytes used efficiently each time.

I think you can use String's getBytes function to calculate the serialized length.
something like:
long characterCount = 0;
HashMap<String, List<String>> index = indexOfIndexes.get(indexName);
Map<String, Long> count= new HashMap<>();
Path path = Paths.get(outputfile);
try(Writer writer = Files.newBufferedWriter(path)) {
index.forEach((key, value) -> {
try {
count.put(key, characterCount);
writer.write(key + value);
characterCount= characterCount+ (key+ value).getBytes().length;
}
catch (IOException ex) { throw new UncheckedIOException(ex); }
});
} catch(UncheckedIOException ex) { throw ex.getCause(); }

Based on #Haijin 's anwer
Writer writer = null;
long characterCount = 0;
HashMap<String, List<String>> index = indexOfIndexes.get(indexName);
HashMap<String, Long> count = new HashMap<>();
Path path = Paths.get(outputfile);
try {
writer = new BufferedWriter(new FileWriter(outputfile));
for (String key : index.keySet()) {
count.put(key, characterCount);
writer.write(key + index.get(key));
characterCount = characterCount + (key + index.get(key)).getBytes().length;
}
characterPositions.put(indexName,count);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (writer != null) try { writer.close(); } catch (IOException ignore) {}
}

Find any key from a hashmap in a file using java 8

I want to find a key(to be taken from hashmap) in a file using java 8. If any key is found in the file it should print true. Below is my code
public static void main(String[] args) {
String fileName = "C:\\Users\\ABC\\Desktop\\Paper_R2_Final.txt";
Map<String, String> myMap = new HashMap<String,String>();
myMap.put("ecological", "myFirstKey");
myMap.put("Survey", "mySecondKey");
myMap.put("Worth", "myThirdKey");
//read file into stream, try-with-resources
try (Stream<String> stream = Files.lines(Paths.get(fileName),StandardCharsets.ISO_8859_1)) {
myMap.forEach((k,v)->System.out.println("key: " + k + ", value: " + v));
//Problem in the below line
System.out.println(stream.anyMatch(line->line.contains((CharSequence) myMap.keySet())));
} catch (IOException e) {
e.printStackTrace();
}
}

Try this:
public static void main(String[] args) {
String fileName = "C:\\Users\\ABC\\Desktop\\Paper_R2_Final.txt";
Map<String, String> myMap = new HashMap<String,String>();
myMap.put("ecological", "myFirstKey");
myMap.put("Survey", "mySecondKey");
myMap.put("Worth", "myThirdKey");
List<String> myList = new ArrayList<String>(myMap.keySet());
//if the line contains any of the keys
Predicate<String> p = (str) -> myList.stream().anyMatch(key -> str.contains(key));
//read file into stream, try-with-resources
try (Stream<String> stream = Files.lines(Paths.get(fileName),StandardCharsets.ISO_8859_1)) {
boolean foundAKey = stream.anyMatch(p);
if(foundAKey) {
//a key is found
}
} catch (IOException e) {
e.printStackTrace();
}
}

myMap.keySet() is giving you a Set, which is a collection. Casting it to a CharSequence makes no sense and will not give you what you are expecting.
One way of doing what you want would be to tokenize your line (for example split on the spaces), and check if your keySet contains the tokens one by one.
Some pseudo java code :
keySet = myMap.keySet();
for each line in the file {
tokens = line.split(" ");
for each token in tokens {
if keySet.contains(token) {
// Do whatever you want
}
}
}

Try this:
Files.lines(Paths.get(fileName), StandardCharsets.ISO_8859_1)
.anyMatch(line -> myMap.keySet().stream().anyMatch(line::contains));

Merge two array list into a TreeMap in java

I want to combine these two text files
Driver details text file:
AB11; Angela
AB22; Beatrice
Journeys text file:
AB22,Edinburgh ,6
AB11,Thunderdome,1
AB11,Station,5
And I want my output to be only the names and where the person has been. It should look like this:
Angela
Thunderdone
Station
Beatrice
Edinburgh
Here is my code. I'm not sure what i'm doing wrong but i'm not getting the right output.
ArrayList<String> names = new ArrayList<String>();
TreeSet<String> destinations = new TreeSet<String>();
public TaxiReader() {
BufferedReader brName = null;
BufferedReader brDest = null;
try {
// Have the buffered readers start to read the text files
brName = new BufferedReader(new FileReader("taxi_details.txt"));
brDest = new BufferedReader(new FileReader("2017_journeys.txt"));
String line = brName.readLine();
String lines = brDest.readLine();
while (line != null && lines != null ){
// The input lines are split on the basis of certain characters that the text files use to split up the fields within them
String name [] = line.split(";");
String destination [] = lines.split(",");
// Add names and destinations to the different arraylists
String x = new String(name[1]);
//names.add(x);
String y = new String (destination[1]);
destinations.add(y);
// add arraylists to treemap
TreeMap <String, TreeSet<String>> taxiDetails = new TreeMap <String, TreeSet<String>> ();
taxiDetails.put(x, destinations);
System.out.println(taxiDetails);
// Reads the next line of the text files
line = brName.readLine();
lines = brDest.readLine();
}
// Catch blocks exist here to catch every potential error
} catch (FileNotFoundException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
// Finally block exists to close the files and handle any potential exceptions that can happen as a result
} finally {
try {
if (brName != null)
brName.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
public static void main (String [] args){
TaxiReader reader = new TaxiReader();
}

You are reading 2 files in parallel, I don't think that's gonna work too well. Try reading one file at a time.
Also you might want to rethink your data structures.
The first file relates a key "AB11" to a value "Angela". A map is better than an arraylist:
Map<String, String> names = new HashMap<String, String>();
String key = line.split(",")[0]; // "AB11"
String value = line.split(",")[1]; // "Angela"
names.put(key, value)
names.get("AB11"); // "Angela"
Similarly, the second file relates a key "AB11" to multiple values "Thunderdome", "Station". You could also use a map for this:
Map<String, List<String>> destinations = new HashMap<String, List<String>>();
String key = line.split(",")[0]; // "AB11"
String value = line.split(",")[1]; // "Station"
if(map.get(key) == null) {
List<String> values = new LinkedList<String>();
values.add(value);
map.put(key, values);
} else {
// we already have a destination value stored for this key
// add a new destination to the list
List<String> values = map.get(key);
values.add(value);
}
To get the output you want:
// for each entry in the names map
for(Map.Entry<String, String> entry : names.entrySet()) {
String key = entry.getKey();
String name = entry.getValue();
// print the name
System.out.println(name);
// use the key to retrieve the list of destinations for this name
List<String> values = destinations.get(key);
for(String destination : values) {
// print each destination with a small indentation
System.out.println(" " + destination);
}
}

Sorted Read in with BufferedReader

I have a Log File where 2 records belong together with the same ID :
2016-09-29 10:50:48.377 [http-100-exec-1] 4711 ffb0dbcc-2615-40f8 request-log...
2016-09-29 10:50:48.377 [http-100-exec-1] 4711 ffb0dbcc-2615-40f8 response-log...
2016-09-29 10:50:47.749 [http-100-exec-1] 4711 5af0cc2f-5525-4748 request-log...
2016-09-29 10:50:47.867 [http-100-exec-1] 4711 fc2f7ff6-da1e-4309 request-log...
2016-09-29 10:50:47.758 [http-100-exec-1] 4711 5af0cc2f-5525-4748 response-log...
2016-09-29 10:50:47.873 [http-100-exec-1] 4711 fc2f7ff6-da1e-4309 response-log...
Now, I want to open this file with a BufferedReader and parse each line into a sorted table. Each parsed line should be sorted by the ID (2 records have always the same ID) (last column, e.g. ffb0dbcc-2615-40f8), in the table.
How can I do this?

One option here would be to use a sorted map to store each line from the log file.
Update:
It appears that the IDs may not all be distinct. In this case, we can keep a counter of the records read in, and form the hash key using a combination of this counter and the actual ID.
For example, the two records with ID ffb0dbcc-2615-40f8 might have keys ffb0dbcc-2615-40f8-0 and ffb0dbcc-2615-40f8-1.
Map<String, String> map = new TreeMap<>();
BufferedReader br = null;
try {
String line;
br = new BufferedReader(new FileReader("C:\\log.txt"));
int counter = 0;
while ((line = br.readLine()) != null) {
String key = line.split("\\s+")[4];
key = key + "-" + counter;
map.put(key, line);
++counter;
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null) br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
// now you can iterate over the log statements in order by ID
for (Map.Entry<String,String> entry : map.entrySet()) {
System.out.println(entry.getKey() + " => " + entry.getValue());
}

Using streams
public static List<String> getSortedLines(String path) throws IOException {
return Files.lines(Paths.get(path))
.sorted((line1, line2) -> get5thWord(line1).compareTo(get5thWord(line2)))
.collect(Collectors.toList());
}
public static String get5thWord(String line) {
return line.split(" ")[4];
}

Read file and get key=value without using java.util.Properties

I'm building a RMI game and the client would load a file that has some keys and values which are going to be used on several different objects. It is a save game file but I can't use java.util.Properties for this (it is under the specification). I have to read the entire file and ignore commented lines and the keys that are not relevant in some classes. These properties are unique but they may be sorted in any order. My file current file looks like this:
# Bio
playerOrigin=Newlands
playerClass=Warlock
# Armor
playerHelmet=empty
playerUpperArmor=armor900
playerBottomArmor=armor457
playerBoots=boot109
etc
These properties are going to be written and placed according to the player's progress and the filereader would have to reach the end of file and get only the matched keys. I've tried different approaches but so far nothing came close to the results that I would had using java.util.Properties. Any idea?

This will read your "properties" file line by line and parse each input line and place the values in a key/value map. Each key in the map is unique (duplicate keys are not allowed).
package samples;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.TreeMap;
public class ReadProperties {
public static void main(String[] args) {
try {
TreeMap<String, String> map = getProperties("./sample.properties");
System.out.println(map);
}
catch (IOException e) {
// error using the file
}
}
public static TreeMap<String, String> getProperties(String infile) throws IOException {
final int lhs = 0;
final int rhs = 1;
TreeMap<String, String> map = new TreeMap<String, String>();
BufferedReader bfr = new BufferedReader(new FileReader(new File(infile)));
String line;
while ((line = bfr.readLine()) != null) {
if (!line.startsWith("#") && !line.isEmpty()) {
String[] pair = line.trim().split("=");
map.put(pair[lhs].trim(), pair[rhs].trim());
}
}
bfr.close();
return(map);
}
}
The output looks like:
{playerBoots=boot109, playerBottomArmor=armor457, playerClass=Warlock, playerHelmet=empty, playerOrigin=Newlands, playerUpperArmor=armor900}
You access each element of the map with map.get("key string");.
EDIT: this code doesn't check for a malformed or missing "=" string. You could add that yourself on the return from split by checking the size of the pair array.

I 'm currently unable to come up with a framework that would just provide that (I'm sure there are plenty though), however, you should be able to do that yourself.
Basically you just read the file line by line and check whether the first non whitespace character is a hash (#) or whether the line is whitespace only. You'd ignore those lines and try to split the others on =. If for such a split you don't get an array of 2 strings you have a malformed entry and handle that accordingly. Otherwise the first array element is your key and the second is your value.

Alternately, you could use a regular expression to get the key/value pairs.
(?m)^[^#]([\w]+)=([\w]+)$
will return capture groups for each key and its value, and will ignore comment lines.
EDIT:
This can be made a bit simpler:
[^#]([\w]+)=([\w]+)

After some study i came up with this solution:
public static String[] getUserIdentification(File file) throws IOException {
String key[] = new String[3];
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
String lines;
try {
while ((lines = br.readLine()) != null) {
String[] value = lines.split("=");
if (lines.startsWith("domain=") && key[0] == null) {
if (value.length <= 1) {
throw new IOException(
"Missing domain information");
} else {
key[0] = value[1];
}
}
if (lines.startsWith("user=") && key[1] == null) {
if (value.length <= 1) {
throw new IOException("Missing user information");
} else {
key[1] = value[1];
}
}
if (lines.startsWith("password=") && key[2] == null) {
if (value.length <= 1) {
throw new IOException("Missing password information");
} else {
key[2] = value[1];
}
} else
continue;
}
br.close();
} catch (IOException e) {
e.printStackTrace();
}
return key;
}
I'm using this piece of code to check the properties. Of course it would be wiser to use Properties library but unfortunately I can't.

The shorter way how to do that:
Properties properties = new Properties();
String confPath = "src/main/resources/.env";
try {
properties.load(new FileInputStream(confPath));
} catch (IOException e) {
e.printStackTrace();
}
String specificValueByKey = properties.getProperty("KEY");
Set<Object> allKeys = properties.keySet();
Collection<Object> values = properties.values();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HashMap String and Count number of times each word is used - java

Instead of hashMap.put(i, tokenizationString[i]); first check if the word is already present, and then increment the corresponding entry: int count = hashMap.containsKey(tokenizationString[i]) ? hashMap.get(tokenizationString[i]) : 0; hashMap.put(tokenizationString[i], count + 1);

Related

How to serialize/save a very large hashmap in a file, knowing the position of each key in the file?

Find any key from a hashmap in a file using java 8

Merge two array list into a TreeMap in java

Sorted Read in with BufferedReader

Read file and get key=value without using java.util.Properties

Categories

Resources