Search multiple HashMaps at the same time - java

tldr: How can I search for an entry in multiple (read-only) Java HashMaps at the same time?
The long version:
I have several dictionaries of various sizes stored as HashMap< String, String >. Once they are read in, they are never to be changed (strictly read-only).
I want to check whether and which dictionary had stored an entry with my key.
My code was originally looking for a key like this:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
if (map.containsKey(key))
return new DictionaryEntry(map.get(key), i);
}
return null;
}
Then it got a little more complicated: my search string could contain typos, or was a variant of the stored entry. Like, if the stored key was "banana", it is possible that I'd look up "bannana" or "a banana", but still would like the entry for "banana" returned. Using the Levenshtein-Distance, I now loop through all dictionaries and each entry in them:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
for (Map.Entry entry : map.entrySet) {
// Calculate Levenshtein distance, store closest match etc.
}
}
// return closest match or null.
}
So far everything works as it should and I'm getting the entry I want. Unfortunately I have to look up around 7000 strings, in five dictionaries of various sizes (~ 30 - 70k entries) and it takes a while. From my processing output I have the strong impression my lookup dominates overall runtime.
My first idea to improve runtime was to search all dictionaries parallely. Since none of the dictionaries is to be changed and no more than one thread is accessing a dictionary at the same time, I don't see any safety concerns.
The question is just: how do I do this? I have never used multithreading before. My search only came up with Concurrent HashMaps (but to my understanding, I don't need this) and the Runnable-class, where I'd have to put my processing into the method run(). I think I could rewrite my current class to fit into Runnable, but I was wondering if there is maybe a simpler method to do this (or how can I do it simply with Runnable, right now my limited understanding thinks I have to restructure a lot).
Since I was asked to share the Levenshtein-Logic: It's really nothing fancy, but here you go:
private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
Map.Entry _closestMatch = null;
int lsDist;
if (key == null) {
return null;
}
for (Map.Entry entry : _dictionary.entrySet()) {
// Perfect match
if (entry.getKey().equals(key)) {
return entry;
}
// Similar match
else {
int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);
// If "dist" is smaller than threshold and smaller than distance of already stored entry
if (dist < _maxLSDistance) {
if (_closestMatch == null || dist < _lsDistance) {
_closestMatch = entry;
_lsDistance = dist;
}
}
}
}
return _closestMatch
}

In order to use multi-threading in your case, could be something like:
The "monitor" class, which basically stores the results and coordinates the threads;
public class Results {
private int nrOfDictionaries = 4; //
private ArrayList<String> results = new ArrayList<String>();
public void prepare() {
nrOfDictionaries = 4;
results = new ArrayList<String>();
}
public synchronized void oneDictionaryFinished() {
nrOfDictionaries--;
System.out.println("one dictionary finished");
notifyAll();
}
public synchronized boolean isReady() throws InterruptedException {
while (nrOfDictionaries != 0) {
wait();
}
return true;
}
public synchronized void addResult(String result) {
results.add(result);
}
public ArrayList<String> getAllResults() {
return results;
}
}
The Thread it's self, which can be set to search for the specific dictionary:
public class ThreadDictionarySearch extends Thread {
// the actual dictionary
private String dictionary;
private Results results;
public ThreadDictionarySearch(Results results, String dictionary) {
this.dictionary = dictionary;
this.results = results;
}
#Override
public void run() {
for (int i = 0; i < 4; i++) {
// search dictionary;
results.addResult("result of " + dictionary);
System.out.println("adding result from " + dictionary);
}
results.oneDictionaryFinished();
}
}
And the main method for demonstration:
public static void main(String[] args) throws Exception {
Results results = new Results();
ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");
threadA.start();
threadB.start();
threadC.start();
threadD.start();
if (results.isReady())
// it stays here until all dictionaries are searched
// because in "Results" it's told to wait() while not finished;
for (String string : results.getAllResults()) {
System.out.println("RESULT: " + string);
}

I think the easiest would be to use a stream over the entry set:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
map.entrySet().parallelStream().foreach( (entry) ->
{
// Calculate Levenshtein distance, store closest match etc.
}
);
}
// return closest match or null.
}
Provided you are using java 8 of course. You could also wrap the outer loop into an IntStream as well. Also you could directly use the Stream.reduce to get the entry with the smallest distance.

Maybe try thread pools:
ExecutorService es = Executors.newFixedThreadPool(_numDictionaries);
for (int i = 0; i < _numDictionaries; i++) {
//prepare a Runnable implementation that contains a logic of your search
es.submit(prepared_runnable);
}
I believe you may also try to find a quick estimate of strings that completely do not match (i.e. significant difference in length), and use it to finish your logic ASAP, moving to next candidate.

I have my strong doubts that HashMaps are a suitable solution here, especially if you want to have some fuzzing and stop words. You should utilize a proper full text search solutions like ElaticSearch or Apache Solr or at least an available engine like Apache Lucene.
That being said, you can use a poor man's version: Create an array of your maps and a SortedMap, iterate over the array, take the keys of the current HashMap and store them in the SortedMap with the index of their HashMap. To retrieve a key, you first search in the SortedMap for said key, get the respective HashMap from the array using the index position and lookup the key in only one HashMap. Should be fast enough without the need for multiple threads to dig through the HashMaps. However, you could make the code below into a runnable and you can have multiple lookups in parallel.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;
public class Search {
public static void main(String[] arg) {
if (arg.length == 0) {
System.out.println("Must give a search word!");
System.exit(1);
}
String searchString = arg[0].toLowerCase();
/*
* Populating our HashMaps.
*/
HashMap<String, String> english = new HashMap<String, String>();
english.put("banana", "fruit");
english.put("tomato", "vegetable");
HashMap<String, String> german = new HashMap<String, String>();
german.put("Banane", "Frucht");
german.put("Tomate", "Gemüse");
/*
* Now we create our ArrayList of HashMaps for fast retrieval
*/
List<HashMap<String, String>> maps = new ArrayList<HashMap<String, String>>();
maps.add(english);
maps.add(german);
/*
* This is our index
*/
SortedMap<String, Integer> index = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
/*
* Populating the index:
*/
for (int i = 0; i < maps.size(); i++) {
// We iterate through or HashMaps...
HashMap<String, String> currentMap = maps.get(i);
for (String key : currentMap.keySet()) {
/* ...and populate our index with lowercase versions of the keys,
* referencing the array from which the key originates.
*/
index.put(key.toLowerCase(), i);
}
}
// In case our index contains our search string...
if (index.containsKey(searchString)) {
/*
* ... we find out in which map of the ones stored in maps
* the word in the index originated from.
*/
Integer mapIndex = index.get(searchString);
/*
* Next, we look up said map.
*/
HashMap<String, String> origin = maps.get(mapIndex);
/*
* Last, we retrieve the value from the origin map
*/
String result = origin.get(searchString);
/*
* The above steps can be shortened to
* String result = maps.get(index.get(searchString).intValue()).get(searchString);
*/
System.out.println(result);
} else {
System.out.println("\"" + searchString + "\" is not in the index!");
}
}
}
Please note that this is a rather naive implementation only provided for illustration purposes. It doesn't address several problems (you can't have duplicate index entries, for example).
With this solution, you are basically trading startup speed for query speed.

Okay!!..
Since your concern is to get faster response.
I would suggest you to divide the work between threads.
Lets you have 5 dictionaries May be keep three dictionaries to one thread and rest two will take care by another thread.
And then witch ever thread finds the match will halt or terminate the other thread.
May be you need an extra logic to do that dividing work ... But that wont effect your performance time.
And may be you need little more changes in your code to get your close match:
for (Map.Entry entry : _dictionary.entrySet()) {
you are using EntrySet But you are not using values anyway it seems getting entry set is a bit expensive. And I would suggest you to just use keySet since you are not really interested in the values in that map
for (Map.Entry entry : _dictionary.keySet()) {
For more details on the proformance of map Please read this link Map performances
Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.

Related

Is there a Map object with takes index and key and object? Java

I'm trying to emulate a rotor of an enigma machine in Java.
I need an object which takes an index, a key and an object, because I unsuccessfully tried HashMaps like this:
private HashMap<Integer,Integer> rotorWiring = new HashMap<Integer, Integer();
private HashMap<Integer,Integer> reverseRotorWiring = new HashMap<Integer, Integer>();
//The "wiring" of the rotor is set from a String,
public void setRotorWiring(String Wiring) {
if (Wiring.length()==26) {
for (int i=0; i<Wiring.length();i++ ) {
char tempChar = Wiring.charAt(i);
int valueOfChar = (int)tempChar-64;
if (valueOfChar<=26){
this.rotorWiring.put(i+1,valueOfChar);
this.reverseRotorWiring.put(valueOfChar,i+1);
}
}
}
}
So far so good, this allows me to translate e.x. an A to an E, however, once I tried to simulate a turn of the rotor like this:
//It should be mentioned that I designing the program to only accept characters a to z inclusive.
public void turn() {
for (int i=1;i<=rotorWiring.size();i++) {
if (i!=26) {
rotorWiring.replace(i, rotorWiring.get(i+1));
}
else {
rotorWiring.replace(i, rotorWiring.get(1));
}
}
for (int i=1;i<=rotorWiring.size();i++) {
if (i!=26) {
reverseRotorWiring.replace(i, rotorWiring.get(i+1));
}
}
}
However, I noticed that this rather simulates an offset of the internal wiring of the rotor rather than a turn... I'm asking for a "Map"-like solutions with an index, key and object, because that would allow me to offset the index of all the keys and objects by 1, thus simulating a turn.
I am, however, open to suggestions for different solutions to this problem.
It should be mentioned that I'm a bit of a novice, and therefore appreciate rather in-depth explanations.
Many thanks.
Welcome to StackOverflow. There doesn't exist an implementation of what you have described in JDK. However, there are more ways to achieve the storing of Integer-String-Object. Note that both the index and the key are unique by definition. Also, note that the index-key are tightly coupled. You might want to put a Map to another Map:
Map<Integer, Map<String, MyObject>> map;
Or use a collection characteristic for indices:
List<Map<String, MyObject>>
Be careful with removing items which change the index of all the subsequent elements - replace it with null instead to keep the indices. Alternatively, you can create a decorator for your defined object with index/key:
Map<Integer, MyDecoratedObject> map;
Where the MyDecoratedObject would look like:
public class MyDecoratedObject {
private final String key; // or int index
private final MyObject delegate;
// Full-args constructor, getters
}
Finally, it's up to you to pick a way that satisfied your requirements the most.
A map of maps was the solution! It was solved like this:
private HashMap<Integer,HashMap<Integer,Integer>> rotorWiring = new HashMap<Integer, HashMap<Integer,Integer>>();
private HashMap<Integer,HashMap<Integer,Integer>> reverseRotorWiring = new HashMap<Integer, HashMap<Integer,Integer>>();
public void setRotorWiring(String Wiring) {
if (Wiring.length()==26) {
for (int i=0; i<Wiring.length();i++ ) {
HashMap<Integer, Integer> wire = new HashMap<Integer, Integer>();
HashMap<Integer, Integer> reverseWire = new HashMap<Integer, Integer>();
char tempChar = Wiring.charAt(i);
int valueOfChar = (int)tempChar-64;
if (valueOfChar<=26){
wire.put(i+1,valueOfChar);
reverseWire.put(valueOfChar,i+1);
rotorWiring.put(i, wire);
reverseRotorWiring.put(i, reverseWire);
}
}
}
}

Java Hashmap - Multiple thread put

We've recently had a discussion at my work about whether we need to use ConcurrentHashMap or if we can simply use regular HashMap, in our multithreaded environment. The argument for HashMaps are two: it is faster then the ConcurrentHashMap, so we should use it if possible. And ConcurrentModificationException apparently only appears as you iterate over the Map as it is modified, so "if we only PUT and GET from the map, what is the problem with the regular HashMap?" was the arguments.
I thought that concurrent PUT actions or concurrent PUT and READ could lead to exceptions, so I put together a test to show this. The test is simple; create 10 threads, each which writes the same 1000 key-value pairs into the map again-and-again for 5 seconds, then print the resulting map.
The results were quite confusing actually:
Length:1299
Errors recorded: 0
I thought each key-value pair was unique in a HashMap, but looking through the map, I can find multiple Key-Value pairs that are identical. I expected either some kind of exception or corrupted keys or values, but I did not expect this. How does this occur?
Here's the code I used, for reference:
public class ConcurrentErrorTest
{
static final long runtime = 5000;
static final AtomicInteger errCount = new AtomicInteger();
static final int count = 10;
public static void main(String[] args) throws InterruptedException
{
List<Thread> threads = new LinkedList<>();
final Map<String, Integer> map = getMap();
for (int i = 0; i < count; i++)
{
Thread t = getThread(map);
threads.add(t);
t.start();
}
for (int i = 0; i < count; i++)
{
threads.get(i).join(runtime + 1000);
}
for (String s : map.keySet())
{
System.out.println(s + " " + map.get(s));
}
System.out.println("Length:" + map.size());
System.out.println("Errors recorded: " + errCount.get());
}
private static Map<String, Integer> getMap()
{
Map<String, Integer> map = new HashMap<>();
return map;
}
private static Map<String, Integer> getConcMap()
{
Map<String, Integer> map = new ConcurrentHashMap<>();
return map;
}
private static Thread getThread(final Map<String, Integer> map)
{
return new Thread(new Runnable() {
#Override
public void run()
{
long start = System.currentTimeMillis();
long now = start;
while (now - start < runtime)
{
try
{
for (int i = 0; i < 1000; i++)
map.put("i=" + i, i);
now = System.currentTimeMillis();
}
catch (Exception e)
{
System.out.println("P - Error occured: " + e.toString());
errCount.incrementAndGet();
}
}
}
});
}
}
What you're faced with seems to be a TOCTTOU class problem. (Yes, this kind of bug happens so often, it's got its own name. :))
When you insert an entry into a map, at least the following two things need to happen:
Check whether the key already exists.
If the check returned true, update the existing entry, if it didn't, add a new one.
If these two don't happen atomically (as they would in a correctly synchronized map implementation), then several threads can come to the conclusion that the key doesn't exist yet in step 1, but by the time they reach step 2, that isn't true any more. So multiple threads will happily insert an entry with the same key.
Please note that this isn't the only problem that can happen, and depending on the implementation and your luck with visibility, you can get all kinds of different and unexpected failures.
In multi thread environment, you should always use CuncurrentHashMap, if you are going to perform any operation except get.
Most of the time you won't get an exception, but definitely get the corrupt data because of the thread local copy value.
Every thread has its own copy of the Map data when performing the put operation and when they check for key existence, multiple threads found it false and they enter the data.

Converting singly linked list to a map

I have been given an assignment to change to upgrade an existing one.
Figure out how to recode the qualifying exam problem using a Map for each terminal line, on the
assumption that the size of the problem is dominated by the number of input lines, not the 500
terminal lines
The program takes in a text file that has number, name. The number is the PC number and the name is the user who logged on. The program returns the user for each pc that logged on the most. Here is the existing code
public class LineUsageData {
SinglyLinkedList<Usage> singly = new SinglyLinkedList<Usage>();
//function to add a user to the linked list or to increment count by 1
public void addObservation(Usage usage){
for(int i = 0; i < singly.size(); ++i){
if(usage.getName().equals(singly.get(i).getName())){
singly.get(i).incrementCount(1);
return;
}
}
singly.add(usage);
}
//returns the user with the most connections to the PC
public String getMaxUsage(){
int tempHigh = 0;
int high = 0;
String userAndCount = "";
for(int i = 0; i < singly.size(); ++i){//goes through list and keeps highest
tempHigh = singly.get(i).getCount();
if(tempHigh > high){
high = tempHigh;
userAndCount = singly.get(i).getName() + " " + singly.get(i).getCount();
}
}
return userAndCount;
}
}
I am having trouble on the theoretical side. We can use a hashmap or a treemap. I am trying to think through how I would form a map that would hold the list of users for each pc? I can reuse the Usage object which will hold the name and the count of the user. I am not supposed to alter that object though
When checking if Usage is present in the list you perform a linear search each time (O(N)). If you replace your list with the Map<String,Usage>, you'll be able to search for name in sublinear time. TreeMap has O(log N) time for search and update, HashMap has amortized O(1)(constant) time.
So, the most effective data structure in this case is HashMap.
import java.util.*;
public class LineUsageData {
Map<String, Usage> map = new HashMap<String, Usage>();
//function to add a user to the map or to increment count by 1
public void addObservation(Usage usage) {
Usage existentUsage = map.get(usage.getName());
if (existentUsage == null) {
map.put(usage.getName(), usage);
} else {
existentUsage.incrementCount(1);
}
}
//returns the user with the most connections to the PC
public String getMaxUsage() {
Usage maxUsage = null;
for (Usage usage : map.values()) {
if (maxUsage == null || usage.getCount() > maxUsage.getCount()) {
maxUsage = usage;
}
}
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
// alternative version that uses Collections.max
public String getMaxUsageAlt() {
Usage maxUsage = map.isEmpty() ? null :
Collections.max(map.values(), new Comparator<Usage>() {
#Override
public int compare(Usage o1, Usage o2) {
return o1.getCount() - o2.getCount();
}
});
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
}
Map can also be iterated in the time proportional to it's size, so you can use the same procedure to find maximum element in it. I gave you two options, either manual approach, or usage of Collections.max utility method.
With simple words: You use a LinkedList (singly or doubly) when you have a list of items, and you usually plan to traverse them,
and a Map implementation when you have "Dictionary-like" entries, where a key corresponds to a value and you plan to access the value using the key.
In order to convert your SinglyLinkedList to a HashMap or TreeMap, you need find out which property of your item will be used as your key (it must be an element with unique values).
Assuming you are using the name property from your Usage class, you can do this
(a simple example):
//You could also use TreeMap, depending on your needs.
Map<String, Usage> usageMap = new HashMap<String, Usage>();
//Iterate through your SinglyLinkedList.
for(Usage usage : singly) {
//Add all items to the Map
usageMap.put(usage.getName(), usage);
}
//Access a value using its name as the key of the Map.
Usage accessedUsage = usageMap.get("AUsageName");
Also note that:
Map<string, Usage> usageMap = new HashMap<>();
Is valid, due to diamond inference.
I Solved this offline and didn't get a chance to see some of the answers which looked to be both very helpful. Sorry about that Nick and Aivean and thanks for the responses. Here is the code i ended up writing to get this to work.
public class LineUsageData {
Map<Integer, Usage> map = new HashMap<Integer, Usage>();
int hash = 0;
public void addObservation(Usage usage){
hash = usage.getName().hashCode();
System.out.println(hash);
while((map.get(hash)) != null){
if(map.get(hash).getName().equals(usage.name)){
map.get(hash).count++;
return;
}else{
hash++;
}
}
map.put(hash, usage);
}
public String getMaxUsage(){
String str = "";
int tempHigh = 0;
int high = 0;
//for loop
for(Integer key : map.keySet()){
tempHigh = map.get(key).getCount();
if(tempHigh > high){
high = tempHigh;
str = map.get(key).getName() + " " + map.get(key).getCount();
}
}
return str;
}
}

How can I retrieve the value in a Hashmap stored in an arraylist type hashmap?

I am a beginner in Java. Basically, I have loaded each text document and stored each individual words in the text document in the hasmap. Afterwhich, I tried storing all the hashmaps in an ArrayList. Now I am stuck with how to retrieve all the words in my hashmaps that is in the arraylist!
private static long numOfWords = 0;
private String userInputString;
private static long wordCount(String data) {
long words = 0;
int index = 0;
boolean prevWhiteSpace = true;
while (index < data.length()) {
//Intialise character variable that will be checked.
char c = data.charAt(index++);
//Determine whether it is a space.
boolean currWhiteSpace = Character.isWhitespace(c);
//If previous is a space and character checked is not a space,
if (prevWhiteSpace && !currWhiteSpace) {
words++;
}
//Assign current character's determination of whether it is a spacing as previous.
prevWhiteSpace = currWhiteSpace;
}
return words;
} //
public static ArrayList StoreLoadedFiles()throws Exception{
final File f1 = new File ("C:/Users/Admin/Desktop/dataFiles/"); //specify the directory to load files
String data=""; //reset the words stored
ArrayList<HashMap> hmArr = new ArrayList<HashMap>(); //array of hashmap
for (final File fileEntry : f1.listFiles()) {
Scanner input = new Scanner(fileEntry); //load files
while (input.hasNext()) { //while there are still words in the document, continue to load all the words in a file
data += input.next();
input.useDelimiter("\t"); //similar to split function
} //while loop
String textWords = data.replaceAll("\\s+", " "); //remove all found whitespaces
HashMap<String, Integer> hm = new HashMap<String, Integer>(); //Creates a Hashmap that would be renewed when next document is loaded.
String[] words = textWords.split(" "); //store individual words into a String array
for (int j = 0; j < numOfWords; j++) {
int wordAppearCount = 0;
if (hm.containsKey(words[j].toLowerCase().replaceAll("\\W", ""))) { //replace non-word characters
wordAppearCount = hm.get(words[j].toLowerCase().replaceAll("\\W", "")); //remove non-word character and retrieve the index of the word
}
if (!words[j].toLowerCase().replaceAll("\\W", "").equals("")) {
//Words stored in hashmap are in lower case and have special characters removed.
hm.put(words[j].toLowerCase().replaceAll("\\W", ""), ++wordAppearCount);//index of word and string word stored in hashmap
}
}
hmArr.add(hm);//stores every single hashmap inside an ArrayList of hashmap
} //end of for loop
return hmArr; //return hashmap ArrayList
}
public static void LoadAllHashmapWords(ArrayList m){
for(int i=0;i<m.size();i++){
m.get(i); //stuck here!
}
Firstly your login wont work correctly. In the StoreLoadedFiles() method you iterate through the words like for (int j = 0; j < numOfWords; j++) { . The numOfWords field is initialized to zero and hence this loop wont execute at all. You should initialize that with length of words array.
Having said that to retrieve the value from hashmap from a list of hashmap, you should first iterate through the list and with each hashmap you could take the entry set. Map.Entry is basically the pair that you store in the hashmap. So when you invoke map.entrySet() method it returns a java.util.Set<Map.Entry<Key, Value>>. A set is returned because the key will be unique.
So a complete program will look like.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
import java.util.Scanner;
public class FileWordCounter {
public static List<HashMap<String, Integer>> storeLoadedFiles() {
final File directory = new File("C:/Users/Admin/Desktop/dataFiles/");
List<HashMap<String, Integer>> listOfWordCountMap = new ArrayList<HashMap<String, Integer>>();
Scanner input = null;
StringBuilder data;
try {
for (final File fileEntry : directory.listFiles()) {
input = new Scanner(fileEntry);
input.useDelimiter("\t");
data = new StringBuilder();
while (input.hasNext()) {
data.append(input.next());
}
input.close();
String wordsInFile = data.toString().replaceAll("\\s+", " ");
HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();
for(String word : wordsInFile.split(" ")){
String strippedWord = word.toLowerCase().replaceAll("\\W", "");
int wordAppearCount = 0;
if(strippedWord.length() > 0){
if(wordCountMap.containsKey(strippedWord)){
wordAppearCount = wordCountMap.get(strippedWord);
}
wordCountMap.put(strippedWord, ++wordAppearCount);
}
}
listOfWordCountMap.add(wordCountMap);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} finally {
if(input != null) {
input.close();
}
}
return listOfWordCountMap;
}
public static void loadAllHashmapWords(List<HashMap<String, Integer>> listOfWordCountMap) {
for(HashMap<String, Integer> wordCountMap : listOfWordCountMap){
for(Entry<String, Integer> wordCountEntry : wordCountMap.entrySet()){
System.out.println(wordCountEntry.getKey() + " - " + wordCountEntry.getValue());
}
}
}
public static void main(String[] args) {
List<HashMap<String, Integer>> listOfWordCountMap = storeLoadedFiles();
loadAllHashmapWords(listOfWordCountMap);
}
}
Since you are beginner in Java programming I would like to point out a few best practices that you could start using from the beginning.
Closing resources : In your while loop to read from files you are opening a Scanner like Scanner input = new Scanner(fileEntry);, But you never closes it. This causes memory leaks. You should always use a try-catch-finally block and close resources in finally block.
Avoid unnecessary redundant calls : If an operation is the same while executing inside a loop try moving it outside the loop to avoid redundant calls. In your case for example the scanner delimiter setting as input.useDelimiter("\t"); is essentially a one time operation after a scanner is initialized. So you could move that outside the while loop.
Use StringBuilder instead of String : For repeated string manipulations such as concatenation should be done using a StringBuilder (or StringBuffer when you need synchronization) instead of using += or +. This is because String is an immutable object, meaning its value cannot be changed. So each time when you do a concatenation a new String object is created. This results in a lot of unused instances in memory. Where as StringBuilder is mutable and values could be changed.
Naming convention : The usual naming convention in Java is starting with lower-case letter and first letter upper-case for each word. So its a standard practice to name a method as storeLoadedFiles as opposed to StoreLoadedFiles. (This could be opinion based ;))
Give descriptive names : Its a good practice to give descriptive names. It helps in later code maintenance. Say its better to give a name as wordCountMap as opposed to hm. So in future if someone tries to go through your code they'll get a better and faster understanding about your code with descriptive names. Again opinion based.
Use generics as much as possible : This avoid additional casting overhead.
Avoid repetition : Similar to point 2 if you have an operation that result in the same output and need to be used multiple times try moving it to a variable and use the variable. In your case you were using words[j].toLowerCase().replaceAll("\\W", "") multiple times. All the time the result is the same but it creates unnecessary instances and repetitions. So you could move that to a String and use that String elsewhere.
Try using for-each loop where ever possible : This relieves us from taking care of indexing.
These are just suggestions. I tried to include most of it in my code but I wont say its the perfect one. Since you are a beginner if you tried to include these best practices now itself it'll get ingrained in you. Happy coding.. :)
for (HashMap<String, Integer> map : m) {
for(Entry<String,Integer> e:map.entrySet()){
//your code here
}
}
or, if using java 8 you can play with lambda
m.stream().forEach((map) -> {
map.entrySet().stream().forEach((e) -> {
//your code here
});
});
But before all you have to change method signature to public static void LoadAllHashmapWords(List<HashMap<String,Integer>> m) otherwise you would have to use a cast.
P.S. are you sure your extracting method works? I've tested it a bit and had list of empty hashmaps all the time.

Can I create an array of sets?

Here is what I am trying to do.
I am reading in a list of words with each having a level of complexity. Each line has a word followed by a comma and the level of the word. "watch, 2" for example. I wish to put all of the words of a given level into a set to ensure their uniqueness in that level. There are 5 levels of complexity, so ideally I'd like an array with 5 elements, each of which is a set.
I can then add words to each of the sets as I read them in. Later on, I wish to pull out a random word of a specified level.
I'm happy with everything except how to create an array of sets. I've read several other posts here that seem to agree that this can't be done exactly as I would hope, but I can't find a good work around. (No, I'm not willing to have 5 sets in a switch statement. Goes against the grain.)
Thanks.
You can use a map . Use level as key and value as the set which contains the words. This will help you to pull out the value for a given level, When a random word is requested from a level, get the value(set in this case) using the key which is the level and pick a random value from that. This will also scale if you increase the number of levels
public static void main(String[] args) {
Map<Integer, Set<String>> levelSet = new HashMap();
//Your code goes here to get the level and word
//
String word="";
int level=0;
addStringToLevel(levelSet,word,level);
}
private static void addStringToLevel(Map<Integer, Set<String>> levelSet,
String word, int level) {
if(levelSet.get(level) == null)
{
// this means this is the first string added for this level
// so create a container to hold the object
levelSet.put(level, new HashSet());
}
Set<String> wordContainer = levelSet.get(level);
wordContainer.add(word);
}
private static String getStringFromLevel(Map<Integer, Set<String>> levelSet,
int level) {
if(levelSet.get(level) == null)
{
return null;
}
Set<String> wordContainer = levelSet.get(level);
return "";// return a random string from wordContainer`
}
If you are willing to use Guava, try SetMultimap. It will take care of everything for you.
SetMultimap<Integer, String> map = HashMultimap.create();
map.put(5, "value");
The collection will take care of creating the inner Set instances for you unlike the array or List solutions which require either pre-creating the Sets or checking that they exist.
Consider using a List instead of an array.
Doing so might make your life easier.
List<Set<String>> wordSetLevels = new ArrayList();
// ...
for ( i = 0; i < 5; i++ ) {
wordSetLevels.add(new HashSet<String>());
}
wordSetLevels = Collections.unmodifiableList(wordSetLevels);
// ...
wordSetLevels.get(2).add("watch");
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class Main {
private Set<String>[] process(List<String> words) {
#SuppressWarnings("unchecked")
Set<String>[] arrayOfSets = new Set[5];
for(int i=0; i<arrayOfSets.length; i++) {
arrayOfSets[i] = new HashSet<String>();
}
for(String word: words) {
int index = getIndex(word);
String val = getValue(word);
arrayOfSets[index].add(val);
}
return arrayOfSets;
}
private int getIndex(String str) {
//TODO Implement
return 0;
}
private String getValue(String str) {
//TODO Implement
return "";
}
}

Categories