Word list search with hashmap in Java - java

I have a word list and there are more than 50,000 words in my word list. As you can see, I read my words and add them to an Array List, but after this process, when I want to read my words, it happens very slowly. That's why Hashmap came to my mind. I want to read my words and when I receive a word input from the user, I want to have it checked whether it is in the HashMap. Even though I did research, I could not find how to do it exactly. How can I do this?
public ArrayList<String> wordReader () throws FileNotFoundException {
File txt = new File(path);
Scanner scanner = new Scanner(txt);
ArrayList <String> words = new ArrayList<String>();
while (scanner.hasNextLine()) {
String data = scanner.nextLine();
words.add(data);
}
scanner.close();
return words;
}

If I have understood your problem correctly, you're having performance issues in traversing an ArrayList filled with 50.000 words when you're trying to check if a specific word exists in your list or not.
This is because looking for an element in an unsorted List has O(n) complexity. You could improve the performances by employing a sorted data structure like a BST (a Binary Search Tree) which will improve the research operation with a O(log n) complexity.
Also, your idea of using a Map is definitely viable, since a HashMap grants a complexity for add and get operations between O(1) (for theoretically perfect hashing algorithm with no collisions at all among the keys) and O(n) (for bad hashing algorithms with a high chance of collision). Besides, since Java 8, it has been introduced an optimization in the HashMap implementation, where under high collision conditions with multiple elements added to the same bucket, the data structure corresponding to a bucket is actually implemented as a Balanced Tree rather than a list, granting a O(log n) complexity in the worst case.
https://www.logicbig.com/tutorials/core-java-tutorial/java-collections/java-map-cheatsheet.html
However, using a HashMap for what I assume is a dictionary (only distinct words) could be unnecessary, since you would use a word as both a key and a value. Instead of a HashMap, you could use a Set as others have pointed out, or better a HashSet. As in fact, a HashSet is implemented via a HashMap instance under the hood, which will give us all the performance and advantages previously discussed (this is why I wrote that preface).
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HashSet.html
Your implementation could look like this:
public Set<String> wordReader(String path) throws FileNotFoundException {
File txt = new File(path);
Scanner scanner = new Scanner(txt);
Set<String> words = new HashSet<>();
while (scanner.hasNextLine()) {
String data = scanner.nextLine();
words.add(data);
}
scanner.close();
return words;
}
public boolean isWordContained(Set<String> set, String word) {
return set.contains(word);
}

Since you will be checking whether the word input is present in your list of words read from the file, you can use a HashSet<String> instead of using an ArrayList<String>.
Your method would then become
public HashSet<String> wordReader () throws FileNotFoundException {
File txt = new File(path);
Scanner scanner = new Scanner(txt);
HashSet <String> words = new HashSet<String>();
while (scanner.hasNextLine()) {
String data = scanner.nextLine();
words.add(data);
}
scanner.close();
return words;
}
Now after you read the word input, you can check whether it is present in the HashSet. This would be a much faster operation as lookup would take constant time.
public boolean isWordPresent(String word, HashMap<String> words){
return words.contains(word);
}
As a side note, HashSet internally uses a HashMap to perform the operations.

I would use a Set, not a List since sets automatically ignore duplicates when you add them to the set. If it wasn't present it returns true and adds it, otherwise false.
public Set<String> wordReader () throws FileNotFoundException {
File txt = new File(path);
Scanner scanner = new Scanner(txt);
Set <String> words = new HashSet<>();
while (scanner.hasNextLine()) {
String data = scanner.nextLine();
if(!words.add(data)) {
// present - Do something
}
}
scanner.close();
return words;
}
because sets are not ordered they are not random access collections. So you can add the set to a list as follows:
Set<String> words = wordReader();
List<String> wordList = new ArrayList<>(words);
Now you can retrieve them with an index.
you may want to make your method more versatile by passing the file name as an argument.

Related

How to find the number of unique words in array list

So I am trying to create an for loop to find unique elements in a ArrayList.
I already have a ArrayList stored with user input of 20 places (repeats are allowed) but I am stuck on how to count the number of different places inputted in the list excluding duplicates. (i would like to avoid using hash)
Input:
[park, park, sea, beach, town]
Output:
[Number of unique places = 4]
Heres a rough example of the code I'm trying to make:
public static void main(String[] args) {
ArrayList<City> place = new ArrayList();
Scanner sc = new Scanner(System.in);
for(...) { // this is just to receive 20 inputs from users using the scanner
...
}
# This is where i am lost on creating a for loop...
}
you can use a Set for that.
https://docs.oracle.com/javase/7/docs/api/java/util/Set.html
Store the list data to the Set.Set will not have duplicates in it, so the size of set will be the elements without duplicates.
use this method to get the set size.
https://docs.oracle.com/javase/7/docs/api/java/util/Set.html#size()
Sample Code.
List<String> citiesWithDuplicates =
Arrays.asList(new String[] {"park", "park", "sea", "beach", "town"});
Set<String> cities = new HashSet<>(citiesWithDuplicates);
System.out.println("Number of unique places = " + cities.size());
If you are able to use Java 8, you can use the distinct method of Java streams:
int numOfUniquePlaces = list.stream().distinct().count();
Otherwise, using a set is the easiest solution. Since you don't want to use "hash", use a TreeSet (although HashSet is in most cases the better solution). If that is not an option either, you'll have to manually check for each element whether it's a duplicate or not.
One way that comes to mind (without using Set or hashvalues) is to make a second list.
ArrayList<City> places = new ArrayList<>();
//Fill array
ArrayList<String> uniquePlaces = new ArrayList<>();
for (City city : places){
if (!uniquePlaces.contains(city.getPlace())){
uniquePlaces.add(city.getPlace());
}
}
//number of unique places:
int uniqueCount = uniquePlaces.size();
Note that this is not super efficient =D
If you do not want to use implementations of Set or Map interfaces (that would solve you problem with one line of code) and you want to stuck with ArrayList, I suggest use something like Collections.sort() method. It will sort you elements. Then iterate through the sorted array and compare and count duplicates. This trick can make solving your iteration problem easier.
Anyway, I strongly recommend using one of the implementations of Set interface.
Use following answer. This will add last duplicate element in distinct list if there are multiple duplicate elements.
List<String> citiesWithDuplicates = Arrays.asList(new String[] {
"park", "park", "sea", "beach", "town", "park", "beach" });
List<String> distinctCities = new ArrayList<String>();
int currentIndex = 0;
for (String city : citiesWithDuplicates) {
int index = citiesWithDuplicates.lastIndexOf(city);
if (index == currentIndex) {
distinctCities.add(city);
}
currentIndex++;
}
System.out.println("[ Number of unique places = "
+ distinctCities.size() + "]");
Well if you do not want to use any HashSets or similar options, a quick and dirty nested for-loop like this for example does the trick (it is just slow as hell if you have a lot of items (20 would be just fine)):
int differentCount=0;
for(City city1 : place){
boolean same=false;
for(City city2 : place){
if(city1.equals(city2)){
same=true;
break;
}
}
if(!same)
differentCount++;
}
System.out.printf("Number of unique places = %d\n",differentCount);

How to unscramble a list of words using a HashMap?

Basically I will be given two large input files. One will be a list of words, the other will be that list of those same words, but the words will be scrambled. I have to use a HashMap to get the list of words and scrambled words and then print the scrambled word with the real word next to it in alphabetical order.
For example:
rdib bird
tca cat
gdo dog
etc.
I'm having some trouble so far. I have created a method to sort and get the key from the words, but I'm not sure where to go from there. I think I still need to work with the scrambled words and then print everything out. Any help explaining these things would be much appreciated, for this is my first time using a HashMap. My current, very incomplete code is below.
import java.io.*;
import java.util.*;
public class Project5
{
public static void main (String[] args) throws Exception
{
BufferedReader dictionaryList = new BufferedReader( new FileReader( args[0] ) );
BufferedReader scrambleList = new BufferedReader( new FileReader( args[1] ) );
HashMap<String, String> dWordMap = new HashMap<String, String>();
while (dictionaryList.ready())
{
String word = dictionaryList.readLine();
dWordMap.put(createKey(word), word);
}
dictionaryList.close();
while (scrambleList.ready())
{
String scrambledWords = scrambleList.readLine();
List<String> dictionaryWords = dWordMap.get(createKey(scrambledWords));
System.out.println(scrambledWords + " " + dictionaryWords);
}
scrambleList.close();
}
private static String createKey(String word)
{
char[] characterWord = word.toCharArray();
Arrays.sort(characterWord);
return new String(characterWord);
}
Make the dWordMap just HashMap<String, String>. For the line you're not sure of, do dWordMap.put(createKey(word), word).
Then loop through the scrableList and the word is dWordMap.get(createKey(scrambledWord)).
You should probably also handle the case that the scrambled word is not in the original word list.
The key concept to understand about a HashMap is that it makes it O(1) to test if the map contains a given key, and O(1) to retrieve the value associated with a given key. This means these operations take constant time--whether the map has 5 elements or 5000, it will take the same time to determine if the map contains "ehllo". If you want to check these two lists (dictionary and scrambled), you need a key that will be the same for both. As you have started to do in your solution, sorting the letters in the word is a good choice. So your HashMap will look something like this:
{
"ehllo": "hello",
"dlorw": "world"
}
One pass through the dictionary list builds that map, then another pass through it takes the scrambled word, sorts the letters in it, then checks the map to find the unscrambled word.
You should break this task down into smaller parts. Check each one works before moving on to the next.
First you need a method that turns a word into a key. A string containing the letters of the word alphabetised is a good key. So write something that passes the test:
assertEquals("dgo", createKey("dog");
assertEquals("act", createKey("cat");
Next you need to populate your map with words from your list. You need something that passes this test:
Map<String,String> map = new HashMap<>();
addToMap(map,"dog");
assertEquals("dog", map.get("dgo");
Your addToMap() method will make use of createKey().
It should now be clear that you can use the map you've created, and createKey() to find "dog" from any ordering:
Map<String,String> map = new HashMap<>();
addToMap(map,"dog");
assertEquals("dog", map.get(createKey("odg"));
You can encapsulate this into a method, so that:
Map<String,String> map = new HashMap<>();
addToMap(map,"dog");
assertEquals("dog", getFromScrambledWord(map,"odg"));
All that's left is to put it all together:
Loop through your dictionary file and call addToMap() for each line
Loop through your scrambled-word file, call getFromScrambledWord() for each one, print the result
If you also need to put this list in alphabetical order, then instead of printing it, store the result in a List, sort the list then loop through it to print.
I've deliberately made this not very object-oriented, because you're clearly a beginner. To make things more OO, make the Map a private field in a class of your own:
public class WordStore {
private final Map<String,String> words = new HashMap<>();
public void addWord(String word) {
// your implementation here
}
public String getFromScrambled(String scrambledWord) {
// your implementation here
}
}
So the test would be more like:
WordStore store = new WordStore(); // the Map is inside the WordStore
store.addWord("dog"); // like addToMap()
assertEquals("dog", store.getFromScrambled("odg"));

Read a file and spit out the count of the words using HashMap and HashSet

This is my first experience in writing code using HashMap and HashSet and I am a little confused where to start from. I want to read a file and count the number of strings used but I have to do this using HashMap and HashSet.
Any ideas on where to start from?
So I will read the file and put the strings in an array and then read it from the array and putting them into a HashSet? Is this an idiotic idea?
The constraint is that The only O(n) operation in the program should be iterating through the text file.
Thank you for the contribution in increasing my knowledge ;)
first you read entire data from file and store in string object
now use java.util.StringTokenizer class .it will split up all words in token
now read all token one by one and check it like following
use word as Key and its frequency as value in HashMap
HashMap map=new HashMap();
HashSet set=new HashSet();
StringTokenizer st = new StringTokenizer(strObj);
String s;
while (st.hasMoreTokens()) {
s=st.nextToken();
if(map.containsKey(s))
{
count=(Integer)map.get(s);
count++;
map.put(s,count);
set.add(s);
}
else
{
map.put(s,count);
set.add(s);
}
}
You're close, but can miss out the middle man (that array).
You can use a HashMap<String, Integer> to store a map of string to count of string.
What you need your program to do is:
Read the next string from the file.
Check if that string exists in you HashMap:
If it does exist, just grab the Integer that the String maps onto from the map, increment it, and put it back into the map.
If it does not exist, put the String in the map with the Integer 1.
Repeat from step 1 until the file has been read.
Grab the value collection from the HashMap and store it using Collection<Integer> counts = map.values();
Sum the collection using streams int sum = counts.stream().mapToInt(i -> i).sum();
Output the value of sum.
I'm sure you can figure out to convert that to code yourself! :)
You can find more info on HashMap here (check out the values() method), more info on Stream here, and more info on that funky bit of code from step 5 here.
In addition to Sharad's answer: Reading from file...
// would loved to use Integer, but it is immutable...
class Wrapper
{
public Wrapper(int value)
{
this.value = value;
}
public int value;
}
HashMap<String, Wrapper> counts = new HashMap<String, Wrapper>();
Scanner scanner = new Scanner(new File(fileName));
while(scanner.hasNext())
{
String token = scanner.next();
Wrapper count = counts.get(token);
if(count == null)
{
counts.put(token, new Wrapper(1));
}
else
{
++count.value;
}
}
scanner.close();
I varied sharad's algorithm a little, not having to calculate the hash value twice if the value is already in the map, and using generics saves you from having to cast.
If you need only the strings in the file as set, you get it via counts.keySet();.

Implementing the dictionary as a sorted singly linked list Java

I have to create a dictionary where you input a text file of 5 sentences and it takes the words in them and sorts them alphabetically using a singly linked list. I have the text file but really need help with making them into a linked list and sorting them. I understand how to create linked lists but I don't know how to create them from a text file and sort them. Any help would be appreciated.
import java.util.*;
public class Dictionary {
public static void main(String[] args) {
String[] things = {"a", "dog", "eats"};
List<String> list1 = new LinkedList<String>();
for(String x : things)
list1.add(x);
String[] things2 = {"The", "Cat", "Walks"};
List<String> list2 = new LinkedList<String>();
for(String y : things2)
list2.add(y);
list1.addAll(list2);
list2 = null;
printMe(list1);
printMe(list1);
}
private static void printMe(List<String> l) {
for(String b : l)
System.out.printf("%s ", b);
System.out.println();
}
}
Try using a Scanner to read the input file
Well, the Scanner class has methods to iterate through tokens based on a pattern you supply it.
You can supply the pattern (a regular expression) either on each call to "hasNext(Pattern)" and "next(Pattern)" or by calling the "usePattern(Pattern)" method to set the default pattern, and using the standard "hasNext()" and "next()" iterator methods.
If you don't set any pattern it uses this:
// A pattern for java whitespace
private static Pattern WHITESPACE_PATTERN = Pattern.compile(
"\\p{javaWhitespace}+");
I won't get into the regular expressions here, but your general flow would be:
Scanner scanner = new Scanner(reader);
scanner.usePattern(Pattern.compile("some regex pattern")); // if you want something other than the default
while (scanner.hasNext()) {
String word = scanner.next();
}
Likely reader would be an instance of java.io.FileReader. And you want a better throughput for a large file, wrap the FileReader in a java.io.BufferedReader.
For sorting, you can either sort with Collections.sort() after adding all the words, or while adding each word, you can iterate through the existing linked list with the ListIterator returned by List.listIterator() method, find the first element which is lexigraphically greater than that token, and insert before that token using the ListIterator.add() method.

Finding a loose match for a string in arraylist

I have a huge array list which contains 1000 entries out of which one of the entry is "world". And, I have a word "big world". I want to get the word "big world" matched with "world" in the arraylist.
What is the most cost effective way of doing it? I cannot use .contains method of array list, and If I traverse all the 1000 entries and match them by pattern its going to be very costly in terms of performance. I am using Java for this.
Could you please let me know what is the best way for this?
Cheers,
J
You can split up every single element of the ArrayList into words and stop as soon as you find one of them.
I suppose by your profile you develop in Java, with Lucene you would easily do something like that
public class NodesAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
Tokenizer tokenizer = new StandardTokenizer(reader)
TokenFilter lowerCaseFilter = new LowerCaseFilter(tokenizer)
TokenFilter stopFilter = new StopFilter(lowerCaseFilter, Data.stopWords.collect{ it.text } as String[])
SnowballFilter snowballFilter = new SnowballFilter(stopFilter, new org.tartarus.snowball.ext.ItalianStemmer())
return snowballFilter
}
}
Analyzer analyzer = new NodesAnalyzer()
TokenStream ts = analyzer.tokenStream(null, new StringReader(str));
Token token = ts.next()
while (token != null) {
String cur = token.term()
token = ts.next();
}
Note: this is Groovy code that I copied from a personal project so you will have to translate things like Data.stopWords.collect{ it.text } as String[] to use with plain Java
Assuming you dont know the content of the arraylist elements. you will have to traverse the whole arraylist.
Traversing the arraylist would cost you O(n).
Sorting the arraylist wouldnt help you because you are talking about a searching a string in a set of strings. and still sorting would be more expensive. O(nlogn)
If you have to search the list repeatedly, it may make sense to use the sort() and binarySearch() methods of Collections.
Addendum: As noted by #user177883, the cost of an O(n log n) sort must be weighed against the benefit of subsequent O(log n) searches.
The word "heart" matches the [word] "ear".
As an exact match is insufficient, this approach would be inadequate.
I had a very similar issue.
Solved it by using this if/else if statement.
if (myArrayList.contains(wordThatIsEntered)
&& wordThatCantBeMatched.equals(wordThatIsEntered)) {
Toast.makeText(getApplicationContext(),
"WORD CAN'T BE THE SAME OR THAT WORD ISN'T HERE",
Toast.LENGTH_SHORT).show();
}
else if (myArrayList.contains(wordThatIsEntered)) {
Toast.makeText(getApplicationContext(),
"FOUND THE EXACT WORD YOU ARE LOOKING FOR!",
Toast.LENGTH_SHORT).show();
}

Categories